Backup one S3 bucket to another bucket by way of EC2

Site Admin No Comments »

Let’s say you want to make a backup of one of your S3 buckets because you’re always a little worried that one of your scripts might burn it, or you might get a little sloppy with S3Fox. You just need a decent safety net to prevent things from going all to hell. In my case, I just want a backup bucket that has a decently recent copy of everything from my main bucket.

Actually accomplishing this is a bit tough, though, as there is no “copy bucket” function in S3. So, you have to resort to pulling all the files somewhere and then pushing them back to S3. Very roundabout, and potentially very expensive due to paying the in/out bandwidth fees. That is, unless you use an EC2 node, where S3 storage is free. That’s what I did, and it’s humming away as I type this. So far, so good.

Boot up your EC2 node

To accomplish this, you’re going to need to boot an EC2 node. I suggest using ElasticFox and firing up one of Alestic’s Ubuntu AMIs. I used ami-cb8d61a2, which is an Ubuntu 8.04 LTS image.

I made sure to boot one with EBS. To be honest, I don’t know what EBS is exactly, but I needed to backup a 20GB bucket, and I don’t think the storage on the default instances is enough for this. With the AMI I loaded, there was a directory at /mnt with 150GB of storage, so I was ready to go.

Start Syncing

Here’s a simple bash script I wrote to handle most of this.

I really don’t recommend just running the script. Instead, read it, understand it, and run the commands by hand. I’m definitely no bash scripting genius, so it’s likely I’ve messed something up.

Oh, and in case it’s not clear…NO WARRANTY. If this script burns you, I don’t accept any liability.

#!/bin/bash

# Setup your AWS credentials
export AWS_ACCESS_KEY_ID=XXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXXXX

# Get the latest s3sync
wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz
tar xzf /home/ubuntu/s3sync.tar.gz

# Must make a directory on the EBS volume, using root
# sudo mkdir /mnt/bucket_backup
# sudo chown ubuntu:ubuntu /mnt/bucket_backup
ln -s /mnt/bucket_backup /home/ubuntu/bucket_backup

# Run the sync
nohup s3sync/s3sync.rb -r --progress --make-dirs main_bucket: bucket_backup/

# Sync back to S3
nohup s3sync/s3sync.rb -r --delete --progress --make-dirs bucket_backup/ backup_bucket:

Waiting…

Right now, I’m getting transfer speeds of roughly 1MB/s. For my 20G bucket, it’s going to take 5 or 6 hours each way to back it up. At current EC2 prices, that should run me about $1. Not bad. Perhaps if I had been more careful to make sure I loaded the EC2 node in the correct data center it would go faster. I dunno. I don’t mind waiting.

Automating?

Obviously, this is a low-tech solution. It has to be run manually, which is a big downside. But, it will provide a basic safety net as long as I can remember to do it every couple weeks.

I hope this helps someone!

ABingo problems – Abingo is not missing constant Alternative!

Plugins, Ruby on Rails 1 Comment »

If you’re seeing that exception, then I may have a solution for you. Following the reasoning of this blog post, it seems that it’s an issue with how Abingo namespaces the models.

Changing the two ABingo models to explicitly set the class name fixed it for me. YMMV

class Abingo::Alternative < ActiveRecord::Base
  include Abingo::ConversionRate

  belongs_to :experiment, :class_name => "::Abingo::Experiment"
  serialize :content
  ...
end

class Abingo::Experiment < ActiveRecord::Base
  include Abingo::Statistics
  include Abingo::ConversionRate

  has_many :alternatives, :dependent => :destroy, :class_name => "::Abingo::Alternative"
  validates_uniqueness_of :test_name
  ...
end

How not to sync your blog to your Rails app

Obsidian Portal, Ruby on Rails 4 Comments »

I’ve been playing with PubSubHubbub (PuSH) and Superfeedr for several hours now, and overall I’m mainly just disappointed. I’m not sure who is to blame, but it’s not the plug-and-play experience I had hoped for.

The Goal

I actually have a purpose, not just play. I wanted to sync up the Obsidian Portal blog (Atom feed available) with the Obsidian Portal homepage. Essentially, we’d like to display a teaser from the latest post on the blog, along with a list of some other recent posts that might be interesting.

The Plan

After putting it in the back of my head for a while, the idea of using PuSH to handle syncing the blog entries over to Obsidian Portal jumped to the forefront. The basic idea was simple: instead of building an RSS poller/parser/whatever, just have someone POST notifications of new blog entries directly to the main Obsidian Portal app. That’s exactly what PuSH is for. Excellent!

Looking around, I found that Superfeedr was a PuSH hub that seemed perfect for this kind of thing.

Poll vs Notify

The first thing I did was start playing around with using Superfeedr to poll the blog feed. This immediately became a frustrating experience. The polling speed was 15 minutes, so every little test/debug/experiment I wanted to try had a 15 minute window. Completely unacceptable.

So, I went out and downloaded a WordPress plugin to support PuSH on the blog side. Now the blog should send notifications to Superfeedr so there would be no need for polling.

Um…nope. I’m not sure if it’s the plugin or Superfeedr, but it’s still polling at a rate of about every 15 minutes. It’s sometimes faster, but on the order of 5-10 minutes instead of 15. That makes it tough to do the sort of playing around that is necessary to figure out the quirks of a new tool.

Post body, not params

Now, this is my own damn fault, so blame lies squarely in my lap on this one. I just want to make sure that other Rails guys reading this know not to make the same mistake.

The Atom entry you receive as part of a notification is in the POST body, not a parameter! This took me a long time to figure out (long time due to waiting 15 minutes between tests, due to polling :( ). Here’s eventually what I did in the action that receives a notification:

atom = request.body.read

Now you’ve got the Atom text and can parse and go.

Summary vs Content

The next wtf? moment came when dealing with summary vs content. In the Atom protocol, it’s possible to have both the full-text content and a summary. This would be perfect for me, since I could leave the full-text content for any Atom subscribers we have, while using the summary field for the excerpt that appears on the Obsidian Portal homepage. WordPress actually puts a summary/excerpt in by default. Excellent!

Bad news adventurers: For some reason, Superfeedr only displays the summary element if there is not already a content element. So, it’s content XOR summary. There may be a valid technical reason for this, but otherwise it makes absolutely no sense to me. Why truncate possibly valuable data? Why not preserve the underlying Atom feed as closely as possible?

An easy solution for me is to remove the content and only include the summary, which WordPress supports out of the box. Still, I really hate this, as I (as a user) much prefer when a feed includes the full text of the article. None of that “come to the site and see all my ads to read the rest!” crap. It irritates me that I’ve been unnecessarily backed into this corner.

No Updates

The worst thing of all is that Superfeedr doesn’t seem to send notifications for updated entries, only sparkly new ones. This is completely unacceptable for my purposes. Let’s examine a common scenario:

Blog post: “Go their too read more…”

First 20 comments: “You used the wrong ‘there’ and ‘to’, dumbass…”

Ok, so I update the blog post and thank the commenters for their kind words. Still, Superfeedr never sends a notification, so the changes aren’t reflected on the Obsidian Portal homepage which gets seen by way more people than the blog.

This decision really has me scratching my head. The Atom protocol (and WordPress feed) have an updated datetime field that can be used to determine when the last update occurred. It seems simple to check that field and send a notification if it’s changed since the last notification. Again, I don’t want to blame Superfeedr here if it’s a PuSH thing.

Going Forward

At this point, having spent probably 5-6 hours playing with this, I’m reluctant to just chuck it. I’m in that “Surely the answer is just around the corner” mindset, but the lack of updates may be a deal-killer. If I have to manually poll for updates or manually edit entries that require update, then PuSH/Superfeedr becomes pretty much useless to me.

Should I switch to the official Google hub? I assumed that a commercial operation like Superfeedr would be preferable to the reference implementation, and I like their web panel more, but the lack of updates plus stripping the summary field are big no-nos in my book. Since I’ve already invested so much time, I guess I owe it to myself to give the Google hub a try before just throwing my laptop out the window.

Am I crazy? Stupid? I thought PuSH was supposed to be easier than this. If I’m not mistaken, my problem/goal is a textbook example of where to use PuSH/Superfeedr, but so far it’s been nothing but a major hassle.

Unobtrusive onload Google Analytics with jQuery

Uncategorized 2 Comments »

Update: This post was live for about 5 minutes before Michael in the comments brought Google’s official asynchronous solution to my attention. That’s probably a much better way to go. I’ll leave the original post here as google bait though… ;)

I’ve gone ahead and moved the loading and execution of the GA javascript into an external file so as to speed up loading of the page and not delay firing of the onload event.

For every page you want tracked, make sure to include a div (or any element) with the id “ga-pageview-tracking” and include the javascript file shown below.

It’s fairly simple, and I’ve extracted the core of it into a single javascript file. I thought about uploading it to github, but it’s just so tiny.

(function($){
  var GA_ID = "PUT-YOUR-GA-ID-HERE";

  function gaTrackPageview() {
    var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
    var src = gaJsHost + "google-analytics.com/ga.js";

    if($('#ga-pageview-tracking').length) {
      $.getScript(src, function(data, textStatus) {
        var tracker = _gat._getTracker(GA_ID);
        tracker._trackPageview();
      });
    }
  }

  $(document).ready(function() {
    gaTrackPageview();
  });
})(jQuery);

Don’t monetize to cover costs

Business 5 Comments »

I hear over and over from people who are thinking about trying to monetize a fun side project “to cover hosting costs.” I have even succumbed to this line of thinking a few times: “Let’s just throw on AdSense and see what happens..” Whenever you get this idea, you need to resist temptation and push aside your thoughts of casual money. Especially in the case of AdSense, you’ll just end up making your site a little uglier, while probably not even making enough money to cover the time you spent inserting the ad code.

First, let’s examine what it really takes to “cover costs.” Even assuming your side project is a fairly hefty web app that requires its own VPS, you’re still probably looking at no more than $100/mo in hosting fees. Put in context, you’re probably spending less on your project than you’re spending on your cell phone, and that $100 estimate is on the high end. Most side projects can run on the crappy shared hosting you’ve got your blog on, or piggy-backed on a VPS you’re using for something else. In those cases, it’s essentially free.

Still want to cover your hosting costs? Well that’s easy. Just cancel your cable tv, or skip eating at a restaurant twice. Costs covered.

Second and most important, remember that it’s perfectly acceptable to have hobbies that cost you money and provide nothing in return except enjoyment. My favorite comparison here is rec-league softball. I played for several seasons and I loved it. I had to spend about $100 to get all my equipment, and then each season had a registration fee of about $60. It never occurred to me to monetize my softball game in order to cover those costs. Maybe I could have sold ad space on my jersey, or some crap like that, but it probably would have been a huge waste of time. Why should a web hobby be any different?

The only reason to monetize a web project is if you intend to make serious money. There’s no guarantee that you’ll succeed, but a sizable payout (defined however you want) should be the goal. Until you’re ready to look in the mirror and say, “Let’s make some money!” then don’t worry about it. Just take pleasure in your hobby and the knowledge that you’re making the web a better place.

Update A lot of people on Hacker News disagreed with my “suck it up” mentality, and a few had some really good ideas on how to cover costs, without resorting to AdSense.

From jeff18:

  • Ask a buddy to let you put the site on one of their under utilized servers. Sharing a VPS amongst a group of friends is a great way to spread costs.
  • Ask a company to sponsor you.
  • Ask for donations from the community. This is probably one of the best ways to “cover costs” if that’s your true goal. Just run a fundraiser once a year or so.

Regarding Jeff’s company sponsorship idea, I can personally say that’s a good one. I (as Obsidian Portal) offered to host the RPG Bloggers Network for free, in exchange for a “hosted by” link and a note to contact me for an introduction in the email sent to new members. They ultimately turned down the offer, but it would have been a killer deal for me.

From uggedal: Collect referral fees. Uggedal supports wasitup with referral bonuses to Linode. Referrals to your hosting company makes perfect sense for a hacker project, but you may have to get more creative if your subject domain isn’t hacker-centric.

Rackspace Email and sSMTP on a Slicehost server

Ruby on Rails, Site Admin No Comments »

This is just a brain-dump of everything I’ve learned while stumbling my way through setting up sSMTP with Rackspace email. For the record, I created a single email account on Rackspace and set up sSMTP on my server to authenticate with the credentials for this account.

From line override – @mydomain

When I enabled FromLineOverride, my emails stopped sending. I thought it was spam filters gobbling them up, but it turned out that Rackspace was refusing to send. Looking in the mail log (/var/log/mail.log), I saw this over and over:

RCPT TO: (550 5.1.0 : Sender address rejected: User unknown in relay recipient table)

I fought with this for several hours until the Rackspace help chat technician was able to guide me to a solution. Apparently, if you’re sending emails with a From address from your domain then the From address has to match up to either a real Rackspace email account, or it has to match one of your aliases. Forget about sending from no-reply@mydomain.com unless that’s a real box.

From line override – @mydomain – with catch-all set

Creating a catch-all email seems to change this behavior. Once your catch-all is set, you can send from whatever address you want on your domain. I guess it makes sense, as now any address is a valid return address on your domain.

From line override – @otherdomain

Strangely enough, if you use a From address with a different domain, it all works fine. So, you can pretend all day long to send emails from other domains and Rackspace doesn’t care. But, send an email from a pretend address on your own domain and you’re screwed. Weird.

cron jobs and From line

You have very little control over the From line created by your cron jobs. AFAIK, they’ll always use only the username of the user executing the job. If the output is emailed out then sSMTP will append your domain to this username. Rackspace will reject the email if there’s no account or alias with that name. This means you may need to set up an alias for deploy or whatever user you use for executing your app’s cron jobs, assuming you want to receive the emailed output.

ssmtp.conf is case sensitive

Maybe I’m wrong here, but I swear that ssmtp.conf is case sensitive for YES/NO, even though I’ve seen them used interchangeably in different tutorials. In my case, FromLineOverride only made a difference when I used “no” and not “NO”. Every other YES/NO option seemed to ignore the case. Maybe I’m just crazy.

SPF record

Don’t forget to set up your SPF record! Assuming you’re just going to send email through Rackspace, the following record should work:

v=spf1 include:emailsrvr.com -all

Note: That’s a hard-fail, since I’m a hardass ;)

DKIM

Nope, not as far as I know. How hard would it be for providers to add this service? Maybe impossible if it would involve them signing emails with your key or something like that.

Well, that’s it. If I’m wrong, or there’s anything I should add, please let me know in the comments.

Update on DoLeaf progress

Business No Comments »

I actually got a request on Hacker News to write up a “where are we now” on DoLeaf. Since I’m an egotistical bastard who loves talking about his projects, here I am with an update. But, since I’m also an entrepreneur, I’m going to try and use you, the reader, to my benefit!

Timeline

Before launching into the particulars, let me lay out the timeline. DoLeaf as an idea was born around September of 2008. The business was officially founded and coding started around January 2009, and we officially launched into our beta in July of 2009. So, we’re a little over a year into it now. So, for all you get-rich-quickies, stop reading now, since this is a story of the long-view, not a 6-month hockey stick fairy tale.

Recruiting Sellers

We knew the hardest part initially would be getting sellers to sign up. Fortunately for us, it seems that we were right to assume that there was a need unserved out there.

As of now, we have 11 active stores, and a couple more that are finalizing their preparations before going live. Overall, the response has been very positive from the nursery community. They see the value in what we’re offering, and it’s definitely a nicer alternative for them than paying some shady consultant $1000s to set up a rickety online shopping cart.

We also hired a marketing consultant to help us reach out to the sellers via magazines and such. Overall, we’ve been very pleased with her work and it allowed us to recruit our first crop of sellers. Believe me, your first user is 1000x harder to recruit than your 100th. It’s hard to convince someone to be the first to jump.

Where are these orders coming from?

We’ve done essentially no marketing or advertising to buyers. Instead, we’ve been focusing on recruiting sellers in order to make sure we have a robust catalog of listings. We knew we had a chicken(sellers) v egg(buyers) problem, so we decided to try and find some chickens.

Even still, we’ve had a fairly brisk pace of orders. I’m not going to list actual numbers, but suffice it to say that we expected pretty much 0 orders (aside from friends’ whose arms we twisted) without advertising. The brisk (and increasing) pace was pretty baffling until we looked into how the customers were arriving.

SEO is still King

We forgot to factor in the power of SEO + targeted searching. Initially, we had hoped to place high for search phrases like “garden marketplace” or “buy plants online“.

Imagine our surprise to find out that we placed high for things like “siam ruby banana” or “peachy sunrise daylily“. Moreover, the conversion rates for these terms is crazy high. Of course anyone familiar with search engine marketing can see why. People searching for very specific things have money in hand ready to buy. People wanting a Siam Ruby Banana can have one ordered in a couple clicks.

We also (on the advice of a savvy seller) decided to submit our listings to Google Product Search. It took an hour or so to whip up an Atom feed that they crawl. Sales started rolling in the very next day. Every day I love Google a little bit more.

Forget Social Media

Initially we had planned to try and be all kinds of social savvy. Forget it. Now that we’ve seen the power of SEO, we’re going to walk that road. Antiquated? Old-school? Maybe so, but I’m convinced that DoLeaf can corner the market on a good chunk of plant botanical names. I’m confident that we can be on Google page 1 when you search on a botanical or common name of a plant we carry. I’m much less confident that I can get people on Twitter to fawn all over us and fan us on Facebook. That’s fine with me, though. We’re looking for sales, not fans.

Overwhelming Enthusiasm

For better or worse, we’ve got one star seller who is pushing DoLeaf to the max. We expected most sellers to list a handful of plants and take a wait-and-see approach. For the most part, that’s what they’ve done.

…Except for one. One seller found DoLeaf and turned the dial up to 11. He’s listed over 300 plants, and accounts for probably 75% of our total listings. Not surprisingly, he also accounts for the lion’s share of our sales. It’s a little scary, but exhilarating at the same time. He sees the same potential that we do, and really believes in what we’re doing. I just hope that we can make it worth the time he’s invested. As always, I’m more worried about disappointing our users than about making a profit. I’m a firm believer that if our sellers are successful and happy, then it’s impossible for DoLeaf to fail.

Buyers: The Next Frontier

Now that we’ve got a decent selection of plants, we’re ready to turn our focus toward enticing buyers. We’re going to..wait for it…try and reach people offline. This is a bit of a stretch for us, but we’ve been reminded over and over that the gardening community still has a very strong physical presence. So, we’re talking to print magazine editors, going to local gardening meetings, and generally trying to get more involved with the real-life of gardening. It can be difficult, but at the very least, I’m learning more about plants, which is a subject I truly enjoy. You can see me outside at all hours of the day inspecting my tulip bulbs and climbing roses.

I’m not sure how successful we’ll be in our buyer outreach, but the fact that our SEO efforts are proceeding so well gives me confidence that even if we fail as advertisers, we’ll still do OK on sales.

YC 2k10?

Since most people reading this will be coming from Hacker News, I’ll go ahead and say that we’re planning on applying yet again for YCombinator. However, I’m going to be smart about it this time, unlike before. I’m going to dust off our previous application, edit some dates, and resubmit. Then I’ll forget about it. We’ll probably do the same with TechStars.

Why so little effort? Because at this point I’m confident that we don’t need them. We could definitely use the help, but we’ve crossed the barrier of proving it could work. Now all we have to do is show that we have the skill and perseverance to be the ones to actually make it work.

Wanna Help?

Since our SEO efforts are proving so fruitful, I’m going to ask for your help. If you have a blog, please write a short post about DoLeaf. Pick out some crazy plants (here are a few specimens) and link to them from a blog post. That’s it! We’ll be eternally grateful, and you’ll be helping a small startup, as well as the small, family-owned businesses we serve. It counts as your good deed for the day.

CloudFront SSL with Rails and attachment_fu

Plugins, Ruby on Rails No Comments »

One of the most irritating things about CloudFront is the lack of SSL support. It’s incredibly frustrating to install an SSL certificate, get all your routing set up, then watch the browser freak out because one teeny-tiny image comes through without encryption. A major pain in the ass.

Anyways, it’s possible to sidestep the issue by requesting the image directly from S3 instead of CloudFront. You are no longer leveraging the CDN, but in my case I’d rather have the page load slightly slower than have the browser complain about security flaws.

CloudFront Helper

I wrote the following helper to make it all easy:

module CloudfrontHelper
  # Will return a URL to an S3/Cloudfront image. If the current request is HTTPS, then it will return
  # an HTTPS URL (ie. S3) and if it is HTTP then it will return a Cloudfront URL.
  def cf_img_url(s3_image, *params)
    if request.ssl?
      s3_image.s3_url(*params)
    else
      s3_image.public_filename(*params)
    end
  end
end

SSL Config in amazon_s3.yml

The final step is to turn on SSL support for attachment_fu

production:
  bucket_name: my-bucket
  access_key_id: asdf
  secret_access_key: xxxx
  distribution_domain: [my-cloud-distribution]
  use_ssl: true

Example Usage

Now, anywhere you need to display an image that’s hosted on S3/CloudFront, just use the cf_image_url helper and it will automatically route to either the S3/https version or the CloudFront/http one depending on the protocol for the request. Simple!

< %= image_tag(cf_img_url(@user.profile_pic)) %>

Web Entrepreneurship Presentation at KC Ruby

Uncategorized 2 Comments »

I had the privilege of speaking at the Kansas City Ruby User Group the night on the topic of Web Entrepreneurship. This is the second presentation along this subject. The first was an in-depth walk through how Jon Crawford went from a full time consultant to building his idea, finding his team, and putting his full effort into Storenvy. You can view his post regarding his presentation here: http://joncrawford.com/entries/web-entrepreneurship-presentation-at-kcrug.

Jon had many great points throughout his presentation and was to use those building blocks to discuss how follow entrepreneurs could build their online startup while keeping there day job. As I mention in the presentation, I love the experience that consulting provides, and given my already limited sleep schedule I find building my own startup during nights and weekends satisfies my hunger while also allowing me to keep the lights on. I definitely wanted to layout all the advantages, disadvantages, and discipline that a 60+ hour work week requires.

Web Entrepreneurship – While Keeping Your Day Job – Part 1


Web Entrepreneurship -- Ryan Felton -- Part 1 from Steven Chau on Vimeo.

Web Entrepreneurship – While Keeping Your Day Job – Part 2


Web Entrepreneurship -- Ryan Felton -- Part 2 from Steven Chau on Vimeo.

Slides

Update

Seth Godin gave a presentation today for Jelly Groups saying your going to burn out quickly. That freelancers are different then entrepreneurs. He mentioned that your goal should be to work like crazy freelancing to build up enough to live off when you quit your freelancing and switch to an entrepreneur. The major aspect issue with this (as mentioned in the presentation) is healthcare. Also, I recommend if you are going to take this approach prepare yourself to live as an full-time entrepreneur for 13 months as that tends to be the make or break period.

Rails, Textile, and javascript WYSIWYG roundup – part 2

Uncategorized 1 Comment »

In part 1, I examined a few of the editors and tried to give some plusses/minuses of each one. At the end, I mentioned markItUp! as a possible editor, but couldn’t make a recommendation due to lack of experience with it.

Now that I’ve used it, I can say definitively that it’s amazing! To be fair, it’s not a true “wysiwyg” editor and is instead a set of buttons and aides for editing some sort of markup language. However, if you are only looking for something that will make editing easier on your users, markItUp is perfect.

Skip the others and go straight to markItUp. You’ll be glad you did.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in