Let’s say you want to make a backup of one of your S3 buckets because you’re always a little worried that one of your scripts might burn it, or you might get a little sloppy with S3Fox. You just need a decent safety net to prevent things from going all to hell. In my case, I just want a backup bucket that has a decently recent copy of everything from my main bucket.
Actually accomplishing this is a bit tough, though, as there is no “copy bucket” function in S3. So, you have to resort to pulling all the files somewhere and then pushing them back to S3. Very roundabout, and potentially very expensive due to paying the in/out bandwidth fees. That is, unless you use an EC2 node, where S3 storage is free. That’s what I did, and it’s humming away as I type this. So far, so good.
Boot up your EC2 node
To accomplish this, you’re going to need to boot an EC2 node. I suggest using ElasticFox and firing up one of Alestic’s Ubuntu AMIs. I used ami-cb8d61a2, which is an Ubuntu 8.04 LTS image.
I made sure to boot one with EBS. To be honest, I don’t know what EBS is exactly, but I needed to backup a 20GB bucket, and I don’t think the storage on the default instances is enough for this. With the AMI I loaded, there was a directory at /mnt with 150GB of storage, so I was ready to go.
Start Syncing
Here’s a simple bash script I wrote to handle most of this.
I really don’t recommend just running the script. Instead, read it, understand it, and run the commands by hand. I’m definitely no bash scripting genius, so it’s likely I’ve messed something up.
Oh, and in case it’s not clear…NO WARRANTY. If this script burns you, I don’t accept any liability.
#!/bin/bash # Setup your AWS credentials export AWS_ACCESS_KEY_ID=XXXXXX export AWS_SECRET_ACCESS_KEY=XXXXXXXX # Get the latest s3sync wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz tar xzf /home/ubuntu/s3sync.tar.gz # Must make a directory on the EBS volume, using root # sudo mkdir /mnt/bucket_backup # sudo chown ubuntu:ubuntu /mnt/bucket_backup ln -s /mnt/bucket_backup /home/ubuntu/bucket_backup # Run the sync nohup s3sync/s3sync.rb -r --progress --make-dirs main_bucket: bucket_backup/ # Sync back to S3 nohup s3sync/s3sync.rb -r --delete --progress --make-dirs bucket_backup/ backup_bucket:
Waiting…
Right now, I’m getting transfer speeds of roughly 1MB/s. For my 20G bucket, it’s going to take 5 or 6 hours each way to back it up. At current EC2 prices, that should run me about $1. Not bad. Perhaps if I had been more careful to make sure I loaded the EC2 node in the correct data center it would go faster. I dunno. I don’t mind waiting.
Automating?
Obviously, this is a low-tech solution. It has to be run manually, which is a big downside. But, it will provide a basic safety net as long as I can remember to do it every couple weeks.
I hope this helps someone!
Recent Comments