Backup filesystem to Amazon S3

Standard

Every server needs to be backed up periodically. The trouble is finding an affordable place to store your filesystem if it contains large amounts of data. Amazon S3 is the solution with reasonably priced standard storage ($0.0300 per GB), as well as reduced redundancy storage ($0.0240 per GB) at the time of writing this article. Updated pricing can be seen at http://aws.amazon.com/s3/pricing/

This short tutorial will show how to backup a servers filesystem using s3cmd. S3cmd is a command line tool for uploading, retrieving, and managing data in Amazon S3. This implementation will use a cronjob to automate the backup processing. The filesystem will be scheduled to be synced nightly.

How to install s3cmd?

This example assumes you are using CentOS, or RHEL. The s3cmd library is included in the default rpm repositories.

After installation the library will be ready to configure.

Configuring s3cmd

An Access Key and Secret Key are required from your AWS account. These credentials can be found on the IAM page.

Start by logging in to AWS and navigating to the Identity & Access Management (IAM) service. Here you will first create a new user. I have excluded my username below.

aws-create-new-user

Create a new user

Next create a group. This group will hold the permission for our user to be able to access all your S3 buckets. Notice under permissions the group has been granted the right to “AmazonS3FullAccess” which means any user in this group can modify any S3 bucket. To grant your new user access to the group click “Add Users to Group” and select your new user from the list.

aws-create-group

Create a new group and give it the “AmazonS3FullAccess” permission. Then add your new user to this group.

For s3cmd to connect to AWS it requres a set of user security credentials. Generate an access key for the new user by navigating back to the user details page. Look to the bottom of the page for the “Security Credentials” tab. Under Access Key click “Create Access Key”. It will generate a Access Key ID and Secret Access Key. Both these are required for configuring s3cmd.

aws-create-access-key

On the user details page generate a new access key

You now have a user setup with permissions to access the S3 API. Back on your server you need to input your new access key into s3cmd. To begin configuration type:

You should now see this page and be able to enter your Access Key Id and Secret Key.

At this point s3cmd is fully configured and ready to push data to S3. The final step is to create your own S3 bucket. This bucket will serve as the storage location for our filesystem.

Setting up your first S3 bucket

Navigate to the AWS S3 service and create a new bucket. You can give the bucket any name you want and pick the region for the data to be stored. This bucket name will be used in the s3cmd command.

aws-create-s3-bucket

Each file pushed to S3 is given a storage category of standard or reduced redundancy storage. This is configurable when syncing files. For the purpose of this tutorial all files will be stored in reduced redundancy storage.

Standard vs Reduced Redundancy Storage

The primary difference between the two options is durability; or how quickly do you need access to your data. Standard storage gives you nearly instant access to your data, where as reduced redundancy storage (RRS) may take up to several hours to retrieve the file(s). For the use case of this tutorial all files are storage in RRS. As noted previous RRS is considerably cheaper than standard storage.

Configuring a simple cronjob

To enter the cronjob editor simply type

Once in the editor create the cronjob below which will run Monday – Friday at 3:30 a.m. every morning.

This cronjob calls the s3cmd sync command and loads the default configuration which you have entered above. The –delete-removed option tells s3cmd to scan for locally deleted files, then remove them from the remove S3 bucket as well. The –reduced-redundancy option places all files in RRS for cost savings. Any folder location can be synced, just change the path to your desired location. Make sure to change mybucket to the name of your S3 bucket.

The server has now been configured to do nightly backups of the filesystem to AWS S3  using s3cmd library. Enjoy!