Skip to content

Smart LAMP Application Backups using Amazon’s S3

Repeat after me: I must backup my LAMP applications!! If you’re like me – you’ve hacked your way through several iterations of backup scripts over the years. I can’t even imagine how screwed I’d be if my server went down without a backup. I have thousands of hours worth of code and ultra important and sensitive data.

In the past, I put together scripts that had hard-coded commands to tar/ftp the applications and data to a remote server. As I added new applications – or as applications would mature (and spread into other areas of the server), I would simply modify the original scripts to include the new stuff. So, of course (because I’m lazy sometimes), I had on several occasions inadvertently neglected to include something in my backup scripts. Over time, my backup scripts became ugly, unmanageable, and over-complicated.

So I came up with a new strategy. Why not develop a set of scripts that:

a. will back up the server nightly directly to remote storage
b. do not have to be constantly modified as new applications or application components are added.

Those seem like reasonable goals.

First, a little background; I have a dedicated Linux server running CentOS (although this technique should be Linux version agnostic). I host roughly 20 different domains on this server (not all of them are mine). For the stuff I develop, these LAMP applications typically consist of:

1. an htdocs directory (where the web-application is served from).
2. one or more MySQL databases (for application data and framework specific databases and tables)
3. associated scripts (for cron jobs, external activities, etc)

So to do this, you have to possibly reorganize your application. Just be careful when you are following these instructions not to break your application, and make sure you think of config files, crontabs, etc.

Let’s get started:

1. Re-organize your applications, and use consistent naming conventions: Choose a path on your server, and store your application folder and scripts folders there. I use /usr/local. My application folders always follow the pattern htdocs.domain where domain is obviously the name of the domain (i.e. techkismet.com). So for this website, the full-path/name of the application is /usr/local/htdocs.techkismet. Remember, if you move your application – you will need to make changes to your apache configuration files to point to your new htdocs directory.

I do the same thing with my application scripts. Using the same example before, my scripts folder would be /usr/local/scripts.techkismet

2. Create a home for your backup scripts and associated files: I used /usr/local/backups. . You should also create a sub-folder within that folder called “temp”. This is where your script will temporarily put all of the files that are being backed-up.

3. Build your backup shell script

The main backup script is a BASH shell-script that identifies and backs-up each of your applications and databases. This file is called /usr/local/backups/backup.sh. Here is my version as a starting point:



#!/bin/sh

# First remove the contents of the temp directory - the files in there are from yesterday.
rm -rf /usr/local/backups/temp/*

# auto-dump all mysql databases
for a in $(echo "show databases" | mysql -u user --password=password | egrep -v '^Database|^information_schema|^test'); do mysqldump -u user --pasword=password $a > /usr/local/backups/temp/$a.sql; done;

# Auto backup all htdocs.* and scripts.* folders in /usr/local (these are all of the application and scripts directories);
cd /usr/local
for a in $(ls -d htdocs.*); do tar -czf /usr/local/backups/temp/$a-backup.tgz $a/* >/dev/null 2>&1; done;
for a in $(ls -d scripts.*); do tar -czf /usr/local/backups/temp/$a-backup.tgz $a/* >/dev/null 2>&1; done;

# Additional directories/files to handle - put any other files you wish to back up here.  
tar -czf /usr/local/backups/temp/other-files.tgz /path/to/local/files/* /path/to/another/file >/dev/null 2>&1

# Backup this script and others in this directory
tar -czf /usr/local/backups/temp/backupscripts.tgz /usr/local/backups/*.sh  /usr/local/backups/s3-bash/* >/dev/null 2>&1

Let’s take a closer look at what we’re doing here:

MySql database backup:


for a in $(echo "show databases" | mysql -u user --password=password | egrep -v '^Database|^information_schema|^test'); do mysqldump -u user --pasword=password $a > /usr/local/backups/temp/$a.sql; 

This nifty command will iterate through your mysql databases, and do a mysqldump of these databases into a file of the same-name (with a .sql extension) in the temp folder. You should obviously change the user and password to a mysql user that has the appropriate rights from localhost.

The “egrep -v” command will exclude from the list the items identified in the string. The first item in the list (“Database”) is required. This text is included in the output of “show databases” and it’s not a real database obviously. You can include whichever database you prefer not to backup in this list. I’m excluding the databases “information_schema” and “test” from the backups.

So this multi-piped command produces a list of databases as strings, and each of these strings will be passed to the mysqldump command. The corresponding backup files will be created and stored in the temp folder.

Application and Scripts Folder backup:

The next few commands will go through a comparable process for the htdocs and scripts folders. The commands:


cd /usr/local
for a in $(ls -d htdocs.*); do tar -czf /usr/local/backups/temp/$a-backup.tgz $a/* >/dev/null 2>&1; done;
for a in $(ls -d scripts.*); do tar -czf /usr/local/backups/temp/$a-backup.tgz $a/* >/dev/null 2>&1; done;

These are a little less complicated, but they achieve the same objective. The ls -d command lists all directories that match the pattern htdocs.* (and scripts.*). The results of the “ls” operation are streamed one at a time to the tar command. When this is finished, all of these directories will have been “tar’d” (using a filename that matches the directory) and stored in the temp directory.

Other Items to backup

There will always be additional files (not included in the strucutres identified above) that you will want to backup. Apache configuration files, php.ini, digital certificates, server documentation/notes, other application configuration files, etc. You can back these up either in groups, or individually using something like the following command:


# Additional directories/files to handle
tar -czf /usr/local/backups/temp/other-files.tgz /path/to/local/files/* /path/to/another/file >/dev/null 2>&1

Add any other files as you see fit.

And, for good measure, you may wish to backup the “backup” directory and scripts.


# Backup this script and others in this directory
tar -czf /usr/local/backups/temp/backupscripts.tgz /usr/local/backups/*.sh  /usr/local/backups/s3-bash/* >/dev/null 2>&1

4. CRON it
You should cron this backup script to run at an appropriate time. I chose 4:00 AM, as that is when my domain activity is at it’s lowest.

Next step: Sending the files to Amazon’s S3
So we’ve only really done 1/2 the work so far, right? You need to send these locally backed up files to a remote storage area. This is where Amazon’s S3 comes in to play. You probably noticed that I’ve decoupled the backup script above from the transmission script (that we’re about to talk about). This was my goal. If I need to re-transmit for any reason, I can do that without regenerating all of the files.

I recently started using S3, and I’m very cooled out by it. It’s a web-services accessible data-storage service, and it can be used to publicly or privately store anything you want. You can find out more by visiting http://aws.amazon.com/s3. Pricing is reasonable. They charge for bandwidth and storage. My typical bill runs around $2.40 a month – and my typical nightly backup is right around 600 MB.

As I mentioned earlier, I send full-backups each night. There are more-advanced tools that allow you to “sync” between local data-stores and S3. This should cut down on your bandwidth usage – and could create a substantial savings. I may make a future posting about this technique.

So, on to the details. Once I signed up for AWS S3, I was able to retrieve my access identifier and my “secret” access key. For my scripts, I used a utility called S3-Bash by Raphael James Cohn. This set of BASH scripts allow you to access your S3 buckets from the Linux command line (get, put, and delete).

So do this:

1. Sign up for AWS S3 (if you haven’t already). Get your access identifier and secret keys. Copy and paste them someplace where you can have them readily available.
2. Download the S3-Bash code.
3. Create a folder in your backups folder called s3-bash, and place the S3-bash scripts in there.
4. Create a text file for each of your access key and secret key. I used the file names “key.txt” and “secret_40.txt” respectively. These files will be used with the s3-bash commands to associate the transmission with your AWS account. Please NOTE: Make sure to follow the instructions here if you are using an editor such as “vim”, it will store a line-feed automatically at the end of the line. The files need to be 20 and 40 bytes respectively and the commands won’t work if the files sizes are not exact. Also, please note that your server time should be accurate for s3-bash commands to work. You can read about this requirement on the AWS knowledge-base.

5. You will need to create your destination “bucket” (or folder) on S3. There are command-line scripts that can do this, but I found it easiest to download and install a browser plugin for FireFox called s3Fox. You can download/install it at https://addons.mozilla.org/en-US/firefox/addon/3247. If you don’t use FireFox – you might want to look into the other options on the AWS knowledge base.

Otherwise, once you’ve installed the plugin (and set it up to access your account) you should create the bucket to store your backup-files. Mine is called “daily-backup”

6. Create a script called /usr/local/backups/daily2s3.sh. Here is what mine looks like including comments:


# This script is executed nightly after backup.sh which produces the files.
# The key files below are required for access to S3
# The key files have to be 20 (for key.txt)  and 40 bytes (for secret_40.txt) .  Use dd to remove the lf if the file was created using VI
# for example
#
# dd if=<21bytefile.txt> of=<20bytefile> bs=1 count=20

cd /usr/local/backups/temp
for a in $(ls); do /usr/local/backups/s3-bash/s3-put -k `cat /usr/local/backups/s3-bash/key.txt` -s /usr/local/backups/s3-bash/secret_40.txt -T $a -c application/octet-stream /daily-backup/$a; done;

As you can see, this script iterates through the files created in the temp folder – and “put”s them in the “daily-backup” bucket on S3. The “cat” commands executed inline incorporate the keys when needed, and keep the keys private (in case someone happens to do a “ps ax” when the command is running. You should cron this script as well. Make sure you schedule it after the backup script – and give your backup script plenty of time to finish. In my case, I scheduled these two scripts 30 minutes apart.

That’s it

I hope you find this useful. Please feel free to leave me comments/feedback!


{ 3 } Comments

  1. Calvin | February 20, 2008 at 4:00 am | Permalink

    Sorry mine is not a coment, rather a question.

    Why do you first redirect to /dev/null and then to $1?..

  2. Alvin Kreitman | May 5, 2008 at 6:49 am | Permalink

    Calvin: I’m sorry for the major delay in replying to you comment. It got lost in a bevy of SPAM comments and I didn’t notice it till just today. So the answer to your question is that dev/null is a nice place to put all standard output – so when this runs as a cron job – I don’t get emails detailing every little thing. The later part of the command – 2>&1 simply tells the command to redirect all error output to standard output – so that is also not displayed on the screen. See a good description of this technique here: http://www.xaprb.com/blog/2006/06/06/what-does-devnull-21-mean/

  3. tardigrade | September 28, 2008 at 8:13 pm | Permalink

    Thanks for putting the script together and sharing it. The transmission script in particular has been really useful. I was struggling with finding a script to help me back up my virtual servers tar’s to s3. The s3-bash makes everything a lot simpler and your transmission script is the icing on the cake that I needed!

    Many thanks.

Post a Comment

Your email is never published nor shared. Required fields are marked *