Skip to content

Automating EC2 EBS Snapshot Cleanup

I’ve recently taken on the task of building and now administering a cluster of Amazon EC2 instances. The EC2 command line tools provide all of the basic functionality you’ll need as it relates to creating new instances, EBS volumes, snapshots, and nearly everything you would ever need to do with all of the assets. The one missing piece was a script to clean-up snapshots. The way snapshots work is they accumulate in your account’s built-in S3 area, and you pay for that.

So the problem in a nutshell is I have 10 volumes, each of which is cron’ed to be snapshotted at various times of the day (depends on the specific volume as to how often it is backed up). With 10 volumes, my S3 storage costs can get out of hand quite quickly. So I needed to develop a set of scripts that would scan my snapshots – and remove the oldest ones – so I’m not paying for that storage. It is important to keep a couple of snapshots for each volume (at the very least) – and in some cases, I’d like to keep several snapshots. For example, one of my volumes is responsible for storing the main database for the CMS. This is backed up once every two hours. For that specific volume, I’d like to always have my choice of the last 10 snapshots to restore. If the database all of the sudden becomes corrupt, it may be necessary to restore earlier backups to see where and when the corruption started. Other volumes may only require the last 1 or 2 snapshots. So this script needed to be flexible – in that I could specify how many backups I’d like to keep for each volume.


PHP as an OS scripting language

My SHELL scripting experience is pretty limited – especially when performing advanced functions. So instead of using BASH, I chose to use PHP to develop this script. I’ve had decent success automating functionality with PHP over the years. It’s not nearly as clean as PERL or some OS shell scripting, but it does work, and it serves 2 other purposes: 1. consistency across the entire application, and 2. it allows me to leverage other components/libraries written for other parts of the application.

A few things to know before you try to get this working:

1. This script assumes you have the Amazon EC2 API installed, and the appropriate environment variables properly set. See Amazon’s documentation to get through that step.
2. This script is setup to run as a cron job – so you must have it set as eXecutable.
3. I make no gurantees this will work for you – but it works for perfectly for me.

Here’s the code:

        #!/usr/bin/php 
        # Set the line above to the location of your PHP command. - don't forget your start PHP tag

        # How many snapshots to keep by volume
        $keep_qty = 2;

        # Customize how many to keep by volume.  You must specify the volume ID and the # to keep.  If you exclude a volume - it uses the default $keep_qty.
        $keep_by_volume = array("vol-identifier-1" => 2, "vol-identifier-2" => 10, "vol-identifier-3" => 6, "vol-identifier-4" => 10);

        # This script has a depdency on the EC2 tools and environment variables
        # $out stores the result of the command-line script "ec2-describe-snapshots"
        $out = array();
        exec('ec2-describe-snapshots', $out);

        /*
        The command line output looks like this:
        SNAPSHOT	snap-0Xa4X6Xe	vol-ident-1	completed	2009-10-29T05:00:34+0000	100%
        SNAPSHOT	snap-7X3XdXX8	vol-ident-1	completed	2009-10-30T05:00:34+0000	100%
        SNAPSHOT	snap-X6aX2XX5	vol-ident-2	completed	2009-10-29T05:00:21+0000	100%
        SNAPSHOT	snap-XeXXcX76	vol-ident-2	completed	2009-10-29T10:00:08+0000	100%       
        SNAPSHOT	snap-X8XdXfX1	vol-ident-2	completed	2009-10-29T16:00:18+0000	100%
       */

        # Store each snapshot in it's own array element 
        $snaps = array();
        foreach ($out as $snap) {
                # Notice the output above is separated by tabs.
                $snaps[] = split("\t",$snap);
        }

        # convert to a unix timestamp
        $inx = 0;   # counter
        foreach ($snaps as $s) {
                $snaps[$inx][4] = strtotime($s[4]);
                $inx++;
        }

        # You can't really sort a PHP array on an element within the array without doing some tricks
        # So here, we're going to turn the array inside out so we can sort on the volume
        # and the timestamp
        foreach ($snaps as $key => $row) {
                $column1[$key] = $row[0];
                $column2[$key] = $row[1];
                $column3[$key] = $row[2];
                $column4[$key] = $row[3];
                $column5[$key] = $row[4];
                $column6[$key] = $row[5];
        }

        # sort it
        array_multisort($column3, SORT_ASC, $column5, SORT_DESC, $snaps);

        # Now store a consolidated array of each volume with it's snapshots
        # This will look like
        $all_snaps = array();
        foreach ($snaps as $s) {
                if (empty($all_snaps[$s[2]])) {
                        $all_snaps[$s[2]] = $s[1];
                } else {
                        $all_snaps[$s[2]] .= "," . $s[1];
                }
        }

        /*
         # At this point, we should have an array that contains looks like this
         $all_snaps['volume-id-1'] = 'snapshot-id-1,snapshot-id-2'
         $all_snaps['volume-id-2'] = 'snapshot-id-1,snapshot-id-2'
        */

        # Since these are sorted from newest to oldest, we can go through these rows
        # and delete all of the entries past the $keep_qty count
        foreach ($all_snaps as $volume => $vol_snaps) {
                $snap_arr = split(",",$vol_snaps);

                $count = 1;
                # is this volume in the special keep_by_volume array we declared up top?
                if (array_key_exists($volume, $keep_by_volume)) {
                        $keep = $keep_by_volume[$volume];
                } else {
                        $keep = $keep_qty;
                }

                # Show how many snapshots we're keeping for this particular volume
                print "Volume $volume; keeping $keep\n";

                # Iterate through snapshots for this volume
                foreach ($snap_arr as $s) {
                        if ($count <= $keep) {
                                print "Keeping: $volume/ $s \n";
                        } else {
                                # Delete the snapshot, and print the output from the ec2-delete-snapshot command
                                print "Deleting: $volume/ $s\n";
                                $out = array();
                                $cmd = "ec2-delete-snapshot $s";
                                exec($cmd, $out);
                                print "Output from command $cmd: \n";
                                foreach ($out as $o) {
                                        print $o . "\n";
                                }
                        }
                        $count++;
                }
        }

       # Close your PHP here

So there you have it. I'd love to see someone post a BASH version of a script that would do this exact same thing.

I hope you found this useful.

UPDATE: I was made aware of a solution that does what I've built here. Oren Solomianik offered up his solution as an alternative. See http://code.google.com/p/ec2-delete-old-snapshots/. It looks pretty good! The only real difference is his script works with how many days old the snapshot is - where mine just guarantees that a certain number of backups (configurable by volume ID) are always available.

Also, Oren's tool uses the Amazon EC2 PHP Library - which I purposely chose not to use. It's not that I have anything against it - I was simply trying to avoid extra requirements.

Another Update: dohpaz42 informs me that the split command is being deprecated in PHP 5.3 and should'nt be depended on. You should use explode instead. Thanks Ken!!

{ 5 } Comments

  1. David Gildeh | March 12, 2010 at 12:57 pm | Permalink

    Hi Alvin,

    This is a great script, only issue is you code the arguments (volumes/number to keep) in the script as oppossed to using it with command line arugments. I’ve taken the best of both, using your script mainly, and the command line code from ec2-delete-old-snapshots, with a few other improvements, to completely automate our backups:

    http://www.sambastream.com/blogs/dgildeh/12-03-10/implementing-revolving-backups-aws-ec2

    Hope this helps,

    Thanks,

    David

  2. Alvin Kreitman | March 12, 2010 at 1:10 pm | Permalink

    Very nice David. I think these are great improvements.

  3. ajmfulcher | April 29, 2011 at 3:36 pm | Permalink

    I’ve put together a little web application to automate ebs snapshots and retention: Ebs2s3. More details here: http://ajmfulcher.blogspot.com/2011/04/ebs2s3-automated-backup-for-amazon-ebs.html

    Hope it’s useful!

  4. Colin Johnson | October 7, 2012 at 7:37 pm | Permalink

    I’ve created “AWS Missing Tools” – an open source a tool that allows backup of multiple EBS volumes in one run and automated snapshot purging – running this with cron you could configure as follows: ec2-automate-backup -s tag -t Backup=true -k 14 -p – when run the command above would backup all EBS volumes with the tag “Backup=true” and set these volumes to be automatically purged after 14 days.

  5. Robert | April 20, 2013 at 4:42 pm | Permalink

    Hi Alvin

    Great post!

    We had exactly the same problem loads of unused volumes and snapshots.

    We used a combination of ec2 commands and bash scripts which we run every few months as a spring clean

    Let me know what you think?

    http://www.robertsindall.co.uk/blog/how-to-clean-up-amazon-ebs-volumes-and-snapshots/

    Cheers

    Robert

{ 2 } Trackbacks

  1. […] good thing about this is that it doesn’t have to be modified as you add new volumes. Pleasesee my earlier posting on snapshot cleanup to have a complete […]

  2. […] PHP script is based on Alvin Kreitman’s PHP script here which I liked because it works by keeping the last N snapshots (as opposed to time period like the […]

Post a Comment

Your email is never published nor shared. Required fields are marked *