Simple S3 Log Archival | Musings of an Anonymous Geek

UPDATE: if anyone knows of a non-broken syntax highlighting plugin for wordpress that supports bash or some other shell syntax, let me know :-/

Apache logs, database backups, etc., on busy web sites, can get large. If you rotate logs or perform backups regularly, they can get large and numerous, and as we all know, large * numerous = expensive, or rapidly filling disk partitions, or both.

Amazon’s S3 service, along with a simple downloadable suite of tools, and a shell script or two can ease your life considerably. Here’s one way to do it:

Get an Amazon Web Services account by going to the AWS website.
Download the ‘aws’ command line tool from here and install it.
Write a couple of shell scripts, and schedule them using cron.

Once you have your Amazon account, you’ll be able to get an access key and secret key. You can copy these to a file and aws will use them to authenticate operations against S3. The aws utility’s web site (in #2 above) has good documentation on how to get set up in a flash.

With items 1 and 2 out of the way, you’re just left with writing a shell script (or two) and scheduling them via cron. Here are some simple example scripts I used to get started (you can add more complex/site-specific stuff once you know it’s working).

The first one is just a simple log compression script that gzips the log files and moves them out of the directory where the active log files are. It has nothing to do with Amazon web services. You can use it on its own if you want:

#!/bin/bash

LOGDIR='/mnt/fs/logs/httplogs'
ARCHIVE='/mnt/fs/logs/httplogs/archive'
cd $LOGDIR
if [ $? -eq 0 ]; then
for i in `find . -maxdepth 1 -name "*_log.*" -mtime +1`; do
gzip $i
done

mv $LOGDIR/*.gz $ARCHIVE/.
else
echo "Failed to cd to log directory"
fi

Before launching this in any kind of production environment, you might want to add some more features, like checking to make sure the archive partition has enough space before trying to copy things to it and stuff like that, but this is a decent start.

The second one is a wrapper around the aws ‘s3put’ command, and it moves stuff from the archive location to S3. It checks a return code, and then if things went ok, it deletes the local gzip files.

#!/bin/bash

cd /mnt/fs/logs/httplogs/archive
for i in `ls *.gz`; do
s3put addthis-logs/ $i
if [ $? -eq 0 ]; then
echo "Moved $i to s3"
rm -f $i
continue
else
echo "Failed to move $i to s3... Continuing"
fi
done

I wish there was a way in aws to check for the existence of an object in a bucket without it trying to cat the file to stdout, but I don’t think there is. This would be a more reliable check than just checking the return code. I’ll work on that at some point.

Scheduling all of this in cron is an exercise for the user. I purposely created two scripts to do this work, so I could run the compression script every day, but the archival script once every week or something. You could also write a third script that checks your disk space in your log partition and runs either or both of these other scripts if it gets too high.

I used ‘aws’ because it was the first tool I found, by the way. I have only recently found ‘boto‘, a Python-based utility that looks like it’s probably the equivalent of the Perl-based ‘aws’. I’m happy to have found that and look forward to giving it a shot!

Share this: