Amazon S3 Backup Script

After years of a somewhat laissez-faire attitude toward my VPS backups, I experimented with a bunch of crappy cloud solutions - SpiderOak (choked on billions of Maildir files), Dropbox (choked on billions of Maildir files, generated enormous local cache files), rsync (worked great, but required two VPSes). My current solution is as follows:

#!/bin/bash
BACKUPDIR=/home/joel/backups

FILES="/home/joel/Maildir/ /home/joel/mailbak/ /home/joel/www/"

# Remove the SNAR file if it's the first of the month. Or just archive for
# now?

DOM=`date +%m-%d`
SNAR=`date +%y-%m`

tar vcf "/home/joel/backups/backup.$DOM.tlz" --lzma -g "$BACKUPDIR/backup-$SNAR.snar" \
 --exclude-from $BACKUPDIR/exclude --no-check-device $FILES

s3cmd put "/home/joel/backups/backup.$DOM.tlz" s3://my.s3.bucket/

if [ $? -eq 0 ]
then
  rm "/home/joel/backups/backup.$DOM.tlz" 
else
  echo "UPLOAD FAILED - Manual intervention required."
  exit
fi

It uses GNU tar's built-in incremental backup feature. On the first day of each month, I do a full backup (since the SNAR file doesn't exist at this point). The archive is then uploaded to Amazon S3, which has a generous 5 GB of storage before you have to start paying.

I found that the --no-check-device option was important because otherwise a reboot of my OpenVZ VPS caused tar to think that everything had changed.

The script then detects the beginning of a new month, and upon successful upload, delete all previous backups so that I stay under 5 GB. Ideally I'd keep two full backups so that I had at least a month of incremental backups, but for my current use case this isn't actually very important. Another option is to direct S3 to automatically move things to Glacier after a certain time.

# Clean up previous month's backups if upload was successful
if [ `date +%d` = "01" ]
then
  for delfile in `s3cmd ls s3://my.s3.bucket/ | cut -d \/ -f 4`; do

    [ "$delfile" != "backup.$DOM.tlz" ] && s3cmd del "s3://my.s3.bucket/$delfile"

  done
fi

If running from cron, it'll email you the list of changed files every day, but because tar prints the names of all directories as well, I added a | grep -v "/$" to the tar command as well so that I didn't just ignore the list.

s3cmd uploads data as fast as it can, which was swamping my VPS and making it unresponsive. I've created a version of s3cmd with rate limiting, which adds a --speedlimit option.

If you want to encrypt files before uploading them, s3cmd has a -e option that shells out to GPG. It works fine except that it creates a temporary file as big as your tar archive, which could be enormous. It's better to just encrypt on the fly, with something like:

tar c  --lzma -g "$BACKUPDIR/backup-$SNAR.snar" --exclude-from $BACKUPDIR/exclude \
 --no-check-device $FILES \
 | gpg -c --no-use-agent --passphrase-file /root/backups/passphrase --yes \
 -o "$BACKUPDIR/backup.$DOM.tlz.gpg" -