Somewhere within the first two weeks after Catherine was born, I found the time to implement a (mostly) comprehensive backup strategy for our website data and our home computer data. We’ve actually been operating without any type of backup for a while now, living dangerously as it were. So I decided that with all of the changes we’ve been going through recently, and the exponential increase in the amount of data we’ve been collecting (web and email content, images, video, etc), it would be smart to implement a good method for backing everything up.
If you’re interested in how I set it up, please read on.
My goal in developing this backup strategy was to create a system whereby all of the data that was important to us would be backed up to at least one duplicate medium on a nightly basis, for the purposes of data retrieval if a file should be deleted or a drive should fail. Other measures such as high availability were not part of the plan; as this is just a personal backup scenario, I don’t need to spend the time or money to implement a system that can automatically switch over in case of emergency. I’m fine with manually recovering the system if necessary, provided I have backup files in place to do so. Also, ‘data that is important to us’ does not include applications and most configuration files, since they can be easily replaced by downloading or reinstalling from the original installation discs.
There are several pieces of hardware involved in this system:
* 80GB internal SATA drive, in our Mac mini Core Duo. This drive holds the core operating system, applications, and basic user data.
* 300 GB external PATA drive, connected to our Mac mini Core Duo. This drive holds all media files: iTunes, iPhoto, iMovie, iDVD, etc.
* 300 GB external PATA drive, connected to our Mac mini Core Duo. This drive serves to back up everything that is on the primary 300 GB media drive.
* [rsync](http://samba.anu.edu.au/rsync/), a piece of open source software that can be used to synchronize files between remote or local directories. It is included by default with [Mac OS X](http://www.apple.com/macosx/).
* rsync-backup, a Perl script I wrote that wraps around rsync and allows me to write simple configuration files to tell rsync how and where to back up data. If anyone is interested, I can supply the code to this script.
* mysqldump, part of the [MySQL](http://www.mysql.com/) database software package. It allows MySQL databases to be dumped into raw files so that they can be transferred to other machines or backed up.
* dump-mysql, a Perl script I wrote that wraps around mysqldump and allows me to write simple configuration files to tell mysqldump what to back up. I can supply the code to this script, as well, if anyone is interested.
* [cron](http://en.wikipedia.org/wiki/Cron), a software service that allows programs to run automatically at specified times, with no user intervention needed. cron is also included by default with [Mac OS X](http://www.apple.com/macosx/).
* EnergySaver, software built into [Mac OS X](http://www.apple.com/macosx/) that allows user to control power saving features, as well as schedule unattended startup and shutdown times.
1. Just before midnight every night, dump-mysql dumps the contents of all of the [MySQL](http://www.mysql.com/) databases on our web host into files in the home directory. This is important because almost everything on our website stores its content in the MySQL database – the Movable Type and WordPress blogging systems, our Gallery2 installation, and some Wiki software. These databases aren’t normally stored in my web host’s home directory, and thus aren’t accessible for backup purposes unless they are dumped to file. Once the databases have been dumped to a directory, they are ready to be picked up for backup along with the rest of our website data (images, html files, etc).
2. At around midnight, EnergySaver wakes up our Mac mini. I put it to sleep every night before I go to bed, in order to save energy and keep the house nice and quiet. I wake it up a few minutes before everything starts, just in case it needs some time to wipe the proverbial crud out of its eyes. 😉 Such lead time probably isn’t necessary, but it makes me feel better.
3. A few minutes after midnight, cron on our Mac mini starts the first of three rsync-backup jobs. This job synchronizes the files from our web hosting provider with my Mac OS user’s home directory, located on the 80 GB internal drive.
4. After this job finishes, the second job starts. This job synchronizes my user’s home directory on the 80 GB internal drive into a directory on the primary 300 GB media drive. This serves to back up all of my preference and bookmark files, as well as the files synchronized from our web hosting provider.
5. Finally, the third job runs. This one synchronizes the primary 300 GB media drive with the backup 300 GB drive. The backup drive is only written to when the backups are running – for regular daily use, I only access the primary media drive. This helps preserve the integrity of the backup drive.
6. After all of the jobs are finished, the Mac puts itself back to sleep.
This all happens whilst mother, father, and daughter are all blissfully sleeping. Unless, of course, mother and baby are blissfully awake for a feeding. In that case, mother and daughter are treated to the warm glow of the Mac as it wakes up, and father remains blissfully asleep, knowing that his family and his Mac are well taken care of. 😉
So in the end, we have:
* All of my user preferences backed up in two places (each of the 300 GB external drives).
* All of our media content backed up in one place (the backup 300 GB external drive).
* All of our website content and database information backed up in *four* places! This is my *favorite* part of the backup strategy. [Site5](http://www.site5.com/) keeps their own backups, which we can access upon request, then we have the content backed up on the Mac mini’s internal hard drive, then *that* is backed up onto the primary 300 GB drive, which is then backed up to the backup 300 GB drive. Niiiiiiiiiice.
At the moment, the major shortcoming of this system is that we have no off-site backup. In other words, if our apartment were to go up in flames, or our computer equipment were to be stolen, all of this backup data would be completely moot. Sure, we have insurance which will pay for the contents of our apartment (with special schedules for the computer equipment), but that doesn’t recover the data for you. Ideally, I would like to implement a weekly or monthly offsite backup, wherein I would take a complete backup of our system to a safe deposit box. The easiest way to do this would be to purchase another external hard drive, clone the first backup drive onto it, and then take one of the drives to the bank. Then, we would periodically take the working backup drive to the bank, take the other drive out of the bank, and put it in place at home as the new working backup drive. The drive at the bank would always be a bit out of date if we needed to do serious disaster recovery, but it’s certainly a lot better than having no data at all. The only thing stopping me from doing this is simply trying to work the hardware purchase and the safe deposit box into our budget. Actually, the safe deposit box would probably be a good idea in general, because we have a number of other things that should probably be stored there. I’ve looked into them before and they aren’t terribly expensive (at least not for the size we would need), so maybe we’ll do that soon. We’ll see. 🙂
I hope that this discussion of backup strategy was enlightening and thought-provoking. Please let me know if you have any questions about how I have set things up, or comments and suggestions for improving the system.