Time Machine vs. ZFS + rsync

Update: I actually got the fslogger thing at the end of this entry working so I can do incremental backups. Not really a product yet but it isn’t hard to do. Here is the super rough version of it.

I can’t stand inefficiency. Time Machine is fundamentally a very inefficient mechanism for backing up large files that change. So bad actually that most things like Parallels and VMWare disable backups of your disk images. Here is the basic algorithm:

1) Get the list of files that have changed since the last backup
2) Create new directory in backup store
3) Copy any file that has changed since the last backup
4) Create hard links to any file or even whole directory in the new backup to the last backup for any file that has not changed

Step 1 is pretty efficient for Time Machine as they keep hooks into the filesystem to track those changes as they occur. Step 2 is obviously easy. Step 3 is a doosy. If you change 1 byte in a VMWare image it will copy the several gigs over to the backup store. Not a great result from such small change and that would quickly consume your disk flushing valuable older changes out of the system. Step 4 is also very efficient because hard links are trivial to create and use virtually no space, though they did have to make special changes to HFS+ so that you could hard link directories to make Time Machine more efficient.

The obvious big problem here is that in the case that a file changes at all you need to copy the whole thing to you backup device. Not that viable over the internet or even WiFi for really big files that are updated often like VM images. You might have wondered why Apple is considering integrating ZFS directly into Mac OS X, now you know why. ZFS lets you do something very special: create a snapshot of a whole filesystem. Essentially a copy of that filesystem at a particular point in time and they do this without copying whole files when they change but instead at the block level. This amazing capability is critical in this more efficient way to backup your system with multi-level snapshots.

Enter rsync. Rsync has been around for a long time. It is used by system adminstrators everywhere to efficiently update files in one location with files from another location, even over the internet. It does this by comparing them at the block level and only sending diffs when needed to update files on the other end. Using the right command line options you can essentially make one filesystem look like a carbon copy of another filesystem. Using this in combination you can make a backup solution that is much better than most out there:

1) Rsync your current filesystem to a ZFS filesystem — remote or attached storage
2) Take a snapshot of the resulting filesystem to forever capture its state

Those are the two steps. Nothing more. Here is the script that I use to backup my Macbook Air to my server at home:

#!/bin/sh
cd /Users
time rsync -av --delete sam 192.168.1.90:/Volumes/zdisk/macbookair
ssh 192.168.1.90 sudo zfs snapshot zdisk/macbookair@`date "+%s"`

This results in a set of filesytems that looks like this:

zdisk/macbookair             14.9G   898G  14.6G  /Volumes/zdisk/macbookair
zdisk/macbookair@1225350709   125M      -  14.6G  -
zdisk/macbookair@1225351248   117M      -  14.6G  -
zdisk/macbookair@1225418584  21.7M      -  14.6G  -

This obviously isn’t as awesome as using Time Machine to recover my files because I don’t have a great UI, I have to run a script and generally have to know more about the system than a Time Machine user. However… I can update a VM without sending gigs of data over the internet to back it up or deal with not having a backup at all.

The only downside is that an empty backup still takes about 8 minutes to go through all my files. Next step would be to integrate into the fslogger into the solution and only look at those files that changed for sure.

This entry was posted in Apple, Technology and tagged , , , , , , , . Bookmark the permalink.

11 Responses to Time Machine vs. ZFS + rsync

  1. If you’re backing up onto a non-mac filesystem (e.g., a NAS box), make sure the remote rsync has ‘fake super’ in its rsyncd.conf; this will help preserve xattrs, ACLs, etc. On your mac, use rsync with -aHAXxs.

    It’s not perfect (larger xattrs get dumped, and xattrs on symlinks don’t work), but better than rsync out of the box.

  2. ssp says:

    Actually Time Machine is even worse than what you state as it seems to determine whether a file has changed solely by looking at its modification date and size. In particular Time Machine will not discover changes in a file in case its modification date didn’t change. Also, conceptually (as opposed to file system snapshots, I suppose), Time Machine’s behaviour when you change ownership of or access permissions to folders is probably less than desirable as well.

    All the more reasons for a better way to do backups like the one you present. But Time Machine’s advantage of making the creation of a backup really trivial and without scanning your full drive is hard to compete with.

  3. sam says:

    @Mark: I am backing up to ZFS running on my Mac OS X box using the latest bits from http://zfs.macosforge.org/trac/wiki/ along with the Mac version of rsync on both sides. Additionally, I am only backing up things in my Users directory — no system files, do I still need the additional flags?

    @ssp: I actually think it would be quite system to use this tool http://www.osxbook.com/software/fslogger/ to track filesystem changes which would allow the rsync solution to be just that much better. Looks like I’ll need to put a nice UI on it as well.

  4. Cristian Yxen says:

    1. Snapshots are not something new, other filesystems have this feature for years. UFS2 for example.

    2. No need for rsync and scanning your full harddrive if your local disk is ZFS too. ZFS includes the feature of incremental snapshots which can be copied and appended to ZFS filesystem on another disk or remote filesystem.

    So the way would be:

    1) make a ZFS snapshot on your local disk
    2) send the initial snapshot to the backup disk
    3) take more snapshots on your local disk
    4) copy the incremental snapshot (only including differences between two snapshot) to the backup disk

  5. sam says:

    @Cristian: 1) Didn’t mean to imply that they were new, but they are new to the Mac. 2) That is ideal for a ZFS only environment — hoping we can do just that in Snow Leopard. Right now though I’m still on Leopard and using HFS+ as my main filesystem.

  6. Jeff says:

    Great article! I’ve been looking for a way to use fsevent with rsync for months now, and as far as I know, you are the first to hack something together. It’s been a little while since your post, so I’m wondering if you have any updates or have made any improvements to the script? For example, implementing incremental deletes or allowing full volume “/” backups? Also, I’m curious what sort of additional overhead running fslogger constantly between backups creates?

    I started experimenting with rsync to ZFS a while back, but ended up coming back to Time Machine for varying reasons, (one being so I could use the AirDisk I have connected to my Airport Extreme and not run a whole ZFS server, another some compatibility issues between Mac meta-data and OpenSolaris or FreeBSD). However, I’m once again becoming frustrated at the time and resources it takes to do an hourly TM backup – wouldn’t it be nice to just send file deltas! I would be interested whether it would be possible to alter the script to edit out ZFS, and to backup an entire volume to a HFS+ disk image on an AirDisk, either performing hard-linked snapshots (a la rsnapshot) or just a regular, non-versioned backup. Unfortunately, I’m not a programmer, so I would have no idea where to begin. Great script though, and I look forward to testing it out with ZFS.

  7. sam says:

    I haven’t done any more work on it but I am running it 24/7 on 2 of my machines (a MacPro and a Macbook Air) and the overhead seems to be very low. As for incremental deletes I basically decided that I would just run the full backup periodically rather than add the functionality. Full volume backups I believe are possible as I just recently used rsync to move my bootdisk. You need some additional parameters to make it work: rsync -xrlptgoEv did all that it needed to make a full backup.

    Sam

    Reference: http://rna.urmc.rochester.edu/john/public/pages/backup%20Mac%20easy%20methods.html

  8. Nakia J Bryden says:

    great article!, grats for u site :)

  9. Ralph P Dickerson says:

    your blog is great!

  10. You got excellent writing skills. Awesome article. I enjoyed every word of it. Thanks:)

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

blog comments powered by Disqus