I’ve been thinking rather a lot about ZFS snapshots recently – and have a few ideas that I thought people would be interested in. I’ll mention one now (but the other will have to wait a bit – watch this space)

On zfs-discuss some folks were thinking that a mechanism to take snapshots automatically based on a schedule would be a good idea. I agreed, and suggested that this would be really nice with integration into SMF and I posted some thoughts as to how that could work.

Well, I’ve now got a working prototype. Based on SMF, it works by creating a cron job that takes periodic snapshots of the filesystem you specify.

Rather than a single default instance of the service, I was thinking that you should have multiple instances, one per set of automatic snapshots you want to take. I’ve also got support for creating recursive snapshots – though, again, getting a -r flag for zfs snapshot would make that support a bit nicer (and atomic!)

I haven’t yet implemented the “rolling snapshot” functionality so we could only keep x number of snapshots into the past. For that, I’m waiting on the new -s flag for ZFS, that would allow me to sort by snapshot creation date, and remove the oldest (tail -1 is your friend!). I’m also a bit limited by what cron can do at the moment, but I reckon what I’ve got is good enough for starters.

What does it look like ? Well here’s a “screenshot” :

# svcs | grep zfs
online         18:36:11 svc:/system/filesystem/zfs/auto-snapshot:space-timf
# svcs -l svc:/system/filesystem/zfs/auto-snapshot:space-timf
fmri         svc:/system/filesystem/zfs/auto-snapshot:space-timf
name         ZFS automatic snapshots
enabled      true
state        online
next_state   none
state_time   Wed May 10 18:36:11 2006
logfile      /var/svc/log/system-filesystem-zfs-auto-snapshot:space-timf.log
restarter    svc:/system/svc/restarter:default
dependency   require_all/none svc:/system/filesystem/local (online)
dependency   require_all/none svc:/system/cron (online)

Okay, not that exciting to look at. To make life easier, I also wrote a simple admin GUI, which asks you the right questions, and constructs the instance manifest for you. It does need a pretty recent version of zenity (thanks Glynn !) to work, but that’s included in Solaris, so you should be okay. Here’s what that looks like :

Tim's ZFS auto-snapshot admin script

Oh, if you’d like to give this a whirl, get a copy of this tarball and do the following:

# cp zfs-auto-snapshot/lib/svc/method/zfs-auto-snapshot /lib/svc/method
# svccfg import zfs-auto-snapshot/zfs-auto-snapshot.xml
#  [ now create an instance, using my GUI, or your text editor of choice ]
# svccfg import my-auto-snapshot-instance.xml
# svcadm enable svc:/system/filesystem/zfs/auto-snapshot:tank-foo

If all goes well, you should see a new entry in your crontab (check using crontab -l) and you’ll start getting regular snapshots. And since this is SMF, you can disable with svcadm disable svc:/system/filesystem/zfs/auto-snapshot:tank-foo. I’ve tested this on snv_35, and it seems to be alright, but let me know if you encounter anything weird.

Now, there’s more work to do : particularly, the error handling here isn’t stellar (I’d like the service to degrade if we weren’t able to take a snapshot for some reason), I also need to implement the rolling snapshot functionalty and should probably be a bit more sensitive wrt. security roles and profiles. Still, for a first attempt, I think this is cool.

Here’s me showing all the snapshots I have, both automatic and manual of one of my filesystems :

# zfs list -r space/timf
space/timf            1.28G  24.9G  1.28G  /space/timf
space/timf@backup     1.66M      -   458M  -
space/timf@more-recent   114K      -   989M  -
space/timf@something_else  87.5K      -  1003M  -
space/timf@zfs-auto-snap-2006-05-10-19:00:00      0      -  1.28G  -

Any/all comments welcome!

Update 11th May: I’ve fixed a bug in the method script that could cause one auto-snapshot cron job to overwrite a separate job that was a child of the parent. Also tweaked the GUI to change the snapshot period depending on the interval type. The link above is updated to point to the new tarball.

Update 12th May: Fixed another bug in how cron jobs are created (oops).

Update 8th June : You probably want to see more recent posts on this topic, here ,