Today’s a very interesting day for storage systems – it’s cool to see
the Fishworks team are announcing the Sun Storage
7000 series systems: congratulations one and all. Great things are afoot in
my opinion, these are fantastic systems.
While I’m not working on storage systems at Sun any more, I do feel an
amount of empathy for those guys: I am working on a software appliance
 in the form of xVM Server, and I can certainly
appreciate what it takes to take a perfectly working OpenSolaris install,
strip it down to the bare minimum, add stuff to make it shine especially brightly for a given task, and (of
particular focus for me at the moment!) get a product out to the market.
That said, in my previous job in the Solaris ZFS test group,
I did run into the Fishworks project, and that story might be worth
telling. (And if
there’s rose-coloured glasses coming across in this post, I apologise: I
love my current job, as much fun as QE was, it was also pretty grueling at times ;-)
It was coming into October 2007, and
PSARC 2007/618 – the addition of L2ARC devices to ZFS was looming. These devices, along with Separate ZFS Intent Log devices (as a pair, affectionately
known as ReadZilla and WriteZilla) and their intelligent application in a
hybrid storage pool are some of the most exciting things about the
products being announced today and I’ve really been looking
forward today’s announcement: it always gives me kicks to see Sun
technology hit the market when I’ve been able to contribute to the product
personally, even in the small way that I did in this case.
Brendan had got in touch with
the ZFS test group to see whether we could do anything to help out.
Our job as QE engineers on ZFS was to write and maintain the
test suite. Clearly we needed to update the test suite to work with
these new L2ARC devices. We’d done the
same thing for slog devices, but in this case, we were looking for
test coverage quickly. There was a ton of other work piling up on my
plate: Solaris 10 update testing for ZFS, the Newboot Sparc work for
Nevada, test sponsor duties for the fingerprint authentiction
project, on top of all the other daily stuff going on. Busy busy.
So, I started hacking about to see how quickly I could get us a
very general set of tests on the L2ARC. The answer? Pretty quickly indeed.
Rather than start from scratch by coming up with a closed set of
assertions about L2ARC devices, discussing those assertions with
colleagues, making sure they were carefully worded, before setting
about implementing tests to verify each assertion, I decided to just wing it.
Now that’s not to say that we shouldn’t also go about
writing tests properly, but for a quick fix (in every sense of the word),
I wrote a 90 line shell wrapper around
which you can download here, if you’re interested.
The wrapper maintained a list of devices that it’d try to add to every
zpool created wth the wrapper; creating a pool would use up one device
from the list, destroying the pool via the wrapper would
return the device to the list. Pretty simple. This gave us a phenomenal
amount of testing for free.
We could use this with our existing test
suite, and it would add an L2ARC device to every pool. We could test big
and small L2ARC devices, ones based on lofi devices backed by files in /
tmp or ramdisks (attempting to simulate really fast disks, despite the
weird VM hoops we were jumping through – which resulted in great
hilarity when run with our somewhat insane stress tests running on really
large machines…) and generally give the code a good run through.
The wrapper found a respectable amount of bugs, and was worth it’s weight
in gold, despite it’s lack of formality in terms of the way we usually write
tests. I’m not sure if it’s still being used by the ZFS QE team, but I was
pretty fond of it.
I think one of the reasons why L2ARC was so pleasant to test, was down
to it’s design. Like the intent log devices, they integrate beautifully
into the rest of the system, with very little extra work on behalf of the
user: and that usually makes test engineers happy too (or at least lets
them concentrate on the underlying feature, rather than having to spent
extra time making sure the CLI was working properly)
Of course helping on L2ARC testing wasn’t all work – I was lucky enough to make it over to
the Bay Area for the first OpenSolaris developer summit that month, and
while in town Brendan was kind enough to invite me up to the Fishworks
office for a quick chat about the testing, a look around, and a rather
excellent burger for lunch. I even got the chance to discover that I’m
completely dreadful at Fish-pong, perhaps lacking in the basic grounding
of American football, table tennis and volley ball rules that my Irish
upbringing just didn’t provide – but that’s another story.
I never got a chance to test on one of the physical Storage 7000 series
boxes themselves, nor play with what looks like one of the snappiest
web interfaces I’ve seen in a long time, instead I was focusing on L2ARC itself,
and helping to make sure it was solid enough to integrate into Solaris. However,
that same operating system is the very one that underpins these appliances, so in that
sense – I’m glad I could help!
 although yes, today’s announcements are software and hardware – indeed, xVM Server’s not much without the right hardware to back it up either..