In an earlier post, I talked a little about what the ON tech lead job entails. In this post, I’m going to talk about some of the changes I made to keep raising the quality bar for the release.
Doing quality assurance for something as large as an operating system presents a few problems, similar to those for any software project, just on a larger scale:
- writing a set of comprehensive automated tests (and having a means to analyze and baseline results)
- ensuring those tests are maintained and executed frequently
- knowing what tests to execute when making a change to the OS
In Solaris, we have several dedicated test teams to execute our automated tests periodically (both on nightly builds, and on biweekly milestone builds) as well as on-demand, and bugs are filed when problems occur. Each team tends to focus on a different area of the OS. We test changes from one consolidation at a time, before doing wider-area testing with changes from all consolidations built into a single OS image.
However, only finding problems after the relevant change has integrated to the source tree is often too late. It’s one thing causing a problem with an application, but an entirely different thing if your change causes the system to panic – no more testing can be done on those bits and the breakage you introduced impacts everybody.
To try to reduce the chance of that happening, we try to “build in” quality into our development processes, to make the break-fix cycle as short as possible. To get a sense of where potential gaps were, I spent time looking at all of the testing that we do for Solaris, and documented it in an wiki page, ordering it chronologically.
I won’t repeat the entire thing here, but thought it might be interesting to at least show you the headings and subheadings. Some of this refers to specific teams that perform testing, other parts simply indicate the type of testing performed. This list now contains some of the test improvements I added, and will talk about those later.
- Before you push
- Your Desktop/VM/cloud instances/LRT
- The build
- Your test teams
- DIY-PIT and QSE-PIT
- Selftest Performance testing
- Project PIT runs
- AK Testing
- After you push
- Incremental builds
- Incremental boots
- Incremental unit-tests
- Periodic Live media boots
- Nightly builds
- Running the nightly bits on the build machines
- Nightly WOS builds
- Nightly gate tests
- Nightly ON-PIT tests
- Bi-weekly ON-PIT tests
- After the WOS is built
- Solaris RE tests
- Solaris SST
- ZFSSA Systems Test
- Conformance testing
- Performance testing
- Jurassic, Compute Ranch build machines, shared group build machines
- Platinum beta customers
- Earlier releases
- SRU testing
(A note on the terminology here: “WOS” stands for “Wad Of Stuff” – it’s the biweekly Solaris image that’s constructed by bundling together all of the latest software from every consolidation into a single image which can be freshly installed, or upgraded to.
“PIT” stands for “Pre-Integration Test”, typically meaning testing performed on changes pushed to each consolidation’s source tree, but not yet built into a WOS image)
Running the bits you build
I’ve talked before about the ON culture of running the bits you develop, so won’t repeat myself here, except to say that the gate machine, the gate build machines, and all developers are expected to run at least biweekly, if not nightly bits. As engineers, we tend to be a curious lot, and enjoy experimenting with shiny new software – it’s amazing (and a little worrying) to discover bugs that the test suites don’t. As we find such gaps in test suites, we file bugs against them so that the test suites continually improve.
Building the bits you run
Building Solaris itself turns out to be a good stress test for the operating system, invoking thousands of processes and putting a reasonable load on the system, more so if it’s a shared build machine.
The build itself also does a significant amount of work verifying that the software it’s producing is correct: apart from the obvious tools tha run during the build, like lint and Parfait (an internal static analysis tool) there are a host of other checks that perform verification on the binaries that are produced.
Indeed, to maximise the chances of finding errors during the build, we compile the source tree with two separate compilers (currently Oracle Developer Studio and gcc) discarding the binaries produced by the “shadow compiler”. As the different compilers produce different warnings, sometimes one will report errors that the other misses, which can be an aid to developers.
The problem with catching problems early
As much as possible, we emphasise pre-integration testing to find problems early. The flip side of that, is that not every engineer has access to some of our larger hardware configurations, and test labs containing them are a finite resource.
Another problem is that even with access to those large systems, how do you know which tests ought to be executed? Since lab-time is limited and some tests can take a long time to complete, we simply can’t run all the tests before every integration because then we’d never be able to effectively make changes.
A common way for tests to be developed for Solaris, was by having a separate teams of test engineers maintain and update tests rather than developers owning their own tests (this wasn’t the rule of course – some developers modified those test suites directly)
In some cases where engineers did write their own tests, the test code was often stored in their home directories – they’d know to execute the tests the next time they were making changes to their code, but nobody else would know of their existence, and breakage could occur.
The build was also lacking any way to describe which tests ought to be executed when a given piece of software changed, and it became a question of “received wisdom” and experience to determine what testing needed to be performed for any given change.
Continuous integration in ON
For some time (5+ years before my time as far as I can tell), the ON gate has had a simple incremental build facility. As each commit happened to the source tree, some custom code, driven by cron and procmail would select one of four previously built workspaces, pull the latest changes and kick off a build.
This typically found build-time problems quickly, so we’d be able to either backout changes that didn’t build, or get in touch with the responsible developer to arrange a fix before too many engineers were impacted, but the binaries that were produced by those incremental builds were simply discarded, which seemed like a lost opportunity to me.
Even worse, from time to time, we’d get integrations that built fine, but actually failed to boot on specific systems due to inadequate testing!
So, to see about modernizing our build infrastructure and plug this gap, I started looking into using Jenkins to not only periodically build the source tree, but also to update a series of kernel zones with those changes and make sure that we were capable of booting the resulting OS.
That was a pretty simple change, and I was pleased with how it turned out. Once that was in place for a few months, I started to wonder what else we could do with those newly booted zones?
Developer notifications and unit testing
I’ve mentioned already that in a large engineering organisation, it’s difficult to know what to test, and being dependent on a separate test group to implement your test suite can be frustrating. Of course, there can be advantages in that separation of duty – having a different pair of eyes looking at changes and writing tests can find problems that a developer would otherwise be blind to.
Given our experience with the IPS consolidation, and its use of unit tests, one of the Solaris development teams working in ON decided to take a similar route, wanting to add their tests to the ON source tree directly.
Rather than deal with the chaos of multiple teams following suit, I felt it was time to formalize how tests were added to the source tree, and to write a simple unit-test framework to allow those tests to be discovered and executed, as well as a way to advertise specific other testing and development advice that could be relevant when we detect modifications to a given source file.
Obviously there were some limits with what we could do here – some tests require specific hardware or software configurations, and so wouldn’t be appropriate for set of build-time tests, other tests are too long-running to really be considered “unit tests”.
Other tests may require elevated privileges, or may attempt to modify the test machine during execution, so it can be tricky to determine when to write a separate test suite, vs. when to enroll in the gate unit-test framework.
As part of this work, I modified our “hg pbchk” command (part of our Mercurial extension that performs basic pre-putback verification on changesets about to be integrated to the source tree, essentially ensuring the integration paperwork is correct)
The pbchk command now loads all of the test descriptor files found in the workspace, reports if tests are associated with the sources being modified, and will print specific developer notifications that ought to be emitted when a given source file changes.
I think of it as a “Robotic CRT advocate” – to point out testing that ought to run prior to integration (the CRT, or “Change Review Team” are a group of senior engineers who must pre-approve each and every putback to the ON source tree and will see the results of ‘hg pbchk’ during their review, and will verify that testing was completed)
Over time, that test framework is getting more and more use, and we now have tests that are easy to run, implemented as a simple build(1) task. Here are the ones we have today:
timf@whero build test INFO : No config file passed: using defaults STATE : Starting task 'test' Usage: build test <file name or section name> FILE SECTION NAME SYNOPSIS ak-divergence.cfg ak-divergence [ no synopsis available ] build.cfg build-notify Warn when users modify AK build tools build.cfg build-test A series of tests that exercise build(1) corediag.cfg core-diag-test A series of tests that exercise coremond. crypto-fips.cfg crypto-fips-140 Crypto Framework FIPS 140-2 Boundary Change note. daxstat.cfg daxstat-test A series of tests to exercise daxstat elfsignrange.cfg elfsignrange Warn about grub2 duplication of elfsignrange code fuser.cfg fuser-test A series of tests to exercise fuser fwenum.cfg fwenum-unit Firmware Enumerator unit tests libc.cfg gettext A simple test for gettext(3C) gk.cfg gk-test A series of tests that exercise the gk tool ipf2pf.cfg ipf2pf-test Test verifies ipf2pf is still sane kom.cfg kom-test Unit tests for the KOM framework libcmdutils.cfg libcmdutils-test A series of tests that exercise libcmdutils libdax.cfg libdax-test A series of tests that exercise libdax libdiskmgt.cfg libdiskmgt-test Test for dumping libdiskmgt cache and do inuse operation. libkstat2.cfg libkstat2_basic A series of basic tests that exercise libkstat2 libkstat2.cfg libkstat2_priv A series of privileged tests that exercise libkstat2 libnvpair.cfg libnvpair-test-27 libnvpair unit tests (Python 2.7) libnvpair.cfg libnvpair-test-34 libnvpair unit tests (Python 3.4) libsdwarf.cfg libsdwarf-test A series of tests that exercise libsdwarf. libuuid.cfg libuuid-test A series of tests that exercise libuuid libv12n.cfg libv12n-test A series of tests that exercise libv12n. mdb.cfg mdb-ctf Mdb CTF unit tests memtype.cfg memtype-test A series of tests for memory types and attributes. netcfg.cfg netcfg-noexec A series of tests that verify libnetcfg operation odoc.cfg odoc-test A series of odoctool tests pbchk.cfg pbchk-test Warn that pbchk tests must be run manually pfiles.cfg pfiles-test A series of tests to exercise pfiles rad-auth_1.cfg rad-auth_1 Tests for RAD module authentication version 1 rad-loccsm_1.cfg rad-loccsm_1 RAD locale test module loccsm version 1 odocprovider.cfg rad-module-odocprovider_1-test Tests for RAD module odocprovider rad-test.cfg rad-test A series of tests that exercise RAD daemon rad-test-rbac_1.cfg rad-test-rbac_1 Testing RBAC integration in RAD libc.cfg sendfile-test sendfile unit tests for blocking socket and NULL offset libc.cfg sendsig A unit test for the user signal (only for SPARC) smf.cfg smf A series of tests that exercise SMF smf.cfg smf-python Tests for solaris.smf.* smf.cfg smf-sysconfig The intersection of sysconfig and SMF snoop.cfg snoop Warn about PSARC 2010/429 spliceadm.cfg spliceadm-test A series of tests to exercise spliceadm sstore.cfg sstore-unit Statistics store unit tests sstore.cfg sstore-unit2 Statistics store unit tests (Python 2) libstackdiag.cfg stackdiag-crosscheck Feed stackdb records to libstackdiag, verify results sstore.cfg statcommon Warn about statcommon sxadm.cfg sxadm-aslr A series of tests that exercise aslr sxadm.cfg sxadm-noexec A series of tests that exercise nx heap/nx stack sxadm.cfg sxadm-test A series of tests that exercise sxadm timespec.cfg timespec-test Test for timespeccmp macro updatedrv-test.cfg updatedrv-test Warn about update_drv dynamic device configuration test vboot.cfg vboot Warn about grub2 duplication of verified boot cert code verify_key2.cfg verify_key2 Warn about duplication of verified boot development key definitions vm2.cfg vm2-test Unit tests for the VM2 code webuicoord.cfg webuicoord-unit-2.7 WebUI Coordinator unit tests (Python 2.7) webuiprefs.cfg webuiprefs-unit-2.7 WebUI Preferences unit tests (Python 2.7) zfs.cfg zloop-test Testing zloop, a framework for ZFS unit tests zonecfg.cfg zonecfg Warn about ModifyingZonecfg wiki zones.cfg zones-test Test wrappers around various zones system calls STATE : Finishing task 'test' SUMMARY : 'test' completed successfully and took 0:00:00.
Of course, this is still a small fraction of the overall number of tests that run on the OS, but my hope is that we will continue to extend these unit tests over time. From my past experience as a test developer on the ZFS Test team, the easier you make tests to execute, the more likely a developer is actually going to run them!
In conjunction with the more comprehensive Jenkins pipelines we have recently finished work on, this framework has been well received, and has found problems before customers do – which continues to make me very happy.