In an earlier post, I talked a little about what the ON tech lead job entails. In this post, I’m going to talk about some of the changes I made to keep raising the quality bar for the release.

Doing quality assurance for something as large as an operating system presents a few problems, similar to those for any software project, just on a larger scale:

  • writing a set of comprehensive automated tests (and having a means to analyze and baseline results)
  • ensuring those tests are maintained and executed frequently
  • knowing what tests to execute when making a change to the OS

In Solaris, we have several dedicated test teams to execute our automated tests periodically (both on nightly builds, and on biweekly milestone builds) as well as on-demand, and bugs are filed when problems occur. Each team tends to focus on a different area of the OS. We test changes from one consolidation at a time, before doing wider-area testing with changes from all consolidations built into a single OS image.

However, only finding problems after the relevant change has integrated to the source tree is often too late. It’s one thing causing a problem with an application, but an entirely different thing if your change causes the system to panic – no more testing can be done on those bits and the breakage you introduced impacts everybody.

To try to reduce the chance of that happening, we try to “build in” quality into our development processes, to make the break-fix cycle as short as possible. To get a sense of where potential gaps were, I spent time looking at all of the testing that we do for Solaris, and documented it in an wiki page, ordering it chronologically.

I won’t repeat the entire thing here, but thought it might be interesting to at least show you the headings and subheadings. Some of this refers to specific teams that perform testing, other parts simply indicate the type of testing performed. This list now contains some of the test improvements I added, and will talk about those later.

  • Before you push
    • You
    • Your Desktop/VM/cloud instances/LRT
    • The build
    • Your test teams
      • DIY-PIT and QSE-PIT
      • Selftest Performance testing
      • Project PIT runs
      • AK Testing
  • After you push
    • Incremental builds
    • Incremental boots
    • Incremental unit-tests
    • Periodic Live media boots
    • Nightly builds
    • Running the nightly bits on the build machines
    • Nightly WOS builds
    • Nightly gate tests
    • Nightly ON-PIT tests
    • Bi-weekly ON-PIT tests
  • After the WOS is built
    • Solaris RE tests
    • Solaris SST
    • SWFVT
    • ZFSSA Systems Test
    • Conformance testing
    • Performance testing
    • Jurassic, Compute Ranch build machines, shared group build machines
    • Platinum beta customers
  • Earlier releases
    • SRU testing

(A note on the terminology here:  “WOS” stands for “Wad Of Stuff” – it’s the biweekly Solaris image that’s constructed by bundling together all of the latest software from every consolidation into a single image which can be freshly installed, or upgraded to.

“PIT” stands for “Pre-Integration Test”, typically meaning testing performed on changes pushed to each consolidation’s source tree, but not yet built into a WOS image)

Running the bits you build

I’ve talked before about the ON culture of running the bits you develop, so won’t repeat myself here, except to say that the gate machine, the gate build machines, and all developers are expected to run at least biweekly, if not nightly bits. As engineers, we tend to be a curious lot, and enjoy experimenting with shiny new software – it’s amazing (and a little worrying) to discover bugs that the test suites don’t. As we find such gaps in test suites, we file bugs against them so that the test suites continually improve.

Building the bits you run

Building Solaris itself turns out to be a good stress test for the operating system, invoking thousands of processes and putting a reasonable load on the system, more so if it’s a shared build machine.

The build itself also does a significant amount of work verifying that the software it’s producing is correct: apart from the obvious tools tha run during the build, like lint and Parfait (an internal static analysis tool) there are a host of other checks that perform verification on the binaries that are produced.

Indeed, to maximise the chances of finding errors during the build, we compile the source tree with two separate compilers (currently Oracle Developer Studio and gcc) discarding the binaries produced by the “shadow compiler”. As the different compilers produce different warnings, sometimes one will report errors that the other misses, which can be an aid to developers.

The problem with catching problems early

As much as possible, we emphasise pre-integration testing to find problems early. The flip side of that, is that not every engineer has access to some of our larger hardware configurations, and test labs containing them are a finite resource.

Another problem is that even with access to those large systems, how do you know which tests ought to be executed? Since lab-time is limited and some tests can take a long time to complete, we simply can’t run all the tests before every integration because then we’d never be able to effectively make changes.

A common way for tests to be developed for Solaris, was by having a separate teams of test engineers maintain and update tests rather than developers owning their own tests (this wasn’t the rule of course – some developers modified those test suites directly)

In some cases where engineers did write their own tests, the test code was often stored in their home directories – they’d know to execute the tests the next time they were making changes to their code, but nobody else would know of their existence, and breakage could occur.

The build was also lacking any way to describe which tests ought to be executed when a given piece of software changed, and it became a question of “received wisdom” and experience to determine what testing needed to be performed for any given change.

Continuous integration in ON

For some time  (5+ years before my time as far as I can tell), the ON gate has had a simple incremental build facility. As each commit happened to the source tree, some custom code, driven by cron and procmail would select one of four previously built workspaces, pull the latest changes and kick off a build.

This typically found build-time problems quickly, so we’d be able to either backout changes that didn’t build, or get in touch with the responsible developer to arrange a fix before too many engineers were impacted, but the binaries that were produced by those incremental builds were simply discarded, which seemed like a lost opportunity to me.

Even worse, from time to time, we’d get integrations that built fine, but actually failed to boot on specific systems due to inadequate testing!

So, to see about modernizing our build infrastructure and plug this gap, I started looking into using Jenkins to not only periodically build the source tree, but also to update a series of kernel zones with those changes and make sure that we were capable of booting the resulting OS.

That was a pretty simple change, and I was pleased with how it turned out. Once that was in place for a few months, I started to wonder what else we could do with those newly booted zones?

Developer notifications and unit testing

I’ve mentioned already that in a large engineering organisation, it’s difficult to know what to test, and being dependent on a separate test group to implement your test suite can be frustrating. Of course, there can be advantages in that separation of duty – having a different pair of eyes looking at changes and writing tests can find problems that a developer would otherwise be blind to.

Given our experience with the IPS consolidation, and its use of unit tests, one of the Solaris development teams working in ON decided to take a similar route, wanting to add their tests to the ON source tree directly.

Rather than deal with the chaos of multiple teams following suit, I felt it was time to formalize how tests were added to the source tree, and to write a simple unit-test framework to allow those tests to be discovered and executed, as well as a way to advertise specific other testing and development advice that could be relevant when we detect modifications to a given source file.

Obviously there were some limits with what we could do here – some tests require specific hardware or software configurations, and so wouldn’t be appropriate for set of build-time tests, other tests are too long-running to really be considered “unit tests”.

Other tests may require elevated privileges, or may attempt to modify the test machine during execution, so it can be tricky to determine when to write a separate test suite, vs. when to enroll in the gate unit-test framework.

As part of this work, I modified our “hg pbchk” command (part of our Mercurial extension that performs basic pre-putback verification on changesets about to be integrated to the source tree, essentially ensuring the integration paperwork is correct)

The pbchk command now loads all of the test descriptor files found in the workspace, reports if tests are associated with the sources being modified, and will print specific developer notifications that ought to be emitted when a given source file changes.

I think of it as a “Robotic CRT advocate” – to point out testing that ought to run prior to integration (the CRT, or “Change Review Team” are a group of senior engineers who must pre-approve each and every putback to the ON source tree and will see the results of ‘hg pbchk’ during their review, and will verify that testing was completed)

Over time, that test framework is getting more and more use, and we now have tests that are easy to run, implemented as a simple build(1) task. Here are the ones we have today:

timf@whero[171] build test
INFO    : No config file passed: using defaults
STATE   : Starting task 'test'

Usage: build test <file name or section name>

FILE                 SECTION NAME                   SYNOPSIS
ak-divergence.cfg    ak-divergence          [ no synopsis available ]
build.cfg            build-notify           Warn when users modify AK build tools
build.cfg            build-test             A series of tests that exercise build(1)
corediag.cfg         core-diag-test         A series of tests that exercise coremond.
crypto-fips.cfg      crypto-fips-140        Crypto Framework FIPS 140-2 Boundary Change note.
daxstat.cfg          daxstat-test           A series of tests to exercise daxstat
elfsignrange.cfg     elfsignrange           Warn about grub2 duplication of elfsignrange code
fuser.cfg            fuser-test             A series of tests to exercise fuser
fwenum.cfg           fwenum-unit            Firmware Enumerator unit tests
libc.cfg             gettext                A simple test for gettext(3C)
gk.cfg               gk-test                A series of tests that exercise the gk tool
ipf2pf.cfg           ipf2pf-test            Test verifies ipf2pf is still sane
kom.cfg              kom-test               Unit tests for the KOM framework
libcmdutils.cfg      libcmdutils-test       A series of tests that exercise libcmdutils
libdax.cfg           libdax-test            A series of tests that exercise libdax
libdiskmgt.cfg       libdiskmgt-test        Test for dumping libdiskmgt cache and do inuse operation.
libkstat2.cfg        libkstat2_basic        A series of basic tests that exercise libkstat2
libkstat2.cfg        libkstat2_priv         A series of privileged tests that exercise libkstat2
libnvpair.cfg        libnvpair-test-27      libnvpair unit tests (Python 2.7)
libnvpair.cfg        libnvpair-test-34      libnvpair unit tests (Python 3.4)
libsdwarf.cfg        libsdwarf-test         A series of tests that exercise libsdwarf.
libuuid.cfg          libuuid-test           A series of tests that exercise libuuid
libv12n.cfg          libv12n-test           A series of tests that exercise libv12n.
mdb.cfg              mdb-ctf                Mdb CTF unit tests
memtype.cfg          memtype-test           A series of tests for memory types and attributes.
netcfg.cfg           netcfg-noexec          A series of tests that verify libnetcfg operation
odoc.cfg             odoc-test              A series of odoctool tests
pbchk.cfg            pbchk-test             Warn that pbchk tests must be run manually
pfiles.cfg           pfiles-test            A series of tests to exercise pfiles
rad-auth_1.cfg       rad-auth_1             Tests for RAD module authentication version 1
rad-loccsm_1.cfg     rad-loccsm_1           RAD locale test module loccsm version 1
odocprovider.cfg     rad-module-odocprovider_1-test Tests for RAD module odocprovider
rad-test.cfg         rad-test               A series of tests that exercise RAD daemon
rad-test-rbac_1.cfg  rad-test-rbac_1        Testing RBAC integration in RAD
libc.cfg             sendfile-test          sendfile unit tests for blocking socket and NULL offset
libc.cfg             sendsig                A unit test for the user signal (only for SPARC)
smf.cfg              smf                    A series of tests that exercise SMF
smf.cfg              smf-python             Tests for solaris.smf.*
smf.cfg              smf-sysconfig          The intersection of sysconfig and SMF
snoop.cfg            snoop                  Warn about PSARC 2010/429
spliceadm.cfg        spliceadm-test         A series of tests to exercise spliceadm
sstore.cfg           sstore-unit            Statistics store unit tests
sstore.cfg           sstore-unit2           Statistics store unit tests (Python 2)
libstackdiag.cfg     stackdiag-crosscheck   Feed stackdb records to libstackdiag, verify results
sstore.cfg           statcommon             Warn about statcommon
sxadm.cfg            sxadm-aslr             A series of tests that exercise aslr
sxadm.cfg            sxadm-noexec           A series of tests that exercise nx heap/nx stack
sxadm.cfg            sxadm-test             A series of tests that exercise sxadm
timespec.cfg         timespec-test          Test for timespeccmp macro
updatedrv-test.cfg   updatedrv-test         Warn about update_drv dynamic device configuration test
vboot.cfg            vboot                  Warn about grub2 duplication of verified boot cert code
verify_key2.cfg      verify_key2            Warn about duplication of verified boot development key definitions
vm2.cfg              vm2-test               Unit tests for the VM2 code
webuicoord.cfg       webuicoord-unit-2.7    WebUI Coordinator unit tests (Python 2.7)
webuiprefs.cfg       webuiprefs-unit-2.7    WebUI Preferences unit tests (Python 2.7)
zfs.cfg              zloop-test             Testing zloop, a framework for ZFS unit tests
zonecfg.cfg          zonecfg                Warn about ModifyingZonecfg wiki
zones.cfg            zones-test             Test wrappers around various zones system calls

STATE   : Finishing task 'test'
SUMMARY : 'test' completed successfully and took 0:00:00.

Of course, this is still a small fraction of the overall number of tests that run on the OS, but my hope is that we will continue to extend these unit tests over time. From my past experience as a test developer on the ZFS Test team, the easier you make tests to execute, the more likely a developer is actually going to run them!

In conjunction with the more comprehensive Jenkins pipelines we have recently finished work on, this framework has been well received, and has found problems before customers do – which continues to make me very happy.

Advertisements