I should have written about this a few days ago, but better late than never.

With the putback of:

changeset:   2236:7b074b5316ec
user:        Tim Foster 
date:        Tue Feb 22 10:00:49 2011 +1300
        16015 pkgdepend needs python runpath hints
        16020 pkgdepend doesn't find native modules
        17477 typo in pkgdepend man page
        17596 python search path not generated correctly for xml.dom.minidom
        17615 pkgdepend generate needs an exclusion mechanism
        17619 pkgdepend generate is broken for 64-bit binaries when passing relative run paths

pkgdepend(1) has become better at being able to determine dependencies. I'd done some work on pkgdepend before, and it was nice to visit the code again.

To those unfamiliar with the tool, I thought I'd write an introduction to it (which I should have written last time).

pkgdepend in a nutshell

pkgdepend is used before publishing an IPS package to discover what other packages are needed in order for the contents of that package to function properly. The packaging system then uses those dependencies whenever a package is installed to automatically install those dependencies for you.

During the creation of a package, the process of running pkgdepend on your manifests is broken into two phases, each with its own subcommand.

pkgdepend generate

The first is called 'generate'. This is where the code examines each of the files you're intending to publish in your package. Depending on the type of file it is, we look for clues in that file to see what other files it may depend on.

Those clues could be as simple as the path that comes after the '#!' in UNIX scripts (so for a Perl script with '#!/usr/bin/perl' at the top of it, obviously you need to have Perl installed in order to run the script) or could be complex, such as digging around in the ELF headers in an ELF binary to find the "NEEDED" libraries, determining Python module imports in a Python script, or looking at 'require_all' SMF services in an SMF manifest.

There's a list of all the things used so far to determine dependencies in the pkgdepend(1) man page.

Once pkgdepend has gathered the set of files it thinks should be dependencies for the files you're delivering, it outputs another copy of your manifest, this time with partially complete 'depend' actions.

I say partially complete, because all we know at this stage, is that your package will need a bunch of files in order for it to function properly: we don't yet know what delivers those files. That's where the second phase of pkgdepend comes in: dependency resolution.

pkgdepend resolve

During dependency resolution, via the 'pkgdepend resolve' subcommand, we take that partially complete list of depend actions, and try to determine which package delivers each file the package depends on.

In order to do this, pkgdepend needs to be pointed at an image populated with all the packages that package could depend on - in most cases, the image is simply the machine you're building the packages on, (remember, in IPS terms, every package is installed to an "image": your running copy of Solaris is itself an image) though you could choose to point 'pkgdepend resolve' to an alternate boot environment containing a different image.

Assuming we're successful, you are then presented a version of your package with all dependencies converted from just the filenames needed to satisfy each dependency, to the actual packages IPS will install for you in order for your package to function.

Things that can go wrong

I say "assuming we're successful" because, unfortunately, sometimes we're not.

There are several things that can go wrong:

  • an ELF header entry could be incorrectly specified at build time, or could contain $VARIABLES that pkgdepend doesn't know how to expand
  • a file might be delivered by multiple packages on your system, in multiple places
  • a python script might modify sys.path, a shell script might modify $LD_LIBRARY_PATH, etc.
  • we could deliver scripts only meant to be read, not run (demo scripts, for example) which could cause either fake dependencies, or dependencies which could never resolve

All of the things above can result in error messages from pkgdepend, where it's unable to determine exactly what we should be depending on - this is the part of pkgdepend I was trying to fix in my putback.

It fixes a few bugs in pkgdepend when dealing with Python modules and kernel modules, and it introduces two new IPS attributes:

  • pkg.depend.bypass-generate
  • pkg.depend.runpath

The first, pkg.depend.bypass-generate, is used to specify regular expressions to files on which we should never generate dependencies. This gets us around the cases where multiple packages deliver files in several places, or where $VARIABLES aren't being expanded. Bypassing dependencies this way is good, though you do need to be careful where and how you apply it -- if you bypass a legitimate dependency, then there's a good chance your package won't function properly if the packages it depends on aren't installed.

The second, pkg.depend.runpath, is used to change the standard set of directories that pkgdepend looks in, per-file-type in order to search for file-dependencies. This gets us around the case where programs are installed in non-standard locations.

What's next?

Alongside this work, I've been doing work on the ON package manifests to greatly reduce the numbers of pkgdepend errors being reported during the ON build. (sadly, I can't share the work on the ON manifests, but they will go back once snv_160 is available internally. If you're an ON engineer there'll be a Flag Day attached to this, making snv_160 the minimum build on which you can build the gate) Quite soon after that, we'll be able to enable error-reporting from the pkgdepend phase of the build, and that will be fabulous.

I'd strongly encourage those working on Illumos and other derivatives of the OpenSolaris codebase to investigate the new pkgdepend functionality, and put in the time to get their gate pkgdepend-clean too.

Why? Well, in my view, one of the problems with SVR4 packaging was that it lacked any sort of automatic dependency analysis. This meant that packages declared manual, often-bogus dependencies on other packages - and dependencies that aren't correct make minimisation of systems very difficult.

When we determine dependencies automatically, minimisation becomes a lot easier.

Crucially, so does package refactoring: if we split or merge packages, so long as those new packages are installed on the image being used to resolve dependencies, the packages that have dependencies on those split/merged packages automatically pick up the new package names the next time they're published.

However, without actually checking the exit status from the pkgdepend phase of the build, you're having to insert more manual dependency actions than should be strictly necessary, and that's a bad thing.

Of course, sometimes we can't avoid inserting manual dependencies - pkgdepend isn't finished yet, and there's more we could be doing to determine dependencies at package publication time, however the tool does make life a lot easier. So, if you're ever tempted to insert a manual dependency into your package, please do think carefully about it, and please add a comment to the manifest explaining in detail why that manual dependency is really required.

About these ads