I should have written about this a few days ago, but better late than never.
With the putback of:
changeset: 2236:7b074b5316ec user: Tim Foster date: Tue Feb 22 10:00:49 2011 +1300 description: 16015 pkgdepend needs python runpath hints 16020 pkgdepend doesn't find native modules 17477 typo in pkgdepend man page 17596 python search path not generated correctly for xml.dom.minidom 17615 pkgdepend generate needs an exclusion mechanism 17619 pkgdepend generate is broken for 64-bit binaries when passing relative run paths
pkgdepend(1) has become better at being able to determine dependencies. I'd done some work on
pkgdepend before, and it was nice to visit the code again.
To those unfamiliar with the tool, I thought I'd write an introduction to it (which I should have written last time).
pkgdepend in a nutshell
pkgdepend is used before publishing an IPS package to discover what other packages are needed in order for the contents of that package to function properly. The packaging system then uses those dependencies whenever a package is installed to automatically install those dependencies for you.
During the creation of a package, the process of running
pkgdepend on your manifests is broken into two phases, each with its own subcommand.
The first is called 'generate'. This is where the code examines each of the files you're intending to publish in your package. Depending on the type of file it is, we look for clues in that file to see what other files it may depend on.
Those clues could be as simple as the path that comes after the '#!' in UNIX scripts (so for a Perl script with '#!/usr/bin/perl' at the top of it, obviously you need to have Perl installed in order to run the script) or could be complex, such as digging around in the ELF headers in an ELF binary to find the "NEEDED" libraries, determining Python module imports in a Python script, or looking at 'require_all' SMF services in an SMF manifest.
There's a list of all the things used so far to determine dependencies in the
pkgdepend(1) man page.
pkgdepend has gathered the set of files it thinks should be dependencies for the files you're delivering, it outputs another copy of your manifest, this time with partially complete 'depend' actions.
I say partially complete, because all we know at this stage, is that your package will need a bunch of files in order for it to function properly: we don't yet know what delivers those files. That's where the second phase of
pkgdepend comes in: dependency resolution.
During dependency resolution, via the '
pkgdepend resolve' subcommand, we take that partially complete list of depend actions, and try to determine which package delivers each file the package depends on.
In order to do this,
pkgdepend needs to be pointed at an image populated with all the packages that package could depend on - in most cases, the image is simply the machine you're building the packages on, (remember, in IPS terms, every package is installed to an "image": your running copy of Solaris is itself an image) though you could choose to point '
pkgdepend resolve' to an alternate boot environment containing a different image.
Assuming we're successful, you are then presented a version of your package with all dependencies converted from just the filenames needed to satisfy each dependency, to the actual packages IPS will install for you in order for your package to function.
Things that can go wrong
I say "assuming we're successful" because, unfortunately, sometimes we're not.
There are several things that can go wrong:
- an ELF header entry could be incorrectly specified at build time, or could contain
pkgdependdoesn't know how to expand
- a file might be delivered by multiple packages on your system, in multiple places
- a python script might modify
sys.path, a shell script might modify
- we could deliver scripts only meant to be read, not run (demo scripts, for example) which could cause either fake dependencies, or dependencies which could never resolve
All of the things above can result in error messages from
pkgdepend, where it's unable to determine exactly what we should be depending on - this is the part of
pkgdepend I was trying to fix in my putback.
It fixes a few bugs in
pkgdepend when dealing with Python modules and kernel modules, and it introduces two new IPS attributes:
pkg.depend.bypass-generate, is used to specify regular expressions to files on which we should never generate dependencies. This gets us around the cases where multiple packages deliver files in several places, or where $VARIABLES aren't being expanded. Bypassing dependencies this way is good, though you do need to be careful where and how you apply it -- if you bypass a legitimate dependency, then there's a good chance your package won't function properly if the packages it depends on aren't installed.
pkg.depend.runpath, is used to change the standard set of directories that
pkgdepend looks in, per-file-type in order to search for file-dependencies. This gets us around the case where programs are installed in non-standard locations.
Alongside this work, I've been doing work on the ON package manifests to greatly reduce the numbers of
pkgdepend errors being reported during the ON build. (sadly, I can't share the work on the ON manifests, but they will go back once snv_160 is available internally. If you're an ON engineer there'll be a Flag Day attached to this, making snv_160 the minimum build on which you can build the gate) Quite soon after that, we'll be able to enable error-reporting from the
pkgdepend phase of the build, and that will be fabulous.
I'd strongly encourage those working on Illumos and other derivatives of the OpenSolaris codebase to investigate the new
pkgdepend functionality, and put in the time to get their gate pkgdepend-clean too.
Why? Well, in my view, one of the problems with SVR4 packaging was that it lacked any sort of automatic dependency analysis. This meant that packages declared manual, often-bogus dependencies on other packages - and dependencies that aren't correct make minimisation of systems very difficult.
When we determine dependencies automatically, minimisation becomes a lot easier.
Crucially, so does package refactoring: if we split or merge packages, so long as those new packages are installed on the image being used to resolve dependencies, the packages that have dependencies on those split/merged packages automatically pick up the new package names the next time they're published.
However, without actually checking the exit status from the
pkgdepend phase of the build, you're having to insert more manual dependency actions than should be strictly necessary, and that's a bad thing.
Of course, sometimes we can't avoid inserting manual dependencies -
pkgdepend isn't finished yet, and there's more we could be doing to determine dependencies at package publication time, however the tool does make life a lot easier. So, if you're ever tempted to insert a manual dependency into your package, please do think carefully about it, and please add a comment to the manifest explaining in detail why that manual dependency is really required.