Dependency Management Rant

It is spectacularly amazing to me how many seasoned software engineers adopt the ostrich method of dealing with dependency management: pretend it’s not a problem.

Maybe it’s because I help run a software quality assurance company that I’m so sensitive to this issue – or maybe I’ve drifted into that job because I understand that being a good software engineer has very little to do with how impressive your code looks. I don’t know. Suffice to say that I’ve run into a huge variety of problems in my career that all were caused by bad dependency management, or a complete lack thereof.

So let’s start first with a quick definition of what it is that I’m talking about. The thing is, dependency management means different things to different people: some view it in terms of package management (such as in operating system installations), others in terms of libraries loaded into your software at run-time, and yet others in terms of libraries available to your software at compile-time. And it’s exactly in these perception differences that some of the problems lie, I think.

All these views share a an underlying problem, though: when you write software, you rarely write software completely from scratch — instead, you rely on operating system APIs and third-party libraries to do a lot of the work. On the computer on which you write your code, specific versions of OS APIs and third-party libraries will be installed, and whether intentionally or not, you will write your code to accommodate all the quirks and oddities of that particular environment.

Through rigorous testing, you can perhaps guarantee that your software will run in an exact replica of your development environment — perhaps. But as for any other environment, all bets are off. So how do you deal with that?

In one form or another, you have no choice but to make a complete list of OS APIs and third-party libraries that your software is using. That’s the easy, first step.

A second step is to note down the version of each OS API or third-party library that you’ve got installed in your development environment. APIs can change from library version to library version, and those changes can break the behaviour of your software. Viewed from a certain angle, an incompatible API version is a worse problem than a missing library. Your software can warn at start-up if a required library is not available, but if it is available, there is no guarantee that an API function it provides actually does the expected thing — that is, the same thing that it’s doing in the version you were developing against.

Let’s leave API compatibility for a moment — I’ll get back to it at a later date. For the purposes of this article, let’s continue with your list of libraries (and now versions). Having that list is a necessary condition for managing dependencies properly, but it is not sufficient. In order to ensure your software runs properly, you have to ensure that these dependencies are available in every environment your software is meant to run in. To do so, you have several options (though I wouldn’t call all of them solutions) available to you:

  • Statically link against all dependencies. That doesn’t work across all languages, and can — in some cases drastically — bloat your software’s distribution size and memory usage. Nevertheless, because of it’s ease of use, it’s a strategy adopted by most Mac OS X software1.
  • Rely on the operating system’s ability to resolve dependencies at install-time. Very few operating systems provide reliable dependency management as part of their distribution mechanism.2
  • Provide your own install-time dependency resolution mechanism. This is a solution adopted by many Windows applications; a problem here is that two different packages might require two conflicting versions of the same dependency. That is not a problem you can hope to solve without operating system support, and Windows lacks support for this.
  • Rely on the operating system’s ability to resolve dependencies at run-time. This is the approach adopted by the Android mobile operating system, and I have previously pointed out how it fails.
  • Leave it up to the user. That’ll earn you their gratitude…

As you can glean from the above list, dependency management falls either in the realm of the operating system, or the person that packages a piece of software3. In the case of Linux, most distributions opt for the first approach. Unfortunately, that requires that the distribution vendor keeps tight control over the default distribution channel for third-party software such as your own: your software is either in the distribution, or it’s not.

Not all is lost, though: by providing a strict framework for how software is distributed and how dependencies are to be managed, it is entirely possible for third-party software to be installed from unofficial sources (channels) using the same distribution mechanism. In effect, it makes your work as an independent software vendor easier, as you can fall back on a well-established dependency management solution4.

If, of course, you’re in the lucky position that you control both the operating system on which your software is supposed to run, and some custom software that runs on top of it, you automatically control the whole software stack, and can adopt approaches similar to the first outlined above: simply bundle everything you need.

To summarize: dependency management is a difficult problem even under the best circumstances. On the other hand, there are systems out there who have solved the problem well. One could learn from them, at the very least.

  1. Technically they bundle dependencies in a private directory from which they’re loaded dynamically. []
  2. I’d point to Debian Linux as the only operating system I know of that handles this relatively satisfactorily. []
  3. Usually a release manager, which is why it’s possible to view dependency management as part of release management []
  4. The downside is that many Linux distributions implement different dependency management solutions, so you have to pick and choose the platforms you want to distribute your software to — and none of this helps with Windows or OS X software []

  • http://twitter.com/emilianbold Emilian Bold

    It’s not a simple problem. The way I see it, dependency management is something the whole industry is more or less ignoring.

    At compile-time I might say Maven should suffice for simple Java stuff. 

    But what about a build system that uses specific OSes? How do we declare a compile-time dependency that needs some native DLLs compiled on 3 OSes?

    Then, a complex deployment needs its own thing via Puppet/Chef.

    I’m not certain there even is a project that tries to tackle these kind of problems in a global way.

    I think the real dependency graph would scare people not only due to its size but because it will expose how brittle everything is.

    • http://www.unwesen.de/ unwesen

      No, it’s not a simple problem. However, at least some aspects of solving it are well-explored: we know it’s too complex to handle manually. We also know it’s a question of interfaces and their versions, not merely of libraries.

      My problem isn’t so much that there are no tools to handle all of this complexity well enough (though that would be nice!). My problem is that too many developers don’t consider it a problem at all.

      You just need to take a look at e.g. rubygems (just because it’s used in the RoR community): gem (package) versions are used only to determine whether a gem is out of date. There is no concept of an interface version at all. So if you update a gem, you can break all sorts of software. Clearly whoever came up with rubygems did not consider versioning a priority.

      Because of that, there’s now Bundler, which basically copies specific versions of gems into the RoR app’s local search directory. The end result is the same not-quite-statically linked thing that happens on OS X natively.