Bundler searches an NP-complete problem
'bundle install' may take a very long time...
Bundler is a tool for managing and resolving dependencies of Ruby programs. It resolves dependencies and installs Gems and is similar to Apache Maven for Java jars or Apt for Debian packages or many other similar tools.
The dependency resolution algorithm used in bundler (up to at least version 1.1) attempts an exhaustive search (using backtracking or backjumping) of what is an NP-complete problem. In most cases, a solution is found quickly. But in some real-world cases, the solution is not found quickly, and bundler appears to have hung.
I first encountered this problem at work. We have several internal libraries packaged as gems. Our build server auto-increments the patch level for each build, and developers may increment the major or minor version from time to time. This results in a great many versions in our internal gemserver; a dangerous situation for any algorithm checking every combination of those many versions.
Most of our gems list a dependency on 'rspec', '~> 2.5.0'. But one listed this as 'rspec', '>= 2.5.0'. The gem with this less restrictive requirement happened to be resolved first, pulling in the latest version of rspec, 2.8 at the time. When bundler's resolver algorithm encountered gems with the more restrictive rspec dependency, it backtracked, trying every older version of that first gem, all the others, and finally all the rspec versions. After nearly 45 minutes, bundler converged on a solution. (Run DEBUG_RESOLVER=y bundle install to see debugging output during dependency resolution.)
I was a bit surprised at this. I had not thought much about the dependency resolution problem, but dependency resolution seems to be NP-complete. Indeed, the EDOS project performed an excellent review of the problem and real-world implementations. They conclude:
This means that automatic package installation tools like APT, URPMI or SMART live dangerously on the edge of intractability, and must carefully apply heuristics that may be either safe (the approach advocated by SMART), and hence still not guaranteed to avoid intractability, or unsafe, thus accepting the risk of not always finding a solution when it exists.
It appears that Apt takes the "unsafe" approach of not guaranteeing a solution even when one exists, but bundler takes the "safe" approach while risking intractability. The documentation of bundle install should mention this.
I was able to contrive a test case showing this worst-case behavior. I hope to find time to at least add some feedback during bundle install to indicate to the user that the resolver is searching for a solution. I personally think bundler should by default not attempt an exhaustive search, but finding an "unsafe" algorithm that works in most cases is non-trivial.
This is not an unknown problem. It gets a small mention in Yehuda Katz' blog. Chris Continanza gives a more in-depth description in his talk, Bundler: The Easy, The Hard, and the NP-Complete.
