Algorithm for dependency resolution

Algorithm for dependency resolution - algorithm

I'm in the process of writing a package manager, and for that I want the dependency resolution to be as powerful as possible.
Each package has a list of versions, and each version contains the following information:
A comparable ID
Dependencies (a list of packages and for each package a set of acceptable versions)
Conflicts (a list of packages and for each package a set of versions that cause issues together with this version)
Provides (a list of packages and for each package a set of versions that this package also provides/contains)
For the current state I have a list of packages and their current versions.
I now want to, given the list of available packages and the current state, be able to get a version for each package in a list of packages, taking the given constraints into account (dependencies, conflicting packages, packages provided by other packages) and get back a list of versions for each of these packages. Circular dependencies are possible.
If no valid state can be reached, the versions of the existing packages may be changed, though this should only be done if necessary. Should it not be possible to reach a valid state as much information to the reason should be available (to tell the user "it could work if you remove X" etc.).
If possible it should also be possible to "lock" packages to a specific version, in which case the version of the package may NOT be changed.
What I'm trying to accomplish is very similar to what existing package managers already do, with the difference that not necessarily the latest version of a package needs to be used (an assumption which most package managers seem to do).
The only idea I have so far is building a structure of all possible states, for all possible versions of the packages in question, and then removing invalid states. I really hope this is not the only solution, since it feels very "brute force"-ish. Staying under a few seconds for ~500 available packages with ~100 versions each, and ~150 installed packages would be a good goal (though the faster the better).
I don't believe this is a language-specific question, but to better illustrate it, here is a bit of pseudecode:
struct Version
integer id
list<Package, set<integer>> dependencies
list<Package, set<integer>> conflicts
list<Package, set<integer>> provides
struct Package
string id
list<Version> versions
struct State
map<Package, Version> packages
map<Package, boolean> isVersionLocked
State resolve(State initialState, list<Package> availablePackages, list<Package> newPackages)
{
// do stuff here
}
(if you should have actual code or know about an existing implementation of something that does this (in any language, C++ preferred) feel free to mention it anyway)

It's NP-hard
Some bad news: This problem is NP-hard, so unless P=NP, there is no algorithm that can efficiently solve all instances of it. I'll prove this by showing how to convert, in polynomial time, any given instance of the NP-hard problem 3SAT into a dependency graph structure suitable for input to your problem, and how to turn the output of any dependency resolution algorithm on that problem back into a solution to the original 3SAT problem, again in polynomial time. The logic is basically that if there was some algorithm that could solve your dependency resolution problem in polynomial time, then it would also solve any 3SAT instance in polynomial time -- and since computer scientists have spent decades looking for such an algorithm without finding one, this is believed to be impossible.
I'll assume in the following that at most one version of any package can be installed at any time. (This is equivalent to assuming that there are implicit conflicts between every pair of distinct versions of the same package.)
First, let's formulate a slightly relaxed version of the dependency resolution problem in which we assume that no packages are already installed. All we want is an algorithm that, given a "target" package, either returns a set of package versions to install that (a) includes some version of the target package and (b) satisfies all dependency and conflict properties of every package in the set, or returns "IMPOSSIBLE" if no set of package versions will work. Clearly if this problem is NP-hard, then so is the more general problem in which we also specify a set of already-installed package versions that are not to be changed.
Constructing the instance
Suppose we are given a 3SAT instance containing n clauses and k variables. We will create 2 packages for each variable: one corresponding to the literal x_k, and one corresponding to the literal !x_k. The x_k package will have a conflict with the !x_k package, and vice versa, ensuring that at most one of these two packages will ever be installed by the package manager. All of these "literal" packages will have just a single version, and no dependencies.
For each clause we will also create a single "parent" package, and 7 versions of a "child" package. Each parent package will be dependent on any of the 7 versions of its child package. Child packages correspond to ways of choosing at least one item from a set of 3 items, and will each have 3 dependencies on the corresponding literal packages. For example, a clause (p, !q, r) will have child package versions having dependencies on the literal packages (p, q, !r), (!p, !q, !r), (!p, q, r), (p, !q, !r), (p, q, r), (!p, !q, r), and (p, !q, r): the first 3 versions satisfy exactly one of the literals p, !q or r; the next 3 versions satisfy exactly 2; and the last satisfies all 3.
Finally, we create a "root" package, which has all of the n parent clause packages as its dependencies. This will be the package that we ask the package manager to install.
If we run the package manager on this set of 2k + 8n + 1 package versions, asking it to install the root package, it will either return "IMPOSSIBLE", or a list of package versions to install. In the former case, the 3SAT problem is unsatisfiable. In the latter case, we can extract values for the variables easily: if the literal package for x_k was installed, set x_k to true; if the literal package !x_k was installed, set x_k to false. (Note that there won't be any variables with neither literal package installed: each variable appears in at least one clause, and each clause produces 7 child package versions, at least one of which must be installed, and which will force installation of one of the two literals for that variable.)
Even some restrictions are hard
This construction doesn't make any use of pre-installed packages or "Provides" information, so the problem remains NP-hard even when those aren't permitted. More interestingly, given our assumption that at most one version of any package can be installed at a time, the problem remains NP-hard even if we don't permit conflicts: instead of making the literals x_k and !x_k separate packages with conflict clauses in each direction, we just make them two different versions of the same package!

Related

Gradle dependency resolution with ranges

How does Gradle resolve dependencies when version ranges are involved? Unfortunately, I couldn't find sufficient explanation on the official docs.
Suppose we have a project A that has two declared dependencies B and C. Those dependencies specify a range of versions, i.e. for B it is [2.0, 3.0) and for C it is [1.0, 1.5].
B itself does not have a dependency to C until version 2.8, so that version introduced a strict dependency to C in the range [1.1, 1.2].
Looking at this example, we might determine that the resolved versions are:
B: 2.8 (because it is the highest version in the range)
C: 1.2 (because given B at 2.8, this is the highest version that satisfies both required ranges)
In general, it is not clear to me how this entire algorithm is carried out exactly. In particular, if ranges are involved, every possible choice of a concrete version inside a range might introduce different transitive dependencies (as in my example with B introducing a dependency to C only at version 2.8) which can declare dependencies with ranges themselves, and so on, making the number of possibilities explode quickly.
Does it apply some sort of greedy strategy in that it tries to settle a version as early as possible and if later a new dependency is encountered that conflicts the already chosen one, it tries to backtrack and choose another version?
Any help in understanding this is very much appreciated.
EDIT: I've read here that the problem in general is NP-hard. So does Gradle actually simplify the process somehow, to make it solvable in polynomial time? And if so, how?

How to check LTL satisfiability using NuSMV?

I'm trying to use NuSMV as a satisfiability checker for LTL formulae, i.e. I want to know if there exist a model for a given formula.
I know that NuSMV can be used for this purpose as well, both because it is theoretically possible and because I see it cited in a lot of papers that deal with satisfiability (one of them claims also that NuSMV is one of the fastest satisfiability checkers out there).
I see that with NuSMV comes a tool called ltl2smv that apparently translates an LTL formula into an SMV module, but then I don't know how to use the output. Giving it directly to NuSMV gives back an error message about "main" being undefined, so I suppose I have to define a main module and use the other in some way. Since I've never used NuSMV as a model checker I have no idea how its language works and the User Guide is overwhelming given that I only need this specific use case, which, by the way, is not mentioned anywhere in said guide.
So how can I use NuSMV to check the satisfiability of an LTL formula? Is there a place where this use case is documented?

Have a look at the chapter about LTL model checking in NuSMV's user manual. It comes with an example how LTL specifications can be expressed in a module and checked:
MODULE main
VAR
...
ASSIGN
...
LTLSPEC <LTL specification 1>
LTLSPEC <LTL specification 2>
...
NuSMV checks if the specifications hold for all possible paths. To check if there exists a model (i.e. path) for your formula, you can enter the negation and the model checker will give you a counter-example for it if it exists. The counter-example would then be an example for your original formula.

One way is to use PolSAT. This takes as input an LTL formula, and feeds it to a number of different LTL solvers. This is usually faster than just using NuSMV alone. If you replace the NuSMV binary with /bin/false and run ./polsat 'Gp & F X ~ p' it will abort and leave behind a file ../NuSMV/tmpin.smv containing something like:
MODULE main
VAR
Gp:boolean;
p:boolean;
LTLSPEC
!(((Gp)&(F (X (!(p))))))
(Note that PolSAT interpreted Gp as a single variable). You can then run NuSMV directly with the command ../NuSMV/nusmv/NuSMV < ../NuSMV/tmpin.smv.
If you want to install PolSAT, it can presently be downloaded from https://lab301.cn/home/pages/lijianwen/. v0.1 has a bit of difficulty on modern machines, you may need to downgrade bison to version 2.7 (See e.g. https://askubuntu.com/questions/444982/install-bison-2-7-in-ubuntu-14-04).

IndexedSeq.last complexity

When working with indexed collections (most often immutable Vectors) I am often using coll.last as what I supposed to be a convenient short-cut to coll(coll.size-1). When randomly inspecting my sources, I have clicked to see the last implementation, and the IntelliJ IDE took me to TraversableLike.last implementation, which traverses all elements to eventually reach the last one.
This was a surprise to me, and I am not sure now what is the reason for this. Is last really implemented this way? Is there some reason preventing last to be implemented for IndexedSeq (or perhaps for IndexedSeqLike) efficiently?
(Scala SDK used is 2.11.4)

IndexedSeq does not override last (it only inherits it from TraversableLike) - the fact that a particular sequence supports indexed access does not necessarily make indexed lookups faster than traversals. However, such optimized implementations are given in IndexedSeqOptimized, which I would expect many implementations to inherit from. In the specific case of Vector, last is overridden explicitly in the class itself.

IndexedSeq has constant access time for the arbitrary element. LinearSeq has linear time. TraversableLike is just common interface and you may find that it's overriden inside IndexedSeqOptimized trait:
A template trait for indexed sequences of type IndexedSeq[A] which
optimizes the implementation of several methods under the
assumption of fast random access.
def last: A = if (length > 0) this(length - 1) else super.last
You may also find the quick random access implementation inside Vector.getElem - it uses a tree of arrays with high branching factor, so usually it's O(1) for apply. It doesn't use IndexedSeqOptimized, but it has its own overriden last:
override /*TraversableLike*/ def last: A = {
if (isEmpty) throw new UnsupportedOperationException("empty.last")
apply(length-1)
}
So it's a little mess inside Scala collections, which is very common for Scala internals. Anyway last on IndexedSeqs is O(1) de facto, regardless such tricky collections architecture.
The Scala collections intricacy is actually an active topic. A talk (and slides) with Scala's collection framework criticism may be found at Paul Phillips: Scala Collections: Why Not?, and Paul Phillips is developing his alternate version of std.

Is the resolution problem in OSGi NP-Complete?

The resolution problem is described in the modularity chapter of the OSGi R4 core specification. It's a constraint satisfaction problem and certainly a challenging problem to solve efficiently, i.e. not by brute force. The main complications are the uses constraint, which has non-local effects, and the freedom to drop optional imports to obtain a successful resolution.
NP-Completeness is dealt with elsewhere on StackOverflow.
There has already been plenty of speculation about the answer to this question, so please avoid speculation. Good answers will include a proof or, failing that, a compelling informal argument.
The answer to this question will be valuable to those projects building resolvers for OSGi, including the Eclipse Equinox and Apache Felix open source projects, as well as to the wider OSGi community.

Yes.
The approach taken by the edos paper Pascal quoted can be made to work with OSGi. Below I’ll show how to reduce any 3-SAT instance to an OSGi bundle resolution problem. This site doesn’t seem to support mathematical notation, so I’ll use the kind of notation that’s familiar to programmers instead.
Here’s a definition of the 3-SAT problem we’re trying to reduce:
First define A to be a set of propositional atoms and their negations A = {a(1), … ,a(k),na(1), … ,na(k)}. In simpler language, each a(i) is a boolean and we define na(i)=!a(i)
Then 3-SAT instances S have the form: S = C(1) & … & C(n)
where C(i) = L(i,1) | L(i,2) | L(i,3) and each L(i,j) is a member of A
Solving a particular 3-SAT instance involves finding a set of values, true or false for each a(i) in A, such that S evaluates to true.
Now let’s define the bundles we’ll be use to construct an equivalent resolution problem. In the following all bundle and package versions are 0 and import version ranges unrestricted except where specified.
The whole expression S will be represented by Bundle BS
Each clause C(i) will be represented by a bundle BC(i)
Each atom a(j) will be represented by a bundle BA(j) version 1
Each negated atom na(j) will be represented by a bundle BA(j) version 2
Now for the constraints, starting with the atoms:
BA(j) version 1
-export package PA(j) version 1
-for each clause C(i) containing atom a(j) export PM(i) and add PA(j) to PM(i)’s uses directive
BA(j) version 2
-export package PA(j) version 2
-for each clause C(i) containing negated atom na(j) export PM(i) and add PA(j) to PM(i)’s uses directive
BC(i)
-export PC(i)
-import PM(i) and add it to the uses directive of PC(i)
-for each atom a(j) in clause C(i) optionally import PA(j) version [1,1] and add PA(j) to the uses directive of the PC(i) export
-for each atom na(j) in clause C(i) optionally import PA(j) version [2,2] and add PA(j) to the uses directive of the PC(i) export
BS
-no exports
-for each clause C(i) import PC(i)
-for each atom a(j) in A import PA(j) [1,2]
A few words of explanation:
The AND relationships between the clauses is implemented by having BS import from each BC(i) a package PC(i) that is only exported by this bundle.
The OR relationship works because BC(i) imports package PM(i) which is only exported by the bundles representing its members, so at least one of them must be present, and because it optionally imports some PA(j) version x from each bundle representing a member, a package unique to that bundle.
The NOT relationship between BA(j) version 1 and BA(j) version 2 is enforced by uses constraints. BS imports each package PA(j) without version constraints, so it must import either PA(j) version 1 or PA(j) version 2 for each j. Also, the uses constraints ensure that any PA(j) imported by a clause bundle BC(i) acts as an implied constraint on the class space of BS, so BS cannot be resolved if both versions of PA(j) appear in its implied constraints. So only one version of BA(j) can be in the solution.
Incidentally, there is a much easier way to implement the NOT relationship - just add the singleton:=true directive to each BA(j). I haven’t done it this way because the singleton directive is rarely used, so this seems like cheating. I’m only mentioning it because in practice, no OSGi framework I know of implements uses based package constraints properly in the face of optional imports, so if you were to actually create bundles using this method then testing them could be a frustrating experience.
Other remarks:
A reduction of 3-SAT that doesn't use optional imports in also possible, although this is longer. It basically involves an extra layer of bundles to simulate the optionality using versions. A reduction of 1-in-3 SAT is equivalent to a reduction to 3-SAT and looks simpler, although I haven't stepped through it.
Apart from proofs that use singleton:=true, all of the proofs I know about depend on the transitivity of uses constraints. Note that both singleton:=true and transitive uses are non-local constraints.
The proof above actually shows that the OSGi resolution problem is NP-Complete or worse. To demonstrate that it’s not worse we need to show that any solution to the problem can be verified in polynomial time. Most of the things that need to be checked are local, e.g. looking at each non-optional import and checking that it is wired to a compatible export. Verifying these is O(num-local-constraints). Constraints based on singleton:=true need to look at all singleton bundles and check that no two have the same bundle symbolic name. The number of checks is less than num-bundlesnum-bundles. The most complicated part is checking that the uses constraints are satisfied. For each bundle this involves walking the uses graph to gather all of the constraints and then checking that none of these conflict with the bundle’s imports. Any reasonable walking algorithm would turn back whenever it encountered a wire or uses relationship it had seen before, so the maximum number of steps in the walk is (num-wires-in-framework + num-uses-in framework). The maximum cost of checking that a wire or uses relationship hasn't been walked before is less than the log of this. Once the constrained packages have been gathered the cost of the consistency check for each bundle is less than num-imports-in-bundlenum-exports-in-framework. Everything here is a polynomial or better.

This paper provides a demonstration: http://www.cse.ucsd.edu/~rjhala/papers/opium.html

From memory I thought this paper contained the demonstration, sorry for not checking that out before. Here is other link that I meant to copy that I'm sure provides a demonstration on page 48: http://www.edos-project.org/bin/download/Main/Deliverables/edos%2Dwp2d1.pdf

Boost Version Numbers

I don't understand why version numbers of the Boost Library are incremented only 1/100 (e.g. 1.33, 1.34, so on) even though major inclusions are made like huge libraries. Is there any strong motivation behind this?

It says in the Boost Faq:
What do the Boost version numbers mean? The scheme is x.y.z, where x is incremented only for massive changes, such as a reorganization of many libraries, y is incremented whenever a new library is added, and z is incremented for maintenance releases. y and z are reset to 0 if the value to the left changes.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio