Say we have the following lists:
list 1: x, y, z
list 2: w, x
list 3: u
And we want to merge them such that the order among each individual list is respected. A solution for the above problem might be w, x, y, z, u.
This problem is easy if we have a comparison key (e.g. string comparison; a < z), as this gives us a reference to any element's position relative to other elements in the combined list. But what about the case when we don't have a key? For the above problem, we could restate the problem as follows:
x < y AND y < z AND w < x where x, y, z, w, u are in {0, 1, 2, 3, 4}
The way I'm currently solving this type of problem is to model the problem as a constraint satisfaction problem -- I run the AC3 arc consistency algorithm to eliminate inconsistent values, and then run a recursive backtracking algorithm to make the assignments. This works fine, but it seems like overkill.
Is there a general algorithm or simpler approach to confront this type of problem?
Construct a graph with a node for every letter in your lists.
x y z
w u
Add a directed edge from letter X to letter Y for every pair of consecutive letters in any list.
x -> y -> z
^
|
w u
Topologically sort the graph nodes to obtain a final list that satisfies all your constraints.
If there were ever a cycle in your graph, the topological sorting algorithm would detect that cycle, revealing a contradiction in the constraints induced by your original lists.
Related
I would like to perform the Edmond matching algorithm or Blossom algorithm on a Graph (example Graph in picture), but how to I start with a empty matching set?
The Algorithm work this way:
Given: Graph G and matching M in G
Task: find matching M' with |M'| =
[M| + 1, or |M'| = IM| if M maximum
1 let F be the forest consisting of all M-exposed nodes; 2 while there
is outer node x and edge {x, y) with y \in V(F), add (x, y} and
matching edge covering y to F;
3 if there are adjacent outer nodes x, y in same tree, then shrink
cycle (M-blossom) in F \cup {x, y) and go to Step 2;
4 if there are adjacent outer nodes x, y in different trees, then
augment M along M-augmenting path P(x) \cup {x, y} \cup P(y);
5 in reverse order, undo each shrinking and re-establish near-perfect
matchings in blossoms.
You don’t begin the algorithm with an empty M. You have to provide one, generally by generating it with a greedy algorithm that parses all edges e of the graph G and adds each e to M if M + e form a matching.
There are two sets, s1 and s2, each containing pairs of letters. A pair is only equivalent to another pair if their letters are in the same order, so they're essentially strings (of length 2). The sets s1 and s2 are disjoint, neither set is empty, and each pair of letters only appears once.
Here is an example of what the two sets might look like:
s1 = { ax, bx, cy, dy }
s2 = { ay, by, cx, dx }
The set of all letters in (s1 ∪ s2) is called sl. The set sr is a set of letters of your choice, but must be a subset of sl. Your goal is to define a mapping m from letters in sl to letters in sr, which, when applied to s1 and s2, will generate the sets s1' and s2', which also contain pairs of letters and must also be disjoint.
The most obvious m just maps each letter to itself. In this example (shown below), s1 is equivalent to s1', and s2 is equivalent to s2' (but given any other m, that would not be the case).
a -> a
b -> b
c -> c
d -> d
x -> x
y -> y
The goal is to construct m such that sr (the set of letters on the right-hand side of the mapping) has the fewest number of letters possible. To accomplish this, you can map multiple letters in sl to the same letter in sr. Note that depending on s1 and s2, and depending on m, you could potentially break the rule that s1' and s2' must be disjoint. For example, you would obviously break that rule by mapping every letter in sl to a single letter in sr.
So, given s1 and s2, how can someone construct an m that minimizes sr, while ensuring that s1' and s2' are disjoint?
Here is a simplified visualization of the problem:
This problem is NP-hard, to show this, consider reducing graph coloring to this problem.
Proof:
Let G=(V,E) be the graph for which we want to compute the minimal graph coloring problem. Formally, we want to compute the chromatic number of the graph, which is the lowest k for which G is k colourable.
To reduce the graph coloring problem to the problem described here, define
s1 = { zu : (u,v) \in E }
s2 = { zv : (u,v) \in E }
where z is a magic value unused other than in constructing s1 & s2.
By construction of the sets above, for any mapping m and any edge (u,v) we must have m(u) != m(v), otherwise the disjointedness of s1' and s2' would be violated. Thus, any optimal sr is the set of optimal colors (with the exception of z) to color the graph G and m is the mapping that defines which node is assigned which color. QED.
The proof above may give the intuition that researching graph coloring approximations would be a good start, and indeed it probably would, but there is a confounding factor involved. This confounding factor is that for two elements ab \in s1 and cd \in s2, if m(a) = m(c) then m(b) != m(d). Logically, this is equivalent to the statement m(a) != m(c) or m(b) != m(d). These types of constraints, in isolation, do not map naturally to an analogous graph problem (because of the or statement.)
There are ways to formulate this problem as an (binary) ILP and solve it as such. This would likely give you (slightly) inferior results to a custom designed & tuned branch-and-bound implementation (assuming you want to find the optimal solution) but would work with turn-key solvers.
If you are more interested in approximations (possibly with guaranteed ratios of optimality) I would investigate a SDP relaxation to your problem & appropriate rounding scheme. This level of work would likely be the kind one would invest in a small-to-medium sized research paper.
Let 3 colorful sets problem. Given M sets of size three over {1...n} elements. We are given in other words sets S1, S2, ... , Sm where, for every i, Si = {x, y, z} for some x, y, z ∈ {1, ... , n}. What I want to find is to pick a set of elements E ⊆ {1, ... , n} so that to maximize the number of sets that contain exactly one element in E, namely, to maximize the |{i |Si ∩ E| = 1}| . A solution could use a 3 approximation polynomial time algo.
I am thinking a randomized algo that guarantee the approximation ration or a deterministic one. I have some ideas but I am not sure how to actually implement it. Any help would be appreciated.
How would you output all the possible topological sorts for a directed acyclic graph? For example, given a graph where V points to W and X, W points to Y and Z, and X points to Z:
V --> W --> Y
W --> Z
V --> X --> Z
How do you topologically sort this graph to produce all possible results? I was able to use a breadth-first-search to get V, W, X, Y, Z and a depth-first search to get V, W, Y, Z, X. But wasn't able to output any other sorts.
An algorithm for generating all topological sorts for a given DAG (aka generating all linear extensions of a partial order) is given in the paper "Generating Linear Extensions Fast" by Pruesse and Ruskey. The algorithm has an amortized running time that is linear in the output (e.g.: if it outputs M topological sorts, it runs in time O(M)).
Note that in general you can't really have anything that has a runtime that's efficient with respect to the size of the input since the size of the output can be exponentially larger than the input. For example, a completely disconnected DAG of N nodes has N! possible topological sorts.
It might be possible to count the number of orderings faster, but the only way to actually generate all orderings that I can think of is with a full brute-force recursion. (I say "brute force", but this is still much better than the brutest-possible brute force approach of testing every possible permutation :) )
Basically, at every step there is a set S of vertices remaining (i.e. which have not been added to the order yet), and a subset X of these can be safely added in the next step. This subset X is exactly the set of vertices that have no in-edges from vertices in S.
For a given partial solution L consisting of some number of vertices that are already in the order, the set S of remaining vertices, and the set X of vertices in S that have no in-edges from other vertices in S, the call Generate(L, X, S) will generate all valid topological orders beginning with L.
Generate(L, X, S):
If X is empty:
Either L is already a complete solution, in which case it contains all n vertices and S is also empty, or the original graph contains a cycle.
If S is empty:
Output L as a solution.
Otherwise:
Report that a cycle exists. (In fact, all vertices in S participate in some cycle, though there may be more than one.)
Otherwise:
For each x in X:
Let L' be L with x added to the end.
Let X' be X\{x} plus any vertices whose only in-edge among vertices in S came from x.
Let S' = S\{x}.
Generate(L', X', S')
To kick things off, find the set X of all vertices having no in-edges and call Generate((), X, V). Because every x chosen in the "For each" loop is different, every partial solution L' generated by the iterations of this loop must also be distinct, so no solution is generated more than once by any call to Generate(), including the top-level call.
In practice, forming X' can be done more efficiently than the above pseudocode suggests: When we choose x, we can delete all out-edges from x, but also add them to a temporary list of edges, and by tracking the total number of in-edges for each vertex (e.g. in an array indexed by vertex number) we can efficiently detect which vertices now have 0 in-edges and should thus be added to X'. Then at the end of the loop iteration, all the edges that we deleted can be restored from the temporary list.
So this approach is flawed! Unsure if it can be salvaged, I'll leave it a little while, if anyone can spot how to fix it, either grab what you can and post a new answer or edit mine.
Specifically, I used the below algorithm on the example from the comment and it will not output the example given, so it is clearly flawed.
The way I've learned to do a topological sort is the following:
Create a list of all the elements with no arrows pointing into it
Create a dictionary of element -> number, where element here is any element in the original collection that has an arrow into it, and the number is how many elements point to it.
Create a dictionary of element -> list, where element here is any element in the original collection that has an arrow out of it, and the list is all the elements those arrows point to
In your example, the two dictionaries and the list would be like this:
D1 D2 List
W: 1 V: W, X V
Y: 1 W: Y, Z
Z: 2 X: Z
X: 1
Then, start a loop where on each iteration you do the following:
Output all elements of the list, these currently have no arrows pointing into them. Make a temporary copy of the list, and clear the list, preparing it for the following iteration
Loop through the temporary copy, and find each element (if it exists) in the dictionary that is element -> list
For each element in those lists, decrement the corresponding number in the element -> number dictionary by 1 (removing 1 arrow). Once a number for an element here reaches 0, add that element to the list (it has no arrows left)
If the list is non-empty, redo the iteration loop
If you reach this point, and the dictionary with element -> number still has any elements left in it with a number above 0 (if you want to, you can remove the elements as you go in the above iteration once their numbers reach zero to make this part easier), then you have a cycle, since the above loop should not terminate until all arrows have been removed.
For your example, each iteration would output the following:
V
W, X (2nd iteration output both W and X)
Y, Z
If you want to know how I arrived at this solution, simply go through my iteration description step by step using the above dictionaries and list as the starting point.
Now, to specifically answer your question, how to output all combinations. The only places where "combinations" comes into play is per iteration. Basically, all the elements that you output in the first step of the iteration (the ones you made a temporary copy of) are considered "equivalent" and any internal ordering between these would have no impact on the topological sort.
So, do this:
In the first point in the iteration, place those elements into a list, and add that to another list, giving you a list of lists
This lists of lists will now contain each iteration as one element, and one element will be yet another list with the elements output in that iteration
Now, combine all permutations of the first list with all the permutations of the second list with all the permutations of the third list, and so on
This means taking this output:
V
W, X
Y, Z
Which gives you 1 * 2 * 2 = 4 permutations in total and you would combine all permutations of the 1st iteration (which is 1) with all the permutations of the 2nd iteration (which is 2, W, X and X, W) with all the permutations of the 3rd iteration (which is 2, Y, Z and Z, Y).
The final list of permutations that are valid topological sorts would be this:
V, W, X, Y, Z
V, X, W, Y, Z
V, W, X, Z, Y
V, X, W, Z, Y
Here is the example from the comment:
A and B with no in-edges. Both A and B have an edge to C, but only A has an edge to D. Neither C nor D has any out-edges.
Which gives:
A --> C
A --> D
B --> C
Dictionaries and list:
D1 D2 List
C: 2 A: C, D A
D: 1 B: C B
Iterations would output:
A, B
D, C
All permutations (2 * 2 = 4):
A, B, D, C
A, B, C, D
B, A, D, C
B, A, C, D
We have three sets S1, S2, S3. I need to find x,y,z such that
x E S1
y E S2
z E S3
let min denote the minimum value out of x,y,z
let max denote the maximum value out of x,y,z
The range denoted by max-min should be the MINIMUM possible value
Of course, the full-bruteforce solution described by IVlad is simple and therefore, easier and faster to write, but it's complexity is O(n3).
According to your algorithm tag, I would like to post a more complex algorithm, that has a O(n2) worst case and O(nlogn) average complexity (almost sure about this, but I'm too lazy to make a proof).
Algorithm description
Consider thinking about some abstract (X, Y, Z) tuple. We want to find a tuple that has a minimal distance between it's maximum and minimum element. What we can say at this point is that distance is actually created by our maximum element and minimum element. Therefore, the value of element between them really doesn't matter as long as it really lies between the maximum and the minimum.
So, here is the approach. We allocate some additional set (let's call it S) and combine every initial set (X, Y, Z) into one. We also need an ability to lookup the initial set of every element in the set we've just created (so, if we point to some element in S, let's say S[10] and ask "Where did this guy come from?", our application should answer something like "He comes from Y).
After that, let's sort our new set S by it's keys (this would be O(n log n) or O(n) in some certain cases)
Determining the minimal distance
Now the interesting part comes. What we want to do is to compute some artificial value, let's call it minimal distance and mark it as d[x], where x is some element from S. This value refers to the minimal max - min distance which can be achived using the elements that are predecessors / successors of current element in the sequence.
Consider the following example - this is our S set(first line shows indexes, second - values and letters X, Y and Z refer to initial sets):
0 1 2 3 4 5 6 7
------------------
1 2 4 5 8 10 11 12
Y Z Y X Y Y X Z
Let's say we want to compute that our minimal distance for element with index 4. In fact, that minimal distance means the best (x, y, z) tuple that can be built using the selected element.
In our case (S[4]), we can say that our (x, y, z) pair would definitely look like (something, 8, something), because it should have the element we're counting the distance for (pretty obvious, hehe).
Now, we have to fill the gaps. We know that elements we're seeking for, should be from X and Z. And we want those elements to be the best in terms of max - min distance. There is an easy way to select them.
We make a bidirectional run (run left, the run right from current element) seeking for the first element-not-from-Y. In this case we would seek for two nearest elements from X and Z in two directions (4 elements total).
This finding method is what we need: if we select the first element of from X while running (left / right, doesn't matter), that element would suit us better than any other element that follows it in terms of distance. This happens because our S set is sorted.
In case of my example (counting the distance for element with index number 4), we would mark elements with indexes 6 and 7 as suitable from the right side and elements with indexes 1 and 3 from the left side.
Now, we have to test 4 cases that can happen - and take the case so that our distance is minimal. In our particular case we have the following (elements returned by the previous routine):
Z X Y X Z
2 5 8 11 12
We should test every (X, Y, Z) tuple that can be built using these elements, take the tuple with minimal distance and save that distance for our element. In this example, we would say that (11, 8, 12) tuple has the best distance of 4. So, we store d[5] = 4 (5 here is the element index).
Yielding the result
Now, when we know how to find the distance, let's do it for every element in our S set (this operation would take O(n2) in the worst case and better time - something like O(nlogn) in average).
After we have that distance value for every element in our set, just select the element with minimal distance and run our distance counting algorithm (which is described above) for it once again, but now save the (-, -, -) tuple. It would be the answer.
Pseudocode
Here is comes the pseudocode, I tried to make it easy to read, but it's implementation would be more complex, because you'll need to code set lookups *("determine set for element"). Also note that determine tuple and determine distance routines are basically the same, but the second yields the actual tuple.
COMBINE (X, Y, Z) -> S
SORT(S)
FOREACH (v in S)
DETERMINE_DISTANCE(v, S) -> d[v]
DETERMINE_TUPLE(MIN(d[v]))
P.S
I'm pretty sure that this method could be easily used for (-, -, -, ... -) tuple seeking, still resulting in good algorithmic complexity.
min = infinity (really large number in practice, like 1000000000)
solution = (-, -, -)
for each x E S1
for each y E S2
for each z E S3
t = max(x, y, z) - min(x, y, z)
if t < min
min = t
solution = (x, y, z)