reconstruct a sequence from a set of partial orders - algorithm

I have a set of elements pairs. Each one of theses pairs means : In the final sequence the first elements precedes the second element.
The set of pairs contains enough pairs to reconstruct a unique sequence.
eg. :
If my set of pairs is {(A, B), (A, C), (C, B)}
= A precedes B, A precedes C and C precedes B.
my final sequence is ACB.
Now, I need an algorithm to reconstruct sequences from this kind of pair sets.
Efficiency is critical. Any smart tip is welcome !

Create a directed graph from those pairs, then perform topological sort.

This is problem of Topological sorting of Oriented graph. Read More

Related

Sort algorithm for "weak order"?

I have unordered elements in a vector. There's no transitivity; if element A > B and B > C, A > C doesn't need to be true.
I need to sort them so that an element is greater than its following one.
For example, if we have three elements A, B and C, and:
A<B, A>C
B<C, B>A
C<A, C>B
and the vector is <A,B,C>, we would need to sort it as <A,C,B>.
I've done the sorting with bubble sort and other classic sorting algorithms that require O(n2) time, but doesn't look efficient.
Is there a more efficient algorithm?
Thanks.
Consider your data as a graph, where the elements of your array A, B, C are vertices, and a (directed) edge between two vertices x and y are comparisons x>y.
The requirement to order the elements such that each adjacent pair x, y satisfies x>y is, in the graph view of your problem, the same as finding a Hamiltonian path through the vertices.
There's no apparent restrictions for your > relation (for example, it's not transitive, and it's ok for it to contain cycles), so the graph you get is an arbitrary graph. So you're left with the problem of finding a Hamiltonian path in an arbitrary graph, which is an NP complete problem. That means you're not going to find an easy, efficient solution.
What are you seeking is called topological sorting.
I ended up using Quicksort. I choose a pivot element, and sort elements so that I have two halves: elements lesser than the pivot and elements greater than the pivot. Quicksort is executed again recursively for those two halves. That way I have a O(nlog n) complexity time on average.
Thanks for the comments!

Topological sort with an objective function

I have a DAG with N nodes, i.e., 1, 2, ..., N, and each node has a weight (we can call it time) x_1, x_2, ..., x_N. I want to do a topological sorting but the difficulty is that I have an objective function when sorting. My objective function is to minimize the total time between several pairs of nodes.
For example, I have a DAG with 6 nodes, and I want a specific topological sorting such that (1,3) + (2,4) is minimized, where (A,B) denotes the time between two nodes A and B. For instance, if we have a sort [1, 6, 3, 2, 5, 4, 7], (1,3) = x_6 and (2,4) = x_5. Based on the DAG, I want to find a sorting that minimizes (1,3) + (2,4).
I have been thinking this problem for a while. Generate all possible topological sorts (reference link) and calculate the objective function one by one is always a possible solution but it takes too much time if N is large. I was also suggested to use branch-bound pruning when generating all possible sorts (I am not very familiar branch-bound but I think that won't dramatically reduce the complexity).
Any (optimal or heuristic) algorithm for this kind of problem? It would be perfect if the algorithm can also be applied to other objective functions such as minimizing the total starting time for some nodes. Any suggestion is appreciated.
PS: Or alternatively, is it possible to formulate this problem as a linear integer optimization problem?
One way to solve this is as follows:
First we run All-Pairs shortest path algorithm Floyd-Warshall. This algorithm can be coded in essentially 5 lines of code and it runs in O(V^3) time. It generates the shortest paths between each of the pair of vertices in the graph, i.e., it generates V X V matrix of shortest paths as its output.
It's trivial to modify this algorithm so that we can also get the count of vertices included in each of the O(N^2) paths. So now we can eliminate all paths that have less than N vertices. For the remaining paths, we order them by their cost and then test each of them to see if topological sort property is not violated. If this property is not violated then we have found our desired result.
The last step above, i.e., testing topological sort can be performed in O(V+E) for each of the O(V^2) paths. This yields worst case runtime of O(V^4). However in practice this should be fast because Floyd-Warshall can be made very cache friendly and we would be testing only small fraction of O(N^2) paths in reality. Also if your DAG is not dense then you might be able to optimize topological testing as well with appropriate data structures.
Here's an idea:
For simplicity, first suppose that you have a single pair to optimize (I'll comment on the general case later,) and suppose you already have your graph topologically sorted into an array.
Take the array segment starting at the lower (in terms of your topological order) node of the pair, say l, and ending at the higher node, say h. For every single node sorted between l and h, calculate whether it is bounded from below by l and/or bounded from above by h. You can calculate the former property by marking nodes in an "upward" BFS from l, cutting at nodes sorted above h; and similarly, the latter by marking in a "downward" BFS from h, cutting at nodes sorted below l. The complexity of either pass will be O( B*L ), where B is the branching factor, and L is the number of nodes originally sorted between l and h.
Now, all nodes that are not bounded from above by h can be moved above h, and all nodes that are not bounded from below by l can be moved below l (the two sets may overlap,) all without violating the topological sorting of the array, provided that that the original sorted order within each group of nodes relocated upward or downward is preserved.
This same procedure can be applied to as many pairs as needed, provided that the segments that they cut from the original sorting order do not overlap.
If any two pairs overlap, say (l1, h1) and (l2, h2), so that e.g. l1 < l2 < h1 < h2 in the original sorted order, you have the following two cases:
1) In the trivial case where h1 and l2 happen to be unrelated in the topological order, then you should be able to optimize the two pairs mostly independently from each other, only applying some care to move either l2 above h1 or h1 below l2 (but not e.g. h1 below l1, if that should turn out to be possible.)
2) If l2 < h1 in the topological order, you can treat both pairs as the single pair (l1, h2), and then possibly apply the procedure once more to (l2, h1).
As it's not clear what the complete process will achieve in the nontrivial overlapping case, especially if you have more complicated overlapping patterns, it may turn out to be better to treat all pairs uniformly, regardless of overlapping. In this case, the order can be processed repeatedly while each run yields an improvement over the previous (I'm not sure if the procedure will be monotonic in terms of the objective function though -- probably not.)

Sorting a list knowing the results of how some of the elements compare?

The goal is to sort a list X of n unknown variables {x0, x1, x2, ... x(n-1)} using a list C of m comparison results (booleans). Each comparison is between two of the n variables, e.g. x2 < x5, and the pair indices for each of the comparisons are fixed and given ahead of time. Also given: All pairs in C are unique (even when flipped, e.g. the pair x0, x1 means there is no pair x1, x0), and never compare a variable against itself. That means C has at most n*(n-1)/2 entries.
So the question is can I prove that my list C of m comparisons is sufficient to sort the list X? Obviously it would be if C was the largest possible length (had all possible comparisons). But what about shorter lists?
Then, if it has been proven that C contains enough information to sort, how do I then actually go about performing the sort.
Let's imagine that you have the collection of objects to be sorted and form a graph from them with one node per object. You're then given a list of pairs indicating how the comparisons go. You can think of these as edges in the graph: if you know that object x compares less than object y, then you can draw an edge from x to y.
Assuming that the results of the comparisons are consistent - that is, you don't have any cycles - you should have a directed acyclic graph.
Think about what happens if you topologically sort this DAG. What you'll end up with is one possible ordering of the values that's consistent with all of the constraints. The reason for this is that in a topological ordering, you won't place an element x before an element y if there is any transitive series of edges leading from y to x, and there's a transitive series of edges leading from y to x if there's a chain of comparisons that transitively indicates that y precedes x.
You can actually make a stronger claim: the set of all topological orderings of the DAG is exactly the set of all possible orderings that satisfy all the constraints. We've already argued that every topological ordering satisfies all the constraints, so all we need to do now is argue that every sequence satisfying all the constraints is a valid topological ordering. The argument here is essentially that if you obey all the constraints, you never place any element in the sequence before something that it transitively compares less than, so you never place any element in the sequence before something that has a path to it.
This then gives us a nice way to solve the problem: take the graph formed this way and see if it has exactly one topological ordering. If so, then that ordering is the unique sorted order. If not, then there are two or more orderings.
So how best to go about this? Well, one of the standard algorithms for doing a topological sort is to annotate each node with its indegree, then repeatedly pull off a node of indegree zero and adjust the indegrees of its successors. The DAG has exactly one topological ordering if in the course of performing this algorithm, at every stage there is exactly one node of indegree zero, since in that case the topological ordering is forced.
With the right setup and data structures, you can implement this to run in time O(n + m), where n is the number of nodes and m is the number of constraints. I'll leave those details as a proverbial exercise to the reader. :-)
Your problem can be reduced to the well-known Topological sort.
To prove that "C contains enough information to sort" is to prove the uniqueness of topological sort:
If a topological sort has the property that all pairs of consecutive vertices in the sorted order are connected by edges, then these edges form a directed Hamiltonian path in the DAG. If a Hamiltonian path exists, the topological sort order is unique; no other order respects the edges of the path. Conversely, if a topological sort does not form a Hamiltonian path, the DAG will have two or more valid topological orderings, for in this case it is always possible to form a second valid ordering by swapping two consecutive vertices that are not connected by an edge to each other. Therefore, it is possible to test in linear time whether a unique ordering exists, and whether a Hamiltonian path exists, despite the NP-hardness of the Hamiltonian path problem for more general directed graphs (Vernet & Markenzon 1997).

Algorithm to convert directed acyclic graphs to sequences

If D depends on B and C which each depend on A, I want ABCD (or ACBD) as the result; that is generate a flat sequence from the graph such that all nodes appear before any of their descendants. For example, we may need to install dependencies for X before installing X.
What is a good algorithm for this?
In questions like this, terminology is crucial in order to find the correct links.
The dependencies you describe form a partially ordered set (poset). Simply put, that is a set with an order operator, for which the comparison of some pairs might be undefined. In your example B and C are incomparible (neither B depends on C, nor C depends on B).
An extension of the order operator is one that respects the original partial order and adds some extra comparisons to previously incomparable elements. In the extreme: a linear extension leads to a total ordering. For every partial ordering such an extension exists.
Algorithms to obtain a linear extension from a poset are called topological sorting. Wikipedia provides the following very simple algorithm:
L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
remove a node n from S
add n to tail of L
for each node m with an edge e from n to m do
remove edge e from the graph
if m has no other incoming edges then
insert m into S
if graph has edges then
return error (graph has at least one cycle)
else
return L (a topologically sorted order)

How hard is this in terms of computational complexity?

So I have a problem that is basically like this: I have a bunch of strings, and I want to construct a DAG such that every path corresponds to a string and vice versa. However, I have the freedom to permute my strings arbitrarily. The order of characters does not matter. The DAGs that I generate have a cost associated with them. Basically, the cost of a branch in the DAG is proportional to the length of its child paths.
For example, let's say I have the strings BAAA, CAAA, DAAA, and I construct a DAG representing them without permuting them. I get:
() -> (B, C, D) -> A -> A -> A
where the tuple represents branching.
A cheaper representation for my purposes would be:
() -> A -> A -> A -> (B, C, D)
The problem is: Given n strings, permute the strings such that the corresponding DAG has the cheapest cost, where the cost function is: If we traverse the graph from the source in depth first, left to right order, the total number of nodes we visit, with multiplicity.
So the cost of the first example is 12, because we must visit the A's multiple times on the traversal. The cost of the second example is 6, because we only visit the A's once before we deal with the branches.
I have a feeling this problem is NP Hard. It seems like a question about formal languages and I'm not familiar enough with those sorts of algorithms to figure out how I should go about the reduction. I don't need a complete answer per se, but if someone could point out a class of well known problems that seem related, I would much appreciate it.
To rephrase:
Given words w1, …, wn, compute permutations x1 of w1, …, xn of wn to minimize the size of the trie storing x1, …, xn.
Assuming an alphabet of unlimited size, this problem is NP-hard via a reduction from vertex cover. (I believe it might be fixed-parameter tractable in the size of the alphabet.) The reduction is easy: given a graph, let each vertex be its own letter and create a two-letter word for each edge.
There is exactly one node at depth zero, and as many nodes at depth two as there are edges. The possible sets of nodes at depth one are exactly the sets of nodes that are vertex covers.

Resources