Converting a DAG to Binary Tree - algorithm

I am trying to convert a DAG to a binary tree. Consider the following graph
I want to have the following output for above tree.
Since A,B,C,E forms a diamond, to convert it into a tree, I need to move B and C into a line.
I have tried following:
Topological Sort : Output is A -> D -> B -> C -> E -> F .
Topological Sort with order : A -> [B,C,D] -> E -> F
Topological Path gives us a straight path. But, I want to preserve the sequence if possible i.e. A -> D. However, if there is a diamond, I want a node to only have one parent and sequence these parents as well.
Is there a way to generate a tree from a DAG for above cases?

Algorithm in pseudo-code
Run a topological sort on the graph
For every node B, in reverse order of the topological sort:
If B has more than one parent:
Order its parents A1, A2, ..., An in the order of the topological sort
For every i in 1..n-1:
Add an arc from Ai to A(i+1)
Remove the arc from Ai to B
Proof of correctness
The algorithm always terminates, since it is a loop of fixed length; its time complexity is O(N^2) where N is the number of nodes
Immediately after the step on a given node B, B has no more than one parent
If the step on a given node C has already been executed, then executing the step on a node B that comes before C in the topological order only adds arcs to nodes that come before C in the topological order; hence once a node's step has been executed, they never gain new parents.
This proves that the algorithm terminates and that every node has at most one parent after executing the algorithm. Since we only remove parents from nodes which had more than one parent, I think it also satisfies your question.

Related

Sort a directed graph that contains exactly one cycle

I have a graph that contains exactly one cycle but I need to sort it using a "topological sort" (of course actual topological sort does not handle cycle). I am wondering how can it be done?
For example:
A -> B
B -> C
C -> D
D -> A
Possible solutions are:
A -> B -> C -> D
B -> C -> D -> A
C -> D -> A -> B
D -> A -> B -> C
I see there is this algorithm as suggested here but it's too overcomplicated for my use case.
There are a few approaches and implementations for topological sort. The most intuitive one I have found is to:
Identify nodes with no incoming edges (Can be done through creating an adjacency list and a dictionary containing incoming edge counts for vertices).
Add these to the sorted list
Remove this node from the graph and subtract it from the count of incoming edges for the node
Repeat process until there are no longer any nodes with a count above 0.
If the sorted list is larger than the number of vertices you will know that you have a cycle at this point and can terminate the algorithm.
There are many code samples with various implementations online but this algorithm should help guide the implementation for a basic topological sort.

Big O in Adjency List - remove vertex and remove edge(time complexity cost of performing various operations on graphs)

I have to prepare explanation of time complexity of removing vertex (O(|V| + |E|)) and edge (O(|E|)) in Adjency List.
When removing vertex from graph with V vertices and E edges we need to go through all the edges (O(|E|)), of course, to check if which ones need to be removed with the vertex, but why do we need to check all vertices?
I don't understand why in order to remove edge we need to go through all the edges.
I think I might have bad understanding from the beginning, so would you kindly help with those two above?
To remove a vertex, you first need to find the vertex in your data structure. This time complexity of this find operation depends on the data structure you use; if you use a HashMap, it will be O(1); if you use a List, it will be O(V).
Once you have identified the vertex that needs to be removed, you now need to remove all the edges of that vertex. Since you are using an adjacency List, you simply need to iterate over the edge-list of the vertex you found in the previous step and update all those nodes. The run-time of this step is O(Deg(V)). Assuming a simple graph, the maximum degree of a node is O(V). For sparse graphs it will be much lower.
Hence the run-time of removeVertex will only be O(V).
Consider a graph like this:
A -> A
A -> B
A -> C
A -> D
B -> C
The adjacency list will look like this.
A: A -> B -> C -> D -> NULL
B: C -> NULL
C: NULL
D: NULL
Let's remove the vertex C, we have to go through all edges to see if we need to remove that edge, that's is O(|E|) Otherwise - how do you find A->C need to be removed?. After then, we need to remove the list C: NULL from the top level container. Depending on the top level container you may or may not need O(|V|) time for this. For example, if the top level container is an array and you don't allow holes, then you need to copy the array. Or the top level is a list, you will need to scan through the list to find the node representing C to delete.
From the original graph, let's removing the edge A->D, we have to go through the whole linked list A -> B -> C -> D to find out the node D and remove it. That's is why you need to go through all vertices. In the worse case, a vertex connects to all other vertices, so it need to go through all vertices to delete that element, or O(|V|). Depending on your top level container, again, you may or may not be able to find the list fast, that will cost you another O(|V|), but in no case I can imagine removing an edge that O(|E|) in an adjacency list representation.

Algorithm to convert directed acyclic graphs to sequences

If D depends on B and C which each depend on A, I want ABCD (or ACBD) as the result; that is generate a flat sequence from the graph such that all nodes appear before any of their descendants. For example, we may need to install dependencies for X before installing X.
What is a good algorithm for this?
In questions like this, terminology is crucial in order to find the correct links.
The dependencies you describe form a partially ordered set (poset). Simply put, that is a set with an order operator, for which the comparison of some pairs might be undefined. In your example B and C are incomparible (neither B depends on C, nor C depends on B).
An extension of the order operator is one that respects the original partial order and adds some extra comparisons to previously incomparable elements. In the extreme: a linear extension leads to a total ordering. For every partial ordering such an extension exists.
Algorithms to obtain a linear extension from a poset are called topological sorting. Wikipedia provides the following very simple algorithm:
L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
remove a node n from S
add n to tail of L
for each node m with an edge e from n to m do
remove edge e from the graph
if m has no other incoming edges then
insert m into S
if graph has edges then
return error (graph has at least one cycle)
else
return L (a topologically sorted order)

Topological sort variant algorithm

I have a set of data on which I need to perform a topological sort, with a few assumptions and constraints, and I was wondering if anyone knew an existing, efficient algorithm that would be appropriate for this.
The data relationships are known to form a DAG (so no cycles to worry about).
An edge from A to B indicates that A depends on B, so B must appear before A in the topological ordering.
The graph is not necessarily connected; that is, for any two nodes N and M there may be no way to get from N to M by following edges (even if you ignore edge direction).
The data relationships are singly linked. This means that when there is an edge directed from A to B, only the A node contains information about the existence of the edge.
The problem can be formulated as follows:
Given a set of nodes S in graph G which may or may not have incoming edges, find a topological ordering of the subgraph G' consisting of all of the nodes in G that are reachable from any node in set S (obeying edge direction).
This confounds the usual approaches to topological sorting because they require that the nodes in set S do not have any incoming edges, which is something that is not true in my case. The pathological case is:
A --> B --> D
| ^ ^
| | |
\---> C ----/
Where S = {B, C}. An appropriate ordering would be D, B, C, but if a normal topological sort algorithm happened to consider B before C, it would end up with C, D, B, which is completely wrong. (Note that A does not appear in the resulting ordering since it is not reachable from S; it's there to give an example where all of the nodes in S might have incoming edges)
Now, I have to imagine that this is a long-solved problem, since this is essentially what programs like apt and yum have to do when you specify multiple packages in one install command. However, when I search for keyphrases like "dependency resolution algorithm", I get results describing normal topological sorting, which does not handle this particular case.
I can think of a couple of ways to do this, but none of them seem particularly elegant. I was wondering if anyone had some pointers to an appropriate algorithm, preferably one that can operate in a single pass over the data.
I don't think you'll find an algorithm that can do this with a single pass over the data. I would perform a breadth-first search, starting with the nodes in S, and then do a topological sort on the resulting subgraph.
I think you can do a topological sorting of the entire graph and then select only the nodes which are reachable from the set of nodes (you can do some depth first searches from the nodes in the set, in the order resulted after the sorting, and you'll get in the subtree of a node if it wasn't visited before).

Graph serialization

I'm looking for a simple algorithm to 'serialize' a directed graph. In particular I've got a set of files with interdependencies on their execution order, and I want to find the correct order at compile time. I know it must be a fairly common thing to do - compilers do it all the time - but my google-fu has been weak today. What's the 'go-to' algorithm for this?
Topological Sort (From Wikipedia):
In graph theory, a topological sort or
topological ordering of a directed
acyclic graph (DAG) is a linear
ordering of its nodes in which each
node comes before all nodes to which
it has outbound edges. Every DAG has
one or more topological sorts.
Pseudo code:
L ← Empty list where we put the sorted elements
Q ← Set of all nodes with no incoming edges
while Q is non-empty do
remove a node n from Q
insert n into L
for each node m with an edge e from n to m do
remove edge e from the graph
if m has no other incoming edges then
insert m into Q
if graph has edges then
output error message (graph has a cycle)
else
output message (proposed topologically sorted order: L)
I would expect tools that need this simply walk the tree in a depth-first manner and when they hit a leaf, just process it (e.g. compile) and remove it from the graph (or mark it as processed, and treat nodes with all leaves processed as leaves).
As long as it's a DAG, this simple stack-based walk should be trivial.
I've come up with a fairly naive recursive algorithm (pseudocode):
Map<Object, List<Object>> source; // map of each object to its dependency list
List<Object> dest; // destination list
function resolve(a):
if (dest.contains(a)) return;
foreach (b in source[a]):
resolve(b);
dest.add(a);
foreach (a in source):
resolve(a);
The biggest problem with this is that it has no ability to detect cyclic dependencies - it can go into infinite recursion (ie stack overflow ;-p). The only way around that that I can see would be to flip the recursive algorithm into an interative one with a manual stack, and manually check the stack for repeated elements.
Anyone have something better?
If the graph contains cycles, how can there exist allowed execution orders for your files?
It seems to me that if the graph contains cycles, then you have no solution, and this
is reported correctly by the above algorithm.

Resources