Topological sort while traversing? - algorithm

Is it possible to topologically sort a directed acyclic graph while traversing it?
One of the extra conditions that holds true for my case is that there is always exactly one vertex that has no incoming edges in my DAG. (My case is a file dependency structure in compilation with only a single entry file.)
I'm wondering if it would be possible to build the topologically sorted list while traversing the graph instead of finding every vertex first and then sort afterwards.

You could find topological sort of DAG graph by running a modified DFS which traverses the graph:
From Wikipedia:
An algorithm for topological sorting is based on depth-first search.
The algorithm loops through each node of the graph, in an arbitrary
order, initiating a depth-first search that terminates when it hits
any node that has already been visited since the beginning of the
topological sort or the node has no outgoing edges (i.e. a leaf node):
L ← Empty list that will contain the sorted nodes
while there are unmarked nodes do
select an unmarked node n
visit(n)
function visit(node n)
if n has a permanent mark then return
if n has a temporary mark then stop (not a DAG)
mark n temporarily
for each node m with an edge from n to m do
visit(m)
mark n permanently
add n to head of L
You can find many implementations if you google it, one implementation you can find here.

Related

Finding strongly connected components - Kosaraju’s Algorithm

In directed graph, to find strongly connected components (with Kosaraju’s Algorithm) why do we have to transpose adjacency matrix (reverses the direction of all edges) if we could use reversed list of nodes by they finishing time and then traverse original graph.
In other words, we would find finish times of all vertices and start traversing from lowest finish time to greatest (by increasing finish time)?
Additionally, if we do topological sorting on some DAG, and then reverse edges (transpose adjacency matrix) and do topological sorting again - should we get to equal arrays, just in reversed order?
This won't give SCC. Consider 2 subgraphs S1 and S2. For both S1 and S2 to be a part of single SCC, there should be a path from S1 to S2 and also from S2 to S1. The way you mentioned it, it will count them as a single SCC even if there is only a path from S1 to S2.
DFS on both original and reversed graph makes sure that only components which have paths in both direction gets combined inside a SCC.
Additionally, if we do topological sorting on some DAG, and then reverse edges (transpose adjacency matrix) and do topological sorting again - should we get to equal arrays, just in reversed order?
Not necessarily. Consider a trivial example (1->2,1->3) .Topological sort=(1,2,3). Reverse graph (2->1,3->1). Topological sort (2,3,1)

Give an order for deleting vertices from a graph such that it doesn't disconnect the graph

This is a question from Algorithm Design by Steven Skiena (for interview prep):
An articulation vertex of a graph G is a vertex whose deletion disconnects G. Let G be a graph with n vertices and m edges. Give a simple O(n + m) that finds a deletion order for the n vertices such that no deletion disconnects the graph.
This is what I thought:
Run DFS on the graph and keep updating each node's oldest reachable ancestor (based on which we decide if it's a bridge cut node, parent cute node or root cut node)
If we find a leaf node(vertex) or a node which is not an articulation vertex delete it.
At the end of DFS, we'd be left with all those nodes in graph which were found to be articulation vertices
The graph will remain connected as the articulation vertices are intact. I've tried it on a couple of graphs and it seems to work but it feels too simple for the book.
in 2 steps:
make the graph DAG using any traversal algorithm
do topology sort
each step finishes without going beyond O(m+n)
Assuming the graph is connected, then any random node reaches a subgraph whose spanning tree may be deleted in post-order without breaking the connectedness of the graph. Repeat in this manner until the graph is all gone.
Utilize DFS to track the exit time of each vertex;
Delete vertices in the order of recorded exit time;
If we always delete leaves of a tree one by one, rest of the tree remain connected. One particular way of doing this is to assign a pre-order number to each vertex as the graph is traversed using DFS or BFS. Sort the vertices in descending order (based on pre-order numbers). Remove vertices in that order from graph. Note that the leaves are always deleted first.

How to find the maximum-weight path between two vertices in a DAG?

In a DAG G, with non negative weighted edges, how do you find the maximum-weight path between two vertices in G?
Thank you guys!
You can solve this in O(n + m) time (where n is the number of nodes and m the number of edges) using a topological sort. Begin by doing topological sort on the reverse graph, so that you have all the nodes ordered in a way such that no node is visited before all its children are visited.
Now, we're going to label all the nodes with the weight of the highest-weight path starting with that node. This is done based on the following recursive observation:
The weight of the highest-weight path starting from a sink node (any node with no outgoing edges) is zero, since the only path starting from that node is the length-zero path of just that node.
The weight of the highest-weight path starting from any other node is given by the maximum weight of any path formed by following an outgoing edge to a node, then taking the maximum-weight path from that node.
Because we have the nodes reverse-topologically sorted, we can visit all of the nodes in an order that guarantees that if we ever try following an edge and looking up the cost of the heaviest path at the endpoint of that node, we will have already computed the maximum-weight path starting at that node. This means that once we have the reverse topological sorted order, we can apply the following algorithm to all the nodes in that order:
If the node has no outgoing edges, record the weight of the heaviest path starting at that node (denoted d(u)) as zero.
Otherwise, for each edge (u, v) leaving the current node u, compute l(u, v) + d(v), and set d(u) to be the largest value attained this way.
Once we've done this step, we can make one last pass over all the nodes and return the highest value of d attained by any node.
The runtime of this algorithm can be analyzed as follows. Computing a topological sort can be done in O(n + m) time using many different methods. When we then scan over each node and each outgoing edge from each node, we visit each node and edge exactly once. This means that we spend O(n) time on the nodes and O(m) time on the edges. Finally, we spend O(n) time on one final pass over the elements to find the highest weight path, which takes O(n). This gives a grand total of O(n + m) time, which is linear in the size of the input.
A simple brute-force algorithm can be written using recursive functions.
Start with an empty vector (in C++: std::vector) and insert the first node.
Then call your recursive function with the vector as argument that does the following:
loop over all neighbours and for each neighbour
copy the vector
add the neighbour
call ourself
Also add the total weight as argument to the recursive function and add the weight in every recursive call.
The function should stop whenever it reaches the end node. Then compare the total weight with the maximum weight you have so far (use a global variable) and if the new total weight is bigger, set the maximum weight and store the vector.
The rest is up to you.

Most efficient way to visit nodes of a DAG in order

I have a large (100,000+ nodes) Directed Acyclic Graph (DAG) and would like to run a "visitor" type function on each node in order, where order is defined by the arrows in the graph. i.e. all parents of a node are guaranteed to be visited before the node itself.
If two nodes do not refer to each other directly or indirectly, then I don't care which order they are visited in.
What's the most efficient algorithm to do this?
You would have to perform a topological sort on the nodes, and visit the nodes in the resulting order.
The complexity of such algorithm is O(|V|+|E|) which is quite good. You want to traverse all nodes, so if you would want a faster algorithm than that, you would have to solve it without even looking at all edges, which would be dangerous, because one single edge could havoc the order completely.
There are some answers here:
Good graph traversal algorithm
and here:
http://en.wikipedia.org/wiki/Topological_sorting
In general, after visiting a node, you should visit its related nodes, but only the nodes that are not already visited. In order to keep track of the visited nodes, you need to keep the IDs of the nodes in a set (or map), or you can mark the node as visited (somehow).
If you care about the topological order, you must first get hold of a collection of all the un-traversed links ("remaining links") to a node, sorted by the id of the referenced node (typically: map(node-ID -> link-count)). If you haven't got that, you might need to build it using an approach similar to the one above. Then, start by visiting a node whose remaining incoming link count is zero. For each link from that node, reduce the remaining link count for each related node, adding the related node to the set of nodes-to-visit (or just visiting the node) if the count reaches zero.
As mentioned in the other answers, this problem can be solved by Topological Sorting.
A very simple algorithm for that (not the most efficient):
Keep an array (or map) indegree[] where indegree[node]=number of incoming edges of node
while there is at least one node n with indegree[n]=0:
for each node n in nodes where indegree[n]>0:
visit(n)
indegree[n]=-1 # mark n as visited
for each node x adjacent to n:
indegree[x]=indegree[x]-1 # its parent has been visited, so one less edge coming into it
You can traverse a DAG in O(N) (without any topsort) by just running your dfs from every node with zero indegree, because those will be the valid "starting point". This will work because graph has no cycles, those zero indegree nodes must exist, and must traverse the whole graph.

Graph serialization

I'm looking for a simple algorithm to 'serialize' a directed graph. In particular I've got a set of files with interdependencies on their execution order, and I want to find the correct order at compile time. I know it must be a fairly common thing to do - compilers do it all the time - but my google-fu has been weak today. What's the 'go-to' algorithm for this?
Topological Sort (From Wikipedia):
In graph theory, a topological sort or
topological ordering of a directed
acyclic graph (DAG) is a linear
ordering of its nodes in which each
node comes before all nodes to which
it has outbound edges. Every DAG has
one or more topological sorts.
Pseudo code:
L ← Empty list where we put the sorted elements
Q ← Set of all nodes with no incoming edges
while Q is non-empty do
remove a node n from Q
insert n into L
for each node m with an edge e from n to m do
remove edge e from the graph
if m has no other incoming edges then
insert m into Q
if graph has edges then
output error message (graph has a cycle)
else
output message (proposed topologically sorted order: L)
I would expect tools that need this simply walk the tree in a depth-first manner and when they hit a leaf, just process it (e.g. compile) and remove it from the graph (or mark it as processed, and treat nodes with all leaves processed as leaves).
As long as it's a DAG, this simple stack-based walk should be trivial.
I've come up with a fairly naive recursive algorithm (pseudocode):
Map<Object, List<Object>> source; // map of each object to its dependency list
List<Object> dest; // destination list
function resolve(a):
if (dest.contains(a)) return;
foreach (b in source[a]):
resolve(b);
dest.add(a);
foreach (a in source):
resolve(a);
The biggest problem with this is that it has no ability to detect cyclic dependencies - it can go into infinite recursion (ie stack overflow ;-p). The only way around that that I can see would be to flip the recursive algorithm into an interative one with a manual stack, and manually check the stack for repeated elements.
Anyone have something better?
If the graph contains cycles, how can there exist allowed execution orders for your files?
It seems to me that if the graph contains cycles, then you have no solution, and this
is reported correctly by the above algorithm.

Resources