Finding strongly connected components - Kosaraju’s Algorithm - algorithm

In directed graph, to find strongly connected components (with Kosaraju’s Algorithm) why do we have to transpose adjacency matrix (reverses the direction of all edges) if we could use reversed list of nodes by they finishing time and then traverse original graph.
In other words, we would find finish times of all vertices and start traversing from lowest finish time to greatest (by increasing finish time)?
Additionally, if we do topological sorting on some DAG, and then reverse edges (transpose adjacency matrix) and do topological sorting again - should we get to equal arrays, just in reversed order?

This won't give SCC. Consider 2 subgraphs S1 and S2. For both S1 and S2 to be a part of single SCC, there should be a path from S1 to S2 and also from S2 to S1. The way you mentioned it, it will count them as a single SCC even if there is only a path from S1 to S2.
DFS on both original and reversed graph makes sure that only components which have paths in both direction gets combined inside a SCC.
Additionally, if we do topological sorting on some DAG, and then reverse edges (transpose adjacency matrix) and do topological sorting again - should we get to equal arrays, just in reversed order?
Not necessarily. Consider a trivial example (1->2,1->3) .Topological sort=(1,2,3). Reverse graph (2->1,3->1). Topological sort (2,3,1)

Related

Reason why all DAG have more than one topological sort order

I am wondering as to why all Directed Acyclic Graph have more than one topological sort order.
I have searched up google and saying most of it just breeze through the fact that they have at least one topo sort. But i am thinking along the lines of how a singly linked list is implemented :
A -> B -> C -> D
This might mean that there is only one way the toposort can technically go through - D, C, B, A...
However, it may be the case that that is not a directed acyclic graph but i am not sure how to refute the case since it is directed (A to B, etc) , Acyclic (There are no cycles back to any start) Graph (it is technically a tree)..
Thank you so much for any clarifications provided !
It's not true that all DAGs have more than one topological sort. Remember that we can construct a topological sort by removing vertices with no incoming edges in order.
Consider a DAG that contains a continuous path that connects all its vertices (Note that this path does not form a cycle, otherwise it won't be a DAG). We can start by removing a vertex with no incoming edge and repeat. We'll find that the topological sort has an edge between each consecutive pair of vertices. If we wanted to form another topological sort, we could have started by removing some other vertex with no incoming edge, but this would mean that there are at least 2 edges with no incoming edges and in that case, it would be impossible to start a path from one vertex and connect all others.
Since we started with a DAG having a path connecting all the vertices, we are met with a contradiction. Hence, it is proven that a DAG with a path connecting all the vertices will have a unique topological sort.

How to build an array to present relationship of nodes in strongly connected component?

For directed graph G(V, E) with n nodes, I want to create an integer array a, and its length is n. If there is a path from node 1 to 2, then a[1] <= a[2], if they are in the same strongly connected component a[1] = a[2], if if there is no path from node 2 to 3, we have a[2] > a[3].
I think the time complexity should be O(n + m), because the time complexity of seeking strongly connected component is it. But I am not sure how to output an array for it, could anybody help? Thanks.
Once you have found every strongly connected components (SCC) of a graph, you can build the condensation of the graph by contracting each SCC into a single vertex. The condensation is a directed acyclic graph in which you can number the vertices using topological sorting. Every step has linear complexity.
Tarjan's SCC algorithm already does almost what you want, you need only one additional bookkeeping step.
Recall that Tarjan's SCC outputs the strongly connected components one by one already in a topologically sorted order. That is, all you have to do is to save the index of the SCC in all cells which correspond to the nodes of the current SCC. This is already the array that you want.
Depending on the representation of the graph and the implementation, you might want to save N - idx in the array cells, where N is the total number of found clusters. This is because it essentially doesn't matter in which direction you traverse the graph: the strongly connected components of a graph with the reversed arrows are the same. It depends on what is easier and faster to access in your concrete implementation.
Tarjan's algo traverses the graph twice, and has O(|V| + |E|) runtime. Keeping an additional array doesn't add anything to the equation.

Topological sort while traversing?

Is it possible to topologically sort a directed acyclic graph while traversing it?
One of the extra conditions that holds true for my case is that there is always exactly one vertex that has no incoming edges in my DAG. (My case is a file dependency structure in compilation with only a single entry file.)
I'm wondering if it would be possible to build the topologically sorted list while traversing the graph instead of finding every vertex first and then sort afterwards.
You could find topological sort of DAG graph by running a modified DFS which traverses the graph:
From Wikipedia:
An algorithm for topological sorting is based on depth-first search.
The algorithm loops through each node of the graph, in an arbitrary
order, initiating a depth-first search that terminates when it hits
any node that has already been visited since the beginning of the
topological sort or the node has no outgoing edges (i.e. a leaf node):
L ← Empty list that will contain the sorted nodes
while there are unmarked nodes do
select an unmarked node n
visit(n)
function visit(node n)
if n has a permanent mark then return
if n has a temporary mark then stop (not a DAG)
mark n temporarily
for each node m with an edge from n to m do
visit(m)
mark n permanently
add n to head of L
You can find many implementations if you google it, one implementation you can find here.

Prim and Kruskal's algorithms complexity

Given an undirected connected graph with weights. w:E->{1,2,3,4,5,6,7} - meaning there is only 7 weights possible.
I need to find a spanning tree using Prim's algorithm in O(n+m) and Kruskal's algorithm in O( m*a(m,n)).
I have no idea how to do this and really need some guidance about how the weights can help me in here.
You can sort edges weights faster.
In Kruskal algorithm you don't need O(M lg M) sort, you just can use count sort (or any other O(M) algorithm). So the final complexity is then O(M) for sorting and O(Ma(m)) for union-find phase. In total it is O(Ma(m)).
For the case of Prim algorithm. You don't need to use heap, you need 7 lists/queues/arrays/anything (with constant time insert and retrieval), one for each weight. And then when you are looking for cheapest outgoing edge you check is one of these lists is nonempty (from the cheapest one) and use that edge. Since 7 is a constant, whole algorithms runs in O(M) time.
As I understand, it is not popular to answer homework assignments, but this could hopefully be usefull for other people than just you ;)
Prim:
Prim is an algorithm for finding a minimum spanning tree (MST), just as Kruskal is.
An easy way to visualize the algorithm, is to draw the graph out on a piece of paper.
Then you create a moveable line (cut) over all the nodes you have selected. In the example below, the set A will be the nodes inside the cut. Then you chose the smallest edge running through the cut, i.e. from a node inside of the line to a node on the outside. Always chose the edge with the lowest weight. After adding the new node, you move the cut, so it contains the newly added node. Then you repeat untill all nodes are within the cut.
A short summary of the algorithm is:
Create a set, A, which will contain the chosen verticies. It will initially contain a random starting node, chosen by you.
Create another set, B. This will initially be empty and used to mark all chosen edges.
Choose an edge E (u, v), that is, an edge from node u to node v. The edge E must be the edge with the smallest weight, which has node u within the set A and v is not inside A. (If there are several edges with equal weight, any can be chosen at random)
Add the edge (u, v) to the set B and v to the set A.
Repeat step 3 and 4 until A = V, where V is the set of all verticies.
The set A and B now describe you spanning tree! The MST will contain the nodes within A and B will describe how they connect.
Kruskal:
Kruskal is similar to Prim, except you have no cut. So you always chose the smallest edge.
Create a set A, which initially is empty. It will be used to store chosen edges.
Chose the edge E with minimum weight from the set E, which is not already in A. (u,v) = (v,u), so you can only traverse the edge one direction.
Add E to A.
Repeat 2 and 3 untill A and E are equal, that is, untill you have chosen all edges.
I am unsure about the exact performance on these algorithms, but I assume Kruskal is O(E log E) and the performance of Prim is based on which data structure you use to store the edges. If you use a binary heap, searching for the smallest edge is faster than if you use an adjacency matrix for storing the minimum edge.
Hope this helps!

Topological sort by arcs

Really need just some guidance :
Topological sort by arcs definition (from my question) - is a way of ordering all the arcs in directional graph so all arcs that insert to vertex must apear before the one that come out from this vertex.
No need to to change anything in topological sort, you can just use it, and post-process.
high level pseudo code:
run topological sort, let the resulting array be arr
create empty edges list, let it be l
for each vertex v in arr [ordered iteration]:
3.1. for each (v,u) in E:
3.1.1. append (v,u) to l
return l
The advantage of this method is you can use topological sort as black box, without modifying it and just post-process to get the desired result.
Correctness [sketch of proof]:
Since for each edge (v,u) - u appears after v in topological sort, when you print it, it is done via v, and thus (v,u) is printed before you print any vertex attached to u.
Complexity:
O(|V|+|E|) topological sort, O(|V|+|E|) for post processing [iterating all vertices and all edges].
"Traditional" topological sort is sorting vertices, while this one is sorting arcs. Otherwise the principle is the same...

Resources