How to find Strongly Connected Components in a Graph? - algorithm

I am trying self-study Graph Theory, and now trying to understand how to find SCC in a graph. I have read several different questions/answers on SO (e.g., 1,2,3,4,5,6,7,8), but I cant find one with a complete step-by-step example I could follow.
According to CORMEN (Introduction to Algorithms), one method is:
Call DFS(G) to compute finishing times f[u] for each vertex u
Compute Transpose(G)
Call DFS(Transpose(G)), but in the main loop of DFS, consider the vertices in order of decreasing f[u] (as computed in step 1)
Output the vertices of each tree in the depth-first forest of step 3 as a separate strong connected component
Observe the following graph (question is 3.4 from here. I have found several solutions here and here, but I am trying to break this down and understand it myself.)
Step 1: Call DFS(G) to compute finishing times f[u] for each vertex u
Running DFS starting on vertex A:
Please notice RED text formatted as [Pre-Vist, Post-Visit]
Step 2: Compute Transpose(G)
Step 3. Call DFS(Transpose(G)), but in the main loop of DFS, consider the vertices in order of decreasing f[u] (as computed in step 1)
Okay, so vertices in order of decreasing post-visit(finishing times) values:
{E, B, A, H, G, I , C, D, F ,J}
So at this step, we run DFS on G^T but start with each vertex from above list:
DFS(E): {E}
DFS(B): {B}
DFS(A): {A}
DFS(H): {H, I, G}
DFS(G): remove from list since it is already visited
DFS(I): remove from list since it is already visited
DFS(C): {C, J, F, D}
DFS(J): remove from list since it is already visited
DFS(F): remove from list since it is already visited
DFS(D): remove from list since it is already visited
Step 4: Output the vertices of each tree in the depth-first forest of step 3 as a separate strong connected component.
So we have five strongly connected components: {E}, {B}, {A}, {H, I, G}, {C, J, F, D}
This is what I believe is correct. However, solutions I found here and here say SCCs are {C,J,F,H,I,G,D}, and {A,E,B}. Where are my mistakes?

Your steps are correct and your answer is also correct, by examining the other answers you provided you can see that they used a different algorithm: First you run DFS on G transposed and then you run an undirected components algorithm on G processing the vertices in decreasing order of their post numbers from the previous step.
The problem is they ran this last step on G transposed instead of in G and thus got an incorrent answer. If you read Dasgupta from page 98 onwards you will see a detailed explanation of the algorithm they (tried) to use.

Your answers is correct. As per CLRS, "A strongly connected component of a directed graph G = (V,E) is a maximal set of vertices C, such that for every pair of vertices u and v, we have both u ~> v and v ~> u, i.e. vertices v and u are reachable from each other."
In case you assume {C, J, F, H, I, G, D} as correct, there is no way to reach from D to G (amongst many other fallacies), and same with other set, there is no way to reach from A to E.

Related

Find disjoint sets of vertexes in a Graph

I want to find a simple method to generate sets of disjoint parts in a Graph. In other words, in the following Graph, I want to get two sets of {A, B, C, D} and {E, F}.
You can use any graph traversal algorithm (BFS and DFS are the most common).
Whenever the algorithm is "stuck" (there is no more nodes to traverse), you have finished finding one component, mark it, and choose a random vertex that was not traversed yet to find the next component.

How to traverse a Graph based on the angle between two edges

I am stuck in a problem regarding Graph traversal based on the angle between two edges. I would like to summarize the problem as follows, given 5 vertices a,b,c,d,e and the edges (a, b), (b, c), (c, d), (d, e).
If I want to traverse the graph based on calculating the angle between two edges like for example angle((a, b), (b, c)). If my angle is greater than 10 degree I should stop at b and start the process again.
What steps do I need to consider to approach this problem having concrete programming structures.
If I understand correctly, when angle((a,b),(b,c)) returns a value of over some threshold (10, in your example), you should stop traversing the graph.
This means that effectively, this node (b) is not helping by connecting the two edges ((a,b) and (b,c)). It might be useful for some other set of edges, but that specific connection is not available.
What I suggest is swapping the role of edges and nodes. Every edge in G becomes a node in G' and every node in G becomes and edge in G' only if the value of angle() returns a value lower than your threshold.
On G' you can now run BFS, DFS or any other algorithm of your liking. When you are done, use the reverse transformation to "translate" your answer back into the original graph in question.

DFS on directed graph & Kosaraju's algorithm

I'm having trouble to understand Kosaraju's algorithm for finding the strongly connected components of a directed graph. Here's what I have on my notebook (I'm a student :D):
Start from an arbitrary vertex (label it with #1) and perform a DFS. When you can't go any further, label the last visited vertex with #2, and start another DFS (skipping vertices already labeled), and so on.
Transpose the graph.
Do DFS starting from each vertex in reverse order, those vertices which end visited after each DFS belong to the same SCC.
I have this example:
And after the first step starting from E, the labels are:
E
G
K
J
I
H
F
C
D
B
A
So here comes the thing: Is there a difference for DFS in directed/undirected graphs?
I did a mental test of the first step on my mind ignoring the arrows (just like it was undirected) and only got correct #1 for E (of course) and #2 for G, but #3 fell onto J, not K. So I thought maybe I should respect the arrows, and did a DFS considering that, but after the first pass starting from E, I can't go anywhere from G (which is #2), so I'm stuck there.
Is there anything about DFS on directed graphs that I'm not aware of? I've been taught DFS only on undirected graphs!
Your second step is incomplete. See Wikipedia:
Kosaraju's algorithm works as follows:
Let G be a directed graph and S be an empty stack.
While S does not contain all vertices:
Choose an arbitrary vertex v not in S. Perform a depth-first search starting at v. Each time that depth-first search finishes expanding a vertex u, push u onto S.
Reverse the directions of all arcs to obtain the transpose graph.
While S is nonempty:
Pop the top vertex v from S. Perform a depth-first search starting at v in the transpose graph. The set of visited vertices will give the strongly connected component containing v; record this and remove all these vertices from the graph G and the stack S. Equivalently, breadth-first search (BFS) can be used instead of depth-first search.
So you shouldn't only do something with the last vertex and first vertices, but with each vertex in the DFS.
Also note that you should be backtracking - when you can't go further, you go to the previous vertex and continue from there.
And no, you can't treat it as an undirected graph - the direction of the edges matter significantly.
So, starting from E, you'd, for example, go F, then G, then back to F, then H, then K, then I, then J, then back to I, K, H, F, and finally E, having pushed all visited vertices onto the stack.

Why do we need to run DFS on the complement of a graph in the Kosaraju's algorithm?

There's a famous algorithm to find the strongly connected components called Kosaraju's algorithm, which uses two DFS's to solve this problem, and runs in θ(|V| + |E|) time.
First we use DFS on complement of the graph (GR) to compute reverse postorder of vertices, and then we apply second DFS on the main graph G by taking vertices in reverse post order to compute the strongly connected components.
Although I understand the mechanics of the algorithm, I'm not getting the intuition behind the need of the reverse post order.
How does it helps the second DFS to find the strongly connected components?
suppose result of the first DFS is:
----------v1--------------v2-----------
where "-" indicates any number and all the vertices in a strongly connected component g appear between v1 and v2.
DFS by post order gives the following guarantee that
all vertices after v2 would not points to g in the reverse graph(that is to say, you cannot reach these vertices from g in the origin graph)
all vertices before v1 cannot be pointed to from g in the reverse graph(that is to say, you cannot reach g from these vertices in the origin graph)
in one word, the first DFS ensures that in the second DFS, strongly connected components that are visited earlier cannot have any edge points to other unvisited strongly connected components.
Some Detailed Explanation
let's simplify the graph as follow:
the whole graph is G
G contains two strongly connected components, one is g, the other one is a single vertex v
there is only one edge between v and g, either from v to g or from g to v, the name of this edge is e
g', e' represent the reverse of g, e
the situation in which this algorithm could fail can be conclude as
start DFS from v, and e' points from v to g'
start DFS from a vertex inside of g', and e' points from g' to v
For situation 1
origin graph would be like g-->v, and the reversed graph looks like g'<--v.
To start the second DFS from v, the post order generated by first DFS need to be something like
g1, g2, g3, ..., v
but you would easily find out that neither starting the first DFS from v nor from g' can give you such a post order, so in this situation, it is guaranteed be the first DFS that the second DFS would not start from a vertex that both be out of and points to a strongly connected component.
For situation 2
similar to the situation 1, in situation 2, where the origin graph is g<--v and the reversed on is g'-->v, it is guaranteed that v would be visited before any vertex in g'.
When you run DFS on a graph for the first time, for every node you visit you get the knowledge about all nodes that are reachable from that node (you get this information after the first DFS is finished).
Then, when you inverse all the vertices and run the DFS once more, for every node you visit you get the knowledge about all nodes that can reach that node in the non-inverted graph (again, you get this info after the second DFS finished).
Example: let's say your first DFS reaches node X. From that node "you can see" all the neighbours you can visit. (I hope this is pretty understandable). Then, let's say your second DFS reaches that node X, but this time all the vertices are inverted. If then from your node X "you can see" any other nodes, it means that before inverting the vertices the node X was reachable from all the neighbours you see now. By calling the second DFS in the correct order you get for every node X all the nodes that where reachable from X in both DFS trees (and so, for every node X you get the nodes that were both reachable from X and could reach X - those are strongly connected components by definition).
Suppose the list L is the post-order DFS visit of nodes. u->v indicates that there exists a forwarding path from u to v.
If u->v and not v->u, then u must appear at the left of v in L. The nodes in a SCC, such as v and w, however, may appear in any arbitrary order on the list L.
So, if a node x appear strictly before y on the list L:
case1: x->y and y->x, like the case of v and w
case2: x->y and not y->x, like the case of u and v
case3: not x->y and not y->x
The Kosaraju's algorithm iterates through L from left to right and run DFS starting from each node on the transpose graph (where the direction of edges are reversed). If some node is reachable by DFS and it does not belong to any SCC, then we add this node to the SCC of current root.
In case 1, we will add y to the SCC of x. In case 3, y and x are in different SCCs.
Case 2 requires some special attention. At the time we call DFS from y, x is already in some other SCC, so we will not add x to the SCC of y. Imagine if you called the DFS starting from root y before the DFS starting from root x, then x would be added to the SCC of y, which is wrong.
In short, the first DFS arranges those nodes which can reach y but can not be reached from y on its left. So the second DFS is able to avoid adding such nodes x to the SCC of y.

Linear time algorithm to make a graph strongly connected

We have a weakly acyclic digraph.
Also we are given a set A which holds vertices of G that have in-degree zero and a set B which holds vertices that have out-degree zero. (size of A is smaller then size of B).
On top of that, we also know that if items in A and B have a particular order (e.g. A = a1, a2, ..., am and B = b1, b2, ... , bn) a DFS started at ai visits bi (1≤ i ≤ m).
Is it possible to design a linear time algorithm which makes G strongly connected by adding to it as few edges as possible?
Add arcs bj -> aj+1 for j = 1, ..., m-1 and arcs bj -> a1 for j = m, ..., n.
The resulting graph is strongly connected because the a's and b's are strongly connected by the added arcs and the paths from ai to bi and, for every node x, there exist i, j such that there exists a path in the original graph from ai to x and a path in the original graph from x to bj.
We cannot use fewer arcs, because an outgoing arc must be added to each of b1, ..., bn.
Edited - Following does not produce solution with least links:
You can run http://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm in linear time. I propose that you do this and note that "no strongly connected component will be identified before any of its successors". Therefore the first strongly component out of the graph must not be a successor of any of the other components. I suggest that every time you emit a strongly connected component which has no successor, then you add a link connecting it to this first component. I suggest that you also add a link every time you essentially restart the Tarjan algorithm with a non-recursive call to strongconnect(), connecting the first component to the vertex you are restarting at.
With these links you can get from the first strong component to every other component, and from every other component to the first strong component. - unfortunately this is not necessarily the solution with the least links - see second comment by Per below.

Resources