I'm having trouble to understand Kosaraju's algorithm for finding the strongly connected components of a directed graph. Here's what I have on my notebook (I'm a student :D):
Start from an arbitrary vertex (label it with #1) and perform a DFS. When you can't go any further, label the last visited vertex with #2, and start another DFS (skipping vertices already labeled), and so on.
Transpose the graph.
Do DFS starting from each vertex in reverse order, those vertices which end visited after each DFS belong to the same SCC.
I have this example:
And after the first step starting from E, the labels are:
E
G
K
J
I
H
F
C
D
B
A
So here comes the thing: Is there a difference for DFS in directed/undirected graphs?
I did a mental test of the first step on my mind ignoring the arrows (just like it was undirected) and only got correct #1 for E (of course) and #2 for G, but #3 fell onto J, not K. So I thought maybe I should respect the arrows, and did a DFS considering that, but after the first pass starting from E, I can't go anywhere from G (which is #2), so I'm stuck there.
Is there anything about DFS on directed graphs that I'm not aware of? I've been taught DFS only on undirected graphs!
Your second step is incomplete. See Wikipedia:
Kosaraju's algorithm works as follows:
Let G be a directed graph and S be an empty stack.
While S does not contain all vertices:
Choose an arbitrary vertex v not in S. Perform a depth-first search starting at v. Each time that depth-first search finishes expanding a vertex u, push u onto S.
Reverse the directions of all arcs to obtain the transpose graph.
While S is nonempty:
Pop the top vertex v from S. Perform a depth-first search starting at v in the transpose graph. The set of visited vertices will give the strongly connected component containing v; record this and remove all these vertices from the graph G and the stack S. Equivalently, breadth-first search (BFS) can be used instead of depth-first search.
So you shouldn't only do something with the last vertex and first vertices, but with each vertex in the DFS.
Also note that you should be backtracking - when you can't go further, you go to the previous vertex and continue from there.
And no, you can't treat it as an undirected graph - the direction of the edges matter significantly.
So, starting from E, you'd, for example, go F, then G, then back to F, then H, then K, then I, then J, then back to I, K, H, F, and finally E, having pushed all visited vertices onto the stack.
Related
Problem : A directed graph G with n vertices and a special vertex u is provided. We call a vertex v ‘interesting’ if there is a path from v to a vertex w such that there is a cycle containing the vertices w and u. Write an O(n) time algorithm which takes G (the whole graph) and the node u as input and returns all the interesting vertices.
Ineffiecient Algorithm : My idea initially was to consider the node u and compute all the cycles that contain u. (This itself seems like traversing through the nodes using DFS and then forward-tracking as well when you encounter a visited node) Now from each vertex on these cycles we can compute the number of nodes on the graph that do not belong to the cycle(s) but is connected with each particular vertex w not equal to u on a cycle. Add all these values to get the desired answer. This isn't an O(n) algorithm.
There are two cases:
If there are no cycles containing u, then no vertex can be "interesting".
If there are any cycles containing u, then a vertex v is "interesting" if and only if there's a path from v to u. (We don't need to worry about the w in the problem description, because if a cycle contains two vertices u and w, then any path that ends at u can be extended to end at w and vice versa.)
So, the most efficient algorithm would be as follows:
Using DFS, determine if u is in a cycle. (We don't need to find all cycles; we just need to determine whether there are any.)
Using DFS in the "reverse" direction, find all vertices from which u is reachable.
DFS requires O(|V| + |E|) time, which is greater than O(n) = O(|V|) unless |E| is in O(n); but then, there's no way to even read in the entire graph definition in less than |E| time, so this is unavoidable. Whoever gave you this question must not have really thought this through.
There's a famous algorithm to find the strongly connected components called Kosaraju's algorithm, which uses two DFS's to solve this problem, and runs in θ(|V| + |E|) time.
First we use DFS on complement of the graph (GR) to compute reverse postorder of vertices, and then we apply second DFS on the main graph G by taking vertices in reverse post order to compute the strongly connected components.
Although I understand the mechanics of the algorithm, I'm not getting the intuition behind the need of the reverse post order.
How does it helps the second DFS to find the strongly connected components?
suppose result of the first DFS is:
----------v1--------------v2-----------
where "-" indicates any number and all the vertices in a strongly connected component g appear between v1 and v2.
DFS by post order gives the following guarantee that
all vertices after v2 would not points to g in the reverse graph(that is to say, you cannot reach these vertices from g in the origin graph)
all vertices before v1 cannot be pointed to from g in the reverse graph(that is to say, you cannot reach g from these vertices in the origin graph)
in one word, the first DFS ensures that in the second DFS, strongly connected components that are visited earlier cannot have any edge points to other unvisited strongly connected components.
Some Detailed Explanation
let's simplify the graph as follow:
the whole graph is G
G contains two strongly connected components, one is g, the other one is a single vertex v
there is only one edge between v and g, either from v to g or from g to v, the name of this edge is e
g', e' represent the reverse of g, e
the situation in which this algorithm could fail can be conclude as
start DFS from v, and e' points from v to g'
start DFS from a vertex inside of g', and e' points from g' to v
For situation 1
origin graph would be like g-->v, and the reversed graph looks like g'<--v.
To start the second DFS from v, the post order generated by first DFS need to be something like
g1, g2, g3, ..., v
but you would easily find out that neither starting the first DFS from v nor from g' can give you such a post order, so in this situation, it is guaranteed be the first DFS that the second DFS would not start from a vertex that both be out of and points to a strongly connected component.
For situation 2
similar to the situation 1, in situation 2, where the origin graph is g<--v and the reversed on is g'-->v, it is guaranteed that v would be visited before any vertex in g'.
When you run DFS on a graph for the first time, for every node you visit you get the knowledge about all nodes that are reachable from that node (you get this information after the first DFS is finished).
Then, when you inverse all the vertices and run the DFS once more, for every node you visit you get the knowledge about all nodes that can reach that node in the non-inverted graph (again, you get this info after the second DFS finished).
Example: let's say your first DFS reaches node X. From that node "you can see" all the neighbours you can visit. (I hope this is pretty understandable). Then, let's say your second DFS reaches that node X, but this time all the vertices are inverted. If then from your node X "you can see" any other nodes, it means that before inverting the vertices the node X was reachable from all the neighbours you see now. By calling the second DFS in the correct order you get for every node X all the nodes that where reachable from X in both DFS trees (and so, for every node X you get the nodes that were both reachable from X and could reach X - those are strongly connected components by definition).
Suppose the list L is the post-order DFS visit of nodes. u->v indicates that there exists a forwarding path from u to v.
If u->v and not v->u, then u must appear at the left of v in L. The nodes in a SCC, such as v and w, however, may appear in any arbitrary order on the list L.
So, if a node x appear strictly before y on the list L:
case1: x->y and y->x, like the case of v and w
case2: x->y and not y->x, like the case of u and v
case3: not x->y and not y->x
The Kosaraju's algorithm iterates through L from left to right and run DFS starting from each node on the transpose graph (where the direction of edges are reversed). If some node is reachable by DFS and it does not belong to any SCC, then we add this node to the SCC of current root.
In case 1, we will add y to the SCC of x. In case 3, y and x are in different SCCs.
Case 2 requires some special attention. At the time we call DFS from y, x is already in some other SCC, so we will not add x to the SCC of y. Imagine if you called the DFS starting from root y before the DFS starting from root x, then x would be added to the SCC of y, which is wrong.
In short, the first DFS arranges those nodes which can reach y but can not be reached from y on its left. So the second DFS is able to avoid adding such nodes x to the SCC of y.
Let's say we run sharir kosaraju algorithm on a Directed graph. And we have an arc (u,v) on this graph.
In this algorithm we have two DFS passes.
Now suppose we insert vertex u into the first depth tree T.
Where can v appear? Is it in another tree created earlier or maybe later?
Thanks in advance !
I'm learning for a test... So this is a kind of homework I guess but I really have no clue!
Kosaraju's Algorithm is based on the fact that, Transpose of a Graph has the same number of Strongly Connected Components (SCCs) as the original Graph.
1) You have a graph G and an empty Stack S.
2) While S does not contain all the nodes in G, choose a random vertex u and do DFS on u. When you are done exploring a node v during this DFS, push the node v in S.
Back to your question, if there is an directed edge (u,v), v will be inserted in the stack S surely before u. But, there can be more nodes between insertion of v and insertion of u.
3) You do DFS of Transpose of G, by popping vertices from stack S, till S is empty. This will get you all the SCC's in the Graph G.
The wiki: http://en.wikipedia.org/wiki/Kosaraju%27s_algorithm is pretty instructive. I have implemented the algorithm and it is available here.
http://khanna111.com/wordPressBlog/2012/04/11/strongly-connected-components-of-a-graph/
The primary thing to understand is that the top elements in the stack in the first step after a pass would be the parents and that in the second step they would be popped out earlier and operate on the transpose where the nodes that were strongly connected in the original graph will remain strongly connected in the transpose.
The whole reason for the first pass is to get the parents to the top of the stack.
What would the following algorithm look like:
a linear-time algorithm which, given an undirected graph G, and a particular edge e in it, determines whether G has a cycle containing e
I have following Idea:
for each v that belongs to V,
if v is a descendant of e and (e,v) has not been traversed then check following:
if we visited e before v and left v before we left e then
the graph contains cycle
I am not sure if this is your homework so I'll just give a little hint - use the properties of breadth-first search tree (with root in any of the two vertices of the edge e), its subtrees which are determined by neighbors of the root and the edges between those subtrees.
Per comingstorm's hint, an undirected edge is itself a cycle. A<->B back and forth as many times as you like.
I need to check if a directed graph is strongly connected, or, in other words, if all nodes can be reached by any other node (not necessarily through direct edge).
One way of doing this is running a DFS and BFS on every node and see all others are still reachable.
Is there a better approach to do that?
Consider the following algorithm.
Start at a random vertex v of the graph G, and run a DFS(G, v).
If DFS(G, v) fails to reach every other vertex in the graph G, then there is some vertex u, such that there is no directed path from v to u, and thus G is not strongly connected.
If it does reach every vertex, then there is a directed path from v to every other vertex in the graph G.
Reverse the direction of all edges in the directed graph G.
Again run a DFS starting at v.
If the DFS fails to reach every vertex, then there is some vertex u, such that in the original graph there is no directed path from u to v.
On the other hand, if it does reach every vertex, then in the original graph there is a directed path from every vertex u to v.
Thus, if G "passes" both DFSs, it is strongly connected. Furthermore, since a DFS runs in O(n + m) time, this algorithm runs in O(2(n + m)) = O(n + m) time, since it requires 2 DFS traversals.
Tarjan's strongly connected components algorithm (or Gabow's variation) will of course suffice; if there's only one strongly connected component, then the graph is strongly connected.
Both are linear time.
As with a normal depth first search, you track the status of each node: new, seen but still open (it's in the call stack), and seen and finished. In addition, you store the depth when you first reached a node, and the lowest such depth that is reachable from the node (you know this after you finish a node). A node is the root of a strongly connected component if the lowest reachable depth is equal to its own depth. This works even if the depth by which you reach a node from the root isn't the minimum possible.
To check just for whether the whole graph is a single SCC, initiate the dfs from any single node, and when you've finished, if the lowest reachable depth is 0, and every node was visited, then the whole graph is strongly connected.
To check if every node has both paths to and from every other node in a given graph:
1. DFS/BFS from all nodes:
Tarjan's algorithm supposes every node has a depth d[i]. Initially, the root has the smallest depth. And we do the post-order DFS updates d[i] = min(d[j]) for any neighbor j of i. Actually BFS also works fine with the reduction rule d[i] = min(d[j]) here.
function dfs(i)
d[i] = i
mark i as visited
for each neighbor j of i:
if j is not visited then dfs(j)
d[i] = min(d[i], d[j])
If there is a forwarding path from u to v, then d[u] <= d[v]. In the SCC, d[v] <= d[u] <= d[v], thus, all the nodes in SCC will have the same depth. To tell if a graph is a SCC, we check whether all nodes have the same d[i].
2. Two DFS/BFS from the single node:
It is a simplified version of the Kosaraju’s algorithm. Starting from the root, we check if every node can be reached by DFS/BFS. Then, reverse the direction of every edge. We check if every node can be reached from the same root again. See C++ code.
You can calculate the All-Pairs Shortest Path and see if any is infinite.
Tarjan's Algorithm has been already mentioned. But I usually find Kosaraju's Algorithm easier to follow even though it needs two traversals of the graph. IIRC, it is also pretty well explained in CLRS.
test-connected(G)
{
choose a vertex x
make a list L of vertices reachable from x,
and another list K of vertices to be explored.
initially, L = K = x.
while K is nonempty
find and remove some vertex y in K
for each edge (y, z)
if (z is not in L)
add z to both L and K
if L has fewer than n items
return disconnected
else return connected
}
You can use Kosaraju’s DFS based simple algorithm that does two DFS traversals of graph:
The idea is, if every node can be reached from a vertex v, and every node can reach v, then the graph is strongly connected.
In step 2 of the algorithm, we check if all vertices are reachable from v. In step 4, we check if all vertices can reach v (In reversed graph, if all vertices are reachable from v, then all vertices can reach v in original graph).
Algorithm :
1) Initialize all vertices as not visited.
2) Do a DFS traversal of graph starting from any arbitrary vertex v. If DFS traversal doesn’t visit all vertices, then return false.
3) Reverse all arcs (or find transpose or reverse of graph)
4) Mark all vertices as not-visited in reversed graph.
5) Do a DFS traversal of reversed graph starting from same vertex v (Same as step 2). If DFS traversal doesn’t visit all vertices, then return false. Otherwise return true.
Time Complexity: Time complexity of above implementation is same as Depth First Search which is O(V+E) if the graph is represented using adjacency list representation.
One way of doing this would be to generate the Laplacian matrix for the graph, then calculate the eigenvalues, and finally count the number of zeros. The graph is strongly connection if there exists only one zero eigenvalue.
Note: Pay attention to the slightly different method for creating the Laplacian matrix for directed graphs.
The algorithm to check if a graph is strongly connected is quite straightforward. But why does the below algorithm work?
Algorithm: suppose there is a graph with vertices [A, B, C......Z]
Choose any random node, say J, and perform DFS from it. If all the nodes are reachable then continue to step 2.
Reverse the directions of the edges of the graph by doing transpose.
Again run DFS from node J and check if all the nodes are visited. If yes then the graph is strongly connected and return true.
Performing step 1 makes sense because we have to check if we can reach all the nodes from that node. After this, next logical step could be
i) Now do this for all other nodes
ii) or try to reach node J from every other node. Because once you reach node J, you are sure that you can reach every other node because of step 1.
This is what we are trying to do in steps 2 & 3. If in a transposed graph node J is able to reach all other nodes then this implies that in original graph all other nodes can reach J.