Algorithm to check if directed graph is strongly connected - algorithm

I need to check if a directed graph is strongly connected, or, in other words, if all nodes can be reached by any other node (not necessarily through direct edge).
One way of doing this is running a DFS and BFS on every node and see all others are still reachable.
Is there a better approach to do that?

Consider the following algorithm.
Start at a random vertex v of the graph G, and run a DFS(G, v).
If DFS(G, v) fails to reach every other vertex in the graph G, then there is some vertex u, such that there is no directed path from v to u, and thus G is not strongly connected.
If it does reach every vertex, then there is a directed path from v to every other vertex in the graph G.
Reverse the direction of all edges in the directed graph G.
Again run a DFS starting at v.
If the DFS fails to reach every vertex, then there is some vertex u, such that in the original graph there is no directed path from u to v.
On the other hand, if it does reach every vertex, then in the original graph there is a directed path from every vertex u to v.
Thus, if G "passes" both DFSs, it is strongly connected. Furthermore, since a DFS runs in O(n + m) time, this algorithm runs in O(2(n + m)) = O(n + m) time, since it requires 2 DFS traversals.

Tarjan's strongly connected components algorithm (or Gabow's variation) will of course suffice; if there's only one strongly connected component, then the graph is strongly connected.
Both are linear time.
As with a normal depth first search, you track the status of each node: new, seen but still open (it's in the call stack), and seen and finished. In addition, you store the depth when you first reached a node, and the lowest such depth that is reachable from the node (you know this after you finish a node). A node is the root of a strongly connected component if the lowest reachable depth is equal to its own depth. This works even if the depth by which you reach a node from the root isn't the minimum possible.
To check just for whether the whole graph is a single SCC, initiate the dfs from any single node, and when you've finished, if the lowest reachable depth is 0, and every node was visited, then the whole graph is strongly connected.

To check if every node has both paths to and from every other node in a given graph:
1. DFS/BFS from all nodes:
Tarjan's algorithm supposes every node has a depth d[i]. Initially, the root has the smallest depth. And we do the post-order DFS updates d[i] = min(d[j]) for any neighbor j of i. Actually BFS also works fine with the reduction rule d[i] = min(d[j]) here.
function dfs(i)
d[i] = i
mark i as visited
for each neighbor j of i:
if j is not visited then dfs(j)
d[i] = min(d[i], d[j])
If there is a forwarding path from u to v, then d[u] <= d[v]. In the SCC, d[v] <= d[u] <= d[v], thus, all the nodes in SCC will have the same depth. To tell if a graph is a SCC, we check whether all nodes have the same d[i].
2. Two DFS/BFS from the single node:
It is a simplified version of the Kosaraju’s algorithm. Starting from the root, we check if every node can be reached by DFS/BFS. Then, reverse the direction of every edge. We check if every node can be reached from the same root again. See C++ code.

You can calculate the All-Pairs Shortest Path and see if any is infinite.

Tarjan's Algorithm has been already mentioned. But I usually find Kosaraju's Algorithm easier to follow even though it needs two traversals of the graph. IIRC, it is also pretty well explained in CLRS.

test-connected(G)
{
choose a vertex x
make a list L of vertices reachable from x,
and another list K of vertices to be explored.
initially, L = K = x.
while K is nonempty
find and remove some vertex y in K
for each edge (y, z)
if (z is not in L)
add z to both L and K
if L has fewer than n items
return disconnected
else return connected
}

You can use Kosaraju’s DFS based simple algorithm that does two DFS traversals of graph:
The idea is, if every node can be reached from a vertex v, and every node can reach v, then the graph is strongly connected.
In step 2 of the algorithm, we check if all vertices are reachable from v. In step 4, we check if all vertices can reach v (In reversed graph, if all vertices are reachable from v, then all vertices can reach v in original graph).
Algorithm :
1) Initialize all vertices as not visited.
2) Do a DFS traversal of graph starting from any arbitrary vertex v. If DFS traversal doesn’t visit all vertices, then return false.
3) Reverse all arcs (or find transpose or reverse of graph)
4) Mark all vertices as not-visited in reversed graph.
5) Do a DFS traversal of reversed graph starting from same vertex v (Same as step 2). If DFS traversal doesn’t visit all vertices, then return false. Otherwise return true.
Time Complexity: Time complexity of above implementation is same as Depth First Search which is O(V+E) if the graph is represented using adjacency list representation.

One way of doing this would be to generate the Laplacian matrix for the graph, then calculate the eigenvalues, and finally count the number of zeros. The graph is strongly connection if there exists only one zero eigenvalue.
Note: Pay attention to the slightly different method for creating the Laplacian matrix for directed graphs.

The algorithm to check if a graph is strongly connected is quite straightforward. But why does the below algorithm work?
Algorithm: suppose there is a graph with vertices [A, B, C......Z]
Choose any random node, say J, and perform DFS from it. If all the nodes are reachable then continue to step 2.
Reverse the directions of the edges of the graph by doing transpose.
Again run DFS from node J and check if all the nodes are visited. If yes then the graph is strongly connected and return true.
Performing step 1 makes sense because we have to check if we can reach all the nodes from that node. After this, next logical step could be
i) Now do this for all other nodes
ii) or try to reach node J from every other node. Because once you reach node J, you are sure that you can reach every other node because of step 1.
This is what we are trying to do in steps 2 & 3. If in a transposed graph node J is able to reach all other nodes then this implies that in original graph all other nodes can reach J.

Related

Specific Graph and need to more Creative solution

Directed Graph (|V|=a, |E|=b) is given.
each vertexes has specific weight. we want for each vertex (1..a) find a vertex with maximum weight that can be reachable from that vertex.
Update 1: one nice answer is prepare by #Paul in O(b + a log a). but I
search for O(a + b) algorithms, if any?
Is there any different efficient or fastest any other ways for doing it?
Yes, it's possible to modify Tarjan's SCC algorithm to solve this problem in linear time.
Tarjan's algorithm uses two node fields to drive its SCC finding logic: index, which represents the order in which the algorithm discovers the nodes; and lowlink, the minimum index reachable by a sequence of tree arcs followed by a back arc. As part of the same depth-first traversal, we can compute another field, maxweight, which has one of two meanings:
For a node not yet included in a finished SCC, it represents the maximum weight reachable by a sequence of tree arcs, optionally followed by a cross arc to another SCC and then any subsequent path.
For nodes in a finished SCC, it represents the maximum weight reachable.
The logic for computing maxweight is as follows. If we discover an arc from v to a new node w, then vw is a tree arc, so we compute w.maxweight recursively and update v.maxweight = max(v.maxweight, w.maxweight). If w is on the stack, then we do nothing, because vw is a back arc and not included in the definition of maxweight. Otherwise, vw is a cross arc, and we do the same update that we would have done for a tree arc, just without the recursive call.
When Tarjan's algorithm identifies an SCC, it's because it has a node r with r.lowlink == r.index. Since r is the depth-first search root of this SCC, its value of maxweight is correct for the whole SCC. Instead of recording each node in the SCC, we simply update its maxweight to r.maxweight.
Sort all nodes by weight in decreasing order and create the graph g' with all edges in E reversed (i.e. if there's an edge a -> b in g, there's an edge b -> a in g'). In this graph you can now propagate the maximum-value by simple DFS. Do this iteratively for all nodes and terminate when a maximum-weight has already been assigned.
As pseudocode:
dfs_assign_weight_reachable(node, weight):
if node.max_weight_reachable >= weight:
return
node.max_weight_reachable = weight
for n = neighbor of node:
dfs_assign_weight_reachable(n, weight)
g' = g with all edges reversed
nodes = nodes from g' sorted descendingly by weight
assign max_weight_reachable = -inf to each node in nodes
for node in nodes:
dfs_assign_weight_reachable(node, node.weight)
UPDATE:
The tight bound is O(b + a log a). a log a is caused by the sorting step. And each edge gets visited once during the reversal step and once during the assigning maximum weights, giving the second term in the max-expression.
Acknowledgement:
I'd like to thank #SerialLazer for the time invested in a discussion about the time-complexity of the above algorithm and helping me figure out the correct bound.

Find an O(n) algorithm that returns all interesting vertices of the graph

Problem : A directed graph G with n vertices and a special vertex u is provided. We call a vertex v ‘interesting’ if there is a path from v to a vertex w such that there is a cycle containing the vertices w and u. Write an O(n) time algorithm which takes G (the whole graph) and the node u as input and returns all the interesting vertices.
Ineffiecient Algorithm : My idea initially was to consider the node u and compute all the cycles that contain u. (This itself seems like traversing through the nodes using DFS and then forward-tracking as well when you encounter a visited node) Now from each vertex on these cycles we can compute the number of nodes on the graph that do not belong to the cycle(s) but is connected with each particular vertex w not equal to u on a cycle. Add all these values to get the desired answer. This isn't an O(n) algorithm.
There are two cases:
If there are no cycles containing u, then no vertex can be "interesting".
If there are any cycles containing u, then a vertex v is "interesting" if and only if there's a path from v to u. (We don't need to worry about the w in the problem description, because if a cycle contains two vertices u and w, then any path that ends at u can be extended to end at w and vice versa.)
So, the most efficient algorithm would be as follows:
Using DFS, determine if u is in a cycle. (We don't need to find all cycles; we just need to determine whether there are any.)
Using DFS in the "reverse" direction, find all vertices from which u is reachable.
DFS requires O(|V| + |E|) time, which is greater than O(n) = O(|V|) unless |E| is in O(n); but then, there's no way to even read in the entire graph definition in less than |E| time, so this is unavoidable. Whoever gave you this question must not have really thought this through.

Why do we need to run DFS on the complement of a graph in the Kosaraju's algorithm?

There's a famous algorithm to find the strongly connected components called Kosaraju's algorithm, which uses two DFS's to solve this problem, and runs in θ(|V| + |E|) time.
First we use DFS on complement of the graph (GR) to compute reverse postorder of vertices, and then we apply second DFS on the main graph G by taking vertices in reverse post order to compute the strongly connected components.
Although I understand the mechanics of the algorithm, I'm not getting the intuition behind the need of the reverse post order.
How does it helps the second DFS to find the strongly connected components?
suppose result of the first DFS is:
----------v1--------------v2-----------
where "-" indicates any number and all the vertices in a strongly connected component g appear between v1 and v2.
DFS by post order gives the following guarantee that
all vertices after v2 would not points to g in the reverse graph(that is to say, you cannot reach these vertices from g in the origin graph)
all vertices before v1 cannot be pointed to from g in the reverse graph(that is to say, you cannot reach g from these vertices in the origin graph)
in one word, the first DFS ensures that in the second DFS, strongly connected components that are visited earlier cannot have any edge points to other unvisited strongly connected components.
Some Detailed Explanation
let's simplify the graph as follow:
the whole graph is G
G contains two strongly connected components, one is g, the other one is a single vertex v
there is only one edge between v and g, either from v to g or from g to v, the name of this edge is e
g', e' represent the reverse of g, e
the situation in which this algorithm could fail can be conclude as
start DFS from v, and e' points from v to g'
start DFS from a vertex inside of g', and e' points from g' to v
For situation 1
origin graph would be like g-->v, and the reversed graph looks like g'<--v.
To start the second DFS from v, the post order generated by first DFS need to be something like
g1, g2, g3, ..., v
but you would easily find out that neither starting the first DFS from v nor from g' can give you such a post order, so in this situation, it is guaranteed be the first DFS that the second DFS would not start from a vertex that both be out of and points to a strongly connected component.
For situation 2
similar to the situation 1, in situation 2, where the origin graph is g<--v and the reversed on is g'-->v, it is guaranteed that v would be visited before any vertex in g'.
When you run DFS on a graph for the first time, for every node you visit you get the knowledge about all nodes that are reachable from that node (you get this information after the first DFS is finished).
Then, when you inverse all the vertices and run the DFS once more, for every node you visit you get the knowledge about all nodes that can reach that node in the non-inverted graph (again, you get this info after the second DFS finished).
Example: let's say your first DFS reaches node X. From that node "you can see" all the neighbours you can visit. (I hope this is pretty understandable). Then, let's say your second DFS reaches that node X, but this time all the vertices are inverted. If then from your node X "you can see" any other nodes, it means that before inverting the vertices the node X was reachable from all the neighbours you see now. By calling the second DFS in the correct order you get for every node X all the nodes that where reachable from X in both DFS trees (and so, for every node X you get the nodes that were both reachable from X and could reach X - those are strongly connected components by definition).
Suppose the list L is the post-order DFS visit of nodes. u->v indicates that there exists a forwarding path from u to v.
If u->v and not v->u, then u must appear at the left of v in L. The nodes in a SCC, such as v and w, however, may appear in any arbitrary order on the list L.
So, if a node x appear strictly before y on the list L:
case1: x->y and y->x, like the case of v and w
case2: x->y and not y->x, like the case of u and v
case3: not x->y and not y->x
The Kosaraju's algorithm iterates through L from left to right and run DFS starting from each node on the transpose graph (where the direction of edges are reversed). If some node is reachable by DFS and it does not belong to any SCC, then we add this node to the SCC of current root.
In case 1, we will add y to the SCC of x. In case 3, y and x are in different SCCs.
Case 2 requires some special attention. At the time we call DFS from y, x is already in some other SCC, so we will not add x to the SCC of y. Imagine if you called the DFS starting from root y before the DFS starting from root x, then x would be added to the SCC of y, which is wrong.
In short, the first DFS arranges those nodes which can reach y but can not be reached from y on its left. So the second DFS is able to avoid adding such nodes x to the SCC of y.

Using BFS or DFS to determine the connectivity in a non connected graph?

How can i design an algorithm using BFS or DFS algorithms in order to determine the connected components of a non connected graph, the algorithm must be able to denote the set of vertices of each connected component.
This is my aproach:
1) Initialize all vertices as not visited.
2) Do a DFS traversal of graph starting from any arbitrary vertex v.
If DFS traversal doesn’t visit all vertices, then return false.
3) Reverse all arcs (or find transpose or reverse of graph)
4) Mark all vertices as not-visited in reversed graph.
5) Do a DFS traversal of reversed graph starting from same vertex v
(Same as step 2). If DFS traversal doesn’t visit all vertices, then
return false. Otherwise return true.
The idea is, if every node can be reached from a vertex v, and every
node can reach v, then the graph is strongly connected. In step 2, we
check if all vertices are reachable from v. In step 4, we check if all
vertices can reach v (In reversed graph, if all vertices are reachable
from v, then all vertices can reach v in original graph).
Any idea of how to improve this solution?.
How about
let vertices = input
let results = empty list
while there are vertices in vertices:
create a set S
choose an arbitrary unexplored vertex, and put it in S.
run BFS/DFS from that vertex, and with each vertex found, remove it from vertices and add it to S.
add S to results
return results
When this completes, you'll have a list of sets of vertices, where each set was made from graph searching from some vertex (making the vertices in each set connected). Assuming an undirected graph, this should work OK (off the top of my head).
This can be done easily using either BFS or DFS in time complexity of O(V+E).
// this is the DFS solution
numCC = 0;
dfs_num.assign(V, UNVISITED); // sets all vertices’ state to UNVISITED
for (int i = 0; i < V; i++) // for each vertex i in [0..V-1]
if (dfs_num[i] == UNVISITED) // if vertex i is not visited yet
printf("CC %d:", ++numCC), dfs(i), printf("\n");
The output of above code for 3 connected components would be something like :
// CC 1: 0 1 2 3 4
// CC 2: 5
// CC 3: 6 7 8
A standard approach for solving this problem is to run DFS starting from each node.
Start by labeling all nodes as unvisited. Then, iterate over the nodes in any order. For each node, if it's not already labeled as being in a connected component, run DFS from that node and mark all reachable nodes as being in the same CC. If the node was already marked, skip it. This then discovers all CC's of the graph one CC at a time.
Moreover, this is very efficient. If there are m edges and n nodes, the runtime is O(n) for the first step (marking all nodes as unvisited) and O(m + n) for the second, since each node and edge are visited at most twice. Thus the overall runtime is O(m + n).
Hope this helps!
Since you seem to be working with a directed graph, and you want to find the connected components (not strongly connected), you have to convert your graph to an undirected graph first. So for each vertex, add a temporary vertex in the opposite direction. Then you can use a simple DFS starting from each vertex which hasn't been visited yet to find the connected components. Finally, you can remove the temporary vertices.

Explanation of Algorithm for finding articulation points or cut vertices of a graph

I have searched the net and could not find any explanation of a DFS algorithm for finding all articulation vertices of a graph. There is not even a wiki page.
From reading around, I got to know the basic facts from here. PDF
There is a variable at each node which is actually looking at back edges and finding the closest and upmost node towards the root node. After processing all edges it would be found.
But I do not understand how to find this down & up variable at each node during the execution of DFS. What is this variable doing exactly?
Please explain the algorithm.
Thanks.
Finding articulation vertices is an application of DFS.
In a nutshell,
Apply DFS on a graph. Get the DFS tree.
A node which is visited earlier is a "parent" of those nodes which are reached by it and visited later.
If any child of a node does not have a path to any of the ancestors of its parent, it means that removing this node would make this child disjoint from the graph.
There is an exception: the root of the tree. If it has more than one child, then it is an articulation point, otherwise not.
Point 3 essentially means that this node is an articulation point.
Now for a child, this path to the ancestors of the node would be through a back-edge from it or from any of its children.
All this is explained beautifully in this PDF.
I'll try to develop an intuitive understanding on how this algorithm works and also give commented pseudocode that outputs Bi-Components as well as bridges.
It's actually easy to develop a brute force algorithm for articulation points. Just take out a vertex, and run BFS or DFS on a graph. If it remains connected, then the vertex is not an articulation point, otherwise it is. This will run in O(V(E+V)) = O(EV) time. The challenge is how to do this in linear time (i.e. O(E+V)).
Articulation points connect two (or more) subgraphs. This means there are no edges from one subgraph to another. So imagine you are within one of these subgraphs and visiting its node. As you visit the node, you flag it and then move on to the next unflagged node using some available edge. While you are doing this, how do you know you are within still same subgraph? The insight here is that if you are within the same subgraph, you will eventually see a flagged node through an edge while visiting an unflagged node. This is called a back edge and indicates that you have a cycle. As soon as you find a back edge, you can be confident that all the nodes through that flagged node to the one you are visiting right now are all part of the same subgraph and there are no articulation points in between. If you didn't see any back edges then all the nodes you visited so far are all articulation points.
So we need an algorithm that visits vertices and marks all points between the target of back edges as currently-being-visited nodes as within the same subgraph. There may obviously be subgraphs within subgraphs so we need to select largest subgraph we have so far. These subgraphs are called Bi-Components. We can implement this algorithm by assigning each bi-component an ID which is initialized as just a count of the number of vertices we have visited so far. Later as we find back edges, we can reset the bi-compinent ID to lowest we have found so far.
We obviously need two passes. In the first pass, we want to figure out which vertex we can see from each vertex through back edges, if any. In the second pass we want to visit vertices in the opposite direction and collect the minimum bi-component ID (i.e. earliest ancestor accessible from any descendants). DFS naturally fits here. In DFS we go down first and then come back up so both of the above passes can be done in a single DFS traversal.
Now without further ado, here's the pseudocode:
time = 0
visited[i] = false for all i
GetArticulationPoints(u)
visited[u] = true
u.st = time++
u.low = v.st //keeps track of highest ancestor reachable from any descendants
dfsChild = 0 //needed because if no child then removing this node doesn't decompose graph
for each ni in adj[i]
if not visited[ni]
GetArticulationPoints(ni)
++dfsChild
parents[ni] = u
u.low = Min(u.low, ni.low) //while coming back up, get the lowest reachable ancestor from descendants
else if ni <> parent[u] //while going down, note down the back edges
u.low = Min(u.low, ni.st)
//For dfs root node, we can't mark it as articulation point because
//disconnecting it may not decompose graph. So we have extra check just for root node.
if (u.low = u.st and dfsChild > 0 and parent[u] != null) or (parent[u] = null and dfsChild > 1)
Output u as articulation point
Output edges of u with v.low >= u.low as bridges
output u.low as bicomponent ID
One fact that seems to be left out of all the explanations:
Fact #1: In a depth first search spanning tree (DFSST), every backedge connects a vertex to one of its ancestors.
This is essential for the algorithm to work, it is why an arbitrary spanning tree won't work for the algorithm. It is also the reason why the root is an articulation point iff it has more than 1 child: there cannot be a backedge between the subtrees rooted at the children of the spanning tree's root.
A proof of the statement is, let (u, v) be a backedge where u is not an ancestor of v, and (WLOG) u is visited before v in the DFS. Let p be the deepest ancestor of both u and v. Then the DFS would have to visit p, then u, then somehow revisit p again before visiting v. But it isn't possible to revisit p before visiting v because there is an edge between u and v.
Call V(c) the set of vertices in the subtree rooted at c in the DFSST
Call N(c) the set of vertices for which that have a neighbor in V(c) (by edge or by backedge)
Fact #2:
For a non root node u,
If u has a child c such that N(c) ⊆ V(c) ∪ {u} then u is an articulation point.
Reason: for every vertex w in V(c), every path from the root to w must contain u. If not, such a path would have to contain a back edge that connects an ancestor of u to a descendant of u due to Fact #1, making N(c) larger than V(c).
Fact #3:
The converse of fact #2 is also true.
Reason: Every descendant of u has a path to the root that doesn't pass through u.
A descendant in V(c) can bypass u with a path through a backedge that connects V(c) to N(c)/V(c).
So for the algorithm, you only need to know 2 things about each non-root vertex u:
The depth of the vertex, say D(u)
The minimum depth of N(u), also called the lowpoint, lets say L(u)
So if a vertex u has a child c, and L(c) is less than D(u), then that mean the subtree rooted at c has a backedge that reaches out to an ancestor of u which makes it not an articulation point by Fact #3. Conversely also by Fact #2.
If low of the descendant of u is greater than the dfsnum of u, then u is said to be the Articulation Point.
int adjMatrix[256][256];
int low[256], num=0, dfsnum[256];
void cutvertex(int u){
low[u]=dfsnum[u]=num++;
for (int v = 0; v < 256; ++v)
{
if(adjMatrix[u][v] && dfsnum[v]==-1)
{
cutvertex(v);
if(low[v]>dfsnum[u])
cout<<"Cut Vertex: "<<u<<"\n";
low[u]=min(low[u], low[v]);
}
else{
low[u]=min(low[u], dfsnum[v]);
}
}
}

Resources