Determine if a graph is semi-connected or not - algorithm

A directed graph G = (V, E) is said to be semi-connected if, for all pairs of vertices u, v in V we have u -> v or v-> u path.
Give an efficient algorithm to determine whether or not G is semi-connected

Trivial O(V^3) solution could be to use floyd warshal all-to-all shortest path, but that's an overkill (in terms of time complexity).
It can be done in O(V+E).
Claim:
A DAG is semi connected in a topological sort, for each i, there there is an edge (vi,vi+1)
Proof:
Given a DAG with topological sort v1,v2,...,vn:
If there is no edge (vi,v(i+1)) for some i, then there is also no path (v(i+1),vi) (because it's a topological sort of a DAG), and the graph is not semi connected.
If for every i there is an edge (vi,vi+1), then for each i,j (i < j) there is a path vi->vi+1->...->vj-1->vj, and the graph is semi connected.
From this we can get the algorithm:
Find Maximal SCCs in the graph.
Build the SCC graph G'=(U,E') such that U is a set of SCCs. E'= { (V1,V2) | there is v1 in V1 and v2 in V2 such that (v1,v2) is in E }.
Do topological sort on G'.
Check if for every i, there is edge Vi,V(i+1).
Correctness proof:
If the graph is semi connected, for a pair (v1,v2), such that there is a path v1->...->v2 - Let V1, V2 be their SCCs. There is a path from V1 to V2, and thus also from v1 to v2, since all nodes in V1 and V2 are strongly connected.
If the algorithm yielded true, then for any two given nodes v1, v2 - we know they are in SCC V1 and V2. There is a path from V1 to V2 (without loss of generality), and thus also from v1 to v2.
As a side note, also every semi-connected graph has a root (vertex r that leads to all vertices):
Proof:
Assume there is no root. Define #(v) = |{u | there is a path from v to u}| (number of nodes that has a path from v to them).
Choose a such that #(a) = max{#(v) | for all v}.
a is not a root, so there is some node u that has no path from a to it. Since graph is semi connected, it means there is a path u->...->a. But that means #(u) >= #(a) + 1 (all nodes reachable from a and also u).
Contradiction to maximality of #(a), thus there is a root.

Amit's soltuin described completely the most efficient approach. I might just add that one can replace step 4 by checking whether there exists more than one topological order of G'. If yes, then the graph is not semi connected. Otherwise, the graph is semi connected. This can be easily incorporated in Kahn's algorithm for finding topological order of a graph.
Another less efficient solution that works in quadratic time is the following.
First, construct another graph G* which is the reverse of the original graph. Then for each vertex v of G, you run a DFS from v in G and consider the set of reachable nodes as R_v. If R_v != V(G), then run another DFS from v in G* and let the set of reachable nodes be R_v. If the union of R_v and R_v is not V(G) then the graph is not semi connected.

The main idea behind step 3 and 4 of Amit's algorithms is to check whether your depth-first forest is comprised of multiple depth-first trees. Presence of a single tree is a necessary condition for semi-connectivity as multiple trees represent unconnected nodes.
Similar ideas: Hamiltonian path, Longest path length

Related

Finding if a graph is still strongly connected after edge removal

A strongly connected digraph is a directed graph in which for each two vertices ๐‘ข and ๐‘ฃ,
there is a directed path from ๐‘ข to ๐‘ฃ and a direct path from ๐‘ฃ to ๐‘ข. Let ๐บ = (๐‘‰, ๐ธ) be a
strongly connected digraph, and let ๐‘’ = (๐‘ข, ๐‘ฃ) โˆˆ ๐ธ be an edge in the graph.
Design an efficient algorithm which decides whether ๐บ
โ€ฒ = (๐‘‰, ๐ธ โˆ– {๐‘’}), the graph
without the edge ๐‘’ is strongly connected. Explain its correctness and analyze its running
time.
So what I did is run BFS and sum the labels, once on the original graph from ๐‘ข and then again in G' without the edge (again from ๐‘ข)
and then : if second sum (in G') < original sum (in G) then the graph isn't strongly connected.
P.S this is a question from my exam which I only got 3/13 points and I'm wondering if i should appeal..
As Sneftel points out, the distance labels can only increase. If u no longer has a path to v, then I guess v's label will be infinite, so the sum of labels will change from finite to infinite. Yet the sum can increase without the graph losing strong connectivity, e.g.,
u<----->v
\ /|
\| /
w
where v's label increases from 1 to 2 because of the indirect path through w.
Since the graph G is strongly connected, G' is strongly connected if and only if there is a path from u to v (this path would replace the edge e).
You can use any path finding algorithm to solve this problem.

Find spanning tree that specific v has k degrees

can someone explain for me how to solve it i know that we should use DFS but i cant understand what we do after that.
input : undirected graph G and specific v that belong to the G
output : spanning tree that v has k degree
I will suggest the following way.
Here I assume G is connected.
First remove v from the graph, find spanning tree for each of the remaining component.
Now you may have a single spanning tree or a forest depending on the graph, you can add back v and use the edges to connecting v and each of the spanning tree.
Then you will have a spanning tree of G, and there will be three cases.
case 1: degree v > k, in this case, the task is impossible
case 2: degree v = k, you have the answer.
case 3: degree v < k, then you just add unused edges of v. Each time you add an edge you will create a cycle, then you can just choose an edge which does not touch v and remove it.
You keep adding edges until you have your answer or all edges run out.
However, I cannot think of a fast way to query a cycle besides doing bfs/dfs.
Update: There is a faster way for case 3 by Matt, after connecting v to k appropriate neighbors, use Kruskal's or Prim's algorithm to fill in the rest of the spanning tree, starting with the edges from v that you already have.
Here is an algorithm that I am providing, but its pretty much brute force.
Condition for such tree to exist: if the degree of v < k, then such tree does not exist.
else follow the algorithm as:
Pick k vertices out of all the adjacent vertices of v,
1.mark all adjacent vertices of v as VISITED.
2.From each of those adjacent vertices , call DFS and the spanning tree grows.
3.After all DFS is complete,if all vertices of graph G are marked VISITED, then we
have our spanning tree, if a single vertex or more are left UNVISITED, then the
4.pick another set of k vertices.
if v has X as degree and X > k, then step 4 has to be repeated XCk(X choose k) times in the worst case.
We can think this way too:
For the given vertex v among all the neighbor v_i s.t. (v,v_i) in E(G) pick k edges with the smallest weights w(v,v_i), we need O(d*lg(d)) time where deg(v)=d >= k. Add these edges to the spanning tree edge set S. We have to add these edges to S anyway to ensure that the constraint holds.
Remove vertex v from the graph G along with all edges incident on v. Now run Prim/Kruskal on the graph G \ {v} starting with the edge set S, the algorithm itself will add edges ensuring the acyclic and minimum properties. This will run in O(m*lg(n)).
Assuming d small step 1 & 2 runs in O(m*lg(n)).

Why do we need to run DFS on the complement of a graph in the Kosaraju's algorithm?

There's a famous algorithm to find the strongly connected components called Kosaraju's algorithm, which uses two DFS's to solve this problem, and runs in ฮธ(|V| + |E|) time.
First we use DFS on complement of the graph (GR) to compute reverse postorder of vertices, and then we apply second DFS on the main graph G by taking vertices in reverse post order to compute the strongly connected components.
Although I understand the mechanics of the algorithm, I'm not getting the intuition behind the need of the reverse post order.
How does it helps the second DFS to find the strongly connected components?
suppose result of the first DFS is:
----------v1--------------v2-----------
where "-" indicates any number and all the vertices in a strongly connected component g appear between v1 and v2.
DFS by post order gives the following guarantee that
all vertices after v2 would not points to g in the reverse graph(that is to say, you cannot reach these vertices from g in the origin graph)
all vertices before v1 cannot be pointed to from g in the reverse graph(that is to say, you cannot reach g from these vertices in the origin graph)
in one word, the first DFS ensures that in the second DFS, strongly connected components that are visited earlier cannot have any edge points to other unvisited strongly connected components.
Some Detailed Explanation
let's simplify the graph as follow:
the whole graph is G
G contains two strongly connected components, one is g, the other one is a single vertex v
there is only one edge between v and g, either from v to g or from g to v, the name of this edge is e
g', e' represent the reverse of g, e
the situation in which this algorithm could fail can be conclude as
start DFS from v, and e' points from v to g'
start DFS from a vertex inside of g', and e' points from g' to v
For situation 1
origin graph would be like g-->v, and the reversed graph looks like g'<--v.
To start the second DFS from v, the post order generated by first DFS need to be something like
g1, g2, g3, ..., v
but you would easily find out that neither starting the first DFS from v nor from g' can give you such a post order, so in this situation, it is guaranteed be the first DFS that the second DFS would not start from a vertex that both be out of and points to a strongly connected component.
For situation 2
similar to the situation 1, in situation 2, where the origin graph is g<--v and the reversed on is g'-->v, it is guaranteed that v would be visited before any vertex in g'.
When you run DFS on a graph for the first time, for every node you visit you get the knowledge about all nodes that are reachable from that node (you get this information after the first DFS is finished).
Then, when you inverse all the vertices and run the DFS once more, for every node you visit you get the knowledge about all nodes that can reach that node in the non-inverted graph (again, you get this info after the second DFS finished).
Example: let's say your first DFS reaches node X. From that node "you can see" all the neighbours you can visit. (I hope this is pretty understandable). Then, let's say your second DFS reaches that node X, but this time all the vertices are inverted. If then from your node X "you can see" any other nodes, it means that before inverting the vertices the node X was reachable from all the neighbours you see now. By calling the second DFS in the correct order you get for every node X all the nodes that where reachable from X in both DFS trees (and so, for every node X you get the nodes that were both reachable from X and could reach X - those are strongly connected components by definition).
Suppose the list L is the post-order DFS visit of nodes. u->v indicates that there exists a forwarding path from u to v.
If u->v and not v->u, then u must appear at the left of v in L. The nodes in a SCC, such as v and w, however, may appear in any arbitrary order on the list L.
So, if a node x appear strictly before y on the list L:
case1: x->y and y->x, like the case of v and w
case2: x->y and not y->x, like the case of u and v
case3: not x->y and not y->x
The Kosaraju's algorithm iterates through L from left to right and run DFS starting from each node on the transpose graph (where the direction of edges are reversed). If some node is reachable by DFS and it does not belong to any SCC, then we add this node to the SCC of current root.
In case 1, we will add y to the SCC of x. In case 3, y and x are in different SCCs.
Case 2 requires some special attention. At the time we call DFS from y, x is already in some other SCC, so we will not add x to the SCC of y. Imagine if you called the DFS starting from root y before the DFS starting from root x, then x would be added to the SCC of y, which is wrong.
In short, the first DFS arranges those nodes which can reach y but can not be reached from y on its left. So the second DFS is able to avoid adding such nodes x to the SCC of y.

Design an algorithm to detect cycle in graph G

What would the following algorithm look like:
a linear-time algorithm which, given an undirected graph G, and a particular edge e in it, determines whether G has a cycle containing e
I have following Idea:
for each v that belongs to V,
if v is a descendant of e and (e,v) has not been traversed then check following:
if we visited e before v and left v before we left e then
the graph contains cycle
I am not sure if this is your homework so I'll just give a little hint - use the properties of breadth-first search tree (with root in any of the two vertices of the edge e), its subtrees which are determined by neighbors of the root and the edges between those subtrees.
Per comingstorm's hint, an undirected edge is itself a cycle. A<->B back and forth as many times as you like.

Algorithm to check if directed graph is strongly connected

I need to check if a directed graph is strongly connected, or, in other words, if all nodes can be reached by any other node (not necessarily through direct edge).
One way of doing this is running a DFS and BFS on every node and see all others are still reachable.
Is there a better approach to do that?
Consider the following algorithm.
Start at a random vertex v of the graph G, and run a DFS(G, v).
If DFS(G, v) fails to reach every other vertex in the graph G, then there is some vertex u, such that there is no directed path from v to u, and thus G is not strongly connected.
If it does reach every vertex, then there is a directed path from v to every other vertex in the graph G.
Reverse the direction of all edges in the directed graph G.
Again run a DFS starting at v.
If the DFS fails to reach every vertex, then there is some vertex u, such that in the original graph there is no directed path from u to v.
On the other hand, if it does reach every vertex, then in the original graph there is a directed path from every vertex u to v.
Thus, if G "passes" both DFSs, it is strongly connected. Furthermore, since a DFS runs in O(n + m) time, this algorithm runs in O(2(n + m)) = O(n + m) time, since it requires 2 DFS traversals.
Tarjan's strongly connected components algorithm (or Gabow's variation) will of course suffice; if there's only one strongly connected component, then the graph is strongly connected.
Both are linear time.
As with a normal depth first search, you track the status of each node: new, seen but still open (it's in the call stack), and seen and finished. In addition, you store the depth when you first reached a node, and the lowest such depth that is reachable from the node (you know this after you finish a node). A node is the root of a strongly connected component if the lowest reachable depth is equal to its own depth. This works even if the depth by which you reach a node from the root isn't the minimum possible.
To check just for whether the whole graph is a single SCC, initiate the dfs from any single node, and when you've finished, if the lowest reachable depth is 0, and every node was visited, then the whole graph is strongly connected.
To check if every node has both paths to and from every other node in a given graph:
1. DFS/BFS from all nodes:
Tarjan's algorithm supposes every node has a depth d[i]. Initially, the root has the smallest depth. And we do the post-order DFS updates d[i] = min(d[j]) for any neighbor j of i. Actually BFS also works fine with the reduction rule d[i] = min(d[j]) here.
function dfs(i)
d[i] = i
mark i as visited
for each neighbor j of i:
if j is not visited then dfs(j)
d[i] = min(d[i], d[j])
If there is a forwarding path from u to v, then d[u] <= d[v]. In the SCC, d[v] <= d[u] <= d[v], thus, all the nodes in SCC will have the same depth. To tell if a graph is a SCC, we check whether all nodes have the same d[i].
2. Two DFS/BFS from the single node:
It is a simplified version of the Kosarajuโ€™s algorithm. Starting from the root, we check if every node can be reached by DFS/BFS. Then, reverse the direction of every edge. We check if every node can be reached from the same root again. See C++ code.
You can calculate the All-Pairs Shortest Path and see if any is infinite.
Tarjan's Algorithm has been already mentioned. But I usually find Kosaraju's Algorithm easier to follow even though it needs two traversals of the graph. IIRC, it is also pretty well explained in CLRS.
test-connected(G)
{
choose a vertex x
make a list L of vertices reachable from x,
and another list K of vertices to be explored.
initially, L = K = x.
while K is nonempty
find and remove some vertex y in K
for each edge (y, z)
if (z is not in L)
add z to both L and K
if L has fewer than n items
return disconnected
else return connected
}
You can use Kosarajuโ€™s DFS based simple algorithm that does two DFS traversals of graph:
The idea is, if every node can be reached from a vertex v, and every node can reach v, then the graph is strongly connected.
In step 2 of the algorithm, we check if all vertices are reachable from v. In step 4, we check if all vertices can reach v (In reversed graph, if all vertices are reachable from v, then all vertices can reach v in original graph).
Algorithm :
1) Initialize all vertices as not visited.
2) Do a DFS traversal of graph starting from any arbitrary vertex v. If DFS traversal doesnโ€™t visit all vertices, then return false.
3) Reverse all arcs (or find transpose or reverse of graph)
4) Mark all vertices as not-visited in reversed graph.
5) Do a DFS traversal of reversed graph starting from same vertex v (Same as step 2). If DFS traversal doesnโ€™t visit all vertices, then return false. Otherwise return true.
Time Complexity: Time complexity of above implementation is same as Depth First Search which is O(V+E) if the graph is represented using adjacency list representation.
One way of doing this would be to generate the Laplacian matrix for the graph, then calculate the eigenvalues, and finally count the number of zeros. The graph is strongly connection if there exists only one zero eigenvalue.
Note: Pay attention to the slightly different method for creating the Laplacian matrix for directed graphs.
The algorithm to check if a graph is strongly connected is quite straightforward. But why does the below algorithm work?
Algorithm: suppose there is a graph with vertices [A, B, C......Z]
Choose any random node, say J, and perform DFS from it. If all the nodes are reachable then continue to step 2.
Reverse the directions of the edges of the graph by doing transpose.
Again run DFS from node J and check if all the nodes are visited. If yes then the graph is strongly connected and return true.
Performing step 1 makes sense because we have to check if we can reach all the nodes from that node. After this, next logical step could be
i) Now do this for all other nodes
ii) or try to reach node J from every other node. Because once you reach node J, you are sure that you can reach every other node because of step 1.
This is what we are trying to do in steps 2 & 3. If in a transposed graph node J is able to reach all other nodes then this implies that in original graph all other nodes can reach J.

Resources