Having trouble figuring out this DFS riddle - algorithm

one of the df trees consists of only one vertex v though v has both
incoming and outgoing edges without a self-loop.
Write G and its df forest.
How can a vertex possibly have outgoing edge, but be the only vertex in the tree? Do cross edges not count, or is there a more clever solution?

A DFS tree usually is defined to just contain the nodes that were visited by forward edges, not cross edges. As a hint: if there's an edge (v, u) and DFS is started on u, then u won't appear in v's DFS tree.
Hope this helps!

Related

How to describe an algorithm to decide whether T is a depth-first spanning tree rooted at s?

Let's Suppose we have a connected graph G, a start vertex s, and a spanning tree T of G and G is undirected. How can I describe an algorithm to decide if T is a depth-first spanning tree rooted at s or not?
All DFS trees T for an undirected graph G have the following property:
{u, v} is an edge in G if and only if u is an ancestor of v in T or v is an ancestor of u in T.
To see why, assume without loss of generality that u is visited before v in the DFS. When building the DFS tree node for u, we will either (1) choose to visit node v as a neighbor of u, making node u a parent of node v, or (2) starting at node u we will visit some other neighbor z, and in recursively exploring z we will visit v, in which case u is a parent of z and z is an ancestor of v.
Moreover, we can make a stronger claim: any tree meeting the above criterion is a DFS tree for some DFS tree of G. Here’s how to see this. Start with the root node of T and look at its children. Given any two subtrees of the root, none of the nodes in those subtrees can be adjacent to one another in G, since otherwise by the above property one of those nodes would have to be an ancestor of the other. Therefore, each subtree consists of a set of nodes that are all reachable from one another via paths that only involve the nodes within that subtree. We can then recursively assemble one possible DFS ordering by starting at the root, recursively building DFS trees for the subgraphs represented by the subtrees in any order we’d like, and gluing those DFS orders together.
With this observation in mind, we can check very quickly with a second DFS whether T can be a DFS tree rooted at s, tracking which nodes have been visited as the DFS runs. After all children of a node v have been processed, check whether all the neighbors of v in graph G have been visited. If so, great! If not, it means that some neighbor of v is neither an ancestor nor a descendant, and the tree isn’t a DFS tree. If this process terminated without finding any violations, the process itself traces out a DFS of G using the edges of T, so T is definitely a valid DFS tree.
This algorithm runs in time O(m + n), which is as fast as possible here. After all, if you don’t look at all the nodes or edges of G, you can’t be sure whether the tree is a valid DFS tree because you can’t check the core property listed above.

DFS on directed graph & Kosaraju's algorithm

I'm having trouble to understand Kosaraju's algorithm for finding the strongly connected components of a directed graph. Here's what I have on my notebook (I'm a student :D):
Start from an arbitrary vertex (label it with #1) and perform a DFS. When you can't go any further, label the last visited vertex with #2, and start another DFS (skipping vertices already labeled), and so on.
Transpose the graph.
Do DFS starting from each vertex in reverse order, those vertices which end visited after each DFS belong to the same SCC.
I have this example:
And after the first step starting from E, the labels are:
E
G
K
J
I
H
F
C
D
B
A
So here comes the thing: Is there a difference for DFS in directed/undirected graphs?
I did a mental test of the first step on my mind ignoring the arrows (just like it was undirected) and only got correct #1 for E (of course) and #2 for G, but #3 fell onto J, not K. So I thought maybe I should respect the arrows, and did a DFS considering that, but after the first pass starting from E, I can't go anywhere from G (which is #2), so I'm stuck there.
Is there anything about DFS on directed graphs that I'm not aware of? I've been taught DFS only on undirected graphs!
Your second step is incomplete. See Wikipedia:
Kosaraju's algorithm works as follows:
Let G be a directed graph and S be an empty stack.
While S does not contain all vertices:
Choose an arbitrary vertex v not in S. Perform a depth-first search starting at v. Each time that depth-first search finishes expanding a vertex u, push u onto S.
Reverse the directions of all arcs to obtain the transpose graph.
While S is nonempty:
Pop the top vertex v from S. Perform a depth-first search starting at v in the transpose graph. The set of visited vertices will give the strongly connected component containing v; record this and remove all these vertices from the graph G and the stack S. Equivalently, breadth-first search (BFS) can be used instead of depth-first search.
So you shouldn't only do something with the last vertex and first vertices, but with each vertex in the DFS.
Also note that you should be backtracking - when you can't go further, you go to the previous vertex and continue from there.
And no, you can't treat it as an undirected graph - the direction of the edges matter significantly.
So, starting from E, you'd, for example, go F, then G, then back to F, then H, then K, then I, then J, then back to I, K, H, F, and finally E, having pushed all visited vertices onto the stack.

Why do we need to run DFS on the complement of a graph in the Kosaraju's algorithm?

There's a famous algorithm to find the strongly connected components called Kosaraju's algorithm, which uses two DFS's to solve this problem, and runs in θ(|V| + |E|) time.
First we use DFS on complement of the graph (GR) to compute reverse postorder of vertices, and then we apply second DFS on the main graph G by taking vertices in reverse post order to compute the strongly connected components.
Although I understand the mechanics of the algorithm, I'm not getting the intuition behind the need of the reverse post order.
How does it helps the second DFS to find the strongly connected components?
suppose result of the first DFS is:
----------v1--------------v2-----------
where "-" indicates any number and all the vertices in a strongly connected component g appear between v1 and v2.
DFS by post order gives the following guarantee that
all vertices after v2 would not points to g in the reverse graph(that is to say, you cannot reach these vertices from g in the origin graph)
all vertices before v1 cannot be pointed to from g in the reverse graph(that is to say, you cannot reach g from these vertices in the origin graph)
in one word, the first DFS ensures that in the second DFS, strongly connected components that are visited earlier cannot have any edge points to other unvisited strongly connected components.
Some Detailed Explanation
let's simplify the graph as follow:
the whole graph is G
G contains two strongly connected components, one is g, the other one is a single vertex v
there is only one edge between v and g, either from v to g or from g to v, the name of this edge is e
g', e' represent the reverse of g, e
the situation in which this algorithm could fail can be conclude as
start DFS from v, and e' points from v to g'
start DFS from a vertex inside of g', and e' points from g' to v
For situation 1
origin graph would be like g-->v, and the reversed graph looks like g'<--v.
To start the second DFS from v, the post order generated by first DFS need to be something like
g1, g2, g3, ..., v
but you would easily find out that neither starting the first DFS from v nor from g' can give you such a post order, so in this situation, it is guaranteed be the first DFS that the second DFS would not start from a vertex that both be out of and points to a strongly connected component.
For situation 2
similar to the situation 1, in situation 2, where the origin graph is g<--v and the reversed on is g'-->v, it is guaranteed that v would be visited before any vertex in g'.
When you run DFS on a graph for the first time, for every node you visit you get the knowledge about all nodes that are reachable from that node (you get this information after the first DFS is finished).
Then, when you inverse all the vertices and run the DFS once more, for every node you visit you get the knowledge about all nodes that can reach that node in the non-inverted graph (again, you get this info after the second DFS finished).
Example: let's say your first DFS reaches node X. From that node "you can see" all the neighbours you can visit. (I hope this is pretty understandable). Then, let's say your second DFS reaches that node X, but this time all the vertices are inverted. If then from your node X "you can see" any other nodes, it means that before inverting the vertices the node X was reachable from all the neighbours you see now. By calling the second DFS in the correct order you get for every node X all the nodes that where reachable from X in both DFS trees (and so, for every node X you get the nodes that were both reachable from X and could reach X - those are strongly connected components by definition).
Suppose the list L is the post-order DFS visit of nodes. u->v indicates that there exists a forwarding path from u to v.
If u->v and not v->u, then u must appear at the left of v in L. The nodes in a SCC, such as v and w, however, may appear in any arbitrary order on the list L.
So, if a node x appear strictly before y on the list L:
case1: x->y and y->x, like the case of v and w
case2: x->y and not y->x, like the case of u and v
case3: not x->y and not y->x
The Kosaraju's algorithm iterates through L from left to right and run DFS starting from each node on the transpose graph (where the direction of edges are reversed). If some node is reachable by DFS and it does not belong to any SCC, then we add this node to the SCC of current root.
In case 1, we will add y to the SCC of x. In case 3, y and x are in different SCCs.
Case 2 requires some special attention. At the time we call DFS from y, x is already in some other SCC, so we will not add x to the SCC of y. Imagine if you called the DFS starting from root y before the DFS starting from root x, then x would be added to the SCC of y, which is wrong.
In short, the first DFS arranges those nodes which can reach y but can not be reached from y on its left. So the second DFS is able to avoid adding such nodes x to the SCC of y.

Forward Edge in an Undirected Graph

CLRS - Chapter 22
Theorem 22.10
In a depth-first search of an undirected graph G, every edge of G is
either a tree edge or a back edge.
Proof Let (u,v) be an arbitrary edge of G, and suppose without loss of
generality that u.d < v.d. Then the search must discover and finish v
before it finishes u (while u is gray), since v is on u’s adjacency
list. If the first time that the search explores edge (u,v), it is in
the direction from u to v, then v is undiscovered (white) until that
time, for otherwise the search would have explored this edge already
in the direction from v to u. Thus, (u.v) becomes a tree edge. If the
search explores (u,v) first in the direction from v to u, then (u,v)
is a back edge, since u is still gray at the time the edge is first
explored.
I most certainly understand the proof; but not quite convinced with the idea of forward edges.
In the above image, there is a forward edge from the first vertex to the third vertex (first row). The first vertex is the source.
As I understand DFS(S) would include a forward vertex 1 -> 3. (I am obviously wrong, but I need somebody to set me straight!)
It looks like you didn't include the definition of "forward edge," so I'll start with the definition I learned.
Assuming u.d < v.d, DFS labels the edge (u,v) a forward edge if
when crossing the edge from u to v, v has already been marked as visited.
Because of that though, I claim that you cannot have forward edges in an undirected graph.
Assume for the sake of contradiction that it was possible. Therefore, the destination node is already marked as visited. Thus, DFS has already gone there and crossed all of the adjacent edges. In particular, you had to have already crossed that edge in the opposite direction. Thus, the edge has already been marked as a certain type of edge and thus won't be marked as a "forward edge".
Because of this, forward edges can only occur in directed graphs.
Now, just in case you mixed up "forward edges" and "tree edges", the edge you describe still isn't necessarily a tree edge. It is only a tree edge if when crossing, that was the first time you've visited the destination node. The easy way to think about it in undirected graphs is that when you traverse an edge, it is a back edge if the destination node has been reached already, and a tree edge otherwise.
I believe that what you are missing is some assumption about the order in which the algorithm would visit the different vertices.
Let's assume the algorithm visits the vertices in a lexicographic order. let's name the vertices this way:
-------
| |
S - A - B
| | |
C - D - E
In this case, the forward edges will be S->A, A->B, B->E, E->D, D->C. the rest of the edges are back edges.
Now let's rename the graph:
-------
| |
S - B - A
| | |
C - D - E
In this case, the forward edges will be S->A, A->B, B->D, D->C, D->E (note that S->A and S->B are not the same edge as in the previous example).
As you can see, the output depends on the order in which the algorithm selects the vertices. when the graph is anonymous, any output may be correct.
In the DFS tree of a general graph, there are TREE, FORWARD, BACK and CROSS edges.
In the DFS tree of an undirected graph, the would-be FORWARD edges are labeled as BACK edges.
The would-be CROSS edges are labeled as TREE edges.
In both cases, the reason is that the edges can be traversed in both directions, but you first encounter them as BACK and TREE and second time as FORWARD and maybe CROSS and they are already labeled.
In a sense, an edge is both FORWARD and BACK and can be both CROSS and TREE, but is first found as BACK and TREE, repectively.

Algorithm to check if directed graph is strongly connected

I need to check if a directed graph is strongly connected, or, in other words, if all nodes can be reached by any other node (not necessarily through direct edge).
One way of doing this is running a DFS and BFS on every node and see all others are still reachable.
Is there a better approach to do that?
Consider the following algorithm.
Start at a random vertex v of the graph G, and run a DFS(G, v).
If DFS(G, v) fails to reach every other vertex in the graph G, then there is some vertex u, such that there is no directed path from v to u, and thus G is not strongly connected.
If it does reach every vertex, then there is a directed path from v to every other vertex in the graph G.
Reverse the direction of all edges in the directed graph G.
Again run a DFS starting at v.
If the DFS fails to reach every vertex, then there is some vertex u, such that in the original graph there is no directed path from u to v.
On the other hand, if it does reach every vertex, then in the original graph there is a directed path from every vertex u to v.
Thus, if G "passes" both DFSs, it is strongly connected. Furthermore, since a DFS runs in O(n + m) time, this algorithm runs in O(2(n + m)) = O(n + m) time, since it requires 2 DFS traversals.
Tarjan's strongly connected components algorithm (or Gabow's variation) will of course suffice; if there's only one strongly connected component, then the graph is strongly connected.
Both are linear time.
As with a normal depth first search, you track the status of each node: new, seen but still open (it's in the call stack), and seen and finished. In addition, you store the depth when you first reached a node, and the lowest such depth that is reachable from the node (you know this after you finish a node). A node is the root of a strongly connected component if the lowest reachable depth is equal to its own depth. This works even if the depth by which you reach a node from the root isn't the minimum possible.
To check just for whether the whole graph is a single SCC, initiate the dfs from any single node, and when you've finished, if the lowest reachable depth is 0, and every node was visited, then the whole graph is strongly connected.
To check if every node has both paths to and from every other node in a given graph:
1. DFS/BFS from all nodes:
Tarjan's algorithm supposes every node has a depth d[i]. Initially, the root has the smallest depth. And we do the post-order DFS updates d[i] = min(d[j]) for any neighbor j of i. Actually BFS also works fine with the reduction rule d[i] = min(d[j]) here.
function dfs(i)
d[i] = i
mark i as visited
for each neighbor j of i:
if j is not visited then dfs(j)
d[i] = min(d[i], d[j])
If there is a forwarding path from u to v, then d[u] <= d[v]. In the SCC, d[v] <= d[u] <= d[v], thus, all the nodes in SCC will have the same depth. To tell if a graph is a SCC, we check whether all nodes have the same d[i].
2. Two DFS/BFS from the single node:
It is a simplified version of the Kosaraju’s algorithm. Starting from the root, we check if every node can be reached by DFS/BFS. Then, reverse the direction of every edge. We check if every node can be reached from the same root again. See C++ code.
You can calculate the All-Pairs Shortest Path and see if any is infinite.
Tarjan's Algorithm has been already mentioned. But I usually find Kosaraju's Algorithm easier to follow even though it needs two traversals of the graph. IIRC, it is also pretty well explained in CLRS.
test-connected(G)
{
choose a vertex x
make a list L of vertices reachable from x,
and another list K of vertices to be explored.
initially, L = K = x.
while K is nonempty
find and remove some vertex y in K
for each edge (y, z)
if (z is not in L)
add z to both L and K
if L has fewer than n items
return disconnected
else return connected
}
You can use Kosaraju’s DFS based simple algorithm that does two DFS traversals of graph:
The idea is, if every node can be reached from a vertex v, and every node can reach v, then the graph is strongly connected.
In step 2 of the algorithm, we check if all vertices are reachable from v. In step 4, we check if all vertices can reach v (In reversed graph, if all vertices are reachable from v, then all vertices can reach v in original graph).
Algorithm :
1) Initialize all vertices as not visited.
2) Do a DFS traversal of graph starting from any arbitrary vertex v. If DFS traversal doesn’t visit all vertices, then return false.
3) Reverse all arcs (or find transpose or reverse of graph)
4) Mark all vertices as not-visited in reversed graph.
5) Do a DFS traversal of reversed graph starting from same vertex v (Same as step 2). If DFS traversal doesn’t visit all vertices, then return false. Otherwise return true.
Time Complexity: Time complexity of above implementation is same as Depth First Search which is O(V+E) if the graph is represented using adjacency list representation.
One way of doing this would be to generate the Laplacian matrix for the graph, then calculate the eigenvalues, and finally count the number of zeros. The graph is strongly connection if there exists only one zero eigenvalue.
Note: Pay attention to the slightly different method for creating the Laplacian matrix for directed graphs.
The algorithm to check if a graph is strongly connected is quite straightforward. But why does the below algorithm work?
Algorithm: suppose there is a graph with vertices [A, B, C......Z]
Choose any random node, say J, and perform DFS from it. If all the nodes are reachable then continue to step 2.
Reverse the directions of the edges of the graph by doing transpose.
Again run DFS from node J and check if all the nodes are visited. If yes then the graph is strongly connected and return true.
Performing step 1 makes sense because we have to check if we can reach all the nodes from that node. After this, next logical step could be
i) Now do this for all other nodes
ii) or try to reach node J from every other node. Because once you reach node J, you are sure that you can reach every other node because of step 1.
This is what we are trying to do in steps 2 & 3. If in a transposed graph node J is able to reach all other nodes then this implies that in original graph all other nodes can reach J.

Resources