An algorithm may come upon a node a second time, i.e. there might be two paths to the node. The algorithm needs to know which path was shorter.
When the Best First Search reaches a node that it has visited before, it is possible that the previous visit had a longer path. When this happens, the open and closed lists need updating.
This can not happen with an A* search.
Question: Can this happen with a DFS?
The answer is yes, but I thought it was no. Why is it yes? I thought that once a node has been visited it wont go back to it.
DFS strategy will visit a node as many times as it finds a path to it. It wouldn't continue visiting from that node down, but it would register the visit itself. This is essential for DFS edge classification.
For example, consider running DFS on this graph:
When you reach node C for the first time, the path is A-B-C. When you reach C for the second time, the path is A-C, which is shorter.
If you have a graph like this
A
|\
B \
| E
C /
|/
D
and traverse it depth-first, from left to right, the following pathes will be visited in that order:
A
AB
ABC
ABCD
AE
AED
You see, the first visit of D is on a longer path (ABCD) than the second one (AED).
Related
In Kosaraju's algorithm, finishing times are generated from the reversed graph. Then, the strongly connected components are discovered from the original graph by performing DFS, starting from the greatest to the lowest finishing times generated earlier.
Can the finishing times for Kosaraju's algorithm be generated from the original graph? Then, could the strongly connected components be discovered by performing DFS, starting from the lowest finishing time to the greatest finishing time?
It seems to me like it would be the case, but that's just my hunch.
No, this approach will not necessarily work. Consider the following graph:
A ---> C
^ |
| |
| v
B
Let's try running your proposed algorithm. We'll begin running the DFS at node A, and whenever we have a choice of which node to visit next, we'll pick the alphabetically first. The DFS will look like this:
Begin exploring A.
Begin exploring B.
Finish exploring B.
Begin exploring C.
Finish exploring C.
Finish exploring A.
That gives us the finishing times as B, C, A.
If we now run a DFS in the original graph, going in the order of finishing times, our first DFS will be from node B. That will find nodes A, B, and C, and so we'd incorrectly conclude that {A, B, C} is a strongly connected component of the graph, even though the SCCs here are actually {A, B} and C.
We could then ask - why doesn't this work? That is, what's so special about going in reverse? The answer is that if we run a DFS and record the finishing times of the nodes, the first node visited in each SCC will have the last finishing time of all the nodes in the SCC. That's why the very last node we find in the ordering must be in a source SCC, for example. On the other hand, the finishing times of the other nodes in the SCCs are not guaranteed to have any particular properties. (That's how I found this graph - I constructed it so that the first node to finish in one of the SCCs would be sequenced before a node in one of the sink SCCs).
I am confused about this answer. Why can't DFS decide if there is a cycle in a directed graph while visiting each node and each edge at most once? Using the white, gray, black method, one should be able to find a cycle if there is a backward edge.
For an unconnected directed graph, why can't one do the following: run DFS from an arbitrary node v and visit as many nodes as v is connected to, then run DFS on another unvisited arbitrary node in the graph, if any, until all nodes are visited?
It seems to me that DFS should be able to find a cycle if it exists in at most o(|V|+|E|) time. Is this claim in the above mentioned answer wrong?
"It is possible to visit a node multiple times in a DFS without a
cycle existing"
Moreover, as this other answer suggest, if a cycle exists, DFS should find it after exploring a maximum of |V| edges, so the run time is really O(|V|).
What am I missing?
Update and conclusions:
Based on Pham Trung's comments, it looks like the "simple DFS" in that answer refers to a DFS starting from one node in a strongly connected graph. As I understand it, for a general case that the graph might be unconnected, the following statements should be true:
Using DFS and starting from an arbitrary unvisited node in an unconnected graph, it is true that each node might be visited more than once, but using white-gray-black coloring, a cycle -if exists - will be correctly found.
The run time of such a DFS algorithm is O(d.|V|+|E|), where d is the max in-degree among all nodes (i.e. the max time that we can visit each node using such DFS-based algorithm)
Moreover, as this other answer suggest, if after exploring O(|V|) edges, a cycle was not found, it does not exist. So the runtime is really O(|V|).
Imagine we have this simple graph with these edges:
1 -> 3
2 -> 3
1 ----->3
^
|
2--------
So, in our first dfs, we discover node 1 and 3. And, we continue to do dfs with node 2, now, we encounter node 3 again, but is this a cycle? obviously not.
One more example:
1 -> 3
1 -> 2
2 -> 3
1----->3
| ^
| |
| |
v |
2-------
So, starting with node 1, we visit node 3, back to node 2, and now, we encounter node 3 one more time, and, this case, it is not a cycle also.
As far as I understand the simple depth-first-search from Jay Conrod's answer means, a normal, original DFS (only checking for connected component). In the same answer, he also described how to modify simple DFS to find the existence of cycle, which is exactly the algorithm OP has cited. And right below, another answer also mentioned a lemma in the famous Introduction to algorithm book
A directed graph G is acyclic if and only if a depth-first search of G yields no back edges
In short, OP's understanding to detect cycle in directed graph is correct, it is just some complexities and shortcuts have lead to misunderstanding.
In a breadth first search of a directed graph (cycles possible), when a node is dequeued, all its children that has not yet been visited are enqueued, and the process continues until the queue its empty.
One time, I implement it the other way around, where all a node's children are enqueued, and the visitation status is checked instead when a node is dequeued. If a node being dequeued has been visited before, it is discarded and the process continue to the next in queue.
But the result is wrong. Wikipedia also says
depth-first search ...... The non-recursive implementation is similar
to breadth-first search but differs from it in two ways: it uses a
stack instead of a queue, and it delays checking whether a vertex has
been discovered until the vertex is popped from the stack rather than
making this check before pushing the vertex.
However, I cannot wrap my head around what exactly is the difference. Why does depth first search check when popping items out and breadth first search must check before enqueuing?
DFS
Suppose you have a graph:
A---B---E
| |
| |
C---D
And you search DFS from A.
You would expect it to search the nodes A,B,D,C,E if using a depth first search (assuming a certain ordering of the children).
However, if you mark nodes as visited before placing them on the stack, then you will visit A,B,D,E,C because C was marked as visited when we examined A.
In some applications where you just want to visit all connected nodes this is a perfectly valid thing to do, but it is not technically a depth first search.
BFS
In breadth first search you can mark the nodes as visited either before or after pushing to the queue. However, it is more efficient to check before as you do not end up with lots of duplicate nodes in the queue.
I don't understand why your BFS code failed in this case, perhaps if you post the code it will become clearer?
DFS checks whether a node has been visited when dequeing because it may have been visited at a "deeper" level. For example:
A--B--C--E
| |
-------
If we start at A, then B and C will be put on the stack; assume we put them on the stack so B will be processed first. When B is now processed, we want to go down to C and finally to E, which would not happen if we marked C as visited when we discovered it from A. Now once we proceed from B, we find the yet unvisited C and put it on the stack a second time. After we finished processing E, all C entries on the stack need to be ignored, which marking as visited will take care of for us.
As #PeterdeRivaz said, for BFS it's not a matter of correctness, but efficiency whether we check nodes for having been visited when enqueuing or dequeuing.
I have a Directed Cyclic graph consisting of node a, b, c, d, e,f g, where ever node is connected to every other node. The edges may be unidirectional or bidirectional. I need to printout a valid order like this for eg. f->a->c->b->e->d->g such that I can reach the end node from the start node. Note that all the nodes must be present in the output list.
Also note that there may be cycles in the graph.
What I came up with:
Basically first we can try to find a start node. If there is a node such that there is no incoming edge to it (there could be atmost one such node). I may find a start node or may not. Also I will do some preprocessing to find the total number of nodes(lets call it n). Now I will start a DFS from the start node marking nodes as visited when I reach them and counting how many nodes I visited. If I can reach n nodes by this method. I am done. If I hit a node, from which there are no outgoing edges to any unvisited node, I have hit a dead end, and I will just mark that node as unvisited again, reduce the pointer and go to its previous node to try a different route.
This was the case when I find a start node. If I dont find a start node, I will just have to try this with various nodes.
I have no idea if I am even close to the solution. Can anyone help me in this regard?
In my opinion, if there is no incoming edge to a node, it means that node is a start node. You can traverse the graph using this start node. And if this start node can not visit all the n nodes, then there is no solution (as you said that all the nodes must be present in the output list.). This is because if you start with some other nodes, you won't be able to reach this start node.
The problem with your solution is that if you enter a loop you don't know if and when to exit.
A DFS search in these conditions can easily became a non polynomial task!
Let me introduce a polynomial algorithm for your problem.
It looks complicated I hope there's room for simplifications.
Here my suggested solution
1) For each node construct the table of the nodes it can reach (if a can reach b and c; b can reach d; c can reach e; a can reach b,c,d,e even tough there is not a single pathfrom a passing through all of them).
If no node can reach all the other ones you're done: there is no the path you're looking for.
2) Find loops. That's easy: if a node can reach itself, there is a loop. This should be part of the construction of the table at the previous point.
Once you have find one loop you can shrink it (and its nodes) to the representative node whose ingoing (outgoing) connections are the union of the ingoing (outgoing) connections of the nodes in the loop.
You keep reducing loops until you cannot do any more.
3) At this point you are left with an acyclic graph, If there is a path connecting all nodes, there is a single node connected to all and starting from it you can perform depth first search.
4)
Write down the path by replacing the traversal of representative nodes with a loop from the entry point of the loop to the exit point.
I am trying to use BFS for getting all paths between two nodes in a cyclic Graph.
I found BFS doesnt't keep track its previous node, so need to add some other collection to achieve the same.
Question is - Should we avoid using BFS to get all paths in a node and
use DFS or BFS can as well give a potential solution.
If its there please provide me the logic for the same.
I'll assume you only want to find simple paths (with no cycles), otherwise there could be infinite paths. If this is not a requirement, either BFS or DFS would theoretically work (though DFS would just keep exploring one cycle and never find other shorter paths), but lifting this requirement doesn't really make sense.
With both BFS and DFS you should have a processed flag per node to prevent infinitely many paths in cyclic graphs.
Consider the following:
A with children B, C
B with children C, D
C with children B, D
With BFS, you'd miss A -> C -> B -> D because B would have been marked as processed before processing C.
With DFS, this wouldn't be a problem, because the processed flag should be reset when ascending back up the tree.
You could keep track of the entire path so far for each node, but this is not viable since it would require a lot of extra storage space and time.
You could get rid of the processed flag for BFS and stop processing when number of nodes in the path > number of nodes in the tree, and remove all non-simple paths from the output.