So I recently implemented a non-recursive version of DFS. Turns out that I can mark the nodes "visited" as soon as they are pushed on the stack or when they are popped out. The problem which I was working on specifically stated to mark it "visited" when pushed on stack. Are both versions some kind of DFS. Or is it like one is DFS and the other is not. Any suggestions are welcomed.
What I think is that if I do the second way, it will mimic the recursive dfs. But why does the other one work?
A recursive dfs (please ignore this)
dfsRec(node)
{
visitedArray[node]=1;
for all neighbours of node
if they aren't visited
dfsRec(neighbour);
}
dfs(startNode)
{
visitedArray;
dfsRec(startNode);
}
The problem with the second way (i.e. marking the node visited when they are popped out) is that your code will loop forever whenever your graph has a cycle. Once DFS reaches that cycle, it would continue going in circles, because the nodes would not be marked visited until they are popped of the stack, so any node reachable through a cycle would be pushed again and again, until you run out of memory.
Note that the issue is not too different from the recursive implementation of DFS: recursive implementation will cause stack overflow instead of running out of memory, but the reason for it would be the same.
Related
So hopefully this is a simple question, but I can't seem to find the answer.
The time complexity of DFS is allegedly O(|V|+|E|). Now I'm having issues seeing why it depends on the number of edges. The usual explanation I've seen goes as follows:
Say we implement a DFS using an explicit stack (for simplicity). Say we have a graph where each node is connected to all the rest. We start at some node, visit it and then push all it's neighbors onto the stack. Now we pop the next node and put all of it's neighbors onto the stack. We repeat until we visit all the nodes.
Let's pretend that the node that finds itself on top of the stack is not visited yet in each iteration (best case scenario for this graph). In this case we visited all the nodes in |V| moves, but for each of them we pushed |V|-1 nodes on the stack which means that all the edges are pushed on the stack and the complexity is O(|E|)
A few notes. I'm arguing that the complexity is LESS than that so this proof that only looks at the best scenario for a worst case graph is fine. I'm also assuming that |E| is always larger than |V|. In fact, I'm assuming it's O(|V|^2). This means that O(|V|+|E|) and O(|E|) mean the same thing to me.
Ok, now here's my deal. What if we don't use an explicit stack?
The explosion here is due to the fact that we keep stacking up useless nodes that will never be processed. What if we instead just recurse? The advantage is that we can check if we're done before each recursive call.
Since there's no explicit stack and I'm still only visiting nodes I haven't seen before, I don't see how I can exceed the complexity of O(|V|).
The explosion here is due to the fact that we keep stacking up useless nodes that will never be processed. What if we instead just recurse? The advantage is that we can check if we're done before each recursive call.
That check still contributes to the run time. For each node you visit, you need to see which of its neighbors still need to be visited, which means checking each adjacent edge.
From a topcoder article:
"In BFS We mark a vertex visited as we push it into the queue, not as
we pop it in case of DFS."
NOTE: This is said in case of dfs implementation using explicit stack.(pseudo dfs).
My question is why so? why we can not mark a vertex visited after popping from queue, instead while pushing onto the queue in case of bfs ?
Your confusion probably comes from thinking about trees too much, but BFS and DFS can be run on any graph. Consider for example a graph with a loop like A-B-C-A. If you go breadth-first starting from A, you will first add B and C to the list. Then, you will pop B and, unless they were marked as visited, you will add C and A to the list, which is obviously wrong. If instead you go depth first from A, you will then visit B and from there go to C and then to A, unless A was already marked as visited.
So, in summary, you need to mark a vertex as seen as soon as you first see it, no matter which algorithm you take. However, if you only consider DAGs, you will find that things get a bit easier, because there you simply don't have any loop like the above. Anyway, the whole point is that you don't get stuck in a loop, and for that there are multiple variants. Setting a flag is one way, checking a set of visited vertices is another and in some cases like trees, you don't need to do anything but just iterate the edges in order.
I am looking at the non-recursive DFS and BFS of a general graph. Besides the fact that the former uses a stack instead of a queue, the only difference is that it "delays checking whether a vertex has been discovered until the vertex is popped from the stack rather than making this check before pushing the vertex." Why is this "visited" check order different? Or put it another way, can we change BFS to non-recursive DFS by simply replacing queue in BFS with stack?
I checked all posts I can find such as this and this, but none clarifies this question.
Yes, that is the only difference.
The DFS algorithm you show from wikipedia has a bug (well, at least a serious inefficiency) in it -- it will reinsert back into S nodes which have already been visited. The BFS one is more sensibly designed, and you could change it to have a stack.
I saw this from an answer to another question
IVlad says that the stack will contain the cycle. But while searching through a graph, wouldn't the nodes that make up the cycle have been popped off in the process?
Maybe he meant in a visited nodes stack? But even then, the visited stack does not cleanly contain the cycle. What I mean is that although the cycle is there, it could have other visited nodes sandwiched between the cycle no?
When you are using DFS to find cycle in a graph, usually you use recursive method to implement your DFS. Recursive methods use stack to store their data, and when you find a cycle in your recursive method, you have all the path to receive current node. mean of IVlad is running program stack and not the stack you are using for implementing your DFS method.
Also you can store the nodes (Path) in another stack.
In a breadth first search of a directed graph (cycles possible), when a node is dequeued, all its children that has not yet been visited are enqueued, and the process continues until the queue its empty.
One time, I implement it the other way around, where all a node's children are enqueued, and the visitation status is checked instead when a node is dequeued. If a node being dequeued has been visited before, it is discarded and the process continue to the next in queue.
But the result is wrong. Wikipedia also says
depth-first search ...... The non-recursive implementation is similar
to breadth-first search but differs from it in two ways: it uses a
stack instead of a queue, and it delays checking whether a vertex has
been discovered until the vertex is popped from the stack rather than
making this check before pushing the vertex.
However, I cannot wrap my head around what exactly is the difference. Why does depth first search check when popping items out and breadth first search must check before enqueuing?
DFS
Suppose you have a graph:
A---B---E
| |
| |
C---D
And you search DFS from A.
You would expect it to search the nodes A,B,D,C,E if using a depth first search (assuming a certain ordering of the children).
However, if you mark nodes as visited before placing them on the stack, then you will visit A,B,D,E,C because C was marked as visited when we examined A.
In some applications where you just want to visit all connected nodes this is a perfectly valid thing to do, but it is not technically a depth first search.
BFS
In breadth first search you can mark the nodes as visited either before or after pushing to the queue. However, it is more efficient to check before as you do not end up with lots of duplicate nodes in the queue.
I don't understand why your BFS code failed in this case, perhaps if you post the code it will become clearer?
DFS checks whether a node has been visited when dequeing because it may have been visited at a "deeper" level. For example:
A--B--C--E
| |
-------
If we start at A, then B and C will be put on the stack; assume we put them on the stack so B will be processed first. When B is now processed, we want to go down to C and finally to E, which would not happen if we marked C as visited when we discovered it from A. Now once we proceed from B, we find the yet unvisited C and put it on the stack a second time. After we finished processing E, all C entries on the stack need to be ignored, which marking as visited will take care of for us.
As #PeterdeRivaz said, for BFS it's not a matter of correctness, but efficiency whether we check nodes for having been visited when enqueuing or dequeuing.