Breadth first search: the timing of checking visitation status - algorithm

In a breadth first search of a directed graph (cycles possible), when a node is dequeued, all its children that has not yet been visited are enqueued, and the process continues until the queue its empty.
One time, I implement it the other way around, where all a node's children are enqueued, and the visitation status is checked instead when a node is dequeued. If a node being dequeued has been visited before, it is discarded and the process continue to the next in queue.
But the result is wrong. Wikipedia also says
depth-first search ...... The non-recursive implementation is similar
to breadth-first search but differs from it in two ways: it uses a
stack instead of a queue, and it delays checking whether a vertex has
been discovered until the vertex is popped from the stack rather than
making this check before pushing the vertex.
However, I cannot wrap my head around what exactly is the difference. Why does depth first search check when popping items out and breadth first search must check before enqueuing?

DFS
Suppose you have a graph:
A---B---E
| |
| |
C---D
And you search DFS from A.
You would expect it to search the nodes A,B,D,C,E if using a depth first search (assuming a certain ordering of the children).
However, if you mark nodes as visited before placing them on the stack, then you will visit A,B,D,E,C because C was marked as visited when we examined A.
In some applications where you just want to visit all connected nodes this is a perfectly valid thing to do, but it is not technically a depth first search.
BFS
In breadth first search you can mark the nodes as visited either before or after pushing to the queue. However, it is more efficient to check before as you do not end up with lots of duplicate nodes in the queue.
I don't understand why your BFS code failed in this case, perhaps if you post the code it will become clearer?

DFS checks whether a node has been visited when dequeing because it may have been visited at a "deeper" level. For example:
A--B--C--E
| |
-------
If we start at A, then B and C will be put on the stack; assume we put them on the stack so B will be processed first. When B is now processed, we want to go down to C and finally to E, which would not happen if we marked C as visited when we discovered it from A. Now once we proceed from B, we find the yet unvisited C and put it on the stack a second time. After we finished processing E, all C entries on the stack need to be ignored, which marking as visited will take care of for us.
As #PeterdeRivaz said, for BFS it's not a matter of correctness, but efficiency whether we check nodes for having been visited when enqueuing or dequeuing.

Related

Efficient Graph Traversal for Node Editor Evaluation

I have a directed acyclic graph created by users, where each node (vertex) of the graph represents an operation to perform on some data. The outputs of a node depend on its inputs (obviously), and that input is provided by its parents. The outputs are then passed on to its children. Cycles are guaranteed to not be present, so can be ignored.
This graph works on the same principle as the Shader Editor in Blender. Each node performs some operation on its input, and this operation can be arbitrarily expensive. For this reason, I only want to evaluate these operations when strictly required.
When a node is updated, via user input or otherwise, I need to reevaluate every node which depends on the output of the updated node. However, given that I can't justify evaluating the same node multiple times, I need a way to determine the correct order to update the nodes. A basic breadth-first traversal doesn't solve the problem. To see why, consider this graph:
A traditional breadth-first traversal would result in D being evaluated prior to B, despite D depending on B.
I've tried doing a breadth-first traversal in reverse (that is, starting with the O1 and O2 nodes, and traversing up the graph), but I seem to run into the same problem. A reversed breadth-first traversal will visit D before B, thus I2 before A, resulting in I2 being ordered after A, despite A depending on I2.
I'm sure I'm missing something relatively simple here, and I feel as though the reverse traversal is key, but I can't seem to wrap my head around it and get all the pieces to fit. I suppose one potential solution is to use the reverse traversal as intended, but rather than avoiding visiting each node more than once, just visiting each node each time it comes up, ensuring that it has a definitely correct ordering. But visiting each node multiple times and the exponential scaling that comes with that is a very unattractive solution.
Is there a well-known efficient algorithm for this type of problem?
Yes, there is a well known efficient algorithm. It's topological sorting.
Create a dictionary with all nodes and their corresponding in-degree, let's call it indegree_dic. in-degree is the number of parents/or incoming edges to that node. Have a set S of the nodes with in-degree equal to zero.
Taken from the Wikipedia page with some modification:
L ← Empty list that will contain the nodes sorted topologically
S ← Set of all nodes with no incoming edge that haven't been added to L yet
while S is not empty do
remove a node n from S
add n to L
for each child node m of n do
decrement m's indegree
if indegree_dic[m] equals zero then
delete m from indegree_dic
insert m into S
if indegree_dic has length > 0 then
return error (graph is not a DAG)
else
return L (a topologically sorted order)
This sort is not unique. I mention that because it has some impact on your algorithm.
Now, whenever a change happens to any of the nodes, you can safely avoid recalculation of any nodes that come before the changed node in your topologically sorted list, but need to nodes that come after it. You can be sure that all the parents are processed before their children if you follow the sorted list in your calculation.
This algorithm is not optimal, as there could be nodes after the changed node, that are not children of that node. Like in the following scenario:
A
/ \
B C
One correct topological sort would be [A, B, C]. Now, suppose B changes. You skip A because nothing has changed for it, but recalculate C because it comes after B. But you actually don't need to, because B has no effect on C whatsoever.
If the impact of this isn't big, you could use this algorithm and keep the implementation easier and less prone to bugs. But if efficiency is key, here are some ideas that may help:
You can do a topological sort each time and include the which node has change as a factor. When choosing nodes from S in the above algorithm, choose every other node that you can before you choose the changed node. In other words, you choose the changed node from S only when S has length 1. This guarantees that you process every node that isn't below the hierarchy of the changed node before it. This approach helps when the sorting is much cheaper then processing the nodes.
Another approach, which I'm not entirely sure is correct, is to look after the changed node in the topological sorted list and start processing only when you reach the first child of the changed node.
Another way relies on idea 1 but is helpful if you can do some pre-processing. You can create topological sorts for each case of one node being changed. When a node is changed, you try to put it in the ordering as late as possible. You save all these ordering in a node to ordering dictionary and based on which node has changed you choose that ordering.

How does Depth First-Search visit a node?

An algorithm may come upon a node a second time, i.e. there might be two paths to the node. The algorithm needs to know which path was shorter.
When the Best First Search reaches a node that it has visited before, it is possible that the previous visit had a longer path. When this happens, the open and closed lists need updating.
This can not happen with an A* search.
Question: Can this happen with a DFS?
The answer is yes, but I thought it was no. Why is it yes? I thought that once a node has been visited it wont go back to it.
DFS strategy will visit a node as many times as it finds a path to it. It wouldn't continue visiting from that node down, but it would register the visit itself. This is essential for DFS edge classification.
For example, consider running DFS on this graph:
When you reach node C for the first time, the path is A-B-C. When you reach C for the second time, the path is A-C, which is shorter.
If you have a graph like this
A
|\
B \
| E
C /
|/
D
and traverse it depth-first, from left to right, the following pathes will be visited in that order:
A
AB
ABC
ABCD
AE
AED
You see, the first visit of D is on a longer path (ABCD) than the second one (AED).

Dfs Vs Bfs confusion

From a topcoder article:
"In BFS We mark a vertex visited as we push it into the queue, not as
we pop it in case of DFS."
NOTE: This is said in case of dfs implementation using explicit stack.(pseudo dfs).
My question is why so? why we can not mark a vertex visited after popping from queue, instead while pushing onto the queue in case of bfs ?
Your confusion probably comes from thinking about trees too much, but BFS and DFS can be run on any graph. Consider for example a graph with a loop like A-B-C-A. If you go breadth-first starting from A, you will first add B and C to the list. Then, you will pop B and, unless they were marked as visited, you will add C and A to the list, which is obviously wrong. If instead you go depth first from A, you will then visit B and from there go to C and then to A, unless A was already marked as visited.
So, in summary, you need to mark a vertex as seen as soon as you first see it, no matter which algorithm you take. However, if you only consider DAGs, you will find that things get a bit easier, because there you simply don't have any loop like the above. Anyway, the whole point is that you don't get stuck in a loop, and for that there are multiple variants. Setting a flag is one way, checking a set of visited vertices is another and in some cases like trees, you don't need to do anything but just iterate the edges in order.

How can I do this graph traversal?

I have a Directed Cyclic graph consisting of node a, b, c, d, e,f g, where ever node is connected to every other node. The edges may be unidirectional or bidirectional. I need to printout a valid order like this for eg. f->a->c->b->e->d->g such that I can reach the end node from the start node. Note that all the nodes must be present in the output list.
Also note that there may be cycles in the graph.
What I came up with:
Basically first we can try to find a start node. If there is a node such that there is no incoming edge to it (there could be atmost one such node). I may find a start node or may not. Also I will do some preprocessing to find the total number of nodes(lets call it n). Now I will start a DFS from the start node marking nodes as visited when I reach them and counting how many nodes I visited. If I can reach n nodes by this method. I am done. If I hit a node, from which there are no outgoing edges to any unvisited node, I have hit a dead end, and I will just mark that node as unvisited again, reduce the pointer and go to its previous node to try a different route.
This was the case when I find a start node. If I dont find a start node, I will just have to try this with various nodes.
I have no idea if I am even close to the solution. Can anyone help me in this regard?
In my opinion, if there is no incoming edge to a node, it means that node is a start node. You can traverse the graph using this start node. And if this start node can not visit all the n nodes, then there is no solution (as you said that all the nodes must be present in the output list.). This is because if you start with some other nodes, you won't be able to reach this start node.
The problem with your solution is that if you enter a loop you don't know if and when to exit.
A DFS search in these conditions can easily became a non polynomial task!
Let me introduce a polynomial algorithm for your problem.
It looks complicated I hope there's room for simplifications.
Here my suggested solution
1) For each node construct the table of the nodes it can reach (if a can reach b and c; b can reach d; c can reach e; a can reach b,c,d,e even tough there is not a single pathfrom a passing through all of them).
If no node can reach all the other ones you're done: there is no the path you're looking for.
2) Find loops. That's easy: if a node can reach itself, there is a loop. This should be part of the construction of the table at the previous point.
Once you have find one loop you can shrink it (and its nodes) to the representative node whose ingoing (outgoing) connections are the union of the ingoing (outgoing) connections of the nodes in the loop.
You keep reducing loops until you cannot do any more.
3) At this point you are left with an acyclic graph, If there is a path connecting all nodes, there is a single node connected to all and starting from it you can perform depth first search.
4)
Write down the path by replacing the traversal of representative nodes with a loop from the entry point of the loop to the exit point.

Can someone explain Breadth-first search?

Can someone explain Breadth-first search to solve the below kind of problems
I need to find all the paths between 4 and 7
You look at all the nodes adjacent to your start node. Then you look at all the nodes adjacent to those (not returning to nodes you've already looked at). Repeat until satisfying node found or no more nodes.
For the kind of problem you indicate, you use the process described above to build a set of paths, terminating any that arrive at the desired destination node, and when your graph is exhausted, the set of paths that so terminated is your solution set.
Breadth first search (BFS) means that you process all of your starting nodes' direct children, before going to deeper levels. BFS is implemented with a queue which stores a list of nodes that need to be processed.
You start the algorithm by adding your start node to the queue. Then you continue to run your algorithm until you have nothing left to process in the queue.
When you dequeue an element from the queue, it becomes your active node. When you are processing your active node, you add all of its children to the back of the queue. You terminate the algorithm when you have an empty queue.
Since you are dealing with a graph instead of a tree, you need to store a list of your processed nodes so you don't enter into a loop.
Once you reach node 7 you have a matching path and you can stop doing the BFS for that recursive path. Meaning that you simply don't add its children to the queue. To avoid loops and to know the exact path you've visited, as you do your recursive BFS calls, pass up the already visited nodes.
Think of it as websites with links to other sites on them. Let A be our root node or our starting website.
A is the starting page (Layer 1)
A links to AA, AB, AC (Layer 2)
AA links to AAA, AAB, AAC (Layer 3)
AB links to ABA, ABB, ABC (Layer 3)
AC links to ACA, ACB, ACC (Layer 3)
This is only three layers deep. You search one layer at a time. So in this case you would start at layer A. If that does not match you go to the next layer, AA, AB and AC. If none of those websites is the one you are searching for then you follow the links and go to the next layer. In other words, you look at one layer at a time.
A depth first search (its complement) you would go from A to AA to AAA. In other words you would go DEEP before going WIDE.
You test each node connected to the root node. Then you test each node connected to the previous nodes. So on, until you find your answer.
Basically, each iteration tests nodes that are the same distance away from the root node.
BEGIN
4;
4,2;
4,2,1; 4,2,3; 4,2,5;
4,2,1;(fail) 4,2,3,6; 4,2,5,6; 4,2,5,11;
4,2,3,6,7;(pass) 4,2,3,6,8; 4,2,5,6,7;(pass) 4,2,5,6,8; 4,2,5,11,12;
4,2,3,6,8,9; 4,2,3,6,8,10; 4,2,5,6,8,9; 4,2,5,6,8,10; 4,2,5,11,12;(fail)
4,2,3,6,8,9;(fail) 4,2,3,6,8,10;(fail) 4,2,5,6,8,9;(fail) 4,2,5,6,8,10;(fail)
END

Resources