From a topcoder article:
"In BFS We mark a vertex visited as we push it into the queue, not as
we pop it in case of DFS."
NOTE: This is said in case of dfs implementation using explicit stack.(pseudo dfs).
My question is why so? why we can not mark a vertex visited after popping from queue, instead while pushing onto the queue in case of bfs ?
Your confusion probably comes from thinking about trees too much, but BFS and DFS can be run on any graph. Consider for example a graph with a loop like A-B-C-A. If you go breadth-first starting from A, you will first add B and C to the list. Then, you will pop B and, unless they were marked as visited, you will add C and A to the list, which is obviously wrong. If instead you go depth first from A, you will then visit B and from there go to C and then to A, unless A was already marked as visited.
So, in summary, you need to mark a vertex as seen as soon as you first see it, no matter which algorithm you take. However, if you only consider DAGs, you will find that things get a bit easier, because there you simply don't have any loop like the above. Anyway, the whole point is that you don't get stuck in a loop, and for that there are multiple variants. Setting a flag is one way, checking a set of visited vertices is another and in some cases like trees, you don't need to do anything but just iterate the edges in order.
Related
I have a very intresting problem regarding graphs and friendship networks. It is as follows:
A teacher wants to make sure that his students aren't cheating by making sure no pair of people who know each other get the same homework. He believes he can do this by making only two different versions of the homework. Design an algorithm (pseudo code) to test whether this is possible or not for a given graph of students and their connections. The algorithm should be based on DFS or BFS.
I approached the problem using BFS and modifying it by comparing to already visited nodes
Modified BFS(v)
Mark v as visited
Enqueue v
While queue is not empty
Dequeue v
Assign either homework to v
For all unvisited neighbors a of v
Give them the opposite homework
Mark them as visited
Add them to queue
For all visited neighbors b of v
Check if their homework is the same as v, terminate if it is
This algorith seems to work for small handmade graphs where i can write out every step. Is it bound to work for all such graphs? If the algorithm is correct, is the pseudo code also acceptable? I am a little uncertain regarding the order of queueing/dequeuing and when it happens as well as whether or not i can make the comparison on the last row without maybe a temporary variable to keep track of the homework for the "current" node.
I wold appreciate any help/input and if you have ideas for another algorithm (for example using DFS), i would appreciate that as well
You made a mistake -- the assign either homework to v goes before the loop.
Then every vertex in the queue will already have homework assigned, and there is never any choice as to which homework to assign to neighbors. This obviously works for all connected graphs. It's a good algorithm.
You need to handle unconnected graphs as well, with a loop over all vertices that runs BFS(v) when v is unvisited.
I have a directed acyclic graph created by users, where each node (vertex) of the graph represents an operation to perform on some data. The outputs of a node depend on its inputs (obviously), and that input is provided by its parents. The outputs are then passed on to its children. Cycles are guaranteed to not be present, so can be ignored.
This graph works on the same principle as the Shader Editor in Blender. Each node performs some operation on its input, and this operation can be arbitrarily expensive. For this reason, I only want to evaluate these operations when strictly required.
When a node is updated, via user input or otherwise, I need to reevaluate every node which depends on the output of the updated node. However, given that I can't justify evaluating the same node multiple times, I need a way to determine the correct order to update the nodes. A basic breadth-first traversal doesn't solve the problem. To see why, consider this graph:
A traditional breadth-first traversal would result in D being evaluated prior to B, despite D depending on B.
I've tried doing a breadth-first traversal in reverse (that is, starting with the O1 and O2 nodes, and traversing up the graph), but I seem to run into the same problem. A reversed breadth-first traversal will visit D before B, thus I2 before A, resulting in I2 being ordered after A, despite A depending on I2.
I'm sure I'm missing something relatively simple here, and I feel as though the reverse traversal is key, but I can't seem to wrap my head around it and get all the pieces to fit. I suppose one potential solution is to use the reverse traversal as intended, but rather than avoiding visiting each node more than once, just visiting each node each time it comes up, ensuring that it has a definitely correct ordering. But visiting each node multiple times and the exponential scaling that comes with that is a very unattractive solution.
Is there a well-known efficient algorithm for this type of problem?
Yes, there is a well known efficient algorithm. It's topological sorting.
Create a dictionary with all nodes and their corresponding in-degree, let's call it indegree_dic. in-degree is the number of parents/or incoming edges to that node. Have a set S of the nodes with in-degree equal to zero.
Taken from the Wikipedia page with some modification:
L ← Empty list that will contain the nodes sorted topologically
S ← Set of all nodes with no incoming edge that haven't been added to L yet
while S is not empty do
remove a node n from S
add n to L
for each child node m of n do
decrement m's indegree
if indegree_dic[m] equals zero then
delete m from indegree_dic
insert m into S
if indegree_dic has length > 0 then
return error (graph is not a DAG)
else
return L (a topologically sorted order)
This sort is not unique. I mention that because it has some impact on your algorithm.
Now, whenever a change happens to any of the nodes, you can safely avoid recalculation of any nodes that come before the changed node in your topologically sorted list, but need to nodes that come after it. You can be sure that all the parents are processed before their children if you follow the sorted list in your calculation.
This algorithm is not optimal, as there could be nodes after the changed node, that are not children of that node. Like in the following scenario:
A
/ \
B C
One correct topological sort would be [A, B, C]. Now, suppose B changes. You skip A because nothing has changed for it, but recalculate C because it comes after B. But you actually don't need to, because B has no effect on C whatsoever.
If the impact of this isn't big, you could use this algorithm and keep the implementation easier and less prone to bugs. But if efficiency is key, here are some ideas that may help:
You can do a topological sort each time and include the which node has change as a factor. When choosing nodes from S in the above algorithm, choose every other node that you can before you choose the changed node. In other words, you choose the changed node from S only when S has length 1. This guarantees that you process every node that isn't below the hierarchy of the changed node before it. This approach helps when the sorting is much cheaper then processing the nodes.
Another approach, which I'm not entirely sure is correct, is to look after the changed node in the topological sorted list and start processing only when you reach the first child of the changed node.
Another way relies on idea 1 but is helpful if you can do some pre-processing. You can create topological sorts for each case of one node being changed. When a node is changed, you try to put it in the ordering as late as possible. You save all these ordering in a node to ordering dictionary and based on which node has changed you choose that ordering.
So hopefully this is a simple question, but I can't seem to find the answer.
The time complexity of DFS is allegedly O(|V|+|E|). Now I'm having issues seeing why it depends on the number of edges. The usual explanation I've seen goes as follows:
Say we implement a DFS using an explicit stack (for simplicity). Say we have a graph where each node is connected to all the rest. We start at some node, visit it and then push all it's neighbors onto the stack. Now we pop the next node and put all of it's neighbors onto the stack. We repeat until we visit all the nodes.
Let's pretend that the node that finds itself on top of the stack is not visited yet in each iteration (best case scenario for this graph). In this case we visited all the nodes in |V| moves, but for each of them we pushed |V|-1 nodes on the stack which means that all the edges are pushed on the stack and the complexity is O(|E|)
A few notes. I'm arguing that the complexity is LESS than that so this proof that only looks at the best scenario for a worst case graph is fine. I'm also assuming that |E| is always larger than |V|. In fact, I'm assuming it's O(|V|^2). This means that O(|V|+|E|) and O(|E|) mean the same thing to me.
Ok, now here's my deal. What if we don't use an explicit stack?
The explosion here is due to the fact that we keep stacking up useless nodes that will never be processed. What if we instead just recurse? The advantage is that we can check if we're done before each recursive call.
Since there's no explicit stack and I'm still only visiting nodes I haven't seen before, I don't see how I can exceed the complexity of O(|V|).
The explosion here is due to the fact that we keep stacking up useless nodes that will never be processed. What if we instead just recurse? The advantage is that we can check if we're done before each recursive call.
That check still contributes to the run time. For each node you visit, you need to see which of its neighbors still need to be visited, which means checking each adjacent edge.
In a breadth first search of a directed graph (cycles possible), when a node is dequeued, all its children that has not yet been visited are enqueued, and the process continues until the queue its empty.
One time, I implement it the other way around, where all a node's children are enqueued, and the visitation status is checked instead when a node is dequeued. If a node being dequeued has been visited before, it is discarded and the process continue to the next in queue.
But the result is wrong. Wikipedia also says
depth-first search ...... The non-recursive implementation is similar
to breadth-first search but differs from it in two ways: it uses a
stack instead of a queue, and it delays checking whether a vertex has
been discovered until the vertex is popped from the stack rather than
making this check before pushing the vertex.
However, I cannot wrap my head around what exactly is the difference. Why does depth first search check when popping items out and breadth first search must check before enqueuing?
DFS
Suppose you have a graph:
A---B---E
| |
| |
C---D
And you search DFS from A.
You would expect it to search the nodes A,B,D,C,E if using a depth first search (assuming a certain ordering of the children).
However, if you mark nodes as visited before placing them on the stack, then you will visit A,B,D,E,C because C was marked as visited when we examined A.
In some applications where you just want to visit all connected nodes this is a perfectly valid thing to do, but it is not technically a depth first search.
BFS
In breadth first search you can mark the nodes as visited either before or after pushing to the queue. However, it is more efficient to check before as you do not end up with lots of duplicate nodes in the queue.
I don't understand why your BFS code failed in this case, perhaps if you post the code it will become clearer?
DFS checks whether a node has been visited when dequeing because it may have been visited at a "deeper" level. For example:
A--B--C--E
| |
-------
If we start at A, then B and C will be put on the stack; assume we put them on the stack so B will be processed first. When B is now processed, we want to go down to C and finally to E, which would not happen if we marked C as visited when we discovered it from A. Now once we proceed from B, we find the yet unvisited C and put it on the stack a second time. After we finished processing E, all C entries on the stack need to be ignored, which marking as visited will take care of for us.
As #PeterdeRivaz said, for BFS it's not a matter of correctness, but efficiency whether we check nodes for having been visited when enqueuing or dequeuing.
Is there an algorithm that can check, in a directed graph, if a vertex, let's say V2, is reachable from a vertex V1, without traversing all the vertices?
You might find a route to that node without traversing all the edges, and if so you can give a yes answer as soon as you do. Nothing short of traversing all the edges can confirm that the node isn't reachable (unless there's some other constraint you haven't stated that could be used to eliminate the possibility earlier).
Edit: I should add that it depends on how often you need to do queries versus how large (and dense) your graph is. If you need to do a huge number of queries on a relatively small graph, it may make sense to pre-process the data in the graph to produce a matrix with a bit at the intersection of any V1 and V2 to indicate whether there's a connection from V1 to V2. This doesn't avoid traversing the graph, but it can avoid traversing the graph at the time of the query. I.e., it's basically a greedy algorithm that assumes you're going to eventually use enough of the combinations that it's easiest to just traverse them all and store the result. Depending on the size of the graph, the pre-processing step may be slow, but once it's done executing a query becomes quite fast (constant time, and usually a pretty small constant at that).
Depth first search or breadth first search. Stop when you find one. But there's no way to tell there's none without going through every one, no. You can improve the performance sometimes with some heuristics, like if you have additional information about the graph. For example, if the graph represents a coordinate space like a real map, and most of the time you know that there's going to be a mostly direct path, then you can attempt to have the depth-first search look along lines that "aim towards the target". However, imagine the case where the start and end points are right next to each other, but with no vector inbetween, and to find it, you have to go way out of the way. You have to check every case in order to be exhaustive.
I doubt it has a name, but a breadth-first search might go like this:
Add V1 to a queue of nodes to be visited
While there are nodes in the queue:
If the node is V2, return true
Mark the node as visited
For every node at the end of an outgoing edge which is not yet visited:
Add this node to the queue
End for
End while
Return false
Create an adjacency matrix when the graph is created. At the same time you do this, create matrices consisting of the powers of the adjacency matrix up to the number of nodes in the graph. To find if there is a path from node u to node v, check the matrices (starting from M^1 and going to M^n) and examine the value at (u, v) in each matrix. If, for any of the matrices checked, that value is greater than zero, you can stop the check because there is indeed a connection. (This gives you even more information as well: the power tells you the number of steps between nodes, and the value tells you how many paths there are between nodes for that step number.)
(Note that if you know the number of steps in the longest path in your graph, for whatever reason, you only need to create a number of matrices up to that power. As well, if you want to save memory, you could just store the base adjacency matrix and create the others as you go along, but for large matrices that may take a fair amount of time if you aren't using an efficient method of doing the multiplications, whether from a library or written on your own.)
It would probably be easiest to just do a depth- or breadth-first search, though, as others have suggested, not only because they're comparatively easy to implement but also because you can generate the path between nodes as you go along. (Technically you'd be generating multiple paths and discarding loops/dead-end ones along the way, but whatever.)
In principle, you can't determine that a path exists without traversing some part of the graph, because the failure case (a path does not exist) cannot be determined without traversing the entire graph.
You MAY be able to improve your performance by searching backwards (search from destination to starting point), or by alternating between forward and backward search steps.
Any good AI textbook will talk at length about search techniques. Elaine Rich's book was good in this area. Amazon is your FRIEND.
You mentioned here that the graph represents a road network. If the graph is planar, you could use Thorup's Algorithm which creates an O(nlogn) space data structure that takes O(nlogn) time to build and answers queries in O(1) time.
Another approach to this problem would allow you to ignore all of the vertices. If you were to only look at the edges, you can produce a transitive closure array that will show you each vertex that is reachable from any other vertex.
Start with your list of edges:
Va -> Vc
Va -> Vd
....
Create an array with start location as the rows and end location as the columns. Fill the arrays with 0. For each edge in the list of edges, place a one in the start,end coordinate of the edge.
Now you iterate a few times until either V1,V2 is 1 or there are no changes.
For each row:
NextRowN = RowN
For each column that is true for RowN
Use boolean OR to OR in the results of that row of that number with the current NextRowN.
Set RowN to NextRowN
If you run this algorithm until the end, you will quickly have a complete list of all reachable vertices without looking at any of them. The runtime is proportional to the number of edges. This would work well with a reasonable implementation and a reasonable number of edges.
A slightly more complex version of this algorithm would be to only calculate the vertices reachable by V1. To do this, you would focus your scope on the ones that are currently reachable at any given time. You can also limit adding rows to only one time, since the other rows are never changing.
In order to be sure, you either have to find a path, or traverse all vertices that are reachable from V1 once.
I would recommend an implementation of depth first or breadth first search that stops when it encounters a vertex that it has already seen. The vertex will be processed on the first occurrence only. You need to make sure that the search starts at V1 and stops when it runs out of vertices or encounters V2.