I think I am fairly certain that the DFS algorithm for the problem should not be any different from a normal DFS, but just wanted to get some feedback from other. Here is my problem:
I would like to perform DFS on a graphs for which I do not know all the nodes. When I start the search all I know is just the start node. Based on the start node's properties, I can determine it's set of child nodes. The children of the nodes that have just been discovered can be further discovered as above.
I am planning to use an algorithm similar to a normal DFS (where the Graph is known before hand) except that every time I reach a node, I now need to discover it's child nodes.
Is this a reasonable approach? I am I missing something?
That seems like a perfectly reasonable approach.
However, without knowing the exact details of what you're trying to do, it's next-to-impossible to say for sure whether it will work or be the best approach.
If this is a large (possibly infinite) graph (or it contains cycles and there are too many nodes to keep track of all of them as you do the DFS to prevent getting stuck in a cycle) and you're looking for some node not too deep in the graph, either BFS or DFS with iterative deepening might be a better idea.
DFS stands for Depth First Search. and by "Search" it means it does not know what is ahead of it. So DFS on a known graph is no different from DFS on a graph where you get to know the child just at the time you reach the parent.
I've learned how these algorithms work, but what are they used for?
Do we use them to:
find a certain node in a graph or
to find a shortest path or
to find a cycle in a graph
?
Both of them just visit all the nodes and mark them visited, and I don't see the point of doing that.
I am sort of lost here what I am learning.
BFS and DFS are graph search algorithms that can be used for a variety of different purposes.
One common application of the two search techniques is to identify all nodes that are reachable from a given starting node. For example, suppose that you have a collection of computers that each are networked to a handful of other computers. By running a BFS or DFS from a given node, you will discover all other computers in the network that the original computer is capable of directly or indirectly talking to. These are the computers that come back marked.
BFS specifically can be used to find the shortest path between two nodes in an unweighted graph. Suppose, for example, that you want to send a packet from one computer in a network to another, and that the computers aren't directly connected to one another. Along what route should you send the packet to get it to arrive at the destination as quickly as possible? If you run a BFS and at each iteration have each node store a pointer to its "parent" node, you will end up finding route from the start node to each other node in the graph that minimizes the number of links that have to be traversed to reach the destination computer.
DFS is often used as a subroutine in more complex algorithms. For example, Tarjan's algorithm for computing strongly-connected components is based on depth-first search. Many optimizing compiler techniques run a DFS over an appropriately-constructed graph in order to determine in which order to apply a specific series of operations. Depth-first search can also be used in maze generation: by taking a grid of nodes and linking each node to its neighbors, you can construct a graph representing a grid. Running a random depth-first search over this graph then produces a maze that has exactly one solution.
This is by no means an exhaustive list. These algorithms have all sorts of applications, and as you start to explore more advanced algorithms you will often find yourself relying on DFS and BFS as building blocks. It's similar to sorting - sorting by itself isn't all that interesting, but being able to sort a list of values is enormously useful as a subroutine in more complex algorithms.
Hope this helps!
I am working on a recursive DFS to retrieve all paths between two nodes in an undirected and unweighted graph for now. It takes the start and end node, and DFS on the node and its adjacent nodes recursively while saving the paths.
I was wondering whether there is a more efficient way to find all paths?
There are exponential number of simple paths, and DFS is basically creating all of them 0 so your approach is correct, though time consuming (but this is a part of the problem itself, not the algorithm).
You might be able to optimize it a bit by eliminating from the graph nodes that do not lead to the target, if such nodes exist - effectively trimming unsuccesful searches before calculating them.
Be aware that if the graph contain cycles - there could be infinite number of paths (though finite number of simple paths). Note that to avoid an infinite loop and get all simple paths, your DFS will need to maintain a visited set, that is modified per path (once "discovering" a node insert it to set, and once it is popped from the stack, remove it from the set).
You can adapt Dijkstra's algorithm, also see A Recursive Algorithm to Find all Paths Between Two Given Nodes
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
This post was edited and submitted for review 6 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I understand the differences between DFS and BFS, but I'm interested to know what factors to consider when choosing DFS vs BFS.
Things like avoiding DFS for very deep trees, etc.
That heavily depends on the structure of the search tree and the number and location of solutions (aka searched-for items).
If you know a solution is not far from the root of the tree, a
breadth first search (BFS) might be better.
If the tree is very deep and solutions are rare, depth first search
(DFS) might take an extremely long time, but BFS could be faster.
If the tree is very wide, a BFS might need too much memory, so it
might be completely impractical.
If solutions are frequent but located deep in the tree, BFS could be
impractical.
If the search tree is very deep you will need to restrict the search
depth for depth first search (DFS), anyway (for example with
iterative deepening).
But these are just rules of thumb; you'll probably need to experiment.
I think in practice you'll usually not use these algorithms in their pure form anyway. There could be heuristics that help to explore promising parts of the search space first, or you might want to modify your search algorithm to be able to parallelize it efficiently.
Depth-first Search
Depth-first searches are often used in simulations of games (and game-like situations in the real world). In a typical game you can choose one of several possible actions. Each choice leads to further choices, each of which leads to further choices, and so on into an ever-expanding tree-shaped graph of possibilities.
For example in games like Chess, tic-tac-toe when you are deciding what move to make, you can mentally imagine a move, then your opponent’s possible responses, then your responses, and so on. You can decide what to do by seeing which move leads to the best outcome.
Only some paths in a game tree lead to your win. Some lead to a win by your opponent, when you reach such an ending, you must back up, or backtrack, to a previous node and try a different path. In this way you explore the tree until you find a path with a successful conclusion. Then you make the first move along this path.
Breadth-first search
The breadth-first search has an interesting property: It first finds all the vertices that are one edge away from the starting point, then all the vertices that are two edges away, and so on. This is useful if you’re trying to find the shortest path from the starting vertex to a given vertex. You start a BFS, and when you find the specified vertex, you know the path you’ve traced so far is the shortest path to the node. If there were a shorter path, the BFS would have found it already.
Breadth-first search can be used for finding the neighbour nodes in peer to peer networks like BitTorrent, GPS systems to find nearby locations, social networking sites to find people in the specified distance and things like that.
Nice Explanation from
http://www.programmerinterview.com/index.php/data-structures/dfs-vs-bfs/
An example of BFS
Here’s an example of what a BFS would look like. This is something like Level Order Tree Traversal where we will use QUEUE with ITERATIVE approach (Mostly RECURSION will end up with DFS). The numbers represent the order in which the nodes are accessed in a BFS:
In a depth first search, you start at the root, and follow one of the branches of the tree as far as possible until either the node you are looking for is found or you hit a leaf node ( a node with no children). If you hit a leaf node, then you continue the search at the nearest ancestor with unexplored children.
An example of DFS
Here’s an example of what a DFS would look like. I think post order traversal in binary tree will start work from the Leaf level first. The numbers represent the order in which the nodes are accessed in a DFS:
Differences between DFS and BFS
Comparing BFS and DFS, the big advantage of DFS is that it has much lower memory requirements than BFS, because it’s not necessary to store all of the child pointers at each level. Depending on the data and what you are looking for, either DFS or BFS could be advantageous.
For example, given a family tree if one were looking for someone on the tree who’s still alive, then it would be safe to assume that person would be on the bottom of the tree. This means that a BFS would take a very long time to reach that last level. A DFS, however, would find the goal faster. But, if one were looking for a family member who died a very long time ago, then that person would be closer to the top of the tree. Then, a BFS would usually be faster than a DFS. So, the advantages of either vary depending on the data and what you’re looking for.
One more example is Facebook; Suggestion on Friends of Friends. We need immediate friends for suggestion where we can use BFS. May be finding the shortest path or detecting the cycle (using recursion) we can use DFS.
Breadth First Search is generally the best approach when the depth of the tree can vary, and you only need to search part of the tree for a solution. For example, finding the shortest path from a starting value to a final value is a good place to use BFS.
Depth First Search is commonly used when you need to search the entire tree. It's easier to implement (using recursion) than BFS, and requires less state: While BFS requires you store the entire 'frontier', DFS only requires you store the list of parent nodes of the current element.
DFS is more space-efficient than BFS, but may go to unnecessary depths.
Their names are revealing: if there's a big breadth (i.e. big branching factor), but very limited depth (e.g. limited number of "moves"), then DFS can be more preferrable to BFS.
On IDDFS
It should be mentioned that there's a less-known variant that combines the space efficiency of DFS, but (cummulatively) the level-order visitation of BFS, is the iterative deepening depth-first search. This algorithm revisits some nodes, but it only contributes a constant factor of asymptotic difference.
When you approach this question as a programmer, one factor stands out: if you're using recursion, then depth-first search is simpler to implement, because you don't need to maintain an additional data structure containing the nodes yet to explore.
Here's depth-first search for a non-oriented graph if you're storing “already visited” information in the nodes:
def dfs(origin): # DFS from origin:
origin.visited = True # Mark the origin as visited
for neighbor in origin.neighbors: # Loop over the neighbors
if not neighbor.visited: dfs(neighbor) # Visit each neighbor if not already visited
If storing “already visited” information in a separate data structure:
def dfs(node, visited): # DFS from origin, with already-visited set:
visited.add(node) # Mark the origin as visited
for neighbor in node.neighbors: # Loop over the neighbors
if not neighbor in visited: # If the neighbor hasn't been visited yet,
dfs(neighbor, visited) # then visit the neighbor
dfs(origin, set())
Contrast this with breadth-first search where you need to maintain a separate data structure for the list of nodes yet to visit, no matter what.
One important advantage of BFS would be that it can be used to find the shortest path between any two nodes in an unweighted graph.
Whereas, we cannot use DFS for the same.
The following is a comprehensive answer to what you are asking.
In simple terms:
Breadth First Search (BFS) algorithm, from its name "Breadth", discovers all the neighbours of a node through the out edges of the node then it discovers the unvisited neighbours of the previously mentioned neighbours through their out edges and so forth, till all the nodes reachable from the origional source are visited (we can continue and take another origional source if there are remaining unvisited nodes and so forth). That's why it can be used to find the shortest path (if there is any) from a node (origional source) to another node if the weights of the edges are uniform.
Depth First Search (DFS) algorithm, from its name "Depth", discovers the unvisited neighbours of the most recently discovered node x through its out edges. If there is no unvisited neighbour from the node x, the algorithm backtracks to discover the unvisited neighbours of the node (through its out edges) from which node x was discovered, and so forth, till all the nodes reachable from the origional source are visited (we can continue and take another origional source if there are remaining unvisited nodes and so forth).
Both BFS and DFS can be incomplete. For example if the branching factor of a node is infinite, or very big for the resources (memory) to support (e.g. when storing the nodes to be discovered next), then BFS is not complete even though the searched key can be at a distance of few edges from the origional source. This infinite branching factor can be because of infinite choices (neighbouring nodes) from a given node to discover.
If the depth is infinite, or very big for the resources (memory) to support (e.g. when storing the nodes to be discovered next), then DFS is not complete even though the searched key can be the third neighbor of the origional source. This infinite depth can be because of a situation where there is, for every node the algorithm discovers, at least a new choice (neighbouring node) that is unvisited before.
Therefore, we can conclude when to use the BFS and DFS. Suppose we are dealing with a manageable limited branching factor and a manageable limited depth. If the searched node is shallow i.e. reachable after some edges from the origional source, then it is better to use BFS. On the other hand, if the searched node is deep i.e. reachable after a lot of edges from the origional source, then it is better to use DFS.
For example, in a social network if we want to search for people who have similar interests of a specific person, we can apply BFS from this person as an origional source, because mostly these people will be his direct friends or friends of friends i.e. one or two edges far.
On the other hand, if we want to search for people who have completely different interests of a specific person, we can apply DFS from this person as an origional source, because mostly these people will be very far from him i.e. friend of friend of friend.... i.e. too many edges far.
Applications of BFS and DFS can vary also because of the mechanism of searching in each one. For example, we can use either BFS (assuming the branching factor is manageable) or DFS (assuming the depth is manageable) when we just want to check the reachability from one node to another having no information where that node can be. Also both of them can solve same tasks like topological sorting of a graph (if it has).
BFS can be used to find the shortest path, with unit weight edges, from a node (origional source) to another. Whereas, DFS can be used to exhaust all the choices because of its nature of going in depth, like discovering the longest path between two nodes in an acyclic graph. Also DFS, can be used for cycle detection in a graph.
In the end if we have infinite depth and infinite branching factor, we can use Iterative Deepening Search (IDS).
I think it depends on what problems you are facing.
shortest path on simple graph -> bfs
all possible results -> dfs
search on graph(treat tree, martix as a graph too) -> dfs
....
Some algorithms depend on particular properties of DFS (or BFS) to work. For example the Hopcroft and Tarjan algorithm for finding 2-connected components takes advantage of the fact that each already visited node encountered by DFS is on the path from root to the currently explored node.
For BFS, we can consider Facebook example. We receive suggestion to add friends from the FB profile from other other friends profile. Suppose A->B, while B->E and B->F, so A will get suggestion for E And F. They must be using BFS to read till second level.
DFS is more based on scenarios where we want to forecast something based on data we have from source to destination. As mentioned already about chess or sudoku.
Once thing I have different here is, I believe DFS should be used for shortest path because DFS will cover the whole path first then we can decide the best. But as BFS will use greedy's approach so might be it looks like its the shortest path, but the final result might differ.
Let me know whether my understanding is wrong.
According to the properties of DFS and BFS.
For example,when we want to find the shortest path.
we usually use bfs,it can guarantee the 'shortest'.
but dfs only can guarantee that we can come from this point can achieve that point ,can not guarantee the 'shortest'.
Because Depth-First Searches use a stack as the nodes are processed, backtracking is provided with DFS. Because Breadth-First Searches use a queue, not a stack, to keep track of what nodes are processed, backtracking is not provided with BFS.
When tree width is very large and depth is low use DFS as recursion stack will not overflow.Use BFS when width is low and depth is very large to traverse the tree.
This is a good example to demonstrate that BFS is better than DFS in certain case. https://leetcode.com/problems/01-matrix/
When correctly implemented, both solutions should visit cells that have farther distance than the current cell +1.
But DFS is inefficient and repeatedly visited the same cell resulting O(n*n) complexity.
For example,
1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,
0,0,0,0,0,0,0,0,
I'm using the Lengauer and Tarjan algorithm with path compression to calculate the dominator tree for a graph where there are millions of nodes. The algorithm is quite complex and I have to admit I haven't taken the time to fully understand it, I'm just using it. Now I have a need to calculate the dominator trees of the direct children of the root node and possibly recurse down the graph to a certain depth repeating this operation. I.e. when I calculate the dominator tree for a child of the root node I want to pretend that the root node has been removed from the graph.
My question is whether there is an efficient solution to this that makes use of immediate dominator information already calculated in the initial dominator tree for the root node? In other words I don't want to start from scratch for each of the children because the whole process is quite time consuming.
Naively it seems it must be possible since there will be plenty of nodes deep down in the graph that have idoms just a little way above them and are unaffected by changes at the top of the graph.
BTW just as aside: it's bizarre that the subject of dominator trees is "owned" by compiler people and there is no mention of it in books on classic graph theory. The application I'm using it for - my FindRoots java heap analyzer - is not related to compiler theory.
Clarification: I'm talking about directed graphs here. The "root" I refer to is actually the node with the greatest reachability. I've updated the text above replacing references to "tree" with "graph". I tend to think of them as trees because the shape is mainly tree-like. The graph is actually of the objects in a java heap and as you can imagine is reasonably hierarchical. I have found the dominator tree useful when doing OOM leak analysis because what you are interested in is "what keeps this object alive?" and the answer ultimately is its dominator. Dominator trees allow you to <ahem> see the wood rather than the trees. But sometimes lots of junk floats to the top of the tree so you have a root with thousands of children directly below it. For such cases I would like to experiment with calculating the dominator trees rooted at each of the direct children (in the original graph) of the root and then maybe go to the next level down and so on. (I'm trying not to worry about the possibility of back links for the time being :)
boost::lengauer_tarjan_dominator_tree_without_dfs might help.
Judging by the lack of comments, I guess there aren't many people on Stackoverflow with the relevent experience to help you. I'm one of those people, but I don't want such an interesting question go down with with a dull thud so I'll try and lend a hand.
My first thought is that if this graph is generated by other compilers would it be worth taking a look at an open-source compiler, like GCC, to see how it solves this problem?
My second thought is that, the main point of your question appears to be avoiding recomputing the result for the root of the tree.
What I would do is create a wrapper around each node that contains the node itself and any pre-computed data associated with that node. A new tree would then be reconstructed from the old tree recursively using these wrapper classes. As you're constructing this tree, you'd start at the root and work your way out to the leaf nodes. For each node, you'd store the result of the computation for all the ancestory thus far. That way, you should only ever have to look at the parent node and the current node data you're processing to compute the value for your new node.
I hope that helps!
Could you elaborate on what sort of graph you're starting with? I don't see how there is any difference between a graph which is a tree, and the dominator tree of that graph. Every node's parent should be its idom, and it would of course be dominated by everything above it in the tree.
I do not fully understand your question, but it seems to me you want to have some incremental update feature. I researched a while ago what algorithms are their but it seemed to me that there's no known way for large graphs to do this quickly (at least from a theoretical standpoint).
You may just search for "incremental updates dominator tree" to find some references.
I guess you are aware the Eclipse Memory Analyzer does use dominator trees, so this topic is not completely "owned" by the compiler community anymore :)