What defines left and right in graphs and trees? - algorithm

I'm learning about tree traversals and I can't seem to find any clear rules for how DFS or BFS algorithms decide which path to take first. I've seen variations of left first or least first.
Is left taken as being first child in the list?
Does this mean that (for a given node) the depth of a vertex in a graph that is part of a cycle is taken using the leftward path?
Also doesn't using a 'least first' rule make for a slower algorithm?
Thanks

Left is only meaningful for trees where the child nodes are oldered. Otherwise usually the author refers to first in the list of child nodes. Depth of a vertex is also not well defined in a graph that is not a tree, but if you refer to depth with respect to a given node that would usually be the shortest distance from the starting node.
I am not sure what does least first mean but if it refers to the key values of nodes and there is no ordering in the child nodes, finding the least will take more time of course.
Hope this helps.

Related

Is DFS classification of edges valid?

DFS can be used to classify edges to tree, forward, back and cross.
Given the classification of edges and number of vertices, can we determine in linear complexity, is it a valid result of DFS? And if so - how?
For example, here is invalid classification: (it's impossible to get one like that, regardless of the root vertex we choose and the order of visiting children)
Apologies for a somewhat brief answer: yes, and here's an algorithm. Verify that the tree edges indeed form a tree. For every other edge, compute the leafmost common ancestor of its endpoints. For forward edges, this should be the tail. For back edges, this should be the head. For cross edges, this should be neither.
Assuming that everything looks good so far, the rootward paths from the head and tail of a cross edge arrive at the LCA via different children. In every possible DFS order that generates the given classification, the head's child is visited before the tail's child. Collect all of these constraints and verify that there is no cycle.
I claim that, together, these checks are sound and complete. Completeness is verified by linearizing the constraints with a topological sort and constructing a plausible DFS order.

Advantage of depth first search over breadth first search or vice versa

I have studied the two graph traversal algorithms,depth first search and breadth first search.Since both algorithms are used to solve the same problem of graph traversal I would like to know how to choose between the two.I mean is one more efficient than the other or any reason why i would choose one over the other in a particular scenario ?
Thank You
Main difference to me is somewhat theoretical. If you had an infinite sized graph then DFS would never find an element if it exists outside of the first path it chooses. It would essentially keep going down the first path and would never find the element. The BFS would eventually find the element.
If the size of the graph is finite, DFS would likely find a outlier (larger distance between root and goal) element faster where BFS would find a closer element faster. Except in the case where DFS chooses the path of the shallow element.
In general, BFS is better for problems related to finding the shortest paths or somewhat related problems. Because here you go from one node to all node that are adjacent to it and hence you effectively move from path length one to path length two and so on.
While DFS on the other end helps more in connectivity problems and also in finding cycles in graph(though I think you might be able to find cycles with a bit of modification of BFS too). Determining connectivity with DFS is trivial, if you call the explore procedure twice from the DFS procedure, then the graph is disconnected (this is for an undirected graph). You can see the strongly connected component algorithm for a directed graph here, which is a modification of DFS. Another application of the DFS is topological sorting.
These are some applications of both the algorithms:
DFS:
Connectivity
Strongly Connected Components
Topological Sorting
BFS:
Shortest Path(Dijkstra is some what of a modification of BFS).
Testing whether the graph is Bipartitie.
When traversing a multiply-connected graph, the order in which nodes are traversed may greatly influence (by many orders of magnitude) the number of nodes to be tracked by the traversing method. Some kinds of algorithms will be massively better when using breadth-first; others will be massively better when using depth-search.
At one extreme, doing a depth-first search on a binary tree with N leaf nodes requires that the traversing method keep track of lgN nodes while a breadth-first search would require keeping track of at least N/2 nodes (since it might scan all other nodes before it scans any leaf nodes; immediately prior to scanning the first leaf node, it would have encountered N/2 of the leafs' parent nodes which have to be tracked separately since none of them reference each other).
On the other extreme, doing a flood-fill on an NxN grid with a method that, if its pixel hasn't been colored yet, colors that pixel and then flood-fills its neighbors will require enqueuing O(N) pixel coordinates if using breadth-first search, but O(N^2) pixel coordinates if using depth-first. When using breadth-first search, paint will seem to "spread out", regardless of the shape to be painted; when using depth-first algorithm to paint a rectangular spiral, each line of which is straight on one side and jagged on the other (which sides should be straight and jagged depends upon the exact algorithm used), all of the straight sections will get painted before any of the jagged ones, meaning that the system must track the location of every jag separately.
For a complete/perfect tree, DFS takes a linear amount of space with respect to the depth of the tree whereas BFS takes an exponential amount of space with respect to the depth of the tree. This is because for BFS the maximum number of nodes in the queue is proportional to the number of nodes in one level of the tree. In DFS the maximum number of nodes in the stack is proportional to the depth of the tree.

What are the practical factors to consider when choosing between Depth-First Search (DFS) and Breadth-First Search (BFS)? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
This post was edited and submitted for review 6 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I understand the differences between DFS and BFS, but I'm interested to know what factors to consider when choosing DFS vs BFS.
Things like avoiding DFS for very deep trees, etc.
That heavily depends on the structure of the search tree and the number and location of solutions (aka searched-for items).
If you know a solution is not far from the root of the tree, a
breadth first search (BFS) might be better.
If the tree is very deep and solutions are rare, depth first search
(DFS) might take an extremely long time, but BFS could be faster.
If the tree is very wide, a BFS might need too much memory, so it
might be completely impractical.
If solutions are frequent but located deep in the tree, BFS could be
impractical.
If the search tree is very deep you will need to restrict the search
depth for depth first search (DFS), anyway (for example with
iterative deepening).
But these are just rules of thumb; you'll probably need to experiment.
I think in practice you'll usually not use these algorithms in their pure form anyway. There could be heuristics that help to explore promising parts of the search space first, or you might want to modify your search algorithm to be able to parallelize it efficiently.
Depth-first Search
Depth-first searches are often used in simulations of games (and game-like situations in the real world). In a typical game you can choose one of several possible actions. Each choice leads to further choices, each of which leads to further choices, and so on into an ever-expanding tree-shaped graph of possibilities.
For example in games like Chess, tic-tac-toe when you are deciding what move to make, you can mentally imagine a move, then your opponent’s possible responses, then your responses, and so on. You can decide what to do by seeing which move leads to the best outcome.
Only some paths in a game tree lead to your win. Some lead to a win by your opponent, when you reach such an ending, you must back up, or backtrack, to a previous node and try a different path. In this way you explore the tree until you find a path with a successful conclusion. Then you make the first move along this path.
Breadth-first search
The breadth-first search has an interesting property: It first finds all the vertices that are one edge away from the starting point, then all the vertices that are two edges away, and so on. This is useful if you’re trying to find the shortest path from the starting vertex to a given vertex. You start a BFS, and when you find the specified vertex, you know the path you’ve traced so far is the shortest path to the node. If there were a shorter path, the BFS would have found it already.
Breadth-first search can be used for finding the neighbour nodes in peer to peer networks like BitTorrent, GPS systems to find nearby locations, social networking sites to find people in the specified distance and things like that.
Nice Explanation from
http://www.programmerinterview.com/index.php/data-structures/dfs-vs-bfs/
An example of BFS
Here’s an example of what a BFS would look like. This is something like Level Order Tree Traversal where we will use QUEUE with ITERATIVE approach (Mostly RECURSION will end up with DFS). The numbers represent the order in which the nodes are accessed in a BFS:
In a depth first search, you start at the root, and follow one of the branches of the tree as far as possible until either the node you are looking for is found or you hit a leaf node ( a node with no children). If you hit a leaf node, then you continue the search at the nearest ancestor with unexplored children.
An example of DFS
Here’s an example of what a DFS would look like. I think post order traversal in binary tree will start work from the Leaf level first. The numbers represent the order in which the nodes are accessed in a DFS:
Differences between DFS and BFS
Comparing BFS and DFS, the big advantage of DFS is that it has much lower memory requirements than BFS, because it’s not necessary to store all of the child pointers at each level. Depending on the data and what you are looking for, either DFS or BFS could be advantageous.
For example, given a family tree if one were looking for someone on the tree who’s still alive, then it would be safe to assume that person would be on the bottom of the tree. This means that a BFS would take a very long time to reach that last level. A DFS, however, would find the goal faster. But, if one were looking for a family member who died a very long time ago, then that person would be closer to the top of the tree. Then, a BFS would usually be faster than a DFS. So, the advantages of either vary depending on the data and what you’re looking for.
One more example is Facebook; Suggestion on Friends of Friends. We need immediate friends for suggestion where we can use BFS. May be finding the shortest path or detecting the cycle (using recursion) we can use DFS.
Breadth First Search is generally the best approach when the depth of the tree can vary, and you only need to search part of the tree for a solution. For example, finding the shortest path from a starting value to a final value is a good place to use BFS.
Depth First Search is commonly used when you need to search the entire tree. It's easier to implement (using recursion) than BFS, and requires less state: While BFS requires you store the entire 'frontier', DFS only requires you store the list of parent nodes of the current element.
DFS is more space-efficient than BFS, but may go to unnecessary depths.
Their names are revealing: if there's a big breadth (i.e. big branching factor), but very limited depth (e.g. limited number of "moves"), then DFS can be more preferrable to BFS.
On IDDFS
It should be mentioned that there's a less-known variant that combines the space efficiency of DFS, but (cummulatively) the level-order visitation of BFS, is the iterative deepening depth-first search. This algorithm revisits some nodes, but it only contributes a constant factor of asymptotic difference.
When you approach this question as a programmer, one factor stands out: if you're using recursion, then depth-first search is simpler to implement, because you don't need to maintain an additional data structure containing the nodes yet to explore.
Here's depth-first search for a non-oriented graph if you're storing “already visited” information in the nodes:
def dfs(origin): # DFS from origin:
origin.visited = True # Mark the origin as visited
for neighbor in origin.neighbors: # Loop over the neighbors
if not neighbor.visited: dfs(neighbor) # Visit each neighbor if not already visited
If storing “already visited” information in a separate data structure:
def dfs(node, visited): # DFS from origin, with already-visited set:
visited.add(node) # Mark the origin as visited
for neighbor in node.neighbors: # Loop over the neighbors
if not neighbor in visited: # If the neighbor hasn't been visited yet,
dfs(neighbor, visited) # then visit the neighbor
dfs(origin, set())
Contrast this with breadth-first search where you need to maintain a separate data structure for the list of nodes yet to visit, no matter what.
One important advantage of BFS would be that it can be used to find the shortest path between any two nodes in an unweighted graph.
Whereas, we cannot use DFS for the same.
The following is a comprehensive answer to what you are asking.
In simple terms:
Breadth First Search (BFS) algorithm, from its name "Breadth", discovers all the neighbours of a node through the out edges of the node then it discovers the unvisited neighbours of the previously mentioned neighbours through their out edges and so forth, till all the nodes reachable from the origional source are visited (we can continue and take another origional source if there are remaining unvisited nodes and so forth). That's why it can be used to find the shortest path (if there is any) from a node (origional source) to another node if the weights of the edges are uniform.
Depth First Search (DFS) algorithm, from its name "Depth", discovers the unvisited neighbours of the most recently discovered node x through its out edges. If there is no unvisited neighbour from the node x, the algorithm backtracks to discover the unvisited neighbours of the node (through its out edges) from which node x was discovered, and so forth, till all the nodes reachable from the origional source are visited (we can continue and take another origional source if there are remaining unvisited nodes and so forth).
Both BFS and DFS can be incomplete. For example if the branching factor of a node is infinite, or very big for the resources (memory) to support (e.g. when storing the nodes to be discovered next), then BFS is not complete even though the searched key can be at a distance of few edges from the origional source. This infinite branching factor can be because of infinite choices (neighbouring nodes) from a given node to discover.
If the depth is infinite, or very big for the resources (memory) to support (e.g. when storing the nodes to be discovered next), then DFS is not complete even though the searched key can be the third neighbor of the origional source. This infinite depth can be because of a situation where there is, for every node the algorithm discovers, at least a new choice (neighbouring node) that is unvisited before.
Therefore, we can conclude when to use the BFS and DFS. Suppose we are dealing with a manageable limited branching factor and a manageable limited depth. If the searched node is shallow i.e. reachable after some edges from the origional source, then it is better to use BFS. On the other hand, if the searched node is deep i.e. reachable after a lot of edges from the origional source, then it is better to use DFS.
For example, in a social network if we want to search for people who have similar interests of a specific person, we can apply BFS from this person as an origional source, because mostly these people will be his direct friends or friends of friends i.e. one or two edges far.
On the other hand, if we want to search for people who have completely different interests of a specific person, we can apply DFS from this person as an origional source, because mostly these people will be very far from him i.e. friend of friend of friend.... i.e. too many edges far.
Applications of BFS and DFS can vary also because of the mechanism of searching in each one. For example, we can use either BFS (assuming the branching factor is manageable) or DFS (assuming the depth is manageable) when we just want to check the reachability from one node to another having no information where that node can be. Also both of them can solve same tasks like topological sorting of a graph (if it has).
BFS can be used to find the shortest path, with unit weight edges, from a node (origional source) to another. Whereas, DFS can be used to exhaust all the choices because of its nature of going in depth, like discovering the longest path between two nodes in an acyclic graph. Also DFS, can be used for cycle detection in a graph.
In the end if we have infinite depth and infinite branching factor, we can use Iterative Deepening Search (IDS).
I think it depends on what problems you are facing.
shortest path on simple graph -> bfs
all possible results -> dfs
search on graph(treat tree, martix as a graph too) -> dfs
....
Some algorithms depend on particular properties of DFS (or BFS) to work. For example the Hopcroft and Tarjan algorithm for finding 2-connected components takes advantage of the fact that each already visited node encountered by DFS is on the path from root to the currently explored node.
For BFS, we can consider Facebook example. We receive suggestion to add friends from the FB profile from other other friends profile. Suppose A->B, while B->E and B->F, so A will get suggestion for E And F. They must be using BFS to read till second level.
DFS is more based on scenarios where we want to forecast something based on data we have from source to destination. As mentioned already about chess or sudoku.
Once thing I have different here is, I believe DFS should be used for shortest path because DFS will cover the whole path first then we can decide the best. But as BFS will use greedy's approach so might be it looks like its the shortest path, but the final result might differ.
Let me know whether my understanding is wrong.
According to the properties of DFS and BFS.
For example,when we want to find the shortest path.
we usually use bfs,it can guarantee the 'shortest'.
but dfs only can guarantee that we can come from this point can achieve that point ,can not guarantee the 'shortest'.
Because Depth-First Searches use a stack as the nodes are processed, backtracking is provided with DFS. Because Breadth-First Searches use a queue, not a stack, to keep track of what nodes are processed, backtracking is not provided with BFS.
When tree width is very large and depth is low use DFS as recursion stack will not overflow.Use BFS when width is low and depth is very large to traverse the tree.
This is a good example to demonstrate that BFS is better than DFS in certain case. https://leetcode.com/problems/01-matrix/
When correctly implemented, both solutions should visit cells that have farther distance than the current cell +1.
But DFS is inefficient and repeatedly visited the same cell resulting O(n*n) complexity.
For example,
1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,
0,0,0,0,0,0,0,0,

How to modify preorder tree traversal algorithm to handle nodes with multiple parents?

I've been searching for a while now and can't seem to find an alternative solution. I need the tree traversal algorithm in such a way that a node can have more than 1 parent, if it's possible (found a great article here: Storing Hierarchical Data in a Database). Are there any algorithms so that, starting from a root node, we can determine the sequence and dependencies of nodes (currently reading topological sorting)?
The structure you described isn't a tree, it's a directed graph. As it would be suitable for hierarchical drawing you might be tempted to think of it as a tree (which itself is an acyclic connected graph).
Typical traversal algorithms for graphs are depth-first and breadth-first. The graph implementation is only different as it records the nodes it has already visited in order to avoid visiting certain nodes multiple times. However, if your data structure guarantees that it's acyclic, you can use tree algorithms on your graph by simply treating "parents" as "children".
I made a simple sketch to illustrate what I mean (the perfect chance to try Google Docs' new drawing feature):
As you see, it's possible to treat any graph that has an acyclic directed form as a tree and apply tree algorithms on it. As soon as you can't guarantee this property you'll have to go for dedicated graph algorithms.
A tree is basically a directed unweighted graph, where each vertice has N or less edges, and no cycles can happen.
If your'e certain there are no cycles in your tree, you could just treat a parent as another child of the specified node, and preform a preorder traversal normally.
However, if cycles might happen, you need graph algorithms.
Specifically: Breadth first search.
Just checking for maybe a simple case: can the two parents have different parents?
If no you could turn them into single node (conceptually) and have a tree again.
Otherwise you will have to split the child node and duplicate a branch for the other parent.
(This can of course lead to inconsistency and/or inneficient algorithms later, depending if you will need to maintain the data structure).
The above options hold if you insist on having the tree structure, which by definition can have only one parent.
So maybe you need to step back and explain what are you trying to accomplish and why it must be a tree structure if nodes can have two parents.
You aren't describing a tree here. You can NOT call your graph a tree.
A tree is an undirected graph without cycles. Parent/child relationship is NOT an interpretation of directions drawn on the edges. They are the result of naming one vertex the root.
We name a vertex "parent" to current, because it's the next one to the path to root. All other vertexes adjacent to current one are "children".
You can't just lay out an arbitrary graph in such a way that "parents" are "above" or "point to vertex", and children are "below" or "vertex points to them". A tree is a tree because a root is picked. What you depict in your question is not a tree. And tree traversal algorithms are NOT applicable to traversing arbitrary graphs.
There are several graph traversal algorithms, such as breadth-first search or depth-first search (check side notes in those pages for more). Use them instead of trying to tie your full-featured graph into your knowledge about trees.

Is there an effient way of determining whether a leaf node is reachable from another arbitrary node in a Directed Acyclic Graph?

Wikipedia: Directed Acyclic Graph
Not sure if leaf node is still proper terminology since it's not really a tree (each node can have multiple children and also multiple parents) and also I'm actually trying to find all the root nodes (which is really just a matter of semantics, if you reverse the direction of all the edges it'd they'd be leaf nodes).
Right now we're just traversing the entire graph (that's reachable from the specified node), but that's turning out to be somewhat expensive, so I'm wondering if there's a better algorithm for doing this. One thing I'm thinking is that we keep track of nodes that have been visited already (while traversing a different path) and don't recheck those.
Are there any other algorithmic optimizations?
We also thought about keeping a list of root nodes that this node is a descendant of, but it seems like maintaining such a list would be fairly expensive as well if we need to check if it changes every time a node is added, moved, or removed.
Edit:
This is more than just finding a single node, but rather finding ALL nodes that are endpoints.
Also there is no master list of nodes. Each node has a list of it's children and it's parents. (Well, that's not completely true, but pulling millions of nodes from the DB ahead of time is prohibitively expensive and would likely cause an OutOfMemory exception)
Edit2:
May or may not change possible solutions, but the graph is bottom-heavy in that there's at most a few dozen root nodes (what I'm trying to find) and some millions (possibly tens or hundreds of millions) leaf nodes (where I'm starting from).
There are a few methods that each may be faster depending on your structure, but in general what youre going to want is a traversal.
A depth first search, goes through each possible route, keeping track of nodes that have already been visited. It's a recursive function, because at each node you have to branch and try each child node of it. There's no faster method if you dont know which way to look for the object you just have to try each way! You definitely need to keep track of where you have already been because it would be wasteful otherwise. It should require on the order of the number of nodes to do a full traversal.
A breadth first search is similar but visits each child of the node before "moving on" and as such builds up layers of distance from the chosen root. This can be faster if the destination is expected to be close to the root node. It would be slower if it is expected to be all the way down a path, because it forces you to traverse every possible edge.
Youre right about maybe keeping a list of known root nodes, the tradeoff there is that you basically have to do the search whenever you alter the graph. If you are altering the graph rarely this is acceptable, but if you alter the graph more frequently than you need to generate this information, then of course it is too costly.
EDIT: Info Update.
It sounds like we are actually looking for a path between two arbitrary nodes, the root/leaf semantic keeps getting switched. The DepthFirstSearch (DFS) starts at one node, and then for each unvisited child, recurse. Break if you find the target node. Due to the way recursion evaluates, this will traverse all the way down the 'left' path, then enumerate nodes at this distance before ever getting to the 'right' path. This is time costly and inefficient if the target node is potentially the first child on the right. BreadthFirst walks in steps, covering all children before moving forward. Because your graph is bottom heavy like a tree, both will be approximately the same execution time.
When the graph is bottom heavy you might be interested in a reverse traversal. Start at the target node and walk upwards, because there are relatively fewer nodes in this direction. So long as the nodes in general have more parents than children, this direction will be much faster. You can also combine the approaches, stepping one up and one down , then comparing lists of nodes, and meeting somewhere in the middle. (this combination might seem the fastest if you ignore that twice as much work is done at each step).
However, since you said that your graph is stored as a list of lists of children, you have no real way of traversing the graph backwards. A node does not know what its parents are. This is a problem. To fix it you have to get a node to know what its parents are by adding that data on graph update, or by creating a duplicate of the whole structure (which you have said is too large). It will need the whole structure to be rewritten, which sounds probably out of the question due to it being a large database at this point.
There's a lot of work to do.
http://en.wikipedia.org/wiki/Graph_(data_structure)
Just color (keep track of) visited nodes.
Sample in Python:
def reachable(nodes, edges, start, end):
color = {}
for n in nodes:
color[n] = False
q = [start]
while q:
n = q.pop()
if color[n]:
continue
color[n] = True
for adj in edges[n]:
q.append(adj)
return color[end]
For a vertex x you want to compute a bit array f(x), each bit corresponds to a root vertex Ri, and 1 (resp 0) means "x can (resp can't) be reached from root vertex Ri.
You could partition the graph into one "upper" set U containing all your target roots R and such that if x in U then all parents of x are in U. For example the set of all vertices at distance <=D from the closest Ri.
Keep U not too big, and precompute f for each vertex x of U.
Then, for a query vertex y: if y is in U, you already have the result. Otherwise recursively perform the query for all parents of y, caching the value f(x) for each visited vertex x (in a map for example), so you won't compute a value twice. The value of f(y) is the bitwise OR of the value of its parents.

Resources