I am working on a recursive DFS to retrieve all paths between two nodes in an undirected and unweighted graph for now. It takes the start and end node, and DFS on the node and its adjacent nodes recursively while saving the paths.
I was wondering whether there is a more efficient way to find all paths?
There are exponential number of simple paths, and DFS is basically creating all of them 0 so your approach is correct, though time consuming (but this is a part of the problem itself, not the algorithm).
You might be able to optimize it a bit by eliminating from the graph nodes that do not lead to the target, if such nodes exist - effectively trimming unsuccesful searches before calculating them.
Be aware that if the graph contain cycles - there could be infinite number of paths (though finite number of simple paths). Note that to avoid an infinite loop and get all simple paths, your DFS will need to maintain a visited set, that is modified per path (once "discovering" a node insert it to set, and once it is popped from the stack, remove it from the set).
You can adapt Dijkstra's algorithm, also see A Recursive Algorithm to Find all Paths Between Two Given Nodes
Related
As the title says, I have a graph that contains cycles and is directed. It's strongly connected so there's no danger of getting "stuck". Given a start node, I want to find the a path (ideally the shortest but that's not the thing I'm optimising for) that visits every node.
It's worth saying that many of the nodes in this graph are frequently connected both ways - i.e. it's almost undirected. I'm wondering if there's a modified DFS that might work well for this particular use case?
If not, should I be looking at the Held-Karp algortihm? The visit once and return to starting point restrictions don't apply for me.
The easiest approach would probably be to choose a root arbitrarily and compute a BFS tree on G (i.e., paths from the root to each other vertex) and a BFS tree on the transpose of G (i.e., paths from each other vertex to the root). Then for each other vertex you can navigate to and from the root by alternating tree paths. There are various quick optimizations to this method.
Another possibility would be to use A* on the search space consisting of states current node × set of visited nodes, with heuristic equal to the number of nodes not visited yet. The worst-case running time is comparable to Held–Karp (which you could also apply after running Floyd–Warshall to form a complete unsymmetric distance matrix).
I just want to get the distance of source node from every node. But it is different than graph problems since it is a tree and path between every node is unique so I expect answer to be in more efficient time.
Is it possible to get answer in efficient time?
You're absolutely right that in a tree, the difficulty of finding a path between two nodes is a lot lower than in a general graph because once you find any path (at least, one without cycles) you know it's the shortest. So all you have to do is just find all paths starting at the given node and going to each other node. You can do this with either a depth-first or a breadth-first search in time O(n). To find the lengths, just keep track of the lengths of the edges you've seen along the paths you've traveled as you travel them.
This is not different from "graph problems": a tree is a special case of a graph. Dijkstra's algorithm is a standard of graph traversal. Just modify it a little: keep all of the path lengths as you find them, and don't worry about the compare-update step, since you're going to keep all of the results. Continue until you run out of nodes to check, and there are your path lengths.
To expand on the title, I need all simple (non-cyclical) paths between all nodes in a very large undirected graph.
The most obvious optimization I can think of is that once I have calculated all the paths between a particular pair of nodes I can just reverse them all instead of recalculating when I need to go the other way.
I was looking into transitive closures and the Floyd–Warshall algorithm, but it looks like the best I could do if I went down that route would be to find only the shortest paths between all nodes.
Any ideas? Right now I'm looking at running a DFS on every node in the graph, which seems to me to be significantly less than optimal.
I don't understand the reasoning behind your idea that DFS is significantly less than optimal. In fact, DFS is clearly optimal here.
If you traverse the graph, limiting the branching only to vertices which haven't been visited in this branch so far, then the total number of nodes in the DFS tree will be equal to the number of simple paths from the starting vertex to all other vertices. As all of these paths are a part of your output, the algorithm cannot be meaningfully improved, as you can't reduce complexity below the size of the output.
There is simply no way to output a factorial amount of data in polynomial time, regardless of what the problem is or what algorithm you are using.
I have studied the two graph traversal algorithms,depth first search and breadth first search.Since both algorithms are used to solve the same problem of graph traversal I would like to know how to choose between the two.I mean is one more efficient than the other or any reason why i would choose one over the other in a particular scenario ?
Thank You
Main difference to me is somewhat theoretical. If you had an infinite sized graph then DFS would never find an element if it exists outside of the first path it chooses. It would essentially keep going down the first path and would never find the element. The BFS would eventually find the element.
If the size of the graph is finite, DFS would likely find a outlier (larger distance between root and goal) element faster where BFS would find a closer element faster. Except in the case where DFS chooses the path of the shallow element.
In general, BFS is better for problems related to finding the shortest paths or somewhat related problems. Because here you go from one node to all node that are adjacent to it and hence you effectively move from path length one to path length two and so on.
While DFS on the other end helps more in connectivity problems and also in finding cycles in graph(though I think you might be able to find cycles with a bit of modification of BFS too). Determining connectivity with DFS is trivial, if you call the explore procedure twice from the DFS procedure, then the graph is disconnected (this is for an undirected graph). You can see the strongly connected component algorithm for a directed graph here, which is a modification of DFS. Another application of the DFS is topological sorting.
These are some applications of both the algorithms:
DFS:
Connectivity
Strongly Connected Components
Topological Sorting
BFS:
Shortest Path(Dijkstra is some what of a modification of BFS).
Testing whether the graph is Bipartitie.
When traversing a multiply-connected graph, the order in which nodes are traversed may greatly influence (by many orders of magnitude) the number of nodes to be tracked by the traversing method. Some kinds of algorithms will be massively better when using breadth-first; others will be massively better when using depth-search.
At one extreme, doing a depth-first search on a binary tree with N leaf nodes requires that the traversing method keep track of lgN nodes while a breadth-first search would require keeping track of at least N/2 nodes (since it might scan all other nodes before it scans any leaf nodes; immediately prior to scanning the first leaf node, it would have encountered N/2 of the leafs' parent nodes which have to be tracked separately since none of them reference each other).
On the other extreme, doing a flood-fill on an NxN grid with a method that, if its pixel hasn't been colored yet, colors that pixel and then flood-fills its neighbors will require enqueuing O(N) pixel coordinates if using breadth-first search, but O(N^2) pixel coordinates if using depth-first. When using breadth-first search, paint will seem to "spread out", regardless of the shape to be painted; when using depth-first algorithm to paint a rectangular spiral, each line of which is straight on one side and jagged on the other (which sides should be straight and jagged depends upon the exact algorithm used), all of the straight sections will get painted before any of the jagged ones, meaning that the system must track the location of every jag separately.
For a complete/perfect tree, DFS takes a linear amount of space with respect to the depth of the tree whereas BFS takes an exponential amount of space with respect to the depth of the tree. This is because for BFS the maximum number of nodes in the queue is proportional to the number of nodes in one level of the tree. In DFS the maximum number of nodes in the stack is proportional to the depth of the tree.
I'm looking for an algorithm to count the number of paths crossing a specific node in a DAG (similar to the concept of 'betweenness'), with the following conditions and constraints:
I need to do the counting for a set of source/destination nodes in the graph, and not all nodes, i.e. for a middle node n, I want to know how many distinct shortest paths from set of nodes S to set of nodes D pass through n (and by distinct, I mean every two paths that have at least one non-common node)
What are the algorithms you may suggest to do this, considering that the DAG may be very large but sparse in edges, and hence preference is not given to deep nested loops on nodes.
You could use a breadth first search for each pair of Src/Dest nodes and see which of those have your given node in the path. You would have to modify the search slightly such that once you've found your shortest path, you continue to empty the queue until you reach a path that causes you to increase the size. In this way you're not bound by random chance if there are multiple shortest paths. This is only an option with non-weighted graphs, of course.