dijkstra's shortest path algorithm backtracks? - hadoop

I am trying to implement dijkstra's shortest path algorithm using map reduce.
I have two questions:
Does this algorithms backtracks to re-evaluate the distances in case the distance turns out to be less for not selected path. For example-> 1->2->5 and 2->3->2 consider these values to be weights and possible 2 paths to a destination path 1 would be selected as 1<2 but overall sum of weights is less for path 2 that is 2->3->2 so want to know if dijkstra's algorithm takes care of backtracking.
Please give me a brief idea of how map and reduce function will be in this case. I am thinking of emitting in map function as and in reduce function and in reduce function I iterate over associated weights to find the least weighted neighbour ..but after that how it function. Please give me a good idea of how it happens from scratch in a cluster and what happens internally.

Dijkstra's does not perform backtracking to re-evaluate the distances.
http://upload.wikimedia.org/wikipedia/commons/5/57/Dijkstra_Animation.gif
that gif should help you understand how Dijkstra's algorithm re-evaluate distances. It avoids the task of backtracking by storing the "shortest path to node n" inside node n.
During traversal if the algorithm comes across node n again, it will simply compare the current "distance" it traversed to get to node n and compare it to the data stored in node n. If it is greater it ignores it and if it is lesser it keeps replaces the data in node n.
Dijkstra's however has a limitation when dealing with negative edges since you could end up with a negative cycle in some circumstances, so that is something you should be wary of.

Related

Does Dijkstra's algorithm always return the "shortest" (least number of edges) path?

There are two functions that I wish to minimize:
a. the number of "obstacles" on the path (assume each obstacle increases the cost); and
b. total number of edges between the source and the destination.
If I had to minimize just (a), I would have used Dijkstra's algorithm; if I had to minimize just (b), I would have used BFS.
But given that I have to minimize both, can I use Dijkstra's algorithm only? In other words, if I find the path with the least cost from the obstacles, does Dijkstra's algorithm also guarantee that the path length thus obtained (between source and destination) would be the shortest?
When discussing paths on weighted graphs, the term "shortest path" means the path with the lowest total cost. Think of the weights as distances. This is the path that Dijkstra's algorithm will find.
You can use any cost function you like, as long as the cost to get from one vertex to another is always positive or zero. As mentioned in comments, however, you can only minimize one function at a time. This is a general fact that has nothing to do with Dijkstra's algorithm.
The cost function that you seem to suggesting is perfectly fine -- the cost to move to a normal vertex is 1, while the cost to move to an "obstacle" vertex is higher. Dijkstra's algorithm is the appropriate way to find a path with lowest total cost.

Finding shortest path distances from a given node s to ALL the nodes in V in a graph with two negative edges

I have a follow-up question on this:
Finding shortest path distances in a graph containing at most two negative edges
Ranveer's solution looks great, but it is not fast enough because I need O(|E| + |V|*log|V|) fast algorithm.
I guess Dukeling's solution works great. It makes sense and it operates in the same running time of Dijkstra's algorithm.
However, my goal is to find shortest path distances from a given node s to ALL the nodes in V.
If I apply Dukeling's algorithm by setting all the nodes in V as end vertex e, I will need to run it |V| - 1 times. Then, the running time will be O(|V||E| + |V^2|*log|V|).
Any help would be appreciated!
Dijkstra's algorithm, in its original form, finds all the shortest paths from a source node to all other nodes in the graph.
You have (at least) two options for your problem:
Use Bellman - Ford. It's not as slow as its big-oh would suggest, at least not necessarily. Make sure you implement it like you would a BF search: using a FIFO queue. This means you will insert a node into the queue every time the distance to it is updated, and only if it isn't already in the queue. Other optimizations are also possible, but this should already give you a fast algorithm in practice;
Use Dijkstra's, but modified similarly to Bellman - Ford: the default Dijkstra's never inserts a node twice into the priority queue. Make sure you reinsert nodes if you have updated the distance to them. This will deal with negative cost edges. It essentially makes the algorithm closer to the Bellman - Ford described above, but using a priority queue instead of a FIFO queue. This will also get you closer to your desired complexity.

Efficient way for finding the nodes having maximum degree of separation in a connected graph

I am working on a graph library.It has to have a function which finds the two nodes which are most separated i.e they maximum number of the minimum number of nodes required to traverse before reaching the target node from the source node.
One naive way would be to calculate the degree of separation from each node to all other node and repeat the same for every node.
The complexity of this turns out to be O(n^2).
Any better solution to this problem ?
Use Floyd-Warshall algorithm to find all pairs shortest path. Then iterate through results and find one with the longest path.
Without any assumptions on the graph, Floyd-Warshall is the way to go.
If your graph is sparse (i.e. it has a relatively few edges by node, or |E|<<|N|^2), then Johnson is likely to be faster.
With unit edge weight (which seems to be your case), a naïve approach by computing the furthest node (with BFS) for each node leads to O(|N|.|E|). This can probably be improved further, but I don't see a way right now.

Most time efficient method of finding all simple paths between all nodes in an undirected graph

To expand on the title, I need all simple (non-cyclical) paths between all nodes in a very large undirected graph.
The most obvious optimization I can think of is that once I have calculated all the paths between a particular pair of nodes I can just reverse them all instead of recalculating when I need to go the other way.
I was looking into transitive closures and the Floyd–Warshall algorithm, but it looks like the best I could do if I went down that route would be to find only the shortest paths between all nodes.
Any ideas? Right now I'm looking at running a DFS on every node in the graph, which seems to me to be significantly less than optimal.
I don't understand the reasoning behind your idea that DFS is significantly less than optimal. In fact, DFS is clearly optimal here.
If you traverse the graph, limiting the branching only to vertices which haven't been visited in this branch so far, then the total number of nodes in the DFS tree will be equal to the number of simple paths from the starting vertex to all other vertices. As all of these paths are a part of your output, the algorithm cannot be meaningfully improved, as you can't reduce complexity below the size of the output.
There is simply no way to output a factorial amount of data in polynomial time, regardless of what the problem is or what algorithm you are using.

Suggestions for KSPA on undirected graph

There is a custom implementation of KSPA which needs to be re-written. The current implementation uses a modified Dijkstra's algorithm whose pseudocode is roughly explained below. It is commonly known as KSPA using edge-deletion strategy i think so. (i am a novice in graph-theory).
Step:-1. Calculate the shortest path between any given pair of nodes using the Dijkstra algorithm. k = 0 here.
Step:-2. Set k = 1
Step:-3. Extract all the edges from all the ‘k-1’ shortest path trees. Add the same to a linked list Edge_List.
Step:-4. Create a combination of ‘k’ edges from Edge_List to be deleted at once such that each edge belongs to a different SPT (Shortest Path Tree). This can be done by inspecting the ‘k’ value for each edge of the combination considered. The ‘k’ value has to be different for each of the edge of the chosen combination.
Step:-5. Delete the combination of edges chosen in the above step temporarily from the graph in memory.
Step:-6. Re-run Dijkstra for the same pair of nodes as in Step:-1.
Step:-7. Add the resulting path into a temporary list of paths. Paths_List.
Step:-8. Restore the deleted edges back into the graph.
Step:-9. Go to Step:-4 to get another combination of edges for deletion until all unique combinations are exhausted. This is nothing but choosing ‘r’ edges at a time among ‘n’ edges => nCr.
Step:-10. The ‘k+1’ th shortest path is = Minimum(Paths_List).
Step:-11. k = k + 1 Go to Step:-3, until k < N.
Step:-12. STOP
As i understand the algorithm, to get kth shortest path, ‘k-1’ SPTs are to be found between each source-destination pair and ‘k-1’ edges each from one SPT are to be deleted simultaneously for every combination.
Clearly this algorithm has combinatorial complexity and clogs the server on large graphs. People suggested me Eppstein's algorithm (http://www.ics.uci.edu/~eppstein/pubs/Epp-SJC-98.pdf). But this white paper cites a 'digraph' and I did not see a mention that it works only for digraphs. I just wanted to ask folks here if anyone has used this algorithm on an undirected graph?
If not, are there good algorithms (in terms of time-complexity) to implement KSPA on an undirected graph?
Thanks in advance,
Time complexity: O(K*(E*log(K)+V*log(V)))
Memory complexity of O(K*V) (+O(E) for storing the input).
We perform a modified Djikstra as follows:
For each node, instead of keeping the best currently-known cost of route from start-node. We keep the best K routes from start node
When updating a nodes' neighbours, we don't check if it improves the best currently known path (like Djikstra does), we check if it improves the worst of the K' best currently known path.
After we already processed the first of a nodes' K best routes, we don't need to find K best routes, but only have K-1 remaining, and after another one K-2. That's what I called K'.
For each node we will keep two priority queues for the K' best currently known path-lengths.
In one priority queue the shortest path is on top. We use this priority queue to determine which of the K' is best and will be used in the regular Djikstra's priority queues as the node's representative.
In the other priority queue the longest path is on top. We use this one to compare candidate paths to the worst of the K' paths.

Resources