I am implementing a bidirectional Dijkstra's algorithm and having issues understanding how the various implementations of the stopping condition work out.
Take this graph for example, where we're starting from node A, targetting node J. Below the graph I have listed the cluster (processed) and relaxed (fringes) nodes at the time the algorithm stops:
The accepted answer to “Bidirectional Dijkstra” by NetworkX explains that the algorithm stops when the same node has been processed in both directions. In my graph that will be Node F. If that were the case, the algorithm stops after finding the shortest path of length 9 going from A-B-C...H-I-J. But this wouldn't be the shortest path because A-J have a direct edge of length 8 which is never taken because the weight 8 is never popped from the priority queue.
Even in this Java implementation on github of Dijksta's bi-directional algorithm, the stopping condition is:
double mtmp = DISTANCEA.get(OPENA.min()) +
DISTANCEB.get(OPENB.min());
if (mtmp >= bestPathLength) return PATH;
This algorithm stops when the top node weights -- from each front and backward queue -- add up to at least the best path length so far. But this wouldn't return the correct shortest path either. Because in that case it will be node G(6) and E(5) totalling to 11, which is greater than the best path so far of length 9.
I don't understand that seemingly both of these stopping conditions yield an incorrect shortest path. I am not sure what I am misunderstanding.
What is a good stopping condition for Dijkstra's bidirectional algorithm? Also, what would be a stopping condition for bidirectional A*?
Conceptually the stopping condition for Dijkstra's algorithm, whether bidirectional or not, is that you stop when the best path you've found is as good as any path you might find if you continue. Only closed paths (the ones in your "cluster" sets above) count as found.
For bidirectional Dijkstra's, a path is "found" whenever the same vertex exists in both the forward and reverse closed sets. That part is easy, but how good is the best path you might find in the future?
To make sure the answer you get is correct, you evaluation of the best path you might find needs to be accurate, or an underestimate. Lets consider the possibilities:
A path might be made with a vertex that is open in both directions.
A vertex that is open in one direction might make a path with one that is already closed in the other direction.
The problem is case (2). The sets and priority queues we use for Dijkstra's algorithm do not allow us to make a very useful underestimate of the best path in this case. The smallest distance in a closed set is always 0, and if we add this to the minimum open from the other direction, then we come up with:
double mtmp = min ( DISTANCEA.get(OPENA.min()) , DISTANCEB.get(OPENB.min()) );
This works, and will produce the correct answer, but it will make the algorithm run until the best complete path is found in at least one direction. Unidirectional Dijkstra's would be faster in many cases.
I think this "bidirectonal Dijkstra's" idea needs significant rework in order to be really good.
Related
I was told by the cracking interview book that bi-directional algorithm give shortest path between 2 points in graph.
I don't get why it is guaranteed shortest path. Doesn't collision point change depended on vertexs' queuing order during breadth-first search?
thx
Doesn't collision point change depended on vertexs' queuing order during breadth-first search?
Yes, it does. However, whichever node ends up being chosen, it will connect one of the shortest paths between the source and target nodes.
So if there are multiple such choices, it can be through any of them depending on queueing order as you said. But you are guaranteed that the resulting path will be of the same optimal length.
Most of the time when implementing a pathfinding algorithm such as A*, we seek to minimize the travel cost along the path. We could also seek to find the optimal path with the fewest number of turns. This could be done by, instead of having a grid of location states, having a grid of location-direction states. For any given location in the old grid, we would have 4 states in that spot representing that location moving left, right, up, or down. That is, if you were expanding to a node above you, you would actually be adding the 'up' state of that node to the priority queue, since we've found the quickest route to this node when going UP. If you were going that direction anyway, we wouldnt add anything to the weight. However, if we had to turn from the current node to get to the expanded node, we would add a small epsilon to the weight such that two shortest paths in distance would not be equal in cost if their number of turns differed. As long as epsilon is << cost of moving between nodes, its still the shortest path.
I now pose a similar problem, but with relaxed constraints. I no longer wish to find the shortest path, not even a path with the fewest turns. My only goal is to find a path of ANY length with numTurns <= n. To clarify, the goal of this algorithm would be to answer the question:
"Does there exist a path P from locations A to B such that there are fewer than or equal to n turns?"
I'm asking whether using some sort of greedy algorithm here would be helpful, since I do not require minimum distance nor turns. The problem is, if I'm NOT finding the minimum, the algorithm may search through more squares on the board. That is, normally a shortest path algorithm searches the least number of squares it has to, which is key for performance.
Are there any techniques that come to mind that would provide an efficient way (better or same as A*) to find such a path? Again, A* with fewest turns provides the "optimal" solution for distance and #turns. But for my problem, "optimal" is the fastest way the function can return whether there is a path of <=n turns between A and B. Note that there can be obstacles in the path, but other than that, moving from one square to another is the same cost (unless turning, as mentioned above).
I've been brainstorming, but I can not think of anything other than A* with the turn states . It might not be possible to do better than this, but I thought there may be a clever exploitation of my relaxed conditions. I've even considered using just numTurns as the cost of moving on the board, but that could waste a lot of time searching dead paths. Thanks very much!
Edit: Final clarification - Path does not have to have least number of turns, just <= n. Path does not have to be a shortest path, it can be a huge path if it only has n turns. The goal is for this function to execute quickly, I don't even need to record the path. I just need to know whether there exists one. Thanks :)
I have implemented the hungarian algorithm, a solution to the assignment problem, as described by this article, but it fails on a few percent of random costs matrices.
I've spent weeks debugging it(I started when I asked this question, not full time though). I took random cost matrices for which the algorithm fails and performed the algorithm with good old pen and paper, and compared that with my implementation to see what went wrong. This led me to a few bugs which I've corrected now, but I have encountered an example for which I do not get the right solution when solving it by hand. For anyone who is interested: the costmatrix of that example is {{0,6,4,3},{3,2,1,2},{0,7,6,4},{3,8,5,3}}, for which the correct solution has the sum of 9=4+2+0+3(in that order). In that example there is eventually a matched edge not on the equality subgraph, and I think that is impossible, indicating something is wrong.
Either I don't fully understand the solution, which is a viable option, or an extremely subtle bug in the presented solution, which I will elaborate on below.
I realize I have to introduce some terminology, but since this is a detailed question I am not going to explain all concepts in full detail, as anyone needing that explanation probably wouldn't be able to answer my question anyway.
The input of the problem is a weighted complete bipartite graph with n nodes on each partition.
The presented method specifies to find n augmenting paths.
An augmenting path is an alternating path starting and ending at a unmatched nodes.
An alternating path is a path alternating between matched an unmatched edges on the equality subgraph.
These alternating paths are grown in a breadth-first manner, stopping only when either:
An augmenting path is found or
the alternating paths cannot be grown any further.
And a crucial fact to the possible bug: the algorithm remembers what nodes the alternating paths have encountered, which affects the algorithm in a part irrelevant to this question.
When an augmented path is found, the presented method says to stop growing the alternating paths. I believe this is incorrect. I think all alternating paths need to be grown up to the cost of the found augmented path. Notice that the alternating paths are grown in a breadth first manner, so this only grows paths whose costs can tie with the found path. This small change might result in some nodes being marked as 'visited by alternating path' which otherwise wouldn't have been marked, which affect the algorithm further on.
The actual question:
Should I consider alternating paths with costs equal to the costs of the augmented path (and starting at the same node) explored? This is contrary to the presented method, which says to stop as soon as an augmented path is found, regardless of any ties in costs with other paths.
Looking at the presentation of the Hungarian algorithm in "The Stanford GraphBase" you can track its progress towards a solution as adding a constant to every cell in a row of the cost matrix, or every cell in a column of the cost matrix, and see that you have a solution when you have a complete set of independent zeros in the altered matrix.
I have read just once the paper you refer to. Is it the case that finding an augmenting path allows you to increase the number of independent zeros in the altered matrix? If so, then finding n augmenting paths, as in their Figure 3 step 2, will find a good solution, because you must then have n independent zeros. If so, then you can check your implementation of the algorithm by checking that each augmenting path found adds an independent zero, even in the case when there are other paths that it could have found but stopped short of finding.
I've been tasked to write an implementation of the A* algorithm (heuristics provided) that will solve the travelling salesman problem. I understand the algorithm, it's simple enough, but I just can't see the code that implements it. I mean, I get it. Priority queue for the nodes, sorted by distance + heuristic(node), add the closest node on to the path. The question is, like, what happens if the closest node can't be reached from the previous closest node? How does one actually take a "graph" as a function argument? I just can't see how the algorithm actually functions, as code.
I read the Wikipedia page before posting the question. Repeatedly. It doesn't really answer the question- searching the graph is way, way different to solving the TSP. For example, you could construct a graph where the shortest node at any given time always results in a backtrack, since two paths of the same length aren't equal, whereas if you're just trying to go from A to B then two paths of the same length are equal.
You could derive a graph by which some nodes are never reached by always going closest first.
I don't really see how A* applies to the TSP. I mean, finding a route from A to B, sure, I get that. But the TSP? I don't see the connection.
I found a solution here
Use minimum spanning tree as a heuristic.
Set
Initial State: Agent in the start city and has not visited any other city
Goal State: Agent has visited all the cities and reached the start city again
Successor Function: Generates all cities that have not yet visited
Edge-cost: distance between the cities represented by the nodes, use this cost to calculate g(n).
h(n): distance to the nearest unvisited city from the current city + estimated distance to travel all the unvisited cities (MST heuristic used here) + nearest distance from an unvisited city to the start city. Note that this is an admissible heuristic function.
You may consider maintaining a list of visited cities and a list of unvisited cities to facilitate computations.
The confusion here is that the graph on which you are trying to solve the TSP is not the graph you are performing an A* search on.
See related: Sudoku solving algorithm C++
To solve this problem you need to:
Define your:
TSP states
TSP initial state
TSP goal state(s)
TSP state successor function
TSP state heuristic
Apply a generic A* solver to this TSP state graph
A quick example I can think up:
TSP states: list of nodes (cities) currently in the TSP cycle
TSP initial state: the list containing a single node, the travelling salesman's home town
TSP goal state(s): a state is a goal if it contains every node in the graph of cities
TSP successor function: can add any node (city) that isn't in the current cycle to the end of the list of nodes in the cycle to get a new state
The cost of the transition is equal to the cost of the edge you're adding to the cycle
TSP state heuristic: you decide
If it's just a problem of understanding the algorithm and how it works you might want to consider drawing a graph on paper, assigning weights to it and drawing it out. Also you can probably find some animations that show Dijkstra's shortest path, Wikipedia has a good one. The only difference between Dijkstra and A* is the addition of the heuristic, and you stop the search as soon as you reach the target node. As far as using it to solve the TSP, good luck with that!
Think about this a little more abstractly. Forget about A* for a moment, it's just dijkstra's with a heuristic anyway. Before, you wanted to get from A to B. What was your goal? To get to B. The goal was to get to B with the least cost. At any given point, what was your current "state"? Probably just your location on the graph.
Now, you want to start at A, then go to both B and C. What is your goal now? To pass over both B and C, maintaining least cost. You can generalize this with more nodes: D, E, F, ... or just N nodes. Now, at any given point, what is your current "state"? This is critical: it ISN'T just your location in the graph--it's also which of B or C or whatever nodes you have visited so far in the search.
Implement your original algorithm so that it calls some function asking if it has reached "the goal state" after making X move. Before, the function would have just said "yes, you're at state B, therefore you are at the goal". But now, let that function return "yes, you're at the goal state" if the search's path has passed over each of the points of interest. It'll know whether or not the search has passed over all points of interest because that's included in the current state.
After you get that, improve the search with some heuristic, and A* it up.
To answer one of your questions...
To pass a graph as a function argument, you have several options. You could pass a pointer to an array containing all the nodes. You could pass just the one starting node and work from there, if it's a fully connected graph. And finally, you could write a graph class with whatever data structures you need inside it, and pass a reference to an instance of that class.
As for your other question about closest nodes, isn't part of A* search that it will backtrack as needed? Or you could implement your own sort of backtracking to handle that kind of situation.
The question is, like, what happens if the closest node can't be reached from the previous closest node?
This step isn't necessary. As in, you aren't computing a path from the previous closest to the current closest, you are trying to get to your goal node, and the current closest is the only thing that matters (e.g. the algorithm doesn't care that last iteration you were 100km away, because this iteration you are only 96km away).
As a broad introduction, A* doesn't directly construct a path: it explores until it definitely knows that the path is contained within the region it has explored, and then constructs the path based on the information recorded during the exploration.
(I'm going to use the code in the Wikipedia article as a reference implementation to aid my explanation.)
You have a two sets of nodes: closedset and openset
closedset holds nodes that have been fully evaluated, that is, you know exactly how far they are from start and all their neighbours are in one of the two sets. This there is no more computation you can do with them and so we can (sort of) ignore them. (Basically these are completely contained within the border.)
openset holds "border" nodes, you know how far these are from start, but you haven't touched their neighbours yet, so they are on the edge of your search so far.
(Implicitly, there is a third set: completely untouched nodes. But you don't really touch them until they are in openset so they don't matter.)
At a given iteration, if you've got nodes to explore (that is, nodes in openset), you need to work out which one to explore. This is the job of the heuristic, it basically gives you a hint about which point on the border will be the best to explore next by telling you which node it thinks will have the shortest path to goal.
The previous closest node is irrelevant, it just expanded the border a bit, adding new nodes to openset. These new nodes are now candidates for the closest node in this iteration.
At first, openset only contains start, but then you iterate and at each step the border is expanded a little (in the most promising direction), until you eventually reach goal.
When A* is actually doing the exploration, it doesn't worry about which nodes came from where. It doesn't need to, because it knows their distance from start and the heuristic function and that's all it needs.
However to reconstruct the path later, you need to have some record of the path, this is what camefrom is. For a given node, camefrom links it to the node that is closest to start, so you can reconstruct the shortest path by following the links backwards from goal.
How does one actually take a "graph" as a function argument?
By passing one of the representations of a graph.
I don't really see how A* applies to the TSP. I mean, finding a route from A to B, sure, I get that. But the TSP? I don't see the connection.
You need a different heuristic and a different end condition: goal is no longer a single node any more, but the state of having everything connected; and your heuristic is some estimate of the length of the shortest path connecting the remaining nodes.
I have a undirected graph with about 100 nodes and about 200 edges. One node is labelled 'start', one is 'end', and there's about a dozen labelled 'mustpass'.
I need to find the shortest path through this graph that starts at 'start', ends at 'end', and passes through all of the 'mustpass' nodes (in any order).
( http://3e.org/local/maize-graph.png / http://3e.org/local/maize-graph.dot.txt is the graph in question - it represents a corn maze in Lancaster, PA)
Everyone else comparing this to the Travelling Salesman Problem probably hasn't read your question carefully. In TSP, the objective is to find the shortest cycle that visits all the vertices (a Hamiltonian cycle) -- it corresponds to having every node labelled 'mustpass'.
In your case, given that you have only about a dozen labelled 'mustpass', and given that 12! is rather small (479001600), you can simply try all permutations of only the 'mustpass' nodes, and look at the shortest path from 'start' to 'end' that visits the 'mustpass' nodes in that order -- it will simply be the concatenation of the shortest paths between every two consecutive nodes in that list.
In other words, first find the shortest distance between each pair of vertices (you can use Dijkstra's algorithm or others, but with those small numbers (100 nodes), even the simplest-to-code Floyd-Warshall algorithm will run in time). Then, once you have this in a table, try all permutations of your 'mustpass' nodes, and the rest.
Something like this:
//Precomputation: Find all pairs shortest paths, e.g. using Floyd-Warshall
n = number of nodes
for i=1 to n: for j=1 to n: d[i][j]=INF
for k=1 to n:
for i=1 to n:
for j=1 to n:
d[i][j] = min(d[i][j], d[i][k] + d[k][j])
//That *really* gives the shortest distance between every pair of nodes! :-)
//Now try all permutations
shortest = INF
for each permutation a[1],a[2],...a[k] of the 'mustpass' nodes:
shortest = min(shortest, d['start'][a[1]]+d[a[1]][a[2]]+...+d[a[k]]['end'])
print shortest
(Of course that's not real code, and if you want the actual path you'll have to keep track of which permutation gives the shortest distance, and also what the all-pairs shortest paths are, but you get the idea.)
It will run in at most a few seconds on any reasonable language :)
[If you have n nodes and k 'mustpass' nodes, its running time is O(n3) for the Floyd-Warshall part, and O(k!n) for the all permutations part, and 100^3+(12!)(100) is practically peanuts unless you have some really restrictive constraints.]
run Djikstra's Algorithm to find the shortest paths between all of the critical nodes (start, end, and must-pass), then a depth-first traversal should tell you the shortest path through the resulting subgraph that touches all of the nodes start ... mustpasses ... end
This is two problems... Steven Lowe pointed this out, but didn't give enough respect to the second half of the problem.
You should first discover the shortest paths between all of your critical nodes (start, end, mustpass). Once these paths are discovered, you can construct a simplified graph, where each edge in the new graph is a path from one critical node to another in the original graph. There are many pathfinding algorithms that you can use to find the shortest path here.
Once you have this new graph, though, you have exactly the Traveling Salesperson problem (well, almost... No need to return to your starting point). Any of the posts concerning this, mentioned above, will apply.
Actually, the problem you posted is similar to the traveling salesman, but I think closer to a simple pathfinding problem. Rather than needing to visit each and every node, you simply need to visit a particular set of nodes in the shortest time (distance) possible.
The reason for this is that, unlike the traveling salesman problem, a corn maze will not allow you to travel directly from any one point to any other point on the map without needing to pass through other nodes to get there.
I would actually recommend A* pathfinding as a technique to consider. You set this up by deciding which nodes have access to which other nodes directly, and what the "cost" of each hop from a particular node is. In this case, it looks like each "hop" could be of equal cost, since your nodes seem relatively closely spaced. A* can use this information to find the lowest cost path between any two points. Since you need to get from point A to point B and visit about 12 inbetween, even a brute force approach using pathfinding wouldn't hurt at all.
Just an alternative to consider. It does look remarkably like the traveling salesman problem, and those are good papers to read up on, but look closer and you'll see that its only overcomplicating things. ^_^ This coming from the mind of a video game programmer who's dealt with these kinds of things before.
This is not a TSP problem and not NP-hard because the original question does not require that must-pass nodes are visited only once. This makes the answer much, much simpler to just brute-force after compiling a list of shortest paths between all must-pass nodes via Dijkstra's algorithm. There may be a better way to go but a simple one would be to simply work a binary tree backwards. Imagine a list of nodes [start,a,b,c,end]. Sum the simple distances [start->a->b->c->end] this is your new target distance to beat. Now try [start->a->c->b->end] and if that's better set that as the target (and remember that it came from that pattern of nodes). Work backwards over the permutations:
[start->a->b->c->end]
[start->a->c->b->end]
[start->b->a->c->end]
[start->b->c->a->end]
[start->c->a->b->end]
[start->c->b->a->end]
One of those will be shortest.
(where are the 'visited multiple times' nodes, if any? They're just hidden in the shortest-path initialization step. The shortest path between a and b may contain c or even the end point. You don't need to care)
Andrew Top has the right idea:
1) Djikstra's Algorithm
2) Some TSP heuristic.
I recommend the Lin-Kernighan heuristic: it's one of the best known for any NP Complete problem. The only other thing to remember is that after you expanded out the graph again after step 2, you may have loops in your expanded path, so you should go around short-circuiting those (look at the degree of vertices along your path).
I'm actually not sure how good this solution will be relative to the optimum. There are probably some pathological cases to do with short circuiting. After all, this problem looks a LOT like Steiner Tree: http://en.wikipedia.org/wiki/Steiner_tree and you definitely can't approximate Steiner Tree by just contracting your graph and running Kruskal's for example.
Considering the amount of nodes and edges is relatively finite, you can probably calculate every possible path and take the shortest one.
Generally this known as the travelling salesman problem, and has a non-deterministic polynomial runtime, no matter what the algorithm you use.
http://en.wikipedia.org/wiki/Traveling_salesman_problem
The question talks about must-pass in ANY order. I have been trying to search for a solution about the defined order of must-pass nodes. I found my answer but since no question on StackOverflow had a similar question I'm posting here to let maximum people benefit from it.
If the order or must-pass is defined then you could run dijkstra's algorithm multiple times. For instance let's assume you have to start from s pass through k1, k2 and k3 (in respective order) and stop at e. Then what you could do is run dijkstra's algorithm between each consecutive pair of nodes. The cost and path would be given by:
dijkstras(s, k1) + dijkstras(k1, k2) + dijkstras(k2, k3) + dijkstras(k3, 3)
How about using brute force on the dozen 'must visit' nodes. You can cover all the possible combinations of 12 nodes easily enough, and this leaves you with an optimal circuit you can follow to cover them.
Now your problem is simplified to one of finding optimal routes from the start node to the circuit, which you then follow around until you've covered them, and then find the route from that to the end.
Final path is composed of :
start -> path to circuit* -> circuit of must visit nodes -> path to end* -> end
You find the paths I marked with * like this
Do an A* search from the start node to every point on the circuit
for each of these do an A* search from the next and previous node on the circuit to the end (because you can follow the circuit round in either direction)
What you end up with is a lot of search paths, and you can choose the one with the lowest cost.
There's lots of room for optimization by caching the searches, but I think this will generate good solutions.
It doesn't go anywhere near looking for an optimal solution though, because that could involve leaving the must visit circuit within the search.
One thing that is not mentioned anywhere, is whether it is ok for the same vertex to be visited more than once in the path. Most of the answers here assume that it's ok to visit the same edge multiple times, but my take given the question (a path should not visit the same vertex more than once!) is that it is not ok to visit the same vertex twice.
So a brute force approach would still apply, but you'd have to remove vertices already used when you attempt to calculate each subset of the path.