Does the A star algorithm return definitely the path with the less cost ?
I'm running this algorithm and it proposes a path which doesn't have the minimum cost (I found another one with a less cost)
Why does it propose this path and not the other ( with less cost)?
Does it have other criteria to choose the proposed path in addition of the cost criteria?
This is an exmaple of what I'm asking about the green path have less cost but the algorithm propose the orange one
The heuristic must return a value that is smaller or equal to the actual minimal cost. Otherwise the algorithm may return a wrong result.
Does the A star algorithm return definitely the path with the less
cost ?
Yes, as long as you have used a consistent heuristic. If so, there are two options,
Your A* implementation is wrong, therefore the algorithm returns a suboptimal path.
The heuristic function you are using is not consistent, ie it does not give an estimation for each node which is smaller or equal to the real shortest path to the goal.
It must be one of these two, since A* is proved to always find the optimal path whenever a consistent heuristic function is used.
Imagine that the first node in the green path, which has cost 1 to be reached from A, has a heuristic value according to whatever function you are using of h(green_1) = 20. That value overestimates the real shortest path from that node to the goal node B, which is 6. Let me now assume the heuristic estimations of all nodes in the orange path correspond to the real shortest path from that node to B. Thus,
f(green_1)= g(green_1) + h(green_1) = 1 + 20 = 21
All f(n) values corresponding to the orange path will have smaller f(n) values and therefore green_1 will not be selected for expansion. The goal node will also be added to the OPEN list with f(B) = g(B) + h(B) = 11 + 0 and expanded, and since our heuristic has only promised us a path with cost 21 from the green side, which is worse than the already found orange path, the algorithm will finish and return the suboptimal solution.
.
Related
I will illustrate my question through this example.
As the above digraph
as for the path form A to C denote (A,C),there exist three path:
A->B->C A->C A->B->D->C
and their geometric mean are:
Math.pow(0.1*0.4,1/2)=0.2
Math.pow(0.4,1)=0.4
Math.pow(0.1*0.1*0.8,1/3)=0.2
Obviously, the max value is 0.4 that is to say the max geometric mean path is A->C
Then what I want to achieve is get max geometric mean path for every two vertex. My current method is use DFS to get all path for every two vertex and then compute geometric mean value every path and get max one.
However the number of vertex is more than 300 and the graphs is very complex. Then it will sacrifice too much much much time before getting results.
So I want to know it there exist more elegant algorithm to solve this question more quickly. I know floyd algorithm for Multi-source shortest path. But it seemed I cant use this algorithm to solve my question. I will appreciate for any advice , link or anything relevant.
Since the geometric mean is equal to (L_1 L_2 ... L_n)^(1/n), its natural logarithm is equal to 1/n * (log(L_1) + log(L_2) + ... + log(L_n). Since the log function is strictly monotonic, this means that path with the maximum geometric mean edge length condition is identical to the path with the maximum arithmetic mean log(edge length). So, the first simplification is to replace each edge length with its logarithm and reframe your condition as searching for the maximum arithmetic mean edge length. Naturally, any edge length equal to 0 should be removed, as a path including this edge can never have the maximum (unless every edge has 0 length). This rephrasing doesn't necessarily help that much, but it removes some artificial (i.e. apparent only at first glance) difficulty.
Next, the fact that you want the maximum mean edge length, rather than total edge length, must be dealt with. Among all paths of length n, the one with the maximum arithmetic mean edge length is the path with the maximum total length. So, choosing the path with the maximum mean edge length is equivalent to choosing the path with the maximum L_n / n, where L_n is the length of the maximum n-edge path. I haven't thought through the details, but it seems to me that it should be possible to compute L_n straight-forwardly (i.e. with as much difficulty as it takes to find the path with the maximum edge length overall, which is still NP hard), maybe with dynamic programming.
I'm searching for an algorithm to find a path between two nodes with minimum cost and maximum length given a maximum cost in an undirected weighted complete graph. Weights are non negative.
As I stand now I'm using DFS, and it's pretty slow (high number of nodes and maximum length too). I already discard all the impossible nodes in every iteration of the DFS.
Could someone point me to a known algorithm for better handling of this problem?
To clarify: ideally the algorithm should search for the path of minimum cost, but is allowed to add cost if this means visiting more nodes. It should end when it concludes that it's impossible to reach more than n nodes without crossing the cost limit and it's impossible to reach n nodes with less cost.
Update
Example of a graph. We have to go from A to B. Cost limit is set to 5:
This path (in red) is ok, but the algorithm should continue searching for better solutions
This is better because although the cost is increased to 4, it contains 1 more node
Here the path contains 3 nodes so it's a lot better than before and the cost is an acceptable 5
Finally this solution is even better because the path also contains 3 nodes but with cost 4, with is less than before.
Hope images explain better than text
Idea 1:
In my opinion your problem is a variation of the pareto optimal shortest path search problem. Because you refer to 2 different optimality metrics:
Longest Path by edge count
Shortest Path by edge weight
Of course some side constraints just make the problem more easy to calculate.
You have to implement a multi criteria dijkstra for pareto optimal results. I found two promising paper in english for this problem:
A multicriteria Pareto-optimal path algorithm
On a multicriteria shortest path problem
Unfortunately I wasn't able to find the pdf files for those papers and the papers I read before where in german :(
Nevertheless this should be your entry point and will lead you to an algorithm to solve your problem nice and smoothly.
Idea 2:
Another way to solve this problem could lie in the calculation of hamilton path, because the longest path in a complete graph is indeed the hamilton path. After calculation of all such path you still have to find the one with the smallest total edge weight cost. This scenario is useful if the length of the path is in every case more relevant than the cost.
Idea 3:
If the cost of the edges is the more important fact you should calculate all paths between those two nodes of a given maximum length and search for the one with the most used edges.
Conclusion:
I think the best results will be obtained by using idea 1. But I didn't know your scenario to well, therefore the other ideas might be an option two.
This problem can formulated as Multi-objective Constraint Satisfaction Problem with priority:
First, solution must satisfy the constraint about maximum cost.
Next, solution must has maximum number of nodes (1st objective).
Finally, solution must has minimum cost (2st objective).
This problem is NP-hard. So, there isn't exact polynomial time algorithm for this problem. But a simple local search algorithm may help you:
First, use Dijkstra algorithm to find minimum cost path, called P. If the cost is bigger than maximum cost, there isn't solution satisfy constraint.
Next, try add more nodes to P by using 2 move operators:
Insert: select a node outside P and insert in best position in P.
Replace: select a node outside P and replace a node inside P (when can't use insert operator).
Finally, try reduce cost by using replace operator.
I have an application that would benefit from using A*; however, for legacy reasons, I need it to continue generating exactly the same paths it did before when there are multiple best-paths to choose from.
For example, consider this maze
...X
FX.S
....
S = start
F = finish
X = wall
. = empty space
with direction-priorities Up; Right; Down; Left. Using breadth-first, we will find the path DLLLU; however, using A* we immediately go left, and end up finding the path LULLD.
I've tried making sure to always expand in the correct direction when breaking ties; and overwriting the PreviousNode pointers when moving from a more important direction, but neither works in that example. Is there a way to do this?
If the original algorithm was BFS, you are looking for the smallest of the shortest paths where "smallest" is according to the lexicographic order induced by some total order Ord on the edges (and of course "shortest" is according to path length).
The idea of tweaking weights suggested by amit is a natural one, but I don't think it is very practical because the weights would need to have a number of bits comparable to the length of a path to avoid discarding information, which would make the algorithm orders of magnitude slower.
Thankfully this can still be done with two simple and inexpensive modifications to A*:
Once we reach the goal, instead of returning an arbitrary shortest path to the goal, we should continue visiting nodes until the path length increases, so that we visit all nodes that belong to a shortest path.
When reconstructing the path, we build the set of nodes that contribute to the shortest paths. This set has a DAG structure when considering all shortest path edges, and it is now easy to find the lexicography smallest path from start to goal in this DAG, which is the desired solution.
Schematically, classic A* is:
path_length = infinity for every node
path_length[start] = 0
while score(goal) > minimal score of unvisited nodes:
x := any unvisited node with minimal score
mark x as visited
for y in unvisited neighbors of x:
path_length_through_x = path_length[x] + d(x,y)
if path_length[y] > path_length_through_x:
path_length[y] = path_length_through_x
ancestor[y] = x
return [..., ancestor[ancestor[goal]], ancestor[goal], goal]
where score(x) stands for path_length[x] + heuristic(x, goal).
We simply turn the strict while loop inequality into a non-strict one and add a path reconstruction phase:
path_length = infinity for every node
path_length[start] = 0
while score(goal) >= minimal score of unvisited nodes:
x := any unvisited node with minimal score
mark x as visited
for y in unvisited neighbors of x:
path_length_through_x = path_length[x] + d(x,y)
if path_length[y] > path_length_through_x:
path_length[y] = path_length_through_x
optimal_nodes = [goal]
for every x in optimal_nodes: // note: we dynamically add nodes in the loop
for y in neighbors of x not in optimal_nodes:
if path_length[x] == path_length[y] + d(x,y):
add y to optimal_nodes
path = [start]
x = start
while x != goal:
z = undefined
for y in neighbors of x that are in optimal_nodes:
if path_length[y] == path_length[x] + d(x,y):
z = y if (x,y) is smaller than (x,z) according to Ord
x = z
append x to path
return path
Warning: to quote Knuth, I have only proven it correct, not tried it.
As for the performance impact, it should be minimal: the search loop only visits nodes with a score that is 1 unit higher than classic A*, and the reconstruction phase is quasi-linear in the number of nodes that belong to a shortest path. The impact is smaller if, as you imply, there is only one shortest path in most cases. You can even optimize for this special case e.g. by remembering an ancestor node as in the classic case, which you set to a special error value when there is more than one ancestor (that is, when path_length[y] == path_length_through_x). Once the search loop is over, you attempt to retrieve a path through ancestor as in classic A*; you only need to execute the full path reconstruction if an error value was encountered when building the path.
i would build in the preference on the path order directly into the heuristic function
i would look at the bread-first algorithm first
define a function for every path that the bread-first algorithm chooses:
consider we are running a depth-first algorithm, and it's at n-th depth
the previously done decisions by the algo: x_i \in {U,R,D,L}
assign U=0,R=1,D=2,L=3
then define:
g(x_1,..,x_n) = sum_{i=1}^n x_i * (1/4)^i
let's fix this step's g value as g'
at every step when the algorithm visites a more deeper node than this one, the g() function would be greater.
at every future step when on of {1..n} x_i is changed, it will be greater hence it's true that the g function always raises while running depth-first.
note:if the depth-first algorithm is successfull, it selects the path with the minimal g() value
note: g() < 1 beacuse max(L,R,U,D)=3
adding g to the A*'s heuristic function won't interfere with the shortest path length because min edge length>=1
the first solution an A* modified like this would found would be the one that the depth-first would find
for you example:
h_bread=g(DLLLU) = (23330)_4 * c
h_astar=g(LULLD) = (30332)_4 * c
()_4 is base4
c is a constant (4^{-5})
for you example: h_bread < h_astar
I've come up with two ways of doing this. Both require continuing the algorithm while the top of the queue has distance-to-start g-value <= g(end-node). Since the heuristic used in A* is admissable, this guarantees that every node that belongs to some best-path will eventually be expanded.
The first method is, when we come to a conflict (ie. we find two nodes with the same f-value that could potentially both be the parent of some node along the best-path), we resolve it by backtracking to the first point along the path that they meet (we can do this easily in O(path-length)). We then simply check the direction priorities of both paths, and go with whichever path would have the higher priority in a BFS-search.
The second method only works for grids where each node touches the horizonally- and vertically- (and possibly diagonally-) adjacent nodes (ie. 4-connected grid-graphs). We do the same thing as before, but instead of backtracking to resolve a conflict, we compare the nodes along the paths from the start, and find the first place they differ. The first place they differ will be the same critical node as before, from which we can check direction-priorities.
We do this by storing the best path so far for each node. Normally this would be cumbersome, but since we have a 4-connected graph, we can do this pretty efficiently by storing each direction taken along the path. This will take only 2-bits per node. Thus, we can essentially encode the path using integers - with 32-bit registers we can compare 16 nodes at a time; 32 nodes with 64-bit registers; and 64(!) nodes at a time with 128-bit registers (like the SSE registers in x86 and x64 processors), making this search very inexpensive even for paths with 100's of nodes.
I implemented both of these, along with #generic human's algorithm, to test the speed. On a 50x50 grid with 400 towers,
#generic human's algorithm ran about 120% slower than normal A*
my backtracking algorithm ran about 55% slower than normal A*
The integer-encoding algorithm only ran less than 10% slower than A*
Thus, since my application uses 4-connected graphs, it seems the integer-encoding algorithm is best for me.
I've copied an email I wrote to a professor here. It includes more detailed descriptions of the algorithms, along with sketches of proofs that they work.
In general, there is no non-trivial way to do this:
Breadth-first search finds the shortest path of lowest order determined by the order in which vertices are considered. And this order must take precedence over any other factor when breaking ties between paths of equal length.
Example: If the nodes are considered in the order A, B, C, then Node A < Node C. Thus if there is a tie between a shortest path beginning with A and one beginning with C, the one with A will found.
On the other hand, A* search will find the shortest path of lowest order determined by the heuristic value of the node. Thus the heuristic must take into account the lowest lexicographic path to each node. And the only way to find that is BFS.
FIRST,
The ideal path was (in order of importance):
1. shortest
My heuristic (f) was:
manhattan distance (h) + path length (g)
This was buggy because it favored paths which veered towards the target then snaked back.
SECOND,
The ideal path was:
1. shortest
2. approaches the destination from the same y coordinate (if of equal length)
My heuristic stayed the same. I checked for the second criteria at the end, after reaching the target. The heuristic was made slightly inefficient (to fix the veering problem) which also resulted in the necessary adjacent coordinates always being searched.
THIRD,
The ideal path:
1. shortest
2. approaches the destination from the same y coordinate (if of equal length)
3. takes the least number of turns
Now I tried making the heuristic (f):
manhattan distance (h) + path length (g) * number of turns (t)
This of course works for criteria #1 and #3, and fixes the veering problem inherently. Unfortunately it's now so efficient that testing for criteria #2 at the end is not working because the set of nodes explored is not large enough to reconstruct the optimal solution.
Can anyone advise me how to fit criteria #2 into my heuristic (f), or how else to tackle this problem?
CRITERIA 2 example: If the goal is (4,6) and the paths to (3,6) and (4,5) are of identical length, then the ideal solution should go through (3,6) because it approaches from the Y plane instead, of (4,5) which comes from the X plane. However if the length is not identical, then the shortest path must be favored regardless of what plane it approaches in.
You seem to be confusing the A* heuristic, what Russell & Norvig call h, with the partial path cost g. Together, these constitute the priority f = g + h.
The heuristic should be an optimistic estimate of how much it costs to reach the goal from the current point. Manhattan distance is appropriate for h if steps go up, down, left and right and take at least unit cost.
Your criterion 2, however, should go in the path cost g, not in h. I'm not sure what exactly you mean by "approaches the destination from the same y coordinate", but you can forbid/penalize entry into the goal node by giving all other approaches an infinite or very high path cost. There's strictly no need to modify the heuristic h.
The number of turns taken so far should also go in the partial path cost g. You may want to include in h an (optimistic) estimate of how many turns there are left to take, if you can compute such a figure cheaply.
Answering my own question with somewhat of a HACK. Still interested in other answers, ideas, comments, if you know of a better way to solve this.
Hacked manhattan distance is calculated towards the nearest square in the Y plane, instead of the destination itself:
dy = min(absolute_value(dy), absolute_value(dy-1));
Then when constructing heuristic (f):
h = hacked_manhattan_distance();
if (h < 2)
// we are beside close to the goal
// switch back to real distance
h = real_manhattan_distance();
This seems true but I can't find anyone on the internet saying it is, so I'd like to make sure. Please tell me if you agree and if so, why. Ideally a link to a paper, or, if you disagree, a counterexample.
Let G be a directed graph with some negative edges. We want to run A* on G.
First off, if G has negative cycles reachable from the source and reaching the goal, there is no admissible heuristic, as it's impossible to underestimate the cost of reaching the goal because it's -∞.
If there are no such cycles however, there may be some admissible heuristic. In particular, the sum of all negative edges will always underestimate the cost of reaching the goal.
I'm under the impression that in this case, A* may work just fine.
P.S. I could run Bellman-Ford on the graph, detect negative cycles, and if there are none, reweigh to get rid of negative edges. But, if I know there are no negative cycles, I can just skip that and run A*.
This is trivially wrong. The cost of a vertex is the sum of the heuristic and the path built so far... while the heuristic underestimates the cost to reach the goal, the sum of the heuristic and the path taken so far may not. Maxima culpa.
It seems that sorting the open set with a function that underestimates the cost to reach the goal, while going through a given vertex may work though... if one uses <sum of negative edges in the graph> as such a function, it looks like it degenerates into a graph traversal.
Consider the example with 3 nodes and 3 weights:
1 2 -10
1 3 -3
3 2 -8
From 1 to 2 there is path with weight -10. So you get this first and establish it as the minimal path to 2. However there is path (1-3-2) which is smaller than the first one.
In the example given by top rated answer:
(2,-10) goes into priority queue. Agreed.
So does (3,x) where x<=-11 as heuristic is admissible.
Now (3,x) gets popped as x<-10 and we get to the correct solution.
I can't put this as a comment as I don't have enough reputation.
This is trivially wrong. The cost of a vertex is the sum of the
heuristic and the path built so far... while the heuristic
underestimates the cost to reach the goal, the sum of the heuristic
and the path taken so far may not.
A* will never expand a node such that the sum of the heuristic and the path taken so far (f-value) is greater than the optimal path length. This is because, along the optimal path, there is always one node with f-value less than or equal to the optimal cost.
Thus, even with negative-weight edges, A* will find the optimal path if such a path exists, as long as there are finite number of edges with f-value less than the optimal cost.