Studying some variants of the A* algorithm - algorithm

I recently started learning about the A* algorithm and its variants and came across this paper [1]. It basically has three variants of the algorithm with the heuristic value changed in each one of them.
For the A*(1) it has f(i) = g(i) + h(i), where g(i) denotes the path-cost function from the start point to the current position i, heuristic function h(i) is the Euclidean distance from the current point to the target point.
And for the A*(2) it has f(i) = g(i) + h(i) + h(j), where j the parent node of current point, h(j) is the Euclidean distance from the partent node of current point to target point.
The results show that A*(2) is generally faster than A*(1) when tried on randomly generated mazes. I am not able to explain why is this the case. I tried to compare the two heuristics and was able to conclude the contrary.
My logic says that if we travel from a point that is farther from the target to a nearer point, the f(i) value would be higher than when we travel from a point closer to the target to one that is far because we are considering the Euclidean distance of the parent node. Basically, to reach a specific node the path that is leading away from the target will have a lower f(i).
And since the f(i) value is lower, it would go up on the priority queue. This is against our goal, as a path that is going away from the target is prioritized over the path that is getting closer.
What is wrong with this logic and why does it not align with the results cited in the paper?
[1] - https://www.researchgate.net/publication/238009053_A_comparative_study_of_A-star_algorithms_for_search_and_rescue_in_perfect_maze

In a perfect maze like they use in the paper, A* has little advantage over depth-first search, breadth-first search and Dijkstra. They all perform more or less the same.
The power of A* is that the heuristic can encode a 'sense of direction' into the algorithm. If your target node is north of you, then it makes sense to start searching for a path northwards. But in a perfect maze, this sense of direction is useless. The northward path may be a dead-end and you'd be forced to backtrack. A* is much better suited to wide open grids with sparse obstacles.
Setting it to h(i) + h(j) more-or-less doubles the weight factor of the heuristic. I think that you'll see the same performance improvements if you use something like f(i) = g(i) + h(i) * 1.5 or f(i) = g(i) + h(i) * 2 This will make the algorithm more greedy, more likely to examine nodes closer to the target. The downside is, that you are no longer guaranteed to find the shortest path, you'll find /any/ path. But in a perfect maze, there is only one path to find, so this is not a real problem in this scenario.
I wrote an online widget that allows you to experiment with a few path finding algorithms. Use it to draw a maze, and see the effect of the "greedy" option.

Related

Dijkstra Algorithm with Chebyshev Distance

I have been using Dijkstra Algorithm to find the shortest path in the Graph API which is given by the Princeton University Algorithm Part 2, and I have figured out how to find the path with Chebyshev Distance.
Even though Chebyshev can move to any side of the node with the cost of only 1, there is no impact on the Total Cost, but according to the graph, the red circle, why does the path finding line moves zigzag without moving straight?
Will the same thing will repeat if I use A* Algorithm?
If you want to prioritize "straight lines" you should take the direction of previous step into account. One possible way is to create a graph G'(V', E') where V' consists of all neighbour pairs of vertices. For example, vertex v = (v_prev, v_cur) would define a vertex in the path where v_cur is the last vertex of the path and v_prev is the previous vertex. Then on "updating distances" step of the shortest path algorithm you could choose the best distance with the best (non-changing) direction.
Also we can add additional property to the distance equal to the number of changing a direction and find the minimal distance way with minimal number of direction changes.
It shouldn't be straight in particular, according to Dijkstra or A*, as you say it has no impact on the total cost. I'll assume, by the way, that you want to prevent useless zig-zagging in particular, and have no particular preference in general for a move that goes in the same direction as the previous move.
Dijkstra and A* do not have a built-in dislike for "weird paths", they only explicitly care about the cost, implicitly that means they also care about how you handle equal costs. There are a couple of things you can do about that:
Use tie-breaking to make them prefer straight moves whenever two nodes have equal cost (G or F, depending on whether you're doing Dijkstra or A*). This gives some trouble around obstacles because two choices that eventually lead to equal-length paths do not necessarily have the same F score, so they might not get tie-broken. It'll never give you a sub-optimal path though.
Slightly increase your diagonal cost, it doesn't have to be a whole lot, say 10 for straight and 11 for diagonal. This will just avoid any diagonal move that isn't a shortcut. But obviously: if that doesn't match the actual cost, you can now find sub-optimal paths. The bigger the cost difference, the more that will happen. In practice it's relatively rare, and paths have to be long enough (accumulating enough cost-difference that it becomes worth an entire extra move) before it happens.

Is A-star guaranteed to give the shortest path in a 2D grid

I am working with A-star algorithm, whereing I have a 2D grid and some obstacles. Now, I have only vertical and horizontal obstacles only, but they could vary densely.
Now, the A-star works well (i.e. shortest path found for most cases), but if I try to reach from the top left corner to the bottom right, then I see sometimes, the path is not shortest, i.e. there is some clumsiness in the path.
The path seem to deviate from the what the shortest should path should be.
Now here is what I am doing with my algorithm. I start from the source, and moving outward while calculating the value of the neighbours, for the distance from source + distance from destination, I keep choosing the minimum cell, and keep repeating until the cell I encounter is the destination, at which point I stop.
My question is, why is A-star not guaranteed to give me the shortest path. Or is it? and I am doing something wrong?
Thanks.
A-star is guaranteed to provide the shortest path according to your metric function (not necessarily 'as the bird flies'), provided that your heuristic is "admissible", meaning that it never over-estimates the remaining distance.
Check this link: http://theory.stanford.edu/~amitp/GameProgramming/Heuristics.html
In order to assist in determining your implementation error, we will need details on both your metric, and your heuristic.
Update:
OP's metric function is 10 for an orthogonal move, and 14 for a diagonal move.
OP's heuristic only considers orthogonal moves, and so is "inadmissible"; it overestimates by ignoring the cheaper diagonal moves.
The only cost to an overly conservative heuristic is that additional nodes are visited before finding the minimum path; the cost of an overly aggressive heuristic is a non-optimal path possibl e being returned. OP should use a heuristic of:
7 * (deltaX + deltaY)
which is a very slight underestimate on the possibility of a direct diagonal path, and so should also be performant.
Update #2:
To really squeeze out performance, this is close to an optimum while still being very fast:
7 * min(deltaX,deltaY) + 10 * ( max(deltaX,deltaY) -
min(deltaX,deltaY) )
Update #3:
The 7 above is derived from 14/2, where 14 is the diagonal cost in the metric.
Only your heuristic changes; the metric is "a business rule" and drives all the rest. If you are interested on A-star for a hexagonal grid, check out my project here: http://hexgridutilities.codeplex.com/
Update #4 (on performance):
My impression of A-star is that it staggers between regions of O(N^2) performance and areas of almost O(N) performance. But this is so dependent on the grid or graph, the obstacle placement, and the start and end points, that it is hard to generalize. For grids and graphs of known particular shapes or flavours there are a variety of more efficient algorithms, but they often get more complicated as well; TANSTAAFL.
I'm sure you are doing something wrong(Maybe some implementation flaw,your idea with A* sounds correct). A* guarantee gives the shortest path, it can be proved in math.
See this wiki pages will gives you all the information to solve your problem .
NO
A* is one of the fastest pathfinder algorithms but, it doesn't necessarily give the shortest path. If you are looking for correctness over time then it's best to use dijkstra's algorithm.

Modified a-star pathfinding heuristic design

FIRST,
The ideal path was (in order of importance):
1. shortest
My heuristic (f) was:
manhattan distance (h) + path length (g)
This was buggy because it favored paths which veered towards the target then snaked back.
SECOND,
The ideal path was:
1. shortest
2. approaches the destination from the same y coordinate (if of equal length)
My heuristic stayed the same. I checked for the second criteria at the end, after reaching the target. The heuristic was made slightly inefficient (to fix the veering problem) which also resulted in the necessary adjacent coordinates always being searched.
THIRD,
The ideal path:
1. shortest
2. approaches the destination from the same y coordinate (if of equal length)
3. takes the least number of turns
Now I tried making the heuristic (f):
manhattan distance (h) + path length (g) * number of turns (t)
This of course works for criteria #1 and #3, and fixes the veering problem inherently. Unfortunately it's now so efficient that testing for criteria #2 at the end is not working because the set of nodes explored is not large enough to reconstruct the optimal solution.
Can anyone advise me how to fit criteria #2 into my heuristic (f), or how else to tackle this problem?
CRITERIA 2 example: If the goal is (4,6) and the paths to (3,6) and (4,5) are of identical length, then the ideal solution should go through (3,6) because it approaches from the Y plane instead, of (4,5) which comes from the X plane. However if the length is not identical, then the shortest path must be favored regardless of what plane it approaches in.
You seem to be confusing the A* heuristic, what Russell & Norvig call h, with the partial path cost g. Together, these constitute the priority f = g + h.
The heuristic should be an optimistic estimate of how much it costs to reach the goal from the current point. Manhattan distance is appropriate for h if steps go up, down, left and right and take at least unit cost.
Your criterion 2, however, should go in the path cost g, not in h. I'm not sure what exactly you mean by "approaches the destination from the same y coordinate", but you can forbid/penalize entry into the goal node by giving all other approaches an infinite or very high path cost. There's strictly no need to modify the heuristic h.
The number of turns taken so far should also go in the partial path cost g. You may want to include in h an (optimistic) estimate of how many turns there are left to take, if you can compute such a figure cheaply.
Answering my own question with somewhat of a HACK. Still interested in other answers, ideas, comments, if you know of a better way to solve this.
Hacked manhattan distance is calculated towards the nearest square in the Y plane, instead of the destination itself:
dy = min(absolute_value(dy), absolute_value(dy-1));
Then when constructing heuristic (f):
h = hacked_manhattan_distance();
if (h < 2)
// we are beside close to the goal
// switch back to real distance
h = real_manhattan_distance();

A* Search Modification

The Wikipedia listing for A* search states:
In other words, the closed set can be omitted (yielding a tree search algorithm) if a solution is guaranteed to exist, or if the algorithm is adapted so that new nodes are added to the open set only if they have a lower f value than at any previous iteration.
However, in doing so, I have found that I receive erroneous results in an otherwise functional A* search implementation. Can someone shed some light on how one would make this modification?
Make sure your heuristic meets the following:
h(x) <= d(x,y) + h(y)
which means that your heuristic function should not overestimate the cost of getting from your current location to the destination or goal.
For example, if you are in a grid and you are trying to get from A to B, both points on this grid. A good heuristic function is the Euclidean distance between current location and goal:
h(x) = sqrt[ (crtX -goalX)^2 + (crtY -goalY)^2 ]
This heuristic does not overestimate because of the triangle inequality.
More on triangle inequality: http://en.wikipedia.org/wiki/Triangle_inequality
More on Euclidean distance: http://mathworld.wolfram.com/Distance.html

Algorithm to find two points furthest away from each other

Im looking for an algorithm to be used in a racing game Im making. The map/level/track is randomly generated so I need to find two locations, start and goal, that makes use of the most of the map.
The algorithm is to work inside a two dimensional space
From each point, one can only traverse to the next point in four directions; up, down, left, right
Points can only be either blocked or nonblocked, only nonblocked points can be traversed
Regarding the calculation of distance, it should not be the "bird path" for a lack of a better word. The path between A and B should be longer if there is a wall (or other blocking area) between them.
Im unsure on where to start, comments are very welcome and proposed solutions are preferred in pseudo code.
Edit: Right. After looking through gs's code I gave it another shot. Instead of python, I this time wrote it in C++. But still, even after reading up on Dijkstras algorithm, the floodfill and Hosam Alys solution, I fail to spot any crucial difference. My code still works, but not as fast as you seem to be getting yours to run. Full source is on pastie. The only interesting lines (I guess) is the Dijkstra variant itself on lines 78-118.
But speed is not the main issue here. I would really appreciate the help if someone would be kind enough to point out the differences in the algorithms.
In Hosam Alys algorithm, is the only difference that he scans from the borders instead of every node?
In Dijkstras you keep track and overwrite the distance walked, but not in floodfill, but thats about it?
Assuming the map is rectangular, you can loop over all border points, and start a flood fill to find the most distant point from the starting point:
bestSolution = { start: (0,0), end: (0,0), distance: 0 };
for each point p on the border
flood-fill all points in the map to find the most distant point
if newDistance > bestSolution.distance
bestSolution = { p, distantP, newDistance }
end if
end loop
I guess this would be in O(n^2). If I am not mistaken, it's (L+W) * 2 * (L*W) * 4, where L is the length and W is the width of the map, (L+W) * 2 represents the number of border points over the perimeter, (L*W) is the number of points, and 4 is the assumption that flood-fill would access a point a maximum of 4 times (from all directions). Since n is equivalent to the number of points, this is equivalent to (L + W) * 8 * n, which should be better than O(n2). (If the map is square, the order would be O(16n1.5).)
Update: as per the comments, since the map is more of a maze (than one with simple obstacles as I was thinking initially), you could make the same logic above, but checking all points in the map (as opposed to points on the border only). This should be in order of O(4n2), which is still better than both F-W and Dijkstra's.
Note: Flood filling is more suitable for this problem, since all vertices are directly connected through only 4 borders. A breadth first traversal of the map can yield results relatively quickly (in just O(n)). I am assuming that each point may be checked in the flood fill from each of its 4 neighbors, thus the coefficient in the formulas above.
Update 2: I am thankful for all the positive feedback I have received regarding this algorithm. Special thanks to #Georg for his review.
P.S. Any comments or corrections are welcome.
Follow up to the question about Floyd-Warshall or the simple algorithm of Hosam Aly:
I created a test program which can use both methods. Those are the files:
maze creator
find longest distance
In all test cases Floyd-Warshall was by a great magnitude slower, probably this is because of the very limited amount of edges that help this algorithm to achieve this.
These were the times, each time the field was quadruplet and 3 out of 10 fields were an obstacle.
Size Hosam Aly Floyd-Warshall
(10x10) 0m0.002s 0m0.007s
(20x20) 0m0.009s 0m0.307s
(40x40) 0m0.166s 0m22.052s
(80x80) 0m2.753s -
(160x160) 0m48.028s -
The time of Hosam Aly seems to be quadratic, therefore I'd recommend using that algorithm.
Also the memory consumption by Floyd-Warshall is n2, clearly more than needed.
If you have any idea why Floyd-Warshall is so slow, please leave a comment or edit this post.
PS: I haven't written C or C++ in a long time, I hope I haven't made too many mistakes.
It sounds like what you want is the end points separated by the graph diameter. A fairly good and easy to compute approximation is to pick a random point, find the farthest point from that, and then find the farthest point from there. These last two points should be close to maximally separated.
For a rectangular maze, this means that two flood fills should get you a pretty good pair of starting and ending points.
I deleted my original post recommending the Floyd-Warshall algorithm. :(
gs did a realistic benchmark and guess what, F-W is substantially slower than Hosam Aly's "flood fill" algorithm for typical map sizes! So even though F-W is a cool algorithm and much faster than Dijkstra's for dense graphs, I can't recommend it anymore for the OP's problem, which involves very sparse graphs (each vertex has only 4 edges).
For the record:
An efficient implementation of Dijkstra's algorithm takes O(Elog V) time for a graph with E edges and V vertices.
Hosam Aly's "flood fill" is a breadth first search, which is O(V). This can be thought of as a special case of Dijkstra's algorithm in which no vertex can have its distance estimate revised.
The Floyd-Warshall algorithm takes O(V^3) time, is very easy to code, and is still the fastest for dense graphs (those graphs where vertices are typically connected to many other vertices). But it's not the right choice for the OP's task, which involves very sparse graphs.
Raimund Seidel gives a simple method using matrix multiplication to compute the all-pairs distance matrix on an unweighted, undirected graph (which is exactly what you want) in the first section of his paper On the All-Pairs-Shortest-Path Problem in Unweighted Undirected Graphs
[pdf].
The input is the adjacency matrix and the output is the all-pairs shortest-path distance matrix. The run-time is O(M(n)*log(n)) for n points where M(n) is the run-time of your matrix multiplication algorithm.
The paper also gives the method for computing the actual paths (in the same run-time) if you need this too.
Seidel's algorithm is cool because the run-time is independent of the number of edges, but we actually don't care here because our graph is sparse. However, this may still be a good choice (despite the slightly-worse-than n^2 run-time) if you want the all pairs distance matrix, and this might also be easier to implement and debug than floodfill on a maze.
Here is the pseudocode:
Let A be the nxn (0-1) adjacency matrix of an unweighted, undirected graph, G
All-Pairs-Distances(A)
Z = A * A
Let B be the nxn matrix s.t. b_ij = 1 iff i != j and (a_ij = 1 or z_ij > 0)
if b_ij = 1 for all i != j return 2B - A //base case
T = All-Pairs-Distances(B)
X = T * A
Let D be the nxn matrix s.t. d_ij = 2t_ij if x_ij >= t_ij * degree(j), otherwise d_ij = 2t_ij - 1
return D
To get the pair of points with the greatest distance we just return argmax_ij(d_ij)
Finished a python mockup of the dijkstra solution to the problem.
Code got a bit long so I posted it somewhere else: http://refactormycode.com/codes/717-dijkstra-to-find-two-points-furthest-away-from-each-other
In the size I set, it takes about 1.5 seconds to run the algorithm for one node. Running it for every node takes a few minutes.
Dont seem to work though, it always displays the topleft and bottomright corner as the longest path; 58 tiles. Which of course is true, when you dont have obstacles. But even adding a couple of randomly placed ones, the program still finds that one the longest. Maybe its still true, hard to test without more advanced shapes.
But maybe it can at least show my ambition.
Ok, "Hosam's algorithm" is a breadth first search with a preselection on the nodes.
Dijkstra's algorithm should NOT be applied here, because your edges don't have weights.
The difference is crucial, because if the weights of the edges vary, you need to keep a lot of options (alternate routes) open and check them with every step. This makes the algorithm more complex.
With the breadth first search, you simply explore all edges once in a way that garantuees that you find the shortest path to each node. i.e. by exploring the edges in the order you find them.
So basically the difference is Dijkstra's has to 'backtrack' and look at edges it has explored before to make sure it is following the shortest route, while the breadth first search always knows it is following the shortest route.
Also, in a maze the points on the outer border are not guaranteed to be part of the longest route.
For instance, if you have a maze in the shape of a giant spiral, but with the outer end going back to the middle, you could have two points one at the heart of the spiral and the other in the end of the spiral, both in the middle!
So, a good way to do this is to use a breadth first search from every point, but remove the starting point after a search (you already know all the routes to and from it).
Complexity of breadth first is O(n), where n = |V|+|E|. We do this once for every node in V, so it becomes O(n^2).
Your description sounds to me like a maze routing problem. Check out the Lee Algorithm. Books about place-and-route problems in VLSI design may help you - Sherwani's "Algorithms for VLSI Physical Design Automation" is good, and you may find VLSI Physical Design Automation by Sait and Youssef useful (and cheaper in its Google version...)
If your objects (points) do not move frequently you can perform such a calculation in a much shorter than O(n^3) time.
All you need is to break the space into large grids and pre-calculate the inter-grid distance. Then selecting point pairs that occupy most distant grids is a matter of simple table lookup. In the average case you will need to pair-wise check only a small set of objects.
This solution works if the distance metrics are continuous. Thus if, for example there are many barriers in the map (as in mazes), this method might fail.

Resources