Why isn't my heuristic for the A* algorithm admissible? - algorithm

I am going through the CS 188 availible to the public at edx.org. Right now I have to develop a heuristic for an A* search to eat all the pellets as shown here:
My heuristic that I was sure would work, (as both admissible and consistent) went like this:
initialize heuristic accumulator called h to be 0
initialize pos to be the current position of pacman
while pellets not eaten:
get nearest pellet from pos using astar search (Manhattan distance as the heuristic)
add the distance to h
remove the pellet from pellets
set pos to be the position of pellet
I also cache the previously computed distances so the astar search to find the nearest pellet isn't done if it has already been done before in another computation of a state. It is able to solve the problem very quickly, and the outcome is optimal.
When I use this algorithm in the autograder, it fails the admissibility test.
Don't worry, I am not asking for a solution to the problem, only why my current solution is not admissible? When I go through the example in the picture in my head the heuristic is never overestimating the cost.
So if anyone was able to understand this, and has any ideas your input is greatly appreciated!

A heuristic for A* needs to provide a number that is no more than the best possible cost. Your heuristic is a plausible greedy solution that does not guarantee this. Suppose that there is a single line of pellets and the pac-man is slightly off centre on this line. The cheapest solution is work out which end of the line is nearest, eat all the pellets to the end of that line, and then move in the other direction to eat all the other pellets without having to reverse in the longer half of the line.
Your greedy heuristic moves first to whichever pellet is nearest the pac-man which might not be the side that has the shortest distance, so in this case it may not return a cost no greater than the optimal cost - it returns the cost of a possible solution which may not be optimal.

Here is way to set up heuristic which is feasible for your problem. Firstly if your goal is to eat all pellets in minimum distance then your solution is too greedy to achieve a feasible solution for it. Here is way to redesign your heuristic:-
Goal : Eat all pellets in minimum path length.
Heuristic Estimate :
1.> Use A* to calculate all shortest paths from current position to pellets independently.
2.> Cost function: (sum of all unvisited pellets shortest path from current)*2 + total distance from start state.
The Cost function is upper bound .
Note: There can be more efficient way to calculate shortest paths to uneaten pellets at each state. Would need some research.

Related

Studying some variants of the A* algorithm

I recently started learning about the A* algorithm and its variants and came across this paper [1]. It basically has three variants of the algorithm with the heuristic value changed in each one of them.
For the A*(1) it has f(i) = g(i) + h(i), where g(i) denotes the path-cost function from the start point to the current position i, heuristic function h(i) is the Euclidean distance from the current point to the target point.
And for the A*(2) it has f(i) = g(i) + h(i) + h(j), where j the parent node of current point, h(j) is the Euclidean distance from the partent node of current point to target point.
The results show that A*(2) is generally faster than A*(1) when tried on randomly generated mazes. I am not able to explain why is this the case. I tried to compare the two heuristics and was able to conclude the contrary.
My logic says that if we travel from a point that is farther from the target to a nearer point, the f(i) value would be higher than when we travel from a point closer to the target to one that is far because we are considering the Euclidean distance of the parent node. Basically, to reach a specific node the path that is leading away from the target will have a lower f(i).
And since the f(i) value is lower, it would go up on the priority queue. This is against our goal, as a path that is going away from the target is prioritized over the path that is getting closer.
What is wrong with this logic and why does it not align with the results cited in the paper?
[1] - https://www.researchgate.net/publication/238009053_A_comparative_study_of_A-star_algorithms_for_search_and_rescue_in_perfect_maze
In a perfect maze like they use in the paper, A* has little advantage over depth-first search, breadth-first search and Dijkstra. They all perform more or less the same.
The power of A* is that the heuristic can encode a 'sense of direction' into the algorithm. If your target node is north of you, then it makes sense to start searching for a path northwards. But in a perfect maze, this sense of direction is useless. The northward path may be a dead-end and you'd be forced to backtrack. A* is much better suited to wide open grids with sparse obstacles.
Setting it to h(i) + h(j) more-or-less doubles the weight factor of the heuristic. I think that you'll see the same performance improvements if you use something like f(i) = g(i) + h(i) * 1.5 or f(i) = g(i) + h(i) * 2 This will make the algorithm more greedy, more likely to examine nodes closer to the target. The downside is, that you are no longer guaranteed to find the shortest path, you'll find /any/ path. But in a perfect maze, there is only one path to find, so this is not a real problem in this scenario.
I wrote an online widget that allows you to experiment with a few path finding algorithms. Use it to draw a maze, and see the effect of the "greedy" option.

Dijkstra Algorithm with Chebyshev Distance

I have been using Dijkstra Algorithm to find the shortest path in the Graph API which is given by the Princeton University Algorithm Part 2, and I have figured out how to find the path with Chebyshev Distance.
Even though Chebyshev can move to any side of the node with the cost of only 1, there is no impact on the Total Cost, but according to the graph, the red circle, why does the path finding line moves zigzag without moving straight?
Will the same thing will repeat if I use A* Algorithm?
If you want to prioritize "straight lines" you should take the direction of previous step into account. One possible way is to create a graph G'(V', E') where V' consists of all neighbour pairs of vertices. For example, vertex v = (v_prev, v_cur) would define a vertex in the path where v_cur is the last vertex of the path and v_prev is the previous vertex. Then on "updating distances" step of the shortest path algorithm you could choose the best distance with the best (non-changing) direction.
Also we can add additional property to the distance equal to the number of changing a direction and find the minimal distance way with minimal number of direction changes.
It shouldn't be straight in particular, according to Dijkstra or A*, as you say it has no impact on the total cost. I'll assume, by the way, that you want to prevent useless zig-zagging in particular, and have no particular preference in general for a move that goes in the same direction as the previous move.
Dijkstra and A* do not have a built-in dislike for "weird paths", they only explicitly care about the cost, implicitly that means they also care about how you handle equal costs. There are a couple of things you can do about that:
Use tie-breaking to make them prefer straight moves whenever two nodes have equal cost (G or F, depending on whether you're doing Dijkstra or A*). This gives some trouble around obstacles because two choices that eventually lead to equal-length paths do not necessarily have the same F score, so they might not get tie-broken. It'll never give you a sub-optimal path though.
Slightly increase your diagonal cost, it doesn't have to be a whole lot, say 10 for straight and 11 for diagonal. This will just avoid any diagonal move that isn't a shortcut. But obviously: if that doesn't match the actual cost, you can now find sub-optimal paths. The bigger the cost difference, the more that will happen. In practice it's relatively rare, and paths have to be long enough (accumulating enough cost-difference that it becomes worth an entire extra move) before it happens.

Understanding an Inconsistent Heuristic

Say I have a grid with some squares designated as "goal" squares. I am using A* in order to navigate this grid, trying to visit every goal square at least once using non-diagonal movement. Once a goal square has been visited, it is no longer considered a goal square. Think Pac Man, moving around and trying to eat all the dots.
I am looking for a consistent heuristic to give A* to aid in navigation. I decided to try a "return the Manhattan Distance to the nearest unvisited goal" heuristic for any given location. I have been told that this is not a consistent heuristic but I do not understand why.
Moving one square towards the closest goal square has a cost of one, and the Manhattan Distance should also be reduced by one. Landing on a goal square will either increase the value of the heuristic (because it will now seek the next nearest unvisited goal) or end the search (if the goal was the last unvisited goal)
H(N) < c(N,P) + h(P) seems to always hold true. What is it that makes this algorithm inconsistent, or is my instructor mistaken?
If you are asking how to use A* to find the shortest path through all the goals, the answer is: you can't (with only one iteration). This is the Travelling Salesman Problem, an NP-Complete problem. To solve this using A*, you'd need to try every permutation of goal-orderings. Each path from a single-start to a single-goal could then be solved using A* (so you'd need to run the algorithm multiple times for each permutation).
However, if you are asking how to use A* to find the shortest path from a single start to any one of a number of goals, your solution works fine, and your heuristic is indeed consistent. The minimum of multiple consistent-heuristics is still a consistent-heuristic, which is easy to prove.

Pathfinding - A path of less than or equal to n turns

Most of the time when implementing a pathfinding algorithm such as A*, we seek to minimize the travel cost along the path. We could also seek to find the optimal path with the fewest number of turns. This could be done by, instead of having a grid of location states, having a grid of location-direction states. For any given location in the old grid, we would have 4 states in that spot representing that location moving left, right, up, or down. That is, if you were expanding to a node above you, you would actually be adding the 'up' state of that node to the priority queue, since we've found the quickest route to this node when going UP. If you were going that direction anyway, we wouldnt add anything to the weight. However, if we had to turn from the current node to get to the expanded node, we would add a small epsilon to the weight such that two shortest paths in distance would not be equal in cost if their number of turns differed. As long as epsilon is << cost of moving between nodes, its still the shortest path.
I now pose a similar problem, but with relaxed constraints. I no longer wish to find the shortest path, not even a path with the fewest turns. My only goal is to find a path of ANY length with numTurns <= n. To clarify, the goal of this algorithm would be to answer the question:
"Does there exist a path P from locations A to B such that there are fewer than or equal to n turns?"
I'm asking whether using some sort of greedy algorithm here would be helpful, since I do not require minimum distance nor turns. The problem is, if I'm NOT finding the minimum, the algorithm may search through more squares on the board. That is, normally a shortest path algorithm searches the least number of squares it has to, which is key for performance.
Are there any techniques that come to mind that would provide an efficient way (better or same as A*) to find such a path? Again, A* with fewest turns provides the "optimal" solution for distance and #turns. But for my problem, "optimal" is the fastest way the function can return whether there is a path of <=n turns between A and B. Note that there can be obstacles in the path, but other than that, moving from one square to another is the same cost (unless turning, as mentioned above).
I've been brainstorming, but I can not think of anything other than A* with the turn states . It might not be possible to do better than this, but I thought there may be a clever exploitation of my relaxed conditions. I've even considered using just numTurns as the cost of moving on the board, but that could waste a lot of time searching dead paths. Thanks very much!
Edit: Final clarification - Path does not have to have least number of turns, just <= n. Path does not have to be a shortest path, it can be a huge path if it only has n turns. The goal is for this function to execute quickly, I don't even need to record the path. I just need to know whether there exists one. Thanks :)

Is A-star guaranteed to give the shortest path in a 2D grid

I am working with A-star algorithm, whereing I have a 2D grid and some obstacles. Now, I have only vertical and horizontal obstacles only, but they could vary densely.
Now, the A-star works well (i.e. shortest path found for most cases), but if I try to reach from the top left corner to the bottom right, then I see sometimes, the path is not shortest, i.e. there is some clumsiness in the path.
The path seem to deviate from the what the shortest should path should be.
Now here is what I am doing with my algorithm. I start from the source, and moving outward while calculating the value of the neighbours, for the distance from source + distance from destination, I keep choosing the minimum cell, and keep repeating until the cell I encounter is the destination, at which point I stop.
My question is, why is A-star not guaranteed to give me the shortest path. Or is it? and I am doing something wrong?
Thanks.
A-star is guaranteed to provide the shortest path according to your metric function (not necessarily 'as the bird flies'), provided that your heuristic is "admissible", meaning that it never over-estimates the remaining distance.
Check this link: http://theory.stanford.edu/~amitp/GameProgramming/Heuristics.html
In order to assist in determining your implementation error, we will need details on both your metric, and your heuristic.
Update:
OP's metric function is 10 for an orthogonal move, and 14 for a diagonal move.
OP's heuristic only considers orthogonal moves, and so is "inadmissible"; it overestimates by ignoring the cheaper diagonal moves.
The only cost to an overly conservative heuristic is that additional nodes are visited before finding the minimum path; the cost of an overly aggressive heuristic is a non-optimal path possibl e being returned. OP should use a heuristic of:
7 * (deltaX + deltaY)
which is a very slight underestimate on the possibility of a direct diagonal path, and so should also be performant.
Update #2:
To really squeeze out performance, this is close to an optimum while still being very fast:
7 * min(deltaX,deltaY) + 10 * ( max(deltaX,deltaY) -
min(deltaX,deltaY) )
Update #3:
The 7 above is derived from 14/2, where 14 is the diagonal cost in the metric.
Only your heuristic changes; the metric is "a business rule" and drives all the rest. If you are interested on A-star for a hexagonal grid, check out my project here: http://hexgridutilities.codeplex.com/
Update #4 (on performance):
My impression of A-star is that it staggers between regions of O(N^2) performance and areas of almost O(N) performance. But this is so dependent on the grid or graph, the obstacle placement, and the start and end points, that it is hard to generalize. For grids and graphs of known particular shapes or flavours there are a variety of more efficient algorithms, but they often get more complicated as well; TANSTAAFL.
I'm sure you are doing something wrong(Maybe some implementation flaw,your idea with A* sounds correct). A* guarantee gives the shortest path, it can be proved in math.
See this wiki pages will gives you all the information to solve your problem .
NO
A* is one of the fastest pathfinder algorithms but, it doesn't necessarily give the shortest path. If you are looking for correctness over time then it's best to use dijkstra's algorithm.

Resources