In the A* (path finding) algorithm, why must h() be *admissible*?

In the A* (path finding) algorithm, why must h() be *admissible*? - algorithm

According to the Wikipedia article on the A* search algorithm it says:
Here, g(n) is the known cost of getting from the initial node to n;
this value is tracked by the algorithm. h(n) is a heuristic estimate
of the cost to get from n to any goal node. For the algorithm to find
the actual shortest path, the heuristic function must be admissible,
meaning that it never overestimates the actual cost to get to the
nearest goal node. The heuristic function is problem-specific and must
be provided by the user of the algorithm.
It specifically states that the h() function must not overestimate the distance. Yet, it seems to me that in my code if my heuristic h() function returns infinity (or zero) it performs just as well and still finds the shortest path.
So why should it be admissable? Isn't a value of infinity overestimating my heuristic? I feel like my node graph is complex enough. Are there specific situations where this would make a difference that I perhaps have not reproduced in my graph?
Addendum:
See this fiddle and feel free to mess with the h function at line 221. Click on the floorplan to move the red dot.
Any of the following commented lines work equally well for the h() function.
var h = function(a,b) {
//return calcDistance(a,b);
//return 0;
return 999999;
}

If your heuristic is not admissible, then you will sometimes "settle for less than the best."
Suppose your search has just reached the goal node. Can you stop? Or is there yet to be found a better path to the goal?
If the heuristic always underestimates the shortest path from any node to the goal, you can look at each frontier node N and compare (Cost to get to N) + (Heuristic for N) to (Cost to get to the goal via the path I already found). If there isn't any node N for which it is still possible to find a shorter path to the goal, then you're done.
If your heuristic is not admissible, this reasoning will not work.

Related

Breadth First Search vs A* with Manhattan Distance in a maze

Given an initial state and a single final state in a maze, is it possible to design a maze in which breadth first search expands less nodes than A* with manhattan distance as heuristic function? The cost of expanding to all nodes is 1.

It's impossible. The intuition is your heuristic is more informed than BFS. This is also the base for proving it.
Formally:
h'(n) = 0 is also an admissible heuristic function.
BFS is basically A* using h' as its heuristic function (since it always expands based on f'=g(n) + h'(n) = g(n))
h dominates h', since for all n: h'(n) <= h(n).
Since h dominates h', and is monotone, then the nodes expanded by algorithm using h is a subset of those expanded by algorithm using h'. More info and proof in this thread, and in the original article
QED

There are three ways this can happen in general:
When A* gets an inconsistent heuristic. A* can do up to 2^N expansions of N states with an inconsistent heuristic. For an undirected graph an inconsistent heuristic has some states with |h(a, g)-h(b, g)| > c(a, b) even if the heuristic is admissible (h(a, g) <= c(a, g)). Manhattan distance is consistent, so this won't work in your example.
In a breadth-first search it is assumed that all costs are 1. Thus, when the goal is generated, it can terminate immediately knowing that it has the optimal cost to the goal. A* typically does not terminate until the goal is selected for expansion. Note that if A* was guaranteed that all edges have uniform cost (or some minimum cost), then A* could do the same.
A* is only optimal up to necessary expansions - those with f(n) < C* where C* is the optimal solution cost. If a problem instance has more than one state with f(n) = C* then A* could do worse if it did poorly with tie-breaking and BFS got lucky with tie-breaking.
So, now consider this example:
....._.
SXXXXG.
.......
Here, a . is open space, as is the _ cell. S is the start and G is the goal. The X states are impassible and the _ cell can be reached/expanded, but you can't go down from that state to the goal.
If A* is unlucky it will expand the top route first (assume it expands largest g-cost first, but this also works with other tie-breaking). That route looks promising until it reaches the end, after which A* will have to expand the full alternate route.
Assume BFS gets lucky and expands the state below G before the _ state. Then, BFS will be able to terminate without expanding the _ state, and do fewer expansions that A*.
To be clear, label the states in the previous example as follows:
abcdefg
SXXXXG.
hijklmn
A* with unlucky tie-breaking will expand a, b, c, d, e, f and generate g (but not expand it). Then it will continue and expand h, i, j, k, l, and m, after which it can terminate with the solution.
BFS with lucky tie-breaking will expand h, a, i, b, j, c, k, d, l, e, and m and then terminate with the optimal solution. BFS does one less expansion than A*, because it doesn't expand f.
So, yes, it is possible for BFS to beat A* if the tie-breaking works out in favor of BFS.
Such examples will always be possible unless A* and BFS always expand states in the same order, or if further restrictions are put on the maze so that the heuristic is always perfect next to the goal.
See this paper for common misconceptions about A* search. One addresses the misconception that a better heuristic will do less work, and another addresses the misconception that A* with a 0-heuristic is the same as BFS.
--
Note 1: There is some discussion in the comments about whether the _ state is expanded if it doesn't have any successors. The original A* paper states:
Starting with the node s, they generate some part of the subgraph G, by repetitive application of the successor operator R. During the course of the algorithm, if R is applied to a node, we say that the algorithm has expanded that node.
Thus, if we apply the R operator and find no successors, we still have expanded the node. (They use a greek letter which I've replaced with R.) But, with a small edit to the map above, the _ state does have one successor, so _ is expanded by A* and a successor is generated (not the goal).
Note 2:
There is a question in the comments about Theorem 3 in the A* paper. That theorem states that there always exists some tie-breaking scheme for A* that will be at least as good as any other less informed algorithm. There are two problems with this theorem. First, it only states that A* is capable of beating another algorithm given the right tie-breaking, not that it will always beat every other algorithm. The second problem is that BFS is more informed than A*. BFS knows all edges have unit cost and A* does not. So, the theorem does not apply to BFS, because BFS has more information.
Note 3:
The question only asks "Is this possible." My answer provided here shows the precise conditions under which it is possible. The other answers (one of which has now been deleted) categorically state that it can never happen, and thus are incorrect.

A star algorithm optimal path criteria

Does the A star algorithm return definitely the path with the less cost ?
I'm running this algorithm and it proposes a path which doesn't have the minimum cost (I found another one with a less cost)
Why does it propose this path and not the other ( with less cost)?
Does it have other criteria to choose the proposed path in addition of the cost criteria?
This is an exmaple of what I'm asking about the green path have less cost but the algorithm propose the orange one

The heuristic must return a value that is smaller or equal to the actual minimal cost. Otherwise the algorithm may return a wrong result.

Does the A star algorithm return definitely the path with the less
cost ?
Yes, as long as you have used a consistent heuristic. If so, there are two options,
Your A* implementation is wrong, therefore the algorithm returns a suboptimal path.
The heuristic function you are using is not consistent, ie it does not give an estimation for each node which is smaller or equal to the real shortest path to the goal.
It must be one of these two, since A* is proved to always find the optimal path whenever a consistent heuristic function is used.
Imagine that the first node in the green path, which has cost 1 to be reached from A, has a heuristic value according to whatever function you are using of h(green_1) = 20. That value overestimates the real shortest path from that node to the goal node B, which is 6. Let me now assume the heuristic estimations of all nodes in the orange path correspond to the real shortest path from that node to B. Thus,
f(green_1)= g(green_1) + h(green_1) = 1 + 20 = 21
All f(n) values corresponding to the orange path will have smaller f(n) values and therefore green_1 will not be selected for expansion. The goal node will also be added to the OPEN list with f(B) = g(B) + h(B) = 11 + 0 and expanded, and since our heuristic has only promised us a path with cost 21 from the green side, which is worse than the already found orange path, the algorithm will finish and return the suboptimal solution.
.

Modifying A* to find a path to closest of multiple goals on a rectangular grid

The problem: finding the path to the closest of multiple goals on a rectangular grid with obstacles. Only moving up/down/left/right is allowed (no diagonals). I did see this question and answers, and this, and that, among others. I didn't see anyone use or suggest my particular approach. Do I have a major mistake in my approach?
My most important constraint here is that it is very cheap for me to represent the path (or any list, for that matter) as a "stack", or a "singly-linked-list", if you want. That is, constant time access to the top element, O(n) for reversing.
The obvious (to me) solution is to search the path from any of the goals to the starting point, using a manhattan distance heuristic. The first path from the goal to the starting point would be a shortest path to the closest goal (one of many, possibly), and I don't need to reverse the path before following it (it would be in the "correct" order, starting point on top and goal at the end).
In pseudo-code:
A*(start, goals) :
init_priority_queue(start, goals, p_queue)
return path(start, p_queue)
init_priority_queue(start, goals, q_queue) :
for (g in goals) :
h = manhattan_distance(start, g)
insert(h, g, q_queue)
path(start, p_queue) :
h, path = extract_min(q_queue)
if (top(path) == start) :
return path
else :
expand(start, path, q_queue)
return path(start, q_queue)
expand(start, path, q_queue) :
this = top(path)
for (n in next(this)) :
h = mahnattan_distance(start, n)
new_path = push(n, path)
insert(h, new_path, p_queue)
To me it seems only natural to reverse the search in this way. Is there a think-o in here?
And another question: assuming that my priority queue is stable on elements with the same priority (if two elements have the same priority, the one inserted later will come out earlier). I have left my next above undefined on purpose: randomizing the order in which the possible next tiles on a rectangular grid are returned seems a very cheap way of finding an unpredictable, rather zig-zaggy path through a rectangular area free of obstacles, instead of going along two of the edges (a zig-zag path is just statistically more probable). Is that correct?

It's correct and efficient in the big O as far as I can see (N log N as long as the heuristic is admissible and consistent, where N = number of cells of the grid, assuming you use a priority queue whose operations work in log N). The zig-zag will also work.
p.s. For these sort of problem there is a more efficient "priority queue" that works in O(1). By these sort of problem I mean the case where the effective distance between every pair of nodes is a very small constant (3 in this problem).
Edit: as requested in the comment, here are the details for a constant time "priority queue" for this problem.
First, transform the graph into the following graph: Let the potential of nodes in the graph (i.e., cell in a grid) be the Manhattan Distance from the node to the goal (i.e., the heuristic). We call the potential of node i as P(i). Previously, there is an edge between adjacent cells and its weight is 1. In the modified graph, the weight w(i, j) is changed into w(i, j) - P(i) + P(j). This is exactly the same graph as in the proof to why A* is optimal and terminates in polynomial time in the case the heuristic is admissible and consistent. Note that Manhattan Distance heuristic for this problem is both admissible and consistent.
The first key observation is that A* in the original graph is exactly the same with Dijkstra in the modified graph. This is since the "value" of node i in the modified graph is exactly the distance from the origin node plus P(i). The second key observation is that the weight of every edge in our transformed graph is either 0 or 2. Thus, we can simulate the A* by using a "deque" (or a bidirectional linked list) instead of an ordinary queue: whenever we encounter an edge with weight 0, push it to the front of the queue, and whenever we encounter an edge with weight 2, push it to the end of the queue.
Thus, this algorithm simulates A* and works in linear time in the worst case.

A* Algorithm Search

I have a tree like the one below. Numbers on the edges are costs (g) and number in the nodes are the estimated distance from the goal from the heuristic function (h). The goal is shaded in grey.
If I start at S, the route, would the traversal for A-star search (f(x) = g(x) + h(x)) be as follow: S>B>H>M ?
This is a funny question because if we are looking instead at the Greedy search algo where the function for determining the next move = f(x) = h(x) we will consider the values in the nodes only and select the least one. Based on this we will start at S and then go on to A (lowest value best), but the leftmost branch is incorrect as it will not lead to any of the goal nodes. Would I be correct to assume that a greedy search will fail with this tree?

Firstly, this is not a tree, it's a DAG, because some nodes have multiple parents.
Secondly, yes, A* will return the correct result with this heuristic, because the heuristic is admissible (ie. it never overestimates the true cost). If that were not true, A* might not return the correct result.

No, the greedy search will walk through S->A->D->B->F.
Heuristic search is just try to speed up the search but it won't make the search fail, the worst case is just it takes longer time than no heuristic.

A* Search Modification

The Wikipedia listing for A* search states:
In other words, the closed set can be omitted (yielding a tree search algorithm) if a solution is guaranteed to exist, or if the algorithm is adapted so that new nodes are added to the open set only if they have a lower f value than at any previous iteration.
However, in doing so, I have found that I receive erroneous results in an otherwise functional A* search implementation. Can someone shed some light on how one would make this modification?

Make sure your heuristic meets the following:
h(x) <= d(x,y) + h(y)
which means that your heuristic function should not overestimate the cost of getting from your current location to the destination or goal.
For example, if you are in a grid and you are trying to get from A to B, both points on this grid. A good heuristic function is the Euclidean distance between current location and goal:
h(x) = sqrt[ (crtX -goalX)^2 + (crtY -goalY)^2 ]
This heuristic does not overestimate because of the triangle inequality.
More on triangle inequality: http://en.wikipedia.org/wiki/Triangle_inequality
More on Euclidean distance: http://mathworld.wolfram.com/Distance.html

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio