How is Manhattan distance an admissible heuristic? - algorithm

Ain't it true that while counting the moves for 1 tile can lead to other tiles getting to their goal state? And hence counting for each tile can give us a count more than the minimum moves required to reach the goal state?
This question is in context of Manhattan distance for 15-Puzzle.
Here is the Question in different words:
Can we use Manhattan distance as an admissible heuristic for N-Puzzle. To implement A* search we need an admissible heuristic. Is Manhattan heuristic a candidate? If yes, how do you counter the above argument (the first 3 sentences in the question)?
Definitions: A* is a kind of search algorithm. It uses a heuristic function to determine the estimated distance to the goal. As long as this heuristic function never overestimates the distance to the goal, the algorithm will find the shortest path, probably faster than breadth-first search would. A heuristic that satisfies that condition is admissible.

Admissable heuristics must not overestimate the number of moves to solve this problem. Since you can only move the blocks 1 at a time and in only one of 4 directions, the optimal scenario for each block is that it has a clear, unobstructed path to its goal state. This is a M.D. of 1.
The rest of the states for a pair of blocks is sub-optimal, meaning it will take more moves than the M.D. to get the block in the right place. Thus, the heuristic never over-estimate and is admissible.
I will delete when someone posts a correct, formal version of this.

Formal Proof:
By definition of h, h(s∗) = 0, if s∗ is the goal state. Assume for proof by contradiction
that C∗ < h(s0) for some initial state s0. Note that, since each action can move
only one tile, performing an action can at most reduce h by one. Since the goal can
be reached in C∗ actions, we have h(s∗) ≥ h(s0) − C∗ > 0, which brings us to a
contradiction since h(s∗) should be zero. Therefore, we must have h(s0) ≤ C∗ forall
s0, and h is admissible.
(Source: University of California, Irvine)

Related

By how much suboptimal can be path found with A* using heuristic which overestimates the remaining distance a little?

Consider you use A* algorithm in which heuristic can overestimate the remaining distance by a few meters. Can it happen that the final path is several kilometres longer than the really shortest path? Can you give an example of graph in which this happens, what kind of graph is it?
A scenario in which Euclidean (straight line) distance can overestimate the remaining distance is:
The graph vertices are situated in (x, y) coordinates on a plane, where x and y are floating-point
There are arcs of some floating-point lengths between some vertices of the graph. The length of an arc is no smaller than the Euclidean distance between its vertices (but can be greater for bended/non-straight arcs)
However, while running A* algorithm you use integer arithmetic with rounding down, while A* estimate is rounded up (this is unreasonable, but just an example of how small the differences are): so you round the length of each arc down to integer number of meters, and you round A* estimate up to integer number of meters
Is there a formula which says the upper bound on suboptimality of the final path given the upper bound on how much A* heuristic overestimates the remaining distance?
A* returns when the partial answer it retrieves (which is the partial answer with smallest estimated total distance to reach the goal) has in fact reached the goal. Standard A* guarantees to find the correct answer because, by the definition of the heuristic, all the estimated total distances to reach the goal are lower bounds, so none of the other answers can do better.
Suppose the heuristic for an answer which in fact will end up with total distance T can be up to KT, where K > 1. If A* retrieves an answer with cost KT and thinks it has succeeded because the answer reaches the goal, then it might not be the best answer. We know that every partial answer still in the pool has cost at least KT, but because of the heuristic, an answer with heuristic KT might actually turn into an answer with total cost T < KT (but cannot turn into an answer with any cheaper cost). So with this sort of heuristic, you the answer returned by A* can be up to K times as expensive as the best answer, but no more.
This is actually summarized in the Wikipedia entry at http://en.wikipedia.org/wiki/A_search_algorithm#Bounded_relaxation - using such heuristics deliberately is one way to speed up A at the cost of returning worse answers.

Understanding an Inconsistent Heuristic

Say I have a grid with some squares designated as "goal" squares. I am using A* in order to navigate this grid, trying to visit every goal square at least once using non-diagonal movement. Once a goal square has been visited, it is no longer considered a goal square. Think Pac Man, moving around and trying to eat all the dots.
I am looking for a consistent heuristic to give A* to aid in navigation. I decided to try a "return the Manhattan Distance to the nearest unvisited goal" heuristic for any given location. I have been told that this is not a consistent heuristic but I do not understand why.
Moving one square towards the closest goal square has a cost of one, and the Manhattan Distance should also be reduced by one. Landing on a goal square will either increase the value of the heuristic (because it will now seek the next nearest unvisited goal) or end the search (if the goal was the last unvisited goal)
H(N) < c(N,P) + h(P) seems to always hold true. What is it that makes this algorithm inconsistent, or is my instructor mistaken?
If you are asking how to use A* to find the shortest path through all the goals, the answer is: you can't (with only one iteration). This is the Travelling Salesman Problem, an NP-Complete problem. To solve this using A*, you'd need to try every permutation of goal-orderings. Each path from a single-start to a single-goal could then be solved using A* (so you'd need to run the algorithm multiple times for each permutation).
However, if you are asking how to use A* to find the shortest path from a single start to any one of a number of goals, your solution works fine, and your heuristic is indeed consistent. The minimum of multiple consistent-heuristics is still a consistent-heuristic, which is easy to prove.

Why isn't my heuristic for the A* algorithm admissible?

I am going through the CS 188 availible to the public at edx.org. Right now I have to develop a heuristic for an A* search to eat all the pellets as shown here:
My heuristic that I was sure would work, (as both admissible and consistent) went like this:
initialize heuristic accumulator called h to be 0
initialize pos to be the current position of pacman
while pellets not eaten:
get nearest pellet from pos using astar search (Manhattan distance as the heuristic)
add the distance to h
remove the pellet from pellets
set pos to be the position of pellet
I also cache the previously computed distances so the astar search to find the nearest pellet isn't done if it has already been done before in another computation of a state. It is able to solve the problem very quickly, and the outcome is optimal.
When I use this algorithm in the autograder, it fails the admissibility test.
Don't worry, I am not asking for a solution to the problem, only why my current solution is not admissible? When I go through the example in the picture in my head the heuristic is never overestimating the cost.
So if anyone was able to understand this, and has any ideas your input is greatly appreciated!
A heuristic for A* needs to provide a number that is no more than the best possible cost. Your heuristic is a plausible greedy solution that does not guarantee this. Suppose that there is a single line of pellets and the pac-man is slightly off centre on this line. The cheapest solution is work out which end of the line is nearest, eat all the pellets to the end of that line, and then move in the other direction to eat all the other pellets without having to reverse in the longer half of the line.
Your greedy heuristic moves first to whichever pellet is nearest the pac-man which might not be the side that has the shortest distance, so in this case it may not return a cost no greater than the optimal cost - it returns the cost of a possible solution which may not be optimal.
Here is way to set up heuristic which is feasible for your problem. Firstly if your goal is to eat all pellets in minimum distance then your solution is too greedy to achieve a feasible solution for it. Here is way to redesign your heuristic:-
Goal : Eat all pellets in minimum path length.
Heuristic Estimate :
1.> Use A* to calculate all shortest paths from current position to pellets independently.
2.> Cost function: (sum of all unvisited pellets shortest path from current)*2 + total distance from start state.
The Cost function is upper bound .
Note: There can be more efficient way to calculate shortest paths to uneaten pellets at each state. Would need some research.

A* Search Modification

The Wikipedia listing for A* search states:
In other words, the closed set can be omitted (yielding a tree search algorithm) if a solution is guaranteed to exist, or if the algorithm is adapted so that new nodes are added to the open set only if they have a lower f value than at any previous iteration.
However, in doing so, I have found that I receive erroneous results in an otherwise functional A* search implementation. Can someone shed some light on how one would make this modification?
Make sure your heuristic meets the following:
h(x) <= d(x,y) + h(y)
which means that your heuristic function should not overestimate the cost of getting from your current location to the destination or goal.
For example, if you are in a grid and you are trying to get from A to B, both points on this grid. A good heuristic function is the Euclidean distance between current location and goal:
h(x) = sqrt[ (crtX -goalX)^2 + (crtY -goalY)^2 ]
This heuristic does not overestimate because of the triangle inequality.
More on triangle inequality: http://en.wikipedia.org/wiki/Triangle_inequality
More on Euclidean distance: http://mathworld.wolfram.com/Distance.html

Does A* work with negative weights as long that the heuristic is admissible?

This seems true but I can't find anyone on the internet saying it is, so I'd like to make sure. Please tell me if you agree and if so, why. Ideally a link to a paper, or, if you disagree, a counterexample.
Let G be a directed graph with some negative edges. We want to run A* on G.
First off, if G has negative cycles reachable from the source and reaching the goal, there is no admissible heuristic, as it's impossible to underestimate the cost of reaching the goal because it's -∞.
If there are no such cycles however, there may be some admissible heuristic. In particular, the sum of all negative edges will always underestimate the cost of reaching the goal.
I'm under the impression that in this case, A* may work just fine.
P.S. I could run Bellman-Ford on the graph, detect negative cycles, and if there are none, reweigh to get rid of negative edges. But, if I know there are no negative cycles, I can just skip that and run A*.
This is trivially wrong. The cost of a vertex is the sum of the heuristic and the path built so far... while the heuristic underestimates the cost to reach the goal, the sum of the heuristic and the path taken so far may not. Maxima culpa.
It seems that sorting the open set with a function that underestimates the cost to reach the goal, while going through a given vertex may work though... if one uses <sum of negative edges in the graph> as such a function, it looks like it degenerates into a graph traversal.
Consider the example with 3 nodes and 3 weights:
1 2 -10
1 3 -3
3 2 -8
From 1 to 2 there is path with weight -10. So you get this first and establish it as the minimal path to 2. However there is path (1-3-2) which is smaller than the first one.
In the example given by top rated answer:
(2,-10) goes into priority queue. Agreed.
So does (3,x) where x<=-11 as heuristic is admissible.
Now (3,x) gets popped as x<-10 and we get to the correct solution.
I can't put this as a comment as I don't have enough reputation.
This is trivially wrong. The cost of a vertex is the sum of the
heuristic and the path built so far... while the heuristic
underestimates the cost to reach the goal, the sum of the heuristic
and the path taken so far may not.
A* will never expand a node such that the sum of the heuristic and the path taken so far (f-value) is greater than the optimal path length. This is because, along the optimal path, there is always one node with f-value less than or equal to the optimal cost.
Thus, even with negative-weight edges, A* will find the optimal path if such a path exists, as long as there are finite number of edges with f-value less than the optimal cost.

Resources