A* heuristic, overestimation/underestimation? - algorithm

I am confused about the terms overestimation/underestimation. I perfectly get how A* algorithm works, but i am unsure of the effects of having a heuristic that overestimate or underestimate.
Is overestimation when you take the square of the direct birdview-line? And why would it make the algorithm incorrect? The same heuristic is used for all nodes.
Is underestimation when you take the squareroot of the direct birdview-line? And why is the algorithm still correct?
I can't find an article which explains it nice and clear so I hope someone here has a good description.

You're overestimating when the heuristic's estimate is higher than the actual final path cost. You're underestimating when it's lower (you don't have to underestimate, you just have to not overestimate; correct estimates are fine). If your graph's edge costs are all 1, then the examples you give would provide overestimates and underestimates, though the plain coordinate distance also works peachy in a Cartesian space.
Overestimating doesn't exactly make the algorithm "incorrect"; what it means is that you no longer have an admissible heuristic, which is a condition for A* to be guaranteed to produce optimal behavior. With an inadmissible heuristic, the algorithm can wind up doing tons of superfluous work examining paths that it should be ignoring, and possibly finding suboptimal paths because of exploring those. Whether that actually occurs depends on your problem space. It happens because the path cost is 'out of joint' with the estimate cost, which essentially gives the algorithm messed up ideas about which paths are better than others.
I'm not sure whether you will have found it, but you may want to look at the Wikipedia A* article. I mention (and link) mainly because it's almost impossible to Google for it.

From the Wikipedia A* article, the relevant part of the algorithm description is:
The algorithm continues until a goal node has a lower f value than any node in the queue (or until the queue is empty).
The key idea is that, with understimation, A* will only stop exploring a potential path to the goal once it knows that the total cost of the path will exceed the cost of a known path to the goal. Since the estimate of a path's cost is always less than or equal to the path's real cost, A* can discard a path as soon as the estimated cost exceeds the total cost of a known path.
With overestimation, A* has no idea when it can stop exploring a potential path as there can be paths with lower actual cost but higher estimated cost than the best currently known path to the goal.

Intuitive Answer
For A* to work correctly (always finding the 'best' solution, not just any), your estimation function needs to be optimistic.
Optimism here means that your expectations are always better than reality.
An optimist will try many things that might disappoint in the end, but they will find all the good opportunities.
A pessimist expects bad results, and so will not try many things. Because of this, they may miss some golden opportunities.
So for A*, being optimistic means to always underestimate the costs (i.e. "it's probably not that far"). When you do that, once you found a path, then you might still feel excited about several unexplored options, that could be even better.
That means you won't stop at the first solution, and still try those other ones. Most will probably disappoint (not be better), but it guarantees you will always find the best solution. Of course trying out more options takes more work (time).
A pessimistic A* will always overestimate cost (e.g. "that option is probably pretty bad"). Once it has found a solution and it knows the true cost of the path, every other path will seem worse (because estimates are always worse than reality), and it will never try any alternative once the goal is found.
The most effective A* is one that never under-estimates, but estimates either perfectly or just slightly over-optimistic. Then you'll not be naive and try too many bad options.
A nice lesson for everyone. Always be slightly optimistic!

Short answer
#chaos answer is bit misleading imo (can should be highlighted)
Overestimating doesn't exactly make the algorithm "incorrect"; what it means is that you no longer have an admissible heuristic, which is a condition for A* to be guaranteed to produce optimal behavior. With an inadmissible heuristic, the algorithm can wind up doing tons of superfluous work
as #AlbertoPL is hinting
You can find an answer quicker by overestimating, but you may not find the shortest path.
In the end (beside the mathematical optimum), the optimal solution strongly depends on whether you consider computing resources, runtime, special types of "Maps"/State Spaces, etc.
Long answer
As an example I could think of an realtime application where a robot gets faster to the target by using an overestimating heuristic because the time advantage by starting earlier is bigger than the time advantage by taken the shortest path but waiting longer for computing this solution.
To give you a better impression, I share some exemplary results that I quickly created with Python. The results stem from the same A* algorithm, only the heuristic differs. Each node(grid cell) has got edges to all eight neighbors except walls. Diagonal edges cost sqrt(2)=1.41
The first picture shows the returned paths of the algorithm for an simple example case. You can see some suboptimal paths from overestimating heuristics (red and cyan). On the other hand there are multiple optimal paths (blue, yellow, green) and it depends on the heuristic which one is found first.
The different images show all expanded nodes when the target is reached. The color shows the estimated path cost using this node (considering the "already taken" path from start to this node as well; mathematically it's the cost so far plus the heuristic for this node). At any time the algorithm expands the node with lowest estimated total cost (described before).
1. Zero (blue)
Corresponds to the Dijkstra algorithm
Nodes expanded: 2685
Path length: 89.669
2. As the crow flies (yellow)
Nodes expanded: 658
Path length: 89.669
3. Ideal (green)
Shortest path without obstacles (if you follow the eight directions)
Highest possible estimate without overestimating (hence "ideal")
Nodes expanded: 854
Path length: 89.669
4. Manhattan (red)
Shortest path without obstacles (if you don't move diagonally; in other words: cost of "moving diagonally" is estimated as 2)
Overestimates
Nodes expanded: 562
Path length: 92.840
5. As the crow flies times ten (cyan)
Overestimates
Nodes expanded: 188
Path length: 99.811

As far as I know, you want to typically underestimate so that you may still find the shortest path. You can find an answer quicker by overestimating, but you may not find the shortest path. Hence why overestimation is "incorrect", whereas underestimating can still provide the best solution.
I'm sorry that I cannot provide any insight as to the birdview lines...

Consider heuristic as f(x)=g(x)+h(x), where g(x) is the real cost from start-node to current-node, and h(x) the prediction cost from current-node to goal. Assume the optimal cost is R then:
The h(x) makes difference in the early stage of the searching. Given three node A,B,C
(*) => current pos: A
A -------> B - 。。。 -> C
|_______________________| => the prediction range of h(x)
Once you step on B, the cost from A to B is truth, the prediction h(x) doesn't include it anymore:
(*) => current pos: B
A -------> B - 。。。 -> C
|____________| => the prediction range of h(x)
When we say under-estimate, it means that your h(x) will cause f(x) < R for all x on the way to goal.
Over-estimation indeed makes the algorithm incorrect:
Assume R is 19. And given that the two cost 20, 21 are the cost of the paths that already reach the goal:
Front Rear
------------------------- => This is a priority queue PQ.
| 20 | 20 | 30 | ... | 99 |
^-------- => This is the "fake" optimal.
But say f(y)=g(y)+h(y), and y is indeed on the right path to achieve the optimal cost R, but since h(y) is over-estimated, so the f(y) is currently 30 in the PQ, so before we can update 30 to 19, the algorithm already will pop 20 from the PQ and wrongly assume that it were an "optimal" solution.

Related

Why do you need to pick a heuristic close to the actual path costs?

I'm a computing science student and I currently have a subject about artificial intelligence. This subject covers various best-first search pathfinding algorithms, such as A* search.
My study material claims that it is usual to pick a heuristic following the next rule:
0 ≤ h(n) ≤ h(n)*
I understand this bit, but after that the material claims your heuristic should be optimistic or close to f(n). According to the material this should result in less nodes being expanded.
Why (and should you) you pick a heuristic as close as possible to f(n) and why does this result in (a lot) less nodes being expanded?
Thanks in advance for replying, you'll be helping me out so much for my next exam!
Example:
Find the shortest way to a specific field on a chess-board like field with obsticles.
Lets say you only have the possibility to go left right up or down.
A heuristic gives you a guess how many steps to the goal you will need for every one of the four possible fields you can go at every iteration.
Now your heuristic can be:
allways optimal: if thats the case you will allways go to the correct next field and initially find your best path.
allways lower or optimal: here it might occur that you go to the wrong field some time but if you reached your goal (or a view fields later) your algorithm will see that the path you found (or the actual heuristic) is greater than the heuristic of the field you should have gone before.
In other words: your heuristic gives you allways a lower or equal number than the actual steps you have to make. Hence if you find a path shorter or equal to all heuristics of fileds you didn't visit you can be sure your path is optimal.
"sometimes higher" if your heuristic gives you sometimes more steps than you would actually need you never can be sure that the path you have found is an optimal path!
So the worst thing to happen is an overestimation of the way by your heuristic because you might not find the optimal path. Therfore you have the condition 0 ≤ h(n) ≤ h(n)*
And the closer you hare to your optimal heuristic, the less "wrong fields" you visit in your search. And the less wrong fields you visit, the faster you are.

What are the problems associated to Best First Search in Artificial intelligence?

I Know general issues include local maxima and plateaus however I am curious if there is any more issues associated to this specific search and what my best course of action would be in order to overcome these issues.
Can someone also give me an example of which sort of problem this search would be good to use for?
Problems with best first search:
It is greedy. In many cases it leads to a very quick solution
(because your number of developed nodes does not increase
exponentially, it is increased linearly with the depth of the
solution!), however it is usually not optimized, because your
heuristic function has some error and sometimes gets the wrong
answer which next node to explore.
There is also an issue with an infinite branch. Assume you are
following a branch where node at depth i has a heuristic value of
h(v_i) = 2^-i. You will never get to zero, but greedy best first
will keep developing these nodes.
Example:
2
/ \
/ \
/ \
1 1.5
| |
1/2 1
| |
1/4 0
|
1/8
|
1/16
|
...
Note that the above is admissible heuristic function, but nevertheless best first search will never get the solution, it'll get stuck in the infinite branch.
Solutions:
To overcome it, we can use an uniformed algorithm (such as Dijkstra's
algorithm or BFS for unweighted graphs)
You can use a combination of "best first search" and Dijkstra, which
is known as A* algorithm.
A* Algorithm is actually a greedy best first algorithm, but instead of choosing according to h(v), you chose which node to explore next with f(v) = h(v) + g(v) (where g(v) is the "so far cost". The algorithm is complete (finds a solution if one exists) and optimal (finds the "best" solution) if it is given an admissible heuristic function.
When to use Best First Search anyway:
If you have a perfect heuristic (denoted as h* in the literature), best first search will find an optimal solution - and fast.
If you don't care about optimal solution, you just want to find one solution fast - it usually does the trick (but you will have to be careful for the infinite branch problem).
When we use A*, we actually use best first search - but on f:V->R instead of on h:V->R.

Finding fastest path at a cost, less or equal to a specified

Here's visualisation of my problem.
I've been trying to use djikstra on that however, It haven't worked.
The complication, as I see it, is that Dijkstra's algorithm throws away information that you need to keep around: if you are trying to get from A to E in
B
/ \
A D - E
\ /
C
And ABD is shorter than ACD, Dijkstra's will forget that ACD was ever a possibility (it uses ACD as the canonical route from A to D). But if ABD has a higher cost than ACD, and ABDE is above the quota while ACDE is below, the now eliminated ACD was correct. The problem is that Dijkstra's algorithm assumes that if one path is at least as long as another, it is weakly dominated: there is no reason to prefer it. And in one dimension of comparison, paths are weakly ordered: given any two paths, one weakly dominates the other.
But here we have two dimensions of comparison, and so ordering does not hold: one path can be shorter, the other cheaper. Since we can only discard dominated paths, we must keep all paths that do not already exceed the budget and are not dominated. I have put a bit of work into implementing this approach; it looks doable but cannot find an argument for a worst-case bound below exponential complexity (although normal performance should be much better, since in a sane graphs most paths are dominated).
You can also, as Billiska notes, use k-th shortest routes algorithms and then proceed through their results until you find one below the budget. That uses time O(m+ K*n*log(m/n)); but unless someone sees an upper bound on K such that K is guaranteed to include a path under the budget (if one exists), we need to set K to be the total number of paths, again yielding exponential complexity (although again a strategy of incrementally increasing K would likely yield a reasonable average runtime, at least if length and cost are reasonably correlated).
EDIT:
Complicating (perhaps fatally) the implementation of my proposed modification is that Dijkstra's algorithm relies on an ordering of the accessibility of nodes, such that we know that if we take the unexplored node to which we have the shortest path, we will never find a better route to it (since all other routes are already known to be longer). If that shortest route is also expensive, that need not hold; even after exploring a node, we must be prepared to update paths out of it on the basis of longer but cheaper routes into it. I suspect that this will prevent it from reaching polynomial time in the worst case.
Basically you need to find the first shortest-path, check if it works, then find the second shortest-path, check if it works, and so on...
Dijkstra's algorithm isn't designed to work with such task.
And just a Google search on this new definition of the problem,
I arrive at Stack Overflow question on finding kth-shortest-paths.
I haven't read into it yet, so don't ask me.
I hope this helps.
I think you can do it with Dijkstra, but you have to change the way you are calculating the tentative distance in each step. Instead of just taking into account the distance, consider also the cost. the new distance should be 2-d number (dist, cost), when you will choose what is the minimal distance you should take the one with minimal dist AND cost <= 6, that's it.
I hope this is correct.

Hill climbing and single-pair shortest path algorithms

I have a bit of a strange question. Can anyone tell me where to find information about, or give me a little bit of an introduction to using shortest path algorithms that use a hill climbing approach? I understand the basics of both, but I can't put the two together. Wikipedia has an interesting part about solving the Travelling Sales Person with hill climbing, but doesn't provide a more in-depth explanation of how to go about it exactly.
For example, hill climbing can be
applied to the traveling salesman
problem. It is easy to find a solution
that visits all the cities but will be
very poor compared to the optimal
solution. The algorithm starts with
such a solution and makes small
improvements to it, such as switching
the order in which two cities are
visited. Eventually, a much better
route is obtained.
As far as I understand it, you should pick any path and then iterate through it and make optimisations along the way. For instance going back and picking a different link from the starting node and checking whether that gives a shorter path.
I am sorry - I did not make myself very clear. I understand how to apply the idea to Travelling Salesperson. I would like to use it on a shortest distance algorithm.
You could just randomly exchange two cities.
You first path is: A B C D E F A with length 200
Now you change it by swapping C and D: A B D C E F A with length 350 - Worse!
Next step: A B C D F E A: length 150 - You improved your solution. ;-)
Hill climbing algorithms are really easy to implement but have several problems with local maxima! [A better approch based on the same idea is simulated annealing.]
Hill climbing is a very simple kind of evolutionary optimization, a much more sophisticated algorithm class are genetic algorithms.
Another good metaheuristic for solving the TSP is ant colony optimization
Examples would be genetic algorithms or expectation maximization in data clustering. With an iteration of single steps it is tried to come to a better solution with every step. The problem is that it only finds a local maximum/minimum, it is never assured that it finds the global maximum/minimum.
A solution for the travelling salesman problem as a genetic algorithm for which we need:
Representation of the solution as order of visited cities, e.g. [New York, Chicago, Denver, Salt Lake City, San Francisco]
Fitness function as the travelled distance
Selection of the best results is done by selecting items randomly depending on their fitness, the higher the fitness, the higher the probability that the solution is chosen to survive
Mutation would be switching to cities in a list, like [A,B,C,D] becomes [A,C,B,D]
Crossing of two possible solutions [B,A,C,D] and [A,B,D,C] result in [B,A,D,C], i.e. cutting both list in the middle and use the beginning of one parent and the end of the other parent to form the child
The algorithm then:
initalization of the starting set of solution
calculation of the fitness of every solution
until desired maximum fitness or until no changes happen any more
selection of the best solutions
crossing and mutation
fitness calculation of every solution
It is possible that with every execution of the algorithm the result is differently, therefore it should be executed more then once.
I'm not sure why you would want to use a hill-climbing algorithm, since Djikstra's algorithm is polynomial complexity O( | E | + | V | log | V | ) using Fibonacci queues:
http://en.wikipedia.org/wiki/Dijkstra's_algorithm
If you're looking for an heuristic approach to the single-path problem, then you can use A*:
http://en.wikipedia.org/wiki/A*_search_algorithm
but an improvement in efficiency is dependent on having an admissible heuristic estimate of the distance to the goal.
http://en.wikipedia.org/wiki/A*_search_algorithm
To hillclimb the TSP you should have a starting route. Of course picking a "smart" route wouldn't hurt.
From that starting route you make one change and compare the result. If it's higher you keep the new one, if it's lower keep the old one. Repeat this until you reach a point from where you can't climb anymore, which becomes your best result.
Obviously, with TSP, you will more than likely hit a local maximum. But it is possible to get decent results.

Matching algorithm

Odd question here not really code but logic,hope its ok to post it here,here it is
I have a data structure that can be thought of as a graph.
Each node can support many links but is limited to a value for each node.
All links are bidirectional. and each link has a cost. the cost depends on euclidian difference between the nodes the minimum value of two parameters in each node. and a global modifier.
i wish to find the maximum cost for the graph.
wondering if there was a clever way to find such a matching, rather than going through in brute force ...which is ugly... and i'm not sure how i'd even do that without spending 7 million years running it.
To clarify:
Global variable = T
many nodes N each have E,X,Y,L
L is the max number of links each node can have.
cost of link A,B = Sqrt( min([a].e | [b].e) ) x
( 1 + Sqrt( sqrt(sqr([a].x-[b].x)+sqr([a].y-[b].y)))/75 + Sqrt(t)/10 )
total cost =sum all links.....and we wish to maximize this.
average values for nodes is 40-50 can range to (20..600)
average node linking factor is 3 range 0-10.
For the sake of completeness for anybody else that looks at this article, i would suggest revisiting your graph theory algorithms:
Dijkstra
Astar
Greedy
Depth / Breadth First
Even dynamic programming (in some situations)
ect. ect.
In there somewhere is the correct solution for your problem. I would suggest looking at Dijkstra first.
I hope this helps someone.
If I understand the problem correctly, there is likely no polynomial solution. Therefore I would implement the following algorithm:
Find some solution by beng greedy. To do that, you sort all edges by cost and then go through them starting with the highest, adding an edge to your graph while possible, and skipping when the node can't accept more edges.
Look at your edges and try to change them to archive higher cost by using a heuristics. The first that comes to my mind: you cycle through all 4-tuples of nodes (A,B,C,D) and if your current graph has edges AB, CD but AC, BD would be better, then you make the change.
Optionally the same thing with 6-tuples, or other genetic algorithms (they are called that way because they work by mutations).
This is equivalent to the traveling salesman problem (and is therefore NP-Complete) since if you could solve this problem efficiently, you could solve TSP simply by replacing each cost with its reciprocal.
This means you can't solve exactly. On the other hand, it means that you can do exactly as I said (replace each cost with its reciprocal) and then use any of the known TSP approximation methods on this problem.
Seems like a max flow problem to me.
Is it possible that by greedily selecting the next most expensive option from any given start point (omitting jumps to visited nodes) and stopping once all nodes are visited? If you get to a dead end backtrack to the previous spot where you are not at a dead end and greedily select. It would require some work and probably something like a stack to keep your paths in. I think this would work quite effectively provided the costs are well ordered and non negative.
Use Genetic Algorithms. They are designed to solve the problem you state rapidly reducing time complexity. Check for AI library in your language of choice.

Resources