How to calculate heuristic value in a* algorithm? - algorithm

I am doing a project to code the A* algorithm in the shortest path problem. In able to determine the shortest path using A* algorithm, I acknowledge that we have to get the heuristic value first. Do anyone know how to calculate and determine the heuristic value for each nodes? [i made up the map on my own so no heuristic values given]

A* and heuristic
A* always requires a heuristic, it is defined using heuristic values for distances. A* in principle is just the ordinary Dijkstra algorithm using heuristic guesses for the distances.
The heuristic function should run fast, in O(1) at query time. Otherwise you won't have much benefit from it. As heuristic you can select every function h for which:
h is admissible: h(u) <= dist(u, t) (never overestimate)
h is monotone: h(u) <= cost(u, v) + h(v) (triangle inequality)
There are however some heuristics that are frequently used in practice like:
Straight-line distance (as-the-crow-flies)
Landmark heuristic (pre-compute distances for all nodes to a set of selected nodes (landmarks))
Dependent on your application you might also find other heuristic functions useful.
Straight-line heuristic
The straight-line distance (or as-the-crow-flies) is straightforward and easy to compute. For two nodes v, u you know the exact location, i.e. Longitude and Latitude.
You then compute the straight-line distance by defining h as the Euclidean distance or if you want more precise results you don't ignore the fact that the earth is a sphere and use the Great-circle distance. Both methods run in O(1).
Landmark heuristic
Here you pre-select some important nodes in your graph. Ideally you always choose a node that is part of frequently used shortest-paths.
However that knowledge is often not available so you can just select nodes that are farthest to the other selected landmarks. You can do so by using greedy farthest selection (pick node which maximizes min_l dist(l, u) where l are already selected landmarks). Therefore you can do a Dijkstra from set which is very easy to implement. Just add multiple nodes at once into your Dijkstra starting queue (all current landmarks). Then you run the Dijkstra until all distances have been computed and pick the node with greatest shortest-path distance as next landmark. By that your landmarks are equally spread around the whole graph.
After selecting landmarks you pre-compute the distance from all landmarks to all other nodes and vice versa (from all nodes to all landmarks) and store them. Therefore just run a Dijkstra starting at a landmark until all distances have been computed.
The heuristic h for any node u, where v is the target node, then is defined as follows
h(u) = max_l(max(dist(u, l) - dist(v, l), dist(l, v) - dist(l, u)))
or for undirected graphs just
h(u) = max_l|dist(l, u) - dist(l, v)|
where max_l is a landmark which maximizes the argument.
After pre-computing said distances the method will obviously also run in O(1). However the pre-computation might take a minute or more but that should be no problem since you only need to compute it once and then never again at query time.
Note that you can also select the landmarks randomly which is faster but the results may vary in quality.
Comparison
Some time ago I created an image which compares some shortest-path computation algorithms I've implemented (PathWeaver at GitHub). Here's the image:
You see a query from top left to bottom right (inside the city). Marked are all nodes that where visited by the used algorithm. The less marks the faster the algorithm found the shortest-path.
The compared algorithms are
Ordinary Dijkstra (baseline, visits all nodes with that distance)
A* with straight-line heuristic (not a good estimate for road networks)
A* with landmarks (randomly computed) (good)
A* with landmarks (greedy farthest selected) (good)
Arc-Flags (okay)
Note that Arc-Flags is a different algorithm. It wants to have an area, like a rectangle around a city. It then selects all boundary nodes (nodes which are inside the rectangle but minimize distance to outside nodes). With those boundary nodes it performs a reversed Dijkstra (reverse all edges and then run Dijkstra). By that you efficiently pre-compute the shortest paths from all nodes to the boundary. Edges which are part of such a shortest path are then marked (arcs are flagged). At query time you run an ordinary Dijkstra but only consider marked edges. Therefore you follow shortest paths to the boundary.
This technique can be combined with others like A* and you can select many different rectangles, like all commonly searched cities.
There's also another algorithm I know (but never implemented though), it is called Contraction Hierarchies and it exploits the fact that you usually start at a small town road, then switch to a bigger road, then a highway and in the end vice versa until you reached your destination. Therefore it gives each edge a level and then first tries to as quickly as possible reach a high level and try to keep it as long as possible.
It therefore pre-computes shortcuts which are temporary edges that represent a shortest-path, like super-highways.
Bottom line
The right choice for a heuristic and also for an algorithm in general heavily depends on your model.
As seen for road networks, especially near smaller towns, the straight-line heuristic doesn't perform well since there often is no road that goes straight-line. Also for long distances you tend to first drive onto the highway which sometimes means driving into the opposite direction for some minutes.
However for games, where you often can move around where you like straight-line performs significantly better. But as soon as you introduce roads where you can travel faster (like by using a car) or if you have many obstacles like big mountains, it might get bad again.
Landmark heuristic performs well on most networks as it uses the true distance. However you have some pre-computation and trade some space since you need to hold all that pre-computed data.

Heuristic values are wholly domain-dependent, especially admissible ones (which A* requires). So, for example, finding the shortest path on a geographic map might involve a heuristic of the straight-line distance between two nodes, which could be pretty-well approximated by computing the Euclidean distance between the (latitude, longitude) of the two points.

Related

Heuristic value in A* algorithm

I am learning A* algorithm and dijkstra algorithm. And found out the only difference is the Heuristic value it used by A* algorithm. But how can I get these Heuristic value in my graph?. I found a example graph for A* Algorithm(From A to J). Can you guys help me how these Heuristic value are calculated.
The RED numbers denotes Heuristic value.
My current problem is in creating maze escape.
In order to get a heuristic that estimates (lower bounds) the minimum path cost between two nodes there are two possibilities (that I know of):
Knowledge about the underlying space the graph is part of
As an example assume the nodes are points on a plane (with x and y coordinate) and the cost of each edge is the euclidean distance between the corresponding nodes. In this case you can estimate (lower bound) the path cost from node U to node V by calculating the euclidean distance between U.position and V.position.
Another example would be a road network where you know its lying on the earth surface. The cost on the edges might represent travel times in minutes. In order to estimate the path cost from node U to node V you can calculate the great-circle distance between the two and divide it by the maximum travel speed possible.
Graph Embedding
Another possibility is to embed your graph in a space where you can estimate the path distance between two nodes efficiently. This approach does not make any assumptions on the underlying space but requires precomputation.
For example you could define a landmark L in your graph. Then you precalculate the distance between each node of the graph to your landmark and safe this distance at the node. In order to estimate the path distance during A* search you can now use the precalculated distances as follows: The path distance between node U and V is lower bounded by |dist(U, L) - dist(V,L)|.You can improve this heuristic by using more than one landmark.
For your graph you could use node A and node H as landmarks, which will give you the graph embedding as shown in the image below. You would have to precompute the shortest paths between the nodes A and H and all other nodes beforehand in order to compute this embedding. When you want to estimate for example the distance between two nodes B and J you can compute the distance in each of the two dimensions and use the maximum of the two distances as estimation. This corresponds to the L-infinity norm.
The heuristic is an estimate of the additional distance you would have to traverse to get to your destination.
It is problem specific and appears in different forms for different problems. For your graph , a good heuristic could be: the actual distance from the node to destination, measured by an inch tape or centimeter scale. Funny right but thats exactly how my college professor did it. He took an inch tape on black board and came up with very good heuristic.
So h(A) could be 10 units means the length measured by a measuring scale physically from A to J.
Of course for your algorithm to work the heuristic must be admissible, if not it could give you wrong answer.

Dijkstra Algorithm with Chebyshev Distance

I have been using Dijkstra Algorithm to find the shortest path in the Graph API which is given by the Princeton University Algorithm Part 2, and I have figured out how to find the path with Chebyshev Distance.
Even though Chebyshev can move to any side of the node with the cost of only 1, there is no impact on the Total Cost, but according to the graph, the red circle, why does the path finding line moves zigzag without moving straight?
Will the same thing will repeat if I use A* Algorithm?
If you want to prioritize "straight lines" you should take the direction of previous step into account. One possible way is to create a graph G'(V', E') where V' consists of all neighbour pairs of vertices. For example, vertex v = (v_prev, v_cur) would define a vertex in the path where v_cur is the last vertex of the path and v_prev is the previous vertex. Then on "updating distances" step of the shortest path algorithm you could choose the best distance with the best (non-changing) direction.
Also we can add additional property to the distance equal to the number of changing a direction and find the minimal distance way with minimal number of direction changes.
It shouldn't be straight in particular, according to Dijkstra or A*, as you say it has no impact on the total cost. I'll assume, by the way, that you want to prevent useless zig-zagging in particular, and have no particular preference in general for a move that goes in the same direction as the previous move.
Dijkstra and A* do not have a built-in dislike for "weird paths", they only explicitly care about the cost, implicitly that means they also care about how you handle equal costs. There are a couple of things you can do about that:
Use tie-breaking to make them prefer straight moves whenever two nodes have equal cost (G or F, depending on whether you're doing Dijkstra or A*). This gives some trouble around obstacles because two choices that eventually lead to equal-length paths do not necessarily have the same F score, so they might not get tie-broken. It'll never give you a sub-optimal path though.
Slightly increase your diagonal cost, it doesn't have to be a whole lot, say 10 for straight and 11 for diagonal. This will just avoid any diagonal move that isn't a shortcut. But obviously: if that doesn't match the actual cost, you can now find sub-optimal paths. The bigger the cost difference, the more that will happen. In practice it's relatively rare, and paths have to be long enough (accumulating enough cost-difference that it becomes worth an entire extra move) before it happens.

Algorithm for determining largest covered area

I'm looking for an algorithm which I'm sure must have been studied, but I'm not familiar enough with graph theory to even know the right terms to search for.
In the abstract, I'm looking for an algorithm to determine the set of routes between reachable vertices [x1, x2, xn] and a certain starting vertex, when each edge has a weight and each route can only have a given maximum total weight of x.
In more practical terms, I have road network and for each road segment a length and maximum travel speed. I need to determine the area that can be reached within a certain time span from any starting point on the network. If I can find the furthest away points that are reachable within that time, then I will use a convex hull algorithm to determine the area (this approximates enough for my use case).
So my question, how do I find those end points? My first intuition was to use Dijkstra's algorithm and stop once I've 'consumed' a certain 'budget' of time, subtracting from that budget on each road segment; but I get stuck when the algorithm should backtrack but has used its budget. Is there a known name for this problem?
If I understood the problem correctly, your initial guess is right. Dijkstra's algorithm, or any other algorithm finding a shortest path from a vertex to all other vertices (like A*) will fit.
In the simplest case you can construct the graph, where weight of edges stands for minimum time required to pass this segment of road. If you have its length and maximum allowed speed, I assume you know it. Run the algorithm from the starting point, pick those vertices with the shortest path less than x. As simple as that.
If you want to optimize things, note that during the work of Dijkstra's algorithm, currently known shortest paths to the vertices are increasing monotonically with each iteration. Which is kind of expected when you deal with graphs with non-negative weights. Now, on each step you are picking an unused vertex with minimum current shortest path. If this path is greater than x, you may stop. There is no chance that you have any vertices with shortest path less than x from now on.
If you need to exactly determine points between vertices, that a vehicle can reach in a given time, it is just a small extension to the above algorithm. As a next step, consider all (u, v) edges, where u can be reached in time x, while v cannot. I.e. if we define shortest path to vertex w as t(w), we have t(u) <= x and t(v) > x. Now use some basic math to interpolate point between u and v with the coefficient (x - t(u)) / (t(v) - t(u)).
Using breadth first search from the starting node seems a good way to solve the problem in O(V+E) time complexity. Well that's what Dijkstra does, but it stops after finding the smallest path. In your case, however, you must continue collecting routes for your set of routes until no route can be extended keeping its weigth less than or equal the maximum total weight.
And I don't think there is any backtracking in Dijkstra's algorithm.

Minimum manhattan distance with certain blocked points

The minimum Manhattan distance between any two points in the cartesian plane is the sum of the absolute differences of the respective X and Y axis. Like, if we have two points (X,Y) and (U,V) then the distance would be: ABS(X-U) + ABS(Y-V). Now, how should I determine the minimum distance between several pairs of points moving only parallel to the coordinate axis such that certain given points need not be visited in the selected path. I need a very efficient algorithm, because the number of avoided points can range up to 10000 with same range for the number of queries. The coordinates of the points would be less than ABS(50000). I would be given the set of points to be avoided in the beginning, so I might use some offline algorithm and/or precomputation.
As an example, the Manhattan distance between (0,0) and (1,1) is 2 from either path (0,0)->(1,0)->(1,1) or (0,0)->(0,1)->(1,1). But, if we are given the condition that (1,0) and (0,1) cannot be visited, the minimum distance increases to 6. One such path would then be: (0,0)->(0,-1)->(1,-1)->(2,-1)->(2,0)->(2,1)->(1,1).
This problem can be solved by breadth-first search or depth-first search, with breadth-first search being the standard approach. You can also use the A* algorithm which may give better results in practice, but in theory (worst case) is no better than BFS.
This is provable because your problem reduces to solving a maze. Obviously you can have so many obstacles that the grid essentially becomes a maze. It is well known that BFS or DFS are the only way to solve mazes. See Maze Solving Algorithms (wikipedia) for more information.
My final recommendation: use the A* algorithm and hope for the best.
You are not understanding the solutions here or we are not understanding the problem:
1) You have a cartesian plane. Therefore, every node has exactly 4 adjacent nodes, given by x+/-1, y+/-1 (ignoring the edges)
2) Do a BFS (or DFS,A*). All you can traverse is x/y +/- 1. Prestore your 10000 obstacles and just check if the node x/y +/-1 is visitable on demand. you don't need a real graph object
If it's too slow, you said you can do an offline calculation - 10^10 only requires 1.25GB to store an indexed obstacle lookup table. leave the algorithm running?
Where am I going wrong?

Modifying A star routing for a euclidean graph with edge weights as well as distances

I'm in the process of writing an application to suggest circular routes over OpenStreetMap data subject to some constraints (the Orienteering Problem). In the innermost loop of the algorithm I'm trialling is a requirement to find the lowest cost path between two given points. Given the layout of the graph (basically Euclidean), the A star algorithm seems to be likely to produce results in the fastest time given the graph. However as well as distances on my edges (representing actual distances on the map), I also have a series of weights (currently scaled from 0.0, least desirable to 1.0, most desirable) indicating how desirable the particular edge (road/path/etc) is, calculated according to some metrics I've devised for my application.
I would like to modify my distances based on these weights. I am aware that the standard A star heuristic relies on the true cost of the path being at least as great as the estimate (based on a euclidean distance between the points). So my first thought was to come up with a scheme where the minimum edge distance is the real distance (for weight 1.0) and the distance is increased as the weight decreases (for instance quadrupling the distance for weight 0.0). Does this seem a sensible approach, or is there a better standard technique for fast routing under these circumstances?
I believe your approach is the most sane. Apparently I'm working on a similar problem, and I decided to use exactly the same strategy.
The A* algorithm doesn't necessarily rely on "true distances". It's not even about distances, you actually may minimize other physical quantity - the heuristic function should have the same physical units.
For instance, my problem is to minimize the path time, whereas the velocity at any given point depends on the location, time, and the chosen direction. My heuristic function is the rough distance (my problem is on the Earth surface, calculating the great circle distance is somewhat pricey) divided by the maximum allowed velocity. That is, it has units of time, and it's interpreted as the most optimistic time to reach the finishing point from the given location.
The relevant question is: "what do you actually want to minimize?". You need to end up with a single "modified distance", so that your pathfinding algorithm can pick the smallest one.
The continued usefulness of the A* algorithm depends on exactly how you integrate "desirability" into your routing distance. A* requires an "admissible heuristic" which is optimistic: the heuristic estimate for your "modified distance" must not exceed the actual "modified distance" (otherwise, the path it finds may not actually be optimal...). One way to ensure that is to make sure that the modified distance is always larger than the original, Euclidean distance for any given step; then, any A* heuristic admissible to minimize the Euclidean distance will also be admissible to minimize the modified distance.
For example, if you compute modified_distance = euclidean_distance / desirability_rating (for 0<desirability_rating<=1), your modified_distance will never be smaller than the euclidean_distance: whatever A* heuristic you were using for your unweighted paths will still be optimistic, so it will be usable for A*. (although, in regions where every path is undesirable, a highly over-optimistic A* heuristic may not improve performance as much as you would like...)
Fast(er) routing can be done with A*. In my own project I see a boost of approx. 4 times faster compared to dijkstra - so, even faster then bidirectional dijkstra.
But there are a lot more technics to improve query speed - e.g. introducing shortcuts and running A* on that graph. Here is a more detailed answer.

Resources