Bidirectional A* (A-star) Search - algorithm

I'm implementing a bidirectional A* search (bidirectional as in the search is performed from both the origin and destination simultaneously, and when these two searches meet, I'll have my shortest path - at least with a bit of extra logic thrown in).
Does anyone have any experience with taking a unidirectional A* and bidirectionalising(!) it - what sort of performance gain can I expect? I'd reckoned on it more-or-less halving the search time, at a minimum - but may I see bigger gains that this? I'm using the algorithm to determine shortest routes on a road network - if that's in any way relevant (I've read about MS's "Reach" algorithm, but want to take baby-steps towards this rather than jumping straight in).

In the best possible case it'd run in O(b^(n/2)) instead of O(b^n), but that's only if you're lucky :)
(where b is your branching factor and n is the number of nodes a unidirectional A* would consider)
It all depends on how easily the two searches meet, if they find each other at a good halfway point early in the search you've done away with a lot of search time, but if they branch into wildly different directions you may end up with something slower than simple A* (because of all the extra bookkeeping)

You may want to try https://github.com/sroycode/tway
there is a benchmarking script that compares this with standard astar
( for NY city roads data it seems to give a 30% time benefit )

You may consider clustering as much more efficient booster. Also see this article

Related

Traveling Salesman Problem for a large number of vertices

I need to solve TSP for a large number of vertices(30-100) with good accuracy and adequate time(like 1-2 days). My graph can contain asymmetrical edges(g[i][j] not equal g[j][i]).
I tried greedy, little(maybe my bad, but that shows worse results than greedy), simple genetic algo(barely better than greedy) , dynamic for O(2^n*n) (fast out of memory).
Well, 30-100 is not really large number of vertices. Did you miss some zeroes? Or you are facing some special hard to solve cases like p43 from TSPLIB?
In any case, if you are looking for a good heuristic I used to use Ant Colony Optimization for Asymmetric TSP. It is easy to implement and providers quite good performance.
You might take a look at my old implementation: https://github.com/aligusnet/optimer/tree/master/src/heuristics/aco
If you can accept not an optimal, but "close to optimal" solution, I can suggest you to use "random traveling" algorithm. Idea of this algorithm - do not BFS/DFS search through entire combination tree, but search just random DFS-subtrees.
For example, you have vertices [A-Z], and you start point within [A]. Try 10000 attempts for each path (total 32 prefix), started from [A-B-...], [A-C-...] and so on, where [...] is randomly selected full-depth path through your graph, according your rules. Keep cost of appropriate paths within array, where cost is sum of costs from each prefix. Because of you use equal attempts to all "start prefixes", sum of minimal prefix will show you best step from [A]. Of course, this is not guarantee for optimal, but this is high probability to be so.
For example, sum of 10,000 attempts withing path [A-K] is lowest. Next step - accept first step [A-K], and again repeat algorithm, until you found the solution.
Here's the TSP source code of the OptaPlanner implementation, fwiw. It deals with datasets up to 10k visits pretty good when NearbySelection is activated (or up to 500 visits or so if it's not activated) - to go above 10k you'll need to activate Partitioned Search which comes at a trade-off.
It has asymmetric datasets (using OpenStreetMap data) in the import/belgium/road-time directory. It can't prove if it reaches the optimal solution or not. Usually termination is set on either a few minutes or on a few unimproved minutes.
Benchmarks showed that Late Acceptance has slightly better results than Simulated Annealing and Tabu Search, given a specific set of MoveSelectors configured, but your mileage may vary...

What algorithm to use to compute the fastest order to build buildings?

For a game I'm playing I would like to know the fastest order to build a number of buildings. The game in question is OGame, if you're familiar with it, then that is a plus, but obviously I will explain the basics of the game:
The player has a number of differences resources available.
The player can build buildings, maximum one building at a time.
Buildings have different levels, for example Building A level 1, Building A level 2, etc.
The resource cost to build buildings increases per level.
Buildings cost time to build too, this also increases per level.
Some of the buildings produce the different resources, this also increases per level.
Some of the buildings change the calculations such that buildings get built faster.
I have explicitly chosen to not show the equations as they are not straight-forward and should not be needed to suggest an algorithm.
I have chosen to model this with the following actions:
StartUpgradeBuildingAction: This action starts the upgrade process by subtracting the cost from the available resources.
FinishUpgradeBuildingAction: This action finishes the upgrade process by going forward in time. This also produces resources.
WaitAction: This action forwards the time by X seconds and meanwhile produces the resources according to the resource production.
It should be noted that the state space is infinite and can be characterized by the fact that there are multiple paths to the final configuration (where all requested buildings have been built), each possibly having a different time it takes and a different amount of resources you end up with in the end. Now I am most interested in the path (the order) that is the fastest, and if there are multiple equal paths, then the path that costs the least should be preferred.
I have already tried the following approaches:
Breadth-First Search
Depth-First Search
Iterative Deepening Depth-First Search
Iterative Deepening A* Search
A* Search
Unfortunately all of these algorithms either take too long, or use too much memory.
As googling has not given me any further leads, I ask the following questions here:
Is there an already existing model that matches my problem? I would think that Business Information Systems for example would have encountered this type of problem already.
Does an algorithm exist that gives the best solution? If so, which one?
Is there an algorithm that gives a solution that is close to the best solution? If so, which one?
Any help is appreciated.
There is not the one algorithm that gives you the best solution for your problem. The approaches you tried are all reasonable. However, having tried A* search doesn't mean a lot since A* search depends on a heuristic that evaluates a certain configuration (i.e. it assigns a value to the combination of the amount of time passed, number and selection of buildings built, available resources, etc.). With a good heuristic, A* search might lead you to a very good solution quickly. Finding this heuristic requires good knowledge of the parameters (costs of buildings, benefits of upgrades etc.)
However, my feeling is that your problem is structured in such a way that a series of build decisions can clearly outperform another series of decisions after a small number of steps. Lets say you build buildings A, B and C in this order. You build each one as soon as the required resources are available. Then you try the order C, A, B. You will probably find that one alternative dominates the other insofar that you have the same buildings but in one alternative you have more resources than in the other. Of course this is less likely if you have many different resources. You might have more of resource X but less of Y which makes the situations harder to compare. If it is possible, the good thing is you don't need a heuristic but you clearly see which path you should follow and which to cut off.
Anyway, I would explore how many steps it takes until you find paths that you can dismiss based on such a consideration. If you find them quick, it makes sense to follow a breadth-first strategy and prune branches as soon as possible. A depth first search bears the risk that you spend a lot of time exploring inferior paths.

A* Algorithm for very large graphs, any thoughts on caching shortcuts?

I'm writing a courier/logistics simulation on OpenStreetMap maps and have realised that the basic A* algorithm as pictured below is not going to be fast enough for large maps (like Greater London).
The green nodes correspond to ones that were put in the open set/priority queue and due to the huge number (the whole map is something like 1-2 million), it takes 5 seconds or so to find the route pictured. Unfortunately 100ms per route is about my absolute limit.
Currently, the nodes are stored in both an adjacency list and also a spatial 100x100 2D array.
I'm looking for methods where I can trade off preprocessing time, space and if needed optimality of the route, for faster queries. The straight-line Haversine formula for the heuristic cost is the most expensive function according to the profiler - I have optimised my basic A* as much as I can.
For example, I was thinking if I chose an arbitrary node X from each quadrant of the 2D array and run A* between each, I can store the routes to disk for subsequent simulations. When querying, I can run A* search only in the quadrants, to get between the precomputed route and the X.
Is there a more refined version of what I've described above or perhaps a different method I should pursue. Many thanks!
For the record, here are some benchmark results for arbitrarily weighting the heuristic cost and computing the path between 10 pairs of randomly picked nodes:
Weight // AvgDist% // Time (ms)
1 1 1461.2
1.05 1 1327.2
1.1 1 900.7
1.2 1.019658848 196.4
1.3 1.027619169 53.6
1.4 1.044714394 33.6
1.5 1.063963413 25.5
1.6 1.071694171 24.1
1.7 1.084093229 24.3
1.8 1.092208509 22
1.9 1.109188175 22.5
2 1.122856792 18.2
2.2 1.131574742 16.9
2.4 1.139104895 15.4
2.6 1.140021962 16
2.8 1.14088128 15.5
3 1.156303676 16
4 1.20256964 13
5 1.19610861 12.9
Surprisingly increasing the coefficient to 1.1 almost halved the execution time whilst keeping the same route.
You should be able to make it much faster by trading off optimality. See Admissibility and optimality on wikipedia.
The idea is to use an epsilon value which will lead to a solution no worse than 1 + epsilon times the optimal path, but which will cause fewer nodes to be considered by the algorithm. Note that this does not mean that the returned solution will always be 1 + epsilon times the optimal path. This is just the worst case. I don't know exactly how it would behave in practice for your problem, but I think it is worth exploring.
You are given a number of algorithms that rely on this idea on wikipedia. I believe this is your best bet to improve the algorithm and that it has the potential to run in your time limit while still returning good paths.
Since your algorithm does deal with millions of nodes in 5 seconds, I assume you also use binary heaps for the implementation, correct? If you implemented them manually, make sure they are implemented as simple arrays and that they are binary heaps.
There are specialist algorithms for this problem that do a lot of pre-computation. From memory, the pre-computation adds information to the graph that A* uses to produce a much more accurate heuristic than straight line distance. Wikipedia gives the names of a number of methods at http://en.wikipedia.org/wiki/Shortest_path_problem#Road_networks and says that Hub Labelling is the leader. A quick search on this turns up http://research.microsoft.com/pubs/142356/HL-TR.pdf. An older one, using A*, is at http://research.microsoft.com/pubs/64505/goldberg-sp-wea07.pdf.
Do you really need to use Haversine? To cover London, I would have thought you could have assumed a flat earth and used Pythagoras, or stored the length of each link in the graph.
There's a really great article that Microsoft Research wrote on the subject:
http://research.microsoft.com/en-us/news/features/shortestpath-070709.aspx
The original paper is hosted here (PDF):
http://www.cc.gatech.edu/~thad/6601-gradAI-fall2012/02-search-Gutman04siam.pdf
Essentially there's a few things you can try:
Start from the both the source as well as the destination. This helps to minimize the amount of wasted work that you'd perform when traversing from the source outwards towards the destination.
Use landmarks and highways. Essentially, find some positions in each map that are commonly taken paths and perform some pre-calculation to determine how to navigate efficiently between those points. If you can find a path from your source to a landmark, then to other landmarks, then to your destination, you can quickly find a viable route and optimize from there.
Explore algorithms like the "reach" algorithm. This helps to minimize the amount of work that you'll do when traversing the graph by minimizing the number of vertices that need to be considered in order to find a valid route.
GraphHopper does two things more to get fast, none-heuristic and flexible routing (note: I'm the author and you can try it online here)
A not so obvious optimization is to avoid 1:1 mapping of OSM nodes to internal nodes. Instead GraphHopper uses only junctions as nodes and saves roughly 1/8th of traversed nodes.
It has efficient implements for A*, Dijkstra or e.g. one-to-many Dijkstra. Which makes a route in under 1s possible through entire Germany. The (none-heuristical) bidirectional version of A* makes this even faster.
So it should be possible to get you fast routes for greater London.
Additionally the default mode is the speed mode which makes everything an order of magnitudes faster (e.g. 30ms for European wide routes) but less flexible, as it requires preprocessing (Contraction Hierarchies). If you don't like this, just disable it and also further fine-tune the included streets for car or probably better create a new profile for trucks - e.g. exclude service streets and tracks which should give you a further 30% boost. And as with any bidirectional algorithm you could easily implement a parallel search.
I think it's worth to work-out your idea with "quadrants". More strictly, I'd call it a low-resolution route search.
You may pick X connected nodes that are close enough, and treat them as a single low-resolution node. Divide your whole graph into such groups, and you get a low-resolution graph. This is a preparation stage.
In order to compute a route from source to target, first identify the low-res nodes they belong to, and find the low-resolution route. Then improve your result by finding the route on high-resolution graph, however restricting the algorithm only to nodes that belong to hte low-resolution nodes of the low-resolution route (optionally you may also consider neighbor low-resolution nodes up to some depth).
This may also be generalized to multiple resolutions, not just high/low.
At the end you should get a route that is close enough to optimal. It's locally optimal, but may be somewhat worse than optimal globally by some extent, which depends on the resolution jump (i.e. the approximation you make when a group of nodes is defined as a single node).
There are dozens of A* variations that may fit the bill here. You have to think about your use cases, though.
Are you memory- (and also cache-) constrained?
Can you parallelize the search?
Will your algorithm implementation be used in one location only (e.g. Greater London and not NYC or Mumbai or wherever)?
There's no way for us to know all the details that you and your employer are privy to. Your first stop thus should be CiteSeer or Google Scholar: look for papers that treat pathfinding with the same general set of constraints as you.
Then downselect to three or four algorithms, do the prototyping, test how they scale up and finetune them. You should bear in mind you can combine various algorithms in the same grand pathfinding routine based on distance between the points, time remaining, or any other factors.
As has already been said, based on the small scale of your target area dropping Haversine is probably your first step saving precious time on expensive trig evaluations. NOTE: I do not recommend using Euclidean distance in lat, lon coordinates - reproject your map into a e.g. transverse Mercator near the center and use Cartesian coordinates in yards or meters!
Precomputing is the second one, and changing compilers may be an obvious third idea (switch to C or C++ - see https://benchmarksgame.alioth.debian.org/ for details).
Extra optimization steps may include getting rid of dynamic memory allocation, and using efficient indexing for search among the nodes (think R-tree and its derivatives/alternatives).
I worked at a major Navigation company, so I can say with confidence that 100 ms should get you a route from London to Athens even on an embedded device. Greater London would be a test map for us, as it's conveniently small (easily fits in RAM - this isn't actually necessary)
First off, A* is entirely outdated. Its main benefit is that it "technically" doesn't require preprocessing. In practice, you need to pre-process an OSM map anyway so that's a pointless benefit.
The main technique to give you a huge speed boost is arc flags. If you divide the map in say 5x6 sections, you can allocate 1 bit position in a 32 bits integer for each section. You can now determine for each edge whether it's ever useful when traveling to section {X,Y} from another section. Quite often, roads are bidirectional and this means only one of the two directions is useful. So one of the two directions has that bit set, and the other has it cleared. This may not appear to be a real benefit, but it means that on many intersections you reduce the number of choices to consider from 2 to just 1, and this takes just a single bit operation.
Usually A* comes along with too much memory consumption rather than time stuggles.
However I think it could be useful to first only compute with nodes that are part of "big streets" you would choose a highway over a tiny alley usually.
I guess you may already use this for your weight function but you can be faster if you use some priority Queue to decide which node to test next for further travelling.
Also you could try reducing the graph to only nodes that are part of low cost edges and then find a way from to start/end to the closest of these nodes.
So you have 2 paths from start to the "big street" and the "big street" to end.
You can now compute the best path between the two nodes that are part of the "big streets" in a reduced graph.
Old question, but yet:
Try to use different heaps that "binary heap". 'Best asymptotic complexity heap' is definetly Fibonacci Heap and it's wiki page got a nice overview:
https://en.wikipedia.org/wiki/Fibonacci_heap#Summary_of_running_times
Note that binary heap has simpler code and it's implemented over array and traversal of array is predictable, so modern CPU executes binary heap operations much faster.
However, given dataset big enough, other heaps will win over binary heap, because of their complexities...
This question seems like dataset big enough.

concrete examples of heuristics

What are concrete examples (e.g. Alpha-beta pruning, example:tic-tac-toe and how is it applicable there) of heuristics. I already saw an answered question about what heuristics is but I still don't get the thing where it uses estimation. Can you give me a concrete example of a heuristic and how it works?
Warnsdorff's rule is an heuristic, but the A* search algorithm isn't. It is, as its name implies, a search algorithm, which is not problem-dependent. The heuristic is. An example: you can use the A* (if correctly implemented) to solve the Fifteen puzzle and to find the shortest way out of a maze, but the heuristics used will be different. With the Fifteen puzzle your heuristic could be how many tiles are out of place: the number of moves needed to solve the puzzle will always be greater or equal to the heuristic.
To get out of the maze you could use the Manhattan Distance to a point you know is outside of the maze as your heuristic. Manhattan Distance is widely used in game-like problems as it is the number of "steps" in horizontal and in vertical needed to get to the goal.
Manhattan distance = abs(x2-x1) + abs(y2-y1)
It's easy to see that in the best case (there are no walls) that will be the exact distance to the goal, in the rest you will need more. This is important: your heuristic must be optimistic (admissible heuristic) so that your search algorithm is optimal. It must also be consistent. However, in some applications (such as games with very big maps) you use non-admissible heuristics because a suboptimal solution suffices.
A heuristic is just an approximation to the real cost, (always lower than the real cost if admissible). The better the approximation, the fewer states the search algorithm will have to explore. But better approximations usually mean more computing time, so you have to find a compromise solution.
Most demonstrative is the usage of heuristics in informed search algorithms, such as A-Star. For realistic problems you usually have large search space, making it infeasible to check every single part of it. To avoid this, i.e. to try the most promising parts of the search space first, you use a heuristic. A heuristic gives you an estimate of how good the available subsequent search steps are. You will choose the most promising next step, i.e. best-first. For example if you'd like to search the path between two cities (i.e. vertices, connected by a set of roads, i.e. edges, that form a graph) you may want to choose the straight-line distance to the goal as a heuristic to determine which city to visit first (and see if it's the target city).
Heuristics should have similar properties as metrics for the search space and they usually should be optimistic, but that's another story. The problem of providing a heuristic that works out to be effective and that is side-effect free is yet another problem...
For an application of different heuristics being used to find the path through a given maze also have a look at this answer.
Your question interests me as I've heard about heuristics too during my studies but never saw an application for it, I googled a bit and found this : http://www.predictia.es/blog/aco-search
This code simulate an "ant colony optimization" algorithm to search trough a website.
The "ants" are workers which will search through the site, some will search randomly, some others will follow the "best path" determined by the previous ones.
A concrete example: I've been doing a solver for the game JT's Block, which is roughly equivalent to the Same Game. The algorithm performs a breadth-first search on all possible hits, store the values, and performs to the next ply. Problem is the number of possible hits quickly grows out of control (10e30 estimated positions per game), so I need to prune the list of positions at each turn and only take the "best" of them.
Now, the definition of the "best" positions is quite fuzzy: they are the positions that are expected to lead to the best final scores, but nothing is sure. And here comes the heuristics. I've tried a few of them:
sort positions by score obtained so far
increase score by best score obtained with a x-depth search
increase score based on a complex formula using the number of tiles, their color and their proximity
improve the last heuristic by tweaking its parameters and seeing how they perform
etc...
The last of these heuristic could have lead to an ant-march optimization: there's half a dozen parameters that can be tweaked from 0 to 1, and an optimizer could find the optimal combination of these. For the moment I've just manually improved some of them.
The second of this heuristics is interesting: it could lead to the optimal score through a full depth-first search, but such a goal is impossible of course because it would take too much time. In general, increasing X leads to a better heuristic, but increases the computing time a lot.
So here it is, some examples of heuristics. Anything can be an heuristic as long as it helps your algorithm perform better, and it's what makes them so hard to grasp: they're not deterministic. Another point with heuristics: they're supposed to lead to quick and dirty results of the real stuff, so there's a trade-of between their execution time and their accuracy.
A couple of concrete examples: for solving the Knight's Tour problem, one can use Warnsdorff's rule - an heuristic. Or for solving the Fifteen puzzle, a possible heuristic is the A* search algorithm.
The original question asked for concrete examples for heuristics.
Some of these concrete examples were already given. Another one would be the number of misplaced tiles in the 15-puzzle or its improvement, the Manhattan distance, based on the misplaced tiles.
One of the previous answers also claimed that heuristics are always problem-dependent, whereas algorithms are problem-independent. While there are, of course, also problem-dependent algorithms (for instance, for every problem you can just give an algorithm that immediately solves that very problem, e.g. the optimal strategy for any tower-of-hanoi problem is known) there are also problem-independent heuristics!
Consequently, there are also different kinds of problem-independent heuristics. Thus, in a certain way, every such heuristic can be regarded a concrete heuristic example while not being tailored to a specific problem like 15-puzzle. (Examples for problem-independent heuristics taken from planning are the FF heuristic or the Add heuristic.)
These problem-independent heuristics base on a general description language and then they perform a problem relaxation. That is, the problem relaxation only bases on the syntax (and, of course, its underlying semantics) of the problem description without "knowing" what it represents. If you are interested in this, you should get familiar with "planning" and, more specifically, with "planning as heuristic search". I also want to mention that these heuristics, while being problem-independent, are dependent on the problem description language, of course. (E.g., my before-mentioned heuristics are specific to "planning problems" and even for planning there are various different sub problem classes with differing kinds of heuristics.)

I need an algorithm to find the best path

I need an algorithm to find the best solution of a path finding problem. The problem can be stated as:
At the starting point I can proceed along multiple different paths.
At each step there are another multiple possible choices where to proceed.
There are two operations possible at each step:
A boundary condition that determine if a path is acceptable or not.
A condition that determine if the path has reached the final destination and can be selected as the best one.
At each step a number of paths can be eliminated, letting only the "good" paths to grow.
I hope this sufficiently describes my problem, and also a possible brute force solution.
My question is: is the brute force is the best/only solution to the problem, and I need some hint also about the best coding structure of the algorithm.
Take a look at A*, and use the length as boundary condition.
http://en.wikipedia.org/wiki/A%2a_search_algorithm
You are looking for some kind of state space search algorithm. Without knowing more about the particular problem, it is difficult to recommend one over another.
If your space is open-ended (infinite tree search), or nearly so (chess, for example), you want an algorithm that prunes unpromising paths, as well as selects promising ones. The alpha-beta algorithm (used by many OLD chess programs) comes immediately to mind.
The A* algorithm can give good results. The key to getting good results out of A* is choosing a good heuristic (weighting function) to evaluate the current node and the various successor nodes, to select the most promising path. Simple path length is probably not good enough.
Elaine Rich's AI textbook (oldie but goodie) spent a fair amount of time on various search algorithms. Full Disclosure: I was one of the guinea pigs for the text, during my undergraduate days at UT Austin.
did you try breadth-first search? (BFS) that is if length is a criteria for best path
you will also have to modify the algorithm to disregard "unacceptable paths"
If your problem is exactly as you describe it, you have two choices: depth-first search, and breadth first search.
Depth first search considers a possible path, pursues it all the way to the end (or as far as it is acceptable), and only then is it compared with other paths.
Breadth first search is probably more appropriate, at each junction you consider all possible next steps and use some score to rank the order in which each possible step is taken. This allows you to prioritise your search and find good solutions faster, (but to prove you have found the best solution it takes just as long as depth-first searching, and is less easy to parallelise).
However, your problem may also be suitable for Dijkstra's algorithm depending on the details of your problem. If it is, that is a much better approach!
This would also be a good starting point to develop your own algorithm that performs much better than iterative searching (if such an algorithm is actually possible, which it may not be!)
A* plus floodfill and dynamic programming. It is hard to implement, and too hard to describe in a simple post and too valuable to just give away so sorry I can't provide more but searching on flood fill and dynamic programming will put you on the path if you want to go that route.

Resources