How to solve this linear programing problem? - algorithm

I'm not so good at linear programing so I'm posting this problem here.
Hope somebody can point me out to the right direction.
It is not homework problem so don't misunderstand.
I have a matrix 5x5 (25 nodes). Distance between each node and its adjacent nodes (or neighbor nodes) is 1 unit. A node can be in 1 of 2 conditions: cache or access. If a node 'i' is a cache node, an access nodes 'j' can be able to access it with a cost of Dij x Aij (Access Cost). Dij is Manhattan distance between node i and j. Aij is access frequency from node i to j.
In order to become a cache node i, it needs to cache from an existing cache node k with a cost of Dik x C where C is a Integer constant. (Cache Cost) . C is called caching frequency.
A is provided as an 25x25 matrix containing all integers that shows access frequency between any pair of node i and j. D is provided as an 25x25 matrix containing all Manhattan distances between any pair of node i and j.
Assume there is 1 cache node in the matrix, find out the set of other cache nodes and access nodes such that the total cost will be minimized.
Total Cost = Total Cache Cost + Total Access Cost .

I've tackled a few problems that are something like this.
First, if you don't need an exact answer, I'd generally suggest looking into something like a genetic algorithm, or doing a greedy algorithm. It won't be right, but it won't generally be bad either. And it will be much faster than an exact algorithm. For instance you can start with all points as cache points, then find the point which reduces your cost most from making it a non-caching point. Continue until removing the next one makes the cost goes up, and use that as your solution. This won't be best. It will generally be reasonably good.
If you do need an exact answer, you will need to brute force search of a lot of data. Assuming that the initial cache point is specified, you'll have 224 = 16,777,216 possible sets of cache points to search. That is expensive.
The trick to doing it more cheaply (note, not cheaply, just more cheaply) is finding ways to prune your search. Take to heart the fact that if doing 100 times as much work on each set you look at lets you remove an average of 10 points from consideration as cache points, then your overall algorithm will visit 0.1% as many sets, and your code will run 10 times faster. Therefore it is worth putting a surprising amount of energy into pruning early and often, even if the pruning step is fairly expensive.
Often you want multiple pruning strategies. One of them is usually "the best we can do from here is worst than the best we have found previously." This works better if you've already found a pretty good best solution. Therefore it is often worth a bit of effort to do some local optimization in your search for solutions.
Typically these optimizations won't change the fact that you are doing a tremendous amount of work. But they do let you do orders of magnitude less work.
My initial try at this would take advantage of the following observations.
Suppose that x is a cache point, and y is its nearest caching neighbor. Then you can always make some path from x to y cache "for free" if you just route the cache update traffic from x to y along that path. Therefore without loss of generality the set of cache points is connected on the grid.
If the minimum cost would could wind up with exceeds the current best cost we have found, we are not on our way to a global solution.
As soon as the sum of the access rate from all points at distance greater than 1 from the cache points plus the highest access frequency of a neighbor to the cache point that you can still use is less than the cache frequency, adding more cache points is always going to be a loss. (This would be an "expensive condition that lets us stop 10 minutes early.")
The highest access neighbor of the current set of cache points is a reasonable candidate for the next cache point to try. (There are several other heuristics you can try, but this one is reasonable.)
Any point whose total access frequency exceeds the cache frequency absolutely must be a caching point.
This might not be the best set of observations to use. But it is likely to be pretty reasonable. To take advantage of this you'll need at least one data structure you might not be familiar with. If you don't know what a priority queue is, then look around for an efficient one in your language of choice. If you can't find one, a heap is pretty easy to implement and works pretty well as a priority queue.
With that in mind, assuming that you have been given the information you've described and an initial cache node P, here is pseudo-code for an algorithm to find the best.
# Data structures to be dynamically maintained:
# AT[x, n] - how many accesses x needs that currently need to go distance n.
# D[x] - The distance from x to the nearest cache node.
# CA[x] - Boolean yes/no for whether x is a cache node.
# B[x] - Boolean yes/no for whether x is blocked from being a cache node.
# cost - Current cost
# distant_accesses - The sum of the total number of accesses made from more than
# distance 1 from the cache nodes.
# best_possible_cost - C * nodes_in_cache + sum(min(total accesses, C) for non-cache nodes)
# *** Sufficient data structures to be able to unwind changes to all of the above before
# returning from recursive calls (I won't specify accesses to them, but they need to
# be there)
# best_cost - The best cost found.
# best_solution - The best solution found.
initialize all of those data structures (including best)
create neighbors priority queue of neighbors of root cache node (ordered by accesses)
call extend_current_solution(neighbors)
do what we want with the best solution
function extend_current_solution (available_neighbors):
if cost < best_cost:
best_cost = cost
best_solution = CA # current set of cache nodes.
if best_cost < best_possible_cost
return # pruning time
neighbors = clone(available_neighbors)
while neighbors:
node = remove best from neighbors
if distant_accesses + accesses(node) < C:
return # this is condition 3 above
make node in cache set
- add it to CA
- update costs
- add its immediate neighbors to neighbors
call extend_current_solution
unwind changes just made
make node in blocked set
call extend_current_solution
unwind changes to blocked set
return
It will take a lot of work to write this, and you'll need to be careful to maintain all data structures. But my bet is that - despite how heavyweight it looks - you'll find that it prunes your search space enough to run more quickly than your existing solution. (It still won't be snappy.)
Good luck!
Update
When I thought about this more, I realized that a better observation is to note that if you can cut the "not a cache node, not a blocked node" set into two pieces, then you can solve those pieces independently. Each of those sub problems is orders of magnitude faster to solve than the whole problem, so seek to do so as fast as possible.
A good heuristic to do that is to follow the following rules:
While no edge has been reached:
Drive towards the closest edge. Distance is measured by how short the shortest path is along the non-cache, non-blocked set.
If two edges are equidistant, break ties according to the following preference order: (1, x), (x, 1), (5, x), (x, 5).
Break any remaining ties according to preferring to drive towards the center of an edge.
Break any remaining ties randomly.
While an edge has been reached and your component still has edges that could become cache pieces:
If you can immediately move into an edge and split the edge pieces into two components, do so. Both for "edge in cache set" and "edge not in cache set" you'll get 2 independent subproblems that are more tractable.
Else move on a shortest path towards the piece in the middle of your section of edge pieces.
If there is a tie, break it in favor of whatever makes the line from the added piece to the added cache element as close to diagonal as possible.
Break any remaining ties randomly.
If you fall through here, choose randomly. (You should have a pretty small subproblem at this point. No need to be clever.)
If you try this starting out with (3, 3) as a cache point, you'll find that in the first few decisions you'll find that 7/16 of the time you manage to cut into two even problems, 1/16 of the time you block in the cache point and finish, 1/4 of the time you manage to cut out a 2x2 block into a separate problem (making the overall solution run 16 times faster for that piece) and 1/4 of the time you wind up well on your way towards a solution that is on its way towards either being boxed in (and quickly exhausted), or else being a candidate for a solution with a lot of cache points that gets pruned for being on track to being a bad solution.
I won't give pseudo-code for this variation. It will have a lot of similarities to what I had above, with a number of important details to handle. But I would be willing to bet money that in practice it will run orders of magnitude faster than your original solution.

The solution is a set, so this is not a linear programming problem. What it is is a special case of connected facility location. Bardossy and Raghavan have a heuristic that looks promising: http://terpconnect.umd.edu/~raghavan/preprints/confl.pdf

is spiral cache an analogy to the solution? http://strumpen.net/ibm-rc24767.pdf

Related

Is there an established algorithm for reaching a goal through the sequential application of known cause and effect tuples?

Let's say that I want to create a fictional product called g.
I know that:
a+b=c
x+y=z
and finally that
c+z=g
So clearly if I start off with products
a,b,x,y
I can create g in three steps:
a+b=c
x+y=z
c+z=g
So a naive algorithm for reaching a goal could be:
For each component required to make the goal (here c and z), recursively find a cause and effect tuple that can create that component.
But there are snags with that algorithm.
For example, let's say that my cause and effect tuples are:
a+b=c
x+y+c=z (NOTE THE EXTRA 'c' REQUIRED!!)
c+z=g
Now when I run my naive algorithm I will do
a+b=c
x+y+c=z (Using up the 'c' I created in the previous step)
c+z=g (Uh oh! I can't do this because I don't have the 'c' any more)
It seems like quite a basic area of research - how we can combine known causes and effects to reach a goal - so I suspect that work must have been done on it, but I've looked around and couldn't find anything and I don't really know where to look now.
Many thanks for any assistance!
Assuming that using a product consumes one item of it, which can then be replaced by producing a second item of that product, I would model this by giving each product a cost and working out how to minimize the cost of the final product. In this case I think this is the same as minimizing the costs of every product, because minimizing the cost of an input never increases the cost of any output. You end up with loads of equations like
a=min(b+c, d+e, f+g)
where a is the cost of a product that can be produced in alternative ways, one way consuming units with cost of b and c, another way consuming units with costs of d and e, another way consuming units with costs of f and g, and so on. There may be cycles in the associated graph.
One way to solve such a problem would be to start by assigning the cost infinity to all products not originally provided as inputs (with costs) and then repeatedly reducing costs where equations show a way of calculating a cost less than the current cost, keeping track of re-calculations caused by inputs not yet considered or reductions in costs. At each stage I would consider the consequences of the smallest input or recalculated value available, with ties broken by a second component which amounts to a tax on production. The outputs produced from a calculation are always at least as large as any input, so newly produced values are always larger than the recalculated value considered, and the recalculated value considered at each stage never decreases, which should reduce repeated recalculation.
Another way would be to turn this into a linear program and throw it at a highly optimized guaranteed polynomial time (at least in practice) linear programming solver.
a = min(b+c, d+e, f+g)
becomes
a = b+c-x
a = d+e-y
a = f+g-z
x >= 0
y >= 0
z >= 0
minimize sum(x+y+z+....)

Minimal data structure to prevent 2D-grid traveler from repeat itself

I'm sorry if this is a duplicate of some thread, but I'm really not sure how to describe the question.
I'm wondering what is the minimal data structure to prevent 2D-grid traveler from repeating itself (i.e. travel to some point it already traveled before). The traveler can only move horizontally or vertically 1 step each time. For my special case (below), the 2D-grid is actually a lower-left triagle where one coordinate never exceeds another.
For example, with 1D case, this can be simply done by recording the direction of last travel. If direction changes, it's repeating itself.
For 2D case it becomes complicated. The most trivial way would be creating a list recording the points traveled before, but I'm wondering are there more efficient ways to do that?
I'm implementing a more-or-less "4-finger" algorithm for 4-sum where the 2 fingers in the middle moves in two directions (namely i, j, k, and l):
i=> <=j=> <=k=> <=l
1 2 3 ... 71 72 ... 123 124 ... 201 202 203
The directions fingers travel are decided (or suggested) by some algorithm but might lead to forever-loop. Therefore, I have to force not to take some suggestion if the 2 fingers in the middle starts to repeat history position.
EDIT
Among these days, I found 2 solutions. None of them is ideal solution to this problem, but they're at least somewhat usable:
As #Sorin mentioned below, one solution would be saving a bit array representing state of all cells. For the triangular-grid example here, we can even condense the array to cut memory cost by half (though requiring k^2 time to compute the bit position where k is the degree of freedom i.e. 2 here. A standard array would use only linear time).
Another solution would be directly avoid backward-travelling. Set up the algorithm such that j and k only move in one direction (this is probably greedy).
But still since the 2D-grid traveler have the nice property that it moves along axis 1 step each time, I'm wondering are there more "specialized" representation
for this kind of movement.
Thanks for your help!
If you are looking for optimal lookup complexity, then a hashset is the best thing. You need O(N) memory but all lookups & insertions will be O(1).
If it's often that you visit most of the cells then you can even skip the hash part and store a bit array. That is store one bit for every cell and just check if the corresponding bit is 0 or 1. This is much more compact in memory (at least 32x, one bit vs. one int, but likely more as you also skip storing some pointers internal to the datastructure, 64 bits).
If this still take too much space, you could use a bloom filter (link), but that will give you some false positives (tells you that you've visited a cell, but in fact you didn't). If that's something you can live with the space savings are fairly huge.
Other structures like BSP or Kd-trees could work as well. Once you reach a point where everything is either free or occupied (ignoring the unused cells in the upper triangle) you can store all that information in a single node.
This is hard to recommend because of it's complexity and that it will likely also use O(N) memory in many cases, but with a larger constant. Also all checks will be O(logN).

Ideas for heuristically solving travelling salesman with extra constraints

I'm trying to come up with a fast and reasonably optimal algorithm to solve the following TSP/hamiltonian-path-like problem:
A delivery vehicle has a number of pickups and dropoffs it needs to
perform:
For each delivery, the pickup needs to come before the
dropoff.
The vehicle is quite small and the packages vary in size.
The total carriage cannot exceed some upper bound (e.g. 1 cubic
metre). Each delivery has a deadline.
The planner can run mid-route, so the vehicle will begin with a number of jobs already picked up and some capacity already taken up.
A near-optimal solution should minimise the total cost (for simplicity, distance) between each waypoint. If a solution does not exist because of the time constraints, I need to find a solution that has the fewest number of late deliveries. Some illustrations of an example problem and a non-optimal, but valid solution:
I am currently using a greedy best first search with backtracking bounded to 100 branches. If it fails to find a solution with on-time deliveries, I randomly generate as many as I can in one second (the most computational time I can spare) and pick the one with the fewest number of late deliveries. I have looked into linear programming but can't get my head around it - plus I would think it would be inappropriate given it needs to be run very frequently. I've also tried algorithms that require mutating the tour, but the issue is mutating a tour nearly always makes it invalid due to capacity constraints and precedence. Can anyone think of a better heuristic approach to solving this problem? Many thanks!
Safe Moves
Here are some ideas for safely mutating an existing feasible solution:
Any two consecutive stops can always be swapped if they are both pickups, or both deliveries. This is obviously true for the "both deliveries" case; for the "both pickups" case: if you had room to pick up A, then pick up B without delivering anything in between, then you have room to pick up B first, then pick up A. (In fact a more general rule is possible: In any pure-delivery or pure-pickup sequence of consecutive stops, the stops can be rearranged arbitrarily. But enumerating all the possibilities might become prohibitive for long sequences, and you should be able to get most of the benefit by considering just pairs.)
A pickup of A can be swapped with any later delivery of something else B, provided that A's original pickup comes after B was picked up, and A's own delivery comes after B's original delivery. In the special case where the pickup of A is immediately followed by the delivery of B, they can always be swapped.
If there is a delivery of an item of size d followed by a pickup of an item of size p, then they can be swapped provided that there is enough extra room: specifically, provided that f >= p, where f is the free space available before the delivery. (We already know that f + d >= p, otherwise the original schedule wouldn't be feasible -- this is a hint to look for small deliveries to apply this rule to.)
If you are starting from purely randomly generated schedules, then simply trying all possible moves, greedily choosing the best, applying it and then repeating until no more moves yield an improvement should give you a big quality boost!
Scoring Solutions
It's very useful to have a way to score a solution, so that they can be ordered. The nice thing about a score is that it's easy to incorporate levels of importance: just as the first digit of a two-digit number is more important than the second digit, you can design the score so that more important things (e.g. deadline violations) receive a much greater weight than less important things (e.g. total travel time or distance). I would suggest something like 1000 * num_deadline_violations + total_travel_time. (This assumes of course that total_travel_time is in units that will stay beneath 1000.) We would then try to minimise this.
Managing Solutions
Instead of taking one solution and trying all the above possible moves on it, I would instead suggest using a pool of k solutions (say, k = 10000) stored in a min-heap. This allows you to extract the best solution in the pool in O(log k) time, and to insert new solutions in the same time.
You could initially populate the pool with randomly generated feasible solutions; then on each step, you would extract the best solution in the pool, try all possible moves on it to generate child solutions, and insert any child solutions that are better than their parent back into the pool. Whenever the pool doubles in size, pull out the first (i.e. best) k solutions and make a new min-heap with them, discarding the old one. (Performing this step after the heap grows to a constant multiple of its original size like this has the nice property of leaving the amortised time complexity unchanged.)
It can happen that some move on solution X produces a child solution Y that is already in the pool. This wastes memory, which is unfortunate, but one nice property of the min-heap approach is that you can at least handle these duplicates cheaply when they arrive at the front of the heap: all duplicates will have identical scores, so they will all appear consecutively when extracting solutions from the top of the heap. Thus to avoid having duplicate solutions generate duplicate children "down through the generations", it suffices to check that the new top of the heap is different from the just-extracted solution, and keep extracting and discarding solutions until this holds.
A note on keeping worse solutions: It might seem that it could be worthwhile keeping child solutions even if they are slightly worse than their parent, and indeed this may be useful (or even necessary to find the absolute optimal solution), but doing so has a nasty consequence: it means that it's possible to cycle from one solution to its child and back again (or possibly a longer cycle). This wastes CPU time on solutions we have already visited.
You are basically combining the Knapsack Problem with the Travelling Salesman Problem.
Your main problem here seems to be actually the Knapsack Problem, rather then the Travelling Salesman Problem, since it has the one hard restriction (maximum delivery volume). Maybe try to combine the solutions for the Knapsack Problem with the Travelling Salesman.
If you really only have one second max for calculations a greedy algorithm with backtracking might actually be one of the best solutions that you can get.

Exploration Algorithm

Massively edited this question to make it easier to understand.
Given an environment with arbitrary dimensions and arbitrary positioning of an arbitrary number of obstacles, I have an agent exploring the environment with a limited range of sight (obstacles don't block sight). It can move in the four cardinal directions of NSEW, one cell at a time, and the graph is unweighted (each step has a cost of 1). Linked below is a map representing the agent's (yellow guy) current belief of the environment at the instant of planning. Time does not pass in the simulation while the agent is planning.
http://imagizer.imageshack.us/a/img913/9274/qRsazT.jpg
What exploration algorithm can I use to maximise the cost-efficiency of utility, given that revisiting cells are allowed? Each cell holds a utility value. Ideally, I would seek to maximise the sum of utility of all cells SEEN (not visited) divided by the path length, although if that is too complex for any suitable algorithm then the number of cells seen will suffice. There is a maximum path length but it is generally in the hundreds or higher. (The actual test environments used on my agent are at least 4x bigger, although theoretically there is no upper bound on the dimensions that can be set, and the maximum path length would thus increase accordingly)
I consider BFS and DFS to be intractable, A* to be non-optimal given a lack of suitable heuristics, and Dijkstra's inappropriate in generating a single unbroken path. Is there any algorithm you can think of? Also, I need help with loop detection, as I've never done that before since allowing revisitations is my first time.
One approach I have considered is to reduce the map into a spanning tree, except that instead of defining it as a tree that connects all cells, it is defined as a tree that can see all cells. My approach would result in the following:
http://imagizer.imageshack.us/a/img910/3050/HGu40d.jpg
In the resultant tree, the agent can go from a node to any adjacent nodes that are 0-1 turn away at intersections. This is as far as my thinking has gotten right now. A solution generated using this tree may not be optimal, but it should at least be near-optimal with much fewer cells being processed by the algorithm, so if that would make the algorithm more likely to be tractable, then I guess that is an acceptable trade-off. I'm still stuck with thinking how exactly to generate a path for this however.
Your problem is very similar to a canonical Reinforcement Learning (RL) problem, the Grid World. I would formalize it as a standard Markov Decision Process (MDP) and use any RL algorithm to solve it.
The formalization would be:
States s: your NxM discrete grid.
Actions a: UP, DOWN, LEFT, RIGHT.
Reward r: the value of the cells that the agent can see from the destination cell s', i.e. r(s,a,s') = sum(value(seen(s')).
Transition function: P(s' | s, a) = 1 if s' is not out of the boundaries or a black cell, 0 otherwise.
Since you are interested in the average reward, the discount factor is 1 and you have to normalize the cumulative reward by the number of steps. You also said that each step has cost one, so you could subtract 1 to the immediate reward rat each time step, but this would not add anything since you will already average by the number of steps.
Since the problem is discrete the policy could be a simple softmax (or Gibbs) distribution.
As solving algorithm you can use Q-learning, which guarantees the optimality of the solution provided a sufficient number of samples. However, if your grid is too big (and you said that there is no limit) I would suggest policy search algorithms, like policy gradient or relative entropy (although they guarantee convergence only to local optima). You can find something about Q-learning basically everywhere on the Internet. For a recent survey on policy search I suggest this.
The cool thing about these approaches is that they encode the exploration in the policy (e.g., the temperature in a softmax policy, the variance in a Gaussian distribution) and will try to maximize the cumulative long term reward as described by your MDP. So usually you initialize your policy with a high exploration (e.g., a complete random policy) and by trial and error the algorithm will make it deterministic and converge to the optimal one (however, sometimes also a stochastic policy is optimal).
The main difference between all the RL algorithms is how they perform the update of the policy at each iteration and manage the tradeoff exploration-exploitation (how much should I explore VS how much should I exploit the information I already have).
As suggested by Demplo, you could also use Genetic Algorithms (GA), but they are usually slower and require more tuning (elitism, crossover, mutation...).
I have also tried some policy search algorithms on your problem and they seems to work well, although I initialized the grid randomly and do not know the exact optimal solution. If you provide some additional details (a test grid, the max number of steps and if the initial position is fixed or random) I can test them more precisely.

A* Algorithm for very large graphs, any thoughts on caching shortcuts?

I'm writing a courier/logistics simulation on OpenStreetMap maps and have realised that the basic A* algorithm as pictured below is not going to be fast enough for large maps (like Greater London).
The green nodes correspond to ones that were put in the open set/priority queue and due to the huge number (the whole map is something like 1-2 million), it takes 5 seconds or so to find the route pictured. Unfortunately 100ms per route is about my absolute limit.
Currently, the nodes are stored in both an adjacency list and also a spatial 100x100 2D array.
I'm looking for methods where I can trade off preprocessing time, space and if needed optimality of the route, for faster queries. The straight-line Haversine formula for the heuristic cost is the most expensive function according to the profiler - I have optimised my basic A* as much as I can.
For example, I was thinking if I chose an arbitrary node X from each quadrant of the 2D array and run A* between each, I can store the routes to disk for subsequent simulations. When querying, I can run A* search only in the quadrants, to get between the precomputed route and the X.
Is there a more refined version of what I've described above or perhaps a different method I should pursue. Many thanks!
For the record, here are some benchmark results for arbitrarily weighting the heuristic cost and computing the path between 10 pairs of randomly picked nodes:
Weight // AvgDist% // Time (ms)
1 1 1461.2
1.05 1 1327.2
1.1 1 900.7
1.2 1.019658848 196.4
1.3 1.027619169 53.6
1.4 1.044714394 33.6
1.5 1.063963413 25.5
1.6 1.071694171 24.1
1.7 1.084093229 24.3
1.8 1.092208509 22
1.9 1.109188175 22.5
2 1.122856792 18.2
2.2 1.131574742 16.9
2.4 1.139104895 15.4
2.6 1.140021962 16
2.8 1.14088128 15.5
3 1.156303676 16
4 1.20256964 13
5 1.19610861 12.9
Surprisingly increasing the coefficient to 1.1 almost halved the execution time whilst keeping the same route.
You should be able to make it much faster by trading off optimality. See Admissibility and optimality on wikipedia.
The idea is to use an epsilon value which will lead to a solution no worse than 1 + epsilon times the optimal path, but which will cause fewer nodes to be considered by the algorithm. Note that this does not mean that the returned solution will always be 1 + epsilon times the optimal path. This is just the worst case. I don't know exactly how it would behave in practice for your problem, but I think it is worth exploring.
You are given a number of algorithms that rely on this idea on wikipedia. I believe this is your best bet to improve the algorithm and that it has the potential to run in your time limit while still returning good paths.
Since your algorithm does deal with millions of nodes in 5 seconds, I assume you also use binary heaps for the implementation, correct? If you implemented them manually, make sure they are implemented as simple arrays and that they are binary heaps.
There are specialist algorithms for this problem that do a lot of pre-computation. From memory, the pre-computation adds information to the graph that A* uses to produce a much more accurate heuristic than straight line distance. Wikipedia gives the names of a number of methods at http://en.wikipedia.org/wiki/Shortest_path_problem#Road_networks and says that Hub Labelling is the leader. A quick search on this turns up http://research.microsoft.com/pubs/142356/HL-TR.pdf. An older one, using A*, is at http://research.microsoft.com/pubs/64505/goldberg-sp-wea07.pdf.
Do you really need to use Haversine? To cover London, I would have thought you could have assumed a flat earth and used Pythagoras, or stored the length of each link in the graph.
There's a really great article that Microsoft Research wrote on the subject:
http://research.microsoft.com/en-us/news/features/shortestpath-070709.aspx
The original paper is hosted here (PDF):
http://www.cc.gatech.edu/~thad/6601-gradAI-fall2012/02-search-Gutman04siam.pdf
Essentially there's a few things you can try:
Start from the both the source as well as the destination. This helps to minimize the amount of wasted work that you'd perform when traversing from the source outwards towards the destination.
Use landmarks and highways. Essentially, find some positions in each map that are commonly taken paths and perform some pre-calculation to determine how to navigate efficiently between those points. If you can find a path from your source to a landmark, then to other landmarks, then to your destination, you can quickly find a viable route and optimize from there.
Explore algorithms like the "reach" algorithm. This helps to minimize the amount of work that you'll do when traversing the graph by minimizing the number of vertices that need to be considered in order to find a valid route.
GraphHopper does two things more to get fast, none-heuristic and flexible routing (note: I'm the author and you can try it online here)
A not so obvious optimization is to avoid 1:1 mapping of OSM nodes to internal nodes. Instead GraphHopper uses only junctions as nodes and saves roughly 1/8th of traversed nodes.
It has efficient implements for A*, Dijkstra or e.g. one-to-many Dijkstra. Which makes a route in under 1s possible through entire Germany. The (none-heuristical) bidirectional version of A* makes this even faster.
So it should be possible to get you fast routes for greater London.
Additionally the default mode is the speed mode which makes everything an order of magnitudes faster (e.g. 30ms for European wide routes) but less flexible, as it requires preprocessing (Contraction Hierarchies). If you don't like this, just disable it and also further fine-tune the included streets for car or probably better create a new profile for trucks - e.g. exclude service streets and tracks which should give you a further 30% boost. And as with any bidirectional algorithm you could easily implement a parallel search.
I think it's worth to work-out your idea with "quadrants". More strictly, I'd call it a low-resolution route search.
You may pick X connected nodes that are close enough, and treat them as a single low-resolution node. Divide your whole graph into such groups, and you get a low-resolution graph. This is a preparation stage.
In order to compute a route from source to target, first identify the low-res nodes they belong to, and find the low-resolution route. Then improve your result by finding the route on high-resolution graph, however restricting the algorithm only to nodes that belong to hte low-resolution nodes of the low-resolution route (optionally you may also consider neighbor low-resolution nodes up to some depth).
This may also be generalized to multiple resolutions, not just high/low.
At the end you should get a route that is close enough to optimal. It's locally optimal, but may be somewhat worse than optimal globally by some extent, which depends on the resolution jump (i.e. the approximation you make when a group of nodes is defined as a single node).
There are dozens of A* variations that may fit the bill here. You have to think about your use cases, though.
Are you memory- (and also cache-) constrained?
Can you parallelize the search?
Will your algorithm implementation be used in one location only (e.g. Greater London and not NYC or Mumbai or wherever)?
There's no way for us to know all the details that you and your employer are privy to. Your first stop thus should be CiteSeer or Google Scholar: look for papers that treat pathfinding with the same general set of constraints as you.
Then downselect to three or four algorithms, do the prototyping, test how they scale up and finetune them. You should bear in mind you can combine various algorithms in the same grand pathfinding routine based on distance between the points, time remaining, or any other factors.
As has already been said, based on the small scale of your target area dropping Haversine is probably your first step saving precious time on expensive trig evaluations. NOTE: I do not recommend using Euclidean distance in lat, lon coordinates - reproject your map into a e.g. transverse Mercator near the center and use Cartesian coordinates in yards or meters!
Precomputing is the second one, and changing compilers may be an obvious third idea (switch to C or C++ - see https://benchmarksgame.alioth.debian.org/ for details).
Extra optimization steps may include getting rid of dynamic memory allocation, and using efficient indexing for search among the nodes (think R-tree and its derivatives/alternatives).
I worked at a major Navigation company, so I can say with confidence that 100 ms should get you a route from London to Athens even on an embedded device. Greater London would be a test map for us, as it's conveniently small (easily fits in RAM - this isn't actually necessary)
First off, A* is entirely outdated. Its main benefit is that it "technically" doesn't require preprocessing. In practice, you need to pre-process an OSM map anyway so that's a pointless benefit.
The main technique to give you a huge speed boost is arc flags. If you divide the map in say 5x6 sections, you can allocate 1 bit position in a 32 bits integer for each section. You can now determine for each edge whether it's ever useful when traveling to section {X,Y} from another section. Quite often, roads are bidirectional and this means only one of the two directions is useful. So one of the two directions has that bit set, and the other has it cleared. This may not appear to be a real benefit, but it means that on many intersections you reduce the number of choices to consider from 2 to just 1, and this takes just a single bit operation.
Usually A* comes along with too much memory consumption rather than time stuggles.
However I think it could be useful to first only compute with nodes that are part of "big streets" you would choose a highway over a tiny alley usually.
I guess you may already use this for your weight function but you can be faster if you use some priority Queue to decide which node to test next for further travelling.
Also you could try reducing the graph to only nodes that are part of low cost edges and then find a way from to start/end to the closest of these nodes.
So you have 2 paths from start to the "big street" and the "big street" to end.
You can now compute the best path between the two nodes that are part of the "big streets" in a reduced graph.
Old question, but yet:
Try to use different heaps that "binary heap". 'Best asymptotic complexity heap' is definetly Fibonacci Heap and it's wiki page got a nice overview:
https://en.wikipedia.org/wiki/Fibonacci_heap#Summary_of_running_times
Note that binary heap has simpler code and it's implemented over array and traversal of array is predictable, so modern CPU executes binary heap operations much faster.
However, given dataset big enough, other heaps will win over binary heap, because of their complexities...
This question seems like dataset big enough.

Resources