AI: Fastest algorithm to find if path exists? - algorithm

I am looking for a pathfinding algorithm to use for an AI controlling an entity in a 2D grid that needs to find a path from A to B. It does not have to be the shortest path but it needs to be calculated very fast. The grid is static (never changes) and some grid cells are occupied by obstacles.
I'm currently using A* but it is too slow for my purposes because it always tries to calculate the fastest path. The main performance problem occurs when the path does not exist, in which case A* will try to explore too many cells.
Is there a different algorithm I could use that could find a path faster than A* if the path doesn't have to be the shortest path?
Thanks,
Luminal

Assuming your grid is static and doesn't change. You can calculate the connected components of your graph once after building the grid.
Then you can easily check if source and target vertex are within a component or not. If yes, then execute A*, if not then don't as there can't be a path between the components.
You can get the connected components of a graph using BFS or DFS.

To find a path instead of the shortest path, use any graph traversal (e.g. depth-first or best-first). It won't necessarily be faster, in fact it may check many more nodes than A* on some graphs, so it depends on your data. However, it will be easier to implement and the constant factors will be significantly lower.
To avoid search for a path when there is none, you could create disjoint sets (once after you built the graph) to very quickly check whether two given points are connected. This takes linear space and linear time to build, and lookup takes amortized practically-constant time, but you still need to run your full algorithm at times, as it will only tell you whether there is a path, not where that path goes.
If you're already building data structures beforehand, and have a bit more time and space to trade for instant shortest paths at run-time, you can have your cake and eat it too: The Floyd-Warshall algorithm gives you all shortest paths in comparatively modest O(|V|^3) time, which is the most bang for the buck considering there are |V|² (start, destination) pairs. It computes a |V| * |V| matrix, which could be a bit large, but consider that this is an integer matrix and you only need |V| * |V| * log_2 |V| bits (for example, that's 1.25 MiB for 1024 vertices).

You can use either DFS or BFS since you just want to know if the two vertices are connected. Both algorithms run in O(|V|) where V is the set of all vertices in the graph.
Use any of this two algorithms if your heuristic takes some non trivial time to get computed, otherwise I think A* should run similarly or better than DFS or BFS.
As another option you can use the Floyd-Warshall algorithm (O(V^3)) to calculate, after you create the grid, the shortest distance path between each pair of vertices, thus doing all the heavy lifting at the start of the simulation and then have stored all shortest paths for O(1) access in a hash, or if this turns out to be too memory explosive you can just keep a matrix next such that next[i][j] stores the vertex that we must take to go from vertex i to vertex j. Thus we can build the path from i to jas (i, k1=next[i][j]), (k1, k2=next[k1][j]) ... (kr, j)

If the graph is small enough, you can precompute all shortest paths using the Floyd-Warshall algorithm. This takes O(|V|²) memory for storing the paths, and O(|V|³) time for the precomputation.
Obviously this is not an option for very large graphs. For those you should use Thomas's answer and precompute the connected components (takes linear time and memory) to avoid the most expensive A* searches.

A*, BFS, DFS, and all other search algorithms will have to check exactly the same number of nodes when there is no path, so "use DFS first" is not a good answer. And using Floyd-Warshall is completely unnecessary.
For static graphs, the solution is simple; see #Thomas's answer. For non-static graphs, the problem is more complicated; see this answer for a good algorithm.

In case your maze never changes and any path that exists exists for ever, could you not use a mapping algorithm wich would find "regions" of your maze and store those in some format? The memory usage would be linear with the amount of nodes or cells and speed is the speed of accessing and comparing two elements of an array.
The computing (spliting to regions) would probably be more time consuming but it's done once in the beginning. Maybe you could adapt some flood fill algorithm for this purpose?
Not sure if it's clear from the above, but i think an example is always ahelp. I hope you'll forgive me using PHP syntax.
example (maze 5x5) ([] marks an obstacle):
0 1 2 3 4
+----------+
0 |[][] 2 |
1 | [][] |
2 | [] []|
3 | 1 [] |
4 | [] |
+----------+
indexed regions (using numeric hash instead of 'x:y' may be better):
$regions=array(
'0:1'=>1,'0:2'=>1,'0:3'=>1,'0:4'=>1,'1:2'=>1,'1:3'=>1,'1:4'=>1,
'2:0'=>2,'3:0'=>2,'3:1'=>2,'3:2'=>2,'3:3'=>2,'3:4'=>2,'4:0'=>2,
'4:1'=>2,'4:3'=>2,'4:4'=>2
);
then your task is only to find whether your start and end point are both in the same region:
if ( $regions[$startPoint] == $regions[$endPoint] )
pathExists();
Now if i am not mistaken the A* complexity (speed) depends on the distance between start and end points (or length of solution), and that may perhaps be used to speed up your search.
I would try to create some "junction nodes" (JN) in the maze. Those could be located on a function (to know fast where the nearest JN is) and you would have paths between all neighbouring JN precalculated.
So then you need only search for solution from startpoint to nearest JN and from endpoint to it's nearest JN (where all of them are in the same region (the above)). Now i can see a scenario (if the function isn't well chosen with regards to the comlexity of the maze) that has several regions, some of which may have no JN at all or that all your JN fall onto obstacles in the maze... So it may be better to manually define those JN if possible or make this JN placing function non-trivial (taking in consideration the area of each region).
Once you reach the JN you may have the paths between them either indexed (to fast retrieve predefined path between start and end JN) or perform another A* pathfinding, except this time only on the set of "junction nodes" - as there's much less of those this path search between JN will be faster.

You may consider using an Anytime A* algorithm (ANA* or other variants).
These will start by performing a greedy best first search to find an initial path.
It will then make incremental improvements by running with the heuristic function increasingly less less weighted.
You can call off the search at any time and get its best path found so far.

Related

create a path between the source and the destination in MATLAB

I want to simulate an algorithm using the MATLAB. This algorithm is designed to find the optimal path among different path in order to be used for routing. The algorithm starts at a specific node (source) then move to a node that is a neighbor to the source node and so on until it reach the destination. I have a n by n matrix (n is the number of nodes) that contains 0 and 1 that is used to determine if the node is neighbor or not to another node (it will have a value of 1 if two nodes are neighbors). so how I can create the different paths from the source to the destination?
That's related with math's theory of graphs. It is familiar to you?
I would say using the BFS's algorithm is going to be the easier way. Between it's applications you find " Finding the shortest path between two nodes u and v (with path length measured by number of edges) ". You can find some theory about it here http://en.wikipedia.org/wiki/Breadth-first_search moreover the algorithm in pseudocode done.
If it isn't quite good for your purpose, you can move to the best available one: Dijkstra's algorithm ( http://en.wikipedia.org/wiki/Dijkstras_algorithm ). But I don't think you need something that sophisticated.
Also I would like to remind you that Matlab isn't the best choice when you have to do loop's stuff, since it somehow is time consuming on it.
I hope this helps.

Difference and advantages between dijkstra & A star [duplicate]

This question already has answers here:
How does Dijkstra's Algorithm and A-Star compare?
(12 answers)
Closed 4 years ago.
I read this:
http://en.wikipedia.org/wiki/A*_search_algorithm
It says A* is faster than using dijkstra and uses best-first-search to speed things up.
If I need the algorithm to run in milliseconds, when does A* become the most prominent choice.
From what I understand it does not necessarily return the best results.
If I need quick results, is it better to pre-compute the paths? It may take megabytes of space to store them.
It says A* is faster than using dijkstra and uses best-first-search to
speed things up.
A* is basically an informed variation of Dijkstra.
A* is considered a "best first search" because it greedily chooses which vertex to explore next, according to the value of f(v) [f(v) = h(v) + g(v)] - where h is the heuristic and g is the cost so far.
Note that if you use a non informative heuristic function: h(v) = 0 for each v: you get that A* chooses which vertex to develop next according to the "so far cost" (g(v)) only, same as Dijkstra's algorithm - so if h(v) = 0, A* defaults to Dijkstra's Algorithm.
If I need the algorithm to run in milliseconds, when does A* become
the most prominent choice.
Not quite, it depends on a lot of things. If you have a decent heuristic function - from my personal experience, greedy best first (choosing according to the heuristic function alone) - is usually significantly faster than A* (but is not even near optimal).
From what I understand it does not necessarily return the best
results.
A* is both complete (finds a path if one exists) and optimal (always finds the shortest path) if you use an Admissible heuristic function. If your function is not admissible - all bets are off.
If I need quick results, is it better to pre-compute the paths? It may
take megabytes of space to store them.
This is a common optimization done on some problems, for example on the 15-puzzle problem, but it is more advanced. A path from point A to point B is called a Macro. Some paths are very useful and should be remembered. A Machine Learning component is added to the algorithm in order to speed things up by remembering these Macros.
Note that the path from point A to point B in here is usually not on the states graph - but in the problem itself (for example, how to move a square from the lowest line to the upper line...)
To speed things up:
If you have a heuristic and you find it too slow, and you want a quicker solution, even if not optimal - A* Epsilon is usually faster then A*, while giving you a bound on the optimality of the path (how close it is to being optimal).
Dijkstra is a special case for A* (when the heuristics is zero).
A* search:
It has two cost function.
g(n): same as Dijkstra. The real cost to reach a node n.
h(n): approximate cost from node n to goal node. It is a heuristic function. This heuristic function should never overestimate the cost. That means, the real cost to reach goal node from node n should be greater than or equal h(n). It is called admissible heuristic.
The total cost of each node is calculated by f(n)=g(n)+h(n)
Dijkstra's:
It has one cost function, which is real cost value from source to each node: f(n)=g(n)
It finds the shortest path from source to every other node by considering only real cost.
A* is just like Dijkstra, the only difference is that A* tries to look for a better path by using a heuristic function which gives priority to nodes that are supposed to be better than others while Dijkstra's just explore all possible paths.
Its optimality depends on the heuristic function used, so yes it can return a non optimal result because of this and at the same time better the heuristic for your specific layout, and better will be the results (and possibly the speed).
It is meant to be faster than Dijkstra even if it requires more memory and more operations per node since it explores a lot less nodes and the gain is good in any case.
Precomputing the paths could be the only way if you need realtime results and the graph is quite large, but usually you wish to pathfind the route less frequently (I'm assuming you want to calculate it often).
These algorithems can be used in pathfinding and graph traversal, the process of plotting an efficiently directed path between multiple points, called nodes.
Formula for a* is f =g + h., g means actual cost and h means heuristic cost.
formula for Dijktras is f = g. there is no heuristic cost. when we are using a* and if heuristic cost is 0 then it'll equal to Dijktras algorithem.
Short answer:
A* uses heuristics to optimize the search. That is, you are able to define a function that to some degree can estimate the cost from one node to the target. This is particulary useful when you are searching for a path on a geographical representation (map) where you can, for instance, guess the distance to the target from a given graph node. Hence, typically A* is used for path finding in games etc. Where Djikstra is used in more generic cases.
No, A* won't always give the best path.
If heuristic is the "geographical" distance, the following example might give the non optimal path.
[airport] - [road] - [start] -> [road] -> [road] -> [road] -> [road] -> [target] - [airport]
|----------------------------------------------------------------|

Generating a tower defense maze (longest maze with limited walls) - near-optimal heuristic?

In a tower defense game, you have an NxM grid with a start, a finish, and a number of walls.
Enemies take the shortest path from start to finish without passing through any walls (they aren't usually constrained to the grid, but for simplicity's sake let's say they are. In either case, they can't move through diagonal "holes")
The problem (for this question at least) is to place up to K additional walls to maximize the path the enemies have to take. For example, for K=14
My intuition tells me this problem is NP-hard if (as I'm hoping to do) we generalize this to include waypoints that must be visited before moving to the finish, and possibly also without waypoints.
But, are there any decent heuristics out there for near-optimal solutions?
[Edit] I have posted a related question here.
I present a greedy approach and it's maybe close to the optimal (but I couldn't find approximation factor). Idea is simple, we should block the cells which are in critical places of the Maze. These places can help to measure the connectivity of maze. We can consider the vertex connectivity and we find minimum vertex cut which disconnects the start and final: (s,f). After that we remove some critical cells.
To turn it to the graph, take dual of maze. Find minimum (s,f) vertex cut on this graph. Then we examine each vertex in this cut. We remove a vertex its deletion increases the length of all s,f paths or if it is in the minimum length path from s to f. After eliminating a vertex, recursively repeat the above process for k time.
But there is an issue with this, this is when we remove a vertex which cuts any path from s to f. To prevent this we can weight cutting node as high as possible, means first compute minimum (s,f) cut, if cut result is just one node, make it weighted and set a high weight like n^3 to that vertex, now again compute the minimum s,f cut, single cutting vertex in previous calculation doesn't belong to new cut because of waiting.
But if there is just one path between s,f (after some iterations) we can't improve it. In this case we can use normal greedy algorithms like removing node from a one of a shortest path from s to f which doesn't belong to any cut. after that we can deal with minimum vertex cut.
The algorithm running time in each step is:
min-cut + path finding for all nodes in min-cut
O(min cut) + O(n^2)*O(number of nodes in min-cut)
And because number of nodes in min cut can not be greater than O(n^2) in very pessimistic situation the algorithm is O(kn^4), but normally it shouldn't take more than O(kn^3), because normally min-cut algorithm dominates path finding, also normally path finding doesn't takes O(n^2).
I guess the greedy choice is a good start point for simulated annealing type algorithms.
P.S: minimum vertex cut is similar to minimum edge cut, and similar approach like max-flow/min-cut can be applied on minimum vertex cut, just assume each vertex as two vertex, one Vi, one Vo, means input and outputs, also converting undirected graph to directed one is not hard.
it can be easily shown (proof let as an exercise to the reader) that it is enough to search for the solution so that every one of the K blockades is put on the current minimum-length route. Note that if there are multiple minimal-length routes then all of them have to be considered. The reason is that if you don't put any of the remaining blockades on the current minimum-length route then it does not change; hence you can put the first available blockade on it immediately during search. This speeds up even a brute-force search.
But there are more optimizations. You can also always decide that you put the next blockade so that it becomes the FIRST blockade on the current minimum-length route, i.e. you work so that if you place the blockade on the 10th square on the route, then you mark the squares 1..9 as "permanently open" until you backtrack. This saves again an exponential number of squares to search for during backtracking search.
You can then apply heuristics to cut down the search space or to reorder it, e.g. first try those blockade placements that increase the length of the current minimum-length route the most. You can then run the backtracking algorithm for a limited amount of real-time and pick the best solution found thus far.
I believe we can reduce the contained maximum manifold problem to boolean satisifiability and show NP-completeness through any dependency on this subproblem. Because of this, the algorithms spinning_plate provided are reasonable as heuristics, precomputing and machine learning is reasonable, and the trick becomes finding the best heuristic solution if we wish to blunder forward here.
Consider a board like the following:
..S........
#.#..#..###
...........
...........
..........F
This has many of the problems that cause greedy and gate-bound solutions to fail. If we look at that second row:
#.#..#..###
Our logic gates are, in 0-based 2D array ordered as [row][column]:
[1][4], [1][5], [1][6], [1][7], [1][8]
We can re-render this as an equation to satisfy the block:
if ([1][9] AND ([1][10] AND [1][11]) AND ([1][12] AND [1][13]):
traversal_cost = INFINITY; longest = False # Infinity does not qualify
Excepting infinity as an unsatisfiable case, we backtrack and rerender this as:
if ([1][14] AND ([1][15] AND [1][16]) AND [1][17]:
traversal_cost = 6; longest = True
And our hidden boolean relationship falls amongst all of these gates. You can also show that geometric proofs can't fractalize recursively, because we can always create a wall that's exactly N-1 width or height long, and this represents a critical part of the solution in all cases (therefore, divide and conquer won't help you).
Furthermore, because perturbations across different rows are significant:
..S........
#.#........
...#..#....
.......#..#
..........F
We can show that, without a complete set of computable geometric identities, the complete search space reduces itself to N-SAT.
By extension, we can also show that this is trivial to verify and non-polynomial to solve as the number of gates approaches infinity. Unsurprisingly, this is why tower defense games remain so fun for humans to play. Obviously, a more rigorous proof is desirable, but this is a skeletal start.
Do note that you can significantly reduce the n term in your n-choose-k relation. Because we can recursively show that each perturbation must lie on the critical path, and because the critical path is always computable in O(V+E) time (with a few optimizations to speed things up for each perturbation), you can significantly reduce your search space at a cost of a breadth-first search for each additional tower added to the board.
Because we may tolerably assume O(n^k) for a deterministic solution, a heuristical approach is reasonable. My advice thus falls somewhere between spinning_plate's answer and Soravux's, with an eye towards machine learning techniques applicable to the problem.
The 0th solution: Use a tolerable but suboptimal AI, in which spinning_plate provided two usable algorithms. Indeed, these approximate how many naive players approach the game, and this should be sufficient for simple play, albeit with a high degree of exploitability.
The 1st-order solution: Use a database. Given the problem formulation, you haven't quite demonstrated the need to compute the optimal solution on the fly. Therefore, if we relax the constraint of approaching a random board with no information, we can simply precompute the optimum for all K tolerable for each board. Obviously, this only works for a small number of boards: with V! potential board states for each configuration, we cannot tolerably precompute all optimums as V becomes very large.
The 2nd-order solution: Use a machine-learning step. Promote each step as you close a gap that results in a very high traversal cost, running until your algorithm converges or no more optimal solution can be found than greedy. A plethora of algorithms are applicable here, so I recommend chasing the classics and the literature for selecting the correct one that works within the constraints of your program.
The best heuristic may be a simple heat map generated by a locally state-aware, recursive depth-first traversal, sorting the results by most to least commonly traversed after the O(V^2) traversal. Proceeding through this output greedily identifies all bottlenecks, and doing so without making pathing impossible is entirely possible (checking this is O(V+E)).
Putting it all together, I'd try an intersection of these approaches, combining the heat map and critical path identities. I'd assume there's enough here to come up with a good, functional geometric proof that satisfies all of the constraints of the problem.
At the risk of stating the obvious, here's one algorithm
1) Find the shortest path
2) Test blocking everything node on that path and see which one results in the longest path
3) Repeat K times
Naively, this will take O(K*(V+ E log E)^2) but you could with some little work improve 2 by only recalculating partial paths.
As you mention, simply trying to break the path is difficult because if most breaks simply add a length of 1 (or 2), its hard to find the choke points that lead to big gains.
If you take the minimum vertex cut between the start and the end, you will find the choke points for the entire graph. One possible algorithm is this
1) Find the shortest path
2) Find the min-cut of the whole graph
3) Find the maximal contiguous node set that intersects one point on the path, block those.
4) Wash, rinse, repeat
3) is the big part and why this algorithm may perform badly, too. You could also try
the smallest node set that connects with other existing blocks.
finding all groupings of contiguous verticies in the vertex cut, testing each of them for the longest path a la the first algorithm
The last one is what might be most promising
If you find a min vertex cut on the whole graph, you're going to find the choke points for the whole graph.
Here is a thought. In your grid, group adjacent walls into islands and treat every island as a graph node. Distance between nodes is the minimal number of walls that is needed to connect them (to block the enemy).
In that case you can start maximizing the path length by blocking the most cheap arcs.
I have no idea if this would work, because you could make new islands using your points. but it could help work out where to put walls.
I suggest using a modified breadth first search with a K-length priority queue tracking the best K paths between each island.
i would, for every island of connected walls, pretend that it is a light. (a special light that can only send out horizontal and vertical rays of light)
Use ray-tracing to see which other islands the light can hit
say Island1 (i1) hits i2,i3,i4,i5 but doesn't hit i6,i7..
then you would have line(i1,i2), line(i1,i3), line(i1,i4) and line(i1,i5)
Mark the distance of all grid points to be infinity. Set the start point as 0.
Now use breadth first search from the start. Every grid point, mark the distance of that grid point to be the minimum distance of its neighbors.
But.. here is the catch..
every time you get to a grid-point that is on a line() between two islands, Instead of recording the distance as the minimum of its neighbors, you need to make it a priority queue of length K. And record the K shortest paths to that line() from any of the other line()s
This priority queque then stays the same until you get to the next line(), where it aggregates all priority ques going into that point.
You haven't showed the need for this algorithm to be realtime, but I may be wrong about this premice. You could then precalculate the block positions.
If you can do this beforehand and then simply make the AI build the maze rock by rock as if it was a kind of tree, you could use genetic algorithms to ease up your need for heuristics. You would need to load any kind of genetic algorithm framework, start with a population of non-movable blocks (your map) and randomly-placed movable blocks (blocks that the AI would place). Then, you evolve the population by making crossovers and transmutations over movable blocks and then evaluate the individuals by giving more reward to the longest path calculated. You would then simply have to write a resource efficient path-calculator without the need of having heuristics in your code. In your last generation of your evolution, you would take the highest-ranking individual, which would be your solution, thus your desired block pattern for this map.
Genetic algorithms are proven to take you, under ideal situation, to a local maxima (or minima) in reasonable time, which may be impossible to reach with analytic solutions on a sufficiently large data set (ie. big enough map in your situation).
You haven't stated the language in which you are going to develop this algorithm, so I can't propose frameworks that may perfectly suit your needs.
Note that if your map is dynamic, meaning that the map may change over tower defense iterations, you may want to avoid this technique since it may be too intensive to re-evolve an entire new population every wave.
I'm not at all an algorithms expert, but looking at the grid makes me wonder if Conway's game of life might somehow be useful for this. With a reasonable initial seed and well-chosen rules about birth and death of towers, you could try many seeds and subsequent generations thereof in a short period of time.
You already have a measure of fitness in the length of the creeps' path, so you could pick the best one accordingly. I don't know how well (if at all) it would approximate the best path, but it would be an interesting thing to use in a solution.

How would you modify BFS to find the shortest path from A to B, given that the graph is very large?

What I mean by "very large graph" is that each vertex has 1000 adjacent vertices, but if you go to see the final solution the distance from A to B was just 6 (say).
In such a situation, using the basic BFS algorithm would be wasteful as it puts all the 1000 of A's adjacent vertices and then in the next round 1000 for each of these and so on..by time I reach B I would have considered 1000^6 vertices..
Any Idea how to optimize? Or rather is there a way?
One easy thing to do is to work from both directions:
At each step, do this:
get new neighbors of nbrsA looking for B, if not found set nbrsA = new neighbors
get new neighbors of nbrsB looking for A, if not found set nbrsB = new neighbors
compare nbrsA and nbrsB
If the graph is big but highly connected, then you'll save a fair amount of space this way, but it does cost some extra time to compare the sets of neighbors.
You could change the BFS to use Dijkstras algorithm. BFS and Dijkstras are related in certain ways and so this modification should be acceptable. And since this was a phone screen there are a lot of chances they wanted you to see the relationship there.
I agree,
this is an interesting problem. As far as I can see, there are two things that might help you:
Search forward and backwards at the same time. If you really had a minimum degree of 1000, then you would need to investigate 1000^d nodes in the d-th iteration of a BFS. This effectively reduces a 1000^(2d) to a 2*1000^d.
At some point BFS will be too expensive in terms of memory consumption. To avoid this you can switch to 'iterative deepening': Emulate a BFS by doing a depth first search (limited to 1 iteration) and then a DFS (limited to 2 iterations), etc. The overhead is a small constant (which of course is not desirable), but this way you can avoid memory problems while discovering nodes in the same order as a BFS would do.
One of two things is true:
Those 1000 vertices are often shared, meaning once they're found in round 1 they will get ignored in round 2 drastically lowering your search field,
OR
Those 1000 verticies are unique meaning you have millions or billions of nodes, meaning you're still on the expected scale of the algorithm.
Either way, not much to do about it.

Algorithm to find two points furthest away from each other

Im looking for an algorithm to be used in a racing game Im making. The map/level/track is randomly generated so I need to find two locations, start and goal, that makes use of the most of the map.
The algorithm is to work inside a two dimensional space
From each point, one can only traverse to the next point in four directions; up, down, left, right
Points can only be either blocked or nonblocked, only nonblocked points can be traversed
Regarding the calculation of distance, it should not be the "bird path" for a lack of a better word. The path between A and B should be longer if there is a wall (or other blocking area) between them.
Im unsure on where to start, comments are very welcome and proposed solutions are preferred in pseudo code.
Edit: Right. After looking through gs's code I gave it another shot. Instead of python, I this time wrote it in C++. But still, even after reading up on Dijkstras algorithm, the floodfill and Hosam Alys solution, I fail to spot any crucial difference. My code still works, but not as fast as you seem to be getting yours to run. Full source is on pastie. The only interesting lines (I guess) is the Dijkstra variant itself on lines 78-118.
But speed is not the main issue here. I would really appreciate the help if someone would be kind enough to point out the differences in the algorithms.
In Hosam Alys algorithm, is the only difference that he scans from the borders instead of every node?
In Dijkstras you keep track and overwrite the distance walked, but not in floodfill, but thats about it?
Assuming the map is rectangular, you can loop over all border points, and start a flood fill to find the most distant point from the starting point:
bestSolution = { start: (0,0), end: (0,0), distance: 0 };
for each point p on the border
flood-fill all points in the map to find the most distant point
if newDistance > bestSolution.distance
bestSolution = { p, distantP, newDistance }
end if
end loop
I guess this would be in O(n^2). If I am not mistaken, it's (L+W) * 2 * (L*W) * 4, where L is the length and W is the width of the map, (L+W) * 2 represents the number of border points over the perimeter, (L*W) is the number of points, and 4 is the assumption that flood-fill would access a point a maximum of 4 times (from all directions). Since n is equivalent to the number of points, this is equivalent to (L + W) * 8 * n, which should be better than O(n2). (If the map is square, the order would be O(16n1.5).)
Update: as per the comments, since the map is more of a maze (than one with simple obstacles as I was thinking initially), you could make the same logic above, but checking all points in the map (as opposed to points on the border only). This should be in order of O(4n2), which is still better than both F-W and Dijkstra's.
Note: Flood filling is more suitable for this problem, since all vertices are directly connected through only 4 borders. A breadth first traversal of the map can yield results relatively quickly (in just O(n)). I am assuming that each point may be checked in the flood fill from each of its 4 neighbors, thus the coefficient in the formulas above.
Update 2: I am thankful for all the positive feedback I have received regarding this algorithm. Special thanks to #Georg for his review.
P.S. Any comments or corrections are welcome.
Follow up to the question about Floyd-Warshall or the simple algorithm of Hosam Aly:
I created a test program which can use both methods. Those are the files:
maze creator
find longest distance
In all test cases Floyd-Warshall was by a great magnitude slower, probably this is because of the very limited amount of edges that help this algorithm to achieve this.
These were the times, each time the field was quadruplet and 3 out of 10 fields were an obstacle.
Size Hosam Aly Floyd-Warshall
(10x10) 0m0.002s 0m0.007s
(20x20) 0m0.009s 0m0.307s
(40x40) 0m0.166s 0m22.052s
(80x80) 0m2.753s -
(160x160) 0m48.028s -
The time of Hosam Aly seems to be quadratic, therefore I'd recommend using that algorithm.
Also the memory consumption by Floyd-Warshall is n2, clearly more than needed.
If you have any idea why Floyd-Warshall is so slow, please leave a comment or edit this post.
PS: I haven't written C or C++ in a long time, I hope I haven't made too many mistakes.
It sounds like what you want is the end points separated by the graph diameter. A fairly good and easy to compute approximation is to pick a random point, find the farthest point from that, and then find the farthest point from there. These last two points should be close to maximally separated.
For a rectangular maze, this means that two flood fills should get you a pretty good pair of starting and ending points.
I deleted my original post recommending the Floyd-Warshall algorithm. :(
gs did a realistic benchmark and guess what, F-W is substantially slower than Hosam Aly's "flood fill" algorithm for typical map sizes! So even though F-W is a cool algorithm and much faster than Dijkstra's for dense graphs, I can't recommend it anymore for the OP's problem, which involves very sparse graphs (each vertex has only 4 edges).
For the record:
An efficient implementation of Dijkstra's algorithm takes O(Elog V) time for a graph with E edges and V vertices.
Hosam Aly's "flood fill" is a breadth first search, which is O(V). This can be thought of as a special case of Dijkstra's algorithm in which no vertex can have its distance estimate revised.
The Floyd-Warshall algorithm takes O(V^3) time, is very easy to code, and is still the fastest for dense graphs (those graphs where vertices are typically connected to many other vertices). But it's not the right choice for the OP's task, which involves very sparse graphs.
Raimund Seidel gives a simple method using matrix multiplication to compute the all-pairs distance matrix on an unweighted, undirected graph (which is exactly what you want) in the first section of his paper On the All-Pairs-Shortest-Path Problem in Unweighted Undirected Graphs
[pdf].
The input is the adjacency matrix and the output is the all-pairs shortest-path distance matrix. The run-time is O(M(n)*log(n)) for n points where M(n) is the run-time of your matrix multiplication algorithm.
The paper also gives the method for computing the actual paths (in the same run-time) if you need this too.
Seidel's algorithm is cool because the run-time is independent of the number of edges, but we actually don't care here because our graph is sparse. However, this may still be a good choice (despite the slightly-worse-than n^2 run-time) if you want the all pairs distance matrix, and this might also be easier to implement and debug than floodfill on a maze.
Here is the pseudocode:
Let A be the nxn (0-1) adjacency matrix of an unweighted, undirected graph, G
All-Pairs-Distances(A)
Z = A * A
Let B be the nxn matrix s.t. b_ij = 1 iff i != j and (a_ij = 1 or z_ij > 0)
if b_ij = 1 for all i != j return 2B - A //base case
T = All-Pairs-Distances(B)
X = T * A
Let D be the nxn matrix s.t. d_ij = 2t_ij if x_ij >= t_ij * degree(j), otherwise d_ij = 2t_ij - 1
return D
To get the pair of points with the greatest distance we just return argmax_ij(d_ij)
Finished a python mockup of the dijkstra solution to the problem.
Code got a bit long so I posted it somewhere else: http://refactormycode.com/codes/717-dijkstra-to-find-two-points-furthest-away-from-each-other
In the size I set, it takes about 1.5 seconds to run the algorithm for one node. Running it for every node takes a few minutes.
Dont seem to work though, it always displays the topleft and bottomright corner as the longest path; 58 tiles. Which of course is true, when you dont have obstacles. But even adding a couple of randomly placed ones, the program still finds that one the longest. Maybe its still true, hard to test without more advanced shapes.
But maybe it can at least show my ambition.
Ok, "Hosam's algorithm" is a breadth first search with a preselection on the nodes.
Dijkstra's algorithm should NOT be applied here, because your edges don't have weights.
The difference is crucial, because if the weights of the edges vary, you need to keep a lot of options (alternate routes) open and check them with every step. This makes the algorithm more complex.
With the breadth first search, you simply explore all edges once in a way that garantuees that you find the shortest path to each node. i.e. by exploring the edges in the order you find them.
So basically the difference is Dijkstra's has to 'backtrack' and look at edges it has explored before to make sure it is following the shortest route, while the breadth first search always knows it is following the shortest route.
Also, in a maze the points on the outer border are not guaranteed to be part of the longest route.
For instance, if you have a maze in the shape of a giant spiral, but with the outer end going back to the middle, you could have two points one at the heart of the spiral and the other in the end of the spiral, both in the middle!
So, a good way to do this is to use a breadth first search from every point, but remove the starting point after a search (you already know all the routes to and from it).
Complexity of breadth first is O(n), where n = |V|+|E|. We do this once for every node in V, so it becomes O(n^2).
Your description sounds to me like a maze routing problem. Check out the Lee Algorithm. Books about place-and-route problems in VLSI design may help you - Sherwani's "Algorithms for VLSI Physical Design Automation" is good, and you may find VLSI Physical Design Automation by Sait and Youssef useful (and cheaper in its Google version...)
If your objects (points) do not move frequently you can perform such a calculation in a much shorter than O(n^3) time.
All you need is to break the space into large grids and pre-calculate the inter-grid distance. Then selecting point pairs that occupy most distant grids is a matter of simple table lookup. In the average case you will need to pair-wise check only a small set of objects.
This solution works if the distance metrics are continuous. Thus if, for example there are many barriers in the map (as in mazes), this method might fail.

Resources