Related
I am looking for a pathfinding algorithm to use for an AI controlling an entity in a 2D grid that needs to find a path from A to B. It does not have to be the shortest path but it needs to be calculated very fast. The grid is static (never changes) and some grid cells are occupied by obstacles.
I'm currently using A* but it is too slow for my purposes because it always tries to calculate the fastest path. The main performance problem occurs when the path does not exist, in which case A* will try to explore too many cells.
Is there a different algorithm I could use that could find a path faster than A* if the path doesn't have to be the shortest path?
Thanks,
Luminal
Assuming your grid is static and doesn't change. You can calculate the connected components of your graph once after building the grid.
Then you can easily check if source and target vertex are within a component or not. If yes, then execute A*, if not then don't as there can't be a path between the components.
You can get the connected components of a graph using BFS or DFS.
To find a path instead of the shortest path, use any graph traversal (e.g. depth-first or best-first). It won't necessarily be faster, in fact it may check many more nodes than A* on some graphs, so it depends on your data. However, it will be easier to implement and the constant factors will be significantly lower.
To avoid search for a path when there is none, you could create disjoint sets (once after you built the graph) to very quickly check whether two given points are connected. This takes linear space and linear time to build, and lookup takes amortized practically-constant time, but you still need to run your full algorithm at times, as it will only tell you whether there is a path, not where that path goes.
If you're already building data structures beforehand, and have a bit more time and space to trade for instant shortest paths at run-time, you can have your cake and eat it too: The Floyd-Warshall algorithm gives you all shortest paths in comparatively modest O(|V|^3) time, which is the most bang for the buck considering there are |V|² (start, destination) pairs. It computes a |V| * |V| matrix, which could be a bit large, but consider that this is an integer matrix and you only need |V| * |V| * log_2 |V| bits (for example, that's 1.25 MiB for 1024 vertices).
You can use either DFS or BFS since you just want to know if the two vertices are connected. Both algorithms run in O(|V|) where V is the set of all vertices in the graph.
Use any of this two algorithms if your heuristic takes some non trivial time to get computed, otherwise I think A* should run similarly or better than DFS or BFS.
As another option you can use the Floyd-Warshall algorithm (O(V^3)) to calculate, after you create the grid, the shortest distance path between each pair of vertices, thus doing all the heavy lifting at the start of the simulation and then have stored all shortest paths for O(1) access in a hash, or if this turns out to be too memory explosive you can just keep a matrix next such that next[i][j] stores the vertex that we must take to go from vertex i to vertex j. Thus we can build the path from i to jas (i, k1=next[i][j]), (k1, k2=next[k1][j]) ... (kr, j)
If the graph is small enough, you can precompute all shortest paths using the Floyd-Warshall algorithm. This takes O(|V|²) memory for storing the paths, and O(|V|³) time for the precomputation.
Obviously this is not an option for very large graphs. For those you should use Thomas's answer and precompute the connected components (takes linear time and memory) to avoid the most expensive A* searches.
A*, BFS, DFS, and all other search algorithms will have to check exactly the same number of nodes when there is no path, so "use DFS first" is not a good answer. And using Floyd-Warshall is completely unnecessary.
For static graphs, the solution is simple; see #Thomas's answer. For non-static graphs, the problem is more complicated; see this answer for a good algorithm.
In case your maze never changes and any path that exists exists for ever, could you not use a mapping algorithm wich would find "regions" of your maze and store those in some format? The memory usage would be linear with the amount of nodes or cells and speed is the speed of accessing and comparing two elements of an array.
The computing (spliting to regions) would probably be more time consuming but it's done once in the beginning. Maybe you could adapt some flood fill algorithm for this purpose?
Not sure if it's clear from the above, but i think an example is always ahelp. I hope you'll forgive me using PHP syntax.
example (maze 5x5) ([] marks an obstacle):
0 1 2 3 4
+----------+
0 |[][] 2 |
1 | [][] |
2 | [] []|
3 | 1 [] |
4 | [] |
+----------+
indexed regions (using numeric hash instead of 'x:y' may be better):
$regions=array(
'0:1'=>1,'0:2'=>1,'0:3'=>1,'0:4'=>1,'1:2'=>1,'1:3'=>1,'1:4'=>1,
'2:0'=>2,'3:0'=>2,'3:1'=>2,'3:2'=>2,'3:3'=>2,'3:4'=>2,'4:0'=>2,
'4:1'=>2,'4:3'=>2,'4:4'=>2
);
then your task is only to find whether your start and end point are both in the same region:
if ( $regions[$startPoint] == $regions[$endPoint] )
pathExists();
Now if i am not mistaken the A* complexity (speed) depends on the distance between start and end points (or length of solution), and that may perhaps be used to speed up your search.
I would try to create some "junction nodes" (JN) in the maze. Those could be located on a function (to know fast where the nearest JN is) and you would have paths between all neighbouring JN precalculated.
So then you need only search for solution from startpoint to nearest JN and from endpoint to it's nearest JN (where all of them are in the same region (the above)). Now i can see a scenario (if the function isn't well chosen with regards to the comlexity of the maze) that has several regions, some of which may have no JN at all or that all your JN fall onto obstacles in the maze... So it may be better to manually define those JN if possible or make this JN placing function non-trivial (taking in consideration the area of each region).
Once you reach the JN you may have the paths between them either indexed (to fast retrieve predefined path between start and end JN) or perform another A* pathfinding, except this time only on the set of "junction nodes" - as there's much less of those this path search between JN will be faster.
You may consider using an Anytime A* algorithm (ANA* or other variants).
These will start by performing a greedy best first search to find an initial path.
It will then make incremental improvements by running with the heuristic function increasingly less less weighted.
You can call off the search at any time and get its best path found so far.
I have an graph with the following attributes:
Undirected
Not weighted
Each vertex has a minimum of 2 and maximum of 6 edges connected to it.
Vertex count will be < 100
Graph is static and no vertices/edges can be added/removed or edited.
I'm looking for paths between a random subset of the vertices (at least 2). The paths should simple paths that only go through any vertex once.
My end goal is to have a set of routes so that you can start at one of the subset vertices and reach any of the other subset vertices. Its not necessary to pass through all the subset nodes when following a route.
All of the algorithms I've found (Dijkstra,Depth first search etc.) seem to be dealing with paths between two vertices and shortest paths.
Is there a known algorithm that will give me all the paths (I suppose these are subgraphs) that connect these subset of vertices?
edit:
I've created a (warning! programmer art) animated gif to illustrate what i'm trying to achieve: http://imgur.com/mGVlX.gif
There are two stages pre-process and runtime.
pre-process
I have a graph and a subset of the vertices (blue nodes)
I generate all the possible routes that connect all the blue nodes
runtime
I can start at any blue node select any of the generated routes and travel along it to reach my destination blue node.
So my task is more about creating all of the subgraphs (routes) that connect all blue nodes, rather than creating a path from A->B.
There are so many ways to approach this and in order not confuse things, here's a separate answer that's addressing the description of your core problem:
Finding ALL possible subgraphs that connect your blue vertices is probably overkill if you're only going to use one at a time anyway. I would rather use an algorithm that finds a single one, but randomly (so not any shortest path algorithm or such, since it will always be the same).
If you want to save one of these subgraphs, you simply have to save the seed you used for the random number generator and you'll be able to produce the same subgraph again.
Also, if you really want to find a bunch of subgraphs, a randomized algorithm is still a good choice since you can run it several times with different seeds.
The only real downside is that you will never know if you've found every single one of the possible subgraphs, but it doesn't really sound like that's a requirement for your application.
So, on to the algorithm: Depending on the properties of your graph(s), the optimal algorithm might vary, but you could always start of with a simple random walk, starting from one blue node, walking to another blue one (while making sure you're not walking in your own old footsteps). Then choose a random node on that path and start walking to the next blue from there, and so on.
For certain graphs, this has very bad worst-case complexity but might suffice for your case. There are of course more intelligent ways to find random paths, but I'd start out easy and see if it's good enough. As they say, premature optimization is evil ;)
A simple breadth-first search will give you the shortest paths from one source vertex to all other vertices. So you can perform a BFS starting from each vertex in the subset you're interested in, to get the distances to all other vertices.
Note that in some places, BFS will be described as giving the path between a pair of vertices, but this is not necessary: You can keep running it until it has visited all nodes in the graph.
This algorithm is similar to Johnson's algorithm, but greatly simplified thanks to the fact that your graph is unweighted.
Time complexity: Since there is a constant number of edges per vertex, each BFS will take O(n), and the total will take O(kn), where n is the number of vertices and k is the size of the subset. As a comparison, the Floyd-Warshall algorithm will take O(n^3).
What you're searching for is (if I understand it correctly) not really all paths, but rather all spanning trees. Read the wikipedia article about spanning trees here to determine if those are what you're looking for. If it is, there is a paper you would probably want to read:
Gabow, Harold N.; Myers, Eugene W. (1978). "Finding All Spanning Trees of Directed and Undirected Graphs". SIAM J. Comput. 7 (280).
I need to find the shortest path in a graph with the least number of added nodes. The start and end nodes are not important. If there is no path in a graph just between specified n-nodes, I can add some nodes to complete the shortest tree but I want to add as few new nodes as possible.
What algorithm can I use to solve this problem?
Start with the start node.
if it is the target node, you are done.
Check every connected node, if it is the target node. If true you are done
Check if any of the connected nodes is connected to the target node. If true you are done.
Else add a node that is connected to start and end node. done.
I recommend you to use genetic algorithm. More information here and here.
Quickly explaining it, GA is an algorithm to find exact or approximate solutions to optimization and search problems.
You create initial population of possible solutions. You evaluate them with fitness function in order to find out, which of them are most suitable. After that, you use evolutionary algorithms that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover.
After several generations, you'll find the most suitable (read shortest) solution to the problem.
You want to minimize the number of nodes in the path (instead of the sum-of-weight as in general algorithms).
If that is the case, assign equal weight to all the edges and find the shortest path (using the generic algorithms). You will have what you needed.
And if there is no path, just add that edge to the graph.
Sands.
PS: If you give a value of 1 for each edge, the number of nodes in the path would be the weight-1 (excluding the source and destination nodes)
I have a undirected graph with about 100 nodes and about 200 edges. One node is labelled 'start', one is 'end', and there's about a dozen labelled 'mustpass'.
I need to find the shortest path through this graph that starts at 'start', ends at 'end', and passes through all of the 'mustpass' nodes (in any order).
( http://3e.org/local/maize-graph.png / http://3e.org/local/maize-graph.dot.txt is the graph in question - it represents a corn maze in Lancaster, PA)
Everyone else comparing this to the Travelling Salesman Problem probably hasn't read your question carefully. In TSP, the objective is to find the shortest cycle that visits all the vertices (a Hamiltonian cycle) -- it corresponds to having every node labelled 'mustpass'.
In your case, given that you have only about a dozen labelled 'mustpass', and given that 12! is rather small (479001600), you can simply try all permutations of only the 'mustpass' nodes, and look at the shortest path from 'start' to 'end' that visits the 'mustpass' nodes in that order -- it will simply be the concatenation of the shortest paths between every two consecutive nodes in that list.
In other words, first find the shortest distance between each pair of vertices (you can use Dijkstra's algorithm or others, but with those small numbers (100 nodes), even the simplest-to-code Floyd-Warshall algorithm will run in time). Then, once you have this in a table, try all permutations of your 'mustpass' nodes, and the rest.
Something like this:
//Precomputation: Find all pairs shortest paths, e.g. using Floyd-Warshall
n = number of nodes
for i=1 to n: for j=1 to n: d[i][j]=INF
for k=1 to n:
for i=1 to n:
for j=1 to n:
d[i][j] = min(d[i][j], d[i][k] + d[k][j])
//That *really* gives the shortest distance between every pair of nodes! :-)
//Now try all permutations
shortest = INF
for each permutation a[1],a[2],...a[k] of the 'mustpass' nodes:
shortest = min(shortest, d['start'][a[1]]+d[a[1]][a[2]]+...+d[a[k]]['end'])
print shortest
(Of course that's not real code, and if you want the actual path you'll have to keep track of which permutation gives the shortest distance, and also what the all-pairs shortest paths are, but you get the idea.)
It will run in at most a few seconds on any reasonable language :)
[If you have n nodes and k 'mustpass' nodes, its running time is O(n3) for the Floyd-Warshall part, and O(k!n) for the all permutations part, and 100^3+(12!)(100) is practically peanuts unless you have some really restrictive constraints.]
run Djikstra's Algorithm to find the shortest paths between all of the critical nodes (start, end, and must-pass), then a depth-first traversal should tell you the shortest path through the resulting subgraph that touches all of the nodes start ... mustpasses ... end
This is two problems... Steven Lowe pointed this out, but didn't give enough respect to the second half of the problem.
You should first discover the shortest paths between all of your critical nodes (start, end, mustpass). Once these paths are discovered, you can construct a simplified graph, where each edge in the new graph is a path from one critical node to another in the original graph. There are many pathfinding algorithms that you can use to find the shortest path here.
Once you have this new graph, though, you have exactly the Traveling Salesperson problem (well, almost... No need to return to your starting point). Any of the posts concerning this, mentioned above, will apply.
Actually, the problem you posted is similar to the traveling salesman, but I think closer to a simple pathfinding problem. Rather than needing to visit each and every node, you simply need to visit a particular set of nodes in the shortest time (distance) possible.
The reason for this is that, unlike the traveling salesman problem, a corn maze will not allow you to travel directly from any one point to any other point on the map without needing to pass through other nodes to get there.
I would actually recommend A* pathfinding as a technique to consider. You set this up by deciding which nodes have access to which other nodes directly, and what the "cost" of each hop from a particular node is. In this case, it looks like each "hop" could be of equal cost, since your nodes seem relatively closely spaced. A* can use this information to find the lowest cost path between any two points. Since you need to get from point A to point B and visit about 12 inbetween, even a brute force approach using pathfinding wouldn't hurt at all.
Just an alternative to consider. It does look remarkably like the traveling salesman problem, and those are good papers to read up on, but look closer and you'll see that its only overcomplicating things. ^_^ This coming from the mind of a video game programmer who's dealt with these kinds of things before.
This is not a TSP problem and not NP-hard because the original question does not require that must-pass nodes are visited only once. This makes the answer much, much simpler to just brute-force after compiling a list of shortest paths between all must-pass nodes via Dijkstra's algorithm. There may be a better way to go but a simple one would be to simply work a binary tree backwards. Imagine a list of nodes [start,a,b,c,end]. Sum the simple distances [start->a->b->c->end] this is your new target distance to beat. Now try [start->a->c->b->end] and if that's better set that as the target (and remember that it came from that pattern of nodes). Work backwards over the permutations:
[start->a->b->c->end]
[start->a->c->b->end]
[start->b->a->c->end]
[start->b->c->a->end]
[start->c->a->b->end]
[start->c->b->a->end]
One of those will be shortest.
(where are the 'visited multiple times' nodes, if any? They're just hidden in the shortest-path initialization step. The shortest path between a and b may contain c or even the end point. You don't need to care)
Andrew Top has the right idea:
1) Djikstra's Algorithm
2) Some TSP heuristic.
I recommend the Lin-Kernighan heuristic: it's one of the best known for any NP Complete problem. The only other thing to remember is that after you expanded out the graph again after step 2, you may have loops in your expanded path, so you should go around short-circuiting those (look at the degree of vertices along your path).
I'm actually not sure how good this solution will be relative to the optimum. There are probably some pathological cases to do with short circuiting. After all, this problem looks a LOT like Steiner Tree: http://en.wikipedia.org/wiki/Steiner_tree and you definitely can't approximate Steiner Tree by just contracting your graph and running Kruskal's for example.
Considering the amount of nodes and edges is relatively finite, you can probably calculate every possible path and take the shortest one.
Generally this known as the travelling salesman problem, and has a non-deterministic polynomial runtime, no matter what the algorithm you use.
http://en.wikipedia.org/wiki/Traveling_salesman_problem
The question talks about must-pass in ANY order. I have been trying to search for a solution about the defined order of must-pass nodes. I found my answer but since no question on StackOverflow had a similar question I'm posting here to let maximum people benefit from it.
If the order or must-pass is defined then you could run dijkstra's algorithm multiple times. For instance let's assume you have to start from s pass through k1, k2 and k3 (in respective order) and stop at e. Then what you could do is run dijkstra's algorithm between each consecutive pair of nodes. The cost and path would be given by:
dijkstras(s, k1) + dijkstras(k1, k2) + dijkstras(k2, k3) + dijkstras(k3, 3)
How about using brute force on the dozen 'must visit' nodes. You can cover all the possible combinations of 12 nodes easily enough, and this leaves you with an optimal circuit you can follow to cover them.
Now your problem is simplified to one of finding optimal routes from the start node to the circuit, which you then follow around until you've covered them, and then find the route from that to the end.
Final path is composed of :
start -> path to circuit* -> circuit of must visit nodes -> path to end* -> end
You find the paths I marked with * like this
Do an A* search from the start node to every point on the circuit
for each of these do an A* search from the next and previous node on the circuit to the end (because you can follow the circuit round in either direction)
What you end up with is a lot of search paths, and you can choose the one with the lowest cost.
There's lots of room for optimization by caching the searches, but I think this will generate good solutions.
It doesn't go anywhere near looking for an optimal solution though, because that could involve leaving the must visit circuit within the search.
One thing that is not mentioned anywhere, is whether it is ok for the same vertex to be visited more than once in the path. Most of the answers here assume that it's ok to visit the same edge multiple times, but my take given the question (a path should not visit the same vertex more than once!) is that it is not ok to visit the same vertex twice.
So a brute force approach would still apply, but you'd have to remove vertices already used when you attempt to calculate each subset of the path.
I am attempting to write (or expand on an existing) graph search algorithm that will let me find the path to get closest to destination node considering there is no guarantee that the nodes will be connected.
To provide a realistic application of this, let's say I need to get from Brampton, Ontario to Hamilton, Ontario. I know my possible options at my start point are Local transit, GO bus or Walking. I know that walking is the least desired way to get to my destination so I look at GO bus first. I know I can take GO to a point close to Hamilton, but at that point the GO bus turns and goes another direction at that closest point is at a place where I have no options (other than walk, but the algorithm would only consider walking for short distances otherwise it will consider the route not feasible)
Using this same example, if the algorithm were to find that I can get there a way that is longer but gets me closer to the destination node (or possible at the destination node) that would be a higher weighted path (The weightings don't matter so much while its searching, only when the results are delivered, it would list by which path was closest to the destination in ascending order). For example, one GO Bus may get me 3km from the destination node, while 3 public transit buses would get me 500m away
So my question is two fold:
1) What algorithm should I start with that does something similar
2) How would I programmaticly explain that it's ok if nodes don't connect so that it doesn't just jump from node A to node R. Would starting from the end and working backward accomplish this
Edit: I forgot to ask how to aim for the best approximate solution because especially with a large graph there will be possibly millions of solutions for this problem.
Thanks,
Michael
Read up on the A* algorithm. It is a generalization of Dijkstra's shortest path algorithm, that allows you to specify a heuristic, which provides a lower bound for distances between two verteces. In your case, the heuristic function would simply return the Euclidean distance.
Run the algorithm and keep track of the vertex with the best characteristic value, which you somehow compute from the graph distance from source and Euclidean distance to target. The only tricky part is to determine when to terminate (unless you want to traverse the entire graph).
Why can't you assume that all nodes are connected? In the real world, they normally are, i.e. you can always walk or call a taxi, etc.?
In this case, you could simply change your model the following way: You have one graph for each transportation method. The nodes that are at the same place are connected with edges of weight 0 (i.e. if you are dropped off by car at an airport or train station).
Then, label each vertex and edge with the transportation type and you can simply use existing routing algorithms. Oh, by the way: A* will not scale well to really big networks. To get something that software like Yahoo/Google/Microsoft Maps actually would use, have a look here. The work of this research group includes the winner of the DIMACS shortest path challenge.
Sounds very much like a travelling salesman problem with additional node characteristics. Just be wary that this type of problem is NP Complete and your best bet would be to go with some sort of approximation algorithm.