Find the shortest path in a graph which visits certain nodes

Find the shortest path in a graph which visits certain nodes - algorithm

I have a undirected graph with about 100 nodes and about 200 edges. One node is labelled 'start', one is 'end', and there's about a dozen labelled 'mustpass'.
I need to find the shortest path through this graph that starts at 'start', ends at 'end', and passes through all of the 'mustpass' nodes (in any order).
( http://3e.org/local/maize-graph.png / http://3e.org/local/maize-graph.dot.txt is the graph in question - it represents a corn maze in Lancaster, PA)

Everyone else comparing this to the Travelling Salesman Problem probably hasn't read your question carefully. In TSP, the objective is to find the shortest cycle that visits all the vertices (a Hamiltonian cycle) -- it corresponds to having every node labelled 'mustpass'.
In your case, given that you have only about a dozen labelled 'mustpass', and given that 12! is rather small (479001600), you can simply try all permutations of only the 'mustpass' nodes, and look at the shortest path from 'start' to 'end' that visits the 'mustpass' nodes in that order -- it will simply be the concatenation of the shortest paths between every two consecutive nodes in that list.
In other words, first find the shortest distance between each pair of vertices (you can use Dijkstra's algorithm or others, but with those small numbers (100 nodes), even the simplest-to-code Floyd-Warshall algorithm will run in time). Then, once you have this in a table, try all permutations of your 'mustpass' nodes, and the rest.
Something like this:
//Precomputation: Find all pairs shortest paths, e.g. using Floyd-Warshall
n = number of nodes
for i=1 to n: for j=1 to n: d[i][j]=INF
for k=1 to n:
for i=1 to n:
for j=1 to n:
d[i][j] = min(d[i][j], d[i][k] + d[k][j])
//That *really* gives the shortest distance between every pair of nodes! :-)
//Now try all permutations
shortest = INF
for each permutation a[1],a[2],...a[k] of the 'mustpass' nodes:
shortest = min(shortest, d['start'][a[1]]+d[a[1]][a[2]]+...+d[a[k]]['end'])
print shortest
(Of course that's not real code, and if you want the actual path you'll have to keep track of which permutation gives the shortest distance, and also what the all-pairs shortest paths are, but you get the idea.)
It will run in at most a few seconds on any reasonable language :)
[If you have n nodes and k 'mustpass' nodes, its running time is O(n3) for the Floyd-Warshall part, and O(k!n) for the all permutations part, and 100^3+(12!)(100) is practically peanuts unless you have some really restrictive constraints.]

run Djikstra's Algorithm to find the shortest paths between all of the critical nodes (start, end, and must-pass), then a depth-first traversal should tell you the shortest path through the resulting subgraph that touches all of the nodes start ... mustpasses ... end

This is two problems... Steven Lowe pointed this out, but didn't give enough respect to the second half of the problem.
You should first discover the shortest paths between all of your critical nodes (start, end, mustpass). Once these paths are discovered, you can construct a simplified graph, where each edge in the new graph is a path from one critical node to another in the original graph. There are many pathfinding algorithms that you can use to find the shortest path here.
Once you have this new graph, though, you have exactly the Traveling Salesperson problem (well, almost... No need to return to your starting point). Any of the posts concerning this, mentioned above, will apply.

Actually, the problem you posted is similar to the traveling salesman, but I think closer to a simple pathfinding problem. Rather than needing to visit each and every node, you simply need to visit a particular set of nodes in the shortest time (distance) possible.
The reason for this is that, unlike the traveling salesman problem, a corn maze will not allow you to travel directly from any one point to any other point on the map without needing to pass through other nodes to get there.
I would actually recommend A* pathfinding as a technique to consider. You set this up by deciding which nodes have access to which other nodes directly, and what the "cost" of each hop from a particular node is. In this case, it looks like each "hop" could be of equal cost, since your nodes seem relatively closely spaced. A* can use this information to find the lowest cost path between any two points. Since you need to get from point A to point B and visit about 12 inbetween, even a brute force approach using pathfinding wouldn't hurt at all.
Just an alternative to consider. It does look remarkably like the traveling salesman problem, and those are good papers to read up on, but look closer and you'll see that its only overcomplicating things. ^_^ This coming from the mind of a video game programmer who's dealt with these kinds of things before.

This is not a TSP problem and not NP-hard because the original question does not require that must-pass nodes are visited only once. This makes the answer much, much simpler to just brute-force after compiling a list of shortest paths between all must-pass nodes via Dijkstra's algorithm. There may be a better way to go but a simple one would be to simply work a binary tree backwards. Imagine a list of nodes [start,a,b,c,end]. Sum the simple distances [start->a->b->c->end] this is your new target distance to beat. Now try [start->a->c->b->end] and if that's better set that as the target (and remember that it came from that pattern of nodes). Work backwards over the permutations:
[start->a->b->c->end]
[start->a->c->b->end]
[start->b->a->c->end]
[start->b->c->a->end]
[start->c->a->b->end]
[start->c->b->a->end]
One of those will be shortest.
(where are the 'visited multiple times' nodes, if any? They're just hidden in the shortest-path initialization step. The shortest path between a and b may contain c or even the end point. You don't need to care)

Andrew Top has the right idea:
1) Djikstra's Algorithm
2) Some TSP heuristic.
I recommend the Lin-Kernighan heuristic: it's one of the best known for any NP Complete problem. The only other thing to remember is that after you expanded out the graph again after step 2, you may have loops in your expanded path, so you should go around short-circuiting those (look at the degree of vertices along your path).
I'm actually not sure how good this solution will be relative to the optimum. There are probably some pathological cases to do with short circuiting. After all, this problem looks a LOT like Steiner Tree: http://en.wikipedia.org/wiki/Steiner_tree and you definitely can't approximate Steiner Tree by just contracting your graph and running Kruskal's for example.

Considering the amount of nodes and edges is relatively finite, you can probably calculate every possible path and take the shortest one.
Generally this known as the travelling salesman problem, and has a non-deterministic polynomial runtime, no matter what the algorithm you use.
http://en.wikipedia.org/wiki/Traveling_salesman_problem

The question talks about must-pass in ANY order. I have been trying to search for a solution about the defined order of must-pass nodes. I found my answer but since no question on StackOverflow had a similar question I'm posting here to let maximum people benefit from it.
If the order or must-pass is defined then you could run dijkstra's algorithm multiple times. For instance let's assume you have to start from s pass through k1, k2 and k3 (in respective order) and stop at e. Then what you could do is run dijkstra's algorithm between each consecutive pair of nodes. The cost and path would be given by:
dijkstras(s, k1) + dijkstras(k1, k2) + dijkstras(k2, k3) + dijkstras(k3, 3)

How about using brute force on the dozen 'must visit' nodes. You can cover all the possible combinations of 12 nodes easily enough, and this leaves you with an optimal circuit you can follow to cover them.
Now your problem is simplified to one of finding optimal routes from the start node to the circuit, which you then follow around until you've covered them, and then find the route from that to the end.
Final path is composed of :
start -> path to circuit* -> circuit of must visit nodes -> path to end* -> end
You find the paths I marked with * like this
Do an A* search from the start node to every point on the circuit
for each of these do an A* search from the next and previous node on the circuit to the end (because you can follow the circuit round in either direction)
What you end up with is a lot of search paths, and you can choose the one with the lowest cost.
There's lots of room for optimization by caching the searches, but I think this will generate good solutions.
It doesn't go anywhere near looking for an optimal solution though, because that could involve leaving the must visit circuit within the search.

One thing that is not mentioned anywhere, is whether it is ok for the same vertex to be visited more than once in the path. Most of the answers here assume that it's ok to visit the same edge multiple times, but my take given the question (a path should not visit the same vertex more than once!) is that it is not ok to visit the same vertex twice.
So a brute force approach would still apply, but you'd have to remove vertices already used when you attempt to calculate each subset of the path.

Related

Find all cyclic paths in a directed graph

The title is self explanatory. Here's a solution that I found in the internet that can help do this. Here's the link
I don't understand why not visiting a vertex having weight below the given threshold will solve the problem.
Additionally, I have no idea how to solve this using/not using this.

Let's restrict this to simple cycles - those which contain no subcycles. For each node in the graph, begin a depth-first search for that node. record each branch of the recursion tree which results in a match. While searching, never cross over nodes already traversed in the branch.
Consider the complete directed graph on n vertices. There are n(n-1) arcs and n! simple cycles of length n. The algorithm above isn't much worse than this at all. Simply constructing a new copy of the answer would take nearly as much time as running the above algorithm to do it, in the worst case at least.

If you want to find cycles in a directed (or even undirected) graph there is an intuitive way to do it:
For each edge (u, v) in the graph
1. Temporarily ignore the edge (u, v) in step 2
2. Run an algorithm to find all paths from v to u (using a backtrackig algorithm)
3. Output the computed paths in step 2 along with the edge (u, v) as cycles in the graph
Note that you will get duplicate cycles this way since a cycle of length k will be found k times.
You can play with this idea to find cycles with specific properties, as well. For example, if you are aiming to find the shortest weighted cycle in the graph instead of finding all cycles. You can use a Dijkstra in step 2, and take the minimum over all the cycles you find. If you wanna finding the cycle with the least number of edges you can use a BFS in step 2.
If you are more struggling with finding all paths in a graph, this question might help you. Although it's for a slightly different problem.
Counting/finding paths with backtracking

dijkstra algorithm, run only one time for shortest paths of some nodes (not two, not the whole graph)

So, dijkstra algorithm is (the best one) used to search for the shortest path of a weighted(without negative) and connected graph. Dijkstra algorithm can be used to find the shortest path of two points/vertices. AND it can be used to find the shortest path of all the vertices.
questions:
is my understanding correct?
Can it also be used to find the shortest paths of some pair of vertices ? for example, the graph has A, B, C, D, E, F, G, H, I, J, K. and we are only interested in the shortest paths of A,B ; C,K. it that possible we only turn the algorithm only one time to find out two paths?

You will need to run two Dijkstras. One starting from A and one from C.
What you could do is to run it from {A, C} (a set Dijkstra) until you have found paths to B and K. But that is no guarantee that the resulting paths are actually from A to B and C to K, it could as well be C, B and C, K. Actually, all combinations of {A, C}, B and {A, C}, K are then possible.
the best one
Not at all. It is a good concept and heavily used for many other similar algorithms. There are many variants, like A*, Arc-Flags, and others. But raw Dijkstra is super slow since it equally searches in all directions.
Imagine a query where you have modeled the whole world. Your destination is 1 hour away. Then Dijkstra will find shortest paths to all nodes that can be reached in 1 hour. So it will also consider a short flight to your neighboring country, even if it's the totally wrong direction. The algorithm A* is a simple modification of Dijkstra that tries to improve on that by introducing a heuristic function that is able to make (hopefully) good guesses about shortest path distances. By that your Dijkstra gets a sense of direction and tries to first prioritize a search into the direction of the destination.
A simple heuristic is as-the-crows-fly. Note that this heuristic does not perform well on road networks and especially bad on transit networks (you often need to drive 10 mins into the wrong direction to get on a highway that lets you arrive earlier in the end, or you need to first drive to some big city to get a good fast train). Other heuristics involve computing landmarks, they yield pretty good results but need a lot of pre-computation and space (usually not a problem).

First of all Dijkstra is not the best algorithm and many heuristics outperform it in most practical implementations.
we are only interested in the shortest paths of A,B ; C,K
Looks like you could take a look at A* algorithm which eliminates the nodes that will not lead you to your final destination with least cost. But it requires a good estimation (and hence the word heuristic) of the distances of various nodes to the destination.
Coming to your main question, the answer is both yes and no with some overheads. We all know that whenever a node gets removed from the min-heap its done for the algorithm.
So, as soon as your node gets removed from the min-heap, just terminate it, but that does not mean that it did find shortest distance only for the given pair. All the nodes that got removed from the min-heap before destination node are also the ones for which it has found the shortest path.
Also its possible that your destination node turns out to be the last node that is removed from the min-heap,which basically means that you have computed shortest paths from source to all nodes.

Algorithm to cover all edges given starting node

Working on an algorithm for a game I am developing with a friend and we got stuck. Currently, we have a cyclic undirected graph, and we are trying to find the quickest path from starting node S that covers every edge. We are not looking for a tour and there can be repeated edges.
Any ideas on an algorithm or approximation? I'm sure this problem is NP-hard, but I don't believe it's TSP.

Route Inspection
This is known as the route inspection problem and it does have a polynomial solution.
The basic idea (see the link for more details) is that it is easy to solve for an Eulerian path (where we visit every edge once), but an Eulerian path is only possible for certain graphs.
In particular, a graph has to be connected and have either 0 or 2 vertices of odd degree.
However, it is possible to generalise this for other graphs by adding additional edges in the cheapest way that will produce a graph that does have an Eulerian path. (Note that we have added more edges so we may travel multiple times over edges in the original graph.)
The way of choosing the best way to add additional edges is a maximal matching problem that can be solved in O(n^3).
P.S.
Concidentally I wrote a simple demo earlier today (link to game) for a planar max-cut problem. The solution to this turns out to be based on exactly the same route inspection problem :)
EDIT
I just spotted from the comments that in your particular case your graph may be a tree.
If so, then I believe the answer is much simpler as you just need to do a DFS over the tree making sure to visit the shallowest subtree first.
For example, suppose you have a tree with edges S->A and S->A->B. S has two subtrees, and you should visit A first because it is shallower.
The total edges visited will equal the number of edges visited in a full DFS, minus the depth of the last leaf visited, which is why to minimise the total edges you want to maximise the depth of the last leaf, and hence visit the shallowest subtree first.

This is somewhat like the Eulerian Path. The main distinction is that there may be dead-ends and you may be able to modify the algorithm to suit your needs. Pruning dead-ends is one option or you may be able to reduce the graph into a number of connected components.

DFS will work here. However you must have a good evaluation function to prun the branch early. Otherwise you can not solve this problem fast. You can refer to my discussion and implementation in Java here http://www.capacode.com/?p=650
Detail of my evaluation function
My first try is if the length of the current path plus the distance from U to G is not shorter than the minimum length (stored in minLength variable) we found, we will not visit U next because it can not lead a shorter path.
Actually, the above evaluation function is not efficient because it only works when we already visit most of the cities. We need to compute more precise the minimum length to reach G with all cities visited.
Assume s is the length from S to U, from U to visit G and pass all cities, the length is at least s’ = s + ∑ minDistance(K) where K is an unvisited city and different from U; minDistance(K) is the minimum distance from K to an unvisited state. Basically, for each unvisited state, we assume that we can reach that city with the shortest edge. Note that those shortest edges may not compose a valid path. Then, we will not visit U if s’ ≥ minLength.
With that evaluation function, my program can handle the problem with 20 cities within 1 second. I also add another optimization to improve the performance more. Before running the program, I use greedy algorithm to get a good value for minLength. Specifically, for each city, we will visit the nearest city next. The reason is when we have a smaller minLength, we can prun more.

Using A* to solve Travelling Salesman

I've been tasked to write an implementation of the A* algorithm (heuristics provided) that will solve the travelling salesman problem. I understand the algorithm, it's simple enough, but I just can't see the code that implements it. I mean, I get it. Priority queue for the nodes, sorted by distance + heuristic(node), add the closest node on to the path. The question is, like, what happens if the closest node can't be reached from the previous closest node? How does one actually take a "graph" as a function argument? I just can't see how the algorithm actually functions, as code.
I read the Wikipedia page before posting the question. Repeatedly. It doesn't really answer the question- searching the graph is way, way different to solving the TSP. For example, you could construct a graph where the shortest node at any given time always results in a backtrack, since two paths of the same length aren't equal, whereas if you're just trying to go from A to B then two paths of the same length are equal.
You could derive a graph by which some nodes are never reached by always going closest first.
I don't really see how A* applies to the TSP. I mean, finding a route from A to B, sure, I get that. But the TSP? I don't see the connection.

I found a solution here
Use minimum spanning tree as a heuristic.
Set
Initial State: Agent in the start city and has not visited any other city
Goal State: Agent has visited all the cities and reached the start city again
Successor Function: Generates all cities that have not yet visited
Edge-cost: distance between the cities represented by the nodes, use this cost to calculate g(n).
h(n): distance to the nearest unvisited city from the current city + estimated distance to travel all the unvisited cities (MST heuristic used here) + nearest distance from an unvisited city to the start city. Note that this is an admissible heuristic function.
 
You may consider maintaining a list of visited cities and a list of unvisited cities to facilitate computations.

The confusion here is that the graph on which you are trying to solve the TSP is not the graph you are performing an A* search on.
See related: Sudoku solving algorithm C++
To solve this problem you need to:
Define your:
TSP states
TSP initial state
TSP goal state(s)
TSP state successor function
TSP state heuristic
Apply a generic A* solver to this TSP state graph
A quick example I can think up:
TSP states: list of nodes (cities) currently in the TSP cycle
TSP initial state: the list containing a single node, the travelling salesman's home town
TSP goal state(s): a state is a goal if it contains every node in the graph of cities
TSP successor function: can add any node (city) that isn't in the current cycle to the end of the list of nodes in the cycle to get a new state
The cost of the transition is equal to the cost of the edge you're adding to the cycle
TSP state heuristic: you decide

If it's just a problem of understanding the algorithm and how it works you might want to consider drawing a graph on paper, assigning weights to it and drawing it out. Also you can probably find some animations that show Dijkstra's shortest path, Wikipedia has a good one. The only difference between Dijkstra and A* is the addition of the heuristic, and you stop the search as soon as you reach the target node. As far as using it to solve the TSP, good luck with that!

Think about this a little more abstractly. Forget about A* for a moment, it's just dijkstra's with a heuristic anyway. Before, you wanted to get from A to B. What was your goal? To get to B. The goal was to get to B with the least cost. At any given point, what was your current "state"? Probably just your location on the graph.
Now, you want to start at A, then go to both B and C. What is your goal now? To pass over both B and C, maintaining least cost. You can generalize this with more nodes: D, E, F, ... or just N nodes. Now, at any given point, what is your current "state"? This is critical: it ISN'T just your location in the graph--it's also which of B or C or whatever nodes you have visited so far in the search.
Implement your original algorithm so that it calls some function asking if it has reached "the goal state" after making X move. Before, the function would have just said "yes, you're at state B, therefore you are at the goal". But now, let that function return "yes, you're at the goal state" if the search's path has passed over each of the points of interest. It'll know whether or not the search has passed over all points of interest because that's included in the current state.
After you get that, improve the search with some heuristic, and A* it up.

To answer one of your questions...
To pass a graph as a function argument, you have several options. You could pass a pointer to an array containing all the nodes. You could pass just the one starting node and work from there, if it's a fully connected graph. And finally, you could write a graph class with whatever data structures you need inside it, and pass a reference to an instance of that class.
As for your other question about closest nodes, isn't part of A* search that it will backtrack as needed? Or you could implement your own sort of backtracking to handle that kind of situation.

The question is, like, what happens if the closest node can't be reached from the previous closest node?
This step isn't necessary. As in, you aren't computing a path from the previous closest to the current closest, you are trying to get to your goal node, and the current closest is the only thing that matters (e.g. the algorithm doesn't care that last iteration you were 100km away, because this iteration you are only 96km away).
As a broad introduction, A* doesn't directly construct a path: it explores until it definitely knows that the path is contained within the region it has explored, and then constructs the path based on the information recorded during the exploration.
(I'm going to use the code in the Wikipedia article as a reference implementation to aid my explanation.)
You have a two sets of nodes: closedset and openset
closedset holds nodes that have been fully evaluated, that is, you know exactly how far they are from start and all their neighbours are in one of the two sets. This there is no more computation you can do with them and so we can (sort of) ignore them. (Basically these are completely contained within the border.)
openset holds "border" nodes, you know how far these are from start, but you haven't touched their neighbours yet, so they are on the edge of your search so far.
(Implicitly, there is a third set: completely untouched nodes. But you don't really touch them until they are in openset so they don't matter.)
At a given iteration, if you've got nodes to explore (that is, nodes in openset), you need to work out which one to explore. This is the job of the heuristic, it basically gives you a hint about which point on the border will be the best to explore next by telling you which node it thinks will have the shortest path to goal.
The previous closest node is irrelevant, it just expanded the border a bit, adding new nodes to openset. These new nodes are now candidates for the closest node in this iteration.
At first, openset only contains start, but then you iterate and at each step the border is expanded a little (in the most promising direction), until you eventually reach goal.
When A* is actually doing the exploration, it doesn't worry about which nodes came from where. It doesn't need to, because it knows their distance from start and the heuristic function and that's all it needs.
However to reconstruct the path later, you need to have some record of the path, this is what camefrom is. For a given node, camefrom links it to the node that is closest to start, so you can reconstruct the shortest path by following the links backwards from goal.
How does one actually take a "graph" as a function argument?
By passing one of the representations of a graph.
I don't really see how A* applies to the TSP. I mean, finding a route from A to B, sure, I get that. But the TSP? I don't see the connection.
You need a different heuristic and a different end condition: goal is no longer a single node any more, but the state of having everything connected; and your heuristic is some estimate of the length of the shortest path connecting the remaining nodes.

Graph Algorithm To Find All Paths Between N Arbitrary Vertices

I have an graph with the following attributes:
Undirected
Not weighted
Each vertex has a minimum of 2 and maximum of 6 edges connected to it.
Vertex count will be < 100
Graph is static and no vertices/edges can be added/removed or edited.
I'm looking for paths between a random subset of the vertices (at least 2). The paths should simple paths that only go through any vertex once.
My end goal is to have a set of routes so that you can start at one of the subset vertices and reach any of the other subset vertices. Its not necessary to pass through all the subset nodes when following a route.
All of the algorithms I've found (Dijkstra,Depth first search etc.) seem to be dealing with paths between two vertices and shortest paths.
Is there a known algorithm that will give me all the paths (I suppose these are subgraphs) that connect these subset of vertices?
edit:
I've created a (warning! programmer art) animated gif to illustrate what i'm trying to achieve: http://imgur.com/mGVlX.gif
There are two stages pre-process and runtime.
pre-process
I have a graph and a subset of the vertices (blue nodes)
I generate all the possible routes that connect all the blue nodes
runtime
I can start at any blue node select any of the generated routes and travel along it to reach my destination blue node.
So my task is more about creating all of the subgraphs (routes) that connect all blue nodes, rather than creating a path from A->B.

There are so many ways to approach this and in order not confuse things, here's a separate answer that's addressing the description of your core problem:
Finding ALL possible subgraphs that connect your blue vertices is probably overkill if you're only going to use one at a time anyway. I would rather use an algorithm that finds a single one, but randomly (so not any shortest path algorithm or such, since it will always be the same).
If you want to save one of these subgraphs, you simply have to save the seed you used for the random number generator and you'll be able to produce the same subgraph again.
Also, if you really want to find a bunch of subgraphs, a randomized algorithm is still a good choice since you can run it several times with different seeds.
The only real downside is that you will never know if you've found every single one of the possible subgraphs, but it doesn't really sound like that's a requirement for your application.
So, on to the algorithm: Depending on the properties of your graph(s), the optimal algorithm might vary, but you could always start of with a simple random walk, starting from one blue node, walking to another blue one (while making sure you're not walking in your own old footsteps). Then choose a random node on that path and start walking to the next blue from there, and so on.
For certain graphs, this has very bad worst-case complexity but might suffice for your case. There are of course more intelligent ways to find random paths, but I'd start out easy and see if it's good enough. As they say, premature optimization is evil ;)

A simple breadth-first search will give you the shortest paths from one source vertex to all other vertices. So you can perform a BFS starting from each vertex in the subset you're interested in, to get the distances to all other vertices.
Note that in some places, BFS will be described as giving the path between a pair of vertices, but this is not necessary: You can keep running it until it has visited all nodes in the graph.
This algorithm is similar to Johnson's algorithm, but greatly simplified thanks to the fact that your graph is unweighted.
Time complexity: Since there is a constant number of edges per vertex, each BFS will take O(n), and the total will take O(kn), where n is the number of vertices and k is the size of the subset. As a comparison, the Floyd-Warshall algorithm will take O(n^3).

What you're searching for is (if I understand it correctly) not really all paths, but rather all spanning trees. Read the wikipedia article about spanning trees here to determine if those are what you're looking for. If it is, there is a paper you would probably want to read:
Gabow, Harold N.; Myers, Eugene W. (1978). "Finding All Spanning Trees of Directed and Undirected Graphs". SIAM J. Comput. 7 (280).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio