Is there an efficient algorithm for finding or approximating the shortest walk of a graph which must visit some subset of vertices of the graph? - algorithm

The title is a mouth-full, but put simply, I have a large, undirected, incomplete graph, and I need to visit some subset of vertices in the (approximately) shortest time possible. Note that this isn't TSP, as I don't need to visit all vertices.
The naive approach would be to simply brute-force the solution by trying every possible walk which includes the required vertices, using A*, for example, to calculate the walks between required vertices. However, this is O(n!) where n is the number of required vertices. This is unfeasible for me, as n > 40 in my average case, and n ≈ 80 in my worst case.
Is there a more efficient algorithm for this, perhaps one that approximates the solution?
This question is similar to the question here, but differs in the fact that my graph is larger than the one in the linked question. There are several other similar questions, but none that I've seen exactly solve my specific problem.

If you allow visiting the same nodes several times, find the shortest path between each pair of mandatory vertices. Then you solve the TSP between the mandatory vertices, using the above shortest path costs. If you disallow multiple visits, the problem is much worse.
I am afraid you cannot escape the TSP.

Related

Find the second maximum weighed matching in a complete bipartite graph

Given a weighed complete bipartite graph G=(V, U, E), the maximum weighted bipartite matching problem, i.e., the assignment problem, aims to find a matching in G where the sum of edge weights is maximized. I know there are some methods (e.g., Hungarian algorithm) can solve this problem. Now, I want to solve a slightly different problem:
Given a weighed complete bipartite graph G=(V, U, E), I would like to find the maximum weighted bipartite matching and the second maximum weighted bipartite matching in G at the same time. Any ideas would be much appreciated.
There is a general algorithm called Lawler-Murty which allows you to find the top K answers to combinatorial algorithms (including matching) in successive calls. It is described at https://core.ac.uk/download/pdf/82129717.pdf in the context of matching.
Basically, after finding the best answer, you add constraints to the problem which create a number of sub-problems such that the answers found so far are ruled out, but all other answers will still turn up as the answer to one of the sub-problems. The second best answer will turn up as the best answer to one of the sub-problems. When you do this repeatedly you end up with a large tree of sub-problems to solve. For matching problems, you can reduce the time take to solve a sub-problem by making use of some of the work from previous problems.

Minimal spanning tree with K extra node

Assume we're given a graph on a 2D-plane with n nodes and edge between each pair of nodes, having a weight equal to a euclidean distance. The initial problem is to find MST of this graph and it's quite clear how to solve that using Prim's or Kruskal's algorithm.
Now let's say we have k extra nodes, which we can place in any integer point on our 2D-plane. The problem is to find locations for these nodes so as new graph has the smallest possible MST, if it is not necessary to use all of these extra nodes.
It is obviously impossible to find the exact solution (in poly-time), but the goal is to find the best approximate one (which can be found within 1 sec). Maybe you can come up with some hints of the most efficient way of going throw possible solutions, or provide with some articles, where the similar problem is covered.
It is very interesting problem which you are working on. You have many options to attack this problem. The best known heuristics in such situation are - Genetic Algorithms, Particle Swarm Optimization, Differential Evolution and many others of this kind.
What is nice for such kind of heuristics is that you can limit their execution to a certain amount of time (let say 1 second). If it was my task to do I would try first Genetic Algorithms.
You could try with a greedy algorithm, try the longest edges in the MST, potentially these could give the largest savings.
Select the longest edge, now get the potential edge from each vertex that are closed in angle to the chosen one, from each side.
from these select the best Steiner point.
Fix the MST ...
repeat until 1 sec is gone.
The challenge is what to do if one of the vertexes is itself a Steiner point.

Shortest path to connect n points

I have n points and I need to connect all of them minimizing the final distance. The image above represents an algorithm that in each node it connects to the nearest one but the final output might be really of.
I've been searching a lot, I know some pathfinding algos but unaware of one that solves exactly this case. I found a question on Math Stackexchange but the answer is not providing any algorithm - https://math.stackexchange.com/a/581844/156584.
Is there any algorithm that solves exactly this problem? Otherwise I can bruteforce it.
Edit: Some clarification regarding the result I'm expecting: each node can be connected to 2 other nodes, creating a continuous path (like taking a pen and without ever lifting it, connect the nodes minimizing the final distance). I don't want to create a cycle (that being the travelling salesman problem).
PS: this question can also be translated to "complete graph with n vertices, and wanting to choose the set of edges such that the graph is connected, but the sum of the edge weights is minimized"
This problem is known as the shortest Hamiltonian path problem and it is NP-hard. So if the number of points is small, you can use backtracking or dynamic programming to find an optimal solution. If the number of points is large, you can use heuristics and/or approximations to obtain a relatively good answer(it is not always possible to find the best one in this case, though).

Minimal cost cyclic path in a graph - A variant of TSP

For example, we have a graph consisting of vertices (cities) and edges (roads) and each edge(road) has a particular cost, find the minimal cost to visit all cities ATLEAST ONCE. Cost is the sum of the edge costs of the edges traversed.
The part "ATLEAST ONCE" caught me. In a TSP we can visit a node only once according to Wiki. Consider the graph,
A-B 11
A-C 5
B-C 2
B-E 4
C-E 3
C-D 20
D-E 100
In a TSP, The cyclic path would be A-B-E-D-C-A cost- 140 (or) A-C-D-E-B-A cost- 140. Where as from my problem description we can visit each vertex ATLEAST ONCE so we can have a cyclic path A-C-D-C-E-B-A cost- 63 which is << a TSP. This is where I had a problem. Any specific algorithm here? I'm pretty sure TSP wont work well here.
Pointers or pseudo code will be very helpful.
For each pair of nodes, you can apply the shortest path algorithm and calculate the shortest distance. This will be the new cost matrix for each pair.
Now it is reduced to Travelling Salesman Problem.
Then you can apply TSP solving technique.
Given that you are allowing a vertex to be visited multiple times, this effectively turns your incomplete graph into a complete graph (all vertices connected), which is what TSP requires. Solving your problem in the general case is exactly the same as solving the metric TSP. The good news is that this is a heavily researched topic. The bad news is that you aren't able to sidestep the TSP - since your problem is identical to a form of the TSP.
As pointed out by others, you complete the graph by computing the shortest cost between each pair of vertices and adding those edges where missing. You also need to replace any existing direct edge for which you've found a lower indirect path cost so that you have a Metric TSP. You can store with the new synthetic edges their actual paths (through intermediate vertices) so you can recover those for your final answer, or you can recompute those paths as needed upon receiving the result of the TSP.
Now you can solve this as a TSP. However, solving TSP optimally is too expensive in the general case, so you'll likely want to use an approximate solution algorithm. A variety of these (e.g. Christofides algorithm, Lin–Kernighan heuristic) are available which make differing tradeoffs between guaranteed levels of optimality and performance of the algorithm.
If you actually don't care about completing the cycle, and just want a minimum path that visits all vertices, starting and ending at any vertex, this is a somewhat different problem. For this, read my answer here: https://stackoverflow.com/a/33601043/5237297

Find the shortest path in a graph which visits certain nodes

I have a undirected graph with about 100 nodes and about 200 edges. One node is labelled 'start', one is 'end', and there's about a dozen labelled 'mustpass'.
I need to find the shortest path through this graph that starts at 'start', ends at 'end', and passes through all of the 'mustpass' nodes (in any order).
( http://3e.org/local/maize-graph.png / http://3e.org/local/maize-graph.dot.txt is the graph in question - it represents a corn maze in Lancaster, PA)
Everyone else comparing this to the Travelling Salesman Problem probably hasn't read your question carefully. In TSP, the objective is to find the shortest cycle that visits all the vertices (a Hamiltonian cycle) -- it corresponds to having every node labelled 'mustpass'.
In your case, given that you have only about a dozen labelled 'mustpass', and given that 12! is rather small (479001600), you can simply try all permutations of only the 'mustpass' nodes, and look at the shortest path from 'start' to 'end' that visits the 'mustpass' nodes in that order -- it will simply be the concatenation of the shortest paths between every two consecutive nodes in that list.
In other words, first find the shortest distance between each pair of vertices (you can use Dijkstra's algorithm or others, but with those small numbers (100 nodes), even the simplest-to-code Floyd-Warshall algorithm will run in time). Then, once you have this in a table, try all permutations of your 'mustpass' nodes, and the rest.
Something like this:
//Precomputation: Find all pairs shortest paths, e.g. using Floyd-Warshall
n = number of nodes
for i=1 to n: for j=1 to n: d[i][j]=INF
for k=1 to n:
for i=1 to n:
for j=1 to n:
d[i][j] = min(d[i][j], d[i][k] + d[k][j])
//That *really* gives the shortest distance between every pair of nodes! :-)
//Now try all permutations
shortest = INF
for each permutation a[1],a[2],...a[k] of the 'mustpass' nodes:
shortest = min(shortest, d['start'][a[1]]+d[a[1]][a[2]]+...+d[a[k]]['end'])
print shortest
(Of course that's not real code, and if you want the actual path you'll have to keep track of which permutation gives the shortest distance, and also what the all-pairs shortest paths are, but you get the idea.)
It will run in at most a few seconds on any reasonable language :)
[If you have n nodes and k 'mustpass' nodes, its running time is O(n3) for the Floyd-Warshall part, and O(k!n) for the all permutations part, and 100^3+(12!)(100) is practically peanuts unless you have some really restrictive constraints.]
run Djikstra's Algorithm to find the shortest paths between all of the critical nodes (start, end, and must-pass), then a depth-first traversal should tell you the shortest path through the resulting subgraph that touches all of the nodes start ... mustpasses ... end
This is two problems... Steven Lowe pointed this out, but didn't give enough respect to the second half of the problem.
You should first discover the shortest paths between all of your critical nodes (start, end, mustpass). Once these paths are discovered, you can construct a simplified graph, where each edge in the new graph is a path from one critical node to another in the original graph. There are many pathfinding algorithms that you can use to find the shortest path here.
Once you have this new graph, though, you have exactly the Traveling Salesperson problem (well, almost... No need to return to your starting point). Any of the posts concerning this, mentioned above, will apply.
Actually, the problem you posted is similar to the traveling salesman, but I think closer to a simple pathfinding problem. Rather than needing to visit each and every node, you simply need to visit a particular set of nodes in the shortest time (distance) possible.
The reason for this is that, unlike the traveling salesman problem, a corn maze will not allow you to travel directly from any one point to any other point on the map without needing to pass through other nodes to get there.
I would actually recommend A* pathfinding as a technique to consider. You set this up by deciding which nodes have access to which other nodes directly, and what the "cost" of each hop from a particular node is. In this case, it looks like each "hop" could be of equal cost, since your nodes seem relatively closely spaced. A* can use this information to find the lowest cost path between any two points. Since you need to get from point A to point B and visit about 12 inbetween, even a brute force approach using pathfinding wouldn't hurt at all.
Just an alternative to consider. It does look remarkably like the traveling salesman problem, and those are good papers to read up on, but look closer and you'll see that its only overcomplicating things. ^_^ This coming from the mind of a video game programmer who's dealt with these kinds of things before.
This is not a TSP problem and not NP-hard because the original question does not require that must-pass nodes are visited only once. This makes the answer much, much simpler to just brute-force after compiling a list of shortest paths between all must-pass nodes via Dijkstra's algorithm. There may be a better way to go but a simple one would be to simply work a binary tree backwards. Imagine a list of nodes [start,a,b,c,end]. Sum the simple distances [start->a->b->c->end] this is your new target distance to beat. Now try [start->a->c->b->end] and if that's better set that as the target (and remember that it came from that pattern of nodes). Work backwards over the permutations:
[start->a->b->c->end]
[start->a->c->b->end]
[start->b->a->c->end]
[start->b->c->a->end]
[start->c->a->b->end]
[start->c->b->a->end]
One of those will be shortest.
(where are the 'visited multiple times' nodes, if any? They're just hidden in the shortest-path initialization step. The shortest path between a and b may contain c or even the end point. You don't need to care)
Andrew Top has the right idea:
1) Djikstra's Algorithm
2) Some TSP heuristic.
I recommend the Lin-Kernighan heuristic: it's one of the best known for any NP Complete problem. The only other thing to remember is that after you expanded out the graph again after step 2, you may have loops in your expanded path, so you should go around short-circuiting those (look at the degree of vertices along your path).
I'm actually not sure how good this solution will be relative to the optimum. There are probably some pathological cases to do with short circuiting. After all, this problem looks a LOT like Steiner Tree: http://en.wikipedia.org/wiki/Steiner_tree and you definitely can't approximate Steiner Tree by just contracting your graph and running Kruskal's for example.
Considering the amount of nodes and edges is relatively finite, you can probably calculate every possible path and take the shortest one.
Generally this known as the travelling salesman problem, and has a non-deterministic polynomial runtime, no matter what the algorithm you use.
http://en.wikipedia.org/wiki/Traveling_salesman_problem
The question talks about must-pass in ANY order. I have been trying to search for a solution about the defined order of must-pass nodes. I found my answer but since no question on StackOverflow had a similar question I'm posting here to let maximum people benefit from it.
If the order or must-pass is defined then you could run dijkstra's algorithm multiple times. For instance let's assume you have to start from s pass through k1, k2 and k3 (in respective order) and stop at e. Then what you could do is run dijkstra's algorithm between each consecutive pair of nodes. The cost and path would be given by:
dijkstras(s, k1) + dijkstras(k1, k2) + dijkstras(k2, k3) + dijkstras(k3, 3)
How about using brute force on the dozen 'must visit' nodes. You can cover all the possible combinations of 12 nodes easily enough, and this leaves you with an optimal circuit you can follow to cover them.
Now your problem is simplified to one of finding optimal routes from the start node to the circuit, which you then follow around until you've covered them, and then find the route from that to the end.
Final path is composed of :
start -> path to circuit* -> circuit of must visit nodes -> path to end* -> end
You find the paths I marked with * like this
Do an A* search from the start node to every point on the circuit
for each of these do an A* search from the next and previous node on the circuit to the end (because you can follow the circuit round in either direction)
What you end up with is a lot of search paths, and you can choose the one with the lowest cost.
There's lots of room for optimization by caching the searches, but I think this will generate good solutions.
It doesn't go anywhere near looking for an optimal solution though, because that could involve leaving the must visit circuit within the search.
One thing that is not mentioned anywhere, is whether it is ok for the same vertex to be visited more than once in the path. Most of the answers here assume that it's ok to visit the same edge multiple times, but my take given the question (a path should not visit the same vertex more than once!) is that it is not ok to visit the same vertex twice.
So a brute force approach would still apply, but you'd have to remove vertices already used when you attempt to calculate each subset of the path.

Resources