I have a digraph which is strongly connected (i.e. there is a path from i to j and j to i for each pair of nodes (i, j) in the graph G). I wish to find a strongly connected graph out of this graph such that the sum of all edges is the least.
To put it differently, I need to get rid of edges in such a way that after removing them, the graph will still be strongly connected and of least cost for the sum of edges.
I think it's an NP hard problem. I'm looking for an optimal solution, not approximation, for a small set of data like 20 nodes.
A more general description: Given a grap G(V,E) find a graph G'(V,E') such that if there exists a path from v1 to v2 in G than there also exists a path between v1 and v2 in G' and sum of each ei in E' is the least possible. so its similar to finding a minimum equivalent graph, only here we want to minimize the sum of edge weights rather than sum of edges.
My approach so far:
I thought of solving it using TSP with multiple visits, but it is not correct. My goal here is to cover each city but using a minimum cost path. So, it's more like the cover set problem, I guess, but I'm not exactly sure. I'm required to cover each and every city using paths whose total cost is minimum, so visiting already visited paths multiple times does not add to the cost.

Your problem is known as minimum spanning strong sub(di)graph (MSSS) or, more generally, minimum cost spanning sub(di)graph and is NP-hard indeed. See also another book: page 501 and page 480.
I'd start with removing all edges that don't satisfy the triangle inequality - you can remove edge a -> c if going a -> b -> c is cheaper. This reminds me of TSP, but don't know if that leads anywhere.
My previous answer was to use the Chu-Liu/Edmonds algorithm which solves Arborescence problem; as Kazoom and ShreevatsaR pointed out, this doesn't help.

I would try this in a dynamic programming kind of way.
0- put the graph into a list
1- make a list of new subgraphs of each graph in the previous list, where you remove one different edge for each of the new subgraphs
2- remove duplicates from the new list
3- remove all graphs from the new list that are not strongly connected
4- compare the best graph from the new list with the current best, if better, set new current best
5- if the new list is empty, the current best is the solution, otherwise, recurse/loop/goto 1
In Lisp, it could perhaps look like this:
(defun best-subgraph (digraphs &optional (current-best (best digraphs)))
(let* ((new-list (remove-if-not #'strongly-connected
(remove-duplicates (list-subgraphs-1 digraphs)
:test #'digraph-equal)))
(this-best (best (cons current-best new-list))))
(if (null new-list)
(best-subgraph new-list this-best))))
The definitions of strongly-connected, list-subgraphs-1, digraph-equal, best, and better are left as an exercise for the reader.

This problem is equivalent to the problem described here:

Few ideas that helped me to solve the famous facebull puzzle:
Preprocessing step:
Pruning: remove all edges a-b if there are cheaper or having the same cost path, for example: a-c-b.
Graph decomposition: you can solve subproblems if the graph has articulation points
Merge vertexes into one virtual vertex if there are only one outgoing edge.
Calculation step:
Get approximate solution using the directed TSP with repeated visits. Use Floyd Warshall and then solve Assignment problem O(N^3) using hungarian method. If we got once cycle - it's directed TSP solution, if not - use branch and bound TSP. After that we have upper bound value - the cycle of the minimum cost.
Exact solution - branch and bound approach. We remove the vertexes from the shortest cycle and try build strongly connected graph with less cost, than upper bound.
That's all folks. If you want to test your solutions - try it here:

Sounds like you want to use the Dijkstra algorithm

Seems like what you basically want is an optimal solution for traveling-salesman where it is permitted for nodes to be visited more than once.
Hmm. Could you solve this by essentially iterating over each node i and then doing a minimum spanning tree of all the edges pointing to that node i, unioned with another minimum spanning tree of all the edges pointing away from that node?

A 2-approximation to the minimal strongly connected subgraph is obtained by taking a union of a minimal in-branching and minimal out-branching, both rooted at the same (but arbitrary) vertex.
An out-branching, also known as arborescence, is a directed tree rooted at a single vertex spanning all vertexes. An in-branching is the same with reverse edges. These can be found by Edmonds' algorithm in time O(VE), and there are speedups to O(E log(V)) (see the wiki page). There is even an open source implementation.
The original reference for the 2-approximation result is the paper by JaJa and Frederickson, but the paper is not freely accessible.
There is even a 3/2 approximation by Adrian Vetta (PDF), but more complicated than the above.


Find all cyclic paths in a directed graph

The title is self explanatory. Here's a solution that I found in the internet that can help do this. Here's the link
I don't understand why not visiting a vertex having weight below the given threshold will solve the problem.
Additionally, I have no idea how to solve this using/not using this.
Let's restrict this to simple cycles - those which contain no subcycles. For each node in the graph, begin a depth-first search for that node. record each branch of the recursion tree which results in a match. While searching, never cross over nodes already traversed in the branch.
Consider the complete directed graph on n vertices. There are n(n-1) arcs and n! simple cycles of length n. The algorithm above isn't much worse than this at all. Simply constructing a new copy of the answer would take nearly as much time as running the above algorithm to do it, in the worst case at least.
If you want to find cycles in a directed (or even undirected) graph there is an intuitive way to do it:
For each edge (u, v) in the graph
1. Temporarily ignore the edge (u, v) in step 2
2. Run an algorithm to find all paths from v to u (using a backtrackig algorithm)
3. Output the computed paths in step 2 along with the edge (u, v) as cycles in the graph
Note that you will get duplicate cycles this way since a cycle of length k will be found k times.
You can play with this idea to find cycles with specific properties, as well. For example, if you are aiming to find the shortest weighted cycle in the graph instead of finding all cycles. You can use a Dijkstra in step 2, and take the minimum over all the cycles you find. If you wanna finding the cycle with the least number of edges you can use a BFS in step 2.
If you are more struggling with finding all paths in a graph, this question might help you. Although it's for a slightly different problem.
Counting/finding paths with backtracking

Algorithm to cover all edges given starting node

Working on an algorithm for a game I am developing with a friend and we got stuck. Currently, we have a cyclic undirected graph, and we are trying to find the quickest path from starting node S that covers every edge. We are not looking for a tour and there can be repeated edges.
Any ideas on an algorithm or approximation? I'm sure this problem is NP-hard, but I don't believe it's TSP.
Route Inspection
This is known as the route inspection problem and it does have a polynomial solution.
The basic idea (see the link for more details) is that it is easy to solve for an Eulerian path (where we visit every edge once), but an Eulerian path is only possible for certain graphs.
In particular, a graph has to be connected and have either 0 or 2 vertices of odd degree.
However, it is possible to generalise this for other graphs by adding additional edges in the cheapest way that will produce a graph that does have an Eulerian path. (Note that we have added more edges so we may travel multiple times over edges in the original graph.)
The way of choosing the best way to add additional edges is a maximal matching problem that can be solved in O(n^3).
Concidentally I wrote a simple demo earlier today (link to game) for a planar max-cut problem. The solution to this turns out to be based on exactly the same route inspection problem :)
I just spotted from the comments that in your particular case your graph may be a tree.
If so, then I believe the answer is much simpler as you just need to do a DFS over the tree making sure to visit the shallowest subtree first.
For example, suppose you have a tree with edges S->A and S->A->B. S has two subtrees, and you should visit A first because it is shallower.
The total edges visited will equal the number of edges visited in a full DFS, minus the depth of the last leaf visited, which is why to minimise the total edges you want to maximise the depth of the last leaf, and hence visit the shallowest subtree first.
This is somewhat like the Eulerian Path. The main distinction is that there may be dead-ends and you may be able to modify the algorithm to suit your needs. Pruning dead-ends is one option or you may be able to reduce the graph into a number of connected components.
DFS will work here. However you must have a good evaluation function to prun the branch early. Otherwise you can not solve this problem fast. You can refer to my discussion and implementation in Java here
Detail of my evaluation function
My first try is if the length of the current path plus the distance from U to G is not shorter than the minimum length (stored in minLength variable) we found, we will not visit U next because it can not lead a shorter path.
Actually, the above evaluation function is not efficient because it only works when we already visit most of the cities. We need to compute more precise the minimum length to reach G with all cities visited.
Assume s is the length from S to U, from U to visit G and pass all cities, the length is at least sā€™ = s + āˆ‘ minDistance(K) where K is an unvisited city and different from U; minDistance(K) is the minimum distance from K to an unvisited state. Basically, for each unvisited state, we assume that we can reach that city with the shortest edge. Note that those shortest edges may not compose a valid path. Then, we will not visit U if sā€™ ā‰„ minLength.
With that evaluation function, my program can handle the problem with 20 cities within 1 second. I also add another optimization to improve the performance more. Before running the program, I use greedy algorithm to get a good value for minLength. Specifically, for each city, we will visit the nearest city next. The reason is when we have a smaller minLength, we can prun more.

Finding X the lowest cost trees in graph

I have Graph with N nodes and edges with cost. (graph may be Complete but also can contain zero edges).
I want to find K trees in the graph (K < N) to ensure every node is visited and cost is the lowest possible.
Any recommendations what the best approach could be?
I tried to modify the problem to finding just single minimal spanning tree, but didn't succeeded.
Thank you for any hint!
little detail, which can be significant. To cost is not related to crossing the edge. The cost is the price to BUILD such edge. Once edge is built, you can traverse it forward and backwards with no cost. The problem is not to "ride along all nodes", the problem is about "creating a net among all nodes". I am sorry for previous explanation
The story
Here is the story i have heard and trying to solve.
There is a city, without connection to electricity. Electrical company is able to connect just K houses with electricity. The other houses can be connected by dropping cables from already connected houses. But dropping this cable cost something. The goal is to choose which K houses will be connected directly to power plant and which houses will be connected with separate cables to ensure minimal cable cost and all houses coverage :)
As others have mentioned, this is NP hard. However, if you're willing to accept a good solution, you could use simulated annealing. For example, the traveling salesman problem is NP hard, yet near-optimal solutions can be found using simulated annealing, e.g.
You are describing something like a cardinality constrained path cover. It's in the Traveling Salesman/ Vehicle routing family of problems and is NP-Hard. To create an algorithm you should ask
Are you only going to run it on small graphs.
Are you only going to run it on special cases of graphs which do have exact algorithms.
Can you live with a heuristic that solves the problem approximately.
Assume you can find a minimum spanning tree in O(V^2) using prim's algorithm.
For each vertex, find the minimum spanning tree with that vertex as the root.
This will be O(V^3) as you run the algorithm V times.
Sort these by total mass (sum of weights of their vertices) of the graph. This is O(V^2 lg V) which is consumed by the O(V^3) so essentially free in terms of order complexity.
Take the X least massive graphs - the roots of these are your "anchors" that are connected directly to the grid, as they are mostly likely to have the shortest paths. To determine which route it takes, you simply follow the path to root in each node in each tree and wire up whatever is the shortest. (This may be further optimized by sorting all paths to root and using only the shortest ones first. This will allow for optimizations on the next iterations. Finding path to root is O(V). Finding it for all V X times is O(V^2 * X). Because you would be doing this for every V, you're looking at O(V^3 * X). This is more than your biggest complexity, but I think the average case on these will be small, even if their worst case is large).
I cannot prove that this is the optimal solution. In fact, I am certain it is not. But when you consider an electrical grid of 100,000 homes, you can not consider (with any practical application) an NP hard solution. This gives you a very good solution in O(V^3 * X), which I imagine is going to give you a solution very close to optimal.
Looking at your story, I think that what you call a path can be a tree, which means that we don't have to worry about Hamiltonian circuits.
Looking at the proof of correctness of Prim's algorithm at, consider taking a minimum spanning tree and removing the most expensive X-1 links. I think the proof there shows that the result has the same cost as the best possible answer to your problem: the only difference is that when you compare edges, you may find that the new edge join two separated components, but in this case you can maintain the number of separated components by removing an edge with cost at most that of the new edge.
So I think an answer for your problem is to take a minimum spanning tree and remove the X-1 most expensive links. This is certainly the case for X=1!
Here is attempt at solving this...
For X=1 I can calculate minimal spanning tree (MST) with Prim's algorithm from each node (this node is the only one connected to the grid) and select the one with the lowest overall cost
For X=2 I create extra node (Power plant node) beside my graph. I connect it with random node (eg. N0) by edge with cost of 0. I am now sure I have one power plant plug right (the random node will definitely be in one of the tree, so whole tree will be connected). Now the iterative part. I take other node (eg. N1) and again connected with PP with 0 cost edge. Now I calculate MST. Then repeat this process with replacing N1 with N2, N3 ...
So I will test every pair [N0, NX]. The lowest cost MST wins.
For X>2 is it really the same as for X=2, but I have to test connect to PP every (x-1)-tuple and calculate MST
with x^2 for MST I have complexity about (N over X-1) * x^2... Pretty complex, but I think it will give me THE OPTIMAL solution
what do you think?
edit by random node I mean random but FIXED node
attempt to visualize for x=2 (each description belongs to image above it)
Let this be our city, nodes A - F are houses, edges are candidates to future cables (each has some cost to build)
Just for image, this could be the solution
Let the green one be the power plant, this is how can look connection to one tree
But this different connection is really the same (connection to power plant(pp) cost the same, cables remains untouched). That is why we can set one of the nodes as fixed point of contact to the pp. We can be sure, that the node will be in one of the trees, and it does not matter where in the tree is.
So let this be our fixed situation with G as PP. Edge (B,G) with zero cost is added.
Now I am trying to connect second connection with PP (A,G, cost 0)
Now I calculate MST from the PP. Because red edges are the cheapest (the can actually have even negative cost), is it sure, that both of them will be in MST.
So when running MST I get something like this. Imagine detaching PP and two MINIMAL COST trees left. This is the best solution for A and B are the connections to PP. I store the cost and move on.
Now I do the same for B and C connections
I could get something like this, so compare cost to previous one and choose the better one.
This way I have to try all the connection pairs (B,A) (B,C) (B,D) (B,E) (B,F) and the cheapest one is the winner.
For X=3 I would just test other tuples with one fixed again. (A,B,C) (A,B,D) ... (A,C,D) ... (A,E,F)
I just came up with the easy solution as follows:
N - node count
C - direct connections to the grid
E - available edges
1, Sort all edges by cost
2, Repeat (N-C) times:
Take the cheapest edge available
Check if adding this edge will not caused circles in already added edge
If not, add this edge
3, That is all... You will end up with C disjoint sets of edges, connect every set to the grid
Sounds like the famous Traveling Salesman problem. The problem known to be NP-hard. Take a look at the Wikipedia as your starting point:

Minimal path - all edges at least once

I have directed graph with lot of cycles, probably strongly connected, and I need to get a minimal cycle from it. I mean I need to get cycle, which is the shortest cycle in graph, and every edge is covered at least once.
I have been searching for some algorithm or some theoretical background, but only thing I have found is Chinese postman algorithm. But this solution is not for directed graph.
Can anybody help me? Thanks
Edit>> All edges of that graph have the same cost - for instance 1
Take a look at this paper - Directed Chinese Postman Problem. That is the correct problem classification though (assuming there are no more restrictions).
If you're just reading into theory, take a good read at this page, which is from the Algorithms Design Manual.
Key quote (the second half for the directed version):
The optimal postman tour can be constructed by adding the appropriate edges to the graph G so as to make it Eulerian. Specifically, we find the shortest path between each pair of odd-degree vertices in G. Adding a path between two odd-degree vertices in G turns both of them to even-degree, thus moving us closer to an Eulerian graph. Finding the best set of shortest paths to add to G reduces to identifying a minimum-weight perfect matching in a graph on the odd-degree vertices, where the weight of edge (i,j) is the length of the shortest path from i to j. For directed graphs, this can be solved using bipartite matching, where the vertices are partitioned depending on whether they have more ingoing or outgoing edges. Once the graph is Eulerian, the actual cycle can be extracted in linear time using the procedure described above.
I doubt that it's optimal, but you could do a queue based search assuming the graph is guaranteed to have a cycle. Each queue entry would contain a list of nodes representing paths. When you take an element off the queue, add all possible next steps to the queue, ensuring you are not re-visiting nodes. If the last node is the same as the first node, you've found the minimum cycle.
what you are looking for is called "Eulerian path". You can google it to find enough info, basics are here
And about algorithm, there is an algorithm called Fleury's algorithm, google for it or take a look here
I think it might be worth while just simply writing which vertices are odd and then find which combo of them will lead to the least amount of extra time (if the weights are for times or distances) then the total length will be every edge weight plus the extra. For example, if the odd order vertices are A,B,C,D try AB&CD then AC&BD and so on. (I'm not sure if this is a specifically named method, it just worked for me).
edit: just realised this mostly only works for undirected graphs.
The special case in which the network consists entirely of directed edges can be solved in polynomial time. I think the original paper is Matching, Euler tours and the Chinese postman (1973) - a clear description of the algorithm for the directed graph problem begins on page 115 (page 28 of the pdf):
When all of the edges of a connected graph are directed and the graph
is symmetric, there is a particularly simple and attractive algorithm for
specifying an Euler tour...
The algorithm to find an Euler tour in a directed, symmetric, connected graph G is to first find a spanning arborescence of G. Then, at
any node n, except the root r of the arborescence, specify any order for
the edges directed away from n so long as the edge of the arborescence
is last in the ordering. For the root r, specify any order at all for the
edges directed away from r.
This algorithm was used by van Aardenne-Ehrenfest and de Bruin to
enumerate all Euler tours in a certain directed graph [ 1 ].

Find the shortest path in a graph which visits certain nodes

I have a undirected graph with about 100 nodes and about 200 edges. One node is labelled 'start', one is 'end', and there's about a dozen labelled 'mustpass'.
I need to find the shortest path through this graph that starts at 'start', ends at 'end', and passes through all of the 'mustpass' nodes (in any order).
( / is the graph in question - it represents a corn maze in Lancaster, PA)
Everyone else comparing this to the Travelling Salesman Problem probably hasn't read your question carefully. In TSP, the objective is to find the shortest cycle that visits all the vertices (a Hamiltonian cycle) -- it corresponds to having every node labelled 'mustpass'.
In your case, given that you have only about a dozen labelled 'mustpass', and given that 12! is rather small (479001600), you can simply try all permutations of only the 'mustpass' nodes, and look at the shortest path from 'start' to 'end' that visits the 'mustpass' nodes in that order -- it will simply be the concatenation of the shortest paths between every two consecutive nodes in that list.
In other words, first find the shortest distance between each pair of vertices (you can use Dijkstra's algorithm or others, but with those small numbers (100 nodes), even the simplest-to-code Floyd-Warshall algorithm will run in time). Then, once you have this in a table, try all permutations of your 'mustpass' nodes, and the rest.
Something like this:
//Precomputation: Find all pairs shortest paths, e.g. using Floyd-Warshall
n = number of nodes
for i=1 to n: for j=1 to n: d[i][j]=INF
for k=1 to n:
for i=1 to n:
for j=1 to n:
d[i][j] = min(d[i][j], d[i][k] + d[k][j])
//That *really* gives the shortest distance between every pair of nodes! :-)
//Now try all permutations
shortest = INF
for each permutation a[1],a[2],...a[k] of the 'mustpass' nodes:
shortest = min(shortest, d['start'][a[1]]+d[a[1]][a[2]]+...+d[a[k]]['end'])
print shortest
(Of course that's not real code, and if you want the actual path you'll have to keep track of which permutation gives the shortest distance, and also what the all-pairs shortest paths are, but you get the idea.)
It will run in at most a few seconds on any reasonable language :)
[If you have n nodes and k 'mustpass' nodes, its running time is O(n3) for the Floyd-Warshall part, and O(k!n) for the all permutations part, and 100^3+(12!)(100) is practically peanuts unless you have some really restrictive constraints.]
run Djikstra's Algorithm to find the shortest paths between all of the critical nodes (start, end, and must-pass), then a depth-first traversal should tell you the shortest path through the resulting subgraph that touches all of the nodes start ... mustpasses ... end
This is two problems... Steven Lowe pointed this out, but didn't give enough respect to the second half of the problem.
You should first discover the shortest paths between all of your critical nodes (start, end, mustpass). Once these paths are discovered, you can construct a simplified graph, where each edge in the new graph is a path from one critical node to another in the original graph. There are many pathfinding algorithms that you can use to find the shortest path here.
Once you have this new graph, though, you have exactly the Traveling Salesperson problem (well, almost... No need to return to your starting point). Any of the posts concerning this, mentioned above, will apply.
Actually, the problem you posted is similar to the traveling salesman, but I think closer to a simple pathfinding problem. Rather than needing to visit each and every node, you simply need to visit a particular set of nodes in the shortest time (distance) possible.
The reason for this is that, unlike the traveling salesman problem, a corn maze will not allow you to travel directly from any one point to any other point on the map without needing to pass through other nodes to get there.
I would actually recommend A* pathfinding as a technique to consider. You set this up by deciding which nodes have access to which other nodes directly, and what the "cost" of each hop from a particular node is. In this case, it looks like each "hop" could be of equal cost, since your nodes seem relatively closely spaced. A* can use this information to find the lowest cost path between any two points. Since you need to get from point A to point B and visit about 12 inbetween, even a brute force approach using pathfinding wouldn't hurt at all.
Just an alternative to consider. It does look remarkably like the traveling salesman problem, and those are good papers to read up on, but look closer and you'll see that its only overcomplicating things. ^_^ This coming from the mind of a video game programmer who's dealt with these kinds of things before.
This is not a TSP problem and not NP-hard because the original question does not require that must-pass nodes are visited only once. This makes the answer much, much simpler to just brute-force after compiling a list of shortest paths between all must-pass nodes via Dijkstra's algorithm. There may be a better way to go but a simple one would be to simply work a binary tree backwards. Imagine a list of nodes [start,a,b,c,end]. Sum the simple distances [start->a->b->c->end] this is your new target distance to beat. Now try [start->a->c->b->end] and if that's better set that as the target (and remember that it came from that pattern of nodes). Work backwards over the permutations:
One of those will be shortest.
(where are the 'visited multiple times' nodes, if any? They're just hidden in the shortest-path initialization step. The shortest path between a and b may contain c or even the end point. You don't need to care)
Andrew Top has the right idea:
1) Djikstra's Algorithm
2) Some TSP heuristic.
I recommend the Lin-Kernighan heuristic: it's one of the best known for any NP Complete problem. The only other thing to remember is that after you expanded out the graph again after step 2, you may have loops in your expanded path, so you should go around short-circuiting those (look at the degree of vertices along your path).
I'm actually not sure how good this solution will be relative to the optimum. There are probably some pathological cases to do with short circuiting. After all, this problem looks a LOT like Steiner Tree: and you definitely can't approximate Steiner Tree by just contracting your graph and running Kruskal's for example.
Considering the amount of nodes and edges is relatively finite, you can probably calculate every possible path and take the shortest one.
Generally this known as the travelling salesman problem, and has a non-deterministic polynomial runtime, no matter what the algorithm you use.
The question talks about must-pass in ANY order. I have been trying to search for a solution about the defined order of must-pass nodes. I found my answer but since no question on StackOverflow had a similar question I'm posting here to let maximum people benefit from it.
If the order or must-pass is defined then you could run dijkstra's algorithm multiple times. For instance let's assume you have to start from s pass through k1, k2 and k3 (in respective order) and stop at e. Then what you could do is run dijkstra's algorithm between each consecutive pair of nodes. The cost and path would be given by:
dijkstras(s, k1) + dijkstras(k1, k2) + dijkstras(k2, k3) + dijkstras(k3, 3)
How about using brute force on the dozen 'must visit' nodes. You can cover all the possible combinations of 12 nodes easily enough, and this leaves you with an optimal circuit you can follow to cover them.
Now your problem is simplified to one of finding optimal routes from the start node to the circuit, which you then follow around until you've covered them, and then find the route from that to the end.
Final path is composed of :
start -> path to circuit* -> circuit of must visit nodes -> path to end* -> end
You find the paths I marked with * like this
Do an A* search from the start node to every point on the circuit
for each of these do an A* search from the next and previous node on the circuit to the end (because you can follow the circuit round in either direction)
What you end up with is a lot of search paths, and you can choose the one with the lowest cost.
There's lots of room for optimization by caching the searches, but I think this will generate good solutions.
It doesn't go anywhere near looking for an optimal solution though, because that could involve leaving the must visit circuit within the search.
One thing that is not mentioned anywhere, is whether it is ok for the same vertex to be visited more than once in the path. Most of the answers here assume that it's ok to visit the same edge multiple times, but my take given the question (a path should not visit the same vertex more than once!) is that it is not ok to visit the same vertex twice.
So a brute force approach would still apply, but you'd have to remove vertices already used when you attempt to calculate each subset of the path.
