Find cycle of shortest length in a directed graph with positive weights - algorithm

I was asked this question in an interview, but I couldn't come up with any decent solution. So, I told them the naive approach of finding all the cycles then picking the cycle with the least length.
I'm curious to know what is an efficient solution to this problem.

You can easily modify Floyd-Warshall algorithm. (If you're not familiar with graph theory at all, I suggest checking it out, e.g. getting a copy of Introduction to Algorithms).
Traditionally, you start path[i][i] = 0 for each i. But you can instead start from path[i][i] = INFINITY. It won't affect algorithm itself, as those zeroes weren't used in computation anyway (since path path[i][j] will never change for k == i or k == j).
In the end, path[i][i] is the length the shortest cycle going through i. Consequently, you need to find min(path[i][i]) for all i. And if you want cycle itself (not only its length), you can do it just like it's usually done with normal paths: by memorizing k during execution of algorithm.
In addition, you can also use Dijkstra's algorithm to find a shortest cycle going through any given node. If you run this modified Dijkstra for each node, you'll get the same result as with Floyd-Warshall. And since each Dijkstra is O(n^2), you'll get the same O(n^3) overall complexity.

The pseudo code is a simple modification of Dijkstra's algorithm.
for all u in V:
for all v in V:
path[u][v] = infinity
for all s in V:
path[s][s] = 0
H = makequeue (V) .. using pathvalues in path[s] array as keys
while H is not empty:
u = deletemin(H)
for all edges (u,v) in E:
if path[s][v] > path[s][u] + l(u, v) or path[s][s] == 0:
path[s][v] = path[s][u] + l(u,v)
decreaseKey(H, v)
lengthMinCycle = INT_MAX
for all v in V:
if path[v][v] < lengthMinCycle & path[v][v] != 0 :
lengthMinCycle = path[v][v]
if lengthMinCycle == INT_MAX:
print(“The graph is acyclic.”)
else:
print(“Length of minimum cycle is ”, lengthMinCycle)
Time Complexity: O(|V|^3)

Perform DFS
During DFS keep the track of the type of the edge
Type of edges are Tree Edge, Back Edge, Down Edge and Parent Edge
Keep track when you get a Back Edge and have another counter for getting length.
See Algorithms in C++ Part5 - Robert Sedgwick for more details

What you will have to do is to assign another weight to each node which is always 1. Now run any shortest path algorithm from one node to the same node using these weights. But while considering the intermediate paths, you will have to ignore the paths whose actual weights are negative.

Below is a simple modification of Floyd - Warshell algorithm.
V = 4
INF = 999999
def minimumCycleLength(graph):
dist = [[0]*V for i in range(V)]
for i in range(V):
for j in range(V):
dist[i][j] = graph[i][j];
for k in range(V):
for i in range(V):
for j in range(V):
dist[i][j] = min(dist[i][j] ,dist[i][k]+ dist[k][j])
length = INF
for i in range(V):
for j in range(V):
length = min(length,dist[i][j])
return length
graph = [ [INF, 1, 1,INF],
[INF, INF, 1,INF],
[1, INF, INF, 1],
[INF, INF, INF, 1] ]
length = minimumCycleLength(graph)
print length

Related

DAG Kth shortest path dynamic programming

This is not for homework. I am working through a practice test (not graded) in preparation for a final in a couple of weeks. I have no idea where to go with this one.
Let G = (V;E) be a DAG (directed-acyclic-graph) of n vertices and m edges.
Each edge (u; v) of E has a weight w(u; v) that is an arbitrary value (positive, zero, or negative).
Let k be an input positive integer.
A path in G is called a k-link path if the path has no more than k edges. Let s and t be two vertices of G. A k-link shortest path from s to t is defined as a k-link path from s to t that has the minimum total sum of edge weights among all possible k-link s-to-t paths in G.
Design an O(k(m+ n)) time algorithm to compute a k-link shortest path from s to t.
Any help on the algorithm would be greatly appreciated.
Let dp[amount][currentVertex] give us the length of the shortest path in G which starts from s, ends at currentVertex and consists of amount edges.
make all values of dp unset
dp[0][s] = 0
for pathLength in (0, 1, .. k-1) // (1)
for vertex in V
if dp[pathLength][vertex] is set
for each u where (vertex, u) is in E // (2), other vertex of the edge
if dp[pathLength+1][u] is unset or greater than dp[pathLength][vertex] + cost(vertex, u)
set dp[pathLength+1][u] = dp[pathLength][vertex] + cost(vertex, u)
best = dp[k][t]
for pathLength in (0, 1, .. k)
if dp[pathLength][t] < best
best = dp[pathLength][t]
The algorithm above will give you the length of the k-link shortest path from s to t in G. Its time complexity is dominated by the complexity for the loop (1). The loop (1) alone has complecity O(k), while its inner part - (2) simply traverses the graph. If you use an adjacency list, (2) can be implemented in O(n+m). Therefore the overall complexity is O(k*(n+m)).
However, this will give you only the length of the path, and not the path itself. You can modify this algorithm by storing the previous vertex for each value of dp[][]. Thus, whenever you set the value of dp[pathLength+1][u] with the value of dp[pathLength][vertex] + cost(vertex, u) for some variables vertex, u, pathLength you would know that the previous used vertex was vertex. Therefore, you would store it like prev[pathLength+1][u] = vertex.
After that, you can get the path you want like. The idea is to go backwards by using the links you had created in prev:
pLen = pathLength such that dp[pathLength][t] is minimal
curVertex = t
path = [] // empty array
while pLen >= 0
insert curVertex in the beginning of path
curVertex = prev[pLen][curVertex]
pLen = pLen - 1
path is stored the k-link shortest path from s to t in G.

Topological sort to find the number of paths to t

I have to develop an O(|V|+|E|) algorithm related to topological sort which, in a directed acyclic graph (DAG), determines the number of paths from each vertex of the graph to t (t is a node with out-degree 0). I have developed a modification of DFS as follow:
DFS(G,t):
for each vertex u ∈ V do
color(u) = WHITE
paths_to_t(u) = 0
for each vertex u ∈ V do
if color(u) == WHITE then
DFS-Visit(u,t)
DFS-Visit(u,t):
color(u) = GREY
for each v ∈ neighbors(u) do
if v == t then
paths_to_t(u) = paths_to_t(u) + 1
else then
if color(v) == WHITE then
DFS-Visit(v)
paths_to_t(u) = paths_to_t(u) + paths_to_t(v)
color(u) = BLACK
But I am not sure if this algorithm is related to topological sort or if should I restructure my work with another point of view.
It can be done using Dynamic Programming and topological sort as follows:
Topological sort the vertices, let the ordered vertices be v1,v2,...,vn
create new array of size t, let it be arr
init: arr[t] = 1
for i from t-1 to 1 (descending, inclusive):
arr[i] = 0
for each edge (v_i,v_j) such that i < j <= t:
arr[i] += arr[j]
When you are done, for each i in [1,t], arr[i] indicates the number of paths from vi to vt
Now, proving the above claim is easy (comparing to your algorithm, which I have no idea if its correct and how to prove it), it is done by induction:
Base: arr[t] == 1, and indeed there is a single path from t to t, the empty one.
Hypothesis: The claim is true for each k in range m < k <= t
Proof: We need to show the claim is correct for m.
Let's look at each out edge from vm: (v_m,v_i).
Thus, the number of paths to vt starting from v_m that use this edge (v_m,v_i). is exactly arr[i] (induction hypothesis). Summing all possibilities of out edges from v_m, gives us the total number of paths from v_m to v_t - and this is exactly what the algorithm do.
Thus, arr[m] = #paths from v_m to v_t
QED
Time complexity:
The first step (topological sort) takes O(V+E).
The loop iterate all edges once, and all vertices once, so it is O(V+E) as well.
This gives us total complexity of O(V+E)

Path finding algorithm on graph considering both nodes and edges

I have an undirected graph. For now, assume that the graph is complete. Each node has a certain value associated with it. All edges have a positive weight.
I want to find a path between any 2 given nodes such that the sum of the values associated with the path nodes is maximum while at the same time the path length is within a given threshold value.
The solution should be "global", meaning that the path obtained should be optimal among all possible paths. I tried a linear programming approach but am not able to formulate it correctly.
Any suggestions or a different method of solving would be of great help.
Thanks!
If you looking for an algorithm in general graph, your problem is NP-Complete, Assume path length threshold is n-1, and each vertex has value 1, If you find the solution for your problem, you can say given graph has Hamiltonian path or not. In fact If your maximized vertex size path has value n, then you have a Hamiltonian path. I think you can use something like Held-Karp relaxation, for finding good solution.
This might not be perfect, but if the threshold value (T) is small enough, there's a simple algorithm that runs in O(n^3 T^2). It's a small modification of Floyd-Warshall.
d = int array with size n x n x (T + 1)
initialize all d[i][j][k] to -infty
for i in nodes:
d[i][i][0] = value[i]
for e:(u, v) in edges:
d[u][v][w(e)] = value[u] + value[v]
for t in 1 .. T
for k in nodes:
for t' in 1..t-1:
for i in nodes:
for j in nodes:
d[i][j][t] = max(d[i][j][t],
d[i][k][t'] + d[k][j][t-t'] - value[k])
The result is the pair (i, j) with the maximum d[i][j][t] for all t in 0..T
EDIT: this assumes that the paths are allowed to be not simple, they can contain cycles.
EDIT2: This also assumes that if a node appears more than once in a path, it will be counted more than once. This is apparently not what OP wanted!
Integer program (this may be a good idea or maybe not):
For each vertex v, let xv be 1 if vertex v is visited and 0 otherwise. For each arc a, let ya be the number of times arc a is used. Let s be the source and t be the destination. The objective is
maximize ∑v value(v) xv .
The constraints are
∑a value(a) ya ≤ threshold
∀v, ∑a has head v ya - ∑a has tail v ya = {-1 if v = s; 1 if v = t; 0 otherwise (conserve flow)
∀v ≠ x, xv ≤ ∑a has head v ya (must enter a vertex to visit)
∀v, xv ≤ 1 (visit each vertex at most once)
∀v ∉ {s, t}, ∀cuts S that separate vertex v from {s, t}, xv ≤ ∑a such that tail(a) ∉ S &wedge; head(a) &in; S ya (benefit only from vertices not on isolated loops).
To solve, do branch and bound with the relaxation values. Unfortunately, the last group of constraints are exponential in number, so when you're solving the relaxed dual, you'll need to generate columns. Typically for connectivity problems, this means using a min-cut algorithm repeatedly to find a cut worth enforcing. Good luck!
If you just add the weight of a node to the weights of its outgoing edges you can forget about the node weights. Then you can use any of the standard algorigthms for the shortest path problem.

Route problem in a graph: minimize average edge cost instead of total cost

I have a weighted graph, no negative weights, and I would like to find the path from one node to another, trying to minimize the cost for the single step. I don't need to minimize the total cost of the trip (as e.g. Dijkstra does) but the average step-cost. However, I have a constraint: K, the maximum number of nodes in the path.
So for example to go from A to J maybe Dijkstra would find this path (between parenthesis the weight)
A (4) D (6) J -> total cost: 10
and the algorithm I need, setting K = 10, would find something like
A (1) B (2) C (2) D (1) E (3) F (2) G (1) H (3) J -> total cost: 15
Is there any well known algorithm for this problem?
Thanks in advance.
Eugenio
Edit as answer to templatetypedef.
Some questions:
1) The fact that it can happen to take a cycle multiple times to drive down the average is not good for my problem: maybe I should have mentioned it but I don' want to visit the same node more than once
2) Is it possible to exploit the fact that I don't have negative weights?
3) When you said O(kE) you meant for the whole algorithm or just for the additional part?
Let's take this simple implementation in C where n=number of nodes e=number of edges, d is a vector with the distances, p a vector with the predecessor and a structure edges (u,v,w) memorize the edges in the graphs
for (i = 0; i < n; ++i)
d[i] = INFINITY;
d[s] = 0;
for (i = 0; i < n - 1; ++i)
for (j = 0; j < e; ++j)
if (d[edges[j].u] + edges[j].w < d[edges[j].v]){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
}
I'm not sure how I should modify the code according to your answer; to take into consideration the average instead of the total cost should this be enough?
for (i = 0; i < n; ++i)
d[i] = INFINITY;
d[s] = 0;
for (i = 0; i < n - 1; ++i)
steps = 0;
for (j = 0; j < e; ++j)
if ( (d[edges[j].u]+ edges[j].w)/(steps+1) < d[edges[j].v]/steps){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
steps++;
}
But anyway I don't know how take into consideration the K limit at the same time...Thanks again in advance for your help.
Edit
Since I can afford some errors I'm thinking about this naif solution:
precompute all the shortest paths and memorize in A
precompute all the shortest paths on a modified graph, where I cut the edges over a certain weight and memorize them in B
When I need a path, I look in A, e.g. from x to y this is the path
x->z->y
then for each step I look in B,
so for x > z I see if there is a connection in B, if not I keep x > z otherwise I fill the path x > z with the subpath provided by B, that could be something like x->j->h->z; then I do the same for z->y.
Each time I will also check if I'm adding a cyclic path.
Maybe I will get some weird paths but it could work in most of the case.
If I extend the solution trying with different "cut thresholds" maybe I can also be close to respect the K constrain.
I believe that you can solve this using a modified version of the Bellman-Ford algorithm.
Bellman-Ford is based on the following dynamic programming recurrence that tries to find the shortest path from some start node s to each other node that's of length no greater than m for some m. As a base case, when you consider paths of length zero, the only reachable node is s and the initial values are
BF(s, t, 0) = infinity
BF(s, s, 0) = 0
Then, if we know the values for a path of length m, we can find it for paths of length m + 1 by noting that the old path may still be valid, or we want to extend some path by length one:
BF(s, t, m + 1) = min {
BF(s, t, m),
BF(s, u, m) + d(u, t) for any node u connected to t
}
The algorithm as a whole works by noting that any shortest path must have length no greater than n and then using the above recurrence and dynamic programming to compute the value of BF(s, t, n) for all t. Its overall runtime is O(EV), since there are E edges to consider at each step and V total vertices.
Let's see how we can change this algorithm to solve your problem. First, to limit this to paths of length k, we can just cut off the Bellman-Ford iteration after finding all shortest paths of length up to k. To find the path with lowest average cost is a bit trickier. At each point, we'll track two quantities - the length of the shortest path reaching a node t and the average length of that path. When considering new paths that can reach t, our options are to either keep the earlier path we found (whose cost is given by the shortest path so far divided by the number of nodes in it) or to extend some other path by one step. The new cost of that path is then given by the total cost from before plus the edge length divided by the number of edges in the old path plus one. If we take the cheapest of these and then record both its cost and number of edges, at the end we will have computed the path with lowest average cost of length no greater than k in time O(kE). As an initialization, we will say that the path from the start node to itself has length 0 and average cost 0 (the average cost doesn't matter, since whenever we multiply it by the number of edges we get 0). We will also say that every other node is at distance infinity by saying that the average cost of an edge is infinity and that the number of edges is one. That way, if we ever try computing the cost of a path formed by extending the path, it will appear to have average cost infinity and won't be chosen.
Mathematically, the solution looks like this. At each point we store the average edge cost and the total number of edges at each node:
BF(s, t, 0).edges = 1
BF(s, t, 0).cost = infinity
BF(s, s, 0).edges = 0
BF(s, s, 0).cost = 0
BF(s, t, m + 1).cost = min {
BF(s, t, m).cost,
(BF(s, u, m).cost * BF(s, u, m).edges + d(u, t)) / (BF(s, u, m).edges + 1)
}
BF(s, t, m + 1).edges = {
BF(s, t, m).edges if you chose the first option above.
BF(s, u, m).edges + 1 else, where u is as above
}
Note that this may not find a simple path of length k, since minimizing the average cost might require you to take a cycle with low (positive or negative) cost multiple times to drive down the average. For example, if a graph has a cost-zero loop, you should just keep taking it as many times as you can.
EDIT: In response to your new questions, this approach won't work if you don't want to duplicate nodes on a path. As #comestibles has pointed out, this version of the problem is NP-hard, so unless P = NP you shouldn't expect to find any good polynomial-time algorithm for this problem.
As for the runtime, the algorithm I've described above runs in total time O(kE). This is because each iteration of computing the recurrence takes O(E) time and there are a total of k iterations.
Finally, let's look at your proposed code. I've reprinted it here:
for (i = 0; i < n - 1; ++i) {
steps = 0;
for (j = 0; j < e; ++j) {
if ( (d[edges[j].u]+ edges[j].w)/(steps+1) < d[edges[j].v]/steps){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
steps++;
}
}
}
Your first question was how to take k into account. This can be done easily by rewriting the outer loop to count up to k, not n - 1. That gives us this code:
for (i = 0; i < k; ++i) {
steps = 0;
for (j = 0; j < e; ++j) {
if ( (d[edges[j].u]+ edges[j].w)/(steps+1) < d[edges[j].v]/steps){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
steps++;
}
}
}
One problem that I'm noticing is that the modified Bellman-Ford algorithm needs to have each candidate best path store its number of edges independently, since each node's optimal path might be reached by a different number of edges. To fix this, I would suggest having the d array store two values - the number of edges required to reach the node and the average cost of a node along that path. You would then update your code by replacing the steps variable in these equations with the cached path lengths.
Hope this helps!
For the new version of your problem, there's a reduction from Hamilton path (making your problem intractable). Take an instance of Hamilton path (i.e., a graph whose edges are assumed to have unit weight), add source and sink vertices and edges of weight 2 from the source to all others and from the sink to all others. Set K = |V| + 2 and request a path from source to sink. There exists a Hamilton path if and only if the optimal mean edge length is (|V| + 3)/(|V| + 2).
Care to tell us why you want these paths so that we can advise you of a reasonable approximation strategy?
You can slightly modify Bellman-Ford algorithm to find minimum path using at most K edges/nodes.
If the number of edges is fixed than you have to minimize total cost, because average cost would be TotalCost/NumberOfEdges.
One of the solutions would be to iterate NumberOfEdges from 1 to K, find minimal total cost and choose minimum TotalCost/NumberOfEdges.

Shortest path with a fixed number of edges

Find the shortest path through a graph in efficient time, with the additional constraint that the path must contain exactly n nodes.
We have a directed, weighted graph. It may, or may not contain a loop. We can easily find the shortest path using Dijkstra's algorithm, but Dijkstra's makes no guarantee about the number of edges.
The best we could come up with was to keep a list of the best n paths to a node, but this uses a huge amount of memory over vanilla Dijkstra's.
It is a simple dynamic programming algorithm.
Let us assume that we want to go from vertex x to vertex y.
Make a table D[.,.], where D[v,k] is the cost of the shortest path of length k from the starting vertex x to the vertex v.
Initially D[x,1] = 0. Set D[v,1] = infinity for all v != x.
For k=2 to n:
D[v,k] = min_u D[u,k-1] + wt(u,v), where we assume that wt(u,v) is infinite for missing edges.
P[v,k] = the u that gave us the above minimum.
The length of the shortest path will then be stored in D[y,n].
If we have a graph with fewer edges (sparse graph), we can do this efficiently by only searching over the u that v is connected to. This can be done optimally with an array of adjacency lists.
To recover the shortest path:
Path = empty list
v = y
For k= n downto 1:
Path.append(v)
v = P[v,k]
Path.append(x)
Path.reverse()
The last node is y. The node before that is P[y,n]. We can keep following backwards, and we will eventually arrive at P[v,2] = x for some v.
The alternative that comes to my mind is a depth first search (as opposed to Dijkstra's breadth first search), modified as follows:
stop "depth"-ing if the required vertex count is exceeded
record the shortest found (thus far) path having the correct number of nodes.
Run time may be abysmal, but it should come up with the correct result while using a very reasonable amount of memory.
Interesting problem. Did you discuss using a heuristic graph search (such as A*), adding a penalty for going over or under the node count? This may or may not be admissible, but if it did work, it may be more efficient than keeping a list of all the potential paths.
In fact, you may be able to use backtracking to limit the amount of memory being used for the Dijkstra variation you discussed.
A rough idea of an algorithm:
Let A be the start node, and let S be a set of nodes (plus a path). The invariant is that at the end of step n, S will all nodes that are exactly n steps from A and the paths will be the shortest paths of that length. When n is 0, that set is {A (empty path)}. Given such a set at step n - 1, you get to step n by starting with an empty set S1 and
for each (node X, path P) in S
for each edge E from X to Y in S,
If Y is not in S1, add (Y, P + Y) to S1
If (Y, P1) is in S1, set the path to the shorter of P1 and P + Y
There are only n steps, and each step should take less than max(N, E), which makes the
entire algorithm O(n^3) for a dense graph and O(n^2) for a sparse graph.
This algorith was taken from looking at Dijkstra's, although it is a different algorithm.
let say we want shortest distance from node x to y of k step
simple dp solution would be
A[k][x][y] = min over { A[1][i][k] + A[t-1][k][y] }
k varies from 0 to n-1
A[1][i][j] = r[i][j]; p[1][i][j]=j;
for(t=2; t<=n; t++)
for(i=0; i<n; i++) for(j=0; j<n; j++)
{
A[t][i][j]=BG; p[t][i][j]=-1;
for(k=0; k<n; k++) if(A[1][i][k]<BG && A[t-1][k][j]<BG)
if(A[1][i][k]+A[t-1][k][j] < A[t][i][j])
{
A[t][i][j] = A[1][i][k]+A[t-1][k][j];
p[t][i][j] = k;
}
}
trace back the path
void output(int a, int b, int t)
{
while(t)
{
cout<<a<<" ";
a = p[t][a][b];
t--;
}
cout<<b<<endl;
}

Resources