Shortest path between two vertex in a complete graph [duplicate] - algorithm

This question already has answers here:
Complete graph with only two possible costs. What's the shortest path's cost from 0 to N - 1
(4 answers)
Closed 8 years ago.
I have a complete graph with N vertices and I need to find the shortest path from a given source to a given destination. All edges have initial cost A, then for K edges the cost will change to B. What is the best way to find the minimal cost between vertex 1 and vertex N [The algorithm finds the lowest cost (i.e. the shortest path) between vertex 1 and vertex N]? The input is N K A B and K edges (the edges with cost B).
where:
2 <= N <= 500000
0 <= K <= 500000
1 <= A, B <= 500000
I've tried with Dijkstra but take to much time ~ 2min, and i need something like 2sec.

If the cost of the edge between 1 and N is A.
1) if A<B, then the lowest cost will be A.
2) if A>B, then use BFS to find the fewest hops from 1 to N through only the edges with cost B. Assume that there are at lestL edges between 1 and N, then return min(LB,A). It is typical BFS and the cost is O(N+K).
If the edge between 1 and N is B.
1) if 'A>B', then the answer is B.
2) Find the fewest hops from 1 to N only using the edge with cost A. Let S[h] be the set of vertices can be reached by h hops and S' be the set have not reached yet, then it can be solved as follows.
min_dis() {
S[0] = {1};
int h = 0;
S'={2,...,N};
while (S[h] is not empty) {
S[h+1] = {};
for_each (v1 in S'){
for (v2 in S[h]) {
if (cost[v1][v2] == A) {
S[h+1].insert(1);
S'.remove(v1);
if (v1 == N) return min((h+1)*A, B);
break;
}
}
}
h++;
}
return B;
}
We can proof that this algorithm is also O(N+K), since each time we testconst[v1][v2]==A is true , the size of S' will be decreased by 1 and there are at most K time when this test is false because there are at most K edge with cost B. So it guarantees to be finished with O(N+K)
In total, the algorithm is O(N+K), which will guarantee the 2sec time limit.

Related

Negative Cycles in Bellman-Ford

In a directed graph with V nodes and E edges, the Bellman-Ford algorithm relaxes every vertex (or rather, the edges going out of every vertex) (V - 1) times. This is because the shortest path from the source to any other node contains at most (V - 1) edges. In the V-th iteration, if an edge can be relaxed, it indicates the presence of a negative cycle.
Now, I need to find the other nodes "ruined" by this negative cycle. That is, some nodes not on the negative cycle now have a distance of negative infinity from the source because of one or more nodes on the path from the source to the node that lie in a negative cycle.
One way to accomplish this is to run Bellman-Ford and take note of the nodes on negative cycles. Then, run DFS/BFS from these nodes to mark other nodes.
However, why can't we run the Bellman-Ford 2 * (V - 1) times to detect such nodes without resorting to DFS/BFS? If my understanding is right, relaxing all vertices 2 * (V - 1) times should allow the negative cycles to "propagate" their values to all other connected nodes.
Additional Details: I encountered this situation when solving this online problem: https://open.kattis.com/problems/shortestpath3
The Java code that I used (along with BFS/DFS that is not shown here) is as follows:
// Relax all vertices n - 1 times.
// And relax one more time to find negative cycles
for (int vv = 1; vv <= n; vv++) {
// Relax each vertex
for (int v = 0; v < n; v++) {
// For each edge
if (distTo[v] != (int) 1e9) {
for (int i = 0; i < adjList[v].size(); i++) {
int dest = adjList[v].get(i).fst;
int wt = adjList[v].get(i).snd;
if (distTo[v] + wt < distTo[dest]) {
distTo[dest] = distTo[v] + wt;
if (vv == n) {
isInfinite[v] = true;
isInfinite[dest] = true;
}
}
}
}
}
}
Consider a graph with N=4, M=5:
A -> B weight 1000
A -> C weight 1000
C -> D weight -1
D -> C weight -1
D -> B weight 1000
Let A be our source and B be the destination.
Now obviously there is a negative cycle (C <-> D). But whether we run the algorithm N times or 2N times or even 3N times, the shortest path from A to B is still 1000. Since the negative cycle only reduces the distance by a small amount every time it is used, it does not propogate to the other nodes as we expect it to.
A solution would be to mark the distance as negative infinity once a cycle affecting a node is identified. That way the negative cycle "takes precendence" over other shortest paths through other nodes.
Yours sincerely,
A fellow coder who has spent lots of time on this problem.
In a classical situation all nodes "on" a negative length cycle have an arbitrary small distance to the source.
So in each iteration after the v-1th the path from source to such nodes gets smaller.
The task requires you to return -infinity for all such nodes.
You could use a modified version of Bellman-Ford algorithm to mark the distance for all such nodes as -infinity and run it v-1 times to get the -infinity propagated to all other nodes connected to the cycle. But this takes a lot of extra time compared to just run DFS or BFS from the nodes on the cycle.

Longest path in ordered graph

Let G = (V, E) be a directed graph with nodes v_1, v_2,..., v_n. We say that G is an ordered graph if it has the following properties.
Each edge goes from a node with lower index to a node with a higher index. That is, every directed edge has the form (v_i, v_j) with i < j.
Each node except v_n has at least one edge leaving it. That is, for every node v_i, there is at least one edge of the form (v_i, v_j).
Give an efficient algorithm that takes an ordered graph G and returns the length of the longest path that begins at v_1 and ends at v_n.
If you want to see the nice latex version: here
My attempt:
Dynamic programming. Opt(i) = max {Opt(j)} + 1. for all j such such j is reachable from i.
Is there perhaps a better way to do this? I think even with memoization my algorithm will still be exponential. (this is just from an old midterm review I found online)
Your approach is right, you will have to do
Opt(i) = max {Opt(j)} + 1} for all j such that j is reachable from i
However, this is exponential only if you run it without memoization. With memoization, you will have the memoized optimal value for every node j, j > i, when you are on node i.
For the worst case complexity, let us assume that every two nodes that could be connected are connected. This means, v_1 is connected with (v_2, v_3, ... v_n); v_i is connected with (v_(i+1), v_(i+2), ... v_n).
Number of Vertices (V) = n
Hence, number of edges (E) = n*(n+1)/2 = O(V^2)
Let us focus our attention on a vertex v_k. For this vertex, we have to go through the already derived optimal values of (n-k) nodes.
Number of ways of reaching v_k directly = (k-1)
Hence worst case time complexity => sigma((k-1)*(n-k)) from k=1 to k=n, which is a sigma of power 2 polynomical, and hence will result in O(n^3) Time complexity.
Simplistically, the worst case time complexity is O(n^3) == O(V^3) == O(E) * O(V) == O(EV).
Thanks to the first property, this problem can be solved O(V^2) or even better with O(E) where V is the number of vertices and E is the number of edges. Indeed, it uses the dynamic programming approach which is quiet similar with the one you gives. Let opt[i] be the length of the longest path for v_1 to v_i. Then
opt[i] = max(opt[j]) + 1 where j < i and we v_i and v_j is connected,
using this equation, it can be solved in O(V^2).
Even better, we can solve this in another order.
int LongestPath() {
for (int v = 1; v <= V; ++v) opt[v] = -1;
opt[1] = 0;
for (int v = 1; v <= V; ++v) {
if (opt[v] >= 0) {
/* Each edge can be visited at most once,
thus the runtime time is bounded by |E|.
*/
for_each( v' can be reached from v)
opt[v'] = max(opt[v]+1, opt[v']);
}
}
return opt[V];
}

Complete graph with only two possible costs. What's the shortest path's cost from 0 to N - 1

You are given a complete undirected graph with N vertices. All but K edges have a cost of A. Those K edges have a cost of B and you know them (as a list of pairs). What's the minimum cost from node 0 to node N - 1.
2 <= N <= 500k
0 <= K <= 500k
1 <= A, B <= 500k
The problem is, obviously, when those K edges cost more than the other ones and node 0 and node N - 1 are connected by a K-edge.
Dijkstra doesn't work. I've even tried something very similar with a BFS.
Step1: Let G(0) be the set of "good" adjacent nodes with node 0.
Step2: For each node in G(0):
compute G(node)
if G(node) contains N - 1
return step
else
add node to some queue
repeat step2 and increment step
The problem is that this uses up a lot of time due to the fact that for every node you have to make a loop from 0 to N - 1 in order to find the "good" adjacent nodes.
Does anyone have any better ideas? Thank you.
Edit: Here is a link from the ACM contest: http://acm.ro/prob/probleme/B.pdf
This is laborous case work:
A < B and 0 and N-1 are joined by A -> trivial.
B < A and 0 and N-1 are joined by B -> trivial.
B < A and 0 and N-1 are joined by A ->
Do BFS on graph with only K edges.
A < B and 0 and N-1 are joined by B ->
You can check in O(N) time is there is a path with length 2*A (try every vertex in middle).
To check other path lengths following algorithm should do the trick:
Let X(d) be set of nodes reachable by using d shorter edges from 0. You can find X(d) using following algorithm: Take each vertex v with unknown distance and iterativelly check edges between v and vertices from X(d-1). If you found short edge, then v is in X(d) otherwise you stepped on long edge. Since there are at most K long edges you can step on them at most K times. So you should find distance of each vertex in at most O(N + K) time.
I propose a solution to a somewhat more general problem where you might have more than two types of edges and the edge weights are not bounded. For your scenario the idea is probably a bit overkill, but the implementation is quite simple, so it might be a good way to go about the problem.
You can use a segment tree to make Dijkstra more efficient. You will need the operations
set upper bound in a range as in, given U, L, R; for all x[i] with L <= i <= R, set x[i] = min(x[i], u)
find a global minimum
The upper bounds can be pushed down the tree lazily, so both can be implemented in O(log n)
When relaxing outgoing edges, look for the edges with cost B, sort them and update the ranges in between all at once.
The runtime should be O(n log n + m log m) if you sort all the edges upfront (by outgoing vertex).
EDIT: Got accepted with this approach. The good thing about it is that it avoids any kind of special casing. It's still ~80 lines of code.
In the case when A < B, I would go with kind of a BFS, where you would check where you can't reach instead of where you can. Here's the pseudocode:
G(k) is the set of nodes reachable by k cheap edges and no less. We start with G(0) = {v0}
while G(k) isn't empty and G(k) doesn't contain vN-1 and k*A < B
A = array[N] of zeroes
for every node n in G(k)
for every expensive edge (n,m)
A[m]++
# now we have that A[m] == |G(k)| iff m can't be reached by a cheap edge from any of G(k)
set G(k+1) to {m; A[m] < |G(k)|} except {n; n is in G(0),...G(k)}
k++
This way you avoid iterating through the (many) cheap edges and only iterate through the relatively few expensive edges.
As you have correctly noted, the problem comes when A > B and edge from 0 to n-1 has a cost of A.
In this case you can simply delete all edges in the graph that have a cost of A. This is because an optimal route shall only have edges with cost B.
Then you can perform a simple BFS since the costs of all edges are the same. It will give you optimal performance as pointed out by this link: Finding shortest path for equal weighted graph
Moreover, you can stop your BFS when the total cost exceeds A.

Floyd's Algorithm Explanation

Concerning
floyds(int a[][100],int n).
What does 'a' and represent and what does each of the two dimensions of a represent?
What does 'n' represent?
I have a list of locations, with a list of connections between those locations and have computed the distance between those connections that are connect to each other. Now I need to find shortest path between any given two locations (floyd's) - but need to understand how to apply floyds(int a[][100],int n) to my locations array, city dictionaries, and connection arrays.
FYI - Using objective C - iOS.
n is the number of nodes in the graph.
a is an distance matrix of the graph. a[i][j] is the cost (or distance) of the edge from node i to node j.
(Also read the definition of adjacency matrix if you need more help with the concept.)
/* Assume a function edgeCost(i,j) which returns the cost of the edge from i to j
2 (infinity if there is none).
3 Also assume that n is the number of vertices and edgeCost(i,i) = 0
4 */
5
6 int path[][];
7 /* A 2-dimensional matrix. At each step in the algorithm, path[i][j] is the shortest path
8 from i to j using intermediate vertices (1..k−1). Each path[i][j] is initialized to
9 edgeCost(i,j).
10 */
12 procedure FloydWarshall ()
13 for k := 1 to n
14 for i := 1 to n
15 for j := 1 to n
16 path[i][j] = min ( path[i][j], path[i][k]+path[k][j] );
http://en.wikipedia.org/wiki/Floyd-Warshall
wiki is very good~~~
floyd-warshall(W) // W is the adjacent matrix representation of graph..
n=W.rows;
for k=1 to n
for i=1 to n
for j=1 to n
w[i][j]=min(W[i][j],W[i][k]+W[k][j]);
return W;
It's a dp-algorithm.At the k-th iteration here W[i][j] is the shortest path between i and j and the vertices of the shortest path(excluding i and j) are from the set {1,2,3...,k-1,k}.In min(W[i][j],W[i][k]+W[k][j]), W[i][j] is the computed shortest path between i and j at k-1-th iteration and here since the intermediate vertices are from the set {1,2...k-1},so this path does not include vertex k. In W[i][k]+W[k][j],we include vertex k in the path.whichever between the two is minimum is the shortest path at k-th iteration.
Basically we check that whether we should include vertex k in the path or not.

Route problem in a graph: minimize average edge cost instead of total cost

I have a weighted graph, no negative weights, and I would like to find the path from one node to another, trying to minimize the cost for the single step. I don't need to minimize the total cost of the trip (as e.g. Dijkstra does) but the average step-cost. However, I have a constraint: K, the maximum number of nodes in the path.
So for example to go from A to J maybe Dijkstra would find this path (between parenthesis the weight)
A (4) D (6) J -> total cost: 10
and the algorithm I need, setting K = 10, would find something like
A (1) B (2) C (2) D (1) E (3) F (2) G (1) H (3) J -> total cost: 15
Is there any well known algorithm for this problem?
Thanks in advance.
Eugenio
Edit as answer to templatetypedef.
Some questions:
1) The fact that it can happen to take a cycle multiple times to drive down the average is not good for my problem: maybe I should have mentioned it but I don' want to visit the same node more than once
2) Is it possible to exploit the fact that I don't have negative weights?
3) When you said O(kE) you meant for the whole algorithm or just for the additional part?
Let's take this simple implementation in C where n=number of nodes e=number of edges, d is a vector with the distances, p a vector with the predecessor and a structure edges (u,v,w) memorize the edges in the graphs
for (i = 0; i < n; ++i)
d[i] = INFINITY;
d[s] = 0;
for (i = 0; i < n - 1; ++i)
for (j = 0; j < e; ++j)
if (d[edges[j].u] + edges[j].w < d[edges[j].v]){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
}
I'm not sure how I should modify the code according to your answer; to take into consideration the average instead of the total cost should this be enough?
for (i = 0; i < n; ++i)
d[i] = INFINITY;
d[s] = 0;
for (i = 0; i < n - 1; ++i)
steps = 0;
for (j = 0; j < e; ++j)
if ( (d[edges[j].u]+ edges[j].w)/(steps+1) < d[edges[j].v]/steps){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
steps++;
}
But anyway I don't know how take into consideration the K limit at the same time...Thanks again in advance for your help.
Edit
Since I can afford some errors I'm thinking about this naif solution:
precompute all the shortest paths and memorize in A
precompute all the shortest paths on a modified graph, where I cut the edges over a certain weight and memorize them in B
When I need a path, I look in A, e.g. from x to y this is the path
x->z->y
then for each step I look in B,
so for x > z I see if there is a connection in B, if not I keep x > z otherwise I fill the path x > z with the subpath provided by B, that could be something like x->j->h->z; then I do the same for z->y.
Each time I will also check if I'm adding a cyclic path.
Maybe I will get some weird paths but it could work in most of the case.
If I extend the solution trying with different "cut thresholds" maybe I can also be close to respect the K constrain.
I believe that you can solve this using a modified version of the Bellman-Ford algorithm.
Bellman-Ford is based on the following dynamic programming recurrence that tries to find the shortest path from some start node s to each other node that's of length no greater than m for some m. As a base case, when you consider paths of length zero, the only reachable node is s and the initial values are
BF(s, t, 0) = infinity
BF(s, s, 0) = 0
Then, if we know the values for a path of length m, we can find it for paths of length m + 1 by noting that the old path may still be valid, or we want to extend some path by length one:
BF(s, t, m + 1) = min {
BF(s, t, m),
BF(s, u, m) + d(u, t) for any node u connected to t
}
The algorithm as a whole works by noting that any shortest path must have length no greater than n and then using the above recurrence and dynamic programming to compute the value of BF(s, t, n) for all t. Its overall runtime is O(EV), since there are E edges to consider at each step and V total vertices.
Let's see how we can change this algorithm to solve your problem. First, to limit this to paths of length k, we can just cut off the Bellman-Ford iteration after finding all shortest paths of length up to k. To find the path with lowest average cost is a bit trickier. At each point, we'll track two quantities - the length of the shortest path reaching a node t and the average length of that path. When considering new paths that can reach t, our options are to either keep the earlier path we found (whose cost is given by the shortest path so far divided by the number of nodes in it) or to extend some other path by one step. The new cost of that path is then given by the total cost from before plus the edge length divided by the number of edges in the old path plus one. If we take the cheapest of these and then record both its cost and number of edges, at the end we will have computed the path with lowest average cost of length no greater than k in time O(kE). As an initialization, we will say that the path from the start node to itself has length 0 and average cost 0 (the average cost doesn't matter, since whenever we multiply it by the number of edges we get 0). We will also say that every other node is at distance infinity by saying that the average cost of an edge is infinity and that the number of edges is one. That way, if we ever try computing the cost of a path formed by extending the path, it will appear to have average cost infinity and won't be chosen.
Mathematically, the solution looks like this. At each point we store the average edge cost and the total number of edges at each node:
BF(s, t, 0).edges = 1
BF(s, t, 0).cost = infinity
BF(s, s, 0).edges = 0
BF(s, s, 0).cost = 0
BF(s, t, m + 1).cost = min {
BF(s, t, m).cost,
(BF(s, u, m).cost * BF(s, u, m).edges + d(u, t)) / (BF(s, u, m).edges + 1)
}
BF(s, t, m + 1).edges = {
BF(s, t, m).edges if you chose the first option above.
BF(s, u, m).edges + 1 else, where u is as above
}
Note that this may not find a simple path of length k, since minimizing the average cost might require you to take a cycle with low (positive or negative) cost multiple times to drive down the average. For example, if a graph has a cost-zero loop, you should just keep taking it as many times as you can.
EDIT: In response to your new questions, this approach won't work if you don't want to duplicate nodes on a path. As #comestibles has pointed out, this version of the problem is NP-hard, so unless P = NP you shouldn't expect to find any good polynomial-time algorithm for this problem.
As for the runtime, the algorithm I've described above runs in total time O(kE). This is because each iteration of computing the recurrence takes O(E) time and there are a total of k iterations.
Finally, let's look at your proposed code. I've reprinted it here:
for (i = 0; i < n - 1; ++i) {
steps = 0;
for (j = 0; j < e; ++j) {
if ( (d[edges[j].u]+ edges[j].w)/(steps+1) < d[edges[j].v]/steps){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
steps++;
}
}
}
Your first question was how to take k into account. This can be done easily by rewriting the outer loop to count up to k, not n - 1. That gives us this code:
for (i = 0; i < k; ++i) {
steps = 0;
for (j = 0; j < e; ++j) {
if ( (d[edges[j].u]+ edges[j].w)/(steps+1) < d[edges[j].v]/steps){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
steps++;
}
}
}
One problem that I'm noticing is that the modified Bellman-Ford algorithm needs to have each candidate best path store its number of edges independently, since each node's optimal path might be reached by a different number of edges. To fix this, I would suggest having the d array store two values - the number of edges required to reach the node and the average cost of a node along that path. You would then update your code by replacing the steps variable in these equations with the cached path lengths.
Hope this helps!
For the new version of your problem, there's a reduction from Hamilton path (making your problem intractable). Take an instance of Hamilton path (i.e., a graph whose edges are assumed to have unit weight), add source and sink vertices and edges of weight 2 from the source to all others and from the sink to all others. Set K = |V| + 2 and request a path from source to sink. There exists a Hamilton path if and only if the optimal mean edge length is (|V| + 3)/(|V| + 2).
Care to tell us why you want these paths so that we can advise you of a reasonable approximation strategy?
You can slightly modify Bellman-Ford algorithm to find minimum path using at most K edges/nodes.
If the number of edges is fixed than you have to minimize total cost, because average cost would be TotalCost/NumberOfEdges.
One of the solutions would be to iterate NumberOfEdges from 1 to K, find minimal total cost and choose minimum TotalCost/NumberOfEdges.

Resources