Let G = (V, E) be a directed graph with nodes v_1, v_2,..., v_n. We say that G is an ordered graph if it has the following properties.
Each edge goes from a node with lower index to a node with a higher index. That is, every directed edge has the form (v_i, v_j) with i < j.
Each node except v_n has at least one edge leaving it. That is, for every node v_i, there is at least one edge of the form (v_i, v_j).
Give an efficient algorithm that takes an ordered graph G and returns the length of the longest path that begins at v_1 and ends at v_n.
If you want to see the nice latex version: here
My attempt:
Dynamic programming. Opt(i) = max {Opt(j)} + 1. for all j such such j is reachable from i.
Is there perhaps a better way to do this? I think even with memoization my algorithm will still be exponential. (this is just from an old midterm review I found online)
Your approach is right, you will have to do
Opt(i) = max {Opt(j)} + 1} for all j such that j is reachable from i
However, this is exponential only if you run it without memoization. With memoization, you will have the memoized optimal value for every node j, j > i, when you are on node i.
For the worst case complexity, let us assume that every two nodes that could be connected are connected. This means, v_1 is connected with (v_2, v_3, ... v_n); v_i is connected with (v_(i+1), v_(i+2), ... v_n).
Number of Vertices (V) = n
Hence, number of edges (E) = n*(n+1)/2 = O(V^2)
Let us focus our attention on a vertex v_k. For this vertex, we have to go through the already derived optimal values of (n-k) nodes.
Number of ways of reaching v_k directly = (k-1)
Hence worst case time complexity => sigma((k-1)*(n-k)) from k=1 to k=n, which is a sigma of power 2 polynomical, and hence will result in O(n^3) Time complexity.
Simplistically, the worst case time complexity is O(n^3) == O(V^3) == O(E) * O(V) == O(EV).
Thanks to the first property, this problem can be solved O(V^2) or even better with O(E) where V is the number of vertices and E is the number of edges. Indeed, it uses the dynamic programming approach which is quiet similar with the one you gives. Let opt[i] be the length of the longest path for v_1 to v_i. Then
opt[i] = max(opt[j]) + 1 where j < i and we v_i and v_j is connected,
using this equation, it can be solved in O(V^2).
Even better, we can solve this in another order.
int LongestPath() {
for (int v = 1; v <= V; ++v) opt[v] = -1;
opt[1] = 0;
for (int v = 1; v <= V; ++v) {
if (opt[v] >= 0) {
/* Each edge can be visited at most once,
thus the runtime time is bounded by |E|.
*/
for_each( v' can be reached from v)
opt[v'] = max(opt[v]+1, opt[v']);
}
}
return opt[V];
}
Related
In an undirected graph with V vertices and E edges how would you count the number of triangles in O(|V||E|)? I see the algorithm here but I'm not exactly sure how that would be implemented to achieve that complexity. Here's the code presented in that post:
for each edge (u, v):
for each vertex w:
if (v, w) is an edge and (w, u) is an edge:
return true
return false
Would you use an adjacency list representation of the graph to traverse all edges in the outer loop and then an adjacency matrix to check for the existence of the 2 edges in the inner loop?
Also, I saw a another solution presented as O(|V||E|) which involves performing a depth-first search on the graph and when you encounter a backedge (u,v) from the vertex u you're visiting check if the grandparent of the vertex u is vertex v. If it is then you have found a triangle. Is this algorithm correct? If so, wouldn't this be O(|V|+|E|)? In the post I linked to there is a counterexample for the breadth-first search solution offered up but based on the examples I came up with it seems like the depth-first search method I outlined above works.
Firstly, note that the algorithm does not so much count the number of triangles, but rather returns whether one exists at all.
For the first algorithm, the analysis becomes simple if we assume that we can do the lookup of (a, b) is an edge in constant time. (Since we loop over all vertices for all edges, and only do something with constant time we get O(|V||E|*1). ) Telling whether something is a member of a set in constant time can be done using for example a hashtable/set. We could also, as you said, do this by the use of the adjacency matrix, which we could create beforehand by looping over all the edges, not changing our total complexity.
An adjacency list representation for looping over the edges could perhaps be used, but traversing this may be O(|V|+|E|), giving us the total complexity O(|V||V| + |V||E|) which may be more than we wanted. If that is the case, we should instead loop over this first, and add all our edges to a normal collection (like a list).
For your proposed DFS algorithm, the problem is that we cannot be sure to encounter a certain edge as a backedge at the correct moment, as is illustrated by the following counterexample:
A -- B --- C -- D
\ / |
E ----- F
Here if we look from A-B-C-E, and then find the backedge E-B, we correctly find the triangle; but if we instead go A-B-C-D-F-E, the backedges E-B, and E-C, do no longer satisfy our condition.
This is a naive approach to count the number of cycles.
We need the input in the form of an adjacency matrix.
public int countTricycles(int [][] adj){
int n = adj.length;
int count = 0;
for(int i = 0; i < n ;i++){
for(int j = 0; j < n; j++){
if(adj[i][j] != 0){
for(int k = 0; k < n; k++){
if(k!=i && adj[j][k] != 0 && adj[i][k] != 0 ){
count++;
}
}
}
}
}
return count/6;
}
The complexity would be O(n^3).
I have a graph G with a starting node S and an ending node E. What's special with this graph is that instead of edges having costs, here it's the nodes that have a cost. I want to find the way (a set of nodes, W) between S and E, so that max(W) is minimized. (In reality, I am not interested of W, just max(W)) Equivalently, if I remove all nodes with cost larger than k, what's the smallest k so that S and E are still connected?
I have one idea, but want to know if it is correct and optimal. Here's my current pseudocode:
L := Priority Queue of nodes (minimum on top)
L.add(S, S.weight)
while (!L.empty) {
X = L.poll()
return X.weight if (X == G)
mark X visited
foreach (unvisited neighbour N of X, N not in L) {
N.weight = max(N.weight, X.weight)
L.add(N, N.weight)
}
}
I believe it is worst case O(n log n) where n is the number of nodes.
Here are some details for my specific problem (percolation), but I am also interested of algorithms for this problem in general. Node weights are randomly uniformly distributed between 0 and a given max value. My nodes are Poisson distributed on the R²-plane, and an edge between two nodes exists if the distance between two nodes is less than a given constant. There are potentially very many nodes, so they are generated on the fly (hidden in the foreach in the pseudocode). My starting node is in (0,0) and the ending node is any node on a distance larger than R from (0,0).
EDIT: The weights on the nodes are floating point numbers.
Starting from an empty graph, you can insert vertices (and their edges to existing neighbours) one at a time in increasing weight order, using a fast union/find data structure to maintain the set of connected components. This is just like the Kruskal algorithm for building minimum spanning trees, but instead of adding edges one at a time, for each vertex v that you process, you would combine the components of all of v's neighbours.
You also keep track of which two components contain the start and end vertices. (Initially comp(S) = S and comp(E) = E; before each union operation, the two input components X and Y can be checked to see whether either one is either comp(S) or comp(E), and the latter updated accordingly in O(1) time.) As soon as these two components become a single component (i.e. comp(S) = comp(E)), you stop. The vertex just added is the maximum weight vertex on the the path between S and E that minimises the maximum weight of any vertex.
[EDIT: Added time complexity info]
If the graph contains n vertices and m edges, it will take O(n log n) time to sort the vertices by weight. There will be at most m union operations (since every edge could be used to combine two components). If a simple disjoint set data structure is used, all of these union operations could be done in O(m + n log n) time, and this would become the overall time complexity; if path compression is also used, this drops to O(m A(n)), where A(n) is the incredibly slowly growing inverse Ackermann function, but the overall time complexity remains unchanged from before because the initial sorting dominates.
Assuming integer weights, Pham Trung's binary search approach will take O((n + m) log maxW) time, where maxW is the heaviest vertex in the graph. On sparse graphs (where m = O(n)), this becomes O(n log maxW), while mine becomes O(n log n), so here his algorithm will beat mine if log(maxW) << log(n) (i.e. if all weights are very small). If his algorithm is called on a graph with large weights but only a small number of distinct weights, then one possible optimisation would be to sort the weights in O(n log n) time and then replace them all with their ranks in the sorted order.
This problem can be solved by using binary search.
Assume that the solution is x, Starting from the start, we will use BFS or DFS to discover the graph, visit only those nodes which have weight <= x. So, in the end, if Start and End is connected, x can be the solution. We can find the optimal value for x by applying binary search.
Pseudo code
int min = min_value_of_all_node;
int max = max_value_of_all_node;
int result = max;
while(min<= max){
int mid = (min + max)>>1;
if(BFS(mid)){//Using Breadth first search to discover the graph.
result = min(mid, result);
max = mid - 1;
}else{
min = mid + 1;
}
}
print result;
Note: we only need to apply those weights that exist in the graph, so this can help to reduce time complexity of the binary search to O(log n) with n is number of distinct weights
If the weights are float, just use the following approach:
List<Double> listWeight ;//Sorted list of weights
int min = 0;
int max = listWeight.size() - 1;
int result = max;
while(min<= max){
int mid = (min + max)>>1;
if(BFS(listWeight.get(mid))){//Using Breadth first search to discover the graph.
result = min(mid, result);
max = mid - 1;
}else{
min = mid + 1;
}
}
print listWeight.get(result);
I have to develop an O(|V|+|E|) algorithm related to topological sort which, in a directed acyclic graph (DAG), determines the number of paths from each vertex of the graph to t (t is a node with out-degree 0). I have developed a modification of DFS as follow:
DFS(G,t):
for each vertex u ∈ V do
color(u) = WHITE
paths_to_t(u) = 0
for each vertex u ∈ V do
if color(u) == WHITE then
DFS-Visit(u,t)
DFS-Visit(u,t):
color(u) = GREY
for each v ∈ neighbors(u) do
if v == t then
paths_to_t(u) = paths_to_t(u) + 1
else then
if color(v) == WHITE then
DFS-Visit(v)
paths_to_t(u) = paths_to_t(u) + paths_to_t(v)
color(u) = BLACK
But I am not sure if this algorithm is related to topological sort or if should I restructure my work with another point of view.
It can be done using Dynamic Programming and topological sort as follows:
Topological sort the vertices, let the ordered vertices be v1,v2,...,vn
create new array of size t, let it be arr
init: arr[t] = 1
for i from t-1 to 1 (descending, inclusive):
arr[i] = 0
for each edge (v_i,v_j) such that i < j <= t:
arr[i] += arr[j]
When you are done, for each i in [1,t], arr[i] indicates the number of paths from vi to vt
Now, proving the above claim is easy (comparing to your algorithm, which I have no idea if its correct and how to prove it), it is done by induction:
Base: arr[t] == 1, and indeed there is a single path from t to t, the empty one.
Hypothesis: The claim is true for each k in range m < k <= t
Proof: We need to show the claim is correct for m.
Let's look at each out edge from vm: (v_m,v_i).
Thus, the number of paths to vt starting from v_m that use this edge (v_m,v_i). is exactly arr[i] (induction hypothesis). Summing all possibilities of out edges from v_m, gives us the total number of paths from v_m to v_t - and this is exactly what the algorithm do.
Thus, arr[m] = #paths from v_m to v_t
QED
Time complexity:
The first step (topological sort) takes O(V+E).
The loop iterate all edges once, and all vertices once, so it is O(V+E) as well.
This gives us total complexity of O(V+E)
I have a weighted graph, no negative weights, and I would like to find the path from one node to another, trying to minimize the cost for the single step. I don't need to minimize the total cost of the trip (as e.g. Dijkstra does) but the average step-cost. However, I have a constraint: K, the maximum number of nodes in the path.
So for example to go from A to J maybe Dijkstra would find this path (between parenthesis the weight)
A (4) D (6) J -> total cost: 10
and the algorithm I need, setting K = 10, would find something like
A (1) B (2) C (2) D (1) E (3) F (2) G (1) H (3) J -> total cost: 15
Is there any well known algorithm for this problem?
Thanks in advance.
Eugenio
Edit as answer to templatetypedef.
Some questions:
1) The fact that it can happen to take a cycle multiple times to drive down the average is not good for my problem: maybe I should have mentioned it but I don' want to visit the same node more than once
2) Is it possible to exploit the fact that I don't have negative weights?
3) When you said O(kE) you meant for the whole algorithm or just for the additional part?
Let's take this simple implementation in C where n=number of nodes e=number of edges, d is a vector with the distances, p a vector with the predecessor and a structure edges (u,v,w) memorize the edges in the graphs
for (i = 0; i < n; ++i)
d[i] = INFINITY;
d[s] = 0;
for (i = 0; i < n - 1; ++i)
for (j = 0; j < e; ++j)
if (d[edges[j].u] + edges[j].w < d[edges[j].v]){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
}
I'm not sure how I should modify the code according to your answer; to take into consideration the average instead of the total cost should this be enough?
for (i = 0; i < n; ++i)
d[i] = INFINITY;
d[s] = 0;
for (i = 0; i < n - 1; ++i)
steps = 0;
for (j = 0; j < e; ++j)
if ( (d[edges[j].u]+ edges[j].w)/(steps+1) < d[edges[j].v]/steps){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
steps++;
}
But anyway I don't know how take into consideration the K limit at the same time...Thanks again in advance for your help.
Edit
Since I can afford some errors I'm thinking about this naif solution:
precompute all the shortest paths and memorize in A
precompute all the shortest paths on a modified graph, where I cut the edges over a certain weight and memorize them in B
When I need a path, I look in A, e.g. from x to y this is the path
x->z->y
then for each step I look in B,
so for x > z I see if there is a connection in B, if not I keep x > z otherwise I fill the path x > z with the subpath provided by B, that could be something like x->j->h->z; then I do the same for z->y.
Each time I will also check if I'm adding a cyclic path.
Maybe I will get some weird paths but it could work in most of the case.
If I extend the solution trying with different "cut thresholds" maybe I can also be close to respect the K constrain.
I believe that you can solve this using a modified version of the Bellman-Ford algorithm.
Bellman-Ford is based on the following dynamic programming recurrence that tries to find the shortest path from some start node s to each other node that's of length no greater than m for some m. As a base case, when you consider paths of length zero, the only reachable node is s and the initial values are
BF(s, t, 0) = infinity
BF(s, s, 0) = 0
Then, if we know the values for a path of length m, we can find it for paths of length m + 1 by noting that the old path may still be valid, or we want to extend some path by length one:
BF(s, t, m + 1) = min {
BF(s, t, m),
BF(s, u, m) + d(u, t) for any node u connected to t
}
The algorithm as a whole works by noting that any shortest path must have length no greater than n and then using the above recurrence and dynamic programming to compute the value of BF(s, t, n) for all t. Its overall runtime is O(EV), since there are E edges to consider at each step and V total vertices.
Let's see how we can change this algorithm to solve your problem. First, to limit this to paths of length k, we can just cut off the Bellman-Ford iteration after finding all shortest paths of length up to k. To find the path with lowest average cost is a bit trickier. At each point, we'll track two quantities - the length of the shortest path reaching a node t and the average length of that path. When considering new paths that can reach t, our options are to either keep the earlier path we found (whose cost is given by the shortest path so far divided by the number of nodes in it) or to extend some other path by one step. The new cost of that path is then given by the total cost from before plus the edge length divided by the number of edges in the old path plus one. If we take the cheapest of these and then record both its cost and number of edges, at the end we will have computed the path with lowest average cost of length no greater than k in time O(kE). As an initialization, we will say that the path from the start node to itself has length 0 and average cost 0 (the average cost doesn't matter, since whenever we multiply it by the number of edges we get 0). We will also say that every other node is at distance infinity by saying that the average cost of an edge is infinity and that the number of edges is one. That way, if we ever try computing the cost of a path formed by extending the path, it will appear to have average cost infinity and won't be chosen.
Mathematically, the solution looks like this. At each point we store the average edge cost and the total number of edges at each node:
BF(s, t, 0).edges = 1
BF(s, t, 0).cost = infinity
BF(s, s, 0).edges = 0
BF(s, s, 0).cost = 0
BF(s, t, m + 1).cost = min {
BF(s, t, m).cost,
(BF(s, u, m).cost * BF(s, u, m).edges + d(u, t)) / (BF(s, u, m).edges + 1)
}
BF(s, t, m + 1).edges = {
BF(s, t, m).edges if you chose the first option above.
BF(s, u, m).edges + 1 else, where u is as above
}
Note that this may not find a simple path of length k, since minimizing the average cost might require you to take a cycle with low (positive or negative) cost multiple times to drive down the average. For example, if a graph has a cost-zero loop, you should just keep taking it as many times as you can.
EDIT: In response to your new questions, this approach won't work if you don't want to duplicate nodes on a path. As #comestibles has pointed out, this version of the problem is NP-hard, so unless P = NP you shouldn't expect to find any good polynomial-time algorithm for this problem.
As for the runtime, the algorithm I've described above runs in total time O(kE). This is because each iteration of computing the recurrence takes O(E) time and there are a total of k iterations.
Finally, let's look at your proposed code. I've reprinted it here:
for (i = 0; i < n - 1; ++i) {
steps = 0;
for (j = 0; j < e; ++j) {
if ( (d[edges[j].u]+ edges[j].w)/(steps+1) < d[edges[j].v]/steps){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
steps++;
}
}
}
Your first question was how to take k into account. This can be done easily by rewriting the outer loop to count up to k, not n - 1. That gives us this code:
for (i = 0; i < k; ++i) {
steps = 0;
for (j = 0; j < e; ++j) {
if ( (d[edges[j].u]+ edges[j].w)/(steps+1) < d[edges[j].v]/steps){
d[edges[j].v] = d[edges[j].u] + edges[j].w;
p[edges[j].v] = u;
steps++;
}
}
}
One problem that I'm noticing is that the modified Bellman-Ford algorithm needs to have each candidate best path store its number of edges independently, since each node's optimal path might be reached by a different number of edges. To fix this, I would suggest having the d array store two values - the number of edges required to reach the node and the average cost of a node along that path. You would then update your code by replacing the steps variable in these equations with the cached path lengths.
Hope this helps!
For the new version of your problem, there's a reduction from Hamilton path (making your problem intractable). Take an instance of Hamilton path (i.e., a graph whose edges are assumed to have unit weight), add source and sink vertices and edges of weight 2 from the source to all others and from the sink to all others. Set K = |V| + 2 and request a path from source to sink. There exists a Hamilton path if and only if the optimal mean edge length is (|V| + 3)/(|V| + 2).
Care to tell us why you want these paths so that we can advise you of a reasonable approximation strategy?
You can slightly modify Bellman-Ford algorithm to find minimum path using at most K edges/nodes.
If the number of edges is fixed than you have to minimize total cost, because average cost would be TotalCost/NumberOfEdges.
One of the solutions would be to iterate NumberOfEdges from 1 to K, find minimal total cost and choose minimum TotalCost/NumberOfEdges.
Find the shortest path through a graph in efficient time, with the additional constraint that the path must contain exactly n nodes.
We have a directed, weighted graph. It may, or may not contain a loop. We can easily find the shortest path using Dijkstra's algorithm, but Dijkstra's makes no guarantee about the number of edges.
The best we could come up with was to keep a list of the best n paths to a node, but this uses a huge amount of memory over vanilla Dijkstra's.
It is a simple dynamic programming algorithm.
Let us assume that we want to go from vertex x to vertex y.
Make a table D[.,.], where D[v,k] is the cost of the shortest path of length k from the starting vertex x to the vertex v.
Initially D[x,1] = 0. Set D[v,1] = infinity for all v != x.
For k=2 to n:
D[v,k] = min_u D[u,k-1] + wt(u,v), where we assume that wt(u,v) is infinite for missing edges.
P[v,k] = the u that gave us the above minimum.
The length of the shortest path will then be stored in D[y,n].
If we have a graph with fewer edges (sparse graph), we can do this efficiently by only searching over the u that v is connected to. This can be done optimally with an array of adjacency lists.
To recover the shortest path:
Path = empty list
v = y
For k= n downto 1:
Path.append(v)
v = P[v,k]
Path.append(x)
Path.reverse()
The last node is y. The node before that is P[y,n]. We can keep following backwards, and we will eventually arrive at P[v,2] = x for some v.
The alternative that comes to my mind is a depth first search (as opposed to Dijkstra's breadth first search), modified as follows:
stop "depth"-ing if the required vertex count is exceeded
record the shortest found (thus far) path having the correct number of nodes.
Run time may be abysmal, but it should come up with the correct result while using a very reasonable amount of memory.
Interesting problem. Did you discuss using a heuristic graph search (such as A*), adding a penalty for going over or under the node count? This may or may not be admissible, but if it did work, it may be more efficient than keeping a list of all the potential paths.
In fact, you may be able to use backtracking to limit the amount of memory being used for the Dijkstra variation you discussed.
A rough idea of an algorithm:
Let A be the start node, and let S be a set of nodes (plus a path). The invariant is that at the end of step n, S will all nodes that are exactly n steps from A and the paths will be the shortest paths of that length. When n is 0, that set is {A (empty path)}. Given such a set at step n - 1, you get to step n by starting with an empty set S1 and
for each (node X, path P) in S
for each edge E from X to Y in S,
If Y is not in S1, add (Y, P + Y) to S1
If (Y, P1) is in S1, set the path to the shorter of P1 and P + Y
There are only n steps, and each step should take less than max(N, E), which makes the
entire algorithm O(n^3) for a dense graph and O(n^2) for a sparse graph.
This algorith was taken from looking at Dijkstra's, although it is a different algorithm.
let say we want shortest distance from node x to y of k step
simple dp solution would be
A[k][x][y] = min over { A[1][i][k] + A[t-1][k][y] }
k varies from 0 to n-1
A[1][i][j] = r[i][j]; p[1][i][j]=j;
for(t=2; t<=n; t++)
for(i=0; i<n; i++) for(j=0; j<n; j++)
{
A[t][i][j]=BG; p[t][i][j]=-1;
for(k=0; k<n; k++) if(A[1][i][k]<BG && A[t-1][k][j]<BG)
if(A[1][i][k]+A[t-1][k][j] < A[t][i][j])
{
A[t][i][j] = A[1][i][k]+A[t-1][k][j];
p[t][i][j] = k;
}
}
trace back the path
void output(int a, int b, int t)
{
while(t)
{
cout<<a<<" ";
a = p[t][a][b];
t--;
}
cout<<b<<endl;
}