Topological sort to find the number of paths to t - algorithm

I have to develop an O(|V|+|E|) algorithm related to topological sort which, in a directed acyclic graph (DAG), determines the number of paths from each vertex of the graph to t (t is a node with out-degree 0). I have developed a modification of DFS as follow:
DFS(G,t):
for each vertex u ∈ V do
color(u) = WHITE
paths_to_t(u) = 0
for each vertex u ∈ V do
if color(u) == WHITE then
DFS-Visit(u,t)
DFS-Visit(u,t):
color(u) = GREY
for each v ∈ neighbors(u) do
if v == t then
paths_to_t(u) = paths_to_t(u) + 1
else then
if color(v) == WHITE then
DFS-Visit(v)
paths_to_t(u) = paths_to_t(u) + paths_to_t(v)
color(u) = BLACK
But I am not sure if this algorithm is related to topological sort or if should I restructure my work with another point of view.

It can be done using Dynamic Programming and topological sort as follows:
Topological sort the vertices, let the ordered vertices be v1,v2,...,vn
create new array of size t, let it be arr
init: arr[t] = 1
for i from t-1 to 1 (descending, inclusive):
arr[i] = 0
for each edge (v_i,v_j) such that i < j <= t:
arr[i] += arr[j]
When you are done, for each i in [1,t], arr[i] indicates the number of paths from vi to vt
Now, proving the above claim is easy (comparing to your algorithm, which I have no idea if its correct and how to prove it), it is done by induction:
Base: arr[t] == 1, and indeed there is a single path from t to t, the empty one.
Hypothesis: The claim is true for each k in range m < k <= t
Proof: We need to show the claim is correct for m.
Let's look at each out edge from vm: (v_m,v_i).
Thus, the number of paths to vt starting from v_m that use this edge (v_m,v_i). is exactly arr[i] (induction hypothesis). Summing all possibilities of out edges from v_m, gives us the total number of paths from v_m to v_t - and this is exactly what the algorithm do.
Thus, arr[m] = #paths from v_m to v_t
QED
Time complexity:
The first step (topological sort) takes O(V+E).
The loop iterate all edges once, and all vertices once, so it is O(V+E) as well.
This gives us total complexity of O(V+E)

Related

A function definition for directed acyclic graphs

Let's consider the following problem: For a directed acyclic graph G = (V,E) we define the function "levels" for each vertex u, as l(u) such that:
1. l(u)>=0 for every u
2. If there is a path from u to v (u -> v) then l(u)>l(v)
3. For each vertex u, l(u) is the minimum integer that satisfies both conditions 1 and 2.
The problem says:
a. Prove that for every DAG the above function is uniquely defined, i.e. it's the only function that satisfies conditions 1,2 and 3.
b. Find an O(|V| + |E|) algorithm that calculates this function for every vertex.
Here is a possible algorithm based on topological sort:
First we find the transpose of G which is G^T, defined as G^T = (V,E^T), where E^T={(u,v): (v,u) is in E} which takes O(|V|+|E|) in total if based on adjacency list implementation:
(O(|V|) for allocation and sum for all v in V of |Adj[v]| = O(|E|)). Topological sort takes Theta(|V|+|E|) since it includes a BFS and |V| insertions in list each of which take O(1).
TRANSPOSE(G){
Allocate |V| list pointers for G^T i.e. (Adj'[])
for(i = 1, i <= |V|, i++){
for every vertex v in Adj[i]{
add vertex i to Adj'[v]
}
}
}
L = TopSort(G)
a. Prove that for every DAG the above function is uniquely defined, i.e. it's the only function that satisfies conditions 1,2 and 3.
Maybe I am missing something, but this seems really obvious to me: if you define it as the minimum that satisfies those conditions, how can there be more than one?
b. Find an O(|V| + |E|) algorithm that calculates this function for every vertex.
I think your topological sort idea is correct (note that a topological sort is a BFS), but it should be performed on the transposed graph (reverse the direction of every edge). Then the first values in the topological sort get 0, the next get 1 etc. For example, for the transposed graph:
1 2 3
*-->*-->*
^
*-------|
1
I have numbered the nodes with their positions in the topological sort. You number the nodes by implementing the topological sort using a BFS. When you extract a node from your FIFO queue, you subtract 1 from the indegree of all of its reachable nodes. When that indegree becomes 0 you insert the node it became 0 for in the queue and you number it as exracted_node + 1. In my example, the nodes numbered 1 start with indegree 0. Then, the bottom-most 1 subtract one from the indegree of the node labeled 3, but that indegree will be 1, not zero, so we don't insert it in the queue. We insert 2 however because its indegree will become 0.
Pseudocode:
G = G^t
Q = a FIFO queue
push all nodes with indegree 0 in Q
set l(v) = 0 for all nodes with indegree 0
indegree(v) = how many edges are going into node v
while not Q.Empty():
x = Q.Pop()
for all nodes v reachable from x:
if indegree[v] > 0:
indegree[v] = indegree[v] - 1
if indegree[v] == 0:
Q.Push(v)
l[v] = l[x] + 1
You can also do it with a DFS that computes the value of each node once the recursion returns, as:
value(v) = 1 + max{value(c), c a child of v}
Note that the DFS is not dont on the transposed graph, because we'll let the recursion handle the traversal in topological sort order.
Let's say you have a topological sort of G. Then you can consider vertices in reversed order: if you have a u -> v edge, then v comes before u in ordering.
If you loop on the nodes with this order, then let l(u) = 0 if there is no outgoing edges and l(u) = 1 + max(l(v), for each v such that there is an edge (u, v)). This is optimal and give you an O(|V| + |E|) algorithm to solve this problem.
Proof is left as an exercise. :D

Longest path in ordered graph

Let G = (V, E) be a directed graph with nodes v_1, v_2,..., v_n. We say that G is an ordered graph if it has the following properties.
Each edge goes from a node with lower index to a node with a higher index. That is, every directed edge has the form (v_i, v_j) with i < j.
Each node except v_n has at least one edge leaving it. That is, for every node v_i, there is at least one edge of the form (v_i, v_j).
Give an efficient algorithm that takes an ordered graph G and returns the length of the longest path that begins at v_1 and ends at v_n.
If you want to see the nice latex version: here
My attempt:
Dynamic programming. Opt(i) = max {Opt(j)} + 1. for all j such such j is reachable from i.
Is there perhaps a better way to do this? I think even with memoization my algorithm will still be exponential. (this is just from an old midterm review I found online)
Your approach is right, you will have to do
Opt(i) = max {Opt(j)} + 1} for all j such that j is reachable from i
However, this is exponential only if you run it without memoization. With memoization, you will have the memoized optimal value for every node j, j > i, when you are on node i.
For the worst case complexity, let us assume that every two nodes that could be connected are connected. This means, v_1 is connected with (v_2, v_3, ... v_n); v_i is connected with (v_(i+1), v_(i+2), ... v_n).
Number of Vertices (V) = n
Hence, number of edges (E) = n*(n+1)/2 = O(V^2)
Let us focus our attention on a vertex v_k. For this vertex, we have to go through the already derived optimal values of (n-k) nodes.
Number of ways of reaching v_k directly = (k-1)
Hence worst case time complexity => sigma((k-1)*(n-k)) from k=1 to k=n, which is a sigma of power 2 polynomical, and hence will result in O(n^3) Time complexity.
Simplistically, the worst case time complexity is O(n^3) == O(V^3) == O(E) * O(V) == O(EV).
Thanks to the first property, this problem can be solved O(V^2) or even better with O(E) where V is the number of vertices and E is the number of edges. Indeed, it uses the dynamic programming approach which is quiet similar with the one you gives. Let opt[i] be the length of the longest path for v_1 to v_i. Then
opt[i] = max(opt[j]) + 1 where j < i and we v_i and v_j is connected,
using this equation, it can be solved in O(V^2).
Even better, we can solve this in another order.
int LongestPath() {
for (int v = 1; v <= V; ++v) opt[v] = -1;
opt[1] = 0;
for (int v = 1; v <= V; ++v) {
if (opt[v] >= 0) {
/* Each edge can be visited at most once,
thus the runtime time is bounded by |E|.
*/
for_each( v' can be reached from v)
opt[v'] = max(opt[v]+1, opt[v']);
}
}
return opt[V];
}

Complete graph with only two possible costs. What's the shortest path's cost from 0 to N - 1

You are given a complete undirected graph with N vertices. All but K edges have a cost of A. Those K edges have a cost of B and you know them (as a list of pairs). What's the minimum cost from node 0 to node N - 1.
2 <= N <= 500k
0 <= K <= 500k
1 <= A, B <= 500k
The problem is, obviously, when those K edges cost more than the other ones and node 0 and node N - 1 are connected by a K-edge.
Dijkstra doesn't work. I've even tried something very similar with a BFS.
Step1: Let G(0) be the set of "good" adjacent nodes with node 0.
Step2: For each node in G(0):
compute G(node)
if G(node) contains N - 1
return step
else
add node to some queue
repeat step2 and increment step
The problem is that this uses up a lot of time due to the fact that for every node you have to make a loop from 0 to N - 1 in order to find the "good" adjacent nodes.
Does anyone have any better ideas? Thank you.
Edit: Here is a link from the ACM contest: http://acm.ro/prob/probleme/B.pdf
This is laborous case work:
A < B and 0 and N-1 are joined by A -> trivial.
B < A and 0 and N-1 are joined by B -> trivial.
B < A and 0 and N-1 are joined by A ->
Do BFS on graph with only K edges.
A < B and 0 and N-1 are joined by B ->
You can check in O(N) time is there is a path with length 2*A (try every vertex in middle).
To check other path lengths following algorithm should do the trick:
Let X(d) be set of nodes reachable by using d shorter edges from 0. You can find X(d) using following algorithm: Take each vertex v with unknown distance and iterativelly check edges between v and vertices from X(d-1). If you found short edge, then v is in X(d) otherwise you stepped on long edge. Since there are at most K long edges you can step on them at most K times. So you should find distance of each vertex in at most O(N + K) time.
I propose a solution to a somewhat more general problem where you might have more than two types of edges and the edge weights are not bounded. For your scenario the idea is probably a bit overkill, but the implementation is quite simple, so it might be a good way to go about the problem.
You can use a segment tree to make Dijkstra more efficient. You will need the operations
set upper bound in a range as in, given U, L, R; for all x[i] with L <= i <= R, set x[i] = min(x[i], u)
find a global minimum
The upper bounds can be pushed down the tree lazily, so both can be implemented in O(log n)
When relaxing outgoing edges, look for the edges with cost B, sort them and update the ranges in between all at once.
The runtime should be O(n log n + m log m) if you sort all the edges upfront (by outgoing vertex).
EDIT: Got accepted with this approach. The good thing about it is that it avoids any kind of special casing. It's still ~80 lines of code.
In the case when A < B, I would go with kind of a BFS, where you would check where you can't reach instead of where you can. Here's the pseudocode:
G(k) is the set of nodes reachable by k cheap edges and no less. We start with G(0) = {v0}
while G(k) isn't empty and G(k) doesn't contain vN-1 and k*A < B
A = array[N] of zeroes
for every node n in G(k)
for every expensive edge (n,m)
A[m]++
# now we have that A[m] == |G(k)| iff m can't be reached by a cheap edge from any of G(k)
set G(k+1) to {m; A[m] < |G(k)|} except {n; n is in G(0),...G(k)}
k++
This way you avoid iterating through the (many) cheap edges and only iterate through the relatively few expensive edges.
As you have correctly noted, the problem comes when A > B and edge from 0 to n-1 has a cost of A.
In this case you can simply delete all edges in the graph that have a cost of A. This is because an optimal route shall only have edges with cost B.
Then you can perform a simple BFS since the costs of all edges are the same. It will give you optimal performance as pointed out by this link: Finding shortest path for equal weighted graph
Moreover, you can stop your BFS when the total cost exceeds A.

Graph Minimum Spanning Tree using BFS

This is a problem from a practice exam that I'm struggling with:
Let G = (V, E) be a weighted undirected connected graph, with positive
weights (you may assume that the weights are distinct). Given a real
number r, define the subgraph Gr = (V, {e in E | w(e) <= r}). For
example, G0 has no edges (obviously disconnected), and Ginfinity = G
(which by assumption is connected). The problem is to find the
smallest r such that Gr is connected.
Describe an O(mlogn)-time algorithm that solves the problem by
repeated applications of BFS or DFS.
The real problem is doing it in O(mlogn). Here's what I've got:
r = min( w(e) ) => O(m)
while true do => O(m)
Gr = G with edges e | w(e) > r removed => O(m)
if | BFS( Gr ).V | < |V| => O(m + n)
r++ (or r = next smallest w(e))
else
return r
That's a whopping O(m^2 + mn). Any ideas for getting it down to O(mlogn)? Thanks!
You are iterating over all possible edge costs which results in the outer loop of O(m). Notice that if the graph is disconnected when you discard all edges >w(e), it is also disconnected for >w(e') where w(e') < w(e). You can use this property to do a binary search over the edge costs and thus do this in O(log(n)).
lo=min(w(e) for e in edges), hi=max(w(e) for e in edges)
while lo<hi:
mid=(lo+hi)/2
if connected(graph after discarding all e where w(e)>w(mid)):
lo=mid
else:
hi=mid-1
return lo
The binary search has a complexity of O(log (max_e-min_e)) (you can actually bring it down to O(log(edges)) and discarding edges and determining connectivity can be done in O(edges+vertices), so this can be done in O((edge+vertices)*log(edges)).
Warning: I have not tested this in code yet, so there may be bugs. But the idea should work.
How about the following algorithm?
First take a list of all edges (or all distinct edge lengths, using ) from the graph and sort them. That takes O(m*log m) = O(m*log n) time: m is usually less than n^2, so O(log m)=O(log n^2)=O(2*log n)=O(log n).
It is obvious that r should be equal to the weight of some edge. So you can do a binary search on the index of the edge in the sorted array.
For each index you try, you take the length of the correspondong edge as r, and check the graph for connectivity, only using the edges of length <= r with BFS or DFS.
Each iteration of the binary search takes O(m), and you have to make O(log m)=O(log n) iterations.

Find cycle of shortest length in a directed graph with positive weights

I was asked this question in an interview, but I couldn't come up with any decent solution. So, I told them the naive approach of finding all the cycles then picking the cycle with the least length.
I'm curious to know what is an efficient solution to this problem.
You can easily modify Floyd-Warshall algorithm. (If you're not familiar with graph theory at all, I suggest checking it out, e.g. getting a copy of Introduction to Algorithms).
Traditionally, you start path[i][i] = 0 for each i. But you can instead start from path[i][i] = INFINITY. It won't affect algorithm itself, as those zeroes weren't used in computation anyway (since path path[i][j] will never change for k == i or k == j).
In the end, path[i][i] is the length the shortest cycle going through i. Consequently, you need to find min(path[i][i]) for all i. And if you want cycle itself (not only its length), you can do it just like it's usually done with normal paths: by memorizing k during execution of algorithm.
In addition, you can also use Dijkstra's algorithm to find a shortest cycle going through any given node. If you run this modified Dijkstra for each node, you'll get the same result as with Floyd-Warshall. And since each Dijkstra is O(n^2), you'll get the same O(n^3) overall complexity.
The pseudo code is a simple modification of Dijkstra's algorithm.
for all u in V:
for all v in V:
path[u][v] = infinity
for all s in V:
path[s][s] = 0
H = makequeue (V) .. using pathvalues in path[s] array as keys
while H is not empty:
u = deletemin(H)
for all edges (u,v) in E:
if path[s][v] > path[s][u] + l(u, v) or path[s][s] == 0:
path[s][v] = path[s][u] + l(u,v)
decreaseKey(H, v)
lengthMinCycle = INT_MAX
for all v in V:
if path[v][v] < lengthMinCycle & path[v][v] != 0 :
lengthMinCycle = path[v][v]
if lengthMinCycle == INT_MAX:
print(“The graph is acyclic.”)
else:
print(“Length of minimum cycle is ”, lengthMinCycle)
Time Complexity: O(|V|^3)
Perform DFS
During DFS keep the track of the type of the edge
Type of edges are Tree Edge, Back Edge, Down Edge and Parent Edge
Keep track when you get a Back Edge and have another counter for getting length.
See Algorithms in C++ Part5 - Robert Sedgwick for more details
What you will have to do is to assign another weight to each node which is always 1. Now run any shortest path algorithm from one node to the same node using these weights. But while considering the intermediate paths, you will have to ignore the paths whose actual weights are negative.
Below is a simple modification of Floyd - Warshell algorithm.
V = 4
INF = 999999
def minimumCycleLength(graph):
dist = [[0]*V for i in range(V)]
for i in range(V):
for j in range(V):
dist[i][j] = graph[i][j];
for k in range(V):
for i in range(V):
for j in range(V):
dist[i][j] = min(dist[i][j] ,dist[i][k]+ dist[k][j])
length = INF
for i in range(V):
for j in range(V):
length = min(length,dist[i][j])
return length
graph = [ [INF, 1, 1,INF],
[INF, INF, 1,INF],
[1, INF, INF, 1],
[INF, INF, INF, 1] ]
length = minimumCycleLength(graph)
print length

Resources