How to count all reachable nodes in a directed graph? - algorithm

There is a directed graph (which might contain cycles), and each node has a value on it, how could we get the sum of reachable value for each node. For example, in the following graph:
the reachable sum for node 1 is: 2 + 3 + 4 + 5 + 6 + 7 = 27
the reachable sum for node 2 is: 4 + 5 + 6 + 7 = 22
.....
My solution: To get the sum for all nodes, I think the time complexity is O(n + m), the n is the number of nodes, and m stands for the number of edges. DFS should be used,for each node we should use a method recursively to find its sub node, and save the sum of sub node when finishing the calculation for it, so that in the future we don't need to calculate it again. A set is needed to be created for each node to avoid endless calculation caused by loop.
Does it work? I don't think it is elegant enough, especially many sets have to be created. Is there any better solution? Thanks.

This can be done by first finding Strongly Connected Components (SCC), which can be done in O(|V|+|E|). Then, build a new graph, G', for the SCCs (each SCC is a node in the graph), where each node has value which is the sum of the nodes in that SCC.
Formally,
G' = (V',E')
Where V' = {U1, U2, ..., Uk | U_i is a SCC of the graph G}
E' = {(U_i,U_j) | there is node u_i in U_i and u_j in U_j such that (u_i,u_j) is in E }
Then, this graph (G') is a DAG, and the question becomes simpler, and seems to be a variant of question linked in comments.
EDIT previous answer (striked out) is a mistake from this point, editing with a new answer. Sorry about that.
Now, a DFS can be used from each node to find the sum of values:
DFS(v):
if v.visited:
return 0
if v is leaf:
return v.value
v.visited = true
return sum([DFS(u) for u in v.children])
This is O(V^2 + VE) worst vase, but since the graph has less nodes, V
and E are now significantly lower.
Some local optimizations can be made, for example, if a node has a single child, you can reuse the pre-calculated value and not apply DFS on the child again, since there is no fear of counting twice in this case.
A DP solution for this problem (DAG) can be:
D[i] = value(i) + sum {D[j] | (i,j) is an edge in G' }
This can be calculated in linear time (after topological sort of the DAG).
Pseudo code:
Find SCCs
Build G'
Topological sort G'
Find D[i] for each node in G'
apply value for all node u_i in U_i, for each U_i.
Total time is O(|V|+|E|).

You can use DFS or BFS algorithms for solving Your problem.
Both have complexity O(V + E)
You dont have to count all values for all nodes. And you dont need recursion.
Just make something like this.
Typically DFS looks like this.
unmark all vertices
choose some starting vertex x
mark x
list L = x
while L nonempty
choose some vertex v from front of list
visit v
for each unmarked neighbor w
mark w
add it to end of list
In Your case You have to add some lines
unmark all vertices
choose some starting vertex x
mark x
list L = x
float sum = 0
while L nonempty
choose some vertex v from front of list
visit v
sum += v->value
for each unmarked neighbor w
mark w
add it to end of list

Related

Find the N highest cost vertices that has a path to S, where S is a vertex in an undirected Graph G

I would like to know, what would be the most efficient way (w.r.t., Space and Time) to solve the following problem:
Given an undirected Graph G = (V, E), a positive number N and a vertex S in V. Assume that every vertex in V has a cost value. Find the N highest cost vertices that is connected to S.
For example:
G = (V, E)
V = {v1, v2, v3, v4},
E = {(v1, v2),
(v1, v3),
(v2, v4),
(v3, v4)}
v1 cost = 1
v2 cost = 2
v3 cost = 3
v4 cost = 4
N = 2, S = v1
result: {v3, v4}
This problem can be solved easily by the graph traversal algorithm (e.g., BFS or DFS). To find the vertices connected to S, we can run either BFS or DFS starting from S. As the space and time complexity of BFS and DFS is same (i.e., time complexity: O(V+E), space complexity: O(E)), here I am going to show the pseudocode using DFS:
Parameter Definition:
* G -> Graph
* S -> Starting node
* N -> Number of connected (highest cost) vertices to find
* Cost -> Array of size V, contains the vertex cost value
procedure DFS-traversal(G,S,N,Cost):
let St be a stack
let Q be a min-priority-queue contains <cost, vertex-id>
let discovered is an array (of size V) to mark already visited vertices
St.push(S)
// Comment: if you do not want to consider the case "S is connected to S"
// then, you can consider commenting the following line
Q.push(make-pair(S, Cost[S]))
label S as discovered
while St is not empty
v = St.pop()
for all edges from v to w in G.adjacentEdges(v) do
if w is not labeled as discovered:
label w as discovered
St.push(w)
Q.push(make-pair(w, Cost[w]))
if Q.size() == N + 1:
Q.pop()
let ret is a N sized array
while Q is not empty:
ret.append(Q.top().second)
Q.pop()
Let's first describe the process first. Here, I run the iterative version of DFS to traverse the graph starting from S. During the traversal, I use a priority-queue to keep the N highest cost vertices that is reachable from S. Instead of the priority-queue, we can use a simple array (or even we can reuse the discovered array) to keep the record of the reachable vertices with cost.
Analysis of space-complexity:
To store the graph: O(E)
Priority-queue: O(N)
Stack: O(V)
For labeling discovered: O(V)
So, as O(E) is the dominating term here, we can consider O(E) as the overall space complexity.
Analysis of time-complexity:
DFS-traversal: O(V+E)
To track N highest cost vertices:
By maintaining priority-queue: O(V*logN)
Or alternatively using array: O(V*logV)
The overall time-complexity would be: O(V*logN + E) or O(V*logV + E)

Pairs of vertices in graph having specific distance

Given a tree with N vertices and a positive number K. Find the number of distinct pairs of the vertices which have a distance of exactly K between them. Note that pairs (v, u) and (u, v) are considered to be the same pair (1 ≤ N ≤ 50000, 1 ≤ K ≤ 500).
I am not able to find an optimum solution for this. I can do BFS from each vertex and count the no of vertices that is reachable from that and having distance less than or equal to K. But then in worst case the complexity will be order of 2. Is there any faster way around??
You can achieve that in more simple way.
Run DFS on tree and for each vertex calculate the distance from the root - save those in array (access by o(1)).
For each pair of vertex in your graph:
Find their LCA (Lowest common ancestor there are algorithm to do that in 0(1)).
Assume u and v are 2 arbitrary vertices and w is their LCA -> subtract the distance from the w to the root from u to the root - now you have the distance between u and w. Do the same for v -> with o(1) you got the distance for (v,w) and (u,w) -> sum them together and you get the (v,u) distance - now all you have to do is compare to K.
Final complexity is o(n^2)
Improving upon the other answer's approach, we can make some key observations.
To calculate distances between two nodes you need their LCA(Lowest Common Ancestor) and depths as the other answer is trying to do. The formula used here is:
Dist(u, v) = depth[u] + depth[v] - 2 * depth[ lca(u, v) ]
Depth[x] denotes distance of x from root, precomputed using DFS once starting from root node.
Now here comes the key observation, you already have distance value K, assume that dist(u, v) = K using this assumption calculate(predict?) depth of LCA. By putting K in above formula we get:
depth[ lca(u, v) ] = (depth[u] + depth[v] - K) / 2
Now that you have depth of LCA you know that distance between u and lca is depth[u] - depth[ lca(u, v) ] and between v and lca is depth[v] - depth[ lca(u, v) ], let this be X and Y respectively.
Now we know that LCA is the lowest common ancestor thus, the Xth parent of u and Yth parent of v should be the LCA, so now if Xth parent of u and Yth parent of v is indeed the same node then we can say that our pre-assumption about distances between the nodes was true and the distance between the two nodes is K.
You can caluculate the Xth and Yth ancestor of the nodes in O(logN) complexity using Binary Lifting Algorithm with a preprocessing of O(NLogN) time, this preprocessing can be included directly in your DFS when a node is visited for the first time.
Corner Cases:
Calcuated depth of LCA should not be a fraction or negative.
If depth of any node u or v matches the calculated depth of the node then that node is the ancestor of the other node.
Consider this tree:
Assuming K = 4, we get that depth[lca] = 1 using the formula above, and if we get the Xth and Yth ancestor of u and v we will get the same node 1, which should validate our assumption but this is not true since the distance between u and v is actually 2 as visible in the picture above. This is because LCA in this case is actually 2, to handle this case calcuate X-1th and Y-1th ancestor of u and v too, respectively and check if they are different.
Final Complexity: O(NlogN)

A function definition for directed acyclic graphs

Let's consider the following problem: For a directed acyclic graph G = (V,E) we define the function "levels" for each vertex u, as l(u) such that:
1. l(u)>=0 for every u
2. If there is a path from u to v (u -> v) then l(u)>l(v)
3. For each vertex u, l(u) is the minimum integer that satisfies both conditions 1 and 2.
The problem says:
a. Prove that for every DAG the above function is uniquely defined, i.e. it's the only function that satisfies conditions 1,2 and 3.
b. Find an O(|V| + |E|) algorithm that calculates this function for every vertex.
Here is a possible algorithm based on topological sort:
First we find the transpose of G which is G^T, defined as G^T = (V,E^T), where E^T={(u,v): (v,u) is in E} which takes O(|V|+|E|) in total if based on adjacency list implementation:
(O(|V|) for allocation and sum for all v in V of |Adj[v]| = O(|E|)). Topological sort takes Theta(|V|+|E|) since it includes a BFS and |V| insertions in list each of which take O(1).
TRANSPOSE(G){
Allocate |V| list pointers for G^T i.e. (Adj'[])
for(i = 1, i <= |V|, i++){
for every vertex v in Adj[i]{
add vertex i to Adj'[v]
}
}
}
L = TopSort(G)
a. Prove that for every DAG the above function is uniquely defined, i.e. it's the only function that satisfies conditions 1,2 and 3.
Maybe I am missing something, but this seems really obvious to me: if you define it as the minimum that satisfies those conditions, how can there be more than one?
b. Find an O(|V| + |E|) algorithm that calculates this function for every vertex.
I think your topological sort idea is correct (note that a topological sort is a BFS), but it should be performed on the transposed graph (reverse the direction of every edge). Then the first values in the topological sort get 0, the next get 1 etc. For example, for the transposed graph:
1 2 3
*-->*-->*
^
*-------|
1
I have numbered the nodes with their positions in the topological sort. You number the nodes by implementing the topological sort using a BFS. When you extract a node from your FIFO queue, you subtract 1 from the indegree of all of its reachable nodes. When that indegree becomes 0 you insert the node it became 0 for in the queue and you number it as exracted_node + 1. In my example, the nodes numbered 1 start with indegree 0. Then, the bottom-most 1 subtract one from the indegree of the node labeled 3, but that indegree will be 1, not zero, so we don't insert it in the queue. We insert 2 however because its indegree will become 0.
Pseudocode:
G = G^t
Q = a FIFO queue
push all nodes with indegree 0 in Q
set l(v) = 0 for all nodes with indegree 0
indegree(v) = how many edges are going into node v
while not Q.Empty():
x = Q.Pop()
for all nodes v reachable from x:
if indegree[v] > 0:
indegree[v] = indegree[v] - 1
if indegree[v] == 0:
Q.Push(v)
l[v] = l[x] + 1
You can also do it with a DFS that computes the value of each node once the recursion returns, as:
value(v) = 1 + max{value(c), c a child of v}
Note that the DFS is not dont on the transposed graph, because we'll let the recursion handle the traversal in topological sort order.
Let's say you have a topological sort of G. Then you can consider vertices in reversed order: if you have a u -> v edge, then v comes before u in ordering.
If you loop on the nodes with this order, then let l(u) = 0 if there is no outgoing edges and l(u) = 1 + max(l(v), for each v such that there is an edge (u, v)). This is optimal and give you an O(|V| + |E|) algorithm to solve this problem.
Proof is left as an exercise. :D

Maximum weighted path between two vertices in a directed acyclic Graph

Love some guidance on this problem:
G is a directed acyclic graph. You want to move from vertex c to vertex z. Some edges reduce your profit and some increase your profit. How do you get from c to z while maximizing your profit. What is the time complexity?
Thanks!
The problem has an optimal substructure. To find the longest path from vertex c to vertex z, we first need to find the longest path from c to all the predecessors of z. Each problem of these is another smaller subproblem (longest path from c to a specific predecessor).
Lets denote the predecessors of z as u1,u2,...,uk and dist[z] to be the longest path from c to z then dist[z]=max(dist[ui]+w(ui,z))..
Here is an illustration with 3 predecessors omitting the edge set weights:
So to find the longest path to z we first need to find the longest path to its predecessors and take the maximum over (their values plus their edges weights to z).
This requires whenever we visit a vertex u, all of u's predecessors must have been analyzed and computed.
So the question is: for any vertex u, how to make sure that once we set dist[u], dist[u] will never be changed later on? Put it in another way: how to make sure that we have considered all paths from c to u before considering any edge originating at u?
Since the graph is acyclic, we can guarantee this condition by finding a topological sort over the graph. topological sort is like a chain of vertices where all edges point left to right. So if we are at vertex vi then we have considered all paths leading to vi and have the final value of dist[vi].
The time complexity: topological sort takes O(V+E). In the worst case where z is a leaf and all other vertices point to it, we will visit all the graph edges which gives O(V+E).
Let f(u) be the maximum profit you can get going from c to u in your DAG. Then you want to compute f(z). This can be easily computed in linear time using dynamic programming/topological sorting.
Initialize f(u) = -infinity for every u other than c, and f(c) = 0. Then, proceed computing the values of f in some topological order of your DAG. Thus, as the order is topological, for every incoming edge of the node being computed, the other endpoints are calculated, so just pick the maximum possible value for this node, i.e. f(u) = max(f(v) + cost(v, u)) for each incoming edge (v, u).
Its better to use Topological Sorting instead of Bellman Ford since its DAG.
Source:- http://www.utdallas.edu/~sizheng/CS4349.d/l-notes.d/L17.pdf
EDIT:-
G is a DAG with negative edges.
Some edges reduce your profit and some increase your profit
Edges - increase profit - positive value
Edges - decrease profit -
negative value
After TS, for each vertex U in TS order - relax each outgoing edge.
dist[] = {-INF, -INF, ….}
dist[c] = 0 // source
for every vertex u in topological order
if (u == z) break; // dest vertex
for every adjacent vertex v of u
if (dist[v] < (dist[u] + weight(u, v))) // < for longest path = max profit
dist[v] = dist[u] + weight(u, v)
ans = dist[z];

Number of paths between two nodes in a DAG

I want to find number of paths between two nodes in a DAG. O(V^2) and O(V+E) are acceptable.
O(V+E) reminds me to somehow use BFS or DFS but I don't know how.
Can somebody help?
Do a topological sort of the DAG, then scan the vertices from the target backwards to the source. For each vertex v, keep a count of the number of paths from v to the target. When you get to the source, the value of that count is the answer. That is O(V+E).
The number of distinct paths from node u to v is the sum of distinct paths from nodes x to v, where x is a direct descendant of u.
Store the number of paths to target node v for each node (temporary set to 0), go from v (here the value is 1) using opposite orientation and recompute this value for each node (sum the value of all descendants) until you reach u.
If you process the nodes in topological order (again opposite orientation) you are guaranteed that all direct descendants are already computed when you visit given node.
Hope it helps.
This question has been asked elsewhere on SO, but nowhere has the simpler solution of using DFS + DP been mentioned; all solutions seem to use topological sorting. The simpler solution goes like this (paths from s to t):
Add a field to the vertex representation to hold an integer count. Initially, set vertex t’s count to 1 and other vertices’ count to 0. Start running DFS with s as the start vertex. When t is discovered, it should be immediately marked as finished (BLACK), without further processing starting from it. Subsequently, each time DFS finishes a vertex v, set v’s count to the sum of the counts of all vertices adjacent to v. When DFS finishes vertex s, stop and return the count computed for s. The time complexity of this solution is O(V+E).
Pseudo-code:
simple_path (s, t)
if (s == t)
return 1
else if (path_count != NULL)
return path_count
else
path_count = 0
for each node w ϵ adj[s]
do path_count = path_count + simple_path(w, t)
end
return path_count
end

Resources