Number of paths between two nodes in a DAG - algorithm

I want to find number of paths between two nodes in a DAG. O(V^2) and O(V+E) are acceptable.
O(V+E) reminds me to somehow use BFS or DFS but I don't know how.
Can somebody help?

Do a topological sort of the DAG, then scan the vertices from the target backwards to the source. For each vertex v, keep a count of the number of paths from v to the target. When you get to the source, the value of that count is the answer. That is O(V+E).

The number of distinct paths from node u to v is the sum of distinct paths from nodes x to v, where x is a direct descendant of u.
Store the number of paths to target node v for each node (temporary set to 0), go from v (here the value is 1) using opposite orientation and recompute this value for each node (sum the value of all descendants) until you reach u.
If you process the nodes in topological order (again opposite orientation) you are guaranteed that all direct descendants are already computed when you visit given node.
Hope it helps.

This question has been asked elsewhere on SO, but nowhere has the simpler solution of using DFS + DP been mentioned; all solutions seem to use topological sorting. The simpler solution goes like this (paths from s to t):
Add a field to the vertex representation to hold an integer count. Initially, set vertex t’s count to 1 and other vertices’ count to 0. Start running DFS with s as the start vertex. When t is discovered, it should be immediately marked as finished (BLACK), without further processing starting from it. Subsequently, each time DFS finishes a vertex v, set v’s count to the sum of the counts of all vertices adjacent to v. When DFS finishes vertex s, stop and return the count computed for s. The time complexity of this solution is O(V+E).
Pseudo-code:
simple_path (s, t)
if (s == t)
return 1
else if (path_count != NULL)
return path_count
else
path_count = 0
for each node w ϵ adj[s]
do path_count = path_count + simple_path(w, t)
end
return path_count
end

Related

How to count all reachable nodes in a directed graph?

There is a directed graph (which might contain cycles), and each node has a value on it, how could we get the sum of reachable value for each node. For example, in the following graph:
the reachable sum for node 1 is: 2 + 3 + 4 + 5 + 6 + 7 = 27
the reachable sum for node 2 is: 4 + 5 + 6 + 7 = 22
.....
My solution: To get the sum for all nodes, I think the time complexity is O(n + m), the n is the number of nodes, and m stands for the number of edges. DFS should be used,for each node we should use a method recursively to find its sub node, and save the sum of sub node when finishing the calculation for it, so that in the future we don't need to calculate it again. A set is needed to be created for each node to avoid endless calculation caused by loop.
Does it work? I don't think it is elegant enough, especially many sets have to be created. Is there any better solution? Thanks.
This can be done by first finding Strongly Connected Components (SCC), which can be done in O(|V|+|E|). Then, build a new graph, G', for the SCCs (each SCC is a node in the graph), where each node has value which is the sum of the nodes in that SCC.
Formally,
G' = (V',E')
Where V' = {U1, U2, ..., Uk | U_i is a SCC of the graph G}
E' = {(U_i,U_j) | there is node u_i in U_i and u_j in U_j such that (u_i,u_j) is in E }
Then, this graph (G') is a DAG, and the question becomes simpler, and seems to be a variant of question linked in comments.
EDIT previous answer (striked out) is a mistake from this point, editing with a new answer. Sorry about that.
Now, a DFS can be used from each node to find the sum of values:
DFS(v):
if v.visited:
return 0
if v is leaf:
return v.value
v.visited = true
return sum([DFS(u) for u in v.children])
This is O(V^2 + VE) worst vase, but since the graph has less nodes, V
and E are now significantly lower.
Some local optimizations can be made, for example, if a node has a single child, you can reuse the pre-calculated value and not apply DFS on the child again, since there is no fear of counting twice in this case.
A DP solution for this problem (DAG) can be:
D[i] = value(i) + sum {D[j] | (i,j) is an edge in G' }
This can be calculated in linear time (after topological sort of the DAG).
Pseudo code:
Find SCCs
Build G'
Topological sort G'
Find D[i] for each node in G'
apply value for all node u_i in U_i, for each U_i.
Total time is O(|V|+|E|).
You can use DFS or BFS algorithms for solving Your problem.
Both have complexity O(V + E)
You dont have to count all values for all nodes. And you dont need recursion.
Just make something like this.
Typically DFS looks like this.
unmark all vertices
choose some starting vertex x
mark x
list L = x
while L nonempty
choose some vertex v from front of list
visit v
for each unmarked neighbor w
mark w
add it to end of list
In Your case You have to add some lines
unmark all vertices
choose some starting vertex x
mark x
list L = x
float sum = 0
while L nonempty
choose some vertex v from front of list
visit v
sum += v->value
for each unmarked neighbor w
mark w
add it to end of list

Linear-time algorithm for number of distinct paths from each vertex in a directed acyclic graph

I am working on the following past paper question for an algorithms module:
Let G = (V, E) be a simple directed acyclic graph (DAG).
For a pair of vertices v, u in V, we say v is reachable from u if there is a (directed) path from u to v in G.
(We assume that every vertex is reachable from itself.)
For any vertex v in V, let R(v) be the reachability number of vertex v, which is the number of vertices u in V that are reachable from v.
Design an algorithm which, for a given DAG, G = (V, E), computes the values of R(v) for all vertices v in V.
Provide the analysis of your algorithm (i.e., correctness and running time
analysis).
(Optimally, one should try to design an algorithm running in
O(n + m) time.)
So, far I have the following thoughts:
The following algorithm for finding a topological sort of a DAG might be useful:
TopologicalSort(G)
1. Run DFS on G and compute a DFS-numbering, N // A DFS-numbering is a numbering (starting from 1) of the vertices of G, representing the point at which the DFS-call on a given vertex v finishes.
2. Let the topological sort be the function a(v) = n - N[v] + 1 // n is the number of nodes in G and N[v] is the DFS-number of v.
My second thought is that dynamic programming might be a useful approach, too.
However, I am currently not sure how to combine these two ideas into a solution.
I would appreciate any hints!
EDIT: Unfortunately the approach below is not correct in general. It may count multiple times the nodes that can be reached via multiple paths.
The ideas below are valid if the DAG is a polytree, since this guarantees that there is at most one path between any two nodes.
You can use the following steps:
find all nodes with 0 in-degree (i.e. no incoming edges).
This can be done in O(n + m), e.g. by looping through all edges
and marking those nodes that are the end of any edge. The nodes with 0
in-degree are those which have not been marked.
Start a DFS from each node with 0 in-degree.
After the DFS call for a node ends, we want to have computed for that
node the information of its reachability.
In order to achieve this, we need to add the reachability of the
successors of this node. Some of these values might have already been
computed (if the successor was already visited by DFS), therefore this
is a dynamic programming solution.
The following pseudocode describes the DFS code:
function DFS(node) {
visited[node] = true;
reachability[node] = 1;
for each successor of node {
if (!visited[successor]) {
DFS(successor);
}
reachability[node] += reachability[successor];
}
}
After calling this for all nodes with 0 in-degree, the reachability
array will contain the reachability for all nodes in the graph.
The overall complexity is O(n + m).
I'd suggest using a Breadth First Search approach.
For every node, add all the nodes that are connected to the queue. In addition to that, maintain a separate array for calculating the reachability.
For example, if a A->B, then
1.) Mark A as traversed
2.) B is added to the queue
3.) arr[B]+=1
This way, we can get R(v) for all vertices in O(|V| + |E|) time through arr[].

Minimum Spanning tree different from another

Assume we are given
an undirected graph g where every node i,1 <= i < n is connected to all j,i < j <=n
and a source s.
We want to find the total costs (defined as the sum of all edges' weights) of the cheapest minimum spanning tree that differs from the minimum distance tree of s (i.e. from the MST obtained by running prim/dijkstra on s) by at least one edge.
What would be the best way to tackle this? Because currently, I can only think of some kind of fixed-point iteration
run dijkstra on (g,s) to obtain reference graph r that we need to differ from
costs := sum(edge_weights_of(r))
change := 0
for each vertex u in r, run a bfs and note for each reached vertex v the longest edge on the path from u to v.
iterate through all edges e = (a,b) in g: and find e'=(a',b') that is NOT in r and minimizes newchange := weight(e') - weight(longest_edge(a',b'))
if(first_time_here OR newchange < 0) then change += newchange
if(newchange < 0) goto 4
result := costs + change
That seems to waste a lot of time... It relies on the fact that adding an edge to a spanning tree creates a cycle from which we can remove the longest edge.
I also thought about using Kruskal to get an overall minimum spanning tree and only using the above algorithm to replace a single edge when the trees from both, prim and kruskal, happen to be the same, but that doesn't seem to work as the result would be highly dependent on the edges selected during a run of kruskal.
Any suggestions/hints?
You can do it using Prim`s algorithm
Prim's algorithm:
let T be a single vertex x
while (T has fewer than n vertices)
{
1.find the smallest edge connecting T to G-T
2.add it to T
}
Now lets modify it.
Let you have one minimum spanning tree. Say Tree(E,V)
Using this algorithm
Prim's algorithm (Modified):
let T be a single vertex
let isOther = false
while (T has fewer than n vertices)
{
1.find the smallest edge (say e) connecting T to G-T
2.If more than one edge is found, {
check which one you have in E(Tree)
choose one different from this
add it to T
set isOther = true
}
else if one vertex is found {
add it to T
If E(Tree) doesn`t contain this edge, set isOther = true
Else don`t touch isOther ( keep value ).
}
}
If isOther = true, it means you have found another tree different from Tree(E,V) and it is T,
Else graph have single minimum spanning tree

Solving cycle in undirected graph in log space?

A slightly more theoretical question, but here it is nonetheless.
Setting
Let:
UCYLE = { : G is an undirected graph that contains a simple cycle}.
My Solution
we show UCYLE is in L by constructing algorithm M that decides UCYLE using $L$ space.
M = "On input where G = (V,E)
For each v_i in V, for each v_j in Neighbor(v_i), store the current v_i and v_j
Traverse the edge (v_i,v_j) and then follow all possible paths through G using DFS.
If we encounter v_k in Neighbor(v_i) / {v_j} so that there is an edge (v_i,v_k) in E, then ACCEPT. Else REJECT."
First we claim M decides UCYLE. First, if there exists a cycle in $G$, then it must start and end on some vertex $v_i$, step one of $M$ tries all such $v_i$'s and therefore must find the desired vertex. Next, suppose the cycle starts at $v_i$, then there must exists a starting edge $(v_i,v_j)$ so that if we follow the cycle, we come back to $v_i$ through a different edge $(v_k,v_i)$, so we accept in step three. Since the graph is undirected, we can always come back to $v_i$ through $(v_i,v_j)$, but $M$ does not accept this case. By construction, neither does $M$ accept if we come upon some $v_k in Neighbor(v_i)/{v_j}$ but there is no edge from $v_k$ to $v_i$.
Now we show M is in L. First if the vertices are labled $1,\ldots,n$ where $|\mathbb V| = n$, then it requires $log(n)$ bits to specify each $v_i$. Next note in $\mathcal M$ we only need to keep track of the current $v_i$ and $v_j$, so M is $2 log(n) = O(log n), which is in L
My Problem
My problem is how do you perform DFS on the graph in $log(n)$ space. For example, in the worst case where each vertex has degree $n$, you'd have to keep a counter of which vertex you took on a particular path, which would require $n log(n)$ space.
The state you maintain as you search is four vertices: (v_i, v_j, prev, current).
The next state is: (v_i, v_j, current, v) where v is the next neighbour of current after prev (wrapping back to the first if prev is the numerically last neighbour of current).
You stop when current is a neighbour of v_i and reject if it's not v_j.
In pseudo-code, something like this:
for v_i in vertices
for v_j in neighbours(v_i)
current, prev = v_j, v_i
repeat
idx = neighbours(current).index(v_j)
idx = (idx + 1) % len(neighbours(current))
current, prev = neighbours(current)[idx], current
until current adjacent to v_i
if current != v_j
return FOUND_A_CYCLE
return NO_CYCLES_EXIST
Intuitively, this is saying for each point in a maze, and for each corridor from that point, follow the left-hand wall, and if when you can see the start point again if it's not through the original corridor then you've found a cycle.
While it's easy to see that this algorithm uses O(log n) space, there's some proof necessary to show that this algorithm terminates.

Count paths with Topological Sort

I have a DAG and I need to count all the paths since any node to another node, I've researched a little bit and I found that it could be done with some Topological Order, but so far the solutions are incomplete or wrong.
So how is the correct way to do it?.
Thanks.
As this is a DAG you can topologically sort the nodes in O(V+E) time. Let's assume the source vertex is S. Then from S start traversing the nodes in depth first fashion. When we're processing node U , let's assume there's an edge U->V then V is of course not yet visited (why? because it's an directed acyclic graph) So you can reach from S to V via node U in d[U] ways where d[U] is the number of paths from S to U.
So number of paths from S to any node V, d[V] = d[x1]+d[x2]+d[x3]+ . . . +d[xy], where there are edge like x1->V, x2->V, . . . xy->V
This algorithm will take O(V+E) to topologically sort the graph and then for calculating number of paths at most O(V*E ). You can further reduce its run time of calculating number of path to O(V+E) using adjacency list instead of adjacency matrix and this is the most efficient solution so far.
You can use recursion to count all of the paths in a tree/DAG. Here is the pseudocode:
function numPaths(node1, node2):
// base case, one path from node to itself
if (node1 == node2): return 1
totalPaths = 0
for edge in node1.edges:
nextNode = edge.destinationNode
totalPaths += numPaths(nextNode, node2)
return totalPaths
Edit:
A good dynamic approach to this problem is the Floyd-Warshall algorithm.
Assume G(V,E)
Let d[i][j] = the number of all the paths from i to j
Then d[i][j]= sigma d[next][j] for all (i,next) in E
It seems too slow? Okay. Just memorise it(some guys call it dynamic programming). Like this
memset(d,-1,sizeof(d))// set all of elements of array d to -1 at the very beginning
saya(int i,int j)
{
if (d[i][j]!=-1) return d[i][j];//d[i][j] has been calculated
if (i==j) return d[i][j]=1;//trivival cases
d[i][j]=0;
for e in i.edges
d[i][j]+=saya(e.next,j);
return d[i][j];
}
Now saya(i,j) will return the number of all the paths from i to j.

Resources