Linear-time algorithm for number of distinct paths from each vertex in a directed acyclic graph - algorithm

I am working on the following past paper question for an algorithms module:
Let G = (V, E) be a simple directed acyclic graph (DAG).
For a pair of vertices v, u in V, we say v is reachable from u if there is a (directed) path from u to v in G.
(We assume that every vertex is reachable from itself.)
For any vertex v in V, let R(v) be the reachability number of vertex v, which is the number of vertices u in V that are reachable from v.
Design an algorithm which, for a given DAG, G = (V, E), computes the values of R(v) for all vertices v in V.
Provide the analysis of your algorithm (i.e., correctness and running time
analysis).
(Optimally, one should try to design an algorithm running in
O(n + m) time.)
So, far I have the following thoughts:
The following algorithm for finding a topological sort of a DAG might be useful:
TopologicalSort(G)
1. Run DFS on G and compute a DFS-numbering, N // A DFS-numbering is a numbering (starting from 1) of the vertices of G, representing the point at which the DFS-call on a given vertex v finishes.
2. Let the topological sort be the function a(v) = n - N[v] + 1 // n is the number of nodes in G and N[v] is the DFS-number of v.
My second thought is that dynamic programming might be a useful approach, too.
However, I am currently not sure how to combine these two ideas into a solution.
I would appreciate any hints!

EDIT: Unfortunately the approach below is not correct in general. It may count multiple times the nodes that can be reached via multiple paths.
The ideas below are valid if the DAG is a polytree, since this guarantees that there is at most one path between any two nodes.
You can use the following steps:
find all nodes with 0 in-degree (i.e. no incoming edges).
This can be done in O(n + m), e.g. by looping through all edges
and marking those nodes that are the end of any edge. The nodes with 0
in-degree are those which have not been marked.
Start a DFS from each node with 0 in-degree.
After the DFS call for a node ends, we want to have computed for that
node the information of its reachability.
In order to achieve this, we need to add the reachability of the
successors of this node. Some of these values might have already been
computed (if the successor was already visited by DFS), therefore this
is a dynamic programming solution.
The following pseudocode describes the DFS code:
function DFS(node) {
visited[node] = true;
reachability[node] = 1;
for each successor of node {
if (!visited[successor]) {
DFS(successor);
}
reachability[node] += reachability[successor];
}
}
After calling this for all nodes with 0 in-degree, the reachability
array will contain the reachability for all nodes in the graph.
The overall complexity is O(n + m).

I'd suggest using a Breadth First Search approach.
For every node, add all the nodes that are connected to the queue. In addition to that, maintain a separate array for calculating the reachability.
For example, if a A->B, then
1.) Mark A as traversed
2.) B is added to the queue
3.) arr[B]+=1
This way, we can get R(v) for all vertices in O(|V| + |E|) time through arr[].

Related

Depth first search (DFS) vs breadth first search (BFS) pseudocode and complexity

I have to develop pseudocode for an algorithm that computes the number of connected
components in a graph G = (V, E) given vertices V and edges E.
I know that I can use either depth-first search or breadth-first search to calculate the number of connected components.
However, I want to use the most efficient algorithm to solve this problem, but I am unsure of the complexity of each algorithm.
Below is an attempt at writing DFS in pseudocode form.
function DFS((V,E))
mark each node in V with 0
count ← 0
for each vertex in V do
if vertex is marked then
DFSExplore(vertex)
function DFSExplore(vertex)
count ← count + 1
mark vertex with count
for each edge (vertex, neighbour) do
if neighbour is marked with 0 then
DFSExplore(neighbour)
Below is an attempt at writing BFS in pseudocode form.
function BFS((V, E))
mark each node in V with 0
count ← 0, init(queue) #create empty queue
for each vertex in V do
if vertex is marked 0 then
count ← count + 1
mark vertex with count
inject(queue, vertex) #queue containing just vertex
while queue is non-empty do
u ← eject(queue) #dequeues u
for each edge (u, w) adjacent to u do
if w is marked with 0 then
count ← count + 1
mark w with count
inject(queue, w) #enqueues w
My lecturer said that BFS has the same complexity as DFS.
However, when I searched up the complexity of depth-first search it was O(V^2), while the complexity of breadth-first search is O(V + E) when adjacency list is used and O(V^2) when adjacency matrix is used.
I want to know how to calculate the complexity of DFS / BFS and I want to know how I can adapt the pseudocode to solve the problem.
Time complexity for both DFS & BFS is same i.e O(V+E) if you're using adjacency list. So if you ask, which one is better then it completely depends on the type of problem you are going to solve. Let's say you want to solve a problem where you have your goal near the starting vertex then BFS would be a better choice. Plus, if you consider memory then DFS is a better option because there is no need of storing child pointers.
Image courtesy - DSA Made Easy by Narasimha karumanchi

Algorithm to Compute square of a directed graph(represented in form of an adjacency list)

I am working on constructing an algorithm to compute G^2 of a directed graph that is a form of an adjacency list, where G^2 = (V,E'), where E' is defined as (u,v)∈E′ if there is a path of length 2 between u and v in G. I understand the question very well and have found an algorithm which I assume is correct, however the runtime of my algorithm is O(VE^2) where V is the number of vertices and E is the number of Edges of the graph. I was wondering how I could do this in O(VE) time in order to make it more efficient?
Here is the algorithm, I came up with:
for vertex in Vertices
for neighbor in Neighbors
for n in Neighbors
if(n!=neighbor)
then-> if(n.value==neighbor)
add this to a new adjacency list
break; // this means we have found a path of size 2 between vertex and neighbor
continue otherwise
The problem can be solved in time O(VE) using BFS(breadth first search). The thing about BFS, is that it traverses the graph level by level. Meaning that first it traverses all the vertices at a distance of 1 from the source vertex. Then it traverses all the vertices at a distance of 2 from the source vertex and so on. So we can take advantage of this fact and terminate our BFS, when we have reached vertices at a distance of 2.
Following is the pseudocode:
For each vertex v in V
{
Do a BFS with v as source vertex
{
For all vertices u at distance of 2 from v
add u to adjacency list of v
and terminate BFS
}
}
Since BFS takes time O(V + E) and we invoke this for every vertex, so total time is O(V(V + E)) = O(V^2 + VE) = O(VE) .Just remember to start with fresh data structures for every BFS traversal.

Minimum Spanning tree different from another

Assume we are given
an undirected graph g where every node i,1 <= i < n is connected to all j,i < j <=n
and a source s.
We want to find the total costs (defined as the sum of all edges' weights) of the cheapest minimum spanning tree that differs from the minimum distance tree of s (i.e. from the MST obtained by running prim/dijkstra on s) by at least one edge.
What would be the best way to tackle this? Because currently, I can only think of some kind of fixed-point iteration
run dijkstra on (g,s) to obtain reference graph r that we need to differ from
costs := sum(edge_weights_of(r))
change := 0
for each vertex u in r, run a bfs and note for each reached vertex v the longest edge on the path from u to v.
iterate through all edges e = (a,b) in g: and find e'=(a',b') that is NOT in r and minimizes newchange := weight(e') - weight(longest_edge(a',b'))
if(first_time_here OR newchange < 0) then change += newchange
if(newchange < 0) goto 4
result := costs + change
That seems to waste a lot of time... It relies on the fact that adding an edge to a spanning tree creates a cycle from which we can remove the longest edge.
I also thought about using Kruskal to get an overall minimum spanning tree and only using the above algorithm to replace a single edge when the trees from both, prim and kruskal, happen to be the same, but that doesn't seem to work as the result would be highly dependent on the edges selected during a run of kruskal.
Any suggestions/hints?
You can do it using Prim`s algorithm
Prim's algorithm:
let T be a single vertex x
while (T has fewer than n vertices)
{
1.find the smallest edge connecting T to G-T
2.add it to T
}
Now lets modify it.
Let you have one minimum spanning tree. Say Tree(E,V)
Using this algorithm
Prim's algorithm (Modified):
let T be a single vertex
let isOther = false
while (T has fewer than n vertices)
{
1.find the smallest edge (say e) connecting T to G-T
2.If more than one edge is found, {
check which one you have in E(Tree)
choose one different from this
add it to T
set isOther = true
}
else if one vertex is found {
add it to T
If E(Tree) doesn`t contain this edge, set isOther = true
Else don`t touch isOther ( keep value ).
}
}
If isOther = true, it means you have found another tree different from Tree(E,V) and it is T,
Else graph have single minimum spanning tree

Graph algorithm to calculate node degree

I'm trying to implement the topological-sort algorithm for a DAG. (http://en.wikipedia.org/wiki/Topological_sorting)
First step of this simple algorithm is finding nodes with zero degree, and I cannot find any way to do this without a quadratic algorithm.
My graph implementation is a simple adjacency list and the basic process is to loop through every node and for every node go through each adjacency list so the complexity will be O(|V| * |V|).
The complexity of topological-sort is O(|V| + |E|) so i think there must be a way to calculate the degree for all nodes in a linear way.
You can maintain the indegree of all vertices while removing nodes from the graph and maintain a linked list of zero indegree nodes:
indeg[x] = indegree of node x (compute this by going through the adjacency lists)
zero = [ x in nodes | indeg[x] = 0 ]
result = []
while zero != []:
x = zero.pop()
result.push(x)
for y in adj(x):
indeg[y]--
if indeg[y] = 0:
zero.push(y)
That said, topological sort using DFS is conceptionally much simpler, IMHO:
result = []
visited = {}
dfs(x):
if x in visited: return
visited.insert(x)
for y in adj(x):
dfs(y)
result.push(x)
for x in V: dfs(x)
reverse(result)
You can achieve it in o(|v|+|e|). Follow below given steps:
Create two lists inDegree, outDegree which maintain count for in coming and out going edges for each node, initialize it to 0.
Now traverse through given adjacency list, for edge (u,v) in graph g, increase count of outdegree for u, and increment count of indegree for v.
You can traverse through adjacency list in o(v +e) , and will have indegree and outdegree for each u in o(|v|+|e|).
The Complexity that you mentioned for visiting adjacency nodes is not quite correct (O(n2)), because if you think carefully, you will notice that this is more like a BFS search. So, you visit each node and each edge only once. Therefore, the complexity is O(m+n). Where, n is the number of nodes and m is the edge count.
You can also use DFS for topological sorting. You won't need additional pass to calculate in-degree after processing each node.
http://www.geeksforgeeks.org/topological-sorting/

Count paths with Topological Sort

I have a DAG and I need to count all the paths since any node to another node, I've researched a little bit and I found that it could be done with some Topological Order, but so far the solutions are incomplete or wrong.
So how is the correct way to do it?.
Thanks.
As this is a DAG you can topologically sort the nodes in O(V+E) time. Let's assume the source vertex is S. Then from S start traversing the nodes in depth first fashion. When we're processing node U , let's assume there's an edge U->V then V is of course not yet visited (why? because it's an directed acyclic graph) So you can reach from S to V via node U in d[U] ways where d[U] is the number of paths from S to U.
So number of paths from S to any node V, d[V] = d[x1]+d[x2]+d[x3]+ . . . +d[xy], where there are edge like x1->V, x2->V, . . . xy->V
This algorithm will take O(V+E) to topologically sort the graph and then for calculating number of paths at most O(V*E ). You can further reduce its run time of calculating number of path to O(V+E) using adjacency list instead of adjacency matrix and this is the most efficient solution so far.
You can use recursion to count all of the paths in a tree/DAG. Here is the pseudocode:
function numPaths(node1, node2):
// base case, one path from node to itself
if (node1 == node2): return 1
totalPaths = 0
for edge in node1.edges:
nextNode = edge.destinationNode
totalPaths += numPaths(nextNode, node2)
return totalPaths
Edit:
A good dynamic approach to this problem is the Floyd-Warshall algorithm.
Assume G(V,E)
Let d[i][j] = the number of all the paths from i to j
Then d[i][j]= sigma d[next][j] for all (i,next) in E
It seems too slow? Okay. Just memorise it(some guys call it dynamic programming). Like this
memset(d,-1,sizeof(d))// set all of elements of array d to -1 at the very beginning
saya(int i,int j)
{
if (d[i][j]!=-1) return d[i][j];//d[i][j] has been calculated
if (i==j) return d[i][j]=1;//trivival cases
d[i][j]=0;
for e in i.edges
d[i][j]+=saya(e.next,j);
return d[i][j];
}
Now saya(i,j) will return the number of all the paths from i to j.

Resources