Finding the Reachability Count for all vertices of a DAG - algorithm

I am trying to find a fast algorithm with modest space requirements to solve the following problem.
For each vertex of a DAG find the sum of its in-degree and out-degree in the DAG's transitive closure.
Given this DAG:
I expect the following result:
Vertex # Reacability Count Reachable Vertices in closure
7 5 (11, 8, 2, 9, 10)
5 4 (11, 2, 9, 10)
3 3 (8, 9, 10)
11 5 (7, 5, 2, 9, 10)
8 3 (7, 3, 9)
2 3 (7, 5, 11)
9 5 (7, 5, 11, 8, 3)
10 4 (7, 5, 11, 3)
It seems to me that this should be possible without actually constructing the transitive closure. I haven't been able to find anything on the net that exactly describes this problem. I've got some ideas about how to do this, but I wanted to see what the SO crowd could come up with.

For an exact answer, I think it's going to be hard to beat KennyTM's algorithm. If you're willing to settle for an approximation, then the tank counting method ( http://www.guardian.co.uk/world/2006/jul/20/secondworldwar.tvandradio ) may help.
Assign each vertex a random number in the range [0, 1). Use a linear-time dynamic program like polygenelubricants's to compute for each vertex v the minimum number minreach(v) reachable from v. Then estimate the number of vertices reachable from v as 1/minreach(v) - 1. For better accuracy, repeat several times and take a median of means at each vertex.

For each node, use BFS or DFS to find the out-reachability.
Do it again for the reversed direction to find the in-reachability.
Time complexity: O(MN + N2), space complexity: O(M + N).

I have constructed a viable solution to this question. I base my solution on a modification of the topological sorting algorithm. The algorithm below calculates only the in-degree in the transitive closure. The out-degree can be computed in the same fashion with edges reversed and the two counts for each vertex summed to determine the final "reachability count".
for each vertex V
inCount[V] = inDegree(V) // inDegree() is O(1)
if inCount[V] == 0
pending.addTail(V)
while pending not empty
process(pending.removeHead())
function process(V)
for each edge (V, V2)
predecessors[V2].add(predecessors[V]) // probably O(|predecessors[V]|)
predecessors[V2].add(V)
inCount[V2] -= 1
if inCount[V2] == 0
pending.add(V2)
count[V] = sizeof(predecessors[V]) // store final answer for V
predecessors[V] = EMPTY // save some memory
Assuming that the set operations are O(1), this algorithm runs in O(|V| + |E|). It is more likely, however, that the set union operation predecessors[V2].add(predecessors[V]) makes it somewhat worse. The additional steps required by the set unions depends on the shape of the DAG. I believe the worst case is O(|V|^2 + |E|). In my tests this algorithm has shown better performance than any other I have tried so far.
Furthermore, by disposing of predecessor sets for fully processed vertices, this algorithm will typically use less memory than most alternatives. It is true, however, that the worst case memory consumption of the above algorithm matches that of constructing the transitive closure, but that will not be true for most DAGs.

OMG IT'S WRONG! SORRY!
I'll leave this up until a good alternative is available. CW-ed so feel free to discuss and expand on this if possible.
Use dynamic programming.
for each vertex V
count[V] = UNKNOWN
for each vertex V
getCount(V)
function getCount(V)
if count[V] == UNKNOWN
count[V] = 0
for each edge (V, V2)
count[V] += getCount(V2) + 1
return count[V]
This is O(|V|+|E|) with adjacency list. It counts only the out-degree in the transitive closure. To count the in-degrees, call getCount with edges reversed. To get the sum, add up the counts from both calls.
To see why this is O(|V|+|E|), consider this: each vertex V will be visited exactly 1 + in-degree(V) times: once directly on V, and once for every edge (*, V). On subsequent visits, getCount(V), simply returns the memoized count[V] in O(1).
Another way to look at it is to count how many times each edge will be followed along: exactly once.

I assume that you have a list of all vertices, and that each vertex has an id and a list of vertices you can directly reach from it.
You can then add another field (or however you represent that) that holds the vertices you can also indirectly reach. I would do this in a recursive depth-first search, memoizing the results in the field of the respective reached nodes. As a data structure for this, you would perhaps use some sort of tree which allows efficient removal of duplicates.
The in-reachability can be done separately by adding the inverse links, but it can also be done in the same pass as the out-reachability, by accumulating the currently out-reaching nodes and adding them to the corresponding fields of the reached nodes.

Related

How can I know all possible ways that the edges are connected if I know the toposort in a graph?

How can I know all possible ways that the edges are connected if I know the topological sort?
Here is the original problem:
Now little C has topologically sorted a simple (no heavy edges) directed acyclic graph, but accidentally lost the original. Except for topological sequences, little C only remembers that the original graph had the number of edges k, and that there is one vertex u in the graph that can reach all the other vertices. He wants to know how many simple directed acyclic graphs there are that satisfy the above requirements. Since the answer may be large, you only need to output the remainder of the answer module m.
I have just learned the topological sort. I wonder how I can use it in an upside down way? I know the final toposorted way as (1 2 3 4) and there is one vertex that connects all other vertexes, and there are 4 edges in all, but I need the number of all possible ways that edges are linked.
I think this problem has something to deal with permutation number,and the specific u has to be the first in the toposorted list.
NOTICE the max of m can be up to 200'000,so definitely you can not brute force this problem!!!
Let the topological order be u = 1, 2, …, n. Since 1 can reach all other
vertices, the topological order begins with 1. Each node v > 1, being
reachable from u, must have arcs from one or more nodes < v. These
choices are linked only by the constraint on the number of arcs.
We end up computing Count[v][m] (modulo whatever the modulus is) as
the number of reconstructions on 1, 2, …, v with exactly m arcs. The
answer is Count[n][k].
Count[1][0] = 1 if m == 0 else 0
for v > 1, Count[v][m] = sum for j = 1 to min(m, v-1) of (v-1 choose j)*Count[v-1][m-j]

What is the Time Complexity of finding all possible ways from one to another?

Here I use a pseudocode to present my algorithm,it's a variation of DFS
The coding style is imitating the Introduction to Algorithm ,Everytime we come across a vertex,its color is painted BLACK .Suppose the starting vertex is START,and the target vertex is TARGET,and the graph is represented as G=(V,E).One more thing,assume that the graph is connected,and strongly connected if it's a directed graph.
FIND-ALL-PATH(G,START,END)
for each vertex u in G.V
u.color=WHITE
path=0//store the result
DFS-VISIT(G,START)
DFS-VISIT(G,u)
if(u==TARGET)
path=path+1
return
u.color=BLACK
for each v in G:Adj[u]
if(v.color==WHITE)
DFS-VISIT(G,v)
u.color=WHITE;//re-paint the vertex to find other possible ways
How to analyze the Time Complexity of the algorithm above?If it's the normal DFS then of course its O(N+E),because each vertex is visited only once,and each edge is visited twice.But what about this?It seems hard to specify the time that each vertex or edge is visited.
To analyze the time complexity for FIND-ALL-PATH, let's see what is the time complexity of DFS-VISIT. I am assuming you are using Adjacency List for representing the Graph.
Here, in one call of DFS-VISIT every vertex which is connected to u (the vertex you passed as the argument) is going to be explored once (i.e. vertex color is going to be changed to BLACK). Since this is a recursive function so in each recursion a new stack is going to be formed and there the set G:Adj[u] present in each stack is nothing but element adjacent to u. Therefore, every node in all the list put together will be examined(color is changed) exactly once and whenever they are examined, we do a constant work (i.e. O(1) operation). There are overall E elements in case of directed Graph and 2E in case of un-directed Graph in Adjacency List representation. So we say it's time is O(E), where E is the number of edges. In some books, they add extra time O(N), where N is the number of vertices, so they say they overall time complexity for DFS-VISIT is O(N+E)(I think that the reason for that extra O(N) time is the for loop which gets executed N number of times or it may be something else). BTW, N is always less than E so you can either ignore it or consider it, it doesn't affect the Asymptotic time for the DFS-VISIT.
The time complexity of the function FIND-ALL-PATH is N * time complexity for DFS-VISIT; where N is the number of vertices in the Graph. So I would say that the algorithm you wrote above is not exactly same as depth-first traversal algorithm but then it will do the same work as depth-first traversal. The time taken in your algorithms is more because you are calling DFS-VISIT for each vertex in your graph. Your function FIND-ALL-PATH could be optimized in a way that before calling DFS-VISIT function just check if the color of the vertex is changed to BLACK or not (that's what is generally done in depth-first traversal).
i.e. you should have written the function like this:
FIND-ALL-PATH(G,START,END)
for each vertex u in G.V
u.color=WHITE
path=0//store the result
for each vertex u in G.V
if u.color is WHITE
DFS-VISIT(G,START)
Now this function written above will have same time complexity as DFS-VISIT.
Also note that there is some time taken to initialize the color of all vertices to WHITE, which is O(N) operation.
So, the overall time complexity of your function FIND-ALL-PATH is O(N)+O(N*(N+E)) or you can ignore the first O(N) (as it's very less as compared to the other term).
Thus, time complexity = O(N*(N+E)), or if you assume just O(E) time for your DFS-VISIT function then you can say that time complexity is O(N*E).
Let me know if you have doubt at any point mentioned above.
For directed graph :-
Suppose there are 7 vertices {0,1,2,3,4,5,6} and consider the worst case where every vertex is connected to every other vertex =>
No of edges required to reach from x to y vertices is as follows :
(6->6) 6 to 6 =0
(5->6) 5 to 6 =1
(4->6) 4 to 6 = (4 to 5 -> 6) + (4 to 6->6) = (1+( 5 -> 6)) + (1+0) =(1 + 1) + 1= 3
(3->6) 3 to 6 =(3 to 4 -> 6) + (3 to 5 -> 6 ) + (3 to 6->6) = (1+3) + (1+1) + (1+0)=7
(2->6) 2 to 6= 4+7+3+1=15
(1->6) 1 to 6= 5+15+7+3+1=31
(0->6) 0 to 6 = 6+5+15+7+3+1=63
So the time complexity to cover all the path to reach from 0 to 6= summation of (1+3+7+15+.....+T(n-1)+T(n))+(Total no of vertices -1) = (2^(n+1)-2-n)+(V-1)
value of n=V-1.
So final time complexity = O(2^V)
For undirected graph :-
Every edge will be traversed twice = 2*((2^(n+1)-2-n)+(V-1))=O(2^(V+1))

Finding two minimum spanning trees in graph such that their sum is minimal

I'm trying to solve pretty complex problem with graphs, namely we have given undirected graph with N(N <= 10)nodes and M (M <= 25)edges.
Let's say we have two sets of edges A and B, we can't have two same edges in both A and B, also there can be edges that wont be used in the any of those sets, each edge is assigned value to it. We want to minimize the total sum of the two trees.
Please note that in both sets A and B the edges should form connected graph with all N nodes.
Example
N = 2, M = 3
Edges: 1 - 2, value = 10, 1 - 2, value: 20, 2 - 1, value 30, we want to return the result 30, in the set A we take the first edge and in set B the second edge.
N = 5
M = 8
Edges: {
(1,2,10),
(1,3,10),
(1,4,10),
(1,4,20),
(1,5,20),
(2,3,20),
(3,4,20),
(4,5,30),
}
set A contains edges {(1,2,10), (1,3,10), (1,4,10), (1,5,20)}
while set B contains {(1,4,20), (2,3,20), (3,4,20), (4,5,30)}
What I tried
Firstly I coded greedy solution, I first generated the first minimum spanning tree and then I generated with the other edges the second one, but it fails on some test cases. So I started thinking about this solution:
We can see that we want to split the edges in two groups, also we can see that in each group we want to have N - 1 edges to make sure the graph doesn't contain not-wanted edges, Now we see that in worse-case we will use (N-1) + (N-1) edges, that is 18 edges. This is small numbers, so we can run backtracking algorithm with some optimizations to solve this problem.
I still haven't coded the backtracking because I'm not sure if it will work, please write what do you think. Thanks in advance.

Directed graph: checking for cycle in adjacency matrix

There is an alternative method respect to the DFS algorithm to check if there are cycles in a directed graph represented with an adjacency matrix?
I found piecemeal information on the properties of matrices.
Maybe I can multiply matrix A by itself n times, and check if there is non-zero diagonal in each resulting matrix.
While this approach may be right, how can I extract explicitly the list of vertices representing a cycle?
And what about the complexity of this hypothetical algorithm?
Thanks in advance for your help.
Let's say after n iterations, you have a matrix where the cell at row i and column j is M[n][i][j]
By definition M[n][i][j] = sum over k (M[n - 1][i][k] * A[k][j]). Let's say M[13][5][5] > 0, meaning it has a cycle of length 13 starting at 5 and ending at 5. To have M[13][5][5] > 0, there must be some k such that M[12][5][k] * A[k][5] > 0. Let's say k = 6, now you know one more node in the cycle (6). It also follows that M[12][5][6] > 0 and A[6][5] > 0
To have M[12][5][6] > 0, there must be some k such that M[11][5][k] * A[k][6] > 0. Let's say k = 9, now, you know one more node in the cycle (9). It also follows that M[11][5][9] > 0 and A[9][6] > 0
Then, you can do the same repetitively to find other nodes in the cycle.
Depth-first search can be modified to decide the existence of a cycle. The first time that the algorithm discovers a node which has previously been visited, the cycle can be extracted from the stack, as the previously found node must still be on the stack; it would make sense to use a user-defined stack instead of the call stack. The complexity would be O(|V|+|E|), as for unmodified depth-first search itself.

Finding the number of paths of given length in a undirected unweighted graph

'Length' of a path is the number of edges in the path.
Given a source and a destination vertex, I want to find the number of paths form the source vertex to the destination vertex of given length k.
We can visit each vertex as many times as we want, so if a path from a to b goes like this: a -> c -> b -> c -> b it is considered valid. This means there can be cycles and we can go through the destination more than once.
Two vertices can be connected by more than one edge. So if vertex a an vertex b are connected by two edges, then the paths , a -> b via edge 1 and a -> b via edge 2 are considered different.
Number of vertices N is <= 70, and K, the length of the path, is <= 10^9.
As the answer can be very large, it is to be reported modulo some number.
Here is what I have thought so far:
We can use breadth-first-search without marking any vertices as visited, at each iteration, we keep track of the number of edges 'n_e' we required for that path and product 'p' of the number of duplicate edges each edge in our path has.
The search search should terminate if the n_e is greater than k, if we ever reach the destination with n_eequal to k, we terminate the search and add p to out count of number of paths.
I think it we could use a depth-first-search instead of breadth first search, as we do not need the shortest path and the size of Q used in breadth first search might not be enough.
The second algorithm i have am thinking about, is something similar to Floyd Warshall's Algorithm using this approach . Only we dont need a shortest path, so i am not sure this is correct.
The problem I have with my first algorithm is that 'K' can be upto 1000000000 and that means my search will run until it has 10^9 edges and n_e the edge count will be incremented by just 1 at each level, which will be very slow, and I am not sure it will ever terminate for large inputs.
So I need a different approach to solve this problem; any help would be greatly appreciated.
So, here's a nifty graph theory trick that I remember for this one.
Make an adjacency matrix A. where A[i][j] is 1 if there is an edge between i and j, and 0 otherwise.
Then, the number of paths of length k between i and j is just the [i][j] entry of A^k.
So, to solve the problem, build A and construct A^k using matrix multiplication (the usual trick for doing exponentiation applies here). Then just look up the necessary entry.
EDIT: Well, you need to do the modular arithmetic inside the matrix multiplication to avoid overflow issues, but that's a much smaller detail.
Actually the [i][j] entry of A^k shows the total different "walk", not "path", in each simple graph. We can easily prove it by "mathematical induction".
However, the major question is to find total different "path" in a given graph.
We there are a quite bit of different algorithm to solve, but the upper bound is as follow:
(n-2)*(n-3)*...(n-k) which "k" is the given parameter stating length of path.
Let me add some more content to above answers (as this is the extended problem I faced). The extended problem is
Find the number of paths of length k in a given undirected tree.
The solution is simple for the given adjacency matrix A of the graph G find out Ak-1 and Ak and then count number of the 1s in the elements above the diagonal (or below).
Let me also add the python code.
import numpy as np
def count_paths(v, n, a):
# v: number of vertices, n: expected path length
paths = 0
b = np.array(a, copy=True)
for i in range(n-2):
b = np.dot(b, a)
c = np.dot(b, a)
x = c - b
for i in range(v):
for j in range(i+1, v):
if x[i][j] == 1:
paths = paths + 1
return paths
print count_paths(5, 2, np.array([
np.array([0, 1, 0, 0, 0]),
np.array([1, 0, 1, 0, 1]),
np.array([0, 1, 0, 1, 0]),
np.array([0, 0, 1, 0, 0]),
np.array([0, 1, 0, 0, 0])
])

Resources