return source nodes of a directed graph - algorithm

A source in a directed graph is a node that has no edges going into it. Give a linear-time algorithm
that takes as input a directed graph in adjacency list format, and outputs all of its sources.
solution:
Finding the sources of a directed graph.
We will keep an array in[u] which holds the indegree (number of incoming edges) of each node. For a
source, this value is zero.
function sources(G)
Input: Directed graph G = (V,E)
Output: A list of G's source nodes
for all u ∈ V : in[u] = 0
for all u ∈ V :
for all edges (u,w) ∈ E:
in[w] = in[w] + 1
L = empty linked list
for all u ∈ V :
if in[u] is 0: add u to L
return L
the thing i particularly do not understand about the code above is the innermost for loop in the first code block what exactly does in[w] = in[w]+1 mean? i think it means its counting the indegrees of each node, but how exactly it's doing that i cannot picture it, can someone please help me visualize this aspect

in[w] = in[w] + 1 increases the number of edges going into w.
Maybe an example will help:
Consider a simple graph:
a ---> b
The adjacency list representation is:
a: {b}
b: {}
Now the algorithm will loop through all vertices.
For a, it will loop over the edge (a,b) and increase b's count.
For b, there are no edges.
Now a's count is still zero, thus it is a source vertex.

Related

How to count all reachable nodes in a directed graph?

There is a directed graph (which might contain cycles), and each node has a value on it, how could we get the sum of reachable value for each node. For example, in the following graph:
the reachable sum for node 1 is: 2 + 3 + 4 + 5 + 6 + 7 = 27
the reachable sum for node 2 is: 4 + 5 + 6 + 7 = 22
.....
My solution: To get the sum for all nodes, I think the time complexity is O(n + m), the n is the number of nodes, and m stands for the number of edges. DFS should be used,for each node we should use a method recursively to find its sub node, and save the sum of sub node when finishing the calculation for it, so that in the future we don't need to calculate it again. A set is needed to be created for each node to avoid endless calculation caused by loop.
Does it work? I don't think it is elegant enough, especially many sets have to be created. Is there any better solution? Thanks.
This can be done by first finding Strongly Connected Components (SCC), which can be done in O(|V|+|E|). Then, build a new graph, G', for the SCCs (each SCC is a node in the graph), where each node has value which is the sum of the nodes in that SCC.
Formally,
G' = (V',E')
Where V' = {U1, U2, ..., Uk | U_i is a SCC of the graph G}
E' = {(U_i,U_j) | there is node u_i in U_i and u_j in U_j such that (u_i,u_j) is in E }
Then, this graph (G') is a DAG, and the question becomes simpler, and seems to be a variant of question linked in comments.
EDIT previous answer (striked out) is a mistake from this point, editing with a new answer. Sorry about that.
Now, a DFS can be used from each node to find the sum of values:
DFS(v):
if v.visited:
return 0
if v is leaf:
return v.value
v.visited = true
return sum([DFS(u) for u in v.children])
This is O(V^2 + VE) worst vase, but since the graph has less nodes, V
and E are now significantly lower.
Some local optimizations can be made, for example, if a node has a single child, you can reuse the pre-calculated value and not apply DFS on the child again, since there is no fear of counting twice in this case.
A DP solution for this problem (DAG) can be:
D[i] = value(i) + sum {D[j] | (i,j) is an edge in G' }
This can be calculated in linear time (after topological sort of the DAG).
Pseudo code:
Find SCCs
Build G'
Topological sort G'
Find D[i] for each node in G'
apply value for all node u_i in U_i, for each U_i.
Total time is O(|V|+|E|).
You can use DFS or BFS algorithms for solving Your problem.
Both have complexity O(V + E)
You dont have to count all values for all nodes. And you dont need recursion.
Just make something like this.
Typically DFS looks like this.
unmark all vertices
choose some starting vertex x
mark x
list L = x
while L nonempty
choose some vertex v from front of list
visit v
for each unmarked neighbor w
mark w
add it to end of list
In Your case You have to add some lines
unmark all vertices
choose some starting vertex x
mark x
list L = x
float sum = 0
while L nonempty
choose some vertex v from front of list
visit v
sum += v->value
for each unmarked neighbor w
mark w
add it to end of list

How to update MST from the old MST if one edge is deleted

I am studying algorithms, and I have seen an exercise like this
I can overcome this problem with exponential time but. I don't know how to prove this linear time O(E+V)
I will appreciate any help.
Let G be the graph where the minimum spanning tree T is embedded; let A and B be the two trees remaining after (u,v) is removed from T.
Premise P: Select minimum weight edge (x,y) from G - (u,v) that reconnects A and B. Then T' = A + B + (x,y) is a MST of G - (u,v).
Proof of P: It's obvious that T' is a tree. Suppose it were not minimum. Then there would be a MST - call it M - of smaller weight. And either M contains (x,y), or it doesn't.
If M contains (x,y), then it must have the form A' + B' + (x,y) where A' and B' are minimum weight trees that span the same vertices as A and B. These can't have weight smaller than A and B, otherwise T would not have been an MST. So M is not smaller than T' after all, a contradiction; M can't exist.
If M does not contain (x,y), then there is some other path P from x to y in M. One or more edges of P pass from a vertex in A to another in B. Call such an edge c. Now, c has weight at least that of (x,y), else we would have picked it instead of (x,y) to form T'. Note P+(x,y) is a cycle. Consequently, M - c + (x,y) is also a spanning tree. If c were of greater weight than (x,y) then this new tree would have smaller weight than M. This contradicts the assumption that M is a MST. Again M can't exist.
Since in either case, M can't exist, T' must be a MST. QED
Algorithm
Traverse A and color all its vertices Red. Similarly label B's vertices Blue. Now traverse the edge list of G - (u,v) to find a minimum weight edge connecting a Red vertex with a Blue. The new MST is this edge plus A and B.
When you remove one of the edges then the MST breaks into two parts, lets call them a and b, so what you can do is iterate over all vertices from the part a and look for all adjacent edges, if any of the edges forms a link between the part a and part b you have found the new MST.
Pseudocode :
for(all vertices in part a){
u = current vertex;
for(all adjacent edges of u){
v = adjacent vertex of u for the current edge
if(u and v belong to different part of the MST) found new MST;
}
}
Complexity is O(V + E)
Note : You can keep a simple array to check if vertex is in part a of the MST or part b.
Also note that in order to get the O(V + E) complexity, you need to have an adjacency list representation of the graph.
Let's say you have graph G' after removing the edge. G' consists have two connected components.
Let each node in the graph have a componentID. Set the componentID for all the nodes based on which component they belong to. This can be done with a simple BFS for example on G'. This is an O(V) operation as G' only has V nodes and V-2 edges.
Once all the nodes have been flagged, iterate over all unused edges and find the one with the least weight that connects the two components (componentIDs of the two nodes will be different). This is an O(E) operation.
Thus the total runtime is O(V+E).

A function definition for directed acyclic graphs

Let's consider the following problem: For a directed acyclic graph G = (V,E) we define the function "levels" for each vertex u, as l(u) such that:
1. l(u)>=0 for every u
2. If there is a path from u to v (u -> v) then l(u)>l(v)
3. For each vertex u, l(u) is the minimum integer that satisfies both conditions 1 and 2.
The problem says:
a. Prove that for every DAG the above function is uniquely defined, i.e. it's the only function that satisfies conditions 1,2 and 3.
b. Find an O(|V| + |E|) algorithm that calculates this function for every vertex.
Here is a possible algorithm based on topological sort:
First we find the transpose of G which is G^T, defined as G^T = (V,E^T), where E^T={(u,v): (v,u) is in E} which takes O(|V|+|E|) in total if based on adjacency list implementation:
(O(|V|) for allocation and sum for all v in V of |Adj[v]| = O(|E|)). Topological sort takes Theta(|V|+|E|) since it includes a BFS and |V| insertions in list each of which take O(1).
TRANSPOSE(G){
Allocate |V| list pointers for G^T i.e. (Adj'[])
for(i = 1, i <= |V|, i++){
for every vertex v in Adj[i]{
add vertex i to Adj'[v]
}
}
}
L = TopSort(G)
a. Prove that for every DAG the above function is uniquely defined, i.e. it's the only function that satisfies conditions 1,2 and 3.
Maybe I am missing something, but this seems really obvious to me: if you define it as the minimum that satisfies those conditions, how can there be more than one?
b. Find an O(|V| + |E|) algorithm that calculates this function for every vertex.
I think your topological sort idea is correct (note that a topological sort is a BFS), but it should be performed on the transposed graph (reverse the direction of every edge). Then the first values in the topological sort get 0, the next get 1 etc. For example, for the transposed graph:
1 2 3
*-->*-->*
^
*-------|
1
I have numbered the nodes with their positions in the topological sort. You number the nodes by implementing the topological sort using a BFS. When you extract a node from your FIFO queue, you subtract 1 from the indegree of all of its reachable nodes. When that indegree becomes 0 you insert the node it became 0 for in the queue and you number it as exracted_node + 1. In my example, the nodes numbered 1 start with indegree 0. Then, the bottom-most 1 subtract one from the indegree of the node labeled 3, but that indegree will be 1, not zero, so we don't insert it in the queue. We insert 2 however because its indegree will become 0.
Pseudocode:
G = G^t
Q = a FIFO queue
push all nodes with indegree 0 in Q
set l(v) = 0 for all nodes with indegree 0
indegree(v) = how many edges are going into node v
while not Q.Empty():
x = Q.Pop()
for all nodes v reachable from x:
if indegree[v] > 0:
indegree[v] = indegree[v] - 1
if indegree[v] == 0:
Q.Push(v)
l[v] = l[x] + 1
You can also do it with a DFS that computes the value of each node once the recursion returns, as:
value(v) = 1 + max{value(c), c a child of v}
Note that the DFS is not dont on the transposed graph, because we'll let the recursion handle the traversal in topological sort order.
Let's say you have a topological sort of G. Then you can consider vertices in reversed order: if you have a u -> v edge, then v comes before u in ordering.
If you loop on the nodes with this order, then let l(u) = 0 if there is no outgoing edges and l(u) = 1 + max(l(v), for each v such that there is an edge (u, v)). This is optimal and give you an O(|V| + |E|) algorithm to solve this problem.
Proof is left as an exercise. :D

counting total degrees of a graph

In a directed graph, the total degree of a node is the number of edges going into it plus the number of edges going out of it. Give a linear-time algorithm that takes as input a directed graph (in adjacency list format, as always), and computes the total degree of every node. The output of the algorithm should be an array total[.], with an entry for each node.
this is my pseudo-code for this problem:
procedure total degree(G)
Input: Directed graph G=(V,E)
Output: array total[.] with an entry for each node
for all u in V in[u]=0
for all u in V:
for all (u,v) in E:
in[v]=in[v]+1
for all u in V out[u]=0
for all u in V:
for all (u,v) in E:
out[u]=out[u]+1
for all u in V total[u]=0
for all u in V:
total[u]=in[v]+out[u]
return total[u]
can someone concur i did this right or tell me what i need to fix if i made a mistake, what im really unsure about is if i did the outdegrees (out[.]) right
i used this code as a reference point to come up with my own:
function sources(G)
Input: Directed graph G = (V;E)
Output: A list of G's source nodes
for all u in V : in[u] = 0
for all u in V :
for all edges (u,w) in E:
in[w] = in[w] + 1
L = empty linked list
for all u in V :
if in[u] is 0: add u to L
return L
Your second for block is the same as the first one, the only difference being the array name. This means it's going to count the same edges as the first one, giving you a wrong result.
In your second for, you need to count the other edge, not the same one:
for all u in V out[u]=0
for all u in V:
for all (u,v) in E:
out[v]=out[v]+1
Alternatively, you could count them all in one go:
Assuming input G=(V,E) is a list of nodes (V) and a list of edges (E) represented by node pairs ((u, v)), and assuming duplicates should count, all you need to do is count the nodes (both out and in) in the edge list.
for all u in V
total[u] = 0
for all (u, v) in E
total[u] = total[u] + 1
total[v] = total[v] + 1
return total

Finding a New Minimum Spanning Tree After a New Edge Was Added to The Graph

Let G = (V, E) be a weighted, connected and undirected graph and let T be a minimum spanning tree. Let e be any edge not in E (and has a weight W(e)).
Prove or disprove:
T U {e} is an edge set that contains a minimum spanning tree of G' = (V, E U {e}).
Well, it sounds true to me, so I decided to prove it but I just get stuck every time...
For example, if e is the new edge with minimum weight, who can promise us that the edges in T weren't chosen in a bad way that would prevent us from obtaining a new minimum weight without the 'help' of other edges in E - T ?
I would appreciate any help,
Thanks in advance.
Let [a(1), a(2), ..., a(n-1)] be a sequence of edges selected from E to construct MST of G by Kruskal's algorithm (in the order they were selected - weight(a(i)) <= weight(a(i + 1))).
Let's now consider how Kruskal's Algorithm behaves being given as input E' = E U {e}.
Let i = min{i: weight(e) < weight(a(i))}. Firstly algorithm decides to choose edges [a(1), ..., a(i - 1)] (e hasn't been processed yet, so it behaves the same). Then it need to decide on e - if e is dropped, solution for E' will be the same as for E. So let's suppose that first i edges selected by algorithm are [a(1), ..., a(i - 1), e] - I will call this new sequence a'. Algorithm continues - as long as its following selections (for j > i) satisfy a'(j) = a(j - 1) we are cool. There are two scenarios that break such great streak (let's say streak breaks at index k + 1):
1) Algorithm selects some edge e' that is not in T, and weight(e') < weight(a(k+1)). By now a' sequence is:
[a(1), ..., a(i-1), e, a(i), a(i+1), ..., a(k-1), a(k), e']
But if it was possible to append e' to this list it would be also possible to append it to [a(1), ..., a(k-1), a(k)]. But Kruskal's algorithm didn't do it when looking for MST for G. That leads to contradiction.
2) Algorithm politely selected:
[a(1), ..., a(i-1), e, a(i), a(i+1), ..., a(k-1), a(k)]
but decided to drop edge a(k+1). But if e was not present in the list algorithm would decide to append a(k+1). That means that in graph (V, {a(1), ..., a(k)}) edge a(k+1) would connect the same components as edge e. And that means that after considering by algorithm edge a(k + 1) in case of both G and G' the division into connected components (determined by set of selected edges) is the same. So after processing a(k+1) algorithm will proceed in the same way in both cases.
When ever a edge is add to a graph without adding a node , then that edge creates a cycle in minimum spanning tree of graph, cycle length may vary from 2 to n where n= no of nodes in graph.
T = Minimum spanning tree of G
Now to find the MST for (T + added edge) , we have to just remove one edge from that cycle .. so remove that edge which has maximum weight.
So T' always comes from T U {e}.
And if you are thinking that this doesn't prove that new MST will be an edge set of T U {e} then analyse Kruskal algorithim for for new graph. i.e. if e is of minimum weight it must have been selected for MST acc to Kruskal algorithim and same here if it is minimum it can not be removed from cycle.

Resources