adjacency-list representation of a directed graph - algorithm

Given an adjacency-list representation of a directed graph, how long does it take
to compute the out-degree of every vertex? How long does it take to compute the
in-degrees?
Thanks

Adjacency-list representation of a directed graph:
Out-degree of each vertex
Graph out-degree of a vertex u is equal to the length of Adj[u].
The sum of the lengths of all the adjacency lists in Adj is |E|.
Thus the time to compute the out-degree of every vertex is Θ(V + E)
In-degree of each vertex
The in-degree of a vertex u is equal to the number of times it appears in all the lists in Adj.
If we search all the lists for each vertex, time to compute the in-degree of every vertex is Θ(VE)
Alternatively, we can allocate an array T of size |V| and initialize its entries to zero.
We only need to scan the lists in Adj once, incrementing T[u] when we see 'u' in the lists.
The values in T will be the in-degrees of every vertex.
This can be done in Θ(V + E) time with Θ(V) additional storage.

Both are O(m + n) where m is the number of edges and n is the number of vertices.
Start a set of counters, one for each vertex, one for in-degree and out for out-degree.
Scan the edges. For the out vertex of each edge, add one to the out-degree counter for that vertex. For the in vertex of each edge, add one to the in-degree counter for that vertex. This is O(m) operation.
Output the out-degree and in-degree counters for each vertex, which is O(n).
That's how you get O(m + n).

out-degree for every vertex:theta(E)
in-degree for each vertex:O(E)
E is the number of edges of the graph

Since, its a directed graph and only the adjacency list is given.
The time taken to count the number of out-degrees would be theta (M+N) where M is the number of vertices and N refers to number of edges.
Whereas for the count of number of in-degrees, for any node you have to count the number of occurrences of that node in all other(rest of vertices) adjacency list. So, it would take theta(MN).
However, if you maintain an Array of size M, then you can do the counting of the in-degree in theta(M+N) with an additional space storage of theta(M)

Computing both the in-degree and out-degree takes theta(m + n) for a graph with m vertices and n edges. The reason that it is theta(m+n) and not O(m + n) because whatever may be the graph , it has to go through every vertex m and every edge n.

Given an adjacency-list representation Adj of a directed graph, the out-degree of a vertex u is equal to the length of Adj[u],
and the sum of the lengths of all the adjacency lists in Adj is |E|. Thus the time to compute the out-degree of every vertex is Θ(|V| + |E|).
The in-degree of a vertex u is equal to the number of times it appears in all the lists in Adj. If we search all the lists for each
vertex, the time to compute the in-degree of every vertex is Θ(|V|.|E|).
(Alternatively, we can allocate an array T of size |V| and initialize its entries to zero. Then we only need to scan the lists in
Adj once, incrementing T[u] when we see u in the lists. The values in T will be the in-degrees of every vertex. This can be
done in Θ(|V| + |E|) time with Θ(|V|) additional storage.)

Related

Algorithm to Compute square of a directed graph(represented in form of an adjacency list)

I am working on constructing an algorithm to compute G^2 of a directed graph that is a form of an adjacency list, where G^2 = (V,E'), where E' is defined as (u,v)∈E′ if there is a path of length 2 between u and v in G. I understand the question very well and have found an algorithm which I assume is correct, however the runtime of my algorithm is O(VE^2) where V is the number of vertices and E is the number of Edges of the graph. I was wondering how I could do this in O(VE) time in order to make it more efficient?
Here is the algorithm, I came up with:
for vertex in Vertices
for neighbor in Neighbors
for n in Neighbors
if(n!=neighbor)
then-> if(n.value==neighbor)
add this to a new adjacency list
break; // this means we have found a path of size 2 between vertex and neighbor
continue otherwise
The problem can be solved in time O(VE) using BFS(breadth first search). The thing about BFS, is that it traverses the graph level by level. Meaning that first it traverses all the vertices at a distance of 1 from the source vertex. Then it traverses all the vertices at a distance of 2 from the source vertex and so on. So we can take advantage of this fact and terminate our BFS, when we have reached vertices at a distance of 2.
Following is the pseudocode:
For each vertex v in V
{
Do a BFS with v as source vertex
{
For all vertices u at distance of 2 from v
add u to adjacency list of v
and terminate BFS
}
}
Since BFS takes time O(V + E) and we invoke this for every vertex, so total time is O(V(V + E)) = O(V^2 + VE) = O(VE) .Just remember to start with fresh data structures for every BFS traversal.

Maximum weighted path between two vertices in a directed acyclic Graph

Love some guidance on this problem:
G is a directed acyclic graph. You want to move from vertex c to vertex z. Some edges reduce your profit and some increase your profit. How do you get from c to z while maximizing your profit. What is the time complexity?
Thanks!
The problem has an optimal substructure. To find the longest path from vertex c to vertex z, we first need to find the longest path from c to all the predecessors of z. Each problem of these is another smaller subproblem (longest path from c to a specific predecessor).
Lets denote the predecessors of z as u1,u2,...,uk and dist[z] to be the longest path from c to z then dist[z]=max(dist[ui]+w(ui,z))..
Here is an illustration with 3 predecessors omitting the edge set weights:
So to find the longest path to z we first need to find the longest path to its predecessors and take the maximum over (their values plus their edges weights to z).
This requires whenever we visit a vertex u, all of u's predecessors must have been analyzed and computed.
So the question is: for any vertex u, how to make sure that once we set dist[u], dist[u] will never be changed later on? Put it in another way: how to make sure that we have considered all paths from c to u before considering any edge originating at u?
Since the graph is acyclic, we can guarantee this condition by finding a topological sort over the graph. topological sort is like a chain of vertices where all edges point left to right. So if we are at vertex vi then we have considered all paths leading to vi and have the final value of dist[vi].
The time complexity: topological sort takes O(V+E). In the worst case where z is a leaf and all other vertices point to it, we will visit all the graph edges which gives O(V+E).
Let f(u) be the maximum profit you can get going from c to u in your DAG. Then you want to compute f(z). This can be easily computed in linear time using dynamic programming/topological sorting.
Initialize f(u) = -infinity for every u other than c, and f(c) = 0. Then, proceed computing the values of f in some topological order of your DAG. Thus, as the order is topological, for every incoming edge of the node being computed, the other endpoints are calculated, so just pick the maximum possible value for this node, i.e. f(u) = max(f(v) + cost(v, u)) for each incoming edge (v, u).
Its better to use Topological Sorting instead of Bellman Ford since its DAG.
Source:- http://www.utdallas.edu/~sizheng/CS4349.d/l-notes.d/L17.pdf
EDIT:-
G is a DAG with negative edges.
Some edges reduce your profit and some increase your profit
Edges - increase profit - positive value
Edges - decrease profit -
negative value
After TS, for each vertex U in TS order - relax each outgoing edge.
dist[] = {-INF, -INF, ….}
dist[c] = 0 // source
for every vertex u in topological order
if (u == z) break; // dest vertex
for every adjacent vertex v of u
if (dist[v] < (dist[u] + weight(u, v))) // < for longest path = max profit
dist[v] = dist[u] + weight(u, v)
ans = dist[z];

Understanding Time complexity calculation for Dijkstra Algorithm

As per my understanding, I have calculated time complexity of Dijkstra Algorithm as big-O notation using adjacency list given below. It didn't come out as it was supposed to and that led me to understand it step by step.
Each vertex can be connected to (V-1) vertices, hence the number of adjacent edges to each vertex is V - 1. Let us say E represents V-1 edges connected to each vertex.
Finding & Updating each adjacent vertex's weight in min heap is O(log(V)) + O(1) or O(log(V)).
Hence from step1 and step2 above, the time complexity for updating all adjacent vertices of a vertex is E*(logV). or E*logV.
Hence time complexity for all V vertices is V * (E*logV) i.e O(VElogV).
But the time complexity for Dijkstra Algorithm is O(ElogV). Why?
Dijkstra's shortest path algorithm is O(ElogV) where:
V is the number of vertices
E is the total number of edges
Your analysis is correct, but your symbols have different meanings! You say the algorithm is O(VElogV) where:
V is the number of vertices
E is the maximum number of edges attached to a single node.
Let's rename your E to N. So one analysis says O(ElogV) and another says O(VNlogV). Both are correct and in fact E = O(VN). The difference is that ElogV is a tighter estimation.
Adding a more detailed explanation as I understood it just in case:
O(for each vertex using min heap: for each edge linearly: push vertices to min heap that edge points to)
V = number of vertices
O(V * (pop vertex from min heap + find unvisited vertices in edges * push them to min heap))
E = number of edges on each vertex
O(V * (pop vertex from min heap + E * push unvisited vertices to min heap)). Note, that we can push the same node multiple times here before we get to "visit" it.
O(V * (log(heap size) + E * log(heap size)))
O(V * ((E + 1) * log(heap size)))
O(V * (E * log(heap size)))
E = V because each vertex can reference all other vertices
O(V * (V * log(heap size)))
O(V^2 * log(heap size))
heap size is V^2 because we push to it every time we want to update a distance and can have up to V comparisons for each vertex. E.g. for the last vertex, 1st vertex has distance 10, 2nd has 9, 3rd has 8, etc, so, we push each time to update
O(V^2 * log(V^2))
O(V^2 * 2 * log(V))
O(V^2 * log(V))
V^2 is also a total number of edges, so if we let E = V^2 (as in the official naming), we will get the O(E * log(V))
let n be the number of vertices and m be the number of edges.
Since with Dijkstra's algorithm you have O(n) delete-mins and O(m) decrease_keys, each costing O(logn), the total run time using binary heaps will be O(log(n)(m + n)). It is totally possible to amortize the cost of decrease_key down to O(1) using Fibonacci heaps resulting in a total run time of O(nlogn+m) but in practice this is often not done since the constant factor penalties of FHs are pretty big and on random graphs the amount of decrease_keys is way lower than its respective upper bound (more in the range of O(n*log(m/n), which is way better on sparse graphs where m = O(n)). So always be aware of the fact that the total run time is both dependent on your data structures and the input class.
In dense(or complete) graph, E logV > V^2
Using linked data & binary heap is not always best.
That case, I prefer to use just matrix data and save minimum length by row.
Just V^2 time needed.
In case, E < V / logV.
Or, max edges per vertex is less than some constant K.
Then use binary heap.
I find it easier to think at this complexity in the following way:
The nodes are first inserted in a priority queue and the extracted one by one leading to O(V log V).
Once a node is extracted, we iterate through its edges and update the priority queue accordingly. Note that every edge is explored only once, moreover, updating the priority queue is O(log V), leading to an overall O(E log V).
TLDR. You have V extractions from the priority queue and E updates to the priority queue, leading to an overall O((V + E) log V).
Let's try to analyze the algorithm as given in CLRS book.
for each loop in line 7: for any vertex say 'u' the number of times the loop runs is equal to the number of adjacent vertices of 'u'.
The number of adjacent vertices for a node is always less than or equal to the total number of edges in the graph.
If we take V (because of while loop in line 4) and E (because of for each in line 7) and compute the complexity as VElog(V) it would be equivalent to assuming each vertex has E edges incident on it, but in actual there will be atmost or less than E edges incident on a single vertex. (the atmost E adjacent vertices for a single vertex case happens in case of a star graph for the internal vertex)
V:Number of Vertices,
E:Number of total_edges
Suppose the Graph is dense
The complexity would be O(V*logV) + O( (1+2+...+V)*logV)
1+2+....+(V-1) = (v)*(v+1)/2 ~ V^2 ~ E because the graph is dense
So the complexity would be O(ElogV).
The sum 1+2+...+ V refers to: For each vertex v in G.adj[u] but not in S
If you think about Q before a vertex is extracted has V vertices then it has V-1 then V-2
... then 1.
E is edges and V is vertices. Number of edges
(V *(V-1)) / 2
approximately
V ^ 2
So we can add maximum V^2 edges to the min heap. So sorting the elements in min heap will take
O(Log(V ^ 2))
Every time we insert a new element into min heap, we are going to sort. We will have E edges so we will be sorting E times. so total time complexity
O(E * Log(V ^ 2)= O( 2 * E * Log(V))
Omitting the constant 2:
O( E * Log(V))

Graph algorithm to calculate node degree

I'm trying to implement the topological-sort algorithm for a DAG. (http://en.wikipedia.org/wiki/Topological_sorting)
First step of this simple algorithm is finding nodes with zero degree, and I cannot find any way to do this without a quadratic algorithm.
My graph implementation is a simple adjacency list and the basic process is to loop through every node and for every node go through each adjacency list so the complexity will be O(|V| * |V|).
The complexity of topological-sort is O(|V| + |E|) so i think there must be a way to calculate the degree for all nodes in a linear way.
You can maintain the indegree of all vertices while removing nodes from the graph and maintain a linked list of zero indegree nodes:
indeg[x] = indegree of node x (compute this by going through the adjacency lists)
zero = [ x in nodes | indeg[x] = 0 ]
result = []
while zero != []:
x = zero.pop()
result.push(x)
for y in adj(x):
indeg[y]--
if indeg[y] = 0:
zero.push(y)
That said, topological sort using DFS is conceptionally much simpler, IMHO:
result = []
visited = {}
dfs(x):
if x in visited: return
visited.insert(x)
for y in adj(x):
dfs(y)
result.push(x)
for x in V: dfs(x)
reverse(result)
You can achieve it in o(|v|+|e|). Follow below given steps:
Create two lists inDegree, outDegree which maintain count for in coming and out going edges for each node, initialize it to 0.
Now traverse through given adjacency list, for edge (u,v) in graph g, increase count of outdegree for u, and increment count of indegree for v.
You can traverse through adjacency list in o(v +e) , and will have indegree and outdegree for each u in o(|v|+|e|).
The Complexity that you mentioned for visiting adjacency nodes is not quite correct (O(n2)), because if you think carefully, you will notice that this is more like a BFS search. So, you visit each node and each edge only once. Therefore, the complexity is O(m+n). Where, n is the number of nodes and m is the edge count.
You can also use DFS for topological sorting. You won't need additional pass to calculate in-degree after processing each node.
http://www.geeksforgeeks.org/topological-sorting/

Time complexity of Prim's MST Algorithm

Can someone explain to me why is Prim's Algorithm using adjacent matrix result in a time complexity of O(V2)?
(Sorry in advance for the sloppy looking ASCII math, I don't think we can use LaTEX to typeset answers)
The traditional way to implement Prim's algorithm with O(V^2) complexity is to have an array in addition to the adjacency matrix, lets call it distance which has the minimum distance of that vertex to the node.
This way, we only ever check distance to find the next target, and since we do this V times and there are V members of distance, our complexity is O(V^2).
This on it's own wouldn't be enough as the original values in distance would quickly become out of date. To update this array, all we do is at the end of each step, iterate through our adjacency matrix and update the distance appropriately. This doesn't affect our time complexity since it merely means that each step takes O(V+V) = O(2V) = O(V). Therefore our algorithm is O(V^2).
Without using distance we have to iterate through all E edges every single time, which at worst contains V^2 edges, meaning our time complexity would be O(V^3).
Proof:
To prove that without the distance array it is impossible to compute the MST in O(V^2) time, consider that then on each iteration with a tree of size n, there are V-n vertices to potentially be added.
To calculate which one to choose we must check each of these to find their minimum distance from the tree and then compare that to each other and find the minimum there.
In the worst case scenario, each of the nodes contains a connection to each node in the tree, resulting in n * (V-n) edges and a complexity of O(n(V-n)).
Since our total would be the sum of each of these steps as n goes from 1 to V, our final time complexity is:
(sum O(n(V-n)) as n = 1 to V) = O(1/6(V-1) V (V+1)) = O(V^3)
QED
Note: This answer just borrows jozefg's answer and tries to explain it more fully since I had to think a bit before I understood it.
Background
An Adjacency Matrix representation of a graph constructs a V x V matrix (where V is the number of vertices). The value of cell (a, b) is the weight of the edge linking vertices a and b, or zero if there is no edge.
Adjacency Matrix
A B C D E
--------------
A 0 1 0 3 2
B 1 0 0 0 2
C 0 0 0 4 3
D 3 0 4 0 1
E 2 2 3 1 0
Prim's Algorithm is an algorithm that takes a graph and a starting node, and finds a minimum spanning tree on the graph - that is, it finds a subset of the edges so that the result is a tree that contains all the nodes and the combined edge weights are minimized. It may be summarized as follows:
Place the starting node in the tree.
Repeat until all nodes are in the tree:
Find all edges that join nodes in the tree to nodes not in the tree.
Of those edges, choose one with the minimum weight.
Add that edge and the connected node to the tree.
Analysis
We can now start to analyse the algorithm like so:
At every iteration of the loop, we add one node to the tree. Since there are V nodes, it follows that there are O(V) iterations of this loop.
Within each iteration of the loop, we need to find and test edges in the tree. If there are E edges, the naive searching implementation uses O(E) to find the edge with minimum weight.
So in combination, we should expect the complexity to be O(VE), which may be O(V^3) in the worst case.
However, jozefg gave a good answer to show how to achieve O(V^2) complexity.
Distance to Tree
| A B C D E
|----------------
Iteration 0 | 0 1* # 3 2
1 | 0 0 # 3 2*
2 | 0 0 4 1* 0
3 | 0 0 3* 0 0
4 | 0 0 0 0 0
NB. # = infinity (not connected to tree)
* = minimum weight edge in this iteration
Here the distance vector represents the smallest weighted edge joining each node to the tree, and is used as follows:
Initialize with the edge weights to the starting node A with complexity O(V).
To find the next node to add, simply find the minimum element of distance (and remove it from the list). This is O(V).
After adding a new node, there are O(V) new edges connecting the tree to the remaining nodes; for each of these determine if the new edge has less weight than the existing distance. If so, update the distance vector. Again, O(V).
Using these three steps reduces the searching time from O(E) to O(V), and adds an extra O(V) step to update the distance vector at each iteration. Since each iteration is now O(V), the overall complexity is O(V^2).
First of all, it's obviously at least O(V^2), because that is how big the adjacency matrix is.
Looking at http://en.wikipedia.org/wiki/Prim%27s_algorithm, you need to execute the step "Repeat until Vnew = V" V times.
Inside that step, you need to work out the shortest link between any vertex in V and any vertex outside V. Maintain an array of size V, holding for each vertex either infinity (if the vertex is in V) or the length of the shortest link between any vertex in V and that vertex, and its length (so in the beginning this just comes from the length of links between the starting vertex and every other vertex). To find the next vertex to add to V, just search this array, at cost V. Once you have a new vertex, look at all the links from that vertex to every other vertex and see if any of them give shorter links from V to that vertex. If they do, update the array. This also cost V.
So you have V steps (V vertexes to add) each taking cost V, which gives you O(V^2)

Resources