Graph In-degree Calculation from Adjacency-list - algorithm

I came across this question in which it was required to calculate in-degree of each node of a graph from its adjacency list representation.
for each u
for each Adj[i] where i!=u
if (i,u) ∈ E
in-degree[u]+=1
Now according to me its time complexity should be O(|V||E|+|V|^2) but the solution I referred instead described it to be equal to O(|V||E|).
Please help and tell me which one is correct.

Rather than O(|V||E|), the complexity of computing indegrees is O(|E|). Let us consider the following pseudocode for computing indegrees of each node:
for each u
indegree[u] = 0;
for each u
for each v \in Adj[u]
indegree[v]++;
First loop has linear complexity O(|V|). For the second part: for each v, the innermost loop executes at most |E| times, while the outermost loop executes |V| times. Therefore the second part appears to have complexity O(|V||E|). In fact, the code executes an operation once for each edge, so a more accurate complexity is O(|E|).

According to http://www.cs.yale.edu/homes/aspnes/pinewiki/C(2f)Graphs.html, Section 4.2, with an adjacency list representation,
Finding predecessors of a node u is extremely expensive, requiring looking through every list of every node in time O(n+m), where m is the total number of edges.
So, in the notation used here, the time complexity of computing the in-degree of a node is O(|V| + |E|).
This can be reduced at the cost of additional space of using extra space, however. The Wiki also states that
adding a second copy of the graph with reversed edges lets us find all predecessors of u in O(d-(u)) time, where d-(u) is u's in-degree.
An example of a package which implements this approach is the Python package Networkx. As you can see from the constructor of the DiGraph object for directional graphs, networkx keeps track of both self._succ and self._pred, which are dictionaries representing the successors and predecessors of each node, respectively. This allows it to compute each node's in_degree efficiently.

O(|V|+|E|) is the correct answer, because you visit each vertex in O(|V|) and each time you visit a fraction of the edges so O(|E|) in total, also usually |E|>>|V| so O(|E|) is also correct

Related

Time Complexity Analysis of BFS

I know that there are a ton of questions out there about the time complexity of BFS which is : O(V+E)
However I still struggle to understand why is the time complexity O(V+E) and not O(V*E)
I know that O(V+E) stands for O(max[V,E]) and my only guess is that it has something to do with the density of the graph and not with the algorithm itself unlike say Merge Sort where it's time complexity is always O(n*logn).
Examples I've thought of are :
A Directed Graph with |E| = |V|-1 and yeah the time complexity will be O(V)
A Directed Graph with |E| = |V|*|V-1| and the complexity would in fact be O(|E|) = O(|V|*|V|) as each vertex has an outgoing edge to every other vertex besides itself
Am I in the right direction? Any insight would be really helpful.
Your "examples of thought" illustrate that the complexity is not O(V*E), but O(E). True, E can be a large number in comparison with V, but it doesn't matter when you say the complexity is O(E).
When the graph is connected, then you can always say it is O(E). The reason to include V in the time complexity, is to cover for the graphs that have many more vertices than edges (and thus are disconnected): the BFS algorithm will not only have to visit all edges, but also all vertices, including those that have no edges, just to detect that they don't have edges. And so we must say O(V+E).
The complexity comes off easily if you walk through the algorithm. Let Q be the FIFO queue where initially it contains the source node. BFS basically does the following
while Q not empty
pop u from Q
for each adjacency v of u
if v is not marked
mark v
push v into Q
Since each node is added once and removed once then the while loop is done O(V) times. Also each time we pop u we perform |adj[u]| operations where |adj[u]| is the number of
adjacencies of u.
Therefore the total complexity is Sum (1+|adj[u]|) over all V which is O(V+E) since the sum of adjacencies is O(E) (2E for undirected graph and E for a directed one)
Consider a situation when you have a tree, maybe even with cycles, you start search from the root and your target is the last leaf of your tree. In this case you will traverse all the edges before you get into your destination.
E.g.
0 - 1
1 - 2
0 - 2
0 - 3
In this scenario you will check 4 edges before you actually find a node #3.
It depends on how the adjacency list is implemented. A properly implemented adjacency list is a list/array of vertices with a list of related edges attached to each vertex entry.
The key is that the edge entries point directly to their corresponding vertex array/list entry, they never have to search through the vertex array/list for a matching entry, they can just look it up directly. This insures that the total number of edge accesses is 2E and the total number of vertex accesses is V+2E. This makes the total time O(E+V).
In improperly implemented adjacency lists, the vertex array/list is not directly indexed, so to go from an edge entry to a vertex entry you have to search through the vertex list which is O(V), which means that the total time is O(E*V).

Time complexity of greedy algorithm to find a maximal independent set of a graph

A simple greedy algorithm to find a maximal independent set, I think it will take O(n) time since no vertex will be visited more than twice. Why wiki said it would take O(m) time?
Greedy(G)
while G is not empty (visited V in an arbitrary order)
mark v as IS and v's neighbors as Non-IS
return all IS vertices
If you run it on Kn/2,n/2, the neighbors on the side not chosen each get marked as non-IS n/2 times.

Minimum spanning tree to minimize cost

Can someone please help me solve this problem?
We have a set E of roads, a set H of highways, and a set V of different cities. We also have a cost x(i) associated to each road i and a cost y(i) associated to each highways i. We want to build the roads to connect the cities, with the conditions that there is always a path between any pair of cities and that we can build at most one highway, which may be cheaper than a road.
Set E and set H are different, and their respective costs are unrelated.
Design an algorithm to build the roads (with at most one highway) that minimize the total cost.
So, what we have is a fully connected graph of edges.
Solution steps:
Find the minimum spanning tree for the roads alone and consider it as the minimum cost.
Add one highway to the roads graph an calculate the minimum spanning cost tree again.
compare step 2 cost with the minimum cost to replace it if its smaller.
remove that high way.
go back to step 2 and go the steps again for each highway.
O(nm) = m*mst_cost(n)
Using Prim's or Kruskal's to build an MST: O(E log V).
The problem is the constraint of at most 1 highway.
1. Naive method to solve this:
For each possible highway, build the MST from scratch.
Time complexity of this solution: O(H E log V)
2. Alternative
Idea: If you build an MST, you can refine the MST with a better MST if you have an additional available edge you have not considered before.
Suppose the new edge connects (u,v). If you use this edge, you can remove the most expensive edge in the path between vertices u and v in the MST. You can find the path naively in O(V) time.
Using this idea, the time complexity is the cost to build the initial MST O(E log V) and the time to try to refine the MST with each of the H highways. The total algorithmic complexity is therefore O(E log V + H V), which is better than the first solution.
3. Optimized refinement
Instead of doing a naive path-searching method with the second method, we can find a faster way to do this. One related problem is LCA (lowest-common ancestor). A good way of solving LCA is using jump pointers. First you root hte tree, then each vertex will have jump pointers towards the root (1 step, 2 steps, 4 steps etc.) Pre-processing might cost O(V log V) time, and finding the LCA of 2 vertices is O(log V) worst case (although it is actually O(log (depth of tree)) which is usually better).
Once you have found the LCA, that implicitly gives you the path between vertices u and v. However, to find the most expensive edge to delete could be expensive since traversing the path is costly.
In 1-dimensional problems, the range-maximum-query (RMQ) can be employed. This uses a segment tree to solve the RMQ in O(log N) time.
Instead of a 1-dimensional space (like an array), we have a tree. However, we can apply the same idea, and build a segment tree-like structure. In fact, this is equivalent to bundling an extra piece of information with each jump pointer. To find the LCA, each vertex in the tree will have log(tree depth) jump pointers towards the root. We can bundle the maximum edge weight of the edges we jump over with the jump pointer. The cost of adding this information is the same as creating the jump pointer in the first place. Therefore, a slight refinement to the LCA algorithm allows us to find the maximum edge weight on the path between vertices u and v in O(log (depth)) time.
Finally, putting it together, the algorithmic complexity of this 3rd solution is O(E log V + H log V) or equivalently O((E+H) log V).

graph - How to find Minimum Directed Cycle (minimum total weight)?

Here is an excise:
Let G be a weighted directed graph with n vertices and m edges, where all edges have positive weight. A directed cycle is a directed path that starts and ends at the same vertex and contains at least one edge. Give an O(n^3) algorithm to find a directed cycle in G of minimum total weight. Partial credit will be given for an O((n^2)*m) algorithm.
Here is my algorithm.
I do a DFS. Each time when I find a back edge, I know I've got a directed cycle.
Then I will temporarily go backwards along the parent array (until I travel through all vertices in the cycle) and calculate the total weights.
Then I compare the total weight of this cycle with min. min always takes the minimum total weights. After the DFS finishes, our minimum directed cycle is also found.
Ok, then about the time complexity.
To be honest, I don't know the time complexity of my algorithm.
For DFS, the traversal takes O(m+n) (if m is the number of edges, and n is the number of vertices). For each vertex, it might point back to one of its ancestors and thus forms a cycle. When a cycle is found, it takes O(n) to summarise the total weights.
So I think the total time is O(m+n*n). But obviously it is wrong, as stated in the excise the optimal time is O(n^3) and the normal time is O(m*n^2).
Can anyone help me with:
Is my algorithm correct?
What is the time complexity if my algorithm is correct?
Is there any better algorithm for this problem?
You can use Floyd-Warshall algorithm here.
The Floyd-Warshall algorithm finds shortest path between all pairs of vertices.
The algorithm is then very simple, go over all pairs (u,v), and find the pair that minimized dist(u,v)+dist(v,u), since this pair indicates on a cycle from u to u with weight dist(u,v)+dist(v,u). If the graph also allows self-loops (an edge (u,u)) , you will also need to check them alone, because those cycles (and only them) were not checked by the algorithm.
pseudo code:
run Floyd Warshall on the graph
min <- infinity
vertex <- None
for each pair of vertices u,v
if (dist(u,v) + dist(v,u) < min):
min <- dist(u,v) + dist(v,u)
pair <- (u,v)
return path(u,v) + path(v,u)
path(u,v) + path(v,u) is actually the path found from u to v and then from v to u, which is a cycle.
The algorithm run time is O(n^3), since floyd-warshall is the bottle neck, since the loop takes O(n^2) time.
I think correctness in here is trivial, but let me know if you disagree with me and I'll try to explain it better.
Is my algorithm correct?
No. Let me give a counter example. Imagine you start DFS from u, there are two paths p1 and p2 from u to v and 1 path p3 from v back to u, p1 is shorter than p2.
Assume you start by taking the p2 path to v, and walk back to u by path p3. One cycle found but apparently it's not minimum. Then you continue exploring u by taking the p1 path, but since v is fully explored, the DFS ends without finding the minimum cycle.
"For each vertex, it might point back to one of its ancestors and thus forms a cycle"
I think it might point back to any of its ancestors which means N
Also, how are u going to mark vertexes when you came out of its dfs, you may come there again from other vertex and its going to be another cycle. So this is not (n+m) dfs anymore.
So ur algo is incomplete
same here
3.
During one dfs, I think the vertex should be either unseen, or check, and for checked u can store the minimum weight for the path to the starting vertex. So if on some other stage u find an edge to that vertex u don't have to search for this path any more.
This dfs will find the minimum directed cycle containing first vertex. and it's O(n^2) (O(n+m) if u store the graph as list)
So if to do it from any other vertex its gonna be O(n^3) (O(n*(n+m))
Sorry, for my english and I'm not good at terminology
I did a similar kind of thing but i did not use any visited array for dfs (which was needed for my algorithm to work correctly) and hence i realised that my algorithm was of exponential complexity.
Since, you are finding all cycles it is not possible to find all cycles in less than exponential time since there can be 2^(e-v+1) cycles.

Fastest algorithm for detecting a loop in a graph

Given an undirected graph what is the best algorithm to detect if it contains a cycle or not?
A breadth first or depth first search while keeping track of visited nodes is one method, but it's O(n^2). Is there anything faster?
The BFS and DFS algorithm for given graph G(V,E) has time complexity of O(|V|+|E|). So as you can see it's linear dependency of input. You can perform some heuristics in case you have very specialized graph, but in general it's not so bad to use DFS for that. You can check for some information here. Anyway you have to traverse the whole graph.
Here's your O(V) algorithm:
def hasCycles(G, V, E):
if E>=V:
return True
else:
# here E<V
# perform O(E+V) = O(V) algorithm
...
The ... can be performed with DFS. If you have E<V and edges are stored in a meaningful way (as a list), you can probably do O(E)+logs which would make the whole algorithm O(min(E,V))+logs.
Hope you like this answer, though a bit late!
Testing for the presence of a cycle in a graph G(V,E) using Depth First Search is O(|V|,|E|) if the graph is represented using an adjacency list.
It is necessary to traverse the entire graph to show there are no cycles. If you are simply interested in the presence/absence of a cycle, you can obviously finish at the point a cycle is discovered.
If you have a simple graph, you can calculate the cyclomatic number:
C = E − N + P
Where C is the number of cycles, E is the number of edges, N is the number of nodes and P is the number of components. If you graph is connected, it is:
C = E - N + 1

Resources