Can someone explain the time complexity of BFS algorithm O(V+E), if there is a for loop inside the while loop? I am solving the time complexity of BFS algorithm and I still didn't get it.
The reason the time complexity is O(V+E) is because as BFS goes through the graph, it marks the nodes as visited so it knows not to revisit them. Because each edge only has two endpoints, and each endpoint can only be visited once, the edge can only be observed at most twice. So, BFS never visits any "thing" more than twice (nodes, of which there are V many, it only can look at once, and edges, of which there are E many, can only be looked at twice). This means that T(V, E) < 2(V + E) which makes T(V, E) = O(V + E).
Related
Here is a snippet of my pseudo code to find the MST of each Strong Connect Component (SCC) given a graph, G:
Number of SCC, K <- apply Kosaraju's algorithm on Graph G O(V + E)
Loop through K components:
each K components <- apply Kruskal's algorithm
According to what I have learnt, Kruskal's algorithm run in O(E log V) time.
However, I am unsure of the worst case time complexity of the loop. My thought is that the worst case would occur when K = 1. Hence, the big O time complexity would simply just be O(E log V).
I do not know if my thoughts are correct or if they are, what's the justification for it are.
Yes, intuitively you’re saving the cost of comparing edges in one
component with edges in another. Formally, f(V, E) ↦ E log V is a convex
function, so f(V1, E1) + f(V2,
E2) ≤ f(V1 + V2, E1 +
E2), which implies that the cost of handling multiple
components separately is never more than handling them together. Of
course, as you observe, there may be only one component, in which case
there are no savings to be had.
Lets say I have implemented dijkstras using a PriorityQueue so that adding and removing from the unvisited nodes takes O(log n).
The PQ will contain at most E nodes so to empty it we get O(E). While PQ is not empty we take the best node and remove it, visit if not visited and go through all of its neighbors (potentially adding them to the PQ).
What I do not understand: How can going through all neighbors (at worst V) for at worst E items not have the time-complexity of O(E*V). I have seen so many explanation that we are supposed to just look at the operations and observe how many times they will execute and draw our conclusions from this. I do not see how we can disregard the fact that we are looping through V neighbors, an empty for-loop of n items is still O(n)?
For me the final complexity seems to be O(V + E*V log E) instead of O(V + V log E). I mean there are a lot of variances but the main point is I am missing something trivial :P
First point of terminology that you seem to have confused. E is not the number of items, it is the number of edges between vertices. V is the number of vertices, which (depending on context) is likely to be the number of items.
Next, "this vertex is a neighbor of that vertex" means that there is an edge between them. Each edge contributes 2 neighbor relationships. (One in each direction.) Therefore 2 E is the number of neighbor relationships that can exist, total.
Your intuition that every one of V nodes can have up to V-1 neighbors for a total of V2-V neighbor relationships is correct - but you can tell how close you are to that worst case from the number of edges.
Therefore we wind up with the following potential work:
for each of E edges:
for each vertex on that edge:
O(1) work to check whether it was processed yet
(processing the vertex if needed accounted for separately)
for each of V vertices:
O(log(V)) to add to the priority queue
O(log(V)) to remove from the priority queue
(processing its edges accounted for separately
The first chunk is O(E). The second chunk is O(V log(V)). The total is O(E + V log(V)).
Hopefully this explanation clarifies why the complexity is what it is.
Let G(V, E)be an undirected graph with positive edge weights. Dijkstra’s single source shortest path algorithm can be implemented using the binary heap data structure with time complexity:
1. O(|V|^2)
2. O(|E|+|V|log|V|)
3. O(|V|log|V|)
4. O((|E|+|V|)log|V|)
========================================================================
Correct answer is -
O((|E|+|V|)log|V|)
=========================================================================
My Approach is as follows -
O(V+V+VlogV+ElogV) = O(ElogV)
O(V) to intialize.
O(V) to Build Heap.
VlogV to perform Extract_Min
ElogV to perform Decrease Key
Now, as I get O(ElogV) and when I see options, a part of me says the
correct one is O(VlogV) because for a sparse Graph |V| = |E|, but as I
said the correct answer is O((|E|+|V|)log|V|). So, where am I going
wrong?
Well, you are correct that the complexity is actually O(E log V).
Since E can be up to (V^2 - V)/2, this is not the same as O(V log V).
If every vertex has an edge, then V <= 2E, so in that case, O(E log V) = O( (E+V) log V). That is the usual case, and corresponds to the "correct" answer.
But technically, O(E log V) is not the same as O( (E+V) log V), because there may be a whole bunch of disconnected vertexes in V. When that is the case, however, Dijkstra's algorithm will never see all those vertexes, since it only finds vertexes connected to the single source. So, when the difference between these two complexities is important, you are right and the "correct answer" is not.
Let me put it this way.The correct answer is O((E+V)logV)).If the graph has the source vertex not reachable to all of the other vertices,VlogV could be more than ElogV.But if we assume that the source vertex is reachable to every other vertex, the graph will have at least V-1 edges.So,it will be ElogV.It is more to do with the reachability from the source vertex.
I am trying to develop an algorithm which will be able to find minimum spanning tree from a graph.I know there are already many existing algorithms for it.However I am trying to eliminate the sorting of edges required in Kruskal's Algorithm.The algorithm I have developed so far has a part where counting of disjoint sets is needed and I need a efficient method for it.After a lot of study I came to know that the only possible way is using BFS or DFS which has a complexity of O(V+E) whereas Kruskal's algorithms has a complexity of O(ElogE).Now my question is which one is better,O(V+E) or O(ElogE)?
In general, E = O(V^2), but that bound may not be tight for all graphs. In particular, in a sparse graph, E = O(V), but for an algorithm complexity is usually stated as a worst-case value.
O(V + E) is a way of indicating that the complexity depends on how many edges there are. In a sparse graph, O(V + E) = O(V + V) = O(V), while in a dense graph O(V + E) = O(V + V^2) = O(V^2).
Another way of looking at it is to see that in big-O notation, O(X + Y) means the same thing as O(max(X, Y)).
Note that this is only useful when V and E might have the same magnitude. For Kruskal's algorithm, the dominating factor is that you need to sort the list of edges. Whether you have a sparse graph or a dense graph, that step dominates anything that might be O(V), so one simply writes O(E lg E) instead of O(V + E lg E).
I understand that time complexity of BFS in a graph traversal is O( V + E ) since every vertex and every edge will be explored in the worst case.
Well,is the exact time complexity v+2E ??
Every vertex is explored once+ Every adjacent vertices
The sum of the degree of all the vertices in a graph= No of edges*2= 2E
Thus the time complexity is n+2E..Am i correct?
For a random graph, the time complexity is O(V+E): Breadth-first search
As stated in the link, according to the topology of your graph, O(E) may vary from O(V) (if your graph is acyclic) to O(V^2) (if all vertices are connected with each other).
Therefore the time complexity varies fromO(V + V) = O(V) to O(V + V^2) = O(V^2) according to the topology of your graph.
Besides, since |V| <= 2 |E|, then O(3E) = O(E) is also correct, but the bound is looser.
Assumptions
Let's assume that G is connected and undirected. If it's not connected, then you can apply the below idea to every connected component of G independently. In addition, let's assume that G is represented as an adjacency lists and for every vertex v, we can decide if v was visited in O(1) time for example using a lookup table.
Analyze
If you want to count the exact number of steps in the BFS you can observe that:
Since G is connected, BFS will visit every vertex exactly once, so we count |V| visits in nodes. Notice that in one visit you may perform more operations, not counting looping over edges, than just marking current vertex visited.
For every vertex v we want to count, how many edges the BFS examines at this vertex.
You have to loop over all edges of v to perform the BFS. If you skip one edge, then it's easy to show that BFS is not correct. So every edge is examined twice.
One question may arise here. Someone could ask, if there is a need to examine the edge (p, v) in vertex v, where p is the parent of v in already constructed BFS tree, i.e. we came to v directly from p. Of course you don't have to consider this edge, but deciding to skip this edge also costs at least one additional operation:
for (v, u) in v.edges:
if u == p: # p is the parent of v in already constructed tree
continue
if not visited[u]:
BFS(u, parent=v)
It examines the same number of edges that the code below, but has higher complexity, because for all but one edge, we run two if-statements rather than one.
for (v, u) in v.edges:
if not visited[u]: # if p is the parent of v, then p is already visited
BFS(u, parent=v)
Conclusion
You may even develop a different method to skip edge (v, p), but it always takes at least one operation, so it's a wasteful effort.