Why is the complexity of BFS O(V+E) instead of O(V*E)? - algorithm

Some pseudocode here (disregard my style)
Starting from v1(enqueued):
function BFS(queue Q)
v2 = dequeue Q
enqueue all unvisited connected nodes of v2 into Q
BFS(Q)
end // maybe minor problems here
Since there are V vertices in the graph, and these V vertices are connected to E edges, and visiting getting connected nodes (equivalent to visiting connected edges) is in the inner loop (the outer loop is the recursion itself), it seems to me that the complexity should be O(V*E) rather than O(V+E). Can anyone explain this for me?

E is not the number of edges adjacent to each vertex - its actually the total number of edges in the graph. Defining it this way is useful because you don't necessarily have the same number of edges on every single vertex.
Since each edge gets visited once by the time the DFS ends, you get O(E) complexity from that part. Then you add the O(V) for visiting each vertex once and get O(V + E) on total.

Related

Time Complexity Analysis of BFS

I know that there are a ton of questions out there about the time complexity of BFS which is : O(V+E)
However I still struggle to understand why is the time complexity O(V+E) and not O(V*E)
I know that O(V+E) stands for O(max[V,E]) and my only guess is that it has something to do with the density of the graph and not with the algorithm itself unlike say Merge Sort where it's time complexity is always O(n*logn).
Examples I've thought of are :
A Directed Graph with |E| = |V|-1 and yeah the time complexity will be O(V)
A Directed Graph with |E| = |V|*|V-1| and the complexity would in fact be O(|E|) = O(|V|*|V|) as each vertex has an outgoing edge to every other vertex besides itself
Am I in the right direction? Any insight would be really helpful.
Your "examples of thought" illustrate that the complexity is not O(V*E), but O(E). True, E can be a large number in comparison with V, but it doesn't matter when you say the complexity is O(E).
When the graph is connected, then you can always say it is O(E). The reason to include V in the time complexity, is to cover for the graphs that have many more vertices than edges (and thus are disconnected): the BFS algorithm will not only have to visit all edges, but also all vertices, including those that have no edges, just to detect that they don't have edges. And so we must say O(V+E).
The complexity comes off easily if you walk through the algorithm. Let Q be the FIFO queue where initially it contains the source node. BFS basically does the following
while Q not empty
pop u from Q
for each adjacency v of u
if v is not marked
mark v
push v into Q
Since each node is added once and removed once then the while loop is done O(V) times. Also each time we pop u we perform |adj[u]| operations where |adj[u]| is the number of
adjacencies of u.
Therefore the total complexity is Sum (1+|adj[u]|) over all V which is O(V+E) since the sum of adjacencies is O(E) (2E for undirected graph and E for a directed one)
Consider a situation when you have a tree, maybe even with cycles, you start search from the root and your target is the last leaf of your tree. In this case you will traverse all the edges before you get into your destination.
E.g.
0 - 1
1 - 2
0 - 2
0 - 3
In this scenario you will check 4 edges before you actually find a node #3.
It depends on how the adjacency list is implemented. A properly implemented adjacency list is a list/array of vertices with a list of related edges attached to each vertex entry.
The key is that the edge entries point directly to their corresponding vertex array/list entry, they never have to search through the vertex array/list for a matching entry, they can just look it up directly. This insures that the total number of edge accesses is 2E and the total number of vertex accesses is V+2E. This makes the total time O(E+V).
In improperly implemented adjacency lists, the vertex array/list is not directly indexed, so to go from an edge entry to a vertex entry you have to search through the vertex list which is O(V), which means that the total time is O(E*V).

Runtime of singe-source shortest paths in the directed acyclic graphs algorithm

Here is the Algorithm:
Topologically sort the Vertices of G
Initialize - Single - Source(G,s)
for each vertex u, taken in topologically sorted order
for each vertex v in G.Adjacent[u]
Relax(u,v,w)
Topological sort has Runtime O(V + E), where V - is the number of
Vertices and E - is a number of edges
Initialize - Single - Source(G,s) has runtime O(V)
The main question is double for Loop: The running time of the double for Loop is O(V + E). But I cannot understand, why it's not O(V*E)? Because for every Vertices we go through every edge and normally one nested Loop(all together 2 for Loops) have complexity O(N^2), but in this case it's not true.
For each vertex u, you only iterate through the edges that go out from u. Each distinct edge is visited only once, and that's why the algorithm takes O(V+E) time.
This assumes you are using a graph representation (like adjacency lists, not a matrix) that allows quick access to every vertex's adjacent edges. The topological sort also requires this.

The running time complexity of BFS for unconnected and indirected graph

I know that for a connected indirected graph, the running time for the BFS is O(V+E). But what if the graph is not connected? Then I assume we need to run a loop to check each vertex's condition first (visited or not).
Here is the simple qseudocode for my idea. Assume every vertex has color white as the sign of not visited at beginning. Gray as visited, black as taken.
//BFS for unconnected indirection graph
BFS(G):
for each v in G:
if (v.color is white) do:
v.color = gray;
enqueue(Q, v);
while Q is not empty do:
u = dequeue(Q);
while s is adjacent to u has color white
s.color = gray;
enqueue(Q,s);
u.color = black;
This is my guess of qseudocode for the unconnected indirected graph. I am having trouble to figure out the running time. I think it is still O(V + E), but I cannot really give a reasonable explanation.
May I know how to clarify the running time of this qseudocode? Or if my qseudocode is inefficient, please let me know an efficient one.
Thanks!
Yes, the time complexity is still O(V + E).
Just check each loop and the maximum number of times it can possibly run.
The outer loop will have O(V) steps.
The loop checking queue will also have O(V) steps as each node in the graph is only being inserted in the queue once (when it was colored white).
The tricky part is the third loop checking adjacent nodes of u. Note that we have already established that u will represent each node in the graph exactly once. If you are using an adjacency list for graph representation, this step will take O(E) time.
Total time complexity: O(V + E).
Yes you are correct. The outer loop iterates over all vertices at most once so is in O(|V|). The inner portion is the BFS for connected graphs, that is O(|V|+|E|). Then, overall it stays in O(|V|+|E|) since you look at every vertex and every edge at most O(1) times.
As a more general explanation, in the graph you have a linear number of vertices but you can have a quadratic number of edges, think about a complete graph. So, if the graph is disconnected you simple have less edges to traverse.

Is the worst time complexity of BFS in a graph traversal n+2E?

I understand that time complexity of BFS in a graph traversal is O( V + E ) since every vertex and every edge will be explored in the worst case.
Well,is the exact time complexity v+2E ??
Every vertex is explored once+ Every adjacent vertices
The sum of the degree of all the vertices in a graph= No of edges*2= 2E
Thus the time complexity is n+2E..Am i correct?
For a random graph, the time complexity is O(V+E): Breadth-first search
As stated in the link, according to the topology of your graph, O(E) may vary from O(V) (if your graph is acyclic) to O(V^2) (if all vertices are connected with each other).
Therefore the time complexity varies fromO(V + V) = O(V) to O(V + V^2) = O(V^2) according to the topology of your graph.
Besides, since |V| <= 2 |E|, then O(3E) = O(E) is also correct, but the bound is looser.
Assumptions
Let's assume that G is connected and undirected. If it's not connected, then you can apply the below idea to every connected component of G independently. In addition, let's assume that G is represented as an adjacency lists and for every vertex v, we can decide if v was visited in O(1) time for example using a lookup table.
Analyze
If you want to count the exact number of steps in the BFS you can observe that:
Since G is connected, BFS will visit every vertex exactly once, so we count |V| visits in nodes. Notice that in one visit you may perform more operations, not counting looping over edges, than just marking current vertex visited.
For every vertex v we want to count, how many edges the BFS examines at this vertex.
You have to loop over all edges of v to perform the BFS. If you skip one edge, then it's easy to show that BFS is not correct. So every edge is examined twice.
One question may arise here. Someone could ask, if there is a need to examine the edge (p, v) in vertex v, where p is the parent of v in already constructed BFS tree, i.e. we came to v directly from p. Of course you don't have to consider this edge, but deciding to skip this edge also costs at least one additional operation:
for (v, u) in v.edges:
if u == p: # p is the parent of v in already constructed tree
continue
if not visited[u]:
BFS(u, parent=v)
It examines the same number of edges that the code below, but has higher complexity, because for all but one edge, we run two if-statements rather than one.
for (v, u) in v.edges:
if not visited[u]: # if p is the parent of v, then p is already visited
BFS(u, parent=v)
Conclusion
You may even develop a different method to skip edge (v, p), but it always takes at least one operation, so it's a wasteful effort.

Algorithm to check if directed graph is strongly connected

I need to check if a directed graph is strongly connected, or, in other words, if all nodes can be reached by any other node (not necessarily through direct edge).
One way of doing this is running a DFS and BFS on every node and see all others are still reachable.
Is there a better approach to do that?
Consider the following algorithm.
Start at a random vertex v of the graph G, and run a DFS(G, v).
If DFS(G, v) fails to reach every other vertex in the graph G, then there is some vertex u, such that there is no directed path from v to u, and thus G is not strongly connected.
If it does reach every vertex, then there is a directed path from v to every other vertex in the graph G.
Reverse the direction of all edges in the directed graph G.
Again run a DFS starting at v.
If the DFS fails to reach every vertex, then there is some vertex u, such that in the original graph there is no directed path from u to v.
On the other hand, if it does reach every vertex, then in the original graph there is a directed path from every vertex u to v.
Thus, if G "passes" both DFSs, it is strongly connected. Furthermore, since a DFS runs in O(n + m) time, this algorithm runs in O(2(n + m)) = O(n + m) time, since it requires 2 DFS traversals.
Tarjan's strongly connected components algorithm (or Gabow's variation) will of course suffice; if there's only one strongly connected component, then the graph is strongly connected.
Both are linear time.
As with a normal depth first search, you track the status of each node: new, seen but still open (it's in the call stack), and seen and finished. In addition, you store the depth when you first reached a node, and the lowest such depth that is reachable from the node (you know this after you finish a node). A node is the root of a strongly connected component if the lowest reachable depth is equal to its own depth. This works even if the depth by which you reach a node from the root isn't the minimum possible.
To check just for whether the whole graph is a single SCC, initiate the dfs from any single node, and when you've finished, if the lowest reachable depth is 0, and every node was visited, then the whole graph is strongly connected.
To check if every node has both paths to and from every other node in a given graph:
1. DFS/BFS from all nodes:
Tarjan's algorithm supposes every node has a depth d[i]. Initially, the root has the smallest depth. And we do the post-order DFS updates d[i] = min(d[j]) for any neighbor j of i. Actually BFS also works fine with the reduction rule d[i] = min(d[j]) here.
function dfs(i)
d[i] = i
mark i as visited
for each neighbor j of i:
if j is not visited then dfs(j)
d[i] = min(d[i], d[j])
If there is a forwarding path from u to v, then d[u] <= d[v]. In the SCC, d[v] <= d[u] <= d[v], thus, all the nodes in SCC will have the same depth. To tell if a graph is a SCC, we check whether all nodes have the same d[i].
2. Two DFS/BFS from the single node:
It is a simplified version of the Kosaraju’s algorithm. Starting from the root, we check if every node can be reached by DFS/BFS. Then, reverse the direction of every edge. We check if every node can be reached from the same root again. See C++ code.
You can calculate the All-Pairs Shortest Path and see if any is infinite.
Tarjan's Algorithm has been already mentioned. But I usually find Kosaraju's Algorithm easier to follow even though it needs two traversals of the graph. IIRC, it is also pretty well explained in CLRS.
test-connected(G)
{
choose a vertex x
make a list L of vertices reachable from x,
and another list K of vertices to be explored.
initially, L = K = x.
while K is nonempty
find and remove some vertex y in K
for each edge (y, z)
if (z is not in L)
add z to both L and K
if L has fewer than n items
return disconnected
else return connected
}
You can use Kosaraju’s DFS based simple algorithm that does two DFS traversals of graph:
The idea is, if every node can be reached from a vertex v, and every node can reach v, then the graph is strongly connected.
In step 2 of the algorithm, we check if all vertices are reachable from v. In step 4, we check if all vertices can reach v (In reversed graph, if all vertices are reachable from v, then all vertices can reach v in original graph).
Algorithm :
1) Initialize all vertices as not visited.
2) Do a DFS traversal of graph starting from any arbitrary vertex v. If DFS traversal doesn’t visit all vertices, then return false.
3) Reverse all arcs (or find transpose or reverse of graph)
4) Mark all vertices as not-visited in reversed graph.
5) Do a DFS traversal of reversed graph starting from same vertex v (Same as step 2). If DFS traversal doesn’t visit all vertices, then return false. Otherwise return true.
Time Complexity: Time complexity of above implementation is same as Depth First Search which is O(V+E) if the graph is represented using adjacency list representation.
One way of doing this would be to generate the Laplacian matrix for the graph, then calculate the eigenvalues, and finally count the number of zeros. The graph is strongly connection if there exists only one zero eigenvalue.
Note: Pay attention to the slightly different method for creating the Laplacian matrix for directed graphs.
The algorithm to check if a graph is strongly connected is quite straightforward. But why does the below algorithm work?
Algorithm: suppose there is a graph with vertices [A, B, C......Z]
Choose any random node, say J, and perform DFS from it. If all the nodes are reachable then continue to step 2.
Reverse the directions of the edges of the graph by doing transpose.
Again run DFS from node J and check if all the nodes are visited. If yes then the graph is strongly connected and return true.
Performing step 1 makes sense because we have to check if we can reach all the nodes from that node. After this, next logical step could be
i) Now do this for all other nodes
ii) or try to reach node J from every other node. Because once you reach node J, you are sure that you can reach every other node because of step 1.
This is what we are trying to do in steps 2 & 3. If in a transposed graph node J is able to reach all other nodes then this implies that in original graph all other nodes can reach J.

Resources