The running time complexity of BFS for unconnected and indirected graph - algorithm

I know that for a connected indirected graph, the running time for the BFS is O(V+E). But what if the graph is not connected? Then I assume we need to run a loop to check each vertex's condition first (visited or not).
Here is the simple qseudocode for my idea. Assume every vertex has color white as the sign of not visited at beginning. Gray as visited, black as taken.
//BFS for unconnected indirection graph
BFS(G):
for each v in G:
if (v.color is white) do:
v.color = gray;
enqueue(Q, v);
while Q is not empty do:
u = dequeue(Q);
while s is adjacent to u has color white
s.color = gray;
enqueue(Q,s);
u.color = black;
This is my guess of qseudocode for the unconnected indirected graph. I am having trouble to figure out the running time. I think it is still O(V + E), but I cannot really give a reasonable explanation.
May I know how to clarify the running time of this qseudocode? Or if my qseudocode is inefficient, please let me know an efficient one.
Thanks!

Yes, the time complexity is still O(V + E).
Just check each loop and the maximum number of times it can possibly run.
The outer loop will have O(V) steps.
The loop checking queue will also have O(V) steps as each node in the graph is only being inserted in the queue once (when it was colored white).
The tricky part is the third loop checking adjacent nodes of u. Note that we have already established that u will represent each node in the graph exactly once. If you are using an adjacency list for graph representation, this step will take O(E) time.
Total time complexity: O(V + E).

Yes you are correct. The outer loop iterates over all vertices at most once so is in O(|V|). The inner portion is the BFS for connected graphs, that is O(|V|+|E|). Then, overall it stays in O(|V|+|E|) since you look at every vertex and every edge at most O(1) times.
As a more general explanation, in the graph you have a linear number of vertices but you can have a quadratic number of edges, think about a complete graph. So, if the graph is disconnected you simple have less edges to traverse.

Related

Time Complexity Analysis of BFS

I know that there are a ton of questions out there about the time complexity of BFS which is : O(V+E)
However I still struggle to understand why is the time complexity O(V+E) and not O(V*E)
I know that O(V+E) stands for O(max[V,E]) and my only guess is that it has something to do with the density of the graph and not with the algorithm itself unlike say Merge Sort where it's time complexity is always O(n*logn).
Examples I've thought of are :
A Directed Graph with |E| = |V|-1 and yeah the time complexity will be O(V)
A Directed Graph with |E| = |V|*|V-1| and the complexity would in fact be O(|E|) = O(|V|*|V|) as each vertex has an outgoing edge to every other vertex besides itself
Am I in the right direction? Any insight would be really helpful.
Your "examples of thought" illustrate that the complexity is not O(V*E), but O(E). True, E can be a large number in comparison with V, but it doesn't matter when you say the complexity is O(E).
When the graph is connected, then you can always say it is O(E). The reason to include V in the time complexity, is to cover for the graphs that have many more vertices than edges (and thus are disconnected): the BFS algorithm will not only have to visit all edges, but also all vertices, including those that have no edges, just to detect that they don't have edges. And so we must say O(V+E).
The complexity comes off easily if you walk through the algorithm. Let Q be the FIFO queue where initially it contains the source node. BFS basically does the following
while Q not empty
pop u from Q
for each adjacency v of u
if v is not marked
mark v
push v into Q
Since each node is added once and removed once then the while loop is done O(V) times. Also each time we pop u we perform |adj[u]| operations where |adj[u]| is the number of
adjacencies of u.
Therefore the total complexity is Sum (1+|adj[u]|) over all V which is O(V+E) since the sum of adjacencies is O(E) (2E for undirected graph and E for a directed one)
Consider a situation when you have a tree, maybe even with cycles, you start search from the root and your target is the last leaf of your tree. In this case you will traverse all the edges before you get into your destination.
E.g.
0 - 1
1 - 2
0 - 2
0 - 3
In this scenario you will check 4 edges before you actually find a node #3.
It depends on how the adjacency list is implemented. A properly implemented adjacency list is a list/array of vertices with a list of related edges attached to each vertex entry.
The key is that the edge entries point directly to their corresponding vertex array/list entry, they never have to search through the vertex array/list for a matching entry, they can just look it up directly. This insures that the total number of edge accesses is 2E and the total number of vertex accesses is V+2E. This makes the total time O(E+V).
In improperly implemented adjacency lists, the vertex array/list is not directly indexed, so to go from an edge entry to a vertex entry you have to search through the vertex list which is O(V), which means that the total time is O(E*V).

Why is the complexity of BFS O(V+E) instead of O(E)? [duplicate]

This question already has answers here:
Why is time complexity for BFS/DFS not simply O(E) instead of O(E+V)?
(2 answers)
Breadth First Search time complexity analysis
(8 answers)
Closed 2 years ago.
This is a generic BFS implementation:
For a connected graph with V nodes and E total number of edges, we know that every edge will be considered twice in the inner loop. So if the total number of iterations in the inner loop of BFS is going to be 2 * number of edges E, isn't the runtime going to be O(E) instead?
This is a case where one needs to look a little deeper at the implementation. In particular, how do I determine if a node is visited or not?
The traditional algorithm does this by coloring the vertices. All vertices are colored white at first, and they get colored black as they are visited. Thus visitation can be determined simply by looking at the color of the vertex. If you use this approach, then you have to do O(V) worth of initialization work setting the color of each vertex to white at the start.
You could manage your colors differently. You could maintain a data structure containing all visited nodes. If you did this, you could avoid the O(V) initialization cost. However, you will pay that cost elsewhere in the data structure. For example, if you stored them all in a balanced tree, each if w is not visited now costs O(log V).
This obviously gives you a choice. You can have O(V+E) using the traditional coloring approach, or you can have O(E log V) by storing this information in your own data structure.
You specify a connected graph in your problem. In this case, O(V+E) == O(E) because the number of vertices can never be more than E+1. However, the time complexity of BFS is typically given with respect to an arbitrary graph, which can include a very sparse graph.
If a graph is sufficiently sparse (say, a million vertices and five edges), the cost of initialization may be great enough that you want to switch to a O(E ln V) algorithm. However, these are pretty rare in a practical setting. In a practical setting, the speed of the traditional approach (giving each vertex a color) is just so blinding fast compared to the more fancy data structures that you choose this traditional coloring scheme for everything except the most extraordinarily sparse graphs.
If you maintained a dedicated color property on your vertices with an invariant rule that all nodes are black between algotihm invocations, you could drop the cost to O(E) by doing each BFS twice. On your first pass, you could set them all to white, and then do a second pass to turn them all black. If you had a very sparse graph, this could be more efficient.
Well, let's break it up into easy pieces...
You've kept a visited array, and by looking it up, you decide whether to push a node into the queue or not. Once visited, you don't push it again. So, how many nodes get pushed into the queue: (of course) V nodes. And it's complexity is O(V).
Now, each time, you take out a node from queue and visit all of its neighboring nodes. Now, following this way, for all of V nodes, how many node you'll come across. Well, it's the number of edges if the graph is unidirectional, or, 2 * number of edges if the graph is bidirectional. So, the complexity would be O(E) for unidirectional and O(2 * E) for bidirectional.
So, the ultimate(i.e. total) complexity would be O(V + E) or O(V + 2 * E) or generally, we may say O(v + E).
Because there might be graph having edges less than number of vertices.
Consider this graph:
1 ---- 2
|
|
3 ---- 4
There are 4 vertices but only 3 edges, and in BFS you have to traverse each and every vertex. Thatswhy time complexity is O(V+E) as it considers both V as well as E.

Why is the complexity of BFS O(V+E) instead of O(V*E)?

Some pseudocode here (disregard my style)
Starting from v1(enqueued):
function BFS(queue Q)
v2 = dequeue Q
enqueue all unvisited connected nodes of v2 into Q
BFS(Q)
end // maybe minor problems here
Since there are V vertices in the graph, and these V vertices are connected to E edges, and visiting getting connected nodes (equivalent to visiting connected edges) is in the inner loop (the outer loop is the recursion itself), it seems to me that the complexity should be O(V*E) rather than O(V+E). Can anyone explain this for me?
E is not the number of edges adjacent to each vertex - its actually the total number of edges in the graph. Defining it this way is useful because you don't necessarily have the same number of edges on every single vertex.
Since each edge gets visited once by the time the DFS ends, you get O(E) complexity from that part. Then you add the O(V) for visiting each vertex once and get O(V + E) on total.

Is the worst time complexity of BFS in a graph traversal n+2E?

I understand that time complexity of BFS in a graph traversal is O( V + E ) since every vertex and every edge will be explored in the worst case.
Well,is the exact time complexity v+2E ??
Every vertex is explored once+ Every adjacent vertices
The sum of the degree of all the vertices in a graph= No of edges*2= 2E
Thus the time complexity is n+2E..Am i correct?
For a random graph, the time complexity is O(V+E): Breadth-first search
As stated in the link, according to the topology of your graph, O(E) may vary from O(V) (if your graph is acyclic) to O(V^2) (if all vertices are connected with each other).
Therefore the time complexity varies fromO(V + V) = O(V) to O(V + V^2) = O(V^2) according to the topology of your graph.
Besides, since |V| <= 2 |E|, then O(3E) = O(E) is also correct, but the bound is looser.
Assumptions
Let's assume that G is connected and undirected. If it's not connected, then you can apply the below idea to every connected component of G independently. In addition, let's assume that G is represented as an adjacency lists and for every vertex v, we can decide if v was visited in O(1) time for example using a lookup table.
Analyze
If you want to count the exact number of steps in the BFS you can observe that:
Since G is connected, BFS will visit every vertex exactly once, so we count |V| visits in nodes. Notice that in one visit you may perform more operations, not counting looping over edges, than just marking current vertex visited.
For every vertex v we want to count, how many edges the BFS examines at this vertex.
You have to loop over all edges of v to perform the BFS. If you skip one edge, then it's easy to show that BFS is not correct. So every edge is examined twice.
One question may arise here. Someone could ask, if there is a need to examine the edge (p, v) in vertex v, where p is the parent of v in already constructed BFS tree, i.e. we came to v directly from p. Of course you don't have to consider this edge, but deciding to skip this edge also costs at least one additional operation:
for (v, u) in v.edges:
if u == p: # p is the parent of v in already constructed tree
continue
if not visited[u]:
BFS(u, parent=v)
It examines the same number of edges that the code below, but has higher complexity, because for all but one edge, we run two if-statements rather than one.
for (v, u) in v.edges:
if not visited[u]: # if p is the parent of v, then p is already visited
BFS(u, parent=v)
Conclusion
You may even develop a different method to skip edge (v, p), but it always takes at least one operation, so it's a wasteful effort.

Worst Case Time Complexity of Depth First Search

I know the answer to this particular question is O(V + E) and for a Graph like a tree, it makes sense because each Vertex is being explored once only.
However let's say there is a cycle in the graph.
For example, let's take up an undirected graph with four vertices A-B-C-D.
A is connected to both B and C, and Both B and C are connected to D. So there are four edges in total. A->B, A->C, B->D, C->D and vice versa.
Let's do DFS(A).
It will explore B first and B's neighbor D and D's neighbor C. After that C will not have any edges so it will come back to D and B and then A.
Then A will traverse its second edge and try to explore C and since it is already explored it will not do anything and DFS will end.
But over here Vertex "C" has been traversed twice, not once. Clearly worst case time complexity can be directly proportional to V.
Any ideas?
If you do not maintain a visited set, that you use to avoid revisitting already visited nodes, DFS is not O(V+E). In fact, it is not complete algorithm - thus it might not even find a path if there is a one, because it will be stuck in an infinite loop.
Note that for infinite graphs, if you are looking for a path from s to t, even with maintaining a visited set, it is not guaranteed to complete, since you might get stuck in an infinite branch.
If you are interested in keeping DFS's advantage of efficient space consumption, while still being complete - you might use iterative deepening DFS, but it will not trivially solve the problem if you are looking to discover the whole graph, and not a path to a specific node.
EDIT: DFS pseudo code with visited set.
DFS(v,visited):
for each u such that (v,u) is an edge:
if (u is not in visited):
visited.add(u)
DFS(u,visited)
It is easy to see that you invoke the recursion on a vertex if and only if it is not yet visited, thus the answer is indeed linear in the number of vertices and edges.
You can visit each vertex and edge of the graph a constant number of times and still be O(V+E). An alternative way of looking at it is that the cost is charged to the edge, not to the vertex.

Resources