Time Complexity of BFS and DFS for bot Matrix and Adjacency List - algorithm

enter image description here
enter image description here
above are the pseudocode of BFS and DFS.
Now with my calculation I think time complexity for both the code will be O(n), but I also have another confusion that it might be O(V+E) where V stands for Vertex and E stands for Edges. Can anyone give me a detailed time complexity of both the pseudocode.
So in short, what will be the time complexity of the BFS and DFS on both Matrix and Adjacency List.

Let us analyze the time complexity of BFS first for adjacency list implementation.
For breadth-first search, that's what we do:
Start from a node and mark it as visited. Then mark all of the neighbors of that node as visited and add them to a queue. Then fetch the next node from the queue and perform the same operation until the queue is empty. If queue is empty but there are still unvisited nodes, call the BFS function again for that node.
When we are at a node, we check each neighbor of that node to fill up the queue. If a neighbor is already visited (visited[int(neighbor) - 1] = 1), we do not add it to the queue. Neighbor of a node is another node connected to it by an edge, therefore checking all neighbors means checking all the edges. This makes our time complexity O(E). Also since we add each node to the queue (and pop it later), it makes the time complexity O(V).
So which one should we take?
Well, we should take the maximum of E and V. That's why we say O(V+E). If one of them is larger than the other, then smaller can be seen as a constant.
For example, if we have a connected graph with N many nodes, we'll have N*(N-1) edges. At each node, we will check all the neighbors, which makes N*(N-1) many checks. Therefore time complexity will be max(N, N*(N-1)) = N*(N-1) = O(N^2)
For example, if we have a sparse graph with N many nodes, and say sqrt(N) many edges, we have to say time complexity of BFS should be O(N).
Same logic can be applied for DFS. You visit each node and check each edge to dive into the depths of the graph. And again it makes it O(V+E).
As to your assumption, it is partially correct. However, as I explained above we cannot say that time complexity will always be O(N). (I assume N is the number of vertices, you didn't specify that in your question.)
Notice that these are for the adjacency list implementation.
For adjacency matrix implementation, to check neighbors of a node, we have to check all the columns corresponding to the related row, which makes O(V). And we have to do it for all vertices, therefore it is O(V^2).
So, for matrix implementation, time complexity is not dependent on the number of edges. However in most cases O(V+E) << O(V^2), therefore prefer adjacency list implementation.

Related

Time complexity of DFS on DAG without using a visited array

What would be the worst-case time complexity for Depth first search on a Directed Acyclic graph with V nodes and E edges if I do not use something like a visited array and traverse same nodes multiple times?
I believe that the worst-case graph would be one where you repeatedly form acyclic "diamonds" like A->B, A->C, B->D, C->D, such that when at A, you're presented with two paths to get to D. Chaining these together, you'd end up with an exponential number of paths to explore. Specifically, you'd end up with O(2^(n/3)) time complexity to explore such a graph.
I'm not entirely certain that a 2-node choice is the worst, but it does at least provide an upper bound on the worst case time.

Time Complexity Analysis of BFS

I know that there are a ton of questions out there about the time complexity of BFS which is : O(V+E)
However I still struggle to understand why is the time complexity O(V+E) and not O(V*E)
I know that O(V+E) stands for O(max[V,E]) and my only guess is that it has something to do with the density of the graph and not with the algorithm itself unlike say Merge Sort where it's time complexity is always O(n*logn).
Examples I've thought of are :
A Directed Graph with |E| = |V|-1 and yeah the time complexity will be O(V)
A Directed Graph with |E| = |V|*|V-1| and the complexity would in fact be O(|E|) = O(|V|*|V|) as each vertex has an outgoing edge to every other vertex besides itself
Am I in the right direction? Any insight would be really helpful.
Your "examples of thought" illustrate that the complexity is not O(V*E), but O(E). True, E can be a large number in comparison with V, but it doesn't matter when you say the complexity is O(E).
When the graph is connected, then you can always say it is O(E). The reason to include V in the time complexity, is to cover for the graphs that have many more vertices than edges (and thus are disconnected): the BFS algorithm will not only have to visit all edges, but also all vertices, including those that have no edges, just to detect that they don't have edges. And so we must say O(V+E).
The complexity comes off easily if you walk through the algorithm. Let Q be the FIFO queue where initially it contains the source node. BFS basically does the following
while Q not empty
pop u from Q
for each adjacency v of u
if v is not marked
mark v
push v into Q
Since each node is added once and removed once then the while loop is done O(V) times. Also each time we pop u we perform |adj[u]| operations where |adj[u]| is the number of
adjacencies of u.
Therefore the total complexity is Sum (1+|adj[u]|) over all V which is O(V+E) since the sum of adjacencies is O(E) (2E for undirected graph and E for a directed one)
Consider a situation when you have a tree, maybe even with cycles, you start search from the root and your target is the last leaf of your tree. In this case you will traverse all the edges before you get into your destination.
E.g.
0 - 1
1 - 2
0 - 2
0 - 3
In this scenario you will check 4 edges before you actually find a node #3.
It depends on how the adjacency list is implemented. A properly implemented adjacency list is a list/array of vertices with a list of related edges attached to each vertex entry.
The key is that the edge entries point directly to their corresponding vertex array/list entry, they never have to search through the vertex array/list for a matching entry, they can just look it up directly. This insures that the total number of edge accesses is 2E and the total number of vertex accesses is V+2E. This makes the total time O(E+V).
In improperly implemented adjacency lists, the vertex array/list is not directly indexed, so to go from an edge entry to a vertex entry you have to search through the vertex list which is O(V), which means that the total time is O(E*V).

Why is the complexity of BFS O(V+E) instead of O(E)? [duplicate]

This question already has answers here:
Why is time complexity for BFS/DFS not simply O(E) instead of O(E+V)?
(2 answers)
Breadth First Search time complexity analysis
(8 answers)
Closed 2 years ago.
This is a generic BFS implementation:
For a connected graph with V nodes and E total number of edges, we know that every edge will be considered twice in the inner loop. So if the total number of iterations in the inner loop of BFS is going to be 2 * number of edges E, isn't the runtime going to be O(E) instead?
This is a case where one needs to look a little deeper at the implementation. In particular, how do I determine if a node is visited or not?
The traditional algorithm does this by coloring the vertices. All vertices are colored white at first, and they get colored black as they are visited. Thus visitation can be determined simply by looking at the color of the vertex. If you use this approach, then you have to do O(V) worth of initialization work setting the color of each vertex to white at the start.
You could manage your colors differently. You could maintain a data structure containing all visited nodes. If you did this, you could avoid the O(V) initialization cost. However, you will pay that cost elsewhere in the data structure. For example, if you stored them all in a balanced tree, each if w is not visited now costs O(log V).
This obviously gives you a choice. You can have O(V+E) using the traditional coloring approach, or you can have O(E log V) by storing this information in your own data structure.
You specify a connected graph in your problem. In this case, O(V+E) == O(E) because the number of vertices can never be more than E+1. However, the time complexity of BFS is typically given with respect to an arbitrary graph, which can include a very sparse graph.
If a graph is sufficiently sparse (say, a million vertices and five edges), the cost of initialization may be great enough that you want to switch to a O(E ln V) algorithm. However, these are pretty rare in a practical setting. In a practical setting, the speed of the traditional approach (giving each vertex a color) is just so blinding fast compared to the more fancy data structures that you choose this traditional coloring scheme for everything except the most extraordinarily sparse graphs.
If you maintained a dedicated color property on your vertices with an invariant rule that all nodes are black between algotihm invocations, you could drop the cost to O(E) by doing each BFS twice. On your first pass, you could set them all to white, and then do a second pass to turn them all black. If you had a very sparse graph, this could be more efficient.
Well, let's break it up into easy pieces...
You've kept a visited array, and by looking it up, you decide whether to push a node into the queue or not. Once visited, you don't push it again. So, how many nodes get pushed into the queue: (of course) V nodes. And it's complexity is O(V).
Now, each time, you take out a node from queue and visit all of its neighboring nodes. Now, following this way, for all of V nodes, how many node you'll come across. Well, it's the number of edges if the graph is unidirectional, or, 2 * number of edges if the graph is bidirectional. So, the complexity would be O(E) for unidirectional and O(2 * E) for bidirectional.
So, the ultimate(i.e. total) complexity would be O(V + E) or O(V + 2 * E) or generally, we may say O(v + E).
Because there might be graph having edges less than number of vertices.
Consider this graph:
1 ---- 2
|
|
3 ---- 4
There are 4 vertices but only 3 edges, and in BFS you have to traverse each and every vertex. Thatswhy time complexity is O(V+E) as it considers both V as well as E.

Graph In-degree Calculation from Adjacency-list

I came across this question in which it was required to calculate in-degree of each node of a graph from its adjacency list representation.
for each u
for each Adj[i] where i!=u
if (i,u) ∈ E
in-degree[u]+=1
Now according to me its time complexity should be O(|V||E|+|V|^2) but the solution I referred instead described it to be equal to O(|V||E|).
Please help and tell me which one is correct.
Rather than O(|V||E|), the complexity of computing indegrees is O(|E|). Let us consider the following pseudocode for computing indegrees of each node:
for each u
indegree[u] = 0;
for each u
for each v \in Adj[u]
indegree[v]++;
First loop has linear complexity O(|V|). For the second part: for each v, the innermost loop executes at most |E| times, while the outermost loop executes |V| times. Therefore the second part appears to have complexity O(|V||E|). In fact, the code executes an operation once for each edge, so a more accurate complexity is O(|E|).
According to http://www.cs.yale.edu/homes/aspnes/pinewiki/C(2f)Graphs.html, Section 4.2, with an adjacency list representation,
Finding predecessors of a node u is extremely expensive, requiring looking through every list of every node in time O(n+m), where m is the total number of edges.
So, in the notation used here, the time complexity of computing the in-degree of a node is O(|V| + |E|).
This can be reduced at the cost of additional space of using extra space, however. The Wiki also states that
adding a second copy of the graph with reversed edges lets us find all predecessors of u in O(d-(u)) time, where d-(u) is u's in-degree.
An example of a package which implements this approach is the Python package Networkx. As you can see from the constructor of the DiGraph object for directional graphs, networkx keeps track of both self._succ and self._pred, which are dictionaries representing the successors and predecessors of each node, respectively. This allows it to compute each node's in_degree efficiently.
O(|V|+|E|) is the correct answer, because you visit each vertex in O(|V|) and each time you visit a fraction of the edges so O(|E|) in total, also usually |E|>>|V| so O(|E|) is also correct

Time complexity of creating a minimal spanning tree if the number of edges is known

Suppose that the number of edges of a connected graph is known and the weight of each edge is distinct, would it possible to create a minimal spanning tree in linear time?
To do this we must look at each edge; and during this loop there can contain no searches otherwise it would result in at least n log n time. I'm not sure how to do this without searching in the loop. It would mean that, somehow we must only look at each edge once, and decide rather to include it or not based on some "static" previous values that does not involve a growing data structure.
So.. let's say we keep the endpoints of the node in question, then look at the next node, if the next node has the same vertices as prev, then compare the weight of prev and current node and keep the lower one. If the current node's endpoints are not equal to prev, then it is in a different component .. now I am stuck because we cannot create a hash or array to keep track of the component nodes that are already added while look through each edge in linear time.
Another approach I thought of is to find the edge with the minimal weight; since the edge weights are distinct this edge will be part of any MST. Then.. I am stuck. Since we cannot do this for n - 1 edges in linear time.
Any hints?
EDIT
What if we know the number of nodes, the number of edges and also that each edge weight is distinct? Say, for example, there are n nodes, n + 6 edges?
Then we would only have to find and remove the correct 7 edges correct?
To the best of my knowledge there is no way to compute an MST faster by knowing how many edges there are in the graph and that they are distinct. In the worst case, you would have to look at every edge in the graph before finding the minimum-cost edge (which must be in the MST), which takes Ω(m) time in the worst case. Therefore, I'll claim that any MST algorithm must take Ω(m) time in the worst case.
However, if we're already doing Ω(m) work in the worst-case, we could do the following preprocessing step on any MST algorithm:
Scan over the edges and count up how many there are.
Add an epsilon value to each edge weight to ensure the edges are unique.
This can be done in time Ω(m) as well. Consequently, if there were a way to speed up MST computation knowing the number of edges and that the edge costs are distinct, we would just do this preprocessing step on any current MST algorithm to try to get faster performance. Since to the best of my knowledge no MST algorithm actually tries to do this for performance reasons, I would suspect that there isn't a (known) way to get a faster MST algorithm based on this extra knowledge.
Hope this helps!
There's a famous randomised linear-time algorithm for minimum spanning trees whose complexity is linear in the number of edges. See "A randomized linear-time algorithm to find minimum spanning trees" by Karger, Klein, and Tarjan.
The key result in the paper is their "sampling lemma" -- that, if you independently randomly select a subset of the edges with probability p and find the minimum spanning tree of this subgraph, then there are only |V|/p edges that are better than the worst edge in the tree path connecting its ends.
As templatetypedef noted, you can't beat linear-time. That all edge weights are distinct is a common assumption that simplifies analysis; if anything, it makes MST algorithms run a little slower.
The fact that a number of edges (N) is known does not influence the complexity in any way. N is still a finite but unbounded variable, and each graph will have different N. If you place a upper bound on N, say, 1 million, then the complexity is O(1 million log 1 million) = O(1).
The fact that each edge has distinct weight does not influence the program either, because it does not say anything about the graph's structure. Therefore knowledge about current case cannot influence further processing, as we cannot predict how the graph's structure will look like in the next step.
If the number of edges is close to n, like in this case n-6 (after edit), we know that we only need to remove 7 edges as every spanning tree has only n-1 edges.
The Cycle Property shows that the most expensive edge in a cycle does not belong to any Minimum Spanning tree(assuming all edges are distinct) and thus, should be removed.
Now you can simply apply BFS or DFS to identify a cycle and remove the most expensive edge. So, overall, we need to run BFS 7 times. This takes 7*n time and gives us a time complexity of O(n). Again, this is only true if the number of edges is close to the number of nodes.

Resources