In CRLS' book, the analysis of Dijkstra's algorithm is as follows:
How many times do you need to use the heap? One time for pulling off each node from the heap (i.e. Extract-Min in CRLS's book) --- O(N); and also every time when looking at the edge ---- O(E), you might need to change the distance (i.e., Decrease-Key in CRLS' book), which means to fix the heap order. And each heap operation needs O(logN) work.
Thus, total time complexity: O((N + E)logN), which is O(ElogN) if all vertices are reachable from the source.
My Question is:
Why the complexity becomes O(ElogN) if all vertices are reachable from the source? Why can we ignore the O(NlogN) part from O((N + E)logN)?
If all vertices are reachable from the source, then there are at least N-1 edges in graph, therefore E >= N-1, N = O(E) and O((N + E) log N) = O((E + E) log N) = O(E log N)
If all nodes are connected there must be at least N-1 edges. So E >= N-1 and thus N <= E+1 and N+E <= 2E+1 which is in O(E).
Related
In this passage from my textbook:
where are the inequalities from? (The ones that I've marked with red rectangles.) I feel that they describe a relationship between vertices and edges in a graph, but I don't understand it.
You have two implementations of Dijkstra’s algorithm to choose from. One runs in time O((m + n) log n) = O(m log n), assuming the graph is connected. The other runs in time O(n2). The question is where the crossover point is between these two runtimes. Equating and simplifying gives that
m log n = n2
m = n2 / log n
So if m is asymptotically smaller than n2 / log n, you’d prefer the heap implementation, and if m is asymptotically bigger than n2 / log n you’d prefer the unsorted sequence approach.
(Note that, with a Fibonacci heap, the runtime of Dijkstra’s algorithm is O(m + n log n), which is never asymptotically worse than O(n2).)
The time complexity of BFS, or DFS, on a graph is O(V+E) because we traverse all the nodes and edges of the graph. (I get that) But for a binary tree, the time complexity of BFS and DFS is O(V)... Why is that?
I am assuming is because of the following: O(V+E) = O(V + V-1) = O(2V) = O(V). Is this the correct reasoning? If not, an intuitive explanation would be much appreciated. Thanks
All trees have n - 1 edges, n being the number of nodes. The time complexity is still technically O(V + E), but that equates to O(n + (n-1)) = O(n).
you can actually see it in a different way, without the use of graphs.
n is the number of nodes.
And denote the steps required for traversing through the whole tree f(n) (note, the time complexity will then be O(f(n))).
Consider that for each node we need to:
either visit that, or traverse it through on the left, and traverse it through on the right, and eventually return on it at most one time.
All these 4 operations can happen at most Once for each node. Agree?
From this we deduce that f(n) <= 4n.
Agree? Because for each node we can have at most those 4 operations. Remind we have n nodes.
Obviously, at the same time, n <= f(n)
because we need to visit each node at least once.
Therefore,
n <= f(n) <= 4n
Applying the O notation, we get
O(n) <= O(f(n)) <= O(4n)
Reminding that O(4n) = O(n) by properties of O (invariance due to multiplicative constants different from 0), we get that
O(n) <= O(f(n)) <= O(4n) = O(n),
or
O(n) <= O(f(n)) <= O(n)
Notice the left side of this chain of inequality is equal to the right side of the chain, meaning that it is not only a chain of inequality, but a chain of equalities, or
O(n) = O(f(n)) = O(n)
meaning that the complexity is O(n)
I am calculating time complexity for kruskal algorithm like this (Please see the algorithm in the Image Attached)
T(n) = O(1) + O(V) + O(E log E) + O(V log V)
= O(E log E) + O(V log V)
as |E| >= |V| - 1
T(n) = E log E + E log E
= E log E
The CLRS Algorithm:
Is it correct or I'm doing something wrong please tell.
Kruskal is O(E log E); your derivation is right. You could also say O(E log V) because E <= V * V, so log(E) <= 2 log(V) (I don't know why I remember that, other than that I think a prof put that on an exam at one point...)
Since |V| > |E|+1, we prefer a tight upper bound with V terms instead of E terms.
|E| <= |V|²
. log |E| < log |V|²
. log |E| < 2 log |V|
. running time of MST-KRUSKAL is: O(E log V)
Sorry for the late reply.
Runtime for Kruskal algorithm is O(E log E) and not O(E log V).
As, the edges have to be sorted first and it takes O(E log E) where it dominates the runtime for verifying whether the edge in consideration is a safe edge or not which would take O( E log V). And |E| > |V|((corner case if the graph is already a tree)), so its safe to assume the runtime is O(E log E)
O(ElogE) is definitely O(ElogV) because E <= V^2 (fully connected graph)
ElogE <= Elog(V^2) = 2ElogV = O(ElogV)
All other answers are correct, but we can consider the following case, that gives us the time complexity of O(|E|).
The following answer is from Algorithms book by Dasgupta, chapter 5, page 140, section path compression:
In the time complexity computation of this algorithm, the dominant part is the edge sorting section which is O(|E| log|E|) or as all other answers explained O( |E|. log|V|).
But, what if the given edges are sorted?
Or if the weights are small (say, O(|E|)) so that sorting can be done in linear time (like applying counting sort).
In such a case, the data structure part becomes the bottleneck (the Union-find) and it is useful to think about improving its performance beyond log n per operation.
The solution is using the path-compression method, while doing the find() operation.
This amortized cost turns out to be just barely more than O(1), down from the earlier O(log n). For more details please check this reference.
The brief idea is, whenever the find(v) operation is called to find the root of a set which v belongs to, all the nodes' links to their parent will be changed and will point out to the root. This way if you call find(x) operation on each node x on the same path, you will get the set's root (label) in O(1). Hence, in this case, the algorithm bottleneck is the Union-find operation and using the described solution it is O(1), the running time of this algorithm in the described situation is O(|E|).
line 5 to 9 the complexity is O(E).
O(E)
O(1)
O(1)
O(1)
O(1)
till the line 5 you have calculated the complexity rightly. Finally, the Dominating factor here is O(E lg E). So, the complexity is O(E lg E)
I am going through exercies for an exam in algorithm analysis and this is one of them:
Present an algorithm that takes as input a list of n elements (that
are comparable) and sorts them in O(n log m) time, where m is the
number of distinct values in the input list.
I have read about the common sorting algorithms and I really can't come up with a solution.
Thanks for your help
You can build an augmented balanced binary search tree on the n elements. The augmented info stored at each node would be it's frequency. You build this structure with n insertions into the tree, the time to do this would be O(n lg m), since there would be only m nodes. Then you do a in-order traversal of this tree: visit the left subtree, then print the element stored at the root f times where f is it's frequency (this was the augmented info) and finally visit the right subtree. This traversal would take time O(n + m). So, the running time of this simple procedure would be O(n lg m + n + m) = O(n lg m) since m <= n.
Can any one explain why the time complexity for generating a binary heap from a unsorted array using bottom-up heap construction is O(n) ?
(Solution found so far: I found in Thomas and Goodrich book that the total sum of sizes of paths for internal nodes while constructing the heap is 2n-1, but still don't understand their explanation)
Thanks.
Normal BUILD-HEAP Procedure for generating a binary heap from an unsorted array is implemented as below :
BUILD-HEAP(A)
heap-size[A] ← length[A]
for i ← length[A]/2 downto 1
do HEAPIFY(A, i)
Here HEAPIFY Procedure takes O(h) time, where h is the height of the tree, and there
are O(n) such calls making the running time O(n h). Considering h=lg n, we can say that BUILD-HEAP Procedure takes O(n lg n) time.
For tighter analysis, we can observe that heights of most nodes are small.
Actually, at any height h, there can be at most CEIL(n/ (2^h +1)) nodes, which we can easily prove by induction.
So, the running time of BUILD-HEAP can be written as,
lg n lg n
∑ n/(2^h+1)*O(h) = O(n* ∑ O(h/2^h))
h=0 h=0
Now,
∞
∑ k*x^k = X/(1-x)^2
k=0
∞
Putting x=1/2, ∑h/2^h = (1/2) / (1-1/2)^2 = 2
h=0
Hence, running time becomes,
lg n ∞
O(n* ∑ O(h/2^h)) = O(n* ∑ O(h/2^h)) = O(n)
h=0 h=0
So, this gives a running time of O(n).
N.B. The analysis is taken from this.
Check out wikipedia:
Building a heap:
A heap could be built by successive insertions. This approach requires O(n log n) time because each insertion takes O(log n) time and there are n elements. However this is not the optimal method. The optimal method starts by arbitrarily putting the elements on a binary tree, respecting the shape property. Then starting from the lowest level and moving upwards, shift the root of each subtree downward as in the deletion algorithm until the heap property is restored.
http://en.wikipedia.org/wiki/Binary_heap