Time complexity for generating binary heap from unsorted array - algorithm

Can any one explain why the time complexity for generating a binary heap from a unsorted array using bottom-up heap construction is O(n) ?
(Solution found so far: I found in Thomas and Goodrich book that the total sum of sizes of paths for internal nodes while constructing the heap is 2n-1, but still don't understand their explanation)
Thanks.

Normal BUILD-HEAP Procedure for generating a binary heap from an unsorted array is implemented as below :
BUILD-HEAP(A)
heap-size[A] ← length[A]
for i ← length[A]/2 downto 1
do HEAPIFY(A, i)
Here HEAPIFY Procedure takes O(h) time, where h is the height of the tree, and there
are O(n) such calls making the running time O(n h). Considering h=lg n, we can say that BUILD-HEAP Procedure takes O(n lg n) time.
For tighter analysis, we can observe that heights of most nodes are small.
Actually, at any height h, there can be at most CEIL(n/ (2^h +1)) nodes, which we can easily prove by induction.
So, the running time of BUILD-HEAP can be written as,
lg n lg n
∑ n/(2^h+1)*O(h) = O(n* ∑ O(h/2^h))
h=0 h=0
Now,
∞
∑ k*x^k = X/(1-x)^2
k=0
∞
Putting x=1/2, ∑h/2^h = (1/2) / (1-1/2)^2 = 2
h=0
Hence, running time becomes,
lg n ∞
O(n* ∑ O(h/2^h)) = O(n* ∑ O(h/2^h)) = O(n)
h=0 h=0
So, this gives a running time of O(n).
N.B. The analysis is taken from this.

Check out wikipedia:
Building a heap:
A heap could be built by successive insertions. This approach requires O(n log n) time because each insertion takes O(log n) time and there are n elements. However this is not the optimal method. The optimal method starts by arbitrarily putting the elements on a binary tree, respecting the shape property. Then starting from the lowest level and moving upwards, shift the root of each subtree downward as in the deletion algorithm until the heap property is restored.
http://en.wikipedia.org/wiki/Binary_heap

Related

Average time complexity of finding top-k elements

Consider the task of finding the top-k elements in a set of N independent and identically distributed floating point values. By using a priority queue / heap, we can iterate once over all N elements and maintain a top-k set by the following operations:
if the element x is "worse" than the heap's head: discard x ⇒ complexity O(1)
if the element x is "better" than the heap's head: remove the head and insert x ⇒ complexity O(log k)
The worst case time complexity of this approach is obviously O(N log k), but what about the average time complexity? Due to the iid-assumption, the probability of the O(1) operation increases over time, and we rarely have to perform the costly O(log k), especially for k << N.
Is this average time complexity documented in any citable reference? What's the average time complexity? If you have a citeable reference for your answer please include it.
Consider the i'th largest element, and a particular permutation. It'll inserted into the k-sized heap if it appears before no more than k-1 of the (i - 1) larger elements in the permutation.
The probability of that heap-insertion happening is 1 if i <= k, and k/i if i > k.
From this, you can compute the expectation of the number of heap adjustments, using linearity of expectation. It's sum(i = 1 to k)1 + sum(i = k+1 to n)k/i = k + sum(i = k+1 to n)k/i = k * (1 + H(n) - H(k)), where H(n) is the n'th harmonic number.
This is approximately k log(n) (for k << n), and you can compute your average cost from there.

Specific algorithm sorting n elements with m distinct values

I am going through exercies for an exam in algorithm analysis and this is one of them:
Present an algorithm that takes as input a list of n elements (that
are comparable) and sorts them in O(n log m) time, where m is the
number of distinct values in the input list.
I have read about the common sorting algorithms and I really can't come up with a solution.
Thanks for your help
You can build an augmented balanced binary search tree on the n elements. The augmented info stored at each node would be it's frequency. You build this structure with n insertions into the tree, the time to do this would be O(n lg m), since there would be only m nodes. Then you do a in-order traversal of this tree: visit the left subtree, then print the element stored at the root f times where f is it's frequency (this was the augmented info) and finally visit the right subtree. This traversal would take time O(n + m). So, the running time of this simple procedure would be O(n lg m + n + m) = O(n lg m) since m <= n.

O(n) - the next permutation lexicographically

i'm just wondering what is efficiency (O(n)) of this algorithm:
Find the largest index k such that a[k] < a[k + 1]. If no such index exists, the permutation is the last permutation.
Find the largest index l such that a[k] < a[l]. Since k + 1 is such an index, l is well defined and satisfies k < l.
Swap a[k] with a[l].
Reverse the sequence from a[k + 1] up to and including the final element a[n].
As I understand the worst case O(n) = n (when k is the first element of previous permutation), best case O(n) = 1 (when k is last element of previous permutation).
Can I say that O(n) = n/2 ?
O(n) = n/2 makes no sense. Let f(n) = n be the running time of your algorithm. Then the right way to say it is that f(n) is in O(n). O(n) is a set of functions that are at most asymptotically linear in n.
Your optimization makes the expected running time g(n) = n/2. g(n) is also in O(n). In fact O(n) = O(n/2) so your saving of half of the time does not change the asymptotic complexity.
All steps in the algorithm takes O(n) asymptotically.
Your averaging is incorrect. Just because best case is O(1) and worst case is O(n), you can't say the algorithm takes O(n)=n/2. Big O notation is simply for the upper bound of the algorithm.
So the algorithm is still O(n) irrespective of the best case scenario.
There is no such thing as O(n) = n/2.
When you do O(n) calculations you're just trying to find the functional dependency, you don't care about coefficients. So there's no O(n)= n/2 just like there's no O(n) = 5n
Asymptotically, O(n) is the same as O(n/2). In any case, the algorithm is performed for each of the n! permutations, so the order is much greater than your estimate (on the order of n!).

Asymptotic time complexity of inserting n elements to a binary heap already containing n elements

Suppose we have a binary heap of n elements and wish to insert n more elements(not necessarily one after other). What would be the total time required for this?
I think it's theta (n logn) as one insertion takes logn.
given : heap of n elements and n more elements to be inserted. So in the end there will be 2*n elements. since heap can be created in 2 ways 1. Successive insertion and 2. Build heap method. Amoung these build heap method takes O(n) time to construct heap which is explained in
How can building a heap be O(n) time complexity?. so total time required is O(2*n) which is same as O(n)
Assuming we are given:
priority queue implemented by standard binary heap H (implemented on array)
n current size of heap
We have following insertion properties:
W(n) = WorstCase(n) = Θ(lg n) (Theta). -> W(n)=Ω(lg n) and W(n)=O(lg n)
A(n) = AverageCase(n) = Θ(lg n) (Theta). -> W(n)=Ω(lg n) and W(n)=O(lg n)
B(n) = BestCase(n) = Θ(1) (Theta). -> W(n)=Ω(1) and W(n)=O(1)
So for every case, we have
T(n) = Ω(1) and T(n) = O(lg n)
WorstCase is when, we insert new minimal value, so up-heap has to travel whole branch.
BestCase is when, for minimal-heap (heap with minimal on top) we insert BIG (biggest on updated branch) value (so up-heap stops immediately).
You've asked about series of n operations on heap containing already n elements,
it's size will grow
from n to 2*n
what asymptotically is ...
n=Θ(n)
2*n=Θ(n)
What simplifies our equations. (We don't have to worry about growth of n , as it's growth is by constant factor).
So, we have "for n insertions" of operation:
Xi(n) = X_for_n_insertions(n)
Wi(n) = Θ(n lg n)
Ai(n) = Θ(n lg n)
Bi(n) = Θ(n)
it implies, for "all case":
Ti(n) = Ω(n) and Ti(n) = O(n lg n)
P.S. For displaying Theta Θ , Omega Ω symbols, you need to have UTF-8 installed/be compatible.
its not theeta(nlogn)... its order(nlogn) since some of the insertions can take less then exact logn time... therefore for n insertions it will take time <=nlogn
=> time complexity=O(nlogn)

Why O(N Log N) to build Binary search tree?

Getting ready for exam. This is not the homework question.
I figured that the worst case O(N^2) to build BST. (each insert req N-1 comparison, you sum all the comparisons 0 + 1 + ... + N-1 ~ N^2). This is the case for skewed BST.
The insertion for (balanced) BST is O(log N), so why the best case is O(N logN) to construct the tree ?
My guess best guess - since single insertion is log N, than summing all the insertions somehow gives us N log.
Thanks !
As you wrote :) Single insertion is O(log N).Because the weighted tree height of N element is log N you need up to log N comparsions to insert single element. You need to do N of these insertions. So N*logN.

Resources