Finding k smallest elements in a min heap - worst-case complexity - algorithm

I have a minimum heap with n elements and want to find k smallest numbers from this heap. What is the worst-case complexity?
Here is my approach: somewhere on the stackoverflow I read that complexity of finding i-th smallest number in a min heap is O(i). So if we would like to find n-1 smallest numbers (n is pointless, since it would be the entire heap), the total complexity would look something like this:
O(n-1)+O(n-2)+O(n-3)+…+O(2)+O(1)=O((1+n-1)*(n/2))=O(n^2)
Is this correct?

No, the time is much better than that. O(k log(n)) very easily, and O(k) if you're smart.
Finding and removing the smallest element from the heap is O(log(n)). This leads to O(k log(n)) time very easily.
BUT the result that you are thinking about is https://ac.els-cdn.com/S0890540183710308/1-s2.0-S0890540183710308-main.pdf?_tid=382a1cac-e4f7-11e7-8ac9-00000aab0f02&acdnat=1513713791_08f4df78a8821855e8ec788da063ea2f that shows how to find the size of the kth smallest number in time O(k). Now you use the fact that a heap is a binary tree and start from the root and do a recursive search for every number that you find which is smaller than that largest. Then fill out the rest of your list with copies of the k'th smallest number.
In that search you will wind up looking at up to k-1 that are at most that size, and for some of them you will look at up to 2 children that are too large to bother with, for a maximum of 3k-3 elements. This makes the whole algorithm O(k).
That link died due to bitrot. Hopefully https://www.sciencedirect.com/science/article/pii/S0890540183710308 lasts longer.

I am doubtful that it is possible to identify the kth smallest element in time O(k). The best I have seen before is an O(k log k) algorithm, which also conveniently solves your problem by identifying the k smallest elements. You can read the details in another answer on StackOverflow or on Quora.
The basic idea is to manipulate a secondary heap. Initially, this secondary heap contains only the root of the original heap. At each step, the algorithm deletes the min of the secondary heap and inserts its two original children (that is, its children from the original heap) into the secondary heap.
This algorithm has a nice property that on step i, the element it deletes from the secondary heap is the ith smallest element overall. So after k steps, the set of items which have been deleted from the secondary heap are exactly the k smallest elements. This algorithm is O(k log k) because there are O(k) deletions/insertions into a secondary heap which is upper bounded in size by O(k).
EDIT: I stand corrected! btilly's answer provides a solution in O(k) using a result from this paper.

There is a recent (2019) algorithm that finds the k smallest elements of a binary min-heap in time O(k) that uses the soft heap data structure. This is a dramatically simpler algorithm than Frederickson’s original O(k)-time heap selection algorithm. See “ Selection from Heaps, Row-Sorted Matrices, and X+Y Using Soft Heaps” by Kaplan et al.

Related

a heap with n elements that supports Insert and Extract-Min, Which of the following tasks can you achieve in O(logn) time?

For the following questions
Question 3
You are given a heap with n elements that supports Insert and Extract-Min. Which of the following tasks can you achieve in O(logn) time?
Find the median of the elements stored in the heap.
Find the fifth-smallest element stored in the heap.
Find the largest element stored in the heap.
Find the median of the elements stored in theheap.
Why is "Find the largest element stored in the heap."not correct, my understanding here is that you can use logN time to go to the bottom of the heap, and one of the element there must be the largest element.
"Find the fifth-smallest element stored in the heap." this should take constant time right, because you only need to go down 5 layers at most?
"Find the median of the elements stored in the heap. " should this take O(n) time? because we extract min for the n elements to get a sorted array, and take o(1) to find the median of it?
It depends on what the running times are of the operations insert and extract-min. In traditional heaps, both take ϴ(log n) time. However, in finger-tree-based heaps, only insert takes ϴ(log n) time, while extract-min takes O(1) time. There, you can find the fifth smallest element in O(5) = O(1) time and the median in O(n/2) = O(n) time. You can also find the largest element in O(n) time.
Why is "Find the largest element stored in the heap."not correct, my understanding here is that you can use logN time to go to the bottom of the heap, and one of the element there must be the largest element.
The lowest level of the heap contains half of the elements. More correctly, half of the elements of the heap are leaves--have no children. The largest element in the heap is one of those. Finding the largest element of the heap, then, will require that you examine n/2 items. Except that the heap only supports insert and extract-min, so you end up having to call extract-min on every element. Finding the largest element will take O(n log n) time.
"Find the fifth-smallest element stored in the heap." this should take constant time right, because you only need to go down 5 layers at most?
This can be done in log(n) time. Actually 5*log(n) because you have to call extract-min five times. But we ignore constant factors. However it's not constant time because the complexity of extract-min depends on the size of the heap.
"Find the median of the elements stored in the heap." should this take O(n) time? because we extract min for the n elements to get a sorted array, and take o(1) to find the median of it?
The median is the middle element. So you only have to remove n/2 elements from the heap. But removing an item from the heap is a log(n) operation. So the complexity is O(n/2 log n) and since we ignore constant factors in algorithmic analysis, it's O(n log n).

Can I find the second largest element with a min heap with O(lgn) time?

My opinion:
In a min heap, the largest element is always on the leaves. The number of leaves is between lg(n+1)/2 and lg(n+1). By linear searching among the leaves I can always find the largest element in the heap with O(lgn) time. Deleting that item costs constant time. Linear searching the Leaves can give us the second largest element in the heap in O(lgn) time.
However, the professors in MIT say we can't do that.
Quiz solution in MIT 6.006
I wonder what's wrong with my solution.
Maximum number of nodes in a binary tree is
Out of them you will have 2^k leaves. So basically almost one half of all the leaves O(n), which is not even close to your log estimate. Your algorithm will run most probably in O(n log n)

Get the k smallest elements of an array using quick sort

How would you find the k smallest elements from an unsorted array using quicksort (other than just sorting and taking the k smallest elements)? Would the worst case running time be the same O(n^2)?
You could optimize quicksort, all you have to do is not run the recursive potion on the other portions of the array other than the "first" half until your partition is at position k. If you don't need your output sorted, you can stop there.
Warning: non-rigorous analysis ahead.
However, I think the worst-case time complexity will still be O(n^2). That occurs when you always pick the biggest or smallest element to be your pivot, and you devolve into bubble sort (i.e. you aren't able to pick a pivot that divides and conquers).
Another solution (if the only purpose of this collection is to pick out k min elements) is to use a min-heap of limited tree height ciel(log(k)) (or exactly k nodes). So now, for each insert into the min heap, your maximum time for insert is O(n*log(k)) and the same for removal (versus O(n*log(n)) for both in a full heapsort). This will give the array back in sorted order in linearithmic time worst-case. Same with mergesort.

Why is the runtime of building a heap by inserting elements worse than using heapify?

In the CLRS book, building a heap by top-down heapify has the complexity O(n). A heap can also be built by repeatedly calling insertion, which has the complexity nlg(n) in the worst case.
My question is: is there any insight why the latter method has the worse performance?
I asked this question since I feel there are simple insights behind the math. For example,
quicksort, merge sort, and heapsort are all based on reducing unnecessary comparisons, but with different methods.
quicksort: balanced partition, no need to compare left subset to right subset.
merge sort: simply compare the two minimum elements from two sub-arrays.
heapsort: if A has larger value than B, A has larger value than B's descendants, and no need to compare with them.
The main difference between the two is what direction they work: upwards (the O(n log n) algorithm) or downwards (the O(n)) algorithm.
In the O(n log n) algorithm done by making n insertions, each insertion might potentially bubble up an element from the bottom of the (current) heap all the way up to the top. So imagine that you've built all of the heap except the last full layer. Imagine that every time you do an insertion in that layer, the value you've inserted is the smallest overall value. In that case, you'd have to bubble the new element all the way up to the top of the heap. During this time, the heap has height (roughly) log n - 1, so the total number of swaps you'll have to do is (roughly) n log n / 2 - n / 2, giving a runtime of Θ(n log n) in the worst-case.
In the O(n) algorithm done by building the heap in one pass, new elements are inserted at the tops of various smaller heaps and then bubbled down. Intuitively, there are progressively fewer and fewer elements higher and higher up in the heap, so most of the work is spent on the leaves, which are lower down, than in the higher elements.
The major difference in the runtimes has to do with the direction. In the O(n log n) version, since elements are bubbled up, the runtime is bounded by the sum of the lengths of the paths from each node to the root of the tree, which is Θ(n log n). In the O(n) version, the runtime is bounded by the lengths of the paths from each node to the leaves of the tree, which is much lower (O(n)), hence the better runtime.
Hope this helps!

Is there a "tournament" algorithm to find k-th largest element?

I know that we can find the 2nd largest element in an array of size N in N+log(N)-2 using a "tournament" algorithm. Now I wonder if we can find the k-th largest element using a similar "tournament".
I know there is an O(N) "selection" algorithm to find the k-th largest element. It uses Quick Select with a "good" pivot, which can be found in O(N). We can build also a heap from the array in O(N) and retrieve k element from the heap.
I wonder if there is another approach.
I believe you can make this an O(N log k) algorithm: Iterate over the array, and maintain a min-heap of the largest k elements encountered so far. So the first k elements go directly into the heap. Every subsequent element will be compared against the tip of the heap, and if it is larger, the tip will be removed from the heap and the new element inserted, which is an O(log k) operation for a heap of size k. When the algorithm is done, and the sequence had a length of at least k, then the tip of the heap will have the kth largest element, and the rest of the heap the larger elements.
This approach has inferior worst-case behaviour than the median-of-medians O(n) solution, but will be much easier to implement and yield rather good behaviour for small k. So it might be well suited for many practical applications.

Resources