Time complexity of Heap Sort [duplicate] - algorithm

This question already has answers here:
How can building a heap be O(n) time complexity?
(18 answers)
Closed 3 years ago.
I read everywhere that the time complexity of heapsort is O(nlog(n)) in the worst case. But we also read everywhere that it is a common misconception that a heap is built in O(nlog(n)). Instead, you can make a heap in O(n). So considering that a heap can be made in O(n), look at the following sorting algorithm and tell me where I am wrong in analyzing its time complexity.
Put n elements into a heap (time: O(n))
Until the heap is empty, pop each element and copy it into an array. (time: O(n). Why? because in the same way all elements can be put into a heap in O(n), all of them can also be extracted in O(n). Right?).
All in all, the complexity is O(n)+O(n) which is O(n). But here, we also need an additional memory of O(n).
I know the traditional heapsort has time complexity of O(nlog(n)) and memory complexity of O(1). But isn't this heapsort too? And it provides O(n) even in the worst case, unlike the traditional heapsort algorithm.

Note that you cant sort an array in O(n) without any additional information on your data. actually, we can prove a lower bound of O(nlogn) on sorting an array while using comaprison based algorithm! and for the same reason, we can prove that lower bound on Heaport!
meaning- you cant, ever, sort any data structure in O(n)! any linear sorting algorithm have to assume some prior knowledge about your data.
for more information about how to prove this lower bound of O(nlogn) search for "decision trees"

Related

sorting algorithm met in interview:sorting in a double linked list

To sort a double linked list with a space complexity of O(1) and an average time complexity of O(nlgn). The worst O(n^2) time should be avoided. How can you meet this requires?
My interviewer hinted that it should be a improved version of heap sort.

can we use heapsort to sort an unordered set of numbers in linear time?

Hey guys quick question..
As we know that, we can build a heap from an unordered set of numbers in linear time. can anyone prove how?
Thanks in Advance !
No.
Although you can build a heap (presumably implemented as a complete binary tree) in linear (O(n)) time, each extraction from the heap takes O(log(n)) time in order to maintain the heap invariant. Thus, the assembly of the sorted array from the binary heap takes O(n log(n)) time total, just as all optimal binary comparison-based sorting algorithms do.

Heapsort: why not use "Soft Heap" to boost the performance?

From the Soft Heap wikipedia page, it seems that the min extraction only takes constant time, thus using a soft heap to perform heapsort should lead to an amortized O(n). Even if the constant is large, for very large n this algorithm should be very useful. But I've never heard people mentioning this. Is there a reason that people do not use this?
Thanks!
The Soft Heap suffers from "corruption" (read the page you link to), which makes it inapplicable as a component of a general-purpose sorting routine. You will simply get the wrong answer most of the time.
If you have some application that requires a sort but could deal with the "corrupted" results you would get from a Soft Heap as part of the implementation, then this would give you a potential speedup.
The Fibonacci Heap does not suffer from "corruption," however it has a O(log n) delete time. You can use Fibonacci Heap as part of a general-purpose sorting routine, but the overall performance of your sort will be O(n log n).
To follow up on #Rob's point:
There is a theoretical limit on the efficiency of comparison-based sorting algorithms, of which heapsort is one. No comparison-based sort can do have runtime better than Ω(n log n) in the average case. Since heapsort is Θ(n log n), this means that it's asymptotically optimal and there can't be an O(n) average-time variant (at least, not a comparison-based one). The proof of this argument comes from information theory - without doing at least Ω(n log n) comparisons, there is no way to reliably distinguish the input permutation from any of the other input permutations.
The soft heap was invented by starting with a binomial heap and corrupting some fraction of the keys such that inserting and dequeuing n elements from a soft heap does not necessarily sort them. (The original paper on soft heaps mentions in its abstract that the ingenuity of the structure is artificially decreasing the "entropy" of the values stored to beat the Ω(n log n) barrier). This is the reason why the soft heap can support O(1)-time operations: unlike a normal heap structure, it doesn't always sort, and therefore is not bound by the runtime barrier given above. Consequently, the very fact that n objects can be enqueued and dequeued from a soft heap in O(n) time immediately indicates that you cannot use a soft heap to speed up heapsort.
More generally, there is no way to use any comparison-based data structure to build a sorting algorithm unless you do at least Ω(n log n) work on average when using that data structure. For example, this earlier question explains why you can't convert a binary heap to a BST in O(n) time, since doing so would let you sort in O(n) time purely using comparisons (build the heap in O(n) time, then convert to a BST in O(n) time, then do an inorder traversal in O(n) time to recover the sorted sequence).
Hope this helps!

Analysis of speed and memory for heapsort

I tried googling and wiki'ing these questions but can't seem to find concrete answers. Most of what I found involved using proofs with the master theorem, but I'm hoping for something in plain English that can be more intuitively remembered. Also I am not in school and these questions are for interviewing.
MEMORY:
What exactly does it mean to determine big-o in terms of memory usage? For example, why is heapsort considered to run with O(1) memory when you have to store all n items? Is it because you are creating only one structure for the heap? Or is it because you know its size and so you can create it on the stack, which is always constant memory usage?
SPEED:
How is the creation of the heap done in O(n) time if adding elements is done in O(1) but percolating is done in O(logn)? Wouldn't that mean you do n inserts at O(1) making it O(n) and percolating after each insert is O(logn). So O(n) * O(logn) = O(nlogn) in total. I also noticed most implementations of heap sort use a heapify function instead of percolating to create the heap? Since heapify does n comparisons at O(logn) that would be O(nlogn) and with n inserts at O(1) we would get O(n) + O(nlogn) = O(nlogn)? Wouldn't the first approach yield better performance than the second with small n?
I kind of assumed this above, but is it true that doing an O(1) operation n times would result in O(n) time? Or does n * O(1) = O(1)?
So I found some useful info about building a binary heap from wikipedia: http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap.
I think my main source of confusion was how "inserting" into a heap is both O(1) and O(logn), even though the first shouldn't be called an insertion and maybe just a build step or something. So you wouldn't use heapify anymore after you've already created your heap, instead you'd use the O(logn) insertion method.
The method of adding items iteratively while maintaining the heap property runs in O(nlogn) and creating the heap without respecting the heap property, and then heapifying, actually runs in O(n), the reason which isn't very intuitive and requires a proof, so I was wrong about that.
The removal step to get the ordered items is the same cost, O(nlogn), after each method has a heap that respects the heap property.
So in the end you'd have either an O(1) + O(n) + O(nlogn) = O(nlogn) for the build heap method, and an O(nlogn) + O(nlogn) = O(nlogn) for the insertion method. Obviously the first is preferable, especially for small n.

How to compute the algorithmic space complexity

I am reviewing my data structures and algorithm analysis lesson, and I get a question that how to determine to the space complexity of merge sort and quick sort
algorithms ?
The depth of recursion is only O(log n) for linked list merge-sort
The amount of extra storage space needed for contiguous quick sort is O(n).
My thoughts:
Both use divide-and-conquer strategy, so I guess the space complexity of linked list merge sort should be same as the contiguous quick sort. Actually I opt for O(log n) because before every iteration or recursion call the list is divided in half.
Thanks for any pointers.
The worst case depth of recursion for quicksort is not (necessarily) O(log n), because quicksort doesn't divide the data "in half", it splits it around a pivot which may or may not be the median. It's possible to implement quicksort to address this[*], but presumably the O(n) analysis was of a basic recursive quicksort implementation, not an improved version. That would account for the discrepancy between what you say in the blockquote, and what you say under "my thoughts".
Other than that I think your analysis is sound - neither algorithm uses any extra memory other than a fixed amount per level of recursion, so depth of recursion dictates the answer.
Another possible way to account for the discrepancy, I suppose, is that the O(n) analysis is just wrong. Or, "contiguous quicksort" isn't a term I've heard before, so if it doesn't mean what I think it does ("quicksorting an array"), it might imply a quicksort that's necessarily space-inefficient in some sense, such as returning an allocated array instead of sorting in-place. But it would be silly to compare quicksort and mergesort on the basis of the depth of recursion of the mergesort vs. the size of a copy of the input for the quicksort.
[*] Specifically, instead of calling the function recursively on both parts, you put it in a loop. Make a recursive call on the smaller part, and loop around to do the bigger part, or equivalently push (pointers to) the larger part onto a stack of work to do later, and loop around to do the smaller part. Either way, you ensure that the depth of the stack never exceeds log n, because each chunk of work not put on the stack is at most half the size of the chunk before it, down to a fixed minimum (1 or 2 if you're sorting purely with quicksort).
I'm not really familiar with the term "contiguous quicksort". But quicksort can have either O(n) or O(log n) space complexity depending on how it is implemented.
If it is implemented as follows:
quicksort(start,stop) {
m=partition(start,stop);
quicksort(start,m-1);
quicksort(m+1,stop);
}
Then the space complexity is O(n), not O(log n) as is commonly believed.
This is because you are pushing onto the stack twice at each level, so the space complexity is determined from the recurrance:
T(n) = 2*T(n/2)
Assuming the partitioning divides the array into 2 equal parts (best case). The solution to this according to the Master Theorem is T(n) = O(n).
If we replace the second quicksort call with tail recursion in the code snippet above, then you get T(n) = T(n/2) and therefore T(n) = O(log n) (by case 2 of the Master theorem).
Perhaps the "contiguous quicksort" refers to the first implementation because the two quicksort calls are next to each other, in which case the space complexity is O(n).

Resources