Analysis of speed and memory for heapsort - sorting

I tried googling and wiki'ing these questions but can't seem to find concrete answers. Most of what I found involved using proofs with the master theorem, but I'm hoping for something in plain English that can be more intuitively remembered. Also I am not in school and these questions are for interviewing.
MEMORY:
What exactly does it mean to determine big-o in terms of memory usage? For example, why is heapsort considered to run with O(1) memory when you have to store all n items? Is it because you are creating only one structure for the heap? Or is it because you know its size and so you can create it on the stack, which is always constant memory usage?
SPEED:
How is the creation of the heap done in O(n) time if adding elements is done in O(1) but percolating is done in O(logn)? Wouldn't that mean you do n inserts at O(1) making it O(n) and percolating after each insert is O(logn). So O(n) * O(logn) = O(nlogn) in total. I also noticed most implementations of heap sort use a heapify function instead of percolating to create the heap? Since heapify does n comparisons at O(logn) that would be O(nlogn) and with n inserts at O(1) we would get O(n) + O(nlogn) = O(nlogn)? Wouldn't the first approach yield better performance than the second with small n?
I kind of assumed this above, but is it true that doing an O(1) operation n times would result in O(n) time? Or does n * O(1) = O(1)?

So I found some useful info about building a binary heap from wikipedia: http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap.
I think my main source of confusion was how "inserting" into a heap is both O(1) and O(logn), even though the first shouldn't be called an insertion and maybe just a build step or something. So you wouldn't use heapify anymore after you've already created your heap, instead you'd use the O(logn) insertion method.
The method of adding items iteratively while maintaining the heap property runs in O(nlogn) and creating the heap without respecting the heap property, and then heapifying, actually runs in O(n), the reason which isn't very intuitive and requires a proof, so I was wrong about that.
The removal step to get the ordered items is the same cost, O(nlogn), after each method has a heap that respects the heap property.
So in the end you'd have either an O(1) + O(n) + O(nlogn) = O(nlogn) for the build heap method, and an O(nlogn) + O(nlogn) = O(nlogn) for the insertion method. Obviously the first is preferable, especially for small n.

Related

Time complexity of Heap Sort [duplicate]

This question already has answers here:
How can building a heap be O(n) time complexity?
(18 answers)
Closed 3 years ago.
I read everywhere that the time complexity of heapsort is O(nlog(n)) in the worst case. But we also read everywhere that it is a common misconception that a heap is built in O(nlog(n)). Instead, you can make a heap in O(n). So considering that a heap can be made in O(n), look at the following sorting algorithm and tell me where I am wrong in analyzing its time complexity.
Put n elements into a heap (time: O(n))
Until the heap is empty, pop each element and copy it into an array. (time: O(n). Why? because in the same way all elements can be put into a heap in O(n), all of them can also be extracted in O(n). Right?).
All in all, the complexity is O(n)+O(n) which is O(n). But here, we also need an additional memory of O(n).
I know the traditional heapsort has time complexity of O(nlog(n)) and memory complexity of O(1). But isn't this heapsort too? And it provides O(n) even in the worst case, unlike the traditional heapsort algorithm.
Note that you cant sort an array in O(n) without any additional information on your data. actually, we can prove a lower bound of O(nlogn) on sorting an array while using comaprison based algorithm! and for the same reason, we can prove that lower bound on Heaport!
meaning- you cant, ever, sort any data structure in O(n)! any linear sorting algorithm have to assume some prior knowledge about your data.
for more information about how to prove this lower bound of O(nlogn) search for "decision trees"

Sort Stack Ascending Order (Space Analysis)

I was going through the book "Cracking the Coding Interview" and came across the question
"Write a program to sort a stack in ascending order. You may use additional stacks to hold items, but you may not copy the elements into any other data structures (such as an array). The stack supports the following operations: push, pop, peek, isEmpty."
The book gave an answer with O(n^2) time complexity and O(n) space.
However, I came across this blog providing an answer in O(n log n) time complexity using quicksort approach.
What I was wondering was is the space complexity O(n^2) though? Since each call to the method involves initializing another two stacks, along with another two recursive calls.
I'm still a little shaky on space complexity. I'm not sure if this would be O(n^2) space with the new stacks spawned from each recursive call being smaller than the ones a level up.
If anyone could give a little explanation behind their answer, that would be great.
The space complexity is also O(n log n) in average case. If space complexity happens to be O(n^2), then how can time complexity be O(n log n), as each space allocated need at least one access.
So, in average case, assuming that stack is divided in half each time, at ith depth of recursion, size of array becomes O(n/2^i) with 2^i recursion branches on ith depth.
So total size allocated on ith depth is O(n/2^i) *2^i = O(n).
Since maximum depth is log n, so overall space complexity is O(n log n).
However, in worst case, space complexity is O(n^2).
In this method of quicksort, the space complexity will exactly follow the time complexity - the reason is quite simple. You are dividing the sub stacks recursively (using the pivot) until each element is in a stack of size one. This leads to (2^x = n) divisions of x sub stacks (log n depth) and in the end you have n stacks each of size one. Hence the total space complexity will be O(n*log n).
Keep in mind that in this case, the space complexity will follow the time complexity exactly as we are literally occupying new space at each iteration. So, in the worst case, the space complexity will be O(n^2).

Heapsort: why not use "Soft Heap" to boost the performance?

From the Soft Heap wikipedia page, it seems that the min extraction only takes constant time, thus using a soft heap to perform heapsort should lead to an amortized O(n). Even if the constant is large, for very large n this algorithm should be very useful. But I've never heard people mentioning this. Is there a reason that people do not use this?
Thanks!
The Soft Heap suffers from "corruption" (read the page you link to), which makes it inapplicable as a component of a general-purpose sorting routine. You will simply get the wrong answer most of the time.
If you have some application that requires a sort but could deal with the "corrupted" results you would get from a Soft Heap as part of the implementation, then this would give you a potential speedup.
The Fibonacci Heap does not suffer from "corruption," however it has a O(log n) delete time. You can use Fibonacci Heap as part of a general-purpose sorting routine, but the overall performance of your sort will be O(n log n).
To follow up on #Rob's point:
There is a theoretical limit on the efficiency of comparison-based sorting algorithms, of which heapsort is one. No comparison-based sort can do have runtime better than Ω(n log n) in the average case. Since heapsort is Θ(n log n), this means that it's asymptotically optimal and there can't be an O(n) average-time variant (at least, not a comparison-based one). The proof of this argument comes from information theory - without doing at least Ω(n log n) comparisons, there is no way to reliably distinguish the input permutation from any of the other input permutations.
The soft heap was invented by starting with a binomial heap and corrupting some fraction of the keys such that inserting and dequeuing n elements from a soft heap does not necessarily sort them. (The original paper on soft heaps mentions in its abstract that the ingenuity of the structure is artificially decreasing the "entropy" of the values stored to beat the Ω(n log n) barrier). This is the reason why the soft heap can support O(1)-time operations: unlike a normal heap structure, it doesn't always sort, and therefore is not bound by the runtime barrier given above. Consequently, the very fact that n objects can be enqueued and dequeued from a soft heap in O(n) time immediately indicates that you cannot use a soft heap to speed up heapsort.
More generally, there is no way to use any comparison-based data structure to build a sorting algorithm unless you do at least Ω(n log n) work on average when using that data structure. For example, this earlier question explains why you can't convert a binary heap to a BST in O(n) time, since doing so would let you sort in O(n) time purely using comparisons (build the heap in O(n) time, then convert to a BST in O(n) time, then do an inorder traversal in O(n) time to recover the sorted sequence).
Hope this helps!

running time of heap sort, when all elements are identical

Can we say that, when all elements are identical in an array A of size n then running time of heap sort is O(n)
--> If this is the case, Is O(n) best case running time of heapsort
When all elements are equal building the heap takes O(n) steps. Because when element gets added to the heap after one compare O(1) we see it is in the correct position.
Removing the root is also O(1), when we swap the tail and root, the heap property is still satisfied.
All elements get added to the heap in O(n), and removed in O(n). So, yes in this case heapsort is O(n). I can't think of a better case so heapsorts best case must be O(n).
'Heapsorts best case is O(n)' means in English something like: there exists arrays of size n such that heapsort needs at most k*n compares to sort it. That's nice in theory but in practice it doesn't say much about how good or fast heapsort is.

How to compute the algorithmic space complexity

I am reviewing my data structures and algorithm analysis lesson, and I get a question that how to determine to the space complexity of merge sort and quick sort
algorithms ?
The depth of recursion is only O(log n) for linked list merge-sort
The amount of extra storage space needed for contiguous quick sort is O(n).
My thoughts:
Both use divide-and-conquer strategy, so I guess the space complexity of linked list merge sort should be same as the contiguous quick sort. Actually I opt for O(log n) because before every iteration or recursion call the list is divided in half.
Thanks for any pointers.
The worst case depth of recursion for quicksort is not (necessarily) O(log n), because quicksort doesn't divide the data "in half", it splits it around a pivot which may or may not be the median. It's possible to implement quicksort to address this[*], but presumably the O(n) analysis was of a basic recursive quicksort implementation, not an improved version. That would account for the discrepancy between what you say in the blockquote, and what you say under "my thoughts".
Other than that I think your analysis is sound - neither algorithm uses any extra memory other than a fixed amount per level of recursion, so depth of recursion dictates the answer.
Another possible way to account for the discrepancy, I suppose, is that the O(n) analysis is just wrong. Or, "contiguous quicksort" isn't a term I've heard before, so if it doesn't mean what I think it does ("quicksorting an array"), it might imply a quicksort that's necessarily space-inefficient in some sense, such as returning an allocated array instead of sorting in-place. But it would be silly to compare quicksort and mergesort on the basis of the depth of recursion of the mergesort vs. the size of a copy of the input for the quicksort.
[*] Specifically, instead of calling the function recursively on both parts, you put it in a loop. Make a recursive call on the smaller part, and loop around to do the bigger part, or equivalently push (pointers to) the larger part onto a stack of work to do later, and loop around to do the smaller part. Either way, you ensure that the depth of the stack never exceeds log n, because each chunk of work not put on the stack is at most half the size of the chunk before it, down to a fixed minimum (1 or 2 if you're sorting purely with quicksort).
I'm not really familiar with the term "contiguous quicksort". But quicksort can have either O(n) or O(log n) space complexity depending on how it is implemented.
If it is implemented as follows:
quicksort(start,stop) {
m=partition(start,stop);
quicksort(start,m-1);
quicksort(m+1,stop);
}
Then the space complexity is O(n), not O(log n) as is commonly believed.
This is because you are pushing onto the stack twice at each level, so the space complexity is determined from the recurrance:
T(n) = 2*T(n/2)
Assuming the partitioning divides the array into 2 equal parts (best case). The solution to this according to the Master Theorem is T(n) = O(n).
If we replace the second quicksort call with tail recursion in the code snippet above, then you get T(n) = T(n/2) and therefore T(n) = O(log n) (by case 2 of the Master theorem).
Perhaps the "contiguous quicksort" refers to the first implementation because the two quicksort calls are next to each other, in which case the space complexity is O(n).

Resources