I know that in the case of inserting numbers 1,2,3,......,n into a initially empty min heap with the order 1,2,3,.....,n, you will just need to put them in one-by-one.
But I can't quite work out how to calculate the time complexity of two different cases: if you insert them in a reverse order (n,n-1,n-2,....,2,1) or even with other numbers with the order (1,n+1,2,n+2,3,n+3,....,n-1,2n-1,n,2n). I know that for the reverse case, you will have to move the numer inserted "along" the height of the heap (which is logn) but I am not quite sure about the remaining parts...
As you say, when you insert the numbers 0..n in-order into a min-heap, insertion is O(1) per item. Because all you have to do is append the number into the array.
When you insert in reverse order, then every item is inserted into the bottom row, and has to be sifted up through the heap to the root. Every insertion has to move up log(n) rows. So insertion is O(log n) per item.
The average, when you're inserting items at random, as discussed at some length in Argument for O(1) average-case complexity of heap insertion and the articles it links, is something like 1.6.
So there is a very strong argument that the average complexity of binary heap insertion is O(1).
In your particular case, your insertions are alternating O(1) and O(log n). So over time you have O((1+log n)/2), which is going to be O(n log n) to insert all of the items.
Related
Consider a binary max-heap with n elements. It will have a height of O(log n). When new elements are inserted into the heap, they will be propagated in the heap so that max-heap property is satisfied always.
The new element will be added as the child on the last level. But post insertion, there can be violation of max-heap property. Hence, heapify method will be used. This will have a time complexity of O(log n) i.e height of the heap.
But can we make it even more efficient?
When multiple insert and delete are performed, this procedure makes things slow. Also, it is a strict requirement that the heap should be a max-heap post every insertion.
The objective is to reduce the time complexity of heapify method. This is possible only when the number of comparisons are reduced.
The objective is to reduce the time complexity of the heapify method.
That is a pity, because that is impossible, in contrast to
Reduce the time complexity of multiple inserts and deletes:
Imagine not inserting into the n item heap immediately,
building an auxiliary one (or even a list).
On delete (extract?), place one item from the auxiliary (now at size k) "in the spot emptied" and do a sift-down or up as required if k << n.
If the auxiliary data structure is not significantly smaller than the main one, merge them.
Such ponderings lead to advanced heaps like Fibonacci, pairing, Brodal…
The time complexity of the insert operation in a heap is dependent on the number of comparisons that are made. One can imagine to use some overhead to implement a smart binary search along the leaf-to-root path.
However, the time complexity is not only determined by the number of comparisons. Time complexity is determined by any work that must be performed, and in this case the number of writes is also O(log𝑛) and that number of writes cannot be reduced.
The number of nodes whose value need to change by the insert operation is O(log𝑛). A reduction of the number of comparisons is not enough to reduce the complexity.
For the following questions
Question 3
You are given a heap with n elements that supports Insert and Extract-Min. Which of the following tasks can you achieve in O(logn) time?
Find the median of the elements stored in the heap.
Find the fifth-smallest element stored in the heap.
Find the largest element stored in the heap.
Find the median of the elements stored in theheap.
Why is "Find the largest element stored in the heap."not correct, my understanding here is that you can use logN time to go to the bottom of the heap, and one of the element there must be the largest element.
"Find the fifth-smallest element stored in the heap." this should take constant time right, because you only need to go down 5 layers at most?
"Find the median of the elements stored in the heap. " should this take O(n) time? because we extract min for the n elements to get a sorted array, and take o(1) to find the median of it?
It depends on what the running times are of the operations insert and extract-min. In traditional heaps, both take ϴ(log n) time. However, in finger-tree-based heaps, only insert takes ϴ(log n) time, while extract-min takes O(1) time. There, you can find the fifth smallest element in O(5) = O(1) time and the median in O(n/2) = O(n) time. You can also find the largest element in O(n) time.
Why is "Find the largest element stored in the heap."not correct, my understanding here is that you can use logN time to go to the bottom of the heap, and one of the element there must be the largest element.
The lowest level of the heap contains half of the elements. More correctly, half of the elements of the heap are leaves--have no children. The largest element in the heap is one of those. Finding the largest element of the heap, then, will require that you examine n/2 items. Except that the heap only supports insert and extract-min, so you end up having to call extract-min on every element. Finding the largest element will take O(n log n) time.
"Find the fifth-smallest element stored in the heap." this should take constant time right, because you only need to go down 5 layers at most?
This can be done in log(n) time. Actually 5*log(n) because you have to call extract-min five times. But we ignore constant factors. However it's not constant time because the complexity of extract-min depends on the size of the heap.
"Find the median of the elements stored in the heap." should this take O(n) time? because we extract min for the n elements to get a sorted array, and take o(1) to find the median of it?
The median is the middle element. So you only have to remove n/2 elements from the heap. But removing an item from the heap is a log(n) operation. So the complexity is O(n/2 log n) and since we ignore constant factors in algorithmic analysis, it's O(n log n).
I thought about doing this in sort array and save the index of the median and its takes O(1). but I couldn't think about any way to do the insert in O(1) and keep the array sorted.
I really appreciate it if someone can help me with this problem
What you are asking for is impossible, because it would allow comparison-based sorting in O(n) time:
Suppose you have an unsorted array of length n.
Find the minimum element and maximum element in O(n) time.
Insert all n elements into the data structure, each insertion takes O(1) time so this takes O(n) time.
Insert n-1 extra copies of the minimum element. This also takes O(n) time.
Initialise an output array of length n.
Do this n times:
Read off the median of the elements currently in the data structure, and write it at the next position into the output array. This takes O(1) time.
Insert two copies of the maximum element into the data structure. This takes O(1) time.
The above algorithm supposedly runs in O(n) time, and the result is a sorted array of the elements from the input array. But this is impossible, because comparison-sorting takes Ω(n log n) time. Therefore, the supposed data structure cannot exist.
I was given a question to find the top log(n) elements in an unsorted array. I know that I can do this in O(n) time with a selection algorithm to find the log(n)-th largest element and then find all elements larger than it. However, would it be possible to use a heap or other priority queue to do it in O(n) time as well?
Thanks
To get the top k items from an unsorted list, you employ the the Quickselect algorithm to partition the array such that the first k elements are at the beginning of the array.
Quickselect is, on average, O(n). However, in pathological cases it can take O(n^2) time.
Using a heap, selecting the top k elements from a list of n is O(n log k). If you're taking the first log(n) items, then k = log(n), and your complexity is O(n log(log(n)).
From a performance perspective, using a heap to select the top log(n) items from a list will almost certainly be faster than using Quickselect. The reason is that Quickselect, even though it's (on average) linear, it has a high constant. It takes almost the same time to select the top 10 items from a list of 1,000,000 as it does to select the top 100,000 items. But my research shows that using a heap to do the selection is faster than Quickselect when the number to be selected is much smaller (less than 1%) of the total number of items. Considering that the log base 2 of 1,000,000 is about 20, it's almost certain that a heap selection algorithm would be faster than Quickselect.
How would you find the k smallest elements from an unsorted array using quicksort (other than just sorting and taking the k smallest elements)? Would the worst case running time be the same O(n^2)?
You could optimize quicksort, all you have to do is not run the recursive potion on the other portions of the array other than the "first" half until your partition is at position k. If you don't need your output sorted, you can stop there.
Warning: non-rigorous analysis ahead.
However, I think the worst-case time complexity will still be O(n^2). That occurs when you always pick the biggest or smallest element to be your pivot, and you devolve into bubble sort (i.e. you aren't able to pick a pivot that divides and conquers).
Another solution (if the only purpose of this collection is to pick out k min elements) is to use a min-heap of limited tree height ciel(log(k)) (or exactly k nodes). So now, for each insert into the min heap, your maximum time for insert is O(n*log(k)) and the same for removal (versus O(n*log(n)) for both in a full heapsort). This will give the array back in sorted order in linearithmic time worst-case. Same with mergesort.