What is the extract-min run-time and why? - data-structures

why does it take O(lg n) for extract-min in a heap?

See there: https://en.wikipedia.org/wiki/Binary_heap#Extract
In a few words, you do three steps:
First, replace the root of the heap with the last element on the last level. It's done for O(1).
Second, compare new root with it's children (still O(1)). If everything is ok, you're done!
Now it's tricky. If new root is bigger (we consider you have a minimum on the top) than one of it's children, swap them! Now you (probably) have the same trouble in the subtree of the child. Go ahead and repeat steps 2 and 3 inside of the subtree! You will not have to repeat this more times than the count of levels in the tree, since each time you go one level down. Balanced binary tree has O(lg N) levels in it. Note: if your root is bigger than both of it's children, swap with the lesser one (you want to bring it up).

Related

Why is the top down approach of heap construction less efficient than bottom up even though its order of growth is lower O(log n) over O(n)?

How is the bottom up approach of heap construction of the order O(n) ? Anany Levitin says in his book that this is more efficient compared to top down approach which is of order O(log n). Why?
That to me seems like a typo.
There are two standard algorithms for building a heap. The first is to start with an empty heap and to repeatedly insert elements into it one at a time. Each individual insertion takes time O(log n), so we can upper-bound the cost of this style of heap-building at O(n log n). It turns out that, in the worst case, the runtime is Θ(n log n), which happens if you insert the elements in reverse-sorted order.
The other approach is the heapify algorithm, which builds the heap directly by starting with each element in its own binary heap and progressively coalescing them together. This algorithm runs in time O(n) regardless of the input.
The reason why the first algorithm requires time Θ(n log n) is that, if you look at the second half of the elements being inserted, you'll see that each of them is inserted into a heap whose height is Θ(log n), so the cost of doing each bubble-up can be high. Since there are n / 2 elements and each of them might take time Θ(log n) to insert, the worst-case runtime is Θ(n log n).
On the other hand, the heapify algorithm spends the majority of its time working on small heaps. Half the elements are inserted into heaps of height 0, a quarter into heaps of height 1, an eighth into heaps of height 2, etc. This means that the bulk of the work is spent inserting elements into small heaps, which is significantly faster.
If you consider swapping to be your basic operation -
In top down construction,the tree is constructed first and a heapify function is called on the nodes.The worst case would swap log n times ( to sift the element to the top of the tree where height of tree is log n) for all the n/2 leaf nodes. This results in a O(n log n) upper bound.
In bottom up construction, you assume all the leaf nodes to be in order in the first pass, so heapify is now called only on n/2 nodes. At each level, the number of possible swaps increases but the number of nodes on which it happens decreases.
For example -
At the level right above leaf nodes,
we have n/4 nodes that can have at most 1 swap each.
At its' parent level we have,
n/8 nodes that can have at most 2 swaps each and so on.
On summation, we'll come up with a O(n) efficiency for bottom up construction of a heap.
It generally refers to a way of solving a problem. Especially in computer science algorithms.
Top down :
Take the whole problem and split it into two or more parts.
Find solution to these parts.
If these parts turn out to be too big to be solved as a whole, split them further and find find solutions to those sub-parts.
Merge solutions according to the sub-problem hierarchy thus created after all parts have been successfully solved.
In the regular heapify(), we perform two comparisons on each node from top to bottom to find the largest of three elements:
Parent node with left child
The larger node from the first comparison with the second child
Bottom up :
Breaking the problem into smallest possible(and practical) parts.
Finding solutions to these small sub-problems.
Merging the solutions you get iteratively(again and again) till you have merged all of them to get the final solution to the "big" problem. The main difference in approach is splitting versus merging. You either start big and split "down" as required or start with the smallest and merge your way "up" to the final solution.
Bottom-up Heapsort, on the other hand, only compares the two children and follows the larger child to the end of the tree ("top-down"). From there, the algorithm goes back towards the tree root (“bottom-up”) and searches for the first element larger than the root. From this position, all elements are moved one position towards the root, and the root element is placed in the field that has become free.
Binary Heap can be built in two ways:
Top-Down Approach
Bottom-Up Approach
In the Top-Down Approach, first begin with 3 elements. You consider 2 of them as heaps and the third as a key k. You then create a new Heap by joining these two sub-heaps with the key as the root node. Then, you perform Heapify to maintain the heap order (either Min or Max Heap order).
The, we take two such heaps(containing 3 elements each) and another element as a k, and create a new heap. We keep repeating this process, and increasing the size of each sub-heap until all elements are added.
This process adds half the elements in the bottom level, 1/4th in the second last one, 1/8th in the third last one and so on, therefore, the complexity of this approach results in a nearly observed time of O(n).
In the bottom up approach, we first simply create a complete binary tree from the given elements. We then apply DownHeap operation on each parent of the tree, starting from the last parent and going up the tree until the root. This is a much simpler approach. However, as DownHeap's worst case is O(logn) and we will be applying it on n/2 elements of the tree; the time complexity of this particular method results in O(nlogn).
Regards.

Time complexity of inserting in to a heap

I am trying to mostly understand the reasoning behind the Big O and Omega of inserting a new element in a heap. I know I can find answers online but I really like having a thorough understanding rather than just finding answers online and just memorizing blindly.
So for instance if we have the following heap (represented in array format)
[8,6,7,3,5,3,4,1,2]
If we decide to insert a new element "4" our array will look like this now
[8,6,7,3,5,3,4,1,2,4]
It would be placed in index 9 and since this is a 0th index based array its parent would be index 4 which is element 5. In this case we would not need to do anything because 4 is < 5 and it does not violate the binary heap property. So best case is OMEGA(1).
However if the new element we insert is 100 then we would have to call the max-heapify function which has running time of O(log n) and therefore in the worst case inserting a new element in the heap takes O(log n).
Can someone correct me if I am wrong because I am not sure if my understanding or reasoning is 100%?
Yes you are right about the best-case running time. And for the worst-case running time, you are also right that this is Theta(lg n) and the reason why is that your heap is always assumed to be BALANCED, i.e. every height level set of nodes is full except at the bottom level. So when you insert an element at the bottom level and swap from one level up to the next level in your heap, the number of nodes at that level is cut roughly in half and so you can only do this swap log_2(n) = O(lg n) times before you are at the root node (i.e. the level at the top of the heap that has just one node). And if you insert a value that belongs at the top of the heap, initially at the bottom of the heap then you will indeed have to do basically log_2(n) swaps to get the element to the top of the heap where it belongs.. So the number of swaps in the worst case is Theta(lg n).
Your understanding is correct.
Since the heap has a complete binary tree structure, its height = lg n (where n is no of elements).
In the worst case (element inserted at the bottom has to be swapped at every level from bottom to top up to the root node to maintain the heap property), 1 swap is needed on every level. Therefore, the maximum no of times this swap is performed is log n.
Hence, insertion in a heap takes O(log n) time.

Why siftDown is better than siftUp in heapify?

To build a MAX heap tree, we can either siftDown or siftUp, by sifting down we start from the root and compare it to its two children, then we replace it with the larger element of the two children, if both children are smaller then we stop, otherwise we continue sifting that element down until we reach a leaf node (or of course again, until that element is larger that both of its children).
Now we will only need to do that n/2 times, because the number of leaves is n/2, and the leaves will satisfy the heap property when we finish heapifying the last element on the level before the last (before the leaves) - so we will be left with n/2 elements to heapify.
Now if we use siftUp, we will start with the leaves, and eventually we will need to heapify all n elements.
My question is: when we use siftDown, aren't we basically doing two comparisons (comparing the element to its both children), instead of only one comparison when using siftUp, since we only compare that element to its one parent? If yes, wouldn't that mean that we're doubling the complexity and really ending up with the same exact complexity as sifting down?
Actually, building a heap with repeated calls of siftDown has a complexity of O(n) whereas building it with repeated calls of siftUp has a complexity of O(nlogn).
This is due to the fact that when you use siftDown, the time taken by each call decreases with the depth of the node because these nodes are closer to the leaves. When you use siftUp, the number of swaps increases with the depth of the node because if you are at full depth, you may have to swap all the way to the root. As the number of nodes grows exponentially with the depth of the tree, using siftUp gives a more expensive algorithm.
Moreover, if you are using a Max-heap to do some sort of sorting where you pop the max element of the heap and then reheapify it, it's easier to do so by using siftDown. You can reheapify in O(logn) time by popping the max element, putting the last element at the root node (which was empty because you popped it) and then sifting it down all the way back to its correct spot.

O(klogk) time algorithm to find kth smallest element from a binary heap

We have an n-node binary heap which contains n distinct items (smallest item at the root). For a k<=n, find a O(klogk) time algorithm to select kth smallest element from the heap.
O(klogn) is obvious, but couldn't figure out a O(klogk) one. Maybe we can use a second heap, not sure.
Well, your intuition was right that we need extra data structure to achieve O(klogk) because if we simply perform operations on the original heap, the term logn will remain in the resulting complexity.
Guessing from the targeted complexity O(klogk), I feel like creating and maintaining a heap of size k to help me achieve the goal. As you may be aware, building a heap of size k in top-down fashion takes O(klogk), which really reminds me of our goal.
The following is my try (not necessarily elegant or efficient) in an attempt to attain O(klogk):
We create a new min heap, initializing its root to be the root of the original heap.
We update the new min heap by deleting the current root and inserting the two children of the current root in the original heap. We repeat this process k times.
The resulting heap will consist of k nodes, the root of which is the kth smallest element in the original heap.
Notes: Nodes in the new heap should store indexes of their corresponding nodes in the original heap, rather than the node values themselves. In each iteration of step 2, we really add a net of one more node into the new heap (one deleted, two inserted), k iterations of which will result in our new heap of size k. During the ith iteration, the node to be deleted is the ith smallest element in the original heap.
Time Complexity: in each iteration, it takes O(3logk) time to delete one element from and insert two into the new heap. After k iterations, it is O(3klogk) = O(klogk).
Hope this solution inspires you a bit.
Assuming that we're using a minheap, so that a root node is always smaller than its children nodes.
Create a sorted list toVisit, which contains the nodes which we will traverse next. This is initially just the root node.
Create an array smallestNodes. Initially this is empty.
While length of smallestNodes < k:
Remove the smallest Node from toVisit
add that node to smallestNodes
add that node's children to toVisit
When you're done, the kth smallest node is in smallestNodes[k-1].
Depending on the implementation of toVisit, you can get insertion in log(k) time and removal in constant time (since you're only removing the topmost node). That makes O(k*log(k)) total.

Listing values in a binary heap in sorted order using breadth-first search?

I'm currently reading this paper and on page five, it discusses properties of binary heaps that it considers to be common knowledge. However, one of the points they make is something that I haven't seen before and can't make sense of. The authors claim that if you are given a balanced binary heap, you can list the elements of that heap in sorted order in O(log n) time per element using a standard breadth-first search. Here's their original wording:
In a balanced heap, any new element can be
inserted in logarithmic time. We can list the elements of a heap in order by weight, taking logarithmic
time to generate each element, simply by using breadth first search.
I'm not sure what the authors mean by this. The first thing that comes to mind when they say "breadth-first search" would be a breadth-first search of the tree elements starting at the root, but that's not guaranteed to list the elements in sorted order, nor does it take logarithmic time per element. For example, running a BFS on this min-heap produces the elements out of order no matter how you break ties:
1
/ \
10 100
/ \
11 12
This always lists 100 before either 11 or 12, which is clearly wrong.
Am I missing something? Is there a simple breadth-first search that you can perform on a heap to get the elements out in sorted order using logarithmic time each? Clearly you can do this by destructively modifying heap by removing the minimum element each time, but the authors' intent seems to be that this can be done non-destructively.
You can get the elements out in sorted order by traversing the heap with a priority queue (which requires another heap!). I guess this is what he refers to as a "breadth first search".
I think you should be able to figure it out (given your rep in algorithms) but basically the key of the priority queue is the weight of a node. You push the root of the heap onto the priority queue. Then:
while pq isn't empty
pop off pq
append to output list (the sorted elements)
push children (if any) onto pq
I'm not really sure (at all) if this is what he was referring to but it vaguely fitted the description and there hasn't been much activity so I thought I might as well put it out there.
In case that you know that all elements lower than 100 are on left you can go left, but in any case even if you get to 100 you can see that there no elements on left so you go out. In any case you go from node (or any other node) at worst twice before you realise that there are no element you are searching for. Than men that you go in this tree at most 2*log(N) times. This is simplified to log(N) complexity.
Point is that even if you "screw up" and you traverse to "wrong" node you go that node at worst once.
EDIT
This is just how heapsort works. You can imagine, that you have to reconstruct heap using N(log n) complexity each time you take out top element.

Resources