Time complexity of inserting in to a heap - algorithm

I am trying to mostly understand the reasoning behind the Big O and Omega of inserting a new element in a heap. I know I can find answers online but I really like having a thorough understanding rather than just finding answers online and just memorizing blindly.
So for instance if we have the following heap (represented in array format)
[8,6,7,3,5,3,4,1,2]
If we decide to insert a new element "4" our array will look like this now
[8,6,7,3,5,3,4,1,2,4]
It would be placed in index 9 and since this is a 0th index based array its parent would be index 4 which is element 5. In this case we would not need to do anything because 4 is < 5 and it does not violate the binary heap property. So best case is OMEGA(1).
However if the new element we insert is 100 then we would have to call the max-heapify function which has running time of O(log n) and therefore in the worst case inserting a new element in the heap takes O(log n).
Can someone correct me if I am wrong because I am not sure if my understanding or reasoning is 100%?

Yes you are right about the best-case running time. And for the worst-case running time, you are also right that this is Theta(lg n) and the reason why is that your heap is always assumed to be BALANCED, i.e. every height level set of nodes is full except at the bottom level. So when you insert an element at the bottom level and swap from one level up to the next level in your heap, the number of nodes at that level is cut roughly in half and so you can only do this swap log_2(n) = O(lg n) times before you are at the root node (i.e. the level at the top of the heap that has just one node). And if you insert a value that belongs at the top of the heap, initially at the bottom of the heap then you will indeed have to do basically log_2(n) swaps to get the element to the top of the heap where it belongs.. So the number of swaps in the worst case is Theta(lg n).

Your understanding is correct.
Since the heap has a complete binary tree structure, its height = lg n (where n is no of elements).
In the worst case (element inserted at the bottom has to be swapped at every level from bottom to top up to the root node to maintain the heap property), 1 swap is needed on every level. Therefore, the maximum no of times this swap is performed is log n.
Hence, insertion in a heap takes O(log n) time.

Related

Comparing complexity of Binary Indexed Tree operations with normal approach

I was going through this article to understand BIT : https://www.hackerearth.com/practice/notes/binary-indexed-tree-or-fenwick-tree/#c217533
In this the author says the following at one place:
If we look at the for loop in update() operation, we can see that the loop runs at most the number of bits in index x which is restricted to be less or equal to n (the size of the given array), so we can say that the update operation takes at most O(log2(n)) time
Then my question is that, if it can go upto n (the size of the given array), then how is the time complexity any different from the normal approach he has mentioned at the starting because in that update should be O(1) ? and prefixsum(int k) can go upto max n.
The key is that you don't do a step of 1 in the loop, but a step of the size x&-x.
This is equivalent to going upwards in the tree, to the next relevant node that needs to include the current one and thus gives you a worst case of O(log n).

Why is the top down approach of heap construction less efficient than bottom up even though its order of growth is lower O(log n) over O(n)?

How is the bottom up approach of heap construction of the order O(n) ? Anany Levitin says in his book that this is more efficient compared to top down approach which is of order O(log n). Why?
That to me seems like a typo.
There are two standard algorithms for building a heap. The first is to start with an empty heap and to repeatedly insert elements into it one at a time. Each individual insertion takes time O(log n), so we can upper-bound the cost of this style of heap-building at O(n log n). It turns out that, in the worst case, the runtime is Θ(n log n), which happens if you insert the elements in reverse-sorted order.
The other approach is the heapify algorithm, which builds the heap directly by starting with each element in its own binary heap and progressively coalescing them together. This algorithm runs in time O(n) regardless of the input.
The reason why the first algorithm requires time Θ(n log n) is that, if you look at the second half of the elements being inserted, you'll see that each of them is inserted into a heap whose height is Θ(log n), so the cost of doing each bubble-up can be high. Since there are n / 2 elements and each of them might take time Θ(log n) to insert, the worst-case runtime is Θ(n log n).
On the other hand, the heapify algorithm spends the majority of its time working on small heaps. Half the elements are inserted into heaps of height 0, a quarter into heaps of height 1, an eighth into heaps of height 2, etc. This means that the bulk of the work is spent inserting elements into small heaps, which is significantly faster.
If you consider swapping to be your basic operation -
In top down construction,the tree is constructed first and a heapify function is called on the nodes.The worst case would swap log n times ( to sift the element to the top of the tree where height of tree is log n) for all the n/2 leaf nodes. This results in a O(n log n) upper bound.
In bottom up construction, you assume all the leaf nodes to be in order in the first pass, so heapify is now called only on n/2 nodes. At each level, the number of possible swaps increases but the number of nodes on which it happens decreases.
For example -
At the level right above leaf nodes,
we have n/4 nodes that can have at most 1 swap each.
At its' parent level we have,
n/8 nodes that can have at most 2 swaps each and so on.
On summation, we'll come up with a O(n) efficiency for bottom up construction of a heap.
It generally refers to a way of solving a problem. Especially in computer science algorithms.
Top down :
Take the whole problem and split it into two or more parts.
Find solution to these parts.
If these parts turn out to be too big to be solved as a whole, split them further and find find solutions to those sub-parts.
Merge solutions according to the sub-problem hierarchy thus created after all parts have been successfully solved.
In the regular heapify(), we perform two comparisons on each node from top to bottom to find the largest of three elements:
Parent node with left child
The larger node from the first comparison with the second child
Bottom up :
Breaking the problem into smallest possible(and practical) parts.
Finding solutions to these small sub-problems.
Merging the solutions you get iteratively(again and again) till you have merged all of them to get the final solution to the "big" problem. The main difference in approach is splitting versus merging. You either start big and split "down" as required or start with the smallest and merge your way "up" to the final solution.
Bottom-up Heapsort, on the other hand, only compares the two children and follows the larger child to the end of the tree ("top-down"). From there, the algorithm goes back towards the tree root (“bottom-up”) and searches for the first element larger than the root. From this position, all elements are moved one position towards the root, and the root element is placed in the field that has become free.
Binary Heap can be built in two ways:
Top-Down Approach
Bottom-Up Approach
In the Top-Down Approach, first begin with 3 elements. You consider 2 of them as heaps and the third as a key k. You then create a new Heap by joining these two sub-heaps with the key as the root node. Then, you perform Heapify to maintain the heap order (either Min or Max Heap order).
The, we take two such heaps(containing 3 elements each) and another element as a k, and create a new heap. We keep repeating this process, and increasing the size of each sub-heap until all elements are added.
This process adds half the elements in the bottom level, 1/4th in the second last one, 1/8th in the third last one and so on, therefore, the complexity of this approach results in a nearly observed time of O(n).
In the bottom up approach, we first simply create a complete binary tree from the given elements. We then apply DownHeap operation on each parent of the tree, starting from the last parent and going up the tree until the root. This is a much simpler approach. However, as DownHeap's worst case is O(logn) and we will be applying it on n/2 elements of the tree; the time complexity of this particular method results in O(nlogn).
Regards.

Min-Heap to Max-Heap, Comparison

I want to Find Maximum number of comparison when convert min-heap to max-heap with n node. i think convert min-heap to max-heap with O(n). it means there is no way and re-create the heap.
As a crude lower bound, given a tree with the (min- or max-) heap property, we have no prior idea about how the values at the leaves compare to one another. In a max heap, the values at the leaves all may be less than all values at the interior nodes. If the heap has the topology of a complete binary tree, then even finding the min requires at least roughly n/2 comparisons, where n is the number of tree nodes.
If you have a min-heap of known size then you can create a binary max-heap of its elements by filling an array from back to front with the values obtained by iteratively deleting the root node from the min-heap until it is exhausted. Under some circumstances this can even be done in place. Using the rule that the root node is element 0 and the children of node i are elements 2i and 2i+1, the (max-) heap condition will automatically be satisfied for the heap represented by the new array.
Each deletion from a min-heap of size m requires up to log(m) element comparisons to restore the heap condition, however. I think that adds up to O(n log n) comparisons for the whole job. I am doubtful that you can do it any with any lower complexity without adding conditions. In particular, if you do not perform genuine heap deletions (incurring the cost of restoring the heap condition), then I think you incur comparable additional costs to ensure that you end up with a heap in the end.

Median of large amount of numbers for each sets of given size

The question is :
find median of a large data(n numbers) in fixed size (k) of numbers
What i thought is :
maintain 2 heaps , maximum heap for numbers less than current median and minimum heap for numbers greater than current median .
The main concept is to FIND the 1st element of previous set in one of the heap (depending on it is < or > current median), and replace it with the new element we encounter .
Now modify such as to make |size(heap1) - size(heap2)| = 1 or 0 because median is avg. of top element if size1 != size2 else the top element of the heap with size > size of other .
The problem i am facing is the time complexity increases because finding the element takes O(n) time so total O(n*k), so i am not able to achieve the desired complexity O(n*logk) (as was required in source of the question).
How should it be reduced , without using extra space ?
edit : input : 1 4 3 5 6 2 , k=4
median :
from 1 4 3 5 = (4+3)/2
from 4 3 5 6 = (4+5)/2
from 3 5 6 2= (3+5)/2
You can solve this problem using an order-statistic tree, which is a BST with some additional information that allows finding medians, quantiles and other order statistics in O(log n) time in a tree with n elements.
First, construct an OST with the first k elements. Then, in a loop:
Find and report the median value.
Remove the first element that was inserted into the tree (you can find out which element this was in the array).
Insert the next element from the array.
Each of these steps takes O(log k) if the tree is self-balancing, because we maintain the invariant that the tree never grows beyond size k, which also gives O(k) auxiliary space. The preprocessing takes O(k log k) time while the loop repeats n + 1 - k times for a total time of O(n log k).
If you can find a balanced tree implementation that gives you efficient access to the central element you should probably use it. You could also do this with heaps much as you suggest, as long as you keep an extra array of length k which tells you where each element in the window lives in its heap, and which heap it is in. You will have to modify the code that maintains the heap to update this array when it moves things around, but heap code is a lot easier to write and a lot smaller than balanced tree code. Then you don't need to search through all the heap to remove the item which has just gone off the edge of the window and the cost is down to n log k.
This problem seems like the one which you have in efficient implementation in dijkstra shortest path where we need to delete(update in case of dijkstra) an element which is not at top of heap.But you can use the same work around to this problem but with extra space complexity. Firstly you cannot use inbuilt heap library, create your own heap data structure but maintain pointers to each element in the heap so while adding or removing element update the pointers to each element. So after calculating median of first k elements delete the first element directly from heap (min or max) according to whether it is greater or less than median using pointers and then use heapify at that position. Then sizes of heaps change and then you can get new median using same logic as you are using for adjusting the heap sizes and calculating median.
Heapify takes O(logk) hence your total cost will be O(n*logk) but will need O(n) more space for pointers.

Algorithm for merging two max heaps?

Is there an efficient algorithm for merging 2 max-heaps that are stored as arrays?
It depends on what the type of the heap is.
If it's a standard heap where every node has up to two children and which gets filled up that the leaves are on a maximum of two different rows, you cannot get better than O(n) for merge.
Just put the two arrays together and create a new heap out of them which takes O(n).
For better merging performance, you could use another heap variant like a Fibonacci-Heap which can merge in O(1) amortized.
Update:
Note that it is worse to insert all elements of the first heap one by one to the second heap or vice versa since an insertion takes O(log(n)).
As your comment states, you don't seem to know how the heap is optimally built in the beginning (again for a standard binary heap)
Create an array and put in the elements of both heaps in some arbitrary order
now start at the lowest level. The lowest level contains trivial max-heaps of size 1 so this level is done
move a level up. When the heap condition of one of the "sub-heap"s gets violated, swap the root of the "sub-heap" with it's bigger child. Afterwards, level 2 is done
move to level 3. When the heap condition gets violated, process as before. Swap it down with it's bigger child and process recursively until everything matches up to level 3
...
when you reach the top, you created a new heap in O(n).
I omit a proof here but you can explain this since you have done most of the heap on the bottom levels where you didn't have to swap much content to re-establish the heap condition. You have operated on much smaller "sub heaps" which is much better than what you would do if you would insert every element into one of the heaps => then, you willoperate every time on the whole heap which takes O(n) every time.
Update 2: A binomial heap allows merging in O(log(n)) and would conform to your O(log(n)^2) requirement.
Two binary heaps of sizes n and k can be merged in O(log n * log k) comparisons. See
Jörg-R. Sack and Thomas Strothotte, An algorithm for merging heaps, Acta Informatica 22 (1985), 172-186.
I think what you're looking for in this case is a Binomial Heap.
A binomial heap is a collection of binomial trees, a member of the merge-able heap family. The worst-case running time for a union (merge) on 2+ binomial heaps with n total items in the heaps is O(lg n).
See http://en.wikipedia.org/wiki/Binomial_heap for more information.

Resources