proving level of median in heap - algorithm

I need to prove that the median of binary heap (doesn't matter if it is a min heap or max heap) can be in the lowest level of the heap (in the leaf). I am not sure how to prove it. I thought about using the fact that a heap is a complete binary tree but I am not sure about it. How can I prove it?

As #Evg mentioned in the comments, if all elements are the same, this is trivially true. Assume that all elements need to be different, and let us focus on the case with an odd amount of nodes 2H+1 and a min heap (the max heap case is similar). To create the min heap where the median is at the bottom, first insert the smallest H elements.
There are two cases. Case 1; after doing this the binary tree formed by these H elements is completely filled (every layer is filled) then you can just insert the remaining H+1 elements on the last layer (which you can do since the maximum capacity of the last layer equals (#total_nodes+1)/2 which is precisely H+1).
Case 2 The last layer still has some unfilled spaces. In this case, take the smallest remaining nodes from the largest H elements until this layer is filled (note that there will be no upward movement in your heap since these elements are already larger than whatever is in the tree). Then start the next layer by inserting the median. Finally insert the remaining nodes, which won't be moved upwards either since they are larger than whatever is in the layer above, by construction. By the same argument about the capacity of the last layer, you will not need to start a new layer during this process.
In the case where there are an even amount of nodes 2H, you can argue similarly, but you would have to define the median as H+1 smallest node (otherwise the statement you want to prove is false, as you can see by noticing that the only possible min-heap for the set {1,2} is the tree with root at 1 and leaf at 2).

Easiest way to prove it is just to make one:
1
2 3
4 5 6 7
Any complete heap with nodes in level order will have the median at the left-most leaf, but you don't have to prove that.

Related

How many comparisons a call to removeMin() will make in max heap of 7-ary tree?

Assume that a max heap with 10^6 elements is stored in a complete 7-ary tree. Approximately how many comparisons a call to removeMin() will make?
5000
50
10^6
500
5
My solution: The number of comparisons should be equal to the number of leaf nodes at most because in max heap, the min. can be found at any of the leaf nodes which is not in the above options. Better approach was to take the square of ( log of 10^6 to the base 7) which gives 50 but this is only when we are sure that the minimum element will follow a single branch across tree which in the case of max heap is not correct.
I hope that you can help.
There's no "natural" way to remove the minimum value from a max heap. You simply have to look at all the leaf nodes to figure out which one happens to be the minimum.
The question then is how many leaf nodes there are. Intuitively, we'd expect the fraction of nodes in the heap that are leaves to be pretty close to the total number of nodes. Take it to the limit - if you have a 1,000,000-ary heap, you'd have one node in the top layer and all remaining 999,999 elements in the next layer. Even in the smallest case where the heap is a binary heap, you'd expect roughly half the elements to be in the bottom layer.
More specifically, let's do some math! How many leaves will a 7-ary heap with n nodes have? Well, each node in the tree will either
be a leaf, or
have seven children,
with one possible exception that, since the bottommost row might not be full, there might be one node with fewer than seven children. Since that's just a one-off, we can ignore that last node when we're dealing with millions of elements. A quick proof by induction can be used to show that any tree where each node either has no children or seven children will have seven times as many leaf nodes as internal nodes (prove this!), so we'd expect than (7/8)ths of the nodes will be leaves, for a total of 875,000 leaves to check.
As a result, the best answer here would be roughly 106 comparisons.
Min element can be any of the leaves of a max heap or any type, and there's no order there. All elements from A[10^6/7 + 1] onwards (where A is the array storing the leaves) are leaf nodes and need to be checked. This means 8571412 comparisons just to find the minimum. After that there is no simple way to 'remove' the minimum without introducing a gap that cannot be filled by simply shifting the leaves.
This is a misprint. Maybe the teacher wanted to ask removeMax, for which the answer is close to 50 -- see below:
There are 7 comparisons per level done by the heapify since each node has 7 children. If h is the height of the heap then that's 7*h comparisons.
Rough analysis: (here ~ means approximately)
h ~ log_7(10^6) = 7.1, hence total comparisons 7*7.1 ~ 50
More accurate analysis:
A 7-ary heap would have elements: 1 + 7 + 7^2 + ... + 7^h = 10^6
On the left side is a geometric series, that sums to: (7^h -1)/6 = 10^6
=> 7^h = 6*10^6 + 1
=> h = lg_7(6*10^6 + 1) = 8 (approximately) , hence 7*8 = 56, still from the options 50 is the closest.
*A is array to sort heap.

Why does it require no more than 1 + logN compares when inserting a new node into a heap?

I think when inserting a new node into a heap, the amount of nodes it might passes by is logN, why is it (1 + logN), where is 1 from?
This is necessary to account for the border case when the number of notes is 2n. A heap of n levels fits 2n-1 objects, so adding one more object starts the new level:
Black squares represent seven elements of a three-level heap. Red element is number eight. If your search takes you to the location of this last element, you end up with four comparisons, even though log28 is three.

Intitutive idea behind the build heap function

I want to know about the following statement in build heap function
for i=A.length/2 downto 1
As this step was deduced by hit & trial to find out the parent of leaves or there was something else in the mind of the person who developed this algorithmBelow is build heap function-
Build_Max_Heap(A)
A.heap_size=A.length
for i=A.length/2 downto 1
Max_Heapify(A,i)
The nodes of the second half of the array are leaves(explanation follows in next paragraph) and are thus trivially 1-node max heaps already, and hence Max_Heapify need not be done for them.
Even if you call Max_Heapify for those nodes, no harm in terms of time complexity as the leaf nodes have no nodes below and Max_Heapify would return immediately anyway.
Mathematically it's easy to establish that the second half of the array are leaves by the way summation of a Geometric Progression works. Recall that a heap is a complete binary tree, which means it will have all nodes filled in each level, except possibly the last level, in which it will be filled partially from left to right. For sake of simplicity, let's assume the number of nodes in our heap as 2^N. Clearly this tree has N levels with all levels filled.
First level has 2^0 node
Second level has 2^1 nodes
Third level has 2^2 nodes
N-1 level has 2^N-2 nodes
Nth level has 2^N-1 nodes
Sum of the count of nodes in all levels except last = 2^0 + 2^1...+ 2^N-2
= 2^N-1 - 1
This is the number of nodes in the last level off by one. That is, the the total number of nodes in all levels except the last is almost same as the number of nodes in just the last level, which directly implies that the last level must have half the total number of all nodes in the heap in it. By this observation, we get A.length/2.

Why does a Binary Heap has to be a Complete Binary Tree?

The heap property says:
If A is a parent node of B then the key of node A is ordered with
respect to the key of node B with the same ordering applying across
the heap. Either the keys of parent nodes are always greater than or
equal to those of the children and the highest key is in the root node
(this kind of heap is called max heap) or the keys of parent nodes are
less than or equal to those of the children and the lowest key is in
the root node (min heap).
But why in this wiki, the Binary Heap has to be a Complete Binary Tree? The Heap Property doesn't imply that in my impression.
According to the wikipedia article you provided, a binary heap must conform to both the heap property (as you discussed) and the shape property (which mandates that it is a complete binary tree). Without the shape property, one would lose the runtime advantage that the data structure provides (i.e. the completeness ensures that there is a well defined way to determine the new root when an element is removed, etc.)
Every item in the array has a position in the binary tree, and this position is calculated from the array index. The positioning formula ensures that the tree is 'tightly packed'.
For example, this binary tree here:
is represented by the array
[1, 2, 3, 17, 19, 36, 7, 25, 100].
Notice that the array is ordered as if you're starting at the top of the tree, then reading each row from left-to-right.
If you add another item to this array, it will represent the slot below the 19 and to the right of the 100. If this new number is less than 19, then values will have to be swapped around, but nonetheless, that is the slot that will be filled by the 10th item of the array.
Another way to look at it: try constructing a binary heap which isn't a complete binary tree. You literally cannot.
You can only guarantee O(log(n)) insertion and (root) deletion if the tree is complete. Here's why:
If the tree is not complete, then it may be unbalanced and in the worst case, simply a linked list, requiring O(n) to find a leaf, and O(n) for insertion and deletion. With the shape requirement of completeness, you are guaranteed O(log(n)) operations since it takes constant time to find a leaf (last in array), and you are guaranteed that the tree is no deeper than log2(N), meaning the "bubble up" (used in insertion) and "sink down" (used in deletion) will require at most log2(N) modifications (swaps) of data in the heap.
This being said, you don't absolutely have to have a complete binary tree, but you just loose these runtime guarantees. In addition, as others have mentioned, having a complete binary tree makes it easy to store the tree in array format forgoing object reference representation.
The point that 'complete' makes is that in a heap all interior (not leaf) nodes have two children, except where there are no children left -- all the interior nodes are 'complete'. As you add to the heap, the lowest level of nodes is filled (with childless leaf nodes), from the left, before a new level is started. As you remove nodes from the heap, the right-most leaf at the lowest level is removed (and pushed back in at the top). The heap is also perfectly balanced (hurrah!).
A binary heap can be looked at as a binary tree, but the nodes do not have child pointers, and insertion (push) and deletion (pop or from inside the heap) are quite different to those procedures for an actual binary tree.
This is a direct consequence of the way in which the heap is organised. The heap is held as a vector with no gaps between the nodes. The parent of the i'th item in the heap is item (i - 1) / 2 (assuming a binary heap, and assuming the top of the heap is item 0). The left child of the i'th item is (i * 2) + 1, and the right child one greater than that. When there are n nodes in the heap, a node has no left child if (i * 2) + 1 exceeds n, and no right child if (i * 2) + 2 does.
The heap is a beautiful thing. It's one flaw is that you do need a vector large enough for all entries... unlike a real binary tree, you cannot allocate a node at a time. So if you have a heap for an indefinite number of items, you have to be ready to extend the underlying vector as and when needed -- or run some fragmented structure which can be addressed as if it was a vector.
FWIW: when stepping down the heap, I find it convenient to step to the right child -- (i + 1) * 2 -- if that is < n then both children are present, if it is == n only the left child is present, otherwise there are no children.
By maintaining binary heap as a complete binary gives multiple advantages such as
1.heap is complete binary tree so height of heap is minimum possible i.e log(size of tree). And insertion, build heap operation depends on height. So if height is minimum then their time complexity will be reduced.
2.All the items of complete binary tree stored in contiguous manner in array so random access is possible and it also provide cache friendliness.
In order for a Binary Tree to be considered a heap two it must meet two criteria. 1) It must have the heap property. 2) it must be a complete tree.
It is possible for a structure to have either of these properties and not have the other, but we would not call such a data structure a heap. You are right that the heap property does not entail the shape property. They are separate constraints.
The underlying structure of a heap is an array where every node is an index in an array so if the tree is not complete that means that one of the index is kept empty which is not possible beause it is coded in such a way that each node is an index .I have given a link below so that u can see how the heap structure is built
http://www.sanfoundry.com/java-program-implement-min-heap/
Hope it helps
I find that all answers so far either do not address the question or are, essentially, saying "because the definition says so" or use a similar circular argument. They are surely true but (to me) not very informative.
To me it became immediately obvious that the heap must be a complete tree when I remembered that you insert a new element not at the root (as you do in a binary search tree) but, rather, at the bottom right.
Thus, in a heap, a new element propagates from the bottom up - it is "moved up" within the tree till it finds a suitable place.
In a binary search tree a newly inserted element moves the other way round - it is inserted at the root and it "moves down" till it finds its place.
The fact that each new element in a heap starts as the bottom right node means that the heap is going to be a complete tree at all times.

Finding the smallest element of a binary max heap stored as Ahnentafel array

I have a binary max heap (largest element at the top), and I need to keep it of constant size (say 20 elements) by getting rid of the smallest element each time I get to 20 elements. The binary heap is stored in an array, with children of node i at 2*i and 2*i+1 (i is zero based). At any point, the heap has 'n_elements' elements, between 0 and 20. For example, the array [16,14,10,8,7,9,3,2,4] would be a valid max binary heap, with 16 having children 14 and 10, 14 having children 8 and 7 ...
To find the smallest element, it seems that in general I have to traverse the array from n_elements/2 to n_elements: the smallest element is not necessarily the last one in the array.
So, with only that array, it seems any attempt at finding/removing the smallest elt is at least O(n). Is that correct?
For any given valid Max Heap the minimum will be at the leaf nodes only. The next question is how to find the leaf nodes of the heap in the array ?. If we carefully observe the last node of the array it will be the last leaf node. Get the parent of the leaf node by the formula
parent node index = (leaf Node Index)/2
Start linear search from the index (parent node index +1) to last leaf node index get the minimum value in that range.
FindMinInMaxHeap(Heap heap)
startIndex = heap->Array[heap->lastIndex/2]
if startIndex == 0
return heap->Array[startIndex]
Minimum = heap->Array[startIndex + 1]
for count from startIndex+2 to heap->lastIndex
if(heap->Array[count] < Minimum)
Minimum := heap->Array[count]
print Minimum
There isn't any way I can think of by which you can get better that O(n) performance for finding and removing the smallest element from a max heap by using the heap alone. One approach that you can take is:
If you are creating this heap data structure yourself, you can keep a separate pointer to the location of the smallest element in the array. So whenever a new element is added to the heap, check if the new element is smaller. If yes, update the pointer etc. Then finding the smallest element would be O(1).
MBo raises a good point in the comment about how to get the next smallest element after each removal. You'll still need to do the O(n) thing to find the next smallest element after each removal. So removal would still be O(n). But finding the smallest element would be O(1)
If you need faster removal as well, you'll need to also maintain a min-heap of all the elements. In that case, removal would be O(log(n)). Insertion will take 2x time because you have to insert into two heaps and it will also take 2x space.
By the way, if you have only 20 elements at any point of time, this is not really going to matter much (unless it is a homework problem or you are just doing it for fun). It would really matter only if you plan to scale it to thousands of values.
There is minmax heap data structure: http://en.wikipedia.org/wiki/Min-max_heap . Of course, it's code is rather complex, but with two separate heaps we have to use a lot of additional space (for the second heap, for maintaining one-to-one mapping) and to do the job twice.

Resources