Max Heapify Algorithm - algorithm

I am bit confused. If I have an array I have to build a tree. To compare the childern I have to know how large my array is in this case its N = 6 so I have to divide it by 2 so I get 3. That means I have to start from index 3 to compare with the parent node. If the child is greater than the parent node then I have to swap it otherwise I don't have to. Then I go to index 2 and compare with the parent if the children is greater than the parent node then I have to swap it. Then index 1 I have to compare with the children and swap it if needed. So I have created a Max heap. But know I don't get it but why do I have to exchange A1 with A[6] then A1 with A[5]. Finally I dont get the Max heap I get the Min Heap? What does Heapify mean?
Thanks alot I appreciate every answer!
One of my exercise is Illustrate the steps of Heapsort by filling in the arrays and the tree representations

There are many implementations of a heap data structure, but one is talking about a specific implicit binary heap. Heap-sort is done in-place, so it uses this design. Binary heaps require a compete binary tree, so it can be represented as an implicit structure built out of the array: for every A[n] in zero-based array,
A[0] is the root; if n != 0, A[floor((n-1)/2)] is the parent;
if 2n+1 is in the range of the the array, then A[2n+1] is the left child, or else it is a leaf node;
if 2n+2 is in the range of the array, then A[2n+2] is the right child.
Say one's array is, [10,14,19,21,23,31], is represented implicitly by the homomorphism, using the above rules, as,
This is not following the max-heap invariants, so one must heapify, probably using Floyd's heap construction which uses sift down and runs in O(n). Now you have a heap and a sorted array of no length, ([31,23,19,21,14,10],[]), (this is all implicit, since the heap takes no extra memory, it's just an array in memory.) The visualisation of the heap at this stage,
We pop off the maximum element of the heap and use sift up to restore the heap shape. Now the heap is one smaller and we've taken the maximum element and stored unshifted it into our array, ([23,21,19,10,14],[31]),
repeat, ([21,14,19,10],[23,31]),
([19,14,10],[21,23,31]),
([14,10],[19,21,23,31]),
([10],[14,19,21,23,31]),
The heap size is one, so one's final sorted array is [10,14,19,21,23,31]. If one used a min-heap and the same algorithm, then the array would be sorted the other way.

Heap sort is a two phase process. In the first phase, you turn the array in a heap with the maximum value at the top A[1]. This is the first transition circled in red. After this phase, the heap is in the array from index 1 to 6, and the biggest value is at index 1 in A[1].
In the second phase we sort the values. This is a multistep process where we extract the biggest value from the heap and put it in place in the sorted array.
The heap is on the left side of the array and will shrink toward the left. The sorted array is on the right of the array and grows to the left.
At each step we swap the top of the heap A[1] that contains the biggest value of the heap, with the last value of the heap. The sorted array has then grown one position to the left. Since the value that has been put in A[1] is not the biggest, we have to restore the heap. This operation called max-heapify. After this process, A[1] contains the biggest value in the heap whose size has been reduced by one element.
By repeatedly extracting the biggest value left in the heap, we can sort the values in the array.
The drawing of the binary tree is very confusing. It's size should shrink at each step because the size of the heap shrinks.

Related

Heap Sort - max heap

I have a list of numbers in [17,98,89,42,67,54,89,25,38] which is to be inserted in an empty heap from left to right . what will be the resulting heap ?
Although this isn't the heap sort algorithm (sorts a data set by using heap), building a heap does require some comparison work to ensure it remains a heap (you're wanting a max heap, so all children are less than their parent). The way to build a max heap is to put the newest element in the next available spot (the left-most spot in the tree not filled within the deepest row, or starting a new row from the left-most spot). Once an element is inserted it is swapped with its parent if it's bigger than its parent until it either becomes the root of the tree or finds a parent bigger than itself. This is done until all elements are inserted, and it, in fact, has the max element as the root.
It is important to note that for a min heap, the minimum element is the root, so instead of having all parents bigger than their children, it becomes all children bigger than their parent. In building a min heap, still add new vertices to the same spot, but switch the new child with their parent if the child is less than the parent instead of more.
Two images have been attached with the resulting max & min heaps. Note that 89a corresponds to the first 89 and 89b corresponds to the seconds, for clarification purposes. Max Heap Min Heap

Why does a Binary Heap has to be a Complete Binary Tree?

The heap property says:
If A is a parent node of B then the key of node A is ordered with
respect to the key of node B with the same ordering applying across
the heap. Either the keys of parent nodes are always greater than or
equal to those of the children and the highest key is in the root node
(this kind of heap is called max heap) or the keys of parent nodes are
less than or equal to those of the children and the lowest key is in
the root node (min heap).
But why in this wiki, the Binary Heap has to be a Complete Binary Tree? The Heap Property doesn't imply that in my impression.
According to the wikipedia article you provided, a binary heap must conform to both the heap property (as you discussed) and the shape property (which mandates that it is a complete binary tree). Without the shape property, one would lose the runtime advantage that the data structure provides (i.e. the completeness ensures that there is a well defined way to determine the new root when an element is removed, etc.)
Every item in the array has a position in the binary tree, and this position is calculated from the array index. The positioning formula ensures that the tree is 'tightly packed'.
For example, this binary tree here:
is represented by the array
[1, 2, 3, 17, 19, 36, 7, 25, 100].
Notice that the array is ordered as if you're starting at the top of the tree, then reading each row from left-to-right.
If you add another item to this array, it will represent the slot below the 19 and to the right of the 100. If this new number is less than 19, then values will have to be swapped around, but nonetheless, that is the slot that will be filled by the 10th item of the array.
Another way to look at it: try constructing a binary heap which isn't a complete binary tree. You literally cannot.
You can only guarantee O(log(n)) insertion and (root) deletion if the tree is complete. Here's why:
If the tree is not complete, then it may be unbalanced and in the worst case, simply a linked list, requiring O(n) to find a leaf, and O(n) for insertion and deletion. With the shape requirement of completeness, you are guaranteed O(log(n)) operations since it takes constant time to find a leaf (last in array), and you are guaranteed that the tree is no deeper than log2(N), meaning the "bubble up" (used in insertion) and "sink down" (used in deletion) will require at most log2(N) modifications (swaps) of data in the heap.
This being said, you don't absolutely have to have a complete binary tree, but you just loose these runtime guarantees. In addition, as others have mentioned, having a complete binary tree makes it easy to store the tree in array format forgoing object reference representation.
The point that 'complete' makes is that in a heap all interior (not leaf) nodes have two children, except where there are no children left -- all the interior nodes are 'complete'. As you add to the heap, the lowest level of nodes is filled (with childless leaf nodes), from the left, before a new level is started. As you remove nodes from the heap, the right-most leaf at the lowest level is removed (and pushed back in at the top). The heap is also perfectly balanced (hurrah!).
A binary heap can be looked at as a binary tree, but the nodes do not have child pointers, and insertion (push) and deletion (pop or from inside the heap) are quite different to those procedures for an actual binary tree.
This is a direct consequence of the way in which the heap is organised. The heap is held as a vector with no gaps between the nodes. The parent of the i'th item in the heap is item (i - 1) / 2 (assuming a binary heap, and assuming the top of the heap is item 0). The left child of the i'th item is (i * 2) + 1, and the right child one greater than that. When there are n nodes in the heap, a node has no left child if (i * 2) + 1 exceeds n, and no right child if (i * 2) + 2 does.
The heap is a beautiful thing. It's one flaw is that you do need a vector large enough for all entries... unlike a real binary tree, you cannot allocate a node at a time. So if you have a heap for an indefinite number of items, you have to be ready to extend the underlying vector as and when needed -- or run some fragmented structure which can be addressed as if it was a vector.
FWIW: when stepping down the heap, I find it convenient to step to the right child -- (i + 1) * 2 -- if that is < n then both children are present, if it is == n only the left child is present, otherwise there are no children.
By maintaining binary heap as a complete binary gives multiple advantages such as
1.heap is complete binary tree so height of heap is minimum possible i.e log(size of tree). And insertion, build heap operation depends on height. So if height is minimum then their time complexity will be reduced.
2.All the items of complete binary tree stored in contiguous manner in array so random access is possible and it also provide cache friendliness.
In order for a Binary Tree to be considered a heap two it must meet two criteria. 1) It must have the heap property. 2) it must be a complete tree.
It is possible for a structure to have either of these properties and not have the other, but we would not call such a data structure a heap. You are right that the heap property does not entail the shape property. They are separate constraints.
The underlying structure of a heap is an array where every node is an index in an array so if the tree is not complete that means that one of the index is kept empty which is not possible beause it is coded in such a way that each node is an index .I have given a link below so that u can see how the heap structure is built
http://www.sanfoundry.com/java-program-implement-min-heap/
Hope it helps
I find that all answers so far either do not address the question or are, essentially, saying "because the definition says so" or use a similar circular argument. They are surely true but (to me) not very informative.
To me it became immediately obvious that the heap must be a complete tree when I remembered that you insert a new element not at the root (as you do in a binary search tree) but, rather, at the bottom right.
Thus, in a heap, a new element propagates from the bottom up - it is "moved up" within the tree till it finds a suitable place.
In a binary search tree a newly inserted element moves the other way round - it is inserted at the root and it "moves down" till it finds its place.
The fact that each new element in a heap starts as the bottom right node means that the heap is going to be a complete tree at all times.

Data Structure that supports queue like operations and mode finding

This was an interview question asked to me almost 3 years back and I was pondering about this a while back.
Design a data structure that supports the following operations:
insert_back(), remove_front() and find_mode(). Best complexity
required.
The best solution I could think of was O(logn) for insertion and deletion and O(1) for mode. This is how I solved it: Keep a queue DS for handling which element is inserted and deleted.
Also keep an array which is max heap ordered and a hash table.
The hashtable contains an integer key and an index into the heap array location of that element. The heap array contains an ordered pair (count,element) and is ordered on the count property.
Insertion : Insert the element into the queue. Find the location of the heap array index from the hashtable. If none exists, then add the element to the heap and heapify upwards. Then add the final location into the hashtable. Increment the count in that location and heapify upwards or downwards as needed to restore the heap property.
Deletion : Remove element from the head of the queue. From the hash table, find a location in the heap array index. Decrement the count in the heap and reheapify upward or downwards as needed to restore the heap property.
Find Mode: The element at the head of the array heap (getMax()) will give us the mode.
Can someone please suggest something better. The only optimization I could think of was using a Fibonacci heap but I am not sure if that is a good fit in this problem.
I think there is a solution with O(1) for all operations.
You need a deque, and two hashtables.
The first one is a linked hashtable, where for each element you store its count, the next element in count order and a previous element in count order. Then you can look the next and previous element's entries in that hashtable in a constant time. For this hashtable you also keep and update the element with the largest count. (element -> count, next_element, previous_element)
In the second hashtable for each distinct number of elements, you store the elements with that count in the start and in the end of the range in the first hashtable. Note that the size of this hashtable will be less than n (it's O(sqrt(n)), I think). (count -> (first_element, last_element))
Basically, when you add an element to or remove an element from the deque, you can find its new position in the first hashtable by analyzing its next and previous elements, and the values for the old and new count in the second hashtable in constant time. You can remove and add elements in the first hashtable in constant time, using algorithms for linked lists. You can also update the second hashtable and the element with the maximum count in constant time as well.
I'll try writing pseudocode if needed, but it seems to be quite complex with many special cases.

How worthwhile is this optimization for Heapsort?

The traditional Heapsort algorithm swaps the last element of the heap with the root of the current heap after every heapification, then continues the process again. However, I noticed that it is kind of unnecessary.
After a heapification of a sub-array, while the node contains the highest value (if it's a max-heap), the next 2 elements in the array must follow the root in the sorted array, either in the same order as they are now, or exchanging them if they are reverse-sorted. So instead of just swapping the root with the last element, won't it be better to swap the first 3 elements (including the node and after the if necessary exchange of the 2nd and 3rd elements) with the last 3 elements, so that 2 subsequent heapifications (for the 2nd and 3rd elements ) are dispensed with?
Is there any disadvantage with this method (apart from the if-needed swapping of the 2nd and 3rd elements, which should be trivial)? If not, if it is indeed better, how much performance boost will it give? Here is the pseudo-code:
function heapify(a,i) {
#assumes node i has two nodes, both heaps. However node i itself might not be a heap, i.e one of its children may be greater than its value.
#if so, find the greater of its two children, then swp the parent with that value.
#this might render that child as no longer a heap, so recurse
}
function create_heap(a) {
#all elements following index |_n/2_| are leaf nodes, thus heapify() should be applied to all elements within index 1 to |_n/2_|
}
function heapsort(a) {
create_heap(a); #a is now a max-heap
#root of the heap, that is a[1] is the maximum, so swap it with a[n].
#The root now contains an element smaller than its children, both of which are themselves heaps.
#so apply heapify(a,1). Note: heap length is now n-1, as a[n] is the largest element already found
#now again the array is a heap. The highest value is in a[1]. Swap it with a[n-1].
#continue
}
Suppose the array is [4,1,3,2,16,9,10,14,8,7]. After running a heapify, it will become [16,14,10,8,7,9,3,2,4]. Now the heapsort's first iteration will swap 16 and 4, leading to [4,14,10,8,7,9,3,2,16]. Since this has now rendered the root of the new heap [4,14,10,8,7,9,3,2] as, umm, un-heaped, (14 and 10 both being greater than 4), run another heapify to produce [14,8,10,4,7,9,3,2]. Now 14 being the root, swap it with 2 to yield [2,8,10,4,7,9,3,14], thus making the array currently [2,8,10,4,7,9,3,14,16]. Again we find that 2 is un-heaped, so again doing a heapify makes the heap as [10,8,9,4,7,2,3]. Then 10 is swapped with 3, making the array as [3,8,9,4,7,2,3,10,14,16]. My point is that instead of doing the 2nd and 3rd heapifications to store 10 and 14 before 16, we can tell from the first heapification that because 10 and 14 follow 16, they are the 2nd and 3rd largest elements (or vice-versa). So after a comparison between them (in case they are already sorted, 14 comes before 10), I swap all the there (16,14,10) with (3,2,4), making the array [3,2,4,8,7,9,16,14,10]. This reduces us to a similar condition as the one after the further two heapifications - [3,8,9,4,7,2,3,10,14,16] originally, as compared to [3,2,4,8,7,9,16,14,10] now. Both will now need further heapification, but the 2nd method has let us arrive at this juncture directly by just a comparison between two elements (14 and 10).
The second largest element of the heap is present in the second or third position, but the third largest can be present further down, at depth 2. (See the figure in http://en.wikipedia.org/wiki/Heap_(data_structure) ). Furthermore, after swapping the first three elements with the last three, the heapify method would first heapify the first subtree of the root, followed by the second subtree of the root, followed by the whole tree. Thus the total cost of this operation is close to three times the cost of swapping the top element with the last and calling heapify. So you won't gain anything by doing this.

Finding the smallest element of a binary max heap stored as Ahnentafel array

I have a binary max heap (largest element at the top), and I need to keep it of constant size (say 20 elements) by getting rid of the smallest element each time I get to 20 elements. The binary heap is stored in an array, with children of node i at 2*i and 2*i+1 (i is zero based). At any point, the heap has 'n_elements' elements, between 0 and 20. For example, the array [16,14,10,8,7,9,3,2,4] would be a valid max binary heap, with 16 having children 14 and 10, 14 having children 8 and 7 ...
To find the smallest element, it seems that in general I have to traverse the array from n_elements/2 to n_elements: the smallest element is not necessarily the last one in the array.
So, with only that array, it seems any attempt at finding/removing the smallest elt is at least O(n). Is that correct?
For any given valid Max Heap the minimum will be at the leaf nodes only. The next question is how to find the leaf nodes of the heap in the array ?. If we carefully observe the last node of the array it will be the last leaf node. Get the parent of the leaf node by the formula
parent node index = (leaf Node Index)/2
Start linear search from the index (parent node index +1) to last leaf node index get the minimum value in that range.
FindMinInMaxHeap(Heap heap)
startIndex = heap->Array[heap->lastIndex/2]
if startIndex == 0
return heap->Array[startIndex]
Minimum = heap->Array[startIndex + 1]
for count from startIndex+2 to heap->lastIndex
if(heap->Array[count] < Minimum)
Minimum := heap->Array[count]
print Minimum
There isn't any way I can think of by which you can get better that O(n) performance for finding and removing the smallest element from a max heap by using the heap alone. One approach that you can take is:
If you are creating this heap data structure yourself, you can keep a separate pointer to the location of the smallest element in the array. So whenever a new element is added to the heap, check if the new element is smaller. If yes, update the pointer etc. Then finding the smallest element would be O(1).
MBo raises a good point in the comment about how to get the next smallest element after each removal. You'll still need to do the O(n) thing to find the next smallest element after each removal. So removal would still be O(n). But finding the smallest element would be O(1)
If you need faster removal as well, you'll need to also maintain a min-heap of all the elements. In that case, removal would be O(log(n)). Insertion will take 2x time because you have to insert into two heaps and it will also take 2x space.
By the way, if you have only 20 elements at any point of time, this is not really going to matter much (unless it is a homework problem or you are just doing it for fun). It would really matter only if you plan to scale it to thousands of values.
There is minmax heap data structure: http://en.wikipedia.org/wiki/Min-max_heap . Of course, it's code is rather complex, but with two separate heaps we have to use a lot of additional space (for the second heap, for maintaining one-to-one mapping) and to do the job twice.

Resources