Why does a Binary Heap has to be a Complete Binary Tree? - data-structures

The heap property says:
If A is a parent node of B then the key of node A is ordered with
respect to the key of node B with the same ordering applying across
the heap. Either the keys of parent nodes are always greater than or
equal to those of the children and the highest key is in the root node
(this kind of heap is called max heap) or the keys of parent nodes are
less than or equal to those of the children and the lowest key is in
the root node (min heap).
But why in this wiki, the Binary Heap has to be a Complete Binary Tree? The Heap Property doesn't imply that in my impression.

According to the wikipedia article you provided, a binary heap must conform to both the heap property (as you discussed) and the shape property (which mandates that it is a complete binary tree). Without the shape property, one would lose the runtime advantage that the data structure provides (i.e. the completeness ensures that there is a well defined way to determine the new root when an element is removed, etc.)

Every item in the array has a position in the binary tree, and this position is calculated from the array index. The positioning formula ensures that the tree is 'tightly packed'.
For example, this binary tree here:
is represented by the array
[1, 2, 3, 17, 19, 36, 7, 25, 100].
Notice that the array is ordered as if you're starting at the top of the tree, then reading each row from left-to-right.
If you add another item to this array, it will represent the slot below the 19 and to the right of the 100. If this new number is less than 19, then values will have to be swapped around, but nonetheless, that is the slot that will be filled by the 10th item of the array.
Another way to look at it: try constructing a binary heap which isn't a complete binary tree. You literally cannot.

You can only guarantee O(log(n)) insertion and (root) deletion if the tree is complete. Here's why:
If the tree is not complete, then it may be unbalanced and in the worst case, simply a linked list, requiring O(n) to find a leaf, and O(n) for insertion and deletion. With the shape requirement of completeness, you are guaranteed O(log(n)) operations since it takes constant time to find a leaf (last in array), and you are guaranteed that the tree is no deeper than log2(N), meaning the "bubble up" (used in insertion) and "sink down" (used in deletion) will require at most log2(N) modifications (swaps) of data in the heap.
This being said, you don't absolutely have to have a complete binary tree, but you just loose these runtime guarantees. In addition, as others have mentioned, having a complete binary tree makes it easy to store the tree in array format forgoing object reference representation.

The point that 'complete' makes is that in a heap all interior (not leaf) nodes have two children, except where there are no children left -- all the interior nodes are 'complete'. As you add to the heap, the lowest level of nodes is filled (with childless leaf nodes), from the left, before a new level is started. As you remove nodes from the heap, the right-most leaf at the lowest level is removed (and pushed back in at the top). The heap is also perfectly balanced (hurrah!).
A binary heap can be looked at as a binary tree, but the nodes do not have child pointers, and insertion (push) and deletion (pop or from inside the heap) are quite different to those procedures for an actual binary tree.
This is a direct consequence of the way in which the heap is organised. The heap is held as a vector with no gaps between the nodes. The parent of the i'th item in the heap is item (i - 1) / 2 (assuming a binary heap, and assuming the top of the heap is item 0). The left child of the i'th item is (i * 2) + 1, and the right child one greater than that. When there are n nodes in the heap, a node has no left child if (i * 2) + 1 exceeds n, and no right child if (i * 2) + 2 does.
The heap is a beautiful thing. It's one flaw is that you do need a vector large enough for all entries... unlike a real binary tree, you cannot allocate a node at a time. So if you have a heap for an indefinite number of items, you have to be ready to extend the underlying vector as and when needed -- or run some fragmented structure which can be addressed as if it was a vector.
FWIW: when stepping down the heap, I find it convenient to step to the right child -- (i + 1) * 2 -- if that is < n then both children are present, if it is == n only the left child is present, otherwise there are no children.

By maintaining binary heap as a complete binary gives multiple advantages such as
1.heap is complete binary tree so height of heap is minimum possible i.e log(size of tree). And insertion, build heap operation depends on height. So if height is minimum then their time complexity will be reduced.
2.All the items of complete binary tree stored in contiguous manner in array so random access is possible and it also provide cache friendliness.

In order for a Binary Tree to be considered a heap two it must meet two criteria. 1) It must have the heap property. 2) it must be a complete tree.
It is possible for a structure to have either of these properties and not have the other, but we would not call such a data structure a heap. You are right that the heap property does not entail the shape property. They are separate constraints.

The underlying structure of a heap is an array where every node is an index in an array so if the tree is not complete that means that one of the index is kept empty which is not possible beause it is coded in such a way that each node is an index .I have given a link below so that u can see how the heap structure is built
http://www.sanfoundry.com/java-program-implement-min-heap/
Hope it helps

I find that all answers so far either do not address the question or are, essentially, saying "because the definition says so" or use a similar circular argument. They are surely true but (to me) not very informative.
To me it became immediately obvious that the heap must be a complete tree when I remembered that you insert a new element not at the root (as you do in a binary search tree) but, rather, at the bottom right.
Thus, in a heap, a new element propagates from the bottom up - it is "moved up" within the tree till it finds a suitable place.
In a binary search tree a newly inserted element moves the other way round - it is inserted at the root and it "moves down" till it finds its place.
The fact that each new element in a heap starts as the bottom right node means that the heap is going to be a complete tree at all times.

Related

How to get the n-th value of a b-tree

Is there general pseudocode or related data structure to get the nth value of a b-tree? For example, the eighth value of this tree is 13 [1,4,9,9,11,11,12,13].
If I have some values sorted in a b-tree, I would like to find the nth value without having to go through the entire tree. Is there a better structure for this problem? The data order could update anytime.
You are looking for order statistics tree. The idea of it, is in addition to any data stored in nodes - also store the size of the subtree in the node, and keep them updated in insertions and deletions.
Since you are "touching" O(logn) nodes for each insert/delete operation - keeping it up to date still keeps the O(logn) behavior of these.
FindKth() is then done by eliminating subtrees that their bigger index is still smaller than k, and checking the next one. Since you don't need to go to the depth of each subtree, only directly to the required one (and checking the nodes in the path to this element) - you need to "touch" O(logn) nodes, which makes this operation O(logn) as well.

Delete certain element from heap [duplicate]

As I understand, binary heap does not support removing random elements. What if I need to remove random elements from a binary heap?
Obviously, I can remove an element and re-arrange the entire heap in O(N). Can I do better?
Yes and no.
The problem is a binary heap does not support search for an arbitrary element. Finding it is itself O(n).
However, if you have a pointer to the element (and not only its value) - you can swap the element with the rightest leaf, remove this leaf, and than re-heapify the relevant sub-heap (by sifting down the newly placed element as much as needed). This results in O(logn) removal, but requires a pointer to the actual element you are looking for.
Amit is right in his answer but here is one more nuance:
the position of the removed item (where you put the right-most leaf) can be required to be bubbled up (compare with parent and move up until parent is larger than you).
Sometimes it is required to bubble down (compare with children and move down until all children are smaller than you). It all depends on the case.
Depends on what is meant by "random element." If it means that the heap contains elements [e1, e2, ..., eN] and one wants to delete some ei (1 <= i <= N), then this is possible.
If you are using a binary heap implementation from some library, it might be that it doesn't provide you with the API that you need. In that case, you should look for another library that has it.
If you were to implement it yourself, you would need two additional calls:
A procedure deleteAtIndex(heap, i) that deletes the node at index i
by positioning the last element in the heap array at i, decrementing the
element count, and finally shuffling down/up
the new ith element to maintain the heap invariant. The most
common use of this procedure is to "pop" the heap by calling
deleteAtIndex(heap, 1) -- assuming 1-origin indexing. This operation
will run O(log n) (though, to be complete, I'll note that the highest bound can be
improved up to O(log(log n)) depending on some assumptions about your elements' keys).
A procedure deleteElement(heap, e) that deletes the element e (your arbitrary element).
Your heap algorithm would maintain an array ElementIndex such that ElementIndex[e] returns
the current index of element e: calling deleteAtIndex(heap, ElementIndex[e])
will then do what you want. It will also run in O(log n) because the array access is constant.
Since binary heaps are often used in algorithms that merely pop the highest (or lowest)
priority element (rather than deleting arbitrary elements), I imagine that some libraries might miss on the deleteAtIndex API to save space (the extra ElementIndex array mentioned above).

How to remove elements from a binary heap?

As I understand, binary heap does not support removing random elements. What if I need to remove random elements from a binary heap?
Obviously, I can remove an element and re-arrange the entire heap in O(N). Can I do better?
Yes and no.
The problem is a binary heap does not support search for an arbitrary element. Finding it is itself O(n).
However, if you have a pointer to the element (and not only its value) - you can swap the element with the rightest leaf, remove this leaf, and than re-heapify the relevant sub-heap (by sifting down the newly placed element as much as needed). This results in O(logn) removal, but requires a pointer to the actual element you are looking for.
Amit is right in his answer but here is one more nuance:
the position of the removed item (where you put the right-most leaf) can be required to be bubbled up (compare with parent and move up until parent is larger than you).
Sometimes it is required to bubble down (compare with children and move down until all children are smaller than you). It all depends on the case.
Depends on what is meant by "random element." If it means that the heap contains elements [e1, e2, ..., eN] and one wants to delete some ei (1 <= i <= N), then this is possible.
If you are using a binary heap implementation from some library, it might be that it doesn't provide you with the API that you need. In that case, you should look for another library that has it.
If you were to implement it yourself, you would need two additional calls:
A procedure deleteAtIndex(heap, i) that deletes the node at index i
by positioning the last element in the heap array at i, decrementing the
element count, and finally shuffling down/up
the new ith element to maintain the heap invariant. The most
common use of this procedure is to "pop" the heap by calling
deleteAtIndex(heap, 1) -- assuming 1-origin indexing. This operation
will run O(log n) (though, to be complete, I'll note that the highest bound can be
improved up to O(log(log n)) depending on some assumptions about your elements' keys).
A procedure deleteElement(heap, e) that deletes the element e (your arbitrary element).
Your heap algorithm would maintain an array ElementIndex such that ElementIndex[e] returns
the current index of element e: calling deleteAtIndex(heap, ElementIndex[e])
will then do what you want. It will also run in O(log n) because the array access is constant.
Since binary heaps are often used in algorithms that merely pop the highest (or lowest)
priority element (rather than deleting arbitrary elements), I imagine that some libraries might miss on the deleteAtIndex API to save space (the extra ElementIndex array mentioned above).

Finding the smallest element of a binary max heap stored as Ahnentafel array

I have a binary max heap (largest element at the top), and I need to keep it of constant size (say 20 elements) by getting rid of the smallest element each time I get to 20 elements. The binary heap is stored in an array, with children of node i at 2*i and 2*i+1 (i is zero based). At any point, the heap has 'n_elements' elements, between 0 and 20. For example, the array [16,14,10,8,7,9,3,2,4] would be a valid max binary heap, with 16 having children 14 and 10, 14 having children 8 and 7 ...
To find the smallest element, it seems that in general I have to traverse the array from n_elements/2 to n_elements: the smallest element is not necessarily the last one in the array.
So, with only that array, it seems any attempt at finding/removing the smallest elt is at least O(n). Is that correct?
For any given valid Max Heap the minimum will be at the leaf nodes only. The next question is how to find the leaf nodes of the heap in the array ?. If we carefully observe the last node of the array it will be the last leaf node. Get the parent of the leaf node by the formula
parent node index = (leaf Node Index)/2
Start linear search from the index (parent node index +1) to last leaf node index get the minimum value in that range.
FindMinInMaxHeap(Heap heap)
startIndex = heap->Array[heap->lastIndex/2]
if startIndex == 0
return heap->Array[startIndex]
Minimum = heap->Array[startIndex + 1]
for count from startIndex+2 to heap->lastIndex
if(heap->Array[count] < Minimum)
Minimum := heap->Array[count]
print Minimum
There isn't any way I can think of by which you can get better that O(n) performance for finding and removing the smallest element from a max heap by using the heap alone. One approach that you can take is:
If you are creating this heap data structure yourself, you can keep a separate pointer to the location of the smallest element in the array. So whenever a new element is added to the heap, check if the new element is smaller. If yes, update the pointer etc. Then finding the smallest element would be O(1).
MBo raises a good point in the comment about how to get the next smallest element after each removal. You'll still need to do the O(n) thing to find the next smallest element after each removal. So removal would still be O(n). But finding the smallest element would be O(1)
If you need faster removal as well, you'll need to also maintain a min-heap of all the elements. In that case, removal would be O(log(n)). Insertion will take 2x time because you have to insert into two heaps and it will also take 2x space.
By the way, if you have only 20 elements at any point of time, this is not really going to matter much (unless it is a homework problem or you are just doing it for fun). It would really matter only if you plan to scale it to thousands of values.
There is minmax heap data structure: http://en.wikipedia.org/wiki/Min-max_heap . Of course, it's code is rather complex, but with two separate heaps we have to use a lot of additional space (for the second heap, for maintaining one-to-one mapping) and to do the job twice.

Data structure supporting Add and Partial-Sum

Let A[1..n] be an array of real numbers. Design an algorithm to perform any sequence of the following operations:
Add(i,y) -- Add the value y to the ith number.
Partial-sum(i) -- Return the sum of the first i numbers, i.e.
There are no insertions or deletions; the only change is to the values of the numbers. Each operation should take O(logn) steps. You may use one additional array of size n as a work space.
How to design a data structure for above algorithm?
Construct a balanced binary tree with n leaves; stick the elements along the bottom of the tree in their original order.
Augment each node in the tree with "sum of leaves of subtree"; a tree has #leaves-1 nodes so this takes O(n) setup time (which we have).
Querying a partial-sum goes like this: Descend the tree towards the query (leaf) node, but whenever you descend right, add the subtree-sum on the left plus the element you just visited, since those elements are in the sum.
Modifying a value goes like this: Find the query (left) node. Calculate the difference you added. Travel to the root of the tree; as you travel to the root, update each node you visit by adding in the difference (you may need to visit adjacent nodes, depending if you're storing "sum of leaves of subtree" or "sum of left-subtree plus myself" or some variant); the main idea is that you appropriately update all the augmented branch data that needs updating, and that data will be on the root path or adjacent to it.
The two operations take O(log(n)) time (that's the height of a tree), and you do O(1) work at each node.
You can probably use any search tree (e.g. a self-balancing binary search tree might allow for insertions, others for quicker access) but I haven't thought that one through.
You may use Fenwick Tree
See this question

Resources