I want to know about the following statement in build heap function
for i=A.length/2 downto 1
As this step was deduced by hit & trial to find out the parent of leaves or there was something else in the mind of the person who developed this algorithmBelow is build heap function-
Build_Max_Heap(A)
A.heap_size=A.length
for i=A.length/2 downto 1
Max_Heapify(A,i)
The nodes of the second half of the array are leaves(explanation follows in next paragraph) and are thus trivially 1-node max heaps already, and hence Max_Heapify need not be done for them.
Even if you call Max_Heapify for those nodes, no harm in terms of time complexity as the leaf nodes have no nodes below and Max_Heapify would return immediately anyway.
Mathematically it's easy to establish that the second half of the array are leaves by the way summation of a Geometric Progression works. Recall that a heap is a complete binary tree, which means it will have all nodes filled in each level, except possibly the last level, in which it will be filled partially from left to right. For sake of simplicity, let's assume the number of nodes in our heap as 2^N. Clearly this tree has N levels with all levels filled.
First level has 2^0 node
Second level has 2^1 nodes
Third level has 2^2 nodes
N-1 level has 2^N-2 nodes
Nth level has 2^N-1 nodes
Sum of the count of nodes in all levels except last = 2^0 + 2^1...+ 2^N-2
= 2^N-1 - 1
This is the number of nodes in the last level off by one. That is, the the total number of nodes in all levels except the last is almost same as the number of nodes in just the last level, which directly implies that the last level must have half the total number of all nodes in the heap in it. By this observation, we get A.length/2.
Related
Assume that a max heap with 10^6 elements is stored in a complete 7-ary tree. Approximately how many comparisons a call to removeMin() will make?
5000
50
10^6
500
5
My solution: The number of comparisons should be equal to the number of leaf nodes at most because in max heap, the min. can be found at any of the leaf nodes which is not in the above options. Better approach was to take the square of ( log of 10^6 to the base 7) which gives 50 but this is only when we are sure that the minimum element will follow a single branch across tree which in the case of max heap is not correct.
I hope that you can help.
There's no "natural" way to remove the minimum value from a max heap. You simply have to look at all the leaf nodes to figure out which one happens to be the minimum.
The question then is how many leaf nodes there are. Intuitively, we'd expect the fraction of nodes in the heap that are leaves to be pretty close to the total number of nodes. Take it to the limit - if you have a 1,000,000-ary heap, you'd have one node in the top layer and all remaining 999,999 elements in the next layer. Even in the smallest case where the heap is a binary heap, you'd expect roughly half the elements to be in the bottom layer.
More specifically, let's do some math! How many leaves will a 7-ary heap with n nodes have? Well, each node in the tree will either
be a leaf, or
have seven children,
with one possible exception that, since the bottommost row might not be full, there might be one node with fewer than seven children. Since that's just a one-off, we can ignore that last node when we're dealing with millions of elements. A quick proof by induction can be used to show that any tree where each node either has no children or seven children will have seven times as many leaf nodes as internal nodes (prove this!), so we'd expect than (7/8)ths of the nodes will be leaves, for a total of 875,000 leaves to check.
As a result, the best answer here would be roughly 106 comparisons.
Min element can be any of the leaves of a max heap or any type, and there's no order there. All elements from A[10^6/7 + 1] onwards (where A is the array storing the leaves) are leaf nodes and need to be checked. This means 8571412 comparisons just to find the minimum. After that there is no simple way to 'remove' the minimum without introducing a gap that cannot be filled by simply shifting the leaves.
This is a misprint. Maybe the teacher wanted to ask removeMax, for which the answer is close to 50 -- see below:
There are 7 comparisons per level done by the heapify since each node has 7 children. If h is the height of the heap then that's 7*h comparisons.
Rough analysis: (here ~ means approximately)
h ~ log_7(10^6) = 7.1, hence total comparisons 7*7.1 ~ 50
More accurate analysis:
A 7-ary heap would have elements: 1 + 7 + 7^2 + ... + 7^h = 10^6
On the left side is a geometric series, that sums to: (7^h -1)/6 = 10^6
=> 7^h = 6*10^6 + 1
=> h = lg_7(6*10^6 + 1) = 8 (approximately) , hence 7*8 = 56, still from the options 50 is the closest.
*A is array to sort heap.
I need to prove that the median of binary heap (doesn't matter if it is a min heap or max heap) can be in the lowest level of the heap (in the leaf). I am not sure how to prove it. I thought about using the fact that a heap is a complete binary tree but I am not sure about it. How can I prove it?
As #Evg mentioned in the comments, if all elements are the same, this is trivially true. Assume that all elements need to be different, and let us focus on the case with an odd amount of nodes 2H+1 and a min heap (the max heap case is similar). To create the min heap where the median is at the bottom, first insert the smallest H elements.
There are two cases. Case 1; after doing this the binary tree formed by these H elements is completely filled (every layer is filled) then you can just insert the remaining H+1 elements on the last layer (which you can do since the maximum capacity of the last layer equals (#total_nodes+1)/2 which is precisely H+1).
Case 2 The last layer still has some unfilled spaces. In this case, take the smallest remaining nodes from the largest H elements until this layer is filled (note that there will be no upward movement in your heap since these elements are already larger than whatever is in the tree). Then start the next layer by inserting the median. Finally insert the remaining nodes, which won't be moved upwards either since they are larger than whatever is in the layer above, by construction. By the same argument about the capacity of the last layer, you will not need to start a new layer during this process.
In the case where there are an even amount of nodes 2H, you can argue similarly, but you would have to define the median as H+1 smallest node (otherwise the statement you want to prove is false, as you can see by noticing that the only possible min-heap for the set {1,2} is the tree with root at 1 and leaf at 2).
Easiest way to prove it is just to make one:
1
2 3
4 5 6 7
Any complete heap with nodes in level order will have the median at the left-most leaf, but you don't have to prove that.
I think when inserting a new node into a heap, the amount of nodes it might passes by is logN, why is it (1 + logN), where is 1 from?
This is necessary to account for the border case when the number of notes is 2n. A heap of n levels fits 2n-1 objects, so adding one more object starts the new level:
Black squares represent seven elements of a three-level heap. Red element is number eight. If your search takes you to the location of this last element, you end up with four comparisons, even though log28 is three.
When building the heap, we start calling max_heapify(A,i) from the middle of the tree, i.e. floor(n/2), until the root in decreasing fashion to maintain heap property. I've read some reasons behind this but I still don't understand why. Kindly, can someone explain the reason of that?
Thank you.
If we do it this way, the time complexity is linear in the worst case (the idea of the proof is to observe that when an element is sifted down, another element moves up and element can never go down once it has been moved up. Thus, the number of times each leaf goes down is zero, the number of time each element one level above the leaves goes up is at most 1 and so on. If we compute this sum explicitly, it turns out to be O(N)).
If we start from the end and sift elements up the time complexity is O(N log N) (for example, if the array is reversed).
To sum up, this way is more efficient.
Note: we could have started from the last element, but a leaf can never go down anyway, so it would be useless (the time complexity would stay linear, though).
If we start the heapify process from the beginning (or root node) that would be wrong because the rest of the heap would not be a max heap so, we can't guarantee that the root node will be the highest node.
So it would make sense we start from the end. That is, bottom up approach makes sense.
But if we start from the end, then that means we're starting from the leaf nodes which will not go up (nothing to perform in heapify if we're calling on leaf nodes). So instead we start from a level above the leaf nodes and hence keep calling heapify for all the nodes from a level above leaf node to the root node.
The index of the parent node of the leaf node is nothing but n / 2 - 1 where n is the size of the array.
We can calculate this easily:
The last child node or leaf node has the index of n - 1
so
c = n - 1
If p is the index of its parent node
Then
c = 2 * p + 1
p = (c - 1) / 2
Substitute
c = n - 1
p = (n - 2) / 2
p = n / 2 - 1
Which is the floor of n / 2
Hope it makes sense now!
Is it 2n? Just checking.
Terminology
The Order of a B-Tree is inconstantly defined in the literature.
(see for example the terminology section of Wikipedia's article on B-Trees)
Some authors consider it to be the minimum number of keys a non-leaf node may hold, while others consider it to be the maximum number of children nodes a non-leaf node may hold (which is one more than the maximum number of keys such a node could hold).
Yet many others skirt around the ambiguity by assuming a fixed length key (and fixed sized nodes), which makes the minimum and maximum the same, hence the two definitions of the order produce values that differ by 1 (as said the number of keys is always one less than the number of children.)
I define depth as the number of nodes found in the search path to a leaf record, and inclusive of the root node and the leaf node. In that sense, a very shallow tree with only a root node pointing directly to leaf nodes has depth 2. If that tree were to grow and require an intermediate level of non-leaf nodes, its depth would be 3 etc.
How many elements can be held in a B-Tree of order n?
Assuming fixed length keys, and assuming that "order" n is defined as the maximum number of child nodes, the answer is:
(Average Number of elements that fit in one Leaf-node) * n ^ (depth - 1)
How do I figure?...:
The data (the "elements") is only held in leaf nodes. So the number of element held is the average number of elements that fit in one node, times the number of leaf nodes.
The number of leaf nodes is itself driven by the number of children that fit in a non-leaf node (the order). For example the non-leaf node just above a leaf node, points to n (the order) leaf-nodes. Then, the non-leaf node above this non-leaf node points to n similar nodes etc, hence "to the power of (depth -1)".
Note that the formula above generally holds using the averages (of key held in a non-leaf node, and of elements held in a leaf node) rather than assuming fixed key length and fixed record length: trees will typically have a node size that is commensurate with the key and record sizes, hence holding a number key or records that is big enough that the effective number of keys or record held in any leaf will vary relatively little compared with the average.
Example:
A tree of depth 4 (a root node, two level of non-leaf nodes and one level [obviously] of leaf nodes) and of order 12 (non-leaf nodes can hold up to 11 keys, hence point to 12 nodes below them) and such that leaf nodes can contain 5 element each, would:
- have its root node point to 12 nodes below it
- each node below it points to 12 nodes below them (hence there will be 12 * 12 nodes in the layer "3" (assuming the root is layer 1 etc., this numbering btw is also ambiguously defined...)
- each node in "layer 3" will point to 12 leaf-nodes (hence there will be 12 * 12 * 12 leaf nodes.
- each leaf node has 5 elements (in this example case)
Hence.. such a tree will hold...
Nb Of Elements in said tree = 5 * 12 * 12 * 12
= 5 * (12 ^ 3)
= 5 * (12 ^ depth -1)
= 8640
Recognize the formula on the 3rd line.
What is generally remarkable for B-Tree, and which makes for their popularity is that a relatively shallow tree (one with a limited number of "hops" between the root and the sought record), can hold a relatively high number record. This number is multiplied by the order at each level.
My book says that the order of a B-tree is the maximum number of pointers that can be stored in a node. (p. 348) The number of "keys" is one less than the order. So a B-tree of order n can hold n-1 elements.
The book is "File Structures", second edition, by Michael J. Folk.
If your formula for the number of elements doesn't include an exponentiation somewhere, you've done it wrong.
A binary tree of order 5 can hold 2^0 + 2^1 + 2^2 + 2^3 + 2^4 elements, so 31 .. (which is 2^order - 1).
Edit:
I appear to have gotten order and depth / length mixed up. What on earth is the order of a binary tree? You appear to discuss B-trees as if they don't, by the very nature of their definition, hold a maximum of two child elements per element.
Let Order of b-tree is 'm' means maximum number of nodes that can be inserted at same level in a b-tree=m-1.After that nodes will splits.
for ex: if order is 3 then only 2 maximum node can be inserted on arrival of 3rd element ,nodes will splits by following the property of binary search tree or self balancing tree.