In CLRS, third Edition, on page 155, it is given that in MAX-HEAPIFY,
The children’s subtrees each have size at most 2n/3—the worst case
occurs when the bottom level of the tree is exactly half full.
I understand why it is worst when the bottom level of the tree is exactly half full. And it is also answered in this question worst case in MAX-HEAPIFY: "the worst case occurs when the bottom level of the tree is exactly half full"
My question is how to get 2n/3?
Why if the bottom level is half full, then the size of the child tree is up to 2n/3?
How to calculate that?
Thanks
In a tree where each node has exactly either 0 or 2 children, the number of nodes with 0 children is one more than the number of nodes with 2 children.{Explanation: number of nodes at height h is 2^h, which by the summation formula of a geometric series equals (sum of nodes from height 0 to h-1) + 1; and all the nodes from height 0 to h-1 are the nodes with exactly 2 children}
ROOT
L R
/ \ / \
/ \ / \
----- -----
*****
Let k be the number of nodes in R. The number of nodes in L is k + (k + 1) = 2k + 1. The total number of nodes is n = 1 + (2k + 1) + k = 3k + 2 (root plus L plus R). The ratio is (2k + 1)/(3k + 2), which is bounded above by 2/3. No constant less than 2/3 works, because the limit as k goes to infinity is 2/3.
Understand the maximum number of elements in a subtree happens for the left subtree of a tree that has the last level half full.Draw this on a piece of paper to realize this.
Once that is clear, the bound of 2N/3 is easy to get.
Let us assume that the total number of nodes in the tree is N.
Number of nodes in the tree = 1 + (Number of nodes in Left Subtree) + (Number of nodes in Right Subtree)
For our case where the tree has last level half full, iF we assume that the right subtree is of height h, then the left subtree if of height (h+1):
Number of nodes in Left Subtree =1+2+4+8....2^(h+1)=2^(h+2)-1 .....(i)
Number of nodes in Right Subtree =1+2+4+8....2^(h) =2^(h+1)-1 .....(ii)
Thus, plugging into:
Number of nodes in the tree = 1 + (Number of nodes in Left Subtree) + (Number of nodes in Right Subtree)
=> N = 1 + (2^(h+2)-1) + (2^(h+1)-1)
=> N = 1 + 3*(2^(h+1)) - 2
=> N = 3*(2^(h+1)) -1
=> 2^(h+1) = (N + 1)/3
Plugging in this value into equation (i), we get:
Number of nodes in Left Subtree = 2^(h+2)-1 = 2*(N+1)/3 -1 =(2N-1)/3 < (2N/3)
Hence the upper bound on the maximum number of nodes in a subtree for a tree with N nodes is 2N/3.
For a complete binary tree of height h, number of nodes is f(h) = 2^h - 1. In above case we have nearly complete binary tree with bottom half full. We can visualize this as collection of root + left complete tree + right complete tree. If height of original tree is h, then height of left is h - 1 and right is h - 2. So equation becomes
n = 1 + f(h-1) + f(h-2) (1)
We want to solve above for f(h-1) expressed as in terms of n
f(h-2) = 2^(h-2) - 1 = (2^(h-1)-1+1)/2 - 1 = (f(h-1) - 1)/2 (2)
Using above in (1) we have
n = 1 + f(h-1) + (f(h-1) - 1)/2 = 1/2 + 3*f(h-1)/2
=> f(h-1) = 2*(n-1/2)/3
Hence O(2n/3)
To add to swen's answer. How (2k + 1) / (3k + 2) tends to 2 / 3, when k tends to infinity,
Lim_(k -> inf) (2k + 1) / (3k + 2) = Lim_(k -> inf) k(2 + 1 / k) / k(3 + 2 / k) = Lim_(k -> inf) (2 + 1 / k) / (3 + 2 / k)
apply the limit, and you get 2/3
Number of nodes at -
level 0 i.e. root is 2^0
level 1 is 2^1
level 2 is 2^2
...
level n is 2^n
Summation of all nodes from level 0 up to level n,
S = 2^0 + 2^1 + 2^2 + ... + 2^n
From geometric series summation rule we know that
x^0 + x^1 + x^2 + ... + x^(n) = (x^(n+1) - 1)/(x-1)
Substituting x = 2, we get
S = 2^(n+1) - 1. i.e. 2^(n+1) = S + 1
As 2^(n+1) is the total nodes at level n+1, we can say that the number of nodes with 0 children is one more than the number of nodes with 2 children.
Now lets calculate number of nodes in left subtree, right tree and total ..
Assume that number of non-leaf nodes in the left subtree of root = k.
By the above reasoning, number of leaf nodes in the left subtree or
root = k + 1.
Number of non-leaf nodes in the right subtree of root = k as the tree is said to be exactly half full.
Total number of nodes in the left subtree of root = k + k + 1 = 2k +
Total number of nodes in the tree, n = (2k + 1) + k + 1 = 3k + 2.
Ratio of nodes in the left subtree and total nodes = (2k + 1) / (3k +
2) which is bounded above by 2/3.
That's the reason of saying that the children’s subtrees each have size at most 2n/3.
Related
The original question is below.
For any binary tree with n nodes and i internal nodes, the relation between n and i is _____ <= i.
In my opinion
I thought n/2 <= i, but I can't not figure out what condition will let n/2 = i.
I also want to ask that the root node is internal node?
In a full tree there are 1 + 2 + ... + 2^(h-1) internal nodes and 2^h leaf nodes (where h is the height of the tree).
That means the total number of nodes is
n = 1 + 2 + .... + 2^h = 2^(h+1)-1
i = 1 + 2 + ... + 2^h-1 = 2^h - 1
Now, for the relation you are looking for:
n-i = 2^h+1 - 1 - 2^h + 1 = 2^h+1 - 2^h = 2*2^h - 2^h = 2^h
Since 2^h = i+1, you get:
n - i = i+1
n - 1 = 2i
(n - 1)/2 = i
(Note that there is no issue with non integer result, since n is always odd in full tree).
Now, all you have left to show is that full tree is indeed the one with the highest such ratio (this is left for the reader).
How do you prove that a binary heap with n nodes has exactly ⌈n / 2⌉ leaf nodes?
Let x be the height of tree in which case 2^x = no of leaves
=> 2^0 + 2^1+ 2^2 + 2^3 +...2^x = n
=> 2^(x+1) - 1 = n (By sum series power of 2 formula)
=>2^(x+1)= n+1
=> log(n+1) = x+1
=>log(n+1)-1 = x;
=>log(n+1)- log2 =x
x =log(n+1/2)
=> no of leaves = (n+1)/2 (which is 2^(log(n+1/2))
A good intuition for this is to think about this inductively. For the n = 0 case, there aren't any leaves, and for the n = 1 case the root is the only leaf. For each added node after that, it either (1) adds a child to a node that was previously a leaf and now has one child, not changing the number of leaf nodes, or (2) adds a child to a node that already has one child, increasing the number of leaves by one. Using induction, you can formalize this to prove that the number of leaves in a binary heap is ⌈n / 2⌉.
Lets say we have n nodes so
0-level
1-level
...
d-level // Total 2^d nodes at this level. #ofLeafNodes at this level depends on how many nodes are there at below level.
last-level-containing-X-nodes // All leaf nodes
Leaf nodes can come only from last two levels.
Last level have x leaf nodes.
While second last level would have total 2^d nodes and ⌈x / 2⌉ leaf nodes.
So, in total leaf nodes =
= (Leaf nodes at bottom-most level) + (Leaf nodes at second bottom-most level)
= (x + (2^d - ⌈x / 2⌉)) //Equation 1
Total nodes in the binary heap tree=>
n = (2^(d + 1) - 1) + x
2*(2^d) = n-x+1
2^d = (n-x+1)/2
Now substitute 2^d in above eq#1,
(x + (2^d - ⌈x / 2⌉))
=(x + (n-x+1)/2 - ⌈x / 2⌉)
= n/2 + x/2 + 1/2 - ⌈x / 2⌉
For even n, x will be odd:
= n/2 + x/2 + 1/2 - (x/2 + 1/2)
= n/2 + x/2 + 1/2 - x/2 - 1/2
= n/2
= ⌈n / 2⌉ /*Since n is even*/
For odd n, x will be even:
= n/2 + x/2 + 1/2 - (x/2 + 1/2)
= n/2 + x/2 + 1/2 - x/2
= n/2 + 1/2
= ⌈n / 2⌉
To solve the problem, we need to find the last internal node and the first leaf node of the heap.
From the heap property, we know that a node with index i,
Parent[i] = ceil(i/2)
Left[i] = 2i
Right[i] = 2i+1
Assume the last internal node is at floor(n/2), and first leaf node is at floor(n/2)+1.
Right[floor(n/2)]
= 2(floor(n/2)) + 1
<= 2 * (n-(2-1))/2 + 1
= 2 * (n-1)/2 + 1
= n
From Right[ floor(n/2) ] <= n, we know that floor(n/2) will have a right child in bound of the heap, thus, it is the last internal nodes. Also, all nodes before floor(n/2) will be internal nodes that have a right child in bound of the heap. Then, we know that internal nodes are indexed by 1, 2, 3, 4, ... floor(n/2), which means the index of last internal is number of internal nodes in the heap.
Left[floor(n/2) + 1]
= 2 * (floor(n/2) + 1)
> 2 * (n/2 - 1 + 1)
= 2 * n/2
= n
From Left[ floor(n/2)+1] <= n, we know that floor(n/2)+1 will have a left child out of bound of the heap, thus, it is the first leaf node. Also, all nodes after floor(n/2)+1 will have a left child out of bound of the heap. Then, we know that nodes after floor(n/2) (last internal node), in the heap will not have child nodes, so they are all leaves that are indexed by floor(n/2)+1, floor(n/2)+2, floor(n/2)+3, ... n.
Thus, from above we know that:
number of leaves = element of heap - number of internal nodes
= n - floor(n/2)
= ceil(n/2)
In my answer to this question, I used two formulas that I arrived at by ad-hoc means, and I am at a loss for a simple explanation for why these formulas work. Here is the problem in full:
Consider a perfect or complete K-ary tree of height H where every node is labeled by their rank in a breadth-first traversal, and the its dual where every node is labeled in depth-first order. Here is an example with K=2, H=2:
_ 0 _ _ 0 _
/ \ / \
1 2 1 4
/ \ / \ / \ / \
3 4 5 6 2 3 5 6
For the BF-ordered tree, the i-th child of a node N at depth D is given by:
K*N + 1 + i
For the DF-ordered tree, the i-th child of a node N at depth D is given by:
N + 1 + i*step, where step = (K^(H - D) - 1) / (K - 1)
What is an intuitive explanation for these formulas?
The BF-ordered formula makes sense to me when looking at any hand-drawn example, but I can't put into words why it works. In the DF-ordered case, the best I can come up with is this:
For a node N at depth D in a DFS-numbered K-ary tree of height H, its first child is simply N+1 because it is the next node to be visited in a depth-first traversal. The second child of N will be visited directly after visiting the entire sub-tree rooted at the first child (N+1), which is itself a complete K-ary tree of height H - (D + 1). The size of any complete, K-ary tree is given by the sum of a finite geometric series as explained here. The size of said sub-tree is the distance between the first and second children, and, in fact, it is the same distance between all siblings since each of their sub-trees are the same size. If we call this distance step, then:
1st child is N + 1
2nd child is N + 1 + step
3rd child is N + 1 + step + step
...and so on.
Can anyone provide a better explanation for how or why these formulas work?
For the BFS:
If node N is at depth D and there is a nodes before N at depth D (and b nodes after):
N = K^0 + K^1 + ... + K^(D-1) + a
How many nodes will be labeled before its first child? There is b remaining nodes at depth D and a * K "child" nodes at depth D+1 that will come before. So if C is the label of the first child of N:
C = N + b + a * K + 1
C = K^0 + K^1 + ... + K^(D-1) + a + b + a * K + 1
C = K^0 + K^1 + ... + K^(D-1) + K^D + a * K
Indeed there is K^D nodes at depth D so a + b + 1 = K^D, therefor:
C = 1 + (K^0 + ... + K^(D-2) + K^(D-1) + a )* K
C = 1 + N*k
For the DFS:
To compute the size of the step you have to compute the size of the remaining sub-tree, and like a sub-tree of a perfect K-ary tree is itself a perfect K-ary tree, you can compute its number of nodes.
I'm getting stumped by this problem:
You have a tree with every internal node having k children, with k >= 2.
What is the maximum number of nodes that such a tree can have, if its depth is d? Prove your
answer by induction on d.
So I realize that if k was 2, the geometric series would be 1 + 2 + 4 + 8...+2^n, but I can't figure out how to include depth and how to prove it inductively.
The number of items in a full k-ary tree of n levels is (k^n - 1)/(k - 1).
A binary tree of 5 levels, for example, has 31 nodes (1 + 2 + 4 + 8 + 16). Or:
(2^5 - 1)/(2 - 1) = 31/1 = 31
A 4-ary tree of 4 levels has 85 nodes (1 + 4 + 16 + 64)
(4^4 - 1)/(4 - 1) = 256/3 = 85
If you write out a few of those for different values of k, you should be able to derive the inductive proof.
An N-ary tree has N sub-nodes for each node. If the tree has M non-leaf nodes, How to find the no of leaf nodes?
First of all if the root is level 0, then the K-th level of the tree will have N^K nodes. You can start incrementing a counter level by level until you get M nodes. This way you will find how many levels is the tree consisting of. And the number of leaf nodes is the number of nodes on the last level - it is N^lastLevel.
Here is an example: N = 3, M = 4.
First level = 3^0 = 1
Second level = 3^1 = 3
1 + 3 = 4
So we found that the tree has two levels(counting from 0).
The answer is 3^2 = 9.
Note: You can find the level number also directly, by noticing that M is a sum of geometric progression: 1 + 3 + 9 + 27 ... = M
Hope it is clear.
Mathematically speaking the nodes increase in the geometric progression.
0th level - 1
1st level - n
2nd level - n ^2
3rd level - n ^ 3
....
mth level - n ^ m
So the total number of nodes at m-1st level is 1 + n + n^2 + .. + n ^ m-1.
Now there is a good formula to calculate 1 + a + a^2 + a^3 + ... + a^m , which is
(1 - n^(m+1))/(1-n), lets call this quantity K.
Now what we need is the number of leaf nodes which is n ^ m, and what we have is K. i.e. total number of non-leaf nodes. Doing some mathematical formula adjustment you will find that
n ^ m = K *(n-1) + 1.
e.g. Lets say in 3-ary tree the total number of non-leaf nodes are 40, then using this formula you get the total number of leaf-nodes as 81 which is the right answer.