Binary search trees with n as the root - algorithm

In my book, I read the following statement:
Suppose that the tree is a binary search tree, then:
If the tree contains all values from 1 to n exactly once, and n is the
root of the tree, then the height of the tree cannot be log2(n)
(rounded up)
Why does this statement hold?

If n is the root, that means that all of the remaining (n-1) elements are on one side of the tree. Thus, the height of the tree is at least d = 1 + log2(n-1)
Little algebra:
log2(n-1) + 1 = log2(n-1) + log2(2) = log2(2(n-1)) > log2(n) for n > 2
So the tree that has n at the root cannot attain height log2(n) if it has more than 2 elements.
Regarding the rouned up part:
Let n=17, 17 in the root. Remaining 16 elements comprise a perfect binary tree of height 4, so total tree depth is 5, which contradicts the claim that the depth can't be ceil(log2(17)) == 5.

Related

Proof that a binary tree with n leaves has a height of at least log n

I've been able to create a proof that shows the maximum total nodes in a tree is equal to n = 2^(h+1) - 1 and logically I know that the height of a binary tree is log n (can draw it out to see) but I'm having trouble constructing a formal proof to show that a tree with n leaves has "at least" log n. Every proof I've come across or been able to put together always deals with perfect binary trees, but I need something for any situation. Any tips to lead me in the right direction?
Lemma: the number of leaves in a tree of height h is no more than 2^h.
Proof: the proof is by induction on h.
Base Case: for h = 0, the tree consists of only a single root node which is also a leaf; here, n = 1 = 2^0 = 2^h, as required.
Induction Hypothesis: assume that all trees of height k or less have fewer than 2^k leaves.
Induction Step: we must show that trees of height k+1 have no more than 2^(k+1) leaves. Consider the left and right subtrees of the root. These are trees of height no more than k, one less than the height of the whole tree. Therefore, each has at most 2^k leaves, by the induction hypothesis. Since the total number of leaves is just the sum of the numbers of leaves of the subtrees of the root, we have n = 2^k + 2^k = 2^(k+1), as required. This proves the claim.
Theorem: a binary tree with n leaves has height at least log(n).
We have already noted in the lemma that the tree consisting of just the root node has one leaf and height zero, so the claim is true in that case. For trees with more nodes, the proof is by contradiction.
Let n = 2^a + b where 0 < b <= 2^a. Now, assume the height of the tree is less than a + 1, contrary to the theorem we intend to prove. Then the height is at most a. By the lemma, the maximum number of leaves in a tree of height a is 2^a. But our tree has n = 2^a + b > 2^a leaves, since 0 < b; a contradiction. Therefore, the assumption that the height was less than a+1 must have been incorrect. This proves the claim.

Number of nodes in the bottom level of a balanced binary tree

I am wondering about two questions that came up when studying about binary search trees. They are the following:
What is the maximum number of nodes in the bottom level of a balanced binary search tree with n nodes?
What is the minimum number of nodes in the bottom level of a balanced binary search tree with n nodes?
I cannot find any formulas in my textbook regarding this. Is there any way to answers these questions? Please let me know.
Using notation:
H = Balanced binary tree height
L = Total number of leaves in a full binary tree of height H
N = Total number of nodes in a full binary tree of height H
The relation is L = (N + 1) / 2 as demonstrated below. That would be the maximum number of leaf nodes for a given tree height H. The minimum number of nodes at a given height is 1 (cannot be zero, because then the tree height would be reduced by one).
Drawing trees with increasing heights, one can observe that:
H = 1, L = 1, N = 1
H = 2, L = 2, N = 3
H = 3, L = 4, N = 7
H = 4, L = 8, N = 15
...
The relation between tree height (H) and the total number of leaves (L)
and the total number of nodes (N) becomes apparent:
L = 2^(H-1)
N = (2^H) - 1
The correctness is easily proven using mathematical induction.
Examples above show that it is true for small H.
Simply put in the value of H (e.g. H=1) and compute L and N.
Assuming the formulas are true for some H, one can show they are also true for HH=H+1:
For L, the assumption is that L=2^(H-1) is true.
As each node has two children, increasing the height by one
is going to replace each leaf node with two new leaves, effectively
doubling the total number of leaves. Therefore, in case of HH=H+1,
the total number of leaves (LL) is going to be doubled:
LL = L * 2
= 2^(H-1) * 2
= 2^(H)
= 2^(HH-1)
For N, the assumption is that N=(2^H)-1 is true.
Increasing the height by one (HH=H+1) increases the total number
of nodes by the total number of added leaf nodes. Therefore,
NN = N + LL
= (2^H) - 1 + 2^(HH-1)
= 2^(HH-1) - 1 + 2^(HH-1)
= 2 * 2^(HH-1) - 1
= (2^HH) - 1
Applying the mathematical induction, the correctness is proven.
H can be expressed in terms of N:
N = (2^H) - 1 // +1 to both sides
N + 1 = 2^H // apply log2 monotone function to both sides
log2(N+1) = log2(2^H)
= H * log2(2)
= H
The direct relation between L and N (which is the answer to the question asked) is:
L = 2^(H - 1) // replace H = log2(N + 1)
= 2^(log2(N + 1) - 1)
= 2^(log2(N + 1) - log2(2))
= 2^(log2( (N + 1) / 2 ))
= (N + 1) / 2
For Big O analysis, the constants are discarded, so the Binary Search Tree lookup time complexity (i.e. H with respect to the input size N) is O(log2(N)). Also, keeping in mind the formula for changing the logarithm base:
log2(N) = log10(N) / log10(2)
and discarding the constant factor 1/log10(2), where instead of 10 one can have an arbitrary logarithm base, the time complexity is simply O(log(N)) regardless of the chosen logarithm base constant.
Assuming that it's a full binary tree, the number of nodes in the leaf will always be equal to (n/2)+1.
For the minimum number of nodes, the total number of nodes could be 1 (satisfying the condition that it should be a balanced tree).
I got the answers from my professor.
1) Maximum number of nodes at the last level: ⌈n/2⌉
If there is a balanced binary search tree with 7 nodes, then the answer would be ⌈7/2⌉ = 4 and for a tree with 15 nodes, the answer would be ⌈15/2⌉ = 8.
But what is troubling is the fact that this formula gives the right answer only when the last level of a balanced tree is completely filled from left to right.
For example, a balanced binary search tree with 5 nodes, the above formula gives an answer of 3 which is not true because a tree with 5 nodes can contain a maximum nodes of 4 nodes at the last level. So I am guessing he meant full balanced binary search tree.
2) Minimum number of nodes at the last level: 1
The maximum number of nodes at level L in a binary tree is 2^L (if you assume that the vertex is level 0). This is easy to see because at each level you spawn 2 children from each previous leaf. The fact that it is balanced/search tree is irrelevant. So you have to find the biggest L such that 2^L < n and subtract it from n. Which in math language is:
The minimum number of nodes depends on the way you balance your tree. There can be height-balanced trees, weight-balanced trees and I assume other balanced trees. Even with height balanced trees you can define what do you mean by a balanced tree. Because technically a tree of 2^N nodes that has a hight of N + 2 is still a balanced tree.

Proof that the height of a balanced binary-search tree is log(n)

The binary-search algorithm takes log(n) time, because of the fact that the height of the tree (with n nodes) would be log(n).
How would you prove this?
Now here I am not giving mathematical proof. Try to understand the problem using log to the base 2. Log2 is the normal meaning of log in computer science.
First, understand it is binary logarithm (log2n) (logarithm to the base 2).
For example,
the binary logarithm of 1 is 0
the binary logarithm of 2 is 1
the binary logarithm of 3 is 1
the binary logarithm of 4 is 2
the binary logarithm of 5, 6, 7 is 2
the binary logarithm of 8-15 is 3
the binary logarithm of 16-31 is 4 and so on.
For each height the number of nodes in a fully balanced tree are
Height Nodes Log calculation
0 1 log21 = 0
1 3 log23 = 1
2 7 log27 = 2
3 15 log215 = 3
Consider a balanced tree with between 8 and 15 nodes (any number, let's say 10). It is always going to be height 3 because log2 of any number from 8 to 15 is 3.
In a balanced binary tree the size of the problem to be solved is halved with every iteration. Thus roughly log2n iterations are needed to obtain a problem of size 1.
I hope this helps.
Let's assume at first that the tree is complete - it has 2^N leaf nodes. We try to prove that you need N recursive steps for a binary search.
With each recursion step you cut the number of candidate leaf nodes exactly by half (because our tree is complete). This means that after N halving operations there is exactly one candidate node left.
As each recursion step in our binary search algorithm corresponds to exactly one height level the height is exactly N.
Generalization to all balanced binary trees: If the tree has less nodes than 2^N we for sure don't need more halvings. We might need less or the same amount but never more.
Assuming that we have a complete tree to work with, we can say that at depth k, there are 2k nodes. You can prove this using simple induction, based on the intuition that adding an extra level to the tree will increase the number of nodes in the entire tree by the number of nodes that were in the previous level times two.
The height k of the tree is log(N), where N is the number of nodes. This can be stated as
log2(N) = k,
and it is equivalent to
N = 2k
To understand this, here's an example:
16 = 24 => log2(16) = 4
The height of the tree and the number of nodes are related exponentially. Taking the log of the number of nodes just allows you to work backwards to find the height.
Just look up the rigorous proof in Knuth, Volume 3 - Searching and Sorting Algorithms ... He does it far more rigorously than anyone else I can think of.
http://en.wikipedia.org/wiki/The_Art_of_Computer_Programming
You can find it in any good Computer Science library and on the bookshelves of many (very) old geeks.
Why is the height of a balanced binary tree equal to ceil(log2N) for N nodes?
w = width of base (maximum number of leaves)
h = height of tree (maximum number of edges from root to leaf)
Divide w by 2 (h times) to get to 1, which counts the single root node at top.
N = w + w/2 + ... + 1
N = 2h + ... + 21 + 20
= (1-2h+1) / (1-2) = 2h+1-1
log2(N+1) = h+1
Check: if N=1, h=0. If h=1, N=3.
This formula is for if the bottom level is full. N will not always be so great, but would still have the same height, h. So we must take the log's ceiling.

Finding the no of leaf nodes

An N-ary tree has N sub-nodes for each node. If the tree has M non-leaf nodes, How to find the no of leaf nodes?
First of all if the root is level 0, then the K-th level of the tree will have N^K nodes. You can start incrementing a counter level by level until you get M nodes. This way you will find how many levels is the tree consisting of. And the number of leaf nodes is the number of nodes on the last level - it is N^lastLevel.
Here is an example: N = 3, M = 4.
First level = 3^0 = 1
Second level = 3^1 = 3
1 + 3 = 4
So we found that the tree has two levels(counting from 0).
The answer is 3^2 = 9.
Note: You can find the level number also directly, by noticing that M is a sum of geometric progression: 1 + 3 + 9 + 27 ... = M
Hope it is clear.
Mathematically speaking the nodes increase in the geometric progression.
0th level - 1
1st level - n
2nd level - n ^2
3rd level - n ^ 3
....
mth level - n ^ m
So the total number of nodes at m-1st level is 1 + n + n^2 + .. + n ^ m-1.
Now there is a good formula to calculate 1 + a + a^2 + a^3 + ... + a^m , which is
(1 - n^(m+1))/(1-n), lets call this quantity K.
Now what we need is the number of leaf nodes which is n ^ m, and what we have is K. i.e. total number of non-leaf nodes. Doing some mathematical formula adjustment you will find that
n ^ m = K *(n-1) + 1.
e.g. Lets say in 3-ary tree the total number of non-leaf nodes are 40, then using this formula you get the total number of leaf-nodes as 81 which is the right answer.

Minimum and maximum height of binary search trees, 2-3-4 trees and B trees

Can anyone please tell me how you find the min/max height of B trees, 2-3-4 trees and binary search trees?
Thanks.
PS: This is not homework.
Minimal and Maximal height of a 2-4 tree
For maximal height of a 2-4 tree, we will be having one key per node, hence it will behave like a Binary Search Tree.
keys at level 0 = 1
keys at level 1 = 2
keys at level 2 = 4 and so on. . . .
Adding total number of keys at each level we get a GP on solving which we will get the maximal height of the tree.
Hence, height = log2(n+1) - 1
Solving it for a total of 10^6 keys we will get :
⇒ 1 * (2^0+ 2^1 + 2^2 + . .. . . . +2^h) = 10^6
⇒ 1*(2^(h+1) - 1) = 10^6
⇒ h = log2(10^6 + 1) - 1
⇒ Maximal height of 2-4 tree with a total of 10^6 keys is 19
For minimal height of a 2-4 tree, we will be having three keys(maximum possible number) per node.
keys at level 0 = 3
keys at level 1 = 3*(4)
keys at level 2 = 3*(4^2) and so on . . .
Hence, height = log4(n+1) - 1
Adding total number of keys at each level we will get a GP on solving which we will get the minimal height. Solving it for a total of 10^6 keys we get:
⇒ 3 * (4^0+ 4^1 + 4^2 + . .. . . . +4^h) = 10^6
⇒ (4^(h+1) - 1) = 10^6
⇒ h = log4(10^6 + 1) - 1
⇒ Minimal height of 2-4 tree is 9
Binary Search Tree
For maximal height we will have a continuous chain of length n(total number of nodes) hence giving us a height equal to n-1(as height starts from 0).
For minimal height we will have a perfectly balanced tree and as solved earlier we will have a height equal to log2(n+1)-1
If you want to know the length of the longest branch you have to traverse the whole tree keeping note of "the longest branch so far".
Start from root node and look for its children
if it is having a child node then
Select the left most child and store others in any one data structure
else
if the height of that node is maximum til now
set it as max
end if
end if
Loop through all nodes of tree and whatever you get at last is the maximum height
Similar you can do for minimum
Minimal height of a binary tree is O(log n), maximal is O(n), depending on how balanced it is.
Wikipedia has a lovely bit about B Tree Heights.
I'm not familiar with 2-3-4 trees, but according to wikipedia they have similar isometry to red-black and B trees, so the above link should educate you on that as well.
As for B trees, the min/max heights depend on the branching factor chosen for the implementation.
Binary trees have a maximum height of n when input is inserted in order, the minimum height is of course log_2(n) when the tree is perfectly balanced. When input is inserted in random order the average height is about 1.39 * log_2 n.
I am not too familiar with b trees but the minimum height is of course log_m(n) when perfectly balanced (m is the number of children per node). According to Wikipedia the maximum height is log_(m/2)(n).
2-3 trees have a maximum height of log_2(n) when the tree consists of only 2-nodes and the minimum height is about log_3(n) [~0.631 log_2(n)] when the tree consists of only 3-nodes.

Resources