Best case height for a binary tree with N internal nodes - algorithm

I am working through Algorithms in C++ by Robert Sedgewick and came across the following statement:
The height of a binary tree with N internal nodes is at least lg N
and at most N-1. The best case occurs in a balanced tree with 2^i
internal nodes at every level except possibly the bottom level. If the
height is "h" then we must have
2^(h-1) < N+1 <= 2^h
since there are N+1 external nodes.
There wasn't much explanation surrounding the inequality, so my question is: how did the author deduce the inequality and what is it showing exactly?
Thanks!

The inequality 2^(h-1) < N + 1 <= 2^h demonstrates that, for a given height h, there is a range of node quantities that will have h as a minimum height in common. This is indicative of the property: all binary trees containing N nodes will have a height of at least log(N) rounded up to the next integer.
For example, a tree with either 4, 5, 6 or 7 nodes can have at best a minimum height of 3. One less than this range, and you can have a tree of height 2; one more and the best you can do is a height of 4.
If we map out the minimum height for a tree that grows from 3 nodes to 8 nodes using the base 2 logarithms for N and round up, the inequality becomes clear:
log(3) = 1.58 -> 2 [lower bound]
log(4) = 2 -> 3 [2^(h-1)]
log(5) = 2.32 -> 3
log(6) = 2.58 -> 3
log(7) = 2.81 -> 3
log(8) = 3 -> 4 [2^h | upper bound]
It might be useful to notice that the range (made up of N+1 different quantities) is directly related to the number of external nodes for a given tree. Take a tree with 3 nodes and having a height of 2:
*
/ \
* *
add one node to this tree,
* * * *
/ \ / \ / \ / \
* * or * * or * * or * *
/ \ / \
* * * *
and regardless of where you place it, the height will increase by 1. We can then keep creating leaf nodes without changing the height until the tree contains 7 nodes in total, at which point, any further additions will increase the minimum possible height once more:
*
/ \
* *
/ \ / \
* * * *
Originally, N was equal to 3 nodes, which meant N+1 = 4 and we saw that there were 4 quantities that had a common minimum height.
If you need more information, I suggest you look up the properties of complete and balanced binary trees.

Let's call the minimum height required to fit N nodes in a binary tree minheight(N).
One way to derive a lower bound on the tree height for a given number N of nodes is to work from the other direction: given a tree of height h, what is the maximum number of nodes that can be packed into it?
Let's call this function of height maxnodes(h). Clearly the number of nodes on a binary tree of given height is maximised when the tree is full, i.e. when each internal node has 2 children. Induction will quickly show that maxnodes(h) = 2^h - 1.
So, if we have N nodes, every h for which maxnodes(h) >= N is an upper bound for minheight(N): that is, you could fit all N nodes on a tree of that height. Of all these upper bounds, the best (tightest) one will be the minimum. So what we want to find is the smallest h such that
N <= maxnodes(h) = 2^h - 1
So how to find this smallest satisfying value of h?
The important property of maxnodes(h) is that it is nondecreasing w.r.t. h (in fact it's strictly increasing, but nondecreasing is sufficient). What that means is that you can never fit more nodes into a full binary tree by reducing its height. (Obvious really but it helps to spell things out sometimes!) This makes rearranging the above equation to find the minimum value of h easy:
2^h - 1 >= N
2^h >= N+1 # This is the RHS of your inequality, just flipped around
h >= log2(N+1) # This step is only allowed because log(x) is nondecreasing
h must be integer, so the smallest value of h satisfying h >= log2(N+1) is RoundUp(log2(N+1)).
I find this to be the most useful way to describe the lower bound, but it can be used to derive the LHS of the inequality you're asking about. Starting from the 2nd equation in the previous block:
2^h >= N+1
The set of h values that satisfy this inequality begins at h = log2(N+1) and stretches out to positive infinity. Since h = log2(N+1) is the minimum satisfying value in this set, anything lower must not satisfy the inequality, so in particular h-1 will not satisfy it. If a >= inequality does not hold between two real (non-infinite) numbers then the corresponding < inequality must hold, so:
2^(h-1) < N+1

Related

Proof that a binary tree with n leaves has a height of at least log n

I've been able to create a proof that shows the maximum total nodes in a tree is equal to n = 2^(h+1) - 1 and logically I know that the height of a binary tree is log n (can draw it out to see) but I'm having trouble constructing a formal proof to show that a tree with n leaves has "at least" log n. Every proof I've come across or been able to put together always deals with perfect binary trees, but I need something for any situation. Any tips to lead me in the right direction?
Lemma: the number of leaves in a tree of height h is no more than 2^h.
Proof: the proof is by induction on h.
Base Case: for h = 0, the tree consists of only a single root node which is also a leaf; here, n = 1 = 2^0 = 2^h, as required.
Induction Hypothesis: assume that all trees of height k or less have fewer than 2^k leaves.
Induction Step: we must show that trees of height k+1 have no more than 2^(k+1) leaves. Consider the left and right subtrees of the root. These are trees of height no more than k, one less than the height of the whole tree. Therefore, each has at most 2^k leaves, by the induction hypothesis. Since the total number of leaves is just the sum of the numbers of leaves of the subtrees of the root, we have n = 2^k + 2^k = 2^(k+1), as required. This proves the claim.
Theorem: a binary tree with n leaves has height at least log(n).
We have already noted in the lemma that the tree consisting of just the root node has one leaf and height zero, so the claim is true in that case. For trees with more nodes, the proof is by contradiction.
Let n = 2^a + b where 0 < b <= 2^a. Now, assume the height of the tree is less than a + 1, contrary to the theorem we intend to prove. Then the height is at most a. By the lemma, the maximum number of leaves in a tree of height a is 2^a. But our tree has n = 2^a + b > 2^a leaves, since 0 < b; a contradiction. Therefore, the assumption that the height was less than a+1 must have been incorrect. This proves the claim.

Number of nodes in the bottom level of a balanced binary tree

I am wondering about two questions that came up when studying about binary search trees. They are the following:
What is the maximum number of nodes in the bottom level of a balanced binary search tree with n nodes?
What is the minimum number of nodes in the bottom level of a balanced binary search tree with n nodes?
I cannot find any formulas in my textbook regarding this. Is there any way to answers these questions? Please let me know.
Using notation:
H = Balanced binary tree height
L = Total number of leaves in a full binary tree of height H
N = Total number of nodes in a full binary tree of height H
The relation is L = (N + 1) / 2 as demonstrated below. That would be the maximum number of leaf nodes for a given tree height H. The minimum number of nodes at a given height is 1 (cannot be zero, because then the tree height would be reduced by one).
Drawing trees with increasing heights, one can observe that:
H = 1, L = 1, N = 1
H = 2, L = 2, N = 3
H = 3, L = 4, N = 7
H = 4, L = 8, N = 15
...
The relation between tree height (H) and the total number of leaves (L)
and the total number of nodes (N) becomes apparent:
L = 2^(H-1)
N = (2^H) - 1
The correctness is easily proven using mathematical induction.
Examples above show that it is true for small H.
Simply put in the value of H (e.g. H=1) and compute L and N.
Assuming the formulas are true for some H, one can show they are also true for HH=H+1:
For L, the assumption is that L=2^(H-1) is true.
As each node has two children, increasing the height by one
is going to replace each leaf node with two new leaves, effectively
doubling the total number of leaves. Therefore, in case of HH=H+1,
the total number of leaves (LL) is going to be doubled:
LL = L * 2
= 2^(H-1) * 2
= 2^(H)
= 2^(HH-1)
For N, the assumption is that N=(2^H)-1 is true.
Increasing the height by one (HH=H+1) increases the total number
of nodes by the total number of added leaf nodes. Therefore,
NN = N + LL
= (2^H) - 1 + 2^(HH-1)
= 2^(HH-1) - 1 + 2^(HH-1)
= 2 * 2^(HH-1) - 1
= (2^HH) - 1
Applying the mathematical induction, the correctness is proven.
H can be expressed in terms of N:
N = (2^H) - 1 // +1 to both sides
N + 1 = 2^H // apply log2 monotone function to both sides
log2(N+1) = log2(2^H)
= H * log2(2)
= H
The direct relation between L and N (which is the answer to the question asked) is:
L = 2^(H - 1) // replace H = log2(N + 1)
= 2^(log2(N + 1) - 1)
= 2^(log2(N + 1) - log2(2))
= 2^(log2( (N + 1) / 2 ))
= (N + 1) / 2
For Big O analysis, the constants are discarded, so the Binary Search Tree lookup time complexity (i.e. H with respect to the input size N) is O(log2(N)). Also, keeping in mind the formula for changing the logarithm base:
log2(N) = log10(N) / log10(2)
and discarding the constant factor 1/log10(2), where instead of 10 one can have an arbitrary logarithm base, the time complexity is simply O(log(N)) regardless of the chosen logarithm base constant.
Assuming that it's a full binary tree, the number of nodes in the leaf will always be equal to (n/2)+1.
For the minimum number of nodes, the total number of nodes could be 1 (satisfying the condition that it should be a balanced tree).
I got the answers from my professor.
1) Maximum number of nodes at the last level: ⌈n/2⌉
If there is a balanced binary search tree with 7 nodes, then the answer would be ⌈7/2⌉ = 4 and for a tree with 15 nodes, the answer would be ⌈15/2⌉ = 8.
But what is troubling is the fact that this formula gives the right answer only when the last level of a balanced tree is completely filled from left to right.
For example, a balanced binary search tree with 5 nodes, the above formula gives an answer of 3 which is not true because a tree with 5 nodes can contain a maximum nodes of 4 nodes at the last level. So I am guessing he meant full balanced binary search tree.
2) Minimum number of nodes at the last level: 1
The maximum number of nodes at level L in a binary tree is 2^L (if you assume that the vertex is level 0). This is easy to see because at each level you spawn 2 children from each previous leaf. The fact that it is balanced/search tree is irrelevant. So you have to find the biggest L such that 2^L < n and subtract it from n. Which in math language is:
The minimum number of nodes depends on the way you balance your tree. There can be height-balanced trees, weight-balanced trees and I assume other balanced trees. Even with height balanced trees you can define what do you mean by a balanced tree. Because technically a tree of 2^N nodes that has a hight of N + 2 is still a balanced tree.

Disjoint Set and Union Data Structure

A union-find structure is a data structure
supporting the following operations:
● find(x), which returns the representative of
node x, and
● union(x, y), which merges the sets containing x
and y into a single set.
Find(x) is having a time complexity of O(n) , so to improve this we are advisied to used concept of Ranks
i.e.
the larger connected component eats up the smaller oneWhich improves the time complexity to O(logn)
I could not understand How we are improving Time Complexity By merging trees on their basics of Rank(Depth) , and How the O(logn) time complexity is achieved.
Please help me to Understand my concept of merging trees on the basis of their Rank.
The key is to understand the maximal height of the tree representing the sets is of size log(n) + 1, thus, following up nodes from any given node to its root is done by O(log(n)) steps.
We now have to prove the claim that each tree in the disjoint set forest is at most of height log(n) + 1 - where n is the number of nodes in this tree. We will prove it by induction and show that after each union(x,y) - this property remains unchanged.
Base: When we begin, we have n different trees, all of size 1. log(1) + 1 = 1 - so each tree is indeed of maximal height log(n) + 1
Union(x,y): We unite two sets, x of size n1 and y of size n2. Without loss of generality, let n1<=n2.
From induction hypothesis, the height h1 of the tree representing x is at most log(n2)+1
So, the union operation is done by changing x's root to point to y's root. This means that the maximal height of any node that was in x is now at most
h1+1 = log(n1)+1 + 1 = log(n1) + log(2) + 1 = log(2*n1) + 1 = log(n1 + n1) + 1 <= log(n1 + n2) + 1
So, we have just found out that for every node that was formally in x, the maximal distance to the root is log(n1+n2) + 1, and the size of the new tree (x and y united) is now n1+n2, so we proved that the desired property remains for any node that was formally in x.
For y - the distance to root remains, while the size of the tree does not shrink - so the property is valid there too.
In conclusion - for all node that was in x or y, the maximal depth from the new root is now log(n1+n2)+1, as required.
QED
remark - all log in this answer is with base 2.

Number of comparisons to find an element in a BST with 635 elements?

I am a freshman in Computer Science University, so please give me a understandable justification.
I have a binary tree that is equilibrated by height which has 635 nodes. What is the number of comparisons that will occur in the worst case scenario and why?
Here's one way to think about this. Every time you do a comparison in a binary search tree, one of the following happens:
You have walked off the tree. In this case, you're done.
The value you're looking for matches the node you're currently exploring. In this case, you're done.
The value you're looking for does not match the node you're exploring. In that case, you either descend to the left or descend to the right.
The key observation here is that after each step, you either terminate (yay!) or descend lower in the tree. At each point, you make one comparison. Since you can't descend forever, there are only so many comparisons that you can make - specifically, if the tree has height h, the maximum number of comparisons you can make is h + 1, which happens if you do one comparison per level.
In your question, you're given that you have a balanced binary search tree of 635 nodes. It's not 100% clear what "balanced" means in this context, since there are many different ways of determining whether a tree is balanced and they all lead to different tree heights. I'm going to assume that you are given a complete binary search tree, which is one in which all levels except the last are filled.
The reason this is important is that if you have a complete binary search tree of height h, it can have at most 2h + 1 - 1 nodes in it. If we try to solve for the height of the tree in terms of the number of nodes, we get this:
n = 2h+1 - 1
n + 1 = 2h+1
lg (n + 1) = h + 1
lg (n + 1) - 1 = h
Therefore, if you have the number of nodes n, you can determine the minimum height of a complete binary search tree holding n nodes. In your case, n = 635, so we get
lg (635 + 1) - 1 = h
lg (636) - 1 = h
9.312882955 - 1 = h
8.312882955 = h
Therefore, the tree has height 8.312882955. Of course, trees can't have fractional height, so we can take the ceiling to find that the height of the tree would be 9. Since the maximum number of comparisons made is h + 1, there are at most 10 comparisons made when doing a lookup.
Hope this helps!
Without any loss of generality you can say the maximum no. of comparison will be the height of the BST ... you dont have to visit every node in the node because each comparison takes you closer to the node...
Let's say it is a balanced BST (all nodes except last have 2 child nodes).
For instance,
Level 0 --> Height 1 --> Number of nodes = 1
Level 1 --> Height 2 --> Number of nodes = 2
Level 2 --> Height 3 --> Number of nodes = 3
Level 3 --> Height 4 --> Number of nodes = 8
......
......
Level n --> Height n+1 --> Number of nodes = 2^n or 2^(h-1)
Using the above logic, you can derive the search time for best, worst or average case.

Proof that the height of a balanced binary-search tree is log(n)

The binary-search algorithm takes log(n) time, because of the fact that the height of the tree (with n nodes) would be log(n).
How would you prove this?
Now here I am not giving mathematical proof. Try to understand the problem using log to the base 2. Log2 is the normal meaning of log in computer science.
First, understand it is binary logarithm (log2n) (logarithm to the base 2).
For example,
the binary logarithm of 1 is 0
the binary logarithm of 2 is 1
the binary logarithm of 3 is 1
the binary logarithm of 4 is 2
the binary logarithm of 5, 6, 7 is 2
the binary logarithm of 8-15 is 3
the binary logarithm of 16-31 is 4 and so on.
For each height the number of nodes in a fully balanced tree are
Height Nodes Log calculation
0 1 log21 = 0
1 3 log23 = 1
2 7 log27 = 2
3 15 log215 = 3
Consider a balanced tree with between 8 and 15 nodes (any number, let's say 10). It is always going to be height 3 because log2 of any number from 8 to 15 is 3.
In a balanced binary tree the size of the problem to be solved is halved with every iteration. Thus roughly log2n iterations are needed to obtain a problem of size 1.
I hope this helps.
Let's assume at first that the tree is complete - it has 2^N leaf nodes. We try to prove that you need N recursive steps for a binary search.
With each recursion step you cut the number of candidate leaf nodes exactly by half (because our tree is complete). This means that after N halving operations there is exactly one candidate node left.
As each recursion step in our binary search algorithm corresponds to exactly one height level the height is exactly N.
Generalization to all balanced binary trees: If the tree has less nodes than 2^N we for sure don't need more halvings. We might need less or the same amount but never more.
Assuming that we have a complete tree to work with, we can say that at depth k, there are 2k nodes. You can prove this using simple induction, based on the intuition that adding an extra level to the tree will increase the number of nodes in the entire tree by the number of nodes that were in the previous level times two.
The height k of the tree is log(N), where N is the number of nodes. This can be stated as
log2(N) = k,
and it is equivalent to
N = 2k
To understand this, here's an example:
16 = 24 => log2(16) = 4
The height of the tree and the number of nodes are related exponentially. Taking the log of the number of nodes just allows you to work backwards to find the height.
Just look up the rigorous proof in Knuth, Volume 3 - Searching and Sorting Algorithms ... He does it far more rigorously than anyone else I can think of.
http://en.wikipedia.org/wiki/The_Art_of_Computer_Programming
You can find it in any good Computer Science library and on the bookshelves of many (very) old geeks.
Why is the height of a balanced binary tree equal to ceil(log2N) for N nodes?
w = width of base (maximum number of leaves)
h = height of tree (maximum number of edges from root to leaf)
Divide w by 2 (h times) to get to 1, which counts the single root node at top.
N = w + w/2 + ... + 1
N = 2h + ... + 21 + 20
= (1-2h+1) / (1-2) = 2h+1-1
log2(N+1) = h+1
Check: if N=1, h=0. If h=1, N=3.
This formula is for if the bottom level is full. N will not always be so great, but would still have the same height, h. So we must take the log's ceiling.

Resources