get a binary tree of height n + 1 from one of height n - binary-tree

"To get a binary tree of height n + 1 from one of height n, we can create, at most, two leaves in place of each previous one."
Introduction to Formal Languages and Automata - Peter Linz
Could someone explain to me (visually) how we get here a binary tree of height n + 1 by just doubling the leaves?

The words to stress in the quote are "in place of".
Here is an example of a binary tree with height 2:
1
/ \
2 3
/
4
It has two leaves: one with value 3 and one with value 4.
If we replace each of those leaves with a little subtree that has a parent node with 2 leaves, then we get:
1
/ \
2 3
/ / \
4 c d
/ \
a b
Here we see that the resulting tree has height 3.
In general, when a tree has a height of 𝑛, then there is at least one leaf at level 𝑛 in that tree (when starting the level numbering at 0). When that leaf is replaced with a little subtree with a parent node that has 2 leaves, it is clear that those new leaves reside on level 𝑛 + 1, giving the tree a height of 𝑛 + 1.
The words "at most, two leaves" indicate that the above would also work if we replaced the existing leaves with little subtrees that just have a parent with one leaf.

Related

Average number of descendant nodes in a complete and full binary tree

Given a complete and full binary tree with n nodes, what is the average number of descendants a node has? For example, the root node has n - 1 descendants and each leaf node has 0 descendants, but considering all nodes, what is the average?
Let's change a little your question for a short moment: we call "descendants" will be the number of descendants including the node itself.
A leaf has 1 descendant, its parent has 3 descendants, then 7, then 15, etc. These numbers are of kind 2^k - 1 where k is the number of levels from the bottom of the tree (with k=1 fot the leaves).
For K levels, you have obviously 2^K-1 nodes in your tree; since you called this value n, you have n=2^K-1 and K=log2(n+1).
Still calling a descendant as I previously did (see later for your exact question), you have 1 descendant for 2^(K-1) nodes (the leaves), 3 descendants for 2^(K-2) nodes, ... until n descendants for 2^(K-K)=1 node.
The average number of descendants will be:
Sum(k=1,K, 2^(K-k) * (2^k-1) ) / n
According to your definition of the "descendants" (excluding the node itself) you have to substract 1:
Sum(k=1,K, 2^(K-k) * (2^k-1) ) / n - 1
And by replacing K, you get:
Sum(k=1,log2(n+1), (2^k-1)(n+1) / 2^k ) / n - 1
With the program Pari-GP, I type:
f(n)=sum(k=1,round(log(n+1)/log(2)), (2^k-1)*(n+1) / 2^k ) / n - 1
and I get:
f(1)=0
f(3)=2/3
f(7)=10/7
f(15)=34/15
which looks like A036799(n)/n. If the sequence is actually the same (I didn't check carefully), you could write the simpler expression (without the sum):
f(n)= ((log(n+1)/log(2)-2)*(n+1)+2)/n

Number of comparisons to find an element in a BST with 635 elements?

I am a freshman in Computer Science University, so please give me a understandable justification.
I have a binary tree that is equilibrated by height which has 635 nodes. What is the number of comparisons that will occur in the worst case scenario and why?
Here's one way to think about this. Every time you do a comparison in a binary search tree, one of the following happens:
You have walked off the tree. In this case, you're done.
The value you're looking for matches the node you're currently exploring. In this case, you're done.
The value you're looking for does not match the node you're exploring. In that case, you either descend to the left or descend to the right.
The key observation here is that after each step, you either terminate (yay!) or descend lower in the tree. At each point, you make one comparison. Since you can't descend forever, there are only so many comparisons that you can make - specifically, if the tree has height h, the maximum number of comparisons you can make is h + 1, which happens if you do one comparison per level.
In your question, you're given that you have a balanced binary search tree of 635 nodes. It's not 100% clear what "balanced" means in this context, since there are many different ways of determining whether a tree is balanced and they all lead to different tree heights. I'm going to assume that you are given a complete binary search tree, which is one in which all levels except the last are filled.
The reason this is important is that if you have a complete binary search tree of height h, it can have at most 2h + 1 - 1 nodes in it. If we try to solve for the height of the tree in terms of the number of nodes, we get this:
n = 2h+1 - 1
n + 1 = 2h+1
lg (n + 1) = h + 1
lg (n + 1) - 1 = h
Therefore, if you have the number of nodes n, you can determine the minimum height of a complete binary search tree holding n nodes. In your case, n = 635, so we get
lg (635 + 1) - 1 = h
lg (636) - 1 = h
9.312882955 - 1 = h
8.312882955 = h
Therefore, the tree has height 8.312882955. Of course, trees can't have fractional height, so we can take the ceiling to find that the height of the tree would be 9. Since the maximum number of comparisons made is h + 1, there are at most 10 comparisons made when doing a lookup.
Hope this helps!
Without any loss of generality you can say the maximum no. of comparison will be the height of the BST ... you dont have to visit every node in the node because each comparison takes you closer to the node...
Let's say it is a balanced BST (all nodes except last have 2 child nodes).
For instance,
Level 0 --> Height 1 --> Number of nodes = 1
Level 1 --> Height 2 --> Number of nodes = 2
Level 2 --> Height 3 --> Number of nodes = 3
Level 3 --> Height 4 --> Number of nodes = 8
......
......
Level n --> Height n+1 --> Number of nodes = 2^n or 2^(h-1)
Using the above logic, you can derive the search time for best, worst or average case.

What is the minimum sized AVL tree where a deletion causes 2 rotations?

It is well known that deletion from an AVL tree may cause several nodes to eventually be unbalanced. My question is, what is the minimum sized AVL tree such that 2 rotations are required (I'm assuming a left-right or right-left rotation is 1 rotation)? I currently have an AVL tree with 12 nodes where deletion would cause 2 rotations. My AVL tree is inserting in this order:
8, 5, 9, 3, 6, 11, 2, 4, 7, 10, 12, 1.
If you delete the 10, 9 becomes unbalanced and a rotation occurs. In doing so, 8 becomes unbalanced and another rotation occurs. Is there a smaller tree where 2 rotations are necessary after a deletion?
After reading jpalecek's comment, my real question is: Given some constant k, what is the minimum sized AVL tree that has k rotations after 1 deletion?
A tree of four nodes requires a single rotation in the worst case. The worst case number of deletions increases with each term in the list: 4, 12, 33, 88, 232, 609, 1596, 4180, 10945, 28656, ...
This is Sloane's A027941 and is a Fibonacci-type sequence that can be generated with N(i)=1+N(i-1)+N(i-2) for i>=2, N(1)=2, N(0)=1.
To see why this is so, first note that rotating an imbalanced AVL tree reduces its height by one because its shorter leg is lengthened at the expense of its longer leg.
When a node is removed from an AVL tree, the AVL algorithm checks all of the removed node's ancestors for potential rebalancing. Therefore, to answer your question we need to identify trees with the minimum number of nodes for a given height.
In such a tree every node is either a leaf or has a balance factor of +1 or -1: if a node had a balance factor of zero this would mean that a node could be removed without triggering a rebalancing. And we know rebalancing makes a tree shorter.
Below, I show a set of worst-case trees. You can see that following the first two trees in the sequence, each tree is constructed by joining the previous two trees. You can also see that every node in each tree is either a leaf or has a non-zero balance factor. Therefore, each tree has the maximum height for its number of nodes.
For each tree, a removal in the left subtree will, in the worst case, cause rotations which ultimately reduce the height of that subtree by one. This balances the tree as a whole. On the other hand, removing a node from the right subtree may ultimately imbalance the tree resulting in a rotation of the root. Therefore, the right subtrees are of prime interest.
You can verify that Tree (c) and Tree (d) have one rotation upon removal, in the worst case.
Tree (c) appears as a right subtree in Tree (e) and Tree (d) as a right subtree in Tree (f). When a rotation is triggered in Tree (c) or (d) this shortens the trees resulting in a root rotation in Trees (d) and (f). Clearly, the sequence continues.
If you count the number of nodes in the trees this matches my original statement and completes the proof.
(In the trees below removing the highlighted node will result in a new maximum number of rotations.)
I am not good at proofs, and I'm sure the below is full of holes, but maybe it will spark something positive.
To effect k rotations on a minimized AVL tree following the deletion of a node, the following conditions must be met:
The target node must exist in a 4-node sub-tree.
The target node must either be on the short branch, or must be the root of the sub-tree and be replaced by the leaf of the short branch.
Each node in the ancestry of the root of the target sub-tree must be slightly out of balance (balance factor of +/-1). That is - when a balance factor of 0 is encountered, the rotation chain will cease.
The height and number of nodes of the minimized tree is calculated with the following equations.
Let H(k) = the minimum height of the tree affected by k rotations.
H(k) = 2k + 1, k > 0
Let N(h) = the number of nodes in a (min-node) AVL tree of height h.
N(0) = 0
N(1) = 1
N(h) = N(h-1) + N(h-2) + 1, h > 1
Let F(k) = the minimum number of nodes in the tree affected by k rotations.
F(k) = N(H(k))
(e.g:)
k = 1, H(k) = 4, N(4) = 7
k = 2, H(k) = 6, N(6) = 20
Proof (such as it is)
Minimum Height
A deletion can only cause a rotation for trees with 4 or more nodes.
A tree of 1 node must have a balance factor of 0.
A tree of 2 nodes must have a balance factor of +/-1, and deletion leads to a balanced tree of 1 node.
A tree of 3 nodes must have a balance factor of 0. Removal of a node results in a balance factor of +/-1 and no rotation occurs.
Therefore, deletion from a tree with fewer than 4 nodes can not result in a rotation.
The smallest sub-tree for which 1 rotation occurs on delete is 4 nodes, which has height of 3. Removal of the node in the short side will result in rotation. Likewise, removal of the root node, using the node on the short side as replacement will cause a rotation. It doesn't matter how the tree is configured:
B B Removal of A or replacement of B with A
/ \ / \ results in rotation. No rotation occurs
A C A D on removal of C or D, or on replacement
\ / of B with C.
D C
C C Removal of D or replacement of C with D
/ \ / \ results in rotation. No rotation occurs
B D A D on removal of A or B, or on replacement
/ \ of C with B.
A B
Deletion from a 4 node tree results in a balanced tree of height 2.
.
/ \
. .
To effect a second rotation, the target tree must have a sibling of height 4, so that the balance factor of the root is +/-1 (and therefore has a height of 5). It doesn't matter if the affected tree is on the right or left of the parent, nor is the layout of the sibling tree important (that is, the H3 child of H4 can be on the left or right, and can be any of the 4 orientations above while the H2 child can be either of the 2 possible orientations - this needs proving).
_._ _._
/ \ / \
(H4) . . (H4)
/ \ / \
. . . .
\ \
. .
It is clear that the third rotation requires that the grandparent of the affected tree be likewise imbalanced by +/-1, and the fourth requires the great-grandparent be imbalanced by +/-1, and so on.
By definition, the height of a sub-tree is the maximum height of each branch plus one for the root. One sibling must be 1 taller than the other to achieve the +/-1 imbalance in the root.
H(1) = 3 (as observed above)
H(k) = 1 + max(H(k - 1), H(k - 1) + 1)) = 1 + H(k - 1) + 1 = H(k - 1) + 2
... Inductive proof leading to H(k) = 2k + 1 eludes me.
Minimum Nodes
By definition, the number of nodes in a sub-tree is the number of nodes in the left branch plus the number of nodes in the right branch plus 1 for the root.
Also be definition, a tree of height 0 must have 0 nodes, and a tree of height 1 must have no branches and thus 1 node.
It was shown above that the one branch must be one shorter than the other.
Let N(h) = minimum number of nodes required to create a tree of height h:
N(0) = 0
N(1) = 1
// the number of nodes in the two subtrees plus the root
N(h) = N(h-1) + N(h-2) + 1
Corollary
The minimum number of nodes is not necessarily the maximum in large trees. To wit:
Delete A from the following tree and observe that the height doesn't change following rotation. Therefore, the balance factor in the parent would not change and no additional rotation would occur.
B B D
/ \ \ / \
A D => D => B E
/ \ / \ \
C E C E C
However, in the k = 2 case, it does not matter if H(4) is minimized here - the second rotation will still occur.
_._ _._
/ \ / \
(H4) . . (H4)
/ \ / \
. . . .
\ \
. .
Questions
What is the position of the target sub-tree? Clearly for k = 1, it is the root, and for k = 2, it is the left if the root's balance factor is -1 otherwise the right. Is there a formula for determining position for k >= 3?
What is the maximum nodes a tree can contain to effect k rotations? Is it possible to have an intermediate node in the ancestry that is not rotated, though its parent is?

Find the number of nodes of n-element heap of given height

We have came across a question in Thomas H. Cormen which are asking for showing
Here I am confused by this question that how there will be at most nodes
For instance, consider this problem:
In the above problem at height 2 there are 2 nodes. But if we calculate by formula:
Greatest Integer of (10/2^2+1) = 4
it does not satisfy Thomas H. Cormen questions.
Please correct me if I am wrong here.
Thanks in Advance
In Tmh Corman I observed that he is doing height numbering from 1 not from 0 so the formula is correct, I was doing wrong Interpration. So leaf as height 1 and root has height 4 for above question
Reading all the answers, I realized that the confusion comes from the precise definition of height. In page 153 of CLRS book, the height is defined as follows:
Viewing a heap as a tree, we define the height of a node in a heap to be the number of edges on the longest simple downward path from the node to a leaf...
Now let's look at the original heap provided by Nishant. The nodes 8, 9, 10, 6, and 7 are at height 0 (i.e., leaves). The nodes 4, 5 and 3 are at height 1. For example, there is one edge between node 5 and its leaf, node 10. Also there is one edge between node 3 and its leaf node 6. Node 6 looks like it is at height 1 but it is at height 0 and hence a leaf. The node 2 is the only node at height 2. You may wonder node 1 (the root) is two edges away from node 6 and 7 (leaves), and say node 1 is also at height 2. But if we look back at the definition, the bold-face word "longest" suggests that the longest simple downward path from the root to a leaf has 3 edges (passing node 2). Finally, the node 1 is at height 3.
In summary, there are 5, 3, 1, 1 nodes at height 0, 1, 2, 3, respectively.
Let's apply the formula to the observation we made in the above paragraph. I would like to point out that the formula given by Nishant is not correct.
It should be
ceiling(n/2^(h+1)) not ceiling(n/(2^h+1). Sorry about the terrible formatting. I am not able to post an image yet.
Anyways, using the correct formula,
h = 0, ceiling(10/2) = 5 (nodes 8, 9, 10, 6, and 7)
h = 1, ceiling(10/4) = 3 (nodes 4, 5 and 3)
h = 2, ceiling(10/8) = 2 (node 2, but this is okay because the formula is predicting that there are at most 2 nodes at height 2.)
h = 3, ceiling(10/16) = 1 (node 1)
With the correct definition of height, the formula works.
It looks like your formula says there are at most [n/2^h+1] nodes of height h. In your example there are two nodes of height 2, which is less than your computed possible maximum of 4(ish).
While calculating the tight bound for Build-Max-Heap author has used this property in the equation.
In this case we call the helper Max-Heapify which takes O(h) where h is the height of the sub-tree rooted at the current node (not the height of node itself with respect to the full tree).
Therefore if we consider the sub tree rooted at leaf node, it will have height 0 and number of nodes in the tree at that level would be at most n / 20+1 = n/2 (i.e h=0 for the sub tree formed from node at leaves).
Similarly for sub-tree rooted at actual root the height of the tree would be log(n) and in that case the number of nodes at that level would be 1 i.e floor of n / 2logn+1 = [n/n+1].
Formula for
no. of nodes = n/(2^(h+1))
so when h is 2, and n = 10
no. of nodes = 10/(2^(2+1)) = 10/(2^3) = 10/8 = 1.25
But
ceil of 10/8 = 2
Hence there are 2 nodes which you can see from the figure.
Though it is mentioned in Cormen that height of a node is the greatest distance traveled from node to leaf(the number of edges), if you take height to be the distance of a node from the leaf, i.e. at leaf the height is zero and at root the height is log(n). The formula stands correct.
As for the leaves you have h=0; hence by the formula n/(2^(h+1))
h=0; max number of leaves in the heap will be n/2.
what about height 1. Cormen's theory gives 10/(2^(1+1))=3(ceil) while there is 4 nodes at height 1. This is a contradiction.
It is not true that Thomas H. Cormen is counting the height of the tree starting from one, height is h = 0, 1, ..., log n and it increases as you go upwards:
and in the following formula, he added 1 plus the height:
All the confusion is coming from the fact that this will work nicely with Perfect Binary Trees, not with the one you are showing in your question, this is why he is saying ON MOST
when you consider Big-O it wouldn't really matter
This formula is wrong, it gives wrong answers in many cases like in this question for h=1 (ie second last level) it gives maximum number of nodes is 3 but there are 4 nodes. Also let us consider a tree with 4 nodes :
a
/ \
b c
/
d
node d has height 0, let us consider for height =1 using the formula n/2^(h+1) we get
4/2^(1+1) = 1
which means this level can have at most 1 node which is false !
so this formula is not right.
The formula is quite correct. Nothing is wrong with the formula!!
Lets take the tree(although its not heap yet its complete) in the question posed by Nishant on the top.
For h=0 means all leaves so ceil(10/2^(0+1)=5) so there are 5 leaves
For h=1 means all nodes which have one arc to reach the leaves, so ceil(10/2^(1+1))=3 there are 3 such nodes in your tree.
For h=2 means all nodes which have two consecutive arcs to reach leaves, so ceil(10/2^(2+1))=1 so you have only one such node(left successor of the root)
For h=3 means all nodes which have three arcs to leaves, so ceil(10/2^(3+1))=1 which is the root.
Moral of the story is that you are confused between height and level. Level starts from up to down. Which means you have 4 nodes on level 2. i.e you can reach 4 nodes if you start at root and moves two arcs down.
Whereas height is completely different. Like in above case at height 0 there are 5 nodes (3 on level 3, and 2 on level 2). Hence height h of a node n means how many arcs you can travel to reach a leaf.
regards,
Hope it clarifies the point.
Safdar from Pakistan

How is Wikipedia's example of an unbalanced AVL tree really unbalanced?

The image above is from "Wikipedia's entry on AVL trees" which Wikipedia indicates is unbalanced.
How is this tree not balanced already?
Here's a quote from the article:
The balance factor of a node is the height of its right subtree minus the height of its left subtree and a node with balance factor 1, 0, or -1 is considered balanced. A node with any other balance factor is considered unbalanced and requires rebalancing the tree. The balance factor is either stored directly at each node or computed from the heights of the subtrees.
Both the left and right subtrees have a height of 4. The right subtree of the left tree has a height of 3 which is still only 1 less than 4. Can someone explain what I'm missing?
Node 76 is unbalanced, for example, because its right subtree is of height 0 and the left is of height 3.
To be balanced, every node in the tree must, either,
have no children, (be a "leaf" node)
Have two children.
Or, if it has only one child, that child must be a leaf.
In the chart you posted, 9, 54 & 76 violate the last rule.
Properly balanced, the tree would look like:
Root: 23
(23) -> 14 & 67
(14) -> 12 & 17
(12) -> 9
(17) -> 19
(67) -> 50 & 72
(50) -> 54
(72) -> 76
UPDATE: (after almost 9 years, I created a cool online chart for the graph at draw.io)
Intuitively, it's because it's not as small as possible. e.g., 12 should be the parent of 9 and 14. As it is, 9 has no left sub-tree so it's out of balance. A tree is a hierarchical data structure so a rule like "balanced" often apply to every node and not just the root node.
You're correct the root node is balanced, but not all the nodes of the tree are.
Another way to look at this is that the height h of any node is given by:
h = 1 + max( left.height, right.height )
and a node is unbalanced whenever:
abs( left.height - right.height ) > 1
Looking at the tree above:
- Node 12 is a leaf node so its height = 1+max(0,0) = 1
- Node 14 has one child (12, on the left), so its height is = 1+max(1,0) = 2
- Node 9 has one child (14, on the right), so its height is = 1+max(0,2) = 3
To determine if node 9 is balanced or not we look at the height of its children:
- 9's left child is NULL, so 9.left.height = 0
- 9's right child (14) has height 2, so 9.right.height = 2
Now solve to show that node 9 is unbalanced:
9.unbalanced = abs( 9.left.height - 9.right.height ) > 1
9.unbalanced = abs( 0 - 2 ) > 1
9.unbalanced = abs( -2 ) > 1
9.unbalanced = 2 > 1
9.unbalanced = true
Similar calculations can be made for every other node.
A height balanced tree is many ways best! For every subtree, it has children of heights that differ by 1 or 0, where no children is a height of zero, so a leaf has height 1. It is the simplest balanced tree, with the lowest overhead to balance: a maximum of 2n rotations for an insert or delete in a tree area of height n, and often much less. A rotation is writing 3 pointers, and so is very cheap. The worst case of the height balanced tree, even though of about 42% greater maximum height, is about one comparison less efficient than a perfectly balanced full binary tree of 2^n-1 values. A perfectly balanced full binary tree is far more expensive to achieve, tends to need, on average, n-1 comparisons for a find and exactly n comparisons always for a not-found. For the tree worst case insertion order, ordered data, when 2^n-1 items are inserted, the height balanced tree that results is a perfectly balanced full binary tree!
(Rotation is a great way to balance, but comes with a catch: if the heavy grandchild is on the inside of the heavy child, a single rotate just moves it to the inside of the opposite side, with no improvement. So, if it is 1 unit higher, even though nominally balanced, you rotate that child to lighten it first. Hence a max of 2n rotations for an n level insert or delete, worst case and unlikely.)

Resources