The image above is from "Wikipedia's entry on AVL trees" which Wikipedia indicates is unbalanced.
How is this tree not balanced already?
Here's a quote from the article:
The balance factor of a node is the height of its right subtree minus the height of its left subtree and a node with balance factor 1, 0, or -1 is considered balanced. A node with any other balance factor is considered unbalanced and requires rebalancing the tree. The balance factor is either stored directly at each node or computed from the heights of the subtrees.
Both the left and right subtrees have a height of 4. The right subtree of the left tree has a height of 3 which is still only 1 less than 4. Can someone explain what I'm missing?
Node 76 is unbalanced, for example, because its right subtree is of height 0 and the left is of height 3.
To be balanced, every node in the tree must, either,
have no children, (be a "leaf" node)
Have two children.
Or, if it has only one child, that child must be a leaf.
In the chart you posted, 9, 54 & 76 violate the last rule.
Properly balanced, the tree would look like:
Root: 23
(23) -> 14 & 67
(14) -> 12 & 17
(12) -> 9
(17) -> 19
(67) -> 50 & 72
(50) -> 54
(72) -> 76
UPDATE: (after almost 9 years, I created a cool online chart for the graph at draw.io)
Intuitively, it's because it's not as small as possible. e.g., 12 should be the parent of 9 and 14. As it is, 9 has no left sub-tree so it's out of balance. A tree is a hierarchical data structure so a rule like "balanced" often apply to every node and not just the root node.
You're correct the root node is balanced, but not all the nodes of the tree are.
Another way to look at this is that the height h of any node is given by:
h = 1 + max( left.height, right.height )
and a node is unbalanced whenever:
abs( left.height - right.height ) > 1
Looking at the tree above:
- Node 12 is a leaf node so its height = 1+max(0,0) = 1
- Node 14 has one child (12, on the left), so its height is = 1+max(1,0) = 2
- Node 9 has one child (14, on the right), so its height is = 1+max(0,2) = 3
To determine if node 9 is balanced or not we look at the height of its children:
- 9's left child is NULL, so 9.left.height = 0
- 9's right child (14) has height 2, so 9.right.height = 2
Now solve to show that node 9 is unbalanced:
9.unbalanced = abs( 9.left.height - 9.right.height ) > 1
9.unbalanced = abs( 0 - 2 ) > 1
9.unbalanced = abs( -2 ) > 1
9.unbalanced = 2 > 1
9.unbalanced = true
Similar calculations can be made for every other node.
A height balanced tree is many ways best! For every subtree, it has children of heights that differ by 1 or 0, where no children is a height of zero, so a leaf has height 1. It is the simplest balanced tree, with the lowest overhead to balance: a maximum of 2n rotations for an insert or delete in a tree area of height n, and often much less. A rotation is writing 3 pointers, and so is very cheap. The worst case of the height balanced tree, even though of about 42% greater maximum height, is about one comparison less efficient than a perfectly balanced full binary tree of 2^n-1 values. A perfectly balanced full binary tree is far more expensive to achieve, tends to need, on average, n-1 comparisons for a find and exactly n comparisons always for a not-found. For the tree worst case insertion order, ordered data, when 2^n-1 items are inserted, the height balanced tree that results is a perfectly balanced full binary tree!
(Rotation is a great way to balance, but comes with a catch: if the heavy grandchild is on the inside of the heavy child, a single rotate just moves it to the inside of the opposite side, with no improvement. So, if it is 1 unit higher, even though nominally balanced, you rotate that child to lighten it first. Hence a max of 2n rotations for an n level insert or delete, worst case and unlikely.)
Related
I am doing one question on dynamic programming where for a given height h, I have to calculate the maximum number of balanced binary trees. I have little confusion with base cases.
If the height is 0 then the number of balanced binary trees is 1 as for h=0 there is a root node only. But for h=1, I am not able to calculate the maximum number of balanced binary trees. Can somebody help me, please?
The solution with a good explanation and figures can be found in:
tutorialspoint with C plus plus code
geeksforgeeks with different implementation.
For the special cases 0 and 1 :
h=0 => nb = 1, at the root, the height is 0, and we have only one node, hence 1 tree.
h=1 => nb = 3, this mean we have these possibilities:
Root node + only left child
Root node + only right child
Root node + left and right child
Hence, at h=1, we have 3 possible binary tree.
h=2 => nb = 15...etc.
Is there a formula to calculate what the maximum and minimum height for an AVL tree, given a certain number of nodes?
For example:
Textbook question:
What is the maximum/minimum height for an AVL tree of 3 nodes, 5 nodes, and 7 nodes?
Textbook answer:
The maximum/minimum height for an AVL tree of 3 nodes is 2/2, for 5 nodes is 3/3, for 7 nodes is 4/3
I don't know if they figured it out by some magic formula, or if they draw out the AVL tree for each of the given heights and determined it that way.
The solution below is appropriate for working things out by hand and gaining an intuition, please see the exact formulas at the bottom of this answer for larger trees (54+ nodes).1
Well the minimum height2 is easy, just fill each level of the tree with nodes until you run out. That height is the minimum.
To find the maximum, do the same as for the minimum, but then go back one step (remove the last placed node) and see if adding that node to the opposite sub-tree (from where it just was) violates the AVL tree property. If it does, your max height is just your min height. Otherwise this new height (which should be min height+1) is your max height.
If you need an overview of what the properties of an AVL tree are, or just a general explanation of an AVL tree, Wikipedia is a great place to start.
Example:
Let's take the 7 node example case. You fill in all levels and find a completely filled tree of height 3. (1 at level 1, 2 at level 2, 4 at level 3. 1+2+4=7 nodes.) That means 3 is your minimum.
Now find the max. Remove that last node and place it on the left subtree instead of the right. The right subtree still has height 3, but the left subtree now has height 4. However these values differ by less than 2, so it is still an AVL tree. Therefore your max height is 4. (Which is min+1)
All three examples worked out below (note that the numbers correspond to order of placement, NOT value):
Formulas:
The technique shown above doesn't hold if you have a tree with a very large number nodes. In this case, one can use the following formulas to calculate the exact min/max height2.
Given n nodes3:
Minimum: ceil(log2(n+1))
Maximum: floor(1.44*log2(n+2)-.328)
If you're curious, the first time max-min>1 is when n=54.
1Thanks to Jamie S for bringing this failure at larger node counts to my attention.
2Technically, the height of a tree is the longest path length (in edges) between the root and any leaf node. However the OP's textbook uses a common alternate definition of height as the number of levels in a tree. For consistency with the OP and Wikipedia, we use that definition in this post as well.
3These formulas are from the Wikipedia AVL page, with constants plugged in. The original source is Sorting and searching by Donald E. Knuth (2nd Edition).
It's important to note the following defining characteristics of an AVL Tree.
AVL Tree Property
The nodes of an AVL tree abide by the BST property
AND The heights of the left and right sub-trees of any node differ by no more than 1.
Theorem: The AVL property is sufficient to maintain a worst case tree height of O(log N).
Note the following diagram.
- T1 is comprised of a T0 + 1 node, for a height of 1.
- T2 is comprised of T1 and a T0 + 1 node, giving a height of 2.
- T3 is comprised of a T2 for the left sub-tree and a T1 for the right
sub-tree + 1 node, for a height of 3.
- T4 is comprised of a T3 for the left sub-tree and a T2 for the right
sub-tree + 1 node, for a height of 4.
If you take the ceiling of O(log N), where N represents the number of nodes in an AVL tree, you get the height.
Example) T4 contains 12 nodes. [ceiling]O(log 12) = 4.
See the pattern developing here??
**The worst-case height is
Lets assume the number of nodes is n
Trying to find out the minimum height of an AVL tree would be the same as trying to make the tree complete i.e. fill all the possible nodes at each level and then move to the next level.
So at each level the number of eligible nodes increases by 2^(h-1) where h is the height of the tree.
So at h=1, nodes(1) = 2^(1-1) = 1 node
for h=2, nodes(2) = nodes(1)+2^(2-1) = 3 nodes
for h=3, nodes(3) = nodes(2)+2^(3-1) = 7 nodes
so just find the smallest h, for which nodes(h) is greater than the given number of nodes n.
Now for the problem of maximum height of an AVL tree:-
lets assume that the AVL tree is of height h, F(h) being the number of nodes in the AVL tree,
for its height to be maximum lets assume that its left subtree FL and right subtree FR have a difference in height of 1(as it satisfies the AVL property).
Now assuming FL is a tree with height h-1 and FR be a tree with height h-2.
now the number of nodes in
F(h)=F(h-1)+F(h-2)+1 (Eq 1)
Adding 1 on both sides :
F(h)+1=(F(h-1)+1)+ (F(h-2)+1) (Eq 2)
So we have reduced the maximum height problem to a Fibonacci sequence. And these trees F(h) are called Fibonacci Trees.
So, F(1)=1 and F(2)=2
so in order to get the maximum height just find the index of the the number in the fibonacci sequence which is less than or equal to n.
So applying (Eq 1)
F(3)= F(2) + F(1)+ 1=4, so if n is between 2 and 4 tree will have height 3.
F(4)= F(3)+ F(2)+ 1 = 7, similarly if n is between 4 and 7 tree will have height 4.
and so on.
http://lcm.csa.iisc.ernet.in/dsa/node112.html
It is roughly 1.44 * log n, where n is the number of nodes.
For a more detailed description on how that was derived. You can refer to this link starting on the middle of page 13: http://www.compsci.hunter.cuny.edu/~sweiss/course_materials/csci335/lecture_notes/chapter04.2.pdf
I am a freshman in Computer Science University, so please give me a understandable justification.
I have a binary tree that is equilibrated by height which has 635 nodes. What is the number of comparisons that will occur in the worst case scenario and why?
Here's one way to think about this. Every time you do a comparison in a binary search tree, one of the following happens:
You have walked off the tree. In this case, you're done.
The value you're looking for matches the node you're currently exploring. In this case, you're done.
The value you're looking for does not match the node you're exploring. In that case, you either descend to the left or descend to the right.
The key observation here is that after each step, you either terminate (yay!) or descend lower in the tree. At each point, you make one comparison. Since you can't descend forever, there are only so many comparisons that you can make - specifically, if the tree has height h, the maximum number of comparisons you can make is h + 1, which happens if you do one comparison per level.
In your question, you're given that you have a balanced binary search tree of 635 nodes. It's not 100% clear what "balanced" means in this context, since there are many different ways of determining whether a tree is balanced and they all lead to different tree heights. I'm going to assume that you are given a complete binary search tree, which is one in which all levels except the last are filled.
The reason this is important is that if you have a complete binary search tree of height h, it can have at most 2h + 1 - 1 nodes in it. If we try to solve for the height of the tree in terms of the number of nodes, we get this:
n = 2h+1 - 1
n + 1 = 2h+1
lg (n + 1) = h + 1
lg (n + 1) - 1 = h
Therefore, if you have the number of nodes n, you can determine the minimum height of a complete binary search tree holding n nodes. In your case, n = 635, so we get
lg (635 + 1) - 1 = h
lg (636) - 1 = h
9.312882955 - 1 = h
8.312882955 = h
Therefore, the tree has height 8.312882955. Of course, trees can't have fractional height, so we can take the ceiling to find that the height of the tree would be 9. Since the maximum number of comparisons made is h + 1, there are at most 10 comparisons made when doing a lookup.
Hope this helps!
Without any loss of generality you can say the maximum no. of comparison will be the height of the BST ... you dont have to visit every node in the node because each comparison takes you closer to the node...
Let's say it is a balanced BST (all nodes except last have 2 child nodes).
For instance,
Level 0 --> Height 1 --> Number of nodes = 1
Level 1 --> Height 2 --> Number of nodes = 2
Level 2 --> Height 3 --> Number of nodes = 3
Level 3 --> Height 4 --> Number of nodes = 8
......
......
Level n --> Height n+1 --> Number of nodes = 2^n or 2^(h-1)
Using the above logic, you can derive the search time for best, worst or average case.
It is well known that deletion from an AVL tree may cause several nodes to eventually be unbalanced. My question is, what is the minimum sized AVL tree such that 2 rotations are required (I'm assuming a left-right or right-left rotation is 1 rotation)? I currently have an AVL tree with 12 nodes where deletion would cause 2 rotations. My AVL tree is inserting in this order:
8, 5, 9, 3, 6, 11, 2, 4, 7, 10, 12, 1.
If you delete the 10, 9 becomes unbalanced and a rotation occurs. In doing so, 8 becomes unbalanced and another rotation occurs. Is there a smaller tree where 2 rotations are necessary after a deletion?
After reading jpalecek's comment, my real question is: Given some constant k, what is the minimum sized AVL tree that has k rotations after 1 deletion?
A tree of four nodes requires a single rotation in the worst case. The worst case number of deletions increases with each term in the list: 4, 12, 33, 88, 232, 609, 1596, 4180, 10945, 28656, ...
This is Sloane's A027941 and is a Fibonacci-type sequence that can be generated with N(i)=1+N(i-1)+N(i-2) for i>=2, N(1)=2, N(0)=1.
To see why this is so, first note that rotating an imbalanced AVL tree reduces its height by one because its shorter leg is lengthened at the expense of its longer leg.
When a node is removed from an AVL tree, the AVL algorithm checks all of the removed node's ancestors for potential rebalancing. Therefore, to answer your question we need to identify trees with the minimum number of nodes for a given height.
In such a tree every node is either a leaf or has a balance factor of +1 or -1: if a node had a balance factor of zero this would mean that a node could be removed without triggering a rebalancing. And we know rebalancing makes a tree shorter.
Below, I show a set of worst-case trees. You can see that following the first two trees in the sequence, each tree is constructed by joining the previous two trees. You can also see that every node in each tree is either a leaf or has a non-zero balance factor. Therefore, each tree has the maximum height for its number of nodes.
For each tree, a removal in the left subtree will, in the worst case, cause rotations which ultimately reduce the height of that subtree by one. This balances the tree as a whole. On the other hand, removing a node from the right subtree may ultimately imbalance the tree resulting in a rotation of the root. Therefore, the right subtrees are of prime interest.
You can verify that Tree (c) and Tree (d) have one rotation upon removal, in the worst case.
Tree (c) appears as a right subtree in Tree (e) and Tree (d) as a right subtree in Tree (f). When a rotation is triggered in Tree (c) or (d) this shortens the trees resulting in a root rotation in Trees (d) and (f). Clearly, the sequence continues.
If you count the number of nodes in the trees this matches my original statement and completes the proof.
(In the trees below removing the highlighted node will result in a new maximum number of rotations.)
I am not good at proofs, and I'm sure the below is full of holes, but maybe it will spark something positive.
To effect k rotations on a minimized AVL tree following the deletion of a node, the following conditions must be met:
The target node must exist in a 4-node sub-tree.
The target node must either be on the short branch, or must be the root of the sub-tree and be replaced by the leaf of the short branch.
Each node in the ancestry of the root of the target sub-tree must be slightly out of balance (balance factor of +/-1). That is - when a balance factor of 0 is encountered, the rotation chain will cease.
The height and number of nodes of the minimized tree is calculated with the following equations.
Let H(k) = the minimum height of the tree affected by k rotations.
H(k) = 2k + 1, k > 0
Let N(h) = the number of nodes in a (min-node) AVL tree of height h.
N(0) = 0
N(1) = 1
N(h) = N(h-1) + N(h-2) + 1, h > 1
Let F(k) = the minimum number of nodes in the tree affected by k rotations.
F(k) = N(H(k))
(e.g:)
k = 1, H(k) = 4, N(4) = 7
k = 2, H(k) = 6, N(6) = 20
Proof (such as it is)
Minimum Height
A deletion can only cause a rotation for trees with 4 or more nodes.
A tree of 1 node must have a balance factor of 0.
A tree of 2 nodes must have a balance factor of +/-1, and deletion leads to a balanced tree of 1 node.
A tree of 3 nodes must have a balance factor of 0. Removal of a node results in a balance factor of +/-1 and no rotation occurs.
Therefore, deletion from a tree with fewer than 4 nodes can not result in a rotation.
The smallest sub-tree for which 1 rotation occurs on delete is 4 nodes, which has height of 3. Removal of the node in the short side will result in rotation. Likewise, removal of the root node, using the node on the short side as replacement will cause a rotation. It doesn't matter how the tree is configured:
B B Removal of A or replacement of B with A
/ \ / \ results in rotation. No rotation occurs
A C A D on removal of C or D, or on replacement
\ / of B with C.
D C
C C Removal of D or replacement of C with D
/ \ / \ results in rotation. No rotation occurs
B D A D on removal of A or B, or on replacement
/ \ of C with B.
A B
Deletion from a 4 node tree results in a balanced tree of height 2.
.
/ \
. .
To effect a second rotation, the target tree must have a sibling of height 4, so that the balance factor of the root is +/-1 (and therefore has a height of 5). It doesn't matter if the affected tree is on the right or left of the parent, nor is the layout of the sibling tree important (that is, the H3 child of H4 can be on the left or right, and can be any of the 4 orientations above while the H2 child can be either of the 2 possible orientations - this needs proving).
_._ _._
/ \ / \
(H4) . . (H4)
/ \ / \
. . . .
\ \
. .
It is clear that the third rotation requires that the grandparent of the affected tree be likewise imbalanced by +/-1, and the fourth requires the great-grandparent be imbalanced by +/-1, and so on.
By definition, the height of a sub-tree is the maximum height of each branch plus one for the root. One sibling must be 1 taller than the other to achieve the +/-1 imbalance in the root.
H(1) = 3 (as observed above)
H(k) = 1 + max(H(k - 1), H(k - 1) + 1)) = 1 + H(k - 1) + 1 = H(k - 1) + 2
... Inductive proof leading to H(k) = 2k + 1 eludes me.
Minimum Nodes
By definition, the number of nodes in a sub-tree is the number of nodes in the left branch plus the number of nodes in the right branch plus 1 for the root.
Also be definition, a tree of height 0 must have 0 nodes, and a tree of height 1 must have no branches and thus 1 node.
It was shown above that the one branch must be one shorter than the other.
Let N(h) = minimum number of nodes required to create a tree of height h:
N(0) = 0
N(1) = 1
// the number of nodes in the two subtrees plus the root
N(h) = N(h-1) + N(h-2) + 1
Corollary
The minimum number of nodes is not necessarily the maximum in large trees. To wit:
Delete A from the following tree and observe that the height doesn't change following rotation. Therefore, the balance factor in the parent would not change and no additional rotation would occur.
B B D
/ \ \ / \
A D => D => B E
/ \ / \ \
C E C E C
However, in the k = 2 case, it does not matter if H(4) is minimized here - the second rotation will still occur.
_._ _._
/ \ / \
(H4) . . (H4)
/ \ / \
. . . .
\ \
. .
Questions
What is the position of the target sub-tree? Clearly for k = 1, it is the root, and for k = 2, it is the left if the root's balance factor is -1 otherwise the right. Is there a formula for determining position for k >= 3?
What is the maximum nodes a tree can contain to effect k rotations? Is it possible to have an intermediate node in the ancestry that is not rotated, though its parent is?
We have came across a question in Thomas H. Cormen which are asking for showing
Here I am confused by this question that how there will be at most nodes
For instance, consider this problem:
In the above problem at height 2 there are 2 nodes. But if we calculate by formula:
Greatest Integer of (10/2^2+1) = 4
it does not satisfy Thomas H. Cormen questions.
Please correct me if I am wrong here.
Thanks in Advance
In Tmh Corman I observed that he is doing height numbering from 1 not from 0 so the formula is correct, I was doing wrong Interpration. So leaf as height 1 and root has height 4 for above question
Reading all the answers, I realized that the confusion comes from the precise definition of height. In page 153 of CLRS book, the height is defined as follows:
Viewing a heap as a tree, we define the height of a node in a heap to be the number of edges on the longest simple downward path from the node to a leaf...
Now let's look at the original heap provided by Nishant. The nodes 8, 9, 10, 6, and 7 are at height 0 (i.e., leaves). The nodes 4, 5 and 3 are at height 1. For example, there is one edge between node 5 and its leaf, node 10. Also there is one edge between node 3 and its leaf node 6. Node 6 looks like it is at height 1 but it is at height 0 and hence a leaf. The node 2 is the only node at height 2. You may wonder node 1 (the root) is two edges away from node 6 and 7 (leaves), and say node 1 is also at height 2. But if we look back at the definition, the bold-face word "longest" suggests that the longest simple downward path from the root to a leaf has 3 edges (passing node 2). Finally, the node 1 is at height 3.
In summary, there are 5, 3, 1, 1 nodes at height 0, 1, 2, 3, respectively.
Let's apply the formula to the observation we made in the above paragraph. I would like to point out that the formula given by Nishant is not correct.
It should be
ceiling(n/2^(h+1)) not ceiling(n/(2^h+1). Sorry about the terrible formatting. I am not able to post an image yet.
Anyways, using the correct formula,
h = 0, ceiling(10/2) = 5 (nodes 8, 9, 10, 6, and 7)
h = 1, ceiling(10/4) = 3 (nodes 4, 5 and 3)
h = 2, ceiling(10/8) = 2 (node 2, but this is okay because the formula is predicting that there are at most 2 nodes at height 2.)
h = 3, ceiling(10/16) = 1 (node 1)
With the correct definition of height, the formula works.
It looks like your formula says there are at most [n/2^h+1] nodes of height h. In your example there are two nodes of height 2, which is less than your computed possible maximum of 4(ish).
While calculating the tight bound for Build-Max-Heap author has used this property in the equation.
In this case we call the helper Max-Heapify which takes O(h) where h is the height of the sub-tree rooted at the current node (not the height of node itself with respect to the full tree).
Therefore if we consider the sub tree rooted at leaf node, it will have height 0 and number of nodes in the tree at that level would be at most n / 20+1 = n/2 (i.e h=0 for the sub tree formed from node at leaves).
Similarly for sub-tree rooted at actual root the height of the tree would be log(n) and in that case the number of nodes at that level would be 1 i.e floor of n / 2logn+1 = [n/n+1].
Formula for
no. of nodes = n/(2^(h+1))
so when h is 2, and n = 10
no. of nodes = 10/(2^(2+1)) = 10/(2^3) = 10/8 = 1.25
But
ceil of 10/8 = 2
Hence there are 2 nodes which you can see from the figure.
Though it is mentioned in Cormen that height of a node is the greatest distance traveled from node to leaf(the number of edges), if you take height to be the distance of a node from the leaf, i.e. at leaf the height is zero and at root the height is log(n). The formula stands correct.
As for the leaves you have h=0; hence by the formula n/(2^(h+1))
h=0; max number of leaves in the heap will be n/2.
what about height 1. Cormen's theory gives 10/(2^(1+1))=3(ceil) while there is 4 nodes at height 1. This is a contradiction.
It is not true that Thomas H. Cormen is counting the height of the tree starting from one, height is h = 0, 1, ..., log n and it increases as you go upwards:
and in the following formula, he added 1 plus the height:
All the confusion is coming from the fact that this will work nicely with Perfect Binary Trees, not with the one you are showing in your question, this is why he is saying ON MOST
when you consider Big-O it wouldn't really matter
This formula is wrong, it gives wrong answers in many cases like in this question for h=1 (ie second last level) it gives maximum number of nodes is 3 but there are 4 nodes. Also let us consider a tree with 4 nodes :
a
/ \
b c
/
d
node d has height 0, let us consider for height =1 using the formula n/2^(h+1) we get
4/2^(1+1) = 1
which means this level can have at most 1 node which is false !
so this formula is not right.
The formula is quite correct. Nothing is wrong with the formula!!
Lets take the tree(although its not heap yet its complete) in the question posed by Nishant on the top.
For h=0 means all leaves so ceil(10/2^(0+1)=5) so there are 5 leaves
For h=1 means all nodes which have one arc to reach the leaves, so ceil(10/2^(1+1))=3 there are 3 such nodes in your tree.
For h=2 means all nodes which have two consecutive arcs to reach leaves, so ceil(10/2^(2+1))=1 so you have only one such node(left successor of the root)
For h=3 means all nodes which have three arcs to leaves, so ceil(10/2^(3+1))=1 which is the root.
Moral of the story is that you are confused between height and level. Level starts from up to down. Which means you have 4 nodes on level 2. i.e you can reach 4 nodes if you start at root and moves two arcs down.
Whereas height is completely different. Like in above case at height 0 there are 5 nodes (3 on level 3, and 2 on level 2). Hence height h of a node n means how many arcs you can travel to reach a leaf.
regards,
Hope it clarifies the point.
Safdar from Pakistan