Homework Help - AVL Tree - algorithm

This is a homework question. I am only seeking help because I am very lost as to approaching this problem.
Instead of having the height of as the balance factor, this AVL tree has the size as the balance factor.
I have to prove that the height is O(log n).
My thought
Approach 1
Since an AVL tree has a height of O(log n), I can basically prove that this is an AVL tree. So it would have the same height as a regular AVL tree. However, I am not sure how to do this completely. I essentially need to show that the height of one side is at most +/- 1 relative to the other. If it is, then it would be an AVL tree.
Approach 2
I can prove the Omega(log n) fairly easily, I can use the best case where it is balanced and that in that case, the height is log(n).
After that, I can try to prove the Big O case. This is the part that I am unsure about. I have no clue how to show that the upper bound of the height is log(n).
Could someone nudge me to the right direction please? Which approach is suggested?

It's unlikely that the first approach will work out - the condition for an AVL tree is probably not equivalent to the description you have. The second approach is more feasible.
Here's a further hint : Induct. Or perhaps in more CS terms, recurse :) Can you prove that the required height property is true for all modified-AVL trees with n nodes? If so, you know that this property is already true for your right and left subtrees. What can you do with that?
EDIT : More detailed answer - Assume that you have proved that for all trees with at most N nodes, the height is bounded by c log N, for some sufficiently large c. Given a tree with N+1 nodes, let L and R be the number of nodes in its left and right subtrees, respectively. Since L<=N and R<=N, by our induction assumption, both these subtrees have heights bounded by c log (number of nodes). The height of full tree is bounded above by 1 + max(c log L, c log R). Now, L+R=N, L<=2R and R<=2L giving L<2N/3 and R<2N/3. So, max(log L, log R) < log(2N/3) = log N + log(2/3). 1 + c*max(log L, log R) < 1 + c*log N + c*log(2/3) < c*log(N+1), for sufficiently large c.

Related

Complete binary tree time complexity

If someone wants to generates a complete binary tree. This tree has h levels where h can be any positive integer and as an input to the algorithm. What complexity will it lie in and why?
A complete binary tree is tree where all levels are full of nodes except the last level, we can define the time complexity in terms of upper bound.
If we know the height of the tree is h, then the maximum number of possible nodes in the tree are 2h - 1.
Therefore, time complexity = O(2h - 1).
To sell your algorithm in the market, you need tight upper bounds to prove that your algorithm is better than the others'.
A slightly tight upper bound for this problem can be defined after knowing exactly how many nodes are there in the tree. Let's say there are N.
Then, the time complexity = O(N).

Big O for Height of Balanced Binary Tree

Perhaps a dumb question. In a balanced binary tree where n is the total number of nodes, I understand why the height is equal to log(n). What I don't understand is what people mean when they refer to the height as being O(log(n)). I've only seen Big O used in the context of algorithms, where if an algorithm runs in O(n) and if the input doubles then the running time doubles. But height isn't an algorithm. How does this apply to the height of a tree? What does it mean for the height to be O(log(n))?
This is because a complete binary tree of n nodes does not have height log(n).
Consider a complete binary tree of height k. Such a tree has 2k leaf nodes. How many nodes does it have in total? If you look at each level, you will find that it has 1 + 2 + 4 + 8 + ... + 2k nodes, or 20 + 21 + 22 + 23 + ... 2k.
After some math, you will find that this series equals 2k+1 - 1.
So, if your tree has n nodes, what is its height? If you solve the equation n = 2k+1 - 1 with respect to k, you obtain k = log2(n+1) - 1.
This expression is slightly less nice than log2(n), and it is certainly not the same number. However, by the properties of big-O notation,
log2(n+1) - 1 = O(log(n)).
In the source you are reading, emphasis is given on that the height grows as fast as log(n). They belong to the same complexity class. This information can be useful when designing algorithms, since you know that doubling the input will increase the tree height only by a constant. This property gives tree structures immense power. So even if the expression for the exact height of the tree is more complicated (and if your binary tree is not complete, it will look more complicated still), it is of logarithmic complexity with respect to n.
To add to Berthur's excellent answer, Big-Oh notation is not specific to analysis of algorithms; it applies to any functions. In analysis f algorithms we care about the function T(n) which gives the (typically worst-case) runtime for input size n, and we want to know an upper bound (Big-Oh) on that function's rate of growth. Here, there is a function that gives the true height of a tree with whatever property, and we want an upper bound on that function's rate of growth. We could find upper bounds on arbitrary functions devoid of any context at all like f(n) = n!^(1/2^n) or whatever.
I think they mean it takes O(log(n)) to traverse the tree

Prove that a binary tree with n leaves has a height of at least log n

I know that the number of leaves in a binary tree of height h can be at most 2^h, and I also know how to prove this using induction. Where do I go from here?
I found this previously answered question, but this doesn't make any sense to me because I don't understand what the proof by contradiction in the theorem section has anything to do with a binary tree's height being at least log(n). I was expecting him to talk about how log(n) relates to the number of leaves and height, but instead he goes on to talk about how to do a proof by contradiction using n = 2^a + b.
Can anyone help me understand how we can prove that the height of a BT with n leaves will be at least log n?
Consider a binary tree, and let h be its height and n be the number of its leaves.
By your first sentence, n <= 2^h. Taking a log base 2 on both sides (which preserves the inequality because log is monotonic), we have log(n) <= h. That immediately gives you what you wanted: the height is at least log(n), where n is the number of leaves.
So you know that your binary tree has the following properties:
n <= 2^h
Now you can use a log on both sides because both are positive and that's it from a math point of view.
In order to understand it better you can do the following:
If you have a tree which is not full you traded 2 possible leaves for one actual leaf (because there could be 2 children). So the maximum number of leaves for any given height is a full tree which has 2^h leaves or log (n) height.

Balanced Binary Search Trees on the basis of size of left and right child subtrees

I have two questions:
What is the difference between nearly balanced BST and nearly Complete Binary tree. Even if the definition of the former is clear then we can differenciate, but not able to get a relevant article.
Today, in my class I was taught about the condition to be balanced as:
max( size(root.left) , size(root.right) ) <= 3*n/4 ------------ (eqn 1).
Hence, H(n) = height for the tree of n nodes following the above property < = 1+H(3*n/4).
Continuing the recursive steps we get the bound for logn.
My question is that, is this a specific type of BST ? For example in case of AVL trees, as I remember the condtion is that the difference in the heights of left and right childs being atmost 1, or is this a more general result and the equation 1 as stated earlier can be reduced to prove the result for AVL Trees as well ? i.e. any Balanced BST will result in difference of heights of siblings being atmost 1 ?
In case its different than AVL, how do we manage the Insertion and Delete Operations in this new kind of tree ?
EDIT : Also if you can explain why 3*n/4 only ?
My Thought: It is because we can then surely say that H(n) <= 1+H(3*n/4), since if we take something like 3n/5 less than 3n/4 then H(3n/5) wont be necessarily less that H(2n/5) as the raio of 3n/5 and 2n/5 is less than 2 and as we know a factor of 2 for number of nodes increases the height by 1.
So we wont surely write H(n) <= 1 + H(3n/5), it may be H(2n/5) in place of H(3n/5) as well, am I right ?
A nearly complete BST is a BST where all levels are filled, except the last one. Definitions are kind of messed up here (some call this property perfect). Please refer to wikipedia for this.
Being balanced is a less strict criterion, i.e. all (nearly) complete BSTs are balanced, but not all balanced BSTs are complete. In that Wikipedia article is a definition for that too. Im my world a BST is balanced, if it leads to O(log n) operation cost.
For example, one can say, a BST is balanced, if each subtree has at most epsilon * n nodes, where epsilon < 1 (for example epsilon = 3/4 or even epsilon = 0.999 -- which are practically not balanced at all).
The reason for that is that the height of such a BST is roughly log_{1/epsilon} n = log_2 n / (- log_2 epsilon) = O(log n), but 1 / (- log_2 0.99) = 99.5 is a huge constant. You can try to prove that with the usual ration of epsilon = 1/2, where both subtrees have roughly the same size.
I don't know of a common BST, which uses this 3/4. Common BSTs are for example Red-Black-Trees, Splay-Trees or - if you are on a hard disk - a whole family of B-Trees. For your example, you can probably implement the operations by augumenting each node with two integers representing the number of nodes in the left and the right subtree respectively. When inserting or deleting someting, you update the numbers as you walk from the root to the leaf (or up) and if the condition is validated, you do a rotation.

What is exactly mean log n height?

I came to know the height of Random-BST/Red-Black trees and some other trees are O(log n).
I wonder, how this can be. Lets say I have a tree like this
The height of the tree is essentially the depth of the tree, which is in this case will be 4 (leaving the parent depth). But how could people say that the height can be represented by O(log n) notion?
I'm very to algorithms, and this point is confusing me a lot. Where I'm missing the point?
In algorithm complexity the variable n typically refers to the total number of items in a collection or involved in some calculation. In this case, n is the total number of nodes in the tree. So, in the picture you posted n=31. If the height of the tree is O(log n) that means that the height of the tree is proportional to the log of n. Since this is a binary tree, you'd use log base 2.
⌊log₂(31)⌋ = 4
Therefore, the height of the tree should be about 4—which is exactly the case in your example.
As I explained in a comment, a binary tree can have multiple cases:
In the degenerate case, a binary tree is simply a chain, and its height is O(n).
In the best case (for most search algorithms), a complete binary tree has the property that for any node, the height of the subtrees are the same. In this case the length will be the floor of log(n) (base 2, or base k, for k branches). You can prove this by induction on the size of the tree (structural induction in the constructors)
In the general case you will have a mix of these, a tree constructed where any node has subtress with possibly different height.

Resources