Balanced Binary Search Trees on the basis of size of left and right child subtrees - algorithm

I have two questions:
What is the difference between nearly balanced BST and nearly Complete Binary tree. Even if the definition of the former is clear then we can differenciate, but not able to get a relevant article.
Today, in my class I was taught about the condition to be balanced as:
max( size(root.left) , size(root.right) ) <= 3*n/4 ------------ (eqn 1).
Hence, H(n) = height for the tree of n nodes following the above property < = 1+H(3*n/4).
Continuing the recursive steps we get the bound for logn.
My question is that, is this a specific type of BST ? For example in case of AVL trees, as I remember the condtion is that the difference in the heights of left and right childs being atmost 1, or is this a more general result and the equation 1 as stated earlier can be reduced to prove the result for AVL Trees as well ? i.e. any Balanced BST will result in difference of heights of siblings being atmost 1 ?
In case its different than AVL, how do we manage the Insertion and Delete Operations in this new kind of tree ?
EDIT : Also if you can explain why 3*n/4 only ?
My Thought: It is because we can then surely say that H(n) <= 1+H(3*n/4), since if we take something like 3n/5 less than 3n/4 then H(3n/5) wont be necessarily less that H(2n/5) as the raio of 3n/5 and 2n/5 is less than 2 and as we know a factor of 2 for number of nodes increases the height by 1.
So we wont surely write H(n) <= 1 + H(3n/5), it may be H(2n/5) in place of H(3n/5) as well, am I right ?

A nearly complete BST is a BST where all levels are filled, except the last one. Definitions are kind of messed up here (some call this property perfect). Please refer to wikipedia for this.
Being balanced is a less strict criterion, i.e. all (nearly) complete BSTs are balanced, but not all balanced BSTs are complete. In that Wikipedia article is a definition for that too. Im my world a BST is balanced, if it leads to O(log n) operation cost.
For example, one can say, a BST is balanced, if each subtree has at most epsilon * n nodes, where epsilon < 1 (for example epsilon = 3/4 or even epsilon = 0.999 -- which are practically not balanced at all).
The reason for that is that the height of such a BST is roughly log_{1/epsilon} n = log_2 n / (- log_2 epsilon) = O(log n), but 1 / (- log_2 0.99) = 99.5 is a huge constant. You can try to prove that with the usual ration of epsilon = 1/2, where both subtrees have roughly the same size.
I don't know of a common BST, which uses this 3/4. Common BSTs are for example Red-Black-Trees, Splay-Trees or - if you are on a hard disk - a whole family of B-Trees. For your example, you can probably implement the operations by augumenting each node with two integers representing the number of nodes in the left and the right subtree respectively. When inserting or deleting someting, you update the numbers as you walk from the root to the leaf (or up) and if the condition is validated, you do a rotation.

Related

Prove that a binary tree with n leaves has a height of at least log n

I know that the number of leaves in a binary tree of height h can be at most 2^h, and I also know how to prove this using induction. Where do I go from here?
I found this previously answered question, but this doesn't make any sense to me because I don't understand what the proof by contradiction in the theorem section has anything to do with a binary tree's height being at least log(n). I was expecting him to talk about how log(n) relates to the number of leaves and height, but instead he goes on to talk about how to do a proof by contradiction using n = 2^a + b.
Can anyone help me understand how we can prove that the height of a BT with n leaves will be at least log n?
Consider a binary tree, and let h be its height and n be the number of its leaves.
By your first sentence, n <= 2^h. Taking a log base 2 on both sides (which preserves the inequality because log is monotonic), we have log(n) <= h. That immediately gives you what you wanted: the height is at least log(n), where n is the number of leaves.
So you know that your binary tree has the following properties:
n <= 2^h
Now you can use a log on both sides because both are positive and that's it from a math point of view.
In order to understand it better you can do the following:
If you have a tree which is not full you traded 2 possible leaves for one actual leaf (because there could be 2 children). So the maximum number of leaves for any given height is a full tree which has 2^h leaves or log (n) height.

Time Efficiency of Binary Search Tree

for the time efficiency of inserting into binary search tree,
I know that the best/average case of insertion is O(log n), where as the worst case is O(N).
What I'm wondering is if there is any way to ensure that we will always have best/average case when inserting besides implementing an AVL (Balanced BST)?
Thanks!
There is no guaranteed log n complexity without balancing a binary search tree. While searching/inserting/deleting, you have to navigate through the tree in order to position yourself at the right place and perform the operation. The key question is - what is the number of steps needed to get at the right position? If BST is balanced, you can expect on average 2^(i-1) nodes at the level i. This further means, if the tree has k levels (kis called the height of tree), the expected number of nodes in the tree is 1 + 2 + 4 + .. + 2^(k-1) = 2^k - 1 = n, which gives k = log n, and that is the average number of steps needed to navigate from the root to the leaf.
Having said that, there are various implementations of balanced BST. You mentioned AVL, the other very popular is red-black tree, which is used e.g. in C++ for implementing std::map or in Java for implementing TreeMap.
The worst case, O(n), can happen when you don't balance BST and your tree degenerates into a linked list. It is clear that in order to position at the end of the list (which is a worst case), you have to iterate through the whole list, and this requires n steps.

Homework Help - AVL Tree

This is a homework question. I am only seeking help because I am very lost as to approaching this problem.
Instead of having the height of as the balance factor, this AVL tree has the size as the balance factor.
I have to prove that the height is O(log n).
My thought
Approach 1
Since an AVL tree has a height of O(log n), I can basically prove that this is an AVL tree. So it would have the same height as a regular AVL tree. However, I am not sure how to do this completely. I essentially need to show that the height of one side is at most +/- 1 relative to the other. If it is, then it would be an AVL tree.
Approach 2
I can prove the Omega(log n) fairly easily, I can use the best case where it is balanced and that in that case, the height is log(n).
After that, I can try to prove the Big O case. This is the part that I am unsure about. I have no clue how to show that the upper bound of the height is log(n).
Could someone nudge me to the right direction please? Which approach is suggested?
It's unlikely that the first approach will work out - the condition for an AVL tree is probably not equivalent to the description you have. The second approach is more feasible.
Here's a further hint : Induct. Or perhaps in more CS terms, recurse :) Can you prove that the required height property is true for all modified-AVL trees with n nodes? If so, you know that this property is already true for your right and left subtrees. What can you do with that?
EDIT : More detailed answer - Assume that you have proved that for all trees with at most N nodes, the height is bounded by c log N, for some sufficiently large c. Given a tree with N+1 nodes, let L and R be the number of nodes in its left and right subtrees, respectively. Since L<=N and R<=N, by our induction assumption, both these subtrees have heights bounded by c log (number of nodes). The height of full tree is bounded above by 1 + max(c log L, c log R). Now, L+R=N, L<=2R and R<=2L giving L<2N/3 and R<2N/3. So, max(log L, log R) < log(2N/3) = log N + log(2/3). 1 + c*max(log L, log R) < 1 + c*log N + c*log(2/3) < c*log(N+1), for sufficiently large c.

Why lookup in a Binary Search Tree is O(log(n))?

I can see how, when looking up a value in a BST we leave half the tree everytime we compare a node with the value we are looking for.
However I fail to see why the time complexity is O(log(n)). So, my question is:
If we have a tree of N elements, why the time complexity of looking up the tree and check if a particular value exists is O(log(n)), how do we get that?
Your question seems to be well answered here but to summarise in relation to your specific question it might be better to think of it in reverse; "what happens to the BST solution time as the number of nodes goes up"?
Essentially, in a BST every time you double the number of nodes you only increase the number of steps to solution by one. To extend this, four times the nodes gives two extra steps. Eight times the nodes gives three extra steps. Sixteen times the nodes gives four extra steps. And so on.
The base 2 log of the first number in these pairs is the second number in these pairs. It's base 2 log because this is a binary search (you halve the problem space each step).
For me the easiest way was to look at a graph of log2(n), where n is the number of nodes in the binary tree. As a table this looks like:
log2(n) = d
log2(1) = 0
log2(2) = 1
log2(4) = 2
log2(8) = 3
log2(16)= 4
log2(32)= 5
log2(64)= 6
and then I draw a little binary tree, this one goes from depth d=0 to d=3:
d=0 O
/ \
d=1 R B
/\ /\
d=2 R B R B
/\ /\ /\ /\
d=3 R B RB RB R B
So as the number of nodes, n, in the tree effectively doubles (e.g. n increases by 8 as it goes from 7 to 15 (which is almost a doubling) when the depth d goes from d=2 to d=3, increasing by 1.) So the additional amount of processing required (or time required) increases by only 1 additional computation (or iteration), because the amount of processing is related to d.
We can see that we go down only 1 additional level of depth d, from d=2 to d=3, to find the node we want out of all the nodes n, after doubling the number of nodes. This is true because we've now searched the whole tree, well, the half of it that we needed to search to find the node we wanted.
We can write this as d = log2(n), where d tells us how much computation (how many iterations) we need to do (on average) to reach any node in the tree, when there are n nodes in the tree.
This can be shown mathematically very easily.
Before I present that, let me clarify something. The complexity of lookup or find in a balanced binary search tree is O(log(n)). For a binary search tree in general, it is O(n). I'll show both below.
In a balanced binary search tree, in the worst case, the value I am looking for is in the leaf of the tree. I'll basically traverse from root to the leaf, by looking at each layer of the tree only once -due to the ordered structure of BSTs. Therefore, the number of searches I need to do is number of layers of the tree. Hence the problem boils down to finding a closed-form expression for the number of layers of a tree with n nodes.
This is where we'll do a simple induction. A tree with only 1 layer has only 1 node. A tree of 2 layers has 1+2 nodes. 3 layers 1+2+4 nodes etc. The pattern is clear: A tree with k layers has exactly
n=2^0+2^1+...+2^{k-1}
nodes. This is a geometric series, which implies
n=2^k-1,
equivalently:
k = log(n+1)
We know that big-oh is interested in large values of n, hence constants are irrelevant. Hence the O(log(n)) complexity.
I'll give another -much shorter- way to show the same result. Since while looking for a value we constantly split the tree into two halves, and we have to do this k times, where k is number of layers, the following is true:
(n+1)/2^k = 1,
which implies the exact same result. You have to convince yourself about where that +1 in n+1 is coming from, but it is okay even if you don't pay attention to it, since we are talking about large values of n.
Now let's discuss the general binary search tree. In the worst case, it is perfectly unbalanced, meaning all of its nodes has only one child (and it becomes a linked list) See e.g. https://www.cs.auckland.ac.nz/~jmor159/PLDS210/niemann/s_fig33.gif
In this case, to find the value in the leaf, I need to iterate on all nodes, hence O(n).
A final note is that these complexities hold true for not only find, but also insert and delete operations.
(I'll edit my equations with better-looking Latex math styling when I reach 10 rep points. SO won't let me right now.)
Whenever you see a runtime that has an O(log n) factor in it, there's a very good chance that you're looking at something of the form "keep dividing the size of some object by a constant." So probably the best way to think about this question is - as you're doing lookups in a binary search tree, what exactly is it that's getting cut down by a constant factor, and what exactly is that constant?
For starters, let's imagine that you have a perfectly balanced binary tree, something that looks like this:
*
/ \
* *
/ \ / \
* * * *
/ \ / \ / \ / \
* * * * * * * *
At each point in doing the search, you look at the current node. If it's the one you're looking for, great! You're totally done. On the other hand, if it isn't, then you either descend into the left subtree or the right subtree and then repeat this process.
If you walk into one of the two subtrees, you're essentially saying "I don't care at all about what's in that other subtree." You're throwing all the nodes in it away. And how many nodes are in there? Well, with a quick visual inspection - ideally one followed up with some nice math - you'll see that you're tossing out about half the nodes in the tree.
This means that at each step in a lookup, you either (1) find the node that you're looking for, or (2) toss out half the nodes in the tree. Since you're doing a constant amount of work at each step, you're looking at the hallmark behavior of O(log n) behavior - the work drops by a constant factor at each step, and so it can only do so logarithmically many times.
Now of course, not all trees look like this. AVL trees have the fun property that each time you descend down into a subtree, you throw away roughly a golden ratio fraction of the total nodes. This therefore guarantees you can only take logarithmically many steps before you run out of nodes - hence the O(log n) height. In a red/black tree, each step throws away (roughly) a quarter of the total nodes, and since you're shrinking by a constant factor you again get the O(log n) lookup time you'd like. The very fun scapegoat tree has a tuneable parameter that's used to determine how tightly balanced it is, but again you can show that every step you take throws away some constant factor based on this tuneable parameter, giving O(log n) lookups.
However, this analysis breaks down for imbalanced trees. If you have a purely degenerate tree - one where every node has exactly one child - then every step down the tree that you take only tosses away a single node, not a constant fraction. That means that the lookup time gets up to O(n) in the worst case, since the number of times you can subtract a constant from n is O(n).
If we have a tree of N elements, why the time complexity of looking up
the tree and check if a particular value exists is O(log(n)), how do
we get that?
That's not true. By default, a lookup in a Binary Search Tree is not O(log(n)), where n is a number of nodes. In the worst case, it can become O(n). For instance, if we insert values of the following sequence n, n - 1, ..., 1 (in the same order), then the tree will be represented as below:
n
/
n - 1
/
n - 2
/
...
1
A lookup for a node with value 1 has O(n) time complexity.
To make a lookup more efficient, the tree must be balanced so that its maximum height is proportional to log(n). In such case, the time complexity of lookup is O(log(n)) because finding any leaf is bounded by log(n) operations.
But again, not every Binary Search Tree is a Balanced Binary Search Tree. You must balance it to guarantee the O(log(n)) time complexity.

What is exactly mean log n height?

I came to know the height of Random-BST/Red-Black trees and some other trees are O(log n).
I wonder, how this can be. Lets say I have a tree like this
The height of the tree is essentially the depth of the tree, which is in this case will be 4 (leaving the parent depth). But how could people say that the height can be represented by O(log n) notion?
I'm very to algorithms, and this point is confusing me a lot. Where I'm missing the point?
In algorithm complexity the variable n typically refers to the total number of items in a collection or involved in some calculation. In this case, n is the total number of nodes in the tree. So, in the picture you posted n=31. If the height of the tree is O(log n) that means that the height of the tree is proportional to the log of n. Since this is a binary tree, you'd use log base 2.
⌊log₂(31)⌋ = 4
Therefore, the height of the tree should be about 4—which is exactly the case in your example.
As I explained in a comment, a binary tree can have multiple cases:
In the degenerate case, a binary tree is simply a chain, and its height is O(n).
In the best case (for most search algorithms), a complete binary tree has the property that for any node, the height of the subtrees are the same. In this case the length will be the floor of log(n) (base 2, or base k, for k branches). You can prove this by induction on the size of the tree (structural induction in the constructors)
In the general case you will have a mix of these, a tree constructed where any node has subtress with possibly different height.

Resources