How to generate an AVL tree as lopsided as possible? - algorithm

I saw this in some paper and someone argued that there can be at most log(n) times rotation when we delete a node of an AVL tree. I believe we can achieve this by generating an AVL tree as lopsided as possible. The problem is how to do this. This will help me a lot about researching the removal rotation thing. Thanks very much!

If you want to make a maximally lopsided AVL tree, you are looking for a Fibonacci tree, which is defined inductively as follows:
A Fibonacci tree of order 0 is empty.
A Fibonacci tree of order 1 is a single node.
A Fibonacci tree of order n + 2 is a node whose left child is a Fibonacci tree of order n and whose right child is a Fibonacci tree of order n + 1.
For example, here's a Fibonacci tree of order 5:
The Fibonacci trees represent the maximum amount of skew that an AVL tree can have, since if the balance factor were any more lopsided the balance factor of each node would exceed the limits placed by AVL trees.
You can use this definition to very easily generate maximally lopsided AVL trees:
function FibonacciTree(int order) {
if order = 0, return the empty tree.
if order = 1, create a single node and return it.
otherwise:
let left = FibonacciTree(order - 2)
let right = FibonacciTree(order - 1)
return a tree whose left child is "left" and whose right child is "right."
Hope this helps!

Related

Algorithm to count number of nodes in an AVL tree

Assume the following notation/operations on AVL trees. An empty AVL tree is denoted E. A
non-empty AVL tree T has three attributes:
• The key T.key is the root node’s key.
• The left child T.left is T’s left subtree, which is an AVL tree (possibly E).
• The right child T.right is T’s right subtree, which is an AVL tree (possibly E).
I'm trying to write an algorithm (pseudocode would do) Count(T, lo, hi) that counts and returns the number of nodes in an AVL tree with root T, where the key value is in the range lo ≤ key ≤ hi. I want it to have time complexity O(n) where n is the number of nodes in the AVL tree T. One idea I had was recursion but this didn't seem to have the required complexity. Any ideas?
You can add a global variable like counter, iterate the tree with Pre-order this has a cost of (n+e) and add 1 for each node.
You can add a counter too, and when add a new node inside the data structure, you can add 1, and if you remove a node you can subtract 1

Why is the time complexity of performing n union find (union by size) operations O(n log n)?

In Tree based Implementation of Union Find operation, each element is stored in a node, which contains a pointer to a set name. A node v whose set pointer points back to v is also a set name. Each set is a tree, rooted at a node with a self-referencing set pointer.
To perform a union, we simply make the root of one tree point to the root of the other. To perform a find, we follow set name pointers from the starting node until reaching a node whose set name pointer refers back to itself.
In Union by size -> When performing a union, we make the root of smaller tree
point to the root of the larger. This implies O(n log n) time for
performing n union find operations. Each time we follow a pointer, we are going to a subtree of size at most double the size of the previous subtree. Thus, we will follow at most O(log n) pointers for any find.
I do not understand how for each union operation, Find operation is always O(log n). Can someone please explain how the worst case complexity is actually computed?
Let's assume for the moment, that each tree of height h contains at least 2^h nodes. What happens, if you join two such trees?
If they are of different height, the height of the combined tree is the same as the height of the higher one, thus the new tree still has more than 2^h nodes (same height but more nodes).
Now if they are the same height, the resulting tree will increase its height by one, and will contain at least 2^h + 2^h = 2^(h+1) nodes. So the condition will still hold.
The most basic trees (1 node, height 0) also fulfill the condition. It follows, that all trees that can be constructed by joining two trees together fulfill it as well.
Now the height is just the maximal number of steps to follow during a find. If a tree has n nodes and height h (n >= 2^h) this gives immediately log2(n) >= h >= steps.
You can do n union find (union by rank or size) operations with complexity O(n lg* n) where lg* n is the inverse Ackermann function using path compression optimization.
Note that O(n lg* n) is better than O(n log n)
In the question Why is the Ackermann function related to the amortized complexity of union-find algorithm used for disjoint sets? you can find details about this relation.
We need to prove that maximum height of trees is log(N) where N is the number of items in UF (1)
In the base case, all trees have a height of 0. (1) of course satisfied
Now assuming all the trees satisfy (1) we need to prove that joining any 2 trees with i, j (i <= j) nodes will create a new tree with maximum height is log(i + j)(2):
Because the joining 2 trees procedure gets root node of the smaller tree and attach it to the root node of the bigger one so the height of the new tree will be:
max(log(j), 1 + log(i)) = max(log(j), log(2i)) <= log(i + j) => (2) proved
log(j): height of new tree is still the height of the bigger tree
1 + log(i): when height of 2 trees are the same
See the picture below for more details:
Ref: book Algorithms

Big O Time Complexity for Recursive Pattern

I have question on runtime for recursive pattern.
Example 1
int f(int n) {
if(n <= 1) {
return 1;
}
return f(n - 1) + f(n - 1);
}
I can understand that the runtime for the above code is O(2^N) because if I pass 5, it calls 4 twice then each 4 calls 3 twice and follows till it reaches 1 i.e., something like O(branches^depth).
Example 2
Balanced Binary Tree
int sum(Node node) {
if(node == null) {
return 0;
}
return sum(node.left) + node.value + sum(node.right);
}
I read that the runtime for the above code is O(2^log N) since it is balanced but I still see it as O(2^N). Can anyone explain it?
When the number of element gets halved each time, the runtime is log N. But how a binary tree works here?
Is it 2^log N just because it is balanced?
What if it is not balanced?
Edit:
We can solve O(2^log N) = O(N) but I am seeing it as O(2^N).
Thanks!
Binary tree will have complexity O(n) like any other tree here because you are ultimately traversing all of the elements of the tree. By halving we are not doing anything special other than calculating sum for the corresponding children separately.
The term comes this way because if it is balanced then 2^(log_2(n)) is the number of elements in the tree (leaf+non-leaf).(log2(n) levels)
Again if it is not balanced it doesn't matter. We are doing an operation for which every element needs to be consideredmaking the runtime to be O(n).
Where it could have mattered? If it was searching an element then it would have mattered (whether it is balanced or not).
I'll take a stab at this.
In a balanced binary tree, you should have half the child nodes to the left and half to the right of each parent node. The first layer of the tree is the root, with 1 element, then 2 elements in the next layer, then 4 elements in the next, then 8, and so on. So for a tree with L layers, you have 2^L - 1 nodes in the tree.
Reversing this, if you have N elements to insert into a tree, you end up with a balanced binary tree of depth L = log_2(N), so you only ever need to call your recursive algorithm for log_2(N) layers. At each layer, you are doubling the number of calls to your algorithm, so in your case you end up with 2^log_2(N) calls and O(2^log_2(N)) run time. Note that 2^log_2(N) = N, so it's the same either way, but we'll get to the advantage of a binary tree in a second.
If the tree is not balanced, you end up with depth greater than log_2(N), so you have more recursive calls. In the extreme case, when all of your children are to the left (or right) of their parent, you have N recursive calls, but each call returns immediately from one of its branches (no child on one side). Thus you would have O(N) run time, which is the same as before. Every node is visited once.
An advantage of a balanced tree is in cases like search. If the left-hand child is always less than the parent, and the right-hand child is always greater than, then you can search for an element n among N nodes in O(log_2(N)) time (not 2^log_2(N)!). If, however, your tree is severely imbalanced, this search becomes a linear traversal of all of the values and your search is O(N). If N is extremely large, or you perform this search a ton, this can be the difference between a tractable and an intractable algorithm.

Cracking the Coding Interview 6th Edition: 10.10. Rank from Stream

The problem statement is as follows:
Imagine you are reading in a stream of integers. Periodically, you
wish to be able to look up the rank of a number x (the number of
values less than or equal to x). Implement the data structures and
algorithms to support these operations.That is, implement the method
track (in t x), which is called when each number is generated, and the
method getRankOfNumber(int x) , which returns the number of values
less than or equal to X (not including x itself).
EXAMPLE: Stream(in order of appearance): 5, 1, 4, 4, 5, 9, 7, 13, 3
getRankOfNumber(1) = 0 getRankOfNumber(3) = 1 getRankOfNumber(4) = 3
The suggested solution uses a modified Binary Search Tree, where each node stores stores the number of nodes to the left of that node. The time complexity for both methods is is O(logN) for balanced tree and O(N) for unbalanced tree, where N is the number of nodes.
But how can we construct a balanced BST from a stream of random integers? Won't the tree become unbalanced in due time if we keep on adding to the same tree and the root is not the median? Shouldn't the worst case complexity be O(N) for this solution (in which case a HashMap with O(1) and O(N) for track() and getRankOfNumber() respectively would be better)?
you just need to build an AVL or Red-Black Tree to have the O(lg n) complexities you desire.
about the rank, its kind of simple. Let's call count(T) the number of elements of a tree with root T.
the rank of a node N will be:
firstly there will be count(N's left subtree) nodes before N (elements smaller than N)
let A = N's father. If N is right son of A, then there will be 1 + count(A's left subtree) nodes before N
if A is right son of some B, then there will be 1 + count(B's left subtree) nodes before N
recursively, run all the way up until you reach the root or until the node you are in isn't someone's right son.
as the height of a balanced tree is at most lg(n), this method will take O(lg n) to return you someone's rank ( O(lg n) to find + O(lg n) to run back and measure the rank ), but this taking in consideration that all nodes store the sizes of their left and right subtrees.
hope that helps :)
Building a Binary Search Tree (BST) using the stream of numbers should be easier to imagine. All the values less than the node, goes to the left and all the values greater than the node, goes to the right.
Then Rank of any x will be number of nodes in left subtree of that node with value x.
Operations to be done: Find the node with Value x O(logN) + Count Nodes of left Subtree of node found O(logN) = Total O(logN + logN) = O(logN)
In case to optimize searching of counts of node of left subtree from O(logN) to O(1), you can keep another class variable 'leftSubTreeSize' in Node class, and populate it during insertion of a node.

Finding the minimum and maximum height in a AVL tree, given a number of nodes?

Is there a formula to calculate what the maximum and minimum height for an AVL tree, given a certain number of nodes?
For example:
Textbook question:
What is the maximum/minimum height for an AVL tree of 3 nodes, 5 nodes, and 7 nodes?
Textbook answer:
The maximum/minimum height for an AVL tree of 3 nodes is 2/2, for 5 nodes is 3/3, for 7 nodes is 4/3
I don't know if they figured it out by some magic formula, or if they draw out the AVL tree for each of the given heights and determined it that way.
The solution below is appropriate for working things out by hand and gaining an intuition, please see the exact formulas at the bottom of this answer for larger trees (54+ nodes).1
Well the minimum height2 is easy, just fill each level of the tree with nodes until you run out. That height is the minimum.
To find the maximum, do the same as for the minimum, but then go back one step (remove the last placed node) and see if adding that node to the opposite sub-tree (from where it just was) violates the AVL tree property. If it does, your max height is just your min height. Otherwise this new height (which should be min height+1) is your max height.
If you need an overview of what the properties of an AVL tree are, or just a general explanation of an AVL tree, Wikipedia is a great place to start.
Example:
Let's take the 7 node example case. You fill in all levels and find a completely filled tree of height 3. (1 at level 1, 2 at level 2, 4 at level 3. 1+2+4=7 nodes.) That means 3 is your minimum.
Now find the max. Remove that last node and place it on the left subtree instead of the right. The right subtree still has height 3, but the left subtree now has height 4. However these values differ by less than 2, so it is still an AVL tree. Therefore your max height is 4. (Which is min+1)
All three examples worked out below (note that the numbers correspond to order of placement, NOT value):
Formulas:
The technique shown above doesn't hold if you have a tree with a very large number nodes. In this case, one can use the following formulas to calculate the exact min/max height2.
Given n nodes3:
Minimum: ceil(log2(n+1))
Maximum: floor(1.44*log2(n+2)-.328)
If you're curious, the first time max-min>1 is when n=54.
1Thanks to Jamie S for bringing this failure at larger node counts to my attention.
2Technically, the height of a tree is the longest path length (in edges) between the root and any leaf node. However the OP's textbook uses a common alternate definition of height as the number of levels in a tree. For consistency with the OP and Wikipedia, we use that definition in this post as well.
3These formulas are from the Wikipedia AVL page, with constants plugged in. The original source is Sorting and searching by Donald E. Knuth (2nd Edition).
It's important to note the following defining characteristics of an AVL Tree.
AVL Tree Property
The nodes of an AVL tree abide by the BST property
AND The heights of the left and right sub-trees of any node differ by no more than 1.
Theorem: The AVL property is sufficient to maintain a worst case tree height of O(log N).
Note the following diagram.
- T1 is comprised of a T0 + 1 node, for a height of 1.
- T2 is comprised of T1 and a T0 + 1 node, giving a height of 2.
- T3 is comprised of a T2 for the left sub-tree and a T1 for the right
sub-tree + 1 node, for a height of 3.
- T4 is comprised of a T3 for the left sub-tree and a T2 for the right
sub-tree + 1 node, for a height of 4.
If you take the ceiling of O(log N), where N represents the number of nodes in an AVL tree, you get the height.
Example) T4 contains 12 nodes. [ceiling]O(log 12) = 4.
See the pattern developing here??
**The worst-case height is
Lets assume the number of nodes is n
Trying to find out the minimum height of an AVL tree would be the same as trying to make the tree complete i.e. fill all the possible nodes at each level and then move to the next level.
So at each level the number of eligible nodes increases by 2^(h-1) where h is the height of the tree.
So at h=1, nodes(1) = 2^(1-1) = 1 node
for h=2, nodes(2) = nodes(1)+2^(2-1) = 3 nodes
for h=3, nodes(3) = nodes(2)+2^(3-1) = 7 nodes
so just find the smallest h, for which nodes(h) is greater than the given number of nodes n.
Now for the problem of maximum height of an AVL tree:-
lets assume that the AVL tree is of height h, F(h) being the number of nodes in the AVL tree,
for its height to be maximum lets assume that its left subtree FL and right subtree FR have a difference in height of 1(as it satisfies the AVL property).
Now assuming FL is a tree with height h-1 and FR be a tree with height h-2.
now the number of nodes in
F(h)=F(h-1)+F(h-2)+1 (Eq 1)
Adding 1 on both sides :
F(h)+1=(F(h-1)+1)+ (F(h-2)+1) (Eq 2)
So we have reduced the maximum height problem to a Fibonacci sequence. And these trees F(h) are called Fibonacci Trees.
So, F(1)=1 and F(2)=2
so in order to get the maximum height just find the index of the the number in the fibonacci sequence which is less than or equal to n.
So applying (Eq 1)
F(3)= F(2) + F(1)+ 1=4, so if n is between 2 and 4 tree will have height 3.
F(4)= F(3)+ F(2)+ 1 = 7, similarly if n is between 4 and 7 tree will have height 4.
and so on.
http://lcm.csa.iisc.ernet.in/dsa/node112.html
It is roughly 1.44 * log n, where n is the number of nodes.
For a more detailed description on how that was derived. You can refer to this link starting on the middle of page 13: http://www.compsci.hunter.cuny.edu/~sweiss/course_materials/csci335/lecture_notes/chapter04.2.pdf

Resources