I know how to find the minimum numbers of nodes in an AVL tree of height h (which includes external nodes) with the formula n(h) = n(h-1) + n(h-2) + 1 but I was wondering if there was a formula to just find the minimum internal nodes only of an AVL tree with height h.
So for n(3) = 4, if we're only counting internal nodes. n(4) = 7, if we're only counting internal nodes. I can draw it out and count the internal nodes but when you get to bigger AVL trees it's a mess.
I can't seem to find anything on this and trying to find a pattern with consistent answers has only led to hours of frustration. Thanks in advance.
Yep, there’s a nice way to calculate this. Let’s begin with the two simplest AVL trees, which have order 0 and order 1:
* *
|
*
This first tree has no internal nodes, and the second has one internal node. This gives us our base cases for a recurrence relation:
I(0) = 0
I(1) = 1
From here, we notice that the way to get the fewest internal nodes in an AVL tree of order n+2 is to pick two trees of order n and n+1 as children (minimizing the number of nodes) that have the fewest internal nodes possible. The resulting tree will have a number of internal nodes equal to the number of internal nodes in the two subtrees, plus one for the new root. This means that
I(n+2) = I(n) + I(n+1) + 1.
Applying this recurrence gives us the sequence
0, 1, 2, 4, 7, 12, 20, etc.
And hey - have we seen this before somewhere? We have! Adding one to each term gives us
1, 2, 3, 5, 8, 13, 21, etc.
which is the Fibonacci sequence, shifted down two positions! So our hypothesis is that
I(n) = F(n+2) - 1
You can prove that this is the case by induction on n.
Here’s a different way to arrive at this result. Imagine you take an AVL tree of height n and remove all the leaves. You’re now left with an AVL tree of height n-1 (prove this!), and all of the remaining nodes in this tree are the internal nodes of the original tree. The smallest possible number of nodes in an AVL tree of height n is F(n+2)-1, matching our result.
Related
Let us assume that we have a tree consisting on N nodes. The task is to find all longest unique paths in the tree. For example, if the tree looks like following:
Then there are three longest unique paths in the tree: 1 - 2 - 3 - 4 - 5, 6 - 2 - 3 - 4 - 5 and 1 - 2 - 6.
I want to programmatically find and store all such paths for a given tree.
One way to do it would be to compute paths between each pair of node in the tree and then reject the paths which are contained in any other path. However, I am looking for an efficient way to do it. My questions are as follows:
Is it possible to compute this information in less than O(N^2)? I have not been able to think of a solution which would be faster than O(N^2).
If yes, could you be kind enough to guide me towards the solution.
The reason why I want to try it out is because I am trying to solve this problem: KNODES
An algorithm with a time complexity below O(N^2) may only exist, if every solution for a tree with N nodes can be encoded in less than O(N^2) space.
Suppose a complete binary tree with n leaves (N=n log n). The solution to the problem will contain a path for every set of 2 leaves. That means, the solution will have O(n^2) elements. So for this case we can encode the solution as the 2-element sets of leaves.
Now consider a nearly complete binary tree with m leaves, which was created by only removing arbitrary leaves from a complete binary tree with n leaves. When comparing the solution of this tree to that of the complete binary tree, both will share a possibly empty set of paths. In fact for every subset of paths of a solution of a complete binary tree, there will exist at least one binary tree with m leaves as mentioned above, that contains every solution of such a subset. (We intentionally ignore the fact that a tree with m leaves may have some more paths in the solution where at least some of the path ends are not leaves of the complete binary tree.)
Only that part of the solution for a binary tree with m leaves will be encoded by a number with (n^2)/2 bits. The index of a bit in this number represents an element in the upper right half of a matrix with n columns and rows.
For n=4 this would be:
x012
xx34
xxx5
The bit at index i will be set if the undirected path row(i),column(i) is contained in the solution.
As we have already statet that a solution for a tree with m leaves may contain any subset of the solution to the complete binary tree with n>=m leaves, every binary number with (n^2)/2 bits will represent a solution for a tree with m leaves.
Now encoding every possible number with (n^2)/2 bits with less than (n^2)/2 is not possible. So we have shown that solutions at least require O(n^2) space to be represented. Using N=n log n from above we yield a space requirement of at least O(N^2).
Therefore there doens't exist an algorithm with time complexity less than O(N^2)
As far as I could understand, you have a tree without a selected root. Your admissible paths are the paths that do not allow to visit tree nodes twice (you are not allowed to return back). And you need to find all such admissible paths that are not subpaths of any admissible path.
So if I understood right, then if a node has only one edge, than the admissible path either start or stop at this node. If tree is connected, then you can get from any node to any node by one admissible path.
So you select all nodes with one edge, call it S. Then select one of S and walk the whole tree saving the paths to the ends (path, not the walk order). Then you do this with every other item in S and remove duplicated paths (they can be in reverse order: like starting from 1: 1 - 2 - 6 and starting from 6: 6 - 2 - 1).
So here you have to visit all the nodes in the tree as much times as you have leafs in the tree. So complexity depends on the branching factor (in the worst case it is O(n^2). There are some optimizations that can reduce the amount of operations, like you don't have to walk the tree from the last of S.
In this picture the longest paths are {1, 2, 3, 4}, {1, 2, 3, 5}, {1, 2, 3, 6}, {1, 2, 3, 7}, {1, 2, 3, 8}, {1, 2, 3, 9}, {1, 2, 3, 10}
For tree like this storing all longest paths will cost you O(N2)
Let's take the tree which looks like the star with n nodes and n-1 edges.
Then you have got C(n-1, 2) unique longest paths.
So the lower limit of complexity can't be less than O(n^2).
Suppose you have got Binary Tree like on your picture.
class Node(object):
def __init__(self, key):
self.key = key
self.right = None
self.left = None
self.parent = None
def append_left(self, node):
self.left = node
node.parent = self
def append_right(self, node):
self.right = node
node.parent = self
root = Node(3)
root.append_left(Node(2))
root.append_right(Node(4))
root.left.append_left(Node(1))
root.left.append_right(Node(6))
root.right.append_right(Node(5))
And we need to get all paths between all leaves. So in your tree they are:
1, 2, 6
6, 2, 3, 4, 5
1, 2, 3, 4, 5
You can do this in (edit: not linear, quadratic) time.
def leaf_paths(node, paths = [], stacks = {}, visited = set()):
visited.add(node)
if node.left is None and node.right is None:
for leaf, path in stacks.iteritems():
path.append(node)
paths.append(path[:])
stacks[node] = [node]
else:
for leaf, path in stacks.iteritems():
if len(path) > 1 and path[-2] == node:
path.pop()
else:
path.append(node)
if node.left and node.left not in visited:
leaf_paths(node.left, paths, stacks, visited)
elif node.right and node.right not in visited:
leaf_paths(node.right, paths, stacks, visited)
elif node.parent:
leaf_paths(node.parent, paths, stacks, visited)
return paths
for path in leaf_paths(root):
print [n.key for n in path]
An output for your tree will be:
[1, 2, 6]
[6, 2, 3, 4, 5]
[1, 2, 3, 4, 5]
The idea is to track all visited leaves while traversing a tree. And to keep stack of paths for each leaf. So here is memory/performance tradeoff.
Draw the tree. Let v be a vertex and p its parent. The length of the longest path including v but not p = (height of left subtree of v) + (height of right subtree of v).
The maximum over all v is the longest path in the graph. You can calculate this in O(n):
First calculate all the intermediate heights. Start at the leaves and work up: (height below v) = 1 + max(height below left child, height below right child)
Then calculate the sum (height of left subtree of v) + (height of right subtree of v) for each vertex v, and take the maximum. This is the length of longest path in the graph.
Is there a formula to calculate what the maximum and minimum height for an AVL tree, given a certain number of nodes?
For example:
Textbook question:
What is the maximum/minimum height for an AVL tree of 3 nodes, 5 nodes, and 7 nodes?
Textbook answer:
The maximum/minimum height for an AVL tree of 3 nodes is 2/2, for 5 nodes is 3/3, for 7 nodes is 4/3
I don't know if they figured it out by some magic formula, or if they draw out the AVL tree for each of the given heights and determined it that way.
The solution below is appropriate for working things out by hand and gaining an intuition, please see the exact formulas at the bottom of this answer for larger trees (54+ nodes).1
Well the minimum height2 is easy, just fill each level of the tree with nodes until you run out. That height is the minimum.
To find the maximum, do the same as for the minimum, but then go back one step (remove the last placed node) and see if adding that node to the opposite sub-tree (from where it just was) violates the AVL tree property. If it does, your max height is just your min height. Otherwise this new height (which should be min height+1) is your max height.
If you need an overview of what the properties of an AVL tree are, or just a general explanation of an AVL tree, Wikipedia is a great place to start.
Example:
Let's take the 7 node example case. You fill in all levels and find a completely filled tree of height 3. (1 at level 1, 2 at level 2, 4 at level 3. 1+2+4=7 nodes.) That means 3 is your minimum.
Now find the max. Remove that last node and place it on the left subtree instead of the right. The right subtree still has height 3, but the left subtree now has height 4. However these values differ by less than 2, so it is still an AVL tree. Therefore your max height is 4. (Which is min+1)
All three examples worked out below (note that the numbers correspond to order of placement, NOT value):
Formulas:
The technique shown above doesn't hold if you have a tree with a very large number nodes. In this case, one can use the following formulas to calculate the exact min/max height2.
Given n nodes3:
Minimum: ceil(log2(n+1))
Maximum: floor(1.44*log2(n+2)-.328)
If you're curious, the first time max-min>1 is when n=54.
1Thanks to Jamie S for bringing this failure at larger node counts to my attention.
2Technically, the height of a tree is the longest path length (in edges) between the root and any leaf node. However the OP's textbook uses a common alternate definition of height as the number of levels in a tree. For consistency with the OP and Wikipedia, we use that definition in this post as well.
3These formulas are from the Wikipedia AVL page, with constants plugged in. The original source is Sorting and searching by Donald E. Knuth (2nd Edition).
It's important to note the following defining characteristics of an AVL Tree.
AVL Tree Property
The nodes of an AVL tree abide by the BST property
AND The heights of the left and right sub-trees of any node differ by no more than 1.
Theorem: The AVL property is sufficient to maintain a worst case tree height of O(log N).
Note the following diagram.
- T1 is comprised of a T0 + 1 node, for a height of 1.
- T2 is comprised of T1 and a T0 + 1 node, giving a height of 2.
- T3 is comprised of a T2 for the left sub-tree and a T1 for the right
sub-tree + 1 node, for a height of 3.
- T4 is comprised of a T3 for the left sub-tree and a T2 for the right
sub-tree + 1 node, for a height of 4.
If you take the ceiling of O(log N), where N represents the number of nodes in an AVL tree, you get the height.
Example) T4 contains 12 nodes. [ceiling]O(log 12) = 4.
See the pattern developing here??
**The worst-case height is
Lets assume the number of nodes is n
Trying to find out the minimum height of an AVL tree would be the same as trying to make the tree complete i.e. fill all the possible nodes at each level and then move to the next level.
So at each level the number of eligible nodes increases by 2^(h-1) where h is the height of the tree.
So at h=1, nodes(1) = 2^(1-1) = 1 node
for h=2, nodes(2) = nodes(1)+2^(2-1) = 3 nodes
for h=3, nodes(3) = nodes(2)+2^(3-1) = 7 nodes
so just find the smallest h, for which nodes(h) is greater than the given number of nodes n.
Now for the problem of maximum height of an AVL tree:-
lets assume that the AVL tree is of height h, F(h) being the number of nodes in the AVL tree,
for its height to be maximum lets assume that its left subtree FL and right subtree FR have a difference in height of 1(as it satisfies the AVL property).
Now assuming FL is a tree with height h-1 and FR be a tree with height h-2.
now the number of nodes in
F(h)=F(h-1)+F(h-2)+1 (Eq 1)
Adding 1 on both sides :
F(h)+1=(F(h-1)+1)+ (F(h-2)+1) (Eq 2)
So we have reduced the maximum height problem to a Fibonacci sequence. And these trees F(h) are called Fibonacci Trees.
So, F(1)=1 and F(2)=2
so in order to get the maximum height just find the index of the the number in the fibonacci sequence which is less than or equal to n.
So applying (Eq 1)
F(3)= F(2) + F(1)+ 1=4, so if n is between 2 and 4 tree will have height 3.
F(4)= F(3)+ F(2)+ 1 = 7, similarly if n is between 4 and 7 tree will have height 4.
and so on.
http://lcm.csa.iisc.ernet.in/dsa/node112.html
It is roughly 1.44 * log n, where n is the number of nodes.
For a more detailed description on how that was derived. You can refer to this link starting on the middle of page 13: http://www.compsci.hunter.cuny.edu/~sweiss/course_materials/csci335/lecture_notes/chapter04.2.pdf
This Wolfram link talked a bit about 'Labelled' Binary tree. So is there something called 'Unlabelled' binary tree as well ? A concise explanation of Both would be really nice.
Why am i searching for this ?
I'm trying to answer this question :
We are given a set of n distinct elements and an unlabeled binary tree with n nodes. In how many ways can we populate the tree with the given set so that it becomes a binary search tree?
Now, i know the number of Binary trees given n nodes is the nth Catalan number, but now i'm confused : which of the above two types does this formula apply to then ?
PS: some help with the question in quotes would be very nice too :)
A binary tree can have labels assigned to each node or not. For a given unlabeled binary tree with n nodes we have n! ways to assign labels. (Consider an in-order traversal of the nodes and which we want to map to a permutation of labels 1..n)
From the above we can see that nth Catalan number gives the number of unlabeled binary trees.
Take for example n = 3. We have the following trees 5 trees:
1. o 2. o 3. o 4. o 5. o
\ \ / \ / /
o o o o o o
/ \ / \
o o o o
in general this number is given by the formulae of the N-th Catalan Number.
To get the number of labeled trees you have to multiply by n! so for n = 3 we have 30 trees in total. Basically for each of the five unlabeled BSTs above we create !3 = 6 labeled BSTs with labels:
1: 1, 2, 3
2: 1, 3, 2
3: 2, 1, 3
4: 2, 3, 1
5: 3, 1, 2
6: 3, 2, 1
Hope this helps to understand the difference.
Well, as far as I understood, the "unlabeled" means that we don't know the nodes of this tree.
The problem then asks us how many different ways to assign the element to the node so the tree will be binary search tree.
UPDATE: It means that we want to set value for each node from that N elements, so we don't break the main binary search tree condition. It means that after all nodes have values, we still have binary tree - that the key in any node is larger than the keys in all nodes in that node's left sub-tree and smaller than the keys in all nodes in that node's right sub-tree.
Binary search tree on Wikipedia
We have came across a question in Thomas H. Cormen which are asking for showing
Here I am confused by this question that how there will be at most nodes
For instance, consider this problem:
In the above problem at height 2 there are 2 nodes. But if we calculate by formula:
Greatest Integer of (10/2^2+1) = 4
it does not satisfy Thomas H. Cormen questions.
Please correct me if I am wrong here.
Thanks in Advance
In Tmh Corman I observed that he is doing height numbering from 1 not from 0 so the formula is correct, I was doing wrong Interpration. So leaf as height 1 and root has height 4 for above question
Reading all the answers, I realized that the confusion comes from the precise definition of height. In page 153 of CLRS book, the height is defined as follows:
Viewing a heap as a tree, we define the height of a node in a heap to be the number of edges on the longest simple downward path from the node to a leaf...
Now let's look at the original heap provided by Nishant. The nodes 8, 9, 10, 6, and 7 are at height 0 (i.e., leaves). The nodes 4, 5 and 3 are at height 1. For example, there is one edge between node 5 and its leaf, node 10. Also there is one edge between node 3 and its leaf node 6. Node 6 looks like it is at height 1 but it is at height 0 and hence a leaf. The node 2 is the only node at height 2. You may wonder node 1 (the root) is two edges away from node 6 and 7 (leaves), and say node 1 is also at height 2. But if we look back at the definition, the bold-face word "longest" suggests that the longest simple downward path from the root to a leaf has 3 edges (passing node 2). Finally, the node 1 is at height 3.
In summary, there are 5, 3, 1, 1 nodes at height 0, 1, 2, 3, respectively.
Let's apply the formula to the observation we made in the above paragraph. I would like to point out that the formula given by Nishant is not correct.
It should be
ceiling(n/2^(h+1)) not ceiling(n/(2^h+1). Sorry about the terrible formatting. I am not able to post an image yet.
Anyways, using the correct formula,
h = 0, ceiling(10/2) = 5 (nodes 8, 9, 10, 6, and 7)
h = 1, ceiling(10/4) = 3 (nodes 4, 5 and 3)
h = 2, ceiling(10/8) = 2 (node 2, but this is okay because the formula is predicting that there are at most 2 nodes at height 2.)
h = 3, ceiling(10/16) = 1 (node 1)
With the correct definition of height, the formula works.
It looks like your formula says there are at most [n/2^h+1] nodes of height h. In your example there are two nodes of height 2, which is less than your computed possible maximum of 4(ish).
While calculating the tight bound for Build-Max-Heap author has used this property in the equation.
In this case we call the helper Max-Heapify which takes O(h) where h is the height of the sub-tree rooted at the current node (not the height of node itself with respect to the full tree).
Therefore if we consider the sub tree rooted at leaf node, it will have height 0 and number of nodes in the tree at that level would be at most n / 20+1 = n/2 (i.e h=0 for the sub tree formed from node at leaves).
Similarly for sub-tree rooted at actual root the height of the tree would be log(n) and in that case the number of nodes at that level would be 1 i.e floor of n / 2logn+1 = [n/n+1].
Formula for
no. of nodes = n/(2^(h+1))
so when h is 2, and n = 10
no. of nodes = 10/(2^(2+1)) = 10/(2^3) = 10/8 = 1.25
But
ceil of 10/8 = 2
Hence there are 2 nodes which you can see from the figure.
Though it is mentioned in Cormen that height of a node is the greatest distance traveled from node to leaf(the number of edges), if you take height to be the distance of a node from the leaf, i.e. at leaf the height is zero and at root the height is log(n). The formula stands correct.
As for the leaves you have h=0; hence by the formula n/(2^(h+1))
h=0; max number of leaves in the heap will be n/2.
what about height 1. Cormen's theory gives 10/(2^(1+1))=3(ceil) while there is 4 nodes at height 1. This is a contradiction.
It is not true that Thomas H. Cormen is counting the height of the tree starting from one, height is h = 0, 1, ..., log n and it increases as you go upwards:
and in the following formula, he added 1 plus the height:
All the confusion is coming from the fact that this will work nicely with Perfect Binary Trees, not with the one you are showing in your question, this is why he is saying ON MOST
when you consider Big-O it wouldn't really matter
This formula is wrong, it gives wrong answers in many cases like in this question for h=1 (ie second last level) it gives maximum number of nodes is 3 but there are 4 nodes. Also let us consider a tree with 4 nodes :
a
/ \
b c
/
d
node d has height 0, let us consider for height =1 using the formula n/2^(h+1) we get
4/2^(1+1) = 1
which means this level can have at most 1 node which is false !
so this formula is not right.
The formula is quite correct. Nothing is wrong with the formula!!
Lets take the tree(although its not heap yet its complete) in the question posed by Nishant on the top.
For h=0 means all leaves so ceil(10/2^(0+1)=5) so there are 5 leaves
For h=1 means all nodes which have one arc to reach the leaves, so ceil(10/2^(1+1))=3 there are 3 such nodes in your tree.
For h=2 means all nodes which have two consecutive arcs to reach leaves, so ceil(10/2^(2+1))=1 so you have only one such node(left successor of the root)
For h=3 means all nodes which have three arcs to leaves, so ceil(10/2^(3+1))=1 which is the root.
Moral of the story is that you are confused between height and level. Level starts from up to down. Which means you have 4 nodes on level 2. i.e you can reach 4 nodes if you start at root and moves two arcs down.
Whereas height is completely different. Like in above case at height 0 there are 5 nodes (3 on level 3, and 2 on level 2). Hence height h of a node n means how many arcs you can travel to reach a leaf.
regards,
Hope it clarifies the point.
Safdar from Pakistan
Can anybody give me proof how the number of nodes in strictly binary tree is 2n-1 where n is the number of leaf nodes??
Proof by induction.
Base case is when you have one leaf. Suppose it is true for k leaves. Then you should proove for k+1. So you get the new node, his parent and his other leaf (by definition of strict binary tree). The rest leaves are k-1 and then you can use the induction hypothesis. So the actual number of nodes are 2*(k-1) + 3 = 2k+1 == 2*(k+1)-1.
just go with the basics, assuming there are x nodes in total, then we have n nodes with degree 1(leaves), 1 with degree 2(the root) and x-n-1 with degree 3(the inner nodes)
as a tree with x nodes will have x-1 edges. so summing
n + 3*(x-n-1) + 2 = 2(x-1) (equating the total degrees)
solving for x we get x = 2n-1
I'm guessing that what you really want is something like a proof that the depth is log2(N), where N is the number of nodes. In this case, the answer is fairly simple: for any given depth D, the number of nodes is 2D.
Edit: in response to edited question: the same fact pretty much applies. Since the number of nodes at any depth is 2D, the number of nodes further up the tree is 2D-1 + 2D-2 + ...20 = 2D-1. Therefore, the total number of nodes in a balanced binary tree is 2D + 2D-1. If you set n = 2D, you've gone the full circle back to the original equation.
I think you are trying to work out a proof for: N = 2L - 1 where L is the number
of leaf nodes and N is the total number of nodes in a binary tree.
For this formula to hold you need to put a few restrictions on how the binary
tree is constructed. Each node is either a leaf, which means it has no children, or
it is an internal node. Internal nodes have 3
possible configurations:
2 child nodes
1 child and 1 internal node
2 internal nodes
All three configurations imply that an internal node connects to two other nodes. This explicitly
rules out the situation where node connects to a single child as in:
o
/
o
Informal Proof
Start with a minimal tree of 1 leaf: L = 1, N = 1 substitute into N = 2L - 1 and the see that
the formula holds true (1 = 1, so far so good).
Now add another minimal chunk to the tree. To do that you need to add another two nodes and
tree looks like:
o
/ \
o o
Notice that you must add nodes in pairs to satisfy the restriction stated earlier.
Adding a pair of nodes always adds
one leaf (two new leaf nodes, but you loose one as it becomes an internal node). Node growth
progresses as the series: 1, 3, 5, 7, 9... but leaf growth is: 1, 2, 3, 4, 5... That is why the formula
N = 2L - 1 holds for this type of tree.
You might use mathematical induction to construct a formal proof, but this works find for me.
Proof by mathematical induction:
The statement that there are (2n-1) of nodes in a strictly binary tree with n leaf nodes is true for n=1. { tree with only one node i.e root node }
let us assume that the statement is true for tree with n-1 leaf nodes. Thus the tree has 2(n-1)-1 = 2n-3 nodes
to form a tree with n leaf nodes we need to add 2 child nodes to any of the leaf nodes in the above tree. Thus the total number of nodes = 2n-3+2 = 2n-1.
hence, proved
To prove: A strictly binary tree with n leaves contains 2n-1 nodes.
Show P(1): A strictly binary tree with 1 leaf contains 2(1)-1 = 1 node.
Show P(2): A strictly binary tree with 2 leaves contains 2(2)-1 = 3 nodes.
Show P(3): A strictly binary tree with 3 leaves contains 2(3)-1 = 5 nodes.
Assume P(K): A strictly binary tree with K leaves contains 2K-1 nodes.
Prove P(K+1): A strictly binary tree with K+1 leaves contains 2(K+1)-1 nodes.
2(K+1)-1 = 2K+2-1
= 2K+1
= 2K-1 +2*
* This result indicates that, for each leaf that is added, another node must be added to the father of the leaf , in order for it to continue to be a strictly binary tree. So, for every additional leaf, a total of two nodes must be added, as expected.
int N = 1000; insert here the value of N
int sum = 0; // the number of total nodes
int currFactor = 1;
for (int i = 0; i< log(N); ++i) //the is log(N) levels
{
sum += currFactor;
currFactor *= 2; //in each level the number of node is double than the upper level
}
if(sum == 2*N - 1)
{
cout<<"wow that the number of nodes is 2*N-1";
}