Ideal height of tree structure - data-structures

How can I calculate the ideal height of a tree structure?
When I have this tree
I know the height is 4.
There's a formula that says that the ideal height of a tree is 2 ^ height - 1 but that doesn't make sense to me (since it would be 15).
Can someone please explain?

Well, first of all, that formula applies only to binary trees. Second, the ideal number of nodes in the tree will be 2^height-1. For a saturated binary tree of height 4, the number of nodes will be 15.

That formula is for the maximum number of nodes that can be included in a binary tree of that height. Assuming you want the tree to be as shallow as possible, you want to know the minimum height of such a tree given the number of nodes. So you simply invert:
nodes = 2^height - 1
to get
height = log2(nodes + 1)
rounded up.

Height of the tree is the maximum height among all the nodes in the tree. Now say you have a tree
1
/ \
2 3
/ \ / \
4 5 6 7
the height of the tree is 3(since all path lengths are same so lets say 1-2-5 is maximum) now as there are three levels so no of node at each level
1 =2^0
/ \
2 3 =2^1
/ \ / \
4 5 6 7 =2^2
total =2^0 +2^1+2^2= clearly its a gp with sum 2^3-1 ,hence the number of nodes =2^height-1
if you talk about levels(as they start from 0) no of nodes= 2^(level+1)-1

Related

Given a red-black tree on n nodes, what is the maximum number of red nodes on any root to leaf path?

This was a quiz question. I'm not sure whether my answer was right. Please help me out.
Lets say the height is h, since no two consecutive nodes (as we go up the tree) can be red, wouldn't the max number of red nodes be h/2? (h = log n)
Somehow, I feel that is not the correct answer.
Any help/input would be greatly appreciated!
Thank you so much in advance!
** Edit ** This answer assumes a definition of height to be the number of nodes in the longest path from root to leaf (used, e.g., in lecture notes here) including the "virtual" black leaf nodes. More common definition counts the number of edges, and does not include the leaf nodes. With this definition the answer is round(h/2), and if you include the leaf nodes in to height round_down(h/2). ** Edit ends **
If you follow the rules that the root node is black as in Wikipedia, then the correct answer is the largest integer smaller than h/2. This is just because root and leaves are black, and half of the nodes (rounded up) in between can be red. I.e. round((h-2)/2)
You can also find the rule just by considering some small red-black trees of different heights.
Case h=1 root is black -> 0 red nodes
Case 'h=2' root is black and leaves are black -> 0 red nodes
Case h=3 root is black, second level can be red, and leaves must be black -> max 1 red node
Case h=4 root is black, second level can be red, third level must be black, and leaves must be black -> max 1 red node
Case h=5 black, red, black, red, black -> max 2 red nodes.
The h as a function of n is trickier, but it can be shown that h <= 2 log (n+1), which guarantees the logarithmic search time. For a proof see, e.g., Searching and Search Trees II (page 11). The proof is based on the fact that the rules of red-black tree guarantee that a subtree starting at x contains at least 2^(bh(x)) - 1 internal nodes, where bh(x) is the black height - number of black nodes in path from root to leaf. This is proven by induction. Then by noting that at most half of the nodes are black (we are speaking of subtrees so the root can be red) that bh(x) >= h/2. Now using these results we get n >= 2^bh(x) - 1 >= 2^(h/2) -1. Solving for h, we get the answer h <= 2 log(n+1).
As the question was a quiz, it should be enough to say that h is proportional to log(n) or even about log(n).
Let's first see how few nodes (minimising n) are needed to make a path with 1 red node (* is black):
*
/ \
* R
/ \
* *
So n must be at least 5 when 1 red node is needed. It has 3 leaf nodes, and 2 internal nodes. Removing any node will require to drop the red node as well to stay within the rules.
If we want to extend this tree to get a path with 2 red nodes we could apply the following two steps:
All leaves get two black children
The right-most leaf (just added) is turned into a red node, and it gets 2 black children.
The dollar signs are the added black nodes compared to the prevous tree:
*
/ \
* R
/| / \
$ $ * *
/| / \
$ $ $ R
/ \
$ $
We choose to place that path with the red nodes on the right side; this choice does not influence the conclusions. Note that it does not help to add red nodes in other, shorter paths, as this will only increase the number of nodes without increasing the path with the most red nodes.
The number of leaf nodes (L) doubles with step 1, while the nodes that were leaves become internal nodes (I).
The second step increases both the number of internal nodes and number of leaves with 1. More formally put, we can find these formulas, where the index r represents the number of red nodes:
L1 = 3
I1 = 2
Lr+1 = 2Lr + 1
Ir+1 = Ir + Lr + 1
Put in a table for increasing r:
r | L | I | n=L+I
----+-----+-----+-------
1 | 3 | 2 | 5
2 | 7 | 6 | 13
3 | 15 | 14 | 29
4 | 31 | 30 | 61
... | ... | ... | ...
We can see the following is true:
Lr = 2r+1 - 1
Ir = 2r+1 - 2
And so:
nr = 2r+2 - 3
So we have a formula for knowing the minimum number of nodes needed to have a path with r red nodes. We need a different relation: the maximum for r when given n.
From the above we can derive:
r = ⌊ log2(n+3) ⌋ - 2

How can you calculate depth of a binary tree with less complexity?

Given a binary search tree t, it is rather easy to get its depth using recursion, as the following:
def node_height(t):
if t.left.value == None and t.right.value == None:
return 1
else:
height_left = t.left.node_height()
height_right = t.right.node_height()
return ( 1 + max(height_left,height_right) )
However, I noticed that its complexity increases exponentially, and thus should perform very badly when we have a deep tree. Is there any faster algorithm for doing this?
If you store the height as a field in the Node object, you can add 1 as you add nodes to the tree (and subtracting during remove).
That'll make the operation constant time for getting the height of any node, but it adds some additional complexity into the add/remove operations.
This kind of extends from what #cricket_007 mentioned in his answer.
So, if you do a ( 1 + max(height_left,height_right) ), you end up having to visit every node, which is essentially an O(N) operation. For an average case with a balanced tree, you would be looking at something like T(n) = 2T(n/2) + Θ(1).
Now, this can be improved to a time of O(1) if you can store the height of a certain node. In that case, the height of the tree would be equal to the height of the root. So, the modification you would need to make would be to your insert(value) method. At the beginning, the root is given a default height of 0. The node to be added is assigned a height of 0. For every node you encounter while trying to add this new node, increase node.height by 1 if needed, and ensure it is set to 1 + max(left child's height, right child's height). So, the height function will simply return node.height, hence allowing for constant time. The time complexity for the insert will also not change; we just need some extra space to store n integer values, where n is the number of nodes.
The following is shown to give an understanding of what I am trying to say.
5 [0]
- insert 2 [increase height of root by 1]
5 [1]
/
/
[0] 2
- insert 1 [increase height of node 2 by 1, increase height of node 5 by 1]
5 [2]
/
/
[1] 2
/
/
[0] 1
- insert 3 [new height of node 2 = 1 + max(height of node 1, height of node 3)
= 1 + 0 = 1; height of node 5 also does not change]
5 [2]
/
/
[1] 2
/ \
/ \
[0] 1 3 [0]
- insert 6 [new height of node 5 = 1 + max(height of node 2, height of node 6)
= 1 + 1 = 2]
5 [2]
/ \
/ \
[1] 2 6 [0]
/ \
/ \
[0] 1 3 [0]

How to make a N-level tree in pyramid fashion, such that each child may(doesn't mean has to) have 2 parents?

The question may look very simple, and probably the answer is too, but I always get confused in the tree questions.
Ok so I want to make a tree something like:
3 level 0
/ \
4 5 level 1 ..
/ \ / \
6 7 8
/ \ / \ / \
9 10 11 12
What are such trees called? Sorry, I'm a beginner..
Function can pass an array[] of ints, or function can take input till N = 3 (denoting level 3 with 10 nodes). Also can you give solution in C/C++/Java.
Given your requirements are only for traversal, I would simply implement this using an array a, containing each level as a contiguous sub-array. Level i then occurs in entries L(i-1) up to but not including L(i), where L(n) = n*(n+1)/2. In particular, the jth value on the ith level is in a[L(i-1)+j].
As long as you always keep track of i and j, you can now easily navigate through your pyramid.

Balanced Binary Search Tree for numbers

I wanted to draw a balanced binary search tree for numbers from 1 to 20.
_______10_______
/ \
___5___ 15
/ \ / \
3 8 13 18
/ \ / \ / \ / \
2 4 7 9 12 14 17 19
/ / / /
1 6 11 16
Is the above tree correct and balanced?
In answer to your original question as to whether or not you need to first calculate the height, no, you don't need to. You just have to understand that a balanced tree is one where the height difference between the tallest and shortest node is zero or one, and the simplest way to achieve this is to ensure that you always pick the midpoint of the possible list, when populating the top node in a sub-tree.
Your sample tree is balanced since all leaf nodes are either at the bottom or next-to-bottom level, hence the difference in heights between any two leaf nodes is at most one.
To create a balanced tree from the numbers 1 through 20 inclusive, you can just make the root entry 10 or 11 (the midpoint being 10.5 for those numbers), so that there's an equal quantity of numbers in either sub-tree.
Then just do that recursively for each sub-tree. On the lower side of 10, 5 is the midpoint:
10
/ \
5 11-thru-19 sub-tree
/ \
1-thru-4 6-thru-9
sub-tree sub-tree
Just expand on that and you'll end up with something like:
_______10_______
/ \
___5___ 15
/ \ / \
2 7 13 17
/ \ / \ / / \
1 3 6 8 11 16 18 <- depth of highest leaf node
\ \ \ \
4 9 12 19 <- depth of lowest leaf node
^
|
Difference is 1
The midpoint can be found at the number where the difference between quantities above and below that numbers is one or zero. For the whole list of numbers 1 through 20 inclusive, there are nine less than 10 and ten greater than 10 (or, if you chose 11 as the midpoint, the quantities are ten and nine).
The difference between your sample and mine is probably to do with the fact that I preferred to pick the midpoint by rounding down where there was a choice (meaning my right sub-trees tend to be "heavier"). Because your left sub-trees are heavier, you appear to have rounded up.
After choosing 10 as the initial midpoint, there's no leeway on the left sub-tree, you have to choose 5 since it has four above and below it. Any other midpoint would result in a difference of at least two between the two halves (for example, choosing 4 as the midpoint would have the two halves of size three and five). This can still give you a balanced sub-tree depending on the data but it's "safer" to choose the midpoint.

Analytical solution to predict array size of binary tree

I'm constructing a binary tree for a sequence of data and the tree is stored in a 1-based array. So if index of parent node is idx,
the left child is 2 * idx and the right is 2 * idx + 1.
Every iteration, I sort current sequence based on certain criteria, select the median element as parent, tree[index] = sequence[median], then do same operation on left(the sub sequence before median) and right(the subsequence after median) recursively.
Eg, if 3 elements in total, the tree will be:
1
/ \
2 3
the array size to store the tree is also 3
4 elements:
1
/ \
2 3
/
4
the array size to store the tree is also 4
5 elements:
1
/ \
2 3
/ \ /
4 null 5
the array size to store the tree has to be 6, since there is a hole between 4 and 5.
Thus, the array size is only determined by number of elements, I believe there is an anlytical solution for it, just can't prove it.
Any suggestion will be appreciated.
Thanks.
Every level of a binary tree contains twice as many nodes as the previous level. If you have n nodes, then the number of levels required (the height of the tree) is log2(n) + 1, rounded up to a whole number. So if you have 5 nodes, your binary tree will have a height of 3.
The number of nodes in a full binary tree of height h is (2^h) - 1. So you know that the maximum size array you need for 5 items is 7. Assuming all the levels are filled except possibly the last one.
The last row of your tree will contain (2^h)-1 - n nodes. The last level of a full tree contains 2^(h-1) nodes. Assuming you want it balanced so half of the nodes are on the left and half are on the right, and the right side is left-filled, that is, you want this:
1
2 3
4 5 6 7
8 9 10 11
The number of array spaces required required for the last level of your tree, then, is either 1, or it's half the number required by a full tree, plus half the nodes required by your tree.
So:
n = 5
height = roundUp(log2(n) + 1)
fullTreeNodes = (2^height) - 1
fullTreeLeafNodes = 2^(height-1)
nodesOnLeafLevel = fullTreeNodes - n
Now comes the fun part. If there is more than 1 node required on the leaf level, and you want to balance the sides, you need half of fullTreeLeafNodes, plus half of nodesOnLeafLevel. In the tree above, for example, the leaf level has a potential for 8 nodes. But you only have 4 leaf nodes. You want two of them on the left side, and two on the right. So you need to allocate space for 4 nodes on the left side (2 for the left side items, and 2 empty spaces), plus two more for the two right side items.
if (nodesOnLeafLevel == 1)
arraySize = n
else
arraySize = (fullTreeNodes - fullTreeLeafNodes/2) + (nodesOnLeafLevel / 2)
You really shouldn't have any holes. They are created by your partitioning algorithm, but that algorithm is incorrect.
For 1-5 items, your trees should look like:
1 2 2 3 4
/ \ / \ / \ / \
1 1 3 2 4 2 5
/ / \
1 1 3
The easiest way to populate the tree is to do an in-order traversal of the node locations, filling items from the sequence in order.
I'm close to formalizing a solution. By intuition, first find the maximal power of 2 < N, then check whether the N - 2^m is even or odd, decide which part of the leave level need be growed.
int32_t rup2 = roundUpPower2(nPoints);
if (rup2 == nPoints || rup2 == nPoints + 1)
{
return nPoints;
}
int32_t leaveLevelCapacity = rup2 / 2;
int32_t allAbove = leaveLevelCapacity - 1;
int32_t pointsOnLeave = nPoints - allAbove;
int32_t iteration = roundDownLog2(pointsOnLeave);
int32_t leaveSize = 1;
int32_t gap = leaveLevelCapacity;
for (int32_t i = 1; i <= iteration; ++i)
{
leaveSize += gap / 2;
gap /= 2;
}
return (allAbove + leaveSize);

Resources