Analytical solution to predict array size of binary tree - algorithm

I'm constructing a binary tree for a sequence of data and the tree is stored in a 1-based array. So if index of parent node is idx,
the left child is 2 * idx and the right is 2 * idx + 1.
Every iteration, I sort current sequence based on certain criteria, select the median element as parent, tree[index] = sequence[median], then do same operation on left(the sub sequence before median) and right(the subsequence after median) recursively.
Eg, if 3 elements in total, the tree will be:
1
/ \
2 3
the array size to store the tree is also 3
4 elements:
1
/ \
2 3
/
4
the array size to store the tree is also 4
5 elements:
1
/ \
2 3
/ \ /
4 null 5
the array size to store the tree has to be 6, since there is a hole between 4 and 5.
Thus, the array size is only determined by number of elements, I believe there is an anlytical solution for it, just can't prove it.
Any suggestion will be appreciated.
Thanks.

Every level of a binary tree contains twice as many nodes as the previous level. If you have n nodes, then the number of levels required (the height of the tree) is log2(n) + 1, rounded up to a whole number. So if you have 5 nodes, your binary tree will have a height of 3.
The number of nodes in a full binary tree of height h is (2^h) - 1. So you know that the maximum size array you need for 5 items is 7. Assuming all the levels are filled except possibly the last one.
The last row of your tree will contain (2^h)-1 - n nodes. The last level of a full tree contains 2^(h-1) nodes. Assuming you want it balanced so half of the nodes are on the left and half are on the right, and the right side is left-filled, that is, you want this:
1
2 3
4 5 6 7
8 9 10 11
The number of array spaces required required for the last level of your tree, then, is either 1, or it's half the number required by a full tree, plus half the nodes required by your tree.
So:
n = 5
height = roundUp(log2(n) + 1)
fullTreeNodes = (2^height) - 1
fullTreeLeafNodes = 2^(height-1)
nodesOnLeafLevel = fullTreeNodes - n
Now comes the fun part. If there is more than 1 node required on the leaf level, and you want to balance the sides, you need half of fullTreeLeafNodes, plus half of nodesOnLeafLevel. In the tree above, for example, the leaf level has a potential for 8 nodes. But you only have 4 leaf nodes. You want two of them on the left side, and two on the right. So you need to allocate space for 4 nodes on the left side (2 for the left side items, and 2 empty spaces), plus two more for the two right side items.
if (nodesOnLeafLevel == 1)
arraySize = n
else
arraySize = (fullTreeNodes - fullTreeLeafNodes/2) + (nodesOnLeafLevel / 2)

You really shouldn't have any holes. They are created by your partitioning algorithm, but that algorithm is incorrect.
For 1-5 items, your trees should look like:
1 2 2 3 4
/ \ / \ / \ / \
1 1 3 2 4 2 5
/ / \
1 1 3
The easiest way to populate the tree is to do an in-order traversal of the node locations, filling items from the sequence in order.

I'm close to formalizing a solution. By intuition, first find the maximal power of 2 < N, then check whether the N - 2^m is even or odd, decide which part of the leave level need be growed.

int32_t rup2 = roundUpPower2(nPoints);
if (rup2 == nPoints || rup2 == nPoints + 1)
{
return nPoints;
}
int32_t leaveLevelCapacity = rup2 / 2;
int32_t allAbove = leaveLevelCapacity - 1;
int32_t pointsOnLeave = nPoints - allAbove;
int32_t iteration = roundDownLog2(pointsOnLeave);
int32_t leaveSize = 1;
int32_t gap = leaveLevelCapacity;
for (int32_t i = 1; i <= iteration; ++i)
{
leaveSize += gap / 2;
gap /= 2;
}
return (allAbove + leaveSize);

Related

Segment tree data position to tree position relation

I wonder if there is any relation between data_array data position to tree_array data position.
int data[N];
int tree[M]; // lets M = 2^X-1, where X = nearest ceiling power of 2 to N;
void build_segment_tree();
I wonder if I can say n'th value of data[] is mapped with i'th value of tree[]. is there any mathematical resolution?
You certainly can. For example segment tree is used for it's capapbility to store
segment information.
Now you will see that if you want to create a segment tree out of N elements then
you will need ceil(log_2(N))+1 levels. And in the last level you will find all the
1 length-range or the single elements.
These elements will be precisely in the position (1-index) 2^ceil(log_2(N)) to 2^ceil(log_2(N))+N-1.
[1-8]
/ \
[1-4] [5-8]
/ \ / \
[1-2][3-4] [5-6][7-8]
/\ /\ /\ /\
[1][2] [3][4] [5][6] [7][8]
1-11
/ \
1-6 7-11
1-3 4-6 7-9 10-11
1-2 3 4-5 6 7-8 9 10 11
1 2 4 5 7 8
This answer is for only valid for segment tree of power of 2 elements.
But for other elements the elements are not necessarily organized.
So the answer will be false for N those are not power of 2.
On that case you can't find any formualitve rule.

How can you calculate depth of a binary tree with less complexity?

Given a binary search tree t, it is rather easy to get its depth using recursion, as the following:
def node_height(t):
if t.left.value == None and t.right.value == None:
return 1
else:
height_left = t.left.node_height()
height_right = t.right.node_height()
return ( 1 + max(height_left,height_right) )
However, I noticed that its complexity increases exponentially, and thus should perform very badly when we have a deep tree. Is there any faster algorithm for doing this?
If you store the height as a field in the Node object, you can add 1 as you add nodes to the tree (and subtracting during remove).
That'll make the operation constant time for getting the height of any node, but it adds some additional complexity into the add/remove operations.
This kind of extends from what #cricket_007 mentioned in his answer.
So, if you do a ( 1 + max(height_left,height_right) ), you end up having to visit every node, which is essentially an O(N) operation. For an average case with a balanced tree, you would be looking at something like T(n) = 2T(n/2) + Θ(1).
Now, this can be improved to a time of O(1) if you can store the height of a certain node. In that case, the height of the tree would be equal to the height of the root. So, the modification you would need to make would be to your insert(value) method. At the beginning, the root is given a default height of 0. The node to be added is assigned a height of 0. For every node you encounter while trying to add this new node, increase node.height by 1 if needed, and ensure it is set to 1 + max(left child's height, right child's height). So, the height function will simply return node.height, hence allowing for constant time. The time complexity for the insert will also not change; we just need some extra space to store n integer values, where n is the number of nodes.
The following is shown to give an understanding of what I am trying to say.
5 [0]
- insert 2 [increase height of root by 1]
5 [1]
/
/
[0] 2
- insert 1 [increase height of node 2 by 1, increase height of node 5 by 1]
5 [2]
/
/
[1] 2
/
/
[0] 1
- insert 3 [new height of node 2 = 1 + max(height of node 1, height of node 3)
= 1 + 0 = 1; height of node 5 also does not change]
5 [2]
/
/
[1] 2
/ \
/ \
[0] 1 3 [0]
- insert 6 [new height of node 5 = 1 + max(height of node 2, height of node 6)
= 1 + 1 = 2]
5 [2]
/ \
/ \
[1] 2 6 [0]
/ \
/ \
[0] 1 3 [0]

Heap sort pseudo code algorithm

In heap sort algorithm
n=m
for k:= m div 2 down to 0
downheap(k);
repeat
t:=a[0]
a[0]:=a[n-1]
a[n-1]:=t
n—
downheap(0);
until n <= 0
Can some one please explain to me what is done in lines
n=m
for k:= m div 2 down to 0
downheap(k);
I think that is the heap building process but what is mean by for k:= m div 2 down to 0
Also is n the number of items.So in an array representation last element is stored at a[n-1]?
But why do it for n> = 0. Can't we finish at n>0.Because the first element gets automatically sorted?
n=m
for k:= m div 2 down to 0
downheap(k);
In a binary heap, half of the nodes have no children. So you can build a heap by starting at the midpoint and sifting items down. What you're doing here is building the heap from the bottom up. Consider this array of five items:
[5, 3, 2, 4, 1]
Or, as a tree:
5
3 2
4 1
The length is 5, so we want to start at index 2 (assume a 1-based heap array). downheap, then, will look at the node labeled 3 and compare it with the smallest child. Since 1 is smaller than 3, we swap the items giving:
5
1 2
4 3
Since we reached a leaf level, we're done with that item. Move on to the first item, 5. It's smaller than 1, so we swap items:
1
5 2
4 3
But the item 5 is still larger than its children, so we do another swap:
1
3 2
4 5
And we're done. You have a valid heap.
It's instructive to do that by hand (with pencil and paper) to build a larger heap--say 10 items. That will give you a very good understanding of how the algorithm works.
For purposes of building the heap in this way, it doesn't matter if the array indexes start at 0 or 1. If the array is 0-based, then you end up making one extra call to downheap, but that doesn't do anything because the node you're trying to move down is already a leaf node. So it's slightly inefficient (one extra call to downheap), but not harmful.
It is important, however, that if your root node is at index 1, that you stop your loop with n > 0 rather than n >= 0. In the latter case, you could very well end up adding a bogus value to your heap and removing an item that's supposed to be there.
for k:= m div 2 down to 0
This appears to be pseudocode for:
for(int k = m/2; k >= 0; k--)
Or possibly
for(int k = m/2; k > 0; k--)
Depending on whether "down to 0" is inclusive or not.
Also is n the number of items?
Initially, yes, but it decrements on the line n-.
Can't we finish at n>0.Because the first element gets automatically sorted?
Yes, this is effectively what happens. Once N becomes zero at n-, it's most of the way through the loop body, so the only thing that gets executed after that and before until n <= 0 terminates is downheap(0);

How many permutations of a given array result in BST's of height 2?

A BST is generated (by successive insertion of nodes) from each permutation of keys from the set {1,2,3,4,5,6,7}. How many permutations determine trees of height two?
I been stuck on this simple question for quite some time. Any hints anyone.
By the way the answer is 80.
Consider how the tree would be height 2?
-It needs to have 4 as root, 2 as the left child, 6 right child, etc.
How come 4 is the root?
-It needs to be the first inserted. So we have one number now, 6 still can move around in the permutation.
And?
-After the first insert there are still 6 places left, 3 for the left and 3 for the right subtrees. That's 6 choose 3 = 20 choices.
Now what?
-For the left and right subtrees, their roots need to be inserted first, then the children's order does not affect the tree - 2, 1, 3 and 2, 3, 1 gives the same tree. That's 2 for each subtree, and 2 * 2 = 4 for the left and right subtrees.
So?
In conclusion: C(6, 3) * 2 * 2 = 20 * 2 * 2 = 80.
Note that there is only one possible shape for this tree - it has to be perfectly balanced. It therefore has to be this tree:
4
/ \
2 6
/ \ / \
1 3 5 7
This requires 4 to be inserted first. After that, the insertions need to build up the subtrees holding 1, 2, 3 and 5, 6, 7 in the proper order. This means that we will need to insert 2 before 1 and 3 and need to insert 6 before 5 and 7. It doesn't matter what relative order we insert 1 and 3 in, as long as they're after the 2, and similarly it doesn't matter what relative order we put 5 and 7 in as long as they're after 6. You can therefore think of what we need to insert as 2 X X and 6 Y Y, where the X's are the children of 2 and the Y's are the children of 6. We can then find all possible ways to get back the above tree by finding all interleaves of the sequences 2 X X and 6 Y Y, then multiplying by four (the number of ways of assigning X and Y the values 1, 3, 5, and 7).
So how many ways are there to interleave? Well, you can think of this as the number of ways to permute the sequence L L L R R R, since each permutation of L L L R R R tells us how to choose from either the Left sequence or the Right sequence. There are 6! / 3! 3! = 20 ways to do this. Since each of those twenty interleaves gives four possible insertion sequences, there end up being a total of 20 × 4 = 80 possible ways to do this.
Hope this helps!
I've created a table for the number of permutations possible with 1 - 12 elements, with heights up to 12, and included the per-root break down for anybody trying to check that their manual process (described in other answers) is matching with the actual values.
http://www.asmatteringofit.com/blog/2014/6/14/permutations-of-a-binary-search-tree-of-height-x
Here is a C++ code aiding the accepted answer, here I haven't shown the obvious ncr(i,j) function, hope someone will find it useful.
int solve(int n, int h) {
if (n <= 1)
return (h == 0);
int ans = 0;
for (int i = 0; i < n; i++) {
int res = 0;
for (int j = 0; j < h - 1; j++) {
res = res + solve(i, j) * solve(n - i - 1, h - 1);
res = res + solve(n - i - 1, j) * solve(i, h - 1);
}
res = res + solve(i, h - 1) * solve(n - i - 1, h - 1);
ans = ans + ncr(n - 1, i) * res;
}
return ans
}
The tree must have 4 as the root and 2 and 6 as the left and right child, respectively. There is only one choice for the root and the insertion should start with 4, however, once we insert the root, there are many insertion orders. There are 2 choices for, the second insertion 2 or 6. If we choose 2 for the second insertion, we have three cases to choose 6: choose 6 for the third insertion, 4, 2, 6, -, -, -, - there are 4!=24 choices for the rest of the insertions; fix 6 for the fourth insertion, 4, 2, -, 6, -,-,- there are 2 choices for the third insertion, 1 or 3, and 3! choices for the rest, so 2*3!=12, and the last case is to fix 6 in the fifth insertion, 4, 2, -, -, 6, -, - there are 2 choices for the third and fourth insertion ((1 and 3), or (3 and 1)) as well as for the last two insertions ((5 and 7) or (7 and 5)), so there are 4 choices. In total, if 2 is the second insertion we have 24+12+4=40 choices for the rest of the insertions. Similarly, there are 40 choices if the second insertion is 6, so the total number of different insertion orders is 80.

Determine distance between two random nodes in a tree

Given a general tree, I want the distance between two nodes v and w.
Wikipedia states the following:
Computation of lowest common ancestors may be useful, for instance, as part of a procedure for determining the distance between pairs of nodes in a tree: the distance from v to w can be computed as the distance from the root to v, plus the distance from the root to w, minus twice the distance from the root to their lowest common ancestor.
Let's say d(x) denotes the distance of node x from the root which we set to 1. d(x,y) denotes the distance between two vertices x and y. lca(x,y) denotes the lowest common ancestor of vertex pair x and y.
Thus if we have 4 and 8, lca(4,8) = 2 therefore, according to the description above, d(4,8) = d(4) + d(8) - 2 * d(lca(4,8)) = 2 + 3 - 2 * 1 = 3. Great, that worked!
However, the case stated above seems to fail for the vertex pair (8,3) (lca(8,3) = 2) d(8,3) = d(8) + d(3) - 2 * d(2) = 3 + 1 - 2 * 1 = 2. This is incorrect however, the distance d(8,3) = 4 as can be seen on the graph. The algorithm seems to fail for anything that crosses over the defined root.
What am I missing?
You missed that the lca(8,3) = 1, and not = 2. Hence the d(1) == 0 which makes it:
d(8,3) = d(8) + d(3) - 2 * d(1) = 3 + 1 - 2 * 0 = 4
For the appropriate 2 node, namely the one one the right, d(lca(8,2)) == 0, not 1 as you have it in your derivation. The distance from the root--which is the lca in this case--to itself is zero. So
d(8,2) = d(8) + d(2) - 2 * d(lca(8,2)) = 3 + 1 - 2 * 0 = 4
The fact that you have two nodes labeled 2 is probably confusing things.
Edit: The post has been edited so that a node originally labeled 2 is now labeled 3. In this case, the derivation is now correct but the statement
the distance d(8,2) = 4 as can be seen on the graph
is incorrect, d(8,2) = 2.

Resources