Enumerate search trees - algorithm

According to this question the number of different search trees of a certain size is equal to a catalan number. Is it possible to enumerate those trees? That is, can someone implement the following two functions:
Node* id2tree(int id); // return root of tree
int tree2id(Node* root); // return id of tree
(I ask because the binary code for the tree (se one of the answers to this question) would be a very efficient code for representing arbitrarily large integers of unknown range, i.e, a variable length code for integers
0 -> 0
1 -> 100
2 -> 11000
3 -> 10100
4 -> 1110000
5 -> 1101000
6 -> 1100100
7 -> 1011000
8 -> 1010100
etc
notice that the number of integers of each code length is 1, 1, 2, 5,.. (the catalan sequence). )

It should be possible to convert the id to tree and back.
The id and bitstrings being:
0 -> 0
1 -> 100
2 -> 11000
3 -> 10100
4 -> 1110000
5 -> 1101000
6 -> 1100100
7 -> 1011000
8 -> 1010100
First consider the fact that given a bitstring, we can easily move to the tree (a recursive method) and viceversa (preorder, outputting 1 for parent and 0 for leaf).
The main challenge comes from trying to map the id to the bitstring and vice versa.
Suppose we listed out the trees of n nodes as follows:
Left sub-tree n-1 nodes, Right sub-tree 0 nodes. (Cn-1*C0 of them)
Left sub-tree n-2 nodes, Right sub-tree 1 node. (Cn-2*C1 of them)
Left sub-tree n-3 nodes, right sub-tree 2 nodes. (Cn-3*C2 of them)
...
...
Left sub-tree 0 nodes, Right sub-tree n-1 nodes. (C0*Cn-1 of them)
Cr = rth catalan number.
The enumeration you have given seems to come from the following procedure: we keep the left subtree fixed, enumerate through the right subtrees. Then move onto the next left subtree, enumerate through the right subtrees, and so on. We start with the maximum size left subtree, then next one is max size -1, etc.
So say we have an id = S say. We first find an n such that
C0 + C1 + C2 + ... + Cn < S <= C0+C1+ C2 + ... +Cn+1
Then S would correspond to a tree with n+1 nodes.
So you now consider P = S - (C0+C1+C2+ ...+Cn), which is the position in the enumeration of the trees of n+1 nodes.
Now we figure out an r such that Cn*C0 + Cn-1*C1 + .. + Cn-rCr < P <= CnC0 + Cn-1*C1 + .. + Cn-r+1*Cr-1
This tell us how many nodes the left subtree and the right subtree have.
Considering P - Cn*C0 + Cn-1*C1 + .. + Cn-r*Cr , we can now figure out the exact left subtree enumeration position(only considering trees of that size) and the exact right subtree enumeration position and recursively form the bitstring.
Mapping the bitstring to the id should be similar, as we know what the left subtree and right subtrees look like, all we would need to do is find the corresponding positions and do some arithmetic to get the ID.
Not sure how helpful it is though. You will be working with some pretty huge numbers all the time.

For general (non-search) binary trees I can see how this would be possible, since when building up the tree there are three choices (the amount of children) for every node, only restricted by having the total reach exactly N. You could find a way to represent such a tree as a sequence of choices (by building up the tree in a specific order), and represent that sequence as a base-3 number (or perhaps a variable base would be more appropriate).
But for binary search trees, not every organisation of elements is acceptable. You have to obey the numeric ordering constraints as well. On the other hand, since insertion into a binary search tree is well-defined, you can represent an entire tree of N elements by having a list of N numbers in a specific insertion order. By permuting the numbers to be in a different order, you can generate a different tree.
Permutations are of course easily counted by using variable-base numbers: You have N choices for the first item, N-1 for the second, etc. That gives you a sequence of N numbers that you can encode as a number with base varying from N to 1. Encoding and decoding from variable-base to binary or decimal is trivially adapted from a normal fixed-base conversion algorithm. (The ones that use modulus and division operations).
So you can convert a number to and from a permutation, and given a list of numbers you can convert a permutation (of that list) from and to a binary search tree. Now I think that you could get all the possible binary search trees of size N by permuting just the integers 1 to N, but I'm not entirely sure, and attempting to prove that is a bit too much for this post.
I hope this is a good starting point for a discussion.

Related

How many different rooted unlabelled binary trees have exacly 9 nodes and are left-heavy?

I know how many trees are possible using nth Catalan number but don't know how to find left-heavy trees. Is there any technique?
The answer is: 2357
I provide here a reasoned approach (no programming involved) and code to produce the same result, but via a more brute-force method.
A. Reasoning
Intuitively, it seems easier to count by exclusion. So that leads to this approach:
Count all the binary trees with 9 nodes. As you already indicated, this corresponds to the 9th Catalan number. This is C9 = 4862.
Subtract the number of trees whose roots are balanced, i.e. where the two subtrees of the root have equal heights (let's call those subtrees L and R). That gives us the number of trees that are either left- or right-heavy.
As there are just as many left- as right-heavy trees, divide this result by two to get the final result.
So now we can focus on calculating the number mentioned in the second bullet point:
Counting trees that are strictly balanced at the root
A tree of height 2 would have at most 7 nodes (when it is full), so the height needs to be at least 3. A tree of height 5 (that is balanced at the root) needs at least 5 nodes in A (on a single path), and 5 in B (also on a single path), so the height cannot be more than 4. We thus have only two possibilities: the height of a 9-node binary tree, that is balanced in the root, is either 3 or 4.
Let's deal with these two cases separately:
1. When the height of the tree is 3
In this case we have a tree with 4 levels. Let's analyse each level:
There are 3 nodes in the first two levels: the root and its two children.
The third level is the first level where there can be some variation: that level has between 2 and 4 nodes. Let's deal with those three cases one by one:
1.a Third level has 2 nodes
Here the third level has one node in L and one in R. Each can either be a left or right child of its parent. So there are two possibilities at either side: 2x2 = 4 possibilities.
There is no variation possible in the fourth level: the four remaining nodes are children of the two nodes in the third level.
Possibilities: 4
1.b Third level has 3 nodes
There are 4 ways to select three positions from the four available positions in the third level. Either L or R gets only one node. Let's call this node x.
In the fourth level we need to distribute the three remaining nodes, such that L and R get at least one of those. This is achieved when x gets either one or two children.
When x gets two children, the other remaining node has 4 possible positions. Here you see those 4 positions in light grey:
When x gets one child, it can be either a left or right child, and the other two remaining nodes can occupy the four available positions (see image above) in 6 ways: 2x6=12.
So given a choice in the third level, there are 4+12=16 possible configurations for the fourth level.
Combining this with the possibilities in the third level, we get 4x16:
Possibilities: 64
1.c Third level has 4 nodes
The third level is thus full. The two remaining nodes on the fourth level need to be split between L and R, and so each has 4 possible positions. This gives 4x4 = 16 possibilities in total.
Possibilities: 16
2. When the height of the tree is 4
When the height is 4, then by consequence L and R each have only one leaf: they are chains of 4 nodes each. This is the only way to make the root strictly balanced and get a height of 4.
There is no choice for the root node of L (it is the left child of the root), but from there on, each next descendant in L can be either a left or right child of its parent. The shape of L has thus 23 possibilities = 8. Considering the same for R, we have a total of 8x8 = 64 shapes.
Possibilities: 64
Total
Taking all of the above together, we have 4+64+16 + 64 = 148 possible shapes that give a tree with a balanced root.
So applying the approach set out at the top, the total number of left-heavy binary trees with 9 unlabelled nodes is (4862-148)/2 = 2357
B. Code
To make this a programming challenge, here is an implementation in JavaScript that defines the following functions:
countTreesUpToHeight(n, height): count all binary trees with n nodes, that are not higher than the given height. Uses recursion.
countTreesWithHeight(n, height): count all binary trees with n nodes, that have exactly the given height. Uses the preceding function.
countLeftHeavy(n): the main function. Uses the other two functions to count all combinations where the root's left subtree is higher than the right one.
So this approach is not like the exclusion approach above. It actually counts the combinations of interest. The output is the same.
function countTreesUpToHeight(n, height) {
if (n > 2**(height+1) - 1) return 0; // too many nodes to fit within height
if (n < 2) return 1;
let count = 0;
for (let i = 0; i < n; i++) {
count += countTreesUpToHeight(i, height-1)
* countTreesUpToHeight(n-1-i, height-1);
}
return count;
}
function countTreesWithHeight(n, height) {
return countTreesUpToHeight(n, height) - countTreesUpToHeight(n, height-1);
}
function countLeftHeavy(n) {
let count = 0;
// make choices for the height of the left subtree
for (let height = 0; height < n; height++) {
// make choices for the number of nodes in the left subtree
for (let i = 0; i < n; i++) {
// multiply the number of combinations for the left subtree
// with those for the right subtree
count += countTreesWithHeight(i, height-1)
* countTreesUpToHeight(n-1-i, height-2);
}
}
return count;
}
let result = countLeftHeavy(9);
console.log(result); // 2357

An algorithm to operate on a binary tree leaves?

I didn't systematically learn the Data Structure and Algorithm course in the uni (just read some books) and would like to ask if there is a well-defined algorithm to do the following work for a binary tree:
For a given binary tree and a positive integer n, search its leaves. If the difference between the depth of the two adjacent leaves (imagine all leaves are display as an array, so two adjacent leaves may be in two different sub-trees) is larger than n. Subdivide the leaf with lower depth. Recursively do this operation until no subdivision is required.
The following figure is a demonstration, for n:
Since leaf 1's depth is smaller than leaf 2 by 2, leaf 1 need to be subdivided:
Now no further subdivision is required.
Initialize P as empty
DFS the remaining part of the tree until you are in a leaf L; exit if no more leaves
Compare the current depth DL with the depth DP of the previous leaf P; go to [5] if P is empty
If |DL – DP| ≥ N, split either L (when DL < DP) or P (otherwise)
Consider L the new P and go to [2]
Repeat this until the tree converges (i.e. until no splits can be performed).

Find the kth node in Nth level of binary tree [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
K-th element in a heap tree
Given a binary tree, if parent is 0, then left child is 0 and right child is 1. if parent is 1, then left child is 1 and right child is 0. Root of the tree is 0. Find the kth node value which is present at Nth level
I tried to solve in this way. Suppose first level has 0, second level has 01, third level has 01 - 10 (i.e complement of first half).
Similarly 0110 1001 on the fourth level.
Now how can I generalize this solution or any other way to solve this question?
To generalize your idea, you could write a recursive procedure that gives the list of the elements of the nth level of the tree, since (like you said) every level can be obtained concatenating the upper level and its complement:
getLevel(level)
if level == 0
return [0]
upperLevel = getLevel(level - 1)
return upperLevel + complement(upperLevel)
Where [...] is a list, + is the concatenation of lists and complement changes 0 into 1 and viceversa.
Having this, you just have to get the kth element of the list generated by getLevel(n).
This is probably not the optimal solution, it's just built on your idea (and it's easy).
I manually generated first several bits, and got 0110100110010110. Google reveals this is Thue-Morse sequence. Sequence A010060 in OEIS. Comments on OEIS page have this line:
a(n) = S2(n) mod 2, where S2(n) = sum of digits of n, n in base-2 notation.
Here n is what in your case is k, and N in your case does not matter. So, to determine a(n) calculate number of 1's in n, and take least significant bit of this sum.

generate all structurally distinct full binary trees with n leaves

This is a homework, I have difficulties in thinking of it. Please give me some ideas on recursions and DP solutions. Thanks a lot
generate and print all structurally distinct full binary
trees with n leaves in dotted parentheses form,
"full" means all internal (non-leaf) nodes have
exactly two children.
For example, there are 5 distinct full binary trees
with 4 leaves each.
In Python you could do this
def gendistinct(n):
leafnode = '(.)'
dp = []
newset = set()
newset.add(leafnode)
dp.append(newset)
for i in range(1,n):
newset = set()
for j in range(i):
for leftchild in dp[j]:
for rightchild in dp[i-j-1]:
newset.add('(' + '.' + leftchild + rightchild + ')')
dp.append(newset)
return dp[-1]
alltrees = gendistinct(4)
for tree in alltrees:
print tree
Another Python example with a different strategy.
This is recursive and uses generators. It is slower than the other implementation here but should use less memory since only one list should ever exist in memory at a time.
#!/usr/bin/env python
import itertools
def all_possible_trees(n):
if n == 1:
yield 'l'
for split in range(1, n):
gen_left = all_possible_trees(split)
gen_right = all_possible_trees(n-split)
for left, right in itertools.product(gen_left, gen_right):
yield [left, right]
if __name__ == '__main__':
import sys
n = int(sys.argv[1])
for thing in all_possible_trees(n):
print(thing)
I don't see an obvious way to do it with recursion, but no doubt there is one.
Rather, I would try a dynamic programming approach.
Note that under your definition of full tree, a tree with n leaves has n-1 internal nodes. Also note that the trees can be generated from smaller trees by joining together at the root two trees with sizes 1 to n-1 leaves on the left with n-1 to 1 leaves on the right.
Note also that the "trees" of various sizes can be stored as dotted parenthesis strings. To build a new tree from these, concatenate ( Left , Right ).
So start with the single tree with 1 leaf (that is, a single node). Build the lists of trees of increasing size up to n. To build the list of k-leaf trees, for each j = 1 to k-1, for each tree of j leaves, for each tree of k-j leaves, concatenate to build the tree (with k leaves) and add to the list.
As you build the n-leaf trees, you can print them out rather than store them.
There are 5*1 + 2*1 + 1*2 + 1*5 = 14 trees with 5 leaves.
There are 14*1 + 5*1 + 2*2 + 1*5 + 1*14 = 42 trees with 6 leaves.
U can use recursion, on i-th step u consider i-th level of tree and u chose which nodes will be present on this level according to constraints:
- there is parent on previous level
- no single children present (by your definition of "full" tree)
recursion ends when u have exactly N nodes.

Counting Treaps

Consider the problem of counting the number of structurally distinct binary search trees:
Given N, find the number of structurally distinct binary search trees containing the values 1 .. N
It's pretty easy to give an algorithm that solves this: fix every possible number in the root, then recursively solve the problem for the left and right subtrees:
countBST(numKeys)
if numKeys <= 1
return 1
else
result = 0
for i = 1 .. numKeys
leftBST = countBST(i - 1)
rightBST = countBST(numKeys - i)
result += leftBST * rightBST
return result
I've recently been familiarizing myself with treaps, and I posed the following problem to myself:
Given N, find the number of distinct treaps containing the values 1 .. N with priorities 1 .. N. Two treaps are distinct if they are structurally different relative to EITHER the key OR the priority (read on for clarification).
I've been trying to figure out a formula or an algorithm that can solve this for a while now, but I haven't been successful. This is what I noticed though:
The answers for n = 2 and n = 3 seem to be 2 and 6, based on me drawing trees on paper.
If we ignore the part that says treaps can also be different relative to the priority of the nodes, the problem seems to be identical to counting just binary search trees, since we'll be able to assign priorities to each BST such that it also respects the heap invariant. I haven't proven this though.
I think the hard part is accounting for the possibility to permute the priorities without changing the structure. For example, consider this treap, where the nodes are represented as (key, priority) pairs:
(3, 5)
/ \
(2, 3) (4, 4)
/ \
(1, 1) (5, 2)
We can permute the priorities of both the second and third levels while still maintaining the heap invariant, so we get more solutions even though no keys switch place. This probably gets even uglier for bigger trees. For example, this is a different treap from the one above:
(3, 5)
/ \
(2, 4) (4, 3) // swapped priorities
/ \
(1, 1) (5, 2)
I'd appreciate if anyone can share any ideas on how to approach this. It seemed like an interesting counting problem when I thought about it. Maybe someone else thought about it too and even solved it!
Interesting question! I believe the answer is N factorial!
Given a tree structure, there is exactly one way to fill in the binary search tree key values.
Thus all we need to do is count the different number of heaps.
Given a heap, consider an in-order traversal of the tree.
This corresponds to a permutation of the numbers 1 to N.
Now given any permutation of {1,2...,N}, you can construct a heap as follows:
Find the position of the largest element. The elements to its left form the left subtree and the elements to its right form the right subtree. These subtrees are formed recursively by finding the largest element and splitting there.
This gives rise to a heap, as we always choose the max element and the in-order traversal of that heap is the permutation we started with. Thus we have a way of going from a heap to a permutaion and back uniquely.
Thus the required number is N!.
As an example:
5
/ \
3 4 In-order traversal -> 35142
/ \
1 2
Now start with 35142. Largest is 5, so 3 is left subtree and 142 is right.
5
/ \
3 {142}
In 142, 4 is largest and 1 is left and 2 is right, so we get
5
/ \
3 4
/ \
1 2
The only way to fill in binary search keys for this is:
(2,5)
/ \
(1,3) (4,4)
/ \
(3,1) (5,2)
For a more formal proof:
If HN is the number of heaps on 1...N, then we have that
HN = Sum_{L=0 to N-1} HL * HN-1-L * (N-1 choose L)
(basically we pick the max and assign to root. Choose the size of left subtree, and choose that many elements and recurse on left and right).
Now,
H0 = 1
H1 = 1
H2 = 2
H3 = 6
If Hn = n! for 0 ≤ n ≤ k
Then HK+1 = Sum_{L=0 to K} L! * (K-L)! * (K!/L!*(K-L)!) = (K+1)!
def countBST(numKeys:Long):Long = numKeys match {
case 0L => 1L
case 1L => 1L
case _ => (1L to numKeys).map{i=>countBST(i-1) * countBST(numKeys-i)}.sum
}
You didn't actually define structural similarity for treaps -- you just gave examples. I'm going to assume the following definition: two trees are structurally different if and only if they have a different shape, or there exist nodes a (from tree A) and b (from tree B) such that a and b are in the same position, and the priorities of the children of a are in the opposite order of the priorities of the children of b. (It's obvious that if two treaps on the same values have the same shape, then the values in corresponding nodes are the same.)
In other words, if we visualize two trees by just giving the priorities on the nodes, the following two trees are structurally similar:
7 7
6 5 6 5
4 3 2 1 2 1 4 3 <--- does not change the relative order
of the children of any node
6's left child is still greater than 6's right child
5's left child is still greater than 5's right child
but the following two trees are structurally different:
7 7
5 6 6 5 <--- changes the relative order of the children
4 3 2 1 4 3 2 1 of node 7
Thus for the treap problem, each internal node has 2 orderings, and these two orderings do not otherwise affect the shape of the tree. So...
def countTreap(numKeys:Long):Long = numKeys match {
case 0L => 1L
case 1L => 1L
case _ => 2 * countBST(numKeys-1) + //2 situations when the tree has only 1 child
2 * (2L to (numKeys-1)).map{i=>countBST(i-1) * countBST(numKeys-i)}.sum
// and for each situation where the tree has 2 children, this node
// contributes 2 orderings the priorities of its children
// (which is independent of the shape of the tree below this level)
}

Resources