generate all structurally distinct full binary trees with n leaves - algorithm

This is a homework, I have difficulties in thinking of it. Please give me some ideas on recursions and DP solutions. Thanks a lot
generate and print all structurally distinct full binary
trees with n leaves in dotted parentheses form,
"full" means all internal (non-leaf) nodes have
exactly two children.
For example, there are 5 distinct full binary trees
with 4 leaves each.

In Python you could do this
def gendistinct(n):
leafnode = '(.)'
dp = []
newset = set()
newset.add(leafnode)
dp.append(newset)
for i in range(1,n):
newset = set()
for j in range(i):
for leftchild in dp[j]:
for rightchild in dp[i-j-1]:
newset.add('(' + '.' + leftchild + rightchild + ')')
dp.append(newset)
return dp[-1]
alltrees = gendistinct(4)
for tree in alltrees:
print tree

Another Python example with a different strategy.
This is recursive and uses generators. It is slower than the other implementation here but should use less memory since only one list should ever exist in memory at a time.
#!/usr/bin/env python
import itertools
def all_possible_trees(n):
if n == 1:
yield 'l'
for split in range(1, n):
gen_left = all_possible_trees(split)
gen_right = all_possible_trees(n-split)
for left, right in itertools.product(gen_left, gen_right):
yield [left, right]
if __name__ == '__main__':
import sys
n = int(sys.argv[1])
for thing in all_possible_trees(n):
print(thing)

I don't see an obvious way to do it with recursion, but no doubt there is one.
Rather, I would try a dynamic programming approach.
Note that under your definition of full tree, a tree with n leaves has n-1 internal nodes. Also note that the trees can be generated from smaller trees by joining together at the root two trees with sizes 1 to n-1 leaves on the left with n-1 to 1 leaves on the right.
Note also that the "trees" of various sizes can be stored as dotted parenthesis strings. To build a new tree from these, concatenate ( Left , Right ).
So start with the single tree with 1 leaf (that is, a single node). Build the lists of trees of increasing size up to n. To build the list of k-leaf trees, for each j = 1 to k-1, for each tree of j leaves, for each tree of k-j leaves, concatenate to build the tree (with k leaves) and add to the list.
As you build the n-leaf trees, you can print them out rather than store them.
There are 5*1 + 2*1 + 1*2 + 1*5 = 14 trees with 5 leaves.
There are 14*1 + 5*1 + 2*2 + 1*5 + 1*14 = 42 trees with 6 leaves.

U can use recursion, on i-th step u consider i-th level of tree and u chose which nodes will be present on this level according to constraints:
- there is parent on previous level
- no single children present (by your definition of "full" tree)
recursion ends when u have exactly N nodes.

Related

Can one extract the in-order given pre- and post-order for a binary search tree over < with only O(n) <-comparisons?

Assume one has a binary search tree B, which is not necessarily balanced, over some domain D with the strict order relation < and n elements.
Given B's extracted pre-order R, post-order T:
Is it possible to compute B's in-order S in O(n) without access to <?
Is it possible to compute B's in-order S using only O(n) comparisons with <?
Furthermore is it possible to compute S in O(n) total?
Note: This is a re-post of a now-deleted unanswered question.
1. In-order without relation <
This is impossible as explained in Why it is impossible to construct Binary Tree with Pre-Order, Post Order and Level Order traversals given?. In short, the input R = cba, T = abc is ambiguous and could stem from both these trees:
a a
/ \
b b
\ /
c c
S = bca S = acb
2. In-order in O(n) comparisons
Using the < relation, one can suddenly differentiate trees like the above although the produce the same pre-order R and post-order T. Given:
R = Ca
with C some arbitrary non-empty range of a's children nodes with C = u...v (i.e. range starts with u and ends with v) one can infer the following:
(1) a < u -> a has 1 direct child (to its right) -> all children are greater than a
(2) v < a -> a has 1 direct child (to its left) -> all children are less than a
(3) otherwise -> a has 2 direct children
Recursion in (1) and (2) is trivial and we have spent O(1) <. In case of (3) we have something of the form:
R = XbYca
T = abXcY
where X and Y are arbitrary sequences. We can split this into the recursion steps:
R = XbYca
T = abXcY
/ \
R = Xb R = Yc
T = bX T = cY
Note that this requires no comparisons with <, but requires to split both ranges. Since X and Y need not be the same length, finding the splitting-point requires us to find b in R, which can be done in O(n) for each level of the recursion tree. So in total we need O(n*d) equality comparisons, where d is the depth of the original binary search tree B (and the recursion tree which mirrors B).
In each recursion step we use at most 2 < comparisons, getting out one element of the range, hence we cannot use more than 2*n < comparisons (which is in O(n)).
3. In-order in O(n) total
In the algorithm given above the problem is that finding the point where to split the range containing all children cannot be done better than in linear time if one cannot afford a lookup table for all elements.
But, if the universe over which B is defined is small enough to be able to create an index table for all entries, one can pre-parse R (e.g. R = xbyca) in O(n) and create a table like this:
a -> 4
b -> 1
c -> 3
d -> N/A
e -> N/A
....
x -> 0
y -> 2
z -> N/A
Only if this is feasible one can achieve overall O(n) with the algorithm described in 2. It consume O(2^D) space.
Is it otherwise possible to produce the in-order S in O(n).
I don't have a proof for this, but do not think it is possible. The rationale is that the problem is too similar to comparison sorting which cannot be better than linearythmic.
In "2." we get away with linear comparisons because we can exploit the structure of the input in conjunction with a lot of equality checks to partially reconstruct the original structure of the binary search tree at least locally. However, I don't see how the size of each sub-tree can be extracted in less than linear time.

How many non-isomorphic full binary trees can n nodes generate?

Isomorphism means that arbitary sub-trees of a full binary tree swapping themselves can be identical to another one.
The answer is definitely not Catalan Number, because the amount of Catalan Number counts the isomorphic ones.
I assume by n nodes you mean n internal nodes. So it will have 2n+1 vertices, and 2n edges.
Next, we can put an order on binary trees as follows. A tree with more nodes is bigger. If two trees have the same number of nodes, compare the left side, and break ties by comparing the right. If two trees are equal in this order, it isn't hard to show by induction that they are the same tree.
For your problem, we can assume that for each isomorphism class we are only interested in the maximal tree in that isomorphism class. Note that this means that both the left and the right subtrees must also be maximal in their isomorphism classes, and the left subtree must be the same as or bigger than the right.
So suppose that f(n) is the number of non-isomorphic binary trees with n nodes. We can now go recursively. Here are our cases:
n=0 there is one, the empty tree.
n=1 there is one. A node with 2 leaves.
n > 1. Let us iterate over m, the number on the right. If 2m+1 < n then there are f(m) maximal trees on the right, f(n-m-1) on the left, and all of those are maximal for f(m) * f(n-m-1). If 2m+1 = n then we want a maximal tree on the right with m nodes, and a maximal tree on the left with m nodes, and the one on the right has to be smaller than or equal to the one on the left. But there is a well-known formula for how many ways to do that, which is f(m) * (f(m) + 1) / 2. And finally we can't have n < 2m+1 because in that case we don't have a maximal tree.
Using this, you should be able to write a recursive function to calculate the answer.
UPDATE Here is such a function:
cache = {0: 1, 1:1}
def count_nonisomorphic_full_trees (n):
if n not in cache:
answer = 0
for m in range(n):
c = count_nonisomorphic_full_trees(m)
if n < m+m+1:
break
elif n == m+m+1:
answer = answer + c*(c+1)//2
else:
d = count_nonisomorphic_full_trees(n-m-1)
answer = answer + c*d
cache[n] = answer
return cache[n]
Note that it starts off slower than the Catalan numbers but still grows exponentially.

Proof for number of internal nodes in a tree

I was reading about compressed tries and read the following:
a compressed trie is a tree which has L leaves and every internal node in the trie has at least 2 children.
Then the author wrote that a tree with L leaves such that every internal node has alteast 2 children, has atmost L-1 internal nodes. I am really confused why this is true.
Can somebody offer a inductive proof for it?
Inductive proof:
we will prove it by induction on L - the number of leaves in the tree.
base: a tree made out of 1 leaf is actually a tree with only a root. It has 0 internal nodes, and the claim is true.
assume the claim is true for a compressed tree with L leaves.
Step: let T be a tree with L+1 leaves. choose an arbitrary leaf, let it be l, and trim it.
In order to make the tree compressed again - you need to make l's father a leaf [if l's father has more then 2 sons including l, skip this step]. We do it by giving it the same value as l's brother, and trimming the brother.
Now you have a tree T' with L leaves.
By induction: T' has at most L-1 internal nodes.
so, T had L-1+1 = L internal nodes, and L+1 leaves, at most.
Q.E.D.
alternative proof:
a binary tree with L leaves has L-1 internal nodes (1 + 2 + 4 + ... + L/2 = L-1)
Since at "worst case" you have a binary tree [every internal node has at least 2 sons!], then you cannot have more then L-1 internal nodes!
You should try drawing a tree with L internal nodes, where every node has 2 children and there are L leaves. If you see why this is impossible, it won't be hard to figure out why it does work for L-1 internal nodes.
Ok, so I'll have a go.
Define trees for a start:
T_0 = { Leaf }
T_i = T_i-1 union { Node(c1, ..., cn) | n >= 1 && ci in T_i-1 }
Trees = sum T_i
Now, the (sketch) proof of your assertion.
It is easy to check it for T_0
For T_i: if t \in T_i it is either in T_i-1 or in the new elements. In the former case, use IH. In the latter case check the assertion (simple: if cis have L_i leaves, t has L = L_1 + ... + L_n leaves. It also hase no more than L_1 - 1 + L_2 - 1 + ... + L_n - 1 + 1 inner nodes (by IH for children, +1 for self). Because we assume each inner node has at least two children (that's the fact from the trie definition), it is no more than L_1 + l_2 + ... + L_n - 2 + 1 = L - 1).
By induction, if the assertion holds for t in T_i for all i, it holds for t in Trees.

binary tree data structures

Can anybody give me proof how the number of nodes in strictly binary tree is 2n-1 where n is the number of leaf nodes??
Proof by induction.
Base case is when you have one leaf. Suppose it is true for k leaves. Then you should proove for k+1. So you get the new node, his parent and his other leaf (by definition of strict binary tree). The rest leaves are k-1 and then you can use the induction hypothesis. So the actual number of nodes are 2*(k-1) + 3 = 2k+1 == 2*(k+1)-1.
just go with the basics, assuming there are x nodes in total, then we have n nodes with degree 1(leaves), 1 with degree 2(the root) and x-n-1 with degree 3(the inner nodes)
as a tree with x nodes will have x-1 edges. so summing
n + 3*(x-n-1) + 2 = 2(x-1) (equating the total degrees)
solving for x we get x = 2n-1
I'm guessing that what you really want is something like a proof that the depth is log2(N), where N is the number of nodes. In this case, the answer is fairly simple: for any given depth D, the number of nodes is 2D.
Edit: in response to edited question: the same fact pretty much applies. Since the number of nodes at any depth is 2D, the number of nodes further up the tree is 2D-1 + 2D-2 + ...20 = 2D-1. Therefore, the total number of nodes in a balanced binary tree is 2D + 2D-1. If you set n = 2D, you've gone the full circle back to the original equation.
I think you are trying to work out a proof for: N = 2L - 1 where L is the number
of leaf nodes and N is the total number of nodes in a binary tree.
For this formula to hold you need to put a few restrictions on how the binary
tree is constructed. Each node is either a leaf, which means it has no children, or
it is an internal node. Internal nodes have 3
possible configurations:
2 child nodes
1 child and 1 internal node
2 internal nodes
All three configurations imply that an internal node connects to two other nodes. This explicitly
rules out the situation where node connects to a single child as in:
o
/
o
Informal Proof
Start with a minimal tree of 1 leaf: L = 1, N = 1 substitute into N = 2L - 1 and the see that
the formula holds true (1 = 1, so far so good).
Now add another minimal chunk to the tree. To do that you need to add another two nodes and
tree looks like:
o
/ \
o o
Notice that you must add nodes in pairs to satisfy the restriction stated earlier.
Adding a pair of nodes always adds
one leaf (two new leaf nodes, but you loose one as it becomes an internal node). Node growth
progresses as the series: 1, 3, 5, 7, 9... but leaf growth is: 1, 2, 3, 4, 5... That is why the formula
N = 2L - 1 holds for this type of tree.
You might use mathematical induction to construct a formal proof, but this works find for me.
Proof by mathematical induction:
The statement that there are (2n-1) of nodes in a strictly binary tree with n leaf nodes is true for n=1. { tree with only one node i.e root node }
let us assume that the statement is true for tree with n-1 leaf nodes. Thus the tree has 2(n-1)-1 = 2n-3 nodes
to form a tree with n leaf nodes we need to add 2 child nodes to any of the leaf nodes in the above tree. Thus the total number of nodes = 2n-3+2 = 2n-1.
hence, proved
To prove: A strictly binary tree with n leaves contains 2n-1 nodes.
Show P(1): A strictly binary tree with 1 leaf contains 2(1)-1 = 1 node.
Show P(2): A strictly binary tree with 2 leaves contains 2(2)-1 = 3 nodes.
Show P(3): A strictly binary tree with 3 leaves contains 2(3)-1 = 5 nodes.
Assume P(K): A strictly binary tree with K leaves contains 2K-1 nodes.
Prove P(K+1): A strictly binary tree with K+1 leaves contains 2(K+1)-1 nodes.
2(K+1)-1 = 2K+2-1
= 2K+1
= 2K-1 +2*
* This result indicates that, for each leaf that is added, another node must be added to the father of the leaf , in order for it to continue to be a strictly binary tree. So, for every additional leaf, a total of two nodes must be added, as expected.
int N = 1000; insert here the value of N
int sum = 0; // the number of total nodes
int currFactor = 1;
for (int i = 0; i< log(N); ++i) //the is log(N) levels
{
sum += currFactor;
currFactor *= 2; //in each level the number of node is double than the upper level
}
if(sum == 2*N - 1)
{
cout<<"wow that the number of nodes is 2*N-1";
}

Enumerate search trees

According to this question the number of different search trees of a certain size is equal to a catalan number. Is it possible to enumerate those trees? That is, can someone implement the following two functions:
Node* id2tree(int id); // return root of tree
int tree2id(Node* root); // return id of tree
(I ask because the binary code for the tree (se one of the answers to this question) would be a very efficient code for representing arbitrarily large integers of unknown range, i.e, a variable length code for integers
0 -> 0
1 -> 100
2 -> 11000
3 -> 10100
4 -> 1110000
5 -> 1101000
6 -> 1100100
7 -> 1011000
8 -> 1010100
etc
notice that the number of integers of each code length is 1, 1, 2, 5,.. (the catalan sequence). )
It should be possible to convert the id to tree and back.
The id and bitstrings being:
0 -> 0
1 -> 100
2 -> 11000
3 -> 10100
4 -> 1110000
5 -> 1101000
6 -> 1100100
7 -> 1011000
8 -> 1010100
First consider the fact that given a bitstring, we can easily move to the tree (a recursive method) and viceversa (preorder, outputting 1 for parent and 0 for leaf).
The main challenge comes from trying to map the id to the bitstring and vice versa.
Suppose we listed out the trees of n nodes as follows:
Left sub-tree n-1 nodes, Right sub-tree 0 nodes. (Cn-1*C0 of them)
Left sub-tree n-2 nodes, Right sub-tree 1 node. (Cn-2*C1 of them)
Left sub-tree n-3 nodes, right sub-tree 2 nodes. (Cn-3*C2 of them)
...
...
Left sub-tree 0 nodes, Right sub-tree n-1 nodes. (C0*Cn-1 of them)
Cr = rth catalan number.
The enumeration you have given seems to come from the following procedure: we keep the left subtree fixed, enumerate through the right subtrees. Then move onto the next left subtree, enumerate through the right subtrees, and so on. We start with the maximum size left subtree, then next one is max size -1, etc.
So say we have an id = S say. We first find an n such that
C0 + C1 + C2 + ... + Cn < S <= C0+C1+ C2 + ... +Cn+1
Then S would correspond to a tree with n+1 nodes.
So you now consider P = S - (C0+C1+C2+ ...+Cn), which is the position in the enumeration of the trees of n+1 nodes.
Now we figure out an r such that Cn*C0 + Cn-1*C1 + .. + Cn-rCr < P <= CnC0 + Cn-1*C1 + .. + Cn-r+1*Cr-1
This tell us how many nodes the left subtree and the right subtree have.
Considering P - Cn*C0 + Cn-1*C1 + .. + Cn-r*Cr , we can now figure out the exact left subtree enumeration position(only considering trees of that size) and the exact right subtree enumeration position and recursively form the bitstring.
Mapping the bitstring to the id should be similar, as we know what the left subtree and right subtrees look like, all we would need to do is find the corresponding positions and do some arithmetic to get the ID.
Not sure how helpful it is though. You will be working with some pretty huge numbers all the time.
For general (non-search) binary trees I can see how this would be possible, since when building up the tree there are three choices (the amount of children) for every node, only restricted by having the total reach exactly N. You could find a way to represent such a tree as a sequence of choices (by building up the tree in a specific order), and represent that sequence as a base-3 number (or perhaps a variable base would be more appropriate).
But for binary search trees, not every organisation of elements is acceptable. You have to obey the numeric ordering constraints as well. On the other hand, since insertion into a binary search tree is well-defined, you can represent an entire tree of N elements by having a list of N numbers in a specific insertion order. By permuting the numbers to be in a different order, you can generate a different tree.
Permutations are of course easily counted by using variable-base numbers: You have N choices for the first item, N-1 for the second, etc. That gives you a sequence of N numbers that you can encode as a number with base varying from N to 1. Encoding and decoding from variable-base to binary or decimal is trivially adapted from a normal fixed-base conversion algorithm. (The ones that use modulus and division operations).
So you can convert a number to and from a permutation, and given a list of numbers you can convert a permutation (of that list) from and to a binary search tree. Now I think that you could get all the possible binary search trees of size N by permuting just the integers 1 to N, but I'm not entirely sure, and attempting to prove that is a bit too much for this post.
I hope this is a good starting point for a discussion.

Resources