Counting Treaps - algorithm

Consider the problem of counting the number of structurally distinct binary search trees:
Given N, find the number of structurally distinct binary search trees containing the values 1 .. N
It's pretty easy to give an algorithm that solves this: fix every possible number in the root, then recursively solve the problem for the left and right subtrees:
countBST(numKeys)
if numKeys <= 1
return 1
else
result = 0
for i = 1 .. numKeys
leftBST = countBST(i - 1)
rightBST = countBST(numKeys - i)
result += leftBST * rightBST
return result
I've recently been familiarizing myself with treaps, and I posed the following problem to myself:
Given N, find the number of distinct treaps containing the values 1 .. N with priorities 1 .. N. Two treaps are distinct if they are structurally different relative to EITHER the key OR the priority (read on for clarification).
I've been trying to figure out a formula or an algorithm that can solve this for a while now, but I haven't been successful. This is what I noticed though:
The answers for n = 2 and n = 3 seem to be 2 and 6, based on me drawing trees on paper.
If we ignore the part that says treaps can also be different relative to the priority of the nodes, the problem seems to be identical to counting just binary search trees, since we'll be able to assign priorities to each BST such that it also respects the heap invariant. I haven't proven this though.
I think the hard part is accounting for the possibility to permute the priorities without changing the structure. For example, consider this treap, where the nodes are represented as (key, priority) pairs:
(3, 5)
/ \
(2, 3) (4, 4)
/ \
(1, 1) (5, 2)
We can permute the priorities of both the second and third levels while still maintaining the heap invariant, so we get more solutions even though no keys switch place. This probably gets even uglier for bigger trees. For example, this is a different treap from the one above:
(3, 5)
/ \
(2, 4) (4, 3) // swapped priorities
/ \
(1, 1) (5, 2)
I'd appreciate if anyone can share any ideas on how to approach this. It seemed like an interesting counting problem when I thought about it. Maybe someone else thought about it too and even solved it!

Interesting question! I believe the answer is N factorial!
Given a tree structure, there is exactly one way to fill in the binary search tree key values.
Thus all we need to do is count the different number of heaps.
Given a heap, consider an in-order traversal of the tree.
This corresponds to a permutation of the numbers 1 to N.
Now given any permutation of {1,2...,N}, you can construct a heap as follows:
Find the position of the largest element. The elements to its left form the left subtree and the elements to its right form the right subtree. These subtrees are formed recursively by finding the largest element and splitting there.
This gives rise to a heap, as we always choose the max element and the in-order traversal of that heap is the permutation we started with. Thus we have a way of going from a heap to a permutaion and back uniquely.
Thus the required number is N!.
As an example:
5
/ \
3 4 In-order traversal -> 35142
/ \
1 2
Now start with 35142. Largest is 5, so 3 is left subtree and 142 is right.
5
/ \
3 {142}
In 142, 4 is largest and 1 is left and 2 is right, so we get
5
/ \
3 4
/ \
1 2
The only way to fill in binary search keys for this is:
(2,5)
/ \
(1,3) (4,4)
/ \
(3,1) (5,2)
For a more formal proof:
If HN is the number of heaps on 1...N, then we have that
HN = Sum_{L=0 to N-1} HL * HN-1-L * (N-1 choose L)
(basically we pick the max and assign to root. Choose the size of left subtree, and choose that many elements and recurse on left and right).
Now,
H0 = 1
H1 = 1
H2 = 2
H3 = 6
If Hn = n! for 0 ≤ n ≤ k
Then HK+1 = Sum_{L=0 to K} L! * (K-L)! * (K!/L!*(K-L)!) = (K+1)!

def countBST(numKeys:Long):Long = numKeys match {
case 0L => 1L
case 1L => 1L
case _ => (1L to numKeys).map{i=>countBST(i-1) * countBST(numKeys-i)}.sum
}
You didn't actually define structural similarity for treaps -- you just gave examples. I'm going to assume the following definition: two trees are structurally different if and only if they have a different shape, or there exist nodes a (from tree A) and b (from tree B) such that a and b are in the same position, and the priorities of the children of a are in the opposite order of the priorities of the children of b. (It's obvious that if two treaps on the same values have the same shape, then the values in corresponding nodes are the same.)
In other words, if we visualize two trees by just giving the priorities on the nodes, the following two trees are structurally similar:
7 7
6 5 6 5
4 3 2 1 2 1 4 3 <--- does not change the relative order
of the children of any node
6's left child is still greater than 6's right child
5's left child is still greater than 5's right child
but the following two trees are structurally different:
7 7
5 6 6 5 <--- changes the relative order of the children
4 3 2 1 4 3 2 1 of node 7
Thus for the treap problem, each internal node has 2 orderings, and these two orderings do not otherwise affect the shape of the tree. So...
def countTreap(numKeys:Long):Long = numKeys match {
case 0L => 1L
case 1L => 1L
case _ => 2 * countBST(numKeys-1) + //2 situations when the tree has only 1 child
2 * (2L to (numKeys-1)).map{i=>countBST(i-1) * countBST(numKeys-i)}.sum
// and for each situation where the tree has 2 children, this node
// contributes 2 orderings the priorities of its children
// (which is independent of the shape of the tree below this level)
}

Related

Data Structures and Algorithmn in C++ 2nd Ed - Goodrich . Page 295 question on vector-based structure binary tree worst case for space 2^n - 1

Let me explain as best as i can. This is about binary tree using vector.
According to author, the implementation is as follows:
A simple structure for representing a binary tree T is based on a way of numbering
the nodes of T. For every node v of T, let f(v) be the integer defined as follows:
• If v is the root of T, then f(v) = 1
• If v is the left child of node u, then f(v) = 2 f(u)
• If v is the right child of node u, then f(v) = 2 f(u)+ 1
The numbering function f is known as a level numbering of the nodes in a binary
tree T, because it numbers the nodes on each level of T in increasing order from
left to right, although it may skip some numbers (see figures below).
Let n be the number of nodes of T, and let fM be the maximum value of f(v)
over all the nodes of T. The vector S has size N = fM + 1, since the element of S at
index 0 is not associated with any node of T. Also, S will have, in general, a number
of empty elements that do not refer to existing nodes of T. For a tree of height h,
N = O(2^h). In the worst case, this can be as high as 2^n − 1.
Question:
The last statement worst case 2^n-1 does not seem right. Here n=number of nodes. I think he meant 2^h-1 instead of 2^n-1. Using figure a) as an example, this would mean 2^n -1 means 2^15-1 = 32768-1 = 32767. Does not make sense.
Any insight is appreciated.
Thanks.
The worst case is when the tree is degenerated to a chain from the root, where each node has two children, but at least one of which is always a leaf. When this chain has n nodes, then the height of the tree is n/2. The vector must span all the levels and allocate room for full levels, even though there is in this degenerate tree only one node per level. The size S of the vector will still be O(2h), but now that in this degenerate case h is O(n/2) = O(n), this makes it O(2n) in the worst case.
The formula 2n-1 seems to suggest the author does not have a proper binary tree in mind, and then the above reasoning should be done with a degenerate tree that consists of a single chain where every node has at the most one child.
Example of worst case
Here is an example tree (not a proper tree, but the principle for proper trees is similar):
1
/
2
\
5
\
11
So n = 4, and h = 3.
The vector however needs to store all the slots where nodes could have been, so something like this:
_____ 1 _____
/ \
__2__ __ __
/ \ / \
_5_
/ \ / \ / \ / \
11
...so the vector has a size of 1+2+4+8 = 15. (Even 16 when we account for the unused slot 0 in the vector)
This illustrates that the size S of the vector is always O(2h). In this worst case (worst with respect to n, not with respect to h), S is O(2n).
Example n=6
When n=6, we could have this as a best case:
1
/ \
2 3
/ \ \
4 5 7
This tree can be represented by a vector of size 8, where the entries at index 0 and index 6 are filled with nulls (unused).
However, for n=6 we could have a worst case ("worst" for the impact on the vector size) when the tree is very unbalanced:
1
\
2
\
3
\
4
\
5
\
7
Now the tree's height is 5 instead of 2, and the vector needs to put that node 7 in the slot at index 63... S is 64. Remember that the vector spans each complete binary level, which doubles in size at each next level.
So when n is 6, S can be 8, 16, 32, or 64. It depends on the shape of the tree. In each case we have that S=O(2h). But when we express S in terms of n, then there is variation, and the best case is that S=O(n), while the worst case is S=O(2n).

How many different rooted unlabelled binary trees have exacly 9 nodes and are left-heavy?

I know how many trees are possible using nth Catalan number but don't know how to find left-heavy trees. Is there any technique?
The answer is: 2357
I provide here a reasoned approach (no programming involved) and code to produce the same result, but via a more brute-force method.
A. Reasoning
Intuitively, it seems easier to count by exclusion. So that leads to this approach:
Count all the binary trees with 9 nodes. As you already indicated, this corresponds to the 9th Catalan number. This is C9 = 4862.
Subtract the number of trees whose roots are balanced, i.e. where the two subtrees of the root have equal heights (let's call those subtrees L and R). That gives us the number of trees that are either left- or right-heavy.
As there are just as many left- as right-heavy trees, divide this result by two to get the final result.
So now we can focus on calculating the number mentioned in the second bullet point:
Counting trees that are strictly balanced at the root
A tree of height 2 would have at most 7 nodes (when it is full), so the height needs to be at least 3. A tree of height 5 (that is balanced at the root) needs at least 5 nodes in A (on a single path), and 5 in B (also on a single path), so the height cannot be more than 4. We thus have only two possibilities: the height of a 9-node binary tree, that is balanced in the root, is either 3 or 4.
Let's deal with these two cases separately:
1. When the height of the tree is 3
In this case we have a tree with 4 levels. Let's analyse each level:
There are 3 nodes in the first two levels: the root and its two children.
The third level is the first level where there can be some variation: that level has between 2 and 4 nodes. Let's deal with those three cases one by one:
1.a Third level has 2 nodes
Here the third level has one node in L and one in R. Each can either be a left or right child of its parent. So there are two possibilities at either side: 2x2 = 4 possibilities.
There is no variation possible in the fourth level: the four remaining nodes are children of the two nodes in the third level.
Possibilities: 4
1.b Third level has 3 nodes
There are 4 ways to select three positions from the four available positions in the third level. Either L or R gets only one node. Let's call this node x.
In the fourth level we need to distribute the three remaining nodes, such that L and R get at least one of those. This is achieved when x gets either one or two children.
When x gets two children, the other remaining node has 4 possible positions. Here you see those 4 positions in light grey:
When x gets one child, it can be either a left or right child, and the other two remaining nodes can occupy the four available positions (see image above) in 6 ways: 2x6=12.
So given a choice in the third level, there are 4+12=16 possible configurations for the fourth level.
Combining this with the possibilities in the third level, we get 4x16:
Possibilities: 64
1.c Third level has 4 nodes
The third level is thus full. The two remaining nodes on the fourth level need to be split between L and R, and so each has 4 possible positions. This gives 4x4 = 16 possibilities in total.
Possibilities: 16
2. When the height of the tree is 4
When the height is 4, then by consequence L and R each have only one leaf: they are chains of 4 nodes each. This is the only way to make the root strictly balanced and get a height of 4.
There is no choice for the root node of L (it is the left child of the root), but from there on, each next descendant in L can be either a left or right child of its parent. The shape of L has thus 23 possibilities = 8. Considering the same for R, we have a total of 8x8 = 64 shapes.
Possibilities: 64
Total
Taking all of the above together, we have 4+64+16 + 64 = 148 possible shapes that give a tree with a balanced root.
So applying the approach set out at the top, the total number of left-heavy binary trees with 9 unlabelled nodes is (4862-148)/2 = 2357
B. Code
To make this a programming challenge, here is an implementation in JavaScript that defines the following functions:
countTreesUpToHeight(n, height): count all binary trees with n nodes, that are not higher than the given height. Uses recursion.
countTreesWithHeight(n, height): count all binary trees with n nodes, that have exactly the given height. Uses the preceding function.
countLeftHeavy(n): the main function. Uses the other two functions to count all combinations where the root's left subtree is higher than the right one.
So this approach is not like the exclusion approach above. It actually counts the combinations of interest. The output is the same.
function countTreesUpToHeight(n, height) {
if (n > 2**(height+1) - 1) return 0; // too many nodes to fit within height
if (n < 2) return 1;
let count = 0;
for (let i = 0; i < n; i++) {
count += countTreesUpToHeight(i, height-1)
* countTreesUpToHeight(n-1-i, height-1);
}
return count;
}
function countTreesWithHeight(n, height) {
return countTreesUpToHeight(n, height) - countTreesUpToHeight(n, height-1);
}
function countLeftHeavy(n) {
let count = 0;
// make choices for the height of the left subtree
for (let height = 0; height < n; height++) {
// make choices for the number of nodes in the left subtree
for (let i = 0; i < n; i++) {
// multiply the number of combinations for the left subtree
// with those for the right subtree
count += countTreesWithHeight(i, height-1)
* countTreesUpToHeight(n-1-i, height-2);
}
}
return count;
}
let result = countLeftHeavy(9);
console.log(result); // 2357

Center node in a tree (in a minimum sum of distances sense)

My problem is the following:
Given a tree (V, E), find the center node v such that sum{w in V}[dist(v,w)] is minimum, where dist(v,w) is the number of edges in shortest path from v to w. The algorithm should run in O(n) time (n being the number of nodes in a tree).
The questions here and here also ask for the center node but define it differently.
I haven't rigorously gone through the steps but I actually think that the solution to my problem should be similar to the solution of this problem.
However, I decided that I should share my problem with the community as it took me a while to navigate to the link, which however does not answer the question directly.
I came up with this solution:
1) Choose an arbitrary node as a root r, form a tree. For each subtree in this tree, calculate number of nodes in a subtree (the leaves are single-node-trees).
As an example for this tree
1
/ | \
2 3 4
/ \ \
5 6 7
/ \
8 9
the result would be
9
/ | \
5 1 2
/ \ \
1 3 1
/ \
1 1
2) Calculate the sum of distances for this chosen root. For the example, if you choose vertex 1 as a root, the sum of distances is
0 + 1 + 1 + 1 + 2 + 2 + 2 + 3 + 3 = 15
3) Traverse the tree in a depth-first-search manner. For example, starting from vertex 1, we traverse to vertex 4. We observe that for 7 nodes (1,2,3,5,6,8,9), we are getting further by 1 (add 7=9-2 to the score), for other 2 (4,7), we are getting closer by 1 (subtract 2). This gives the sum-of-distances equal to 15+(9-2)-2 = 20.
Suppose we traverse from 4 to 7 next. Now we get the sum of distances equal to 20+(9-1)-1 = 27 (getting further from 8 vertices, and getting closer to 1 vertex).
As another example if we traverse from 1 to 2, we get a sum of distances equal to 15+(9-5)-5 = 14. Vertex 2 is actually the solution for this example.
This would be my algorithm.
Each edge e={a,b} has the following properties:
a_count = number of nodes to a side (including a)
b_count = number of nodes to b side (including b)
a_sum = sum of distances from a to its subtree nodes
b_sum = sum of distances from b to its subtree nodes
a_count for node e={a,b} can be evaluated as following:
* get all edges of a, not including e, sum their a_count
* add 1 to the sum
a_sum for node e={a,b} can be evaluated as following:
* get all edges of a, not including e, sum their a_sum
* add a_count (it includes +1 for each enumerated edge and +1 for a)
You can freely do calculation in recursive function accepting node and direction parameters, saving obtained results in global array.
If you run this function on every edge of tree in both directions, you get full calculation for edges. Total time for all calculations is O(n), since once you get to some subtree, recursive nature will close the whole subtree in this direction and next calls will obtain result from global array, and you only do 2*n calls for your function.
For a node A final measure is sum of all B_count+B_sum of all edges connected to node. Do one run of this evaluation on nodes and select node with minimal value.

number of distinct smaller elements on left for each position in a array

I stumbled upon the problem of finding the number of distinct elements to the left and less than the element for each position in array.
Example:
For the array 1 1 2 4 5 3 6 the the answer would be 0 0 1 2 3 2 5
It's straight forward to solve the problem in O(n2), I wish to know if the problem could be solved in O(n*lg(n)).
Yes, you can just insert the elements into a balanced (red-black, AVG, whatever) binary search tree, storing the total subtree node count in each node. Updates are O(log N), as you only update along the path to root, and checking the number of distinct elements is also O(log N), as it requires summing the nodecount of left subtrees on the path from the new element to root.
This is how a tree might look after inserting [0,1,2,3,5,6], the subtree nodecounts in parentheses.
2(6)
/ \
1(2) 5(3)
/ / \
0(1) 3(1)6(1)
While inserting 6 (assuming it's last), you add:
2 (node count of left subtree of 2)
1 (the node with 2, because you take the right path, so root is smaller)
1 (the left subtree of 5)
1 (the node with 5, same reason, no left subtree to add)
Total 5. The tree is a bit too small to see the savings from keeping the totals, but note that you don't need to visit the 0 node, it's accounted for in its parent - the 1 node.

Enumerate search trees

According to this question the number of different search trees of a certain size is equal to a catalan number. Is it possible to enumerate those trees? That is, can someone implement the following two functions:
Node* id2tree(int id); // return root of tree
int tree2id(Node* root); // return id of tree
(I ask because the binary code for the tree (se one of the answers to this question) would be a very efficient code for representing arbitrarily large integers of unknown range, i.e, a variable length code for integers
0 -> 0
1 -> 100
2 -> 11000
3 -> 10100
4 -> 1110000
5 -> 1101000
6 -> 1100100
7 -> 1011000
8 -> 1010100
etc
notice that the number of integers of each code length is 1, 1, 2, 5,.. (the catalan sequence). )
It should be possible to convert the id to tree and back.
The id and bitstrings being:
0 -> 0
1 -> 100
2 -> 11000
3 -> 10100
4 -> 1110000
5 -> 1101000
6 -> 1100100
7 -> 1011000
8 -> 1010100
First consider the fact that given a bitstring, we can easily move to the tree (a recursive method) and viceversa (preorder, outputting 1 for parent and 0 for leaf).
The main challenge comes from trying to map the id to the bitstring and vice versa.
Suppose we listed out the trees of n nodes as follows:
Left sub-tree n-1 nodes, Right sub-tree 0 nodes. (Cn-1*C0 of them)
Left sub-tree n-2 nodes, Right sub-tree 1 node. (Cn-2*C1 of them)
Left sub-tree n-3 nodes, right sub-tree 2 nodes. (Cn-3*C2 of them)
...
...
Left sub-tree 0 nodes, Right sub-tree n-1 nodes. (C0*Cn-1 of them)
Cr = rth catalan number.
The enumeration you have given seems to come from the following procedure: we keep the left subtree fixed, enumerate through the right subtrees. Then move onto the next left subtree, enumerate through the right subtrees, and so on. We start with the maximum size left subtree, then next one is max size -1, etc.
So say we have an id = S say. We first find an n such that
C0 + C1 + C2 + ... + Cn < S <= C0+C1+ C2 + ... +Cn+1
Then S would correspond to a tree with n+1 nodes.
So you now consider P = S - (C0+C1+C2+ ...+Cn), which is the position in the enumeration of the trees of n+1 nodes.
Now we figure out an r such that Cn*C0 + Cn-1*C1 + .. + Cn-rCr < P <= CnC0 + Cn-1*C1 + .. + Cn-r+1*Cr-1
This tell us how many nodes the left subtree and the right subtree have.
Considering P - Cn*C0 + Cn-1*C1 + .. + Cn-r*Cr , we can now figure out the exact left subtree enumeration position(only considering trees of that size) and the exact right subtree enumeration position and recursively form the bitstring.
Mapping the bitstring to the id should be similar, as we know what the left subtree and right subtrees look like, all we would need to do is find the corresponding positions and do some arithmetic to get the ID.
Not sure how helpful it is though. You will be working with some pretty huge numbers all the time.
For general (non-search) binary trees I can see how this would be possible, since when building up the tree there are three choices (the amount of children) for every node, only restricted by having the total reach exactly N. You could find a way to represent such a tree as a sequence of choices (by building up the tree in a specific order), and represent that sequence as a base-3 number (or perhaps a variable base would be more appropriate).
But for binary search trees, not every organisation of elements is acceptable. You have to obey the numeric ordering constraints as well. On the other hand, since insertion into a binary search tree is well-defined, you can represent an entire tree of N elements by having a list of N numbers in a specific insertion order. By permuting the numbers to be in a different order, you can generate a different tree.
Permutations are of course easily counted by using variable-base numbers: You have N choices for the first item, N-1 for the second, etc. That gives you a sequence of N numbers that you can encode as a number with base varying from N to 1. Encoding and decoding from variable-base to binary or decimal is trivially adapted from a normal fixed-base conversion algorithm. (The ones that use modulus and division operations).
So you can convert a number to and from a permutation, and given a list of numbers you can convert a permutation (of that list) from and to a binary search tree. Now I think that you could get all the possible binary search trees of size N by permuting just the integers 1 to N, but I'm not entirely sure, and attempting to prove that is a bit too much for this post.
I hope this is a good starting point for a discussion.

Resources