Number of binary search trees over n distinct elements - algorithm

How many binary search trees can be constructed from n distinct elements? And how can we find a mathematically proved formula for it?
Example:
If we have 3 distinct elements, say 1, 2, 3, there
are 5 binary search trees.

Given n elements, the number of binary search trees that can be made from those elements is given by the nth Catalan number (denoted Cn). This is equal to
Intuitively, the Catalan numbers represent the number of ways that you can create a structure out of n elements that is made in the following way:
Order the elements as 1, 2, 3, ..., n.
Pick one of those elements to use as a pivot element. This splits the remaining elements into two groups - those that come before the element and those that come after.
Recursively build structures out of those two groups.
Combine those two structures together with the one element you removed to get the final structure.
This pattern perfectly matches the ways in which you can build a BST from a set of n elements. Pick one element to use as the root of the tree. All smaller elements must go to the left, and all larger elements must go to the right. From there, you can then build smaller BSTs out of the elements to the left and the right, then fuse them together with the root node to form an overall BST. The number of ways you can do this with n elements is given by Cn, and therefore the number of possible BSTs is given by the nth Catalan number.
Hope this helps!

I am sure this question is not just to count using a mathematical formula.. I took out some time and wrote the program and the explanation or idea behind the calculation for the same.
I tried solving it with recursion and dynamic programming both. Hope this helps.
The formula is already present in the previous answer:
So if you are interested in learning the solution and understanding the apporach you can always check my article Count Binary Search Trees created from N unique elements

Let T(n) be the number of bsts of n elements.
Given n distinct ordered elements, numbered 1 to n, we select i as the root.
This leaves (1..i-1) in the left subtree for T(i-1) combinations and (i+1..n) in the right subtree for T(n-i) combinations.
Therefore:
T(n) = sum_i=1^n(T(i-1) * T(n-i))
and of course T(1) = 1

Related

Calculating the number of 2-3 trees if the number of nodes is given

I'm trying to find out to get count of available trees' number if the number of data are given..
ex) If there are 8 different data, How many tree can be made?
The question is not totally clear. I'll assume you want the number of full binary (resp. ternary) trees with exactly n leaves (if you're rather looking for n nodes in total, you'll be able to find it from my result; if you want the number of trees that can store a given set of n different data, i.e. take into account all the ways to store your data in such a tree, then you should also be able to get it easily).
Let's consider the case of full binary trees. If you want n leaves in it, you can call k the number of leaves in the left subtree, you'll have n-k on the right. These subtrees have N(k) and N(n-k) possibilities respectively,
Then you have N(n)=sum(N(k)*N(n-k), k=1..n-1). With N(1)=1.
This is the (a?) definition of Catalan Numbers: https://en.wikipedia.org/wiki/Catalan_number
N(n)=C(n-1)
You can do something similar for ternary trees

Huffman Tree with Max-Height, Nice Questions?

I ran into a nice question in one Solution of Homework in DS course.
which of the following (for large n) create the most height for Huffman Tree. the elements of each sequence in following option shows the frequencies of character in input text and not shown the characters.
1) sequence of n equal numbers
2) sequence of n consecutive Fibonacci numbers.
3) sequence <1,2,3,...,n>
4) sequence <1^2,2^2,3^2,...,n^2>
Anyone could say, why this solution select (2)? thanks to anyone.
Let's analyze the various options here.
A sequence of N equal numbers means a balanced tree will be created with the actual symbols at the bottom leaf nodes.
A sequence 1-N has the property that as you start grouping the two lowest element their sum will quickly rise above other elements, here's an example:
As you can see, the groups from 4+5 and 7+8 did not by themselves contribute to the height of the tree.
After grouping the two 3-nodes into a 6, nodes 4 and 5 are the next in line, which means that each new group formed won't contribute to its height. Most will, but not all, and that's the important fact.
A sequence using squares (note: squares as in the third sequence in the question, 1^2, 2^2, 3^2, 4^2, ..., N^2, not square diagram elements) has somewhat the same behavior as a sequence of 1-N, some of the time other elements than the one that was just formed will be used, which cuts down on the height:
As you can see here, the same happened to 36+49, it did not contribute to the height of the tree.
However, the fibonacci sequence is different. As you group the two lowest nodes, their sum will at most topple the next item but not more than one of them, which means that each new group being formed will be used in the next as well, so that each new group formed will contribute to the height of the tree. This is different from the other 3 examples.

Minimum number of swaps to sort a list of triangles

Let L = list of S's where S = list of lengths of sides of a triangle.
Then, I need to find the minimum number of swaps required to sort the list L.
The list L is said to be sorted:
if the elements within each S list are sorted in non-decreasing order and
if the elements at ith index of each S list are sorted in non-decreasing order.
where 0 <= i <= 2
Note: Two types of swap operations can be done :
Either elements withing a S list can be swapped (requires 1 swap)
Two complete S lists can be swapped without changing the order of elements.
(requires 3 swaps)
Any efficient algorithm in terms of Time Complexity to find the minimum number of swaps required to sort the list L whenever possible?
EDIT:
As pointed out correctly by #Mbo, it is not always possible to sort such a list L. So, it would be great if someone provides an algorithm to check if the list L can be sorted followed by sorting if possible.
Maybe I misunderstood, but it seems to me that you have to sort the 3 elements in each S. Once this is done, it is simply a matter of sorting the triplets in order of the list {S0[0],S1[0]...} and then checking whether the sequences {S0[1], S1[1].....}, {S0[2], S1[2]..} are in increasing order. If you have n triangles the asymptotic worst case complexity would be O(n)

Finding closest number in a range

I thought a problem which is as follows:
We have an array A of integers of size n, and we have test cases t and in every test cases we are given a number m and a range [s,e] i.e. we are given s and e and we have to find the closest number of m in the range of that array(A[s]-A[e]).
You may assume array indexed are from 1 to n.
For example:
A = {5, 12, 9, 18, 19}
m = 13
s = 4 and e = 5
So the answer should be 18.
Constraints:
n<=10^5
t<=n
All I can thought is an O(n) solution for every test case, and I think a better solution exists.
This is a rough sketch:
Create a segment tree from the data. At each node, besides the usual data like left and right indices, you also store the numbers found in the sub-tree rooted at that node, stored in sorted order. You can achieve this when you construct the segment tree in bottom-up order. In the node just above the leaf, you store the two leaf values in sorted order. In an intermediate node, you keep the numbers in the left child, and right child, which you can merge together using standard merging. There are O(n) nodes in the tree, and keeping this data should take overall O(nlog(n)).
Once you have this tree, for every query, walk down the path till you reach the appropriate node(s) in the given range ([s, e]). As the tutorial shows, one or more different nodes would combine to form the given range. As the tree depth is O(log(n)), that is the time per query to reach these nodes. Each query should be O(log(n)). For all the nodes which lie completely inside the range, find the closest number using binary search in the sorted array stored in those nodes. Again, O(log(n)). Find the closest among all these, and that is the answer. Thus, you can answer each query in O(log(n)) time.
The tutorial I link to contains other data structures, such as sparse table, which are easier to implement, and should give O(sqrt(n)) per query. But I haven't thought much about this.
sort the array and do binary search . complexity : o(nlogn + logn *t )
I'm fairly sure no faster solution exists. A slight variation of your problem is:
There is no array A, but each test case contains an unsorted array of numbers to search. (The array slice of A from s to e).
In that case, there is clearly no better way than a linear search for each test case.
Now, in what way is your original problem more specific than the variation above? The only added information is that all the slices come from the same array. I don't think that this additional constraint can be used for an algorithmic speedup.
EDIT: I stand corrected. The segment tree data structure should work.

minimum number of comparisons needed

what is the minimum number of comparisons needed to find the largest element from 4 distinct elements? I know for 5 distinct numbers it is 6, floor(5/2) * 3; this is from clrs book. but I know there is no one general formula for finding this, or is there?
edit clarification
these 4 elements could be in any different order(for all permutations of these 4 elements) im not interested in a counting technique to keep track of the largest element as you traverse the elements, but comparisons like > or <.
for 4 elements the min. number of comparisons is 3.
In general, to find largest of N elements you need N-1 comparisons. This gives you 4 for 5 numbers, not 6.
Proof:
there is always a solution with N-1 comparisons: just compare first two and then select the larger and compare with next one, select the larger and compare with next one etc....
there cannot be shorter solution because this solution would not compare all the elements.
QED.
I know it does not answer the original question, but I enjoyed reading this not-so-intuitive post on the minimum number of comparisons needed to find the smallest AND the largest number from an unsorted array (with proof).
Think of it as a competition. By comparing two elements you have a looser and a winner.
So if you have n elements and need 1 final winner you need n-1 comparisons to rule out the other ones.
for elements a,b,c,d
if a>b+c+d, then it only required one comparison to know that a is the biggest.
You do have to get lucky though.

Resources