Compute the number of binary trees with i nodes - binary-tree

Let bi be the number of binary trees with i nodes. Compute b10.
This is a problem I've come upon.
I've able to come up with these so far:
B0=1
B1=1
B2=2
B3=5
B4=12
It quickly gets a bit too much as I gets bigger.
Can anyone think of a better way to compute Bi than just drawing out the trees and count them?

I typed your answer into OEIS and it came up with a few results.
A promising result is A000669 - the number of series-reduced planted trees with n leaves. The following example is provided: a(4)=5 with the following series-reduced planted trees: (oooo), (oo(oo)), (o(ooo)), (o(o(oo))), ((oo)(oo)). That said, our trees are not necessarily planted.
However, after a bit of work, I must inform you that your value for B4 is incorrect - the correct answer is 14. Then the answer is clear: the Catalan numbers. The Catalan numbers count a strange and varied number of things, including the problem you're presented here (via Wolfram). It is worth noting Catalan number identity (8) here - the recurrence that defines the Catalan numbers. This summation can be thought of as deciding the number of nodes that will be to the left of a node (and the rest will be to the right).
An easier way to conceptualize this is using Dyck words. let X mean 'left parenthesis' and Y mean '0'. (I am using a list representation for trees - nodes to the left are lists on the left of an element and visa versa; if a node has no left or right lists it is considered a leaf.) We will put in right parentheses where appropriate. Then our trees for B3 are as follows:
(((0)0)0) => X X X Y Y Y
((0)0(0)) => X X Y Y X Y
(0(0(0))) => X Y X Y X Y
((0(0))0) => X X Y X Y Y
(0((0)0)) => X Y X X Y Y
From Wikipedia, the five 2n-length Dyck words of this form are XXXYYY, XYXXYY, XYXYXY, XXYYXY, and XXYXYY. And finally, the closed form
Bn = (1 / (n + 1)) * (2n choose n) = (2n!)/((n+1)!(n!))

Related

How to find number of steps to transform (a,b) to (x,y)

Given 2 numbers a=1 and b=1.
At each steps, you can do one of the following:
a+=b;
b+=a;
If it's possible to transform a into x and b into y, find the minimum steps needed
x and y can be arbitrarily large (more than 10^15)
My approach so far was just to do a recursive backtrack which will be around O(2^min(x,y)) in complexity (too large). DP won't do either since the states can be more than 10^15.
Any idea? Is there any number theory that is needed to solve this?
P.s. This is not a homework.
Given that you reached some (x,y) the only way to get there is if you added the smaller value into what is now the larger value. Say x > y, then the only possible previous state is x-y, y.
Also note that the number of steps to get to x,y is the same to get to y,x.
So the solution you are looking for is something like
steps(x,y):
if x < y: return steps(y, x)
if y == 1: return x - 1
if y == 0: throw error # You can't get this combination.
return x / y + steps (y, x % y)
In other words, find the depth of a node in the Calkin--Wilf tree. The node exists iff gcd(a, b) = 1. You can modify the gcd algorithm to give the number of operations as a byproduct (sum all of the quotients computed along the way and subtract one).

How to prove the correctness of the algorithm for "Arrange given numbers to form the biggest number"?

Arrange given numbers to form the biggest number gives the algorithm.
It uses the following text to prove the correctness of the algorithm:
So how do we go about it? The idea is to use any comparison based sorting algorithm. In the used sorting algorithm, instead of using the default comparison, write a comparison function myCompare() and use it to sort numbers. Given two numbers X and Y, how should myCompare() decide which number to put first – we compare two numbers XY (Y appended at the end of X) and YX (X appended at the end of Y). If XY is larger, then X should come before Y in output, else Y should come before. For example, let X and Y be 542 and 60. To compare X and Y, we compare 54260 and 60542. Since 60542 is greater than 54260, we put Y first.
Consider three numers: X, Y and Z. Use X -> Y to indicate that X should come before Y. A comparison based algorithm can use the following two comparisons to sort X, Y and Z into XYZ: XY >= YX => X -> Y and YZ >= ZY => Y -> Z. But these two comparisons do not necessarily ensure that XYZ is the largest number. In other words, the fact that X should come before Y and Y should come before Z does not necessarily ensure that XYZ form the largest number. Take YZX as an example. To prove XYZ >= YZX, we need to prove that X(YZ) >= (YZ)X which meains that X should before YZ as a whole to form a bigger number.
Can anyone give a formal proof of the correctness of the algorithm?
First we will prove that if X "<" Y and Y "<" Z then X "<" Z. Assuming that they have p, q and r digits respectively, the first two relations reduce to
X * 10^q + Y ≥ Y * 10^p + X ⇒ X * (10^q - 1) ≥ Y * (10^p - 1)
Y * 10^r + Z ≥ Z * 10^q + Y ⇒ Y * (10^r - 1) ≥ Z * (10^q - 1)
We want to prove
X * 10^r + Z ≥ Z * 10^p + X which is equivalent to X * (10^r - 1) ≥ Z * (10^p - 1)
But this can be proved simply by multiplying the first two inequalities and cancelling off common terms.
Now that we have shown that the relation is transitive (and thus can be used to define a sort order), it is easy to show that it works to solve the problem.
Suppose the numbers given are A, B, C … such that A "<" B "<" C "<" D…. We will show that A has to come first in the final number. If not, we have a string like (some prefix)XA(some suffix) as the final number. Easily, (some prefix)AX(some suffix) is a larger number because A "<" X for all X due to transitivity. Continuing in this fashion A bubbles to the left till it becomes the first element.
Now that we have fixed the first element, the same argument can be applied to B and so on to show that the best solution is ABCD…

Data structure to hold and retrieve points in a plane

Definition 1: Point (x,y) is controlling point (x',y') if and only if x < x' and y < y'.
Definition 2: Point (x,y) is controlled by point (x',y') if and only if x' < x and y' < y.
I'm trying to come up with data structure to support the following operations:
Add(x,y) - Adds a point (x,y) to the system in O(logn) complexity, where n is the number of points in the system.
Remove(x,y) - Removes a point (x,y) from the system in O(logn) complexity, where n is the number of points in the system.
Score(x,y) - Returns the number of points (x,y) controls - number of points that (x,y) is controlled by. Worst case complexity O(logn).
I've tried to solve it using multiple AVL trees, but could not come up with elegant enough solution.
Point (x,y) is controlling point (x',y') if and only if x < x' and y <
y'.
Point (x,y) is controlled by point (x',y') if and only if x' < x and
y' < y.
Lets assume that (x,y) is the middle of the square.
(x,y) is controlling points in B square and is being controlled by points in C.
The output required is the number of points (x,y) controls minus the number of points (x,y) is being controlled by. Which is the number of points in B minus the number of points in C,B-C(Referring to the number of points in A,B,C,D as simply A,B,C,D).
We can easily calculate the number of points in A+C, that's simply the number of points with x' < x.
Same goes for C+D (Points with y'y), B+D (x'>x).
We add up A+C to C+D which is A+2C+D.
Add up A+B to B+D which is A+2B+D.
Deduct the two: A+2B+D-(A+2C+D) = 2B-2C, divide by two: (2B-2C)/2 = B-C which is the output needed.
(I'm assuming handling the 1D case is simple enough and there is no need to explain.)
For the sake of future reference
Solution outline:
We will maintain two AVL trees.
Tree_X: will hold points sorted by their X coordinate.
Tree_Y: will hold points sorted by their Y coordinate.
Each node within both trees will hold the following additional data:
Number of leaves in left sub-tree.
Number of leaves in right sub-tree.
For a point $(x,y)$ we will define regions A ,B, C, D:
Point (x',y') is in A if x' < x and y' > y.
Point (x',y') is in B if x' > x and y' > y.
Point (x',y') is in C if x' < x and y' < y.
Point (x',y') is in D if x' > x and y' < y.
Now it is clear that Score(x,y) = |C|-|B|.
However |A|+|C|, |B|+|D|, |A|+|B|, |C|+|D| could be easily retrieved from our two AVL trees, as we will soon see.
And notice that [(|A| + |C| + |C| + |D|) - (|A| + |B| + |B| + |D|)]/2 = |C|-|B|
Implementation of required operations:
Add(x,y) - We will add point (x,y) to both of our AVL trees. Since the additional data we are storing is affected only on the insertion path and since the insertion occurs in (logn), the total cost of Add(x,y) is O(logn).
Remove(x,y) - We will remove point (x,y) from both of our AVL trees. Since the additional data we are storing is affected only on the removal path and since the removal occurs in (logn), the total cost of Remove(x,y) is O(logn).
Score(x,y) - I will show how to calculate $|B|+|D|$ as others done in similar way and same complexity costs. It is clear that $|B|+|D|$ is the number of points which satisfy $x' > x$. To calculate this number we will:
Find x in AVL_X. Complexity O(logn).
Go upwards in Tree_X until the root and on each turn right we will sum the number of elements in left sub-tree of the son. Complexity O(logn).
Total cost of Remove(x,y) is O(logn).

Algorithm to enumerate paths

Say you are standing at point 0 on the real line. At each step, you can either move to the left l places, or to the right r places. You intend to get to the number p. Also, there are some numbers on which you are not allowed to step. You want count in how many you can do this. All numbers mentioned are integers (l and r positive, of course). What would be a good method for counting this?
Note. You can step on p itself in the journey as well, so the answer is infinity in some cases.
It just like "how many integer(x,y) solutions with L*x+R*y=P".
I believe there are numbers of articles to this problem.
This is not an algorithmic question but rather a math question. Nevertheless, here is the solution. Let us assume that your numbers l and r are positive integers (none of them are zero).
A solution exists if, and only if, the equation r * x - l * y = p has nonnegative integer solutions (x, y). The equation expresses the fact that we walked x times to the right and y times to the left, in any order. The equation is known as Bézout identity and we know precisely what its solutions look like.
If gcd(r,l) divides p then there exists an integer solution (x0, y0), and every other solution is of the form x = x0 + k * r / gcd(l,r), y = y0 + k * l / gcd(l,r) where k runs through the integers. Clearly, if k is larger than both -x0 * gcd(l,r) / r and -y0 * gcd(l,r) / l then x and y are nonnegative, so we have infinitely many solutions.
If gcd(r,l) does not divide p then there are no solutions because the left hand side is always divisible by gcd(l,r) but the right-hand side is not.
In summary, your algorithm for counting the solutions looks like this:
if p % gcd(l,r):
return Infinity
else:
return 0
At this point it seems pointless to try to enumerate all the paths, because that will be a rather boring exercise. For each nonnegative solution (x,y) we simply enumerate all possible ways of arranging x moves to the right and y moves to the left. There will be (x+y)!/(x! * y!) such paths (among the x+y steps pick x which will be the moves to the right).

Minimum range of 3 sets

We have three sets S1, S2, S3. I need to find x,y,z such that
x E S1
y E S2
z E S3
let min denote the minimum value out of x,y,z
let max denote the maximum value out of x,y,z
The range denoted by max-min should be the MINIMUM possible value
Of course, the full-bruteforce solution described by IVlad is simple and therefore, easier and faster to write, but it's complexity is O(n3).
According to your algorithm tag, I would like to post a more complex algorithm, that has a O(n2) worst case and O(nlogn) average complexity (almost sure about this, but I'm too lazy to make a proof).
Algorithm description
Consider thinking about some abstract (X, Y, Z) tuple. We want to find a tuple that has a minimal distance between it's maximum and minimum element. What we can say at this point is that distance is actually created by our maximum element and minimum element. Therefore, the value of element between them really doesn't matter as long as it really lies between the maximum and the minimum.
So, here is the approach. We allocate some additional set (let's call it S) and combine every initial set (X, Y, Z) into one. We also need an ability to lookup the initial set of every element in the set we've just created (so, if we point to some element in S, let's say S[10] and ask "Where did this guy come from?", our application should answer something like "He comes from Y).
After that, let's sort our new set S by it's keys (this would be O(n log n) or O(n) in some certain cases)
Determining the minimal distance
Now the interesting part comes. What we want to do is to compute some artificial value, let's call it minimal distance and mark it as d[x], where x is some element from S. This value refers to the minimal max - min distance which can be achived using the elements that are predecessors / successors of current element in the sequence.
Consider the following example - this is our S set(first line shows indexes, second - values and letters X, Y and Z refer to initial sets):
0 1 2 3 4 5 6 7
------------------
1 2 4 5 8 10 11 12
Y Z Y X Y Y X Z
Let's say we want to compute that our minimal distance for element with index 4. In fact, that minimal distance means the best (x, y, z) tuple that can be built using the selected element.
In our case (S[4]), we can say that our (x, y, z) pair would definitely look like (something, 8, something), because it should have the element we're counting the distance for (pretty obvious, hehe).
Now, we have to fill the gaps. We know that elements we're seeking for, should be from X and Z. And we want those elements to be the best in terms of max - min distance. There is an easy way to select them.
We make a bidirectional run (run left, the run right from current element) seeking for the first element-not-from-Y. In this case we would seek for two nearest elements from X and Z in two directions (4 elements total).
This finding method is what we need: if we select the first element of from X while running (left / right, doesn't matter), that element would suit us better than any other element that follows it in terms of distance. This happens because our S set is sorted.
In case of my example (counting the distance for element with index number 4), we would mark elements with indexes 6 and 7 as suitable from the right side and elements with indexes 1 and 3 from the left side.
Now, we have to test 4 cases that can happen - and take the case so that our distance is minimal. In our particular case we have the following (elements returned by the previous routine):
Z X Y X Z
2 5 8 11 12
We should test every (X, Y, Z) tuple that can be built using these elements, take the tuple with minimal distance and save that distance for our element. In this example, we would say that (11, 8, 12) tuple has the best distance of 4. So, we store d[5] = 4 (5 here is the element index).
Yielding the result
Now, when we know how to find the distance, let's do it for every element in our S set (this operation would take O(n2) in the worst case and better time - something like O(nlogn) in average).
After we have that distance value for every element in our set, just select the element with minimal distance and run our distance counting algorithm (which is described above) for it once again, but now save the (-, -, -) tuple. It would be the answer.
Pseudocode
Here is comes the pseudocode, I tried to make it easy to read, but it's implementation would be more complex, because you'll need to code set lookups *("determine set for element"). Also note that determine tuple and determine distance routines are basically the same, but the second yields the actual tuple.
COMBINE (X, Y, Z) -> S
SORT(S)
FOREACH (v in S)
DETERMINE_DISTANCE(v, S) -> d[v]
DETERMINE_TUPLE(MIN(d[v]))
P.S
I'm pretty sure that this method could be easily used for (-, -, -, ... -) tuple seeking, still resulting in good algorithmic complexity.
min = infinity (really large number in practice, like 1000000000)
solution = (-, -, -)
for each x E S1
for each y E S2
for each z E S3
t = max(x, y, z) - min(x, y, z)
if t < min
min = t
solution = (x, y, z)

Resources