Ratio of leaves to total nodes in a Fibonacci call stack - algorithm

If you were to look at a recursive implementation of calculating the nth Fibonacci number (root 100, children 99 and 98, grandchildren 98, 97, 97, and 96, etc. etc.), roughly what would be the ratio of the number of leaves to the total number of nodes in the recursive tree?
100
/ \
98 97
/ \ .
96 97 .
. . .
. .
Not homework, just academically curious about this. (And yes, I realize that a recursive implementation is a god-awful way to calculate Fibonacci numbers)

The number of leaves is simply F(n) since the F(i) is simply the number of leaves beneath that node. Do you see why? (hint: use induction)
The number of non-leaf nodes is the number of leaf nodes-1. This is a property of binary trees. So the total number of nodes is F(n) + F(n)-1 = 2F(n)-1.
The ratio thus approaches 1/2 as n grows large.

fib(x) consist of leaves fib(x-1) and leaves of fib(x-2). So you get the same recursive equation as you have for fibonacci numbers.
If the termination point (leaves) are Fib1 and Fib0, then
tree numofleaves
fib2 2
fib3 3
fib4 5
fib5 8
fib6 13
...
and numofleaves(x) = fib(x+1).
For the number of nodes you get the equation numnodes(x) = 1 + numnodes(x-1) + numnodes(x-2).

I actually worked out a proof by induction that shows that the number of leaves in a Fibonacci tree always exceeds the number of internal nodes.
Proof: Let E(n) be the # of leaves of fibonacci tree for input n,
and M(n) be the # of internal nodes of fibonacci tree for input n
E(n) >= M(n) + 1
Base cases:
f(0): E(0) = 1, M(0) = 0
f(1): E(1) = 1, M(1) = 1
f(2): E(2) = 2, M(2) = 1
f(3): E(3) = 3, M(3) = 2
The leaves of a tree of size n is equal to the leaves
of each sub-tree:
E(n) = E(n - 1) + E(n - 2)
The internal nodes of a tree of size n is equal to the internal nodes of each
sub-tree, plus the root
M(n) = M(n - 1) + M(n - 2) + 1
E(n) >= [M(n - 1) + 1] + [M(n - 2) + 1], (by the Inductive Hypothesis)
So, E(n) = M(n - 1) + M(n - 2) + 2
So, E(n) >= M(n) + 1,
QED

Related

Complexity of searching sorted matrix

Suppose we have a matrix of size NxN of numbers where all the rows and columns are in increasing order, and we want to find if it contains a value v. One algorithm is to perform a binary search on the middle row, to find the elements closest in value to v: M[row,col] < v < M[row,col+1] (if we find v exactly, the search is complete). Since the matrix is sorted we know that v is larger than all elements in the sub-matrix M[0..row, 0..col] (the top-left quadrant of the matrix), and similarly it's smaller than all elements in the sub-matrix M[row..N-1, col+1..N-1] (the bottom right quadrant). So we can recursively search the top right quadrant M[0..row-1, col+1..N-1] and the bottom left quadrant M[row+1..N-1, 0..col].
The question is what is the complexity of this algorithm ?
Example: Suppose we have the 5x5 matrix shown below and we are searching for the number 25:
0 10 20 30 40
1 11 21 31 41
2 12 22 32 42
3 13 23 33 43
4 14 24 34 44
In the first iteration we perform binary search on the middle row and find the closest element which is smaller than 25 is 22 (at row=2 col=2). So now we know 25 is larger than all items in the top-left 3x3 quadrant:
0 10 20
1 11 21
2 12 22
Similary we know 25 is smaller than all elements in the bottom right 3x2 quadrant:
32 42
33 43
34 44
So, we recursively search the remaining quadrants - the top right 2x2:
30 40
31 41
and the bottom left 2x3:
3 13 23
4 14 24
And so on. We essentially divided the matrix into 4 quadrants (which might be of different sizes depending on the result of the binary search on the middle row), and then we recursively search two of the quadrants.
The worst-case running time is Theta(n). Certainly this is as good as it gets for correct algorithms (consider an anti-diagonal, with elements less than v above and elements greater than v below). As far as upper bounds go, the bound for an n-row, m-column matrix is O(n log(2 + m/n)), as evidenced by the correct recurrence
m-1
f(n, m) = log m + max [f(n/2, j) + f(n/2, m-1 - j)],
j=0
where there are two sub-problems, not one. This recurrence is solvable by the substitution method.
?
f(n, m) ≤ c n log(2 + m/n) - log(m) - 2 [hypothesis; c to be chosen later]
m-1
f(n, m) = log m + max [f((n-1)/2, j) + f((n-1)/2, m-j)]
j=0
m-1
≤ log m + max [ c (n/2) log(2 + j/(n/2)) - log(j) - 2
+ c (n/2) log(2 + (m-j)/(n/2))] - log(m-j) - 2]
j=0
[fixing j = m/2 by the concavity of log]
≤ log m + c n log(2 + m/n) - 2 log(m/2) - 4
= log m + c n log(2 + m/n) - 2 log(m) - 2
= c n log(2 + m/n) - log(m) - 2.
Set c large enough that, for all n, m,
c n log(2 + m/n) - log(m) - 2 ≥ log(m),
where log(m) is the cost of the base case n = 1.
If you find your element after n steps, then the searchable range has size N = 4^n. Then, time complexity is O(log base 4 of N) = O(log N / log 4) = O(0.5 * log N) = O(log N).
In other words, your algorithm is two times faster then binary search, which is equal to O(log N)
A consideration on binary search on matrices:
Binary search on 2D matrices and in general ND matrices are nothing different than binary search on sorted 1D vectors. Infact C for instance store them in row-major fashion(as concat of rows from: [[row0],[row1]..[rowk]]
This means one can use the well-known binary search on matrix as following (with complexity log(n*m)):
template<typename T>
bool binarySearch_2D(T target,T** matrix){
int a=0;int b=NCELLS-1;//ROWS*COLS
bool found=false;
while(!found && a <= b){
int half=(a+b)/2;
int r=half/COLS;
int c=half-(half/COLS)*COLS;
int v =matrix[r][c];
if(v==target)
found=true;
else if(target > v)
a=half+1;
else //target < v
b=half-1;
}
return found;
}
The complexity of this algorithm will be -:
O(log2(n*n))
= O(log2(n))
This is because you are eliminating half of the matrix in one iteration.
EDIT -:
Recurrence relation -:
Assuming n to be the total number of elements in the matrix,
=> T(n) = T(n/2) + log(sqrt(n))
=> T(n) = T(n/2) + log(n^(1/2))
=> T(n) = T(n/2) + 1/2 * log(n)
Here, a = 1, b = 2.
Therefore, c = logb(a) = log2(1) = 0
=> n^c = n^0
Also, f(n) = n^0 * 1/2 * log(n)
According to case 2 of Master Theorem,
T(n) = O((log(n))^2)
You can use a recursive function and apply the master theorem to find the complexity.
Assume n is the number of elements in the matrix.
Cost for one step is binary search on sqrt(n) elements and you get two problems, in worst case same size each with n/4 elements: 2*T(n/4). So we have:
T(n)=2*T(n/4)+log(sqrt(n))
equal to
T(n)=2*T(n/4)+log(n)/2
Now apply master theorem case 1 (a=2, b=4, f(n)=log(n)/2 and f(n) in O(n^log_b(a))=O(n^(1/2)) therefore we have case 1)
=> Total running time T(n) is in O(n^(a/b)) = O(n^(1/2))
or equal to
O(sqrt(n))
which is equal to height or width of the matrix if both sides are the same.
Let's assume that we have the following matrix:
1 2 3
4 5 6
7 8 9
Let's search for value 7 using binary search as you specified:
Search nearest value to 7 in middle row: 4 5 6, which is 6.
Hmm we have a problem, 7 is not in the following submatrix:
6
9
So what to do? One solution would be to apply binary search to all rows, which has a complexity of nlog(n). So walking the matrix is a better solution.
Edit:
Recursion relation:
T(N*N) = T(N*N/2) + log(N)
if we normalize the function to one variable with M = N^2:
T(M) = T(M/2) + log(sqrt(M))
T(M) = T(M/2) + log(M)/2
According to Master Theorem Case #2, complexity is
(log(M))^2
=> (2log(N))^2
=> (log(N))^2
Edit 2:
Sorry I answered your question from my mobile, now when you think about it, M[0...row-1, col+1...N-1] doesn't make much sense right? Consider my example, if you search for a value that is smaller than all values in the middle row, you'll always end up with the leftmost number. Similarly, if you search for a value that is greater than all values in the middle row, you'll end up with the rightmost number. So the algorithm can be reworded as follows:
Search middle row with custom binary search that returns 1 <= idx <= N if found, idx = 0 or idx = N+1 if not found. After binary search if idx = 0, start the search in the upper submatrix: M[0...row][0...N].
If the index is N + 1 start the search in the lower submatrix: M[row+1...N][0...N]. Otherwise, we are done.
You suggest that complexity should be: 2T(M/4) + log(M)/2. But at each step, we divide the whole matrix by two and only process one of them.
Moreover, if you agree that T(N*N) = T(N*N/2) + log(N) is correct, than you can substitute all N*N expressions with M.

What is the minimal depth of a 4-ary tree with n nodes?

The question is:
What is the minimal depth of a 4-ary tree with n nodes?
I can't find the correct log that is the answer, I know that for n = 1 the depth is 0, if 2 <= n <= 5 it is 1, if 6 <= n <= 21 it is 2
Thanks in advance!
That is a math question.
Lets find the relation f between the height h and the number of nodes n in a full tree. I'll to it with recursion.
n = f(h). The base is easy, as you said: f(0)=1.
We can see that each level contains exactly 4^i nodes, where i is the distance from the root. So, after summarizing all levels we have:
f(h) = 4^h + f(h-1) = 4^h + 4^(h-1) + ... 4^1 + 4^0 = (4^(h+1)-1)/3 =n [sum of geometric series]
Isolating h:
h = log_4(3n+1) - 1 and you should take the ceil() of that, because you want it to apply on non-full trees as well.
Generalization for k-ary is easy now, as:
f_k(h) = (k^(h+1)-1)/(k-1), so h = ceil(log_k((k-1)n + 1) - 1)

k-ary Trees Induction Proof

I'm getting stumped by this problem:
You have a tree with every internal node having k children, with k >= 2.
What is the maximum number of nodes that such a tree can have, if its depth is d? Prove your
answer by induction on d.
So I realize that if k was 2, the geometric series would be 1 + 2 + 4 + 8...+2^n, but I can't figure out how to include depth and how to prove it inductively.
The number of items in a full k-ary tree of n levels is (k^n - 1)/(k - 1).
A binary tree of 5 levels, for example, has 31 nodes (1 + 2 + 4 + 8 + 16). Or:
(2^5 - 1)/(2 - 1) = 31/1 = 31
A 4-ary tree of 4 levels has 85 nodes (1 + 4 + 16 + 64)
(4^4 - 1)/(4 - 1) = 256/3 = 85
If you write out a few of those for different values of k, you should be able to derive the inductive proof.

Worst case in Max-Heapify - How do you get 2n/3?

In CLRS, third Edition, on page 155, it is given that in MAX-HEAPIFY,
The children’s subtrees each have size at most 2n/3—the worst case
occurs when the bottom level of the tree is exactly half full.
I understand why it is worst when the bottom level of the tree is exactly half full. And it is also answered in this question worst case in MAX-HEAPIFY: "the worst case occurs when the bottom level of the tree is exactly half full"
My question is how to get 2n/3?
Why if the bottom level is half full, then the size of the child tree is up to 2n/3?
How to calculate that?
Thanks
In a tree where each node has exactly either 0 or 2 children, the number of nodes with 0 children is one more than the number of nodes with 2 children.{Explanation: number of nodes at height h is 2^h, which by the summation formula of a geometric series equals (sum of nodes from height 0 to h-1) + 1; and all the nodes from height 0 to h-1 are the nodes with exactly 2 children}
ROOT
L R
/ \ / \
/ \ / \
----- -----
*****
Let k be the number of nodes in R. The number of nodes in L is k + (k + 1) = 2k + 1. The total number of nodes is n = 1 + (2k + 1) + k = 3k + 2 (root plus L plus R). The ratio is (2k + 1)/(3k + 2), which is bounded above by 2/3. No constant less than 2/3 works, because the limit as k goes to infinity is 2/3.
Understand the maximum number of elements in a subtree happens for the left subtree of a tree that has the last level half full.Draw this on a piece of paper to realize this.
Once that is clear, the bound of 2N/3 is easy to get.
Let us assume that the total number of nodes in the tree is N.
Number of nodes in the tree = 1 + (Number of nodes in Left Subtree) + (Number of nodes in Right Subtree)
For our case where the tree has last level half full, iF we assume that the right subtree is of height h, then the left subtree if of height (h+1):
Number of nodes in Left Subtree =1+2+4+8....2^(h+1)=2^(h+2)-1 .....(i)
Number of nodes in Right Subtree =1+2+4+8....2^(h) =2^(h+1)-1 .....(ii)
Thus, plugging into:
Number of nodes in the tree = 1 + (Number of nodes in Left Subtree) + (Number of nodes in Right Subtree)
=> N = 1 + (2^(h+2)-1) + (2^(h+1)-1)
=> N = 1 + 3*(2^(h+1)) - 2
=> N = 3*(2^(h+1)) -1
=> 2^(h+1) = (N + 1)/3
Plugging in this value into equation (i), we get:
Number of nodes in Left Subtree = 2^(h+2)-1 = 2*(N+1)/3 -1 =(2N-1)/3 < (2N/3)
Hence the upper bound on the maximum number of nodes in a subtree for a tree with N nodes is 2N/3.
For a complete binary tree of height h, number of nodes is f(h) = 2^h - 1. In above case we have nearly complete binary tree with bottom half full. We can visualize this as collection of root + left complete tree + right complete tree. If height of original tree is h, then height of left is h - 1 and right is h - 2. So equation becomes
n = 1 + f(h-1) + f(h-2) (1)
We want to solve above for f(h-1) expressed as in terms of n
f(h-2) = 2^(h-2) - 1 = (2^(h-1)-1+1)/2 - 1 = (f(h-1) - 1)/2 (2)
Using above in (1) we have
n = 1 + f(h-1) + (f(h-1) - 1)/2 = 1/2 + 3*f(h-1)/2
=> f(h-1) = 2*(n-1/2)/3
Hence O(2n/3)
To add to swen's answer. How (2k + 1) / (3k + 2) tends to 2 / 3, when k tends to infinity,
Lim_(k -> inf) (2k + 1) / (3k + 2) = Lim_(k -> inf) k(2 + 1 / k) / k(3 + 2 / k) = Lim_(k -> inf) (2 + 1 / k) / (3 + 2 / k)
apply the limit, and you get 2/3
Number of nodes at -
level 0 i.e. root is 2^0
level 1 is 2^1
level 2 is 2^2
...
level n is 2^n
Summation of all nodes from level 0 up to level n,
S = 2^0 + 2^1 + 2^2 + ... + 2^n
From geometric series summation rule we know that
x^0 + x^1 + x^2 + ... + x^(n) = (x^(n+1) - 1)/(x-1)
Substituting x = 2, we get
S = 2^(n+1) - 1. i.e. 2^(n+1) = S + 1
As 2^(n+1) is the total nodes at level n+1, we can say that the number of nodes with 0 children is one more than the number of nodes with 2 children.
Now lets calculate number of nodes in left subtree, right tree and total ..
Assume that number of non-leaf nodes in the left subtree of root = k.
By the above reasoning, number of leaf nodes in the left subtree or
root = k + 1.
Number of non-leaf nodes in the right subtree of root = k as the tree is said to be exactly half full.
Total number of nodes in the left subtree of root = k + k + 1 = 2k +
Total number of nodes in the tree, n = (2k + 1) + k + 1 = 3k + 2.
Ratio of nodes in the left subtree and total nodes = (2k + 1) / (3k +
2) which is bounded above by 2/3.
That's the reason of saying that the children’s subtrees each have size at most 2n/3.

Finding the no of leaf nodes

An N-ary tree has N sub-nodes for each node. If the tree has M non-leaf nodes, How to find the no of leaf nodes?
First of all if the root is level 0, then the K-th level of the tree will have N^K nodes. You can start incrementing a counter level by level until you get M nodes. This way you will find how many levels is the tree consisting of. And the number of leaf nodes is the number of nodes on the last level - it is N^lastLevel.
Here is an example: N = 3, M = 4.
First level = 3^0 = 1
Second level = 3^1 = 3
1 + 3 = 4
So we found that the tree has two levels(counting from 0).
The answer is 3^2 = 9.
Note: You can find the level number also directly, by noticing that M is a sum of geometric progression: 1 + 3 + 9 + 27 ... = M
Hope it is clear.
Mathematically speaking the nodes increase in the geometric progression.
0th level - 1
1st level - n
2nd level - n ^2
3rd level - n ^ 3
....
mth level - n ^ m
So the total number of nodes at m-1st level is 1 + n + n^2 + .. + n ^ m-1.
Now there is a good formula to calculate 1 + a + a^2 + a^3 + ... + a^m , which is
(1 - n^(m+1))/(1-n), lets call this quantity K.
Now what we need is the number of leaf nodes which is n ^ m, and what we have is K. i.e. total number of non-leaf nodes. Doing some mathematical formula adjustment you will find that
n ^ m = K *(n-1) + 1.
e.g. Lets say in 3-ary tree the total number of non-leaf nodes are 40, then using this formula you get the total number of leaf-nodes as 81 which is the right answer.

Resources