How to solve such this recurrence to find out time complexity - algorithm

There is this version of merge sort where the array is divided into n/3 and 2n/3 halves each time(instead of n/2 and n/2 originally).
The recurrence here would be:
T(n)=T(n/3)+T(2n/3)+n
Now the problem is, how to solve this to get the time complexity of this implementation?

There is Akra–Bazzi_method to calculate complexity for some more complex cases than Master Theorem is intended for.
In this example you'll get the same Theta(NlogN) as for equal parts (p=1, T=Theta(n(1+Integral(n/n^2*dn)))

T(n) denotes the total time taken by the algorithm.
We can calculate time complexity of this recurrence relation through recursion tree.
T(n)=T(n/3)+T(2n/3)+n ------- 1
Root node of T(n) is n, Root node will be expanded into 2 parts:
T(n/3) and T(2n/3)
In next step we will find root node value of T(n/3) and T(2n/3)
To compute T(n/3) substitute n/3 in place of n in equation 1
T(n/3)=T(n/9)+T(2n/9)+n/3
To compute T(2n/3) substitute 2n/3 in place of n in equation 1
T(2n/3)=T(2n/9)+T(4n/9)+2n/3
Root node of T(n/3) is n/3root node will be expanded into 2 parts:
T(n/9) and T(2n/9)
Expand root node value till you will get individual elements i.e T(1)
Calculation of depth:
For calculating depth, n/(b^i)=1
So we will get, n/(3^i) or n/((3/2)^i)
If n=9 then n/3=3, 2n/3=6
for next level n/9=1, 2n/9=2,4n/9=4
Right part of recursion tree n->2n/3->4n/9 this is the longest path that we will take to expand the root node
If we take left part of tree to expand the root node, we will use n/(3^i) to find the depth of tree, to know where the tree will stop
So here we are using right part of tree, n/((3/2)^i)
n=(3/2)^i
log n=log(3/2)^i
i=(logn base 3/2)
Now, calculating cost of each level
Since, cost of each level is same i.e. n
T(n) = cost of level * depth
T(n) = n * i
T(n) = n(logn base 3/2)
Or we can calculate using T(n)=n+n+n..... i times i.e T(n) = n * i
You can even find time complexity using Akra–Bazzi method

Related

Complexity of array sum with divide and conquer

Let the following algorithm be:
sum(v, i, j) {
if i == j
return v[i]
else {
k = (i + j) / 2
return sum(v, i, k) + sum(v, k+1, j)
}
}
The time complexity of this algorithm is O(n), but how can I prove (in natural language) its complexity? The problem always gets divided in two new problems so that would be O(log n), but where does the rest of the complexity come from?
Applying master theorem yields the expected result, O(n).
Thanks.
From a high level perspective, your algorithm acts as if it is traversing a balanced binary tree, where each node covers a specific interval [i, j]. Their children divide the interval into 2, roughly equal parts, namely [i, (i+j)/2] and [(i+j)/2 + 1, j].
Let's assume that they are, in this case equal. (in other words, for the sake of the proof, the length of the array n is a power of 2)
Think of it in the following way. There are n leaves of this balanced binary tree your algorithm is traversing. Each are responsible from an interval of length 1. There are n/2 nodes of the tree that are the parents of these n leaves. Those n/2 nodes have n/4 parents. This goes all the way until you reach the root node of the tree, which covers the entire interval.
Think of how many nodes there are in this tree. n + (n/2) + (n/4) + (n/8) + ... + 2 + 1. Since we initially assumed that n = 2^k, we can formulate this sum as the sum of exponents, for which the summation formula is well known. It turns out that there are 2^(k+1) - 1 = 2 * (2^k) - 1 = 2n - 1 nodes in that tree. So, obviously traversing all nodes of that tree would take O(n) time.
Dividing the problem in two parts does not necessarly mean that complexity is log(n).
I guess you are referring to binary search algorithm but in that every division each half is skipped as we know search key would be in other side of division.
Just by looking at the sudo code , Recursive call is made for every division and it is not skipping anything. Why would it be log(n)?
O(n) is correct complexity.

Is CLRS completely accurate to state that max-heapify running time is described by the recurrence `T(n) = T(2n/3) + O(1)`?

In CLRS on page 155, about max-heaps, the running time of max-heapify is described as T(n) = T(2n/3) + O(1).
I understand why the first recursive call is on a subproblem of size 2n/3 in the case where we have a nearly complete binary tree (always the case with heaps) in which the deepest level of nodes is half full (and we are recursing on the child that is the root of the subtree that contains these nodes at the deepest level). A more in depth explanation of this is here.
What I don't understand is: after that first recursive call, the subtree is now a complete binary tree, so the next recursive calls will be on problems of size n/2.
So is it accurate to simply state that the running time of max-heapify is described by the recurrence T(n) = T(2n/3) + O(1)?
Converting my comment to an answer: if you assume that T(n), the time required to build a max-heap with n nodes, is a nondecreasing function of n, then we know that T(m) ≤ T(n) for any m ≤ n. You're correct that the ratio of 2n / 3 is the worst-case ratio and that after the first level of the recurrence it won't be reached, but under the above assumption you can safely conclude that T(n / 2) ≤ T(2n / 3), so we can upper-bound the recurrence as
T(n) ≤ T(2n / 3) + O(1)
even if strict equality doesn't hold. That then lets us use the master theorem to conclude that T(n) = O(log n).

Why is the Fibonacci Sequence Big O(2^n) instead of O(logn)?

I took discrete math (in which I learned about master theorem, Big Theta/Omega/O) a while ago and I seem to have forgotten the difference between O(logn) and O(2^n) (not in the theoretical sense of Big Oh). I generally understand that algorithms like merge and quick sort are O(nlogn) because they repeatedly divide the initial input array into sub arrays until each sub array is of size 1 before recursing back up the tree, giving a recursion tree that is of height logn + 1. But if you calculate the height of a recursive tree using n/b^x = 1 (when the size of the subproblem has become 1 as was stated in an answer here) it seems that you always get that the height of the tree is log(n).
If you solve the Fibonacci sequence using recursion, I would think that you would also get a tree of size logn, but for some reason, the Big O of the algorithm is O(2^n). I was thinking that maybe the difference is because you have to remember all of the fib numbers for each subproblem to get the actual fib number meaning that the value at each node has to be recalled, but it seems that in merge sort, the value of each node has to be used (or at least sorted) as well. This is unlike binary search, however, where you only visit certain nodes based on comparisons made at each level of the tree so I think this is where the confusion is coming from.
So specifically, what causes the Fibonacci sequence to have a different time complexity than algorithms like merge/quick sort?
The other answers are correct, but don't make it clear - where does the large difference between the Fibonacci algorithm and divide-and-conquer algorithms come from? Indeed, the shape of the recursion tree for both classes of functions is the same - it's a binary tree.
The trick to understand is actually very simple: consider the size of the recursion tree as a function of the input size n.
In the Fibonacci recursion, the input size n is the height of the tree; for sorting, the input size n is the width of the tree. In the former case, the size of the tree (i.e. the complexity) is an exponent of the input size, in the latter: it is input size multiplied by the height of the tree, which is usually just a logarithm of the input size.
More formally, start by these facts about binary trees:
The number of leaves n is a binary tree is equal to the the number of non-leaf nodes plus one. The size of a binary tree is therefore 2n-1.
In a perfect binary tree, all non-leaf nodes have two children.
The height h for a perfect binary tree with n leaves is equal to log(n), for a random binary tree: h = O(log(n)), and for a degenerate binary tree h = n-1.
Intuitively:
For sorting an array of n elements with a recursive algorithm, the recursion tree has n leaves. It follows that the width of the tree is n, the height of the tree is O(log(n)) on the average and O(n) in the worst case.
For calculating a Fibonacci sequence element k with the recursive algorithm, the recursion tree has k levels (to see why, consider that fib(k) calls fib(k-1), which calls fib(k-2), and so on). It follows that height of the tree is k. To estimate a lower-bound on the width and number of nodes in the recursion tree, consider that since fib(k) also calls fib(k-2), therefore there is a perfect binary tree of height k/2 as part of the recursion tree. If extracted, that perfect subtree would have 2k/2 leaf nodes. So the width of the recursion tree is at least O(2^{k/2}) or, equivalently, 2^O(k).
The crucial difference is that:
for divide-and-conquer algorithms, the input size is the width of the binary tree.
for the Fibonnaci algorithm, the input size is it the height of the tree.
Therefore the number of nodes in the tree is O(n) in the first case, but 2^O(n) in the second. The Fibonacci tree is much larger compared to the input size.
You mention Master theorem; however, the theorem cannot be applied to analyze the complexity of Fibonacci because it only applies to algorithms where the input is actually divided at each level of recursion. Fibonacci does not divide the input; in fact, the functions at level i produce almost twice as much input for the next level i+1.
To address the core of the question, that is "why Fibonacci and not Mergesort", you should focus on this crucial difference:
The tree you get from Mergesort has N elements for each level, and there are log(n) levels.
The tree you get from Fibonacci has N levels because of the presence of F(n-1) in the formula for F(N), and the number of elements for each level can vary greatly: it can be very low (near the root, or near the lowest leaves) or very high. This, of course, is because of repeated computation of the same values.
To see what I mean by "repeated computation", look at the tree for the computation of F(6):
Fibonacci tree picture from: http://composingprograms.com/pages/28-efficiency.html
How many times do you see F(3) being computed?
Consider the following implementation
int fib(int n)
{
if(n < 2)
return n;
return fib(n-1) + fib(n-2)
}
Let's denote T(n) the number of operations that fib performs to calculate fib(n). Because fib(n) is calling fib(n-1) and fib(n-2), it means that T(n) is at least T(n-1) + T(n-2). This in turn means that T(n) > fib(n). There is a direct formula of fib(n) which is some constant to the power of n. Therefore T(n) is at least exponential. QED.
To my understanding, the mistake in your reasoning is that using a recursive implementation to evaluate f(n) where f denotes the Fibonacci sequence, the input size is reduced by a factor of 2 (or some other factor), which is not the case. Each call (except for the 'base cases' 0 and 1) uses exactly 2 recursive calls, as there is no possibility to re-use previously calculated values. In the light of the presentation of the master theorem on Wikipedia, the recurrence
f(n) = f (n-1) + f(n-2)
is a case for which the master theorem cannot be applied.
With the recursive algo, you have approximately 2^N operations (additions) for fibonacci (N). Then it is O(2^N).
With a cache (memoization), you have approximately N operations, then it is O(N).
Algorithms with complexity O(N log N) are often a conjunction of iterate over every item (O(N)) , split recurse, and merge ... Split by 2 => you do log N recursions.
Merge sort time complexity is O(n log(n)). Quick sort best case is O(n log(n)), worst case O(n^2).
The other answers explain why naive recursive Fibonacci is O(2^n).
In case you read that Fibonacci(n) can be O(log(n)), this is possible if calculated using iteration and repeated squaring either using matrix method or lucas sequence method. Example code for lucas sequence method (note that n is divided by 2 on each loop):
/* lucas sequence method */
int fib(int n) {
int a, b, p, q, qq, aq;
a = q = 1;
b = p = 0;
while(1) {
if(n & 1) {
aq = a*q;
a = b*q + aq + a*p;
b = b*p + aq;
}
n /= 2;
if(n == 0)
break;
qq = q*q;
q = 2*p*q + qq;
p = p*p + qq;
}
return b;
}
As opposed to answers master theorem can be applied. But master theorem for decreasing functions needs to be applied instead of master theorem for dividing functions. Without theorem with following recurrence relation with substitution it can be solved,
f(n) = f(n-1) + f(n-2)
f(n) = 2*f(n-1) + c
let assume c is equal 1 since it is constant and doesn't affect the complexity
f(n) = 2*f(n-1) + 1
and substitute this function k times
f(n) = 2*[2*f(n-2) +1 ] + 1
f(n) = 2^2*f(n-2) + 2 + 1
f(n) = 2^2*[2*f(n-3) + 1] +2 + 1
f(n) = 2^3*f(n-3) + 4 + 2 + 1
.
.
.
f(n) = 2^k*f(n-k) + 2^k-1 + 2^k-2 + ... + 4 + 2 + 1
now let's assume n=k
f(n) = 2^n*f(0) + 2^n-1 + 2^n-2 + ... + 4 + 2 + 1
f(n) = 2^n+1 thus complexity is O(2^n)
Check this video for master theorem for decreasing functions.

Binary tree level order traversal time complexity

HERE it is explained that method 1 of level order traversal has a time complexity of O(n^2). Can someone please explain me this. I am not sure how author is saying that printGivenLevel() takes O(n).
"Time Complexity: O(n^2) in worst case. For a skewed tree, printGivenLevel() takes O(n) time where n is the number of nodes in the skewed tree. So time complexity of printLevelOrder() is O(n) + O(n-1) + O(n-2) + .. + O(1) which is O(n^2)."
On the contrary HERE, it seems to be proved that it is O(n)
In the attached code, printGivenLevel() is O(n) indeed for worst case.
The *complexity function) of printGivenLevel() is:
T(n) = T(left) + T(right) + O(1)
where left = size of left subtree
right = size of right subtree
In worst case, for each node in the tree there is at most one son, so it looks something like this:
1
2
3
4
5
6
7
8
...
Now, note that the way the algorithm works, you start from the root, and travel all the way to the required level, while decreasing the level variable every time you recurse. So, in order to get to the nth level, you are going to need at least n invokations of printGivenLevel(), so the complexity function of printGivenLevel() for the above example is:
T(n) = T(n-1) + T(1) + O(1) = O(n) (can be proved used master theorem)
The first implementation requires you to do printGivenLevel() for each level, so for the same example, you get a worst case running time of O(n^2), since you need O(k) to print each level from 1 to k, which is O(1 + 2 + 3 + ... + n) =(*) O(n(n+1)/2) = O(n^2), where the equality marked with (*) is from sum or arithmetic progression
We can perform level order traversal in an easy way with always(best,avg,worst) O(n) Time complexity.
Simple Python code:
def level_order(self):
print(self.root.data,end=' ')
root=self.root
a=[root]
while len(a)!=0:
tmp=[]
for i in a:
if i.left!=None:
tmp.append(i.left)
print(i.left.data,end=' ')
if i.right!=None:
tmp.append(i.right)
print(i.right.data,end=' ')
a=tmp
Explanation: a is a list of all addresses of nodes at current level; tmp is a list to store addresses of child nodes of a. If len(a)=0, that means it is the last level, so the loop breaks.

recurrence relation for tree

suppose there is a tree with number of child nodes increasing from 2 to 4 then 8 and so on.how can we write recurrence relation for such a tree.
using subtitution method- T(n)=2T(n/2)+n
=2[2T(n/2^2)+n/2]+n
=2^2T(n/2^2)+n+n
=2^2[2T(n/2^3)+n/2^2]+n+n
=2^3T(n/2^3)+n+n+n
similarly subtituing k times-- we get
T(n)=2^k T(n/2^k)+nk
the recursion will terminate when n/2^k=1
k=log n base 2.
thus T(n)=2^log n(base2)+n(log n)
=n+nlogn
thus the tight bound for the above recurrence relation will be
=n log N (base of log is 2)
Take a look at this link.
T(n) = 2 T(n/2) + O(n) [the O(n) is for Combine]
T(1) = O(1)

Resources