What's the big-O complexity of this recursive algorithm? - algorithm

I am following a course of Algorithms and Data Structures.
Today, my professor said the complexity of the following algorithm is 2n.
I waited till the lesson was over, approached him and told him I actually believed it was an O(n) algorithm, and I did the computation to prove it, and wanted to show them to it, but he continued to say it was not, without giving me any convincing explanation.
The algorithm is recursive, and it has this complexity:
{ 1 if n=1
T(n) = {
{ 2T(n/2) otherwise
I computed it down to be a O(n), this way:
Let's expand T(n)
T(n) = 2 [2 * T(n/(2^2))]
= 2^2 * T(n/(2^2))
= 2^2 * [2 * T(n/(2^3))]
= 2^3 * T(n/(2^3))
= ...
= 2^i * T(n/(2^i)).
We stop when the term inside the T is 1, that is:
n/(2i) = 1 ==> n = 2i ==> i = log n
After the substitution, we obtain
T(n) = 2^log n * T(1)
= n * 1
= O(n).
Since this algorithm jumped out of a lesson on Merge Sort, I noted how Merge Sort, which notoriously is O(n log n) has a complexity of 2T(n/2) + Θ(n) (obviously higher than 2T(n/2)), and I asked him why is it, that an algorithm with a lower complexity, gets a higher big-O. Because, at this point, it's counter intuitive for me. He replied, words for words, "If you think that is counter-intuitive, you have serious problem in your math."
My questions are:
Is there any fallacy in my demonstration?
Wouldn't the last situation be counter-intuitive?
Yes, this is also a vent.

Proof - 1
This recurrence falls in case - 3 of Master Theorem, with
a = 2;
b = 2; and,
c = -∞
and thus Logba = 1 which is bigger than -∞. Therefore the running time is Θ(n1) = Θ(n).
Proof - 2
Intuitively, you are breaking the problem of size n into 2 problems of size n/2 and the cost to join the result of two sub-problems is 0 (i.e. there is no constant component in the recurrence).
Hence at the bottom-most level you have n problems of cost 1 each, resulting in the running time of n * O(1) which is equal to O(n).
Edit: Just to complete this answer I will also add the answers to specific questions asked by you.
Is there any fallacy in my demonstration?
No. It is correct.
Wouldn't the last situation be counter-intuitive?
Definitely it is counter-intuitive. See the Proof-2 above.

You are correct in computing the time complexity of the given relation. If we are measuring the input size in n(which we should) then your professor is wrong in claiming that the time complexity is 2^n.
You should probably discuss it with him and clear any misunderstanding that you might have.

You are clearly correct that a function T(n) which satisfies that recurrence relation is O(n). It is essentially obvious since it says that the complexity of a given problem is twice that of a problem which is half the size. You can't get much more linear than that. For example -- the complexity of searching through a list of 1000 elements with a linear search is twice that of searching through a list with 500 elements.
If your professor is also correct then perhaps you are incorrect about the complexity satisfying that recurrence. Alternatively, sometimes there is some confusion about how the input size is being measured. For example, an integer n is exponential in the number of bits needed to specify it. For example -- brute force trial division of an integer n is O(sqrt(n)) which is much better than O(n). The reason that this doesn't contradict that fact that brute force factoring is essentially worthless for e.g. cracking RSA is because for say a 256 bit key the relevant n is around 2^256.

Related

The complexity of divide-and-conquer algorithms

Assume that the size of the input problem increases with an integer n. Let T(n) be the time complexity of a divide-and-conquer algorithm to solve this problem. Then T(n) satisfies an equation of the form:
T(n) = a T(n/b) + f (n).
Now my question is: how can a and b be unequal?
It seems that they should be equal because the number of recursive calls must be equal to b (size of a sub-problem).
In software, time is often wasted on control operations, like function calls. So usually a > b.
Also, there are situations where the problem calls for just one "recursive call" (which would then be just iteration), for example, binary search. In these cases, a < b.

Divide and Conquer alg that runs in constant time?

Can you point out to me an example of a divide and conquer algorithm that runs in CONSTANT time! I'm in a "OMG! I can not think of any such thing" kind of situation. Point me to something please. Thanks
I know that an alg that follows the following recurence: T(n) = 2T(n/2) + n would be merge sort. We're dividing the problem into 2 subproblems - each of size n/2. Then we're taking n time to conquer everything back into one sorted array.
I also know that T(n) = T(n/2) + 1 would be binary search.
But what is T(n) = 1?
For a divide-and-conquer algorithm to run in constant time, it needs to do no more than a fixed amount of work on any input. Therefore, it can make at most a fixed number of recursive calls on any input, since if the number of calls was unbounded, the total work done would not be a constant. Moreover, it needs to do a constant amount of work across all those recursive calls.
This eliminates basically any reasonable-looking recurrence relation. Anything of the form
T(n) = aT(n / b) + O(nk)
is immediately out of the question, because the number of recursive calls would grow as a function of the input n.
You could make some highly contrived divide-and-conquer algorithms that run in constant time. For example, consider this problem:
Return the first element of the input array.
This could technically be solved with divide-and-conquer by noting that
The first element of a one-element array is equal to itself.
The first element of an n-element array is the first element of the subarray of just the first element.
The recurrence is then
T(n) = T(1) + O(1)
T(1) = 1
As you can see, this is a very odd-looking recurrence, but it does work out.
I've never heard of anything like this coming up in practice, but if I think of anything I'll try to update this answer with details. (A note: I'm not expecting to ever update this answer. ^_^)
Hope this helps!

What's the complexity of for i: for o = i+1

for i = 0 to size(arr)
for o = i + 1 to size(arr)
do stuff here
What's the worst-time complexity of this? It's not N^2, because the second one decreases by one every i loop. It's not N, it should be bigger. N-1 + N-2 + N-3 + ... + N-N+1.
It is N ^ 2, since it's the product of two linear complexities.
(There's a reason asymptotic complexity is called asymptotic and not identical...)
See Wikipedia's explanation on the simplifications made.
Think of it like you are working with a n x n matrix. You are approximately working on half of the elements in the matrix, but O(n^2/2) is the same as O(n^2).
When you want to determine the complexity class of an algorithm, all you need is to find the fastest growing term in the complexity function of the algorithm. For example, if you have complexity function f(n)=n^2-10000*n+400, to find O(f(n)), you just have to find the "strongest" term in the function. Why? Because for n big enough, only that term dictates the behavior of the entire function. Having said that, it is easy to see that both f1(n)=n^2-n-4 and f2(n)=n^2 are in O(n^2). However, they, for the same input size n, don't run for the same amount of time.
In your algorithm, if n=size(arr), the do stuff here code will run f(n)=n+(n-1)+(n-2)+...+2+1 times. It is easy to see that f(n) represents a sum of an arithmetic series, which means f(n)=n*(n+1)/2, i.e. f(n)=0.5*n^2+0.5*n. If we assume that do stuff here is O(1), then your algorithm has O(n^2) complexity.
for i = 0 to size(arr)
I assumed that the loop ends when i becomes greater than size(arr), not equal to. However, if the latter is the case, than f(n)=0.5*n^2-0.5*n, and it is still in O(n^2). Remember that O(1),O(n),0(n^2),... are complexity classes, and that complexity functions of algorithms are functions that describe, for the input size n, how many steps there is in the algorithm.
It's n*(n-1)/2 which is equal to O(n^2).

Understanding Master Theorem

Generic form: T(n) = aT(n/b) + f(n)
So i must compare n^logb(a) with f(n)
if n^logba > f(n) is case 1 and T(n)=Θ(n^logb(a))
if n^logba < f(n) is case 2 and T(n)=Θ((n^logb(a))(logb(a)))
Is that correct? Or I misunderstood something?
And what about case 3? When its apply?
Master Theorem for Solving Recurrences
Recurrences occur in a divide and conquer strategy of solving complex problems.
What does it solve?
It solves recurrences of the form T(n) = aT(n/b) + f(n).
a should be greater than or equal to 1. This means that the problem is at least reduced to a smaller sub problem once. At least one recursion is needed.
b should be greater than 1. Which means at every recursion, the size of the problem is reduced to a smaller size. If b is not greater than 1, that means our sub problems are not of smaller size.
f(n) must be positive for relatively larger values of n.
Consider the below image:
Let's say we have a problem of size n to be solved. At each step, the problem can be divided into a sub problems and each sub problem is of smaller size, where the size is reduced by a factor of b.
The above statement in simple words means that a problem of size n can be divided into a sub problems of relatively smaller sizes n/b.
Also, the above diagram shows that at the end when we have divided the problems multiple times, each sub problem would be so small that it can be solved in constant time.
For the below derivation consider log to the base b.
Let us say that H is the height of the tree, then H = logn. The number of leaves = a^logn.
Total work done at Level 1 : f(n)
Total work done at Level 2 : a * f(n/b)
Total work done at Level 1 : a * a * f(n/b2)
Total work done at last Level : number of leaves * θ(1). This is equal to n^loga
The three cases of the Master Theorem
Case 1:
Now let us assume that the cost of operation is increasing by a significant factor at each level and by the time we reach the leaf level the value of f(n) becomes polynomially smaller than the value n^loga. Then the overall running time will be heavily dominated by the cost of the last level. Hence T(n) = θ(n^loga).
Case 2:
Let us assume that the cost of the operation on each level is roughly equal. In that case f(n) is roughly equal to n^loga. Hence, the total running time would be f(n) times the total number of levels.
T(n) = θ(n^loga * logn) where k can be >=0. Where logn would be the height of a tree for k >= 0.
Note: Here k+1 is the base of log in logn
Case 3:
Let us assume that the cost of the operation on each level is decreasing by a significant factor at each level and by the time we reach the leaf level the value of f(n) becomes polynomially larger than the value n^loga. Then the overall running time will be heavily dominated by the cost of the first level. Hence T(n) = θ(f(n)).
If you are interested in more detailed reading and examples to practice, visit my blog entry Master Method to Solve Recurrences
I think you have misunderstood it.
if n^logba > f(n) is case 1 and T(n)=Θ(n^logb(a))
Here you should not be worried about f(n) as a result, what you are getting is T(n)=Θ(n^logb(a)).
f(n) is part of T(n) ..and if you get the result T(n) then that value will be inclusive of f(n).
so, There is no need to consider that part.
Let me know if you are not clear.

Determining BigO of a recurrence

T (1) = c
T (n) = T (n/2) + dn
How would I determine BigO of this quickly?
Use repeated backsubstitution and find the pattern. An example here.
I'm not entirely sure what dn is, but assuming you mean a constant multiplied by n:
According to Wolfram Alpha, the recurrence equation solution for:
f(n) = f(n / 2) + cn
is:
f(n) = 2c(n - 1) + c1
which would make this O(n).
Well, the recurrence part of the relationship is the T(n/2) part, which is in effect halving the value of n each time.
Thus you will need approx. (log2 n) steps to get to the termination condition, hence the overall cost of the algorithm is O(log2 n). You can ignore the dn part as is it a constant-time operation for each step.
Note that as stated, the problem won't necessarily terminate since halving an arbitrary value of n repeatedly is unlikely to exactly hit 1. I suspect that the T(n/2) part should actually read T(floor (n / 2)) or something like that in order to ensure that this terminates.
use master's theorem
see http://en.wikipedia.org/wiki/Master_theorem
By the way, the asymptotic behaviour of your recurrence is O(n) assuming d is positive and sufficiently smaller than n (size of problem)

Resources