What is big O of this pseudocode? - data-structures

For this pseudocode, what would be big O of this:
BinarySum(A, i, n) {
if n=1 then
return A[i] // base case
return BinarySum(A, i, n/2) + BinarySum(A, i+[n/2], [n/2])
}
This is really confusing because I think it is O(logn) since you are dividing by 2 each time you run the function but then some people I spoke to think it is O(n) and not O(logn) because the algorithm doesn't half the problem by picking and choosing one half of the array. What is it?

TL;DR
The runtime is θ(n).
How to determine runtime
The recurrence relation for the algorithm is
T(n) = 2T(n/2) + O(1)
Because we have two recursive calls for half of the array in every call and need constant time in a single call. BTW: this is the same recurrence relation which also describes a binary tree traversal.
You can use the Master-theorem to determine the runtime here since a >= 1 and b > 1 and the recurrence relation has the form
T(1) = 1; T(n) = aT(n/b) + f(n)
This is case one of the theorem meaning f(n) = O(n ^ logb(a)). This is because with a = 2, b = 2 and f(n) = O(1) like in this case log2(2) = 1 and therefore f(n) = O(1) = O(n¹) = O(n).
When case one is applicable the Master-theorem says that the runtime of the algorithm is θ(n).

First, let me say that I like the other answer that links the recurrence with the traversal of binary trees. For a balanced binary tree, this is indeed the recurrence, and so the complexity must necessarily be the same as a depth-first traversal, which we all know is O(n). I like this analogy because it clearly says that the result doesn't just apply to the recurrence T(n) = 2T(n/2) + O(1) but to anything where you split the input into chunks of sizes m[0], m[1], ... that sum to size n and do T(n) = T(m[0]) + T(m[1]) + T(m[2]) + ... + O(1). You don't have to split the input into two equally sized parts to get O(n); you just have to spent constant time and then recurse on disjoint parts of the input.
Using the Master's Theorem, I feel, is a bit overkill for this one. It is a big gun, and it gives us the correct answer, but if you are like me, it doesn't give you much intuition about why the answer is what it is. With this particular recurrence, we can get the correct answer and an intuitive understanding of it with a few drawings.
We can break down what happens at each level of the recursion and maybe draw it like this:
We have the work that we are handling on the left, i.e., the size of the input and the actual time we spend at each function call on the right. We have input size n at the first level, but we only spend one "computation unit" of time on it. We have two chunks of size n/2 that we spend two units of time on at the next level. At the next level, we have four chunks, each of size n/4, and we spend four units on them. This continues until our chunks have size one, and we have n of those, that we spend n units of time on.
The total time we spend is the area of the red blocks on the right. The depth of the recursion is O(log n) but we don't need to worry about that to analyse the time. We will just flip the "time" bit and look at it this way:
The total time we spend must be n for the original bottom layer (now top layer), n/2 for the next layer, n/4 for the next, and so on. Move n outside of parentheses, and all we have to worry about is the sum 1+1/2+1/4+1/8+.... The sum ends at some point, of course. We only have O(log n) terms. But we don't have to worry about that at all, because even if it continued forever we wouldn't sum to more than two.
You might recognise this as a geometric series. They have the form sum x^i for i=0 to infinity, and when |x|<1 they converge to 1/(1-x). Proving this takes a little calculus, but when x = 1/2 as we have, it is easy to draw the series and get the result from there.
Take the size n layer and then start putting the remaining layers next to each other under it. First, you put down the n/2 layer. It takes half of the space. Then you put the n/4 layer next to it, and it takes half of the remaining space. The n/8 layer will take half of the remaining space, the n/16 layer will take half of the remaining space, and it will continue like this as if it were a reenactment of Zeno's paradox.
If you keep taking half of what is left forever, you can never get more than you started with, so adding up all the layers except the first cannot give you more time spent than you spent on the very first layer. The total time you would do if you kept recursing forever (and time worked like real numbers) would be linear. Since you stop before forever, it is still going to be linear. Infinity gives us O(n) so recursion depth O(log n) will as well.
Of course, getting O(n) from observing that T(n) < T'(n) = O(n) where T'(n) continues subdividing forever only tells us that T(n) = O(n) and not that T(n) = Omega(n), there you have to show that you don't spend substantially less time than n, but considering that the largest layer is n, it should be obvious that the recursion also runs in Omega(n).
If you don't cut the data size in half every recursion, but cut the data in some other chunks that add up to n, you still get O(n) of course--think of traversing a tree--but it gets a hell of a lot harder to draw, and I've never managed to come up with a good illustration of that. For splitting the data in half, though, the drawing is simple and the conclusions we draw from it gives us the correct running time.
The same drawing also tells us that the recurrence T(n) = T(n/2) + O(n) is in O(n). Here, we don't have to flip the first drawing, because we start out with the largest layer on top. We spend time n then n/2 then n/4 and so on, and we end up spending 2n time units. Because 2 isn't special here, we have T(n) = T(f·n) + O(n) = O(n) for any fraction 0 ≤ f < 1, it is just a lot harder to draw when f ≠ 1/2.

First a bug:
BinarySum(A, i, n) {
if n=1 then
return A[i] // base case
return BinarySum(A, i, n/2) + BinarySum(A, i+(n/2), (n - n/2))
// ^^^^
}
The second half might be of uneven length. And then the last value was dropped for n/2.
Recursively this might be several values.
On the complexity. Having A[0] + A[1] + A[2] + ... + A[n-1].
The recursion goes down to a single A[i] and for every + above adds exactly 1 left and right. So (n-1 subtrees + n leafs = 2n-1) O(n). Furthermore the call tree is irrelevant. BinarySum is not faster (than non-binary sums) unless using multithreading.

There's a difference between making just one recursive function call, like binary search does, and two recursive function calls, like your code does. In both cases, the maximum depth of the recursion is O(log n), but the total cost of the algorithms are different. I think you are confusing the maximum depth of the recursion with total running time of the algorithm.
The given function does a constant amount c of work before making two recursive function calls. Let c denote the work done by a function outside its recursive calls. You can draw a recursion tree where each node is the cost of a function call. The root node has a cost of c. The root node has two children because there are two recursive calls, each with a cost of c. Each of these children makes two further recursive calls; hence, the root node has 4 grandchildren, each with a cost of c. This continues until we hit the base case.
The total cost of the recursion tree is the cost of the root node (which is c), plus the cost of its children (which is 2c), plus the cost of the grandchilden (which is 4c), and so on, until we hit the n leaves (which have a total cost of nc, where for simplicity we'll assume n is a power of 2). The total cost of all levels of the recursion tree is c+2c+4c+8c+...+nc = O(nc) = O(n). Here, we used the fact that in an increasing geometric series, the total sum is dominated by the last term (the sum is essentially just the last term, up to constant factors, which are subsumed in asymptotic notation). This sum had O(log n) terms, but the sum is O(n).
Equivalently, the recurrence describing the running time of your algorithm is T(n) = 2T(n/2)+c, and by the Master theorem, the solution is T(n) = O(n). This is different from binary search, which has the recurrence T(n)=1T(n/2)+c, which has the solution T(n)=O(log n). For binary search, the total cost of all levels of the recursion tree would be c+c+...+c; here, the sum has O(log n) terms and the sum is O(log n).

Related

time complexity of some recursive and none recursive algorithm

I have two pseudo-code algorithms:
RandomAlgorithm(modVec[0 to n − 1])
b = 0;
for i = 1 to n do
b = 2.b + modVec[n − i];
for i = 1 to b do
modVec[i mod n] = modVec[(i + 1) mod n];
return modVec;
Second:
AnotherRecursiveAlgo(multiplyVec[1 to n])
if n ≤ 2 do
return multiplyVec[1] × multiplyVec[1];
return
multiplyVec[1] × multiplyVec[n] +
AnotherRecursiveAlgo(multiplyVec[1 to n/3]) +
AnotherRecursiveAlgo(multiplyVec[2n/3 to n]);
I need to analyse the time complexity for these algorithms:
For the first algorithm i got the first loop is in O(n),the second loop has a best case and a worst case , best case is we have O(1) the loop runs once, the worst case is we have a big n on the first loop, but i don't know how to write this idea as a time complexity cause i usually get b=sum(from 1 to n-1) of 2^n-1 . modVec[n-1] and i get stuck here.
For the second loop i just don't get how to solve the time complexity of this one, we usually have it dependant on n , so we need the formula i think.
Thanks for the help.
The first problem is a little strange, all right.
If it helps, envision modVec as an array of 1's and 0's.
In this case, the first loop converts this array to a value.
This is O(n)
For instance, (1, 1, 0, 1, 1) will yield b = 27.
Your second loop runs b times. The dominating term for the value of b is 2^(n-1), a.k.a. O(2^n). The assignment you do inside the loop is O(1).
The second loop does depend on n. Your base case is a simple multiplication, O(1). The recursion step has three terms:
simple multiplication
recur on n/3 elements
recur on n/3 elements (from 2n/3 to the end is n/3 elements)
Just as your binary partitions result in log[2] complexities, this one will result in log[3]. The base doesn't matter; the coefficient (two recursive calls) doesn't' matter. 2*O(log3) is still O(log N).
Does that push you to a solution?
First Loop
To me this boils down to the O(First-For-Loop) + O(Second-For-Loop).
O(First-For-Loop) is simple = O(n).
O(Second-For-Loop) interestingly depends on n. Therefore, to me it's can be depicted as O(f(n)), where f(n) is some function of n. Not completely sure if I understand the f(n) based on the code presented.
The answer consequently becomes O(n) + O(f(n)). This could boil down to O(n) or O(f(n)) depending upon which one is larger and more dominant (since the lower order terms don't matter in the big-O notation.
Second Loop
In this case, I see that each call to the function invokes 3 additional calls...
The first call seems to be an O(1) call. So it won't matter.
The second and the third calls seems to recurses the function.
Therefore each function call is resulting in 2 additional recursions.
Consequently , the time complexity on this would be O(2^n).

Time Complexity of the following code fragment?

I calculated it to be O(N^2), but my instructor marked it incorrect in the exam. The Correct answer was O(1). Can anyone help me, how did the time complexity come out to be O(1)?
The outer loop will run for 2N times. (int j = 2 * N) and later decrementing everytime by 1)
And since N is not changing, and the i is assigned the values of N always (int i = N), the inner loop will always run for logN base 2 times.
(Notice the way i changes i = i div 2)
Therefore, the complexity is O(NlogN)
Question: What happens when you repeatedly half input(or search space) ?(Like in Binary Search).
Answer: Well, you get log(N) complexity. (Reference : The Algorithm Design Manual by Steven S. Skiena)
See the inner loop in your algorithm, i = i div 2 makes it a log(N) complexity loop. Therefore the overall complexity will be N log(N).
Take this with a pinch of salt : Whenever you divide your input (search space) by 2, 3 , 4 or whatever constant number greater than 1, you get log(N) complexity.
P.S. : the complexity of your algorithm is nowhere near to O(1).

Logarithmic function in time complexity

How does a program's worst case or average case dependent on log function? How does the base of log come in play?
The log factor appears when you split your problem to k parts, of size n/k each and then "recurse" (or mimic recursion) on some of them.
A simple example is the following loop:
foo(n):
while n > 0:
n = n/2
print n
The above will print n, n/2, n/4, .... , 1 - and there are O(logn) such values.
the complexity of the above program is O(logn), since each printing requires constant amount of time, and number of values n will get along the way is O(logn)
If you are looking for "real life" examples, in quicksort (and for simplicity let's assume splitting to exactly two halves), you split the array of size n to two subarrays of size n/2, and then you recurse on both of them - and invoke the algorithm on each half.
This makes the complexity function of:
T(n) = 2T(n/2) + O(n)
From master theorem, this is in Theta(nlogn).
Similarly, on binary search - you split the problem to two parts, and recurse only on one of them:
T(n) = T(n/2) + 1
Which will be in Theta(logn)
The base is not a factor in big O complexity, because
log_k(n) = log_2(n)/log_2(k)
and log_2(k) is constant, for any constant k.

Prove 3-Way Quicksort Big-O Bound

For 3-way Quicksort (dual-pivot quicksort), how would I go about finding the Big-O bound? Could anyone show me how to derive it?
There's a subtle difference between finding the complexity of an algorithm and proving it.
To find the complexity of this algorithm, you can do as amit said in the other answer: you know that in average, you split your problem of size n into three smaller problems of size n/3, so you will get, in è log_3(n)` steps in average, to problems of size 1. With experience, you will start getting the feeling of this approach and be able to deduce the complexity of algorithms just by thinking about them in terms of subproblems involved.
To prove that this algorithm runs in O(nlogn) in the average case, you use the Master Theorem. To use it, you have to write the recursion formula giving the time spent sorting your array. As we said, sorting an array of size n can be decomposed into sorting three arrays of sizes n/3 plus the time spent building them. This can be written as follows:
T(n) = 3T(n/3) + f(n)
Where T(n) is a function giving the resolution "time" for an input of size n (actually the number of elementary operations needed), and f(n) gives the "time" needed to split the problem into subproblems.
For 3-Way quicksort, f(n) = c*n because you go through the array, check where to place each item and eventually make a swap. This places us in Case 2 of the Master Theorem, which states that if f(n) = O(n^(log_b(a)) log^k(n)) for some k >= 0 (in our case k = 0) then
T(n) = O(n^(log_b(a)) log^(k+1)(n)))
As a = 3 and b = 3 (we get these from the recurrence relation, T(n) = aT(n/b)), this simplifies to
T(n) = O(n log n)
And that's a proof.
Well, the same prove actually holds.
Each iteration splits the array into 3 sublists, on average the size of these sublists is n/3 each.
Thus - number of iterations needed is log_3(n) because you need to find number of times you do (((n/3) /3) /3) ... until you get to one. This gives you the formula:
n/(3^i) = 1
Which is satisfied for i = log_3(n).
Each iteration is still going over all the input (but in a different sublist) - same as quicksort, which gives you O(n*log_3(n)).
Since log_3(n) = log(n)/log(3) = log(n) * CONSTANT, you get that the run time is O(nlogn) on average.
Note, even if you take a more pessimistic approach to calculate the size of the sublists, by taking minimum of uniform distribution - it will still get you first sublist of size 1/4, 2nd sublist of size 1/2, and last sublist of size 1/4 (minimum and maximum of uniform distribution), which will again decay to log_k(n) iterations (with a different k>2) - which will yield O(nlogn) overall - again.
Formally, the proof will be something like:
Each iteration takes at most c_1* n ops to run, for each n>N_1, for some constants c_1,N_1. (Definition of big O notation, and the claim that each iteration is O(n) excluding recursion. Convince yourself why this is true. Note that in here - "iteration" means all iterations done by the algorithm in a certain "level", and not in a single recursive invokation).
As seen above, you have log_3(n) = log(n)/log(3) iterations on average case (taking the optimistic version here, same principles for pessimistic can be used)
Now, we get that the running time T(n) of the algorithm is:
for each n > N_1:
T(n) <= c_1 * n * log(n)/log(3)
T(n) <= c_1 * nlogn
By definition of big O notation, it means T(n) is in O(nlogn) with M = c_1 and N = N_1.
QED

Big O, what is the complexity of summing a series of n numbers?

I always thought the complexity of:
1 + 2 + 3 + ... + n is O(n), and summing two n by n matrices would be O(n^2).
But today I read from a textbook, "by the formula for the sum of the first n integers, this is n(n+1)/2" and then thus: (1/2)n^2 + (1/2)n, and thus O(n^2).
What am I missing here?
The big O notation can be used to determine the growth rate of any function.
In this case, it seems the book is not talking about the time complexity of computing the value, but about the value itself. And n(n+1)/2 is O(n^2).
You are confusing complexity of runtime and the size (complexity) of the result.
The running time of summing, one after the other, the first n consecutive numbers is indeed O(n).1
But the complexity of the result, that is the size of “sum from 1 to n” = n(n – 1) / 2 is O(n ^ 2).
1 But for arbitrarily large numbers this is simplistic since adding large numbers takes longer than adding small numbers. For a precise runtime analysis, you indeed have to consider the size of the result. However, this isn’t usually relevant in programming, nor even in purely theoretical computer science. In both domains, summing numbers is usually considered an O(1) operation unless explicitly required otherwise by the domain (i.e. when implementing an operation for a bignum library).
n(n+1)/2 is the quick way to sum a consecutive sequence of N integers (starting from 1). I think you're confusing an algorithm with big-oh notation!
If you thought of it as a function, then the big-oh complexity of this function is O(1):
public int sum_of_first_n_integers(int n) {
return (n * (n+1))/2;
}
The naive implementation would have big-oh complexity of O(n).
public int sum_of_first_n_integers(int n) {
int sum = 0;
for (int i = 1; i <= n; i++) {
sum += n;
}
return sum;
}
Even just looking at each cell of a single n-by-n matrix is O(n^2), since the matrix has n^2 cells.
There really isn't a complexity of a problem, but rather a complexity of an algorithm.
In your case, if you choose to iterate through all the numbers, the the complexity is, indeed, O(n).
But that's not the most efficient algorithm. A more efficient one is to apply the formula - n*(n+1)/2, which is constant, and thus the complexity is O(1).
So my guess is that this is actually a reference to Cracking the Coding Interview, which has this paragraph on a StringBuffer implementation:
On each concatenation, a new copy of the string is created, and the
two strings are copied over, character by character. The first
iteration requires us to copy x characters. The second iteration
requires copying 2x characters. The third iteration requires 3x, and
so on. The total time therefore is O(x + 2x + ... + nx). This reduces
to O(xn²). (Why isn't it O(xnⁿ)? Because 1 + 2 + ... n equals n(n+1)/2
or, O(n²).)
For whatever reason I found this a little confusing on my first read-through, too. The important bit to see is that n is multiplying n, or in other words that n² is happening, and that dominates. This is why ultimately O(xn²) is just O(n²) -- the x is sort of a red herring.
You have a formula that doesn't depend on the number of numbers being added, so it's a constant-time algorithm, or O(1).
If you add each number one at a time, then it's indeed O(n). The formula is a shortcut; it's a different, more efficient algorithm. The shortcut works when the numbers being added are all 1..n. If you have a non-contiguous sequence of numbers, then the shortcut formula doesn't work and you'll have to go back to the one-by-one algorithm.
None of this applies to the matrix of numbers, though. To add two matrices, it's still O(n^2) because you're adding n^2 distinct pairs of numbers to get a matrix of n^2 results.
There's a difference between summing N arbitrary integers and summing N that are all in a row. For 1+2+3+4+...+N, you can take advantage of the fact that they can be divided into pairs with a common sum, e.g. 1+N = 2+(N-1) = 3+(N-2) = ... = N + 1. So that's N+1, N/2 times. (If there's an odd number, one of them will be unpaired, but with a little effort you can see that the same formula holds in that case.)
That is not O(N^2), though. It's just a formula that uses N^2, actually O(1). O(N^2) would mean (roughly) that the number of steps to calculate it grows like N^2, for large N. In this case, the number of steps is the same regardless of N.
Adding the first n numbers:
Consider the algorithm:
Series_Add(n)
return n*(n+1)/2
this algorithm indeed runs in O(|n|^2), where |n| is the length (the bits) of n and not the magnitude, simply because multiplication of 2 numbers, one of k bits and the other of l bits runs in O(k*l) time.
Careful
Considering this algorithm:
Series_Add_pseudo(n):
sum=0
for i= 1 to n:
sum += i
return sum
which is the naive approach, you can assume that this algorithm runs in linear time or generally in polynomial time. This is not the case.
The input representation(length) of n is O(logn) bits (any n-ary coding except unary), and the algorithm (although it is running linearly in the magnitude) it runs exponentially (2^logn) in the length of the input.
This is actually the pseudo-polynomial algorithm case. It appears to be polynomial but it is not.
You could even try it in python (or any programming language), for a medium length number like 200 bits.
Applying the first algorithm the result comes in a split second, and applying the second, you have to wait a century...
1+2+3+...+n is always less than n+n+n...+n n times. you can rewrite this n+n+..+n as n*n.
f(n) = O(g(n)) if there exists a positive integer n0 and a positive
constant c, such that f(n) ≤ c * g(n) ∀ n ≥ n0
since Big-Oh represents the upper bound of the function, where the function f(n) is the sum of natural numbers up to n.
now, talking about time complexity, for small numbers, the addition should be of a constant amount of work. but the size of n could be humongous; you can't deny that probability.
adding integers can take linear amount of time when n is really large.. So you can say that addition is O(n) operation and you're adding n items. so that alone would make it O(n^2). of course, it will not always take n^2 time, but it's the worst-case when n is really large. (upper bound, remember?)
Now, let's say you directly try to achieve it using n(n+1)/2. Just one multiplication and one division, this should be a constant operation, no?
No.
using a natural size metric of number of digits, the time complexity of multiplying two n-digit numbers using long multiplication is Θ(n^2). When implemented in software, long multiplication algorithms must deal with overflow during additions, which can be expensive. Wikipedia
That again leaves us to O(n^2).
It's equivalent to BigO(n^2), because it is equivalent to (n^2 + n) / 2 and in BigO you ignore constants, so even though the squared n is divided by 2, you still have exponential growth at the rate of square.
Think about O(n) and O(n/2) ? We similarly don't distinguish the two, O(n/2) is just O(n) for a smaller n, but the growth rate is still linear.
What that means is that as n increase, if you were to plot the number of operations on a graph, you would see a n^2 curve appear.
You can see that already:
when n = 2 you get 3
when n = 3 you get 6
when n = 4 you get 10
when n = 5 you get 15
when n = 6 you get 21
And if you plot it like I did here:
You see that the curve is similar to that of n^2, you will have a smaller number at each y, but the curve is similar to it. Thus we say that the magnitude is the same, because it will grow in time complexity similarly to n^2 as n grows bigger.
answer of sum of series of n natural can be found using two ways. first way is by adding all the numbers in loop. in this case algorithm is linear and code will be like this
int sum = 0;
for (int i = 1; i <= n; i++) {
sum += n;
}
return sum;
it is analogous to 1+2+3+4+......+n. in this case complexity of algorithm is calculated as number of times addition operation is performed which is O(n).
second way of finding answer of sum of series of n natural number is direst formula n*(n+1)/2. this formula use multiplication instead of repetitive addition. multiplication operation has not linear time complexity. there are various algorithm available for multiplication which has time complexity ranging from O(N^1.45) to O (N^2). therefore in case of multiplication time complexity depends on the processor's architecture. but for the analysis purpose time complexity of multiplication is considered as O(N^2). therefore when one use second way to find the sum then time complexity will be O(N^2).
here multiplication operation is not same as the addition operation. if anybody has knowledge of computer organisation subject then he can easily understand the internal working of multiplication and addition operation. multiplication circuit is more complex than the adder circuit and require much higher time than the adder circuit to compute the result. so time complexity of sum of series can't be constant.

Resources