How do you find the algorithmic complexity of code fragements? - big-o

I don't know what the procedure of this would be. How do I think of this, how do I determine what the big-O will be? What is the process to solving?
Example1:
for ( i = 1; i <= n; i++)
for (j = 1; j <= n*3; j++)
System.out.println("Apple");
Example2:
for (i = 1; i < n*n*n; i *=n)
System.out.println("Banana");
Thank you

The short answer is that you count the loops. If there is no loop, it is O constant, if there is one it is O(N) if there are two nested loops it is O(N squared) and if there are three it is O(N cubed).
However that's only the short answer. You can also have loops which reduce an input by half on each iteration, so thats a log N term. And you can have pathological brute force functions which try every possibility, these are non-polynomial. Usually they are written to make heavy use of recursion and the problem is hardly chipped away at on each recursive step.
Be aware that library functions are often not O constant, and that has to be factored in.

Big-O measures efficiency. So say you were to loop through an array of size n and say n is 2,000. O(n) would signify that your algorithm for solving this is doing WORST CASE 2,000 total calculations. O is always the worst case scenario for your algorithm. There are other notation used for best case. You also have Ω(n) and Θ(n).
Check this out to kind of get an idea of the difference in efficiency:
http://bigocheatsheet.com/
Informally:
"T(n)T(n)T(n) is O(f(n))O(f(n))O(f(n))" basically means that f(n)f(n)f(n) describes the upper bound for T(n)T(n)T(n)
"T(n)T(n)T(n) is Ω(f(n))\Omega(f(n))Ω(f(n))" basically means that f(n)f(n)f(n) describes the lower bound for T(n)T(n)T(n)
"T(n)T(n)T(n) is Θ(f(n))\Theta(f(n))Θ(f(n))" basically means that f(n)f(n)f(n) describes the exact bound for T(n)T(n)T(n)
A good way to approach this for simple situations is to plug a couple of easy numbers in for n and see what happens. So say n is size 10:
in example 1:
for ( i = 1; i <= n; i++) //loop through this n times
for (j = 1; j <= n*3; j++) for each of those n times, loop through 3*n times
System.out.println("Apple"); //negligible time (O(1))
If it were just the outside loop, it would be O(n). However, since you add the inside loop, you get O(N^2) because although your input is (say) 10, you're doing 300 (30 prints for each of the 10; 30*10) operations. 3* O(N^2) but we generally leave the 3 out so O(n^2). Most nested for loops where you aren't modifying by n are O(n^2).
If it's easier you can visualize it as the polynomial 3n * n = 3n^2 worst case.
I'll let you try the next one... hint in the bold statement above.

Related

What is the time complexity of Brute Force Algorithm of Kadane's Algorithm? [duplicate]

Here is a segment of an algorithm I came up with:
for (int i = 0; i < n - 1; i++)
for (int j = i; j < n; j++)
(...)
I am using this "double loop" to test all possible 2-element sums in a an array of size n.
Apparently (and I have to agree with it), this "double loop" is O(n²):
n + (n-1) + (n-2) + ... + 1 = sum from 1 to n = (n (n - 1))/2
Here is where I am confused:
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
(...)
This second "double loop" also has a complexity of O(n²), when it is clearly (at worst) much (?) better than the first.
What am I missing? Is the information accurate? Can someone explain this "phenomenon"?
(n (n - 1))/2 simplifies to n²/2 - n/2. If you use really large numbers for n, the growth rate of n/2 will be dwarfed in comparison to n², so for the sake of calculating Big-O complexity, you effectively ignore it. Likewise, the "constant" value of 1/2 doesn't grow at all as n increases, so you ignore that too. That just leaves you with n².
Just remember that complexity calculations are not the same as "speed". One algorithm can be five thousand times slower than another and still have a smaller Big-O complexity. But as you increase n to really large numbers, general patterns emerge that can typically be classified using simple formulae: 1, log n, n, n log n, n², etc.
It sometimes helps to create a graph and see what kind of line appears:
Even though the zoom factors of these two graphs are very different, you can see that the type of curve it produces is almost exactly the same.
Constant factors.
Big-O notation ignores constant factors, so even though the second loop is slower by a constant factor, they end up with the same time complexity.
Right there in the definition it tells you that you can pick any old constant factor:
... if and only if there is a positive constant M ...
This is because we want to analyse the growth rate of an algorithm - constant factors just complicates things and are often system-dependent (operations may vary in duration on different machines).
You could just count certain types of operations, but then the question becomes which operation to pick, and what if that operation isn't predominant in some algorithm. Then you'll need to relate operations to each other (in a system-independent way, which is probably impossible), or you could just assign the same weight to each, but that would be fairly inaccurate as some operations would take significantly longer than others.
And how useful would saying O(15n² + 568n + 8 log n + 23 sqrt(n) + 17) (for example) really be? As opposed to just O(n²).
(For the purpose of the below, assume n >= 2)
Note that we actually have asymptotically smaller (i.e. smaller as we approach infinity) terms here, but we can always simplify that to a matter of constant factors. (It's n(n+1)/2, not n(n-1)/2)
n(n+1)/2 = n²/2 + n/2
and
n²/2 <= n²/2 + n/2 <= n²
Given that we've just shown that n(n+1)/2 lies between C.n² and D.n², for two constants C and D, we've also just shown that it's O(n²).
Note - big-O notation is actually strictly an upper bound (so we only care that it's smaller than a function, not between two), but it's often used to mean Θ (big-Theta), which cares about both bounds.
From The Big O page on Wikipedia
In typical usage, the formal definition of O notation is not used
directly; rather, the O notation for a function f is derived by the
following simplification rules:
If f(x) is a sum of several terms, the
one with the largest growth rate is kept, and all others omitted
Big-O is used only to give the asymptotic behaviour - that one is a bit faster than the other doesn't come into it - they're both O(N^2)
You could also say that the first loop is O(n(n-1)/2). The fancy mathematical definition of big-O is something like:
function "f" is big-O of function "g" if there exists constants c, n such that f(x) < c*g(x) for some c and all x > n.
It's a fancy way of saying g is an upper bound past some point with some constant applied. It then follows that O(n(n-1)/2) = O((n^2-n)/2) is big-O of O(n^2), which is neater for quick analysis.
AFAIK, your second code snippet
for(int i = 0; i < n; i++) <-- this loop goes for n times
for(int j = 0; j < n; j++) <-- loop also goes for n times
(...)
So essentially, it's getting a O(n*n) = O(n^2) time complexity.
Per BIG-O theory, constant factor is neglected and only higher order is considered. that's to say, if complexity is O(n^2+k) then actual complexity will be O(n^2) constant k will be ignored.
(OR) if complexity is O(n^2+n) then actual complexity will be O(n^2) lower order n will be ignored.
So in your first case where complexity is O(n(n - 1)/2) will/can be simplified to
O(n^2/2 - n/2) = O(n^2/2) (Ignoring the lower order n/2)
= O(1/2 * n^2)
= O(n^2) (Ignoring the constant factor 1/2)

Big-O notation of an algorithm that runs max(n,0) times?

I have the following algorithm:
for(int i = 1; i < n; i++)
for(int j = 0; j < i; j++)
if(j % i == 0) System.out.println(i + " " + j);
This will run max(n,0) times.
Would the Big-O notation be O(n)? If not, what is it and why?
Thank you.
You haven't stated what you are trying to measure with the Big-O notation. Let's assume it's time complexity. Next we have to define what the dependent variable is against which you want to measure the complexity. A reasonable choice here is the absolute value of n (as opposed to the bit-length), since you are dealing with fixed-length ints and not arbitrary-length integers.
You are right that the println is executed O(n) times, but that's counting how often a certain line is hit, it's not measuring time complexity.
It's easy to see that the if statement is hit O(n^2) times, so we have already established that the time complexity is bounded from below by Omega(n^2). As a commenter has already noted, the if-condition is only true for j=0, so I suspect that you actually meant to write i % j instead of j % i? This matters because the time complexity of the println(i + " " + j)-statement is certainly not O(1), but O(log n) (you can't possibly print x characters in less than x steps), so at first sight there is a possibility that the overall complexity is strictly worse than O(n^2).
Assuming that you meant to write i % j we could make the simplifying assumption that the condition is always true, in which case we would obtain the upper bound O(n^2 log n), which is strictly worse than O(n^2)!
However, noting that the number of divisors of n is bounded by O(Sqrt(n)), we actually have O(n^2 + n*Sqrt(n)*log(n)). But since O(Sqrt(n) * log(n)) < O(n), this amounts to O(n^2).
You can dig deeper into number theory to find tighter bounds on the number of divisors, but that doesn't make a difference since the n^2 stays the dominating factor.
So the tightest upper bound is indeed O(n^2), but it's not as obvious as it seems at first sight.
max(n,0) would indeed be O(n). However, your algorithm is in O(n**2). Your first loop goes n times, and the second loop goes i times which is on average n/2. That makes O(n**2 / 2) = O(n**2). However, unlike the runtime of the algorithm, the amount of times println is reached is in O(n), as this happens exactly n times.
So, the answer depends on what exactly you want to measure.

theoretical analysis of comparisons

I'm first asked to develop a simple sorting algorithm that sorts an array of integers in ascending order and put it to code:
int i, j;
for ( i = 0; i < n - 1; i++)
{
if(A[i] > A[i+1])
swap(A, i+1, i);
for (j = n - 2; j >0 ; j--)
if(A[j] < A[j-1])
swap(A, j-1, j);
}
Now that I have the sort function, I'm asked to do a theoretical analysis for the running time of the algorithm. It says that the answer is O(n^2) but I'm not quite sure how to prove that complexity.
What I know so far is that the 1st loop runs from 0 to n-1, (so n-1 times), and the 2nd loop from n-2 to 0, (so n-2 times).
Doing the recurrence relation:
let C(n) = the number of comparisons
for C(2) = C(n-1) + C(n-2)
= C(1) + C(0)
C(2) = 0 comparisons?
C(n) in general would then be: C(n-1) + C(n-2) comparisons?
If anyone could guide my step by step, that would be greatly appreciated.
When doing a "real" big O - time complexity analysis, you select one operation that you count, obviously the one that dominates the running time. In your case you could either choose the comparison or the swap, since worst case there will be a lot of swaps right?
Then you calculate how many times this will be evoked, scaling to input. So in your case you are quite right with your analysis, you simply do this:
C = O((n - 1)(n - 2)) = O(n^2 -3n + 2) = O(n^2)
I come up with these numbers through reasoning about the flow of data in your code. You have one outer for-loop iterating right? Inside that for-loop you have another for-loop iterating. The first for-loop iterates n - 1 times, and the second one n - 2 times. Since they are nested, the actual number of iterations are actually the multiplication of these two, because for every iteration in the outer loop, the whole inner loop runs, doing n - 2 iterations.
As you might know you always remove all but the dominating term when doing time complexity analysis.
There is a lot to add about worst-case complexity and average case, lower bounds, but this will hopefully make you grasp how to reason about big O time complexity analysis.
I've seen a lot of different techniques for actually analyzing the expression, such as your recurrence relation. However I personally prefer to just reason about the code instead. There are few algorithms which have hard upper bounds to compute, lower bounds on the other hand are in general very hard to compute.
Your analysis is correct: the outer loop makes n-1 iterations. The inner loop makes n-2.
So, for each iteration of the outer loop, you have n-2 iterations on the internal loop. Thus, the total number of steps is (n-1)(n-2) = n^2-3n+2.
The dominating term (which is what matters in big-O analysis) is n^2, so you get O(n^2) runtime.
I personally wouldn't use the recurrence method in this case. Writing the recurrence equation is usually helpful in recursive functions, but in simpler algorithms like this, sometimes it's just easier to look at the code and do some simple math.

Do these two nested loops really have the same quadratic time complexity?

Here is a segment of an algorithm I came up with:
for (int i = 0; i < n - 1; i++)
for (int j = i; j < n; j++)
(...)
I am using this "double loop" to test all possible 2-element sums in a an array of size n.
Apparently (and I have to agree with it), this "double loop" is O(n²):
n + (n-1) + (n-2) + ... + 1 = sum from 1 to n = (n (n - 1))/2
Here is where I am confused:
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
(...)
This second "double loop" also has a complexity of O(n²), when it is clearly (at worst) much (?) better than the first.
What am I missing? Is the information accurate? Can someone explain this "phenomenon"?
(n (n - 1))/2 simplifies to n²/2 - n/2. If you use really large numbers for n, the growth rate of n/2 will be dwarfed in comparison to n², so for the sake of calculating Big-O complexity, you effectively ignore it. Likewise, the "constant" value of 1/2 doesn't grow at all as n increases, so you ignore that too. That just leaves you with n².
Just remember that complexity calculations are not the same as "speed". One algorithm can be five thousand times slower than another and still have a smaller Big-O complexity. But as you increase n to really large numbers, general patterns emerge that can typically be classified using simple formulae: 1, log n, n, n log n, n², etc.
It sometimes helps to create a graph and see what kind of line appears:
Even though the zoom factors of these two graphs are very different, you can see that the type of curve it produces is almost exactly the same.
Constant factors.
Big-O notation ignores constant factors, so even though the second loop is slower by a constant factor, they end up with the same time complexity.
Right there in the definition it tells you that you can pick any old constant factor:
... if and only if there is a positive constant M ...
This is because we want to analyse the growth rate of an algorithm - constant factors just complicates things and are often system-dependent (operations may vary in duration on different machines).
You could just count certain types of operations, but then the question becomes which operation to pick, and what if that operation isn't predominant in some algorithm. Then you'll need to relate operations to each other (in a system-independent way, which is probably impossible), or you could just assign the same weight to each, but that would be fairly inaccurate as some operations would take significantly longer than others.
And how useful would saying O(15n² + 568n + 8 log n + 23 sqrt(n) + 17) (for example) really be? As opposed to just O(n²).
(For the purpose of the below, assume n >= 2)
Note that we actually have asymptotically smaller (i.e. smaller as we approach infinity) terms here, but we can always simplify that to a matter of constant factors. (It's n(n+1)/2, not n(n-1)/2)
n(n+1)/2 = n²/2 + n/2
and
n²/2 <= n²/2 + n/2 <= n²
Given that we've just shown that n(n+1)/2 lies between C.n² and D.n², for two constants C and D, we've also just shown that it's O(n²).
Note - big-O notation is actually strictly an upper bound (so we only care that it's smaller than a function, not between two), but it's often used to mean Θ (big-Theta), which cares about both bounds.
From The Big O page on Wikipedia
In typical usage, the formal definition of O notation is not used
directly; rather, the O notation for a function f is derived by the
following simplification rules:
If f(x) is a sum of several terms, the
one with the largest growth rate is kept, and all others omitted
Big-O is used only to give the asymptotic behaviour - that one is a bit faster than the other doesn't come into it - they're both O(N^2)
You could also say that the first loop is O(n(n-1)/2). The fancy mathematical definition of big-O is something like:
function "f" is big-O of function "g" if there exists constants c, n such that f(x) < c*g(x) for some c and all x > n.
It's a fancy way of saying g is an upper bound past some point with some constant applied. It then follows that O(n(n-1)/2) = O((n^2-n)/2) is big-O of O(n^2), which is neater for quick analysis.
AFAIK, your second code snippet
for(int i = 0; i < n; i++) <-- this loop goes for n times
for(int j = 0; j < n; j++) <-- loop also goes for n times
(...)
So essentially, it's getting a O(n*n) = O(n^2) time complexity.
Per BIG-O theory, constant factor is neglected and only higher order is considered. that's to say, if complexity is O(n^2+k) then actual complexity will be O(n^2) constant k will be ignored.
(OR) if complexity is O(n^2+n) then actual complexity will be O(n^2) lower order n will be ignored.
So in your first case where complexity is O(n(n - 1)/2) will/can be simplified to
O(n^2/2 - n/2) = O(n^2/2) (Ignoring the lower order n/2)
= O(1/2 * n^2)
= O(n^2) (Ignoring the constant factor 1/2)

What is Big O of a loop?

I was reading about Big O notation. It stated,
The big O of a loop is the number of iterations of the loop into
number of statements within the loop.
Here is a code snippet,
for (int i=0 ;i<n; i++)
{
cout <<"Hello World"<<endl;
cout <<"Hello SO";
}
Now according to the definition, the Big O should be O(n*2) but it is O(n). Can anyone help me out by explaining why is that?
Thanks in adavance.
If you check the definition of the O() notation you will see that (multiplier) constants doesn't matter.
The work to be done within the loop is not 2. There are two statements, for each of them you have to do a couple of machine instructions, maybe it's 50, or 78, or whatever, but this is completely irrelevant for the asymptotic complexity calculations because they are all constants. It doesn't depend on n. It's just O(1).
O(1) = O(2) = O(c) where c is a constant.
O(n) = O(3n) = O(cn)
O(n) is used to messure the loop agains a mathematical funciton (like n^2, n^m,..).
So if you have a loop like this
for(int i = 0; i < n; i++) {
// sumfin
}
The best describing math function the loops takes is calculated with O(n) (where n is a number between 0..infinity)
If you have a loop like this
for(int i =0 ; i< n*2; i++) {
}
Means it will took O(n*2); math function = n*2
for(int i = 0; i < n; i++) {
for(int j = 0; j < n; j++) {
}
}
This loops takes O(n^2) time; math funciton = n^n
This way you can calculate how long your loop need for n 10 or 100 or 1000
This way you can build graphs for loops and such.
Big-O notation ignores constant multipliers by design (and by definition), so being O(n) and being O(2n) is exactly the same thing. We usually write O(n) because that is shorter and more familiar, but O(2n) means the same.
First, don't call it "the Big O". That is wrong and misleading. What you are really trying to find is asymptotically how many instructions will be executed as a function of n. The right way to think about O(n) is not as a function, but rather as a set of functions. More specifically:
O(n) is the set of all functions f(x) such that there exists some constant M and some number x_0 where for all x > x_0, f(x) < M x.
In other words, as n gets very large, at some point the growth of the function (for example, number of instructions) will be bounded above by a linear function with some constant coefficient.
Depending on how you count instructions that loop can execute a different number of instructions, but no matter what it will only iterate at most n times. Therefore the number of instructions is in O(n). It doesn't matter if it repeats 6n or .5n or 100000000n times, or even if it only executes a constant number of instructions! It is still in the class of functions in O(n).
To expand a bit more, the class O(n*2) = O(0.1*n) = O(n), and the class O(n) is strictly contained in the class O(n^2). As a result, that loop is also in O(2*n) (because O(2*n) = O(n)), and contained in O(n^2) (but that upper bound is not tight).
O(n) means the loops time complexity increases linearly with the number of elements.
2*n is still linear, so you say the loop is of order O(n).
However, the loop you posted is O(n) since the instructions in the loop take constant time. Two times a constant is still a constant.
The fastest growing term in your program is the loop and the rest is just the constant so we choose the fastest growing term which is the loop O(n)
In case if your program has a nested loop in it this O(n) will be ignored and your algorithm will be given O(n^2) because your nested loop has the fastest growing term.
Usually big O notation expresses the number of principal operations in a function.
In this tou're overating over n elements. So complexity is O(n).
Surely is not O(n^2), since quadratic is the complexity of those algorithms, like bubble sort which compare every element in the input with all other elements.
As you remember, bubble sort, in order to determine the right position in which to insert an element, compare every element with the others n in a list (bubbling behaviour).
At most, you can claim that you're algorithm has complexity O(2n),since it prints 2 phrases for every element in the input, but in big O notation O(n) is quiv to O(2n).

Resources