I have following algorithm and I need to calculate best-case, worst-case and average-case complexity of it:
for (i=0; i<N; i++){
for (j=0; j<N; j++){
if ((tab[i][j] % 2 != 0) && (tab[j][i] % 2 != 0)){
tab[i][i] += tab [i][j] + tab[j][i];
}
}
}
The question is - do I count ifs (as one or two - because there are two operations inside) or only alignments?
I guess complexity is n^2, but I don't know how to calculate best-case, worst-case and average-case complexity.
The complexity is N^2 in every case. The actual number of operations is somewhere between (cN^2, CN^2) where c, C are constants, c < C. The actual number of operations differ in worst, best and average cases but that doesn't change the quadratic nature of the algorithm.
The complexity is asymptotic. So, O(c*n) is taken as O(n) where c is a constant. If you want to calculate the actual number of operations, then for example, in the i loop:
Initialization of i is one operation.
i < N comparison occurs N+1 times
i++ increment operation occurs N times.
So, the loop itself has 2*N+2 operations plus the operations in the loop n times.
Related
Here is a segment of an algorithm I came up with:
for (int i = 0; i < n - 1; i++)
for (int j = i; j < n; j++)
(...)
I am using this "double loop" to test all possible 2-element sums in a an array of size n.
Apparently (and I have to agree with it), this "double loop" is O(n²):
n + (n-1) + (n-2) + ... + 1 = sum from 1 to n = (n (n - 1))/2
Here is where I am confused:
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
(...)
This second "double loop" also has a complexity of O(n²), when it is clearly (at worst) much (?) better than the first.
What am I missing? Is the information accurate? Can someone explain this "phenomenon"?
(n (n - 1))/2 simplifies to n²/2 - n/2. If you use really large numbers for n, the growth rate of n/2 will be dwarfed in comparison to n², so for the sake of calculating Big-O complexity, you effectively ignore it. Likewise, the "constant" value of 1/2 doesn't grow at all as n increases, so you ignore that too. That just leaves you with n².
Just remember that complexity calculations are not the same as "speed". One algorithm can be five thousand times slower than another and still have a smaller Big-O complexity. But as you increase n to really large numbers, general patterns emerge that can typically be classified using simple formulae: 1, log n, n, n log n, n², etc.
It sometimes helps to create a graph and see what kind of line appears:
Even though the zoom factors of these two graphs are very different, you can see that the type of curve it produces is almost exactly the same.
Constant factors.
Big-O notation ignores constant factors, so even though the second loop is slower by a constant factor, they end up with the same time complexity.
Right there in the definition it tells you that you can pick any old constant factor:
... if and only if there is a positive constant M ...
This is because we want to analyse the growth rate of an algorithm - constant factors just complicates things and are often system-dependent (operations may vary in duration on different machines).
You could just count certain types of operations, but then the question becomes which operation to pick, and what if that operation isn't predominant in some algorithm. Then you'll need to relate operations to each other (in a system-independent way, which is probably impossible), or you could just assign the same weight to each, but that would be fairly inaccurate as some operations would take significantly longer than others.
And how useful would saying O(15n² + 568n + 8 log n + 23 sqrt(n) + 17) (for example) really be? As opposed to just O(n²).
(For the purpose of the below, assume n >= 2)
Note that we actually have asymptotically smaller (i.e. smaller as we approach infinity) terms here, but we can always simplify that to a matter of constant factors. (It's n(n+1)/2, not n(n-1)/2)
n(n+1)/2 = n²/2 + n/2
and
n²/2 <= n²/2 + n/2 <= n²
Given that we've just shown that n(n+1)/2 lies between C.n² and D.n², for two constants C and D, we've also just shown that it's O(n²).
Note - big-O notation is actually strictly an upper bound (so we only care that it's smaller than a function, not between two), but it's often used to mean Θ (big-Theta), which cares about both bounds.
From The Big O page on Wikipedia
In typical usage, the formal definition of O notation is not used
directly; rather, the O notation for a function f is derived by the
following simplification rules:
If f(x) is a sum of several terms, the
one with the largest growth rate is kept, and all others omitted
Big-O is used only to give the asymptotic behaviour - that one is a bit faster than the other doesn't come into it - they're both O(N^2)
You could also say that the first loop is O(n(n-1)/2). The fancy mathematical definition of big-O is something like:
function "f" is big-O of function "g" if there exists constants c, n such that f(x) < c*g(x) for some c and all x > n.
It's a fancy way of saying g is an upper bound past some point with some constant applied. It then follows that O(n(n-1)/2) = O((n^2-n)/2) is big-O of O(n^2), which is neater for quick analysis.
AFAIK, your second code snippet
for(int i = 0; i < n; i++) <-- this loop goes for n times
for(int j = 0; j < n; j++) <-- loop also goes for n times
(...)
So essentially, it's getting a O(n*n) = O(n^2) time complexity.
Per BIG-O theory, constant factor is neglected and only higher order is considered. that's to say, if complexity is O(n^2+k) then actual complexity will be O(n^2) constant k will be ignored.
(OR) if complexity is O(n^2+n) then actual complexity will be O(n^2) lower order n will be ignored.
So in your first case where complexity is O(n(n - 1)/2) will/can be simplified to
O(n^2/2 - n/2) = O(n^2/2) (Ignoring the lower order n/2)
= O(1/2 * n^2)
= O(n^2) (Ignoring the constant factor 1/2)
I have the following algorithm:
for(int i = 1; i < n; i++)
for(int j = 0; j < i; j++)
if(j % i == 0) System.out.println(i + " " + j);
This will run max(n,0) times.
Would the Big-O notation be O(n)? If not, what is it and why?
Thank you.
You haven't stated what you are trying to measure with the Big-O notation. Let's assume it's time complexity. Next we have to define what the dependent variable is against which you want to measure the complexity. A reasonable choice here is the absolute value of n (as opposed to the bit-length), since you are dealing with fixed-length ints and not arbitrary-length integers.
You are right that the println is executed O(n) times, but that's counting how often a certain line is hit, it's not measuring time complexity.
It's easy to see that the if statement is hit O(n^2) times, so we have already established that the time complexity is bounded from below by Omega(n^2). As a commenter has already noted, the if-condition is only true for j=0, so I suspect that you actually meant to write i % j instead of j % i? This matters because the time complexity of the println(i + " " + j)-statement is certainly not O(1), but O(log n) (you can't possibly print x characters in less than x steps), so at first sight there is a possibility that the overall complexity is strictly worse than O(n^2).
Assuming that you meant to write i % j we could make the simplifying assumption that the condition is always true, in which case we would obtain the upper bound O(n^2 log n), which is strictly worse than O(n^2)!
However, noting that the number of divisors of n is bounded by O(Sqrt(n)), we actually have O(n^2 + n*Sqrt(n)*log(n)). But since O(Sqrt(n) * log(n)) < O(n), this amounts to O(n^2).
You can dig deeper into number theory to find tighter bounds on the number of divisors, but that doesn't make a difference since the n^2 stays the dominating factor.
So the tightest upper bound is indeed O(n^2), but it's not as obvious as it seems at first sight.
max(n,0) would indeed be O(n). However, your algorithm is in O(n**2). Your first loop goes n times, and the second loop goes i times which is on average n/2. That makes O(n**2 / 2) = O(n**2). However, unlike the runtime of the algorithm, the amount of times println is reached is in O(n), as this happens exactly n times.
So, the answer depends on what exactly you want to measure.
for example if we have
public void search(){
for (int i=0; i<n; i++){
.............
}
for (int j=0; j<n; j++){
.............
}
}
would search() be O(n) + O(n) = O(2n) ?
Yes, the time is linear in n, but because constant factors are not interesting in O-notation, you can still write O(n). Even if you had 10 consecutive for-loops from 1 to n, it would still be O(n). Constant factors are dropped.
Edit (answer to your comment):
If i, n and k are independent, yes, it would be O(k + (n-i)).
But you could simplify it if you for example know that k = O(n). (e.g. k ≈ n/2). Then you could write it as O(n - i) (because in O(2n - i), the 2 is dropped).
If i and k are both linearly dependent on n, it would even be O(n).
Yes, this is O(2n) which is equivalent to saying it is O(n), the coefficient doesn't matter really.
Big-O notation expresses time or space complexity. It does not measure time or space directly, it measures how the change in number of elements impacts time or space. While two loops are certainly slower than one, they both scale exactly the same: in direct proportion to n. That is what O(n) means.
(Slight simplification here, it does not account for why O(n^2 + n) = O(n^2), for example, but it should suffice for the matter at hand.)
Here is a segment of an algorithm I came up with:
for (int i = 0; i < n - 1; i++)
for (int j = i; j < n; j++)
(...)
I am using this "double loop" to test all possible 2-element sums in a an array of size n.
Apparently (and I have to agree with it), this "double loop" is O(n²):
n + (n-1) + (n-2) + ... + 1 = sum from 1 to n = (n (n - 1))/2
Here is where I am confused:
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
(...)
This second "double loop" also has a complexity of O(n²), when it is clearly (at worst) much (?) better than the first.
What am I missing? Is the information accurate? Can someone explain this "phenomenon"?
(n (n - 1))/2 simplifies to n²/2 - n/2. If you use really large numbers for n, the growth rate of n/2 will be dwarfed in comparison to n², so for the sake of calculating Big-O complexity, you effectively ignore it. Likewise, the "constant" value of 1/2 doesn't grow at all as n increases, so you ignore that too. That just leaves you with n².
Just remember that complexity calculations are not the same as "speed". One algorithm can be five thousand times slower than another and still have a smaller Big-O complexity. But as you increase n to really large numbers, general patterns emerge that can typically be classified using simple formulae: 1, log n, n, n log n, n², etc.
It sometimes helps to create a graph and see what kind of line appears:
Even though the zoom factors of these two graphs are very different, you can see that the type of curve it produces is almost exactly the same.
Constant factors.
Big-O notation ignores constant factors, so even though the second loop is slower by a constant factor, they end up with the same time complexity.
Right there in the definition it tells you that you can pick any old constant factor:
... if and only if there is a positive constant M ...
This is because we want to analyse the growth rate of an algorithm - constant factors just complicates things and are often system-dependent (operations may vary in duration on different machines).
You could just count certain types of operations, but then the question becomes which operation to pick, and what if that operation isn't predominant in some algorithm. Then you'll need to relate operations to each other (in a system-independent way, which is probably impossible), or you could just assign the same weight to each, but that would be fairly inaccurate as some operations would take significantly longer than others.
And how useful would saying O(15n² + 568n + 8 log n + 23 sqrt(n) + 17) (for example) really be? As opposed to just O(n²).
(For the purpose of the below, assume n >= 2)
Note that we actually have asymptotically smaller (i.e. smaller as we approach infinity) terms here, but we can always simplify that to a matter of constant factors. (It's n(n+1)/2, not n(n-1)/2)
n(n+1)/2 = n²/2 + n/2
and
n²/2 <= n²/2 + n/2 <= n²
Given that we've just shown that n(n+1)/2 lies between C.n² and D.n², for two constants C and D, we've also just shown that it's O(n²).
Note - big-O notation is actually strictly an upper bound (so we only care that it's smaller than a function, not between two), but it's often used to mean Θ (big-Theta), which cares about both bounds.
From The Big O page on Wikipedia
In typical usage, the formal definition of O notation is not used
directly; rather, the O notation for a function f is derived by the
following simplification rules:
If f(x) is a sum of several terms, the
one with the largest growth rate is kept, and all others omitted
Big-O is used only to give the asymptotic behaviour - that one is a bit faster than the other doesn't come into it - they're both O(N^2)
You could also say that the first loop is O(n(n-1)/2). The fancy mathematical definition of big-O is something like:
function "f" is big-O of function "g" if there exists constants c, n such that f(x) < c*g(x) for some c and all x > n.
It's a fancy way of saying g is an upper bound past some point with some constant applied. It then follows that O(n(n-1)/2) = O((n^2-n)/2) is big-O of O(n^2), which is neater for quick analysis.
AFAIK, your second code snippet
for(int i = 0; i < n; i++) <-- this loop goes for n times
for(int j = 0; j < n; j++) <-- loop also goes for n times
(...)
So essentially, it's getting a O(n*n) = O(n^2) time complexity.
Per BIG-O theory, constant factor is neglected and only higher order is considered. that's to say, if complexity is O(n^2+k) then actual complexity will be O(n^2) constant k will be ignored.
(OR) if complexity is O(n^2+n) then actual complexity will be O(n^2) lower order n will be ignored.
So in your first case where complexity is O(n(n - 1)/2) will/can be simplified to
O(n^2/2 - n/2) = O(n^2/2) (Ignoring the lower order n/2)
= O(1/2 * n^2)
= O(n^2) (Ignoring the constant factor 1/2)
I'm studying the running time of programs and have come across the Big O notation. One is asked to prove that T(n) is O(f(n)) by proving that there exists integer x and constant c > 0 such that for all integers n >= x, T(n) <= cf(n).
The examples I've seen prove this by "picking" values for x and c. I understand that you can plug values into the equation and see if they are correct, but is there a way to actually calculate x or c? Or, at least, some rules of thumb on how to pick them so one isn't plugging in values endlessly?
The values come from an examination from the algorithm T. For example, when you have a simple loop:
for (i=0; i < n; ++i) {
sum += i;
}
then you perform the operations i<n, ++i and sum+=i n times, and i=0 once. So f(n)==n, c==4 (for the four operations, elevating the "once" to "n times" for correctness of values), x==1 (for n==0, you still perform i=0 and i<n, so the formula would not work). This gives you an O(n) performance (linear in the number of inputs).
For the nested loops:
for (i=0; i < n; ++i) {
for (j=0; j<n; ++j) {
sum += j;
}
}
The calculations are similar, with f(n)==n^2, giving you O(n^2).
So there is no cut-n-dry way of telling the exact values of c and x, but most of the time the hard part is coming up with f -- and the "smallest" of that too (an O(n^2) algorithm is also an O(n^3) algorithm according to the definition you provided, but you want to characterize that algorithm with O(n^2) instead of O(n^3)). The ordering of fs is based on their growth when n approaches infinity: f(n)=n^3 grows slower than f(n)=2^n, even if for small ns the former is larger than the latter.
Note that in theory the actual values of x and c become irrelevant as n approaches infinity, that is why they don't show up in the O(n) notation itself. This does not mean, however that for (relatively) small values if n, the number of instructions are not much larger than f(n) (e.g. you have 1000 instructions within the for loop).
Also, the O(n) notation gives you worst performance, which might be much higher than what you observe in real life (average-case cost) or in the overall usage of a data structure (amortized cost), for example.