What's the big O of this little code snippet? - algorithm

for i := 1 to n do
j := 2;
while j < i do
j := j^4;
I'm really confused when it comes to Big-O notation, so I'd like to know if it's O(n log n). That's my gut, but I can't prove it. I know the while loop is probably faster than log n, but I don't know by how much!
Edit: the caret denotes exponent.

The problem is the number of iterations the while loop is executed for a given i.
On every iteration j:= j^4 and at the beginning j := 2, so after x iterations j = 24^x
j < i is equivalent to x < log_4(log_2(i))
I'd risk a statement, that the complexity is O(n * log_4(log_2(n)))
You can get rid of constant factors in Big O notation. log_4(x) = log(x) / log(4) and log(4) is a constant. Similarly you can change log_2(x) to log(x). The complexity can be expressed as O(n*log(log(n)))

Off the cuff, I'd guess is it is O(n slog4 n) where slog represents the inverse of the tetration operator. Tetration is the next operation after exponentiation. Just like multiplication is iterated addition, and exponentiation is iterated multiplication, tetration is iterated exponentiation.
My reasoning is, if you multiplied j by 4 each iteration then the function would be O(n log4 n). But since you are exponentiating it each iteration, you need a correspondingly more powerful operator than log: slog.

Related

What is the time complexity of Brute Force Algorithm of Kadane's Algorithm? [duplicate]

Here is a segment of an algorithm I came up with:
for (int i = 0; i < n - 1; i++)
for (int j = i; j < n; j++)
(...)
I am using this "double loop" to test all possible 2-element sums in a an array of size n.
Apparently (and I have to agree with it), this "double loop" is O(n²):
n + (n-1) + (n-2) + ... + 1 = sum from 1 to n = (n (n - 1))/2
Here is where I am confused:
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
(...)
This second "double loop" also has a complexity of O(n²), when it is clearly (at worst) much (?) better than the first.
What am I missing? Is the information accurate? Can someone explain this "phenomenon"?
(n (n - 1))/2 simplifies to n²/2 - n/2. If you use really large numbers for n, the growth rate of n/2 will be dwarfed in comparison to n², so for the sake of calculating Big-O complexity, you effectively ignore it. Likewise, the "constant" value of 1/2 doesn't grow at all as n increases, so you ignore that too. That just leaves you with n².
Just remember that complexity calculations are not the same as "speed". One algorithm can be five thousand times slower than another and still have a smaller Big-O complexity. But as you increase n to really large numbers, general patterns emerge that can typically be classified using simple formulae: 1, log n, n, n log n, n², etc.
It sometimes helps to create a graph and see what kind of line appears:
Even though the zoom factors of these two graphs are very different, you can see that the type of curve it produces is almost exactly the same.
Constant factors.
Big-O notation ignores constant factors, so even though the second loop is slower by a constant factor, they end up with the same time complexity.
Right there in the definition it tells you that you can pick any old constant factor:
... if and only if there is a positive constant M ...
This is because we want to analyse the growth rate of an algorithm - constant factors just complicates things and are often system-dependent (operations may vary in duration on different machines).
You could just count certain types of operations, but then the question becomes which operation to pick, and what if that operation isn't predominant in some algorithm. Then you'll need to relate operations to each other (in a system-independent way, which is probably impossible), or you could just assign the same weight to each, but that would be fairly inaccurate as some operations would take significantly longer than others.
And how useful would saying O(15n² + 568n + 8 log n + 23 sqrt(n) + 17) (for example) really be? As opposed to just O(n²).
(For the purpose of the below, assume n >= 2)
Note that we actually have asymptotically smaller (i.e. smaller as we approach infinity) terms here, but we can always simplify that to a matter of constant factors. (It's n(n+1)/2, not n(n-1)/2)
n(n+1)/2 = n²/2 + n/2
and
n²/2 <= n²/2 + n/2 <= n²
Given that we've just shown that n(n+1)/2 lies between C.n² and D.n², for two constants C and D, we've also just shown that it's O(n²).
Note - big-O notation is actually strictly an upper bound (so we only care that it's smaller than a function, not between two), but it's often used to mean Θ (big-Theta), which cares about both bounds.
From The Big O page on Wikipedia
In typical usage, the formal definition of O notation is not used
directly; rather, the O notation for a function f is derived by the
following simplification rules:
If f(x) is a sum of several terms, the
one with the largest growth rate is kept, and all others omitted
Big-O is used only to give the asymptotic behaviour - that one is a bit faster than the other doesn't come into it - they're both O(N^2)
You could also say that the first loop is O(n(n-1)/2). The fancy mathematical definition of big-O is something like:
function "f" is big-O of function "g" if there exists constants c, n such that f(x) < c*g(x) for some c and all x > n.
It's a fancy way of saying g is an upper bound past some point with some constant applied. It then follows that O(n(n-1)/2) = O((n^2-n)/2) is big-O of O(n^2), which is neater for quick analysis.
AFAIK, your second code snippet
for(int i = 0; i < n; i++) <-- this loop goes for n times
for(int j = 0; j < n; j++) <-- loop also goes for n times
(...)
So essentially, it's getting a O(n*n) = O(n^2) time complexity.
Per BIG-O theory, constant factor is neglected and only higher order is considered. that's to say, if complexity is O(n^2+k) then actual complexity will be O(n^2) constant k will be ignored.
(OR) if complexity is O(n^2+n) then actual complexity will be O(n^2) lower order n will be ignored.
So in your first case where complexity is O(n(n - 1)/2) will/can be simplified to
O(n^2/2 - n/2) = O(n^2/2) (Ignoring the lower order n/2)
= O(1/2 * n^2)
= O(n^2) (Ignoring the constant factor 1/2)

Complexity/big theta of loop with multiplicative increment of i

I'm trying to find the time complexity/ big theta of the following:
def f(n):
i = 2
while i <n:
print(i)
i = i*i
The only approach of how I know how to solve this is to find a general formula for i_k and then solve the equation of i_k >= n, however I end up with a log(logn/log2)/log(2) equation as my k value, and that seems awefully wrong to me and I'm not sure how I would translate that into a big theta expression. Any help would be appreciated!
That answer looks good, actually! If you rewrite log x / log 2 as log2 x (or lg x, for short), what you have is that the number of iterations is lg lg n. Since the value of i in iteration k of the loop is 22k, this means that the loop stops when i reaches the value 22lg lg n = 2lg n = n, which matches the loop bound.
More generally, the number of times you can square a value before it exceeds n is Θ(log log n), and similarly the number of square roots you can take before you drop a number n down to a constant is Θ(log log n), so your answer is pretty much what you’d expect.

How are the following equivalent to O(N)

I am reading an example where the following are equivalent to O(N):
O(N + P), where P < N/2
O(N + log N)
Can someone explain in laymen terms how it is that the two examples above are the same thing as O(N)?
We always take the greater one in case of addition.
In both the cases N is bigger than the other part.
In first case P < N/2 < N
In second case log N < N
Hence the complexity is O(N) in both the cases.
Let f and g be two functions defined on some subset of the real numbers. One writes
f(x) = O(g(x)) as x -> infinite
if and only if there is a positive constant M such that for all sufficiently large values of x, the absolute value of f(x) is at most M multiplied by the absolute value of g(x). That is, f(x) = O(g(x)) if and only if there exists a positive real number M and a real number x0 such that
|f(x)| <= M |g(x)| for all x > x0
So in your case 1:
f(N) = N + P <= N + N/2
We could set M = 2 Then:
|f(N)| <= 3/2|N| <= 2|N| (N0 could any number)
So:
N+p = O(N)
In your second case, we could also set M=2 and N0=1 to satify that:
|N + logN| <= 2 |N| for N > 1
Big O notation usually only provides an upper bound on the growth rate of the function, wiki. Meaning for your both cases, as P < N and logN < N. So that O(N + P) = O(2N) = O(N), The same to O(N + log N) = O(2N) = O(N). Hope that can answer your question.
For the sake of understanding you can assume that O(n) represents that the complexity is of the order of n and also that O notation represents the upper bound(or the complexity in worst case). So, when I say that O(n+p) it represents that the order of n+p.
Let's assume that in worst case p = n/2, then what would be order of n+n/2? It would still be O(n), that is, linear because constants do form a part of the Big-O notation.
Similary, for O(n+logn) because logn can never be greater than n. So, overall complexity turns out to be linear.
In short
If N is a function and C is a constant:
O(N+N/2):
If C=2, then for any N>1 :
(C=2)*N > N+N/2,
2*N>3*N/2,
2> 3/2 (true)
O(N+logN):
If C=2, then for any N>2 :
(C=2)*N > N+logN,
2*N > N+logN,
2>(N+logN)/N,
2> 1 + logN/N (limit logN/N is 0),
2>1+0 (true)
Counterexample O(N^2):
No C exists such that C*N > N^2 :
C > N^2/N,
C>N (contradiction).
Boring mathematical part
I think the source of confusion is that equals sign in O(f(x))=O(N) does not mean equality! Usually if x=y then y=x. However consider O(x)=O(x^2) which is true, but reverse is false: O(x^2) != O(x)!
O(f(x)) is an upper bound of how fast a function is growing.
Upper bound is not an exact value.
If g(x)=x is an upper bound for some function f(x), then function 2*g(x) (and in general anything growing faster than g(x)) is also an upper bound for f(x).
The formal definition is: for function f(x) to be bound by some other function g(x) if you chose any constant C then, starting from some x_0 g(x) is always greater than f(x).
f(x)=N+N/2 is the same as 3*N/2=1.5*N. If we take g(x)=N and our constant C=2 then 2*g(x)=2*N is growing faster than 1.5*N:
If C=2 and x_0=1 then for any n>(x_0=1) 2*N > 1.5*N.
same applies to N+log(N):
C*N>N+log(N)
C>(N+logN)/N
C>1+log(N)/N
...take n_0=2
C>1+1/2
C>3/2=1.5
use C=2: 2*N>N+log(N) for any N>(n_0=2),
e.g.
2*3>3+log(3), 6>3+1.58=4.68
...
2*100>100+log(100), 200>100+6.64
...
Now interesting part is: no such constant exist for N & N^2. E.g. N squared grows faster than N:
C*N > N^2
C > N^2/N
C > N
obviously no single constant exists which is greater than a variable. Imagine such a constant exists C=C_0. Then starting from N=C_0+1 function N is greater than constant C, therefore such constant does not exist.
Why is this useful in computer science?
In most cases calculating exact algorithm time or space does not make sense as it would depend on hardware speed, language overhead, algorithm implementation details and many other factors.
Big O notation provides means to estimate which algorithm is better independently from real world complications. It's easy to see that O(N) is better than O(N^2) starting from some n_0 no matter which constants are there in front of two functions.
Another benefit is ability to estimate algorithm complexity by just glancing at program and using Big O properties:
for x in range(N):
sub-calc with O(C)
has complexity of O(N) and
for x in range(N):
sub-calc with O(C_0)
sub-calc with O(C_1)
still has complexity of O(N) because of "multiplication by constant rule".
for x in range(N):
sub-calc with O(N)
has complexity of O(N*N)=O(N^2) by "product rule".
for x in range(N):
sub-calc with O(C_0)
for y in range(N):
sub-calc with O(C_1)
has complexity of O(N+N)=O(2*N)=O(N) by "definition (just take C=2*C_original)".
for x in range(N):
sub-calc with O(C)
for x in range(N):
sub-calc with O(N)
has complexity of O(N^2) because "the fastest growing term determines O(f(x)) if f(x) is a sum of other functions" (see explanation in the mathematical section).
Final words
There is much more to Big-O than I can write here! For example in some real world applications and algorithms beneficial n_0 might be so big that an algorithm with worse complexity works faster on real data.
CPU cache might introduce unexpected hidden factor into otherwise asymptotically good algorithm.
Etc...

Big-O notation with 2 variables. Given m <= n, can we reduce O(nm)?

Assuming m <= n, can you reduce O(nm) to O(n)?
I would think not, since m can be equal to n
In the case of m < n, I would think that O(nm) can be reduced to O(n) since m is strictly less than n and hence becomes insignificant as n approaches infinity
Am I correct in my above assumptions? If so, why? What is the more formal way of showing this?
If m is a constant (E.g.: 2) or lower than a constant, you are right: O(mn) = O(n).
But because you wrote m < n, I suppose that m also goes to infinite, but slower than n.
In this case, you are wrong.
Consider m = log(n) as an example and everything should be clear.
O(mn) = O(n*log(n))
which is different than O(n).
That would be true for O(m+n), but not for O(mn).
Given m <= n, all you can say about O(mn) is that it's O(n**2) at worst.
If m is bounded by a constant, O(mn) becomes O(n)
Simply you can not reduce the O(m*n) to O(n). If there is no boundary condition on m.
m < n It means m can be anything between 0 to n-1.
Let's say that m is bounded and it value ca not grow more than C
m <= C
In this case
O(m*n) can be reduced to O(n)
P.S : Do read this plain english big o notaion

Calculating tilde-complexity of for-loop with cubic index

Say I have following algorithm:
for(int i = 1; i < N; i *= 3) {
sum++
}
I need to calculate the complexity using tilde-notation, which basically means that I have to find a tilde-function so that when I divide the complexity of the algorithm by this tilde-function, the limit in infinity has to be 1.
I don't think there's any need to calculate the exact complexity, we can ignore the constants and then we have a tilde-complexity.
By looking at the growth off the index, I assume that this algorithm is
~ log N
But rather than having a binary logarithmic function, the base in this case is 3.
Does this matter for the exact notation? Is the order of growth exactly the same and thus can we ignore the base when using Tilde-notation? Do I approach this correctly?
You are right, the for loop executes ceil(log_3 N) times, where log_3 N denotes the base-3 logarithm of N.
No, you cannot ignore the base when using the tilde notation.
Here's how we can derive the time complexity.
We will assume that each iteration of the for loop costs C, for some constant C>0.
Let T(N) denote the number of executions of the for-loop. Since at j-th iteration the value of i is 3^j, it follows that the number of iterations that we make is the smallest j for which 3^j >= N. Taking base-3 logarithms of both sides we get j >= log_3 N. Because j is an integer, j = ceil(log_3 N). Thus T(N) ~ ceil(log_3 N).
Let S(N) denote the time complexity of the for-loop. The "total" time complexity is thus C * T(N), because the cost of each of T(N) iterations is C, which in tilde notation we can write as S(N) ~ C * ceil*(log_3 N).

Resources