Inconsistencies in Big-O Analysis of a Basic "Algorithm" - algorithm

I recently learned about formal Big-O analysis of algorithms; however, I don't see why these 2 algorithms, which do virtually the same thing, would have drastically different running times. The algorithms both print numbers 0 up to n. I will write them in pseudocode:
Algorithm 1:
def countUp(int n){
for(int i = 0; i <= n; i++){
print(n);
}
}
Algorithm 2:
def countUp2(int n){
for(int i = 0; i < 10; i++){
for(int j = 0; j < 10; j++){
... (continued so that this can print out all values 0 - Integer.MAX_VALUE)
for(int z = 0; z < 10; z++){
print("" + i + j + ... + k);
if(("" + i + j + k).stringToInt() == n){
quit();
}
}
}
}
}
So, the first algorithm runs in O(n) time, whereas the second algorithm (depending on the programming language) runs in something close to O(n^10). Is there anything with the code that causes this to happen, or is it simply the absurdity of my example that "breaks" the math?

In countUp, the loop hits all numbers in the range [0,n] once, thus resulting in a runtime of O(n).
In countUp2, you do somewhat the exact same thing, a bunch of times. The bounds on all your loops is 10.
Say you have 3 loop running with a bound of 10. So, outer loop does 10, inner does 10x10, innermost does 10x10x10. So, worst case your innermost loop will run 1000 times, which is essentially constant time. So, for n loops with bounds [0, 10), your runtime is 10^n which, again, can be called constant time, O(1), since it is not dependent on n for worst case analysis.
Assuming you can write enough loops and that the size of n is not a factor, then you would need a loop for every single digit of n. Number of digits in n is int(math.floor(math.log10(n))) + 1; lets call this dig. So, a more strict upper bound on the number of iterations would be 10^dig (which can be kinda reduced to O(n); proof is left to the reader as an exercise).

When analyzing the runtime of an algorithm, one key thing to look for is the loops. In algorithm 1, you have code that executes n times, making the runtime O(n). In algorithm 2, you have nested loops that each run 10 times, so you have a runtime of O(10^3). This is because your code runs the innermost loop 10 times for each run of the middle loop, which in turn runs 10 times for each run of the outermost loop. So the code runs 10x10x10 times. (This is purely an upper bound however, because your if-statement may end the algorithm before the looping is complete, depending on the value of n).

To count up to n in countUp2, then you need the same number of loops as the number of digits in n: so log(n) loops. Each loop can run 10 times, so the total number of iterations is 10^log(n) which is O(n).

The first runs in O(n log n) time, since print(n) outputs O(log n) digits.
The second program assumes an upper limit for n, so is trivially O(1). When we do complexity analysis, we assume a more abstract version of the programming language where (usually) integers are unbounded but arithmetic operations still perform in O(1). In your example you're mixing up the actual programming language (which has bounded integers) with this more abstract model (which doesn't). If you rewrite the program[*] so that is has a dynamically adjustable number of loops depending on n (so if your number n has k digits, then there's k+1 nested loops), then it does one iteration of the innermost code for each number from 0 up to the next power of 10 after n. The inner loop does O(log n) work[**] as it constructs the string, so overall this program too is O(n log n).
[*] you can't use for loops and variables to do this; you'd have to use recursion or something similar, and an array instead of the variables i, j, k, ..., z.
[**] that's assuming your programming language optimizes the addition of k length-1 strings so that it runs in O(k) time. The obvious string concatenation implementation would be O(k^2) time, meaning your second program would run in O(n(log n)^2) time.

Related

What are the number of operations in this method?

I am currently learning about asymptotic analysis, however I am unsure of how many operations are in these nested loops. Would somebody be able to help me understand how to approach them? Also what would the Big-O notation be? (This is also my first time on stack overflow so please forgive any formatting errors in the code).
public static void primeFactors(int n){
while( n % 2 == 0){
System.out.print(2 + " ");
n /= 2;
}
for(int i = 3; i <= Math.sqrt(n); i += 2){
while(n % i == 0){
System.out.print(i + " ");
n /= i;
}
}
}
In the worst case, the first loop (while) is O(log(n)). In the second loop, the outer loop runs in O(sqrt(n)) and the inner loop runs in O(log_i(n)). Hence, the time complexity of the second loop (inner and outer in total) is:
O(sum_{i = 3}^{sqrt(n)} log_i(n))
Therefore, the time complexity of the mentioned algorithm is O(sqrt(n) log(n)).
Notice that, if you mean n is modified inside the inner loop, and it affects on sqrt(n) in the outer loop, so the complexity of the second loop is O(sqrt(n)). Therefore, under this assumtion, the time complexity of the alfgorithm will be O(sqrt(n)) + O(log(n)) == O(sqrt(n)).
First, we see that the first loop is really a special case of the inner loop that occurs in the other loop, with i equal to two. It was separated as a special case in order to be able to increase i with steps of 2 instead of 1. But from the point of view of asymptotic complexity the step by 2 makes no difference: it represents a constant coefficient, which we can ignore. And so for our analysis we can just rewrite the code to this:
public static void primeFactors(int n){
for(int i = 2; i <= Math.sqrt(n); i += 1){ // note the change in start and increment value
while(n % i == 0){
System.out.print(i + " ");
n /= i;
}
}
}
The number of times that n/i is executed, corresponds to the number of non-distinct prime divisors that a number has. According to this Q&A that number of times is O(loglogn). It is not straightforward to derive this, so I had to look it up.
We should also consider the number of times the for loop iterates. The Math.sqrt(n) boundary for i can lower as the for loop iterates. The more divisions take place, the (much) fewer iterations the for loop has to make.
We can see that at the time that the loop exits, i has surpassed the square root of the greatest prime divisor of n. In the worst case that greatest prime divisor is n itself (when n is prime). So the for loop can iterate up to the square root of n, so O(√n) times. In that case the inner loop never iterates (no divisions).
We should thus see which is more determining, and we get O(√n + loglogn). This is O(√n).
The first loop divides n by 2 until it is not divisible by 2 anymore. The maximum number of time this can happen is log2(n).
The second loop, at first sight, seems to run sqrt(n) times the inner loop which is also O(log(n)), but that is actually not the case. Everytime the while condition of the second loop is satisfied, n is drastically decreased, and since the condition of a for-loop is executed on each iteration, sqrt(n) also decreases. The worst case actually happens if the condition of while loop is never satisfied, i.e. if n is prime.
Hence the overall time complexity is O(sqrt(n)).

Why is this algorithm O(nlogn)?

I'm reading a book on algorithm analysis and have found an algorithm which I don't know how to get the time complexity of, although the book says that it is O(nlogn).
Here is the algorithm:
sum1=0;
for(k=1; k<=n; k*=2)
for(j=1; j<=n; j++)
sum1++;
Perhaps the easiest way to convince yourself of the O(n*lgn) running time is to run the algorithm on a sheet of paper. Consider what happens when n is 64. Then the outer loop variable k would take the following values:
1 2 4 8 16 32 64
The log_2(64) is 6, which is the number of terms above plus one. You can continue this line of reasoning to conclude that the outer loop will take O(lgn) running time.
The inner loop, which is completely independent of the outer loop, is O(n). Multiplying these two terms together yields O(lgn*n).
In your first loop for(k=1; k<=n; k*=2), variable k reaches the value of n in log n steps since you're doubling the value in each step.
The second loop for(j=1; j<=n; j++) is just a linear cycle, so requires n steps.
Therefore, total time is O(nlogn) since the loops are nested.
To add a bit of mathematical detail...
Let a be the number of times the outer loop for(k=1; k<=n; k*=2) runs. Then this loop will run 2^a times (Note the loop increment k*=2). So we have n = 2^a. Solve for a by taking base 2 log on both sides, then you will get a = log_2(n)
Since the inner loop runs n times, total is O(nlog_2(n)).
To add to #qwerty's answer, if a is the number of times the outer loop runs:
    k takes values 1, 2, 4, ..., 2^a and 2^a <= n
    Taking log on both sides: log_2(2^a) <= log_2(n), i.e. a <= log_2(n)
So the outer loop has a upper bound of log_2(n), i.e. it cannot run more than log_2(n) times.

Theta Notation and Worst Case Running time nested loops

This is the code I need to analyse:
i = 1
while i < n
do
j = 0;
while j <= i
do
j = j + 1
i = 2i
So, the first loop should run log(2,n) and the innermost loop should run log(2,n) * (i + 1), but I'm pretty sure that's wrong.
How do I use a theta notation to prove it?
An intuitive way to think about this is to see how much work your inner loop is doing for a fixed value of outer loop variable i. It's clearly as much as i itself. Thus, if the value of i is 256, then then you will do j = j + 1 that many times.
Thus, total work done is the sum of the values that i takes in the outer loop's execution. That variable is increasing much rapidly to catch up with n. Its values, as given by i = 2i (it should be i = 2*i), are going to be like: 2, 4, 8, 16, ..., because we start with 2 iterations of the inner loop when i = 1. This is a geometric series: a, ar, ar^2 ... with a = 1 and r = 2. The last term, as you figured out will be n and there will be log2 n terms in the series. And that is simple summation of a geometric series.
It doesn't make much sense to have a worst case or a best case for this algorithm because there are no different permutations of the input which is just a number n in this case. Best case or worst case are relevant when a particular input (e.g. a particular sequence of numbers) affects the running time of the algorithm.
The running time then is the sum of geometric series (a.(r^num_terms - 1)/(r-1)):
T(n) = 2 + 4 + ... 2^(log2 n)
= 2 . (2^log2 n - 1)
= 2 . (n - 1)
⩽ 3n = O(n)
Thus, you can't be doing work that is more than some constant multiple of n. Hence, the running time of this algorithm is O(n).
You can't be doing some work that is less than some (other) constant multiple of n, since you have to go through the increment in inner loop as shown above. Thus, the running time of this algorithm is also ≥ c.n i.e. it is Ω(n).
Together, this means that running time of this algorithm is Θ(n).
You can't use i in your final expression; only n.
You can easily see that the inner loop executes i times each time it is reached. And it sounds like you've figured out the different values that i can have. So add up those values, and you have the total amount of work.

The efficiency of an algorithm

I am having a hard time understanding the efficiency of an algorithm and how do you really determine that, that particular sentence or part is lg n, O (N) or log base 2 (n)?
I have two examples over here.
doIt() can be expressed as O(n)=n^2.
First example.
i=1
loop (i<n)
doIt(…)
i=i × 2
end loop
The cost of the above is as follows:
i=1 ... 1
loop (i<n) ... lg n
doIt(…) ... n^2 lg n
i=i × 2 ... lg n
end loop
Second example:
static int myMethod(int n){
int i = 1;
for(int i = 1; i <= n; i = i * 2)
doIt();
return 1;
}
The cost of the above is as follows:
static int myMethod(int n){ ... 1
int i = 1; ... 1
for(int i = 1; i <= n; i = i * 2) ... log base 2 (n)
doIt(); ... log base 2 (n) * n^2
return 1; ... 1
}
All this have left me wondering, how do you really find out what cost is what? I've been asking around, trying to understand but there is really no one who can really explain this to me. I really wanna understand how do I really determine the cost badly. Anyone can help me on this?
The big O notation is not measuring how long the program will run. It says how fast will the running time increase as the size of the problem grows.
For example, if calculating something is O(1), that could be a very long time, but it is independent of the size of the problem.
Normally, you're not expecting to estimate costs of such things as cycle iterator (supposing storing one integer value and changing it N times is too minor to include in result estimation).
What really matters - that in terms of Big-O, Big-Theta e.t.c you're expected to find functional dependence, i.e. find a function of one argument (N), for which:
Big-O: entire algorithm count of operation grows lesser than F(N)
Big-Theta: entire algorithm count of operation grows equal to F(N)
Big-Omega: entire algorithm count of operation grows greater than F(N)
so, remember - you're not trying to find a number of operations, you're trying to find functional estimation for that, i.e. functional dependence between amount of incoming data N and some function from N, which indicates speed of growth for operation's count.
So, O(1), for example, indicates, that whole algorithm will not depend from N (it is constant). You can read more here.
Also, there are different types of estimations. You can estimate memory or execution time, for example - that will be different estimations in common case.

How to calculate worst case analysis of this algorithm?

sum = 0;
for(int i = 0; i < N; i++)
for(int j = i; j >= 0; j--)
sum++;
From what I understand, the first line is 1 operation, 2nd line is (i+1) operations, 3rd line is (i-1) operations, and 4th line is n operations. Does this mean that the running time would be 1 + (i+1)(i-1) + n? It's just these last steps that confuse me.
To analyze the algorithm you don't want to go line by line asking "how much time does this particular line contribute?" The reason is that each line doesn't execute the same number of times. For example, the innermost line is executed a whole bunch of times, compared to the first line which is run just once.
To analyze an algorithm like this, try identifying some quantity whose value is within a constant factor of the total runtime of the algorithm. In this case, that quantity would probably be "how many times does the line sum++ execute?", since if we know this value, we know the total amount of time that's spent by the two loops in the algorithm. To figure this out, let's trace out what happens with these loops. On the first iteration of the outer loop, i == 0 and so the inner loop will execute exactly once (counting down from 0 to 0). On the second iteration of the outer loop, i == 1 and the inner loop executes exactly twice (first with j == 1, once with j == 0. More generally, on the kth iteration of the outer loop, the inner loop executes k + 1 times. This means that the total number of iterations of the innermost loop is given by
1 + 2 + 3 + ... + N
This quantity can be shown to be equal to
N (N + 1) N^2 + N N^2 N
--------- = ------- = --- + ---
2 2 2 2
Of these two terms, the N^2 / 2 term is the dominant growth term, and so if we ignore its constant factors we get a runtime of O(N2).
Don't look at this answer as something you should memorize - think of all of the steps required to get to the answer. We started by finding some quantity to count, and then saw how that quantity was influenced by the execution of the loops. From this, we were able to derive a mathematical expression for that quantity, which we then simplified. Finally, we took the resulting expression and determined the dominant term, which serves as the big-O for the overall function.
Work from inside-out.
sum++
This is a single operation on it's own, as it doesn't repeat.
for(int j = i; j >= 0; j--)
This loops i+1 times. There are several operations in there, but you probably don't mean to count the number of asm instructions. So I'll assume for this question this is a multiplier of i+1. Since the loop contents is a single operation, the loop and its block perform i+1 operations.
for(int i = 0; i < N; i++)
This loops N times. So as before, this is a multiplier of N. Since the block performs i+1 operations, this loop performs N(N+1)/2 operations in total. And that's your answer! If you want to consider big-O complexity, then this simplifies to O(N2).
It's not additive: the inner loop happens once for EACH iteration of the outer loop. So it's O(n2).
By the way, this is a good example of why we use asymptotic notation for this kind of thing -- depending on the definition of "operation" the exact details of the count could vary pretty widely. (Like, is sum++ a single operation, or is it add sum to 1 giving temp; load temp to sum?) But since we know that all that can be hidden in a constant factor, it's still going to be O(n2).
No; you don't count a specific number of operations for each line and then add them up. The entire point of constructions like 'for' is to make it possible for a given line of code to run more than once. You're supposed to use thinking and logic skills to figure out how many times the line 'sum++' will run, as a function of N. Hint: it runs once for every time that the third line is encountered.
How many times is the second line encountered?
Each time the second line is encountered, the value of 'i' is set. How many times does the third line run with that value of i? Therefore, how many times will it run overall? (Hint: if I give you a different amount of money on several different occasions, how do you find out the total amount I gave you?)
Each time the third line is encountered, the fourth line happens once.
Which line happens most often? How often does it happen, in terms of N?
So guess what interest you is the sum++ and how many time you execute it.
The final stat of sum would give you that answer.
Actually your loop is just:
Sigma(n) n goes from 1 to N.
Which equal to: N*(N+1) / 2 This give you in big-o-notation O(N^2)
Also beside the name of you question there is no worst case in you algorithm.
Or you could say that the worst case is when N goes to infinity.
Using Sigma notation to represent your loops:

Resources