I was going through the examples of Big Oh to calculate Running time calculation from Data Structures and Algorithm Analysis using C by mark Weiss.
The example is:
int sum(int N)
/*1*/ int sum=0;
/*2*/ for (int i=1;i<=N;i++)
/*3*/ sum=sum+i*i*i;
/*4*/ return sum;
I understand the cost for each line, but I have confusion for the line no 2, that in line 2 there are three things, initialization, testing and increment. so the total cost for initialization is 1 and for testing (n+1) and n for all the increment. I want to know that how for testing the cost is n+1 because from the condition, the loop will run n time then how the cost will be n+1?
Let's say n=5. The condition will run 6 times and for 6th time the condition will not be true so n+1 testing.
Related
This question already has answers here:
Big O, how do you calculate/approximate it?
(24 answers)
Closed 2 years ago.
I have gone through a couple of examples in understanding the time complexity of a program using the operation count and step count technique but these examples are small and straight. However in real-world programming, if someone gives you the production or live code to find the worst case time complexity of it. How does one start analyzing it?
Is there any technique or thumb rules for analyzing the program? say I have a function as shown below that has if and for conditions. For some values of n, it goes deep but for some values, it does not.
So maximum number of operations is the worst case and the minimum number of operations is the best case.
How can we know when do maximum operations occur if we have this kind of conditional statement? (Other functions can go much deeper conditional and not the loops)
int l = 0;
while (l <= n){
int m = l + (n-l)/2;
if (arr[m] == x) return;
if (arr[m] < x) l = m + 1;
else n = m - 1;
}
How does one calculate the worst and best case by looking at the above code? Do the perform step count for the above program and by substituting n = 1 till 20 and get some values and then trying to derive the function? I would like to know how do people analyze time complexity for existing code when they have this kind of branching statements.
Step by Step analysis or Set of statements to follow to solve the above problem would be greatly helpful.
As each time half of the n is added to m and the loop will be continued by increasing a unit of l or changing the value of the n to the m-1 the following scenario can give the maximum operation:
In each iteration, the `else` part is happened and set `n` to `m-1`.
Let see what is happened for this case. As each time n is dividend by 2, and l is keeping 0, after O(log(n)) iteration, l == n.
Therefore, the time complexity of the loop is O(log(n)).
Notice that other cases can increase l faster towards n. For example, if l = m + 1 it means that l = (n-1)/2, and in the next iteration, m will be increased to n-1. Hence, just 2 iteration we will have to reach to the end of the loop.
So I was taught that using the recurrence relation of Fibonacci numbers, I could get an O(n) algorithm. But, due to the large size of Fibonacci numbers for large n, the addition takes proportionally longer : meaning that, the time complexity is no longer linear.
That's all well and fine but in this graph (source), why is there a number near 1800000 which takes significantly longer to compute than it's neighbours?
EDIT: As mentioned in the answers, the outlier is around 180000, not 1800000
The outlier occurred at 180,000, not 1,800,000. I don't know how big integers are stored in python, but assuming 32 bit words stored as binary, fib(180000) takes close to 100,000 bytes. I suspect an issue with the testing for why 180,000 would take significantly longer than 181,000 or 179,000.
#cdlane mentioned the time it would take for fib(170000) to fib(200000), but the increment is 1000, so that would be 30 test cases, run 10 times each, which would take less than 20 minutes.
The article linked to mentioned a matrix variation for calculating Fibonacci number which is a log2(n) process. This can be further optimized using a Lucas sequence which is similar logic (repeated squaring for raising a matrix to a power). Example C code for 64 bit unsigned integers:
uint64_t fibl(uint64_t n) {
uint64_t a, b, p, q, qq, aq;
a = q = 1;
b = p = 0;
while(1) {
if(n & 1) {
aq = a*q;
a = b*q + aq + a*p;
b = b*p + aq;
}
n >>= 1;
if(n == 0)
break;
qq = q*q;
q = 2*p*q + qq;
p = p*p + qq;
}
return b;
}
probably the measurement of time was done on a system where other
tasks were performed simultaneously, which was causing a single
computation to take longer than expected.
Part of the graph that the OP didn't copy says:
Note: Each data point was averaged over 10 calculcations[sic].
So that seems to negate that explanation unless it was a huge difference and we're only seeing it averaged with normal times.
(My 2012 vintage Mac mini calculates the 200,000th number in 1/2 a second using the author's code. The machine that produced this data took over 4 seconds to calculate the 200,000th number. I pulled out a 2009 vintage MacBook to confirm that this is believable.)
I did my own crude graph with a modified version of the code that didn't restart at the beginning each time but kept a running total of the time and came up with the following chart:
The times are a bit longer than expected as the algorithm I used adds some printing overhead into each calculation. But no big anomaly between 180,000 and 195,000.
I recently learned about formal Big-O analysis of algorithms; however, I don't see why these 2 algorithms, which do virtually the same thing, would have drastically different running times. The algorithms both print numbers 0 up to n. I will write them in pseudocode:
Algorithm 1:
def countUp(int n){
for(int i = 0; i <= n; i++){
print(n);
}
}
Algorithm 2:
def countUp2(int n){
for(int i = 0; i < 10; i++){
for(int j = 0; j < 10; j++){
... (continued so that this can print out all values 0 - Integer.MAX_VALUE)
for(int z = 0; z < 10; z++){
print("" + i + j + ... + k);
if(("" + i + j + k).stringToInt() == n){
quit();
}
}
}
}
}
So, the first algorithm runs in O(n) time, whereas the second algorithm (depending on the programming language) runs in something close to O(n^10). Is there anything with the code that causes this to happen, or is it simply the absurdity of my example that "breaks" the math?
In countUp, the loop hits all numbers in the range [0,n] once, thus resulting in a runtime of O(n).
In countUp2, you do somewhat the exact same thing, a bunch of times. The bounds on all your loops is 10.
Say you have 3 loop running with a bound of 10. So, outer loop does 10, inner does 10x10, innermost does 10x10x10. So, worst case your innermost loop will run 1000 times, which is essentially constant time. So, for n loops with bounds [0, 10), your runtime is 10^n which, again, can be called constant time, O(1), since it is not dependent on n for worst case analysis.
Assuming you can write enough loops and that the size of n is not a factor, then you would need a loop for every single digit of n. Number of digits in n is int(math.floor(math.log10(n))) + 1; lets call this dig. So, a more strict upper bound on the number of iterations would be 10^dig (which can be kinda reduced to O(n); proof is left to the reader as an exercise).
When analyzing the runtime of an algorithm, one key thing to look for is the loops. In algorithm 1, you have code that executes n times, making the runtime O(n). In algorithm 2, you have nested loops that each run 10 times, so you have a runtime of O(10^3). This is because your code runs the innermost loop 10 times for each run of the middle loop, which in turn runs 10 times for each run of the outermost loop. So the code runs 10x10x10 times. (This is purely an upper bound however, because your if-statement may end the algorithm before the looping is complete, depending on the value of n).
To count up to n in countUp2, then you need the same number of loops as the number of digits in n: so log(n) loops. Each loop can run 10 times, so the total number of iterations is 10^log(n) which is O(n).
The first runs in O(n log n) time, since print(n) outputs O(log n) digits.
The second program assumes an upper limit for n, so is trivially O(1). When we do complexity analysis, we assume a more abstract version of the programming language where (usually) integers are unbounded but arithmetic operations still perform in O(1). In your example you're mixing up the actual programming language (which has bounded integers) with this more abstract model (which doesn't). If you rewrite the program[*] so that is has a dynamically adjustable number of loops depending on n (so if your number n has k digits, then there's k+1 nested loops), then it does one iteration of the innermost code for each number from 0 up to the next power of 10 after n. The inner loop does O(log n) work[**] as it constructs the string, so overall this program too is O(n log n).
[*] you can't use for loops and variables to do this; you'd have to use recursion or something similar, and an array instead of the variables i, j, k, ..., z.
[**] that's assuming your programming language optimizes the addition of k length-1 strings so that it runs in O(k) time. The obvious string concatenation implementation would be O(k^2) time, meaning your second program would run in O(n(log n)^2) time.
I am having a hard time understanding the efficiency of an algorithm and how do you really determine that, that particular sentence or part is lg n, O (N) or log base 2 (n)?
I have two examples over here.
doIt() can be expressed as O(n)=n^2.
First example.
i=1
loop (i<n)
doIt(…)
i=i × 2
end loop
The cost of the above is as follows:
i=1 ... 1
loop (i<n) ... lg n
doIt(…) ... n^2 lg n
i=i × 2 ... lg n
end loop
Second example:
static int myMethod(int n){
int i = 1;
for(int i = 1; i <= n; i = i * 2)
doIt();
return 1;
}
The cost of the above is as follows:
static int myMethod(int n){ ... 1
int i = 1; ... 1
for(int i = 1; i <= n; i = i * 2) ... log base 2 (n)
doIt(); ... log base 2 (n) * n^2
return 1; ... 1
}
All this have left me wondering, how do you really find out what cost is what? I've been asking around, trying to understand but there is really no one who can really explain this to me. I really wanna understand how do I really determine the cost badly. Anyone can help me on this?
The big O notation is not measuring how long the program will run. It says how fast will the running time increase as the size of the problem grows.
For example, if calculating something is O(1), that could be a very long time, but it is independent of the size of the problem.
Normally, you're not expecting to estimate costs of such things as cycle iterator (supposing storing one integer value and changing it N times is too minor to include in result estimation).
What really matters - that in terms of Big-O, Big-Theta e.t.c you're expected to find functional dependence, i.e. find a function of one argument (N), for which:
Big-O: entire algorithm count of operation grows lesser than F(N)
Big-Theta: entire algorithm count of operation grows equal to F(N)
Big-Omega: entire algorithm count of operation grows greater than F(N)
so, remember - you're not trying to find a number of operations, you're trying to find functional estimation for that, i.e. functional dependence between amount of incoming data N and some function from N, which indicates speed of growth for operation's count.
So, O(1), for example, indicates, that whole algorithm will not depend from N (it is constant). You can read more here.
Also, there are different types of estimations. You can estimate memory or execution time, for example - that will be different estimations in common case.
sum = 0;
for(int i = 0; i < N; i++)
for(int j = i; j >= 0; j--)
sum++;
From what I understand, the first line is 1 operation, 2nd line is (i+1) operations, 3rd line is (i-1) operations, and 4th line is n operations. Does this mean that the running time would be 1 + (i+1)(i-1) + n? It's just these last steps that confuse me.
To analyze the algorithm you don't want to go line by line asking "how much time does this particular line contribute?" The reason is that each line doesn't execute the same number of times. For example, the innermost line is executed a whole bunch of times, compared to the first line which is run just once.
To analyze an algorithm like this, try identifying some quantity whose value is within a constant factor of the total runtime of the algorithm. In this case, that quantity would probably be "how many times does the line sum++ execute?", since if we know this value, we know the total amount of time that's spent by the two loops in the algorithm. To figure this out, let's trace out what happens with these loops. On the first iteration of the outer loop, i == 0 and so the inner loop will execute exactly once (counting down from 0 to 0). On the second iteration of the outer loop, i == 1 and the inner loop executes exactly twice (first with j == 1, once with j == 0. More generally, on the kth iteration of the outer loop, the inner loop executes k + 1 times. This means that the total number of iterations of the innermost loop is given by
1 + 2 + 3 + ... + N
This quantity can be shown to be equal to
N (N + 1) N^2 + N N^2 N
--------- = ------- = --- + ---
2 2 2 2
Of these two terms, the N^2 / 2 term is the dominant growth term, and so if we ignore its constant factors we get a runtime of O(N2).
Don't look at this answer as something you should memorize - think of all of the steps required to get to the answer. We started by finding some quantity to count, and then saw how that quantity was influenced by the execution of the loops. From this, we were able to derive a mathematical expression for that quantity, which we then simplified. Finally, we took the resulting expression and determined the dominant term, which serves as the big-O for the overall function.
Work from inside-out.
sum++
This is a single operation on it's own, as it doesn't repeat.
for(int j = i; j >= 0; j--)
This loops i+1 times. There are several operations in there, but you probably don't mean to count the number of asm instructions. So I'll assume for this question this is a multiplier of i+1. Since the loop contents is a single operation, the loop and its block perform i+1 operations.
for(int i = 0; i < N; i++)
This loops N times. So as before, this is a multiplier of N. Since the block performs i+1 operations, this loop performs N(N+1)/2 operations in total. And that's your answer! If you want to consider big-O complexity, then this simplifies to O(N2).
It's not additive: the inner loop happens once for EACH iteration of the outer loop. So it's O(n2).
By the way, this is a good example of why we use asymptotic notation for this kind of thing -- depending on the definition of "operation" the exact details of the count could vary pretty widely. (Like, is sum++ a single operation, or is it add sum to 1 giving temp; load temp to sum?) But since we know that all that can be hidden in a constant factor, it's still going to be O(n2).
No; you don't count a specific number of operations for each line and then add them up. The entire point of constructions like 'for' is to make it possible for a given line of code to run more than once. You're supposed to use thinking and logic skills to figure out how many times the line 'sum++' will run, as a function of N. Hint: it runs once for every time that the third line is encountered.
How many times is the second line encountered?
Each time the second line is encountered, the value of 'i' is set. How many times does the third line run with that value of i? Therefore, how many times will it run overall? (Hint: if I give you a different amount of money on several different occasions, how do you find out the total amount I gave you?)
Each time the third line is encountered, the fourth line happens once.
Which line happens most often? How often does it happen, in terms of N?
So guess what interest you is the sum++ and how many time you execute it.
The final stat of sum would give you that answer.
Actually your loop is just:
Sigma(n) n goes from 1 to N.
Which equal to: N*(N+1) / 2 This give you in big-o-notation O(N^2)
Also beside the name of you question there is no worst case in you algorithm.
Or you could say that the worst case is when N goes to infinity.
Using Sigma notation to represent your loops: