This question already has answers here:
Big O, how do you calculate/approximate it?
(24 answers)
Closed 7 years ago.
This is likely ground that has been covered but I have yet to find an explanation that I am able to understand. It is likely that I will soon feel embarrassed.
For instance, I am trying to find the order of magnitude using Big-O notation of the following:
count = 0;
for (i = 1; i <= N; i++)
count++;
Where do I begin to find what defines the magnitude? I'm relatively bad at mathematics and, even though I've tried a few resources, have yet to find something that can explain the way a piece of code is translated to an algebraic equation. Frankly, I can't even surmise a guess as to what the Big-O efficiency is regarding this loop.
These notations (big O, big omega, theta) simply say how does the algorithm will be "difficult" (or complex) asymptotically when things will get bigger and bigger.
For big O, having two functions: f(x) and g(x) where f(x) = O(g(x)) then you can say that you are able to find one x from which g(x) will be always bigger than f(x). That is why the definition contains "asymptotically" because these two functions may have any run at the beginning (for example f(x) > g(x) for few first x) but from the single point, g(x) will get always superior (g(x) >= f(x)). So you are interested in behavior in a long run (not for small numbers only). Sometimes big-O notation is named upper bound because it describes the worst possible scenario (it will never be asymptotically more difficult that this function).
That is the "mathematical" part. When it comes to practice you usually ask: How many times the algorithm will have to process something? How many operations will be done?
For your simple loop, it is easy because as your N will grow, the complexity of algorithm will grow linearly (as simple linear function), so the complexity is O(N). For N=10 you will have to do 10 operations, for N=100 => 100 operations, for N=1000 => 1000 operations... So the growth is truly linear.
I'll provide few another examples:
for (int i = 0; i < N; i++) {
if (i == randomNumber()) {
// do something...
}
}
Here it seems that the complexity will be lower because I added the condition to the loop, so we have possible chance the number of "doing something" operations will be lower. But we don't know how many times the condition will pass, it may happen it passes every time, so using big-O (the worst case) we again need to say that the complexity is O(N).
Another example:
for (int i = 0; i < N; i++) {
for (int i = 0; i < N; i++) {
// do something
}
}
Here as N will be bigger and bigger, the # of operations will grow more rapidly. Having N=10 means that you will have to do 10x10 operations, having N=100 => 100x100 operations, having N=1000 => 1000x1000 operations. You can see the growth is no longer linear it is N x N, so we have O(N x N).
For the last example I will use idea of full binary tree. Hope you know what binary tree is. So if you have simple reference to the root and you want to traverse it to the left-most leaf (from top to bottom), how many operations will you have to do if the tree has N nodes? The algorithm would be something similar to:
Node actual = root;
while(actual.left != null) {
actual = actual.left
}
// in actual we have left-most leaf
How many operations (how long loop will execute) will you have to do? Well that depends on the depth of the tree, right? And how is defined depth of full binary tree? It is something like log(N) - with base of logarithm = 2. So here, the complexity will be O(log(N)) - generally we don't care about the base of logarithm, what we care about is the function (linear, quadratic, logaritmic...)
Your example is the order
O(N)
Where N=number of elements, and a comparable computation is performed on each, thus
for (int i=0; i < N; i++) {
// some process performed N times
}
The big-O notation is probably easier than you think; in all daily code you will find examples of O(N) in loops, list iterations, searches, and any other process that does work once per individual of a set. It is the abstraction that is first unfamiliar, O(N) meaning "some unit of work", repeated N times. This "something" can be a an incrementing counter, as in your example, or it can be lengthy and resource intensive computation. Most of the time in algorithm design the 'big-O', or complexity, is more important than the unit of work, this is especially relevant as N becomes large. The description 'limiting' or 'asymptotic' is mathematically significant, it means that an algorithm of lesser complexity will always beat one that is greater no matter how significant the unit of work, given that N is large enough, or "as N grows"
Another example, to understand the general idea
for (int i=0; i < N; i++) {
for (int j=0; j < N; j++) {
// process here NxN times
}
}
Here the complexity is
O(N2)
For example, if N=10, then the second "algorithm" will take 10 times longer than the first, because 10x10 = 100 (= ten times larger). If you consider what will happen when N equals, say a million, or billion, you should be able to work out it will also take this much longer. So if you can find a way to do something in O(N) that a super-computer does in O(N2), you should be able to beat it with your old x386, pocket watch, or other old tool
Related
The following code reverses an array.What is its runtime ?
My heart says it is O(n/2), but my friend says O(n). which is correct? please answer with reason. thank you so much.
void reverse(int[] array) {
for (inti = 0; i < array.length / 2; i++) {
int other = array.length - i - 1;
int temp = array[i];
array[i] = array[other];
array[other] = temp;
}
}
Big-O complexity captures how the run-time scales with n as n gets arbitrarily large. It isn't a direct measure of performance. f(n) = 1000n and f(n) = n/128 + 10^100 are both O(n) because they both scale linearly with n even though the first scales much more quickly than the second, and the second is actually prohibitively slow for all n because of the large constant cost. Nonetheless, they have the same complexity class. For these sorts of reasons, if you want to differentiate actual performance between algorithms or define the performance of any particular algorithm (rather than how performance scales with n) asymptotic complexity is not the best tool. If you want to measure performance, you can count the exact number of operations performed by the algorithm, or better yet, provide a representative set of inputs and just measure the execution time on those inputs.
As for the particular problem, yes, the for loop runs n/2 times, but you also do some constant number of operations, c, in each of those loops (subtractions, array accesses, variable assignments, conditional check on i). Maybe c=10, it's not really important to count precisely to determine the complexity class, just to know that it's constant. The run-time is then f(n)=c*n/2, which is O(n): the fact that you only do n/2 for-loops doesn't change the complexity class.
I'm having trouble understanding something. I'm not even sure if it's correct.
In Cracking the Code Interview, there is a section that asks you to determine the Big O for a number of functions. For the most part, they're predictable.
However, one of them throws me for a loop.
Apparently, this evaluates to O(ab):
void printUnorderedPairs(int[] arrayA, int[] arrayB) {
for (int i = 0; i < arrayA.length; i++){
for (int j = 0; j < arrayB.length; j++){
for (int k = 0; k < 100000; k++){
System.out.println(arrayA[i] + "," + arrayB[j]);
}
}
}
}
With the rational that
"100,000 units of work is still constant, so the runtime is O(ab).
I'm trying to see why this could make sense, but I just can't yet; naturally, I expected O(abc).
Yes, 100,000 is a constant and arrayA and arrayB are arrays, but we're taking the length of the arrays. At the time of running these for loops, won't array[x].length be a constant (assuming the size of the arrays don't change during their execution)?
So, is the book right? If so, I would really appreciate insight and intuition so I don't fall into the same trap in the future.
Thanks!
Time complexity is generally expressed as the number of required elementary operations on an input of size n, where elementary operations are assumed to take a constant amount of time on a given computer and change only by a constant factor when run on a different computer.
O(ab) is the complexity in the above case as arrayA and arrayB are of variable length and are fully dependent on the calling function , and 100000 is constant, which won't change by any external factors.
Complexity is the measure of Unknown
The arrays A and B have an unspecified length, and all you can do is to give an indication of the complexity that is a function of these two lengths. Nothing else is variable in the given code.
What the authors meant by constant is a value that is a fix, regardless of the input size, unlike the length of input arrays that might change. For instance, the printUnorderedPairs might be called with different arrays as parameter, and those arrays might have different sizes among them.
The point of Big-O is to examine how the calculation grows as the inputs grow. It's clear that it would double if A doubled, and likewise if B doubled. So linear in those two.
What might be confusing you is that you could easily replace the 100k with C, yet another linear input, but it happens it doesn't have the 100k as a variable, it's a constant.
A similar thing in Big-O problems is where you step through an array a fixed number of times. That doesn't change the Big-O. For example if you step through an array to find the max, that's O(n). Stepping through it twice to find the min and the max is... also O(n). And in fact it's the same as stepping through it once to find the min and max in a single sweep.
I've wrote this code for bubble sort.Can someone explain me the time complexity for this. It is working similar to 2 for loops. But still want to confirm with time complexity.
public int[] sortArray(int[] inpArr)
{
int i = 0;
int j = 0;
while(i != inpArr.length-1 && j != inpArr.length-1)
{
if(inpArr[i] > inpArr[i+1])
{
int temp = inpArr[i];
inpArr[i] = inpArr[i+1];
inpArr[i+1] = temp;
}
else
{
i++;
}
if(i==inpArr.length-1)
{
j++;
i = 0;
}
}
return inpArr;
}
This would have O(n^2) time complexity. Actually, this would be probably be both O(n^2) and theta(n^2).
Look at the logic of your code. You are performing the following:
Loop through the input array
If the current item is bigger than the next, switch the two
If that is not the case, increase the index(and essentially check the next item, so recursively walk through steps 1-2)
Once your index is the length-1 of the input array, i.e. it has gone through the entire array, your index is reset (the i=0 line), and j is increased, and the process restarts.
This essentially ensures that the given array will be looped through twice, meaning that you will have a WORST-CASE (big o, or O(x)) time complexity of O(n^2), but given this code, your AVERAGE (theta) time complexity will be theta(n^2).
There are SOME situations where you can have a BEST CASE (lambda) of nlg(n), giving a lambda(nlg*(n)) time complexity, but this situation is rare and I'm not even sure its achievable with this code.
Your time complexity is O(n^2) as a worst-case scenario and O(n) as a best case scenario. Your average scenario still performs O(n^2) comparisons but will have less swaps than O(n^2). This is because you're essentially doing the same thing as having two for loops. If you're interested in algorithmic efficiency, I'd recommend checking out pre-existing libraries that sort. The computer scientists that work on these sort of things really are intense. Java's Arrays.sort() method is based on a Python project called timsort that is based on merge-sorting. The disadvantage of your (and every) Bubble sort is that it's really inefficient for big, disordered arrays. Read more here.
I recently learned about formal Big-O analysis of algorithms; however, I don't see why these 2 algorithms, which do virtually the same thing, would have drastically different running times. The algorithms both print numbers 0 up to n. I will write them in pseudocode:
Algorithm 1:
def countUp(int n){
for(int i = 0; i <= n; i++){
print(n);
}
}
Algorithm 2:
def countUp2(int n){
for(int i = 0; i < 10; i++){
for(int j = 0; j < 10; j++){
... (continued so that this can print out all values 0 - Integer.MAX_VALUE)
for(int z = 0; z < 10; z++){
print("" + i + j + ... + k);
if(("" + i + j + k).stringToInt() == n){
quit();
}
}
}
}
}
So, the first algorithm runs in O(n) time, whereas the second algorithm (depending on the programming language) runs in something close to O(n^10). Is there anything with the code that causes this to happen, or is it simply the absurdity of my example that "breaks" the math?
In countUp, the loop hits all numbers in the range [0,n] once, thus resulting in a runtime of O(n).
In countUp2, you do somewhat the exact same thing, a bunch of times. The bounds on all your loops is 10.
Say you have 3 loop running with a bound of 10. So, outer loop does 10, inner does 10x10, innermost does 10x10x10. So, worst case your innermost loop will run 1000 times, which is essentially constant time. So, for n loops with bounds [0, 10), your runtime is 10^n which, again, can be called constant time, O(1), since it is not dependent on n for worst case analysis.
Assuming you can write enough loops and that the size of n is not a factor, then you would need a loop for every single digit of n. Number of digits in n is int(math.floor(math.log10(n))) + 1; lets call this dig. So, a more strict upper bound on the number of iterations would be 10^dig (which can be kinda reduced to O(n); proof is left to the reader as an exercise).
When analyzing the runtime of an algorithm, one key thing to look for is the loops. In algorithm 1, you have code that executes n times, making the runtime O(n). In algorithm 2, you have nested loops that each run 10 times, so you have a runtime of O(10^3). This is because your code runs the innermost loop 10 times for each run of the middle loop, which in turn runs 10 times for each run of the outermost loop. So the code runs 10x10x10 times. (This is purely an upper bound however, because your if-statement may end the algorithm before the looping is complete, depending on the value of n).
To count up to n in countUp2, then you need the same number of loops as the number of digits in n: so log(n) loops. Each loop can run 10 times, so the total number of iterations is 10^log(n) which is O(n).
The first runs in O(n log n) time, since print(n) outputs O(log n) digits.
The second program assumes an upper limit for n, so is trivially O(1). When we do complexity analysis, we assume a more abstract version of the programming language where (usually) integers are unbounded but arithmetic operations still perform in O(1). In your example you're mixing up the actual programming language (which has bounded integers) with this more abstract model (which doesn't). If you rewrite the program[*] so that is has a dynamically adjustable number of loops depending on n (so if your number n has k digits, then there's k+1 nested loops), then it does one iteration of the innermost code for each number from 0 up to the next power of 10 after n. The inner loop does O(log n) work[**] as it constructs the string, so overall this program too is O(n log n).
[*] you can't use for loops and variables to do this; you'd have to use recursion or something similar, and an array instead of the variables i, j, k, ..., z.
[**] that's assuming your programming language optimizes the addition of k length-1 strings so that it runs in O(k) time. The obvious string concatenation implementation would be O(k^2) time, meaning your second program would run in O(n(log n)^2) time.
The question is rather simple, but I just can't find a good enough answer. On the most upvoted SO question regarding the big-O notation, it says that:
For example, sorting algorithms are typically compared based on comparison operations (comparing two nodes to determine their relative ordering).
Now let's consider the simple bubble sort algorithm:
for (int i = arr.length - 1; i > 0; i--) {
for (int j = 0; j < i; j++) {
if (arr[j] > arr[j+1]) {
switchPlaces(...)
}
}
}
I know that worst case is O(n²) and best case is O(n), but what is n exactly? If we attempt to sort an already sorted algorithm (best case), we would end up doing nothing, so why is it still O(n)? We are looping through 2 for-loops still, so if anything it should be O(n²). n can't be the number of comparison operations, because we still compare all the elements, right?
When analyzing the Big-O performance of sorting algorithms, n typically represents the number of elements that you're sorting.
So, for example, if you're sorting n items with Bubble Sort, the runtime performance in the worst case will be on the order of O(n2) operations. This is why Bubble Sort is considered to be an extremely poor sorting algorithm, because it doesn't scale well with increasing numbers of elements to sort. As the number of elements to sort increases linearly, the worst case runtime increases quadratically.
Here is an example graph demonstrating how various algorithms scale in terms of worst-case runtime as the problem size N increases. The dark-blue line represents an algorithm that scales linearly, while the magenta/purple line represents a quadratic algorithm.
Notice that for sufficiently large N, the quadratic algorithm eventually takes longer than the linear algorithm to solve the problem.
Graph taken from http://science.slc.edu/~jmarshall/courses/2002/spring/cs50/BigO/.
See Also
The formal definition of Big-O.
I think two things are getting confused here, n and the function of n that is being bounded by the Big-O analysis.
By convention, for any algorithm complexity analysis, n is the size of the input if nothing different is specified. For any given algorithm, there are several interesting functions of the input size for which one might calculate asymptotic bounds such as Big-O.
The commonest such function for a sorting algorithm is the worst case number of comparisons. If someone says a sorting algorithm is O(n^2), without specifying anything else, I would assume they mean the worst case comparison count is O(n^2), where n is the input size.
Another interesting function is the amount of work space, of space in addition to the array being sorted. Bubble sort's work space is O(1), constant space, because it only uses a few variables regardless of the array size.
Bubble sort can be coded to do only n-1 array element comparisons in the best case, by finishing after any pass that does no exchanges. See this pseudo code implementation, which uses swapped to remember whether there were any exchanges. If the array is already sorted the first pass does no exchanges, so the sort finishes after one pass.
n is usually the size of the input. For array, that would be the number of elements.
To see the different cases, you would need to change the algorithm:
for (int i = arr.length - 1; i > 0 ; i--) {
boolean swapped = false;
for (int j = 0; j<i; j++) {
if (arr[j] > arr[j+1]) {
switchPlaces(...);
swapped = true;
}
}
if(!swapped) {
break;
}
}
Your algorithm's best/worst cases are both O(n^2), but with the possibility of returning early, the best-case is now O(n).
n is array length. You want to find T(n) algorithm complexity.
It is much expensive to access memory then check condition if. So, you define T(n) to be number of access memory.
In the given algorithm BC and WC use O(n^2) accesses to memory because you check the if-condition O(n^2) times.
Make the complexity better: Hold a flag and if you don't do any swaps in the main-loop, it means your array is sorted and you can put a break.
Now, in BC the array is sorted and you access all elements once so O(n).
And in WC still O(n^2).