When (not how or why) to calculate Big O of an algorithm - algorithm

I was asked this question in an interview recently and was curious as to what others thought.
"When should you calculate Big O?"
Most sites/books talk about HOW to calc Big O but not actually when you should do it. I'm an entry level developer and I have minimal experience so I'm not sure if I'm thinking on the right track. My thinking is you would have a target Big O to work towards, develop the algorithm then calculate the Big O. Then try to refactor the algorithm for efficiency.
My question then becomes is this what actually happens in industry or am I far off?

"When should you calculate Big O?"
When you care about the Time Complexity of the algorithm.
When do I care?
When you need to make your algorithm to be able to scale, meaning that it's expected to have big datasets as input to your algorithm (e.g. number of points and number of dimensions in a nearest neighbor algorithm).
Most notably, when you want to compare algorithms!
You are asked to do a task, for which several algorithms can be applied to. Which one do you choose? You compare the Space, Time and development/maintenance complexities of them, and choose the one that best fits your needs.

Big O or asymptotic notations allow us to analyze an algorithm's running time by identifying its behavior as the input size for the algorithm increases.
So whenever you need to analyse your algorithm's behavior with respect to growth of the input, you will calculate this. Let me give you an example -
Suppose you need to query over 1 billion data. So you wrote a linear search algorithm. So is it okay? How would you know? You will calculate Big-o. It's O(n) for linear search. So in worst case it would execute 1 billion instruction to query. If your machine executes 10^7 instruction per second(let's assume), then it would take 100 seconds. So you see - you are getting an runtime analysis in terms of growth of the input.

When we are solving an algorithmic problem we want to test the algorithm irrespective of hardware where we are running the algorithm. So we have certain asymptotic notation using which we can define the time and space complexities of our algorithm.
Theta-Notation: Used for defining average case complexity as it bounds the function from top and bottom
Omega-Notation: Bounds the function from below. It is used for best-time complexity
Big-O Notation: This is important as it tells about worst-case complexity and it bounds the function from top.
Now I think the answer to Why BIG-O is calculated is that using it we can get a fair idea that how bad our algorithm can perform as the size of input increases. And If we can optimize our algorithm for worst case then average and best case will take care for themselves.

I assume that you want to ask "when should I calculate time complexity?", just to avoid technicalities about Theta, Omega and Big-O.
Right attitude is to guess it almost always. Notable exceptions include piece of code you want to run just once and you are sure that it will never receive bigger input.
The emphasis on guess is because it does not matter that much whether complexity is constant or logarithmic. There is also a little difference between O(n^2) and O(n^2 log n) or between O(n^3) and O(n^4). But there is a big difference between constant and linear.
The main goal of the guess, is the answer to the question: "What happens if I get 10 times larger input?". If complexity is constant, nothing happens (in theory at least). If complexity is linear, you will get 10 times larger running time. If complexity is quadratic or bigger, you start to have problems.
Secondary goal of the guess is the answer to question: 'What is the biggest input I can handle?". Again quadratic will get you up to 10000 at most. O(2^n) ends around 25.
This might sound scary and time consuming, but in practice, getting time complexity of the code is rather trivial, since most of the things are either constant, logarithmic or linear.

It represents the upper bound.
Big-oh is the most useful because represents the worst-case behavior. So, it guarantees that the program will terminate within a certain time period, it may stop earlier, but never later.
It gives the worst time complexity or maximum time require to execute the algorithm

Related

Why big-Oh is not always a worst case analysis of an algorithm?

I am trying to learn analysis of algorithms and I am stuck with relation between asymptotic notation(big O...) and cases(best, worst and average).
I learn that the Big O notation defines an upper bound of an algorithm, i.e. it defines function can not grow more than its upper bound.
At first it sound to me as it calculates the worst case.
I google about(why worst case is not big O?) and got ample of answers which were not so simple to understand for beginner.
I concluded it as follows:
Big O is not always used to represent worst case analysis of algorithm because, suppose a algorithm which takes O(n) execution steps for best, average and worst input then it's best, average and worst case can be expressed as O(n).
Please tell me if I am correct or I am missing something as I don't have anyone to validate my understanding.
Please suggest a better example to understand why Big O is not always worst case.
Big-O?
First let us see what Big O formally means:
In computer science, big O notation is used to classify algorithms
according to how their running time or space requirements grow as the
input size grows.
This means that, Big O notation characterizes functions according to their growth rates: different functions with the same growth rate may be represented using the same O notation. Here, O means order of the function, and it only provides an upper bound on the growth rate of the function.
Now let us look at the rules of Big O:
If f(x) is a sum of several terms, if there is one with largest
growth rate, it can be kept, and all others omitted
If f(x) is a product of several factors, any constants (terms in the
product that do not depend on x) can be omitted.
Example:
f(x) = 6x^4 − 2x^3 + 5
Using the 1st rule we can write it as, f(x) = 6x^4
Using the 2nd rule it will give us, O(x^4)
What is Worst Case?
Worst case analysis gives the maximum number of basic operations that
have to be performed during execution of the algorithm. It assumes
that the input is in the worst possible state and maximum work has to
be done to put things right.
For example, for a sorting algorithm which aims to sort an array in ascending order, the worst case occurs when the input array is in descending order. In this case maximum number of basic operations (comparisons and assignments) have to be done to set the array in ascending order.
It depends on a lot of things like:
CPU (time) usage
memory usage
disk usage
network usage
What's the difference?
Big-O is often used to make statements about functions that measure the worst case behavior of an algorithm, but big-O notation doesn’t imply anything of the sort.
The important point here is we're talking in terms of growth, not number of operations. However, with algorithms we do talk about the number of operations relative to the input size.
Big-O is used for making statements about functions. The functions can measure time or space or cache misses or rabbits on an island or anything or nothing. Big-O notation doesn’t care.
In fact, when used for algorithms, big-O is almost never about time. It is about primitive operations.
When someone says that the time complexity of MergeSort is O(nlogn), they usually mean that the number of comparisons that MergeSort makes is O(nlogn). That in itself doesn’t tell us what the time complexity of any particular MergeSort might be because that would depend how much time it takes to make a comparison. In other words, the O(nlogn) refers to comparisons as the primitive operation.
The important point here is that when big-O is applied to algorithms, there is always an underlying model of computation. The claim that the time complexity of MergeSort is O(nlogn), is implicitly referencing an model of computation where a comparison takes constant time and everything else is free.
Example -
If we are sorting strings that are kk bytes long, we might take “read a byte” as a primitive operation that takes constant time with everything else being free.
In this model, MergeSort makes O(nlogn) string comparisons each of which makes O(k) byte comparisons, so the time complexity is O(k⋅nlogn). One common implementation of RadixSort will make k passes over the n strings with each pass reading one byte, and so has time complexity O(nk).
The two are not the same thing.  Worst-case analysis as other have said is identifying instances for which the algorithm takes the longest to complete (i.e., takes the most number of steps), then formulating a growth function using this.  One can analyze the worst-case time complexity using Big-Oh, or even other variants such as Big-Omega and Big-Theta (in fact, Big-Theta is usually what you want, though often Big-Oh is used for ease of comprehension by those not as much into theory).  One important detail and why worst-case analysis is useful is that the algorithm will run no slower than it does in the worst case.  Worst-case analysis is a method of analysis we use in analyzing algorithms.
Big-Oh itself is an asymptotic measure of a growth function; this can be totally independent as people can use Big-Oh to not even measure an algorithm's time complexity; its origins stem from Number Theory.  You are correct to say it is the asymptotic upper bound of a growth function; but the manner you prescribe and construct the growth function comes from your analysis.  The Big-Oh of a growth function itself means little to nothing without context as it only says something about the function you are analyzing.  Keep in mind there can be infinitely many algorithms that could be constructed that share the same time complexity (by the definition of Big-Oh, Big-Oh is a set of growth functions).
In short, worst-case analysis is how you build your growth function, Big-Oh notation is one method of analyzing said growth function.  Then, we can compare that result against other worst-case time complexities of competing algorithms for a given problem.  Worst-case analysis if done correctly yields the worst-case running time if done exactly (you can cut a lot of corners and still get the correct asymptotics if you use a barometer), and using this growth function yields the worst-case time complexity of the algorithm.  Big-Oh alone doesn't guarantee the worst-case time complexity as you had to make the growth function itself.  For instance, I could utilize Big-Oh notation for any other kind of analysis (e.g., best case, average case).  It really depends on what you're trying to capture.  For instance, Big-Omega is great for lower bounds.
Imagine a hypothetical algorithm that in best case only needs to do 1 step, in the worst case needs to do n2 steps, but in average (expected) case, only needs to do n steps. With n being the input size.
For each of these 3 cases you could calculate a function that describes the time complexity of this algorithm.
1 Best case has O(1) because the function f(x)=1 is really the highest we can go, but also the lowest we can go in this case, omega(1). Since Omega is equal to O (the upper bound and lower bound), we state that this function, in the best case, behaves like theta(1).
2 We could do the same analysis for the worst case and figure out that O(n2 ) = omega(n2 ) =theta(n2 ).
3 Same counts for the average case but with theta( n ).
So in theory you could determine 3 cases of an algorithm and for those 3 cases calculate the lower/upper/thight bounds. I hope this clears things up a bit.
https://www.google.co.in/amp/s/amp.reddit.com/r/learnprogramming/comments/3qtgsh/how_is_big_o_not_the_same_as_worst_case_or_big/
Big O notation shows how an algorithm grows with respect to input size. It says nothing of which algorithm is faster because it doesn't account for constant set up time (which can dominate if you have small input sizes). So when you say
which takes O(n) execution steps
this almost doesn't mean anything. Big O doesn't say how many execution steps there are. There are C + O(n) steps (where C is a constant) and this algorithm grows at rate n depending on input size.
Big O can be used for best, worst, or average cases. Let's take sorting as an example. Bubble sort is a naive O(n^2) sorting algorithm, but when the list is sorted it takes O(n). Quicksort is often used for sorting (the GNU standard C library uses it with some modifications). It preforms at O(n log n), however this is only true if the pivot chosen splits the array in to two equal sized pieces (on average). In the worst case we get an empty array one side of the pivot and Quicksort performs at O(n^2).
As Big O shows how an algorithm grows with respect to size, you can look at any aspect of an algorithm. Its best case, average case, worst case in both time and/or memory usage. And it tells you how these grow when the input size grows - but it doesn't say which is faster.
If you deal with small sizes then Big O won't matter - but an analysis can tell you how things will go when your input sizes increase.
One example of where the worst case might not be the asymptotic limit: suppose you have an algorithm that works on the set difference between some set and the input. It might run in O(N) time, but get faster as the input gets larger and knocks more values out of the working set.
Or, to get more abstract, f(x) = 1/x for x > 0 is a decreasing O(1) function.
I'll focus on time as a fairly common item of interest, but Big-O can also be used to evaluate resource requirements such as memory. It's essential for you to realize that Big-O tells how the runtime or resource requirements of a problem scale (asymptotically) as the problem size increases. It does not give you a prediction of the actual time required. Predicting the actual runtimes would require us to know the constants and lower order terms in the prediction formula, which are dependent on the hardware, operating system, language, compiler, etc. Using Big-O allows us to discuss algorithm behaviors while sidestepping all of those dependencies.
Let's talk about how to interpret Big-O scalability using a few examples. If a problem is O(1), it takes the same amount of time regardless of the problem size. That may be a nanosecond or a thousand seconds, but in the limit doubling or tripling the size of the problem does not change the time. If a problem is O(n), then doubling or tripling the problem size will (asymptotically) double or triple the amounts of time required, respectively. If a problem is O(n^2), then doubling or tripling the problem size will (asymptotically) take 4 or 9 times as long, respectively. And so on...
Lots of algorithms have different performance for their best, average, or worst cases. Sorting provides some fairly straightforward examples of how best, average, and worst case analyses may differ.
I'll assume that you know how insertion sort works. In the worst case, the list could be reverse ordered, in which case each pass has to move the value currently being considered as far to the left as possible, for all items. That yields O(n^2) behavior. Doubling the list size will take four times as long. More likely, the list of inputs is in randomized order. In that case, on average each item has to move half the distance towards the front of the list. That's less than in the worst case, but only by a constant. It's still O(n^2), so sorting a randomized list that's twice as large as our first randomized list will quadruple the amount of time required, on average. It will be faster than the worst case (due to the constants involved), but it scales in the same way. The best case, however, is when the list is already sorted. In that case, you check each item to see if it needs to be slid towards the front, and immediately find the answer is "no," so after checking each of the n values you're done in O(n) time. Consequently, using insertion sort for an already ordered list that is twice the size only takes twice as long rather than four times as long.
You are right, in that you can say certainly say that an algorithm runs in O(f(n)) time in the best or average case. We do that all the time for, say, quicksort, which is O(N log N) on average, but only O(N^2) worst case.
Unless otherwise specified, however, when you say that an algorithm runs in O(f(n)) time, you are saying the algorithm runs in O(f(n)) time in the worst case. At least that's the way it should be. Sometimes people get sloppy, and you will often hear that a hash table is O(1) when in the worst case it is actually worse.
The other way in which a big O definition can fail to characterize the worst case is that it's an upper bound only. Any function in O(N) is also in O(N^2) and O(2^N), so we would be entirely correct to say that quicksort takes O(2^N) time. We just don't say that because it isn't useful to do so.
Big Theta and Big Omega are there to specify lower bounds and tight bounds respectively.
There are two "different" and most important tools:
the best, worst, and average-case complexity are for generating numerical function over the size of possible problem instances (e.g. f(x) = 2x^2 + 8x - 4) but it is very difficult to work precisely with these functions
big O notation extract the main point; "how efficient the algorithm is", it ignore a lot of non important things like constants and ... and give you a big picture

Big O based comparison of an algorithm's running time

This was during a Data Structures and Algorithms lesson that I encountered this doubt. I have looked up several resources,both online and offline, but still have doubts. I was asked to figure out the running time of an algorithm and I did that correctly - O(n^3) . But what puzzled me was this question - how much slower does the algorithm work if n goes from ,say, 90 to 900,000 ? My doubts were :
The algorithm has a running time of O(n^3) , so obviously it will take more time for a larger input. But how do I compare an algorithm's performance for different inputs based on just the worst case time complexity ?
Can we just plug in the values of 'n' and divide the big-O to get a ratio ?
Please correct me wherever I am mistaken ! Thanks !
Obviously,your thinking is correct!
As the algorithm goes for the size(n),IN THE WORST-CASE,if size is increased the speed will go inversely with n^3.
Your assumption is correct.You just divide after putting values for n=900,000 by n=90 and obtain the result.This factor will be the slowing factor!
Here,slowing factor = (900,000)^3 / (90)^3 = 10^12 !
Hence, slow factor=10^12 .Your program will slow down by a factor of 10^12 !!!Such a drastic change!Or in other words,your efficiency will fall 10^(-12) times!!!
EDIT based on suggested comments :-
But how do I compare an algorithm's performance for different inputs
based on just the worst case time complexity ?
As hinted by G.Bach,one of the commentator on this post,The basic idea underlying your question is itself contradictory!You should have talked about Big-Theta Notation,instead of Big-O to think of a general solution! Big-O is an upper-bound whereas Big-Theta is a tight bound. When people only worry about what's the worst that can happen, big-O is sufficient; i.e. it says that "it can't get much worse than this". The tighter the bound the better, of course, but a tight bound isn't always easy to compute.
So,in worst case analysis your question would be answered the way both of us have answered.Now,one narrow reason that people use O instead of Ω is to drop disclaimers about worst or average cases.But,for a tighter analysis you'll have to check for both O and big-Omega as well to frame a question. This question can suitably be found solution for varied size of n though.I leave it for you to figure out.But,if you have any doubts,PLEASE,feel free to comment.
Allso,your running time is not directly related to worst case analysis,but,somehow it is related and can be reframed this way!
P.S.---> I have taken some ideas/statements from Big-oh vs big-theta. So,THANKS to the authors too!
I hope it helps.Feel free to comment if any info skimmed!
Your understanding is correct.
If the only performance measure you have is worst-case time complexity, that's all you can compare. If you had more information you could make a better estimate.
Yes, just substitute the values of n and divide 900,0003 by 903 to get the ratio.

How to translate algorithm complexity to time necessary for computation

If I know complexity of an algorithm, can I predict how long it will compute in real life?
A bit more context:
I have been trying to solve university assignment which has to find the best possible result in a game from given position. I have written an algorithm and it works, however very slow. The complexity is O(n)=5^n . For 24 elements it computes a few minutes. I'm not sure if it's because my implementation is wrong, or if this algorithm is simply very slow. Is there a way for me to approximate how much time any algorithm should take?
Worst case you can base on extrapolation. So having time on N=1,2,3,4 elements (the more the better) and O-notation estimation for algorithm complexity you can estimate time for any finite number. Another question this estimation precision goes lower and lower as N increases.
What you can do with it? Search for error estimation algorithms for such approaches. In practice it usually gives good enough result.
Also please don't forget about model adequateness checks. So having results for N=1..10 and O-notation complexity you should check 'how good' your results correlate with your O-model (if you can select numbers for O-notation formula that meets your results). If you cannot get numbers, you need either more numbers to get wider picture or ... OK, you can have wrong complexity estimation :-).
Useful links:
Brief review on algorithm complexity.
Time complexity catalogue
Really good point to start - look for examples based on code as input.
You cannot predict running time based on time complexity alone. There are many factors involved: hardware speed, programming language, implementation details, etc. The only thing you can predict using the complexity is expected time increase when the size of the input increases.
For example, personally, I've seen O(N^2) algorithms take longer than O(N^3) ones, especially on small values of N, such as it is in your case. And by, the way, 5^24 is a huge number (5.9e16). I wouldn't be surprised if that took a few hours on a supercomputer, let alone on some mid-range personal pc, which most of us are using.

What is the purpose of Big-O notation in computer science if it doesn't give all the information needed?

What is the use of Big-O notation in computer science if it doesn't give all the information needed?
For example, if one algorithm runs at 1000n and one at n, it is true that they are both O(n). But I still may make a foolish choice based on this information, since one algorithm takes 1000 times as long as the other for any given input.
I still need to know all the parts of the equation, including the constant, to make an informed choice, so what is the importance of this "intermediate" comparison? I end up loosing important information when it gets reduced to this form, and what do I gain?
What does that constant factor represent? You can't say with certainty, for example, that an algorithm that is O(1000n) will be slower than an algorithm that's O(5n). It might be that the 1000n algorithm loads all data into memory and makes 1,000 passes over that data, and the 5n algorithm makes five passes over a file that's stored on a slow I/O device. The 1000n algorithm will run faster even though its "constant" is much larger.
In addition, some computers perform some operations more quickly than other computers do. It's quite common, given two O(n) algorithms (call them A and B), for A to execute faster on one computer and B to execute faster on the other computer. Or two different implementations of the same algorithm can have widely varying runtimes on the same computer.
Asymptotic analysis, as others have said, gives you an indication of how an algorithm's running time varies with the size of the input. It's useful for giving you a good starting place in algorithm selection. Quick reference will tell you that a particular algorithm is O(n) or O(n log n) or whatever, but it's very easy to find more detailed information on most common algorithms. Still, that more detailed analysis will only give you a constant number without saying how that number relates to real running time.
In the end, the only way you can determine which algorithm is right for you is to study it yourself and then test it against your expected data.
In short, I think you're expecting too much from asymptotic analysis. It's a useful "first line" filter. But when you get beyond that you have to look for more information.
As you correctly noted, it does not give you information on the exact running time of an algorithm. It is mainly used to indicate the complexity of an algorithm, to indicate if it is linear in the input size, quadratic, exponential, etc. This is important when choosing between algorithms if you know that your input size is large, since even a 1000n algorithm well beat a 1.23 exp(n) algorithm for large enough n.
In real world algorithms, the hidden 'scaling factor' is of course important. It is therefore not uncommon to use an algorithm with a 'worse' complexity if it has a lower scaling factor. Many practical implementations of sorting algorithms are for example 'hybrid' and will resort to some 'bad' algorithm like insertion sort (which is O(n^2) but very simple to implement) for n < 10, while changing to quicksort (which is O(n log(n)) but more complex) for n >= 10.
Big-O tells you how the runtime or memory consumption of a process changes as the size of its input changes. O(n) and O(1000n) are both still O(n) -- if you double the size of the input, then for all practical purposes the runtime doubles too.
Now, we can have an O(n) algorithm and an O(n2) algorithm where the coefficient of n is 1000000 and the coefficient of n2 is 1, in which case the O(n2) algorithm would outperform the O(n) for smaller n values. This doesn't change the fact, however, that the second algorithm's runtime grows more rapidly than the first's, and this is the information that big-O tells us. There will be some input size at which the O(n) algorithm begins to outperform the O(n2) algorithm.
In addition to the hidden impact of the constant term, complexity notation also only considers the worst case instance of a problem.
Case in point, the simplex method (linear programming) has exponential complexity for all known implementations. However, the simplex method works much faster in practice than the provably polynomial-time interior point methods.
Complexity notation has much value for theoretical problem classification. If you want some more information on practical consequences check out "Smoothed Analysis" by Spielman: http://www.cs.yale.edu/homes/spielman
This is what you are looking for.
It's main purpose is for rough comparisons of logic. The difference of O(n) and O(1000n) is large for n ~ 1000 (n roughly equal to 1000) and n < 1000, but when you compare it to values where n >> 1000 (n much larger than 1000) the difference is miniscule.
You are right in saying they both scale linearly and knowing the coefficient helps in a detailed analysis but generally in computing the difference between linear (O(cn)) and exponential (O(cn^x)) performance is more important to note than the difference between two linear times. There is a larger value in the comparisons of runtime of higher orders such as and Where the performance difference scales exponentially.
The overall purpose of Big O notation is to give a sense of relative performance time in order to compare and further optimize algorithms.
You're right that it doesn't give you all information, but there's no single metric in any field that does that.
Big-O notation tells you how quickly the performance gets worse, as your dataset gets larger. In other words, it describes the type of performance curve, but not the absolute performance.
Generally, Big-O notation is useful to express an algorithm's scaling performance as it falls into one of three basic categories:
Linear
Logarithmic (or "linearithmic")
Exponential
It is possible to do deep analysis of an algorithm for very accurate performance measurements, but it is time consuming and not really necessary to get a broad indication of performance.

Estimation of program execution time from complexity

I want to know, how i can estimate the time that my program will take to execute on my machine (for example a 2.5 Ghz machine), if i have an estimation of its worst case time complexity?
For Example : - If I have a program which is O(n^2), in worst case, and n<100000, how can i know /estimate before writing the actual program/procedure, the time that it will take to execute in seconds?
Wouldn't it be good to know how a program actually performs, and it will also save writing code which eventually turns out to be inefficient!
Help greatly appreciated.
Since big O complexity ignores linear coefficients and smaller terms, it is impossible to estimate the performance of an algorithm given only its big o complexity.
In fact, for any specific N, you cannot predict which of two given algorithms will execute faster.
For example, O(N) is not always faster than O(N*N) since an algorithm that takes 100000000*n steps is O(N) is slower than an algorithm than takes N*N steps for many small values of N.
These linear coefficients and asymptotically smaller terms vary from platform to platform and even amongst algorithms of the same equivalence class (in terms of big O measure). 3
The problem you are trying to use big O notation for is not the one it is designed to solve.
Instead of dealing with complecity, you might want to have a look at Worst Case Execution Time (WCET). This area of research most likely corresponds to what you are looking for.
http://en.wikipedia.org/wiki/Worst-case_execution_time
Multiply N^2 by the time You spend in an iteration of the innermost loop, and You have a ballpark estimate.

Resources