https://www.freecodecamp.org/news/big-o-notation-why-it-matters-and-why-it-doesnt-1674cfa8a23c/
Exponentials have greater complexity than polynomials as long as the coefficients are positive multiples of n
O(2ⁿ) is more complex than O(n⁹⁹), but O(2ⁿ) is actually less complex
than O(1). We generally take 2 as base for exponentials and logarithms
because things tends to be binary in Computer Science, but exponents
can be changed by changing the coefficients. If not specified, the
base for logarithms is assumed to be 2.
I thought O(1) is the simplest in complexity. Could anyone help me explain why O(2ⁿ) is less complex than O(1) ?
Errata. The author made an obvious mistake and you caught it. It's not the only mistake in the article. For example, I would expect O(n*log(n)) to be the more appropriate complexity for sorting algorithms than the one they claim (quoted below). Otherwise, you'd be able to sort a set without even seeing all of the data.
"As complexity is often related to divide and conquer algorithms, O(log(n)) is generally a good complexity you can reach for sorting algorithms."
It might be worthwhile to try to contact the author and give him a heads up so he can correct it and avoid confusing anyone else with misinformation.
Related
BigO always checks the upper bound. So we can measure ..the way we write the code, so that there will be less time complexity and thus increase our code performance. But why do we use the lowerbound (omega) ? I did not understand the use of omega in real time. Can anybody please suggest me on this
It's a precision feature. It happens that it is usually easier to prove that an algorithm will take, say, O(n) operations to complete than proving that it will take at least O(n) operations (BTW, in this context operation means elementary computations such as the logical and arithmetic ones.)
By providing a lower bound, you are also giving an estimate of the best case scenario, as the big-O notation only provides an upper bound.
From a practical viewpoint, this has the benefit of telling that no matter what, any algorithm will require so many (elementary) steps (or more).
Note also, that it is also useful to have estimates of the average, the worst and the best cases, because these will shed more light on the complexity of the algorithm.
There are problems whose inherent complexity is known to be at least of some order (meaning there is a mathematical theorem proving the fact). So, no matter the algorithm, these problems cannot be solved with less that a certain number of calculations. This is also useful because lets you know whether a given algorithm is sub-optimal or matches the inherent complexity of the problem.
For a function with a run time of (cn)! where c is a coefficient >= 0 and c != n, would the tight bound of the run be Θ(n!) or Θ((cn)!)? Right now, I believe it would be Θ((cn)!) since they would differ by a coefficent >= n since cn != n.
Thanks!
Edit: A more specific example to clarify what I'm asking:
Will (7n)!, (5n/16)! and n! all be Θ(n!)?
You can use Stirling's approximation to get that if c>1 then (cn)! is asymptotically larger than pow(c,n)*n!, which is not O(n!) since the quotient diverges. As a more elementary approach consider this example for c=2: (2n)!=(2n)(2n-1)...(n+1)n!>n!n! and (n!n!)/n!=n! diverges, so (2n)! is NOT O(n!).
Will (7n)!, (5n/16)! and n! all be Θ(n!)?
I think there are two answers to your question.
The shorter one is from the purely theoretical point of view. Of those 3 only the n! lies in the class of Θ(n!). The second lies in the O(n!) (note big-O instead of big-Theta) and (7n)! is slower than Θ(n!), it lies in Θ((7n)!)
There is also a longer but more practical answer. And to get to it we first need to understand what is the big deal with this whole big-O and big-Theta business in the first place?
The thing is that for many practical tasks there are many algorithms and not all of them are equally or even similarly efficient. So the practical question is: can we somehow capture this difference in performance in an easy to understand and compare way? And this is the problem that big-O/big-Theta are trying to solve. The idea behind this method is that if we look at some algorithm with some complicated real formula for the exact time, there is only 1 term that grows faster than all others and thus dominates the time as the problem gets bigger. So let's compress this big formula to that dominant term. Then we can compare those terms and if they are different, we can easily say which is the better algorithm (7*n^2 is clearly better than 2*n^3).
Another idea is that the term "operation" is usually not that well defined at the level people usually think about algorithms. Which "operation" actually maps to a single CPU instruction and which to a few depends on many factors such as particular hardware. Also the instructions themselves can take different time to execute. Moreover sometimes the algorithm's working time is dominated by memory access than CPU instructions and those components are not easily additive. The morale of this story is that if two algorithms are different only in a scalar coefficient, you can't really compare those algorithms just theoretically. You need to compare some implementations in some particular environment. This is why algorithms complexity measure typically boils down to something like O(n^k) where k is a constant.
There is one more consideration: practicality. If the algorithm is some polynomial, there is a huge practical difference between cases a=3 and a=4 in O(n^a). But if it is something like O(2^(n^a)), then there is not much difference what exactly the a as along as a>1. This is because 2^n grows fast enough to make it impractical for almost any realistic n irrespective of a. So in practical terms it is often good enough approximation to put all such algorithms into a single "exponential algorithms" bucket and say they are all impractical even despite the fact there is a huge difference between them. This is where some mathematically unconventional notations like 2^O(n) come from.
From this last practical perspective the difference between Θ(n!) and Θ((7n)!) is also very little: both are totally impractical because both lie beyond even the exponential bucket of 2^O(n) (see Stirling's formula that shows that n! grows a bit faster than (n/e)^n). So it makes sense to put all such algorithms in another bucket of "factorial complexity" and mark them as impractical as well.
Consider an^2 + bn + c. I understand that for large n, bn and c become insignificant.
I also understand that for large n, the differences between 2n^2 and n^2 are pretty insignificant compared to the differences between, say n^2 and n*log(n).
However, there is still an order of 2 difference between 2n^2 and n^2. Does this matter in practice? Or do people just think about algorithms without coefficients? Why?
The actual coefficients matter if you're interested in timing. But big-O isn't actually about timing, it's about scalability. When you see an algorithm described as O(n^2), you don't really know how long it will take to solve a problem of size n on a particular computer in a particular language with a particular compiler, but you know that a problem of size 2n should take about 4 times as long.
The reason you can ignore the coefficients is that if you consider the ratio of different size problems, the lower order terms' coefficients are asymptotically dominated, and the highest order term's coefficients cancel in the ratio.
We use time complexity analysis to help us estimate the time cost and understand how far we can go. For example, the lower bound time complexity for sorting algorithm is O(nlgn), it is proved in theory, and we should never try to design a algorithm better than this.
For the coefficient, in many case it's not easy to find a accurate number in theory, since it could be effect by the input data. But it doesn't mean it's not important. Quicksort is the most widely used sorting algorithm, since the coefficient of time complexity is really small, which is only 1.39NlgN for average case.
And another interesting fact about quicksort is that we all know that the worst case for quicksort will cost O(N^2). We can use Median of Medians algorithm to reduce the worst case time complexity of quicksort to O(NlgN), but we seldom use this version in practice. It's because that the coefficient of Median of Medians version is too big, which make it unusable.
What is the use of Big-O notation in computer science if it doesn't give all the information needed?
For example, if one algorithm runs at 1000n and one at n, it is true that they are both O(n). But I still may make a foolish choice based on this information, since one algorithm takes 1000 times as long as the other for any given input.
I still need to know all the parts of the equation, including the constant, to make an informed choice, so what is the importance of this "intermediate" comparison? I end up loosing important information when it gets reduced to this form, and what do I gain?
What does that constant factor represent? You can't say with certainty, for example, that an algorithm that is O(1000n) will be slower than an algorithm that's O(5n). It might be that the 1000n algorithm loads all data into memory and makes 1,000 passes over that data, and the 5n algorithm makes five passes over a file that's stored on a slow I/O device. The 1000n algorithm will run faster even though its "constant" is much larger.
In addition, some computers perform some operations more quickly than other computers do. It's quite common, given two O(n) algorithms (call them A and B), for A to execute faster on one computer and B to execute faster on the other computer. Or two different implementations of the same algorithm can have widely varying runtimes on the same computer.
Asymptotic analysis, as others have said, gives you an indication of how an algorithm's running time varies with the size of the input. It's useful for giving you a good starting place in algorithm selection. Quick reference will tell you that a particular algorithm is O(n) or O(n log n) or whatever, but it's very easy to find more detailed information on most common algorithms. Still, that more detailed analysis will only give you a constant number without saying how that number relates to real running time.
In the end, the only way you can determine which algorithm is right for you is to study it yourself and then test it against your expected data.
In short, I think you're expecting too much from asymptotic analysis. It's a useful "first line" filter. But when you get beyond that you have to look for more information.
As you correctly noted, it does not give you information on the exact running time of an algorithm. It is mainly used to indicate the complexity of an algorithm, to indicate if it is linear in the input size, quadratic, exponential, etc. This is important when choosing between algorithms if you know that your input size is large, since even a 1000n algorithm well beat a 1.23 exp(n) algorithm for large enough n.
In real world algorithms, the hidden 'scaling factor' is of course important. It is therefore not uncommon to use an algorithm with a 'worse' complexity if it has a lower scaling factor. Many practical implementations of sorting algorithms are for example 'hybrid' and will resort to some 'bad' algorithm like insertion sort (which is O(n^2) but very simple to implement) for n < 10, while changing to quicksort (which is O(n log(n)) but more complex) for n >= 10.
Big-O tells you how the runtime or memory consumption of a process changes as the size of its input changes. O(n) and O(1000n) are both still O(n) -- if you double the size of the input, then for all practical purposes the runtime doubles too.
Now, we can have an O(n) algorithm and an O(n2) algorithm where the coefficient of n is 1000000 and the coefficient of n2 is 1, in which case the O(n2) algorithm would outperform the O(n) for smaller n values. This doesn't change the fact, however, that the second algorithm's runtime grows more rapidly than the first's, and this is the information that big-O tells us. There will be some input size at which the O(n) algorithm begins to outperform the O(n2) algorithm.
In addition to the hidden impact of the constant term, complexity notation also only considers the worst case instance of a problem.
Case in point, the simplex method (linear programming) has exponential complexity for all known implementations. However, the simplex method works much faster in practice than the provably polynomial-time interior point methods.
Complexity notation has much value for theoretical problem classification. If you want some more information on practical consequences check out "Smoothed Analysis" by Spielman: http://www.cs.yale.edu/homes/spielman
This is what you are looking for.
It's main purpose is for rough comparisons of logic. The difference of O(n) and O(1000n) is large for n ~ 1000 (n roughly equal to 1000) and n < 1000, but when you compare it to values where n >> 1000 (n much larger than 1000) the difference is miniscule.
You are right in saying they both scale linearly and knowing the coefficient helps in a detailed analysis but generally in computing the difference between linear (O(cn)) and exponential (O(cn^x)) performance is more important to note than the difference between two linear times. There is a larger value in the comparisons of runtime of higher orders such as and Where the performance difference scales exponentially.
The overall purpose of Big O notation is to give a sense of relative performance time in order to compare and further optimize algorithms.
You're right that it doesn't give you all information, but there's no single metric in any field that does that.
Big-O notation tells you how quickly the performance gets worse, as your dataset gets larger. In other words, it describes the type of performance curve, but not the absolute performance.
Generally, Big-O notation is useful to express an algorithm's scaling performance as it falls into one of three basic categories:
Linear
Logarithmic (or "linearithmic")
Exponential
It is possible to do deep analysis of an algorithm for very accurate performance measurements, but it is time consuming and not really necessary to get a broad indication of performance.
Is somebody aware of a natural program or algorithm that has a non-monotone worst-case behavior?
By non-monotone worst-case behavior I mean that there is a natural number n such that the worst-case runtime for inputs of size n+1 is less than the worst-case runtime for inputs of size n.
Of course, it is easy to construct a program with this behavior. It might even be the case that this happens for small n (like n = 1) in natural programs. But I'm interested in a useful algorithm that is non-monotone for large n.
Is somebody aware of a natural program or algorithm that has a
non-monotone worst-case behavior?
Please define "natural program or algorithm". The concept "algorithm" has a definition I'm aware of, and there are certainly algorithms (as you correctly admit) which have non-monotone worst-case runtime complexity. Is a program "natural" if it does no unecessary work or has minimal runtime complexity for the class of problem it solves? In that case, would you argue that BubbleSort isn't an algorithm? More importantly, I can define a problem the most efficient solution to which has non-monotone worst-case behavior. Would such a problem be "unnatural"? What is your definition of a "natural problem"?
Of course, it is easy to construct a program with this behavior.
Then what's the real question? Until you commit to a definition of natural/useful algorithms and problems, your question has no answer. Are you interested only in pre-existing algorithms which people already use in the real world? If so, you should state that, and the problem becomes one of searching the literature. Frankly, I cannot imagine a reasonable definition of "natural, useful algorithm" which would preclude many examples of algorithms with non-monotone runtime complexity...
But I'm interested in a useful algorithm that is non-monotone for
large n.
Please define "useful algorithm". The concept "algorithm" has a definition I'm aware of, and there are certainly algorithms (as you correctly admit) which have non-monotone worst-case runtime complexity. Is an algorithm "useful" if it correctly solves some problem? I can easily define a problem which can be solved by an algorithm with non-monotone runtime complexity.
Think about a binary search.
When implementing binary search you need to think about the case where the array segment which you're splitting is of odd length. At that point you have 2 choices:
1. Round up/down
2. Check both indexes and make a decision before continuing.
If you choose the first case (lets assume you round down). For odd length arrays where the number you're searching for is the one passed the middle point, you'll have an extra iteration to make.
If that odd array was added one more element it would have saved you that extra iteration.
If you went for the second case, then most executions of the algorithm with more odd iterations then even would require more comparisons then if it was used with an extra element.
Note that all this is very implementation dependent, and so there can't be a real answer without a real algorithm (and moreover a real implementation).
Also all this is based on the assuming you're talking about actual run-time diff and not asymptotic diff. If that's not the case, then the answer would be "no". There is no algorithms with non-monotonic worst case asymptotic running time. That would defy the concept of "worst case".