Expected syntax for Big O notation - algorithm

are there a limited amount of basic O Notations, considering you are meant to 'distil' them down to their most important part?
O(n^2):
O(n):
O(1):
O(log n) logarithmic
O(n!) factorial
O(na) polynomial
Or are you expected to work out variations such as O(n^4) etc... and if so, is that the only exception? the power of X one?

Generally, you distill Big-O notation (and related Bachman-Landau notations like Big-Theta and Big-Omega) down to the operation of the fastest-growing N term. So, you remove/simplify lesser terms (N2 + N == O(N2)) and nonvariable coefficients of the term (O(4N2) == O(N2)), but NOT powers or exponent bases (O(34N) == O(3N)). You also don't strip variable coefficients; NlogN is NlogN, NOT logN or N.
So, you will normally only see numbers in a Big-Oh notation if if the complexity is polynomial (power of N) or exponential (Nth power of a base). The most common Big-Oh notations are much as you show, with the addition of NlogN (VERY common).
However, if you are differentiating between two algorithms of equal general complexity, you MAY add lesser terms and/or coefficients back in to demonstrate the relative difference; an algorithm that performs linearly but has double the instructions of another might be described as O(2N) when comparing it with the other O(N) algorithm. However, taken individually, both algorithms are linear (O(N)).
Some Big-O notations are not algebraic, and may involve multiple variables in therir simplest general-case form. The counting sort, for instance, is complexity O(Max(N,M)), where N is the number of elements in the list, and M is the range of those elements. Often it is possible to reduce this in specific cases by defining M in terms of N and thus reducing to a single variable (if the list in question is of the first N squares, M = N2-1), but in the general case both variables are independent and significant. BucketSort's complexity is officially O(N), but really it's more like O(NlogM) where M is the maximum value of the list of N elements. M is usually considered insignificant, but that depends on the values you normally sort (sorting 5 values each in the billions will require more loops to compare each power of 10 than traversals through the list to put them in the buckets) and on the radix used (RadixSort is a base-2 BucketSort; again, sorting values with a greater log2 value will require more loops than traversals).

The Big-O notation is a way to provide an upper bound on the limiting behaviour of a function. There are no restrictions on its functional form. However, there are certain conventions, as explained by Wikipedia:
In typical usage, the formal definition of O notation is not used directly; rather, the O notation for a function f(x) is derived by the following simplification rules:
If f(x) is a sum of several terms, the one with the largest growth rate is kept, and all others omitted.
If f(x) is a product of several factors, any constants (terms in the product that do not depend on x) are omitted.
There are, of course, some functional forms that show up more frequently that others. Some common classes are listed here.

No, the number of different O-classes is not finite.
As you already mentioned O(n^x) describes a different set for every x. And that is not the only "exception". O(x^n) is also a different set for every x. Likewise O(n^n), O(n^n^n), O(n^n^n^n) etc. are all different sets (and you can of course continue that ad infinitum).

In general, you split the expression into a sum of products, keep the largest term, and divide by constants to simplify it as much as possible.
ex:
n(2n+3log(n)) => 2n^2+3nlog(n) => 2n^2 => n^2
(n+1)(2nlog(n)+n) => 2n^2log(n)+n^2+2nlog(n)+n => 2n^2log(n) => n^2log(n)

Related

Asymptotic analysis, Upper bound

I have some confusion regarding the Asymptotic Analysis of Algorithms.
I have been trying to understand this upper bound case, seen a couple of youtube videos. In one of them, there was an example of this equation
where we have to find the upper bound of the equation 2n+3. So, by looking at this, one can say that it is going o be O(n).
My first question :
In algorithmic complexity, we have learned to drop the constants and find the dominant term, so is this Asymptotic Analysis to prove that theory? or does it have other significance? otherwise, what is the point of this analysis when it is always going to be the biggest n in the equation, example- if it were n+n^2+3, then the upper bound would always be n^2 for some c and n0.
My second question :
as per rule the upper bound formula in Asymptotic Analysis must satisfy this condition f(n) = O(g(n)) IFF f(n) < c.g(n) where n>n0,c>0, n0>=1
i) n is the no of inputs, right? or does n represent the number of steps we perform? and does f(n) represents the algorithm?
ii) In the following video to prove upper bound of the equation 2n+3 could be n^2 the presenter considered c =1, and that is why to satisfy the equation n had to be >= 3 whereas one could have chosen c= 5 and n=1 as well, right? So then why were, in most cases in the video, the presenter was changing the value of n and not c to satisfy the conditions? is there a rule, or is it random? Can I change either c or n(n0) to satisfy the condition?
My Third Question:
In the same video, the presenter mentioned n0 (n not) is the number of steps. Is that correct? I thought n0 is the limit after which the graph becomes the upper bound (after n0, it satisfies the condition for all values of n); hence n0 also represents the input.
Would you please help me understand because people come up with different ideas in different explanations, and I want to understand them correctly?
Edit
The accepted answer clarified all of the questions except the first one. I have gone through many articles on the web, and here I am documenting my conclusion if anyone else has the same question. This will help them.
My first question was
In algorithmic complexity, we have learned to drop the constants and
find the dominant term, so is this Asymptotic Analysis to prove that
theory?
No, Asymptotic Analysis describes the algorithmic complexity, which is all about understanding or visualizing the Asymptotic behavior or the tail behavior of a function or a group of functions by plotting mathematical expression.
In computer science, we use it to evaluate (note: evaluate is not measuring) the performance of an algorithm in terms of input size.
for example, these two functions belong to the same group
mySet = set()
def addToMySet(n):
for i in range(n):
mySet.add(i*i)
mySet2 = set()
def addToMySet2(n):
for i in range(n):
for j in range(500):
mySet2.add(i*j)
Even though the execution time of the addToMySet2(n) is always > the execution time of addToMySet(n), the tail behavior of both of these functions would be the same with respect to the largest n, if one plot them in a graph the tendency of that graph for both of the functions would be linear thus they belong to the same group. Using Asymptotic Analysis, we get to see the behavior and group them.
A mistake that I made assuming upper bound represents the worst case. In reality, The upper bound of any algorithm is associated with all of the best, average, and worst cases. so the correct way of putting that would be
upper/lower bound in the best/average/worst case of an
algorithm
.
We can't relate the upper bound of an algorithm with the worst-case time complexity and the lower bound with the best-case complexity. However, an upper bound can be higher than the worst-case because upper bounds are usually asymptotic formulae that have been proven to hold.
I have seen this kind of question like find the worst-case time complexity of such and such algorithm, and the answer is either O(n) or O(n^2) or O(log-n), etc.
For example, if we consider the function addToMySet2(n), one would say the algorithmic time complexity of that function is O(n), which is technically wrong because there are three factors bound, bound type, (inclusive upper bound and strict upper bound) and case are involved determining the algorithmic time complexity.
When one denote O(n) it is derived from this Asymptotic Analysis f(n) = O(g(n)) IFF for any c>0, there is a n0>0 from which f(n) < c.g(n) (for any n>n0) so we are considering upper bound of best/average/worst case. In the above statement the case is missing.
I think We can consider, when not indicated, the big O notation generally describes an asymptotic upper bound on the worst-case time complexity. Otherwise, one can also use it to express asymptotic upper bounds on the average or best case time complexities
The whole point of asymptotic analysis is to compare algorithms performance scaling. For example, if I write two version of the same algorithm, one with O(n^2) time complexity and the other with O(n*log(n)) time complexity, I know for sure that the O(n*log(n)) one will be faster when n is "big". How big? it depends. You actually can't know unless you benchmark it. What you know is at some point, the O(n*log(n)) will always be better.
Now with your questions:
the "lower" n in n+n^2+3 is "dropped" because it is negligible when n scales up compared to the "dominant" one. That means that n+n^2+3 and n^2 behave the same asymptotically. It is important to note that even though 2 algorithms have the same time complexity, it does not mean they are as fast. For example, one could be always 100 times faster than the other and yet have the exact same complexity.
(i) n can be anything. It may be the size of the input (eg. an algorithm that sorts a list) but it may also be the input itself (eg. an algorithm that give the n-th prime number) or a number of iteration, etc
(ii) he could have taken any c, he chose c=1 as an example as he could have chosen c=1.618. Actually the correct formulation would be:
f(n) = O(g(n)) IFF for any c>0, there is a n0>0 from which f(n) < c.g(n) (for any n>n0)
the n0 from the formula is a pure mathematical construct. For c>0, it is the n value from which the function f is bounded by g. Since n can represent anything (size of a list, input value, etc), it is the same for n0

Is there a way to simplify O(n^c) · O(c^n)?

I'm trying to figure out if there's a way to simplify O(nc) · O(cn) but couldn't come up with anything. Is this equal to infinity, or would it fall under one of the more common type of complexities (O(n), O(nc), O(cn), O(n log n), O(log n) or O(n!))?
O(nc) · O(cn) means the set of functions that can be expressed as f(n) · g(n) where f(n) ∈ O(nc) and g(n) ∈ O(cn).
Since nc ∈ O((1+δ)n) for arbitrarily small positive δ, we have O(1n) ⊂ O(nc) ⊂ O((1+δ)n). (Do you see why?)
Additionally, it's a general property of big-O notation that O(foo) · O(bar) = O(foo · bar); so we can always move multiplication inside the O (or pull multiplication outside the O). (Do you see why?)
Combining these two observations, we have O(cn) ⊂ O(nc) · O(cn) ⊂ O((c+δ)n) for arbitrarily small positive δ. So you can reasonably simplify O(nc) · O(cn) to O((c+δ)n). This is analogous to how O(n log n) is sometimes called "quasilinear", because it's a subset of O(n1+δ) for arbitrarily small positive δ.
e^N grows faster then all functions from your list of asymptotically equivalent candidates.
To find out whether g(N) = E^n*n^E and f(N) = e^N have the same order of grows we need the limit of g(N)/f(N) -> 1 when N->infinity. But:
So given function doesn't fall under one of the more common type of complexities (O(n), O(nc), O(cn), O(n log n), O(log n)) - it grows faster.
Following log log plot illustrates and compares order of grows of mentioned functions:
Some algorithms has exponential brute force solution but may be simplified(restrictions added) and become faster. The brute force solution of the traveling salesman problem is O(n!) which is approximately O(N^N).
Nowadays input with N = 30 is solvable in minutes for 2^N. For N = 1..30 you can observe behavior of c^n*n^c and c^N using following loglog plot:

Why do we ignore co-efficients in Big O notation?

While searching for answers relating to "Big O" notation, I have seen many SO answers such as this, this, or this, but still I have not clearly understood some points.
Why do we ignore the co-efficients?
For example this answer says that the final complexity of 2N + 2 is O(N); we remove the leading co-efficient 2 and the final constant 2 as well.
Removing the final constant of 2 perhaps understandable. After all, N may be very large and so "forgetting" the final 2 may only change the grand total by a small percentage.
However I cannot clearly understand how removing the leading co-efficient does not make difference. If the leading 2 above became a 1 or a 3, the percentage change to the grand total would be large.
Similarly, apparently 2N^3 + 99N^2 + 500 is O(N^3). How do we ignore the 99N^2 along with the 500?
The purpose of the Big-O notation is to find what is the dominant factor in the asymptotic behavior of a function as the value tends towards the infinity.
As we walk through the function domain, some factors become more important than others.
Imagine f(n) = n^3+n^2. As n goes to infinity, n^2 becomes less and less relevant when compared with n^3.
But that's just the intuition behind the definition. In practice we ignore some portions of the function because of the formal definition:
f(x) = O(g(x)) as x->infinity
if and only if there is a positive real M and a real x_0 such as
|f(x)| <= M|g(x)| for all x > x_0.
That's in wikipedia. What that actually means is that there is a point (after x_0) after which some multiple of g(x) dominates f(x). That definition acts like a loose upper bound on the value of f(x).
From that we can derive many other properties, like f(x)+K = O(f(x)), f(x^n+x^n-1)=O(x^n), etc. It's just a matter of using the definition to prove those.
In special, the intuition behind removing the coefficient (K*f(x) = O(f(x))) lies in what we try to measure with computational complexity. Ultimately it's all about time (or any resource, actually). But it's hard to know how much time each operation take. One algorithm may perform 2n operations and the other n, but the latter may have a large constant time associated with it. So, for this purpose, isn't easy to reason about the difference between n and 2n.
From a (complexity) theory point of view, the coefficients represent hardware details that we can ignore. Specifically, the Linear Speedup Theorem dictates that for any problem we can always throw an exponentially increasing amount of hardware (money) at a computer to get a linear boost in speed.
Therefore, modulo expensive hardware purchases two algorithms that solve the same problem, one at twice the speed of the other for all input sizes, are considered essentially the same.
Big-O (Landau) notation has its origins independently in number theory, where one of its uses is to create a kind of equivalence between functions: if a given function is bounded above by another and simultaneously is bounded below by a scaled version of that same other function, then the two functions are essentially the same from an asymptotic point of view. The definition of Big-O (actually, "Big-Theta") captures this situation: the "Big-O" (Theta) of the two functions are exactly equal.
The fact that Big-O notation allows us to disregard the leading constant when comparing the growth of functions makes Big-O an ideal vehicle to measure various qualities of algorithms while respecting (ignoring) the "freebie" optimizations offered by the Linear Speedup Theorem.
Big O provides a good estimate of what algorithms are more efficient for larger inputs, all things being equal; this is why for an algorithm with an n^3 and an n^2 factor we ignore the n^2 factor, because even if the n^2 factor has a large constant it will eventually be dominated by the n^3 factor.
However, real algorithms incorporate more than simple Big O analysis, for example a sorting algorithm will often start with a O(n * log(n)) partitioning algorithm like quicksort or mergesort, and when the partitions become small enough the algorithm will switch to a simpler O(n^2) algorithm like insertionsort - for small inputs insertionsort is generally faster, although a basic Big O analysis doesn't reveal this.
The constant factors often aren't very interesting, and so they're omitted - certainly a difference in factors on the order of 1000 is interesting, but usually the difference in factors are smaller, and then there are many more constant factors to consider that may dominate the algorithms' constants. Let's say I've got two algorithms, the first with running time 3*n and the second with running time 2*n, each with comparable space complexity. This analysis assumes uniform memory access; what if the first algorithm interacts better with the cache, and this more than makes up for the worse constant factor? What if more compiler optimizations can be applied to it, or it behaves better with the memory management subsystem, or requires less expensive IO (e.g. fewer disk seeks or fewer database joins or whatever) and so on? The constant factor for the algorithm is relevant, but there are many more constants that need to be considered. Often the easiest way to determine which algorithm is best is just to run them both on some sample inputs and time the results; over-relying on the algorithms' constant factors would hide this step.
An other thing is that, what I have understood, the complexity of 2N^3 + 99N^2 + 500 will be O(N^3). So how do we ignore/remove 99N^2 portion even? Will it not make difference when let's say N is one miilion?
That's right, in that case the 99N^2 term is far overshadowed by the 2N^3 term. The point where they cross is at N=49.5, much less than one million.
But you bring up a good point. Asymptotic computational complexity analysis is in fact often criticized for ignoring constant factors that can make a huge difference in real-world applications. However, big-O is still a useful tool for capturing the efficiency of an algorithm in a few syllables. It's often the case that an n^2 algorithm will be faster in real life than an n^3 algorithm for nontrivial n, and it's almost always the case that a log(n) algorithm will be much faster than an n^2 algorithm.
In addition to being a handy yardstick for approximating practical efficiency, it's also an important tool for the theoretical analysis of algorithm complexity. Many useful properties arise from the composability of polynomials - this makes sense because nested looping is fundamental to computation, and those correspond to polynomial numbers of steps. Using asymptotic complexity analysis, you can prove a rich set of relationships between different categories of algorithms, and that teaches us things about exactly how efficiently certain problems can be solved.
Big O notation is not an absolute measure of complexity.
Rather it is a designation of how complexity will change as the variable changes. In other words as N increases the complexity will increase
Big O(f(N)).
To explain why terms are not included we look at how fast the terms increase.
So, Big O(2n+2) has two terms 2n and 2. Looking at the rate of increase
Big O(2) this term will never increase it does not contribute to the rate of increase at all so it goes away. Also since 2n increases faster than 2, the 2 turns into noise as n gets very large.
Similarly Big O(2n^3 + 99n^2) compares Big O(2n^3) and Big O(99n^2). For small values, say n < 50, the 99n^2 will contribute a larger nominal percentage than 2n^3. However if n gets very large, say 1000000, then 99n^2 although nominally large it is insignificant (close to 1 millionth) compared to the size of 2n^3.
As a consequence Big O(n^i) < Big O(n^(i+1)).
Coefficients are removed because of the mathematical definition of Big O.
To simplify the definition says Big O(f(n)) = Big O(f(cn)) for a constant c. This needs to be taken on faith because the reason for this is purely mathematical, and as such the proof would be too complex and dry to explain in simple terms.
The mathematical reason:
The real reason why we do this, is the way Big O-Notation is defined:
A series (or lets use the word function) f(n) is in O(g(n)) when the series f(n)/g(n) is bounded. Example:
f(n)= 2*n^2
g(n)= n^2
f(n) is in O(g(n)) because (2*n^2)/(n^2) = 2 as n approaches Infinity. The term (2*n^2)/(n^2) doesn't become infinitely large (its always 2), so the quotient is bounded and thus 2*n^2 is in O(n^2).
Another one:
f(n) = n^2
g(n) = n
The term n^2/n (= n) becomes infinetely large, as n goes to infinity, so n^2 is not in O(n).
The same principle applies, when you have
f(n) = n^2 + 2*n + 20
g(n) = n^2
(n^2 + 2*n + 20)/(n^2) is also bounded, because it tends to 1, as n goes to infinity.
Big-O Notation basically describes, that your function f(n) is (from some value of n on to infinity) smaller than a function g(n), multiplied by a constant. With the previous example:
2*n^2 is in O(n^2), because we can find a value C, so that 2*n^2 is smaller than C*n^2. In this example we can pick C to be 5 or 10, for example, and the condition will be satisfied.
So what do you get out of this? If you know your algorithm has complexity O(10^n) and you input a list of 4 numbers, it may take only a short time. If you input 10 numbers, it will take a million times longer! If it's one million times longer or 5 million times longer doesn't really matter here. You can always use 5 more computers for it and have it run in the same amount of time, the real problem here is, that it scales incredibly bad with input size.
For practical applications the constants does matter, so O(2 n^3) will be better than O(1000 n^2) for inputs with n smaller than 500.
There are two main ideas here: 1) If your algorithm should be great for any input, it should have a low time complexity, and 2) that n^3 grows so much faster than n^2, that perfering n^3 over n^2 almost never makes sense.

Max and min of functions in order-notation

Order notation question, big-o notation and the like:
What does the max and min of a function mean in terms of order notation?
for example:
DEFINITION:
"Maximum" rules: Suppose that f(n) and g(n) are positive functions for all n > n0.
Then:
O[f(n) + g(n)] = O[max (f(n),g(n)) ]
etc...
I need to use these definitions to prove something for homework.. thanks for the help!
EDIT: f(n) and g(n) are supposed to represent running times of algorithms with respect to input size
With Big-O notation, you're talking about upper bounds on computation. This means that you are only interested in the largest term of the combined function as n (the variable) tends to infinity. What's more, you drop any constant multipliers as well since the formal definition of the notation allows you to discard those parts, which is important as it allows you to focus on the algorithm's behavior and not on the implementation of the algorithm.
So, we are combining two functions through summation. Well, there are two cases (strictly three, but it's symmetric):
One function is of higher order than the other. At that point, the higher order function dominates the lesser; you can pretend that the lesser doesn't exist.
Both functions are of the same order. This is just like you are doing some kind of ratio-ed sum of the two (since we've already thrown away the scaling factors) but then you just end up with the same factor and have just changed the scaling factors a bit.
The net result looks very much like the max() function, even though it's really not (it's actually a generalization of max over the space of functions), so it's very convenient to use the notation.
It is a regular max between natural numbers. f is a function mapped to numbers [f:N->N], and so is g.
Thus, f(n) is in N, and so max(f(n),g(n)) is just standard max: f(n) > g(n) ? f(n) : g(n)
O[max (f(n),g(n)) ] means: which ever is more 'expensive': f or g: it is the upper bound.

Meaning of average complexity when using Big-O notation

While answering to this question a debate began in comments about complexity of QuickSort. What I remember from my university time is that QuickSort is O(n^2) in worst case, O(n log(n)) in average case and O(n log(n)) (but with tighter bound) in best case.
What I need is a correct mathematical explanation of the meaning of average complexity to explain clearly what it is about to someone who believe the big-O notation can only be used for worst-case.
What I remember if that to define average complexity you should consider complexity of algorithm for all possible inputs, count how many degenerating and normal cases. If the number of degenerating cases divided by n tend towards 0 when n get big, then you can speak of average complexity of the overall function for normal cases.
Is this definition right or is definition of average complexity different ? And if it's correct can someone state it more rigorously than I ?
You're right.
Big O (big Theta etc.) is used to measure functions. When you write f=O(g) it doesn't matter what f and g mean. They could be average time complexity, worst time complexity, space complexities, denote distribution of primes etc.
Worst-case complexity is a function that takes size n, and tells you what is maximum number of steps of an algorithm given input of size n.
Average-case complexity is a function that takes size n, and tells you what is expected number of steps of an algorithm given input of size n.
As you see worst-case and average-case complexity are functions, so you can use big O to express their growth.
If you're looking for a formal definition, then:
Average complexity is the expected running time for a random input.
Let's refer Big O Notation in Wikipedia:
Let f and g be two functions defined on some subset of the real numbers. One writes f(x)=O(g(x)) as x --> infinity if ...
So what the premise of the definition states is that the function f should take a number as an input and yield a number as an output. What input number are we talking about? It's supposedly a number of elements in the sequence to be sorted. What output number could we be talking about? It could be a number of operations done to order the sequence. But stop. What is a function? Function in Wikipedia:
a function is a relation between a set of inputs and a set of permissible outputs with the property that each input is related to exactly one output.
Are we producing exacly one output with our prior defition? No, we don't. For a given size of a sequence we can get a wide variation of number of operations. So to ensure the definition is applicable to our case we need to reduce a set possible outcomes (number of operations) to a single value. It can be a maximum ("the worse case"), a minimum ("the best case") or an average.
The conclusion is that talking about best/worst/average case is mathematically correct and using big O notation without those in context of sorting complexity is somewhat sloppy.
On the other hand, we could be more precise and use big Theta notation instead of big O notation.
I think your definition is correct, but your conclusions are wrong.
It's not necessarily true that if the proportion of "bad" cases tends to 0, then the average complexity is equal to the complexity of the "normal" cases.
For example, suppose that 1/(n^2) cases are "bad" and the rest "normal", and that "bad" cases take exactly (n^4) operations, whereas "normal" cases take exactly n operations.
Then the average number of operations required is equal to:
(n^4/n^2) + n(n^2-1)/(n^2)
This function is O(n^2), but not O(n).
In practice, though, you might find that time is polynomial in all cases, and the proportion of "bad" cases shrinks exponentially. That's when you'd ignore the bad cases in calculating an average.
Average case analysis does the following:
Take all inputs of a fixed length (say n), sum up all the running times of all instances of this length, and build the average.
The problem is you will probably have to enumerate all inputs of length n in order to come up with an average complexity.

Resources