Order notation question, big-o notation and the like:
What does the max and min of a function mean in terms of order notation?
for example:
DEFINITION:
"Maximum" rules: Suppose that f(n) and g(n) are positive functions for all n > n0.
Then:
O[f(n) + g(n)] = O[max (f(n),g(n)) ]
etc...
I need to use these definitions to prove something for homework.. thanks for the help!
EDIT: f(n) and g(n) are supposed to represent running times of algorithms with respect to input size
With Big-O notation, you're talking about upper bounds on computation. This means that you are only interested in the largest term of the combined function as n (the variable) tends to infinity. What's more, you drop any constant multipliers as well since the formal definition of the notation allows you to discard those parts, which is important as it allows you to focus on the algorithm's behavior and not on the implementation of the algorithm.
So, we are combining two functions through summation. Well, there are two cases (strictly three, but it's symmetric):
One function is of higher order than the other. At that point, the higher order function dominates the lesser; you can pretend that the lesser doesn't exist.
Both functions are of the same order. This is just like you are doing some kind of ratio-ed sum of the two (since we've already thrown away the scaling factors) but then you just end up with the same factor and have just changed the scaling factors a bit.
The net result looks very much like the max() function, even though it's really not (it's actually a generalization of max over the space of functions), so it's very convenient to use the notation.
It is a regular max between natural numbers. f is a function mapped to numbers [f:N->N], and so is g.
Thus, f(n) is in N, and so max(f(n),g(n)) is just standard max: f(n) > g(n) ? f(n) : g(n)
O[max (f(n),g(n)) ] means: which ever is more 'expensive': f or g: it is the upper bound.
Related
I‘m trying to wrap my head around the meaning of the Landau-Notation in the context of analysing an algorithm‘s complexity.
What exactly does the O formally mean in Big-O-Notation?
So the way I understand it is that O(g(x)) gives a set of functions which grow as rapidly or slower as g(x), meaning, for example in the case of O(n^2):
where t(x) could be, for instance, x + 3 or x^2 + 5. Is my understanding correct?
Furthermore, are the following notations correct?
I saw the following written down by a tutor. What does this mean? How can you use less or equal, if the O-Notation returns a set?
Could I also write something like this?
So the way I understand it is that O(g(x)) gives a set of functions which grow as rapidly or slower as g(x).
This explanation of Big-Oh notation is correct.
f(n) = n^2 + 5n - 2, f(n) is an element of O(n^2)
Yes, we can say that. O(n^2) in plain English, represents "set of all functions that grow as rapidly as or slower than n^2". So, f(n) satisfies that requirement.
O(n) is a subset of O(n^2), O(n^2) is a subset of O(2^n)
This notation is correct and it comes from the definition. Any function that is in O(n), is also in O(n^2) since growth rate of it is slower than n^2. 2^n is an exponential time complexity, whereas n^2 is polynomial. You can take limit of n^2 / 2^n as n goes to infinity and prove that O(n^2) is a subset of O(2^n) since 2^n grows bigger.
O(n) <= O(n^2) <= O(2^n)
This notation is tricky. As explained here, we don't have "less than or equal to" for sets. I think tutor meant that time complexity for the functions belonging to the set O(n) is less than (or equal to) the time complexity for the functions belonging to the set O(n^2). Anyways, this notation doesn't really seem familiar, and it's best to avoid such ambiguities in textbooks.
O(g(x)) gives a set of functions which grow as rapidly or slower as g(x)
That's technically right, but a bit imprecise. A better description contains the addenda
O(g(x)) gives the set of functions which are asymptotically bounded above by g(x), up to constant factors.
This may seem like a nitpick, but one inference from the imprecise definition is wrong.
The 'fixed version' of your first equation, if you make the variables match up and have one limit sign, seems to be:
This is incorrect: the ratio only has to be less than or equal to some fixed constant c > 0.
Here is the correct version:
where c is some fixed positive real number, that does not depend on n.
For example, f(x) = 3 (n^2) is in O(n^2): one constant c that works for this f is c = 4. Note that the requirement isn't 'for all c > 0', but rather 'for at least one constant c > 0'
The rest of your remarks are accurate. The <= signs in that expression are an unusual usage, but it's true if <= means set inclusion. I wouldn't worry about that expression's meaning.
There's other, more subtle reasons to talk about 'boundedness' rather than growth rates. For instance, consider the cosine function. |cos(x)| is in O(1), but its derivative fluctuates from negative one to positive one even as x increases to infinity.
If you take 'growth rate' to mean something like the derivative, example like these become tricky to talk about, but saying |cos(x)| is bounded by 2 is clear.
For an even better example, consider the logistic curve. The logistic function is O(1), however, its derivative and growth rate (on positive numbers) is positive. It is strictly increasing/always growing, while 1 has a growth rate of 0. This seems to conflict with the first definition without lots of additional clarifying remarks of what 'grow' means.
An always growing function in O(1) (image from the Wikipedia link):
I have some confusion regarding the Asymptotic Analysis of Algorithms.
I have been trying to understand this upper bound case, seen a couple of youtube videos. In one of them, there was an example of this equation
where we have to find the upper bound of the equation 2n+3. So, by looking at this, one can say that it is going o be O(n).
My first question :
In algorithmic complexity, we have learned to drop the constants and find the dominant term, so is this Asymptotic Analysis to prove that theory? or does it have other significance? otherwise, what is the point of this analysis when it is always going to be the biggest n in the equation, example- if it were n+n^2+3, then the upper bound would always be n^2 for some c and n0.
My second question :
as per rule the upper bound formula in Asymptotic Analysis must satisfy this condition f(n) = O(g(n)) IFF f(n) < c.g(n) where n>n0,c>0, n0>=1
i) n is the no of inputs, right? or does n represent the number of steps we perform? and does f(n) represents the algorithm?
ii) In the following video to prove upper bound of the equation 2n+3 could be n^2 the presenter considered c =1, and that is why to satisfy the equation n had to be >= 3 whereas one could have chosen c= 5 and n=1 as well, right? So then why were, in most cases in the video, the presenter was changing the value of n and not c to satisfy the conditions? is there a rule, or is it random? Can I change either c or n(n0) to satisfy the condition?
My Third Question:
In the same video, the presenter mentioned n0 (n not) is the number of steps. Is that correct? I thought n0 is the limit after which the graph becomes the upper bound (after n0, it satisfies the condition for all values of n); hence n0 also represents the input.
Would you please help me understand because people come up with different ideas in different explanations, and I want to understand them correctly?
Edit
The accepted answer clarified all of the questions except the first one. I have gone through many articles on the web, and here I am documenting my conclusion if anyone else has the same question. This will help them.
My first question was
In algorithmic complexity, we have learned to drop the constants and
find the dominant term, so is this Asymptotic Analysis to prove that
theory?
No, Asymptotic Analysis describes the algorithmic complexity, which is all about understanding or visualizing the Asymptotic behavior or the tail behavior of a function or a group of functions by plotting mathematical expression.
In computer science, we use it to evaluate (note: evaluate is not measuring) the performance of an algorithm in terms of input size.
for example, these two functions belong to the same group
mySet = set()
def addToMySet(n):
for i in range(n):
mySet.add(i*i)
mySet2 = set()
def addToMySet2(n):
for i in range(n):
for j in range(500):
mySet2.add(i*j)
Even though the execution time of the addToMySet2(n) is always > the execution time of addToMySet(n), the tail behavior of both of these functions would be the same with respect to the largest n, if one plot them in a graph the tendency of that graph for both of the functions would be linear thus they belong to the same group. Using Asymptotic Analysis, we get to see the behavior and group them.
A mistake that I made assuming upper bound represents the worst case. In reality, The upper bound of any algorithm is associated with all of the best, average, and worst cases. so the correct way of putting that would be
upper/lower bound in the best/average/worst case of an
algorithm
.
We can't relate the upper bound of an algorithm with the worst-case time complexity and the lower bound with the best-case complexity. However, an upper bound can be higher than the worst-case because upper bounds are usually asymptotic formulae that have been proven to hold.
I have seen this kind of question like find the worst-case time complexity of such and such algorithm, and the answer is either O(n) or O(n^2) or O(log-n), etc.
For example, if we consider the function addToMySet2(n), one would say the algorithmic time complexity of that function is O(n), which is technically wrong because there are three factors bound, bound type, (inclusive upper bound and strict upper bound) and case are involved determining the algorithmic time complexity.
When one denote O(n) it is derived from this Asymptotic Analysis f(n) = O(g(n)) IFF for any c>0, there is a n0>0 from which f(n) < c.g(n) (for any n>n0) so we are considering upper bound of best/average/worst case. In the above statement the case is missing.
I think We can consider, when not indicated, the big O notation generally describes an asymptotic upper bound on the worst-case time complexity. Otherwise, one can also use it to express asymptotic upper bounds on the average or best case time complexities
The whole point of asymptotic analysis is to compare algorithms performance scaling. For example, if I write two version of the same algorithm, one with O(n^2) time complexity and the other with O(n*log(n)) time complexity, I know for sure that the O(n*log(n)) one will be faster when n is "big". How big? it depends. You actually can't know unless you benchmark it. What you know is at some point, the O(n*log(n)) will always be better.
Now with your questions:
the "lower" n in n+n^2+3 is "dropped" because it is negligible when n scales up compared to the "dominant" one. That means that n+n^2+3 and n^2 behave the same asymptotically. It is important to note that even though 2 algorithms have the same time complexity, it does not mean they are as fast. For example, one could be always 100 times faster than the other and yet have the exact same complexity.
(i) n can be anything. It may be the size of the input (eg. an algorithm that sorts a list) but it may also be the input itself (eg. an algorithm that give the n-th prime number) or a number of iteration, etc
(ii) he could have taken any c, he chose c=1 as an example as he could have chosen c=1.618. Actually the correct formulation would be:
f(n) = O(g(n)) IFF for any c>0, there is a n0>0 from which f(n) < c.g(n) (for any n>n0)
the n0 from the formula is a pure mathematical construct. For c>0, it is the n value from which the function f is bounded by g. Since n can represent anything (size of a list, input value, etc), it is the same for n0
According to CourseEra course on Algorithms and Introduction to Algorithms
, a function G(n) where n is the input size is said to be a big oh notation of F(n) when there exists constants n0 and C such that this inequality holds true
F(n) <= C*G(N) ( For all N > N0 )
Now ,
This mathematical definition is very clear to me .
But as it was taught to me by my teacher today , I am confused!
He said that "Big - Oh Notations are upper bound on a function and it is like the LCM of two numbers i.e. Unique and greater than the function"
I don't think this statement was kind of correct, Is Big Oh notation really unique ?
Morover,
Thinking about Big Oh notations , I also confused myself why do we approximate the Big Oh notations to the highest degree term . ( We can easily prove the mathematical inequality though with nice choice of constants ) but what is the real use of it ??
I mean what does it signify?
We can even take F(n) as the Big Oh Notation of F(n) for the constant 1 !
I think it shows the dependence of the running time only on the highest degree term! Please Clear my doubts as I might have understood it wrongly from my book or my teacher made an error?
Is Big Oh notation really unique ?
Yes and no. By the pure formula, Big-O is of course not unique. However, to be of use for its purpose, one actually tries to find not just some upper bound, but the lowest upper bound. And this makes a meaningful "Big-O" unique.
We can even take F(n) as the Big Oh Notation of F(n) for the constant
1 !
Yes we probably can do that. However, the Big-O is used to relate classes of functions/algorithms to each other. Saying that F(n) relates to X(n) like F(n) relates to X(n) is what you get by using G(n) = F(n). Not much value in that.
That's why we try to find the unique lowest G to satisfy the equation. G(n) is usually a rather trivial function, like G(n) = n, G(n) = n², or G(n) = n*log(n), and this allows us to compare algorithms more easily because we can easily see that, e.g., G(n) = n is less than G(n) = n² for all n >= something.
Interestingly, most algorithms' complexity converges to one of the simple G(n) for large n. You could also say that, by looking at large n's, we try to separate out the "important" from the not-so-important parts of F(n); then we just omit the minor terms in F(n) and get a simplified function G(n).
In practical terms, we also want to abstract away from technical details. If I have, for instance, F(n) = 4*n and E(n) = 2*n I can use twice as much CPUs for the F algorithm and be just as good as the E one independent of the size of the input. Maybe one machine has a dedicated instruction for sqare root, so that SQRT(x) is a single step, while another machine needs much more instructions to get the result. We want to abstract away from that.
This implies one more point of view too: If I have a problem to solve, e.g. "calculate x(y)", I could present the solution as "result := x(y)", O(1). But that's not considered an algorithm. The specification of the algorithm must include a relevant level of detail to be a) meaningful and b) accessible to Big-O.
Our teacher gave us the following definition of Big O notation:
O(f(n)): A function g(n) is in O(f(n)) (“big O of f(n)”) if there exist
constants c > 0 and N such that |g(n)| ≤ c |f(n)| for all n > N.
I'm trying to tease apart the various components of this definition. First of all, I'm confused by what it means for g(n) to be in O(f(n)). What does in mean?
Next, I'm confused by the overall second portion of the statement. Why does saying that the absolute value of g(n) less than or equal f(n) for all n > N mean anything about Big O Notation?
My general intuition for what Big O Notation means is that it is a way to describe the runtime of an algorithm. For example, if bubble sort runs in O(n^2) in the worst case, this means that it takes the time of n^2 operations (in this case comparisons) to complete the algorithm. I don't see how this intuition follows from the above definition.
First of all, I'm confused by what it means for g(n) to be in O(f(n)). What does in mean?
In this formulation, O(f(n)) is a set of functions. Thus O(N) is the set of all functions that are (in simple terms) proportional to N as N tends to infinity.
The word "in" means ... "is a member of the set".
Why does saying that the absolute value of g(n) less than or equal f(n) for all n > N mean anything about Big O Notation?
It is the definition. And besides you have neglected the c term in your synopsis, and that is an important part of the definition.
My general intuition for what Big O Notation means is that it is a way to describe the runtime of an algorithm. For example, if bubble sort runs in O(n^2) in the worst case, this means that it takes the time of n^2 operations (in this case comparisons) to complete the algorithm. I don't see how this intuition follows from the above definition.
Your intuition is incorrect in two respects.
Firstly, the real definition of O(N^2) is NOT that it takes N^2 operations. it is that it takes proportional to N^2 operations. That's where the c comes into it.
Secondly, it is only proportional to N^2 for large enough values of N. Big O notation is not about what happens for small N. It is about what happens when the problem size scales up.
Also, as a a comment notes "proportional" is not quite the right phraseology here. It might be more correct to say "tends towards proportional" ... but in reality there isn't a simple english description of what is going on here. The real definition is the mathematical one.
If you now reread the definition in the light of that, you should see that it fits just nicely.
(Note that the definitions of Big O, and related measures of complexity can also be expressed in calculus terminology; i.e. using "limits". However, generally speaking the things we are talking about are quantized; i.e. an integer number instructions, an integer number bytes of storage, etc. Calculus is really about functions involving real numbers. Hence, you could argue that the formulation above is preferable. OTOH, a real mathematician would probably see bus-sized holes in this argumentation.)
O(g(n)) looks like a function, but it is actually a set of functions. If a function f is in O(g(n)), it means that g is an asymptotic upper bound on f to within a constant factor. O(g(n)) contains all functions that are bounded from above by g(n).
More specifically, there exists a constant c and n0 such that f(n) < c * g(n) for all n > n0. This means that c * g(n) will always overtake f(n) beyond some value of n. g is asymptotically larger than f; it scales faster.
This is used in the analysis of algorithms as follows. The running time of an algorithm is impossible to specify practically. It would obviously depend on the machine on which it runs. We need a way of talking about efficiency that is unconcerned with matters of hardware. One might naively suggest counting the steps executed by the algorithm and using that as the measure of running time, but this would depend on the granularity with which the algorithm is specified and so is no good either. Instead, we concern ourselves only with how quickly the running time (this hypothetical thing T(n)) scales with the size of the input n.
Thus, we can report the running time by saying something like:
My algorithm (algo1) has a running time T(n) that is in the set O(n^2). I.e. it's bounded from above by some constant multiple of n^2.
Some alternative algorithm (algo2) might have a time complexity of O(n), which we call linear. This may or may not be better for some particular input size or on some hardware, but there is one thing we can say for certain: as n tends to infinity, algo2 will out-perform algo1.
Practically then, one should favour algorithms with better time complexities, as they will tend to run faster.
This asymptotic notation may be applied to memory usage also.
are there a limited amount of basic O Notations, considering you are meant to 'distil' them down to their most important part?
O(n^2):
O(n):
O(1):
O(log n) logarithmic
O(n!) factorial
O(na) polynomial
Or are you expected to work out variations such as O(n^4) etc... and if so, is that the only exception? the power of X one?
Generally, you distill Big-O notation (and related Bachman-Landau notations like Big-Theta and Big-Omega) down to the operation of the fastest-growing N term. So, you remove/simplify lesser terms (N2 + N == O(N2)) and nonvariable coefficients of the term (O(4N2) == O(N2)), but NOT powers or exponent bases (O(34N) == O(3N)). You also don't strip variable coefficients; NlogN is NlogN, NOT logN or N.
So, you will normally only see numbers in a Big-Oh notation if if the complexity is polynomial (power of N) or exponential (Nth power of a base). The most common Big-Oh notations are much as you show, with the addition of NlogN (VERY common).
However, if you are differentiating between two algorithms of equal general complexity, you MAY add lesser terms and/or coefficients back in to demonstrate the relative difference; an algorithm that performs linearly but has double the instructions of another might be described as O(2N) when comparing it with the other O(N) algorithm. However, taken individually, both algorithms are linear (O(N)).
Some Big-O notations are not algebraic, and may involve multiple variables in therir simplest general-case form. The counting sort, for instance, is complexity O(Max(N,M)), where N is the number of elements in the list, and M is the range of those elements. Often it is possible to reduce this in specific cases by defining M in terms of N and thus reducing to a single variable (if the list in question is of the first N squares, M = N2-1), but in the general case both variables are independent and significant. BucketSort's complexity is officially O(N), but really it's more like O(NlogM) where M is the maximum value of the list of N elements. M is usually considered insignificant, but that depends on the values you normally sort (sorting 5 values each in the billions will require more loops to compare each power of 10 than traversals through the list to put them in the buckets) and on the radix used (RadixSort is a base-2 BucketSort; again, sorting values with a greater log2 value will require more loops than traversals).
The Big-O notation is a way to provide an upper bound on the limiting behaviour of a function. There are no restrictions on its functional form. However, there are certain conventions, as explained by Wikipedia:
In typical usage, the formal definition of O notation is not used directly; rather, the O notation for a function f(x) is derived by the following simplification rules:
If f(x) is a sum of several terms, the one with the largest growth rate is kept, and all others omitted.
If f(x) is a product of several factors, any constants (terms in the product that do not depend on x) are omitted.
There are, of course, some functional forms that show up more frequently that others. Some common classes are listed here.
No, the number of different O-classes is not finite.
As you already mentioned O(n^x) describes a different set for every x. And that is not the only "exception". O(x^n) is also a different set for every x. Likewise O(n^n), O(n^n^n), O(n^n^n^n) etc. are all different sets (and you can of course continue that ad infinitum).
In general, you split the expression into a sum of products, keep the largest term, and divide by constants to simplify it as much as possible.
ex:
n(2n+3log(n)) => 2n^2+3nlog(n) => 2n^2 => n^2
(n+1)(2nlog(n)+n) => 2n^2log(n)+n^2+2nlog(n)+n => 2n^2log(n) => n^2log(n)