Why doesn't the additive constant in Big-O notation matter? - algorithm

In definition of Big-O notation we care only about C coefficient:
f(n) ≤ Cg(n) for all n ≥ k
Why don't we care about A as well:
f(n) ≤ Cg(n) + A for all n ≥ k

There are really two cases to consider here. For starters, imagine that your function g(n) has the property that g(n) ≥ 1 for all "sufficiently large" choices of n. In that case, if you know that
f(n) ≥ cg(n) + A,
then you also know that
f(n) ≥ cg(n) + Ag(n),
so
f(n) ≥ (c + A)g(n).
In other words, if your function g is always at least one, then bounding f(n) by something of the form cg(n) + A is equivalent to bounding it with something of the form c'g(n) for some new constant c'. In that sense, adding some extra flexibility into the definition of big-O notation, at least in this case, wouldn't make a difference.
In the context of the analysis of algorithms, pretty much every function g(n) you might bound something with will be at least one, and so we can "munch up" that extra additive term by choosing a larger multiple of g.
However, big-O notation is also used in many cases to bound functions that decrease as n increases. For example, we might say that the probability that some algorithm gives back the right answer is O(1 / n), where the function 1/n drops to 0 as a function of n. In this case, we use big-O notation to talk about how fast the function drops off. If the success probability is O(1 / n2), for example, that's a better guarantee than the earlier O(1 / n) success probability, assuming n gets sufficiently large. In that case, allowing for additive terms in the definition of big-O notation would actually break things. For example, intuitively, the function 1 / n2 drops to 0 faster than the function 1 / n, and using the formal definition of big-O notation you can see this because 1 / n2 ≤ 1 / n for all n ≥ 1. However, with your modified definition of big-O notation, we could also say that 1 / n = O(1 / n2), since
1 / n ≤ 1 / n2 + 1 for all n ≥ 1,
which is true only because the additive 1 term bounds the 1/n term, not the 1/n2 we might have been initially interested in.
So the long answer to your question is "the definition you proposed above is equivalent to the regular definition of big-O if we only restrict ourselves to the case where g(n) doesn't drop to zero as a function of n, and in the case where g(n) does drop the zero as a function of n your new definition isn't particularly useful."

Big-O notation is about what happens as the data gets larger. In other words, it is a limit as n --> infinity.
As n gets large, A remains the same. So it gets smaller and smaller in comparison. On the other hand, g(n) (presumably) gets bigger and bigger, so its contribution increases more and more.

A is constant in this case, thus it will not affect much the complexity when the size of the problem grows very much.
When you have a cost of 1 million, you do not care if you add up a constant factor of 100 for example. You care for how this 1 million grows (resulting from Cg(n)); whether it becomes 2 millions for example if the size of the problem grows a bit. However, your constant will still be 100, so it doesn't really affect the overall complexity.

Related

Having a bit of trouble reasoning the formal definition of Big O

My professor recently brushed over the formal definition of Big O:
To be completely honest even after him explaining it to a few different students we all seem to still not understand it at its core. The problems in comprehension mostly occurred with the following examples we went through:
So far my reasoning is as follows:
When you multiply a function's highest term by a constant, you get a new function that eventually surpasses the initial function at a given n. He called this n a "witness" to the function O(g(n))
How is this c term created/found? He mentioned bounds a couple of times but didn't really specify what bounds signify or how to find them/use them.
I think I just need a more solid foundation of the formal definition and how these examples back up the definition.
I think that the way this definition is typically presented in terms of c values and n0's is needlessly confusing. What f(n) being O(g(n)) really means is that when you disregard constant and lower order terms, g(n) is an asymptotic upper bound for f(n) (for a function to g to asymptotically upper bound f just means that past a certain point g is always greater than or equal to f). Put another way, f(n) grows no faster than g(n) as n goes to infinity.
Big O itself is a little confusing, because f(n) = O(g(n)) doesn't mean that g(n) grows strictly faster than f(n). It means when you disregard constant and lower order terms, g(n) grows faster than f(n), or it grows at the same rate (strictly faster would be "little o"). A simple, formal way to put this concept is to say:
That is, for this limit to hold true, the highest order term of f(n) can be at most a constant multiple of the highest order term of g(n). f(n) is O(g(n)) iff it grows no faster than g(n).
For example, f(n) = n is in O(g(n) = n^2), because past a certain point n^2 is always bigger than n. The limit of n^2 over n is positive, so n is in O(n^2)
As another example, f(n) = 5n^2 + 2n is in O(g(n) = n^2), because in the limit, f(n) can only be about 5 times larger than g(n). It's not infinitely bigger: they grow at the same rate. To be precise, the limit of n^2 over 5n^2 + 3n is 1/5, which is more than zero, so 5n^2 + 3n is in O(n^2). Hopefully this limit based definition provides some intuition, as it is completely equivalent mathematically to the provided definition.
Finding a particular constant value c and x value n0 for which the provided inequality holds true is just a particular way of showing that in the limit as n goes to infinity, g(n) grows at least as fast as f(n): that f(n) is in O(g(n)). That is, if you've found a value past which c*g(n) is always greater than f(n), you've shown that f(n) grows no more than a constant multiple (c times) faster than g(n) (if f grew faster than g by more than a constant multiple, finding such a c and n0 would be impossible).
There's no real art to finding a particular c and n0 value to demonstrate f(n) = O(g(n)). They can be literally whatever positive values you need them to be to make the inequality true. In fact, if it is true that f(n) = O(g(n)) then you can pick any value you want for c and there will be some sufficiently large n0 value that makes the inequality true, or, similarly you could pick any n0 value you want, and if you make c big enough the inequality will become true (obeying the restrictions that c and n0 are both positive). That's why I don't really like this formalization of big O: it's needlessly particular and proofs involving it are somewhat arbitrary, distracting away from the main concept which is the behavior of f and g as n goes to infinity.
So, as for how to handle this in practice, using one of the example questions: why is n^2 + 3n in O(n^2)?
Answer: because the limit as n goes to infinity of n^2 / n^2 + 3n is 1, which is greater than 0.
Or, if you're wanting/needing to do it the other way, pick any positive value you want for n0, and evaluate f at that value. f(1) will always be easy enough:
f(1) = 1^2 + 3*1 = 4
Then find the constant you could multiply g(1) by to get the same value as f(1) (or, if not using n0 = 1 use whatever n0 for g that you used for f).
c*g(1) = 4
c*1^2 = 4
c = 4
Then, you just combine the statements into an assertion to show that there exists a positive n0 and a constant c such that cg(n) <= f(n) for all n >= n0.
n^2 + 3n <= (4)n^2 for all n >= 1, implying n^2 + 3n is in O(n^2)
If you're using this method of proof, the above statement you use to demonstrate the inequality should ideally be immediately obvious. If it's not, maybe you want to change your n0 so that the final statement is more clearly true. I think that showing the limit of the ratio g(n)/f(n) is positive is much clearer and more direct if that route is available to you, but it is up to you.
Moving to a negative example, it's quite easy with the limit method to show that f(n) is not in O(g(n)). To do so, you just show that the limit of g(n) / f(n) = 0. Using the third example question: is nlog(n) + 2n in O(n)?
To demonstrate it the other way, you actually have to show that there exists no positive pair of numbers n0, c such that for all n >= n0 f(n) <= cg(n).
Unfortunately showing that f(n) = nlogn + 2n is in O(nlogn) by using c=2, n0=8 demonstrates nothing about whether f(n) is in O(n) (showing a function is in a higher complexity class implies nothing about it not being a lower complexity class).
To see why this is the case, we could also show a(n) = n is in g(n) = nlogn using those same c and n0 values (n <= 2(nlog(n) for all n >= 8, implying n is in O(nlogn))`), and yet a(n)=n clearly is in O(n). That is to say, to show f(n)=nlogn + 2n is not in O(n) with this method, you can't just show that it is in O(nlogn). You would have to show that no matter what n0 you pick, you can never find a c value large enough such that f(n) >= c(n) for all n >= n0. Showing that such a pair of numbers does not exist is not impossible, but relatively speaking it's a tricky thing to do (and would probably itself involve limit equations, or a proof by contradiction).
To sum things up, f(n) is in O(g(n)) if the limit of g(n) over f(n) is positive, which means f(n) doesn't grow any faster than g(n). Similarly, finding a constant c and x value n0 beyond which cg(n) >= f(n) shows that f(n) cannot grow asymptotically faster than g(n), implying that when discarding constants and lower order terms, g(n) is a valid upper bound for f(n).

Role of lower order terms in big O notation

In big O notation, we always say that we should ignore constant factors for most cases. That is, rather than writing,
3n^2-100n+6
we are almost always satisfied with
n^2
since that term is the fastest growing term in the equation.
But I found many algorithm courses starts comparing functions with many terms
2n^2+120n+5 = big O of n^2
then finding c and n0 for those long functions, before recommending to ignore low order terms in the end.
My question is what would I get from trying to understand and annalising these kinds of functions with many terms? Before this month I am comfortable with understanding what O(1), O(n), O(LOG(n)), O(N^3) mean. But am I missing some important concepts if I just rely on this typically used functions? What will I miss if I skipped analysing those long functions?
Let's first of all describe what we mean when we say that f(n) is in O(g(n)):
... we can say that f(n) is O(g(n)) if we can find a constant c such
that f(n) is less than c·g(n) or all n larger than n0, i.e., for all
n>n0.
In equation for: we need to find one set of constants (c, n0) that fulfils
f(n) < c · g(n), for all n > n0, (+)
Now, the result that f(n) is in O(g(n)) is sometimes presented in difference forms, e.g. as f(n) = O(g(n)) or f(n) ∈ O(g(n)), but the statement is the same. Hence, from your question, the statement 2n^2+120n+5 = big O of n^2 is just:
f(n) = 2n^2 + 120n + 5
a result after some analysis: f(n) is in O(g(n)), where
g(n) = n^2
Ok, with this out of the way, we look at the constant term in the functions we want to analyse asymptotically, and let's look at it educationally, using however, your example.
As the result of any big-O analysis is the asymptotic behaviour of a function, in all but some very unusual cases, the constant term has no effect whatsoever on this behaviour. The constant factor can, however, affect how to choose the constant pair (c, n0) used to show that f(n) is in O(g(n)) for some functions f(n) and g(n), i.e., the none-unique constant pair (c, n0) used to show that (+) holds. We can say that the constant term will have no effect of our result of the analysis, but it can affect our derivation of this result.
Lets look at your function as well as another related function
f(n) = 2n^2 + 120n + 5 (x)
h(n) = 2n^2 + 120n + 22500 (xx)
Using a similar approach as in this thread, for f(n), we can show:
linear term:
120n < n^2 for n > 120 (verify: 120n = n^2 at n = 120) (i)
constant term:
5 < n^2 for e.g. n > 3 (verify: 3^2 = 9 > 5) (ii)
This means that if we replace both 120n as well as 5 in (x) by n^2 we can state the following inequality result:
Given that n > 120, we have:
2n^2 + n^2 + n^2 = 4n^2 > {by (ii)} > 2n^2 + 120n + 5 = f(n) (iii)
From (iii), we can choose (c, n0) = (4, 120), and (iii) then shows that these constants fulfil (+) for f(n) with g(n) = n^2, and hence
result: f(n) is in O(n^2)
Now, for for h(n), we analogously have:
linear term (same as for f(n))
120n < n^2 for n > 120 (verify: 120n = n^2 at n = 120) (I)
constant term:
22500 < n^2 for e.g. n > 150 (verify: 150^2 = 22500) (II)
In this case, we replace 120n as well as 22500 in (xx) by n^2, but we need a larger less than constraint on n for these to hold, namely n > 150. Hence, we the following holds:
Given that n > 150, we have:
2n^2 + n^2 + n^2 = 4n^2 > {by (ii)} > 2n^2 + 120n + 5 = h(n) (III)
In same way as for f(n), we can, here, choose (c, n0) = (4, 150), and (III) then shows that these constants fulfil (+) for h(n), with g(n) = n^2, and hence
result: h(n) is in O(n^2)
Hence, we have the same result for both functions f(n) and h(n), but we had to use different constants (c,n0) to show these (i.e., somewhat different derivation). Note finally that:
Naturally the constants (c,n0) = (4,150) (used for h(n) analysis) are also valid to show that f(n) is in O(n^2), i.e., that (+) holds for f(n) with g(n)=n^2.
However, not the reverse: (c,n0) = (4,120) cannot be used to show that (+) holds for h(n) (with g(n)=n^2).
The core of this discussion is that:
As long as you look at sufficiently large values of n, you will be able to describe the constant terms in relations as constant < dominantTerm(n), where, in our example, we look at the relation with regard to dominant term n^2.
The asymptotic behaviour of a function will not (in all but some very unusual cases) depend on the constant terms, so we might as well skip looking at them at all. However, for a rigorous proof of the asymptotic behaviour of some function, we need to take into account also the constant terms.
Ever have intermediate steps in your work? That is what this likely is as when you are computing a big O, chances are you don't already know for sure what the highest order term is and thus you keep track of them all and then determine which complexity class makes sense in the end. There is also something to be said for understanding why the lower order terms can be ignored.
Take some graph algorithms like a minimum spanning tree or shortest path. Now, can just looking at an algorithm you know what the highest term will be? I know I wouldn't and so I'd trace through the algorithm and collect a bunch of terms.
If you want another example, consider Sorting Algorithms and whether you want to memorize all the complexities or not. Bubble Sort, Shell Sort, Merge Sort, Quick Sort, Radix Sort and Heap Sort are a few of the more common algorithms out there. You could either memorize both the algorithm and complexity or just the algorithm and derive the complexity from the pseudo code if you know how to trace them.

Are 2^n and n*2^n in the same time complexity?

Resources I've found on time complexity are unclear about when it is okay to ignore terms in a time complexity equation, specifically with non-polynomial examples.
It's clear to me that given something of the form n2 + n + 1, the last two terms are insignificant.
Specifically, given two categorizations, 2n, and n*(2n), is the second in the same order as the first? Does the additional n multiplication there matter? Usually resources just say xn is in an exponential and grows much faster... then move on.
I can understand why it wouldn't since 2n will greatly outpace n, but because they're not being added together, it would matter greatly when comparing the two equations, in fact the difference between them will always be a factor of n, which seems important to say the least.
You will have to go to the formal definition of the big O (O) in order to answer this question.
The definition is that f(x) belongs to O(g(x)) if and only if the limit limsupx → ∞ (f(x)/g(x)) exists i.e. is not infinity. In short this means that there exists a constant M, such that value of f(x)/g(x) is never greater than M.
In the case of your question let f(n) = n ⋅ 2n and let g(n) = 2n. Then f(n)/g(n) is n which will still grow infinitely. Therefore f(n) does not belong to O(g(n)).
A quick way to see that n⋅2ⁿ is bigger is to make a change of variable. Let m = 2ⁿ. Then n⋅2ⁿ = ( log₂m )⋅m (taking the base-2 logarithm on both sides of m = 2ⁿ gives n = log₂m ), and you can easily show that m log₂m grows faster than m.
I agree that n⋅2ⁿ is not in O(2ⁿ), but I thought it should be more explicit since the limit superior usage doesn't always hold.
By the formal definition of Big-O: f(n) is in O(g(n)) if there exist constants c > 0 and n₀ ≥ 0 such that for all n ≥ n₀ we have f(n) ≤ c⋅g(n). It can easily be shown that no such constants exist for f(n) = n⋅2ⁿ and g(n) = 2ⁿ. However, it can be shown that g(n) is in O(f(n)).
In other words, n⋅2ⁿ is lower bounded by 2ⁿ. This is intuitive. Although they are both exponential and thus are equally unlikely to be used in most practical circumstances, we cannot say they are of the same order because 2ⁿ necessarily grows slower than n⋅2ⁿ.
I do not argue with other answers that say that n⋅2ⁿ grows faster than 2ⁿ. But n⋅2ⁿ grows is still only exponential.
When we talk about algorithms, we often say that time complexity grows is exponential.
So, we consider to be 2ⁿ, 3ⁿ, eⁿ, 2.000001ⁿ, or our n⋅2ⁿ to be same group of complexity with exponential grows.
To give it a bit mathematical sense, we consider a function f(x) to grow (not faster than) exponentially if exists such constant c > 1, that f(x) = O(cx).
For n⋅2ⁿ the constant c can be any number greater than 2, let's take 3. Then:
n⋅2ⁿ / 3ⁿ = n ⋅ (2/3)ⁿ and this is less than 1 for any n.
So 2ⁿ grows slower than n⋅2ⁿ, the last in turn grows slower than 2.000001ⁿ. But all three of them grow exponentially.
You asked "is the second in the same order as the first? Does the additional n multiplication there matter?" These are two different questions with two different answers.
n 2^n grows asymptotically faster than 2^n. That's that question answered.
But you could ask "if algorithm A takes 2^n nanoseconds, and algorithm B takes n 2^n nanoseconds, what is the biggest n where I can find a solution in a second / minute / hour / day / month / year? And the answers are n = 29/35/41/46/51/54 vs. 25/30/36/40/45/49. Not much difference in practice.
The size of the biggest problem that can be solved in time T is O (ln T) in both cases.
Very Simple answer is 'NO'
see 2^n and n.2^n
as seen n.2^n > 2^n for any n>0
or you can even do it by applying log on both sides then you get
n.log(2) < n.log(2) + log(n)
hence by both type of analysis that is by
substituting a number
using log
we see that n.2^n is greater than 2^n as visibly seen
so if you get a equation like
O ( 2^n + n.2^n ) which can be replaced as O ( n.2^n)

Which algorithm is better?

I have two algorithms.
The complexity of the first one is somewhere between Ω(n^2*(logn)^2) and O(n^3).
The complexity of the second is ω(n*log(logn)).
I know that O(n^3) tells me that it can't be worse than n^3, but I don't know the difference between Ω and ω. Can someone please explain?
Big-O: The asymptotic worst case performance of an algorithm. The function n happens to be the lowest valued function that will always have a higher value than the actual running of the algorithm. [constant factors are ignored because they are meaningless as n reaches infinity]
Big-Ω: The opposite of Big-O. The asymptotic best case performance of an algorithm. The function n happens to be the highest valued function that will always have a lower value than the actual running of the algorithm. [constant factors are ignored because they are meaningless as n reaches infinity]
Big-Θ: The algorithm is so nicely behaved that some function n can describe both the algorithm's upper and lower bounds within the range defined by some constant value c. An algorithm could then have something like this: BigTheta(n), O(c1n), BigOmega(-c2n) where n == n throughout.
Little-o: Is like Big-O but sloppy. Big-O and the actual algorithm performance will actually become nearly identical as you head out to infinity. little-o is just some function that will always be bigger than the actual performance. Example: o(n^7) is a valid little-o for a function that might actually have linear or O(n) performance.
Little-ω: Is just the opposite. w(1) [constant time] would be a valid little omega for the same above function that might actually exihbit BigOmega(n) performance.
Big omega (Ω) lower bound:
A function f is an element of the set Ω(g) (which is often written as f(n) = Ω(g(n))) if and only if there exists c > 0, and there exists n0 > 0 (probably depending on the c), such that for every n >= n0 the following inequality is true:
f(n) >= c * g(n)
Little omega (ω) lower bound:
A function f is an element of the set ω(g) (which is often written as f(n) = ω(g(n))) if and only for each c > 0 we can find n0 > 0 (depending on the c), such that for every n >= n0 the following inequality is true:
f(n) >= c * g(n)
You can see that it's actually the same inequality in both cases, the difference is only in how we define or choose the constant c. This slight difference means that the ω(...) is conceptually similar to the little o(...). Even more - if f(n) = ω(g(n)), then g(n) = o(f(n)) and vice versa.
Returning to your two algorithms - the algorithm #1 is bounded from both sides, so it looks more promising to me. The algorithm #2 can work longer than c * n * log(log(n)) for any (arbitrarily large) c, so it might eventually loose to the algorithm #1 for some n. Remember, it's only asymptotic analysis - so all depends on actual values of these constants and the problem size which has some practical meaning.

When analyzing the time complexity of an algorithm why do we drop the constant of the term with the largest degree

Suppose I have the following : T(n) = 5n^2 +2n
The asymtotic tight bound of this is theta n^2. I want to understand the reason behind dropping the 5. I understand why we ignore the lower order terms.
Consult the definition of big-O.
Keeping things simple[*], let's define that a function g is O(f) if there exist constants C and M such that for n > M, 0 <= g(n) < Cf(n).
The presence of a positive constant multiplier in f doesn't affect this, just choose C appropriately. Your example T is O(n^2) by choosing a value of C greater than 5, and a value of M big enough that the +2n is irrelevant. For for example, for n > 2 it's a fact that 5n^2 + 2n < 6n^2 (because n^2 > 2n), so with C= 6 and M = 2 we see that T(n) is O(n^2).
So it's true that T(n) is O(n^2), and also true that it's O(5n^2), and O(5n^2 + 2n). The most interesting of those facts is that it's O(n^2), since it's the simplest expression and the other two are logically equivalent. If we want to compare the complexities of different functions, then we want to use simple expressions.
For big-Theta just note that we can play the same trick when f and g are the other way around. The relation "g is Theta(f)" is an equivalence relation, so what are we going to choose as the representative member of the equivalence class of T? The simplest one.
[*] Keeping things less simple, we cope with negative numbers by using a limsup rather than a plain limit. My definition above is actually sufficient but not necessary.
wEverything comes back to the concept of "Order of Magnitute". Given something like
5n^2 +2n
You think that the 5 is significant, however when you break it down and think about the numbers, the constant really doesn't matter (graph it and you will see why). For example .. say n is 50.
5 * 50^2 + 2 * 50 --> 5 * 2500 + 2 * 50 --> 12,600
As you mentioned, the 2*n is insignificant when compared to n^2. This same concept applies when viewing the constant as well... you might think that 2500 vs 125,000 is a big difference; however consider if the algorithm was n^3... you are now looking at 12,600 vs 625,100
So the factor that will have the most significant difference on the cost of an algorithm is just n^2.
The constant is dropped because it does not affect which complexity class the function belongs to.
If you have two functions f(n) = c1 * n^2 and g(n) = c2 * n^3, where c1 and c2 are constants, it doesn't matter how large c1 is and how small c2 is, g(n) will ALWAYS overtake and outgrow f(n) at some value of n.

Resources