How can an algorithm have two worst case complexities? - algorithm

Steven Skiena's The Algorithm design manual's chapter 1 exercise has this question:
Let P be a problem. The worst-case time complexity of P is O(n^2) .
The worst-case time complexity of P is also Ω(n log n) . Let A be an
algorithm that solves P. Which subset of the following statements are
consistent with this information about the complexity of P?
A has worst-case time complexity O(n^2) .
A has worst-case time complexity O(n^3/2).
A has worst-case time complexity O(n).
A has worst-case time complexity ⍬(n^2).
A has worst-case time complexity ⍬(n^3) .
How can an algorithm have two worst-case time complexities?
Is the author trying to say that for some value of n (say e.g. 300) upper bound for algorithm written for solving P is of the order of O(n^2) while for another value of n (say e.g. 3000) the same algorithm worst case was Ω(n log n)?

The answer to your specific question
is the author trying to say that for some value of n (say e.g. 300) upper bound for algorithm written for solving P is of the order of O(n^2) while for another value of n (say e.g. 3000) the same algorithm worst case was Ω(n log n)?
is no. That is not how complexity functions work. :) We don't talk about different complexity classes for different values of n. The complexity refers to the entire algorithm, not to the algorithm at specific sizes. An algorithm has a single time complexity function T(n), which computes how many steps are required to carry out the computation for an input size of n.
In the problem, you are given two pieces of information:
The worst case complexity is O(n^2)
The worst case complexity is Ω(n log n)
All this means is that we can pick constants c1, c2, N1, and N2, such that, for our algorithm's function T(n), we have
T(n) ≤ c1*n^2 for all n ≥ N1
T(n) ≥ c2*n log n for all n ≥ N2
In other words, our T(n) is "asymptotically bounded below" by some constant time n log n and "asymptotically bounded above" by some constant times n^2. It can itself be anything "between" an n log n style function and an n^2 style function. It can even be n log n (since that is bounded above by n^2) or it can be n^2 (since that's bounded below by n log n. It can be something in between, like n(log n)(log n).
It's not so much that an algorithm has "multiple worst case complexities" in the sense it has different behaviors. What are you seeing is an upper bound and a lower bound! And these can, of course, be different.
Now it is possible that you have some "weird" function like this:
def p(n):
if n is even:
print n log n stars
else:
print n*2 stars
This crazy algorithm does have the bounds specified in the problem from the Skiena book. And it has no Θ complexity. That might have been what you were thinking about, but do note that it is not necessary for a complexity function to be this weird in order for us to say the upper and lower bounds differ. The thing to remember is that upper and lower bounds are not tight unless explicitly stated to be so.

Related

Why O(n log n) is greater than O(n)?

I read that O(n log n) is greater than O(n), I need to know why is it so?
For instance taking n as 1, and solving O(n log n) will be O(1 log 1) = O(0). On the same hand O(n) will be O(1)?
Which actually contradicts O(n log n) > O(n)
Let us start by clarifying what is Big O notation in the current context. From (source) one can read:
Big O notation is a mathematical notation that describes the limiting
behavior of a function when the argument tends towards a particular
value or infinity. (..) In computer science, big O notation is used to classify algorithms
according to how their run time or space requirements grow as the
input size grows.
The following statement is not accurate:
For instance taking n as 1, solving O(n log n) will be O(1 log 1) =
O(0). On the same hand O(n) will be O(1)?
One cannot simply perform "O(1 log 1)" since the Big O notation does not represent a function but rather a set of functions with a certain asymptotic upper-bound; as one can read from source:
Big O notation characterizes functions according to their growth
rates: different functions with the same growth rate may be
represented using the same O notation.
Informally, in computer-science time-complexity and space-complexity theories, one can think of the Big O notation as a categorization of algorithms with a certain worst-case scenario concerning time and space, respectively. For instance, O(n):
An algorithm is said to take linear time/space, or O(n) time/space, if its time/space complexity is O(n). Informally, this means that the running time/space increases at most linearly with the size of the input (source).
and O(n log n) as:
An algorithm is said to run in quasilinear time/space if T(n) = O(n log^k n) for some positive constant k; linearithmic time/space is the case k = 1 (source).
Mathematically speaking the statement
I read that O(n log n) is greater than O(n) (..)
is not accurate, since as mentioned before Big O notation represents a set of functions. Hence, more accurate will be O(n log n) contains O(n). Nonetheless, typically such relaxed phrasing is normally used to quantify (for the worst-case scenario) how a set of algorithms behaves compared with another set of algorithms regarding the increase of their input sizes. To compare two classes of algorithms (e.g., O(n log n) and O(n)) instead of
For instance taking n as 1, solving O(n log n) will be O(1 log 1) =
O(0). On the same hand O(n) will be O(1)?
Which actually contradicts O(n log n) > O(n)
you should analyze how both classes of algorithms behaves with the increase of their input size (i.e., n) for the worse-case scenario; analyzing n when it tends to the infinity
As #cem rightly point it out, in the image "big-O denote one of the asymptotically least upper-bounds of the plotted functions, and does not refer to the sets O(f(n))"
As you can see in the image after a certain input, O(n log n) (green line) grows faster than O(n) (yellow line). That is why (for the worst-case) O(n) is more desirable than O(n log n) because one can increase the input size, and the growth rate will increase slower with the former than with the latter.
I'm going to give the you the real answer, even though it seems to be more than one step away from the way you're currently thinking about it...
O(n) and O(n log n) are not numbers, or even functions, and it doesn't quite make sense to say that one is greater than the other. It's sloppy language, but there are actually two accurate statements that might be meant by saying that "O(n log n) is greater than O(n)".
Firstly, O(f(n)), for any function f(n) of n, is the infinite set of all functions that asymptotically grow no faster than f(n). A formal definition would be:
A function g(n) is in O(f(n)) if and only if there are constants n0 and C such that g(n) <= Cf(n) for all n > n0.
So O(n) is a set of functions and O(n log n) is a set of functions, and O(n log n) is a superset of O(n). Being a superset is kind of like being "greater", so if one were to say that "O(n log n) is greater than O(n)", they might be referring to the superset relationship between them.
Secondly, the definition of O(f(n)) makes f(n) an upper bound on the asymptotic growth of functions in the set. And the upper bound is greater for O(n log n) than it is for O(n). In more concrete terms, there a constant n0 such that n log n > n, for all n > n0. The bounding function itself is asymptotically greater, and this is another thing that one might mean when saying "O(n log n) is greater than O(n)".
Finally, both of these things are mathematically equivalent. If g(n) is asymptotically greater than f(n), it follows mathematically that O(g(n)) is a superset of O(f(n)), and if O(g(n)) is a proper superset of O(f(n)), it follows mathematically that g(n) is asymptotically greater than f(n).
Therefore, even though the statement "O(n log n) is greater than O(n)" does not strictly make any sense, it has a clear and unambiguous meaning if you're willing to read it charitably.
The big O notation only has an asymptotic meaning, that is it makes sense only when n goes to infinity.
For example, a time complexity of O(100000) just means your code runs in constant time, which is asymptotically faster (smaller) than O(log n).

What is the complexity of code that does O(n*log n) work, and then O(n^2) work?

I have an algorithm that first does something in O(n*log(n)) time and then does something else in O(n^2) time. Am I correct that the total complexity would be
O(n*log(n) + n^2)
= O(n*(log(n) + n))
= O(n^2)
since log(n) + n is dominated by the + n?
The statement is correct, as O(n log n) is a subset of O(n^2); however, a formal proof would consist out of choosing and constructing suitable constants.
If the call probability of both is equal then you are right. But if the probability of both is not equal you have to do an amortized analysis where you split rare expensive calls (n²) to many fast calls (n log(n)).
For quick sort for example (which generally takes n log(n), but rarly takes n²) you can proof that average running time is n log(n) because of amortized anlysis.
one of the rules of complexity analysis is that you must remove the terms with lower exponent or lower factors.
nlogn vs n^2 (divide both by n)
logn vs n
logn is smaller than n, than you can remove it from the complexity equation
so if the complexity is O(nlogn + n^2), when n is really big, the value of nlogn is not significant if compared to n^2, this is why you remove it and rewrite as O(n^2)

the smallest algorithm’s asymptotic time complexity as a function of n

we have known some of the algorithm’s asymptotic time complexity is a function of n such as
O(log* n), O(log n), O(log log n), O(n^c) with 0< c < 1, ....
May I know what is the smallest algorithm’s asymptotic time complexity as a function of n ?
Update 1 : we look for the asymptotic time complexity function with n. O(1) is the smallest, but it does not have n.
Update 2: O(1) is the smallest time complexity we can go, but what is the next smallest well-known functions with n ? so far as I research:
O(alpha (n)) : inverse Ackermann: Amortized time per operation using a disjoint set
or O(log * n)iterated logarithmic The find algorithm of Hopcroft and Ullman on a disjoint set
Apart from the trivial O(1), the answer is: there isn't one.
If something isn't O(1) (that is, with n -> infinity, computing time goes to infinity), whatever bounding function of n you find, there's always a smaller one: just take a logarithm of the bounding function. You can do this infinitely, hence there's no smallest non-constant bounding function.
However in practice you should probably stop worrying when you reach the inverse Ackermann function :)
It is not necessary that the complexity of a given algorithm be expressible via well-known functions. Also note that big-oh is not the complexity of a given algorithm. It is an upper bound of the complexity.
You can construct functions growing as slow as you want, for instance n1/k for any k.
O(1) is as low as you can go in terms of complexity and strictly speaking 1 is a valid function, it simply is constant.
EDIT: a really slow growing function that I can think of is the iterated logarithm as complexity of disjoint set forest implemented with both path compression and union by rank.
There is always a "smaller algorithm" that whatever suggested.
O(log log log log(n)) < O(log log log(n)) < O(log log (n)) < O(log(n)).
You can put as many log as you want. But I don't know if there is real life example of these.
So my answer is you will get closer and closer to O(1).

Which algorithm is faster O(N) or O(2N)?

Talking about Big O notations, if one algorithm time complexity is O(N) and other's is O(2N), which one is faster?
The definition of big O is:
O(f(n)) = { g | there exist N and c > 0 such that g(n) < c * f(n) for all n > N }
In English, O(f(n)) is the set of all functions that have an eventual growth rate less than or equal to that of f.
So O(n) = O(2n). Neither is "faster" than the other in terms of asymptotic complexity. They represent the same growth rates - namely, the "linear" growth rate.
Proof:
O(n) is a subset of O(2n): Let g be a function in O(n). Then there are N and c > 0 such that g(n) < c * n for all n > N. So g(n) < (c / 2) * 2n for all n > N. Thus g is in O(2n).
O(2n) is a subset of O(n): Let g be a function in O(2n). Then there are N and c > 0 such that g(n) < c * 2n for all n > N. So g(n) < 2c * n for all n > N. Thus g is in O(n).
Typically, when people refer to an asymptotic complexity ("big O"), they refer to the canonical forms. For example:
logarithmic: O(log n)
linear: O(n)
linearithmic: O(n log n)
quadratic: O(n2)
exponential: O(cn) for some fixed c > 1
(Here's a fuller list: Table of common time complexities)
So usually you would write O(n), not O(2n); O(n log n), not O(3 n log n + 15 n + 5 log n).
Timothy Shield's answer is absolutely correct, that O(n) and O(2n) refer to the same set of functions, and so one is not "faster" than the other. It's important to note, though, that faster isn't a great term to apply here.
Wikipedia's article on "Big O notation" uses the term "slower-growing" where you might have used "faster", which is better practice. These algorithms are defined by how they grow as n increases.
One could easily imagine a O(n^2) function that is faster than O(n) in practice, particularly when n is small or if the O(n) function requires a complex transformation. The notation indicates that for twice as much input, one can expect the O(n^2) function to take roughly 4 times as long as it had before, where the O(n) function would take roughly twice as long as it had before.
It depends on the constants hidden by the asymptotic notation. For example, an algorithm that takes 3n + 5 steps is in the class O(n). So is an algorithm that takes 2 + n/1000 steps. But 2n is less than 3n + 5 and more than 2 + n/1000...
It's a bit like asking if 5 is less than some unspecified number between 1 and 10. It depends on the unspecified number. Just knowing that an algorithm runs in O(n) steps is not enough information to decide if an algorithm that takes 2n steps will complete faster or not.
Actually, it's even worse than that: you're asking if some unspecified number between 1 and 10 is larger than some other unspecified number between 1 and 10. The sets you pick from being the same doesn't mean the numbers you happen to pick will be equal! O(n) and O(2n) are sets of algorithms, and because the definition of Big-O cancels out multiplicative factors they are the same set. Individual members of the sets may be faster or slower than other members, but the sets are the same.
Theoretically O(N) and O(2N) are the same.
But practically, O(N) will definitely have a shorter running time, but not significant. When N is large enough, the running time of both will be identical.
O(N) and O(2N) will show significant difference in growth for small numbers of N, But as N value increases O(N) will dominate the growth and coefficient 2 becomes insignificant. So we can say algorithm complexity as O(N).
Example:
Let's take this function
T(n) = 3n^2 + 8n + 2089
For n= 1 or 2, the constant 2089 seems to be the dominant part of function but for larger values of n, we can ignore the constants and 8n and can just concentrate on 3n^2 as it will contribute more to the growth, If the n value still increases the coefficient 3 also seems insignificant and we can say complexity is O(n^2).
For detailed explanation refer here
O(n) is faster however you need to understand that when we talk about Big O, we are measuring the complexity of a function/algorithm, not its speed. And we measure this complexity asymptotically. In lay man terms, when we talk about asymptotic analysis, we take immensely huge values for n. So if you plot the graph for O(n) and O(2n), the values will stay in some particular range from each other for any value of n. They are much closer compared to the other canonical forms like O(nlogn) or O(1), so by convention we approximate the complexity to the canonical form O(n).

Big O when adding together different routines

Lets say I have a routine that scans an entire list of n items 3 times, does a sort based on the size, and then bsearches that sorted list n times. The scans are O(n) time, the sort I will call O(n log(n)), and the n times bsearch is O(n log(n)). If we add all 3 together, does it just give us the worst case of the 3 - the n log(n) value(s) or does the semantics allow added times?
Pretty sure, now that I type this out that the answer is n log(n), but I might as well confirm now that I have it typed out :)
The sum is indeed the worst of the three for Big-O.
The reason is that your function's time complexity is
(n) => 3n + nlogn + nlogn
which is
(n) => 3n + 2nlogn
This function is bounded above by 3nlogn, so it is in O(n log n).
You can choose any constant. I just happened to choose 3 because it was a good asymptotic upper bound.
You are correct. When n gets really big, the n log(n) dominates 3n.
Yes it will just be the worst case since O-notation is just about asymptotic performance.
This should of course not be taken to mean that adding these extra steps will have no effect on your programs performance. One of the O(n) steps could easily consume a huge portion of your execution time for the given range of n where your program operates.
As Ray said, the answer is indeed O(n log(n)). The interesting part of this question is that it doesn't matter which way you look at it: does adding mean "actual addition" or does it mean "the worst case". Let's prove that these two ways of looking at it produce the same result.
Let f(n) and g(n) be functions, and without loss of generality suppose f is O(g). (Informally, that g is "worse" than f.) Then by definition, there exists constants M and k such that f(n) < M*g(n) whenever n > k. If we look at in the "worst case" way, we expect that f(n)+g(n) is O(g(n)). Now looking at it in the "actual addition" way, and specializing to the case where n > k, we have f(n) + g(n) < M*g(n) + g(n) = (M+1)*g(n), and so by definition f(n)+g(n) is O(g(n)) as desired.

Resources