What does the O in Big-O Notation mean? - algorithm

I‘m trying to wrap my head around the meaning of the Landau-Notation in the context of analysing an algorithm‘s complexity.
What exactly does the O formally mean in Big-O-Notation?
So the way I understand it is that O(g(x)) gives a set of functions which grow as rapidly or slower as g(x), meaning, for example in the case of O(n^2):
where t(x) could be, for instance, x + 3 or x^2 + 5. Is my understanding correct?
Furthermore, are the following notations correct?
I saw the following written down by a tutor. What does this mean? How can you use less or equal, if the O-Notation returns a set?
Could I also write something like this?

So the way I understand it is that O(g(x)) gives a set of functions which grow as rapidly or slower as g(x).
This explanation of Big-Oh notation is correct.
f(n) = n^2 + 5n - 2, f(n) is an element of O(n^2)
Yes, we can say that. O(n^2) in plain English, represents "set of all functions that grow as rapidly as or slower than n^2". So, f(n) satisfies that requirement.
O(n) is a subset of O(n^2), O(n^2) is a subset of O(2^n)
This notation is correct and it comes from the definition. Any function that is in O(n), is also in O(n^2) since growth rate of it is slower than n^2. 2^n is an exponential time complexity, whereas n^2 is polynomial. You can take limit of n^2 / 2^n as n goes to infinity and prove that O(n^2) is a subset of O(2^n) since 2^n grows bigger.
O(n) <= O(n^2) <= O(2^n)
This notation is tricky. As explained here, we don't have "less than or equal to" for sets. I think tutor meant that time complexity for the functions belonging to the set O(n) is less than (or equal to) the time complexity for the functions belonging to the set O(n^2). Anyways, this notation doesn't really seem familiar, and it's best to avoid such ambiguities in textbooks.

O(g(x)) gives a set of functions which grow as rapidly or slower as g(x)
That's technically right, but a bit imprecise. A better description contains the addenda
O(g(x)) gives the set of functions which are asymptotically bounded above by g(x), up to constant factors.
This may seem like a nitpick, but one inference from the imprecise definition is wrong.
The 'fixed version' of your first equation, if you make the variables match up and have one limit sign, seems to be:
This is incorrect: the ratio only has to be less than or equal to some fixed constant c > 0.
Here is the correct version:
where c is some fixed positive real number, that does not depend on n.
For example, f(x) = 3 (n^2) is in O(n^2): one constant c that works for this f is c = 4. Note that the requirement isn't 'for all c > 0', but rather 'for at least one constant c > 0'
The rest of your remarks are accurate. The <= signs in that expression are an unusual usage, but it's true if <= means set inclusion. I wouldn't worry about that expression's meaning.
There's other, more subtle reasons to talk about 'boundedness' rather than growth rates. For instance, consider the cosine function. |cos(x)| is in O(1), but its derivative fluctuates from negative one to positive one even as x increases to infinity.
If you take 'growth rate' to mean something like the derivative, example like these become tricky to talk about, but saying |cos(x)| is bounded by 2 is clear.
For an even better example, consider the logistic curve. The logistic function is O(1), however, its derivative and growth rate (on positive numbers) is positive. It is strictly increasing/always growing, while 1 has a growth rate of 0. This seems to conflict with the first definition without lots of additional clarifying remarks of what 'grow' means.
An always growing function in O(1) (image from the Wikipedia link):

Related

BigO Notation, understanding

I had seen in one of the videos (https://www.youtube.com/watch?v=A03oI0znAoc&t=470s) that, If suppose f(n)= 2n +3, then BigO is O(n).
Now my question is if I am a developer, and I was given O(n) as upperbound of f(n), then how I will understand, what exact value is the upper bound. Because in 2n +3, we remove 2 (as it is a constant) and 3 (because it is also a constant). So, if my function is f(n) where n = 1, I can't say g(n) is upperbound where n = 1.
1 cannot be upperbound for 1. I find hard understanding this.
I know it is a partial (and probably wrong answer)
From Wikipedia,
Big O notation characterizes functions according to their growth rates: different functions with the same growth rate may be represented using the same O notation.
In your example,
f(n) = 2n+3 has the same growth rate as f(n) = n
If you plot the functions, you will see that both functions have the same linear growth; and as n -> infinity, the difference between the 2 gets minimal.
In Big O notation, f(n) = 2n+3 when n=1 means nothing; you need to look at the trend, not discreet values.
As a developer, you will consider big-O as a first indication for deciding which algorithm to use. If you have an algorithm which is say, O(n^2), you will try to understand whether there is another one which is, say, O(n). If the problem is inherently O(n^2), then the big-O notation will not provide further help and you will need to use other criterion for your decision. However, if the problem is not inherently O(n^2), but O(n), you should discard any algorithm that happen to be O(n^2) and find an O(n) one.
So, the big-O notation will help you to better classify the problem and then try to solve it with an algorithm whose complexity has the same big-O. If you are lucky enough as to find 2 or more algorithms with this complexity, then you will need to ponder them using a different criterion.

Can't understand particalur Big-O example in textbook

So far understanding Big-O notation and how it's calculated is ok...most of the situations are easy to understand. However, I just came across this one problem that I cannot for the life of me figure out.
Directions: select the best big-O notation for the expression.
(n^2 + lg(n))(n-1) / (n + n^2)
The answer is O(n). That's all fine and dandy, but how is that rationalized given the n^3 factor in the numerator? n^3 isn't the best, but I thought there was like a "minimum" basis between f(n) <= O(g(n))?
The book has not explained any mathematical inner-workings, everything has sort of been injected into a possible solution (taking f(n) and generating a g(n) that's slightly greater than f(n)).
Kinda stumped. Go crazy on the math, or math referencing, if you must.
Also, given a piece of code, how does one determine the time units per line? How do you determine logarithmic times based off of a line of code (or multiple lines of code)? I understand that declaring and setting a variable is considered 1 unit of time, but when things get nasty, how would I approach a solution?
If you throw this algorithm into Wolfram Alpha, you get this generic result:
If you expand (FOIL) it, you get (roughly) a cubic function divided by a quadratic function. With Big-O, constants don't matter and the larger power wins, so you'd result with something like this:
The rest from here is mathematical induction. The overall algorithm grows in a linear-like fashion with respect to larger and larger values of n. It's not quite linear so we can't say it has a Big-Omega of (n), but it does come fairly reasonably close to O(n) due to the amortized constant growth rate.
Alternatively, you could annoy mathematicians everywhere and say, "Since this is based on Big-O rules, we can drop the factor of n from the denominator and thus result in O(n) by simple division." However, it's important in my mind to consider that this is still not quite linear.
Mind, this is a less-rigorous explanation than might be satisfactory for your class, but this gives you some math-based perspective on its runtime.
Non-rigorous answer:
Distributing the numerator product, we find that the numerator is n^3 + n log(n) - n^2 - log n.
We note that the numerator grows as n^3 for large n, and the denominator grows as n^2 for large n.
We interpret that as growth as n^{3 - 2} for large n, or O(n).

Why do we ignore co-efficients in Big O notation?

While searching for answers relating to "Big O" notation, I have seen many SO answers such as this, this, or this, but still I have not clearly understood some points.
Why do we ignore the co-efficients?
For example this answer says that the final complexity of 2N + 2 is O(N); we remove the leading co-efficient 2 and the final constant 2 as well.
Removing the final constant of 2 perhaps understandable. After all, N may be very large and so "forgetting" the final 2 may only change the grand total by a small percentage.
However I cannot clearly understand how removing the leading co-efficient does not make difference. If the leading 2 above became a 1 or a 3, the percentage change to the grand total would be large.
Similarly, apparently 2N^3 + 99N^2 + 500 is O(N^3). How do we ignore the 99N^2 along with the 500?
The purpose of the Big-O notation is to find what is the dominant factor in the asymptotic behavior of a function as the value tends towards the infinity.
As we walk through the function domain, some factors become more important than others.
Imagine f(n) = n^3+n^2. As n goes to infinity, n^2 becomes less and less relevant when compared with n^3.
But that's just the intuition behind the definition. In practice we ignore some portions of the function because of the formal definition:
f(x) = O(g(x)) as x->infinity
if and only if there is a positive real M and a real x_0 such as
|f(x)| <= M|g(x)| for all x > x_0.
That's in wikipedia. What that actually means is that there is a point (after x_0) after which some multiple of g(x) dominates f(x). That definition acts like a loose upper bound on the value of f(x).
From that we can derive many other properties, like f(x)+K = O(f(x)), f(x^n+x^n-1)=O(x^n), etc. It's just a matter of using the definition to prove those.
In special, the intuition behind removing the coefficient (K*f(x) = O(f(x))) lies in what we try to measure with computational complexity. Ultimately it's all about time (or any resource, actually). But it's hard to know how much time each operation take. One algorithm may perform 2n operations and the other n, but the latter may have a large constant time associated with it. So, for this purpose, isn't easy to reason about the difference between n and 2n.
From a (complexity) theory point of view, the coefficients represent hardware details that we can ignore. Specifically, the Linear Speedup Theorem dictates that for any problem we can always throw an exponentially increasing amount of hardware (money) at a computer to get a linear boost in speed.
Therefore, modulo expensive hardware purchases two algorithms that solve the same problem, one at twice the speed of the other for all input sizes, are considered essentially the same.
Big-O (Landau) notation has its origins independently in number theory, where one of its uses is to create a kind of equivalence between functions: if a given function is bounded above by another and simultaneously is bounded below by a scaled version of that same other function, then the two functions are essentially the same from an asymptotic point of view. The definition of Big-O (actually, "Big-Theta") captures this situation: the "Big-O" (Theta) of the two functions are exactly equal.
The fact that Big-O notation allows us to disregard the leading constant when comparing the growth of functions makes Big-O an ideal vehicle to measure various qualities of algorithms while respecting (ignoring) the "freebie" optimizations offered by the Linear Speedup Theorem.
Big O provides a good estimate of what algorithms are more efficient for larger inputs, all things being equal; this is why for an algorithm with an n^3 and an n^2 factor we ignore the n^2 factor, because even if the n^2 factor has a large constant it will eventually be dominated by the n^3 factor.
However, real algorithms incorporate more than simple Big O analysis, for example a sorting algorithm will often start with a O(n * log(n)) partitioning algorithm like quicksort or mergesort, and when the partitions become small enough the algorithm will switch to a simpler O(n^2) algorithm like insertionsort - for small inputs insertionsort is generally faster, although a basic Big O analysis doesn't reveal this.
The constant factors often aren't very interesting, and so they're omitted - certainly a difference in factors on the order of 1000 is interesting, but usually the difference in factors are smaller, and then there are many more constant factors to consider that may dominate the algorithms' constants. Let's say I've got two algorithms, the first with running time 3*n and the second with running time 2*n, each with comparable space complexity. This analysis assumes uniform memory access; what if the first algorithm interacts better with the cache, and this more than makes up for the worse constant factor? What if more compiler optimizations can be applied to it, or it behaves better with the memory management subsystem, or requires less expensive IO (e.g. fewer disk seeks or fewer database joins or whatever) and so on? The constant factor for the algorithm is relevant, but there are many more constants that need to be considered. Often the easiest way to determine which algorithm is best is just to run them both on some sample inputs and time the results; over-relying on the algorithms' constant factors would hide this step.
An other thing is that, what I have understood, the complexity of 2N^3 + 99N^2 + 500 will be O(N^3). So how do we ignore/remove 99N^2 portion even? Will it not make difference when let's say N is one miilion?
That's right, in that case the 99N^2 term is far overshadowed by the 2N^3 term. The point where they cross is at N=49.5, much less than one million.
But you bring up a good point. Asymptotic computational complexity analysis is in fact often criticized for ignoring constant factors that can make a huge difference in real-world applications. However, big-O is still a useful tool for capturing the efficiency of an algorithm in a few syllables. It's often the case that an n^2 algorithm will be faster in real life than an n^3 algorithm for nontrivial n, and it's almost always the case that a log(n) algorithm will be much faster than an n^2 algorithm.
In addition to being a handy yardstick for approximating practical efficiency, it's also an important tool for the theoretical analysis of algorithm complexity. Many useful properties arise from the composability of polynomials - this makes sense because nested looping is fundamental to computation, and those correspond to polynomial numbers of steps. Using asymptotic complexity analysis, you can prove a rich set of relationships between different categories of algorithms, and that teaches us things about exactly how efficiently certain problems can be solved.
Big O notation is not an absolute measure of complexity.
Rather it is a designation of how complexity will change as the variable changes. In other words as N increases the complexity will increase
Big O(f(N)).
To explain why terms are not included we look at how fast the terms increase.
So, Big O(2n+2) has two terms 2n and 2. Looking at the rate of increase
Big O(2) this term will never increase it does not contribute to the rate of increase at all so it goes away. Also since 2n increases faster than 2, the 2 turns into noise as n gets very large.
Similarly Big O(2n^3 + 99n^2) compares Big O(2n^3) and Big O(99n^2). For small values, say n < 50, the 99n^2 will contribute a larger nominal percentage than 2n^3. However if n gets very large, say 1000000, then 99n^2 although nominally large it is insignificant (close to 1 millionth) compared to the size of 2n^3.
As a consequence Big O(n^i) < Big O(n^(i+1)).
Coefficients are removed because of the mathematical definition of Big O.
To simplify the definition says Big O(f(n)) = Big O(f(cn)) for a constant c. This needs to be taken on faith because the reason for this is purely mathematical, and as such the proof would be too complex and dry to explain in simple terms.
The mathematical reason:
The real reason why we do this, is the way Big O-Notation is defined:
A series (or lets use the word function) f(n) is in O(g(n)) when the series f(n)/g(n) is bounded. Example:
f(n)= 2*n^2
g(n)= n^2
f(n) is in O(g(n)) because (2*n^2)/(n^2) = 2 as n approaches Infinity. The term (2*n^2)/(n^2) doesn't become infinitely large (its always 2), so the quotient is bounded and thus 2*n^2 is in O(n^2).
Another one:
f(n) = n^2
g(n) = n
The term n^2/n (= n) becomes infinetely large, as n goes to infinity, so n^2 is not in O(n).
The same principle applies, when you have
f(n) = n^2 + 2*n + 20
g(n) = n^2
(n^2 + 2*n + 20)/(n^2) is also bounded, because it tends to 1, as n goes to infinity.
Big-O Notation basically describes, that your function f(n) is (from some value of n on to infinity) smaller than a function g(n), multiplied by a constant. With the previous example:
2*n^2 is in O(n^2), because we can find a value C, so that 2*n^2 is smaller than C*n^2. In this example we can pick C to be 5 or 10, for example, and the condition will be satisfied.
So what do you get out of this? If you know your algorithm has complexity O(10^n) and you input a list of 4 numbers, it may take only a short time. If you input 10 numbers, it will take a million times longer! If it's one million times longer or 5 million times longer doesn't really matter here. You can always use 5 more computers for it and have it run in the same amount of time, the real problem here is, that it scales incredibly bad with input size.
For practical applications the constants does matter, so O(2 n^3) will be better than O(1000 n^2) for inputs with n smaller than 500.
There are two main ideas here: 1) If your algorithm should be great for any input, it should have a low time complexity, and 2) that n^3 grows so much faster than n^2, that perfering n^3 over n^2 almost never makes sense.

Meaning of Big O notation

Our teacher gave us the following definition of Big O notation:
O(f(n)): A function g(n) is in O(f(n)) (“big O of f(n)”) if there exist
constants c > 0 and N such that |g(n)| ≤ c |f(n)| for all n > N.
I'm trying to tease apart the various components of this definition. First of all, I'm confused by what it means for g(n) to be in O(f(n)). What does in mean?
Next, I'm confused by the overall second portion of the statement. Why does saying that the absolute value of g(n) less than or equal f(n) for all n > N mean anything about Big O Notation?
My general intuition for what Big O Notation means is that it is a way to describe the runtime of an algorithm. For example, if bubble sort runs in O(n^2) in the worst case, this means that it takes the time of n^2 operations (in this case comparisons) to complete the algorithm. I don't see how this intuition follows from the above definition.
First of all, I'm confused by what it means for g(n) to be in O(f(n)). What does in mean?
In this formulation, O(f(n)) is a set of functions. Thus O(N) is the set of all functions that are (in simple terms) proportional to N as N tends to infinity.
The word "in" means ... "is a member of the set".
Why does saying that the absolute value of g(n) less than or equal f(n) for all n > N mean anything about Big O Notation?
It is the definition. And besides you have neglected the c term in your synopsis, and that is an important part of the definition.
My general intuition for what Big O Notation means is that it is a way to describe the runtime of an algorithm. For example, if bubble sort runs in O(n^2) in the worst case, this means that it takes the time of n^2 operations (in this case comparisons) to complete the algorithm. I don't see how this intuition follows from the above definition.
Your intuition is incorrect in two respects.
Firstly, the real definition of O(N^2) is NOT that it takes N^2 operations. it is that it takes proportional to N^2 operations. That's where the c comes into it.
Secondly, it is only proportional to N^2 for large enough values of N. Big O notation is not about what happens for small N. It is about what happens when the problem size scales up.
Also, as a a comment notes "proportional" is not quite the right phraseology here. It might be more correct to say "tends towards proportional" ... but in reality there isn't a simple english description of what is going on here. The real definition is the mathematical one.
If you now reread the definition in the light of that, you should see that it fits just nicely.
(Note that the definitions of Big O, and related measures of complexity can also be expressed in calculus terminology; i.e. using "limits". However, generally speaking the things we are talking about are quantized; i.e. an integer number instructions, an integer number bytes of storage, etc. Calculus is really about functions involving real numbers. Hence, you could argue that the formulation above is preferable. OTOH, a real mathematician would probably see bus-sized holes in this argumentation.)
O(g(n)) looks like a function, but it is actually a set of functions. If a function f is in O(g(n)), it means that g is an asymptotic upper bound on f to within a constant factor. O(g(n)) contains all functions that are bounded from above by g(n).
More specifically, there exists a constant c and n0 such that f(n) < c * g(n) for all n > n0. This means that c * g(n) will always overtake f(n) beyond some value of n. g is asymptotically larger than f; it scales faster.
This is used in the analysis of algorithms as follows. The running time of an algorithm is impossible to specify practically. It would obviously depend on the machine on which it runs. We need a way of talking about efficiency that is unconcerned with matters of hardware. One might naively suggest counting the steps executed by the algorithm and using that as the measure of running time, but this would depend on the granularity with which the algorithm is specified and so is no good either. Instead, we concern ourselves only with how quickly the running time (this hypothetical thing T(n)) scales with the size of the input n.
Thus, we can report the running time by saying something like:
My algorithm (algo1) has a running time T(n) that is in the set O(n^2). I.e. it's bounded from above by some constant multiple of n^2.
Some alternative algorithm (algo2) might have a time complexity of O(n), which we call linear. This may or may not be better for some particular input size or on some hardware, but there is one thing we can say for certain: as n tends to infinity, algo2 will out-perform algo1.
Practically then, one should favour algorithms with better time complexities, as they will tend to run faster.
This asymptotic notation may be applied to memory usage also.

Is the time complexity of the empty algorithm O(0)?

So given the following program:
Is the time complexity of this program O(0)? In other words, is 0 O(0)?
I thought answering this in a separate question would shed some light on this question.
EDIT: Lots of good answers here! We all agree that 0 is O(1). The question is, is 0 O(0) as well?
From Wikipedia:
A description of a function in terms of big O notation usually only provides an upper bound on the growth rate of the function.
From this description, since the empty algorithm requires 0 time to execute, it has an upper bound performance of O(0). This means, it's also O(1), which happens to be a larger upper bound.
Edit:
More formally from CLR (1ed, pg 26):
For a given function g(n), we denote O(g(n)) the set of functions
O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0 }
The asymptotic time performance of the empty algorithm, executing in 0 time regardless of the input, is therefore a member of O(0).
Edit 2:
We all agree that 0 is O(1). The question is, is 0 O(0) as well?
Based on the definitions, I say yes.
Furthermore, I think there's a bit more significance to the question than many answers indicate. By itself the empty algorithm is probably meaningless. However, whenever a non-trivial algorithm is specified, the empty algorithm could be thought of as lying between consecutive steps of the algorithm being specified as well as before and after the algorithm steps. It's nice to know that "nothingness" does not impact the algorithm's asymptotic time performance.
Edit 3:
Adam Crume makes the following claim:
For any function f(x), f(x) is in O(f(x)).
Proof: let S be a subset of R and T be a subset of R* (the non-negative real numbers) and let f(x):S ->T and c ≥ 1. Then 0 ≤ f(x) ≤ f(x) which leads to 0 ≤ f(x) ≤ cf(x) for all x∈S. Therefore f(x) ∈ O(f(x)).
Specifically, if f(x) = 0 then f(x) ∈ O(0).
It takes the same amount of time to run regardless of the input, therefore it is O(1) by definition.
Several answers say that the complexity is O(1) because the time is a constant and the time is bounded by the product of some coefficient and 1. Well, it is true that the time is a constant and it is bounded that way, but that doesn't mean that the best answer is O(1).
Consider an algorithm that runs in linear time. It is ordinarily designated as O(n) but let's play devil's advocate. The time is bounded by the product of some coefficient and n^2. If we consider O(n^2) to be a set, the set of all algorithms whose complexity is small enough, then linear algorithms are in that set. But it doesn't mean that the best answer is O(n^2).
The empty algorithm is in O(n^2) and in O(n) and in O(1) and in O(0). I vote for O(0).
I have a very simple argument for the empty algorithm being O(0): For any function f(x), f(x) is in O(f(x)). Simply let f(x)=0, and we have that 0 (the runtime of the empty algorithm) is in O(0).
On a side note, I hate it when people write f(x) = O(g(x)), when it should be f(x) ∈ O(g(x)).
Big O is asymptotic notation. To use big O, you need a function - in other words, the expression must be parametrized by n, even if n is not used. It makes no sense to say that the number 5 is O(n), it's the constant function f(n) = 5 that is O(n).
So, to analyze time complexity in terms of big O you need a function of n. Your algorithm always makes arguably 0 steps, but without a varying parameter talking about asymptotic behaviour makes no sense. Assume that your algorithm is parametrized by n. Only now you may use asymptotic notation. It makes no sense to say that it is O(n2), or even O(1), if you don't specify what is n (or the variable hidden in O(1))!
As soon as you settle on the number of steps, it's a matter of the definition of big O: the function f(n) = 0 is O(0).
Since this is a low-level question it depends on the model of computation.
Under "idealistic" assumptions, it is possible you don't do anything.
But in Python, you cannot say def f(x):, but only def f(x): pass. If you assume that every instruction, even pass (NOP), takes time, then the complexity is f(n) = c for some constant c, and unless c != 0 you can only say that f is O(1), not O(0).
It's worth noting big O by itself does not have anything to do with algorithms. For example, you may say sin x = x + O(x3) when discussing Taylor expansion. Also, O(1) does not mean constant, it means bounded by constant.
All of the answers so far address the question as if there is a right and a wrong answer. But there isn't. The question is a matter of definition. Usually in complexity theory the time cost is an integer --- although that too is just a definition. You're free to say that the empty algorithm that quits immediately takes 0 time steps or 1 time step. It's an abstract question because time complexity is an abstract definition. In the real world, you don't even have time steps, you have continuous physical time; it may be true that one CPU has clock cycles, but a parallel computer could easily have asynchronoous clocks and in any case a clock cycle is extremely small.
That said, I would say that it's more reasonable to say that the halt operation takes 1 time step rather than that it takes 0 time steps. It does seem more realistic. For many situations it's arguably very conservative, because the overhead of initialization is typically far greater than executing one arithmetic or logical operation. Giving the empty algorithm 0 time steps would only be reasonable to model, for example, a function call that is deleted by an optimizing compiler that knows that the function won't do anything.
It should be O(1). The coefficient is always 1.
Consider:
If something grows like 5n, you don't say O(5n), you say O(n) [in other words, O(1n)]
If something grows like 7n^2, you don't say O(7n^2), you say O(n^2) [in other words, O(1n^2)]
Likewise you should say O(1), not O(some other constant)
There is no such thing as O(0). Even an oracle machine or a hypercomputer require the time for one operation, i.e. solve(the_goldbach_conjecture), ergo:
All machines, theoretical or real, finite or infinite produce algorithms with a minimum time complexity of O(1).
But then again, this code right here is O(0):
// Hello world!
:)
I would say it's O(1) by definition, but O(0) if you want to get technical about it: since O(k1g(n)) is equivalent to O(k2g(n)) for any constants k1 and k2, it follows that O(1 * 1) is equivalent to O(0 * 1), and therefore O(0) is equivalent to O(1).
However, the empty algorithm is not like, for example, the identity function, whose definition is something like "return your input". The empty algorithm is more like an empty statement, or whatever happens between two statements. Its definition is "do absolutely nothing with your input", presumably without even the implied overhead of simply having input.
Consequently, the complexity of the empty algorithm is unique in that O(0) has a complexity of zero times whatever function strikes your fancy, or simply zero. It follows that since the whole business is so wacky, and since O(0) doesn't already mean something useful, and since it's slightly ridiculous to even discuss such things, a reasonable special case for O(0) is something like this:
The complexity of the empty algorithm is O(0) in time and space. An algorithm with time complexity O(0) is equivalent to the empty algorithm.
So there you go.
Given the formal definition of Big O:
Let f(x) and g(x) be two functions defined over the set of real numbers. Then, we write:
f(x) = O(g(x)) as x approaches infinity iff there exists a real M and a real x0 so that:
|f(x)| <= M * |g(x)| for every x > x0
As I see it, if we substitute g(x) = 0 (in order to have a program with complexity O(0)), we must have:
|f(x)| <= 0, for every x > x0 (the constraint of existence of a real M and x0 is practically lifted here)
which can only be true when f(x) = 0.
So I would say that not only the empty program is O(0), but it is the only one for which that holds. Intuitively, this should've been true since O(1) encompasses all algorithms that require a constant number of steps regardless of the size of its task, including 0. It's essentially useless to talk about O(0); it's already in O(1). I suspect it's purely out of simplicity of definition that we use O(1), where it could as well be O(c) or something similar.
0 = O(f) for all function f, since 0 <= |f|, so it is also O(0).
Not only is this a perfectly sensible question, but it is important in certain situations involving amortized analysis, especially when "cost" means something other than "time" (for example, "atomic instructions").
Let's say there is a datastructure featuring multiple operation types, for which an amortized analysis is being conducted. It could well happen that one type of operation can always be funded fully using "coins" deposited during previous operations.
There is a simple example of this: the "multipop queue" described in Cormen, Leiserson, Rivest, Stein [CLRS09, 17.2, p. 457], and also on Wikipedia. Each time an item is pushed, a coin is put on the item, for a total amortized cost of 2. When (multi) pops occur, they can be fully paid for by taking one coin from each item popped, so the amortized cost of MULTIPOP(k) is O(0). To wit:
Note that the amortized cost of MULTIPOP is a constant (0)
...
Moreover, we can also charge MULTIPOP operations nothing. To pop the
first plate, we take the dollar of credit off the plate and use it to
pay the actual cost of a POP operation. To pop a second plate, we
again have a dollar of credit on the plate to pay for the POP
operation, and so on. Thus, we have always charged enough up front to
pay for MULTIPOP operations. In other words, since each plate on the
stack has 1 dollar of credit on it, and the stack always has a
nonnegative number of plates, we have ensured that the amount of
credit is always nonnegative.
Thus O(0) is an important "complexity class" for certain amortized operations.
O(1) means the algorithm's time complexity is always constant.
Let's say we have this algorithm (in C):
void doSomething(int[] n)
{
int x = n[0]; // This line is accessing an array position, so it is time consuming.
int y = n[1]; // Same here.
return x + y;
}
I am ignoring the fact that the array could have less than 2 positions, just to keep it simple.
If we count the 2 most expensive lines, we have a total time of 2.
2 = O(1), because:
2 <= c * 1, if c = 2, for every n > 1
If we have this code:
public void doNothing(){}
And we count it as having 0 expansive lines, there is no difference in saying it has O(0) O(1), or O(1000), because for every one of these functions, we can prove the same theorem.
Normally, if the algorithm takes a constant number of steps to complete, we say it has O(1) time complexity.
I guess this is just a convention, because you could use any constant number to represent the function inside the O().
No. It's O(c) by convention whenever you don't have dependence on input size, where c is any positive constant (typically 1 is used - O(1) = O(12.37)).

Resources