Is `log(n)` base 10? - algorithm

Still getting a grip on logarithms being the opposite of exponentials. (Would it also be correct to describe them as the inversion of exponentials?)
There are lots of great SO entries already on Big-O notation including O(log n) and QuickSort n(log n) specifically. Found some useful graphs as well.
In looking at Divide and Conquer algorithms, I'm coming across n log n, which I think is n multiplied by the value of log n. I often try concrete examples like 100 log 100, to help visualize what's going on in an abstract equation.
Just read that log n assumes base 10. Does n log n translate into:
"the number n multiplied by the amount 10 needs to be raised to the power of in order to equal the number n"?
So 100 log 100 equals 200 because 10 needs to be raised to the power of two to equal 100?
Does the base change as an algorithm iterates through a set? Does the base even matter if we're talking in abstractions anyway?

Yes, the base does change depending on the way it iterates, but it doesn't matter. As you might remember, changing the base of logarithms means multiplying them by a constant. Since you mentioned that you have read about Big-O notation, then you probably already know that constants do not make a difference (O(n) is the same as O(2n) or O(1000n)).
EDIT: to clarify something you said - "I'm coming across n log n, which I think is n multiplied by the value of log n". Yes, you are right. And if you want to know why it involves log n, then think of what algorithms like divide and conquer do - they split the input (in two halves or four quarters or ten tenths, depending on the algorithm) during each iteration. The question is "How many times can that input be split up before the algorithm ends?" So you look at the input and try to find how many times you can divide it by 2, or by 4, or by 10, until the operation is meaningless? (unless the purpose of the algorithm is to divide 0 as many times as possible) Now you can give yourself concrete examples, starting with easy stuff like "How many times can 8 be divided by 2?" or ""How many times can 1000 be divided by 10?"

You don't need to worry about the base - if you're dealing with algorithmic complexity, it doesn't matter which base you're in, because the difference is just a constant factor.
Fundamentally, you just need to know that log n means that as n increases exponentially, the running time (or space used) increases linearly. For example, if n=10 takes 1 minute, then n=100 would take 2 miuntes, and n=1000 would take 3 minutes - roughly. (It's usually in terms of upper bounds, with smaller factors ignored... but that's the general gist of it.)
n log n is just that log n multiplied by n - so the time or space taken increases "a bit faster than linearly", basically.

The base does not matter at all. In fact people tend to drop part of the operations, e.g. if one has O(n^4 + 2*n), this is often reduced to O(n^4). Only the most relevant power needs to be considered when comparing algorithms.
For the case of comparing two closely related algorithms, say O(n^4 + 2*n) against O(n^4 + 3*n), one needs to include the linear dependency in order to conserve the relevant information.
Consider a divide and conquer approach based on bisection: your base is 2, so you may talk about ld(n). On the other hand you use the O-notation to compare different algorithms by means of the same base. This being said, the difference between ld, ln, and log10 is just a matter of a general offset.

Logarithms and Exponents are inverse operations.
if
x^n = y
then
Logx(y) = n
For example,
10^3 = 1000
Log10 (1000) = 3
Divide and conquer algorithms work by dividing the problem into parts that are then solved as independent problems. There can also be a combination step that combines the parts. Most divide and conquer algorithms are base 2 which means they cut the problem in half each time. For example, Binary Search, works like searching a phone book for a name. You flip to the middle and say.. Is the name I'm looking for in the first half or last half? (before or after what you flipped to), then repeat. Every time you do this you divide the problem's size by 2. Therefore, it's base 2, not base 10.
Order notation is primarily only concerned with the "order" of the runtime because that is what is most important when trying to determine if a problem will be tractable (solvable in a reasonable amount of time).
Examples of different orders would be:
O(1)
O(n)
O(n * log n)
O(n^2 * log n)
O(n^3.5)
O(n!)
etc.
The O here stands for "big O notation" which is basically provides an upper bound on the growth rate of the function. Because we only care about the growth of the function for large inputs.. we typically ignore lower order terms for example
n^3 + 2 n^2 + 100 n
would be
O(n^3)
because n^3 is the largest order term it will dominate the growth function for large values of N.
When you see O(n * log n) people are just abbreviating... if you understand the algorithm it is typically Log base 2 because most algorithms cut the problem in half. However, it could be log base 3 for example if the algorithm cut the problem into thirds for example.
Note:
In either case, if you were to graph the growth function, it would appear as a logarithmic curve. But, of course a O(Log3 n) would be faster than O(Log2 n).
The reason you do not see O(log10 n) or O(log3 n) etc.. is that it just isn't that common for an algorithm to work better this way. In our phone book example you could split the pages into 3 separate thirds and compare inbetween 1-2 and 2-3. But, then you just made 2 comparisons and ended up knowing which 1/3 the name was in. However, if you just split it in half each time you'd know which 1/4 it was in which is more efficient.

In the vast set of programming languages I know, the function log() is intended to be base e=2.718281....
In mathematical books sometimes it means base "ten" and sometimes base "e".
As another answers pointed out, for the big-O notation does not matter, because, for all base x, the complexities O(log_x (n)) is the same as O(ln(n)) (here log_x means "logarithm in base x" and ln() means "logarithm in base e").
Finally, it's common that, in the analysis of several algorithms, it's more convenient consider that log() is, indeed, "logarithm in base 2". (I've seen some texts taking this approach). This is obviously related to the binary representation of numbers in the computers.

Related

Shouldn't the time complexity of binary search be O(ceil(logn))?

This seems to be a common question and the answer can be found anywhere, but it's not the case: I cannot find an answer anywhere on the Internet. The point is, I've never seen anywhere asking a question about whether the time complexity might be O(ceil(logn)), I cannot figure it out, so I decided to ask a question here.
First, suppose I have a sorted array containing n numbers, and I want to search for a value in it using the binary search algorithm. The count of steps required in the worse case are listed below:
n
steps
1
1
2
2
3
2
4
3
5
3
6
3
7
3
8
4
9
4
10
4
As you can see, The steps required for an array of n numbers are ceil(log2n)(ceil(log2n) denotes the smallest integer that is greater than or equal to log2n). So I think the time complexity of binary search should be O(ceil(logn)), but according to Wikipedia, the time complexity should be O(logn), why? Is there something wrong?
As I have already explained in two other answers (see here and here), the Big-O notation is not what most people think it is. It neither tells you anything about the speed of an algorithm nor anything about the number of processing steps.
The only thing Big-O tells you is how the processing time of an algorithm will change if the number of input elements is changing. Does it stay constant? Does it raise linearly? Does it raise logarithmically? Does it raise quadratically? This is the only thing that Big-O is answering.
Thus O(5) is the same as O(1000000) as both simply mean constant which is typically written as O(1). And O(n + 100000) is the same as O(5n + 8) as both simple mean linear which is typically written as O(n).
I know that many people will say "Yes, but O(5n) is steeper than O(2n)" and this is absolutely correct but still they are both linear and Big-O is not about comparing two functions of linear complexity with each other but about categorizing functions into coarse categories. People just get confused by the fact that these categories are named after mathematical functions, so they believe any function may make sense to be used for Big-O notation but that isn't the case. Only functions with different characteristics do get an own Big-O notation.
The following overview is nowhere near complete but in practice mainly the following Big-O notations are relevant:
O(1) - constant
O(log log n) - double logarithmic
O(log n) - logarithmic
O((log n)^c), c > 1 - polylogarithmic
O(n^c), 0 < c < 1 - fractional power
O(n) - linear
O(n log n) = O(log n!) - linearithmic
O(n^2) - quadratic
O(n^c), c > 1 - polynomial
O(c^n), c > 1 - exponential
O(n!) - factorial
Instead of writing these as functions one could also have given each of them just a name but writing them as function has two advantages: People with some math background will immediately have the image of a graph in their head and it's easy to introduce new types without coming up with fancy names as long as you can mathematically describe their graphs.
O(ceil(log n)) and O(log n) both represent the same asymptotic complexity (logarithmic complexity).
Or loosely put : O(ceil(log n)) = O(log n)
What you're seeing here is the difference between two different ways of quantifying how much work an algorithm does. The first approach is to count up exactly how many times something happens, yielding a precise expression for the answer. For example, in this case you're noting that the number of rounds of binary search is ⌈lg (n+1)⌉. This precisely counts the number of rounds done in binary search in the worst case, and if you need to know that exact figure, this is a good piece of information to have.
However, in many cases we're aren't interested in the exact amount of work that an algorithm does and are instead more interested in how that algorithm is going to scale as the inputs get larger. For example, let's suppose that we run binary search on an array of size 103 and find that it takes 10μs to run. How long should we expect it to take to run on an input of size 106? We can work out the number of rounds that binary search will perform here in the worst case by plugging in n = 103 into our formula (11), and we could then try to work out how long we think each of those iterations is going to take (10μs / 11 rounds ≈ 0.91 μs / round), and then work out how many rounds we'll have with n = 106 (21) and multiply the number of rounds by the cost per round to get an estimate of about 19.1μs.
That's kind of a lot of work. Is there an easier way to do this?
That's where big-O notation comes in. When we say that the cost of binary search in the worst case is O(log n), we are not saying "the exact amount of work that binary search does is given by the expression log n." Instead, what we're saying is *the runtime of binary search, in the worst case, scales the same way that the function log n scales." Through properties of logarithms, we can see that log n2 = 2 log n, so if you square the size of the input to a function that grows logarithmically, it's reasonable to guess that the output of that function will roughly double.
In our case, if we know that the runtime of binary search on an input of size 103 is 10μs and we're curious to learn what the runtime will be on an input of size 106 = (103)2, we can make a pretty reasonable guess that it'll be about 20μs because we've squared the input. And if you look at the math I did above, that's really close to the figure we got by doing a more detailed analysis.
So in that sense, saying "the runtime of binary search is O(log n)" is a nice shorthand for saying "whatever the exact formula for the runtime of the function is, it'll scale logarithmically with the size of the input." That allows us to extrapolate from our existing data points without having to work out the leading coefficients or anything like that.
This is especially useful when you factor in that even though binary search does exactly ⌈lg (n+1)⌉ rounds, that expression doesn't exactly capture the full cost of performing the binary search. Each of those rounds might take a slightly different amount of time depending on how the comparison goes, and there's probably some setup work that gives an additive constant in, and the cost of each round in terms of real compute time can't be determined purely from the code itself. That's why we often use big-O notation when quantifying how fast algorithms run - it lets us capture what often ends up being the most salient details (how things will scale) without having to pin down all the mathematical particulars.

How to calculate O(log n) in big O notation?

I know that O(log n) refers to an iterative reduction by a fixed ratio of the problem set N (in big O notation), but how do i actually calculate it to see how many iterations an algorithm with a log N complexity would have to preform on the problem set N before it is done (has one element left)?
You can't. You don't calculate the exact number of iterations with BigO.
You can "derive" BigO when you have exact formula for number of iterations.
BigO just gives information how the number iterations grows with growing N, and only for "big" N.
Nothing more, nothing less. With this you can draw conclusions how much more operations/time will the algorithm take if you have some sample runs.
Expressed in the words of Tim Roughgarden at his courses on algorithms:
The big-Oh notation tries to provide a sweet spot for high level algorithm reasoning
That means it is intended to describe the relation between the algorithm time execution and the size of its input avoiding dependencies on the system architecture, programming language or chosen compiler.
Imagine that big-Oh notation could provide the exact execution time, that would mean that for any algorithm, for which you know its big-Oh time complexity function, you could predict how would it behave on any machine whatsoever.
On the other hand, it is centered on asymptotic behaviour. That is, its description is more accurate for big n values (that is why lower order terms of your algorithm time function are ignored in big-Oh notation). It can reasoned that low n values do not demand you to push foward trying to improve your algorithm performance.
Big O notation only shows an order of magnitude - not the actual number of operations that algorithm would perform. If you need to calculate exact number of loop iterations or elementary operations, you have to do it by hand. However in most practical purposes exact number is irrelevant - O(log n) tells you that num. of operations will raise logarythmically with a raise of n
From big O notation you can't tell precisely how many iteration will the algorithm do, it's just estimation. That means with small numbers the different between the log(n) and actual number of iterations could be differentiate significantly but the closer you get to infinity the different less significant.
If you make some assumptions, you can estimate the time up to a constant factor. The big assumption is that the limiting behavior as the size tends to infinity is the same as the actual behavior for the problem sizes you care about.
Under that assumption, the upper bound on the time for a size N problem is C*log(N) for some constant C. The constant will change depending on the base you use for calculating the logarithm. The base does not matter as long as you are consistent about it. If you have the measured time for one size, you can estimate C and use that to guesstimate the time for a different size.
For example, suppose a size 100 problem takes 20 seconds. Using common logarithms, C is 10. (The common log of 100 is 2). That suggests a size 1000 problem might take about 30 seconds, because the common log of 1000 is 3.
However, this is very rough. The approach is most useful for estimating whether an algorithm might be usable for a large problem. In that sort of situation, you also have to pay attention to memory size. Generally, setting up a problem will be at least linear in size, so its cost will grow faster than an O(log N) operation.

base of logarithms in time-complexity algorithms

What is the base of logarithm on all time-complexity algorithms? Is it base 10 or base e?
When we say that the average sorting complexity is O(n log n). Is the base of log n 10 or e?
In Computer Science, it's often base 2. This is because many divide and conquer algorithms that exhibit this kind of complexity are dividing the problem in two at each step.
Binary search is a classic example. At each step, we divide the array into two and only recursively search in one of the halves, until you reach a base case of a subarray of one element (or zero elements). When dividing an array of length n in two, the total number of divisions before reaching a one element array is log2(n).
This is often simplified because the logarithms of different bases are effectively equivalent when discussing algorithm analysis.
In Big-O complexity analysis, it doesn't actually matter what the logarithm base is. (they are asymptotically the same, i.e. they differ by only a constant factor):
O(log2 N) = O(log10 N) = O(loge N)
Most of the time when mathematicians talk about logs, they implicitly mean to base e. Computer Scientists would tend to favour base 2, but it doesn't actually matter.
In terms of Big-O, the base doesn't matter because of the change of base formula implies that it's only a constant factor difference.
However sometimes it's useful to count approximately how many operations an algorithm performs. In this case, most of the times it's base 2 due to the nature of most divide and conquer algorithms.
There's some explanation in this site
Basically, logarithms from base 10 or base 2 or base e can be exchanged (transformed) to any other base with the addition of a constant. So, it doesn't matter the base for the log.
The key thing to note is that log2N grows slowly. Doubling N has a relatively small effect. Logarithmic curves flatten out nicely.
source
It varies depending on the problem and for the most part is irrelevant. However in many cases it is base 2, for example binary search is log(base 2)n. You can read about it in more detail here.

Trying to understand Big-oh notation

Hi I would really appreciate some help with Big-O notation. I have an exam in it tomorrow and while I can define what f(x) is O(g(x)) is, I can't say I thoroughly understand it.
The following question ALWAYS comes up on the exam and I really need to try and figure it out, the first part seems easy (I think) Do you just pick a value for n, compute them all on a claculator and put them in order? This seems to easy though so I'm not sure. I'm finding it very hard to find examples online.
From lowest to highest, what is the
correct order of the complexities
O(n2), O(log2 n), O(1), O(2n), O(n!),
O(n log2 n)?
What is the
worst-case computational-complexity of
the Binary Search algorithm on an
ordered list of length n = 2k?
That guy should help you.
From lowest to highest, what is the
correct order of the complexities
O(n2), O(log2 n), O(1), O(2n), O(n!),
O(n log2 n)?
The order is same as if you compare their limit at infinity. like lim(a/b), if it is 1, then they are same, inf. or 0 means one of them is faster.
What is the worst-case
computational-complexity of the Binary
Search algorithm on an ordered list of
length n = 2k?
Find binary search best/worst Big-O.
Find linked list access by index best/worst Big-O.
Make conclusions.
Hey there. Big-O notation is tough to figure out if you don't really understand what the "n" means. You've already seen people talking about how O(n) == O(2n), so I'll try to explain exactly why that is.
When we describe an algorithm as having "order-n space complexity", we mean that the size of the storage space used by the algorithm gets larger with a linear relationship to the size of the problem that it's working on (referred to as n.) If we have an algorithm that, say, sorted an array, and in order to do that sort operation the largest thing we did in memory was to create an exact copy of that array, we'd say that had "order-n space complexity" because as the size of the array (call it n elements) got larger, the algorithm would take up more space in order to match the input of the array. Hence, the algorithm uses "O(n)" space in memory.
Why does O(2n) = O(n)? Because when we talk in terms of O(n), we're only concerned with the behavior of the algorithm as n gets as large as it could possibly be. If n was to become infinite, the O(2n) algorithm would take up two times infinity spaces of memory, and the O(n) algorithm would take up one times infinity spaces of memory. Since two times infinity is just infinity, both algorithms are considered to take up a similar-enough amount of room to be both called O(n) algorithms.
You're probably thinking to yourself "An algorithm that takes up twice as much space as another algorithm is still relatively inefficient. Why are they referred to using the same notation when one is much more efficient?" Because the gain in efficiency for arbitrarily large n when going from O(2n) to O(n) is absolutely dwarfed by the gain in efficiency for arbitrarily large n when going from O(n^2) to O(500n). When n is 10, n^2 is 10 times 10 or 100, and 500n is 500 times 10, or 5000. But we're interested in n as n becomes as large as possible. They cross over and become equal for an n of 500, but once more, we're not even interested in an n as small as 500. When n is 1000, n^2 is one MILLION while 500n is a "mere" half million. When n is one million, n^2 is one thousand billion - 1,000,000,000,000 - while 500n looks on in awe with the simplicity of it's five-hundred-million - 500,000,000 - points of complexity. And once more, we can keep making n larger, because when using O(n) logic, we're only concerned with the largest possible n.
(You may argue that when n reaches infinity, n^2 is infinity times infinity, while 500n is five hundred times infinity, and didn't you just say that anything times infinity is infinity? That doesn't actually work for infinity times infinity. I think. It just doesn't. Can a mathematician back me up on this?)
This gives us the weirdly counterintuitive result where O(Seventy-five hundred billion spillion kajillion n) is considered an improvement on O(n * log n). Due to the fact that we're working with arbitrarily large "n", all that matters is how many times and where n appears in the O(). The rules of thumb mentioned in Julia Hayward's post will help you out, but here's some additional information to give you a hand.
One, because n gets as big as possible, O(n^2+61n+1682) = O(n^2), because the n^2 contributes so much more than the 61n as n gets arbitrarily large that the 61n is simply ignored, and the 61n term already dominates the 1682 term. If you see addition inside a O(), only concern yourself with the n with the highest degree.
Two, O(log10n) = O(log(any number)n), because for any base b, log10(x) = log_b(*x*)/log_b(10). Hence, O(log10n) = O(log_b(x) * 1/(log_b(10)). That 1/log_b(10) figure is a constant, which we've already shown drop out of O(n) notation.
Very loosely, you could imagine picking extremely large values of n, and calculating them. Might exceed your calculator's range for large factorials, though.
If the definition isn't clear, a more intuitive description is that "higher order" means "grows faster than, as n grows". Some rules of thumb:
O(n^a) is a higher order than O(n^b) if a > b.
log(n) grows more slowly than any positive power of n
exp(n) grows more quickly than any power of n
n! grows more quickly than exp(kn)
Oh, and as far as complexity goes, ignore the constant multipliers.
That's enough to deduce that the correct order is O(1), O(log n), O(2n) = O(n), O(n log n), O(n^2), O(n!)
For big-O complexities, the rule is that if two things vary only by constant factors, then they are the same. If one grows faster than another ignoring constant factors, then it is bigger.
So O(2n) and O(n) are the same -- they only vary by a constant factor (2). One way to think about it is to just drop the constants, since they don't impact the complexity.
The other problem with picking n and using a calculator is that it will give you the wrong answer for certain n. Big O is a measure of how fast something grows as n increases, but at any given n the complexities might not be in the right order. For instance, at n=2, n^2 is 4 and n! is 2, but n! grows quite a bit faster than n^2.
It's important to get that right, because for running times with multiple terms, you can drop the lesser terms -- ie, if O(f(n)) is 3n^2+2n+5, you can drop the 5 (constant), drop the 2n (3n^2 grows faster), then drop the 3 (constant factor) to get O(n^2)... but if you don't know that n^2 is bigger, you won't get the right answer.
In practice, you can just know that n is linear, log(n) grows more slowly than linear, n^a > n^b if a>b, 2^n is faster than any n^a, and n! is even faster than that. (Hint: try to avoid algorithms that have n in the exponent, and especially avoid ones that are n!.)
For the second part of your question, what happens with a binary search in the worst case? At each step, you cut the space in half until eventually you find your item (or run out of places to look). That is log2(2k). A search where you just walk through the list to find your item would take n steps. And we know from the first part that O(log(n)) < O(n), which is why binary search is faster than just a linear search.
Good luck with the exam!
In easy to understand terms the Big-O notation defines how quickly a particular function grows. Although it has its roots in pure mathematics its most popular application is the analysis of algorithms which can be analyzed on the basis of input size to determine the approximate number of operations that must be performed.
The benefit of using the notation is that you can categorize function growth rates by their complexity. Many different functions (an infinite number really) could all be expressed with the same complexity using this notation. For example, n+5, 2*n, and 4*n + 1/n all have O(n) complexity because the function g(n)=n most simply represents how these functions grow.
I put an emphasis on most simply because the focus of the notation is on the dominating term of the function. For example, O(2*n + 5) = O(2*n) = O(n) because n is the dominating term in the growth. This is because the notation assumes that n goes to infinity which causes the remaining terms to play less of a role in the growth rate. And, by convention, any constants or multiplicatives are omitted.
Read Big O notation and Time complexity for more a more in depth overview.
See this and look up for solutions here is first one.

Big-O for Eight Year Olds? [duplicate]

This question already has answers here:
What is a plain English explanation of "Big O" notation?
(43 answers)
Closed 5 years ago.
I'm asking more about what this means to my code. I understand the concepts mathematically, I just have a hard time wrapping my head around what they mean conceptually. For example, if one were to perform an O(1) operation on a data structure, I understand that the number of operations it has to perform won't grow because there are more items. And an O(n) operation would mean that you would perform a set of operations on each element. Could somebody fill in the blanks here?
Like what exactly would an O(n^2) operation do?
And what the heck does it mean if an operation is O(n log(n))?
And does somebody have to smoke crack to write an O(x!)?
One way of thinking about it is this:
O(N^2) means for every element, you're doing something with every other element, such as comparing them. Bubble sort is an example of this.
O(N log N) means for every element, you're doing something that only needs to look at log N of the elements. This is usually because you know something about the elements that let you make an efficient choice. Most efficient sorts are an example of this, such as merge sort.
O(N!) means to do something for all possible permutations of the N elements. Traveling salesman is an example of this, where there are N! ways to visit the nodes, and the brute force solution is to look at the total cost of every possible permutation to find the optimal one.
The big thing that Big-O notation means to your code is how it will scale when you double the amount of "things" it operates on. Here's a concrete example:
Big-O | computations for 10 things | computations for 100 things
----------------------------------------------------------------------
O(1) | 1 | 1
O(log(n)) | 3 | 7
O(n) | 10 | 100
O(n log(n)) | 30 | 700
O(n^2) | 100 | 10000
So take quicksort which is O(n log(n)) vs bubble sort which is O(n^2). When sorting 10 things, quicksort is 3 times faster than bubble sort. But when sorting 100 things, it's 14 times faster! Clearly picking the fastest algorithm is important then. When you get to databases with million rows, it can mean the difference between your query executing in 0.2 seconds, versus taking hours.
Another thing to consider is that a bad algorithm is one thing that Moore's law cannot help. For example, if you've got some scientific calculation that's O(n^3) and it can compute 100 things a day, doubling the processor speed only gets you 125 things in a day. However, knock that calculation to O(n^2) and you're doing 1000 things a day.
clarification:
Actually, Big-O says nothing about comparative performance of different algorithms at the same specific size point, but rather about comparative performance of the same algorithm at different size points:
computations computations computations
Big-O | for 10 things | for 100 things | for 1000 things
----------------------------------------------------------------------
O(1) | 1 | 1 | 1
O(log(n)) | 1 | 3 | 7
O(n) | 1 | 10 | 100
O(n log(n)) | 1 | 33 | 664
O(n^2) | 1 | 100 | 10000
You might find it useful to visualize it:
Also, on LogY/LogX scale the functions n1/2, n, n2 all look like straight lines, while on LogY/X scale 2n, en, 10n are straight lines and n! is linearithmic (looks like n log n).
This might be too mathematical, but here's my try. (I am a mathematician.)
If something is O(f(n)), then it's running time on n elements will be equal to A f(n) + B (measured in, say, clock cycles or CPU operations). It's key to understanding that you also have these constants A and B, which arise from the specific implementation. B represents essentially the "constant overhead" of your operation, for example some preprocessing that you do that doesn't depend on the size of the collection. A represents the speed of your actual item-processing algorithm.
The key, though, is that you use big O notation to figure out how well something will scale. So those constants won't really matter: if you're trying to figure out how to scale from 10 to 10000 items, who cares about the constant overhead B? Similarly, other concerns (see below) will certainly outweigh the weight of the multiplicative constant A.
So the real deal is f(n). If f grows not at all with n, e.g. f(n) = 1, then you'll scale fantastically---your running time will always just be A + B. If f grows linearly with n, i.e. f(n) = n, your running time will scale pretty much as best as can be expected---if your users are waiting 10 ns for 10 elements, they'll wait 10000 ns for 10000 elements (ignoring the additive constant). But if it grows faster, like n2, then you're in trouble; things will start slowing down way too much when you get larger collections. f(n) = n log(n) is a good compromise, usually: your operation can't be so simple as to give linear scaling, but you've managed to cut things down such that it'll scale much better than f(n) = n2.
Practically, here are some good examples:
O(1): retrieving an element from an array. We know exactly where it is in memory, so we just go get it. It doesn't matter if the collection has 10 items or 10000; it's still at index (say) 3, so we just jump to location 3 in memory.
O(n): retrieving an element from a linked list. Here, A = 0.5, because on average you''ll have to go through 1/2 of the linked list before you find the element you're looking for.
O(n2): various "dumb" sorting algorithms. Because generally their strategy involves, for each element (n), you look at all the other elements (so times another n, giving n2), then position yourself in the right place.
O(n log(n)): various "smart" sorting algorithms. It turns out that you only need to look at, say, 10 elements in a 1010-element collection to intelligently sort yourself relative to everyone else in the collection. Because everyone else is also going to look at 10 elements, and the emergent behavior is orchestrated just right so that this is enough to produce a sorted list.
O(n!): an algorithm that "tries everything," since there are (proportional to) n! possible combinations of n elements that might solve a given problem. So it just loops through all such combinations, tries them, then stops whenever it succeeds.
don.neufeld's answer is very good, but I'd probably explain it in two parts: first, there's a rough hierarchy of O()'s that most algorithms fall into. Then, you can look at each of those to come up with sketches of what typical algorithms of that time complexity do.
For practical purposes, the only O()'s that ever seem to matter are:
O(1) "constant time" - the time required is independent of the size of the input. As a rough category, I would include algorithms such as hash lookups and Union-Find here, even though neither of those are actually O(1).
O(log(n)) "logarithmic" - it gets slower as you get larger inputs, but once your input gets fairly large, it won't change enough to worry about. If your runtime is ok with reasonably-sized data, you can swamp it with as much additional data as you want and it'll still be ok.
O(n) "linear" - the more input, the longer it takes, in an even tradeoff. Three times the input size will take roughly three times as long.
O(n log(n)) "better than quadratic" - increasing the input size hurts, but it's still manageable. The algorithm is probably decent, it's just that the underlying problem is more difficult (decisions are less localized with respect to the input data) than those problems that can be solved in linear time. If your input sizes are getting up there, don't assume that you could necessarily handle twice the size without changing your architecture around (eg by moving things to overnight batch computations, or not doing things per-frame). It's ok if the input size increases a little bit, though; just watch out for multiples.
O(n^2) "quadratic" - it's really only going to work up to a certain size of your input, so pay attention to how big it could get. Also, your algorithm may suck -- think hard to see if there's an O(n log(n)) algorithm that would give you what you need. Once you're here, feel very grateful for the amazing hardware we've been gifted with. Not long ago, what you are trying to do would have been impossible for all practical purposes.
O(n^3) "cubic" - not qualitatively all that different from O(n^2). The same comments apply, only more so. There's a decent chance that a more clever algorithm could shave this time down to something smaller, eg O(n^2 log(n)) or O(n^2.8...), but then again, there's a good chance that it won't be worth the trouble. (You're already limited in your practical input size, so the constant factors that may be required for the more clever algorithms will probably swamp their advantages for practical cases. Also, thinking is slow; letting the computer chew on it may save you time overall.)
O(2^n) "exponential" - the problem is either fundamentally computationally hard or you're being an idiot. These problems have a recognizable flavor to them. Your input sizes are capped at a fairly specific hard limit. You'll know quickly whether you fit into that limit.
And that's it. There are many other possibilities that fit between these (or are greater than O(2^n)), but they don't often happen in practice and they're not qualitatively much different from one of these. Cubic algorithms are already a bit of a stretch; I only included them because I've run into them often enough to be worth mentioning (eg matrix multiplication).
What's actually happening for these classes of algorithms? Well, I think you had a good start, although there are many examples that wouldn't fit these characterizations. But for the above, I'd say it usually goes something like:
O(1) - you're only looking at most at a fixed-size chunk of your input data, and possibly none of it. Example: the maximum of a sorted list.
Or your input size is bounded. Example: addition of two numbers. (Note that addition of N numbers is linear time.)
O(log n) - each element of your input tells you enough to ignore a large fraction of the rest of the input. Example: when you look at an array element in binary search, its value tells you that you can ignore "half" of your array without looking at any of it. Or similarly, the element you look at gives you enough of a summary of a fraction of the remaining input that you won't need to look at it.
There's nothing special about halves, though -- if you can only ignore 10% of your input at each step, it's still logarithmic.
O(n) - you do some fixed amount of work per input element. (But see below.)
O(n log(n)) - there are a few variants.
You can divide the input into two piles (in no more than linear time), solve the problem independently on each pile, and then combine the two piles to form the final solution. The independence of the two piles is key. Example: classic recursive mergesort.
Each linear-time pass over the data gets you halfway to your solution. Example: quicksort if you think in terms of the maximum distance of each element to its final sorted position at each partitioning step (and yes, I know that it's actually O(n^2) because of degenerate pivot choices. But practically speaking, it falls into my O(n log(n)) category.)
O(n^2) - you have to look at every pair of input elements.
Or you don't, but you think you do, and you're using the wrong algorithm.
O(n^3) - um... I don't have a snappy characterization of these. It's probably one of:
You're multiplying matrices
You're looking at every pair of inputs but the operation you do requires looking at all of the inputs again
the entire graph structure of your input is relevant
O(2^n) - you need to consider every possible subset of your inputs.
None of these are rigorous. Especially not linear time algorithms (O(n)): I could come up with a number of examples where you have to look at all of the inputs, then half of them, then half of those, etc. Or the other way around -- you fold together pairs of inputs, then recurse on the output. These don't fit the description above, since you're not looking at each input once, but it still comes out in linear time. Still, 99.2% of the time, linear time means looking at each input once.
A lot of these are easy to demonstrate with something non-programming, like shuffling cards.
Sorting a deck of cards by going through the whole deck to find the ace of spades, then going through the whole deck to find the 2 of spades, and so on would be worst case n^2, if the deck was already sorted backwards. You looked at all 52 cards 52 times.
In general the really bad algorithms aren't necessarily intentional, they're commonly a misuse of something else, like calling a method that is linear inside some other method that repeats over the same set linearly.
I try to explain by giving simple code examples in C# and JavaScript.
C#
For List<int> numbers = new List<int> {1,2,3,4,5,6,7,12,543,7};
O(1) looks like
return numbers.First();
O(n) looks like
int result = 0;
foreach (int num in numbers)
{
result += num;
}
return result;
O(n log(n)) looks like
int result = 0;
foreach (int num in numbers)
{
int index = numbers.Count - 1;
while (index > 1)
{
// yeah, stupid, but couldn't come up with something more useful :-(
result += numbers[index];
index /= 2;
}
}
return result;
O(n2) looks like
int result = 0;
foreach (int outerNum in numbers)
{
foreach (int innerNum in numbers)
{
result += outerNum * innerNum;
}
}
return result;
O(n!) looks like, uhm, to tired to come up with anything simple.
But I hope you get the general point?
JavaScript
For const numbers = [ 1, 2, 3, 4, 5, 6, 7, 12, 543, 7 ];
O(1) looks like
numbers[0];
O(n) looks like
let result = 0;
for (num of numbers){
result += num;
}
O(n log(n)) looks like
let result = 0;
for (num of numbers){
let index = numbers.length - 1;
while (index > 1){
// yeah, stupid, but couldn't come up with something more useful :-(
result += numbers[index];
index = Math.floor(index/2)
}
}
O(n2) looks like
let result = 0;
for (outerNum of numbers){
for (innerNum of numbers){
result += outerNum * innerNum;
}
}
Ok - there are some very good answers here but almost all of them seem to make the same mistake and it's one that is pervading common usage.
Informally, we write that f(n) = O( g(n) ) if, up to a scaling factor and for all n larger than some n0, g(n) is larger than f(n). That is, f(n) grows no quicker than, or is bounded from above by, g(n). This tells us nothing about how fast f(n) grows, save for the fact that it is guaranteed not to be any worse than g(n).
A concrete example: n = O( 2^n ). We all know that n grows much less quickly than 2^n, so that entitles us to say that it is bounded by above by the exponential function. There is a lot of room between n and 2^n, so it's not a very tight bound, but it's still a legitimate bound.
Why do we (computer scientists) use bounds rather than being exact? Because a) bounds are often easier to prove and b) it gives us a short-hand to express properties of algorithms. If I say that my new algorithm is O(n.log n) that means that in the worst case its run-time will be bounded from above by n.log n on n inputs, for large enough n (although see my comments below on when I might not mean worst-case).
If instead, we want to say that a function grows exactly as quickly as some other function, we use theta to make that point (I'll write T( f(n) ) to mean \Theta of f(n) in markdown). T( g(n) ) is short hand for being bounded from above and below by g(n), again, up to a scaling factor and asymptotically.
That is f(n) = T( g(n) ) <=> f(n) = O(g(n)) and g(n) = O(f(n)). In our example, we can see that n != T( 2^n ) because 2^n != O(n).
Why get concerned about this? Because in your question you write 'would someone have to smoke crack to write an O(x!)?' The answer is no - because basically everything you write will be bounded from above by the factorial function. The run time of quicksort is O(n!) - it's just not a tight bound.
There's also another dimension of subtlety here. Typically we are talking about the worst case input when we use O( g(n) ) notation, so that we are making a compound statement: in the worst case running time it will not be any worse than an algorithm that takes g(n) steps, again modulo scaling and for large enough n. But sometimes we want to talk about the running time of the average and even best cases.
Vanilla quicksort is, as ever, a good example. It's T( n^2 ) in the worst case (it will actually take at least n^2 steps, but not significantly more), but T(n.log n) in the average case, which is to say the expected number of steps is proportional to n.log n. In the best case it is also T(n.log n) - but you could improve that for, by example, checking if the array was already sorted in which case the best case running time would be T( n ).
How does this relate to your question about the practical realisations of these bounds? Well, unfortunately, O( ) notation hides constants which real-world implementations have to deal with. So although we can say that, for example, for a T(n^2) operation we have to visit every possible pair of elements, we don't know how many times we have to visit them (except that it's not a function of n). So we could have to visit every pair 10 times, or 10^10 times, and the T(n^2) statement makes no distinction. Lower order functions are also hidden - we could have to visit every pair of elements once, and every individual element 100 times, because n^2 + 100n = T(n^2). The idea behind O( ) notation is that for large enough n, this doesn't matter at all because n^2 gets so much larger than 100n that we don't even notice the impact of 100n on the running time. However, we often deal with 'sufficiently small' n such that constant factors and so on make a real, significant difference.
For example, quicksort (average cost T(n.log n)) and heapsort (average cost T(n.log n)) are both sorting algorithms with the same average cost - yet quicksort is typically much faster than heapsort. This is because heapsort does a few more comparisons per element than quicksort.
This is not to say that O( ) notation is useless, just imprecise. It's quite a blunt tool to wield for small n.
(As a final note to this treatise, remember that O( ) notation just describes the growth of any function - it doesn't necessarily have to be time, it could be memory, messages exchanged in a distributed system or number of CPUs required for a parallel algorithm.)
The way I describe it to my nontechnical friends is like this:
Consider multi-digit addition. Good old-fashioned, pencil-and-paper addition. The kind you learned when you were 7-8 years old. Given two three-or-four-digit numbers, you can find out what they add up to fairly easily.
If I gave you two 100-digit numbers, and asked you what they add up to, figuring it out would be pretty straightforward, even if you had to use pencil-and-paper. A bright kid could do such an addition in just a few minutes. This would only require about 100 operations.
Now, consider multi-digit multiplication. You probably learned that at around 8 or 9 years old. You (hopefully) did lots of repetitive drills to learn the mechanics behind it.
Now, imagine I gave you those same two 100-digit numbers and told you to multiply them together. This would be a much, much harder task, something that would take you hours to do - and that you'd be unlikely to do without mistakes. The reason for this is that (this version of) multiplication is O(n^2); each digit in the bottom number has to be multiplied by each digit in the top number, leaving a total of about n^2 operations. In the case of the 100-digit numbers, that's 10,000 multiplications.
No, an O(n) algorithm does not mean it will perform an operation on each element. Big-O notation gives you a way to talk about the "speed" of you algorithm independent of your actual machine.
O(n) means that the time your algorithm will take grows linearly as your input increase. O(n^2) means that the time your algorithm takes grows as the square of your input. And so forth.
The way I think about it, is you have the task of cleaning up a problem caused by some evil villain V who picks N, and you have to estimate out how much longer it's going to take to finish your problem when he increases N.
O(1) -> increasing N really doesn't make any difference at all
O(log(N)) -> every time V doubles N, you have to spend an extra amount of time T to complete the task. V doubles N again, and you spend the same amount.
O(N) -> every time V doubles N, you spend twice as much time.
O(N^2) -> every time V doubles N, you spend 4x as much time. (it's not fair!!!)
O(N log(N)) -> every time V doubles N, you spend twice as much time plus a little more.
These are bounds of an algorithm; computer scientists want to describe how long it is going to take for large values of N. (which gets important when you are factoring numbers that are used in cryptography -- if the computers speed up by a factor of 10, how many more bits do you have to use to ensure it will still take them 100 years to break your encryption and not just 1 year?)
Some of the bounds can have weird expressions if it makes a difference to the people involved. I've seen stuff like O(N log(N) log(log(N))) somewhere in Knuth's Art of Computer Programming for some algorithms. (can't remember which one off the top of my head)
One thing that hasn't been touched on yet for some reason:
When you see algorithms with things like O(2^n) or O(n^3) or other nasty values it often means you're going to have to accept an imperfect answer to your problem in order to get acceptable performance.
Correct solutions that blow up like this are common when dealing with optimization problems. A nearly-correct answer delivered in a reasonable timeframe is better than a correct answer delivered long after the machine has decayed to dust.
Consider chess: I don't know exactly what the correct solution is considered to be but it's probably something like O(n^50) or even worse. It is theoretically impossible for any computer to actually calculate the correct answer--even if you use every particle in the universe as a computing element performing an operation in the minimum possible time for the life of the universe you still have a lot of zeros left. (Whether a quantum computer can solve it is another matter.)
The "Intuitition" behind Big-O
Imagine a "competition" between two functions over x, as x approaches infinity: f(x) and g(x).
Now, if from some point on (some x) one function always has a higher value then the other, then let's call this function "faster" than the other.
So, for example, if for every x > 100 you see that f(x) > g(x), then f(x) is "faster" than g(x).
In this case we would say g(x) = O(f(x)). f(x) poses a sort of "speed limit" of sorts for g(x), since eventually it passes it and leaves it behind for good.
This isn't exactly the definition of big-O notation, which also states that f(x) only has to be larger than C*g(x) for some constant C (which is just another way of saying that you can't help g(x) win the competition by multiplying it by a constant factor - f(x) will always win in the end). The formal definition also uses absolute values. But I hope I managed to make it intuitive.
And does somebody have to smoke crack to write an O(x!)?
No, just use Prolog. If you write a sorting algorithm in Prolog by just describing that each element should be bigger than the previous, and let backtracking do the sorting for you, that will be O(x!). Also known as "permutation sort".
I like don neufeld's answer, but I think I can add something about O(n log n).
An algorithm which uses a simple divide and conquer strategy is probably going to be O(log n). The simplest example of this is finding a something in an sorted list. You don't start at the beginning and scan for it. You go to the middle, you decide if you should then go backwards or forwards, jump halfway to the last place you looked, and repeat this until you find the item you're looking for.
If you look at the quicksort or mergesort algorithms, you will see that they both take the approach of dividing the list to be sorted in half, sorting each half (using the same algorithm, recursively), and then recombining the two halves. This sort of recursive divide and conquer strategy will be O(n log n).
If you think about it carefully, you'll see that quicksort does an O(n) partitioning algorithm on the whole n items, then an O(n) partitioning twice on n/2 items, then 4 times on n/4 items, etc... until you get to an n partitions on 1 item (which is degenerate). The number of times you divide n in half to get to 1 is approximately log n, and each step is O(n), so recursive divide and conquer is O(n log n). Mergesort builds the other way, starting with n recombinations of 1 item, and finishing with 1 recombination of n items, where the recombination of two sorted lists is O(n).
As for smoking crack to write an O(n!) algorithm, you are unless you have no choice. The traveling salesman problem given above is believed to be one such problem.
Think of it as stacking lego blocks (n) vertically and jumping over them.
O(1) means at each step, you do nothing. The height stays the same.
O(n) means at each step, you stack c blocks, where c1 is a constant.
O(n^2) means at each step, you stack c2 x n blocks, where c2 is a constant, and n is the number of stacked blocks.
O(nlogn) means at each step, you stack c3 x n x log n blocks, where c3 is a constant, and n is the number of stacked blocks.
Most Jon Bentley books (e.g. Programming Pearls) cover such stuff in a really pragmatic manner. This talk given by him includes one such analysis of a quicksort.
While not entirely relevant to the question, Knuth came up with an interesting idea: teaching Big-O notation in high school calculus classes, though I find this idea quite eccentric.
To understand O(n log n), remember that log n means log-base-2 of n. Then look at each part:
O(n) is, more or less, when you operate on each item in the set.
O(log n) is when the number of operations is the same as the exponent to which you raise 2, to get the number of items. A binary search, for instance, has to cut the set in half log n times.
O(n log n) is a combination – you're doing something along the lines of a binary search for each item in the set. Efficient sorts often operate by doing one loop per item, and in each loop doing a good search to find the right place to put the item or group in question. Hence n * log n.
Just to respond to the couple of comments on my above post:
Domenic - I'm on this site, and I care. Not for pedantry's sake, but because we - as programmers - typically care about precision. Using O( ) notation incorrectly in the style that some have done here renders it kind of meaningless; we may just as well say something takes n^2 units of time as O( n^2 ) under the conventions used here. Using the O( ) adds nothing. It's not just a small discrepancy between common usage and mathematical precision that I'm talking about, it's the difference between it being meaningful and it not.
I know many, many excellent programmers who use these terms precisely. Saying 'oh, we're programmers therefore we don't care' cheapens the whole enterprise.
onebyone - Well, not really although I take your point. It's not O(1) for arbitrarily large n, which is kind of the definition of O( ). It just goes to show that O( ) has limited applicability for bounded n, where we would rather actually talk about the number of steps taken rather than a bound on that number.
Tell your eight year old log(n) means the number of times you have to chop a length n log in two for it to get down to size n=1 :p
O(n log n) is usually sorting
O(n^2) is usually comparing all pairs of elements
Suppose you had a computer that could solve a problem of a certain size. Now imagine that we can double the performance a few times. How much bigger a problem can we solve with each doubling?
If we can solve a problem of double the size, that's O(n).
If we have some multiplier that isn't one, that's some sort of polynomial complexity. For example, if each doubling allows us to increase the problem size by about 40%, it's O(n^2), and about 30% would be O(n^3).
If we just add to the problem size, it's exponential or worse. For example, if each doubling means we can solve a problem 1 bigger, it's O(2^n). (This is why brute-forcing a cipher key becomes effectively impossible with reasonably sized keys: a 128-bit key requires about 16 quintillion times as much processing as a 64-bit.)
Remember the fable of the tortoise and the hare (turtle and rabbit)?
Over the long run, the tortoise wins, but over the short run the hare wins.
That's like O(logN) (tortoise) vs. O(N) (hare).
If two methods differ in their big-O, then there is a level of N at which one of them will win, but big-O says nothing about how big that N is.
To remain sincere to the question asked I would answer the question in the manner I would answer an 8 year old kid
Suppose an ice-cream seller prepares a number of ice creams ( say N ) of different shapes arranged in an orderly fashion.
You want to eat the ice cream lying in the middle
Case 1 : - You can eat an ice cream only if you have eaten all the ice creams smaller than it
You will have to eat half of all the ice creams prepared (input).Answer directly depends on the size of the input
Solution will be of order o(N)
Case 2 :- You can directly eat the ice cream in the middle
Solution will be O(1)
Case 3 : You can eat an ice cream only if you have eaten all the ice creams smaller than it and each time you eat an ice cream you allow another kid (new kid everytime ) to eat all his ice creams
Total time taken would be N + N + N.......(N/2) times
Solution will be O(N2)
log(n) means logarithmic growth. An example would be divide and conquer algorithms. If you have 1000 sorted numbers in an array ( ex. 3, 10, 34, 244, 1203 ... ) and want to search for a number in the list (find its position), you could start with checking the value of the number at index 500. If it is lower than what you seek, jump to 750. If it is higher than what you seek, jump to 250. Then you repeat the process until you find your value (and key). Every time we jump half the search space, we can cull away testing many other values since we know the number 3004 can't be above number 5000 (remember, it is a sorted list).
n log(n) then means n * log(n).
I'll try to actually write an explanation for a real eight year old boy, aside from technical terms and mathematical notions.
Like what exactly would an O(n^2) operation do?
If you are in a party, and there are n people in the party including you. How many handshakes it take so that everyone has handshaked everyone else, given that people would probably forget who they handshaked at some point.
Note: this approximate to a simplex yielding n(n-1) which is close enough to n^2.
And what the heck does it mean if an operation is O(n log(n))?
Your favorite team has won, they are standing in line, and there are n players in the team. How many hanshakes it would take you to handshake every player, given that you will hanshake each one multiple times, how many times, how many digits are in the number of the players n.
Note: this will yield n * log n to the base 10.
And does somebody have to smoke crack to write an O(x!)?
You are a rich kid and in your wardrobe there are alot of cloths, there are x drawers for each type of clothing, the drawers are next to each others, the first drawer has 1 item, each drawer has as many cloths as in the drawer to its left and one more, so you have something like 1 hat, 2 wigs, .. (x-1) pants, then x shirts. Now in how many ways can you dress up using a single item from each drawer.
Note: this example represent how many leaves in a decision-tree where number of children = depth, which is done through 1 * 2 * 3 * .. * x

Resources