Related
I am currently learning about Big O Notation running times and amortized times.
I have the following question:
two algorithms based on the principle of Divide & Conquer are available to solve a problem of complexity n.
Algorithm 1 divide the problem into 18 small problems and requires O (n^2) operations to combine the sub-solutions together.
Algorithm 2 divide the problem into 64 small problems and requires O(n) operations to combine the sub-solutions together.
Which algorithm is better and faster (for large n)?
I'm guessing that the second Algorithm is better because it requires less time (O(n) is faster than O(n^2)).
Am I correct in my guess?
Does the number of small problems play a role in the speed of Algorithm or does it always require a constant Time?
In this case it's probably not intended to be a trap, but it's good to be careful and some counter-intuitive things can happen. The trap, if it happens, is mostly this: how much smaller do the sub-problems get, compared to how many of them are generated?
For example, it is true for Algorithm 1 here that if sub-problems are 1/5th of the size of the current problem or smaller (and perhaps they meant they would be 1/18th the size?), then overall the time complexity is in O(n²). But if the size of the problem only goes down by a factor of 4, we're already up to O(n2.085), and if the domain is only cut into half (but still 18 times) then it goes all the way up to O(n4.17).
Similarly for Algorithm 2, sure if it cuts a program into 64 sub problems that are each 1/64th of the size, the overall time complexity would be in O(n log n). But if the sub problems are even a little bit bigger, say 1/63rd of the size, immediately we go up a whole step in the hierarchy to O(n1.004) - a tiny constant in the exponent still, but no longer loglinear. Make the problems 1/8th the size and the complexity becomes quadratic, and if we go to a mere halving of the problem size at each step it's all the way up to O(n6)! On the other hand if the problems shrink only a little bit faster, say 1/65th of the size, immediately the complexity stops being loglinear again but this time in the other direction, becoming O(n).
So it could go either way, depending on how quickly the sub-problems shrink, which is not explicitly mentioned in your problem statement. Hopefully it is clear that merely comparing the "additional processing per step" is not sufficient, not in general anyway. A lot of processing per step is a disadvantage that cannot be overcome, but having only a little processing per step is an advantage that can be easily lost if the "shrinkage factor" is small compared to the "fan-out factor".
The Master theorem is used for asymptotic analysis for divide and conquer algorithms and will provide a way for you to get a direct answer rather than guessing.
T(n) = aT(n/b) + f(n)
where T is the main problem, n is the set of input, a is the number of subproblems you divide into, b is the factor that your input set is decreased by for each subproblem, and f(n) is the function to split and combine subproblems together. From here we find c:
f(n) is O(n^c)
For example, in your example algorithm 1, c = 2, and in algorithm 2, c = 1. The value a is 18 and 64 for algorithm 1 and 2 respectively. The next part is where your problem is missing the appropriate information since b is not provided. In other words, to get a clear answer, you need to know the factor that each subproblem divides the original input.
if c < logb(a) then T(n) is O(n^logb(a))
if c = logb(a) then T(n) is O(n^c log(n))
if c > logb(a) then T(n) is O(f(n))
for example, say n = Integer.MAX_VALUE or 2^123 then O(log(n)) = 32 and 123 so a small integer. isn't it O(1) ?
what is the difference ? I think, the reason is O(1) is constant but O(log(n)) not. Any other ideas ?
If n is bounded above, then complexity classes involving n make no sense. There is no such thing as "in the limit as 2^123 approaches infinity", except in the old joke that "a pentagon approximates a circle, for sufficiently large values of 5".
Generally, when analysing the complexity of code, we pretend that the input size isn't bounded above by the resource limits of the machine, even though it is. This does lead to some slightly odd things going on around log n, since if n has to fit into a fixed-size int type, then log n has quite a small bound, so the bound is more likely to be useful/relevant.
So sometimes, we're analysing a slightly idealised version of the algorithm, because the actual code written cannot accept arbitrarily large input.
For example, your average quicksort formally uses Theta(log n) stack in the worst case, obviously so with the fairly common implementation that call-recurses on the "small" side of the partition and loop-recurses on the "big" side. But on a 32 bit machine you can arrange to in fact use a fixed-size array of about 240 bytes to store the "todo list", which might be less than some other function you've written based on an algorithm that formally has O(1) stack use. The morals are that implementation != algorithm, complexity doesn't tell you anything about small numbers, and any specific number is "small".
If you want to account for bounds, you could say that, for example, your code to sort an array is O(1) running time, because the array has to be below the size that fits in your PC's address space, and hence the time to sort it is bounded. However, you will fail your CS assignment if you do, and you won't be providing anyone with any useful information :-)
Obviously if you know that the input will always have a fixed number of elements, the algorithm will always run in constant time. Big-O notation is used to denote worse-case running time, which describes the limit when the number of elements grows infinitely large.
The difference is that n isn't fixed. The idea behind Big-O notation is to get an idea of how the size of the input effects the running time (or memory usage). So if an algorithm always takes the same amount of time, whether n = 1 or n = Integer.MAX_VALUE, we say it is O(1). If the algorithm takes a unit of time longer each time the input size doubles, then we say it is O(logn).
Edit: to answer your specific question on the difference between O(1) and O(logn), I'll give you an example. Let's say we want an algorithm that will find the min element in an unsorted array. One approach is to go through each element and keep track of the current min. Another approach is to sort the array and then return the first element.
The first algorithm is O(n), and the second algorithm is O(nlogn). So let's say we start with an array of 16 elements. The first algorithm will run in time 16, the second algorithm will run in time 16*4. If we increase it to 17, then it becomes 17 and 17*4. We might naively say that the second algorithm takes 4 times as long as the first algorithm (if we treat the logn component as constant).
But let's look at what happens when our array contains 2^32 elements. Now the first algorithm takes 2^32 time to complete, where our second algorithm takes 32*2^32 time to complete. It takes 32 times as long. Yes, it's a small difference, but it is still a difference. If the first algorithm takes 1 minute, the second algorithm will take over half an hour!
I think you will get a better idea if it is called O(n^0).
It is a scaling function depending on the input variable N. It is a function, not number, you should never assume any number for the variable N.
It is just like that you say that a function f(x) is 3 because f(100) = 3, it is wrong. It is a function, not any particular number. A constant function f(x) = 1 is still a function, it will never equal to another function g(x) = N, i.e. g(x)=f(x)
Its the growth rate that you want to look at. O(1) implies no growth at all. While O(logn) does have growth. Even though the growth is small it is still growth.
You’re not thinking big enough. Any algorithm that runs on a computer will either run forever or terminate after some small number of steps — since the computer is only a finite state machine, you cannot write algorithms that run for an arbitrary amount of time and then terminate. By that argument, Big-O notation is only theoretical and has no purpose in a real-life computer program. Even O(2^n) hits an upper limit at O(2^INT_MAX), which is equivalent to O(1).
Realistically, though, Big-O can help you out if you know the constant factors. Even if an algorithm has an upper bound of O(log n), and n can have 32 bits, that could mean the difference between a request taking 1 second and 32 seconds.
Big-O shows how running time (or memory, etc) changes as the size of problem changes.
When size of the problem gets 10 times bigger, an O(n) solution takes 10 times as long, an O(log(n)) solution takes a bit longer, and an O(1) solution takes the same time: O(1) means 'changes as fast as constant 1', but constants don't change.
Familiarize yourself with the big-O notation in a bit more detail.
There is a reason why you leave "O(n)" in, and consider to drop "O(log n)". They both are "constants": the former is less than 32, and the latter is less than 232. But you nevertheless have a natural feeling that you can't call O(n) O(1).
However, if log(n) < 32, it means that O(n*logn) algorithm works thirty two times slower than its O(n) version. Big enough to write "log*n"s?
My program of sorting values clocks at:
100000 8s
1000000 82s
10000000 811s
Is that O(n)?
Looks like it, but in reality, you really need to analyze the algorithm, because there may be different cases based on the data. Some algorithms do better or worse with pre-sorted data, for instance. What's your algorithm?
Yes, that looks like O(n) to me - going from the 1st to 2nd case and the 2nd to 3rd case, you've made the input 10 times bigger, and it's taken 10 times longer.
In particular, it looks like you can predict the rough time by using:
f(n) = n / 12500
or
f(n) = n * 0.00008
which gives a simplest explanation of O(n) for the data provided.
EDIT: However... As has been pointed out, there are various ways in which the data could be misleading you - I rather like Dennis Palmer's idea that the IO cost is dwarfing everything else. For example, suppose you have an algorithm whose absolute number of operations is:
f(n) = 1000000000000n + (n^2)
In that case the complexity is still O(n^2), but that won't become apparent from observations until n is very large.
I think it would be accurate to say that these observations are suggestive of an O(n) algorithm but that doesn't mean it definitely is.
Time behavior doesn't work that way. All you can really say there is that those three datasets are roughly O(n) from each other. That doesn't mean the algorithim is O(n).
The first problem is that I could easily draw a curve that goes exponential ( O(e**n) ) that nonetheless includes those three points.
The big problem though is that you didn't say anything about the data. There are many sorting algorithms that approach O(n) for sorted or nearly sorted inputs (eg: Mergesort). However, their average case (typically randomly-ordered data) and worst case (often reverse-sorted data) is invariably O(nlogn) or worse.
You cannot tell.
First, the time depends on the data, environment and algorithm. If you have an array of zeroes and try to sort it, the program running time shouldn't be much different for 1000 or 1000000 elements.
Second, the O notation tells about function behavior for big value of n, starting at n0. It could be that your algorithm scales well up to certain input size, and then its behavior changes - like the g(n) function.
it looks like O(n) to me.
Yes, that is O(n) because it scales with the number of items.
1000000 = 10 * 100000
and
82s = 10 * 8s (roughly)
You cannot depend on that to say it is O(n). Bubblesort, for instance, can complete in n steps, however it is an O(n^2) algorithm.
Yes, it appears to be O(n), but I don't think you can conclusively analyze the algorithm based on it's timed performance. You might be using the most inefficient sorting algorithm, but have O(n) timed results because the data read/write time is the majority of the total execution time.
Edit: Big-O is determined by how efficient an algorithm is and how it scales. It should predict the growth of execution time as the number of input items grows. The inverse is not necessarily true. Observing a given growth in execution time could mean a few different things. If it stays true to the example numbers you've given, then you can conclude that your program runs at O(n), but as others have said, that doesn't mean your sorting algorithm is O(n).
Whenever I consider algorithms/data structures I tend to replace the log(N) parts by constants. Oh, I know log(N) diverges - but does it matter in real world applications?
log(infinity) < 100 for all practical purposes.
I am really curious for real world examples where this doesn't hold.
To clarify:
I understand O(f(N))
I am curious about real world examples where the asymptotic behaviour matters more than the constants of the actual performance.
If log(N) can be replaced by a constant it still can be replaced by a constant in O( N log N).
This question is for the sake of (a) entertainment and (b) to gather arguments to use if I run (again) into a controversy about the performance of a design.
Big O notation tells you about how your algorithm changes with growing input. O(1) tells you it doesn't matter how much your input grows, the algorithm will always be just as fast. O(logn) says that the algorithm will be fast, but as your input grows it will take a little longer.
O(1) and O(logn) makes a big diference when you start to combine algorithms.
Take doing joins with indexes for example. If you could do a join in O(1) instead of O(logn) you would have huge performance gains. For example with O(1) you can join any amount of times and you still have O(1). But with O(logn) you need to multiply the operation count by logn each time.
For large inputs, if you had an algorithm that was O(n^2) already, you would much rather do an operation that was O(1) inside, and not O(logn) inside.
Also remember that Big-O of anything can have a constant overhead. Let's say that constant overhead is 1 million. With O(1) that constant overhead does not amplify the number of operations as much as O(logn) does.
Another point is that everyone thinks of O(logn) representing n elements of a tree data structure for example. But it could be anything including bytes in a file.
I think this is a pragmatic approach; O(logN) will never be more than 64. In practice, whenever terms get as 'small' as O(logN), you have to measure to see if the constant factors win out. See also
Uses of Ackermann function?
To quote myself from comments on another answer:
[Big-Oh] 'Analysis' only matters for factors
that are at least O(N). For any
smaller factor, big-oh analysis is
useless and you must measure.
and
"With O(logN) your input size does
matter." This is the whole point of
the question. Of course it matters...
in theory. The question the OP asks
is, does it matter in practice? I
contend that the answer is no, there
is not, and never will be, a data set
for which logN will grow so fast as to
always be beaten a constant-time
algorithm. Even for the largest
practical dataset imaginable in the
lifetimes of our grandchildren, a logN
algorithm has a fair chance of beating
a constant time algorithm - you must
always measure.
EDIT
A good talk:
http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
about halfway through, Rich discusses Clojure's hash tries, which are clearly O(logN), but the base of the logarithm is large and so the depth of the trie is at most 6 even if it contains 4 billion values. Here "6" is still an O(logN) value, but it is an incredibly small value, and so choosing to discard this awesome data structure because "I really need O(1)" is a foolish thing to do. This emphasizes how most of the other answers to this question are simply wrong from the perspective of the pragmatist who wants their algorithm to "run fast" and "scale well", regardless of what the "theory" says.
EDIT
See also
http://queue.acm.org/detail.cfm?id=1814327
which says
What good is an O(log2(n)) algorithm
if those operations cause page faults
and slow disk operations? For most
relevant datasets an O(n) or even an
O(n^2) algorithm, which avoids page
faults, will run circles around it.
(but go read the article for context).
This is a common mistake - remember Big O notation is NOT telling you about the absolute performance of an algorithm at a given value, it's simply telling you the behavior of an algorithm as you increase the size of the input.
When you take it in that context it becomes clear why an algorithm A ~ O(logN) and an algorithm B ~ O(1) algorithm are different:
if I run A on an input of size a, then on an input of size 1000000*a, I can expect the second input to take log(1,000,000) times as long as the first input
if I run B on an input of size a, then on an input of size 1000000*a, I can expect the second input to take about the same amount of time as the first input
EDIT: Thinking over your question some more, I do think there's some wisdom to be had in it. While I would never say it's correct to say O(lgN) == O(1), It IS possible that an O(lgN) algorithm might be used over an O(1) algorithm. This draws back to the point about absolute performance above: Just knowing one algorithm is O(1) and another algorithm is O(lgN) is NOT enough to declare you should use the O(1) over the O(lgN), it's certainly possible given your range of possible inputs an O(lgN) might serve you best.
You asked for a real-world example. I'll give you one. Computational biology. One strand of DNA encoded in ASCII is somewhere on the level of gigabytes in space. A typical database will obviously have many thousands of such strands.
Now, in the case of an indexing/searching algorithm, that log(n) multiple makes a large difference when coupled with constants. The reason why? This is one of the applications where the size of your input is astronomical. Additionally, the input size will always continue to grow.
Admittedly, these type of problems are rare. There are only so many applications this large. In those circumstances, though... it makes a world of difference.
Equality, the way you're describing it, is a common abuse of notation.
To clarify: we usually write f(x) = O(logN) to imply "f(x) is O(logN)".
At any rate, O(1) means a constant number of steps/time (as an upper bound) to perform an action regardless of how large the input set is. But for O(logN), number of steps/time still grows as a function of the input size (the logarithm of it), it just grows very slowly. For most real world applications you may be safe in assuming that this number of steps will not exceed 100, however I'd bet there are multiple examples of datasets large enough to mark your statement both dangerous and void (packet traces, environmental measurements, and many more).
For small enough N, O(N^N) can in practice be replaced with 1. Not O(1) (by definition), but for N=2 you can see it as one operation with 4 parts, or a constant-time operation.
What if all operations take 1hour? The difference between O(log N) and O(1) is then large, even with small N.
Or if you need to run the algorithm ten million times? Ok, that took 30minutes, so when I run it on a dataset a hundred times as large it should still take 30minutes because O(logN) is "the same" as O(1).... eh...what?
Your statement that "I understand O(f(N))" is clearly false.
Real world applications, oh... I don't know.... EVERY USE OF O()-notation EVER?
Binary search in sorted list of 10 million items for example. It's the very REASON we use hash tables when the data gets big enough. If you think O(logN) is the same as O(1), then why would you EVER use a hash instead of a binary tree?
As many have already said, for the real world, you need to look at the constant factors first, before even worrying about factors of O(log N).
Then, consider what you will expect N to be. If you have good reason to think that N<10, you can use a linear search instead of a binary one. That's O(N) instead of O(log N), which according to your lights would be significant -- but a linear search that moves found elements to the front may well outperform a more complicated balanced tree, depending on the application.
On the other hand, note that, even if log N is not likely to exceed 50, a performance factor of 10 is really huge -- if you're compute-bound, a factor like that can easily make or break your application. If that's not enough for you, you'll frequently see factors of (log N)^2 or (logN)^3 in algorithms, so even if you think you can ignore one factor of (log N), that doesn't mean you can ignore more of them.
Finally, note that the simplex algorithm for linear programming has a worst case performance of O(2^n). However, for practical problems, the worst case never comes up; in practice, the simplex algorithm is fast, relatively simple, and consequently very popular.
About 30 years ago, someone developed a polynomial-time algorithm for linear programming, but it was not initially practical because the result was too slow.
Nowadays, there are practical alternative algorithms for linear programming (with polynomial-time wost-case, for what that's worth), which can outperform the simplex method in practice. But, depending on the problem, the simplex method is still competitive.
The observation that O(log n) is oftentimes indistinguishable from O(1) is a good one.
As a familiar example, suppose we wanted to find a single element in a sorted array of one 1,000,000,000,000 elements:
with linear search, the search takes on average 500,000,000,000 steps
with binary search, the search takes on average 40 steps
Suppose we added a single element to the array we are searching, and now we must search for another element:
with linear search, the search takes on average 500,000,000,001 steps (indistinguishable change)
with binary search, the search takes on average 40 steps (indistinguishable change)
Suppose we doubled the number of elements in the array we are searching, and now we must search for another element:
with linear search, the search takes on average 1,000,000,000,000 steps (extraordinarily noticeable change)
with binary search, the search takes on average 41 steps (indistinguishable change)
As we can see from this example, for all intents and purposes, an O(log n) algorithm like binary search is oftentimes indistinguishable from an O(1) algorithm like omniscience.
The takeaway point is this: *we use O(log n) algorithms because they are often indistinguishable from constant time, and because they often perform phenomenally better than linear time algorithms.
Obviously, these examples assume reasonable constants. Obviously, these are generic observations and do not apply to all cases. Obviously, these points apply at the asymptotic end of the curve, not the n=3 end.
But this observation explains why, for example, we use such techniques as tuning a query to do an index seek rather than a table scan - because an index seek operates in nearly constant time no matter the size of the dataset, while a table scan is crushingly slow on sufficiently large datasets. Index seek is O(log n).
You might be interested in Soft-O, which ignores logarithmic cost. Check this paragraph in Wikipedia.
What do you mean by whether or not it "matters"?
If you're faced with the choice of an O(1) algorithm and a O(lg n) one, then you should not assume they're equal. You should choose the constant-time one. Why wouldn't you?
And if no constant-time algorithm exists, then the logarithmic-time one is usually the best you can get. Again, does it then matter? You just have to take the fastest you can find.
Can you give me a situation where you'd gain anything by defining the two as equal? At best, it'd make no difference, and at worst, you'd hide some real scalability characteristics. Because usually, a constant-time algorithm will be faster than a logarithmic one.
Even if, as you say, lg(n) < 100 for all practical purposes, that's still a factor 100 on top of your other overhead. If I call your function, N times, then it starts to matter whether your function runs logarithmic time or constant, because the total complexity is then O(n lg n) or O(n).
So rather than asking if "it matters" that you assume logarithmic complexity to be constant in "the real world", I'd ask if there's any point in doing that.
Often you can assume that logarithmic algorithms are fast enough, but what do you gain by considering them constant?
O(logN)*O(logN)*O(logN) is very different. O(1) * O(1) * O(1) is still constant.
Also a simple quicksort-style O(nlogn) is different than O(n O(1))=O(n). Try sorting 1000 and 1000000 elements. The latter isn't 1000 times slower, it's 2000 times, because log(n^2)=2log(n)
The title of the question is misleading (well chosen to drum up debate, mind you).
O(log N) == O(1) is obviously wrong (and the poster is aware of this). Big O notation, by definition, regards asymptotic analysis. When you see O(N), N is taken to approach infinity. If N is assigned a constant, it's not Big O.
Note, this isn't just a nitpicky detail that only theoretical computer scientists need to care about. All of the arithmetic used to determine the O function for an algorithm relies on it. When you publish the O function for your algorithm, you might be omitting a lot of information about it's performance.
Big O analysis is cool, because it lets you compare algorithms without getting bogged down in platform specific issues (word sizes, instructions per operation, memory speed versus disk speed). When N goes to infinity, those issues disappear. But when N is 10000, 1000, 100, those issues, along with all of the other constants that we left out of the O function, start to matter.
To answer the question of the poster: O(log N) != O(1), and you're right, algorithms with O(1) are sometimes not much better than algorithms with O(log N), depending on the size of the input, and all of those internal constants that got omitted during Big O analysis.
If you know you're going to be cranking up N, then use Big O analysis. If you're not, then you'll need some empirical tests.
In theory
Yes, in practical situations log(n) is bounded by a constant, we'll say 100. However, replacing log(n) by 100 in situations where it's correct is still throwing away information, making the upper bound on operations that you have calculated looser and less useful. Replacing an O(log(n)) by an O(1) in your analysis could result in your large n case performing 100 times worse than you expected based on your small n case. Your theoretical analysis could have been more accurate and could have predicted an issue before you'd built the system.
I would argue that the practical purpose of big-O analysis is to try and predict the execution time of your algorithm as early as possible. You can make your analysis easier by crossing out the log(n) terms, but then you've reduced the predictive power of the estimate.
In practice
If you read the original papers by Larry Page and Sergey Brin on the Google architecture, they talk about using hash tables for everything to ensure that e.g. the lookup of a cached web page only takes one hard-disk seek. If you used B-tree indices to lookup you might need four or five hard-disk seeks to do an uncached lookup [*]. Quadrupling your disk requirements on your cached web page storage is worth caring about from a business perspective, and predictable if you don't cast out all the O(log(n)) terms.
P.S. Sorry for using Google as an example, they're like Hitler in the computer science version of Godwin's law.
[*] Assuming 4KB reads from disk, 100bn web pages in the index, ~ 16 bytes per key in a B-tree node.
As others have pointed out, Big-O tells you about how the performance of your problem scales. Trust me - it matters. I have encountered several times algorithms that were just terrible and failed to meet the customers demands because they were too slow. Understanding the difference and finding an O(1) solution is a lot of times a huge improvement.
However, of course, that is not the whole story - for instance, you may notice that quicksort algorithms will always switch to insertion sort for small elements (Wikipedia says 8 - 20) because of the behaviour of both algorithms on small datasets.
So it's a matter of understanding what tradeoffs you will be doing which involves a thorough understanding of the problem, the architecture, & experience to understand which to use, and how to adjust the constants involved.
No one is saying that O(1) is always better than O(log N). However, I can guarantee you that an O(1) algorithm will also scale way better, so even if you make incorrect assumptions about how many users will be on the system, or the size of the data to process, it won't matter to the algorithm.
Yes, log(N) < 100 for most practical purposes, and No, you can not always replace it by constant.
For example, this may lead to serious errors in estimating performance of your program. If O(N) program processed array of 1000 elements in 1 ms, then you are sure it will process 106 elements in 1 second (or so). If, though, the program is O(N*logN), then it will take it ~2 secs to process 106 elements. This difference may be crucial - for example, you may think you've got enough server power because you get 3000 requests per hour and you think your server can handle up to 3600.
Another example. Imagine you have function f() working in O(logN), and on each iteration calling function g(), which works in O(logN) as well. Then, if you replace both logs by constants, you think that your program works in constant time. Reality will be cruel though - two logs may give you up to 100*100 multiplicator.
The rules of determining the Big-O notation are simpler when you don't decide that O(log n) = O(1).
As krzysio said, you may accumulate O(log n)s and then they would make a very noticeable difference. Imagine you do a binary search: O(log n) comparisons, and then imagine that each comparison's complexity O(log n). If you neglect both you get O(1) instead of O(log2n). Similarly you may somehow arrive at O(log10n) and then you'll notice a big difference for not too large "n"s.
Assume that in your entire application, one algorithm accounts for 90% of the time the user waits for the most common operation.
Suppose in real time the O(1) operation takes a second on your architecture, and the O(logN) operation is basically .5 seconds * log(N). Well, at this point I'd really like to draw you a graph with an arrow at the intersection of the curve and the line, saying, "It matters right here." You want to use the log(N) op for small datasets and the O(1) op for large datasets, in such a scenario.
Big-O notation and performance optimization is an academic exercise rather than delivering real value to the user for operations that are already cheap, but if it's an expensive operation on a critical path, then you bet it matters!
For any algorithm that can take inputs of different sizes N, the number of operations it takes is upper-bounded by some function f(N).
All big-O tells you is the shape of that function.
O(1) means there is some number A such that f(N) < A for large N.
O(N) means there is some A such that f(N) < AN for large N.
O(N^2) means there is some A such that f(N) < AN^2 for large N.
O(log(N)) means there is some A such that f(N) < AlogN for large N.
Big-O says nothing about how big A is (i.e. how fast the algorithm is), or where these functions cross each other. It only says that when you are comparing two algorithms, if their big-Os differ, then there is a value of N (which may be small or it may be very large) where one algorithm will start to outperform the other.
you are right, in many cases it does not matter for pracitcal purposes. but the key question is "how fast GROWS N". most algorithms we know of take the size of the input, so it grows linearily.
but some algorithms have the value of N derived in a complex way. if N is "the number of possible lottery combinations for a lottery with X distinct numbers" it suddenly matters if your algorithm is O(1) or O(logN)
Big-OH tells you that one algorithm is faster than another given some constant factor. If your input implies a sufficiently small constant factor, you could see great performance gains by going with a linear search rather than a log(n) search of some base.
O(log N) can be misleading. Take for example the operations on Red-Black trees.
The operations are O(logN) but rather complex, which means many low level operations.
Whenever N is the amount of objects that is stored in some kind of memory, you're correct. After all, a binary search through EVERY byte representable by a 64-bit pointer can be achieved in just 64 steps. Actually, it's possible to do a binary search of all Planck volumes in the observable universe in just 618 steps.
So in almost all cases, it's safe to approximate O(log N) with O(N) as long as N is (or could be) a physical quantity, and we know for certain that as long as N is (or could be) a physical quantity, then log N < 618
But that is assuming N is that. It may represent something else. Note that it's not always clear what it is. Just as an example, take matrix multiplication, and assume square matrices for simplicity. The time complexity for matrix multiplication is O(N^3) for a trivial algorithm. But what is N here? It is the side length. It is a reasonable way of measuring the input size, but it would also be quite reasonable to use the number of elements in the matrix, which is N^2. Let M=N^2, and now we can say that the time complexity for trivial matrix multiplication is O(M^(3/2)) where M is the number of elements in a matrix.
Unfortunately, I don't have any real world problem per se, which was what you asked. But at least I can make up something that makes some sort of sense:
Let f(S) be a function that returns the sum of the hashes of all the elements in the power set of S. Here is some pesudo:
f(S):
ret = 0
for s = powerset(S))
ret += hash(s)
Here, hash is simply the hash function, and powerset is a generator function. Each time it's called, it will generate the next (according to some order) subset of S. A generator is necessary, because we would not be able to store the lists for huge data otherwise. Btw, here is a python example of such a power set generator:
def powerset(seq):
"""
Returns all the subsets of this set. This is a generator.
"""
if len(seq) <= 1:
yield seq
yield []
else:
for item in powerset(seq[1:]):
yield [seq[0]]+item
yield item
https://www.technomancy.org/python/powerset-generator-python/
So what is the time complexity for f? As with the matrix multiplication, we can choose N to represent many things, but at least two makes a lot of sense. One is number of elements in S, in which case the time complexity is O(2^N), but another sensible way of measuring it is that N is the number of element in the power set of S. In this case the time complexity is O(N)
So what will log N be for sensible sizes of S? Well, list with a million elements are not unusual. If n is the size of S and N is the size of P(S), then N=2^n. So O(log N) = O(log 2^n) = O(n * log 2) = O(n)
In this case it would matter, because it's rare that O(n) == O(log n) in the real world.
I do not believe algorithms where you can freely choose between O(1) with a large constant and O(logN) really exists. If there is N elements to work with at the beginning, it is just plain impossible to make it O(1), the only thing that is possible is move your N to some other part of your code.
What I try to say is that in all real cases I know off you have some space/time tradeoff, or some pre-treatment such as compiling data to a more efficient form.
That is, you do not really go O(1), you just move the N part elsewhere. Either you exchange performance of some part of your code with some memory amount either you exchange performance of one part of your algorithm with another one. To stay sane you should always look at the larger picture.
My point is that if you have N items they can't disappear. In other words you can choose between inefficient O(n^2) algorithms or worse and O(n.logN) : it's a real choice. But you never really go O(1).
What I try to point out is that for every problem and initial data state there is a 'best' algorithm. You can do worse but never better. With some experience you can have a good guessing of what is this intrisic complexity. Then if your overall treatment match that complexity you know you have something. You won't be able to reduce that complexity, but only to move it around.
If problem is O(n) it won't become O(logN) or O(1), you'll merely add some pre-treatment such that the overall complexity is unchanged or worse, and potentially a later step will be improved. Say you want the smaller element of an array, you can search in O(N) or sort the array using any common O(NLogN) sort treatment then have the first using O(1).
Is it a good idea to do that casually ? Only if your problem asked also for second, third, etc. elements. Then your initial problem was truly O(NLogN), not O(N).
And it's not the same if you wait ten times or twenty times longer for your result because you simplified saying O(1) = O(LogN).
I'm waiting for a counter-example ;-) that is any real case where you have choice between O(1) and O(LogN) and where every O(LogN) step won't compare to the O(1). All you can do is take a worse algorithm instead of the natural one or move some heavy treatment to some other part of the larger pictures (pre-computing results, using storage space, etc.)
Let's say you use an image-processing algorithm that runs in O(log N), where N is the number of images. Now... stating that it runs in constant time would make one believe that no matter how many images there are, it would still complete its task it about the same amount of time. If running the algorithm on a single image would hypothetically take a whole day, and assuming that O(logN) will never be more than 100... imagine the surprise of that person that would try to run the algorithm on a very large image database - he would expect it to be done in a day or so... yet it'll take months for it to finish.
This question already has answers here:
What is a plain English explanation of "Big O" notation?
(43 answers)
Closed 5 years ago.
I'm asking more about what this means to my code. I understand the concepts mathematically, I just have a hard time wrapping my head around what they mean conceptually. For example, if one were to perform an O(1) operation on a data structure, I understand that the number of operations it has to perform won't grow because there are more items. And an O(n) operation would mean that you would perform a set of operations on each element. Could somebody fill in the blanks here?
Like what exactly would an O(n^2) operation do?
And what the heck does it mean if an operation is O(n log(n))?
And does somebody have to smoke crack to write an O(x!)?
One way of thinking about it is this:
O(N^2) means for every element, you're doing something with every other element, such as comparing them. Bubble sort is an example of this.
O(N log N) means for every element, you're doing something that only needs to look at log N of the elements. This is usually because you know something about the elements that let you make an efficient choice. Most efficient sorts are an example of this, such as merge sort.
O(N!) means to do something for all possible permutations of the N elements. Traveling salesman is an example of this, where there are N! ways to visit the nodes, and the brute force solution is to look at the total cost of every possible permutation to find the optimal one.
The big thing that Big-O notation means to your code is how it will scale when you double the amount of "things" it operates on. Here's a concrete example:
Big-O | computations for 10 things | computations for 100 things
----------------------------------------------------------------------
O(1) | 1 | 1
O(log(n)) | 3 | 7
O(n) | 10 | 100
O(n log(n)) | 30 | 700
O(n^2) | 100 | 10000
So take quicksort which is O(n log(n)) vs bubble sort which is O(n^2). When sorting 10 things, quicksort is 3 times faster than bubble sort. But when sorting 100 things, it's 14 times faster! Clearly picking the fastest algorithm is important then. When you get to databases with million rows, it can mean the difference between your query executing in 0.2 seconds, versus taking hours.
Another thing to consider is that a bad algorithm is one thing that Moore's law cannot help. For example, if you've got some scientific calculation that's O(n^3) and it can compute 100 things a day, doubling the processor speed only gets you 125 things in a day. However, knock that calculation to O(n^2) and you're doing 1000 things a day.
clarification:
Actually, Big-O says nothing about comparative performance of different algorithms at the same specific size point, but rather about comparative performance of the same algorithm at different size points:
computations computations computations
Big-O | for 10 things | for 100 things | for 1000 things
----------------------------------------------------------------------
O(1) | 1 | 1 | 1
O(log(n)) | 1 | 3 | 7
O(n) | 1 | 10 | 100
O(n log(n)) | 1 | 33 | 664
O(n^2) | 1 | 100 | 10000
You might find it useful to visualize it:
Also, on LogY/LogX scale the functions n1/2, n, n2 all look like straight lines, while on LogY/X scale 2n, en, 10n are straight lines and n! is linearithmic (looks like n log n).
This might be too mathematical, but here's my try. (I am a mathematician.)
If something is O(f(n)), then it's running time on n elements will be equal to A f(n) + B (measured in, say, clock cycles or CPU operations). It's key to understanding that you also have these constants A and B, which arise from the specific implementation. B represents essentially the "constant overhead" of your operation, for example some preprocessing that you do that doesn't depend on the size of the collection. A represents the speed of your actual item-processing algorithm.
The key, though, is that you use big O notation to figure out how well something will scale. So those constants won't really matter: if you're trying to figure out how to scale from 10 to 10000 items, who cares about the constant overhead B? Similarly, other concerns (see below) will certainly outweigh the weight of the multiplicative constant A.
So the real deal is f(n). If f grows not at all with n, e.g. f(n) = 1, then you'll scale fantastically---your running time will always just be A + B. If f grows linearly with n, i.e. f(n) = n, your running time will scale pretty much as best as can be expected---if your users are waiting 10 ns for 10 elements, they'll wait 10000 ns for 10000 elements (ignoring the additive constant). But if it grows faster, like n2, then you're in trouble; things will start slowing down way too much when you get larger collections. f(n) = n log(n) is a good compromise, usually: your operation can't be so simple as to give linear scaling, but you've managed to cut things down such that it'll scale much better than f(n) = n2.
Practically, here are some good examples:
O(1): retrieving an element from an array. We know exactly where it is in memory, so we just go get it. It doesn't matter if the collection has 10 items or 10000; it's still at index (say) 3, so we just jump to location 3 in memory.
O(n): retrieving an element from a linked list. Here, A = 0.5, because on average you''ll have to go through 1/2 of the linked list before you find the element you're looking for.
O(n2): various "dumb" sorting algorithms. Because generally their strategy involves, for each element (n), you look at all the other elements (so times another n, giving n2), then position yourself in the right place.
O(n log(n)): various "smart" sorting algorithms. It turns out that you only need to look at, say, 10 elements in a 1010-element collection to intelligently sort yourself relative to everyone else in the collection. Because everyone else is also going to look at 10 elements, and the emergent behavior is orchestrated just right so that this is enough to produce a sorted list.
O(n!): an algorithm that "tries everything," since there are (proportional to) n! possible combinations of n elements that might solve a given problem. So it just loops through all such combinations, tries them, then stops whenever it succeeds.
don.neufeld's answer is very good, but I'd probably explain it in two parts: first, there's a rough hierarchy of O()'s that most algorithms fall into. Then, you can look at each of those to come up with sketches of what typical algorithms of that time complexity do.
For practical purposes, the only O()'s that ever seem to matter are:
O(1) "constant time" - the time required is independent of the size of the input. As a rough category, I would include algorithms such as hash lookups and Union-Find here, even though neither of those are actually O(1).
O(log(n)) "logarithmic" - it gets slower as you get larger inputs, but once your input gets fairly large, it won't change enough to worry about. If your runtime is ok with reasonably-sized data, you can swamp it with as much additional data as you want and it'll still be ok.
O(n) "linear" - the more input, the longer it takes, in an even tradeoff. Three times the input size will take roughly three times as long.
O(n log(n)) "better than quadratic" - increasing the input size hurts, but it's still manageable. The algorithm is probably decent, it's just that the underlying problem is more difficult (decisions are less localized with respect to the input data) than those problems that can be solved in linear time. If your input sizes are getting up there, don't assume that you could necessarily handle twice the size without changing your architecture around (eg by moving things to overnight batch computations, or not doing things per-frame). It's ok if the input size increases a little bit, though; just watch out for multiples.
O(n^2) "quadratic" - it's really only going to work up to a certain size of your input, so pay attention to how big it could get. Also, your algorithm may suck -- think hard to see if there's an O(n log(n)) algorithm that would give you what you need. Once you're here, feel very grateful for the amazing hardware we've been gifted with. Not long ago, what you are trying to do would have been impossible for all practical purposes.
O(n^3) "cubic" - not qualitatively all that different from O(n^2). The same comments apply, only more so. There's a decent chance that a more clever algorithm could shave this time down to something smaller, eg O(n^2 log(n)) or O(n^2.8...), but then again, there's a good chance that it won't be worth the trouble. (You're already limited in your practical input size, so the constant factors that may be required for the more clever algorithms will probably swamp their advantages for practical cases. Also, thinking is slow; letting the computer chew on it may save you time overall.)
O(2^n) "exponential" - the problem is either fundamentally computationally hard or you're being an idiot. These problems have a recognizable flavor to them. Your input sizes are capped at a fairly specific hard limit. You'll know quickly whether you fit into that limit.
And that's it. There are many other possibilities that fit between these (or are greater than O(2^n)), but they don't often happen in practice and they're not qualitatively much different from one of these. Cubic algorithms are already a bit of a stretch; I only included them because I've run into them often enough to be worth mentioning (eg matrix multiplication).
What's actually happening for these classes of algorithms? Well, I think you had a good start, although there are many examples that wouldn't fit these characterizations. But for the above, I'd say it usually goes something like:
O(1) - you're only looking at most at a fixed-size chunk of your input data, and possibly none of it. Example: the maximum of a sorted list.
Or your input size is bounded. Example: addition of two numbers. (Note that addition of N numbers is linear time.)
O(log n) - each element of your input tells you enough to ignore a large fraction of the rest of the input. Example: when you look at an array element in binary search, its value tells you that you can ignore "half" of your array without looking at any of it. Or similarly, the element you look at gives you enough of a summary of a fraction of the remaining input that you won't need to look at it.
There's nothing special about halves, though -- if you can only ignore 10% of your input at each step, it's still logarithmic.
O(n) - you do some fixed amount of work per input element. (But see below.)
O(n log(n)) - there are a few variants.
You can divide the input into two piles (in no more than linear time), solve the problem independently on each pile, and then combine the two piles to form the final solution. The independence of the two piles is key. Example: classic recursive mergesort.
Each linear-time pass over the data gets you halfway to your solution. Example: quicksort if you think in terms of the maximum distance of each element to its final sorted position at each partitioning step (and yes, I know that it's actually O(n^2) because of degenerate pivot choices. But practically speaking, it falls into my O(n log(n)) category.)
O(n^2) - you have to look at every pair of input elements.
Or you don't, but you think you do, and you're using the wrong algorithm.
O(n^3) - um... I don't have a snappy characterization of these. It's probably one of:
You're multiplying matrices
You're looking at every pair of inputs but the operation you do requires looking at all of the inputs again
the entire graph structure of your input is relevant
O(2^n) - you need to consider every possible subset of your inputs.
None of these are rigorous. Especially not linear time algorithms (O(n)): I could come up with a number of examples where you have to look at all of the inputs, then half of them, then half of those, etc. Or the other way around -- you fold together pairs of inputs, then recurse on the output. These don't fit the description above, since you're not looking at each input once, but it still comes out in linear time. Still, 99.2% of the time, linear time means looking at each input once.
A lot of these are easy to demonstrate with something non-programming, like shuffling cards.
Sorting a deck of cards by going through the whole deck to find the ace of spades, then going through the whole deck to find the 2 of spades, and so on would be worst case n^2, if the deck was already sorted backwards. You looked at all 52 cards 52 times.
In general the really bad algorithms aren't necessarily intentional, they're commonly a misuse of something else, like calling a method that is linear inside some other method that repeats over the same set linearly.
I try to explain by giving simple code examples in C# and JavaScript.
C#
For List<int> numbers = new List<int> {1,2,3,4,5,6,7,12,543,7};
O(1) looks like
return numbers.First();
O(n) looks like
int result = 0;
foreach (int num in numbers)
{
result += num;
}
return result;
O(n log(n)) looks like
int result = 0;
foreach (int num in numbers)
{
int index = numbers.Count - 1;
while (index > 1)
{
// yeah, stupid, but couldn't come up with something more useful :-(
result += numbers[index];
index /= 2;
}
}
return result;
O(n2) looks like
int result = 0;
foreach (int outerNum in numbers)
{
foreach (int innerNum in numbers)
{
result += outerNum * innerNum;
}
}
return result;
O(n!) looks like, uhm, to tired to come up with anything simple.
But I hope you get the general point?
JavaScript
For const numbers = [ 1, 2, 3, 4, 5, 6, 7, 12, 543, 7 ];
O(1) looks like
numbers[0];
O(n) looks like
let result = 0;
for (num of numbers){
result += num;
}
O(n log(n)) looks like
let result = 0;
for (num of numbers){
let index = numbers.length - 1;
while (index > 1){
// yeah, stupid, but couldn't come up with something more useful :-(
result += numbers[index];
index = Math.floor(index/2)
}
}
O(n2) looks like
let result = 0;
for (outerNum of numbers){
for (innerNum of numbers){
result += outerNum * innerNum;
}
}
Ok - there are some very good answers here but almost all of them seem to make the same mistake and it's one that is pervading common usage.
Informally, we write that f(n) = O( g(n) ) if, up to a scaling factor and for all n larger than some n0, g(n) is larger than f(n). That is, f(n) grows no quicker than, or is bounded from above by, g(n). This tells us nothing about how fast f(n) grows, save for the fact that it is guaranteed not to be any worse than g(n).
A concrete example: n = O( 2^n ). We all know that n grows much less quickly than 2^n, so that entitles us to say that it is bounded by above by the exponential function. There is a lot of room between n and 2^n, so it's not a very tight bound, but it's still a legitimate bound.
Why do we (computer scientists) use bounds rather than being exact? Because a) bounds are often easier to prove and b) it gives us a short-hand to express properties of algorithms. If I say that my new algorithm is O(n.log n) that means that in the worst case its run-time will be bounded from above by n.log n on n inputs, for large enough n (although see my comments below on when I might not mean worst-case).
If instead, we want to say that a function grows exactly as quickly as some other function, we use theta to make that point (I'll write T( f(n) ) to mean \Theta of f(n) in markdown). T( g(n) ) is short hand for being bounded from above and below by g(n), again, up to a scaling factor and asymptotically.
That is f(n) = T( g(n) ) <=> f(n) = O(g(n)) and g(n) = O(f(n)). In our example, we can see that n != T( 2^n ) because 2^n != O(n).
Why get concerned about this? Because in your question you write 'would someone have to smoke crack to write an O(x!)?' The answer is no - because basically everything you write will be bounded from above by the factorial function. The run time of quicksort is O(n!) - it's just not a tight bound.
There's also another dimension of subtlety here. Typically we are talking about the worst case input when we use O( g(n) ) notation, so that we are making a compound statement: in the worst case running time it will not be any worse than an algorithm that takes g(n) steps, again modulo scaling and for large enough n. But sometimes we want to talk about the running time of the average and even best cases.
Vanilla quicksort is, as ever, a good example. It's T( n^2 ) in the worst case (it will actually take at least n^2 steps, but not significantly more), but T(n.log n) in the average case, which is to say the expected number of steps is proportional to n.log n. In the best case it is also T(n.log n) - but you could improve that for, by example, checking if the array was already sorted in which case the best case running time would be T( n ).
How does this relate to your question about the practical realisations of these bounds? Well, unfortunately, O( ) notation hides constants which real-world implementations have to deal with. So although we can say that, for example, for a T(n^2) operation we have to visit every possible pair of elements, we don't know how many times we have to visit them (except that it's not a function of n). So we could have to visit every pair 10 times, or 10^10 times, and the T(n^2) statement makes no distinction. Lower order functions are also hidden - we could have to visit every pair of elements once, and every individual element 100 times, because n^2 + 100n = T(n^2). The idea behind O( ) notation is that for large enough n, this doesn't matter at all because n^2 gets so much larger than 100n that we don't even notice the impact of 100n on the running time. However, we often deal with 'sufficiently small' n such that constant factors and so on make a real, significant difference.
For example, quicksort (average cost T(n.log n)) and heapsort (average cost T(n.log n)) are both sorting algorithms with the same average cost - yet quicksort is typically much faster than heapsort. This is because heapsort does a few more comparisons per element than quicksort.
This is not to say that O( ) notation is useless, just imprecise. It's quite a blunt tool to wield for small n.
(As a final note to this treatise, remember that O( ) notation just describes the growth of any function - it doesn't necessarily have to be time, it could be memory, messages exchanged in a distributed system or number of CPUs required for a parallel algorithm.)
The way I describe it to my nontechnical friends is like this:
Consider multi-digit addition. Good old-fashioned, pencil-and-paper addition. The kind you learned when you were 7-8 years old. Given two three-or-four-digit numbers, you can find out what they add up to fairly easily.
If I gave you two 100-digit numbers, and asked you what they add up to, figuring it out would be pretty straightforward, even if you had to use pencil-and-paper. A bright kid could do such an addition in just a few minutes. This would only require about 100 operations.
Now, consider multi-digit multiplication. You probably learned that at around 8 or 9 years old. You (hopefully) did lots of repetitive drills to learn the mechanics behind it.
Now, imagine I gave you those same two 100-digit numbers and told you to multiply them together. This would be a much, much harder task, something that would take you hours to do - and that you'd be unlikely to do without mistakes. The reason for this is that (this version of) multiplication is O(n^2); each digit in the bottom number has to be multiplied by each digit in the top number, leaving a total of about n^2 operations. In the case of the 100-digit numbers, that's 10,000 multiplications.
No, an O(n) algorithm does not mean it will perform an operation on each element. Big-O notation gives you a way to talk about the "speed" of you algorithm independent of your actual machine.
O(n) means that the time your algorithm will take grows linearly as your input increase. O(n^2) means that the time your algorithm takes grows as the square of your input. And so forth.
The way I think about it, is you have the task of cleaning up a problem caused by some evil villain V who picks N, and you have to estimate out how much longer it's going to take to finish your problem when he increases N.
O(1) -> increasing N really doesn't make any difference at all
O(log(N)) -> every time V doubles N, you have to spend an extra amount of time T to complete the task. V doubles N again, and you spend the same amount.
O(N) -> every time V doubles N, you spend twice as much time.
O(N^2) -> every time V doubles N, you spend 4x as much time. (it's not fair!!!)
O(N log(N)) -> every time V doubles N, you spend twice as much time plus a little more.
These are bounds of an algorithm; computer scientists want to describe how long it is going to take for large values of N. (which gets important when you are factoring numbers that are used in cryptography -- if the computers speed up by a factor of 10, how many more bits do you have to use to ensure it will still take them 100 years to break your encryption and not just 1 year?)
Some of the bounds can have weird expressions if it makes a difference to the people involved. I've seen stuff like O(N log(N) log(log(N))) somewhere in Knuth's Art of Computer Programming for some algorithms. (can't remember which one off the top of my head)
One thing that hasn't been touched on yet for some reason:
When you see algorithms with things like O(2^n) or O(n^3) or other nasty values it often means you're going to have to accept an imperfect answer to your problem in order to get acceptable performance.
Correct solutions that blow up like this are common when dealing with optimization problems. A nearly-correct answer delivered in a reasonable timeframe is better than a correct answer delivered long after the machine has decayed to dust.
Consider chess: I don't know exactly what the correct solution is considered to be but it's probably something like O(n^50) or even worse. It is theoretically impossible for any computer to actually calculate the correct answer--even if you use every particle in the universe as a computing element performing an operation in the minimum possible time for the life of the universe you still have a lot of zeros left. (Whether a quantum computer can solve it is another matter.)
The "Intuitition" behind Big-O
Imagine a "competition" between two functions over x, as x approaches infinity: f(x) and g(x).
Now, if from some point on (some x) one function always has a higher value then the other, then let's call this function "faster" than the other.
So, for example, if for every x > 100 you see that f(x) > g(x), then f(x) is "faster" than g(x).
In this case we would say g(x) = O(f(x)). f(x) poses a sort of "speed limit" of sorts for g(x), since eventually it passes it and leaves it behind for good.
This isn't exactly the definition of big-O notation, which also states that f(x) only has to be larger than C*g(x) for some constant C (which is just another way of saying that you can't help g(x) win the competition by multiplying it by a constant factor - f(x) will always win in the end). The formal definition also uses absolute values. But I hope I managed to make it intuitive.
And does somebody have to smoke crack to write an O(x!)?
No, just use Prolog. If you write a sorting algorithm in Prolog by just describing that each element should be bigger than the previous, and let backtracking do the sorting for you, that will be O(x!). Also known as "permutation sort".
I like don neufeld's answer, but I think I can add something about O(n log n).
An algorithm which uses a simple divide and conquer strategy is probably going to be O(log n). The simplest example of this is finding a something in an sorted list. You don't start at the beginning and scan for it. You go to the middle, you decide if you should then go backwards or forwards, jump halfway to the last place you looked, and repeat this until you find the item you're looking for.
If you look at the quicksort or mergesort algorithms, you will see that they both take the approach of dividing the list to be sorted in half, sorting each half (using the same algorithm, recursively), and then recombining the two halves. This sort of recursive divide and conquer strategy will be O(n log n).
If you think about it carefully, you'll see that quicksort does an O(n) partitioning algorithm on the whole n items, then an O(n) partitioning twice on n/2 items, then 4 times on n/4 items, etc... until you get to an n partitions on 1 item (which is degenerate). The number of times you divide n in half to get to 1 is approximately log n, and each step is O(n), so recursive divide and conquer is O(n log n). Mergesort builds the other way, starting with n recombinations of 1 item, and finishing with 1 recombination of n items, where the recombination of two sorted lists is O(n).
As for smoking crack to write an O(n!) algorithm, you are unless you have no choice. The traveling salesman problem given above is believed to be one such problem.
Think of it as stacking lego blocks (n) vertically and jumping over them.
O(1) means at each step, you do nothing. The height stays the same.
O(n) means at each step, you stack c blocks, where c1 is a constant.
O(n^2) means at each step, you stack c2 x n blocks, where c2 is a constant, and n is the number of stacked blocks.
O(nlogn) means at each step, you stack c3 x n x log n blocks, where c3 is a constant, and n is the number of stacked blocks.
Most Jon Bentley books (e.g. Programming Pearls) cover such stuff in a really pragmatic manner. This talk given by him includes one such analysis of a quicksort.
While not entirely relevant to the question, Knuth came up with an interesting idea: teaching Big-O notation in high school calculus classes, though I find this idea quite eccentric.
To understand O(n log n), remember that log n means log-base-2 of n. Then look at each part:
O(n) is, more or less, when you operate on each item in the set.
O(log n) is when the number of operations is the same as the exponent to which you raise 2, to get the number of items. A binary search, for instance, has to cut the set in half log n times.
O(n log n) is a combination – you're doing something along the lines of a binary search for each item in the set. Efficient sorts often operate by doing one loop per item, and in each loop doing a good search to find the right place to put the item or group in question. Hence n * log n.
Just to respond to the couple of comments on my above post:
Domenic - I'm on this site, and I care. Not for pedantry's sake, but because we - as programmers - typically care about precision. Using O( ) notation incorrectly in the style that some have done here renders it kind of meaningless; we may just as well say something takes n^2 units of time as O( n^2 ) under the conventions used here. Using the O( ) adds nothing. It's not just a small discrepancy between common usage and mathematical precision that I'm talking about, it's the difference between it being meaningful and it not.
I know many, many excellent programmers who use these terms precisely. Saying 'oh, we're programmers therefore we don't care' cheapens the whole enterprise.
onebyone - Well, not really although I take your point. It's not O(1) for arbitrarily large n, which is kind of the definition of O( ). It just goes to show that O( ) has limited applicability for bounded n, where we would rather actually talk about the number of steps taken rather than a bound on that number.
Tell your eight year old log(n) means the number of times you have to chop a length n log in two for it to get down to size n=1 :p
O(n log n) is usually sorting
O(n^2) is usually comparing all pairs of elements
Suppose you had a computer that could solve a problem of a certain size. Now imagine that we can double the performance a few times. How much bigger a problem can we solve with each doubling?
If we can solve a problem of double the size, that's O(n).
If we have some multiplier that isn't one, that's some sort of polynomial complexity. For example, if each doubling allows us to increase the problem size by about 40%, it's O(n^2), and about 30% would be O(n^3).
If we just add to the problem size, it's exponential or worse. For example, if each doubling means we can solve a problem 1 bigger, it's O(2^n). (This is why brute-forcing a cipher key becomes effectively impossible with reasonably sized keys: a 128-bit key requires about 16 quintillion times as much processing as a 64-bit.)
Remember the fable of the tortoise and the hare (turtle and rabbit)?
Over the long run, the tortoise wins, but over the short run the hare wins.
That's like O(logN) (tortoise) vs. O(N) (hare).
If two methods differ in their big-O, then there is a level of N at which one of them will win, but big-O says nothing about how big that N is.
To remain sincere to the question asked I would answer the question in the manner I would answer an 8 year old kid
Suppose an ice-cream seller prepares a number of ice creams ( say N ) of different shapes arranged in an orderly fashion.
You want to eat the ice cream lying in the middle
Case 1 : - You can eat an ice cream only if you have eaten all the ice creams smaller than it
You will have to eat half of all the ice creams prepared (input).Answer directly depends on the size of the input
Solution will be of order o(N)
Case 2 :- You can directly eat the ice cream in the middle
Solution will be O(1)
Case 3 : You can eat an ice cream only if you have eaten all the ice creams smaller than it and each time you eat an ice cream you allow another kid (new kid everytime ) to eat all his ice creams
Total time taken would be N + N + N.......(N/2) times
Solution will be O(N2)
log(n) means logarithmic growth. An example would be divide and conquer algorithms. If you have 1000 sorted numbers in an array ( ex. 3, 10, 34, 244, 1203 ... ) and want to search for a number in the list (find its position), you could start with checking the value of the number at index 500. If it is lower than what you seek, jump to 750. If it is higher than what you seek, jump to 250. Then you repeat the process until you find your value (and key). Every time we jump half the search space, we can cull away testing many other values since we know the number 3004 can't be above number 5000 (remember, it is a sorted list).
n log(n) then means n * log(n).
I'll try to actually write an explanation for a real eight year old boy, aside from technical terms and mathematical notions.
Like what exactly would an O(n^2) operation do?
If you are in a party, and there are n people in the party including you. How many handshakes it take so that everyone has handshaked everyone else, given that people would probably forget who they handshaked at some point.
Note: this approximate to a simplex yielding n(n-1) which is close enough to n^2.
And what the heck does it mean if an operation is O(n log(n))?
Your favorite team has won, they are standing in line, and there are n players in the team. How many hanshakes it would take you to handshake every player, given that you will hanshake each one multiple times, how many times, how many digits are in the number of the players n.
Note: this will yield n * log n to the base 10.
And does somebody have to smoke crack to write an O(x!)?
You are a rich kid and in your wardrobe there are alot of cloths, there are x drawers for each type of clothing, the drawers are next to each others, the first drawer has 1 item, each drawer has as many cloths as in the drawer to its left and one more, so you have something like 1 hat, 2 wigs, .. (x-1) pants, then x shirts. Now in how many ways can you dress up using a single item from each drawer.
Note: this example represent how many leaves in a decision-tree where number of children = depth, which is done through 1 * 2 * 3 * .. * x