Is Big(O) machine dependent? - algorithm

I am really confused with Big(O) notation. Is Big(O) machine dependent or machine independent ? (Machine in the sense the computer in which we run the Algorithm)
Will Sorting of 1000 numbers using quick sort in i3 processor and i7 processor be the same ? Why don't we consider the machine and it's processor speed when calculating the Time Complexity ? I am a neophyte in this stuff.

Big-O is a measure of scalability, not of speed. It shows you what effect on time and memory it has when you e.g. double the amount of data - does it double the execution, or quadruple it?
Whether you use i7 or i3, double is double. Whether a linear algorithm is fast or slow, double is double.
This also has another implication many people ignore. A complex algorithm such as O(n^3) can be faster than a simple algorithm such as O(n) for a given n that is below a certain limit. Example:
loop n times:
loop n times:
loop n times:
sleep 1 second
is O(n^3), as it has 3 nested loops.
loop n times:
sleep 10 seconds
is O(n), as it only has one loop. For n = 10 the first program executes for 1000 seconds, and the second one executes for only 100. So, O(n) is good! one would be tempted to say. But if you have n = 2, the first, complex program executes in only 8 seconds, while the second, simpler one executes for 20! Even for n = 3, the first executes in 27 seconds, the second one in 30. So while the n is low, a complex program might be able to outperform the simpler one. It's just that as n rises, the complex program gets slower much faster (if that makes sense) than a simple one. For n = 1000, the simple code has risen to only 10000 seconds, but the complex one is now 1000000000 seconds!
Also, this clearly shows you that complexity is not processor-dependent. A second is a second.
EDIT: Also, you might want to read this question, where Big-O is explained in a number of very high-quality answers.

Big(O) Notation is the method of calculating the complexity of an algorithm, and hence the relative time it will take to run. The same algorithm, for the same data, will run faster on a faster processor, but will still take the same number of operations. It's used as a way of evaluating the relative efficiency of different algorithms to achieve the same result.

Big O notation is not architecture-dependent in any way, it is a Mathematical construct.It is a very limited measure of algorithmic complexity, it only gives you a rough upper bound for how performance changes with data size.

Big(O) is alogorithm dependent. It's job is to help compare the relative costs of various algorithms, without the need to consider the machine dependencies.
Linear search though an array, on average will look at about 1/2 of the elements if it is found. for all practical purposes that is O(N/2) which is the same as O(1/2 * N). for compairson, you toss away the coefficient. hence it is O(N) for use.
A binary tree can hold N elements for searching as well. on agerage it will look though log base 2 (N) to find something, hence you will see it described as cost O(LN2(N)).
pop in small values for N, and there isn't a whole lot of difference between the algorithms. Pop in a large value of N, and it will be clear that the binary tree lookup is much faster.

Big(O) is not machine dependent. It is mathematical notation to denote complexity of an algorithm. Usually we use these notations in theory to compare algorithms performance.

Related

Analysis with Parallel Algorithms

Everyone knows that bubblesort is O(n^2), but this is based on the number of comparisons needed to sort this. I have a question in which, if I didn't care about the number of comparisons, but the output time, then how do you do analysis of this? Is there a way to do analysis on output time instead of comparisons?
For example, if you could have bubble sort and have parallel comparisons happening at for all pairs (even then odd comparisons), then the throughput time would be something like 2n-1 throughput time. The number of comparisons would be high, but I don't care as the final throughput time is quick.
So in essence, is there a common analysis for overall parallel performance time? If so, just give me some key terms and I'll learn the rest from google.
Parallel programming is a bit of red herring here. Making assumptions about run time only on big O notation can be misleading. To compare run times of algorithms you need the full equation not just the big O notation.
The problem is that big O notation tells you the dominating term as n goes to infinity. But the run time is on finite ranges of n. This is easy to understand from continuous mathematics (my background).
Consider y=Ax and y=Bx^2 Big O notation would tell you that y=Bx^2 is slower. However, between (0,A/B) it's less than y=Ax. In this case it could be faster to use the O(x^2) algorithm than the O(x) algorithm for x<A/B.
In fact I have heard of sorting algorithms which start off with a O(nlogn) algorithm and then switch to a O(n^2) logarithm when n is sufficiently small.
The best example is matrix multiplication. The naïve algorithm is O(n^3) but there are algorithms that get that down to O(n^2.3727). However, every algorithm I have seen has such a large constant that the naïve O(n^3) is still the fastest algorithm for all particle values of n and that does not look likely to change any time soon.
So really what you need is the full equation to compare. Something like An^3 (let's ignore lower order terms) and Bn^2.3727. In this case B is so incredibly large that the O(n^3) method always wins.
Parallel programming usually just lowers the constant. For example when I do matrix multiplication using four cores my time goes from An^3 to A/4 n^3. The same thing will happen with your parallel bubble sort. You will decrease the constant. So it's possible that for some range of values of n that your parallel bubble sort will beat a non-parallel (or possibly even parallel) merge sort. Though, unlike matrix multiplication, I think the range will be pretty small.
Algorithm analysis is not meant to give actual run times. That's what benchmarks are for. Analysis tells you how much relative complexity is in a program, but the actual run time for that program depends on so many other factors that strict analysis can't guarantee real-world times. For example, what happens if your OS decides to suspend your program to install updates? Your run time will be shot. Even running the same program over and over yields different results due to the complexity of computer systems (memory page faults, virtual machine garbage collection, IO interrupts, etc). Analysis can't take these into account.
This is why parallel processing doesn't usually come under consideration during analysis. The mechanism for "parallelizing" a program's components is usually external to your code, and usually based on a probabilistic algorithm for scheduling. I don't know of a good way to do static analysis on that. Once again, you can run a bunch of benchmarks and that will give you an average run time.
The time efficiency we get by parallel steps can be measured by round complexity. Where each round consists of parallel steps occurring at the same time. By doing so, we can see how effective the throughput time is, in similar analysis that we are used to.

Estimation of program execution time from complexity

I want to know, how i can estimate the time that my program will take to execute on my machine (for example a 2.5 Ghz machine), if i have an estimation of its worst case time complexity?
For Example : - If I have a program which is O(n^2), in worst case, and n<100000, how can i know /estimate before writing the actual program/procedure, the time that it will take to execute in seconds?
Wouldn't it be good to know how a program actually performs, and it will also save writing code which eventually turns out to be inefficient!
Help greatly appreciated.
Since big O complexity ignores linear coefficients and smaller terms, it is impossible to estimate the performance of an algorithm given only its big o complexity.
In fact, for any specific N, you cannot predict which of two given algorithms will execute faster.
For example, O(N) is not always faster than O(N*N) since an algorithm that takes 100000000*n steps is O(N) is slower than an algorithm than takes N*N steps for many small values of N.
These linear coefficients and asymptotically smaller terms vary from platform to platform and even amongst algorithms of the same equivalence class (in terms of big O measure). 3
The problem you are trying to use big O notation for is not the one it is designed to solve.
Instead of dealing with complecity, you might want to have a look at Worst Case Execution Time (WCET). This area of research most likely corresponds to what you are looking for.
http://en.wikipedia.org/wiki/Worst-case_execution_time
Multiply N^2 by the time You spend in an iteration of the innermost loop, and You have a ballpark estimate.

How can one test time complexity "experimentally"?

Could it be done by keeping a counter to see how many iterations an algorithm goes through, or does the time duration need to be recorded?
The currently accepted won't give you any theoretical estimation, unless you are somehow able to fit the experimentally measured times with a function that approximates them. This answer gives you a manual technique to do that and fills that gap.
You start by guessing the theoretical complexity function of the algorithm. You also experimentally measure the actual complexity (number of operations, time, or whatever you find practical), for increasingly larger problems.
For example, say you guess an algorithm is quadratic. Measure (Say) the time, and compute the ratio of time to your guessed function (n^2):
for n = 5 to 10000 //n: problem size
long start = System.time()
executeAlgorithm(n)
long end = System.time()
long totalTime = end - start
double ratio = (double) time / (n * n)
end
. As n moves towards infinity, this ratio...
Converges to zero? Then your guess is too low. Repeat with something bigger (e.g. n^3)
Diverges to infinity? Then your guess is too high. Repeat with something smaller (e.g. nlogn)
Converges to a positive constant? Bingo! Your guess is on the money (at least approximates the theoretical complexity for as large n values as you tried)
Basically that uses the definition of big O notation, that f(x) = O(g(x)) <=> f(x) < c * g(x) - f(x) is the actual cost of your algorithm, g(x) is the guess you put, and c is a constant. So basically you try to experimentally find the limit of f(x)/g(x); if your guess hits the real complexity, this ratio will estimate the constant c.
Algorithm complexity is defined as (something like:)
the number of operations the algorithm does as a function
of its input size.
So you need to try your algorithm with various input sizes (i.e. for sort - try sorting 10 elements, 100 elements etc.), and count each operation (e.g. assignment, increment, mathematical operation etc.) the algorithm does.
This will give you a good "theoretical" estimation.
If you want real-life numbers on the other hand - use profiling.
As others have mentioned, the theoretical time complexity is a function of number of cpu operations done by your algorithm. In general processor time should be a good approximation for that modulo a constant. But the real run time may vary because of a number of reasons such as:
processor pipeline flushes
Cache misses
Garbage collection
Other processes on the machine
Unless your code is systematically causing some of these things to happen, with enough number of statistical samples, you should have a fairly good idea of the time complexity of your algorithm, based on observed runtime.
The best way would be to actually count the number of "operations" performed by your algorithm. The definition of "operation" can vary: for an algorithm such as quicksort, it could be the number of comparisons of two numbers.
You could measure the time taken by your program to get a rough estimate, but various factors could cause this value to differ from the actual mathematical complexity.
yes.
you can track both, actual performance and number of iterations.
Might I suggest using ANTS profiler. It will provide you this kind of detail while you run your app with "experimental" data.

What's my Big O?

My program of sorting values clocks at:
100000 8s
1000000 82s
10000000 811s
Is that O(n)?
Looks like it, but in reality, you really need to analyze the algorithm, because there may be different cases based on the data. Some algorithms do better or worse with pre-sorted data, for instance. What's your algorithm?
Yes, that looks like O(n) to me - going from the 1st to 2nd case and the 2nd to 3rd case, you've made the input 10 times bigger, and it's taken 10 times longer.
In particular, it looks like you can predict the rough time by using:
f(n) = n / 12500
or
f(n) = n * 0.00008
which gives a simplest explanation of O(n) for the data provided.
EDIT: However... As has been pointed out, there are various ways in which the data could be misleading you - I rather like Dennis Palmer's idea that the IO cost is dwarfing everything else. For example, suppose you have an algorithm whose absolute number of operations is:
f(n) = 1000000000000n + (n^2)
In that case the complexity is still O(n^2), but that won't become apparent from observations until n is very large.
I think it would be accurate to say that these observations are suggestive of an O(n) algorithm but that doesn't mean it definitely is.
Time behavior doesn't work that way. All you can really say there is that those three datasets are roughly O(n) from each other. That doesn't mean the algorithim is O(n).
The first problem is that I could easily draw a curve that goes exponential ( O(e**n) ) that nonetheless includes those three points.
The big problem though is that you didn't say anything about the data. There are many sorting algorithms that approach O(n) for sorted or nearly sorted inputs (eg: Mergesort). However, their average case (typically randomly-ordered data) and worst case (often reverse-sorted data) is invariably O(nlogn) or worse.
You cannot tell.
First, the time depends on the data, environment and algorithm. If you have an array of zeroes and try to sort it, the program running time shouldn't be much different for 1000 or 1000000 elements.
Second, the O notation tells about function behavior for big value of n, starting at n0. It could be that your algorithm scales well up to certain input size, and then its behavior changes - like the g(n) function.
it looks like O(n) to me.
Yes, that is O(n) because it scales with the number of items.
1000000 = 10 * 100000
and
82s = 10 * 8s (roughly)
You cannot depend on that to say it is O(n). Bubblesort, for instance, can complete in n steps, however it is an O(n^2) algorithm.
Yes, it appears to be O(n), but I don't think you can conclusively analyze the algorithm based on it's timed performance. You might be using the most inefficient sorting algorithm, but have O(n) timed results because the data read/write time is the majority of the total execution time.
Edit: Big-O is determined by how efficient an algorithm is and how it scales. It should predict the growth of execution time as the number of input items grows. The inverse is not necessarily true. Observing a given growth in execution time could mean a few different things. If it stays true to the example numbers you've given, then you can conclude that your program runs at O(n), but as others have said, that doesn't mean your sorting algorithm is O(n).

O(log N) == O(1) - Why not?

Whenever I consider algorithms/data structures I tend to replace the log(N) parts by constants. Oh, I know log(N) diverges - but does it matter in real world applications?
log(infinity) < 100 for all practical purposes.
I am really curious for real world examples where this doesn't hold.
To clarify:
I understand O(f(N))
I am curious about real world examples where the asymptotic behaviour matters more than the constants of the actual performance.
If log(N) can be replaced by a constant it still can be replaced by a constant in O( N log N).
This question is for the sake of (a) entertainment and (b) to gather arguments to use if I run (again) into a controversy about the performance of a design.
Big O notation tells you about how your algorithm changes with growing input. O(1) tells you it doesn't matter how much your input grows, the algorithm will always be just as fast. O(logn) says that the algorithm will be fast, but as your input grows it will take a little longer.
O(1) and O(logn) makes a big diference when you start to combine algorithms.
Take doing joins with indexes for example. If you could do a join in O(1) instead of O(logn) you would have huge performance gains. For example with O(1) you can join any amount of times and you still have O(1). But with O(logn) you need to multiply the operation count by logn each time.
For large inputs, if you had an algorithm that was O(n^2) already, you would much rather do an operation that was O(1) inside, and not O(logn) inside.
Also remember that Big-O of anything can have a constant overhead. Let's say that constant overhead is 1 million. With O(1) that constant overhead does not amplify the number of operations as much as O(logn) does.
Another point is that everyone thinks of O(logn) representing n elements of a tree data structure for example. But it could be anything including bytes in a file.
I think this is a pragmatic approach; O(logN) will never be more than 64. In practice, whenever terms get as 'small' as O(logN), you have to measure to see if the constant factors win out. See also
Uses of Ackermann function?
To quote myself from comments on another answer:
[Big-Oh] 'Analysis' only matters for factors
that are at least O(N). For any
smaller factor, big-oh analysis is
useless and you must measure.
and
"With O(logN) your input size does
matter." This is the whole point of
the question. Of course it matters...
in theory. The question the OP asks
is, does it matter in practice? I
contend that the answer is no, there
is not, and never will be, a data set
for which logN will grow so fast as to
always be beaten a constant-time
algorithm. Even for the largest
practical dataset imaginable in the
lifetimes of our grandchildren, a logN
algorithm has a fair chance of beating
a constant time algorithm - you must
always measure.
EDIT
A good talk:
http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
about halfway through, Rich discusses Clojure's hash tries, which are clearly O(logN), but the base of the logarithm is large and so the depth of the trie is at most 6 even if it contains 4 billion values. Here "6" is still an O(logN) value, but it is an incredibly small value, and so choosing to discard this awesome data structure because "I really need O(1)" is a foolish thing to do. This emphasizes how most of the other answers to this question are simply wrong from the perspective of the pragmatist who wants their algorithm to "run fast" and "scale well", regardless of what the "theory" says.
EDIT
See also
http://queue.acm.org/detail.cfm?id=1814327
which says
What good is an O(log2(n)) algorithm
if those operations cause page faults
and slow disk operations? For most
relevant datasets an O(n) or even an
O(n^2) algorithm, which avoids page
faults, will run circles around it.
(but go read the article for context).
This is a common mistake - remember Big O notation is NOT telling you about the absolute performance of an algorithm at a given value, it's simply telling you the behavior of an algorithm as you increase the size of the input.
When you take it in that context it becomes clear why an algorithm A ~ O(logN) and an algorithm B ~ O(1) algorithm are different:
if I run A on an input of size a, then on an input of size 1000000*a, I can expect the second input to take log(1,000,000) times as long as the first input
if I run B on an input of size a, then on an input of size 1000000*a, I can expect the second input to take about the same amount of time as the first input
EDIT: Thinking over your question some more, I do think there's some wisdom to be had in it. While I would never say it's correct to say O(lgN) == O(1), It IS possible that an O(lgN) algorithm might be used over an O(1) algorithm. This draws back to the point about absolute performance above: Just knowing one algorithm is O(1) and another algorithm is O(lgN) is NOT enough to declare you should use the O(1) over the O(lgN), it's certainly possible given your range of possible inputs an O(lgN) might serve you best.
You asked for a real-world example. I'll give you one. Computational biology. One strand of DNA encoded in ASCII is somewhere on the level of gigabytes in space. A typical database will obviously have many thousands of such strands.
Now, in the case of an indexing/searching algorithm, that log(n) multiple makes a large difference when coupled with constants. The reason why? This is one of the applications where the size of your input is astronomical. Additionally, the input size will always continue to grow.
Admittedly, these type of problems are rare. There are only so many applications this large. In those circumstances, though... it makes a world of difference.
Equality, the way you're describing it, is a common abuse of notation.
To clarify: we usually write f(x) = O(logN) to imply "f(x) is O(logN)".
At any rate, O(1) means a constant number of steps/time (as an upper bound) to perform an action regardless of how large the input set is. But for O(logN), number of steps/time still grows as a function of the input size (the logarithm of it), it just grows very slowly. For most real world applications you may be safe in assuming that this number of steps will not exceed 100, however I'd bet there are multiple examples of datasets large enough to mark your statement both dangerous and void (packet traces, environmental measurements, and many more).
For small enough N, O(N^N) can in practice be replaced with 1. Not O(1) (by definition), but for N=2 you can see it as one operation with 4 parts, or a constant-time operation.
What if all operations take 1hour? The difference between O(log N) and O(1) is then large, even with small N.
Or if you need to run the algorithm ten million times? Ok, that took 30minutes, so when I run it on a dataset a hundred times as large it should still take 30minutes because O(logN) is "the same" as O(1).... eh...what?
Your statement that "I understand O(f(N))" is clearly false.
Real world applications, oh... I don't know.... EVERY USE OF O()-notation EVER?
Binary search in sorted list of 10 million items for example. It's the very REASON we use hash tables when the data gets big enough. If you think O(logN) is the same as O(1), then why would you EVER use a hash instead of a binary tree?
As many have already said, for the real world, you need to look at the constant factors first, before even worrying about factors of O(log N).
Then, consider what you will expect N to be. If you have good reason to think that N<10, you can use a linear search instead of a binary one. That's O(N) instead of O(log N), which according to your lights would be significant -- but a linear search that moves found elements to the front may well outperform a more complicated balanced tree, depending on the application.
On the other hand, note that, even if log N is not likely to exceed 50, a performance factor of 10 is really huge -- if you're compute-bound, a factor like that can easily make or break your application. If that's not enough for you, you'll frequently see factors of (log N)^2 or (logN)^3 in algorithms, so even if you think you can ignore one factor of (log N), that doesn't mean you can ignore more of them.
Finally, note that the simplex algorithm for linear programming has a worst case performance of O(2^n). However, for practical problems, the worst case never comes up; in practice, the simplex algorithm is fast, relatively simple, and consequently very popular.
About 30 years ago, someone developed a polynomial-time algorithm for linear programming, but it was not initially practical because the result was too slow.
Nowadays, there are practical alternative algorithms for linear programming (with polynomial-time wost-case, for what that's worth), which can outperform the simplex method in practice. But, depending on the problem, the simplex method is still competitive.
The observation that O(log n) is oftentimes indistinguishable from O(1) is a good one.
As a familiar example, suppose we wanted to find a single element in a sorted array of one 1,000,000,000,000 elements:
with linear search, the search takes on average 500,000,000,000 steps
with binary search, the search takes on average 40 steps
Suppose we added a single element to the array we are searching, and now we must search for another element:
with linear search, the search takes on average 500,000,000,001 steps (indistinguishable change)
with binary search, the search takes on average 40 steps (indistinguishable change)
Suppose we doubled the number of elements in the array we are searching, and now we must search for another element:
with linear search, the search takes on average 1,000,000,000,000 steps (extraordinarily noticeable change)
with binary search, the search takes on average 41 steps (indistinguishable change)
As we can see from this example, for all intents and purposes, an O(log n) algorithm like binary search is oftentimes indistinguishable from an O(1) algorithm like omniscience.
The takeaway point is this: *we use O(log n) algorithms because they are often indistinguishable from constant time, and because they often perform phenomenally better than linear time algorithms.
Obviously, these examples assume reasonable constants. Obviously, these are generic observations and do not apply to all cases. Obviously, these points apply at the asymptotic end of the curve, not the n=3 end.
But this observation explains why, for example, we use such techniques as tuning a query to do an index seek rather than a table scan - because an index seek operates in nearly constant time no matter the size of the dataset, while a table scan is crushingly slow on sufficiently large datasets. Index seek is O(log n).
You might be interested in Soft-O, which ignores logarithmic cost. Check this paragraph in Wikipedia.
What do you mean by whether or not it "matters"?
If you're faced with the choice of an O(1) algorithm and a O(lg n) one, then you should not assume they're equal. You should choose the constant-time one. Why wouldn't you?
And if no constant-time algorithm exists, then the logarithmic-time one is usually the best you can get. Again, does it then matter? You just have to take the fastest you can find.
Can you give me a situation where you'd gain anything by defining the two as equal? At best, it'd make no difference, and at worst, you'd hide some real scalability characteristics. Because usually, a constant-time algorithm will be faster than a logarithmic one.
Even if, as you say, lg(n) < 100 for all practical purposes, that's still a factor 100 on top of your other overhead. If I call your function, N times, then it starts to matter whether your function runs logarithmic time or constant, because the total complexity is then O(n lg n) or O(n).
So rather than asking if "it matters" that you assume logarithmic complexity to be constant in "the real world", I'd ask if there's any point in doing that.
Often you can assume that logarithmic algorithms are fast enough, but what do you gain by considering them constant?
O(logN)*O(logN)*O(logN) is very different. O(1) * O(1) * O(1) is still constant.
Also a simple quicksort-style O(nlogn) is different than O(n O(1))=O(n). Try sorting 1000 and 1000000 elements. The latter isn't 1000 times slower, it's 2000 times, because log(n^2)=2log(n)
The title of the question is misleading (well chosen to drum up debate, mind you).
O(log N) == O(1) is obviously wrong (and the poster is aware of this). Big O notation, by definition, regards asymptotic analysis. When you see O(N), N is taken to approach infinity. If N is assigned a constant, it's not Big O.
Note, this isn't just a nitpicky detail that only theoretical computer scientists need to care about. All of the arithmetic used to determine the O function for an algorithm relies on it. When you publish the O function for your algorithm, you might be omitting a lot of information about it's performance.
Big O analysis is cool, because it lets you compare algorithms without getting bogged down in platform specific issues (word sizes, instructions per operation, memory speed versus disk speed). When N goes to infinity, those issues disappear. But when N is 10000, 1000, 100, those issues, along with all of the other constants that we left out of the O function, start to matter.
To answer the question of the poster: O(log N) != O(1), and you're right, algorithms with O(1) are sometimes not much better than algorithms with O(log N), depending on the size of the input, and all of those internal constants that got omitted during Big O analysis.
If you know you're going to be cranking up N, then use Big O analysis. If you're not, then you'll need some empirical tests.
In theory
Yes, in practical situations log(n) is bounded by a constant, we'll say 100. However, replacing log(n) by 100 in situations where it's correct is still throwing away information, making the upper bound on operations that you have calculated looser and less useful. Replacing an O(log(n)) by an O(1) in your analysis could result in your large n case performing 100 times worse than you expected based on your small n case. Your theoretical analysis could have been more accurate and could have predicted an issue before you'd built the system.
I would argue that the practical purpose of big-O analysis is to try and predict the execution time of your algorithm as early as possible. You can make your analysis easier by crossing out the log(n) terms, but then you've reduced the predictive power of the estimate.
In practice
If you read the original papers by Larry Page and Sergey Brin on the Google architecture, they talk about using hash tables for everything to ensure that e.g. the lookup of a cached web page only takes one hard-disk seek. If you used B-tree indices to lookup you might need four or five hard-disk seeks to do an uncached lookup [*]. Quadrupling your disk requirements on your cached web page storage is worth caring about from a business perspective, and predictable if you don't cast out all the O(log(n)) terms.
P.S. Sorry for using Google as an example, they're like Hitler in the computer science version of Godwin's law.
[*] Assuming 4KB reads from disk, 100bn web pages in the index, ~ 16 bytes per key in a B-tree node.
As others have pointed out, Big-O tells you about how the performance of your problem scales. Trust me - it matters. I have encountered several times algorithms that were just terrible and failed to meet the customers demands because they were too slow. Understanding the difference and finding an O(1) solution is a lot of times a huge improvement.
However, of course, that is not the whole story - for instance, you may notice that quicksort algorithms will always switch to insertion sort for small elements (Wikipedia says 8 - 20) because of the behaviour of both algorithms on small datasets.
So it's a matter of understanding what tradeoffs you will be doing which involves a thorough understanding of the problem, the architecture, & experience to understand which to use, and how to adjust the constants involved.
No one is saying that O(1) is always better than O(log N). However, I can guarantee you that an O(1) algorithm will also scale way better, so even if you make incorrect assumptions about how many users will be on the system, or the size of the data to process, it won't matter to the algorithm.
Yes, log(N) < 100 for most practical purposes, and No, you can not always replace it by constant.
For example, this may lead to serious errors in estimating performance of your program. If O(N) program processed array of 1000 elements in 1 ms, then you are sure it will process 106 elements in 1 second (or so). If, though, the program is O(N*logN), then it will take it ~2 secs to process 106 elements. This difference may be crucial - for example, you may think you've got enough server power because you get 3000 requests per hour and you think your server can handle up to 3600.
Another example. Imagine you have function f() working in O(logN), and on each iteration calling function g(), which works in O(logN) as well. Then, if you replace both logs by constants, you think that your program works in constant time. Reality will be cruel though - two logs may give you up to 100*100 multiplicator.
The rules of determining the Big-O notation are simpler when you don't decide that O(log n) = O(1).
As krzysio said, you may accumulate O(log n)s and then they would make a very noticeable difference. Imagine you do a binary search: O(log n) comparisons, and then imagine that each comparison's complexity O(log n). If you neglect both you get O(1) instead of O(log2n). Similarly you may somehow arrive at O(log10n) and then you'll notice a big difference for not too large "n"s.
Assume that in your entire application, one algorithm accounts for 90% of the time the user waits for the most common operation.
Suppose in real time the O(1) operation takes a second on your architecture, and the O(logN) operation is basically .5 seconds * log(N). Well, at this point I'd really like to draw you a graph with an arrow at the intersection of the curve and the line, saying, "It matters right here." You want to use the log(N) op for small datasets and the O(1) op for large datasets, in such a scenario.
Big-O notation and performance optimization is an academic exercise rather than delivering real value to the user for operations that are already cheap, but if it's an expensive operation on a critical path, then you bet it matters!
For any algorithm that can take inputs of different sizes N, the number of operations it takes is upper-bounded by some function f(N).
All big-O tells you is the shape of that function.
O(1) means there is some number A such that f(N) < A for large N.
O(N) means there is some A such that f(N) < AN for large N.
O(N^2) means there is some A such that f(N) < AN^2 for large N.
O(log(N)) means there is some A such that f(N) < AlogN for large N.
Big-O says nothing about how big A is (i.e. how fast the algorithm is), or where these functions cross each other. It only says that when you are comparing two algorithms, if their big-Os differ, then there is a value of N (which may be small or it may be very large) where one algorithm will start to outperform the other.
you are right, in many cases it does not matter for pracitcal purposes. but the key question is "how fast GROWS N". most algorithms we know of take the size of the input, so it grows linearily.
but some algorithms have the value of N derived in a complex way. if N is "the number of possible lottery combinations for a lottery with X distinct numbers" it suddenly matters if your algorithm is O(1) or O(logN)
Big-OH tells you that one algorithm is faster than another given some constant factor. If your input implies a sufficiently small constant factor, you could see great performance gains by going with a linear search rather than a log(n) search of some base.
O(log N) can be misleading. Take for example the operations on Red-Black trees.
The operations are O(logN) but rather complex, which means many low level operations.
Whenever N is the amount of objects that is stored in some kind of memory, you're correct. After all, a binary search through EVERY byte representable by a 64-bit pointer can be achieved in just 64 steps. Actually, it's possible to do a binary search of all Planck volumes in the observable universe in just 618 steps.
So in almost all cases, it's safe to approximate O(log N) with O(N) as long as N is (or could be) a physical quantity, and we know for certain that as long as N is (or could be) a physical quantity, then log N < 618
But that is assuming N is that. It may represent something else. Note that it's not always clear what it is. Just as an example, take matrix multiplication, and assume square matrices for simplicity. The time complexity for matrix multiplication is O(N^3) for a trivial algorithm. But what is N here? It is the side length. It is a reasonable way of measuring the input size, but it would also be quite reasonable to use the number of elements in the matrix, which is N^2. Let M=N^2, and now we can say that the time complexity for trivial matrix multiplication is O(M^(3/2)) where M is the number of elements in a matrix.
Unfortunately, I don't have any real world problem per se, which was what you asked. But at least I can make up something that makes some sort of sense:
Let f(S) be a function that returns the sum of the hashes of all the elements in the power set of S. Here is some pesudo:
f(S):
ret = 0
for s = powerset(S))
ret += hash(s)
Here, hash is simply the hash function, and powerset is a generator function. Each time it's called, it will generate the next (according to some order) subset of S. A generator is necessary, because we would not be able to store the lists for huge data otherwise. Btw, here is a python example of such a power set generator:
def powerset(seq):
"""
Returns all the subsets of this set. This is a generator.
"""
if len(seq) <= 1:
yield seq
yield []
else:
for item in powerset(seq[1:]):
yield [seq[0]]+item
yield item
https://www.technomancy.org/python/powerset-generator-python/
So what is the time complexity for f? As with the matrix multiplication, we can choose N to represent many things, but at least two makes a lot of sense. One is number of elements in S, in which case the time complexity is O(2^N), but another sensible way of measuring it is that N is the number of element in the power set of S. In this case the time complexity is O(N)
So what will log N be for sensible sizes of S? Well, list with a million elements are not unusual. If n is the size of S and N is the size of P(S), then N=2^n. So O(log N) = O(log 2^n) = O(n * log 2) = O(n)
In this case it would matter, because it's rare that O(n) == O(log n) in the real world.
I do not believe algorithms where you can freely choose between O(1) with a large constant and O(logN) really exists. If there is N elements to work with at the beginning, it is just plain impossible to make it O(1), the only thing that is possible is move your N to some other part of your code.
What I try to say is that in all real cases I know off you have some space/time tradeoff, or some pre-treatment such as compiling data to a more efficient form.
That is, you do not really go O(1), you just move the N part elsewhere. Either you exchange performance of some part of your code with some memory amount either you exchange performance of one part of your algorithm with another one. To stay sane you should always look at the larger picture.
My point is that if you have N items they can't disappear. In other words you can choose between inefficient O(n^2) algorithms or worse and O(n.logN) : it's a real choice. But you never really go O(1).
What I try to point out is that for every problem and initial data state there is a 'best' algorithm. You can do worse but never better. With some experience you can have a good guessing of what is this intrisic complexity. Then if your overall treatment match that complexity you know you have something. You won't be able to reduce that complexity, but only to move it around.
If problem is O(n) it won't become O(logN) or O(1), you'll merely add some pre-treatment such that the overall complexity is unchanged or worse, and potentially a later step will be improved. Say you want the smaller element of an array, you can search in O(N) or sort the array using any common O(NLogN) sort treatment then have the first using O(1).
Is it a good idea to do that casually ? Only if your problem asked also for second, third, etc. elements. Then your initial problem was truly O(NLogN), not O(N).
And it's not the same if you wait ten times or twenty times longer for your result because you simplified saying O(1) = O(LogN).
I'm waiting for a counter-example ;-) that is any real case where you have choice between O(1) and O(LogN) and where every O(LogN) step won't compare to the O(1). All you can do is take a worse algorithm instead of the natural one or move some heavy treatment to some other part of the larger pictures (pre-computing results, using storage space, etc.)
Let's say you use an image-processing algorithm that runs in O(log N), where N is the number of images. Now... stating that it runs in constant time would make one believe that no matter how many images there are, it would still complete its task it about the same amount of time. If running the algorithm on a single image would hypothetically take a whole day, and assuming that O(logN) will never be more than 100... imagine the surprise of that person that would try to run the algorithm on a very large image database - he would expect it to be done in a day or so... yet it'll take months for it to finish.

Resources