How does one calculate the runtime of an algorithm? - algorithm

I've read about algorithm run-time in some algorithm books, where it's expressed as, O(n). For eg., the given code would run in O(n) time for the best case & O(n3) for the worst case. What does it mean & how does one calculate it for their own code? Is it like linear time , and is it like each predefined library function has their own run-time which should be kept in mind before calling it? Thanks...

A Beginner's Guide to Big O Notation might be a good place to start:
http://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
also take a look at Wikipedia
http://en.wikipedia.org/wiki/Big_O_notation
there are several related questions and good answers on stackoverflow
What is a plain English explanation of "Big O" notation?
and
Big-O for Eight Year Olds?

Should't this be in math?
If you are trying to sort with bubble sort array, that is already sorted, then
you can check, if this move along array checked anything. If not, all okey -- we done.
Than, for best case you will have O(n) compraisons(n-1, to be exact), for worst case(array is reversed) you will have O(n^2) compraisons(n(n-1)/2, to be exact).
More complicated example. Let's find maximum element of array.
Obvilously, you will always do n-1 compraisons, but how many assignments on average?
Complicated math answers: H(n) -1.
Usually, It is easy to Your Answerget best and worst scenarios, but average require a lot of math.
I would suggest you read Knuth, Volume 1. But who would not?
And, formal definition:
f(n)∈O(g(n)) means exist n∈N: for all m>n f(m)
In fact, you must read O-notation about on wiki.

The big-O notation is one kind of asymptotic notation. Asymptotic notation is an idea from mathematics, which describes the behavior of functions "in the limit" - as you approach infinity.
The problem with defining how long an algorithm takes to run is that you usually can't give an answer in milliseconds because it depends on the machine, and you can't give an answer in clock cycles or as an operation count because that would be too specific to particular data to be useful.
The simple way of looking at asymptotic notation is that it discards all the constant factors in a function. Basically, a n2 will always be bigger that b n if n is sufficiently large (assuming everything is positive). Changing the constant factors a and b doesn't change that - it changes the specific value of n where a n2 is bigger, but doesn't change that it happens. So we say that O(n2) is bigger than O(n), and forget about those constants that we probably can't know anyway.
That's useful because the problems with bigger n are usually the ones where things slow down enough that we really care. If n is small enough, the time taken is small, and the gains available from choosing different algorithms are small. When n gets big, choosing a different algorithm can make a huge difference. Also, the problems happen in the real world are often much bigger than the ones we can easily test against - and often, they keep growing over time (e.g. as databases accumulate more data).
It's a useful mathematical model that abstracts away enough awkward-to-handle detail that useful results can be found, but it's not a perfect system. We don't deal with infinite problems in the real world, and there are plenty of times when problems are small enough that those constants are relevant for real-world performance, and sometimes you just have to time things with a clock.
The MIT OCW Introduction to Algorithms course is very good for this kind of thing. The videos and other materials are available for free, and the course book (not free) is among the best books available for algorithms.

Related

Big Theta of Factorial Multiplied by a Coefficient

For a function with a run time of (cn)! where c is a coefficient >= 0 and c != n, would the tight bound of the run be Θ(n!) or Θ((cn)!)? Right now, I believe it would be Θ((cn)!) since they would differ by a coefficent >= n since cn != n.
Thanks!
Edit: A more specific example to clarify what I'm asking:
Will (7n)!, (5n/16)! and n! all be Θ(n!)?
You can use Stirling's approximation to get that if c>1 then (cn)! is asymptotically larger than pow(c,n)*n!, which is not O(n!) since the quotient diverges. As a more elementary approach consider this example for c=2: (2n)!=(2n)(2n-1)...(n+1)n!>n!n! and (n!n!)/n!=n! diverges, so (2n)! is NOT O(n!).
Will (7n)!, (5n/16)! and n! all be Θ(n!)?
I think there are two answers to your question.
The shorter one is from the purely theoretical point of view. Of those 3 only the n! lies in the class of Θ(n!). The second lies in the O(n!) (note big-O instead of big-Theta) and (7n)! is slower than Θ(n!), it lies in Θ((7n)!)
There is also a longer but more practical answer. And to get to it we first need to understand what is the big deal with this whole big-O and big-Theta business in the first place?
The thing is that for many practical tasks there are many algorithms and not all of them are equally or even similarly efficient. So the practical question is: can we somehow capture this difference in performance in an easy to understand and compare way? And this is the problem that big-O/big-Theta are trying to solve. The idea behind this method is that if we look at some algorithm with some complicated real formula for the exact time, there is only 1 term that grows faster than all others and thus dominates the time as the problem gets bigger. So let's compress this big formula to that dominant term. Then we can compare those terms and if they are different, we can easily say which is the better algorithm (7*n^2 is clearly better than 2*n^3).
Another idea is that the term "operation" is usually not that well defined at the level people usually think about algorithms. Which "operation" actually maps to a single CPU instruction and which to a few depends on many factors such as particular hardware. Also the instructions themselves can take different time to execute. Moreover sometimes the algorithm's working time is dominated by memory access than CPU instructions and those components are not easily additive. The morale of this story is that if two algorithms are different only in a scalar coefficient, you can't really compare those algorithms just theoretically. You need to compare some implementations in some particular environment. This is why algorithms complexity measure typically boils down to something like O(n^k) where k is a constant.
There is one more consideration: practicality. If the algorithm is some polynomial, there is a huge practical difference between cases a=3 and a=4 in O(n^a). But if it is something like O(2^(n^a)), then there is not much difference what exactly the a as along as a>1. This is because 2^n grows fast enough to make it impractical for almost any realistic n irrespective of a. So in practical terms it is often good enough approximation to put all such algorithms into a single "exponential algorithms" bucket and say they are all impractical even despite the fact there is a huge difference between them. This is where some mathematically unconventional notations like 2^O(n) come from.
From this last practical perspective the difference between Θ(n!) and Θ((7n)!) is also very little: both are totally impractical because both lie beyond even the exponential bucket of 2^O(n) (see Stirling's formula that shows that n! grows a bit faster than (n/e)^n). So it makes sense to put all such algorithms in another bucket of "factorial complexity" and mark them as impractical as well.

Right way to discuss computational complexity for small n

When discussing computational complexity, it seems everyone generally goes straight to Big O.
Lets say for example I have a hybrid algorithm such as merge sort which uses insertion sort for smaller subarrays (I believe this is called tiled merge sort). It's still ultimately merge sort with O(n log n), but I want to discuss the behaviour/characteristics of the algorithm for small n, in cases where no merging actually takes place.
For all intents and purposes the tiled merge sort is insertion sort, executing exactly the same instructions for the domain of my small n. However, Big O deals with the large and asymptotic cases and discussing Big O for small n is pretty much an oxymoron. People have yelled at me for even thinking the words "behaves like an O(n^2) algorithm in such cases". What is the correct way to describe the algorithm's behaviour in cases of small n within the context of formal theoretical computational analysis? To clarify, not just in the case where n is small, but in the case where n is never big.
One might say that for such small n it doesn't matter but I'm interested in the cases where it does, for example with a large constant such as being executed many times, and where in practice it would show a clear trend and be the dominant factor. For example the initial quadratic growth seen in the graph below. I'm not dismissing Big O, more asking for a way to properly tell both sides of the story.
[EDIT]
If for "small n", constants can easily remove all trace of a growth rate then either
only the asymptotic case is discussed, in which case there is less relevance to any practical application, or
there must be a threshold at which we agree n is no longer "small".
What about the cases where n is not "small" (n is sufficiently big that the growth rate will not to affected significantly by any practical constant), but not yet big enough to show the final asymptotic growth rate so only sub growth rates are visible (for example the shape in the image above)?
Are there no practical algorithms that exhibit this behaviour? Even if there aren't, theoretical discussion should still be possible. Do we measure instead of discussing the theory purely because that's "what one should do"? If some behaviour is observed in all practical cases, why can't there be theory that's meaningful?
Let me turn the question around the other way. I have a graph that shows segmented super-linear steps. It sounds like many people would say "this is a pure coincidence, it could be any shape imaginable" (at the extreme of course) and wouldn't bat an eyelid if it were a sine wave instead. I know in many cases the shape could be hidden by constants, but here it's quite obvious. How can I give a formal explanation of why the graph produces this shape?
I particularly like #Sneftel's words "imprecise but useful guidance".
I know Big O and asymptotic analysis isn't applicable. What is? How far can I take it?
Discuss in chat
For small n, computation complexity - how things change as n increases towards infinity - isn't meaningful as other effects dominate.
Papers I've seen which discuss behaviour for small values of n do so by measuring the algorithms on real systems, and discuss how the algorithms perform in practice rather than from a theoretical viewpoint. For example, for the graph you've added to your post I would say 'this graph demonstrates an O(N) asymptotic behaviour overall, but the growth within each tile is bounded quadratic'.
I don't know of a situation where a discussion of such behaviour from a theoretical viewpoint would be meaningful - it is well known that for small n the practical effects outweigh the effects of scaling.
It's important to remember that asymptotic analysis is an analytic simplification, not a mandate for analyzing algorithms. Take selection sort, for instance. Yes, it executes in O(n^2) time. But it is also true that it performs precisely n*(n-1)/2 comparisons, and n-1-k swaps, where k is the number of elements (other than the maximum) which start in the correct position. Asymptotic analysis is a tool for simplifying the (otherwise generally impractical) task of performance analysis, and one we can put aside if we're not interested in the "really big n" segment.
And you can choose how you express your bounds. Say a function requires precisely n + floor(0.01*2^n) operations. That's exponential time, of course. But one can also say "for data sizes up to 10 elements, this algorithm requires between n and 2*n operations." The latter describes not the shape of the curve, but an envelope around that curve, giving imprecise but useful guidance about the practicalities of the algorithm within a particular range of use cases.
You are right.
For small n, i.e. when only insertion sort is performed, the asymptotic behavior is quadratic O(n^2).
And for larger n, when tiled merge sort enters into play, the behavior switches to O(n.Log(n)).
There is no contradiction if you remember that every behavior has its domain of validity, before the switching threshold, let N, and after it.
In practice there will be a smooth blend between the curves around N. But in practice too, that value of N is so small that the quadratic behavior does not have enough "room" to manifest itself.
Another way to deal with this analysis is to say that N being a constant, the insertion sorts take constant time. But I would disagree to say that this is a must.
Let's unpack things a bit. Big-O is a tool for describing the growth rate of a function. One of the functions to which it is commonly applied is the worst-case running time of an algorithm on inputs of length n, executing on a particular abstract machine. We often forget about the last part because there is a large class of machines with random-access memory that can emulate one another with only constant-factor slowdown, and the class of problems solvable within a particular big-O running-time bound is equivalent across these machines.
If you want to talk about complexity on small inputs, then you need to measure constant factors. One way is to measure running times of actual implementations. People usually do this on physical machines, but if you're hardcore like Knuth, you invent your own assembly language complete with instruction timings. Another way is to measure something that's readily identifiable but also a useful proxy for the other work performed. For comparison sorts, this could be comparisons. For numerical algorithms, this is often floating-point operations.
Complexity is not about execution time for one n on one machine, so there is no need to consider it even if constant is large. Complexity tells you how the size of the input affects execution time. For small n, you can treat execution time as constant. This is the one side.
From the second side you are saying that:
You have a hybrid algorithm working in O(n log n) for n larger than some k and O(n^2) for n smaller than k.
The constant k is so large that algorithm works slowly.
There is no sense in such algorithm, because you could easily improve it.
Lets take Red-black tree. Operations on this tree are performed in O(n log n) time complexity, but there is a large constant. So, on normal machines, it could work slowly (i.e. slower than simpler solutions) in some cases. There is no need to consider it in analyzing complexity. You need to consider it when you are implementing it in your system: you need to check if it's the best choice considering the actual machine(s) on which it will be working and what problems it will be solving.
Read Knuth's "The Art of Computer Programming series", starting with "Volume 1. Fundamental Algorithms", section "1.2.10: Analysis of an Algorithm". There he shows (and in all the rest of his seminal work) how exact analysis can be conducted for arbitrary problem sizes, using a suitable reference machine, by taking a detailed census of every processor instruction.
Such analyses have to take into account not only the problem size, but also any relevant aspect of the input data distribution which will influence the running time. For simplification, the analysis are often limited to the study of the worst case, the expected case or the output-sensitive case, rather than a general statistical characterization. And for further simplification, asymptotic analysis is used.
Not counting the fact that except for trivial algorithms the detailed approach is mathematically highly challenging, it has become unrealistic on modern machines. Indeed, it relies on a processor behavior similar to the so-called RAM model, which assumes constant time per instruction and per memory access (http://en.wikipedia.org/wiki/Random-access_machine). Except maybe for special hardware combined to a hard real-time OS, these assumptions are nowadays completely wrong.
When you have an algorithm with a time complexity say O(n^2).And you also have an another algorithm with a time complexity, say O(n).Then from these two time complexity you can't conclude that the latter algorithm is faster than the former one for all input values.You can only say latter algorithm is asymptotically(means for sufficiently large input values)faster than the former one.Here you have to keep in mind the fact that in case of asymptotic notations constant factors are generally ignored to increase the understand-ability of the time complexity of the algorithm.As example: marge sort runs in O(nlogn) worst-case time and insertion sort runs in O(n^2) worst case time.But as the hidden constant factors in insertion sort is smaller than that of marge sort, in practice insertion sort can be faster than marge sort for small problem sizes on many machines.
Asymptotic notation does not describe the actual running-time of an algorithm.Actual running time is dependent on machine as different machine has different architecture and different Instruction Cycle Execution time.Asymptotic notation just describes asymptotically how fast an algorithm is with respect to other algorithms.But it does not describe the behavior of the algorithm in case of small input values(n<=no).The value of no (threshold) is dependent on the hidden constant factors and lower order terms.And hidden constant factors are dependent on the machine on which it will be executed.

Is there such a thing as "negative" big-O complexity? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Are there any O(1/n) algorithms?
This just popped in my head for no particular reason, and I suppose it's a strange question. Are there any known algorithms or problems which actually get easier or faster to solve with larger input? I'm guessing that if there are, it wouldn't be for things like mutations or sorting, it would be for decision problems. Perhaps there's some problem where having a ton of input makes it easy to decide something, but I can't imagine what.
If there is no such thing as negative complexity, is there a proof that there cannot be? Or is it just that no one has found it yet?
No that is not possible. Since Big-Oh is suppose to be an approximation of the number of operations an algorithm performs related to its domain size then it would not make sense to describe an algorithm as using a negative number of operations.
The formal definition section of the wikipedia article actually defines the Big-Oh notation in terms of using positive real numbers. So there actually is not even a proof because the whole concept of Big-Oh has no meaning on the negative real numbers per the formal definition.
Short answer: Its not possible because the definition says so.
update
Just to make it clear, I'm answering this part of the question: Are there any known algorithms or problems which actually get easier or faster to solve with larger input?
As noted in accepted answer here, there are no algorithms working faster with bigger input.
Are there any O(1/n) algorithms?
Even an algorithm like sleep(1/n) has to spend time reading its input, so its running time has a lower bound.
In particular, author referes relatively simple substring search algorithm:
http://en.wikipedia.org/wiki/Horspool
PS But using term 'negative complexity' for such algorithms doesn't seem to be reasonable to me.
To think in an algorithm that executes in negative time, is the same as thinking about time going backwards.
If the program starts executing at 10:30 AM and stops at 10:00 AM without passing through 11:00 AM, it has just executed with time = O(-1).
=]
Now, for the mathematical part:
If you can't come up with a sequence of actions that execute backwards in time (you never know...lol), the proof is quite simple:
positiveTime = O(-1) means:
positiveTime <= c * -1, for any C > 0 and n > n0 > 0
Consider the "C > 0" restriction.
We can't find a positive number that multiplied by -1 will result in another positive number.
By taking that in account, this is the result:
positiveTime <= negativeNumber, for any n > n0 > 0
Wich just proves that you can't have an algorithm with O(-1).
Not really. O(1) is the best you can hope for.
The closest I can think of is language translation, which uses large datasets of phrases in the target language to match up smaller snippets from the source language. The larger the dataset, the better (and to a certain extent faster) the translation. But that's still not even O(1).
Well, for many calculations like "given input A return f(A)" you can "cache" calculation results (store them in array or map), which will make calculation faster with larger number of values, IF some of those values repeat.
But I don't think it qualifies as "negative complexity". In this case fastest performance will probably count as O(1), worst case performance will be O(N), and average performance will be somewhere inbetween.
This is somewhat applicable for sorting algorithms - some of them have O(N) best-case scenario complexity and O(N^2) worst case complexity, depending on the state of data to be sorted.
I think that to have negative complexity, algorithm should return result before it has been asked to calculate result. I.e. it should be connected to a time machine and should be able to deal with corresponding "grandfather paradox".
As with the other question about the empty algorithm, this question is a matter of definition rather than a matter of what is possible or impossible. It is certainly possible to think of a cost model for which an algorithm takes O(1/n) time. (That is not negative of course, but rather decreasing with larger input.) The algorithm can do something like sleep(1/n) as one of the other answers suggested. It is true that the cost model breaks down as n is sent to infinity, but n never is sent to infinity; every cost model breaks down eventually anyway. Saying that sleep(1/n) takes O(1/n) time could be very reasonable for an input size ranging from 1 byte to 1 gigabyte. That's a very wide range for any time complexity formula to be applicable.
On the other hand, the simplest, most standard definition of time complexity uses unit time steps. It is impossible for a positive, integer-valued function to have decreasing asymptotics; the smallest it can be is O(1).
I don't know if this quite fits but it reminds me of bittorrent. The more people downloading a file, the faster it goes for all of them

How do you show that one algorithm is more efficient than another algorithm?

I'm no professional programmer and I don't study it. I'm an aerospace student and did a numeric method for my diploma thesis and also coded a program to prove that it works.
I did several methods and implemented several algorithms and tried to show the proofs why different situations needed their own algorithm to solve the task.
I did this proof with a mathematical approach, but some algorithm was so specific that I do know what they do and they do it right, but it was very hard to find a mathematical function or something to show how many iterations or loops it has to do until it finishes.
So, I would like to know how you do this comparison. Do you also present a mathematical function, or do you just do a speedtest of both algorithms, and if you do it mathematically, how do you do that? Do you learn this during your university studies, or how?
Thank you in advance, Andreas
The standard way of comparing different algorithms is by comparing their complexity using Big O notation. In practice you would of course also benchmark the algorithms.
As an example the sorting algorithms bubble sort and heap sort has complexity O(n2) and O(n log n) respective.
As a final note it's very hard to construct representative benchmarks, see this interesting post from Christer Ericsson on the subject.
While big-O notation can provide you with a way of distinguishing an awful algorithm from a reasonable algorithm, it only tells you about a particular definition of computational complexity. In the real world, this won't actually allow you to choose between two algorithms, since:
1) Two algorithms at the same order of complexity, let's call them f and g, both with O(N^2) complexity might differ in runtime by several orders of magnitude. Big-O notation does not measure the number of individual steps associated with each iteration, so f might take 100 steps while g takes 10.
In addition, different compilers or programming languages might generate more or less instructions for each iteration of the algorithm, and subtle choices in the description of the algorithm can make cache or CPU hardware perform 10s to 1000s of times worse, without changing either the big-O order, or the number of steps!
2) An O(N) algorithm might outperform an O(log(N)) algorithm
Big-O notation does not measure the number of individual steps associated with each iteration, so if O(N) takes 100 steps, but O(log(N)) takes 1000 steps for each iteration, then for data sets up to a certain size O(N) will be better.
The same issues apply to compilers as above.
The solution is to do an initial mathematical analysis of Big-O notation, followed by a benchmark-driven performance tuning cycle, using time and hardware performance counter data, as well as a good dollop of experience.
Firstly one would need to define what more efficient means, does it mean quicker, uses less system resources (such as memory) etc... (these factors are sometimes mutually exclusive)
In terms of standard definitions of efficiency one would often utilize Big-0 Notation, however in the "real world" outside academia normally one would profile/benchmark both equations and then compare the results
It's often difficult to make general assumptions about Big-0 notation as this is primarily concerned with looping and assumes a fixed cost for the code within a loop so benchmarking would be the better way to go
One caveat to watch out for is that sometimes the result can vary significantly based on the dataset size you're working with - for small N in a loop one will sometimes not find much difference
You might get off easy when there is a significant difference in the asymptotic Big-O complexity class for the worst case or for the expected case. Even then you'll need to show that the hidden constant factors don't make the "better" (from the asymptotic perspective) algorithm slower for reasonably sized inputs.
If difference isn't large, then given the complexity of todays computers, benchmarking with various datasets is the only correct way. You cannot even begin to take into account all of the convoluted interplay that comes from branch prediction accuracy, data and code cache hit rates, lock contention and so on.
Running speed tests is not going to provide you with as good quality an answer as mathematics will. I think your outline approach is correct -- but perhaps your experience and breadth of knowledge let you down when analysing on of your algorithms. I recommend the book 'Concrete Mathematics' by Knuth and others, but there are a lot of other good (and even more not good) books covering the topic of analysing algorithms. Yes, I learned this during my university studies.
Having written all that, most algoritmic complexity is analysed in terms of worst-case execution time (so called big-O) and it is possible that your data sets do not approach worst-cases, in which case the speed tests you run may illuminate your actual performance rather than the algorithm's theoretical performance. So tests are not without their value. I'd say, though, that the value is secondary to that of the mathematics, which shouldn't cause you any undue headaches.
That depends. At the university you do learn to compare algorithms by calculating the number of operations it executes depending on the size / value of its arguments. (Compare analysis of algorithms and big O notation). I would require of every decent programmer to at least understand the basics of that.
However in practice this is useful only for small algorithms, or small parts of larger algorithms. You will have trouble to calculate this for, say, a parsing algorithm for an XML Document. But knowing the basics often keeps you from making braindead errors - see, for instance, Joel Spolskys amusing blog-entry "Back to the Basics".
If you have a larger system you will usually either compare algorithms by educated guessing, making time measurements, or find the troublesome spots in your system by using a profiling tool. In my experience this is rarely that important - fighting to reduce the complexity of the system helps more.
To answer your question: " Do you also present a mathematical function, or do you just do a speedtest of both algorithms."
Yes to both - let's summarize.
The "Big O" method discussed above refers to the worst case performance as Mark mentioned above. The "speedtest" you mention would be a way to estimate "average case performance". In practice, there may be a BIG difference between worst case performance and average case performance. This is why your question is interesting and useful.
Worst case performance has been the classical way of defining and classifying algorithm performance. More recently, research has been more concerned with average case performance or more precisely performance bounds like: 99% of the problems will require less than N operations. You can imagine why the second case is far more practical for most problems.
Depending on the application, you might have very different requirements. One application may require response time to be less than 3 seconds 95% of the time - this would lead to defining performance bounds. Another might require performance to NEVER exceed 5 seconds - this would lead to analyzing worst case performance.
In both cases this is taught at the university or grad school level. Anyone developing new algorithms used in real-time applications should learn about the difference between average and worst case performance and should also be prepared to develop simulations and analysis of algorithm performance as part of an implementation process.
Hope this helps.
Big O notation give you the complexity of an algoritm in the worst case, and is mainly usefull to know how the algoritm will grow in execution time when the ammount of data that have to proccess grow up. For example (C-style syntax, this is not important):
List<int> ls = new List<int>(); (1) O(1)
for (int i = 0; i < ls.count; i++) (2) O(1)
foo(i); (3) O(log n) (efficient function)
Cost analysis:
(1) cost: O(1), constant cost
(2) cost: O(1), (int i = 0;)
O(1), (i < ls.count)
O(1), (i++)
---- total: O(1) (constant cost), but it repeats n times (ls.count)
(3) cost: O(log n) (assume it, just an example),
but it repeats n times as it is inside the loop
So, in asymptotic notation, it will have a cost of: O(n log n) (not as efficient) wich in this example is a reasonable result, but take this example:
List<int> ls = new List<int>(); (1) O(1)
for (int i = 0; i < ls.count; i++) (2) O(1)
if ( (i mod 2) == 0) ) (*) O(1) (constant cost)
foo(i); (3) O(log n)
Same algorithm but with a little new line with a condition. In this case asymptotic notation will chose the worst case and will conclude same results as above O(n log n), when is easily detectable that the (3) step will execute only half the times.
Data an so are only examples and may not be exact, just trying to illustrate the behaviour of the Big O notation. It mainly gives you the behaviour of your algoritm when data grow up (you algoritm will be linear, exponencial, logarithmical, ...), but this is not what everybody knows as "efficiency", or almost, this is not the only "efficiency" meaning.
However, this methot can detect "impossible of process" (sorry, don't know the exact english word) algoritms, this is, algoritms that will need a gigantic amount of time to be processed in its early steps (think in factorials, for example, or very big matix).
If you want a real world efficiency study, may be you prefere catching up some real world data and doing a real world benchmark of the beaviour of you algoritm with this data. It is not a mathematical style, but it will be more precise in the majority of cases (but not in the worst case! ;) ).
Hope this helps.
Assuming speed (not memory) is your primary concern, and assuming you want an empirical (not theoretical) way to compare algorithms, I would suggest you prepare several datasets differing in size by a wide margin, like 3 orders of magnitude. Then run each algorithm against every dataset, clock them, and plot the results. The shape of each algorithm's time vs. dataset size curve will give a good idea of its big-O performance.
Now, if the size of your datasets in practice are pretty well known, an algorithm with better big-O performance is not necessarily faster. To determine which algorithm is faster for a given dataset size, you need to tune each one's performance until it is "as fast as possible" and then see which one wins. Performance tuning requires profiling, or single-stepping at the instruction level, or my favorite technique, stackshots.
As others have pointed out rightfully a common way is to use the Big O-notation.
But, the Big O is only good as long as you consider processing performance of algorithms that are clearly defined and scoped (such as a bubble sort).
It's when other hardware resources or other running software running in parallell comes into play that the part called engineering kicks in. The hardware has its constraints. Memory and disk are limited resources. Disk performance even depend on the mechanics involved.
An operating system scheduler will for instance differentiate on I/O bound and CPU bound resources to improve the total performance for a given application. A DBMS will take into account disk reads and writes, memory and CPU usage and even networking in the case of clusters.
These things are hard to prove mathematically but are often easily benchmarked against a set of usage patterns.
So I guess the answer is that developers both use theoretical methods such as Big O and benchmarking to determine the speed of algorithms and its implementations.
This is usually expressed with big O notation. Basically you pick a simple function (like n2 where n is the number of elements) that dominates the actual number of iterations.

When does Big-O notation fail?

What are some examples where Big-O notation[1] fails in practice?
That is to say: when will the Big-O running time of algorithms predict algorithm A to be faster than algorithm B, yet in practice algorithm B is faster when you run it?
Slightly broader: when do theoretical predictions about algorithm performance mismatch observed running times? A non-Big-O prediction might be based on the average/expected number of rotations in a search tree, or the number of comparisons in a sorting algorithm, expressed as a factor times the number of elements.
Clarification:
Despite what some of the answers say, the Big-O notation is meant to predict algorithm performance. That said, it's a flawed tool: it only speaks about asymptotic performance, and it blurs out the constant factors. It does this for a reason: it's meant to predict algorithmic performance independent of which computer you execute the algorithm on.
What I want to know is this: when do the flaws of this tool show themselves? I've found Big-O notation to be reasonably useful, but far from perfect. What are the pitfalls, the edge cases, the gotchas?
An example of what I'm looking for: running Dijkstra's shortest path algorithm with a Fibonacci heap instead of a binary heap, you get O(m + n log n) time versus O((m+n) log n), for n vertices and m edges. You'd expect a speed increase from the Fibonacci heap sooner or later, yet said speed increase never materialized in my experiments.
(Experimental evidence, without proof, suggests that binary heaps operating on uniformly random edge weights spend O(1) time rather than O(log n) time; that's one big gotcha for the experiments. Another one that's a bitch to count is the expected number of calls to DecreaseKey).
[1] Really it isn't the notation that fails, but the concepts the notation stands for, and the theoretical approach to predicting algorithm performance. </anti-pedantry>
On the accepted answer:
I've accepted an answer to highlight the kind of answers I was hoping for. Many different answers which are just as good exist :) What I like about the answer is that it suggests a general rule for when Big-O notation "fails" (when cache misses dominate execution time) which might also increase understanding (in some sense I'm not sure how to best express ATM).
It fails in exactly one case: When people try to use it for something it's not meant for.
It tells you how an algorithm scales. It does not tell you how fast it is.
Big-O notation doesn't tell you which algorithm will be faster in any specific case. It only tells you that for sufficiently large input, one will be faster than the other.
When N is small, the constant factor dominates. Looking up an item in an array of five items is probably faster than looking it up in a hash table.
Short answer: When n is small. The Traveling Salesman Problem is quickly solved when you only have three destinations (however, finding the smallest number in a list of a trillion elements can last a while, although this is O(n). )
the canonical example is Quicksort, which has a worst time of O(n^2), while Heapsort's is O(n logn). in practice however, Quicksort is usually faster then Heapsort. why? two reasons:
each iteration in Quicksort is a lot simpler than Heapsort. Even more, it's easily optimized by simple cache strategies.
the worst case is very hard to hit.
But IMHO, this doesn't mean 'big O fails' in any way. the first factor (iteration time) is easy to incorporate into your estimates. after all, big O numbers should be multiplied by this almost-constant facto.
the second factor melts away if you get the amortized figures instead of average. They can be harder to estimate, but tell a more complete story
One area where Big O fails is memory access patterns. Big O only counts operations that need to be performed - it can't keep track if an algorithm results in more cache misses or data that needs to be paged in from disk. For small N, these effects will typically dominate. For instance, a linear search through an array of 100 integers will probably beat out a search through a binary tree of 100 integers due to memory accesses, even though the binary tree will most likely require fewer operations. Each tree node would result in a cache miss, whereas the linear search would mostly hit the cache for each lookup.
Big-O describes the efficiency/complexity of the algorithm and not necessarily the running time of the implementation of a given block of code. This doesn't mean Big-O fails. It just means that it's not meant to predict running time.
Check out the answer to this question for a great definition of Big-O.
For most algorithms there is an "average case" and a "worst case". If your data routinely falls into the "worst case" scenario, it is possible that another algorithm, while theoretically less efficient in the average case, might prove more efficient for your data.
Some algorithms also have best cases that your data can take advantage of. For example, some sorting algorithms have a terrible theoretical efficiency, but are actually very fast if the data is already sorted (or nearly so). Another algorithm, while theoretically faster in the general case, may not take advantage of the fact that the data is already sorted and in practice perform worse.
For very small data sets sometimes an algorithm that has a better theoretical efficiency may actually be less efficient because of a large "k" value.
One example (that I'm not an expert on) is that simplex algorithms for linear programming have exponential worst-case complexity on arbitrary inputs, even though they perform well in practice. An interesting solution to this is considering "smoothed complexity", which blends worst-case and average-case performance by looking at small random perturbations of arbitrary inputs.
Spielman and Teng (2004) were able to show that the shadow-vertex simplex algorithm has polynomial smoothed complexity.
Big O does not say e.g. that algorithm A runs faster than algorithm B. It can say that the time or space used by algorithm A grows at a different rate than algorithm B, when the input grows. However, for any specific input size, big O notation does not say anything about the performance of one algorithm relative to another.
For example, A may be slower per operation, but have a better big-O than B. B is more performant for smaller input, but if the data size increases, there will be some cut-off point where A becomes faster. Big-O in itself does not say anything about where that cut-off point is.
The general answer is that Big-O allows you to be really sloppy by hiding the constant factors. As mentioned in the question, the use of Fibonacci Heaps is one example. Fibonacci Heaps do have great asymptotic runtimes, but in practice the constants factors are way too large to be useful for the sizes of data sets encountered in real life.
Fibonacci Heaps are often used in proving a good lower bound for asymptotic complexity of graph-related algorithms.
Another similar example is the Coppersmith-Winograd algorithm for matrix multiplication. It is currently the algorithm with the fastest known asymptotic running time for matrix multiplication, O(n2.376). However, its constant factor is far too large to be useful in practice. Like Fibonacci Heaps, it's frequently used as a building block in other algorithms to prove theoretical time bounds.
This somewhat depends on what the Big-O is measuring - when it's worst case scenarios, it will usually "fail" in that the runtime performance will be much better than the Big-O suggests. If it's average case, then it may be much worse.
Big-O notation typically "fails" if the input data to the algorithm has some prior information. Often, the Big-O notation refers to the worst case complexity - which will often happen if the data is either completely random or completely non-random.
As an example, if you feed data to an algorithm that's profiled and the big-o is based on randomized data, but your data has a very well-defined structure, your result times may be much faster than expected. On the same token, if you're measuring average complexity, and you feed data that is horribly randomized, the algorithm may perform much worse than expected.
Small N - And for todays computers, 100 is likely too small to worry.
Hidden Multipliers - IE merge vs quick sort.
Pathological Cases - Again, merge vs quick
One broad area where Big-Oh notation fails is when the amount of data exceeds the available amount of RAM.
Using sorting as an example, the amount of time it takes to sort is not dominated by the number of comparisons or swaps (of which there are O(n log n) and O(n), respectively, in the optimal case). The amount of time is dominated by the number of disk operations: block writes and block reads.
To better analyze algorithms which handle data in excess of available RAM, the I/O-model was born, where you count the number of disk reads. In that, you consider three parameters:
The number of elements, N;
The amount of memory (RAM), M (the number of elements that can be in memory); and
The size of a disk block, B (the number of elements per block).
Notably absent is the amount of disk space; this is treated as if it were infinite. A typical extra assumption is that M > B2.
Continuing the sorting example, you typically favor merge sort in the I/O case: divide the elements into chunks of size θ(M) and sort them in memory (with, say, quicksort). Then, merge θ(M/B) of them by reading the first block from each chunk into memory, stuff all the elements into a heap, and repeatedly pick the smallest element until you have picked B of them. Write this new merge block out and continue. If you ever deplete one of the blocks you read into memory, read a new block from the same chunk and put it into the heap.
(All expressions should be read as being big θ). You form N/M sorted chunks which you then merge. You merge log (base M/B) of N/M times; each time you read and write all the N/B blocks, so it takes you N/B * (log base M/B of N/M) time.
You can analyze in-memory sorting algorithms (suitably modified to include block reads and block writes) and see that they're much less efficient than the merge sort I've presented.
This knowledge is courtesy of my I/O-algorithms course, by Arge and Brodal (http://daimi.au.dk/~large/ioS08/); I also conducted experiments which validate the theory: heap sort takes "almost infinite" time once you exceed memory. Quick sort becomes unbearably slow, merge sort barely bearably slow, I/O-efficient merge sort performs well (the best of the bunch).
I've seen a few cases where, as the data set grew, the algorithmic complexity became less important than the memory access pattern. Navigating a large data structure with a smart algorithm can, in some cases, cause far more page faults or cache misses, than an algorithm with a worse big-O.
For small n, two algorithms may be comparable. As n increases, the smarter algorithm outperforms. But, at some point, n grows big enough that the system succumbs to memory pressure, in which case the "worse" algorithm may actually perform better because the constants are essentially reset.
This isn't particularly interesting, though. By the time you reach this inversion point, the performance of both algorithms is usually unacceptable, and you have to find a new algorithm that has a friendlier memory access pattern AND a better big-O complexity.
This question is like asking, "When does a person's IQ fail in practice?" It's clear that having a high IQ does not mean you'll be successful in life and having a low IQ does not mean you'll perish. Yet, we measure IQ as a means of assessing potential, even if its not an absolute.
In algorithms, the Big-Oh notation gives you the algorithm's IQ. It doesn't necessarily mean that the algorithm will perform best for your particular situation, but there's some mathematical basis that says this algorithm has some good potential. If Big-Oh notation were enough to measure performance you would see a lot more of it and less runtime testing.
Think of Big-Oh as a range instead of a specific measure of better-or-worse. There's best case scenarios and worst case scenarios and a huge set of scenarios in between. Choose your algorithms by how well they fit within the Big-Oh range, but don't rely on the notation as an absolute for measuring performance.
When your data doesn't fit the model, big-o notation will still work, but you're going to see an overlap from best and worst case scenarios.
Also, some operations are tuned for linear data access vs. random data access, so one algorithm while superior in terms of cycles, might be doggedly slow if the method of calling it changes from design. Similarly, if an algorithm causes page/cache misses due to the way it access memory, Big-O isn't going to going to give an accurate estimate of the cost of running a process.
Apparently, as I've forgotten, also when N is small :)
The short answer: always on modern hardware when you start using a lot of memory. The textbooks assume memory access is uniform, and it is no longer. You can of course do Big O analysis for a non-uniform access model, but that is somewhat more complex.
The small n cases are obvious but not interesting: fast enough is fast enough.
In practice I've had problems using the standard collections in Delphi, Java, C# and Smalltalk with a few million objects. And with smaller ones where the dominant factor proved to be the hash function or the compare
Robert Sedgewick talks about shortcomings of the big-O notation in his Coursera course on Analysis of Algorithms. He calls particularly egregious examples galactic algorithms because while they may have a better complexity class than their predecessors, it would take inputs of astronomical sizes for it to show in practice.
https://www.cs.princeton.edu/~rs/talks/AlgsMasses.pdf
Big O and its brothers are used to compare asymptotic mathematical function growth. I would like to emphasize on the mathematical part. Its entirely about being able reduce your problem to a function where the input grows a.k.a scales. It gives you a nice plot where your input (x axis) related to the number of operations performed(y-axis). This is purely based on the mathematical function and as such requires us to accurately model the algorithm used into a polynomial of sorts. Then the assumption of scaling.
Big O immediately loses its relevance when the data is finite, fixed and constant size. Which is why nearly all embedded programmers don't even bother with big O. Mathematically this will always come out to O(1) but we know that we need to optimize our code for space and Mhz timing budget at a level that big O simply doesn't work. This is optimization is on the same order where the individual components matter due to their direct performance dependence on the system.
Big O's other failure is in its assumption that hardware differences do not matter. A CPU that has a MAC, MMU and/or a bit shift low latency math operations will outperform some tasks which may be falsely identified as higher order in the asymptotic notation. This is simply because of the limitation of the model itself.
Another common case where big O becomes absolutely irrelevant is where we falsely identify the nature of the problem to be solved and end up with a binary tree when in reality the solution is actually a state machine. The entire algorithm regimen often overlooks finite state machine problems. This is because a state machine complexity grows based on the number of states and not the number of inputs or data which in most cases are constant.
The other aspect here is the memory access itself which is an extension of the problem of being disconnected from hardware and execution environment. Many times the memory optimization gives performance optimization and vice-versa. They are not necessarily mutually exclusive. These relations cannot be easily modeled into simple polynomials. A theoretically bad algorithm running on heap (region of memory not algorithm heap) data will usually outperform a theoretically good algorithm running on data in stack. This is because there is a time and space complexity to memory access and storage efficiency that is not part of the mathematical model in most cases and even if attempted to model often get ignored as lower order terms that can have high impact. This is because these will show up as a long series of lower order terms which can have a much larger impact when there are sufficiently large number of lower order terms which are ignored by the model.
Imagine n3+86n2+5*106n2+109n
It's clear that the lower order terms that have high multiples will likely together have larger significance than the highest order term which the big O model tends to ignore. It would have us ignore everything other than n3. The term "sufficiently large n' is completely abused to imagine unrealistic scenarios to justify the algorithm. For this case, n has to be so large that you will run out of physical memory long before you have to worry about the algorithm itself. The algorithm doesn't matter if you can't even store the data. When memory access is modeled in; the lower order terms may end up looking like the above polynomial with over a 100 highly scaled lower order terms. However for all practical purposes these terms are never even part of the equation that the algorithm is trying to define.
Most scientific notations are generally the description of mathematical functions and used to model something. They are tools. As such the utility of the tool is constrained and only as good as the model itself. If the model cannot describe or is an ill fit to the problem at hand, then the model simply doesn't serve the purpose. This is when a different model needs to be used and when that doesn't work, a direct approach may serve your purpose well.
In addition many of the original algorithms were models of Turing machine that has a completely different working mechanism and all computing today are RASP models. Before you go into big O or any other model, ask yourself this question first "Am I choosing the right model for the task at hand and do I have the most practically accurate mathematical function ?". If the answer is 'No', then go with your experience, intuition and ignore the fancy stuff.

Resources