Runtime of Algorithms In Various Programming Languages - algorithm

How would the run time of algorithms differ in various programming languages? For example, assuming that an algorithm's run time is stated as Θ(n2) , would it differ in any way if it were ran in Haskell, Java, C, etc.?
EDIT: My question has been answered, but I would still like additional input if you guys have the time to spare. My professor is letting us do our own programming project for the quarter and this was an idea I came up with and I just want to make sure that there's enough for me to discuss. I am having second thoughts on this so if anyone else would like to suggest an idea(s) or modify/build upon this one, I'd greatly appreciate it.

The time complexity of an algorithm is heavily tied to the specific way in which it's implemented. For example, Dijkstra's algorithm will run in time O(m + n log n) if you use a Fibonacci heap, takes time O(m log n) if you use a binary heap, and takes time O(n2) if you use an array-based priority queue.
I mention this because if you say that a particular algorithm runs in time Θ(n2), you're saying that the algorithm, when implemented with specific data structures in a specific way, will run in time Θ(n2). Provided that you can faithfully implement the appropriate data structures and other basic operations in the programming language of your choice, the runtime will then be Θ(n2). The constant factors might vary greatly based on the particular language and the particular way in which that Θ(n2) algorithm is translated into the language, but there's no fundamental reason that an algorithm in C should run asymptotically faster than the same algorithm in Java, provided that it can also be represented in Java.
That said, certain programming languages might not support, or at least efficiently support, certain operations. Purely functional Haskell code, ignoring various monadic tricks, doesn't support variable assignment and mutation, so many algorithms that run very quickly in an imperative model won't work efficiently in Haskell. There's been a lot of research into purely functional algorithms and data structures, and there's a result (IIRC) that says that any imperative algorithm can be converted into a functional algorithm that has a slowdown of O(log n). In some cases, there's no slowdown at all (for example, in purely functional binomial heaps or red/black trees). If you look at Haskell and then allow for state using monadic tricks, then to the best of my knowledge there's no slowdown because the underlying implementation can optimize the code directly into an imperative equivalent.
As another example, single-threaded programming languages might not be able to implement parallel algorithms, so even if the parallel algorithm runs in, say, time Θ(f(n)), the best you might be able to do in a language without threads might be ω(f(n)). Languages that don't support bit manipulations (if they exist) might not be able to take advantage of certain bit-hacky tricks that shave O(log n) factors off of certain types of algorithms.
You sometimes do in practice see a slowdown when implementing algorithms in different programming languages because there are subtle differences between how those programming languages implement certain structures. For example, if you're not used to C++, it's easy to inadvertently have a lot of objects passed by value rather than by pointer or reference, adding in an extra cost due to the cost of copying objects. If you don't already know that the std::map implementation is usually some balanced BST, you might end up introducing an extra O(log n) factor due to lookup costs. That said, this is less about the specific language than it is about the specific implementation within that language.
Hope this helps!

Related

When should one implement a simple or advanced sorting algorithm?

Apart from the obvious "It's faster when there are many elements". When is it more appropriate to use a simple sorting algorithm (0(N^2)) compared to an advanced one (O(N log N))?
I've read quite a bit about for example insertion sort being preferred when you've got a small array that's nearly sorted because you get the best case N. Why is it not good to use quicksort for example, when you've got say 20 elements. Not just insertion or quick but rather when and why is a more simple algorithm useful compared to an advanced?
EDIT: If we're working with for example an array, does it matter which data input we have? Such as objects or primitive types (Integer).
The big-oh notation captures the runtime cost of the algorithm for large values of N. It is less effective at measuring the runtime of the algorithm for small values.
The actual transition from one algorithm to another is not a trivial thing. For large N, the effects of N really dominate. For small numbers, more complex effects become very important. For example, some algorithms have better cache coherency. Others are best when you know something about the data (like your example of insertion sort when the data is nearly sorted).
The balance also changes over time. In the past, CPU speeds and memory speeds were closer together. Cache coherency issues were less of an issue. In modern times, CPU speeds have generally left memory busses behind, so cache coherency is more important.
So there's no one clear cut and dry answer to when you should use one algorithm over another. The only reliable answer is to profile your code and see.
For amusement: I was looking at the dynamic disjoint forest problem a few years back. I came across a state-of-the-art paper that permitted some operations to be done in something silly like O(log log N / log^4N). They did some truly brilliant math to get there, but there was a catch. The operations were so expensive that, for my graphs of 50-100 nodes, it was far slower than the O(n log n) solution that I eventually used. The paper's solution was far more important for people operating on graphs of 500,000+ nodes.
When programming sorting algorithms, you have to take into account how much work would be put into implementing the actual algorithm vs the actual speed of it. For big O, the time to implement advanced algorithms would be outweighed by the decreased time taken to sort. For small O, such as 20-100 items, the difference is minimal, so taking a simpler route is much better.
First of all O-Notation gives you the sense of the worst case scenario. So in case the array is nearly sorted the execution time could be near to linear time so it would be better than quick sort for example.
In case the n is small enough, we do take into consideration other aspects. Algorithms such as Quick-sort can be slower because of all the recursions called. At that point it depends on how the OS handles the recursions which can end up being slower than the simple arithmetic operations required in the insertion-sort. And not to mention the additional memory space required for recursive algorithms.
Better than 99% of the time, you should not be implementing a sorting algorithm at all.
Instead use a standard sorting algorithm from your language's standard library. In one line of code you get to use a tested and optimized implementation which is O(n log(n)). It likely implements tricks you wouldn't have thought of.
For external sorts, I've used the Unix sort utility from time to time. Aside from the non-intuitive LC_ALL=C environment variable that I need to get it to behave, it is very useful.
Any other cases where you actually need to implement your own sorting algorithm, what you implement will be driven by your precise needs. I've had to deal with this exactly once for production code in two decades of programming. (That was because for a complex series of reasons, I needed to be sorting compressed data on a machine which literally did not have enough disk space to store said data uncompressed. I used a merge sort.)

Is (pure) functional programming antagonistic with "algorithm classics"?

The classic algorithm books (TAOCP, CLR) (and not so classic ones, such as the fxtbook)are full of imperative algorithms. This is most obvious with algorithms whose implementation is heavily based on arrays, such as combinatorial generation (where both array index and array value are used in the algorithm) or the union-find algorithm.
The worst-case complexity analysis of these algorithms depends on array accesses being O(1). If you replace arrays with array-ish persistent structures, such as Clojure does, the array accesses are no longer O(1), and the complexity analysis of those algorithms is no longer valid.
Which brings me to the following questions: is pure functional programming incompatible with the classical algorithms literature?
With respect to data structures, Chris Okasaki has done substantial research into adopting classic data structures into a purely functional setting, as many of the standard data structures no longer work when using destructive updates. His book "Purely Functional Data Structures" shows how some structures, like binomial heaps and red/black trees, can be implemented quite well in a functional setting, while other basic structures like arrays and queues must be implemented with more elaborate concepts.
If you're interested in pursuing this branch of the core algorithms, his book would be an excellent starting point.
The short answer is that, so long as the algorithm does not have effects that can be observed after it finishes (other than what it returns), then it is pure. This holds even when you do things like destructive array updates or mutation.
If you had an algorithm like say:
function zero(array):
ix <- 0
while(ix < length(array)):
array[ix] <- 0
ix <- ix+1
return array
Assuming our pseudocode above is lexically scoped, so long as the array parameter is first copied over and the returned array is a wholly new thing, this algorithm represents a pure function (in this case, the Haskell function fmap (const 0) would probably work). Most "imperative" algorithms shown in books are really pure functions, and it is perfectly fine to write them that way in a purely functional setting using something like ST.
I would recommend looking at Mercury or the Disciple Disciplined Compiler to see pure languages that still thrive on destruction.
You may be interested in this related question: Efficiency of purely functional programming.
is there any problem for which the best known non-destructive algorithm is asymptotically worse than the best known destructive algorithm, and if so by how much?
It is not. But it is true that one can see in many book algorithms that look like they are only usable in imperative languages. The main reason is that pure functional programming was restrained to academic use for a long time. Then, the authors of these algorithms strongly relied on imperative features to be in the mainstream. Now, consider two widely spread algorithms: quick sort and merge sort. Quick sort is more "imperative" than merge sort; one of its advantage is to be in place. Merge sort is more "pure" than quick sort (in some way) since it needs to copy and keep its data persistent. Actually many algorithm can be implemented in pure functional programming without losing too much efficiency. This is true for many algorithms in the famous Dragon Book for example.

Complexity of algorithms of different programming paradigms

I know that most programming languages are Turing complete, but I wonder whether a problem can be resolved with an algorithm of the same complexity with any programming language (and in particular with any programming paradigm).
To make my answer more explicit with an example: is there any problem which can be resolved with an imperative algorithm of complexity x (say O(n)), but cannot be resolved by a functional algorithm with the same complexity (or vice versa)?
Edit: The algorithm itself can be different. The question is about the complexity of solving the problem -- using any approach available in the language.
In general, no, not all algorithms can be implemented with the same order of complexity in all languages. This can be trivially proven, for instance, with a hypothetical language that disallows O(1) access to an array. However, there aren't any algorithms (to my knowledge) that cannot be implemented with the optimal order of complexity in a functional language. The complexity analysis of an algorithm's pseudocode makes certain assumptions about what operations are legal, and what operations are O(1). If you break one of those assumptions, you can alter the complexity of the algorithm's implementation even though the language is Turing complete. Turing-completeness makes no guarantees regarding the complexity of any operation.
An algorithm has a measured runtime such as O(n) like you said, implementations of an algorithm must adhere to that same runtime or they do not implement the algorithm. The language or implementation does not by definition change the algorithm and thus does not change the asymptotic runtime.
That said certain languages and technologies might make expressing the algorithm easier and offer constant speedups (or slowdowns) due to how the language gets compiled or executed.
I think your first paragraph is wrong. And I think your edit doesn't change that.
Assuming you are requiring that the observed behaviour of an implementation conforms to the time complexity of the algorithm, then...
When calculating the complexity of an algorithm assumptions are made about what operations are constant time. These assumptions are where you're going to find your clues.
Some of the more common assumptions are things like constant time array access, function calls, and arithmetic operations.
If you cannot provide those operations in a language in constant time you cannot reproduce the algorithm in a way that preserves the time complexity.
Reasonable languages can break those assumptions, and sometimes have to if they want to deal with, say, immutable data structures with shared state, concurrency, etc.
For example, Clojure uses trees to represent Vectors. This means that access is not constant time (I think it's log32 of the size of the array, but that's not constant even though it might as well be).
You can easily imagine a language having to do complicated stuff at runtime when calling a function. For example, deciding which one was meant.
Once upon a time floating point and multi-word integer multiplication and division were sadly not constant time (they were implemented in software). There was a period during which languages transitioned to using hardware when very reasonable language implementations behaved very differently.
I'm also pretty sure you can come up with algorithms that fare very poorly in the world of immutable data structures. I've seen some optimisation algorithms that would be horribly difficult, maybe impossible or effectively so, to implement while dealing immutability without breaking the time complexity.
For what it's worth, there are algorithms out there that assume set union and intersection are constant time... good luck implementing those algorithms in constant time. There are also algorithms that use an 'oracle' that can answer questions in constant time... good luck with those too.
I think that a language can have different basilar operations that cost O(1), for example mathematical operations (+, -, *, /), or variable/array access (a[i]), function call and everything you can think.
If a language do not have one of this O(1) operations (as brain bending that do not have O(1) array access) it can not do everything C can do with same complexity, but if a language have more O(1) operations (for example a language with O(1) array search) it can do more than C.
I think that all "serious" language have the same basilar O(1) operations, so they can resolve problem with same complexity.
If you consider Brainfuck or the Turing machine itself, there is one fundamental operation, that takes O(n) time there, although in most other languages it can be done in O(1) – indexing an array.
I'm not completely sure about this, but I think you can't have true array in functional programming either (having O(1) “get element at position” and O(1) “set element at position”). Because of immutability, you can either have a structure that can change quickly, but accessing it takes time or you will have to copy large parts of the structure on every change to get fast access. But I guess you could cheat around that using monads.
Looking at things like functional versus imperative, I doubt you'll find any real differences.
Looking at individual languages and implementations is a different story though. Ignoring, for the moment, the examples from Brainfuck and such, there are still some pretty decent examples to find.
I still remember one example from many years ago, writing APL (on a mainframe). The task was to find (and eliminate) duplicates in a sorted array of numbers. At the time, most of the programming I'd done was in Fortran, with a few bits and pieces in Pascal (still the latest and greatest thing at the time) or BASIC. I did what seemed obvious: wrote a loop that stepped through the array, comparing array[i] to array[i+1], and keeping track of a couple of indexes, copying each unique element back the appropriate number of places, depending on how many elements had already been eliminated.
While this would have worked quite well in the languages to which I was accustomed, it was barely short of a disaster in APL. The solution that worked a lot better was based more on what was easy in APL than computational complexity. Specifically, what you did was compare the first element of the array with the first element of the array after it had been "rotated" by one element. Then, you either kept the array as it was, or eliminated the last element. Repeat that until you'd gone through the whole array (as I recall, detected when the first element was smaller than the first element in the rotated array).
The difference was fairly simple: like most APL implementations (at least at the time), this one was a pure interpreter. A single operation (even one that was pretty complex) was generally pretty fast, but interpreting the input file took quite a bit of time. The improved version was much shorter and faster to interpret (e.g., APL provides the "rotate the array" thing as a single, primitive operation so that was only a character or two to interpret instead of a loop).

Algorithms: explanation about complexity and optimization

I'm trying to find a source or two on the web that explain these in simple terms. Also, can this notion be used in a practical fashion to improve an algorithm? If so, how? Below is a brief explanation I got from a contact.
I dont know where you can find simple
explanation. But i try to explain you.
Algorithm complexity is a term, that
explain dependence between input
information and resourses that need to
process it. For example, if you need
to find max in array, you should
enumerate all elements and compare it
with your variable(e.g. max). Suppose
that there are N elements in array.
So, complexity of this algorithm is
O(N), because we enumerate all
elements for one time. If we enumerate
all elements for 2 times, complexity
will be O(N*N). Binary search has
complexity O(log2(N)), because its
decrease area of search by a half
during every step. Also, you can
figure out a space complexity of your
algorithm - dependence between space,
required by program, and amount of
input data.
It's not easy to say all things about complexity, but I think wiki has a good explanation on it and for startup is good, see:
Big O notation for introducing
this aspect (Also you can look at
teta and omega notations too).
Analysis of algorithm, to know
about complexity more.
And Computational Complexity,
which is a big theory in computer
science.
and about optimization you can look at web and wiki to find it, but with five line your friends give a good sample for introduction, but these are not one night effort for understanding their usage, calculation, and thousand of theory.
In all you can be familiar with them as needed, reading wiki, more advance reading books like Gary and Johnson or read Computation Complexity, a modern approach, but do not expect you know everything about them after that. Also you can see this lecture notes: http://www.cs.rutgers.edu/~allender/lecture.notes/.
As your friend hinted, this isn't a simple topic. But it is worth investing some time to learn. Check out this book, commonly used as a textbook in CS courses on algorithms.
The course reader used in Stanford's introductory programming classes has a great chapter on algorithmic analysis by legendary CS educator Eric Roberts. The whole text is online at this link, and Chapter 8 might be worth looking at.
You can watch Structure and Interpretation of computer programs. It's a nice MIT course.
Also, can this notion be used in a practical fashion to improve an algorithm? If so, how?
It's not so much used for improving an algorithm but evaluating the performance of algorithms and deciding on which algorithm you choose to use. For any given problem, you really want to avoid algorithms that has O(N!) or O(N^x) since they slow down dramatically when the size of N (your input) increases. What you want is O(N) or O(log(N)) or even better O(1).
O(1) is constant time which means the algorithm takes the same amount of time to execute for a million inputs as it does for one. O(N) is of course linear which means the time it takes to execute the algorithm increases in proportion to its input.
There are even some problems where any algorithm developed to solve them end up being O(N!). Basically no fast algorithm can be developed to solve the problem completely (this class of problems is known as NP-complete). Once you realize you're dealing with such a problem you can relax your requirements a bit and solve the problem imperfectly by "cheating". These cheats don't necessarily find the optimal solution but instead settle for good enough. My favorite cheats are genetic/evolutionary algorithms and rainbow tables.
Another example of how understanding algorithmic complexity changes how you think about programming is micro-optimizations. Or rather, not doing it. You often see newbies asking questions like is ++x faster than x++. Seasoned programmers mostly don't care and will usually reply the first rule of optimization is: don't.
The more useful answer should be that changing x++ to ++x does not in any way alter your algorithm complexity. The complexity of your algorithm has a much greater impact on the speed of your code than any form of micro-optimization. For example, it is much more productive for you to look at your code and reduce the number of deeply nested for loops than it is to worry about how your compiler turns your code to assembly.
Yet another example is how in games programming speeding up code counter-intuitively involve adding even more code instead of reducing code. The added code are in the form of filters (basically if..else statements) that decides which bit of data need further processing and which can be discarded. Form a micro-optimizer point of view adding code means more instructions for the CPU to execute. But in reality those filters reduce the problem space by discarding data and therefore run faster overall.
By all means, understand data structures, algorithms, and big-O.
Design code carefully and well, keeping it as simple as possible.
But that's not enough.
The key to optimizing is knowing how to find problems, after the code is written.
Here's an example.

How do you show that one algorithm is more efficient than another algorithm?

I'm no professional programmer and I don't study it. I'm an aerospace student and did a numeric method for my diploma thesis and also coded a program to prove that it works.
I did several methods and implemented several algorithms and tried to show the proofs why different situations needed their own algorithm to solve the task.
I did this proof with a mathematical approach, but some algorithm was so specific that I do know what they do and they do it right, but it was very hard to find a mathematical function or something to show how many iterations or loops it has to do until it finishes.
So, I would like to know how you do this comparison. Do you also present a mathematical function, or do you just do a speedtest of both algorithms, and if you do it mathematically, how do you do that? Do you learn this during your university studies, or how?
Thank you in advance, Andreas
The standard way of comparing different algorithms is by comparing their complexity using Big O notation. In practice you would of course also benchmark the algorithms.
As an example the sorting algorithms bubble sort and heap sort has complexity O(n2) and O(n log n) respective.
As a final note it's very hard to construct representative benchmarks, see this interesting post from Christer Ericsson on the subject.
While big-O notation can provide you with a way of distinguishing an awful algorithm from a reasonable algorithm, it only tells you about a particular definition of computational complexity. In the real world, this won't actually allow you to choose between two algorithms, since:
1) Two algorithms at the same order of complexity, let's call them f and g, both with O(N^2) complexity might differ in runtime by several orders of magnitude. Big-O notation does not measure the number of individual steps associated with each iteration, so f might take 100 steps while g takes 10.
In addition, different compilers or programming languages might generate more or less instructions for each iteration of the algorithm, and subtle choices in the description of the algorithm can make cache or CPU hardware perform 10s to 1000s of times worse, without changing either the big-O order, or the number of steps!
2) An O(N) algorithm might outperform an O(log(N)) algorithm
Big-O notation does not measure the number of individual steps associated with each iteration, so if O(N) takes 100 steps, but O(log(N)) takes 1000 steps for each iteration, then for data sets up to a certain size O(N) will be better.
The same issues apply to compilers as above.
The solution is to do an initial mathematical analysis of Big-O notation, followed by a benchmark-driven performance tuning cycle, using time and hardware performance counter data, as well as a good dollop of experience.
Firstly one would need to define what more efficient means, does it mean quicker, uses less system resources (such as memory) etc... (these factors are sometimes mutually exclusive)
In terms of standard definitions of efficiency one would often utilize Big-0 Notation, however in the "real world" outside academia normally one would profile/benchmark both equations and then compare the results
It's often difficult to make general assumptions about Big-0 notation as this is primarily concerned with looping and assumes a fixed cost for the code within a loop so benchmarking would be the better way to go
One caveat to watch out for is that sometimes the result can vary significantly based on the dataset size you're working with - for small N in a loop one will sometimes not find much difference
You might get off easy when there is a significant difference in the asymptotic Big-O complexity class for the worst case or for the expected case. Even then you'll need to show that the hidden constant factors don't make the "better" (from the asymptotic perspective) algorithm slower for reasonably sized inputs.
If difference isn't large, then given the complexity of todays computers, benchmarking with various datasets is the only correct way. You cannot even begin to take into account all of the convoluted interplay that comes from branch prediction accuracy, data and code cache hit rates, lock contention and so on.
Running speed tests is not going to provide you with as good quality an answer as mathematics will. I think your outline approach is correct -- but perhaps your experience and breadth of knowledge let you down when analysing on of your algorithms. I recommend the book 'Concrete Mathematics' by Knuth and others, but there are a lot of other good (and even more not good) books covering the topic of analysing algorithms. Yes, I learned this during my university studies.
Having written all that, most algoritmic complexity is analysed in terms of worst-case execution time (so called big-O) and it is possible that your data sets do not approach worst-cases, in which case the speed tests you run may illuminate your actual performance rather than the algorithm's theoretical performance. So tests are not without their value. I'd say, though, that the value is secondary to that of the mathematics, which shouldn't cause you any undue headaches.
That depends. At the university you do learn to compare algorithms by calculating the number of operations it executes depending on the size / value of its arguments. (Compare analysis of algorithms and big O notation). I would require of every decent programmer to at least understand the basics of that.
However in practice this is useful only for small algorithms, or small parts of larger algorithms. You will have trouble to calculate this for, say, a parsing algorithm for an XML Document. But knowing the basics often keeps you from making braindead errors - see, for instance, Joel Spolskys amusing blog-entry "Back to the Basics".
If you have a larger system you will usually either compare algorithms by educated guessing, making time measurements, or find the troublesome spots in your system by using a profiling tool. In my experience this is rarely that important - fighting to reduce the complexity of the system helps more.
To answer your question: " Do you also present a mathematical function, or do you just do a speedtest of both algorithms."
Yes to both - let's summarize.
The "Big O" method discussed above refers to the worst case performance as Mark mentioned above. The "speedtest" you mention would be a way to estimate "average case performance". In practice, there may be a BIG difference between worst case performance and average case performance. This is why your question is interesting and useful.
Worst case performance has been the classical way of defining and classifying algorithm performance. More recently, research has been more concerned with average case performance or more precisely performance bounds like: 99% of the problems will require less than N operations. You can imagine why the second case is far more practical for most problems.
Depending on the application, you might have very different requirements. One application may require response time to be less than 3 seconds 95% of the time - this would lead to defining performance bounds. Another might require performance to NEVER exceed 5 seconds - this would lead to analyzing worst case performance.
In both cases this is taught at the university or grad school level. Anyone developing new algorithms used in real-time applications should learn about the difference between average and worst case performance and should also be prepared to develop simulations and analysis of algorithm performance as part of an implementation process.
Hope this helps.
Big O notation give you the complexity of an algoritm in the worst case, and is mainly usefull to know how the algoritm will grow in execution time when the ammount of data that have to proccess grow up. For example (C-style syntax, this is not important):
List<int> ls = new List<int>(); (1) O(1)
for (int i = 0; i < ls.count; i++) (2) O(1)
foo(i); (3) O(log n) (efficient function)
Cost analysis:
(1) cost: O(1), constant cost
(2) cost: O(1), (int i = 0;)
O(1), (i < ls.count)
O(1), (i++)
---- total: O(1) (constant cost), but it repeats n times (ls.count)
(3) cost: O(log n) (assume it, just an example),
but it repeats n times as it is inside the loop
So, in asymptotic notation, it will have a cost of: O(n log n) (not as efficient) wich in this example is a reasonable result, but take this example:
List<int> ls = new List<int>(); (1) O(1)
for (int i = 0; i < ls.count; i++) (2) O(1)
if ( (i mod 2) == 0) ) (*) O(1) (constant cost)
foo(i); (3) O(log n)
Same algorithm but with a little new line with a condition. In this case asymptotic notation will chose the worst case and will conclude same results as above O(n log n), when is easily detectable that the (3) step will execute only half the times.
Data an so are only examples and may not be exact, just trying to illustrate the behaviour of the Big O notation. It mainly gives you the behaviour of your algoritm when data grow up (you algoritm will be linear, exponencial, logarithmical, ...), but this is not what everybody knows as "efficiency", or almost, this is not the only "efficiency" meaning.
However, this methot can detect "impossible of process" (sorry, don't know the exact english word) algoritms, this is, algoritms that will need a gigantic amount of time to be processed in its early steps (think in factorials, for example, or very big matix).
If you want a real world efficiency study, may be you prefere catching up some real world data and doing a real world benchmark of the beaviour of you algoritm with this data. It is not a mathematical style, but it will be more precise in the majority of cases (but not in the worst case! ;) ).
Hope this helps.
Assuming speed (not memory) is your primary concern, and assuming you want an empirical (not theoretical) way to compare algorithms, I would suggest you prepare several datasets differing in size by a wide margin, like 3 orders of magnitude. Then run each algorithm against every dataset, clock them, and plot the results. The shape of each algorithm's time vs. dataset size curve will give a good idea of its big-O performance.
Now, if the size of your datasets in practice are pretty well known, an algorithm with better big-O performance is not necessarily faster. To determine which algorithm is faster for a given dataset size, you need to tune each one's performance until it is "as fast as possible" and then see which one wins. Performance tuning requires profiling, or single-stepping at the instruction level, or my favorite technique, stackshots.
As others have pointed out rightfully a common way is to use the Big O-notation.
But, the Big O is only good as long as you consider processing performance of algorithms that are clearly defined and scoped (such as a bubble sort).
It's when other hardware resources or other running software running in parallell comes into play that the part called engineering kicks in. The hardware has its constraints. Memory and disk are limited resources. Disk performance even depend on the mechanics involved.
An operating system scheduler will for instance differentiate on I/O bound and CPU bound resources to improve the total performance for a given application. A DBMS will take into account disk reads and writes, memory and CPU usage and even networking in the case of clusters.
These things are hard to prove mathematically but are often easily benchmarked against a set of usage patterns.
So I guess the answer is that developers both use theoretical methods such as Big O and benchmarking to determine the speed of algorithms and its implementations.
This is usually expressed with big O notation. Basically you pick a simple function (like n2 where n is the number of elements) that dominates the actual number of iterations.

Resources