Preferred Sorting For People Based On Their Age - algorithm

Suppose we have 1 million entries of an object 'Person' with two fields 'Name', 'Age'. The problem was to sort the entries based on the 'Age' of the person.
I was asked this question in an interview. I answered that we could use an array to store the objects and use quick sort as that would save us from using additional space but interviewer told that memory was not a factor.
My question is what would be the factor that would decide which sort to use?
Also what would be the preferred way to store this?
In this scenario does any sorting algorithm have an advantage over another sorting algorithm and would result in a better complexity?

This Stackoverflow link may be useful to you.
The answers above are sufficient but i would like to add some more information from the link above.
I am copying some information from the answers in, the link above, over here.
We should note that even if the fields in the Object are very big (i.e. long names) you do not need to use a file system sort, you can use an in-memory sort, because
# elements * 8 ~= 762 MB (most modern systems have enough memory for that)
^
key(age) + pointer to struct requires 8 bytes in 32 bits system
It is important to minimize the disk accesses - because disks are not random access, and disk accesses are MUCH slower then RAM accesses.
Now, use a sort of your choice on that - and avoid using disk for the sorting process.
Some possibilities of sorts (on RAM) for this case are:
Standard quicksort or merge-sort (Which you had already thought of)
Bucket sort can also be applied here, since the rage is limited to [0,150] (Which others have specified here under the name Count Sort)
Radix sort (For the same reason, radix sort will need ceil(log_2(150)) ~= 8 iterations
I wanted to point out the memory aspect in case you may encounter the same question but may need to answer it taking the memory constraints into consideration. In fact your constraints are even less(10^6 compared to the 10^8 in the other question).
As for the matter of storing it -
The quickest way to sort it would be to allocate 151 linked lists/vector (let's call them buckets or whatever you may depending on the language you prefer) and put each person's data structure in the bucket according to his/her age(all people's ages are between 0 and 150):
bucket[person->age].add(person)
As others have pointed out Bucket Sort is going to be the better option for you.
In fact the beauty of bucket sort is that if you have to perform any operation on ranges of ages(like from 10-50 years of age) you can partition your bucket sizes according to your requirements(like have varied bucket range for each bucket).
I repeat again i have copied the information from the answers in the link given above, but i believe they might be useful to you.

If the array has n elements, then quicksort (or, actually, any comparison-based sort) is Ω(n log(n)).
Here, though, it looks like you have here an alternative to comparison-based sorting, since you need to sort only on age. Suppose there are m distinct ages. In this case, Counting Sort, will be Θ(m + n). For the specifics of your question, assuming that age is in years, m is much smaller than n, and you can do this in linear time.
The implementation is trivial. Simply create an array of, say, 200 entries (200 being an upper bound on the age). The array is of linked lists. Scan over the people, and place each person in the linked list in the appropriate entry. Now, just concatenate the lists according to the positions in the array.

Different sorting algorithms perform at different complexities, yes. Some use different amounts of space. And in practice, real performance with the same complexity varies too.
http://www.cprogramming.com/tutorial/computersciencetheory/sortcomp.html
There're different ways to set up a quicksort's partition method that could have an effect for ages. Shell sorts can have different gap settings that perform better for certain types of input. But maybe your interviewer was more interested in you thinking about 1 million people having a lot of duplicate ages; which might mean you want a 3-way quicksort, or as suggested in comments a counting sort.

This is an interview question, so I guess interviewee's answer is more important than correct sorting algorithm. Your problem is sorting array of Object with field age is integer. Age has some special properties:
integer: there are some sorting algorithms specially design for integer.
finite: you know maximum age of people, right? For example that will be 200.
I will list some sorting algorithm for this problem with advantages and disadvantages that suitable enough in one interview session:
Quick sort: complexity is O(NLogN) and can apply to any data set. Quicksort is the fastest sort that using compare operator between two elements. Biggest disadvantage of quicksort is quicksort isn't stable. That means two objects equal in age doesn't maintain order after sorting.
Merge sort: complexity is O(NLogN). Little bit slower than quicksort but this is a stable sort. Also this algorithm can apply to any data set.
radix sort: complexity is O(w*n), with n is size of your list and w is maximum length of number of digits in your dataset. For example: length of 12 is 3, length of 154 is 3. So if people's age maximum is 99, complexity should be O(2*n). This algorithm just can apply to integer or string.
Counting sort complexity is O(m+n). With n is size of your list and m is number of distinct ages. This algorithm just can apply to integer.
Because we are sorting milion of entries and all values are integer stand in range 0 .. 200 so ton of duplicate values. So counting sort is the best fit with complexity O(200 + N), with N ~= 1,000,000. 200 is not much.

If you assume that you have finite number of different values of age (usually people are not older then 100) then you could use
counting sort (https://en.wikipedia.org/wiki/Counting_sort). You would be able to sort in linear time.

Related

Number of comparisons for different lists in sorting algorithms

I've been studying sorting algorithms and had a question about the number of comparisons in each sorting algorithm.
Let's say we have a sorting algorithm (insertion sort, quicksort, anything). Then I want to count the number of comparisons using different files. These files have items that are randomized and not in order. For example, file 1 has 10 items, containing letters a to j. Then we have another file (again, 10 items) containing integers 1 to 10. Then we have another file (10 items), containing float numbers 1.1111111111 to 10.1111111111. If we want to sort these using any sorting algorithm (for the first one we sort in alphabetical order and others from smallest to largest number).
If we count the number of comparisons (in a quicksort algorithm, for example) in each file, would they be the same since we are comparing the same number of items, or does the length of the items change the number of comparisons (a vs 10.1111111)? If they are the same, is it the case for all sorting algorithms (at least the ones I mentioned) or just some? I don't think it's a hard question (sorry), but I'm over-thinking as usual. I'm guessing that they would be the same, but I'm not really. Would someone care to explain?
The number of comparisons depends on the initial state. the sorting algorithm and the specific implementation.
For example:
The implementation could make a first pass to check if the set is already sorted up or down to avoid unnecessary work or even a worst case scenario. This has a small cost but can avoid a pathological case. The number of comparisons will be very different for the same set between an implementation that does and one that does not.
Some implementation choices such as which element to select as a pivot in qsort() will greatly impact the number of comparisons for identical sets.
Even worse: to avoid quadratic worst case in qsort() that can be triggered more of less easily as described in Kernighan's paper anti qsort, one can implement qsort() to make non deterministic choices of pivot values, using some source of randomness. For such an implementation, the number of comparisons may vary, even for sorting the same set repeatedly. Note that this can produce a different order if some elements compare equal, due to qsort()s unstability.
Your teacher's question cannot be answered precisely unless you know both the initial state and the sorting algorithm specific implementation. Even best case and worst case numbers depend on the implementation details.
You are considering performance of algorithm with varies in input files. To standardize this kind of problems, scientist already gave three types of performance for every algorithm :
Best Case - Lower Bound on cost
Worst case - Upper Bound on cost
Average case - "Expected cost"
Now if you want to get number of comparison it makes with particular input then you can form your own mathematical model. But rather for standardization you can think of these three types. And another thing is, number of comparison doesn't varies with input type, but in which order data is. That means if you pass sorted input to the insertion sort, it will give you O(N) with approx N comparisons. But if it is in reverse form, then its worst case.
This is the analysis of sorting:
Reference : Princeton course

What sorting techniques can I use when comparing elements is expensive?

Problem
I have an application where I want to sort an array a of elements a0, a1,...,an-1. I have a comparison function cmp(i,j) that compares elements ai and aj and a swap function swap(i,j), that swaps elements ai and aj of the array. In the application, execution of the cmp(i,j) function might be extremely expensive, to the point where one execution of cmp(i,j) takes longer than any other steps in the sort (except for other cmp(i,j) calls, of course) together. You may think of cmp(i,j) as a rather lengthy IO operation.
Please assume for the sake of this question that there is no way to make cmp(i,j) faster. Assume all optimizations that could possibly make cmp(i,j) faster have already been done.
Questions
Is there a sorting algorithm that minimizes the number of calls to cmp(i,j)?
It is possible in my application to write a predicate expensive(i,j) that is true iff a call to cmp(i,j) would take a long time. expensive(i,j) is cheap and expensive(i,j) ∧ expensive(j,k) → expensive(i,k) mostly holds in my current application. This is not guaranteed though.
Would the existance of expensive(i,j) allow for a better algorithm that tries to avoid expensive comparing operations? If yes, can you point me to such an algorithm?
I'd like pointers to further material on this topic.
Example
This is an example that is not entirely unlike the application I have.
Consider a set of possibly large files. In this application the goal is to find duplicate files among them. This essentially boils down to sorting the files by some arbitrary criterium and then traversing them in order, outputting sequences of equal files that were encountered.
Of course reader in large amounts of data is expensive, therefor one can, for instance, only read the first megabyte of each file and calculate a hash function on this data. If the files compare equal, so do the hashes, but the reverse may not hold. Two large file could only differ in one byte near the end.
The implementation of expensive(i,j) in this case is simply a check whether the hashes are equal. If they are, an expensive deep comparison is neccessary.
I'll try to answer each question as best as I can.
Is there a sorting algorithm that minimizes the number of calls to cmp(i,j)?
Traditional sorting methods may have some variation, but in general, there is a mathematical limit to the minimum number of comparisons necessary to sort a list, and most algorithms take advantage of that, since comparisons are often not inexpensive. You could try sorting by something else, or try using a shortcut that may be faster that may approximate the real solution.
Would the existance of expensive(i,j) allow for a better algorithm that tries to avoid expensive comparing operations? If yes, can you point me to such an algorithm?
I don't think you can get around the necessity of doing at least the minimum number of comparisons, but you may be able to change what you compare. If you can compare hashes or subsets of the data instead of the whole thing, that could certainly be helpful. Anything you can do to simplify the comparison operation will make a big difference, but without knowing specific details of the data, it's hard to suggest specific solutions.
I'd like pointers to further material on this topic.
Check these out:
Apparently Donald Knuth's The Art of Computer Programming, Volume 3 has a section on this topic, but I don't have a copy handy.
Wikipedia of course has some insight into the matter.
Sorting an array with minimal number of comparisons
How do I figure out the minimum number of swaps to sort a list in-place?
Limitations of comparison based sorting techniques
The theoretical minimum number of comparisons needed to sort an array of n elements on average is lg (n!), which is about n lg n - n. There's no way to do better than this on average if you're using comparisons to order the elements.
Of the standard O(n log n) comparison-based sorting algorithms, mergesort makes the lowest number of comparisons (just about n lg n, compared with about 1.44 n lg n for quicksort and about n lg n + 2n for heapsort), so it might be a good algorithm to use as a starting point. Typically mergesort is slower than heapsort and quicksort, but that's usually under the assumption that comparisons are fast.
If you do use mergesort, I'd recommend using an adaptive variant of mergesort like natural mergesort so that if the data is mostly sorted, the number of comparisons is closer to linear.
There are a few other options available. If you know for a fact that the data is already mostly sorted, you could use insertion sort or a standard variation of heapsort to try to speed up the sorting. Alternatively, you could use mergesort but use an optimal sorting network as a base case when n is small. This might shave off enough comparisons to give you a noticeable performance boost.
Hope this helps!
A technique called the Schwartzian transform can be used to reduce any sorting problem to that of sorting integers. It requires you to apply a function f to each of your input items, where f(x) < f(y) if and only if x < y.
(Python-oriented answer, when I thought the question was tagged [python])
If you can define a function f such that f(x) < f(y) if and only if x < y, then you can sort using
sort(L, key=f)
Python guarantees that key is called at most once for each element of the iterable you are sorting. This provides support for the Schwartzian transform.
Python 3 does not support specifying a cmp function, only the key parameter. This page provides a way of easily converting any cmp function to a key function.
Is there a sorting algorithm that minimizes the number of calls to cmp(i,j)?
Edit: Ah, sorry. There are algorithms that minimize the number of comparisons (below), but not that I know of for specific elements.
Would the existence of expensive(i,j) allow for a better algorithm that tries to avoid expensive comparing operations? If yes, can you point me to such an algorithm?
Not that I know of, but perhaps you'll find it in these papers below.
I'd like pointers to further material on this topic.
On Optimal and Efficient in Place Merging
Stable Minimum Storage Merging by Symmetric Comparisons
Optimal Stable Merging (this one seems to be O(n log2 n) though
Practical In-Place Mergesort
If you implement any of them, posting them here might be useful for others too! :)
Is there a sorting algorithm that minimizes the number of calls to cmp(i,j)?
Merge insertion algorithm, described in D. Knuth's "The art of computer programming", Vol 3, chapter 5.3.1, uses less comparisons than other comparison-based algorithms. But still it needs O(N log N) comparisons.
Would the existence of expensive(i,j) allow for a better algorithm that tries to avoid expensive comparing operations? If yes, can you point me to such an algorithm?
I think some of existing sorting algorithms may be modified to take into account expensive(i,j) predicate. Let's take the simplest of them - insertion sort. One of its variants, named in Wikipedia as binary insertion sort, uses only O(N log N) comparisons.
It employs a binary search to determine the correct location to insert new elements. We could apply expensive(i,j) predicate after each binary search step to determine if it is cheap to compare the inserted element with "middle" element found in binary search step. If it is expensive we could try the "middle" element's neighbors, then their neighbors, etc. If no cheap comparisons could be found we just return to the "middle" element and perform expensive comparison.
There are several possible optimizations. If predicate and/or cheap comparisons are not so cheap we could roll back to the "middle" element earlier than all other possibilities are tried. Also if move operations cannot be considered as very cheap, we could use some order statistics data structure (like Indexable skiplist) do reduce insertion cost to O(N log N).
This modified insertion sort needs O(N log N) time for data movement, O(N2) predicate computations and cheap comparisons and O(N log N) expensive comparisons in the worst case. But more likely there would be only O(N log N) predicates and cheap comparisons and O(1) expensive comparisons.
Consider a set of possibly large files. In this application the goal is to find duplicate files among them.
If the only goal is to find duplicates, I think sorting (at least comparison sorting) is not necessary. You could just distribute the files between buckets depending on hash value computed for first megabyte of data from each file. If there are more than one file in some bucket, take other 10, 100, 1000, ... megabytes. If still more than one file in some bucket, compare them byte-by-byte. Actually this procedure is similar to radix sort.
Most sorting algorithm out there try minimize the amount of comparisons during sorting.
My advice:
Pick quick-sort as a base algorithm and memorize results of comparisons just in case you happen to compare the same problems again. This should help you in the O(N^2) worst case of quick-sort. Bear in mind that this will make you use O(N^2) memory.
Now if you are really adventurous you could try the Dual-Pivot quick-sort.
Something to keep in mind is that if you are continuously sorting the list with new additions, and the comparison between two elements is guaranteed to never change, you can memoize the comparison operation which will lead to a performance increase. In most cases this won't be applicable, unfortunately.
We can look at your problem in the another direction, Seems your problem is IO related, then you can use advantage of parallel sorting algorithms, In fact you can run many many threads to run comparison on files, then sort them by one of a best known parallel algorithms like Sample sort algorithm.
Quicksort and mergesort are the fastest possible sorting algorithm, unless you have some additional information about the elements you want to sort. They will need O(n log(n)) comparisons, where n is the size of your array.
It is mathematically proved that any generic sorting algorithm cannot be more efficient than that.
If you want to make the procedure faster, you might consider adding some metadata to accelerate the computation (can't be more precise unless you are, too).
If you know something stronger, such as the existence of a maximum and a minimum, you can use faster sorting algorithms, such as radix sort or bucket sort.
You can look for all the mentioned algorithms on wikipedia.
As far as I know, you can't benefit from the expensive relationship. Even if you know that, you still need to perform such comparisons. As I said, you'd better try and cache some results.
EDIT I took some time to think about it, and I came up with a slightly customized solution, that I think will make the minimum possible amount of expensive comparisons, but totally disregards the overall number of comparisons. It will make at most (n-m)*log(k) expensive comparisons, where
n is the size of the input vector
m is the number of distinct component which are easy to compare between each other
k is the maximum number of elements which are hard to compare and have consecutive ranks.
Here is the description of the algorithm. It's worth nothing saying that it will perform much worse than a simple merge sort, unless m is big and k is little. The total running time is O[n^4 + E(n-m)log(k)], where E is the cost of an expensive comparison (I assumed E >> n, to prevent it from being wiped out from the asymptotic notation. That n^4 can probably be further reduced, at least in the mean case.
EDIT The file I posted contained some errors. While trying it, I also fixed them (I overlooked the pseudocode for insert_sorted function, but the idea was correct. I made a Java program that sorts a vector of integers, with delays added as you described. Even if I was skeptical, it actually does better than mergesort, if the delay is significant (I used 1s delay agains integer comparison, which usually takes nanoseconds to execute)

Best method for sorting when you can use numeric indices?

Most of the time, we use built-in libraries for sorting, which are generic. But most of the times, too, we are sorting based on numeric indexes or other values that can be translated in indices. If I'm not mistaken, sorting numbers is O(n). So why aren't we ever using numeric sorting algorithms at all?
Is there really a need?
I'm not really sure (single) integers (or floating points for that matter, though most numeric sorts require / are efficient for integers) are what is being sorted 'most of the time', thus having some algorithm that only works on integers doesn't seem particularly useful. I say 'single' integers as opposed to (strings or) objects (or equivalent) that contain multiple integers, numbers, strings or whatever else.
Not to mention that (I believe) the bottleneck of any real-world program (that's primary purpose is more than just sorting data) (well, most of them) should not be sorting 'single' numbers using an O(n log n) sort. You're probably way better off changing the way your data is represented to remove the need for the sort rather than cutting down on the log n factor.
Numeric sorts
It's a common misperception, but no sorting algorithm (numeric or otherwise) is actually worst-case O(n). There's always some additional parameter that comes into play. For radix sort, the length of the numbers is the determining factor. For long numbers in short arrays, this length can easily be more than log n, resulting in worse performance than a O(n log n) sort (see below test).
Now numeric sorts are useful and way better than any comparison-based sorting algorithm given that your data conforms to specific constraints most (but not all) of the time (by looking at the complexity given by any decent reference, you should easily see what determines whether it will be good - e.g. O(kN) implies long numbers might cause it to take a bit longer, things like dealing with duplicates well is a bit more subtle).
So, why aren't they used?
Without extensive real-world experience / theoretical knowledge, you're unlikely to pick the most efficient algorithm it's entirely possible that you'll find yourself with a problem where the chosen algorithm, which should be awesome in theory severely under-performs a standard algorithm for your data, because of some subtle factor.
So standard libraries don't put you in the position to pick an incorrect sort and possibly have terrible performance because your data doesn't conform to some constraints. Library sorts tend to be decent all-round, but aren't specialized to specific data sets. Though I'm sure there are also libraries that focus on sorting algorithms, allowing you to pick from an extensive range of algorithms, but your average Joe the programmer probably doesn't want to / shouldn't be exposed to this choice.
Also note, while they aren't commonly included in libraries, it should be easy enough to find / write an implementation of whichever (popular) sort you wish to use ... which you should then benchmark against library sorts on a sufficient sample of your data before committing to it.
A somewhat random test
This is by no means intended to be a conclusive, 100% correct test with the best implementations of radix sort and quick sort to ever see the light of day. It's more to show that what the data looks like plays a large role in the performance of any given algorithm.
This is the only decent benchmark including radix-sort I could find in a few minutes of searching.
I ran the code and found this: (number range 0-2147483646)
(the time unit is related to nano-seconds, which doesn't really translate to seconds)
ArraySize Radix Quick
10 1889 126
100 2871 2702
1000 18227 38075
10000 360623 484128
100000 2306284 6029230
Quick-sort is faster for a large range of numbers and arrays of less than size 100 (exactly what I was saying above). Interesting but nothing really amazing about it. I mean who cares about the performance of sorting less than 100 numbers?
However, look what happened when I changed the number range to 0-99:
ArraySize Radix Quick
10 1937 121
100 8932 2022
1000 29513 14824
10000 236669 125926
100000 2393641 1225715
Quick-sort is consistently around 2x faster than Radix-sort for reasonably-sized arrays (1000-100000 elements).
You must be thinking - "What in the world? I thought radix sort was supposed to be good at these. I mean ... there's only 2 digits. And why is Quick-sort so much faster than in the above case?" Exactly. That's where "extensive real-world experience / theoretical knowledge" comes in. I suspect it relates to how well each algorithm / implementation deals with duplicates. But half of that could be because I might not have optimized the radix sort implementation for the smaller range (didn't know we do that? Well, that's another reason against trying to have a generic radix sort in a library)
Now 0-99 is probably not your typical data set either, and, overall, radix-sort is probably still better, but what you need to take away from all of this:
There's about a gazillion sorting algorithms. They vary greatly in what they're good at. Don't expect a standard library to give you a function for each. Comparison-based sorts can sort any comparable data types (and are fast enough for most practical applications) as opposed to numeric sorts which can only sort numbers. Thus having a single (or 2, as Java has) comparison-based sort in your (as in you, the person who wrote it) library is preferred.
Basically, we use comparison-based sorting algorithms because it's easier. Being able to supply a comparison function and get your data sorted is a huge win from an engineering perspective, even if you pay for it with a speed hit.
Keep in mind that the O(n log n) comparison-based sorting bound counts comparisons, not total runtime. If you're sorting strings, for instance, comparison can take time linear in the lengths of the strings being compared.
A common misconception (that I see echoed in the other answer) is that comparison-based sorting winds up having faster asymptotic complexity when you're sorting a moderate number of long numbers; say they're k bytes each. This simply isn't true; you do about n log(n) number comparisons, each of which takes O(k) time, for an overall complexity of O(k n log n). This is worse than O(k n).
Engineering a fast radix sort is a little harder than the theory says. While theory dictates that you should choose as large a radix as possible, there is a tradeoff between the radix you choose and the locality you achieve when partitioning the input stream. A bigger radix means fewer passes but also less local use of memory.

Sorting in O(n*log(n)) worst case

Is there a sort of an array that works in O(n*log(n)) worst case time complexity?
I saw in Wikipedia that there are sorts like that, but they are unstable, what does that mean? Is there a way to do in low space complexity?
Is there a best sorting algorithm?
An algorithm that requires only O(1) extra memory (so modifying the input array is permitted) is generally described as "in-place", and that's the lowest space complexity there is.
A sort is described as "stable" or not, according to what happens when there are two elements in the input which compare as equal, but are somehow distinguishable. For example, suppose you have a bunch of records with an integer field and a string field, and you sort them on the integer field. The question is, if two records have the same integer value but different string values, then will the one that came first in the input, also come first in the output, or is it possible that they will be reversed? A stable sort is one that guarantees to preserve the order of elements that compare the same, but aren't identical.
It is difficult to make a comparison sort that is in-place, and stable, and achieves O(n log n) worst-case time complexity. I've a vague idea that it's unknown whether or not it's possible, but I don't keep up to date on it.
Last time someone asked about the subject, I found a couple of relevant papers, although that question wasn't identical to this question:
How to sort in-place using the merge sort algorithm?
As far as a "best" sort is concerned - some sorting strategies take advantage of the fact that on the whole, taken across a large number of applications, computers spend a lot of time sorting data that isn't randomly shuffled, it has some structure to it. Timsort is an algorithm to take advantage of commonly-encountered structure. It performs very well in a lot of practical applications. You can't describe it as a "best" sort, since it's a heuristic that appears to do well in practice, rather than being a strict improvement on previous algorithms. But it's the "best" known overall in the opinion of people who ship it as their default sort (Python, Java 7, Android). You probably wouldn't describe it as "low space complexity", though, it's no better than a standard merge sort.
You can check out between mergesort, quicksort or heapsort all nicely described here.
There is also radix sort whose complexity is O(kN) but it takes full advantage of extra memory consumption.
You can also see that for smaller collections quicksort is faster but then mergesort takes the lead but all of this is case specific so take your time to study all 4 algorithms
For the question best algorithm, the simple answer is, it depends.It depends on the size of the data set you want to sort,it depends on your requirement.Say, Bubble sort has worst-case and average complexity both О(n2), where n is the number of items being sorted. There exist many sorting algorithms with substantially better worst-case or average complexity of O(n log n). Even other О(n2) sorting algorithms, such as insertion sort, tend to have better performance than bubble sort. Therefore, bubble sort is not a practical sorting algorithm when n is large.
Among simple average-case Θ(n2) algorithms, selection sort almost always outperforms bubble sort, but is generally outperformed by insertion sort.
selection sort is greatly outperformed on larger arrays by Θ(n log n) divide-and-conquer algorithms such as mergesort. However, insertion sort or selection sort are both typically faster for small arrays.
Likewise, you can yourself select the best sorting algorithm according to your requirements.
It is proven that O(n log n) is the lower bound for sorting generic items. It is also proven that O(n) is the lower bound for sorting integers (you need at least to read the input :) ).
The specific instance of the problem will determine what is the best algorithm for your needs, ie. sorting 1M strings is different from sorting 2M 7-bits integers in 2MB of RAM.
Also consider that besides the asymptotic runtime complexity, the implementation is making a lot of difference, as well as the amount of available memory and caching policy.
I could implement quicksort in 1 line in python, roughly keeping O(n log n) complexity (with some caveat about the pivot), but Big-Oh notation says nothing about the constant terms, which are relevant too (ie. this is ~30x slower than python built-in sort, which is likely written in C btw):
qsort = lambda a: [] if not a else qsort(filter(lambda x: x<a[len(a)/2], a)) + filter(lambda x: x == a[len(a)/2], a) + qsort(filter(lambda x: x>a[len(a)/2], a))
For a discussion about stable/unstable sorting, look here http://www.developerfusion.com/article/3824/a-guide-to-sorting/6/.
You may want to get yourself a good algorithm book (ie. Cormen, or Skiena).
Heapsort, maybe randomized quicksort
stable sort
as others already mentioned: no there isn't. For example you might want to parallelize your sorting algorithm. This leads to totally different sorting algorithms..
Regarding your question meaning stable, let's consider the following: We have a class of children associated with ages:
Phil, 10
Hans, 10
Eva, 9
Anna, 9
Emil, 8
Jonas, 10
Now, we want to sort the children in order of ascending age (and nothing else). Then, we see that Phil, Hans and Jonas all have age 10, so it is not clear in which order we have to order them since we sort just by age.
Now comes stability: If we sort stable we sort Phil, Hans and Jonas in the order they were before, i.e. we put Phil first, then Hans, and at last, Jonas (simply because they were in this order in the original sequence and we only consider age as comparison criterion). Similarily, we have to put Eva before Anna (both the same age, but in the original sequence Eva was before Anna).
So, the result is:
Emil, 8
Eva, 9
Anna, 9
Phil, 10 \
Hans, 10 | all aged 10, and left in original order.
Jonas, 10 /
To put it in a nutshell: Stability means that if two elements are equal (w.r.t. the chosen sorting criterion), the one coming first in the original sequence still comes first in the resulting sequence.
Note that you can easily transform any sorting algorithm into a stable sorting algorithm: If your original sequence holds n elements: e1, e2, e3, ..., en, you simply attach a counter to each one: (e1, 0), (e2, 1), (e3, 2), ..., (en, n-1). This means you store for each element its original position.
If now two elements are equal, you simply compare their counters and put the one with the lower counter value first. This increases runtime (and memory) by O(n), which is asymptotic no worsening since the best (comparison) sort algorithm needs already O(n lg n).

Limitations of comparison based sorting techniques

Comparison sort is opted for in most of the scenarios where data needs to be ordered. Techniques like merge sort, quick sort, insertion sort and other comparison sorts can handle different data types and efficiency with a lower limit of O(nLog(n)).
My questions are
Are there any limitations of comparison based sorting techniques?
Any sort of scenarios where non-comparison sorting techniques would be used?
cheers
You answered it more or less yourself. Comparison based sorting techniques are limited to lower limit of O(n Log(n)). Non-comparison based sorting techniques do not suffer from this limit. The general problem with non-sorting algorithms is that the domain must be better known and for that reason they aren't as versatile as comparison based techniques.
Pigeonhole sort is a great and quite simple example which is pretty fast as long as the number of possible key values is close to the number of elements.
Obviously the limitations of comparison sorts is the time factor - some are better than others, but given a large enough data set, they'll all get too slow at some point. The trick is to choose the right one given the kind and mix of data you're sorting.
Non-comparison sorting is based on other factors ignoring the data, eg counting sort will order a collection of data, by inspecting each element - not comparing it with any other value in the collection. Counting sort is useful to order a collection based on some data, if you had a collection of integers, it would order them by taking all the elements with a value of 1 and putting them into the destination first, then all elements of value 2 etc (ok, it uses a "sparse" array to quickly zoom through the collection and reorder the values, leaving gaps but that's the basic principle)
It is easy to see, why the comparison sort needs about N log N comparisons. There are N! permutations and as we know ln (N!) is approximately N ln N - N + O(ln N). In big O notation, we can neglect terms lower than N ln N, and because ln and log differs only by constant, we get final result O(N log N)

Resources