When is each sorting algorithm used? [closed] - algorithm

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
What are the use cases when a particular sorting algorithm is preferred over others - merge sort vs QuickSort vs heapsort vs 'intro sort', etc?
Is there a recommended guide in using them based on the size, type of data structure, available memory and cache, and CPU performance?

First, a definition, since it's pretty important: A stable sort is one that's guaranteed not to reorder elements with identical keys.
Recommendations:
Quick sort: When you don't need a stable sort and average case performance matters more than worst case performance. A quick sort is O(N log N) on average, O(N^2) in the worst case. A good implementation uses O(log N) auxiliary storage in the form of stack space for recursion.
Merge sort: When you need a stable, O(N log N) sort, this is about your only option. The only downsides to it are that it uses O(N) auxiliary space and has a slightly larger constant than a quick sort. There are some in-place merge sorts, but AFAIK they are all either not stable or worse than O(N log N). Even the O(N log N) in place sorts have so much larger a constant than the plain old merge sort that they're more theoretical curiosities than useful algorithms.
Heap sort: When you don't need a stable sort and you care more about worst case performance than average case performance. It's guaranteed to be O(N log N), and uses O(1) auxiliary space, meaning that you won't unexpectedly run out of heap or stack space on very large inputs.
Introsort: This is a quick sort that switches to a heap sort after a certain recursion depth to get around quick sort's O(N^2) worst case. It's almost always better than a plain old quick sort, since you get the average case of a quick sort, with guaranteed O(N log N) performance. Probably the only reason to use a heap sort instead of this is in severely memory constrained systems where O(log N) stack space is practically significant.
Insertion sort: When N is guaranteed to be small, including as the base case of a quick sort or merge sort. While this is O(N^2), it has a very small constant and is a stable sort.
Bubble sort, selection sort: When you're doing something quick and dirty and for some reason you can't just use the standard library's sorting algorithm. The only advantage these have over insertion sort is being slightly easier to implement.
Non-comparison sorts: Under some fairly limited conditions it's possible to break the O(N log N) barrier and sort in O(N). Here are some cases where that's worth a try:
Counting sort: When you are sorting integers with a limited range.
Radix sort: When log(N) is significantly larger than K, where K is the number of radix digits.
Bucket sort: When you can guarantee that your input is approximately uniformly distributed.

Quicksort is usually the fastest on average, but It has some pretty nasty worst-case behaviors. So if you have to guarantee no bad data gives you O(N^2), you should avoid it.
Merge-sort uses extra memory, but is particularly suitable for external sorting (i.e. huge files that don't fit into memory).
Heap-sort can sort in-place and doesn't have the worst case quadratic behavior, but on average is slower than quicksort in most cases.
Where only integers in a restricted range are involved, you can use some kind of radix sort to make it very fast.
In 99% of the cases, you'll be fine with the library sorts, which are usually based on quicksort.

The Wikipedia page on sorting algorithms has a great comparison chart.
http://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms

What the provided links to comparisons/animations do not consider is when the amount of data exceed available memory --- at which point the number of passes over the data, i.e. I/O-costs, dominate the runtime. If you need to do that, read up on "external sorting" which usually cover variants of merge- and heap sorts.
http://corte.si/posts/code/visualisingsorting/index.html and http://corte.si/posts/code/timsort/index.html also have some cool images comparing various sorting algorithms.

#dsimcha wrote:
Counting sort: When you are sorting integers with a limited range
I would change that to:
Counting sort: When you sort positive integers (0 - Integer.MAX_VALUE-2 due to the pigeonhole).
You can always get the max and min values as an efficiency heuristic in linear time as well.
Also you need at least n extra space for the intermediate array and it is stable obviously.
/**
* Some VMs reserve some header words in an array.
* Attempts to allocate larger arrays may result in
* OutOfMemoryError: Requested array size exceeds VM limit
*/
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
(even though it actually will allow to MAX_VALUE-2)
see:
Do Java arrays have a maximum size?
Also I would explain that radix sort complexity is O(wn) for n keys which are integers of word size w. Sometimes w is presented as a constant, which would make radix sort better (for sufficiently large n) than the best comparison-based sorting algorithms, which all perform O(n log n) comparisons to sort n keys. However, in general w cannot be considered a constant: if all n keys are distinct, then w has to be at least log n for a random-access machine to be able to store them in memory, which gives at best a time complexity O(n log n). (from wikipedia)

Related

Why quick sort is considered as fastest sorting algorithm?

Quick sort has worst case time complexity as O(n^2) while others like heap sort and merge sort has worst case time complexity as O(n log n) ..still quick sort is considered as more fast...Why?
On a side note, if sorting an array of integers, then counting / radix sort is fastest.
In general, merge sort does more moves but fewer compares than quick sort. The typical implementation of merge sort uses a temp array of the same size as the original array, or 1/2 the size (sort 2nd half into second half, sort first half into temp array, merge temp array + 2nd half into original array), so it needs more space than quick sort which optimally only needs log2(n) levels of nesting, and to avoid worst case nesting, a nesting check may be used and quick sort changed to heap sort, (this is called introsort).
If the compare overhead is greater than the move overhead, then merge sort is faster. A common example where compares take longer than moves would be sorting an array of pointers to strings. Only the (4 or 8 byte) pointers are moved, while the strings may be significantly larger (and similar for a large number of strings).
If there is significant pre-ordering of the data to be sorted, then timsort (fixed sized runs) or a "natural" merge sort (variable sized runs) will be faster.
While it is true that quicksort has worst case time complexity of O(n^2), as long as the quicksort implementation properly randomizes the input, its average case (expected) running time is O(n log n).
Additionally, the constant factors hidden by the asymptotic notation that do matter in practice are pretty small as compared to other popular choices such as merge sort. Thus, in expectation, quicksort will outperform other O(n log n) comparison sorts despite the less savory worst case bounds
Not exactly like that. Quicksort is the best in most cases, however it's pesimistic time complexity can be O(n^2), it doesn't mean it always is. The issue lies in choosing the right point of pivot, if you choose it correctly you have time complexity O(n log n).
In addition, quicksort is one of the cheapest/easiest in implementation.

Quicksort vs Mergesort on big arrray with a high range of values

If I have a big array with a high range of values, which would be faster, Quick sort or Merge sort?
First I would say both take the same time, because both have best case O(n*log(n) and both sorting algorithms should not be negatively effected by the array specification.
But because Quick sort is very reliant on the pivot you might want to argue that merge is better
From the theoretical point of view, there should be no big difference:
quicksort has an expected complexity of O(n log n) for random choice of the pivot element, but in worst case it could be up to O(n²).
mergesort has a fixed complexity of O(n log n)
I know that a professor of mine and his work group is sorting big sets of data (> 100 GB) using mergesort. He says that mergesort can be modified to reduce the number of read/write operation on your hard disc. Since they are very slow compared to read/write operations on your RAM, they are, what slows the algorithm massively down.

Memory speed tradeoff of sorting algorithm

Consider only the bubble sort and merge sort. For bubble sort, time complexity would be O(n) to worst case O(n^2) and space complexity O(1). For merge sort, time complexity would be O(nlogn) with space complexity O(n). Which sort would you choose if the size of input is less than 1000 and why? What about more than 1000?
This is an interview question I had. Just want to know how you guys would answer it.
Consider only the bubble sort and merge sort.
By less than 1000, it might mean RAM is enough for any sorting algorithm without external storage. It also implies that the theoretical bound for time complexity doesn't matter in this case. You can pick any sorting algorithm you like without incurring any time penalty. For example, you can do bubble sort since it may be easy and intuitive to implement. Merge sort is just as good.
When the input size is bigger than 1000, it is probably assuming that the time complexity matters and even that RAM may not be big enough without external storage. In this case, if you have to choose between the two, merge sort is the safe one to pick. This is because merge sort has better worst case performance over bubble sort and merge sort is a good candidate for external sort(when input size is bigger than RAM).

Why is Insertion sort better than Quick sort for small list of elements?

Isn't Insertion sort O(n^2) > Quicksort O(n log n)...so for a small n, won't the relation be the same?
Big-O Notation describes the limiting behavior when n is large, also known as asymptotic behavior. This is an approximation. (See http://en.wikipedia.org/wiki/Big_O_notation)
Insertion sort is faster for small n because Quick Sort has extra overhead from the recursive function calls. Insertion sort is also more stable than Quick sort and requires less memory.
This question describes some further benefits of insertion sort. ( Is there ever a good reason to use Insertion Sort? )
Define "small".
When benchmarking sorting algorithms, I found out that switching from quicksort to insertion sort - despite what everybody was saying - actually hurts performance (recursive quicksort in C) for arrays larger than 4 elements. And those arrays can be sorted with a size-dependent optimal sorting algorithm.
That being said, always keep in mind that O(n...) only is the number of comparisons (in this specific case), not the speed of the algorithm. The speed depends on the implementation, e. g., if your quicksort function as or not recursive and how quickly function calls are dealt with.
Last but not least, big oh notation is only an upper bound.
If algorithm A requires 10000 n log n comparions and algorithm B requires 10 n ^ 2, the first is O(n log n) and the second is O(n ^ 2). Nevertheless, the second will (probably) be faster.
O()-notation is typically used to characterize performance for large problems, while deliberately ignoring constant factors and additive offsets to performance.
This is important because constant factors and overhead can vary greatly between processors and between implementations: the performance you get for a single-threaded Basic program on a 6502 machine will be very different from the same algorithm implemented as a C program running on an Intel i7-class processor. Note that implementation optimization is also a factor: attention to detail can often get you a major performance boost, even if all other factors are the same!
However, the constant factor and overhead are still important. If your application ensures that N never gets very large, the asymptotic behavior of O(N^2) vs. O(N log N) doesn't come into play.
Insertion sort is simple and, for small lists, it is generally faster than a comparably implemented quicksort or mergesort. That is why a practical sort implementation will generally fall back on something like insertion sort for the "base case", instead of recursing all the way down to single elements.
Its a matter of the constants that are attached to the running time that we ignore in the big-oh notation(because we are concerned with order of growth). For insertion sort, the running time is O(n^2) i.e. T(n)<=c(n^2) whereas for Quicksort it is T(n)<=k(nlgn). As c is quite small, for small n, the running time of insertion sort is less then that of Quicksort.....
Hope it helps...
Good real-world example when insertion sort can be used in conjunction with quicksort is the implementation of qsort function from glibc.
The first thing to point is qsort implements quicksort algorithm with a stack because it consumes less memory, stack implemented through macros directives.
Summary of current implementation from the source code (you'll find a lot of useful information through comments if you take a look at it):
Non-recursive
Chose the pivot element using a median-of-three decision tree
Only quicksorts TOTAL_ELEMS / MAX_THRESH partitions, leaving
insertion sort to order the MAX_THRESH items within each partition.
This is a big win, since insertion sort is faster for small, mostly
sorted array segments.
The larger of the two sub-partitions is always pushed onto the
stack first
What is MAX_THRESH value stands for? Well, just a small constant magic value which
was chosen to work best on a Sun 4/260.
How about binary insertion sort? You can absolutely search the position to swap by using binary search.

Why is quicksort used in practice? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Quicksort has a worst-case performance of O(n2), but is still used widely in practice anyway. Why is this?
You shouldn't center only on worst case and only on time complexity. It's more about average than worst, and it's about time and space.
Quicksort:
has average time complexity of Θ(n log n);
can be implemented with space complexity of Θ(log n);
Also have in account that big O notation doesn't take in account any constants, but in practice it does make difference if the algorithm is few times faster. Θ(n log n) means, that algorithm executes in K n log(n), where K is constant. Quicksort is the comparison-sort algorithm with the lowest K.
Average asymptotic order of QuickSort is O(nlogn) and it's usually more efficient than heapsort due to smaller constants (tighter loops). In fact, there is a theoretical linear time median selection algorithm that you can use to always find the best pivot, thus resulting a worst case O(nlogn). However, the normal QuickSort is usually faster than this theoretical one.
To make it more sensible, consider the probability that QuickSort will finish in O(n2). It's just 1/n! which means it'll almost never encounter that bad case.
Interestingly, quicksort performs more comparisons on average than mergesort - 1.44 n lg n (expected) for quicksort versus n lg n for mergesort. If all that mattered were comparisons, mergesort would be strongly preferable to quicksort.
The reason that quicksort is fast is that it has many other desirable properties that work extremely well on modern hardware. For example, quicksort requires no dynamic allocations. It can work in-place on the original array, using only O(log n) stack space (worst-case if implemented correctly) to store the stack frames necessary for recursion. Although mergesort can be made to do this, doing so usually comes at a huge performance penalty during the merge step. Other sorting algorithms like heapsort also have this property.
Additionally, quicksort has excellent locality of reference. The partitioning step, if done using Hoare's in-place partitioning algorithm, is essentially two linear scans performed inward from both ends of the array. This means that quicksort will have a very small number of cache misses, which on modern architectures is critical for performance. Heapsort, on the other hand, doesn't have very good locality (it jumps around all over an array), though most mergesort implementations have reasonably locality.
Quicksort is also very parallelizable. Once the initial partitioning step has occurred to split the array into smaller and greater regions, those two parts can be sorted independently of one another. Many sorting algorithms can be parallelized, including mergesort, but the performance of parallel quicksort tends to be better than other parallel algorithms for the above reason. Heapsort, on the other hand, does not.
The only issue with quicksort is the possibility that it degrades to O(n2), which on large data sets can be very serious. One way to avoid this is to have the algorithm introspect on itself and switch to one of the slower but more dependable algorithms in the case where it degenerates. This algorithm, called introsort, is a great hybrid sorting algorithm that gets many of the benefits of quicksort without the pathological case.
In summary:
Quicksort is in-place except for the stack frames used in the recursion, which take O(log n) space.
Quicksort has good locality of reference.
Quicksort is easily parallelized.
This accounts for why quicksort tends to outperform sorting algorithms that on paper might be better.
Hope this helps!
Because on average it's the fastest comparison sort (in terms of elapsed time).
Because, in the general case, it's one of the fastest sorting algorithms.
In addition to being the fastest though, some of it's bad case scenarios can be avoided by shuffling the array before sorting it. As for it's weakness with small data sets, obviously isn't as big a problem since the datasets are small and the sort time is probably small regardless.
As an example, I wrote a python function for QuickSort and bubble sorts. The bubble sort takes ~20 seconds to sort 10,000 records, 11 seconds for 7500, and 5 for 5000. The quicksort does all these sorts in around 0.15 seconds!
It might be worth pointing out that C does have the library function qsort(), but there's no requirement that it be implemented using an actual QuickSort, that is up to the compiler vendor.
Bcs this is one of Algorithm with work well on large data set with O(NlogN) complexity. This is also in place algorithm which take constant space. By selecting pivot element wisely we can avoid worse case of Quick sort and will perform in O(NlogN) always even on sorted array.

Resources