When is mergesort preferred over quicksort? - algorithm

Quicksort is better than mergesort in many cases. But when might mergesort be better than quicksort?
For example, mergesort works better when all data cannot be loaded to memory at once. Are there any other cases?
Answers to the suggested duplicate question list advantages of using quicksort over mergesort. I'm asking about the possible cases and applications where mergesort would be better than quicksort.

Both quicksort and mergesort can work just fine if you can't fit all data into memory at once. You can implement quicksort by choosing a pivot, then streaming elements in from disk into memory and writing elements into one of two different files based on how that element compares to the pivot. If you use a double-ended priority queue, you can actually do this even more efficiently by fitting the maximum number of possible elements into memory at once.
Mergesort is worst-case O(n log n). That said, you can easily modify quicksort to produce the introsort algorithm, a hybrid between quicksort, insertion sort, and heapsort, that's worst-case O(n log n) but retains the speed of quicksort in most cases.
It might be helpful to see why quicksort is usually faster than mergesort, since if you understand the reasons you can pretty quickly find some cases where mergesort is a clear winner. Quicksort usually is better than mergesort for two reasons:
Quicksort has better locality of reference than mergesort, which means that the accesses performed in quicksort are usually faster than the corresponding accesses in mergesort.
Quicksort uses worst-case O(log n) memory (if implemented correctly), while mergesort requires O(n) memory due to the overhead of merging.
There's one scenario, though, where these advantages disappear. Suppose you want to sort a linked list of elements. The linked list elements are scattered throughout memory, so advantage (1) disappears (there's no locality of reference). Second, linked lists can be merged with only O(1) space overhead instead of O(n) space overhead, so advantage (2) disappears. Consequently, you usually will find that mergesort is a superior algorithm for sorting linked lists, since it makes fewer total comparisons and isn't susceptible to a poor pivot choice.

A single most important advantage of merge sort over quick sort is its stability: the elements compared equal retain their original order.

MergeSort is stable by design, equal elements keep their original order.
MergeSort is well suited to be implemented parallel (multithreading).
MergeSort uses (about 30%) less comparisons than QuickSort. This is an often overlooked advantage, because a comparison can be quite expensive (e.g. when comparing several fields of database rows).

Quicksort is average case O(n log n), but has a worst case of O(n^2). Mergesort is always O(n log n). Besides the asymptotic worst case and the memory-loading of mergesort, I can't think of another reason.
Scenarios when quicksort is worse than mergesort:
Array is already sorted.
All elements in the array are the same.
Array is sorted in reverse order.
Take mergesort over quicksort if you don't know anything about the data.

Merge sort has a guaranteed upper limit of O(N log2N). Quick sort has such limit, too, but it is much higher - it is O(N2). When you need a guaranteed upper bound on the timing of your code, use merge sort over quick sort.
For example, if you write code for a real-time system that relies on sorting, merge sort would be a better choice.

Merge Sort Worst case complexity is O(nlogn) whereas Quick Sort worst case is O(n^2).
Merge Sort is a stable sort which means that the same element in an array maintain their original positions with respect to each other.

Related

Quick Sort vs Insertion Sort

When building a sorting algorithm to sort an array, how many n elements in the array is quick sort faster that Insertion sort? I know that Quick sort is good for more elements and that Insertion sort is great for smaller size. But was wondering around what size is Quick Sort a far better option than Insertion Sort?
These algorithms depend on more than just the size of the arrays to determine their run time. For quicksort, the pivot your algorithm selects can have a significant effect on runtime. If the pivot is consistently the greatest or least element, then the quicksort takes O(n^2). Insertion sort is also influenced by factors besides array size. If you are inserting elements in order, the algorithm might allow for a runtime of O(n) regardless of array size. However, if you are inserting in reverse-order, this algorithm will take O(n^2). Due to these factors, there is no size n for which one algorithm is guaranteed to perform better than the other. If you are concerned with the runtimes of sorting algorithms for large arrays, you should check out heapsort or mergesort, they are both O(n log n) and are much faster!

Algorithm description - is it heapsort or quicksort?

I"m having some trouble telling whether this algorithm is heapsort or quicksort...
Lets say I have an algorithm that I don't have the source code for - it is unstable, performance is good on large datasets, and runs in similar time for ordered and unordered sets.
Without any more information, is it possible to tell whether this algorithm is heapsort or quicksort?
I would say that it is mostly* impossible to tell what algorithm was used from the data you have.
Both quicksort and heapsort are unstable. Also both handles nicely large inputs (the constants are not that different). So these two things tells us mostly nothing.
The last piece of knowledge is about sorted input. Quicksort is a randomized algorithm, so sorted input is irrelevant here. The running time of heapsort also n logn for both directions of sort:
The running time of HEAPSORT on an array of length that is already
sorted in increasing order is Θ(n lgn), because even though it is
already sorted, it will be transformed back into a heap and sorted.
The running time of HEAPSORT on an array of length that is sorted in
decreasing order will be Θ(n lgn). This occurs because even though the
heap will be built in linear time, every time the element is removed
and HEAPIFY is called, it could cover the full height of the tree.
The only reason how I would try to guess an algorithm is by exploiting the randomness of quicksort. By this I mean that I would run the same dataset many many times, and would see potential fluctuations in time of execution (worse case is O(n^2)). If I have not found any significant fluctuations - this is heapsort, otherwise quicksort.
May be you will be more lucky if you can analyze the memory it uses. Heapsort requires O(1), where good quicksort needs O(logn) additional memory and naive one needs O(n). But you do not have this info at your disposal.
P.S. Thanks to Ixanezis and Mooingduck for pointing that quicksort in the real world is not really randomized. I didn't know that but it is true
A correctly implemented quicksort runs in linear time on constant arrays (that is, arrays where all the elements are the same). That's because all elements will match the pivot, so after the pivoting step which separates the array into three parts: (< pivot)(= pivot)(> pivot) the left and right parts will be empty, and the quicksort will terminate immediately.
Heapsort doesn't have this property: it always runs in O(n log n).
So to distinguish the two, I'd try sorting constant arrays of increasing size, and hope to see a greater than linear slowdown in the heapsort implementation.
This approach can also distinguish heapsort from badly implemented quicksort implementations! If the quicksort separates the array into three parts (<= pivot)(pivot)(> pivot), then the quicksort will take O(n^2) time as the right-hand part will be empty, and the left-hand part will have n-1 items in it. Sorting a 10,000,000 item array will distinguish this bad quicksort from heapsort -- heapsort will take a few seconds on a modern machine, but the badly implemented quicksort will take many minutes.

Why quick sort is considered as fastest sorting algorithm?

Quick sort has worst case time complexity as O(n^2) while others like heap sort and merge sort has worst case time complexity as O(n log n) ..still quick sort is considered as more fast...Why?
On a side note, if sorting an array of integers, then counting / radix sort is fastest.
In general, merge sort does more moves but fewer compares than quick sort. The typical implementation of merge sort uses a temp array of the same size as the original array, or 1/2 the size (sort 2nd half into second half, sort first half into temp array, merge temp array + 2nd half into original array), so it needs more space than quick sort which optimally only needs log2(n) levels of nesting, and to avoid worst case nesting, a nesting check may be used and quick sort changed to heap sort, (this is called introsort).
If the compare overhead is greater than the move overhead, then merge sort is faster. A common example where compares take longer than moves would be sorting an array of pointers to strings. Only the (4 or 8 byte) pointers are moved, while the strings may be significantly larger (and similar for a large number of strings).
If there is significant pre-ordering of the data to be sorted, then timsort (fixed sized runs) or a "natural" merge sort (variable sized runs) will be faster.
While it is true that quicksort has worst case time complexity of O(n^2), as long as the quicksort implementation properly randomizes the input, its average case (expected) running time is O(n log n).
Additionally, the constant factors hidden by the asymptotic notation that do matter in practice are pretty small as compared to other popular choices such as merge sort. Thus, in expectation, quicksort will outperform other O(n log n) comparison sorts despite the less savory worst case bounds
Not exactly like that. Quicksort is the best in most cases, however it's pesimistic time complexity can be O(n^2), it doesn't mean it always is. The issue lies in choosing the right point of pivot, if you choose it correctly you have time complexity O(n log n).
In addition, quicksort is one of the cheapest/easiest in implementation.

Differences in efficiency between merge quick and heap sort

All of these sorting algorithms have an average case of O(n log n), so I would just like to know how I would be able to differentiate between these three sorting algorithms if I could run tests but not know which sorting algorithm was being run.
another difference between Heap and Merge sort you may want to concern is, Heap is not stable sort, but Mergesort is.
here is a table(link below), you could find (almost) any information about comparison sort algorithms you want.
https://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms
heapsort is a inplace sorting algorithm , we don't need extra storage to sort the elements but mergesort is not inplace sorting algorithm , we required extra storage , in merge procedure , to sort the elements.The worst case running time of quicksort is O(n^2) that differentiate it form heapsort and mergesort
There are many cases in which performance of these algorithms are different.
For example.
if all input element are same.
then, heapsort will run in O(n) time
quicksort will run in O(n^2) time. (if last element is a pivote element)
and,
mergesort is going to take O(logn) time.

Comparison between timsort and quicksort

Why is it that I mostly hear about Quicksort being the fastest overall sorting algorithm when Timsort (according to wikipedia) seems to perform much better? Google didn't seem to turn up any kind of comparison.
TimSort is a highly optimized mergesort, it is stable and faster than old mergesort.
when comparing with quicksort, it has two advantages:
It is unbelievably fast for nearly sorted data sequence (including reverse sorted data);
The worst case is still O(N*LOG(N)).
To be honest, I don't think #1 is a advantage, but it did impress me.
Here are QuickSort's advantages
QuickSort is very very simple, even a highly tuned implementation, we can write down its pseduo codes within 20 lines;
QuickSort is fastest in most cases;
The memory consumption is LOG(N).
Currently, Java 7 SDK implements timsort and a new quicksort variant: i.e. Dual Pivot QuickSort.
If you need stable sort, try timsort, otherwise start with quicksort.
More or less, it has to do with the fact that Timsort is a hybrid sorting algorithm. This means that while the two underlying sorts it uses (Mergesort and Insertion sort) are both worse than Quicksort for many kinds of data, Timsort only uses them when it is advantageous to do so.
On a slightly deeper level, as Patrick87 states, quicksort is a worst-case O(n2) algorithm. Choosing a good pivot isn't hard, but guaranteeing an O(n log n) quicksort comes at the cost of generally slower sorting on average.
For more detail on Timsort, see this answer, and the linked blog post. It basically assumes that most data is already partially sorted, and constructs "runs" of sorted data that allow for efficient merges using mergesort.
Generally speaking quicksort is best algorithm for primitive array. This is due to memory locality and cache.
JDK7 uses TimSort for Object array. Object array only holds object reference. The object itself is stored in Heap. To compare object, we need to read object from heap. This is like reading from one part of the heap for one object, then randomly reading object from another part of heap. There will be a lot of cache miss. I guess for this reason memory locality is not important any more. This is may be why JDK only uses TimSort for Object array instead if primitive array.
This is only my guess.
Here are benchmark numbers from my machine (i7-6700 CPU, 3.4GHz, Ubuntu 16.04, gcc 5.4.0, parameters: SIZE=100000 and RUNS=3):
$ ./demo
Running tests
stdlib qsort time: 12246.33 us per iteration
##quick sort time: 5822.00 us per iteration
merge sort time: 8244.33 us per iteration
...
##tim sort time: 7695.33 us per iteration
in-place merge sort time: 6788.00 us per iteration
sqrt sort time: 7289.33 us per iteration
...
grail sort dyn buffer sort time: 7856.67 us per iteration
The benchmark comes from Swenson's sort project in which he as implemented several sorting algorithms in C. Presumably, his implementations are good enough to be representative, but I haven't investigated them.
So you really can't tell. Benchmark numbers only stay relevant for at most two years and then you have to repeat them. Possibly, timsort beat qsort waaay back in 2011 when the question was asked, but the times have changed. Or qsort was always the fastest, but timsort beat it on non-random data. Or Swenson's code isn't so good and a better programmer would turn the tide in timsort's favor. Or perhaps I suck and didn't use the right CFLAGS when compiling the code. Or... You get the point.
Tim Sort is great if you need an order-preserving sort, or if you are sorting a complex array (comparing heap-based objects) rather than a primitive array. As mentioned by others, quicksort benefits significantly from the locality of data and processor caching for primitive arrays.
The fact that the worst case of quicksort is O(n^2) was raised. Fortunately, you can achieve O(n log n) time worst-case with quicksort. The quicksort worst-case occurs when the pivot point is either the smallest or largest value such as when the pivot is the first or last element of an already sorted array.
We can achieve O(n log n) worst-case quicksort by setting the pivot at the median value. Since finding the median value can be done in linear time O(n). Since O(n) + O(n log n) = O(n log n), that becomes the worst-case time complexity.
In practice, however, most implementations find that a random pivot is sufficient so do not search for the median value.
Timsort is a popular hybrid sorting algorithm designed in 2002 by Tim Peters. It is a combination of insertion sort and merge sort. It is developed to perform well on various kinds of real world data sets. It is a fast, stable and adaptive sorting technique with average and worst-case performance of O(n log n).
How Timsort works
First of all, the input array is split into sub-arrays/blocks known as Run.
A simple Insertion Sort is used to sort each Run.
Merge Sort is used to merge the sorted Runs into a single array.
Advantages of Timsort
It performs better on nearly ordered data.
It is well-suited to dealing with real-world data.
Quicksort is a highly useful and efficient sorting algorithm that divides a large array of data into smaller ones and it is based on the concept of Divide and Conquer. Tony Hoare designed this sorting algorithm in 1959 with average performance of O(n log n).
How Quicksort works
Pick any element as the pivot.
Divide the array into partitions based on pivots.
Recursively apply quick sort to the left partition.
Recursively apply quick sort to the right partition.
Advantages of Quicksort
It performs better on random data as compared to Timsort.
It is useful when there is limited space availability.
It is the better suited for large data sets.

Resources