Memory speed tradeoff of sorting algorithm - performance

Consider only the bubble sort and merge sort. For bubble sort, time complexity would be O(n) to worst case O(n^2) and space complexity O(1). For merge sort, time complexity would be O(nlogn) with space complexity O(n). Which sort would you choose if the size of input is less than 1000 and why? What about more than 1000?
This is an interview question I had. Just want to know how you guys would answer it.

Consider only the bubble sort and merge sort.
By less than 1000, it might mean RAM is enough for any sorting algorithm without external storage. It also implies that the theoretical bound for time complexity doesn't matter in this case. You can pick any sorting algorithm you like without incurring any time penalty. For example, you can do bubble sort since it may be easy and intuitive to implement. Merge sort is just as good.
When the input size is bigger than 1000, it is probably assuming that the time complexity matters and even that RAM may not be big enough without external storage. In this case, if you have to choose between the two, merge sort is the safe one to pick. This is because merge sort has better worst case performance over bubble sort and merge sort is a good candidate for external sort(when input size is bigger than RAM).

Related

Sorting technique - most efficient

What sorting technique would you use to sort 10,000 items using just 1000 available slots in your RAM?
Heap Sort
Quick Sort
Bubble Sort
Merge Sort
I am confused between quick and merge sort. Both have average time complexity of nlogn but again heap sort also has the same complexity. Any inputs would be appreciated!
Time complexity won't help you here - what the question is looking for is space complexity. Just as a hint, n = 10000 and you have only 1000 available spaces, so you need to pick an algorithm that is better than O(n) space complexity even in the worst case.
This seems like an HW question, so I'd prefer not to answer directly. In general, though, since your RAM is small and your list is big, you'll do best with something like a cache oblivious algorithm.

What kind of input data are the following sorting algorithms good/bad for?

What kind of data input are the following sorting algorithms efficient on/not efficient on? Quicksort, Mergesort, Heapsort, Insertion sort etc.
I know there are at least 2 factors that affect the performance of a sorting algorithm: 1) The size of the input, and 2) whether or not the data is already mostly sorted. But I don't know exactly how these factors affect the efficiency of the algorithms.
I'd like to study this in detail, so if there are any sources/links that you can point me to, that'd be great.
Assuming quicksort is based on Hoare partition scheme (middle value as pivot), then it won't degrade to worst case time complexity of O(n^2) for almost sorted data.
https://en.wikipedia.org/wiki/Quicksort#Hoare_partition_scheme
Mergesort always does n ⌈log2(n)⌉ moves. If data is already sorted, then the number of compares is about (⌈n ⌈log2(n)⌉)/2.
Heapsort time complexity remains about the same (duplicates may reduce running time).
Insertion sort is the only sort in this list that is faster if the data is nearly sorted, but it's time complexity is O(n^2). I'm thinking that for nearly sorted data, the time complexity would be ~ O(m n), where m is the number of elements out of place.
Variations of natural merge sort, which might use insertion sort on small runs while scanning and identifying already sorted runs, would have time complexity O(n) on already sorted data.

Why quick sort is considered as fastest sorting algorithm?

Quick sort has worst case time complexity as O(n^2) while others like heap sort and merge sort has worst case time complexity as O(n log n) ..still quick sort is considered as more fast...Why?
On a side note, if sorting an array of integers, then counting / radix sort is fastest.
In general, merge sort does more moves but fewer compares than quick sort. The typical implementation of merge sort uses a temp array of the same size as the original array, or 1/2 the size (sort 2nd half into second half, sort first half into temp array, merge temp array + 2nd half into original array), so it needs more space than quick sort which optimally only needs log2(n) levels of nesting, and to avoid worst case nesting, a nesting check may be used and quick sort changed to heap sort, (this is called introsort).
If the compare overhead is greater than the move overhead, then merge sort is faster. A common example where compares take longer than moves would be sorting an array of pointers to strings. Only the (4 or 8 byte) pointers are moved, while the strings may be significantly larger (and similar for a large number of strings).
If there is significant pre-ordering of the data to be sorted, then timsort (fixed sized runs) or a "natural" merge sort (variable sized runs) will be faster.
While it is true that quicksort has worst case time complexity of O(n^2), as long as the quicksort implementation properly randomizes the input, its average case (expected) running time is O(n log n).
Additionally, the constant factors hidden by the asymptotic notation that do matter in practice are pretty small as compared to other popular choices such as merge sort. Thus, in expectation, quicksort will outperform other O(n log n) comparison sorts despite the less savory worst case bounds
Not exactly like that. Quicksort is the best in most cases, however it's pesimistic time complexity can be O(n^2), it doesn't mean it always is. The issue lies in choosing the right point of pivot, if you choose it correctly you have time complexity O(n log n).
In addition, quicksort is one of the cheapest/easiest in implementation.

Quicksort vs Mergesort on big arrray with a high range of values

If I have a big array with a high range of values, which would be faster, Quick sort or Merge sort?
First I would say both take the same time, because both have best case O(n*log(n) and both sorting algorithms should not be negatively effected by the array specification.
But because Quick sort is very reliant on the pivot you might want to argue that merge is better
From the theoretical point of view, there should be no big difference:
quicksort has an expected complexity of O(n log n) for random choice of the pivot element, but in worst case it could be up to O(n²).
mergesort has a fixed complexity of O(n log n)
I know that a professor of mine and his work group is sorting big sets of data (> 100 GB) using mergesort. He says that mergesort can be modified to reduce the number of read/write operation on your hard disc. Since they are very slow compared to read/write operations on your RAM, they are, what slows the algorithm massively down.

When is each sorting algorithm used? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
What are the use cases when a particular sorting algorithm is preferred over others - merge sort vs QuickSort vs heapsort vs 'intro sort', etc?
Is there a recommended guide in using them based on the size, type of data structure, available memory and cache, and CPU performance?
First, a definition, since it's pretty important: A stable sort is one that's guaranteed not to reorder elements with identical keys.
Recommendations:
Quick sort: When you don't need a stable sort and average case performance matters more than worst case performance. A quick sort is O(N log N) on average, O(N^2) in the worst case. A good implementation uses O(log N) auxiliary storage in the form of stack space for recursion.
Merge sort: When you need a stable, O(N log N) sort, this is about your only option. The only downsides to it are that it uses O(N) auxiliary space and has a slightly larger constant than a quick sort. There are some in-place merge sorts, but AFAIK they are all either not stable or worse than O(N log N). Even the O(N log N) in place sorts have so much larger a constant than the plain old merge sort that they're more theoretical curiosities than useful algorithms.
Heap sort: When you don't need a stable sort and you care more about worst case performance than average case performance. It's guaranteed to be O(N log N), and uses O(1) auxiliary space, meaning that you won't unexpectedly run out of heap or stack space on very large inputs.
Introsort: This is a quick sort that switches to a heap sort after a certain recursion depth to get around quick sort's O(N^2) worst case. It's almost always better than a plain old quick sort, since you get the average case of a quick sort, with guaranteed O(N log N) performance. Probably the only reason to use a heap sort instead of this is in severely memory constrained systems where O(log N) stack space is practically significant.
Insertion sort: When N is guaranteed to be small, including as the base case of a quick sort or merge sort. While this is O(N^2), it has a very small constant and is a stable sort.
Bubble sort, selection sort: When you're doing something quick and dirty and for some reason you can't just use the standard library's sorting algorithm. The only advantage these have over insertion sort is being slightly easier to implement.
Non-comparison sorts: Under some fairly limited conditions it's possible to break the O(N log N) barrier and sort in O(N). Here are some cases where that's worth a try:
Counting sort: When you are sorting integers with a limited range.
Radix sort: When log(N) is significantly larger than K, where K is the number of radix digits.
Bucket sort: When you can guarantee that your input is approximately uniformly distributed.
Quicksort is usually the fastest on average, but It has some pretty nasty worst-case behaviors. So if you have to guarantee no bad data gives you O(N^2), you should avoid it.
Merge-sort uses extra memory, but is particularly suitable for external sorting (i.e. huge files that don't fit into memory).
Heap-sort can sort in-place and doesn't have the worst case quadratic behavior, but on average is slower than quicksort in most cases.
Where only integers in a restricted range are involved, you can use some kind of radix sort to make it very fast.
In 99% of the cases, you'll be fine with the library sorts, which are usually based on quicksort.
The Wikipedia page on sorting algorithms has a great comparison chart.
http://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms
What the provided links to comparisons/animations do not consider is when the amount of data exceed available memory --- at which point the number of passes over the data, i.e. I/O-costs, dominate the runtime. If you need to do that, read up on "external sorting" which usually cover variants of merge- and heap sorts.
http://corte.si/posts/code/visualisingsorting/index.html and http://corte.si/posts/code/timsort/index.html also have some cool images comparing various sorting algorithms.
#dsimcha wrote:
Counting sort: When you are sorting integers with a limited range
I would change that to:
Counting sort: When you sort positive integers (0 - Integer.MAX_VALUE-2 due to the pigeonhole).
You can always get the max and min values as an efficiency heuristic in linear time as well.
Also you need at least n extra space for the intermediate array and it is stable obviously.
/**
* Some VMs reserve some header words in an array.
* Attempts to allocate larger arrays may result in
* OutOfMemoryError: Requested array size exceeds VM limit
*/
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
(even though it actually will allow to MAX_VALUE-2)
see:
Do Java arrays have a maximum size?
Also I would explain that radix sort complexity is O(wn) for n keys which are integers of word size w. Sometimes w is presented as a constant, which would make radix sort better (for sufficiently large n) than the best comparison-based sorting algorithms, which all perform O(n log n) comparisons to sort n keys. However, in general w cannot be considered a constant: if all n keys are distinct, then w has to be at least log n for a random-access machine to be able to store them in memory, which gives at best a time complexity O(n log n). (from wikipedia)

Resources