Hybrid merge + insertion sorting algorithm - algorithm

In the classical merge sort algorithm, one typically divides the input array until they have several single-element subarrays prior to merging the elements back together. But, it's well-known that you can modify this mergesort algorithm by splitting the arrays until you have, say, k subarrays, each of size n/k (n is the original length of the array). You can then use insertion sort to sort each one of those k subarrays and merge them using the merge subroutine.
Intuitively, I think that this should be better than just merge sort in some cases because insertion sort is fast on small arrays. But I want to figure out precisely when this hybrid algorithm is better than the regular merge sort algorithm. I don't think it would be better for small k because as k approaches 1, we'd just be using the insertion sort algorithm. I think there is some optimal ratio n/k, but I am not so sure how to find it.
Any help is appreciated.

Related

Mixed Quick/Merge Sort's performance on random data

A test asked me to implement a sorting algorithm that sorts an array that when the size is N > 1000 by Merge sort, otherwise by Quick sort with the pivot is chosen randomly.
Then suppose the keys to compare consists of randomly distributed integers on [1, M].
How M should be for the above algorithm to run best?
I let Quick Sort handles the recursive call of Merge Sort if the size is <=1000. In my opinion, because of random keys, random pivots and Hoare's partition scheme isn't slowed down by repeated elements if M is much smaller than N, Quick sort will run at its best, and Merge sort runs the same for a specific array size regardless of keys distribution, so what is M used for here?
Quicksort must be implemented carefully to avoid pathological cases. Choosing the pivot at random is a good way to avoid quadratic time complexity on sorted arrays, but it is not sufficient for arrays with many duplicate elements.
If M is much smaller than N, you will have lots of duplicates. The original algorithm does not handle duplicates efficiently and this will cause quicksort performance to degrade significantly because Hoare's original algorithm only removes one element per recursion level on arrays with all identical elements.
See this question for a study of an actual implementation, its behavior on arrays with randomly distributed data in a small range and how to fix the quicksort implementation to avoid performance degradation: Benchmarking quicksort and mergesort yields that mergesort is faster

Does sorting time of n numbers depend on a permutation of the numbers?

Consider this problem:
A comparison-based sorting algorithm sorts an array with n items. For which fraction of n! permutations, the number of comparisons may be cn where c is a constant?
I know the best time complexity for sorting an array with arbitrary items is O(nlogn) and it doesn't depend on any order, right? So, there is no fraction that leads to cn comparisons. Please guide me if I am wrong.
This depends on the sorting algorithm you use.
Optimized Bubble Sort for example, compares all neighboring elements of an array and swaps them when the left element is larger then right one. This is repeated until no swaps where performed.
When you give Bubble Sort a sorted array it won't perform any swaps in the first iteration and thus sorts in O(n).
On the other hand, Heapsort will take O(n log n) independent of the order of the input.
Edit:
To answer your question for a given sorting algorithm, might be non-trivial. Only one out of n! permutations is sorted (assuming no duplicates for simplicity). However, for the example of bubblesort you could (starting for the sorted array) swap each pair of neighboring elements. This input will take Bubblesort two iterations which is also O(n).

O(nlogn) in-place sorting algorithm

This question was in the preparation exam for my midterm in introduction to computer science.
There exists an algorithm which can find the kth element in a list in
O(n) time, and suppose that it is in place. Using this algorithm,
write an in place sorting algorithm that runs in worst case time
O(n*log(n)), and prove that it does. Given that this algorithm exists,
why is mergesort still used?
I assume I must write some alternate form of the quicksort algorithm, which has a worst case of O(n^2), since merge-sort is not an in-place algorithm. What confuses me is the given algorithm to find the kth element in a list. Isn't a simple loop iteration through through the elements of an array already a O(n) algorithm?
How can the provided algorithm make any difference in the running time of the sorting algorithm if it does not change anything in the execution time? I don't see how used with either quicksort, insertion sort or selection sort, it could lower the worst case to O(nlogn). Any input is appreciated!
Check wiki, namely the "Selection by sorting" section:
Similarly, given a median-selection algorithm or general selection algorithm applied to find the median, one can use it as a pivot strategy in Quicksort, obtaining a sorting algorithm. If the selection algorithm is optimal, meaning O(n), then the resulting sorting algorithm is optimal, meaning O(n log n). The median is the best pivot for sorting, as it evenly divides the data, and thus guarantees optimal sorting, assuming the selection algorithm is optimal. A sorting analog to median of medians exists, using the pivot strategy (approximate median) in Quicksort, and similarly yields an optimal Quicksort.
The short answer why mergesort is prefered over quicksort in some cases is that it is stable (while quicksort is not).
Reasons for merge sort. Merge Sort is stable. Merge sort does more moves but fewer compares than quick sort. If the compare overhead is greater than move overhead, then merge sort is faster. One situation where compare overhead may be greater is sorting an array of indices or pointers to objects, like strings.
If sorting a linked list, then merge sort using an array of pointers to the first nodes of working lists is the fastest method I'm aware of. This is how HP / Microsoft std::list::sort() is implemented. In the array of pointers, array[i] is either NULL or points to a list of length pow(2,i) (except the last pointer points to a list of unlimited length).
I found the solution:
if(start>stop) 2 op.
pivot<-partition(A, start, stop) 2 op. + n
quickSort(A, start, pivot-1) 2 op. + T(n/2)
quickSort(A, pibvot+1, stop) 2 op. + T(n/2)
T(n)=8+2T(n/2)+n k=1
=8+2(8+2T(n/4)+n/2)+n
=24+4T(n/4)+2n K=2
...
=(2^K-1)*8+2^k*T(n/2^k)+kn
Recursion finishes when n=2^k <==> k=log2(n)
T(n)=(2^(log2(n))-1)*8+2^(log2(n))*2+log2(n)*n
=n-8+2n+nlog2(n)
=3n+nlog2(n)-8
=n(3+log2(n))-8
is O(nlogn)
Quick sort have worstcase O(n^2), but that only occurs if you have bad luck when choosing the pivot. If you can select the kth element in O(n) that means you can choose a good pivot by doing O(n) extra steps. That yields a woest-case O(nlogn) algorithm. There are a couple of reasons why mergesort is still used. First, this selection algorithm is more or less cumbersome to implement in-place, and also adds several extra operations to the regular quicksort, so it is not that fastest than merge sort, as one might expect.
Nevertheless, MergeSort is not still used because of its worst time complexity, in fact HeapSort achieves the same worst case bounds and is also in place, and didn't replace MergeSort, though it has also other disadvantages against quicksort. The main reason why MergeSort survives is because it is the fastest stable sort algorithm know so far. There are several applications in which is paramount to have an stable sorting algorithm. And that is the strength of MergeSort.
A stable sort is such that the equal items preserve the original relative order. For example, this is very useful when you have two keys, and you want to sort by first key first and then by second key, preserving the first key order.
The problem with HeapSort against quicksort is that it is cache inefficient, since you swap/compare elements too far from each other in the array, while quicksort compares consequent elements, these elements are more likely to be in the cache at the same time.

What is the best sorting methods?

I need more Sorting algorithms instead of these :
Insertion
Selection
Bubble
Shell
Merge
Heap
Quick
Radix
can anyone help ?
Quick sort is best sorting algorithms: QuickSort is a Divide and Conquer algorithm. It picks an element as pivot, movies smaller than pivot to left, moves larger than pivot to right.
Like QuickSort, Merge Sort is a Divide and Conquer algorithm. recursively divided in two halves till the size becomes 1, merge them back together sorted.
Bubble Sort is the simplest sorting algorithm. It is not used in the real world, since it is not very efficient. Elements bubble to front and back.
Selection sort algorithm sorts an array by repeatedly finding the minimum/maximum element. This algorithm maintains two subarrays, one subarray contains selected list which is sorted and other subarray contains remaining items which are not sorted.
Look at this links
http://www.sorting-algorithms.com/
https://www.cs.usfca.edu/~galles/visualization/ComparisonSort.html

Linear vs Insertion vs Binary vs Merge Sort

So I know the O(N) for linear is n, insertion is n**2, binary is log(n) and merge is nlogn
So Merge Sort is the best search for large lists. Which of the above is the best for small lists i.e. how small? Thanks
You're mixing up sort and search algorithms. Linear search and binary search are algorithms for finding a value in an array, not sorting the array. Insertion sort and mergesort are sorting algorithms.
Insertion sort tends run faster for small arrays. Many high-performance sorting routines, including Python's adaptive mergesort, automatically switch to insertion sort for small input sizes. The best size for the switch to occur is generally determined by testing. Java uses insertion sort for <= 6 elements in the primitive array versions of Arrays.sort; I'm not sure exactly how Python behaves.
You have got your facts wrong,
There is nothing called Linear Sort.
Insertion Sort is O(N^2)
There is nothing called Binary Sort
Though it could be Heap Sort which is O(NlogN)
MergeSort is O(NlogN)
QuickSoty is O(NlogN)
It is better to switch to insertion sort from merge sort if number of elements is less than 7
It is better to switch to insertion sort from quick sort if number of elements is less than 13

Resources