Combining insertion sort and merge sort - sorting

I was thinking of optimizing the current sorting algorithms. To make the process faster, I thought of using threads and divide the arrays into two parts. Sort that both arrays using insertion sort simultaneously using threads and wait for both of them to complete. After that use bubble sort to merge both of that array. Do you think, using this algorithm, the sorting will be faster ?

No, that will not be faster, in the general case. Imagine, for example, that your initial array looks like this:
[9,5,7,6,8,3,2,0,4,1]
After sorting the two halves it looks like this:
[5,6,7,8,9,0,1,2,3,4]
Sorting that with bubble sort won't be appreciably faster than sorting the initial array with bubble sort. Total time elapsed with the insertion sorts plus the bubble sort almost certainly will be more than if you just sorted the initial array with a single thread.
Neither bubble sort nor insertion sort are particularly amenable to parallelization. You're better off implementing a parallel quicksort. Or, if you insist on using insertion sort in the threads, use a merge to combine the sorted sub-arrays. Of course, the merge would require O(n) additional memory.

No. Your method would not be faster than known methods of parallel sorting algorithms. Insertion sort is O(n^2), and if applied would not result in better result unless the code is more parallelised. Given I assume you have only two threads to optimize : best will be to use Merge sort with two threads, which would help in Worst Case to be O(nlogn). I dont know why you want to learn two thread parallelism, but learning from the following resource might get you the knack of parallel algorithms of sorting in multithreads :
http://www.dcc.fc.up.pt/~fds/aulas/PPD/1112/sorting.pdf

Related

In what situations are slower sorting algorithms (bubble sort, selection sort, etc) more useful than faster algorithms such Quicksort?

I just wrote an essay about the efficiency and usefulness of different sorting algorithms. I concluded that merge sort and quicksort were far better when sorting completely randomized lists. I just wanted to ask in what situations would the slower sorting algorithms for this scenario (bubble sort and selection sort) be more useful or as useful as quicksort and merge sort.
Notice that both merge sort and quicksort may require additional memory, whether this is heap space required to save the stack for recursion or actual copy of the buffer.
Bubble sort and selection sort does not require additional memory. So in situations where memory is a strict restriction they would be used.
Recursion involved in quick sort or merge can be expensive when we are soting quite a few numers (may be less than 100)
In this case insertion sort performs much better.
Even the standard implementations of sorting libraries in Java uses a mix of quick-merge and insertion sort
HTH
There are stable sorting algorithms which keep items that compare equal arranged in the original order. If this is important to you, then you would use a stable sorting algorithm even if slower.
There are some cases where simpler algorithms are faster (because obviously a slower algorithm is never more useful if it doesn't have other advantages).
I have seen one situation where an array was sorted, and then each array element was modified by a tiny amount. So most items were in the right position, and only a few needed exchanging. In that case shakersort proved optimal.
If you know that the array was sorted, and then a small number of items was changed, there is a clever algorithm for that (which you find somewhere on cs.stackexchange.com): If k items have been changed, you can extract at most 2k items into a separate array, sort that (using Quicksort most likely) and merge the two arrays.
If you use a library function, it is unlikely to be plain Quicksort. For example, Apple's implementation looks for a sorted range at the beginning and the end of the array, and if there are substantial numbers of items already sorted, takes advantage of this (for example sorting the concatenation of two sorted arrays runs in linear time).

Are Selection or Insertion sort useful outside of academic environments?

Do these sorting algorithms have any use in real world application?
Or is it just a basic example of sorting algorithm with n^2 complexity?
Can anyone give some example of its usage?
Insertion sort is one of the fastest sorting algorithm for sorting very small arrays.
In practice, many quicksort / mergesort implementations stop when the subarrays to sort is below certain threshold, and insertion sort is then used for these small arrays.
Selection sort is rarely used in practice.
Insertion sort is actually pretty fast for small input sizes, due to the small hidden constants in its complexity. Upto some size, insertion sort is faster than merge sort.
Thus, for many popular sorting algorithms, when the array size becomes very small, insertion sort is employed.
Bottomline: A O(N2) algorithm may be faster in practise than a O(N*logN) algorithm for sufficiently small sized inputs, owing to the hidden constants.
Yes, insertion sort is widely used in industrial applications. That's mainly dues ot the fact that several popular C++ standard libraries such as libstdc++ and libc++ implement sort routine as a combination of insertion sort and depth-limited quicksort.
The idea is that insertion sort works very fast on nearly-sorted arrays, while for a straightforward implementation of quick sort sorted input leads to the worst-case behavior. Therefore the combined algorithm first applies a quicksort-like algorithm to partially sort the input, and then finished off with a call to insertion sort.
In libc++ insertion sort is also used for sorting by default if the input size is small enough (but larger than five elements, as sizes <= 5 are handled as special cases).
HP / Microsoft std::sort() is introsort, quicksort with a depth parameter that switches to heapsort if the depth becomes too deep.
HP / Microsoft std::stable_sort() is a type of timsort. It uses insertion sort to create groups of 32 sorted elements, then uses bottom up merge sort to merge the groups.
On a side note, top down merge sort is not used in any common library that I'm aware of. Instead the common library versions for both internal (memory) and external (disk) merge sorts are all variations of bottom up merge sort (like timsort). Yet in a classroom environment or on web site articles, you see more examples of top down merge sort than bottom up merge sort.

A Way To Improve Sorting Algorithms?

For sorting algorithms, why can't you just cut the array in half, and just use selection sort or insertion sort on both, and put them back together, to significantly improve the speed?
You're saying that your algorithm is faster than existing sorts, for example, selection sort and insertion sort. But then, once you've split your array in half, you'd be better using your algorithm rather than selection/insertion sort to sort the halves (perhaps unless the halves are small).
This is exactly merge-sort.
You are right. This approach is followed in some sorting algorithms. For example in Merge sort which divides the array into two halves and if these two halves are small, you can apply insertion sort directly on them but if they are large then it would not be feasible as you better divide the halves too (please see the details of Merge sort) . Insertion sort/Selection sort/Bubble sort perform better when array is small or generally on nearly sorted data. If you are tackling long data then choose Merge sort/Quick sort/Redix sort.

In what situations do I use these sorting algorithms?

I know the implementation for most of these algorithms, but I don't know for what sized data sets to use them for (and the data included):
Merge Sort
Bubble Sort (I know, not very often)
Quick Sort
Insertion Sort
Selection Sort
Radix Sort
First of all, you take all the sorting algorithms that have a O(n2) complexity and throw them away.
Then, you have to study several proprieties of your sorting algorithms and decide whether each one of them will be better suited for the problem you want to solve. The most important are:
Is the algorithm in-place? This means that the sorting algorithm doesn't use any (O(1) actually) extra memory. This propriety is very important when you are running memory-critical applications.
Bubble-sort, Insertion-sort and Selection-sort use constant memory.
There is an in-place variant for Merge-sort too.
Is the algorithm stable? This means that if two elements x and y are equal given your comparison method, and in the input x is found before y, then in the output x will be found before y.
Merge-sort, Bubble-sort and Insertion-sort are stable.
Can the algorithm be parallelized? If the application you are building can make use of parallel computation, you might want to choose parallelizable sorting algorithms.
More info here.
Use Bubble Sort only when the data to be sorted is stored on rotating drum memory. It's optimal for that purpose, but not for random-access memory. These days, that amounts to "don't use Bubble Sort".
Use Insertion Sort or Selection Sort up to some size that you determine by testing it against the other sorts you have available. This usually works out to be around 20-30 items, but YMMV. In particular, when implementing divide-and-conquer sorts like Merge Sort and Quick Sort, you should "break out" to an Insertion sort or a Selection sort when your current block of data is small enough.
Also use Insertion Sort on nearly-sorted data, for example if you somehow know that your data used to be sorted, and hasn't changed very much since.
Use Merge Sort when you need a stable sort (it's also good when sorting linked lists), beware that for arrays it uses significant additional memory.
Generally you don't use "plain" Quick Sort at all, because even with intelligent choice of pivots it still has Omega(n^2) worst case but unlike Insertion Sort it doesn't have any useful best cases. The "killer" cases can be constructed systematically, so if you're sorting "untrusted" data then some user could deliberately kill your performance, and anyway there might be some domain-specific reason why your data approximates to killer cases. If you choose random pivots then the probability of killer cases is negligible, so that's an option, but the usual approach is "IntroSort" - a QuickSort that detects bad cases and switches to HeapSort.
Radix Sort is a bit of an oddball. It's difficult to find common problems for which it is best, but it has good asymptotic limit for fixed-width data (O(n), where comparison sorts are Omega(n log n)). If your data is fixed-width, and the input is larger than the number of possible values (for example, more than 4 billion 32-bit integers) then there starts to be a chance that some variety of radix sort will perform well.
When using extra space equal to the size of the array is not an issue
Only on very small data sets
When you want an in-place sort and a stable sort is not required
Only on very small data sets, or if the array has a high probability to already be sorted
Only on very small data sets
When the range of values to number of items ratio is small (experimentation suggested)
Note that usually Merge or Quick sort implementations use Insertion sort for parts of the subroutine where the sub-array is very small.

Insertion sort better than Bubble sort?

I am doing my revision for the exam.
Would like to know under what condition will Insertion sort performs better than bubble sort given same average case complexity of O(N^2).
I did found some related articles, but I can't understand them.
Would anyone mind explaining it in a simple way?
The advantage of bubblesort is in the speed of detecting an already sorted list:
BubbleSort Best Case Scenario: O(n)
However, even in this case insertion sort got better/same performance.
Bubblesort is, more or less, only good for understanding and/or teaching the mechanism of sortalgorithm, but wont find a proper usage in programming these days, because its complexity
O(n²)
means that its efficiency decreases dramatically on lists of more than a small number of elements.
Following things came to my mind:
Bubble sort always takes one more pass over array to determine if it's sorted. On the other hand, insertion sort not need this -- once last element inserted, algorithm guarantees that array is sorted.
Bubble sort does n comparisons on every pass. Insertion sort does less than n comparisons: once the algorithm finds the position where to insert current element it stops making comparisons and takes next element.
Finally, quote from wikipedia article:
Bubble sort also interacts poorly with modern CPU hardware. It
requires at least twice as many writes as insertion sort, twice as
many cache misses, and asymptotically more branch mispredictions.
Experiments by Astrachan sorting strings in Java show bubble sort to
be roughly 5 times slower than insertion sort and 40% slower than
selection sort
You can find link to original research paper there.
I guess the answer you're looking for is here:
Bubble sort may also be efficiently used on a list that is already
sorted except for a very small number of elements. For example, if
only one element is not in order, bubble sort will take only 2n time.
If two elements are not in order, bubble sort will take only at most
3n time...
and
Insertion sort is a simple sorting algorithm that is relatively
efficient for small lists and mostly sorted lists, and often is used
as part of more sophisticated algorithms
Could you provide links to the related articles you don't understand? I'm not sure what aspects they might be addressing. Other than that, there is a theoretical difference which might be that bubble sort is more suited for collections represented as arrays (than it is for those represented as linked lists), while insertion sort is suited for linked lists.
The reasoning would be that bubble sort always swaps two items at a time which is trivial on both, array and linked list (more efficient on arrays), while insertion sort inserts at a place in a given list which is trivial for linked lists but involves moving all subsequent elements in an array to the right.
That being said, take it with a grain of salt. First of all, sorting arrays is, in practice, almost always faster than sorting linked lists. Simply due to the fact that scanning the list once has an enormous difference already. Apart from that, moving n elements of an array to the right, is much faster than performing n (or even n/2) swaps. This is why other answers correctly claim insertion sort to be superior in general, and why I really wonder about the articles you read, because I fail to think of a simple way of saying this is better in cases A, and that is better in cases B.
In the worst case both tend to perform at O(n^2)
In the best case scenario, i.e., when the array is already sorted, Bubble sort can perform at O(n).

Resources