SelectionSort and InsertionSort

SelectionSort and InsertionSort - sorting

I have to answer the following question: Could you think of a scenario in which SelectionSort is better (in terms of the development of the result) than InsertionSort.
My idea is that if you only need e.g. the top 10 of a very big list, your could end the sorting after the 10th step. Is this a valid answer to this question? Could you think of other scenarios?

InsertionSort has Ο(n^2) swaps in worst case, but SelectionSort is Θ(n). So if writes are significantly more expensive than reads, SelectionSort is better suited.

When input array is "close to sorted" selecion sort have better performance than insertion sort. Also there are two things you must take in mind: selection sort needs only Θ(n) swaps versus Ο(n2) (in case of insertion sort) swaps; selection sort has stability strong realization, insertion sort has no.

Related

What type of input differentiates insertion sort from selection sort?

In what kind of test case does insertion sort perform better than selection sort? Clearly describe the test case.
Why does selection sort perform worse than insertion sort in that test case?
I answered the first question like this:
O(n2). When insertion sort is given a list, it takes the current element and inserts it at the appropriate position of the list, adjusting the list every time we insert. It is similar to arranging the cards in a card game.
And the second question:
Because Selection Sort always does n(n-1)/2 comparisons, but in the worst case it will only ever do n-1 swaps.
But I am not sure about my answers, any advice?

For a case where insertion sort is faster than selection sort, think about what happens if the input array is already sorted. In that case, insertion sort makes O(n) comparisons and no swaps. Selection sort always makes Θ(n2) comparisons and O(n) swaps, so given a reasonably large sorted input array insertion sort will outperform selection sort.
For a case where selection sort outperforms insertion sort, we need to be a bit tricky. Insertion sort's worst case is when the input array is reverse sorted. In this case, it makes Θ(n2) comparisons and Θ(n2) swaps. Compare this to selection sort, which always makes Θ(n2) comparisons and O(n) swaps. If compares and swaps take roughly the same amount of time, then selection sort might be faster than insertion sort, but you'd have to actually profile it to find out. Really, it boils down to the implementation. A good implementation of insertion sort might actually beat a bad implementation of selection sort in this case.
As a tricky solution, try making a custom data type where compares are cheap but swaps take a really, really long time to complete. That should work for your second case.

When is mergesort preferred over quicksort?

Quicksort is better than mergesort in many cases. But when might mergesort be better than quicksort?
For example, mergesort works better when all data cannot be loaded to memory at once. Are there any other cases?
Answers to the suggested duplicate question list advantages of using quicksort over mergesort. I'm asking about the possible cases and applications where mergesort would be better than quicksort.

Both quicksort and mergesort can work just fine if you can't fit all data into memory at once. You can implement quicksort by choosing a pivot, then streaming elements in from disk into memory and writing elements into one of two different files based on how that element compares to the pivot. If you use a double-ended priority queue, you can actually do this even more efficiently by fitting the maximum number of possible elements into memory at once.
Mergesort is worst-case O(n log n). That said, you can easily modify quicksort to produce the introsort algorithm, a hybrid between quicksort, insertion sort, and heapsort, that's worst-case O(n log n) but retains the speed of quicksort in most cases.
It might be helpful to see why quicksort is usually faster than mergesort, since if you understand the reasons you can pretty quickly find some cases where mergesort is a clear winner. Quicksort usually is better than mergesort for two reasons:
Quicksort has better locality of reference than mergesort, which means that the accesses performed in quicksort are usually faster than the corresponding accesses in mergesort.
Quicksort uses worst-case O(log n) memory (if implemented correctly), while mergesort requires O(n) memory due to the overhead of merging.
There's one scenario, though, where these advantages disappear. Suppose you want to sort a linked list of elements. The linked list elements are scattered throughout memory, so advantage (1) disappears (there's no locality of reference). Second, linked lists can be merged with only O(1) space overhead instead of O(n) space overhead, so advantage (2) disappears. Consequently, you usually will find that mergesort is a superior algorithm for sorting linked lists, since it makes fewer total comparisons and isn't susceptible to a poor pivot choice.

A single most important advantage of merge sort over quick sort is its stability: the elements compared equal retain their original order.

MergeSort is stable by design, equal elements keep their original order.
MergeSort is well suited to be implemented parallel (multithreading).
MergeSort uses (about 30%) less comparisons than QuickSort. This is an often overlooked advantage, because a comparison can be quite expensive (e.g. when comparing several fields of database rows).

Quicksort is average case O(n log n), but has a worst case of O(n^2). Mergesort is always O(n log n). Besides the asymptotic worst case and the memory-loading of mergesort, I can't think of another reason.
Scenarios when quicksort is worse than mergesort:
Array is already sorted.
All elements in the array are the same.
Array is sorted in reverse order.
Take mergesort over quicksort if you don't know anything about the data.

Merge sort has a guaranteed upper limit of O(N log2N). Quick sort has such limit, too, but it is much higher - it is O(N2). When you need a guaranteed upper bound on the timing of your code, use merge sort over quick sort.
For example, if you write code for a real-time system that relies on sorting, merge sort would be a better choice.

Merge Sort Worst case complexity is O(nlogn) whereas Quick Sort worst case is O(n^2).
Merge Sort is a stable sort which means that the same element in an array maintain their original positions with respect to each other.

Why is Insertion sort better than Quick sort for small list of elements?

Isn't Insertion sort O(n^2) > Quicksort O(n log n)...so for a small n, won't the relation be the same?

Big-O Notation describes the limiting behavior when n is large, also known as asymptotic behavior. This is an approximation. (See http://en.wikipedia.org/wiki/Big_O_notation)
Insertion sort is faster for small n because Quick Sort has extra overhead from the recursive function calls. Insertion sort is also more stable than Quick sort and requires less memory.
This question describes some further benefits of insertion sort. ( Is there ever a good reason to use Insertion Sort? )

Define "small".
When benchmarking sorting algorithms, I found out that switching from quicksort to insertion sort - despite what everybody was saying - actually hurts performance (recursive quicksort in C) for arrays larger than 4 elements. And those arrays can be sorted with a size-dependent optimal sorting algorithm.
That being said, always keep in mind that O(n...) only is the number of comparisons (in this specific case), not the speed of the algorithm. The speed depends on the implementation, e. g., if your quicksort function as or not recursive and how quickly function calls are dealt with.
Last but not least, big oh notation is only an upper bound.
If algorithm A requires 10000 n log n comparions and algorithm B requires 10 n ^ 2, the first is O(n log n) and the second is O(n ^ 2). Nevertheless, the second will (probably) be faster.

O()-notation is typically used to characterize performance for large problems, while deliberately ignoring constant factors and additive offsets to performance.
This is important because constant factors and overhead can vary greatly between processors and between implementations: the performance you get for a single-threaded Basic program on a 6502 machine will be very different from the same algorithm implemented as a C program running on an Intel i7-class processor. Note that implementation optimization is also a factor: attention to detail can often get you a major performance boost, even if all other factors are the same!
However, the constant factor and overhead are still important. If your application ensures that N never gets very large, the asymptotic behavior of O(N^2) vs. O(N log N) doesn't come into play.
Insertion sort is simple and, for small lists, it is generally faster than a comparably implemented quicksort or mergesort. That is why a practical sort implementation will generally fall back on something like insertion sort for the "base case", instead of recursing all the way down to single elements.

Its a matter of the constants that are attached to the running time that we ignore in the big-oh notation(because we are concerned with order of growth). For insertion sort, the running time is O(n^2) i.e. T(n)<=c(n^2) whereas for Quicksort it is T(n)<=k(nlgn). As c is quite small, for small n, the running time of insertion sort is less then that of Quicksort.....
Hope it helps...

Good real-world example when insertion sort can be used in conjunction with quicksort is the implementation of qsort function from glibc.
The first thing to point is qsort implements quicksort algorithm with a stack because it consumes less memory, stack implemented through macros directives.
Summary of current implementation from the source code (you'll find a lot of useful information through comments if you take a look at it):
Non-recursive
Chose the pivot element using a median-of-three decision tree
Only quicksorts TOTAL_ELEMS / MAX_THRESH partitions, leaving
insertion sort to order the MAX_THRESH items within each partition.
This is a big win, since insertion sort is faster for small, mostly
sorted array segments.
The larger of the two sub-partitions is always pushed onto the
stack first
What is MAX_THRESH value stands for? Well, just a small constant magic value which
was chosen to work best on a Sun 4/260.

How about binary insertion sort? You can absolutely search the position to swap by using binary search.

Sorting in O(n*log(n)) worst case

Is there a sort of an array that works in O(n*log(n)) worst case time complexity?
I saw in Wikipedia that there are sorts like that, but they are unstable, what does that mean? Is there a way to do in low space complexity?
Is there a best sorting algorithm?

An algorithm that requires only O(1) extra memory (so modifying the input array is permitted) is generally described as "in-place", and that's the lowest space complexity there is.
A sort is described as "stable" or not, according to what happens when there are two elements in the input which compare as equal, but are somehow distinguishable. For example, suppose you have a bunch of records with an integer field and a string field, and you sort them on the integer field. The question is, if two records have the same integer value but different string values, then will the one that came first in the input, also come first in the output, or is it possible that they will be reversed? A stable sort is one that guarantees to preserve the order of elements that compare the same, but aren't identical.
It is difficult to make a comparison sort that is in-place, and stable, and achieves O(n log n) worst-case time complexity. I've a vague idea that it's unknown whether or not it's possible, but I don't keep up to date on it.
Last time someone asked about the subject, I found a couple of relevant papers, although that question wasn't identical to this question:
How to sort in-place using the merge sort algorithm?
As far as a "best" sort is concerned - some sorting strategies take advantage of the fact that on the whole, taken across a large number of applications, computers spend a lot of time sorting data that isn't randomly shuffled, it has some structure to it. Timsort is an algorithm to take advantage of commonly-encountered structure. It performs very well in a lot of practical applications. You can't describe it as a "best" sort, since it's a heuristic that appears to do well in practice, rather than being a strict improvement on previous algorithms. But it's the "best" known overall in the opinion of people who ship it as their default sort (Python, Java 7, Android). You probably wouldn't describe it as "low space complexity", though, it's no better than a standard merge sort.

You can check out between mergesort, quicksort or heapsort all nicely described here.
There is also radix sort whose complexity is O(kN) but it takes full advantage of extra memory consumption.
You can also see that for smaller collections quicksort is faster but then mergesort takes the lead but all of this is case specific so take your time to study all 4 algorithms

For the question best algorithm, the simple answer is, it depends.It depends on the size of the data set you want to sort,it depends on your requirement.Say, Bubble sort has worst-case and average complexity both О(n2), where n is the number of items being sorted. There exist many sorting algorithms with substantially better worst-case or average complexity of O(n log n). Even other О(n2) sorting algorithms, such as insertion sort, tend to have better performance than bubble sort. Therefore, bubble sort is not a practical sorting algorithm when n is large.
Among simple average-case Θ(n2) algorithms, selection sort almost always outperforms bubble sort, but is generally outperformed by insertion sort.
selection sort is greatly outperformed on larger arrays by Θ(n log n) divide-and-conquer algorithms such as mergesort. However, insertion sort or selection sort are both typically faster for small arrays.
Likewise, you can yourself select the best sorting algorithm according to your requirements.

It is proven that O(n log n) is the lower bound for sorting generic items. It is also proven that O(n) is the lower bound for sorting integers (you need at least to read the input :) ).
The specific instance of the problem will determine what is the best algorithm for your needs, ie. sorting 1M strings is different from sorting 2M 7-bits integers in 2MB of RAM.
Also consider that besides the asymptotic runtime complexity, the implementation is making a lot of difference, as well as the amount of available memory and caching policy.
I could implement quicksort in 1 line in python, roughly keeping O(n log n) complexity (with some caveat about the pivot), but Big-Oh notation says nothing about the constant terms, which are relevant too (ie. this is ~30x slower than python built-in sort, which is likely written in C btw):
qsort = lambda a: [] if not a else qsort(filter(lambda x: x<a[len(a)/2], a)) + filter(lambda x: x == a[len(a)/2], a) + qsort(filter(lambda x: x>a[len(a)/2], a))
For a discussion about stable/unstable sorting, look here http://www.developerfusion.com/article/3824/a-guide-to-sorting/6/.
You may want to get yourself a good algorithm book (ie. Cormen, or Skiena).

Heapsort, maybe randomized quicksort
stable sort
as others already mentioned: no there isn't. For example you might want to parallelize your sorting algorithm. This leads to totally different sorting algorithms..

Regarding your question meaning stable, let's consider the following: We have a class of children associated with ages:
Phil, 10
Hans, 10
Eva, 9
Anna, 9
Emil, 8
Jonas, 10
Now, we want to sort the children in order of ascending age (and nothing else). Then, we see that Phil, Hans and Jonas all have age 10, so it is not clear in which order we have to order them since we sort just by age.
Now comes stability: If we sort stable we sort Phil, Hans and Jonas in the order they were before, i.e. we put Phil first, then Hans, and at last, Jonas (simply because they were in this order in the original sequence and we only consider age as comparison criterion). Similarily, we have to put Eva before Anna (both the same age, but in the original sequence Eva was before Anna).
So, the result is:
Emil, 8
Eva, 9
Anna, 9
Phil, 10 \
Hans, 10 | all aged 10, and left in original order.
Jonas, 10 /
To put it in a nutshell: Stability means that if two elements are equal (w.r.t. the chosen sorting criterion), the one coming first in the original sequence still comes first in the resulting sequence.
Note that you can easily transform any sorting algorithm into a stable sorting algorithm: If your original sequence holds n elements: e1, e2, e3, ..., en, you simply attach a counter to each one: (e1, 0), (e2, 1), (e3, 2), ..., (en, n-1). This means you store for each element its original position.
If now two elements are equal, you simply compare their counters and put the one with the lower counter value first. This increases runtime (and memory) by O(n), which is asymptotic no worsening since the best (comparison) sort algorithm needs already O(n lg n).

Average time complexity of quicksort vs insertion sort

I'm lead to believe that quick sort should be faster than insertion sort on a medium size unorderd int array. I've implemented both algorithms in java and I notice quicksort is significantly slower then insertion sorrt.
I have a theory: quiksort is being slower because it's recursive and the call it's making to it's own method signature is quite slow in the JVM which is why my timer is giving much higher readings than I expected, whereas insertion isn't recursive and all thwe work is done within one method so they JVM isn't having to do any extra grunt work? amirite?

You may be interested in these Sorting Algorithm Animations.

Probably not, unless your recursive methods are making any big allocations. Its more likely there's a quirk in your code or your data set is small.
The JVM shouldn't have any trouble with recursive calls.

Unless you've hit one of Quicksort's pathological cases (often, a list that is already sorted), Quicksort should be O(n log n) — substantially faster than insertion sort's O(n^2) as n increases.
You may want to use merge sort or heap sort instead; they don't have pathological cases. They are both O(n log n).
(When I did these long ago in C++, quicksort was faster than insertion sort with fairly small ns. Radix is notable faster with mid-size ns as well.)

theoretically Quick Sort should work faster than insertion sort for random data of medium to large size.
I guess the differences should be in the way QS is implemented:
pivot selection for the given data ?(3-median is a better approach)
using the same Swap mechanism for QS and insertion sort ?
is the input random enuf, i.e ., if you have clusters of ordered data performance will
suffer.
I did this exercise in C and results are in accordance with theory.

Actually for small value of n insertion sort is better than quick sort. As for small value of n instead of n^2 or nlogn the time depends more on constant.

The fastest implementations of quicksort use looping instead of recursion. Recursion typically isn't very fast.

You have to be careful how you make the recursive calls, and because it's Java, you can't rely on tail calls being optimized, so you should probably manage your own stack for the recursion.
Everything that is available to be known about quicksort vs insertion sort can be found in Bob Sedgewick's doctoral dissertation. The boiled-down version can be found in his algorithms textbooks.

I remember that in school, when we did sorting in Java, we would actually do a hybrid of the two. So for resursive algorithms like quicksort and mergesort, we would actually do insertion sort for segments that were very smal, say 10 records or so.
Recursion is slow, so use it with care. And as was noted before, if you can figure a way to implement the same algorithm in an iterative fashion, then do that.

There are three things to consider here. First, insertion sort is much faster (O(n) vs O(n log n)) than quicksort IF the data set is already sorted, or nearly so; second, if the data set is very small, the 'start up time" to set up the quicksort, find a pivot point and so on, dominates the rest; and third, Quicksort is a little subtle, you may want to re-read the code after a night's sleep.

How are you choosing your pivot in Quicksort?
This simple fact is the key to your question, and probably why Quicksort is running slower. In cases like this it's a good idea to post at least the important sections of your code if you're looking for some real help.

Actually for little worth of n insertion type is healthier than fast type. As for little worth of n rather than n^2 or nlogn the time depends a lot of on constant.
Web Development Indianapolis

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio