Is selection sort an efficient algorithm? - sorting

I know it's a quadratric time algorithm, but how does it compare to other sorting algorithms, such as QuickSort or Bubble Sort?

Sorting algorithms generally vary based on the nature of data you have.
However, while bubble sort and selection sort are easy to understand (and implement) sorting algorithms their run time is O(n^2) i.e the worst time that you can possibly get.
As far as quick sort is concerned on an avergage it takes O(n log n) time hence it is an excellent sort. However, it too can take O(n ^ 2) in certain cases.

Quadratic time algorithms, depending the size of your data set, can be unbelievably slower.
n = 10e78 (about the number of atoms in the universe)
For a quadratic algorithm, that's n*(10e78). For an nlog(n) algorithm, like quicksort or mergesort, that's n*262. That's a huge difference.
But if your dataset is relatively small (< 1000 items, say), then the performance difference probably isn't going to be noticeable (unless, perhaps, the sort is being done repeatedly). In these cases it's usually best to use the simplest algorithm, and optimize later if it turns out to be too slow.
"Premature optimization is the root of all evil."
-Sir Tony Hoare, popularized by Donald Knuth

Wikipedia knows all.
Selection sort pretty much sucks.

If the data you have consists of only positive integers you may want to look at Bucket Sort. The algorithm can have a linear running time O(n) in the right conditions.

Related

why we are always using quick sort ? or any specific sorting algorithm?

why we are always using quick sort ? or any specific sorting algorithm ??
i tried some experiment on my PC using quick,merge,heap,flash sort
results:-
sorting algorithm : time in nanosecond -> time in minutes
quick sort time : 135057597441 -> 2.25095995735
Flash sort time : 137704213630 -> 2.29507022716667
merge sort time : 138317794813 -> 2.30529658021667
heap sort time : 148662032992 -> 2.47770054986667
using java in built function
long startTime = System.nanoTime();
given times are in nanoseconds there hardly any difference between them if we convert them into seconds for 20000000 random integer data and max array size is 2147483647 in java.if we are using in-place algorithm then there may be difference of 1 to 2 min till max array size.
if the difference is too small why we should care ??
All of the algorithms presented have a similar average case bounds, of O(n lg n), which is the "best" a comparison sort can do.
Since they share the same average bounds, the expected performance of these algorithms over random data should be similar - which is what the findings show. However, the devil is in the details. Here is a very quick summary; follow the links for further details.
Quicksort is generally not stable (but there are stable variations). While quicksort has an average bounds of O(n lg n), Quicksort has a worst case bounds of O(n * n) but there are ways to mitigate this. Quicksort, like heapsort, is done in-place.
Merge-sort is a stable sort. Mergesort has a worst case bounds of O(n lg n) which means it has predictable performance. Base merge-sort requires O(n) extra space so it's generally not an in-place sort (although there is an in-place variant, and the memory for a linked list implementation is constant).
Heapsort is not stable; it also has the worst case bounds of O(n lg n), but has the benefit of a constant size bounds and being in-place. It has worse cache and parallelism aspects than merge-sort.
Exactly which one is "best" depends upon the use-case, data, and exact implementation/variant.
Merge-sort (or hybrid such as Timsort) is the "default" sort implementation in many libraries/languages. A common Quicksort-based hybrid, Introsort is used in several C++ implementations. Vanilla/plain Quicksort implementations, should they be provided, are usually secondary implementations.
Merge-sort: a stable sort with consistent performance and acceptable memory bounds.
Quicksort/heapsort: trivially work in-place and [effectively] don't require additional memory.
We rarely need to sort integer data. One of the biggest overheads on a sort is the time it takes to make comparisons. Quicksort reduces the number of comparisons required by comparison with, say, a bubble sort. If you're sorting strings this is much more significant. As a real world example some years ago I wrote a sort/merge that took 40 minutes with a bubble sort, and 17 with a quick sort. (It was a z80 CPU a long time ago. I'd expect much better performance now).
Your conclusion is correct: most people that do care about this in most situations waste their time. Differences between these algorithms in terms of time and memory complexity become significant in a particular scenarios where:
you have huge number of elements to sort
performance is really critical (for example: real-time systems)
resources are really limited (for example: embedded systems)
(please note the really)
Also, there is the concern of stability which may be important more often. Most standard libraries provide stable sort algorithms (for example: OrderBy in C#, std::stable_sort in C++, sort in Python, sort methods in Java).
Correctness. While switching between sort algorithms might offer speed-ups under some specific scenarios, the cost of proving that algorithms work can be quite high.
For instance, TimSort, a popular sorting algorithm used by Android, Java, and Python, had an implementation bug that went unnoticed for years. This bug could cause a crash and was easily induced by the user.
It took a dedicated team "looking for a challenge" to isolate and solve the issue.
For this reason, any time a standard implementation of a data structure or algorithm is available, I will use that standard implementation. The time saved by using a smarter implementation is rarely worth uncertainty about the implementation's security and correctness.

What sort algorithm provides the best worst-case performance?

What is the fastest known sort algorithm for absolute worst case? I don't care about best case and am assuming a gigantic data set if that even matters.
make sure you have seen this:
visualizing sort algorithms - it helped me decide what sort alg to use.
Depends on data. For example for integers (or anything that can be expressed as integer) the fastest is radix sort which for fixed length values has worst case complexity of O(n). Best general comparison sort algorithms have complexity of O(n log n).
If you are using binary comparisons, the best possible sort algorithm takes O(N log N) comparisons to complete. If you're looking for something with good worst case performance, I'd look at MergeSort and HeapSort since they are O(N log N) algorithms in all cases.
HeapSort is nice if all your data fits in memory, while MergeSort allows you to do on-disk sorts better (but takes more space overall).
There are other less-well-known algorithms mentioned on the Wikipedia sorting algorithm page that all have O(n log n) worst case performance. (based on comment from mmyers)
For the man with limitless budget
Facetious but correct:
Sorting networks trade space (in real hardware terms) for better than O(n log n) sorting!
Without resorting to such hardware (which is unlikely to be available) you have a lower bound for the best comparison sorts of O(n log n)
O(n log n) worst case performance (no particular order)
Binary Tree Sort
Merge Sort
Heap Sort
Smooth Sort
Intro Sort
Beating the n log n
If your data is amenable to it you can beat the n log n restriction but instead care about the number of bits in the input data as well
Radix and Bucket are probably the best known examples of this. Without more information about your particular requirements it is not fruitful to consider these in more depth.
Quicksort is usually the fastest, but if you want good worst-case time, try Heapsort or Mergesort. These both have O(n log n) worst time performance.
If you have a gigantic data set (ie much larger than available memory) you likely have your data on disk/tape/something-with-expensive-random-access, so you need an external sort.
Merge sort works well in that case; unlike most other sorts it doesn't involve random reads/writes.
It largely is related to the size of your dataset and whether or not the set is already ordered (or what order it is currently in).
Entire books are written on search/sort algorithms. You aren't going to find an "absolute fastest" assuming a worst case scenario because different sorts have different worst-case situations.
If you have a sufficiently huge data set, you're probably looking at sorting individual bins of data, then using merge-sort to merge those bins. But at this point, we're talking data sets huge enough to be VASTLY larger than main memory.
I guess the most correct answer would be "it depends".
It depends both on the type of data and the type of resources. For example there are parallel algorithms that beat Quicksort, but given how you asked the question it's unlikely you have access them. There are times when the "worst case" for one algorithm is "best case" for another (nearly sorted data is problematic with Quick and Merge, but fast with much simpler techniques).
It depends on the size, according to the Big O notation O(n).
Here is a list of sorting algorithms BEST AND WORST CASE for you to compare.
My preference is the 2 way MergeSort
Assuming randomly sorted data, quicksort.
O(nlog n) mean case, O(n^2) in the worst case, but that requires highly non-random data.
You might want to describe your data set characteristics.
See Quick Sort Vs Merge Sort for a comparison of Quicksort and Mergesort, which are two of the better algorithms in most cases.
It all depends on the data you're trying to sort. Different algorithms have different speeds for different data. an O(n) algorithm may be slower than an O(n^2) algorithm, depending on what kind of data you're working with.
I've always preferred merge sort, as it's stable (meaning that if two elements are equal from a sorting perspective, then their relative order is explicitly preserved), but quicksort is good as well.
The lowest upper bound on Turing machines is achieved by merge sort, that is O(n log n). Though quick sort might be better on some datasets.
You can't go lower than O(n log n) unless you're using special hardware (e.g. hardware supported bead sort, other non-comparison sorts).
On the importance of specifying your problem: radix sort might be the fastest, but it's only usable when your data has fixed-length keys that can be broken down into independent small pieces. That limits its usefulness in the general case, and explains why more people haven't heard of it.
http://en.wikipedia.org/wiki/Radix_sort
P.S. This is an O(k*n) algorithm, where k is the size of the key.

Why is quicksort used in practice? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Quicksort has a worst-case performance of O(n2), but is still used widely in practice anyway. Why is this?
You shouldn't center only on worst case and only on time complexity. It's more about average than worst, and it's about time and space.
Quicksort:
has average time complexity of Θ(n log n);
can be implemented with space complexity of Θ(log n);
Also have in account that big O notation doesn't take in account any constants, but in practice it does make difference if the algorithm is few times faster. Θ(n log n) means, that algorithm executes in K n log(n), where K is constant. Quicksort is the comparison-sort algorithm with the lowest K.
Average asymptotic order of QuickSort is O(nlogn) and it's usually more efficient than heapsort due to smaller constants (tighter loops). In fact, there is a theoretical linear time median selection algorithm that you can use to always find the best pivot, thus resulting a worst case O(nlogn). However, the normal QuickSort is usually faster than this theoretical one.
To make it more sensible, consider the probability that QuickSort will finish in O(n2). It's just 1/n! which means it'll almost never encounter that bad case.
Interestingly, quicksort performs more comparisons on average than mergesort - 1.44 n lg n (expected) for quicksort versus n lg n for mergesort. If all that mattered were comparisons, mergesort would be strongly preferable to quicksort.
The reason that quicksort is fast is that it has many other desirable properties that work extremely well on modern hardware. For example, quicksort requires no dynamic allocations. It can work in-place on the original array, using only O(log n) stack space (worst-case if implemented correctly) to store the stack frames necessary for recursion. Although mergesort can be made to do this, doing so usually comes at a huge performance penalty during the merge step. Other sorting algorithms like heapsort also have this property.
Additionally, quicksort has excellent locality of reference. The partitioning step, if done using Hoare's in-place partitioning algorithm, is essentially two linear scans performed inward from both ends of the array. This means that quicksort will have a very small number of cache misses, which on modern architectures is critical for performance. Heapsort, on the other hand, doesn't have very good locality (it jumps around all over an array), though most mergesort implementations have reasonably locality.
Quicksort is also very parallelizable. Once the initial partitioning step has occurred to split the array into smaller and greater regions, those two parts can be sorted independently of one another. Many sorting algorithms can be parallelized, including mergesort, but the performance of parallel quicksort tends to be better than other parallel algorithms for the above reason. Heapsort, on the other hand, does not.
The only issue with quicksort is the possibility that it degrades to O(n2), which on large data sets can be very serious. One way to avoid this is to have the algorithm introspect on itself and switch to one of the slower but more dependable algorithms in the case where it degenerates. This algorithm, called introsort, is a great hybrid sorting algorithm that gets many of the benefits of quicksort without the pathological case.
In summary:
Quicksort is in-place except for the stack frames used in the recursion, which take O(log n) space.
Quicksort has good locality of reference.
Quicksort is easily parallelized.
This accounts for why quicksort tends to outperform sorting algorithms that on paper might be better.
Hope this helps!
Because on average it's the fastest comparison sort (in terms of elapsed time).
Because, in the general case, it's one of the fastest sorting algorithms.
In addition to being the fastest though, some of it's bad case scenarios can be avoided by shuffling the array before sorting it. As for it's weakness with small data sets, obviously isn't as big a problem since the datasets are small and the sort time is probably small regardless.
As an example, I wrote a python function for QuickSort and bubble sorts. The bubble sort takes ~20 seconds to sort 10,000 records, 11 seconds for 7500, and 5 for 5000. The quicksort does all these sorts in around 0.15 seconds!
It might be worth pointing out that C does have the library function qsort(), but there's no requirement that it be implemented using an actual QuickSort, that is up to the compiler vendor.
Bcs this is one of Algorithm with work well on large data set with O(NlogN) complexity. This is also in place algorithm which take constant space. By selecting pivot element wisely we can avoid worse case of Quick sort and will perform in O(NlogN) always even on sorted array.

Efficiency of Sort Algorithms

I am studying up for a pretty important interview tomorrow and there is one thing that I have a great deal of trouble with: Sorting algorithms and BigO efficiencies.
What number is important to know? The best, worst, or average efficiency?
worst, followed by average. be aware of the real-world impact of the so-called "hidden constants" too - for instance, the classic quicksort algorithm is O(n^2) in the worst case, and O(n log n) on average, whereas mergesort is O(n log n) in the worst case, but quicksort will outperform mergesort in practice.
All of them are important to know, of course. You have to understand the benefits of one sort algorithm in the average case can become a terrible deficit in the worst case, or the worst case isn't that bad, but the best case isn't that good, and it only works well on unsorted data, etc.
In short.
Sorting algorithm efficiency will vary on input data and task.
sorting max speed, that can be archived is n*log(n)
if data contains sorted sub data, max speed can be better then n*log(n)
if data consists of duplicates, sorting can be done in near linear time
most of sorting algorithms have their uses
Most of quick sort variants have its average case also n*log(n), but thy are usually faster then other not heavily optimized algorithms. It is faster when it is not stable, but stable variants are only a fraction slower. Main problem is worst case. Best casual fix is Introsort.
Most of merge sort variants have its best, average and worst case fixed to n*log(n). It is stable and relatively easy to scale up. BUT it needs a binary tree (or its emulation) with relative to the size of total items. Main problem is memory. Best casual fix is timsort.
Sorting algorithms vary also by size of input. I can make a newbie claim, that over 10T size data input, there is no match for merge sort variants.
I recommend that you don't merely memorize these factoids. Learn why they are what they are. If I was interviewing you, I would make sure to ask questions that show my you understand how to analyze an algorithm, not just can spit back out something you saw on a webpage or in a book. Additionally, the day before an interview is not the time to be doing this studying.
I wish you the best of luck!! Please report back in a comment how it went!
I am over with one set of interviews at my college just now...
Every algorithm has its benefits, otherwise it wont exist.
So, its better to understand what is so good with the algorithm that you are studying. Where does it do well? How can it be improved?
I guess you'll automatically need to read various efficiency notations when you do this. Mind the worst case, and pay attention to the average case, best cases are rare.
All the best for your interview.
You may also want to look into other types of sorting that can be used when certain conditions exist. For example, consider Radix sort. http://en.wikipedia.org/wiki/Radix_sort

Average time complexity of quicksort vs insertion sort

I'm lead to believe that quick sort should be faster than insertion sort on a medium size unorderd int array. I've implemented both algorithms in java and I notice quicksort is significantly slower then insertion sorrt.
I have a theory: quiksort is being slower because it's recursive and the call it's making to it's own method signature is quite slow in the JVM which is why my timer is giving much higher readings than I expected, whereas insertion isn't recursive and all thwe work is done within one method so they JVM isn't having to do any extra grunt work? amirite?
You may be interested in these Sorting Algorithm Animations.
Probably not, unless your recursive methods are making any big allocations. Its more likely there's a quirk in your code or your data set is small.
The JVM shouldn't have any trouble with recursive calls.
Unless you've hit one of Quicksort's pathological cases (often, a list that is already sorted), Quicksort should be O(n log n) — substantially faster than insertion sort's O(n^2) as n increases.
You may want to use merge sort or heap sort instead; they don't have pathological cases. They are both O(n log n).
(When I did these long ago in C++, quicksort was faster than insertion sort with fairly small ns. Radix is notable faster with mid-size ns as well.)
theoretically Quick Sort should work faster than insertion sort for random data of medium to large size.
I guess the differences should be in the way QS is implemented:
pivot selection for the given data ?(3-median is a better approach)
using the same Swap mechanism for QS and insertion sort ?
is the input random enuf, i.e ., if you have clusters of ordered data performance will
suffer.
I did this exercise in C and results are in accordance with theory.
Actually for small value of n insertion sort is better than quick sort. As for small value of n instead of n^2 or nlogn the time depends more on constant.
The fastest implementations of quicksort use looping instead of recursion. Recursion typically isn't very fast.
You have to be careful how you make the recursive calls, and because it's Java, you can't rely on tail calls being optimized, so you should probably manage your own stack for the recursion.
Everything that is available to be known about quicksort vs insertion sort can be found in Bob Sedgewick's doctoral dissertation. The boiled-down version can be found in his algorithms textbooks.
I remember that in school, when we did sorting in Java, we would actually do a hybrid of the two. So for resursive algorithms like quicksort and mergesort, we would actually do insertion sort for segments that were very smal, say 10 records or so.
Recursion is slow, so use it with care. And as was noted before, if you can figure a way to implement the same algorithm in an iterative fashion, then do that.
There are three things to consider here. First, insertion sort is much faster (O(n) vs O(n log n)) than quicksort IF the data set is already sorted, or nearly so; second, if the data set is very small, the 'start up time" to set up the quicksort, find a pivot point and so on, dominates the rest; and third, Quicksort is a little subtle, you may want to re-read the code after a night's sleep.
How are you choosing your pivot in Quicksort?
This simple fact is the key to your question, and probably why Quicksort is running slower. In cases like this it's a good idea to post at least the important sections of your code if you're looking for some real help.
Actually for little worth of n insertion type is healthier than fast type. As for little worth of n rather than n^2 or nlogn the time depends a lot of on constant.
Web Development Indianapolis

Resources