As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
From reading this article from Wikipedia on sorting algorithms, it would seem that smoothsort is the best sorting algorithm there is. It has top performance in all categories: best, average, and worst. Nothing beats it in any category. It also has constant memory requirements. The only downside is that it isn't stable.
It beats timsort in memory, and it beats quicksort in both worst-case performance and memory.
But I never hear about smoothsort. Nobody ever mentions it, and most discussions seem to revolve around other sorting algorithms.
Why is that?
Big-O performance is great for publishing papers, but in the real world we have to look at the constants too. Quicksort has been the algorithm of choice for unstable, in-place, in-memory sorting for so long because we can implement its inner loop very efficiently and it is very cache-friendly. Even if you can implement smoothsort's inner loop as efficiently, or nearly as efficiently, as quicksort's, you will probably find that its cache miss rate makes it slower.
We mitigate quicksort's worst-case performance by spending a little more effort choosing good pivots (to reduce the number of pathological cases) and detecting pathological cases. Look up introsort. Introsort runs quicksort first, but switches to heapsort if it detects excessive recursion (which indicates a pathological case for quicksort).
Better asymptotic doesn't imply better performance (though usually it turns out so). Hidden constant may be several times bigger, causing it to be slower that another algorithm (with same or even worst asymptotic complexity) on arrays of relatively small size (where relatively small array, in fact, may be of arbitrary size, 10100, for example. That's asymptotic analysis). But I don't know anything about smoothsort hidden constants.
For example, there is a O(n) worstcase in time algorithm for finding kth order statistic, but it's so complex that O(n log n) worstcase version outperforms it in most cases.
Also, there is an interesting comparison:
…As you can see, both Timsort and Smoothsort didn’t cut the mustard. Smoothsort is worse than STL sorts in all cases(even with std:bitset replaced with raw bit operations)…
Well first I would say that it is not like Smoothsort is not famous. It depends on the need of a user and also it depends on the user whether to use it or not.
The advantage of smoothsort is that it comes closer to O(n) time if the input is already sorted to some degree, whereas heapsort averages O(n log n) regardless of the initial sorted state.
From the Documentation:-
The smoothsort algorithm needs to be able to hold in memory the sizes
of all of the heaps in the string. Since all these values are
distinct, this is usually done using a bit vector. Moreover, since
there are at most O(log n) numbers in the sequence, these bits can be
encoded in O(1) machine words, assuming a transdichotomous machine
model.
Related
This questions came up on a homework assignment. I cannot fathom why? It seems like you would always want to choose the algorithm that produces the best run time.
Big O and Big Theta notation only imply that for arbitrarily large input sizes, the performance tends towards some limit. For example, the function 99999999n is O(n) but the function (1/9999999999)n^2 is O(n^2). However, for any input of reasonable size (not infinitely large), the O(n^2) function is obviously likely to be faster.
In other words, if you can make assumptions about your input data, there are some cases where a generally worse algorithm may perform better.
A real world example of the above concept is sorting - there are some algorithms which perform in O(n) time if the array is already sorted (bubble sort). If you know a lot of your arrays already are sorted you may choose to use bubble sort over merge sort for this reason.
Another corner case where you might want to not use a more time-efficient algorithm is space efficiency. Maybe you are programming on an embedded device with very little RAM. You would rather use less memory and waste slightly more time than be as perfectly time-efficient as possible.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
This is a question to people who are programmers for a living -
I just proved (using the Master theorem) that if we use quicksort and we pick the pivot to be the median of the the subarray we are partitioning (using the median of medians algorithm with Θ(n) worst case run time) then the worst case run time of quicksort is Θ(n lg n) - so basically this means that this version of quicksort is as good as it can get.
My question now is - does anyone implement quicksort like this in practice? Or is it just one of those nice theoretical things that are actually not good in real life?
PS - I don't need proofs of what I'm stating, I just want to know if this is being widely known/useful
This is known (see the wikipedia entry), but since in practice the worst case is relatively rare, the added overhead of an O(N) selection algorithm on the average case is generally considered unacceptable.
It really depends on where you're working.
So far, personally, I never actually implemented it - But I really think it varies, depending on the requirements of your workplace.
When you made partition around some pivot, you already have "quality" of the pivot (how evenly it divides array). If it's lower than some threshold, you can try some smarter ways to select pivot. This keeps time complexity O(n*log n) and keeps constants low, because complex selection is done rarely.
If I don't mistake C++ STL uses something like this, but I haven't any links - that's from a conversation on work.
update
C++ STL (at least the one in Visual Studio) seems to do a different thing:
Perform partition
Unconditionally sort the smaller part by recursion (since it cannot be bigger than half that's safe for O(n*log n))
Handle the larger part in the same loop (without recursive call)
If number of iterations exceeds approx. 1.5 log2(N), it switches to heap sort which is O(n*log n).
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
There are varieties of sorting algorithms available. Sorting algorithm with time complexity of O(n^2) may be suited over O(nlogn), because it is in-place or it is stable. For example:
For somewhat sorted things insertion sort is good.
Applying quick sort on nearly sorted array is foolishness.
Heap sort is good with O(nlogn) but not stable.
Merge sort can't be used in embedded systems as in worst case it requires O(n) of space complexity.
I want to know which sorting algorithm is suitable in what conditions.
Which sorting algo is best for sorting names in alphabetical order?
Which sorting algo is best for sorting less integers?
Which sorting algo is best for sorting less integers but may be large in range (98767 – 6734784)?
Which sorting algo is best for sorting billions of integers?
Which sorting algo is best for sorting in embedded systems or real time systems where space and time both are constraints?
Please suggest these/other situations, books or website for these type of comparisons.
Well, there is no silver bullet - but here are some rules of thumb:
Radix sort/ Counting sort is usually good when the range of elements (let it be U) is relatively small comparing to the number of elements (U<<n) (might fit your case 2,4)
Insertion sort is good for small (say n<30) lists, even faster then O(nlogn) algorithms (empirically). In fact, you can optimize an O(nlogn) top-down algorithm by switching to insertion sort when n<30
A variation of radix sort might also be a good choice for sorting strings alphabetically, since it is O(|S|*n), while normal comparing based algorithm is O(|S|*nlogn) [where |S| is the length of your string]. (fits your case 1)
Where the sorted input is very large, way too large to fit in merge, the way to do it is with external sort - which is a variation or merge sort, it minimizes the number of disk reads/writes and makes sure these are done sequentially - because it improves the performance drastically. (might fit case 4)
For general case sorting, quick sort and timsort (used for java)
gives good performance.
Merge sort can't be used in embedded systems as in worst case it
requires O(n) of space complexity.
You may be interested in the stable_sort function from C++. It tries to allocate the extra space for a regular merge sort, but if that fails it does an in-place stable merge sort with inferior time complexity (n * ((log n)^2) instead of n * (log n)). If you can read C++ you can look at the implementation in your favourite standard library, otherwise I expect you can find the details explained somewhere in language-agnostic terms.
There's a body of academic literature about in-place stable sorting (and in particular in-place merging).
So in C++ the rule of thumb is easy, "use std::stable_sort if you need a stable sort, otherwise use std::sort". Python makes it even easier again, the rule of thumb is "use sorted".
In general, you will find that a lot of languages have fairly clever built-in sort algorithms, and you can use them most of the time. It's rare that you'll need to implement your own to beat the standard library. If you do need to implement your own, there isn't really any substitute for pulling out the textbooks, implementing a few algorithms with as many tricks as you can find, and testing them against each other for the specific case you're worried about for which you need to beat the library function.
Most of the "obvious" advice that you might be hoping for in response to this question is already incorporated into the built-in sort functions of one or more common programming languages. But to answer your specific questions:
Which sorting algo is best for sorting names in alphabetical order?
A radix sort might edge out standard comparison sorts like C++ sort, but that might not be possible if you're using "proper" collation rules for names. For example, "McAlister" used to be alphabetized the same as "MacAlister", and "St. John" as "Saint John". But then programmers came along and wanted to just sort by ASCII value rather than code a lot of special rules, so most computer systems don't use those rules any more. I find Friday afternoon is a good time for this kind of feature ;-) You can still use a radix sort if you do it on the letters of the "canonicalized" name rather than the actual name.
"Proper" collation rules in languages other than English are also entertaining. For example in German "Grüber" sorts like "Grueber", and therefore comes after "Gruber" but before "Gruhn". In English the name "Llewellyn" comes after "Lewis", but I believe in Welsh (using the exact same alphabet but different traditional collation rules) it comes before.
For that reason, it's easier to talk about optimizing string sorts than it is to actually do it. Sorting strings "properly" requires being able to plug in locale-specific collation rules, and if you move away from a comparison sort then you might have to re-write all your collation code.
Which sorting algo is best for sorting less integers?
For a small number of small values maybe a counting sort, but Introsort with a switch to insertion sort when the data gets small enough (20-30 elements) is pretty good. Timsort is especially good when the data isn't random.
Which sorting algo is best for sorting less integers but may be large in range (98767 – 6734784)?
The large range rules out counting sort, so for a small number of widely-ranged integers, Introsort/Timsort.
Which sorting algo is best for sorting billions of integers?
If by "billions" you mean "too many to fit in memory" then that changes the game a bit. Probably you want to divide the data into chunks that do fit in memory, Intro/Tim sort each one, then do an external merge. Of if you're on a 64 bit machine sorting 32 bit integers, you could consider counting sort.
Which sorting algo is best for sorting in embedded systems or real time systems where space and time both are constraints?
Probably Introsort.
For somewhat sorted things insertion sort is good.
True, and Timsort takes advantage of the same situation.
Applying quick sort on nearly sorted array is foolishness.
False. Nobody uses the plain QuickSort originally published by Hoare, you can make better choices of pivot that make the killer cases much less obvious than "sorted data". To deal with the bad cases thoroughly there is Introsort.
Heap sort is good with O(nlogn) but not stable.
True, but Introsort is better (and also not stable).
Merge sort can't be used in embedded systems as in worst case it requires O(n) of space complexity.
Handle this by allowing for somewhat slower in-place merging like std::stable_sort does.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Our algorithm professor gave us a assignment that requires us to choose a rare sorting algorithm (e.g. Introsort, Gnomesort, etc.) and do some research about it.
Wikipedia sure has a plenty of information about this, but it is still not enough for me to do the research in depth.
So I would like to find a book that include discussions of those rare sorting algorithms, since most of the textbooks (like CLRS, the one I am using) only discuss about some basic sorting algorithms (e.g. Bubble Sort, Merge Sort, Insertion Sort.).
Is there a book or website that contains a good amount of those information?
Thanks!
Well, a very interesting "rare" sorting algorithm in Smoothsort by Edsger Dijkstra. On paper it is almost a perfect sort:
O(n) best
O(n log n) average
O(n log n) worst
O(1) memory
n comparisons, 0 swaps when input is sorted
It is so rare due to it's complex nature (which makes it hard to optimize).
You can read the paper written by Dijkstra himself here: http://www.cs.utexas.edu/users/EWD/ewd07xx/EWD796a.PDF
And here is the wikipedia link and a very extensive article about smoothsort (by Keith Schwarz).
One of a sorting which may be you say Rare Sorting, is timsorting, It works great in arrays which are have sorted parts, best case is O(n), and worst and average case is O(n log n).
Another fast way of sorting is bitonic sorting, which is base of nearly all parallel sorting algorithms. you can find thousands of papers about in the web, also some books like Parallel algorithm of Quinn you can find extended discussion on it, and related variations of this algorithm.
Also Art of computer programming volume 3 has good discussion on sorting strategies.
Bitonic sort is O(N log^2(N)) (slightly asymptotically slower than the likes of quicksort), but it is parallellizable, with a highly regular structure. This lets you use SIMD vector instruction sets like SSE -- providing a constant-factor boost which makes it an interesting option for "bottom-level" sorts (instead of the more commonly used insertion sort).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Quicksort has a worst-case performance of O(n2), but is still used widely in practice anyway. Why is this?
You shouldn't center only on worst case and only on time complexity. It's more about average than worst, and it's about time and space.
Quicksort:
has average time complexity of Θ(n log n);
can be implemented with space complexity of Θ(log n);
Also have in account that big O notation doesn't take in account any constants, but in practice it does make difference if the algorithm is few times faster. Θ(n log n) means, that algorithm executes in K n log(n), where K is constant. Quicksort is the comparison-sort algorithm with the lowest K.
Average asymptotic order of QuickSort is O(nlogn) and it's usually more efficient than heapsort due to smaller constants (tighter loops). In fact, there is a theoretical linear time median selection algorithm that you can use to always find the best pivot, thus resulting a worst case O(nlogn). However, the normal QuickSort is usually faster than this theoretical one.
To make it more sensible, consider the probability that QuickSort will finish in O(n2). It's just 1/n! which means it'll almost never encounter that bad case.
Interestingly, quicksort performs more comparisons on average than mergesort - 1.44 n lg n (expected) for quicksort versus n lg n for mergesort. If all that mattered were comparisons, mergesort would be strongly preferable to quicksort.
The reason that quicksort is fast is that it has many other desirable properties that work extremely well on modern hardware. For example, quicksort requires no dynamic allocations. It can work in-place on the original array, using only O(log n) stack space (worst-case if implemented correctly) to store the stack frames necessary for recursion. Although mergesort can be made to do this, doing so usually comes at a huge performance penalty during the merge step. Other sorting algorithms like heapsort also have this property.
Additionally, quicksort has excellent locality of reference. The partitioning step, if done using Hoare's in-place partitioning algorithm, is essentially two linear scans performed inward from both ends of the array. This means that quicksort will have a very small number of cache misses, which on modern architectures is critical for performance. Heapsort, on the other hand, doesn't have very good locality (it jumps around all over an array), though most mergesort implementations have reasonably locality.
Quicksort is also very parallelizable. Once the initial partitioning step has occurred to split the array into smaller and greater regions, those two parts can be sorted independently of one another. Many sorting algorithms can be parallelized, including mergesort, but the performance of parallel quicksort tends to be better than other parallel algorithms for the above reason. Heapsort, on the other hand, does not.
The only issue with quicksort is the possibility that it degrades to O(n2), which on large data sets can be very serious. One way to avoid this is to have the algorithm introspect on itself and switch to one of the slower but more dependable algorithms in the case where it degenerates. This algorithm, called introsort, is a great hybrid sorting algorithm that gets many of the benefits of quicksort without the pathological case.
In summary:
Quicksort is in-place except for the stack frames used in the recursion, which take O(log n) space.
Quicksort has good locality of reference.
Quicksort is easily parallelized.
This accounts for why quicksort tends to outperform sorting algorithms that on paper might be better.
Hope this helps!
Because on average it's the fastest comparison sort (in terms of elapsed time).
Because, in the general case, it's one of the fastest sorting algorithms.
In addition to being the fastest though, some of it's bad case scenarios can be avoided by shuffling the array before sorting it. As for it's weakness with small data sets, obviously isn't as big a problem since the datasets are small and the sort time is probably small regardless.
As an example, I wrote a python function for QuickSort and bubble sorts. The bubble sort takes ~20 seconds to sort 10,000 records, 11 seconds for 7500, and 5 for 5000. The quicksort does all these sorts in around 0.15 seconds!
It might be worth pointing out that C does have the library function qsort(), but there's no requirement that it be implemented using an actual QuickSort, that is up to the compiler vendor.
Bcs this is one of Algorithm with work well on large data set with O(NlogN) complexity. This is also in place algorithm which take constant space. By selecting pivot element wisely we can avoid worse case of Quick sort and will perform in O(NlogN) always even on sorted array.