Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I am currently studying algorithms at college and I am curios as what does a seasoned developer uses in its code when said programmer needs to sort something.
C++ uses IntroSort which has an average of Θ(n log(n)) and worst of Θ(n^2).
C# uses QuickSort which has an average of Θ(n log(n)) and worst of Θ(n^2).
Java uses MergeSort which has an average of Θ(n log(n)) and worst of Θ(n log(n)).
Javascript seems like its doing Θ(n log(n)) and the algorithm depends on the browser.
And from a quick read, the majority of languages have a sorting method that has a time complexity of Θ(n log(n)).
Do programmers use the default sorting methods or they implement their own ?
When do they use the default one and when do they implement their own ?
Is Θ(n log(n)) the best time a sorting algorithm can get ?
There are a ton of sorting algorithm, as I am currently finding out in uni.
I am currently studying algorithms at college and I am curios as what does a seasoned developer uses in its code when said programmer needs to sort something.
Different sorting algorithms have different applications. You choose the best algorithm for the problem you're facing. For example, if you have a list of items in-memory then you can sort them in-place with QuickSort - if you want to sort items that are streamed-in (i.e. an online sort) then QuickSort wouldn't be appropriate.
C++ uses IntroSort which has an average of Θ(n log(n)) and worst of Θ(n^2).
I think you mean C++'s STL sort defaults to using Introsort in most implementations (including the original SGI STL and GNU's, but I don't believe the C++ specification specifically requires sort to use Introsort - it only requires it to be a stable sort. C++ is just a language, which does not have a sorting-algorithm built in to the language. Anyway, it's a library feature, not a language feature.
C# uses QuickSort which has an average of Θ(n log(n)) and worst of Θ(n^2).
Again, C# (the language) does not have any built-in sorting functionality. It's a .NET BCL (Base Class Library) feature that exposes methods that perform the sorting (such as Array.Sort, List<T>.Sort, Enumerable.OrderBy<T>, and so on). Unlike the C++ specification, the C# official documentation does state that the algorithm used by List<T>.Sort is Quicksort, but other methods like Enumerable.OrderBy<T> leave the actual sorting algorithm used to the backend provider (e.g. in Linq-to-SQL and Linq-to-Entities the sorting is performed by the remote database engine).
Do programmers use the default sorting methods or they implement their own ?
Generally speaking, we use the defaults because they're good enough for 95%+ of all workloads and scenarios - or because the specification allows the toolchain and library we're using to pick the best algorithm for the runtime platform (e.g. C++'s sort could hypothetically make use of hardware-sorting which allows for sorting of constrained values of n in O(1) to O(n) worst-case time instead of O(n^2) with QuickSort - which is a problem when processing unsanitized user-input.
But also, generally speaking, programmers should never reimplement their own sorting algorithms. Modern languages with support for templates and generics mean that an algorithm can be written in the general form for us, so we just need to provide the data to be sorted and either comparator function or a sorting key selector, which avoids common programmer human errors when reimplementing a stock algorithm.
As for the possibility of programmers inventing their own new novel sorting algorithms... with few exceptions that really doesn't happen. As with cryptography, if you find yourself "inventing" a new sorting algorithm I guarantee that not only are you not inventing a new algorithm, but that your algorithm will be flawed in some way or another. In short: don't - at least not until you've ran your idea past your nearest computer science academic.
When do they use the default one and when do they implement their own ?
See above. You're also not considering using a non-default algorithm. As the other answers have said, it's based on the application, i.e. the problem you're trying to solve.
Is Θ(n log(n)) the best time a sorting algorithm can get ?
You need to understand the difference between Best-case, Average-case, and Worst-case time complexities. Just read the Wikipedia article section with the big table that shows you the different runtime complexities: https://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_sorts - for example, insertion sort has a best-case time complexity of O(n) which is much better than O(n log n), which directly contradicts your supposition.
There are a ton of sorting algorithm, as I am currently finding out in uni.
I think you would be better served by bringing your questions to your class TA/prof/reader as they know the course material you're using and know the context in which you're asking.
A different sort is practically chosen based upon what it is sorting and where it is sorting it
whether the data needs to be sorted at all (can all or a subset of the data be inserted in order?)
how sorted the data is already (does it come in sorted chunks?)
whether the data needs to be sorted now or how unsorted it can be before it should be sorted (when? cache compaction during off-peak hours?)
time complexity
space requirements for the sort
Distributed environments are also extremely relevant in modern software, and causes states where not all of the nodes may be available
This greatly changes how things are or if they are fully sorted (for example data may be sliced up to different nodes, partially sorted, and then referenced by some sort of Cuckoo Hash YT)
The standard list sort in Haskell uses merge sort.
Divide the list into "runs"; sections where the input is already in ascending order, or in descending order. The minimum run length is 2, and for random data the average is 3 (I think). Runs in descending order are just reversed, so now you have a list of sorted lists.
Merge each pair of lists. Repeat until you have only one list left.
This is O(n . log n) in the worst case and O(n) if the input is already sorted. Also it is stable.
I think I qualify as a seasoned developer. If I just what to sort whatever, I will almost always call the library function. A lot of effort has been put into optimizing it.
Some of the situations in which I will write my own sort include:
When I need to do it incrementally. Insertion-sort each item as it comes in to maintain a sorted sequence, maybe, or use a heap as a priority queue.
When a counting sort or bucket sort will do. In this case it's easy to implement and has lower complexity.
When the keys are integers and speed is very important, I sometimes implement a radix sort.
When the stuff I need to sort doesn't fit in memory (external sorting)
When I need to build a suffix array or otherwise take advantage of special relationships between the keys.
When comparisons are extremely expensive, I will sometimes implement a merge sort to put a good upper bound on how many I have to do.
In a real-time context that is memory constrained, I will sometimes write a heap sort to get in-place sorting with a good upper bound on worst-case execution time.
If I can produce the required ordering as a side-effect of something else that is going on (and it make design sense), then I might take advantage of that instead of doing a separate sort.
In the overwhelming number of cases only the default sorts of a language are used.
When that is not the case it is mostly because the data has some special properties that can be used to reduce the sort time, an even those are then mostly the ordering lambda that is changed.
Some cases where you know that only a few distinct values are have simple O(N) sorting algorithms that could be used.
In decreasing order of generality and increasing order of simplicity.
radix sort
bucket sorts
counting sort
Related
There's bubble, insert, selection, quick sorting algorithm.
Which one is the 'fastest' algorithm?
code size is not important.
Bubble sort
insertion sort
quick sort
I tried to check speed. when data is already sorted, bubble, insertion's Big-O is n but the algorithm is too slow on large lists.
Is it good to use only one algorithm?
Or faster to use a different mix?
Quicksort is generally very good, only really falling down when the data is close to being ordered already, or when the data has a lot of similarity (lots of key repeats), in which case it is slower.
If you don't know anything about your data and you don't mind risking the slow case of quick sort (if you think about it you can probably make a determination for your case if it's ever likely you'll get this (from already ordered data)) then quicksort is never going to be a BAD choice.
If you decide your data is or will sometimes (or often enough to be a problem) be sorted (or significantly partially sorted) already, or one way and another you decide you can't risk the worst case of quicksort, then consider timsort.
As noted by the comments on your question though, if it's really important to have the ultimate performance, you should consider implementing several algorithms and trying them on good representative sample data.
HP / Microsoft std::sort is introsort (quick sort switching to heap sort if nesting reaches some limit), and std::stable_sort is a variation of bottom up mergesort.
For sorting an array or vector of mostly random integers, counting / radix sort would normally be fastest.
Most external sorts are some variation of a k-way bottom up merge sort (the initial internal sort phase could use any of the algorithms mentioned above).
For sorting a small (16 or less) fixed number of elements, a sorting network could be used. This seems to be one of the lesser known algorithms. It would mostly be useful if having to repeatedly sort small sets of elements, perhaps implemented in hardware.
why we are always using quick sort ? or any specific sorting algorithm ??
i tried some experiment on my PC using quick,merge,heap,flash sort
results:-
sorting algorithm : time in nanosecond -> time in minutes
quick sort time : 135057597441 -> 2.25095995735
Flash sort time : 137704213630 -> 2.29507022716667
merge sort time : 138317794813 -> 2.30529658021667
heap sort time : 148662032992 -> 2.47770054986667
using java in built function
long startTime = System.nanoTime();
given times are in nanoseconds there hardly any difference between them if we convert them into seconds for 20000000 random integer data and max array size is 2147483647 in java.if we are using in-place algorithm then there may be difference of 1 to 2 min till max array size.
if the difference is too small why we should care ??
All of the algorithms presented have a similar average case bounds, of O(n lg n), which is the "best" a comparison sort can do.
Since they share the same average bounds, the expected performance of these algorithms over random data should be similar - which is what the findings show. However, the devil is in the details. Here is a very quick summary; follow the links for further details.
Quicksort is generally not stable (but there are stable variations). While quicksort has an average bounds of O(n lg n), Quicksort has a worst case bounds of O(n * n) but there are ways to mitigate this. Quicksort, like heapsort, is done in-place.
Merge-sort is a stable sort. Mergesort has a worst case bounds of O(n lg n) which means it has predictable performance. Base merge-sort requires O(n) extra space so it's generally not an in-place sort (although there is an in-place variant, and the memory for a linked list implementation is constant).
Heapsort is not stable; it also has the worst case bounds of O(n lg n), but has the benefit of a constant size bounds and being in-place. It has worse cache and parallelism aspects than merge-sort.
Exactly which one is "best" depends upon the use-case, data, and exact implementation/variant.
Merge-sort (or hybrid such as Timsort) is the "default" sort implementation in many libraries/languages. A common Quicksort-based hybrid, Introsort is used in several C++ implementations. Vanilla/plain Quicksort implementations, should they be provided, are usually secondary implementations.
Merge-sort: a stable sort with consistent performance and acceptable memory bounds.
Quicksort/heapsort: trivially work in-place and [effectively] don't require additional memory.
We rarely need to sort integer data. One of the biggest overheads on a sort is the time it takes to make comparisons. Quicksort reduces the number of comparisons required by comparison with, say, a bubble sort. If you're sorting strings this is much more significant. As a real world example some years ago I wrote a sort/merge that took 40 minutes with a bubble sort, and 17 with a quick sort. (It was a z80 CPU a long time ago. I'd expect much better performance now).
Your conclusion is correct: most people that do care about this in most situations waste their time. Differences between these algorithms in terms of time and memory complexity become significant in a particular scenarios where:
you have huge number of elements to sort
performance is really critical (for example: real-time systems)
resources are really limited (for example: embedded systems)
(please note the really)
Also, there is the concern of stability which may be important more often. Most standard libraries provide stable sort algorithms (for example: OrderBy in C#, std::stable_sort in C++, sort in Python, sort methods in Java).
Correctness. While switching between sort algorithms might offer speed-ups under some specific scenarios, the cost of proving that algorithms work can be quite high.
For instance, TimSort, a popular sorting algorithm used by Android, Java, and Python, had an implementation bug that went unnoticed for years. This bug could cause a crash and was easily induced by the user.
It took a dedicated team "looking for a challenge" to isolate and solve the issue.
For this reason, any time a standard implementation of a data structure or algorithm is available, I will use that standard implementation. The time saved by using a smarter implementation is rarely worth uncertainty about the implementation's security and correctness.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
There are varieties of sorting algorithms available. Sorting algorithm with time complexity of O(n^2) may be suited over O(nlogn), because it is in-place or it is stable. For example:
For somewhat sorted things insertion sort is good.
Applying quick sort on nearly sorted array is foolishness.
Heap sort is good with O(nlogn) but not stable.
Merge sort can't be used in embedded systems as in worst case it requires O(n) of space complexity.
I want to know which sorting algorithm is suitable in what conditions.
Which sorting algo is best for sorting names in alphabetical order?
Which sorting algo is best for sorting less integers?
Which sorting algo is best for sorting less integers but may be large in range (98767 – 6734784)?
Which sorting algo is best for sorting billions of integers?
Which sorting algo is best for sorting in embedded systems or real time systems where space and time both are constraints?
Please suggest these/other situations, books or website for these type of comparisons.
Well, there is no silver bullet - but here are some rules of thumb:
Radix sort/ Counting sort is usually good when the range of elements (let it be U) is relatively small comparing to the number of elements (U<<n) (might fit your case 2,4)
Insertion sort is good for small (say n<30) lists, even faster then O(nlogn) algorithms (empirically). In fact, you can optimize an O(nlogn) top-down algorithm by switching to insertion sort when n<30
A variation of radix sort might also be a good choice for sorting strings alphabetically, since it is O(|S|*n), while normal comparing based algorithm is O(|S|*nlogn) [where |S| is the length of your string]. (fits your case 1)
Where the sorted input is very large, way too large to fit in merge, the way to do it is with external sort - which is a variation or merge sort, it minimizes the number of disk reads/writes and makes sure these are done sequentially - because it improves the performance drastically. (might fit case 4)
For general case sorting, quick sort and timsort (used for java)
gives good performance.
Merge sort can't be used in embedded systems as in worst case it
requires O(n) of space complexity.
You may be interested in the stable_sort function from C++. It tries to allocate the extra space for a regular merge sort, but if that fails it does an in-place stable merge sort with inferior time complexity (n * ((log n)^2) instead of n * (log n)). If you can read C++ you can look at the implementation in your favourite standard library, otherwise I expect you can find the details explained somewhere in language-agnostic terms.
There's a body of academic literature about in-place stable sorting (and in particular in-place merging).
So in C++ the rule of thumb is easy, "use std::stable_sort if you need a stable sort, otherwise use std::sort". Python makes it even easier again, the rule of thumb is "use sorted".
In general, you will find that a lot of languages have fairly clever built-in sort algorithms, and you can use them most of the time. It's rare that you'll need to implement your own to beat the standard library. If you do need to implement your own, there isn't really any substitute for pulling out the textbooks, implementing a few algorithms with as many tricks as you can find, and testing them against each other for the specific case you're worried about for which you need to beat the library function.
Most of the "obvious" advice that you might be hoping for in response to this question is already incorporated into the built-in sort functions of one or more common programming languages. But to answer your specific questions:
Which sorting algo is best for sorting names in alphabetical order?
A radix sort might edge out standard comparison sorts like C++ sort, but that might not be possible if you're using "proper" collation rules for names. For example, "McAlister" used to be alphabetized the same as "MacAlister", and "St. John" as "Saint John". But then programmers came along and wanted to just sort by ASCII value rather than code a lot of special rules, so most computer systems don't use those rules any more. I find Friday afternoon is a good time for this kind of feature ;-) You can still use a radix sort if you do it on the letters of the "canonicalized" name rather than the actual name.
"Proper" collation rules in languages other than English are also entertaining. For example in German "Grüber" sorts like "Grueber", and therefore comes after "Gruber" but before "Gruhn". In English the name "Llewellyn" comes after "Lewis", but I believe in Welsh (using the exact same alphabet but different traditional collation rules) it comes before.
For that reason, it's easier to talk about optimizing string sorts than it is to actually do it. Sorting strings "properly" requires being able to plug in locale-specific collation rules, and if you move away from a comparison sort then you might have to re-write all your collation code.
Which sorting algo is best for sorting less integers?
For a small number of small values maybe a counting sort, but Introsort with a switch to insertion sort when the data gets small enough (20-30 elements) is pretty good. Timsort is especially good when the data isn't random.
Which sorting algo is best for sorting less integers but may be large in range (98767 – 6734784)?
The large range rules out counting sort, so for a small number of widely-ranged integers, Introsort/Timsort.
Which sorting algo is best for sorting billions of integers?
If by "billions" you mean "too many to fit in memory" then that changes the game a bit. Probably you want to divide the data into chunks that do fit in memory, Intro/Tim sort each one, then do an external merge. Of if you're on a 64 bit machine sorting 32 bit integers, you could consider counting sort.
Which sorting algo is best for sorting in embedded systems or real time systems where space and time both are constraints?
Probably Introsort.
For somewhat sorted things insertion sort is good.
True, and Timsort takes advantage of the same situation.
Applying quick sort on nearly sorted array is foolishness.
False. Nobody uses the plain QuickSort originally published by Hoare, you can make better choices of pivot that make the killer cases much less obvious than "sorted data". To deal with the bad cases thoroughly there is Introsort.
Heap sort is good with O(nlogn) but not stable.
True, but Introsort is better (and also not stable).
Merge sort can't be used in embedded systems as in worst case it requires O(n) of space complexity.
Handle this by allowing for somewhat slower in-place merging like std::stable_sort does.
I am risking this question being closed before i get an answer, but i really do want to know the answer. So here goes.
I am currently trying to learn algorithms, and I am beginning to understand it as such but cannot relate to it.
I understand Time Complexity and Space Complexity. I also do understand some sorting algorithms based on the pseudo code
Sorting algorithms like
Bubble Sort
Insertion Sort
Selection Sort
Quicksort
Mergesort
Heapsort (Some what)
I am also aware of Best Case and Worst Case scenarios(Average case not so much).
Some online relevant references
Nice place which shows all the above graphically.
This gave me a good understanding as well.
BUT my question is - can some one give me REAL WORLD EXAMPLES where these sorting algorithms are implemented.
As the number of elements increases, you will use more sophisticated sorting algorithms. The later sorting techniques have a higher initial overhead, so you need a lot of elements to sort to justify that cost. If you only have 10 elements, a bubble or insertion sort will be the much faster than a merge sort, or heapsort.
Space complexity is important to consider for smaller embedded devices like a TV remote, or a cell phone. You don't have enough space to do something like a heapsort on those devices.
Datebases use an external merge sort to sort sets of data that are too large to be loaded entirely into memory. The driving factor in this sort is the reduction in the number of disk I/Os.
Good bubble sort discussion, there are many other factors to consider that contribute to a time and space complexity.
Sorting-Algorithms.com
One example is C++ STL sort
as the wikipedia page says:
The GNU Standard C++ library, for example, uses a hybrid sorting
algorithm: introsort is performed first, to a maximum depth given by
2×log2 n, where n is the number of elements, followed by an insertion
sort on the result.1 Whatever the implementation, the complexity
should be O(n log n) comparisons on the average.[2]
This question already has answers here:
Why is quicksort better than mergesort?
(29 answers)
Closed 9 years ago.
Why might quick sort be better than merge sort ?
See Quicksort on wikipedia:
Typically, quicksort is significantly
faster in practice than other Θ(nlogn)
algorithms, because its inner loop can
be efficiently implemented on most
architectures, and in most real-world
data, it is possible to make design
choices which minimize the probability
of requiring quadratic time.
Note that the very low memory requirement is a big plus as well.
Quick sort is typically faster than merge sort when the data is stored in memory. However, when the data set is huge and is stored on external devices such as a hard drive, merge sort is the clear winner in terms of speed. It minimizes the expensive reads of the external drive and also lends itself well to parallel computing.
For Merge sort worst case is O(n*log(n)), for Quick sort: O(n2). For other cases (avg, best) both have O(n*log(n)). However Quick sort is space constant where Merge sort depends on the structure you're sorting.
See this comparison.
You can also see it visually.
While quicksort is often a better choice than merge sort, there are definitely times when merge sort is thereotically a better choice. The most obvious time is when it's extremely important that your algorithm run faster than O(n^2). Quicksort is usually faster than this, but given the theoretical worst possible input, it could run in O(n^2), which is worse than the worst possible merge sort.
Quicksort is also more complicated than mergesort, especially if you want to write a really solid implementation, and so if you're aiming for simplicity and maintainability, merge sort becomes a promising alternative with very little performance loss.
I personally wanted to test the difference between Quick sort and merge sort myself and saw the running times for a sample of 1,000,000 elements.
Quick sort was able to do it in 156 milliseconds whereas
Merge sort did the same in 247 milliseconds
The Quick sort data, however, was random and quick sort performs well if the data is random where as its not the case with merge sort i.e. merge sort performs the same, irrespective of whether data is sorted or not.
But merge sort requires one full extra space and quick sort does not as its an in-place sort
I have written comprehensive working program for them will illustrative pictures too.
Quicksort is in place. You just need to swap positions of data during the Partitioning function.
Mergesort requires a lot more data copying. You need another temporary storage (typically
the same size as your original data array) for the Merge function.
In addition to the others: Merge sort is very efficient for immutable datastructures like linked lists and is therefore a good choice for (purely) functional programming languages.
A poorly implemented quicksort can be a security risk.
It is not true that quicksort is better. ALso, it depends on what you mean better, memory consumption, or speed.
In terms of memory consumption, in worst case, but quicksort can use n^2 memory (i.e. each partition is 1 to n-1), whereas merge sort uses nlogn.
The above follows in terms of speed.
quicksort is named so for a reason ,
highlights :
both are stable sorts,(simply an implementation nuisance ) , so lets just move on to complexities
its very confusing with just the big-oh notations being spilled and "abused" , both have average case complexity of 0(nlogn) ,
but merge sort is always 0(nlogn) , whereas quicksort for bad partitions, ie skewed partitions like 1 element-10 element (which can happen due to sorted or reverse sorted list ) can lead to a 0(n^2)..
.. and so we have randomized quicksort , where we pick the pivot randomly and avoid such skewed partitioning , thereby nullifying the whole n^2 scenario
anyway even for moderately skewed partitioning like 3-4 , we have a nlog(7/4)n,
ideally we want 1-1 partion , thus the whole 2 of O(nlog(2)n).
so it is O(nlogn) , almost always and unlike merge sort the constants hidden under the "big-oh" notation are better for quicksort than for mergesort ..and it doesnt use up extra space like merge sort.
but getting quicksort run perfectly requires tweaking ,rephrase , quicksort provides you opportunities to tweak ....
The answer would slightly tilt towards quicksort w.r.t to changes brought with DualPivotQuickSort for primitive values . It is used in JAVA 7 to sort in java.util.Arrays
It is proved that for the Dual-Pivot Quicksort the average number of
comparisons is 2*n*ln(n), the average number of swaps is 0.8*n*ln(n),
whereas classical Quicksort algorithm has 2*n*ln(n) and 1*n*ln(n)
respectively. Full mathematical proof see in attached proof.txt
and proof_add.txt files. Theoretical results are also confirmed
by experimental counting of the operations.
You can find the JAVA7 implmentation here - http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/Arrays.java
Further Awesome Reading on DualPivotQuickSort - http://permalink.gmane.org/gmane.comp.java.openjdk.core-libs.devel/2628
Quicksort is in place. You need very little extra memory. Which is extremely important.
Good choice of median makes it even more efficient but even a bad choice of median quarantees Theta(nlogn).