Why does Haskell use mergesort instead of quicksort?

Why does Haskell use mergesort instead of quicksort? - performance

In Wikibooks' Haskell, there is the following claim:
Data.List offers a sort function for sorting lists. It does not use quicksort; rather, it uses an efficient implementation of an algorithm called mergesort.
What is the underlying reason in Haskell to use mergesort over quicksort? Quicksort usually has better practical performance, but maybe not in this case. I gather that the in-place benefits of quicksort are hard (impossible?) to do with Haskell lists.
There was a related question on softwareengineering.SE, but it wasn't really about why mergesort is used.
I implemented the two sorts myself for profiling. Mergesort was superior (around twice as fast for a list of 2^20 elements), but I'm not sure that my implementation of quicksort was optimal.
Edit: Here are my implementations of mergesort and quicksort:
mergesort :: Ord a => [a] -> [a]
mergesort [] = []
mergesort [x] = [x]
mergesort l = merge (mergesort left) (mergesort right)
where size = div (length l) 2
(left, right) = splitAt size l
merge :: Ord a => [a] -> [a] -> [a]
merge ls [] = ls
merge [] vs = vs
merge first#(l:ls) second#(v:vs)
| l < v = l : merge ls second
| otherwise = v : merge first vs
quicksort :: Ord a => [a] -> [a]
quicksort [] = []
quicksort [x] = [x]
quicksort l = quicksort less ++ pivot:(quicksort greater)
where pivotIndex = div (length l) 2
pivot = l !! pivotIndex
[less, greater] = foldl addElem [[], []] $ enumerate l
addElem [less, greater] (index, elem)
| index == pivotIndex = [less, greater]
| elem < pivot = [elem:less, greater]
| otherwise = [less, elem:greater]
enumerate :: [a] -> [(Int, a)]
enumerate = zip [0..]
Edit 2 3: I was asked to provide timings for my implementations versus the sort in Data.List. Following #Will Ness' suggestions, I compiled this gist with the -O2 flag, changing the supplied sort in main each time, and executed it with +RTS -s. The sorted list was a cheaply-created, pseudorandom [Int] list with 2^20 elements. The results were as follows:
Data.List.sort: 0.171s
mergesort: 1.092s (~6x slower than Data.List.sort)
quicksort: 1.152s (~7x slower than Data.List.sort)

In imperative languages, Quicksort is performed in-place by mutating an array. As you demonstrate in your code sample, you can adapt Quicksort to a pure functional language like Haskell by building singly-linked lists instead, but this is not as fast.
On the other hand, Mergesort is not an in-place algorithm: a straightforward imperative implementation copies the merged data to a different allocation. This is a better fit for Haskell, which by its nature must copy the data anyway.
Let's step back a bit: Quicksort's performance edge is "lore" -- a reputation built up decades ago on machines much different from the ones we use today. Even if you use the same language, this kind of lore needs rechecking from time to time, as the facts on the ground can change. The last benchmarking paper I read on this topic had Quicksort still on top, but its lead over Mergesort was slim, even in C/C++.
Mergesort has other advantages: it doesn't need to be tweaked to avoid Quicksort's O(n^2) worst case, and it is naturally stable. So, if you lose the narrow performance difference due to other factors, Mergesort is an obvious choice.

I think #comingstorm's answer is pretty much on the nose, but here's some more info on the history of GHC's sort function.
In the source code for Data.OldList, you can find the implementation of sort and verify for yourself that it's a merge sort. Just below the definition in that file is the following comment:
Quicksort replaced by mergesort, 14/5/2002.
From: Ian Lynagh <igloo#earth.li>
I am curious as to why the List.sort implementation in GHC is a
quicksort algorithm rather than an algorithm that guarantees n log n
time in the worst case? I have attached a mergesort implementation along
with a few scripts to time it's performance...
So, originally a functional quicksort was used (and the function qsort is still there, but commented out). Ian's benchmarks showed that his mergesort was competitive with quicksort in the "random list" case and massively outperformed it in the case of already sorted data. Later, Ian's version was replaced by another implementation that was about twice as fast, according to additional comments in that file.
The main issue with the original qsort was that it didn't use a random pivot. Instead it pivoted on the first value in the list. This is obviously pretty bad because it implies performance will be worst case (or close) for sorted (or nearly sorted) input. Unfortunately, there are a couple of challenges in switching from "pivot on first" to an alternative (either random, or -- as in your implementation -- somewhere in "the middle"). In a functional language without side effects, managing a pseudorandom input is a bit of a problem, but let's say you solve that (maybe by building a random number generator into your sort function). You still have the problem that, when sorting an immutable linked list, locating an arbitrary pivot and then partitioning based on it will involve multiple list traversals and sublist copies.
I think the only way to realize the supposed benefits of quicksort would be to write the list out to a vector, sort it in place (and sacrifice sort stability), and write it back out to a list. I don't see that that could ever be an overall win. On the other hand, if you already have data in a vector, then an in-place quicksort would definitely be a reasonable option.

On a singly-linked list, mergesort can be done in place. What's more, naive implementations scan over half the list in order to get the start of the second sublist, but the start of the second sublist falls out as a side effect of sorting the first sublist and does not need extra scanning. The one thing quicksort has going over mergesort is cache coherency. Quicksort works with elements close to each other in memory. As soon as an element of indirection enters into it, like when you are sorting pointer arrays instead of the data itself, that advantage becomes less.
Mergesort has hard guarantees for worst-case behavior, and it's easy to do stable sorting with it.

Short answer:
Quicksort is advantageous for arrays (in-place, fast, but not worst-case optimal). Mergesort for linked lists (fast, worst-case optimal, stable, simple).
Quicksort is slow for lists, Mergesort is not in-place for arrays.

Many arguments on why Quicksort is not used in Haskell seem plausible. However, at least Quicksort is not slower than Mergesort for the random case. Based on the implementation given in Richard Bird's book, Thinking Functionally in Haskell, I made a 3-way Quicksort:
tqsort [] = []
tqsort (x:xs) = sortp xs [] [x] []
where
sortp [] us ws vs = tqsort us ++ ws ++ tqsort vs
sortp (y:ys) us ws vs =
case compare y x of
LT -> sortp ys (y:us) ws vs
GT -> sortp ys us ws (y:vs)
_ -> sortp ys us (y:ws) vs
I benchmarked a few cases, e.g., lists of size 10^4 containing Int between 0 and 10^3 or 10^4, and so on. The result is the 3-way Quicksort or even Bird's version are better than GHC's Mergesort, something like 1.x~3.x faster than ghc's Mergesort, depending on the type of data (many repetitions? very sparse?). The following stats is generated by criterion:
benchmarking Data.List.sort/Diverse/10^5
time 223.0 ms (217.0 ms .. 228.8 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 226.4 ms (224.5 ms .. 228.3 ms)
std dev 2.591 ms (1.824 ms .. 3.354 ms)
variance introduced by outliers: 14% (moderately inflated)
benchmarking 3-way Quicksort/Diverse/10^5
time 91.45 ms (86.13 ms .. 98.14 ms)
0.996 R² (0.993 R² .. 0.999 R²)
mean 96.65 ms (94.48 ms .. 98.91 ms)
std dev 3.665 ms (2.775 ms .. 4.554 ms)
However, there is another requirement of sort stated in Haskell 98/2010: it needs to be stable. The typical Quicksort implementation using Data.List.partition is stable, but the above one isn't.
Later addition: A stable 3-way Quicksort mentioned in the comment seems as fast as tqsort here.

I am not sure, but looking at the code i don't think Data.List.sort is Mergesort as we know it. It just makes a single pass starting with the sequences function in a beautiful triangular mutual recursive fashion with ascending and descending functions to result in a list of already ascending or descending ordered chunks in the required order. Only then it starts merging.
It's a manifestation of poetry in coding. Unlike Quicksort, its worst case (total random input) has O(nlogn) time complexity, and best case (already sorted ascending or descending) is O(n).
I don't think any other sorting algorithm can beat it.

Related

Compare complexity of two algorithms: Identify Basic Operation applicable for both algorithms?

For an assignment I have to theoretically analyze the complexity of two algorithms (sorting) to compare them. Then I will implement them and try to confirm the efficiency empirically.
Given that, I analyzed both algorithms and I know the efficiency classes but I have a problem identifying the basic operation. There was a hint that we should be careful in choosing a basic operation because it should be applicable for both algorithms. My problem is that I don't really know why I should take the same basic operation for both algorithms.
Pseudocode
Algo1:
//sorts given array A[0..n-1]
for i=0 to n-2
min <- i
for j <- i+1 to n-1
if A[j] < A[min] min <- j
swap A[i] and A[min]
Efficiency: Theta(n^2)
Algo2:
//sorts given array with limited range (u,l)
for j = 0 to u-l D[j] = 0
for i = 0 to n-1
D[A[i]-l] = D[A[i]-l]+1
for j=1 to u-l D[j] = D[j-1]+D[j]
for i=n-1 to 0
j = A[i]-l
S[D[j]-1] = A[i]
D[j] = D[j]-1
return S
Efficiency Levitin -> Theta(n), Johnsonbaugh -> Theta(n+m) m: distinctable integers in array
So my understanding is that I choose the operation occuring the most as basic operation and I don't see why there is a difference when I choose different basic operations for each algorithm. In the end it doesn't matter because it will lead to the same efficiency class anyway but maybe its important for the empirical analysis (comparing the number of basic operation needed for different input sizes)?
What I plan to do now is to choose assignment as basic operation which is performed 5 times in Algo1 and 6 times in Algo 2 (dependant on the loops of course). Is there a downside to this approach?

Typical choices for "basic operation" would be to look at number of comparisons, or swaps.
Consider a system with a memory hierarchy, where "hot" items are in cache and "cold" items result in an L2-miss followed by RAM reference, or result in a disk I/O. Then the cache hit cost might be essentially zero, and the basic operation boils down to cost of cache misses, leading to a new expression for time complexity.
Mostly-ordered lists get sorted more often than you might think. A stable sort may be more cache-friendly than an unstable sort. If it is easy to reason about how a sort's comparison order interacts with cache evictions, that can lead to a good big-O description of its expected running time.
EDIT: "Reading an element of A[]" seems a fair operation to talk about. Fancier analyses would look at how many "cache miss on A[]" operations happen.

How fast is Data.Sequence.Seq compared to []?

Clearly Seq asymptotically performs the same or better as [] for all possible operations. But since its structure is more complicated than lists, for small sizes its constant overhead will probably make it slower. I'd like to know how much, in particular:
How much slower is <| compared to :?
How much slower is folding over/traversing Seq compared to folding over/traversing [] (excluding the cost of a folding/traversing function)?
What is the size (approximately) for which \xs x -> xs ++ [x] becomes slower than |>?
What is the size (approximately) for which ++ becomes slower than ><?
What's the cost of calling viewl and pattern matching on the result compared to pattern matching on a list?
How much memory does an n-element Seq occupy compared to an n-element list? (Not counting the memory occupied by the elements, only the structure.)
I know that it's difficult to measure, since with Seq we talk about amortized complexity, but I'd like to know at least some rough numbers.

This should be a start - http://www.haskell.org/haskellwiki/Performance#Data.Sequence_vs._lists
A sequence uses between 5/6 and 4/3 times as much space as the equivalent list (assuming an overhead of one word per node, as in GHC). If only deque operations are used, the space usage will be near the lower end of the range, because all internal nodes will be ternary. Heavy use of split and append will result in sequences using approximately the same space as lists. In detail:
a list of length n consists of n cons nodes, each occupying 3 words.
a sequence of length n has approximately n/(k-1) nodes, where k is the average arity of the internal nodes (each 2 or 3). There is a pointer, a size and overhead for each node, plus a pointer for each element, i.e. n(3/(k-1) + 1) words.
List is a non-trivial constant-factor faster for operations at the head (cons and head), making it a more efficient choice for stack-like and stream-like access patterns. Data.Sequence is faster for every other access pattern, such as queue and random access.

I have one more concrete result to add to above answer. I am solving a Langevin equation. I used List and Data.Sequence. A lot of insertions at back of list/sequence are going on in this solution.
To sum up, I did not see any improvement in speed, in fact performance deteriorated with Sequences. Moreover with Data.Sequence, I need to increase the memory available for Haskell RTS.
Since I am definitely not an authority on optimizing; I post the both cases below. I'd be glad to know if this can be improved. Both codes were compiled with -O2 flag.
Solution with List, takes approx 13.01 sec
Solution with Data.Sequence, takes approx 15.13 sec

Why is an even-odd split 'faster' for MergeSort?

MergeSort is a divide-and-conquer algorithm that divides the input into several parts and solves the parts recursively.
...There are several approaches for the split function. One way is to split down the middle. That approach has some nice properties, however, we'll focus on a method that's a little bit faster: even-odd split. The idea is to put every even-position element in one list, and every odd-position in another.
This is straight from my lecture notes. Why exactly is it the case that the even-odd split is faster than down the middle of the array?
I'm speculating it has something to do with the list being passed into MergeSort and having the quality of already already sorted, but I'm not entirely sure.
Could anyone shed some light on this?
Edit: I tried running the following in Python...
global K
K = []
for i in range (1, 100000):
K.append(i)
def testMergeSort():
"""
testMergeSort shows the proper functionality for the
Merge Sort Algorithm implemented above.
"""
t = Timer("mergeSort([K])", "from __main__ import *")
print(t.timeit(1000000))
p = Timer("mergeSort2([K])", "from __main__ import *")
print(p.timeit(1000000))
(MergeSort is the even-odd MergeSort, MergeSort2 divides down the center)
And the result was:
0.771506746608
0.843161219237

I can see that it could be possible that it is better because splitting it with alternative elements means you don't have to know how long the input is to start with - you just take elements and put them in alternating lists until you run out.
Also you could potentially starting splitting the resulting lists before you have finished iterating through the first list if you are careful allowing for better parallel processing.
I should add that I'm no expert on these matters, they are just things that came to mind...

The closer the input list is to already being sorted, the lower the runtime (this is because the merge procedure doesn't have to move any of the values if everything is already in the correct order; it just performs O(n) comparisons. Since MergeSort recursively calls itself on each half of the split, one wants to choose a split function that increases the likelihood that the resulting halves of the list are in sorted order. If the list is mostly sorted, even-odd split will do a better job of this than splitting down the middle. For example,
MergeSort([2, 1, 4, 3, 5, 7])
would result in
Merge(MergeSort([2, 1, 4]), MergeSort([3, 5, 7]))
if we split down the middle (note that both sub-lists have sorting errors), whereas if we did even-odd split we would get
Merge(MergeSort([2, 4, 5]), MergeSort([1, 3, 7]))
which results in two already-sorted lists (and best-case performance for the subsequent calls to MergeSort). Without knowing anything about the input lists, though, the choice of splitting function shouldn't affect runtime asymptotically.

I suspect there is noise in your experiment. :) Some of it may come from compare-and-swap not actually moving any elements in the list which avoids cache invalidation, etc.
Regardless, there is a chat about this here: https://cstheory.stackexchange.com/questions/6732/why-is-an-even-odd-split-faster-for-mergesort/6764#6764 (and yes, I did post a similar answer there (full disclosure))
The related Wikipedia articles point out that mergesort is O( n log(n) ) while Odd-Even Merge Sort is O( n log(n)^2 ). Odd-Even is certainly "slower", but the sorting network is static so you always know what operations you are going to perform and (looking at the graphic in the Wikipedia entry) notice how the algorithm stays parallel until the end.
Where as merge sort finally merges 2 lists together the last comparisons of the 8-element sorting network for Odd-Even merge sort are still independent.

Is this searching algorithm optimal?

I have two lists, L and M, each containing thousands of 64-bit unsigned integers. I need to find out whether the sum of any two members of L is itself a member of M.
Is it possible to improve upon the performance of the following algorithm?
Sort(M)
for i = 0 to Length(L)
for j = i + 1 to Length(L)
BinarySearch(M, L[i] + L[j])

(I'm assuming your goal is to find all pairs in L that sum to something in M)
Forget hashtables!
Sort both lists.
Then do the outer loop of your algorithm: walk over every element i in L, then every larger element j in L. As you go, form the sum and check to see if it's in M.
But don't look using a binary search: simply do a linear scan from the last place you looked. Let's say you're working on some value i, and you have some value j, followed by some value j'. When searching for (i+j), you would have got to the point in M where that value is found, or the first largest value. You're now looking for (i+j'); since j' > j, you know that (i+j') > (i+j), and so it cannot be any earlier in M than the last place you got. If L and M are both smoothly distributed, there is an excellent chance that the point in M where you would find (i+j') is only a little way off.
If the arrays are not smoothly distributed, then better than a linear scan might be some sort of jumping scan - look forward N elements at a time, halving N if the jump goes too far.
I believe this algorithm is O(n^2), which is as fast as any proposed hash algorithm (which have an O(1) primitive operation, but still have to do O(n**2) of them. It also means that you don't have to worry about the O(n log n) to sort. It has much better data locality than the hash algorithms - it basically consists of paired streamed reads over the arrays, repeated n times.
EDIT: I have written implementations of Paul Baker's original algorithm, Nick Larsen's hashtable algorithm, and my algorithm, and a simple benchmarking framework. The implementations are simple (linear probing in the hashtable, no skipping in my linear search), and i had to make guesses at various sizing parameters. See http://urchin.earth.li/~twic/Code/SumTest/ for the code. I welcome corrections or suggestions, about any of the implementations, the framework, and the parameters.
For L and M containing 3438 items each, with values ranging from 1 to 34380, and with Larsen's hashtable having a load factor of 0.75, the median times for a run are:
Baker (binary search): 423 716 646 ns
Larsen (hashtable): 733 479 121 ns
Anderson (linear search): 62 077 597 ns
The difference is much bigger than i had expected (and, i admit, not in the direction i had expected). I suspect i have made one or more major mistakes in the implementation. If anyone spots one, i really would like to hear about it!
One thing is that i have allocated Larsen's hashtable inside the timed method. It is thus paying the cost of allocation and (some) garbage collection. I think this is fair, because it's a temporary structure only needed by the algorithm. If you think it's something that could be reused, it would be simple enough to move it into an instance field and allocate it only once (and Arrays.fill it with zero inside the timed method), and see how that affects performance.

The complexity of the example code in the question is O(m log m + l2 log m) where l=|L| and m=|M| as it runs binary search (O(log m)) for every pair of elements in L (O(l2)), and M is sorted first.
Replacing the binary search with a hash table reduces the complexity to O(l2) assuming that hash table insert and lookup are O(1) operations.
This is asymptotically optimal as long as you assume that you need to process every pair of numbers on the list L, as there are O(l2) such pairs. If there are a couple of thousands of numbers on L, and they are random 64-bit integers, then definitely you need to process all the pairs.

Instead of sorting M at a cost of n * log(n), you could create a hash set at the cost of n.
You could also store all sums in another hash set while iterating and add a check to make sure you don't perform the same search twice.

You can avoid binary search by using hashtable except sorted M array.

Alternatively, add all of the members of L to a hashset lSet, then iterate over M, performing these steps for each m in M:
add m to hashset mSet - if m is already in mSet, skip this iteration; if m is in hashset dSet, also skip this iteration.
subtract each member l of L less than m from m to give d, and test whether d is also in lSet;
if so, add (l, d) to some collection rSet; add d to hashset dSet.
This will require fewer iterations, at the cost of more memory. You will want to pre-allocate the memory for the structures, if this is to give you a speed increase.

Are there any worse sorting algorithms than Bogosort (a.k.a Monkey Sort)? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
My co-workers took me back in time to my University days with a discussion of sorting algorithms this morning. We reminisced about our favorites like StupidSort, and one of us was sure we had seen a sort algorithm that was O(n!). That got me started looking around for the "worst" sorting algorithms I could find.
We postulated that a completely random sort would be pretty bad (i.e. randomize the elements - is it in order? no? randomize again), and I looked around and found out that it's apparently called BogoSort, or Monkey Sort, or sometimes just Random Sort.
Monkey Sort appears to have a worst case performance of O(∞), a best case performance of O(n), and an average performance of O(n·n!).
What is the currently official accepted sorting algorithm with the worst average sorting performance (and there fore beeing worse than O(n·n!))?

From David Morgan-Mar's Esoteric Algorithms page: Intelligent Design Sort
Introduction
Intelligent design sort is a sorting algorithm based on the theory of
intelligent design.
Algorithm Description
The probability of the original input list being in the exact order
it's in is 1/(n!). There is such a small likelihood of this that it's
clearly absurd to say that this happened by chance, so it must have
been consciously put in that order by an intelligent Sorter. Therefore
it's safe to assume that it's already optimally Sorted in some way
that transcends our naïve mortal understanding of "ascending order".
Any attempt to change that order to conform to our own preconceptions
would actually make it less sorted.
Analysis
This algorithm is constant in time, and sorts the list in-place,
requiring no additional memory at all. In fact, it doesn't even
require any of that suspicious technological computer stuff. Praise
the Sorter!
Feedback
Gary Rogers writes:
Making the sort constant in time
denies the power of The Sorter. The
Sorter exists outside of time, thus
the sort is timeless. To require time
to validate the sort diminishes the role
of the Sorter. Thus... this particular
sort is flawed, and can not be
attributed to 'The Sorter'.
Heresy!

Many years ago, I invented (but never actually implemented) MiracleSort.
Start with an array in memory.
loop:
Check to see whether it's sorted.
Yes? We're done.
No? Wait a while and check again.
end loop
Eventually, alpha particles flipping bits in the memory chips should result in a successful sort.
For greater reliability, copy the array to a shielded location, and check potentially sorted arrays against the original.
So how do you check the potentially sorted array against the original? You just sort each array and check whether they match. MiracleSort is the obvious algorithm to use for this step.
EDIT: Strictly speaking, this is not an algorithm, since it's not guaranteed to terminate. Does "not an algorithm" qualify as "a worse algorithm"?

Quantum Bogosort
A sorting algorithm that assumes that the many-worlds interpretation of quantum mechanics is correct:
Check that the list is sorted. If not, destroy the universe.
At the conclusion of the algorithm, the list will be sorted in the only universe left standing.
This algorithm takes worst-case Θ(N) and average-case θ(1) time. In fact, the average number of comparisons performed is 2: there's a 50% chance that the universe will be destroyed on the second element, a 25% chance that it'll be destroyed on the third, and so on.

Jingle Sort, as described here.
You give each value in your list to a different child on Christmas. Children, being awful human beings, will compare the value of their gifts and sort themselves accordingly.

I'm surprised no one has mentioned sleepsort yet... Or haven't I noticed it? Anyway:
#!/bin/bash
function f() {
sleep "$1"
echo "$1"
}
while [ -n "$1" ]
do
f "$1" &
shift
done
wait
example usage:
./sleepsort.sh 5 3 6 3 6 3 1 4 7
./sleepsort.sh 8864569 7
In terms of performance it is terrible (especially the second example). Waiting almost 3.5 months to sort 2 numbers is kinda bad.

I had a lecturer who once suggested generating a random array, checking if it was sorted and then checking if the data was the same as the array to be sorted.
Best case O(N) (first time baby!)
Worst case O(Never)

There is a sort that's called bogobogosort. First, it checks the first 2 elements, and bogosorts them. Next it checks the first 3, bogosorts them, and so on.
Should the list be out of order at any time, it restarts by bogosorting the first 2 again. Regular bogosort has a average complexity of O(N!), this algorithm has a average complexity of O(N!1!2!3!...N!)
Edit: To give you an idea of how large this number is, for 20 elements, this algorithm takes an average of 3.930093*10^158 years,well above the proposed heat death of the universe(if it happens) of 10^100 years,
whereas merge sort takes around .0000004 seconds,
bubble sort .0000016 seconds,
and bogosort takes 308 years, 139 days, 19 hours, 35 minutes, 22.306 seconds, assuming a year is 365.242 days and a computer does 250,000,000 32 bit integer operations per second.
Edit2: This algorithm is not as slow as the "algorithm" miracle sort, which probably, like this sort, will get the computer sucked in the black hole before it successfully sorts 20 elemtnts, but if it did, I would estimate an average complexity of 2^(32(the number of bits in a 32 bit integer)*N)(the number of elements)*(a number <=10^40) years,
since gravity speeds up the chips alpha moving, and there are 2^N states, which is 2^640*10^40, or about 5.783*10^216.762162762 years, though if the list started out sorted, its complexity would only be O(N), faster than merge sort, which is only N log N even at the worst case.
Edit3: This algorithm is actually slower than miracle sort as the size gets very big, say 1000, since my algorithm would have a run time of 2.83*10^1175546 years, while the miracle sort algorithm would have a run time of 1.156*10^9657 years.

If you keep the algorithm meaningful in any way, O(n!) is the worst upper bound you can achieve.
Since checking each possibility for a permutations of a set to be sorted will take n! steps, you can't get any worse than that.
If you're doing more steps than that then the algorithm has no real useful purpose. Not to mention the following simple sorting algorithm with O(infinity):
list = someList
while (list not sorted):
doNothing

Bogobogosort. Yes, it's a thing. to Bogobogosort, you Bogosort the first element. Check to see if that one element is sorted. Being one element, it will be. Then you add the second element, and Bogosort those two until it's sorted. Then you add one more element, then Bogosort. Continue adding elements and Bogosorting until you have finally done every element. This was designed never to succeed with any sizable list before the heat death of the universe.

You should do some research into the exciting field of Pessimal Algorithms and Simplexity Analysis. These authors work on the problem of developing a sort with a pessimal best-case (your bogosort's best case is Omega(n), while slowsort (see paper) has a non-polynomial best-case time complexity).

Here's 2 sorts I came up with my roommate in college
1) Check the order
2) Maybe a miracle happened, go to 1
and
1) check if it is in order, if not
2) put each element into a packet and bounce it off a distant server back to yourself. Some of those packets will return in a different order, so go to 1

There's always the Bogobogosort (Bogoception!). It performs Bogosort on increasingly large subsets of the list, and then starts all over again if the list is ever not sorted.
for (int n=1; n<sizeof(list); ++n) {
while (!isInOrder(list, 0, n)) {
shuffle(list, 0, n);
}
if (!isInOrder(list, 0, n+1)) { n=0; }
}

1 Put your items to be sorted on index cards
2 Throw them into the air on a windy day, a mile from your house.
2 Throw them into a bonfire and confirm they are completely destroyed.
3 Check your kitchen floor for the correct ordering.
4 Repeat if it's not the correct order.
Best case scenerio is O(∞)
Edit above based on astute observation by KennyTM.

The "what would you like it to be?" sort
Note the system time.
Sort using Quicksort (or anything else reasonably sensible), omitting the very last swap.
Note the system time.
Calculate the required time. Extended precision arithmetic is a requirement.
Wait the required time.
Perform the last swap.
Not only can it implement any conceivable O(x) value short of infinity, the time taken is provably correct (if you can wait that long).

Nothing can be worse than infinity.

Segments of π
Assume π contains all possible finite number combinations.
See math.stackexchange question
Determine the number of digits needed from the size of the array.
Use segments of π places as indexes to determine how to re-order the array. If a segment exceeds the size boundaries for this array, adjust the π decimal offset and start over.
Check if the re-ordered array is sorted. If it is woot, else adjust the offset and start over.

Bozo sort is a related algorithm that checks if the list is sorted and, if not, swaps two items at random. It has the same best and worst case performances, but I would intuitively expect the average case to be longer than Bogosort. It's hard to find (or produce) any data on performance of this algorithm.

A worst case performance of O(∞) might not even make it an algorithm according to some.
An algorithm is just a series of steps and you can always do worse by tweaking it a little bit to get the desired output in more steps than it was previously taking. One could purposely put the knowledge of the number of steps taken into the algorithm and make it terminate and produce the correct output only after X number of steps have been done. That X could very well be of the order of O(n2) or O(nn!) or whatever the algorithm desired to do. That would effectively increase its best-case as well as average case bounds.
But your worst-case scenario cannot be topped :)

My favorite slow sorting algorithm is the stooge sort:
void stooges(long *begin, long *end) {
if( (end-begin) <= 1 ) return;
if( begin[0] < end[-1] ) swap(begin, end-1);
if( (end-begin) > 1 ) {
int one_third = (end-begin)/3;
stooges(begin, end-one_third);
stooges(begin+one_third, end);
stooges(begin, end-one_third);
}
}
The worst case complexity is O(n^(log(3) / log(1.5))) = O(n^2.7095...).
Another slow sorting algorithm is actually named slowsort!
void slow(long *start, long *end) {
if( (end-start) <= 1 ) return;
long *middle = start + (end-start)/2;
slow(start, middle);
slow(middle, end);
if( middle[-1] > end[-1] ) swap(middle-1, end-1);
slow(start, end-1);
}
This one takes O(n ^ (log n)) in the best case... even slower than stoogesort.

Recursive Bogosort (probably still O(n!){
if (list not sorted)
list1 = first half of list.
list 2 = second half of list.
Recursive bogosort (list1);
Recursive bogosort (list2);
list = list1 + list2
while(list not sorted)
shuffle(list);
}

Double bogosort
Bogosort twice and compare results (just to be sure it is sorted) if not do it again

This page is a interesting read on the topic: http://home.tiac.net/~cri_d/cri/2001/badsort.html
My personal favorite is Tom Duff's sillysort:
/*
* The time complexity of this thing is O(n^(a log n))
* for some constant a. This is a multiply and surrender
* algorithm: one that continues multiplying subproblems
* as long as possible until their solution can no longer
* be postponed.
*/
void sillysort(int a[], int i, int j){
int t, m;
for(;i!=j;--j){
m=(i+j)/2;
sillysort(a, i, m);
sillysort(a, m+1, j);
if(a[m]>a[j]){ t=a[m]; a[m]=a[j]; a[j]=t; }
}
}

You could make any sort algorithm slower by running your "is it sorted" step randomly. Something like:
Create an array of booleans the same size as the array you're sorting. Set them all to false.
Run an iteration of bogosort
Pick two random elements.
If the two elements are sorted in relation to eachother (i < j && array[i] < array[j]), mark the indexes of both on the boolean array to true. Overwise, start over.
Check if all of the booleans in the array are true. If not, go back to 3.
Done.

Yes, SimpleSort, in theory it runs in O(-1) however this is equivalent to O(...9999) which is in turn equivalent to O(∞ - 1), which as it happens is also equivalent to O(∞). Here is my sample implementation:
/* element sizes are uneeded, they are assumed */
void
simplesort (const void* begin, const void* end)
{
for (;;);
}

One I was just working on involves picking two random points, and if they are in the wrong order, reversing the entire subrange between them. I found the algorithm on http://richardhartersworld.com/cri_d/cri/2001/badsort.html, which says that the average case is is probably somewhere around O(n^3) or O(n^2 log n) (he's not really sure).
I think it might be possible to do it more efficiently, because I think it might be possible to do the reversal operation in O(1) time.
Actually, I just realized that doing that would make the whole thing I say maybe because I just realized that the data structure I had in mind would put accessing the random elements at O(log n) and determining if it needs reversing at O(n).

Randomsubsetsort.
Given an array of n elements, choose each element with probability 1/n, randomize these elements, and check if the array is sorted. Repeat until sorted.
Expected time is left as an exercise for the reader.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio