Quicksort: Choosing the pivot - algorithm

When implementing Quicksort, one of the things you have to do is to choose a pivot. But when I look at pseudocode like the one below, it is not clear how I should choose the pivot. First element of list? Something else?
function quicksort(array)
var list less, greater
if length(array) ≤ 1
return array
select and remove a pivot value pivot from array
for each x in array
if x ≤ pivot then append x to less
else append x to greater
return concatenate(quicksort(less), pivot, quicksort(greater))
Can someone help me grasp the concept of choosing a pivot and whether or not different scenarios call for different strategies.

Choosing a random pivot minimizes the chance that you will encounter worst-case O(n2) performance (always choosing first or last would cause worst-case performance for nearly-sorted or nearly-reverse-sorted data). Choosing the middle element would also be acceptable in the majority of cases.
Also, if you are implementing this yourself, there are versions of the algorithm that work in-place (i.e. without creating two new lists and then concatenating them).

It depends on your requirements. Choosing a pivot at random makes it harder to create a data set that generates O(N^2) performance. 'Median-of-three' (first, last, middle) is also a way of avoiding problems. Beware of relative performance of comparisons, though; if your comparisons are costly, then Mo3 does more comparisons than choosing (a single pivot value) at random. Database records can be costly to compare.
Update: Pulling comments into answer.
mdkess asserted:
'Median of 3' is NOT first last middle. Choose three random indexes, and take the middle value of this. The whole point is to make sure that your choice of pivots is not deterministic - if it is, worst case data can be quite easily generated.
To which I responded:
Analysis Of Hoare's Find Algorithm With Median-Of-Three Partition (1997)
by P Kirschenhofer, H Prodinger, C Martínez supports your contention (that 'median-of-three' is three random items).
There's an article described at portal.acm.org that is about 'The Worst Case Permutation for Median-of-Three Quicksort' by Hannu Erkiö, published in The Computer Journal, Vol 27, No 3, 1984. [Update 2012-02-26: Got the text for the article. Section 2 'The Algorithm' begins: 'By using the median of the first, middle and last elements of A[L:R], efficient partitions into parts of fairly equal sizes can be achieved in most practical situations.' Thus, it is discussing the first-middle-last Mo3 approach.]
Another short article that is interesting is by M. D. McIlroy, "A Killer Adversary for Quicksort", published in Software-Practice and Experience, Vol. 29(0), 1–4 (0 1999). It explains how to make almost any Quicksort behave quadratically.
AT&T Bell Labs Tech Journal, Oct 1984 "Theory and Practice in the Construction of a Working Sort Routine" states "Hoare suggested partitioning around the median of several randomly selected lines. Sedgewick [...] recommended choosing the median of the first [...] last [...] and middle". This indicates that both techniques for 'median-of-three' are known in the literature. (Update 2014-11-23: The article appears to be available at IEEE Xplore or from Wiley — if you have membership or are prepared to pay a fee.)
'Engineering a Sort Function' by J L Bentley and M D McIlroy, published in Software Practice and Experience, Vol 23(11), November 1993, goes into an extensive discussion of the issues, and they chose an adaptive partitioning algorithm based in part on the size of the data set. There is a lot of discussion of trade-offs for various approaches.
A Google search for 'median-of-three' works pretty well for further tracking.
Thanks for the information; I had only encountered the deterministic 'median-of-three' before.

Heh, I just taught this class.
There are several options.
Simple: Pick the first or last element of the range. (bad on partially sorted input)
Better: Pick the item in the middle of the range. (better on partially sorted input)
However, picking any arbitrary element runs the risk of poorly partitioning the array of size n into two arrays of size 1 and n-1. If you do that often enough, your quicksort runs the risk of becoming O(n^2).
One improvement I've seen is pick median(first, last, mid);
In the worst case, it can still go to O(n^2), but probabilistically, this is a rare case.
For most data, picking the first or last is sufficient. But, if you find that you're running into worst case scenarios often (partially sorted input), the first option would be to pick the central value( Which is a statistically good pivot for partially sorted data).
If you're still running into problems, then go the median route.

Never ever choose a fixed pivot - this can be attacked to exploit your algorithm's worst case O(n2) runtime, which is just asking for trouble. Quicksort's worst case runtime occurs when partitioning results in one array of 1 element, and one array of n-1 elements. Suppose you choose the first element as your partition. If someone feeds an array to your algorithm that is in decreasing order, your first pivot will be the biggest, so everything else in the array will move to the left of it. Then when you recurse, the first element will be the biggest again, so once more you put everything to the left of it, and so on.
A better technique is the median-of-3 method, where you pick three elements at random, and choose the middle. You know that the element that you choose won't be the the first or the last, but also, by the central limit theorem, the distribution of the middle element will be normal, which means that you will tend towards the middle (and hence, nlog(n) time).
If you absolutely want to guarantee O(nlog(n)) runtime for the algorithm, the columns-of-5 method for finding the median of an array runs in O(n) time, which means that the recurrence equation for quicksort in the worst case will be:
T(n) = O(n) (find the median) + O(n) (partition) + 2T(n/2) (recurse left and right)
By the Master Theorem, this is O(nlog(n)). However, the constant factor will be huge, and if worst case performance is your primary concern, use a merge sort instead, which is only a little bit slower than quicksort on average, and guarantees O(nlog(n)) time (and will be much faster than this lame median quicksort).
Explanation of the Median of Medians Algorithm

Don't try and get too clever and combine pivoting strategies. If you combined median of 3 with random pivot by picking the median of the first, last and a random index in the middle, then you'll still be vulnerable to many of the distributions which send median of 3 quadratic (so its actually worse than plain random pivot)
E.g a pipe organ distribution (1,2,3...N/2..3,2,1) first and last will both be 1 and the random index will be some number greater than 1, taking the median gives 1 (either first or last) and you get an extermely unbalanced partitioning.

It is easier to break the quicksort into three sections doing this
Exchange or swap data element function
The partition function
Processing the partitions
It is only slightly more inefficent than one long function but is alot easier to understand.
Code follows:
/* This selects what the data type in the array to be sorted is */
#define DATATYPE long
/* This is the swap function .. your job is to swap data in x & y .. how depends on
data type .. the example works for normal numerical data types .. like long I chose
above */
void swap (DATATYPE *x, DATATYPE *y){
DATATYPE Temp;
Temp = *x; // Hold current x value
*x = *y; // Transfer y to x
*y = Temp; // Set y to the held old x value
};
/* This is the partition code */
int partition (DATATYPE list[], int l, int h){
int i;
int p; // pivot element index
int firsthigh; // divider position for pivot element
// Random pivot example shown for median p = (l+h)/2 would be used
p = l + (short)(rand() % (int)(h - l + 1)); // Random partition point
swap(&list[p], &list[h]); // Swap the values
firsthigh = l; // Hold first high value
for (i = l; i < h; i++)
if(list[i] < list[h]) { // Value at i is less than h
swap(&list[i], &list[firsthigh]); // So swap the value
firsthigh++; // Incement first high
}
swap(&list[h], &list[firsthigh]); // Swap h and first high values
return(firsthigh); // Return first high
};
/* Finally the body sort */
void quicksort(DATATYPE list[], int l, int h){
int p; // index of partition
if ((h - l) > 0) {
p = partition(list, l, h); // Partition list
quicksort(list, l, p - 1); // Sort lower partion
quicksort(list, p + 1, h); // Sort upper partition
};
};

It is entirely dependent on how your data is sorted to begin with. If you think it will be pseudo-random then your best bet is to either pick a random selection or choose the middle.

If you are sorting a random-accessible collection (like an array), it's general best to pick the physical middle item. With this, if the array is all ready sorted (or nearly sorted), the two partitions will be close to even, and you'll get the best speed.
If you are sorting something with only linear access (like a linked-list), then it's best to choose the first item, because it's the fastest item to access. Here, however,if the list is already sorted, you're screwed -- one partition will always be null, and the other have everything, producing the worst time.
However, for a linked-list, picking anything besides the first, will just make matters worse. It pick the middle item in a listed-list, you'd have to step through it on each partition step -- adding a O(N/2) operation which is done logN times making total time O(1.5 N *log N) and that's if we know how long the list is before we start -- usually we don't so we'd have to step all the way through to count them, then step half-way through to find the middle, then step through a third time to do the actual partition: O(2.5N * log N)

Ideally the pivot should be the middle value in the entire array.
This will reduce the chances of getting worst case performance.

In a truly optimized implementation, the method for choosing pivot should depend on the array size - for a large array, it pays off to spend more time choosing a good pivot. Without doing a full analysis, I would guess "middle of O(log(n)) elements" is a good start, and this has the added bonus of not requiring any extra memory: Using tail-call on the larger partition and in-place partitioning, we use the same O(log(n)) extra memory at almost every stage of the algorithm.

Quick sort's complexity varies greatly with the selection of pivot value. for example if you always choose first element as an pivot, algorithm's complexity becomes as worst as O(n^2). here is an smart method to choose pivot element-
1. choose the first, mid, last element of the array.
2. compare these three numbers and find the number which is greater than one and smaller than other i.e. median.
3. make this element as pivot element.
choosing the pivot by this method splits the array in nearly two half and hence the complexity
reduces to O(nlog(n)).

On the average, Median of 3 is good for small n. Median of 5 is a bit better for larger n. The ninther, which is the "median of three medians of three" is even better for very large n.
The higher you go with sampling the better you get as n increases, but the improvement dramatically slows down as you increase the samples. And you incur the overhead of sampling and sorting samples.

I recommend using the middle index, as it can be calculated easily.
You can calculate it by rounding (array.length / 2).

If you choose the first or the last element in the array, then there are high chance that the pivot is the smallest or the largest element of the array and that is bad.
Why?
Because in that case the number of element smaller / larger than the pivot element in 0. and this will repeat as follow :
Consider the size of the array n.Then,
(n) + (n - 1) + (n - 2) + ......+ 1 = O(n^2)
Hence, the time complexity increases to O(n^2) from O(nlogn). So, I highly recommend to use median / random element of the array as the pivot.

Related

Big Shot IT company interview puzzle

This past week I attended a couple of interviews at a few big IT companies. one question that left me bit puzzled. below is an exact description of the problem.(from one of the interview questions website)
Given the data set,
A,B,A,C,A,B,A,D,A,B,A,C,A,B,A,E,A,B,A,C,A,B,A,D,A,B,A,C,A,B,A,F
which can be reduced to
(A; 16); (B; 8); (C; 4); (D; 2); (E; 1); (F; 1):
using the (value, frequency) format.
for a total of m of these tuples, stored in no specific order. Devise an O(m) algorithm that returns the kth order statistic of the data set. m is the number of tuples as opposed to n which is total number of elements in the data set.
You can use Quick-Select to solve this problem.
Naively:
Pick an element (called the pivot) from the array
Put things less than or equal to the pivot on the left of the array, those greater on the right.
If the pivot is in position k, then you're done. If it's greater than k, then repeat the algorithm on the left side of the array. If it's less than k, then repeat the algorithm on the right side of the array.
There's a couple of details:
You need to either pick the pivot randomly (if you're happy with expected O(m) as the cost), or use a deterministic median algorithm.
You need to be careful to not take O(m^2) time if there's lots of values equal to the pivot. One simple way to do this is to do a second pass to split the array into 3 parts rather than 2: those less than the pivot, those equal to the pivot, and those greater than the pivot.

searching through a vast collection of potential solutions

I have a quite difficult problem (perhaps even a NP-hard problem ^^) with looking for a solution in a massive collection of results. Perhaps there is an algorithm for it.
Below exercise is artificial but is a perfect example to illustrate my issue.
There is a big array with integers. Lets say it has 100.000 elements.
int numbers[] = {-123,32,4,-234564,23,5,....}
I want to check in a relatively quick way if a sum on any 2 numbers from this array is equal to 0. In other words, if the array has "-123" I want to find is there also a "123" number.
The easiest solution would be brute force - check everything with everything. That gives 100.000 x 100.000 a big number ;-) Obviously brute force method can by optimised. Order numbers and check negatives against positive only. My question is - is there something better then optimised brute force to find a solution?
First, sort the array by magnitude of the value.
Then, if the data contains a pair which satisfies the conditions you're after, it contains such a pair adjacent in the array. So just sweep through looking for adjacent pairs whose sum is 0.
Overall time complexity is O(n log n) for the sort, could be O(n) if you use "cheating" sorts not based solely on comparisons. Clearly it can't be done in less than linear time, because in the worst case you can't do it without looking at all the elements. I think n log n is probably optimal in the decision tree model of computing, but only because it "feels a bit like" the element uniqueness problem.
Alternative approach:
Add the elements one at a time to a hash-based or tree-based container. Before adding each element, check whether its negative is present. If so, stop.
This is likely to be faster in the case where there are lots of suitable pairs, because you save the cost of sorting the whole data. That said, you could write a modified sort that exits early by checking for adjacent pairs as soon as any subset of the data is in its final order, but that's effort.
Brute force would be an O(n^2) solution. You can certainly do better.
Off the top of my head, first sort it. Heap sort will have a complexity of O(nlogn).
Now, for the first element, say a, you know you need to find an element b, such that a+b = 0. This can be found using binary search (since your array is now sorted). Binary search has a complexity of O(logn).
This gives you an overall solution of O(nlogn) complexity.
The example you provided can be brute-force solved in O(n^2) time.
You can start ordering the numbers (O(n·logn)) from smaller to bigger. If you place one pointer at the beginning (the "most negative number") and other at the end (the "most positive"), you can check if there is such pair of numbers in an additional O(n) steps by following the next procedure:
If the numbers at both pointers have the same module, you have the solution
If not, move the pointer of the number with bigger module towards "zero" (this is, increase if it is the pointer on the negative side, decrease if it is the positive-side one)
Repeat until finding a solution, or the pointers cross.
Total complexity is O(n·logn)+O(n) = O(n·logn).
Sort your array using Quicksort. After this happened, use two indexes, let's call them positive and negative.
positive <- 0
negative <- size - 1
while ((array[positive] > 0) and (array(negative < 0) and (positive >= 0) and (negative < size)) do
delta <- array[positive] + array[negative]
if (delta = 0) then
return true
else if (delta < 0) then
negative <- negative + 1
else
positive <- positive - 1
end if
end while
return (array[positive] * array[negative] = 0)
You didn't say what should the algorithm do if 0 is part of the array, I've supposed that in this case true should be returned.

Is there a Sorting Algorithm that sorts in O(∞) permutations?

After reading this question and through the various Phone Book sorting scenarios put forth in the answer, I found the concept of the BOGO sort to be quite interesting. Certainly there is no use for this type of sorting algorithm but it did raise an interesting question in my mind-- could their be a sorting algorithm that is infinitely impossible to complete?
In other words, is there a process where one could attempt to compare and re-order a fixed set of data and can yet never achieve an actual sorted list?
This is much more of a theoretical/philosophical question than a practical one and if I was more of a mathematician I'd probably be able to prove/disprove such a possibility. Has anyone asked this question before and if so, what can be said about it?
[edit:] no deterministic process with a finite amount of state takes "O(infinity)" since the slowest it can be is to progress through all possible states. this includes sorting.
[earlier, more specific answer:]
no. for a list of size n you only have state space of size n! in which to store progress (assuming that the entire state of the sort is stored in the ordering of the elements and it really is "doing something," deterministically).
so the worst possible behaviour would cycle through all available states before terminating and take time proportional to n! (at the risk of confusing matters, there must be a single path through the state - since that is "all the state" you cannot have a process move from state X to Y, and then later from state X to Z, since that requires additional state, or is non-deterministic)
Idea 1:
function sort( int[] arr ) {
int[] sorted = quicksort( arr ); // compare and reorder data
while(true); // where'd this come from???
return sorted; // return answer
}
Idea 2
How do you define O(infinity)? The formal definition of Big-O merely states that f(x)=O(g(x)) implies that M*g(x) is an upper bound of f(x) given sufficiently large x and some constant M.
Typically when you talking about "infinity", you are talking about some sort of unbounded limit. So in this case, the only reasonable definition is saying that O(infinity) is O(function that's larger than every function). Obviously a function that's larger than every function is an upper bound. Thus technically everything is "O(infinity)"
Idea 3
Assuming you mean theta notation (tight bound)...
If you impose the additional restriction that the algorithm is smart (returns when it finds a sorted permutation) and every permutation of the list must be visited in a finite amount of time, then the answer no. There are only N! permutations of a list. The upper bound for such a sorting algorithm is then a finite over finite numbers, which is finite.
Your question doesn't really have much to do with sorting. An algorithm which is guaranteed never to complete would be pretty dull. Indeed, even an algorithm which would might or might not ever complete would be pretty dull. Much more interesting would be an algorithm which would be guaranteed to complete, eventually, but whose worst-case computation time with respect to the size of the input would not be expressible as O(F(N)) for any function F that could itself be computed in bounded time. My hunch would be that such an algorithm could be devised, but I'm not sure how.
How about this one:
Start at the first item.
Flip a coin.
If it's heads, switch it with the next item.
If it's tails, don't switch them.
If list is sorted, stop.
If not, move onto the next pair ...
It's a sorting algorithm -- the kind a monkey might do. Is there any guarantee that you'll arrive at a sorted list? I don't think so!
Yes -
SortNumbers(collectionOfNumbers)
{
If IsSorted(collectionOfNumbers){
reverse(collectionOfNumbers(1:end/2))
}
return SortNumbers(collectionOfNumbers)
}
Input: A[1..n] : n unique integers in arbitrary order
Output: A'[1..n] : reordering of the elements of A
such that A'[i] R(A') A'[j] if i < j.
Comparator: a R(A') b iff A'[i] = a, A'[j] = b and i > j
More generally, make the comparator something that's either (a) impossible to reconcile with the output specification, so that no solution can exist, or (b) uncomputable (e.g., sort these (input, turing machine) pairs in order of the number of steps needed for the machine to halt on the input).
Even more generally, if you have a procedure that fails to halt on a valid input, the procedure is not an algorithm which solves the problem on that input/output domain... which means you don't have an algorithm at all, or that what you have is only an algorithm if you appropriately restrict the domain.
Let's suppose that you have a random coin flipper, infinite arithmetic, and infinite rationals. Then the answer is yes. You can write a sorting algorithm which has 100% chance of successfully sorting your data (so it really is a sorting function), but which on average will take infinite time to do so.
Here is an emulation of this in Python.
# We'll pretend that these are true random numbers.
import random
import fractions
def flip ():
return 0.5 < random.random()
# This tests whether a number is less than an infinite precision number in the range
# [0, 1]. It has a 100% probability of returning an answer.
def number_less_than_rand (x):
high = fractions.Fraction(1, 1)
low = fractions.Fraction(0, 1)
while low < x and x < high:
if flip():
low = (low + high) / 2
else:
high = (low + high) / 2
return high < x
def slow_sort (some_array):
n = fractions.Fraction(100, 1)
# This loop has a 100% chance of finishing, but its average time to complete
# is also infinite. If you haven't studied infinite series and products, you'll
# just have to take this on faith. Otherwise proving that is a fun exercise.
while not number_less_than_rand(1/n):
n += 1
print n
some_array.sort()

Where can I use a technique from Majority Vote algorithm

As seen in the answers to Linear time majority algorithm?, it is possible to compute the majority of an array of elements in linear time and log(n) space.
It was shown that everyone who sees this algorithm believes that it is a cool technique. But does the idea generalize to new algorithms?
It seems the hidden power of this algorithm is in keeping a counter that plays a complex role -- such as "(count of majority element so far) - (count of second majority so far)". Are there other algorithms based on the same idea?
Umh, let's first start to understand why the algorithm works, in order to "isolate" the ideas there.
The point of the algorithm is that if you have a majority element, then you can match each occurrence of it with an "another" element, and then you have some more "spare".
So, we just have a counter which counts the number of "spare" occurrences of our guest answer.
If it reaches 0, then it isn't a majority element for the subsequence starting from when we have "elected" the "current" element as the guest major element to the "current" position.
Also, since our "guest" element matches every other element occurrence in the considered subsequence, there are no major elements in the considered subsequence.
Now, since:
our algorithm gives a correct answer only if there is a major element, and
if there is a major element, then it'll still be if we ignore the "current" subsequence when the counter goes to zero
it is obvious to see by contradiction that, if a major element exists, then we have a suffix of the whole sequence when the counter never gets to zero.
Now: what's the idea that can be exploited in new, O(1) size O(n) time algorithms?
To me, you can apply this technique whenever you have to compute a property P on a sequence of elements which:
can be exteded from seq[n, m] to seq[n, m+1] in O(1) time if Q(seq[n, m+1]) doesn't hold
P(seq[n, m]) can be computed in O(1) time and space from P(seq[n, j]) and P(seq[j, m]) if Q(seq[n, j]) holds
In our case, P is the "spare" occurrences of our "elected" major element and Q is "P is zero".
If you see things in that way, longest common subsequence exploits the same idea (dunno about its "coolness factor" ;))
Jaydev Misra and David Gries have a paper called Finding Repeated Elements (ACM page) which generalizes it to an element repeating more than n/k times (k=2 is the majority problem).
Of course, this is probably very similar to the original problem, and you are probably looking for 'different' algorithms.
Here is an example which is possibly different.
Give an algorithm which will detect if a string of parentheses ( '(' and ')') is well formed.
I believe the standard solution is to maintain a counter.
Side note:
As to answers which claim cannot be constant space etc, ask them for the model of computation. In the WORD RAM model for instance, you assume the integers/array indices etc are O(1).
A lot of folks incorrectly mix and match models. For instance, they will happily have the input array of n integers be O(n), have an array index be O(1) space, but a counter they consider Omega(log n) etc, which is nonsense. If they want to consider the size in bits, then the input itself is Omega(n log n) etc.
For people who want to understand what does this algorithm do and why does it works: look at my detailed answer.
Here I will describe a natural extension of this algorithm (or a generalization). So in a standard majority voting algorithm you have to find an element which appears at least n/2 times in the stream, where n is the size of the stream. You can do this in O(n) time (with a tiny constant and in O(log(n)) space, worse case and highly unlikely.
The generalized algorithm allows you to find k most frequent items, where each time appeared at least n/(k+1) times in the original stream. Note that if k=1, you end up with your original problem.
Solution to this problem is really similar to the original one, except instead of one counter and one possible element, you maintain k counters and k possible elements. Now the logic goes in a similar way. You iterate through the array and if the element is in the possible elements, you increase it's counter, if one of the counters is zero - substitute the element of this counter with new element. Otherwise just decrease the values.
As with original majority voting algorithm, you need to have a guarantee that you have these k majority elements, otherwise you have to do another pass over the array to verify that your previously found possible elements are correct. Here is my python attempt (have not done a thorough testing).
from collections import defaultdict
def majority_element_general(arr, k=1):
counter, i = defaultdict(int), 0
while len(counter) < k and i < len(arr):
counter[arr[i]] += 1
i += 1
for i in arr[i:]:
if i in counter:
counter[i] += 1
elif len(counter) < k:
counter[i] = 1
else:
fields_to_remove = []
for el in counter:
if counter[el] > 1:
counter[el] -= 1
else:
fields_to_remove.append(el)
for el in fields_to_remove:
del counter[el]
potential_elements = counter.keys()
# might want to check that they are really frequent.
return potential_elements

Are there any worse sorting algorithms than Bogosort (a.k.a Monkey Sort)? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
My co-workers took me back in time to my University days with a discussion of sorting algorithms this morning. We reminisced about our favorites like StupidSort, and one of us was sure we had seen a sort algorithm that was O(n!). That got me started looking around for the "worst" sorting algorithms I could find.
We postulated that a completely random sort would be pretty bad (i.e. randomize the elements - is it in order? no? randomize again), and I looked around and found out that it's apparently called BogoSort, or Monkey Sort, or sometimes just Random Sort.
Monkey Sort appears to have a worst case performance of O(∞), a best case performance of O(n), and an average performance of O(n·n!).
What is the currently official accepted sorting algorithm with the worst average sorting performance (and there fore beeing worse than O(n·n!))?
From David Morgan-Mar's Esoteric Algorithms page: Intelligent Design Sort
Introduction
Intelligent design sort is a sorting algorithm based on the theory of
intelligent design.
Algorithm Description
The probability of the original input list being in the exact order
it's in is 1/(n!). There is such a small likelihood of this that it's
clearly absurd to say that this happened by chance, so it must have
been consciously put in that order by an intelligent Sorter. Therefore
it's safe to assume that it's already optimally Sorted in some way
that transcends our naïve mortal understanding of "ascending order".
Any attempt to change that order to conform to our own preconceptions
would actually make it less sorted.
Analysis
This algorithm is constant in time, and sorts the list in-place,
requiring no additional memory at all. In fact, it doesn't even
require any of that suspicious technological computer stuff. Praise
the Sorter!
Feedback
Gary Rogers writes:
Making the sort constant in time
denies the power of The Sorter. The
Sorter exists outside of time, thus
the sort is timeless. To require time
to validate the sort diminishes the role
of the Sorter. Thus... this particular
sort is flawed, and can not be
attributed to 'The Sorter'.
Heresy!
Many years ago, I invented (but never actually implemented) MiracleSort.
Start with an array in memory.
loop:
Check to see whether it's sorted.
Yes? We're done.
No? Wait a while and check again.
end loop
Eventually, alpha particles flipping bits in the memory chips should result in a successful sort.
For greater reliability, copy the array to a shielded location, and check potentially sorted arrays against the original.
So how do you check the potentially sorted array against the original? You just sort each array and check whether they match. MiracleSort is the obvious algorithm to use for this step.
EDIT: Strictly speaking, this is not an algorithm, since it's not guaranteed to terminate. Does "not an algorithm" qualify as "a worse algorithm"?
Quantum Bogosort
A sorting algorithm that assumes that the many-worlds interpretation of quantum mechanics is correct:
Check that the list is sorted. If not, destroy the universe.
At the conclusion of the algorithm, the list will be sorted in the only universe left standing.
This algorithm takes worst-case Θ(N) and average-case θ(1) time. In fact, the average number of comparisons performed is 2: there's a 50% chance that the universe will be destroyed on the second element, a 25% chance that it'll be destroyed on the third, and so on.
Jingle Sort, as described here.
You give each value in your list to a different child on Christmas. Children, being awful human beings, will compare the value of their gifts and sort themselves accordingly.
I'm surprised no one has mentioned sleepsort yet... Or haven't I noticed it? Anyway:
#!/bin/bash
function f() {
sleep "$1"
echo "$1"
}
while [ -n "$1" ]
do
f "$1" &
shift
done
wait
example usage:
./sleepsort.sh 5 3 6 3 6 3 1 4 7
./sleepsort.sh 8864569 7
In terms of performance it is terrible (especially the second example). Waiting almost 3.5 months to sort 2 numbers is kinda bad.
I had a lecturer who once suggested generating a random array, checking if it was sorted and then checking if the data was the same as the array to be sorted.
Best case O(N) (first time baby!)
Worst case O(Never)
There is a sort that's called bogobogosort. First, it checks the first 2 elements, and bogosorts them. Next it checks the first 3, bogosorts them, and so on.
Should the list be out of order at any time, it restarts by bogosorting the first 2 again. Regular bogosort has a average complexity of O(N!), this algorithm has a average complexity of O(N!1!2!3!...N!)
Edit: To give you an idea of how large this number is, for 20 elements, this algorithm takes an average of 3.930093*10^158 years,well above the proposed heat death of the universe(if it happens) of 10^100 years,
whereas merge sort takes around .0000004 seconds,
bubble sort .0000016 seconds,
and bogosort takes 308 years, 139 days, 19 hours, 35 minutes, 22.306 seconds, assuming a year is 365.242 days and a computer does 250,000,000 32 bit integer operations per second.
Edit2: This algorithm is not as slow as the "algorithm" miracle sort, which probably, like this sort, will get the computer sucked in the black hole before it successfully sorts 20 elemtnts, but if it did, I would estimate an average complexity of 2^(32(the number of bits in a 32 bit integer)*N)(the number of elements)*(a number <=10^40) years,
since gravity speeds up the chips alpha moving, and there are 2^N states, which is 2^640*10^40, or about 5.783*10^216.762162762 years, though if the list started out sorted, its complexity would only be O(N), faster than merge sort, which is only N log N even at the worst case.
Edit3: This algorithm is actually slower than miracle sort as the size gets very big, say 1000, since my algorithm would have a run time of 2.83*10^1175546 years, while the miracle sort algorithm would have a run time of 1.156*10^9657 years.
If you keep the algorithm meaningful in any way, O(n!) is the worst upper bound you can achieve.
Since checking each possibility for a permutations of a set to be sorted will take n! steps, you can't get any worse than that.
If you're doing more steps than that then the algorithm has no real useful purpose. Not to mention the following simple sorting algorithm with O(infinity):
list = someList
while (list not sorted):
doNothing
Bogobogosort. Yes, it's a thing. to Bogobogosort, you Bogosort the first element. Check to see if that one element is sorted. Being one element, it will be. Then you add the second element, and Bogosort those two until it's sorted. Then you add one more element, then Bogosort. Continue adding elements and Bogosorting until you have finally done every element. This was designed never to succeed with any sizable list before the heat death of the universe.
You should do some research into the exciting field of Pessimal Algorithms and Simplexity Analysis. These authors work on the problem of developing a sort with a pessimal best-case (your bogosort's best case is Omega(n), while slowsort (see paper) has a non-polynomial best-case time complexity).
Here's 2 sorts I came up with my roommate in college
1) Check the order
2) Maybe a miracle happened, go to 1
and
1) check if it is in order, if not
2) put each element into a packet and bounce it off a distant server back to yourself. Some of those packets will return in a different order, so go to 1
There's always the Bogobogosort (Bogoception!). It performs Bogosort on increasingly large subsets of the list, and then starts all over again if the list is ever not sorted.
for (int n=1; n<sizeof(list); ++n) {
while (!isInOrder(list, 0, n)) {
shuffle(list, 0, n);
}
if (!isInOrder(list, 0, n+1)) { n=0; }
}
1 Put your items to be sorted on index cards
2 Throw them into the air on a windy day, a mile from your house.
2 Throw them into a bonfire and confirm they are completely destroyed.
3 Check your kitchen floor for the correct ordering.
4 Repeat if it's not the correct order.
Best case scenerio is O(∞)
Edit above based on astute observation by KennyTM.
The "what would you like it to be?" sort
Note the system time.
Sort using Quicksort (or anything else reasonably sensible), omitting the very last swap.
Note the system time.
Calculate the required time. Extended precision arithmetic is a requirement.
Wait the required time.
Perform the last swap.
Not only can it implement any conceivable O(x) value short of infinity, the time taken is provably correct (if you can wait that long).
Nothing can be worse than infinity.
Segments of π
Assume π contains all possible finite number combinations.
See math.stackexchange question
Determine the number of digits needed from the size of the array.
Use segments of π places as indexes to determine how to re-order the array. If a segment exceeds the size boundaries for this array, adjust the π decimal offset and start over.
Check if the re-ordered array is sorted. If it is woot, else adjust the offset and start over.
Bozo sort is a related algorithm that checks if the list is sorted and, if not, swaps two items at random. It has the same best and worst case performances, but I would intuitively expect the average case to be longer than Bogosort. It's hard to find (or produce) any data on performance of this algorithm.
A worst case performance of O(∞) might not even make it an algorithm according to some.
An algorithm is just a series of steps and you can always do worse by tweaking it a little bit to get the desired output in more steps than it was previously taking. One could purposely put the knowledge of the number of steps taken into the algorithm and make it terminate and produce the correct output only after X number of steps have been done. That X could very well be of the order of O(n2) or O(nn!) or whatever the algorithm desired to do. That would effectively increase its best-case as well as average case bounds.
But your worst-case scenario cannot be topped :)
My favorite slow sorting algorithm is the stooge sort:
void stooges(long *begin, long *end) {
if( (end-begin) <= 1 ) return;
if( begin[0] < end[-1] ) swap(begin, end-1);
if( (end-begin) > 1 ) {
int one_third = (end-begin)/3;
stooges(begin, end-one_third);
stooges(begin+one_third, end);
stooges(begin, end-one_third);
}
}
The worst case complexity is O(n^(log(3) / log(1.5))) = O(n^2.7095...).
Another slow sorting algorithm is actually named slowsort!
void slow(long *start, long *end) {
if( (end-start) <= 1 ) return;
long *middle = start + (end-start)/2;
slow(start, middle);
slow(middle, end);
if( middle[-1] > end[-1] ) swap(middle-1, end-1);
slow(start, end-1);
}
This one takes O(n ^ (log n)) in the best case... even slower than stoogesort.
Recursive Bogosort (probably still O(n!){
if (list not sorted)
list1 = first half of list.
list 2 = second half of list.
Recursive bogosort (list1);
Recursive bogosort (list2);
list = list1 + list2
while(list not sorted)
shuffle(list);
}
Double bogosort
Bogosort twice and compare results (just to be sure it is sorted) if not do it again
This page is a interesting read on the topic: http://home.tiac.net/~cri_d/cri/2001/badsort.html
My personal favorite is Tom Duff's sillysort:
/*
* The time complexity of this thing is O(n^(a log n))
* for some constant a. This is a multiply and surrender
* algorithm: one that continues multiplying subproblems
* as long as possible until their solution can no longer
* be postponed.
*/
void sillysort(int a[], int i, int j){
int t, m;
for(;i!=j;--j){
m=(i+j)/2;
sillysort(a, i, m);
sillysort(a, m+1, j);
if(a[m]>a[j]){ t=a[m]; a[m]=a[j]; a[j]=t; }
}
}
You could make any sort algorithm slower by running your "is it sorted" step randomly. Something like:
Create an array of booleans the same size as the array you're sorting. Set them all to false.
Run an iteration of bogosort
Pick two random elements.
If the two elements are sorted in relation to eachother (i < j && array[i] < array[j]), mark the indexes of both on the boolean array to true. Overwise, start over.
Check if all of the booleans in the array are true. If not, go back to 3.
Done.
Yes, SimpleSort, in theory it runs in O(-1) however this is equivalent to O(...9999) which is in turn equivalent to O(∞ - 1), which as it happens is also equivalent to O(∞). Here is my sample implementation:
/* element sizes are uneeded, they are assumed */
void
simplesort (const void* begin, const void* end)
{
for (;;);
}
One I was just working on involves picking two random points, and if they are in the wrong order, reversing the entire subrange between them. I found the algorithm on http://richardhartersworld.com/cri_d/cri/2001/badsort.html, which says that the average case is is probably somewhere around O(n^3) or O(n^2 log n) (he's not really sure).
I think it might be possible to do it more efficiently, because I think it might be possible to do the reversal operation in O(1) time.
Actually, I just realized that doing that would make the whole thing I say maybe because I just realized that the data structure I had in mind would put accessing the random elements at O(log n) and determining if it needs reversing at O(n).
Randomsubsetsort.
Given an array of n elements, choose each element with probability 1/n, randomize these elements, and check if the array is sorted. Repeat until sorted.
Expected time is left as an exercise for the reader.

Resources