Lately, I have been learning about various methods of sorting and a lot of them are unstable i.e selection sort, quick sort, heap sort.
My question is: What are the general factors that make sorting unstable?
Most of the efficient sorting algorithms are efficient since they move data over a longer distance i.e. far closer to the final position every move. This efficiency causes the loss of stability in sorting.
For example, when you do a simple sort like bubble sort, you compare and swap neighboring elements. In this case, it is easy to not move the elements if they are already in the correct order. But say in the case of quick-sort, the partitioning process might chose to say move so the swaps are minimal. For example, if you partition the below list on the number 2, the most efficient way would be to swap the 1st element with the 4th element and 2nd element with the 5th element
2 3 1 1 1 4
1 1 1 2 3 4
If you notice, now we have changed the sequence of 1's in the list causing it to be unstable.
So to sum it up, some algorithms are very suitable for stable sorting (like bubble-sort), whereas some others like quick sort can be made stable by carefully selecting a partitioning algorithm, albeit at the cost of efficiency or complexity or both.
We usually classify the algorithm to be stable or not based on the most "natural" implementation of it.
A sorting algorithm is stable when it uses the original order of elements to break ties in the new ordering. For example, lets say you have records of (name, age) and you want to sort them by age.
If you use a stable sort on (Matt, 50), (Bob, 20), (Alice, 50), then you will get (Bob, 20), (Matt, 50), (Alice, 50). The Matt and Alice records have equal ages, so they are equal according to the sorting criteria. The stable sort preserves their original relative order -- Matt came before Alice in the original list, so it comes before Alice in the output.
If you use an unstable sort on the same list, you might get (Bob, 20), (Matt, 50), (Alice, 50) or you might get (Bob, 20), (Alice, 50), (Matt, 50). Elements that compare equal will be grouped together but can come out in any order.
It's often handy to have a stable sort, but a stable sort implementation has to remember information about the original order of the elements while its reordering them.
In-place array sorting algorithms are designed not to use any extra space to store this kind of information, and they destroy the original ordering while they work. The fast ones like quicksort aren't usually stable, because reordering the array in ways that preserve the original order to break ties is slow. Slow array sorting algorithms like insertion sort or selection sort can usually be written to be stable without difficulty.
Sorting algorithms that copy data from one place to another, or work with other data structures like linked lists, can be both fast and stable. Merge sort is the most common.
If you have an example input of
1 5 3 7 1
You want for to be stable the last 1 to never go before the first 1.
Generally, elements with the same value in the input array to not have changed their positions once sorted.
Then sorted would look like:
1(f) 3 5 7 1(l)
f: first, l: last(or second if more than 2).
For example, QuickSort uses swaps and because the comparisons are done with greater than (>=) or less than, equally valued elements can be swapped while sorting. And as result in the output.
Related
It is something I don't understand because if the array is Un-sorted it means that the position of any element in the array does not matter when sorting the array only the key by which you sort the array should be considered if there are two keys with the same value, why does the default return to the original array to check where was its position before the sorting? is it like that just because that is the only possibility it has?
I assume your question is "why do you want stability in a sorting algorithm?", otherwise check #Sopel's comment.
Some sorting algorithms are stable by nature like Insertion sort, Merge Sort, Bubble Sort, etc. And some sorting algorithms are not, like Heap Sort, Quick Sort, etc. Stability is a property in sorting algorithms that is required for some applications.
Consider the following example: you want to sort people by last_name and if the last_name is identical sort them by first_name (like in a telephone book). One way to achieve this, is by sorting the elements by first_name and afterwards use a stable sorting algorithm to sort by last_name. This will only give you the correct result if the second sorting is stable ensuring that people with the same last_name are still sorted by their first_name from the initial sorting.
why does a stable algorithm look at the original list to determine the position of the duplicate values in the sorted array not by value but by the position they had in the unsorted array?
You are assuming that a sorting algorithm would have to do this in order to be stable. But that is not the case.
Consider bubble sort, for example. Bubble sort only ever swaps adjacent elements, and it only swaps them if they are currently out of order. Now imagine we have two elements in the array with equal keys:
[..., 8♦, ..., ..., ..., 8♥, ...]
Bubble sort will swap 8♦ to the right and/or it will swap 8♥ to the left. But it only ever swaps them one space at a time. So after a some swaps they end up being adjacent:
[..., ..., ..., 8♦, 8♥, ..., ...]
And then they will stay adjacent (except while temporarily swapping a value with a larger key past both of them), and they will never be swapped with each other, because they are not out of order relative to each other. So, bubble sort will leave them in their original relative order.
Note that this works without comparing the original indices; if 8♦ is to the left of 8♥, then it stays to the left of 8♥ simply because the algorithm won't swap them with each other. This means that bubble sort is naturally stable without keeping track of the original array indices or looking them up.
In practice, we call a sorting algorithm "stable" if it's naturally stable, without having to use the original indices as a tie-breaker. The technique of keeping track of the original array indices is used when you want to do a stable sort using an unstable sorting algorithm; an algorithm is stable if you don't need to use that technique.
For example an array of size 7 with contains all 3's.
<3a, 3b, 3c, 3d, 3e, 3f, 3g>
The letters are used to distinguish which the "identity" of the 3 for the purposes of demonstration, they are not actually part of the data.
It depends on the implementation.
It is called stable if it leaves identical elements in the sequence they are, and not stable if they might come back in another sequence.
Of course, you wouldn't see the difference - unless you sort data rows with other data in other columns, and only the sorted column(s) are identical. There it makes a difference.
Quick sort and just about any sort that swaps non-adjacent elements is not stable (you could get lucky and end up with a stable result), meaning the order of identical elements is not preserved.
Merge sort and most sorts that only swap adjacent elements are stable.
I would like to sort an array of tuples by all elements (like if they were in a trie). If the input is (1,2,5), (1,2,3), (1,1,4), (2,8,9), the corresponding output would be (1,1,4), (1,2,3), (1,2,5),(2,8,9). The corresponding trie would be:
root
/ \
1 2
/ \ |
1 2 8
| /\ |
4 3 5 9
I was thinking about using a search tree for each position in the tuples. There is also the obvious naive way (sort by first position, then sort by second position, etc.). Does anybody see a better way?
The trie-based approach that you have outlined above is extremely similar to doing a most-significant digit radix sort on the tuples. You essentially are distributing them into buckets based on their first digit, then recursively subdividing the buckets into smaller groups based on the remaining digits. You might want to consider explicitly performing the MSD radix sort rather than building the trie, since tries can be memory-inefficient when the data are sparse, while MSD radix sort has reasonably good memory usage (especially if you implement everything implicitly).
In the example you gave above, all of the numbers in the tuples were single digits. If this is the case, you can have at most 10 × 10 × 10 = 1000 possible distinct tuples, which isn't very large. In that case, you might want to consider just using a standard sorting algorithm with a custom comparator, since the benefits of a more optimized sort probably won't be all that apparent at that scale. On the other hand, if your tuples have many more entries in them, then it might be worth investing in a more clever sort, like MSD radix sort.
Hope this helps!
Radix Sort to Trie, is like Merge Sort to Binary-Tree
how about keeping things simple and consider the tuple value as sum of all elements of the tuple to some base
say 10
then we have (1,2,5) as 125
and so now and then just sorting them with any simple comparative sorting like heap sort
Quicksort is not stable, since it exchanges nonadjacent elements.
Please help me understanding this statement.
I know how partitioning works, and what stability is. But I cant figure out what makes the above as the reason for this to be not stable?
Then I believe the same can be said for merge sort - though it is quoted to be a stable algorithm.
Consider what happens during the partition for the following array of pairs, where the comparator uses the integer (only). The string is just there so that we have two elements that compare as if equal, but actually are distinguishable.
(4, "first"), (2, ""), (3, ""), (4, "second"), (1, "")
By definition a sort is stable if, after the sort, the two elements that compare as if equal (the two 4s) appear in the same order afterwards as they did before.
Suppose we choose 3 as the pivot. The two 4 elements will end up after it and the 1 and the 2 before it (there's a bit more to it than that, I've ignored moving the pivot since it's already in the correct position, but you say you understand partitioning).
Quicksorts in general don't give any particular guarantee where after the partition the two 4s will be, and I think most implementations would reverse them. For instance, if we use Hoare's classic partitioning algorithm, the array is partitioned as follows:
(1, ""), (2, ""), (3, ""), (4, "second"), (4, "first")
which violates the stability of sorting.
Since each partition isn't stable, the overall sort isn't likely to be.
As Steve314 points out in a comment, merge sort is stable provided that when merging, if you encounter equal elements you always output first the one that came from the "lower down" of the two halves that you're merging together. That is, each merge has to look like this, where the "left" is the side that comes from lower down in the original array.
while (left not empty and right not empty):
if first_item_on_left <= first_item_on_right:
move one item from left to output
else:
move one item from right to output
move everything from left to output
move everything from right to output
If the <= were < then the merge wouldn't be stable.
Consider the following array of pairs:
{(4,'first');(2,'');(1,'');(4,'second');(3,'')}
Consider 3 as pivot. During a run of Quick sort, the array undergoes following changes:
{(4,'first');(2,'');(1,'');(4,'second');(3,'')}
{(2,'');(4,'first');(1,'');(4,'second');(3,'')}
{(2,'');(1,'');(4,'first');(4,'second');(3,'')}
{(2,'');(1,'');(3,'');(4,'second');(4,'first')}
{(1,'');(2,'');(3,'');(4,'second');(4,'first')}
Clearly from the above, the relative order is changed. This is why quick sort is said to 'not ensure stability'.
It will be like a user has a sorted array, and sorts by another column, does the sort algorithm always preserve the relative order of the elements that differ for the previous sort key but have the same value in the new sort key?
So, in A sort algorithm which always preserves the order of elements (which do not differ in the new sort key) is called a "stable sort".
A sort is said to be stable if the order of equivalent items is preserved.
The stability of quicksort depends on partitioning strategy. "Quicksort is not stable since it exchanges nonadjacent elements." This statement relies on the prerequisite that Hoare partitioning is used.
This is a hoare partioning demo from Berkeley CS61b, Hoare Partitioning
You should know what "it exchanges nonadjacent elements " means.
Does the following Quicksort partitioning algorithm result in a stable sort (i.e. does it maintain the relative position of elements with equal values):
partition(A,p,r)
{
x=A[r];
i=p-1;
for j=p to r-1
if(A[j]<=x)
i++;
exchange(A[i],A[j])
exchange(A[i+1],A[r]);
return i+1;
}
There is one case in which your partitioning algorithm will make a swap that will change the order of equal values. Here's an image that helps demonstrate how your in-place partitioning algorithm works:
We march through each value with the j index, and if the value we see is less than the partition value, we append it to the light-gray subarray by swapping it with the element that is immediately to the right of the light-gray subarray. The light-gray subarray contains all the elements that are <= the partition value. Now let's look at, say, stage (c) and consider the case in which three 9's are in the beginning of the white zone, followed by a 1. That is, we are about to check whether the 9's are <= the partition value. We look at the first 9 and see that it is not <= 4, so we leave it in place, and march j forward. We look at the next 9 and see that it is not <= 4, so we also leave it in place, and march j forward. We also leave the third 9 in place. Now we look at the 1 and see that it is less than the partition, so we swap it with the first 9. Then to finish the algorithm, we swap the partition value with the value at i+1, which is the second 9. Now we have completed the partition algorithm, and the 9 that was originally third is now first.
Any sort can be converted to a stable sort if you're willing to add a second key. The second key should be something that indicates the original order, such as a sequence number. In your comparison function, if the first keys are equal, use the second key.
A sort is stable when the original order of similar elements doesn't change. Your algorithm isn't stable since it swaps equal elements.
If it didn't, then it still wouldn't be stable:
( 1, 5, 2, 5, 3 )
You have two elements with the sort key "5". If you compare element #2 (5) and #5 (3) for some reason, then the 5 would be swapped with 3, thereby violating the contract of a stable sort. This means that carefully choosing the pivot element doesn't help, you must also make sure that the copying of elements between the partitions never swaps the original order.
Your code looks suspiciously similar to the sample partition function given on wikipedia which isn't stable, so your function probably isn't stable. At the very least you should make sure your pivot point r points to the last position in the array of values equal to A[r].
You can make quicksort stable (I disagree with Matthew Jones there) but not in it's default and quickest (heh) form.
Martin (see the comments) is correct that a quicksort on a linked list where you start with the first element as pivot and append values at the end of the lower and upper sublists as you go through the array. However, quicksort is supposed to work on a simple array rather than a linked list. One of the advantages of quicksort is it's low memory footprint (because everything happens in place). If you're using a linked list you're already incurring a memory overhead for all the pointers to next values etc, and you're swapping those rather than the values.
If you need a stable O(n*log(n)) sort, use mergesort. (The best way to make quicksort stable by the way is to chose a median of random values as the pivot. This is not stable for all elements equivalent, however.)
Quick sort is not stable. Here is the case when its not stable.
5 5 4 8
taking 1st 5 as pivot, we will have following after 1st pass-
4 5 5 8
As you can see order of 5's have been changed. Now if we continue doing sorting it will change the order of 5's in sorted array.
From Wikipedia:
Quicksort is a comparison sort and, in
efficient implementations, is not a
stable sort.
One way to solve this problem is by not taking Last Element of array as Key. Quick sort is randomized algorithm.
Its performance highly depends upon selection of Key. Although algorithm def says we should take last or first element as key, in reality we can select any element as key.
So I tried Median of 3 approach, which says take first ,middle and last element of array. Sorts them and then use middle position as a Key.
So for example my array is {9,6,3,10,15}. So by sorting first, middle and last element it will be {3,6,9,10,15}. Now use 9 as key. So moving key to the end it will be {3,6,15,10,9}.
All we need to take care is what happens if 9 comes more than once. That is key it self comes more than once.
In such cases after selecting key as middle index we need to go through elements between Key to Right end and if any element is found same key i.e. if 9 is found between middle position to the end make that 9 as key.
Now in the region of elements greater than 9 i.e. loop of j if any 9 is found swap it with region of elements less than that is region of i. Your array will be stable sorted.