Quicksort - conditions that makes it stable - sorting

A sorting algorithm is stable if it preserves the relative order of any two elements with equals keys. Under which conditions is quicksort stable?
Quicksort is stable when no item is passed unless it has a smaller key.
What other conditions make it stable?

Well, it is quite easy to make a stable quicksort that uses O(N) space rather than the O(log N) that an in-place, unstable implementation uses. Of course, a quicksort that uses O(N) space doesn't have to be stable, but it can be made to be so.
I've read that it is possible to make an in-place quicksort that uses O(log N) memory, but it ends up being significantly slower (and the details of the implementation are kind of beastly).
Of course, you can always just go through the array being sorted and add an extra key that is its place in the original array. Then the quicksort will be stable and you just go through and remove the extra key at the end.

The condition is easy to figure out, just keep the raw order for equal elements. There is no other condition that has essential differences from this one. The point is how to achieve it.
There is a good example.
1. Use the middle element as the pivot.
2. Create two lists, one for smaller, the other for larger.
3. Iterate from the first to the last and put elements into the two lists. Append the element smaller than or equals to and ahead of the pivot to the smaller list, the element larger than or equals to and behind of the pivot to the larger list. Go ahead and recursive the smaller list and the larger list.
https://www.geeksforgeeks.org/stable-quicksort/
The latter 2 points are both necessary. The middle element as the pivot is optional. If you choose the last element as pivot, just append all equal elements to the smaller list one by one from the beginning and they will keep the raw order in nature.

You can add these to a list, if that's your approach:
When the elements have absolute ordering
When the implementation takes O(N) time to note the relative orderings and restores them after the sort
When the pivot chosen is ensured to be of a unique key, or the first occurence in the current sublist.

Related

Quicksort to already sorted array

In this question: https://www.quora.com/What-is-randomized-quicksort
Alejo Hausner told in: Cost of quicksort, in the worst case, that
Ironically, if you apply quicksort to an array that is already sorted, you will get probably get this costly behavior
I cannot get it. Can someone explain it to me.
https://www.quora.com/What-will-be-the-complexity-of-quick-sort-if-array-is-already-sorted may be answer to this, but that did not get me a complete response.
The Quicksort algorithm is this:
select a pivot
move elements smaller than the pivot to the beginning, and elements larger than pivot to the end
now the array looks like [<=p, <=p, <=p, p, >p, >p, >p]
recursively sort the first and second "halves" of the array
Quicksort will be efficient, with a running time close to n log n, if the pivot always end up close to the middle of the array. This works perfectly if the pivot is the median value. But selecting the actual median would be costly in itself. If the pivot happens, out of bad luck, to be the smallest or largest element in the array, you'll get an array like this: [p, >p, >p, >p, >p, >p, >p]. If this happens too often, your "quicksort" effectively behaves like selection sort. In that case, since the size of the subarray to be recursively sorted only reduces by 1 at every iteration, there will be n levels of iteration, each one costing n operations, so the overall complexity will be `n^2.
Now, since we're not willing to use costly operations to find a good pivot, we might as well pick an element at random. And since we also don't really care about any kind of true randomness, we can just pick an arbitrary element from the array, for instance the first one.
If the array was shuffled uniformly at random, then picking the first element is great. You can reasonably hope it will regularly give you an "average" element. But if the array was already sorted... Then by definition the first element is the smallest. So we're in the bad case where the complexity is n^2.
A simple way to avoid "bad lists" is to pick a true random element instead of an arbitrary element. Or if you have reasons to believe that quicksort will often be called on lists that are almost sorted, you could pick the element in position n/2 instead of the one in position 1.
There are also several research papers about different ways to select the pivot, with precise calculations on the impact on complexity. For instance, you could pick three random elements, rank them from smallest to largest and keep the middle one. But the conclusion usually is: if you try to write a better pivot-selection, then it will also be more costly, and the overall complexity of the algorithm won't be improved that much.
Depending on the implementations there are several 'common' ways to choose the pivot.
In general for 'unsorted' source there is no good or bad way to choose it.
So some implementations just take the first element as pivot.
In the case of a already sorted source this results in the worst pivot possible because the lest interval will always be empty.
-> recursion steps = O(n) instead the desired O(log n).
This leads to O(n²) complexity, which is very bad for sorting.
Choosing the pivot by random avoids this behavior. It is extremely unlikely that the random chosen pivot will have the same bad characteristics in every recursion as described above.
Also on purpose bad source is not possible to generate because you cannot predict the choices of the random generator (if it's a good one)

introduction to algorithms exercise 8.3-2 understanding stability

when I got the answer:
http://clrs.skanev.com/08/03/02.html for exercise 8.3-2,
I could not understand how to use index specifically to solve it.
Could someone please show it step by step or interpret why is Θ(n)?
and here is the question and answer:
Which of the following sorting algorithms are stable: insertion sort, merge sort, heapsort, and quicksort? Give a simple scheme that makes any sorting algorithm stable. How much additional time and space does your scheme entail?
Stable: Insertion sort, merge sort
Not stable: Heapsort, quicksort
We can make any algorithm stable by mapping the array to an array of pairs, where the first element in each pair is the original element and the second is its index. Then we sort lexicographically. This scheme takes additional Θ(n) space.
In the context of sorting, "stable" means that when a collection containing some elements with equivalent value is sorted, those elements stay in the same order with respect to each other.
So a sorting algorithm can be made stable by storing the original index of each element, and using that index as a secondary way of sorting elements with equal primary value.
To implement this the comparison function (for example <) would be implemented
so A < B returns true if A.PrimarySortValue < B.PrimarySortValue, and returns (A.OrginalIndex < B.OriginalIndex) when A.PrimarySortValue == B.PrimarySortValue. otherwise (when A.PrimarySortValue > B.PrimarySortValue) it returns false;
This requires one additional OriginalIndex value to be stored per element. There are n elements hence Θ(n) extra space is required.

Finding time and space complexity

I have been learning about sorting.
Most of sorting algorithms(merge, quick, etc) utilise arrays.
I was thinking what if I did not sort an array in place.
An algorithm I thought of is
Iterate through each element in array - O(n).
For each element, compare the element with starting and ending element of a doubly linked list.
Add the element to correct position in linked list. (Start iterating from start/end of list based on which one would be faster).
When all elements in original array are sorted, create background thread which copies them into array. Until copy is not done, return index element by iterating over list.
When copy is done, return elements through array index.
Now, what would be time complexity of this and how do I calculate it?
Let's go through everything one step at a time.
Iterate through each element in array - O(n).
Yep!
For each element, compare the element with starting and ending element of a doubly linked list.
Add the element to correct position in linked list. (Start iterating from start/end of list based on which one would be faster).
Let's suppose that the doubly-linked list currently has k elements in it. Unfortunately, just by looking at the front and back element of the list, you won't be able to tell where in the list the element is likely to go. It's quite possible that your element is closer in value to the front element of the list than the back, but would actually belong just before the back element. You also don't have random access in a linked list, so in the worst case you may have to scan all k elements of the linked list trying to find the spot where this element belongs. That means that the work done is in the worst case going to be O(k). Now, each iteration of the algorithm increases k (the number of elements in the list) by one, so the work done is in the worst case 1 + 2 + 3 + ... + n = Θ(n2).
When all elements in original array are sorted, create background thread which copies them into array. Until copy is not done, return index element by iterating over list.
When copy is done, return elements through array index.
This is an interesting idea and it's hard to measure the complexity. If the background thread gets starved out or is really slow, then the cost of looking up any element will be O(n) in the worst case because you may have to scan over half the elements in the list to find the one you're looking for.
In total, your algorithm runs in time O(n2) and uses Θ(n) memory. It's essentially a variant of insertion sort (as #Yu Hao) pointed out and, in practice, I'd expect that this would be substantially slower than just using a standard O(n log n) sorting algorithm, or even an in-place insertion sort, due to the extra memory overhead and poor locality of reference afforded by linked lists.
The algorithm you describe is basically a variant version of Insertion sort.
The major reason of using a linked list here is to avoid the extra swap of elements in arrays. Comparing elements with both the head and tail of doubly linked list provides minor performance improvmenet, if any.
The time complexity is still O(N2) for random input.

QuickSort for sorting part Mergesort?

Ques: Mergesort divides a list of numbers into two halves and calls itself recursively on both of them. Instead can you perform quicksort on the left half and mergesort on the right half? If yes, show how it will sort the following list of numbers by showing every step. If no, explain why you cannot.
Iam supposed to sort a list of numbers using mergesort. Where the left half is to be sorted using a quicksort ?
I figured it out.
Ans:Yes,we can
Sort the right half of the array using mergesort.
Sort the left half using quicksort.
Merge the 2 using the merge func of merge_sort.
Yes, you can do this. The basic idea behind mergesort is the following:
Split the array into two (or more) pieces.
Sort each piece independently.
Apply a merge step to combine the sorted pieces into one overall sorted list.
From the perspective of correctness, it doesn't actually matter how you sort the lists generated in part (2). All that matters is that those lists get sorted. A typical implementation of mergesort does step (2) by recursively applying itself to the left and right halves, but there's no fundamental reason you have to do this. (In fact, in some optimized versions of mergesort, you specifically don't do this and instead switch to an algorithm like insertion sort when the arrays get sufficiently small).
In your case, you are correct that using quicksort on the left and mergesort on the right would still produce a sorted sequence. However, the way in which it would work would look quite different from what you're describing. What would end up happening is something like this: the first half of the array would get quicksorted (because you quicksort the left half), then you'd recursively sort the right half. The first half of that would get quicksorted, then you'd recursively sort the right half. The first half of that would get quicksorted, etc. Overall this would look something like this:
You quicksort the first half of the array, then the first half of what's left, then the first half of what's left, etc. until there are no elements left.
Then, working from left to right, you'd merge the last two elements together, then the last four, then the last eight, etc.
This would be a pretty cool-looking sort, but doing it by hand would be a total pain. You might be better off writing a program that just does this and showing all the intermediate steps. :-)
No, you cannot do it. At least if you still want to call it "merge sort". The most fundamental difference between merge sort and quick sort is that the first is a stable algorithm, i.e. equally ordered elements keep their relative positions unaltered after sorting. This is important in many scenarios.
If you sort the second half using quick sort, the relative position of equal elements can (and very likely will) change. The resulting set will not preserve stability so it can't be still considered merge sort.
By the way, previous answer is correct regarding insertion sort used as the last step of merge sort. Most efficient merge sort implementations will use something like insertion sort when the number of elements is small. Insertion sort is also stable, that's why it can be done without breaking merge sort stability.

sorting algorithm suitable for a sorted list

I have a sorted list at hand. Now i add a new element to the end of the list. Which sorting algorithm is suitable for such scenario?
Quick sort has worst case time complexity of O(n2) when the list is already sorted. Does this mean time complexity if quick sort is used in the above case will be close to O(n2)?
If you are adding just one element, find the position where it should be inserted and put it there. For an array, you can do binary search for O(logN) time and insert in O(N). For a linked list, you'll have to do a linear search which will take O(N) time but then insertion is O(1).
As for your question on quicksort: If you choose the first value as your pivot, then yes it will be O(N2) in your case. Choose a random pivot and your case will still be O(NlogN) on average. However, the method I suggest above is both easier to implement and faster in your specific case.
It depends on the implementation of the underlying list.
It seems to me that insertion sort will fit your needs except the case when the list is implemented as an array list. In this case too many moves will be required.
Rather than appending to the end of the list, you should do an insert operation.
That is, when adding 5 to [1,2,3,4,7,8,9] you'd result want to the "insert" by putting it where it belongs in the sorted list, instead of at the end and then re-sorting the whole list.
You can quickly find the position to insert the item by using a binary search.
This is basically how insertion sort works, except it operates on the entire list. This method will have better performance than even the best sorting algorithm, for a single item. It may also be faster than appending at the end of the list, depending on your implementation.
I'm assuming you're using an array, since you talk about quicksort, so just adding an element would involve finding the place to insert it (O(log n)) and then actually inserting it (O(n)) for a total cost of O(n). Just appending it to the end and then resorting the entire list is definitely the wrong way to go.
However, if this is to be a frequent operation (i.e. if you have to keep adding elements while maintaining the sorted property) you'll incur an O(n^2) cost of adding another n elements to the list. If you change your representation to a balanced binary tree, that drops to O(n log n) for another n inserts, but finding an element by index will become O(n). If you never need to do this, but just iterate over the elements in order, the tree is definitely the way to go.
Of possible interest is the indexable skiplist which, for a slight storage cost, has O(log n) inserts, deletes, searches and lookups-by-index. Give it a look, it might be just what you're looking for here.
What exactly do you mean by "list" ? Do you mean specifically a linked list, or just some linear (sequential) data structure like an array?
If it's linked list, you'll need a linear search for the correct position. The insertion itself can be done in constant time.
If it's something like an array, you can add to the end and sort, as you mentioned. A sorted collection is only bad for Quicksort if the Quicksort is really badly implemented. If you select your pivot with the typical median of 3 alogrithm, a sorted list will give optimal performance.

Resources