Clarify the swap criteria in a Quicksort? - algorithm

In a quicksort, the idea is you keep selecting a pivot. And you swap the a value you find on the left that is greater than the pivot with a value you find on the right which is less than the pivot. see: ref
Just want to be 100% sure what happens in the following cases:
No value on left greater than pivot, value on right less than pivot
Value on left greater than pivot, no value on right less than pivot
No value on left greater than pivot, no value on right less than pivot

While the choice of the pivot value is important for performance, it is unimportant for sorting.
Once you've chosen some value as the pivot, you then move all values smaller than or equal to the pivot to the left of the pivot and to the right of it you end up with all values greater than the pivot.
After all these moves, the pivot value is in its final position.
Then you recursively repeat the above procedure for the sub-array to the left of the pivot value and also for the sub-array to the right of it. Or course, if the sub-arrays have 0 or 1 elements in them, there's nothing to do with them, nothing to sort.
So in this way you end up choosing a bunch of pivot values which get into their final positions after all the moves. Between those pivot values are empty or single-element sub-arrays that don't need sorting as described previously.

The swap criteria depend on the implementation. What happens in the three cases you mention depends on the partitioning scheme. There are many implementations of Quicksort, but the main two best known ones (in my opinion) are:
Hoare's Partition: The first element is the pivot, and two index variables (i and j) walk the array (a[])towards the center while the elements they encounters are less than / greater than the pivot. Then a[j] and a[i] are swapped. Note that in this implementation the swap happens for elements that are equal to the pivot. This is believed to be important when your array contains many identical entries. After i and j cross, a[0] is swapped with a[j], so the pivot goes between the smaller-or-equal-to partition and larger-or-equal-to partition.
Lomuto's partition. This is the one implemented in pseudo-code in the current Wiki quicksort entry under "In-place version". Here pivot could be anything (say a median, or a median of three), and is swapped with the last element of a. Here only i "walks" toward the end of the array: whenever a[i]>=pivot it is swapped with a[j] and j is decremented. At the end, the pivot is swapped with a[i+1]
(See here for instance for an illustration).
Robert Sedgewick chamions a three way partitioning scheme, where the array is divided into three partitions: less than, equal to, and greater than the pivot: the claim is that it has better performance on arrays with lots of dupes, or identical values. It is implemented yet differently (see the link above).

Related

If the array has an even number of elements which value becomes the "perfect" pivot for Quicksort?

Let's say we have an array which consists -5,1,-9,-3. I assume that the perfect pivot is the one which partitions the array into two equal subarrays, but that's only possible with arrays which have odd number of element. As I understand the perfect pivot in this case would be either -3 or 1 as they are not the elements with lowest or highest value?
In many cases, multiple pivots will be equally good. Further, if one uses a three-way subdivision where all items equal to the pivot get grouped in the middle, the optimal choice of pivot would depend upon more than just the number of elements above and below the pivot, but also their value distribution. For example, given [1,2,2,2,3,4,5,6,7] the median value is 3, but pivoting around 3 would require one pivoting operation on the left and two on the right, while pivoting around 4 would result in only one additional operation being needed on each side.

Why does quicksort exclude the middle element but mergesort includes it?

I was going over the implementation of quicksort (from CLRS 3rd Edition). I found that the recursive divide of the array goes from the low index to middle-1 and then again from middle+1 to high.
QUICKSORT(A,p,r)
1 if(p < r)
2 q = PARTITION(A,p,r)
3 QUICKSORT(A,p,q-1)
4 QUICKSORT(A,q+1,r)
And the implementation of the merge sort is given as follows:
MERGE-SORT(A,p,r)
1 if(p < r)
2 q = (p+r)/2 (floor)
3 MERGE-SORT(A,p,q)
4 MERGE-SORT(A,q+1,r)
5 MERGE(A,p,q,r)
As both of them use the divide strategy to be the same, why does quicksort ignore the middle elements as going from 0 to q-1 and q+1 to r does not have q included in it while the mergesort has?
Quicksort puts all the elements smaller than the pivot on one side and all elements bigger on the other side. After this step we know the final position of the pivot will be between those two, and that's where we put it, so we don't need to look at it again.
Thus we can exclude the pivot element in the recursive calls.
Mergesort just picks the middle position and doesn't do anything with that element until later. There's no guarantee that the element in that position will already be in the right place, thus we need to look at that element again later on.
Thus we must include the middle element in the recursive calls.
Both methods exploit divide strategy but in different ways
Mergesort (the most common implementation) divides array recursively into equal (if possible) size parts, middle indexes are fixed positions (for given array length). Recurive calls treat left part and right part of array completely.
Quicksort partition subroutine places pivot element in the needed (final) position (in most cases pivot index is not middle). There is no need to treat this eleьent further, and recursive calls treat pieces before and after that element.

Quicksort: pivot position after one partition

I am reading about quicksort, looking at different implementations and I am trying to wrap my head around something.
In this implementation (which of course works), the pivot is chosen as the middle element and then the left and right pointer move to the right and left accordingly, swapping elements to partition around the pivot.
I was trying the array [4, 3, 2, 6, 8, 1, 0].
On the first partition, pivot is 6 and all the left elements are already smaller than 6, so the left pointer will stop at the pivot. On the right side, we will swap 0 with 6, and then 1 and 8, so at the end of the first iteration, the array will look like:
[4, 3, 2, 0, 1, 8, 6].
However, I was under the impression that after each iteration in quicksort, the pivot ends up in its rightful place, so here it should end up in position 5 of the array.
So, it is possible (and ok) that the pivot doesn't end up in its correct iteration or is it something obvious I am missing?
There are many possible variations of the quicksort algorithm. In this one it is OK for the pivot to be not in its correct place in its iteration.
The defining feature of every variation of the quicksort algorithm is that after the partition step, we have a part in the beginning of the array, where all the elements are less or equal to pivot, and a non-overlapping part in the end of the array where all the elements are greater or equal to pivot. There may also be a part between them, where every element is equal to pivot. This layout ensures, that after we sort the left part and the right part with recursive calls, and leave the middle part intact, the whole array will be sorted.
Notice, that in general elements equal to pivot may go to any part of the array. A good implementation of quicksort, that avoids quadratic time for the most obvious case, i.e. all equal elements, must spread elements equal to pivot between parts rationally.
Possible variants include:
The middle part includes only 1 element: the pivot. In that case pivot takes its final place in the array after the partition and won't be used in the recursive calls. That's what you meant by pivot taking its place in its iteration. For this approach the good implementation must move about half the elements equal to pivot to the left part and the other half to the right part, otherwise we would have quadratic time for an array with all equal elements.
There is no middle part. Pivot and all elements equal to it are spread between the left and the right part. That's what the implementation you linked does. Once again, in this approach about half of the elements equal to pivot should go to the left part, and the other half to the right part. This can also be mixed with the first variation, depending on whether we are sorting an array with an odd or an even number of elements.
Every element equal to pivot goes to the middle part. There are no elements equal to pivot in either left or right part. That's quite efficient and that's the example Wikipedia gives for solving the all-elements-equal problem. Arrays with all elements equal to each other are sorted in linear time in that case.
Thus, the correct and efficient implementation of quicksort is quite tricky (there is also a problem of choosing a good pivot, for which several approaches with different tradeoffs exist as well; or an optimisation of switching to another non-recursive sorting algorithm for smaller sub-array sizes).
Also, it seems that the implementation you linked to, may do recursive calls on overlapping subarrays:
if (i <= j) {
exchange(i, j);
i++;
j--;
}
For example, when i is equal to j, those elements will be swapped, and i will become greater than j by 2. After that 3 elements will overlap between the ranges of the following recursive calls. The code still seems to work correctly though.

Big Shot IT company interview puzzle

This past week I attended a couple of interviews at a few big IT companies. one question that left me bit puzzled. below is an exact description of the problem.(from one of the interview questions website)
Given the data set,
A,B,A,C,A,B,A,D,A,B,A,C,A,B,A,E,A,B,A,C,A,B,A,D,A,B,A,C,A,B,A,F
which can be reduced to
(A; 16); (B; 8); (C; 4); (D; 2); (E; 1); (F; 1):
using the (value, frequency) format.
for a total of m of these tuples, stored in no specific order. Devise an O(m) algorithm that returns the kth order statistic of the data set. m is the number of tuples as opposed to n which is total number of elements in the data set.
You can use Quick-Select to solve this problem.
Naively:
Pick an element (called the pivot) from the array
Put things less than or equal to the pivot on the left of the array, those greater on the right.
If the pivot is in position k, then you're done. If it's greater than k, then repeat the algorithm on the left side of the array. If it's less than k, then repeat the algorithm on the right side of the array.
There's a couple of details:
You need to either pick the pivot randomly (if you're happy with expected O(m) as the cost), or use a deterministic median algorithm.
You need to be careful to not take O(m^2) time if there's lots of values equal to the pivot. One simple way to do this is to do a second pass to split the array into 3 parts rather than 2: those less than the pivot, those equal to the pivot, and those greater than the pivot.

How to modify Lomuto partition scheme?

Lomuto partition is a simple partition algorithm used in quicksort. The Lomuto algorithm partitions subarray A[left] ... A[right] and assumes A[left] to be a pivot. How to modify this algorithm to partition A[left] ... A[right] using a given pivot P (which differs from A[left]) ?
Lomuto's partioning algorithm depends on the pivot being the leftmost element of the subarray being partitioned. It can also be modified to use the rightmost element of the pivot instead; for instance, see Chapter 7 of CLRS.
Using an arbitrary value for the pivot (say something not in the subarray) would screw things up in a quicksort implementation because there would be no guarantee that your partition made the problem any smaller. Say you had zero as the value you pivoted on but all N array entries were positive. Then your partition would give at zero-length array of elements <= 0 and an array of length N containing the elements >= 0 (which is all of them). You'd get an infinite loop trying to do quicksort in that case. Same if you were trying to find the median of the array using that modified form of Lomuto's partition. The partition depends critically on choosing an element from the array to pivot on. You'd basically lose the postcondition that an element (the pivot) would be fixed in place for good after the partition, which Lomuto's partition guarantees.
Lomuto's algorithm also depends critically on pivoting on an element that is either in the first or last position of the array being partitioned. If you pivot on an element not located at either the very front or very end of the array, the loop invariant that is the core of why Lomuto's partition works would would be a nightmare.
You can pivot on a different element of the array by swapping it with the first (or last if you implement it that way) element as the first step. Check MIT's video lecture on Quicksort for course 6.046J where they go in depth discussing Lomuto's partitioning algorithm (though they just call it Partition) and a vanilla implementation of quicksort based on it, not to mention some great probability in discussing the expected runtime of a randomized form of quicksort:
http://www.youtube.com/watch?v=vK_q-C-kXhs
CLRS and Programming Pearls both have great sections on quicksort if perhaps you're stuck using an inferior book for an algorithms class or something.
depends on how you define P, is P an index or a particular element?
if it is an index, then it is easy. you modify your two passes
...
i = left
j = right
while (a[i]<a[p]) i++
while (a[p]>a[j]) j--
if (i <= j)
swap(a, i, j)
qsort(a, left,i)
qsort(a, j,right)
...
if P is not an index, but a particular value, then you would need to search for it first, and only then do the above with the resultant index. Because the array is not sorted yet, you can only search linearly. You could also come up with a more clever scheme (hashtable) for finding your pivot P, but I don't see why you would need to do such a thing.

Resources