Quicksort: pivot position after one partition - algorithm

I am reading about quicksort, looking at different implementations and I am trying to wrap my head around something.
In this implementation (which of course works), the pivot is chosen as the middle element and then the left and right pointer move to the right and left accordingly, swapping elements to partition around the pivot.
I was trying the array [4, 3, 2, 6, 8, 1, 0].
On the first partition, pivot is 6 and all the left elements are already smaller than 6, so the left pointer will stop at the pivot. On the right side, we will swap 0 with 6, and then 1 and 8, so at the end of the first iteration, the array will look like:
[4, 3, 2, 0, 1, 8, 6].
However, I was under the impression that after each iteration in quicksort, the pivot ends up in its rightful place, so here it should end up in position 5 of the array.
So, it is possible (and ok) that the pivot doesn't end up in its correct iteration or is it something obvious I am missing?

There are many possible variations of the quicksort algorithm. In this one it is OK for the pivot to be not in its correct place in its iteration.
The defining feature of every variation of the quicksort algorithm is that after the partition step, we have a part in the beginning of the array, where all the elements are less or equal to pivot, and a non-overlapping part in the end of the array where all the elements are greater or equal to pivot. There may also be a part between them, where every element is equal to pivot. This layout ensures, that after we sort the left part and the right part with recursive calls, and leave the middle part intact, the whole array will be sorted.
Notice, that in general elements equal to pivot may go to any part of the array. A good implementation of quicksort, that avoids quadratic time for the most obvious case, i.e. all equal elements, must spread elements equal to pivot between parts rationally.
Possible variants include:
The middle part includes only 1 element: the pivot. In that case pivot takes its final place in the array after the partition and won't be used in the recursive calls. That's what you meant by pivot taking its place in its iteration. For this approach the good implementation must move about half the elements equal to pivot to the left part and the other half to the right part, otherwise we would have quadratic time for an array with all equal elements.
There is no middle part. Pivot and all elements equal to it are spread between the left and the right part. That's what the implementation you linked does. Once again, in this approach about half of the elements equal to pivot should go to the left part, and the other half to the right part. This can also be mixed with the first variation, depending on whether we are sorting an array with an odd or an even number of elements.
Every element equal to pivot goes to the middle part. There are no elements equal to pivot in either left or right part. That's quite efficient and that's the example Wikipedia gives for solving the all-elements-equal problem. Arrays with all elements equal to each other are sorted in linear time in that case.
Thus, the correct and efficient implementation of quicksort is quite tricky (there is also a problem of choosing a good pivot, for which several approaches with different tradeoffs exist as well; or an optimisation of switching to another non-recursive sorting algorithm for smaller sub-array sizes).
Also, it seems that the implementation you linked to, may do recursive calls on overlapping subarrays:
if (i <= j) {
exchange(i, j);
i++;
j--;
}
For example, when i is equal to j, those elements will be swapped, and i will become greater than j by 2. After that 3 elements will overlap between the ranges of the following recursive calls. The code still seems to work correctly though.

Related

Why does quicksort exclude the middle element but mergesort includes it?

I was going over the implementation of quicksort (from CLRS 3rd Edition). I found that the recursive divide of the array goes from the low index to middle-1 and then again from middle+1 to high.
QUICKSORT(A,p,r)
1 if(p < r)
2 q = PARTITION(A,p,r)
3 QUICKSORT(A,p,q-1)
4 QUICKSORT(A,q+1,r)
And the implementation of the merge sort is given as follows:
MERGE-SORT(A,p,r)
1 if(p < r)
2 q = (p+r)/2 (floor)
3 MERGE-SORT(A,p,q)
4 MERGE-SORT(A,q+1,r)
5 MERGE(A,p,q,r)
As both of them use the divide strategy to be the same, why does quicksort ignore the middle elements as going from 0 to q-1 and q+1 to r does not have q included in it while the mergesort has?
Quicksort puts all the elements smaller than the pivot on one side and all elements bigger on the other side. After this step we know the final position of the pivot will be between those two, and that's where we put it, so we don't need to look at it again.
Thus we can exclude the pivot element in the recursive calls.
Mergesort just picks the middle position and doesn't do anything with that element until later. There's no guarantee that the element in that position will already be in the right place, thus we need to look at that element again later on.
Thus we must include the middle element in the recursive calls.
Both methods exploit divide strategy but in different ways
Mergesort (the most common implementation) divides array recursively into equal (if possible) size parts, middle indexes are fixed positions (for given array length). Recurive calls treat left part and right part of array completely.
Quicksort partition subroutine places pivot element in the needed (final) position (in most cases pivot index is not middle). There is no need to treat this eleьent further, and recursive calls treat pieces before and after that element.

Find element of an array that appears only once in O(logn) time

Given an array A with all elements appearing twice except one element which appears only once. How do we find the element which appears only once in O(logn) time? Let's discuss two cases.
Array is always sorted and elements are in sequential order. Let's assume A = [1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 6], we want to find 3 in log n time because it appears only once.
When the array is not sorted and the elements are not in sequential order.
I can only come up with a solution of using the XOR operator on the binary representation of the integers as explained Here, and at the end, the binary string will represent the element which appears only once because duplicates will cancel out. But it takes O(n) time. How can we do better than that?
using Haroon S' comment this is the solution which I think is correct, given the constraints for time.
class Solution:
def singleNonDuplicate(self, nums: List[int]) -> int:
low = 0
high = len(nums)-1
while(low<high):
mid = (low+high)//2
if(mid%2==0):
mid+=1
if(nums[mid]==nums[mid+1]):
# answer in second half
high = mid-1
elif(nums[mid]==nums[mid-1]):
# answer in first half
low = mid+1
return nums[low]
If the elements are sorted (i.e., the first case you mentioned) then I believe a strategy not unlike binary search could work in O(logN) time.
Starting from the left endpoint in a sorted array, until we encounter the unique element, all the index pairs (2i, 2i + 1) we encounter along the way will have the same value. (i.e., due to the array being sorted) However, as we go towards the right endpoint of the array, as soon as we consider an array that includes the unique element, that structure of "same values within (2i, 2i+1) index pairs" will be invalid.
Using that information, a search algorithm similar to binary search can find out in which half of the array the unique element is. Basically, you can deduce that, "in the left half of the array, if the values in the rightmost index pair (2i, 2i+1) are the same, then the unique value is in the right half". (i.e., with the exception of the last index on the left half-array being even; but you can overcome that case with various O(1) time operations)
The overall complexity then becomes O(logN), due to the halving of the array size at each step.
For the demonstration of the index notion I mentioned above, see your own example. In the left of the unique element(i.e. 3) all index pairs (2i, 2i+1) have the same values. And all subarrays starting from index 0 and ending with an index that is to the right of the unique element, all index pairs (2i, 2i+1) have a correspond to cells that contain different values.
Unless the array is sorted, though, since you'd have to investigate each and every element, I believe any algorithm you may come up with would take at least O(n) time. This is what I think will happen in the second case you mention in your question.
In the general case this is impossible, as to make sure an element doesn't repeat you need to check every other element.
From your example, it seems the array might be a sorted sequence of integers with no "gaps" (or some other clearly defined sequence, like all even numbers, etc). In this case it is possible with a modified binary search.
You have the array [1,1,2,2,3,4,4,5,5,6,6].
You check the middle element and the element following it and see 3 and 4. Now you know there are only 5 elements from the set {1, 2, 3}, while there are 6 elements from the set {4, 5, 6}. Which means, the missing elements is in {1, 2, 3}.
Then you recurse on [1,1,2,2,3]. You see 2,2. Now you know there are 2 "1" elements and 1 "3" element, so 3 is the answer.
The reason you check 2 elements in each step is that if you see just "3", you don't know whether you hit the first 3 in "3,3" or the second one. But if you read 2 elements you always find a "boundary" between 2 different elements.
The condition for this to be viable is that, given the value of an element, you need to be able to calculate in O(1) how many different elements come before this element. In your case this is trivial, but it is also possible for any arithmetic series, geometric series (with fixed size numbers)...
This is not a O(log n) solution. I have no idea how to solve it in logarithmic time without the constraints that the array is sorted and we have a known difference between consecutive numbers so we can recognise when we are to the left or right of the singleton. The other solutions already deal with that special case and I couldn’t do better there either.
I have a suggestion that might solve the general case in O(n), rather than O(n log n) when you first sort the array. It’s not as fast as the xor solution, but it will also work for non-integers. The elements must have an order, so it is not completely general, but it will work anywhere you can sort the elements.
The idea is the same as the k’th order element algorithm based on Quicksort. You partition and recurse on one half of the array. The time recurrence is T(n) = T(n/2) + O(n) = O(n).
Given array x and indices i,j, representing sub-array x[i:j], partition with quicksort’s partitioning method. You want a variant that partitions x[i:j] into three segments, x[i:k] x[k:l], x[l:j] where all elements in the first part are smaller than the pivot (whatever it is) all elements in x[k:l] are equal to the pivot, and all elements in the last segment are greater than the pivot.
(you might be able to use a version that only partitions in two, or explicitly count the number of pivots, but with this version is easier to work with here)
Now, if the middle segment has length one, you have your singleton. It is the pivot.
If not, the length of the segment that has the singleton is odd while the other is even. So recurse on the segment with the odd length.
It doesn’t give you worst case linear time, for the same reason that Quicksort isn’t worst case log-linear, but you get an expected linear time algorithm and likely a fast one at that.
Not, of course, as fast as those solutions based on binary search, but here the elements do not need to be sorted and we can handle elements with arbitrary gaps between them. We are also not restricted to data where we can easily manipulate their bit-patterns. So it is more general. If you can compare the elements, this approach will find the singleton in O(n).
This solution will find the element in the array that appeared only once but there should not be more than one element of that type and the array should be sorted. This is Binary Search and will return the element in O(log n) time.
var singleNonDuplicate = function(nums) {
let s=0,e= nums.length-1
while(s < e){
let mid = Math.trunc(s+(e-s)/2)
if((mid%2 == 0&& nums[mid] ==nums[mid+1])||(mid%2==1 && nums[mid] == nums[mid-1]) ){
s= mid+1
}
else{
e = mid
}
}
return nums[s] // can return nums[e] also
};
I don't believe there is a O(log n) solution for that. The reason is that in order to find which element is appearing only once, you at least need to iterate over the elements of that array once.

Quick sorting algorithm states using middle element as pivot

I need help understanding exactly how the quick sort algorithm works. I've been watching teaching videos and still fail to really grasp it completely.
I have an unsorted list: 1, 2, 9, 5, 6, 4, 7, 8, 3
And I have to quick sort it using 6 as the pivot.
I need to see the state of the list after each partition procedure.
My main problem is understanding what the order of the elements are before and after the pivot. So in this case if we made 6 the pivot, I know the numbers 1 - 5 will be before 6 and 7 - 9 will go after that. But what will the order of the numbers 1 - 5 be and 7 - 9 be in the first partition given my list above?
Here is the partition algorithm that I want to use (bear in my I'm using the middle element as my initial pivot):
Determine the pivot, and swap the pivot with the first element of the list.
Suppose that the index smallIndex points to the last element smaller than the pivot. The index smallIndex is initialized to the first element of the list.
For the remaining elements in the list (starting at the second element)
If the current element is smaller than the pivot
a. Increment smallIndex
b. Swap the current element with the array element pointed to by smallIndex.
Swap the first element, that is the pivot, with the array element pointed to by smallIndex.
It would be amazing if anyone could show the list after each single little change that occurs to the list in the algorithm.
It doesn't matter.
All that matters - all that the partitioning process asserts - is that, after it has been run, there are no values on the left-hand side of the center point that emerges that are greater than the pivot and that there are no values on the right-hand side that are less than the pivot value.
The internal order of the two partitions is then handled in the subsequent recursive calls for each half.

Clarify the swap criteria in a Quicksort?

In a quicksort, the idea is you keep selecting a pivot. And you swap the a value you find on the left that is greater than the pivot with a value you find on the right which is less than the pivot. see: ref
Just want to be 100% sure what happens in the following cases:
No value on left greater than pivot, value on right less than pivot
Value on left greater than pivot, no value on right less than pivot
No value on left greater than pivot, no value on right less than pivot
While the choice of the pivot value is important for performance, it is unimportant for sorting.
Once you've chosen some value as the pivot, you then move all values smaller than or equal to the pivot to the left of the pivot and to the right of it you end up with all values greater than the pivot.
After all these moves, the pivot value is in its final position.
Then you recursively repeat the above procedure for the sub-array to the left of the pivot value and also for the sub-array to the right of it. Or course, if the sub-arrays have 0 or 1 elements in them, there's nothing to do with them, nothing to sort.
So in this way you end up choosing a bunch of pivot values which get into their final positions after all the moves. Between those pivot values are empty or single-element sub-arrays that don't need sorting as described previously.
The swap criteria depend on the implementation. What happens in the three cases you mention depends on the partitioning scheme. There are many implementations of Quicksort, but the main two best known ones (in my opinion) are:
Hoare's Partition: The first element is the pivot, and two index variables (i and j) walk the array (a[])towards the center while the elements they encounters are less than / greater than the pivot. Then a[j] and a[i] are swapped. Note that in this implementation the swap happens for elements that are equal to the pivot. This is believed to be important when your array contains many identical entries. After i and j cross, a[0] is swapped with a[j], so the pivot goes between the smaller-or-equal-to partition and larger-or-equal-to partition.
Lomuto's partition. This is the one implemented in pseudo-code in the current Wiki quicksort entry under "In-place version". Here pivot could be anything (say a median, or a median of three), and is swapped with the last element of a. Here only i "walks" toward the end of the array: whenever a[i]>=pivot it is swapped with a[j] and j is decremented. At the end, the pivot is swapped with a[i+1]
(See here for instance for an illustration).
Robert Sedgewick chamions a three way partitioning scheme, where the array is divided into three partitions: less than, equal to, and greater than the pivot: the claim is that it has better performance on arrays with lots of dupes, or identical values. It is implemented yet differently (see the link above).

Stability of quicksort partitioning approach

Does the following Quicksort partitioning algorithm result in a stable sort (i.e. does it maintain the relative position of elements with equal values):
partition(A,p,r)
{
x=A[r];
i=p-1;
for j=p to r-1
if(A[j]<=x)
i++;
exchange(A[i],A[j])
exchange(A[i+1],A[r]);
return i+1;
}
There is one case in which your partitioning algorithm will make a swap that will change the order of equal values. Here's an image that helps demonstrate how your in-place partitioning algorithm works:
We march through each value with the j index, and if the value we see is less than the partition value, we append it to the light-gray subarray by swapping it with the element that is immediately to the right of the light-gray subarray. The light-gray subarray contains all the elements that are <= the partition value. Now let's look at, say, stage (c) and consider the case in which three 9's are in the beginning of the white zone, followed by a 1. That is, we are about to check whether the 9's are <= the partition value. We look at the first 9 and see that it is not <= 4, so we leave it in place, and march j forward. We look at the next 9 and see that it is not <= 4, so we also leave it in place, and march j forward. We also leave the third 9 in place. Now we look at the 1 and see that it is less than the partition, so we swap it with the first 9. Then to finish the algorithm, we swap the partition value with the value at i+1, which is the second 9. Now we have completed the partition algorithm, and the 9 that was originally third is now first.
Any sort can be converted to a stable sort if you're willing to add a second key. The second key should be something that indicates the original order, such as a sequence number. In your comparison function, if the first keys are equal, use the second key.
A sort is stable when the original order of similar elements doesn't change. Your algorithm isn't stable since it swaps equal elements.
If it didn't, then it still wouldn't be stable:
( 1, 5, 2, 5, 3 )
You have two elements with the sort key "5". If you compare element #2 (5) and #5 (3) for some reason, then the 5 would be swapped with 3, thereby violating the contract of a stable sort. This means that carefully choosing the pivot element doesn't help, you must also make sure that the copying of elements between the partitions never swaps the original order.
Your code looks suspiciously similar to the sample partition function given on wikipedia which isn't stable, so your function probably isn't stable. At the very least you should make sure your pivot point r points to the last position in the array of values equal to A[r].
You can make quicksort stable (I disagree with Matthew Jones there) but not in it's default and quickest (heh) form.
Martin (see the comments) is correct that a quicksort on a linked list where you start with the first element as pivot and append values at the end of the lower and upper sublists as you go through the array. However, quicksort is supposed to work on a simple array rather than a linked list. One of the advantages of quicksort is it's low memory footprint (because everything happens in place). If you're using a linked list you're already incurring a memory overhead for all the pointers to next values etc, and you're swapping those rather than the values.
If you need a stable O(n*log(n)) sort, use mergesort. (The best way to make quicksort stable by the way is to chose a median of random values as the pivot. This is not stable for all elements equivalent, however.)
Quick sort is not stable. Here is the case when its not stable.
5 5 4 8
taking 1st 5 as pivot, we will have following after 1st pass-
4 5 5 8
As you can see order of 5's have been changed. Now if we continue doing sorting it will change the order of 5's in sorted array.
From Wikipedia:
Quicksort is a comparison sort and, in
efficient implementations, is not a
stable sort.
One way to solve this problem is by not taking Last Element of array as Key. Quick sort is randomized algorithm.
Its performance highly depends upon selection of Key. Although algorithm def says we should take last or first element as key, in reality we can select any element as key.
So I tried Median of 3 approach, which says take first ,middle and last element of array. Sorts them and then use middle position as a Key.
So for example my array is {9,6,3,10,15}. So by sorting first, middle and last element it will be {3,6,9,10,15}. Now use 9 as key. So moving key to the end it will be {3,6,15,10,9}.
All we need to take care is what happens if 9 comes more than once. That is key it self comes more than once.
In such cases after selecting key as middle index we need to go through elements between Key to Right end and if any element is found same key i.e. if 9 is found between middle position to the end make that 9 as key.
Now in the region of elements greater than 9 i.e. loop of j if any 9 is found swap it with region of elements less than that is region of i. Your array will be stable sorted.

Resources