Stability of quicksort partitioning approach

Stability of quicksort partitioning approach - algorithm

Does the following Quicksort partitioning algorithm result in a stable sort (i.e. does it maintain the relative position of elements with equal values):
partition(A,p,r)
{
x=A[r];
i=p-1;
for j=p to r-1
if(A[j]<=x)
i++;
exchange(A[i],A[j])
exchange(A[i+1],A[r]);
return i+1;
}

There is one case in which your partitioning algorithm will make a swap that will change the order of equal values. Here's an image that helps demonstrate how your in-place partitioning algorithm works:
We march through each value with the j index, and if the value we see is less than the partition value, we append it to the light-gray subarray by swapping it with the element that is immediately to the right of the light-gray subarray. The light-gray subarray contains all the elements that are <= the partition value. Now let's look at, say, stage (c) and consider the case in which three 9's are in the beginning of the white zone, followed by a 1. That is, we are about to check whether the 9's are <= the partition value. We look at the first 9 and see that it is not <= 4, so we leave it in place, and march j forward. We look at the next 9 and see that it is not <= 4, so we also leave it in place, and march j forward. We also leave the third 9 in place. Now we look at the 1 and see that it is less than the partition, so we swap it with the first 9. Then to finish the algorithm, we swap the partition value with the value at i+1, which is the second 9. Now we have completed the partition algorithm, and the 9 that was originally third is now first.

Any sort can be converted to a stable sort if you're willing to add a second key. The second key should be something that indicates the original order, such as a sequence number. In your comparison function, if the first keys are equal, use the second key.

A sort is stable when the original order of similar elements doesn't change. Your algorithm isn't stable since it swaps equal elements.
If it didn't, then it still wouldn't be stable:
( 1, 5, 2, 5, 3 )
You have two elements with the sort key "5". If you compare element #2 (5) and #5 (3) for some reason, then the 5 would be swapped with 3, thereby violating the contract of a stable sort. This means that carefully choosing the pivot element doesn't help, you must also make sure that the copying of elements between the partitions never swaps the original order.

Your code looks suspiciously similar to the sample partition function given on wikipedia which isn't stable, so your function probably isn't stable. At the very least you should make sure your pivot point r points to the last position in the array of values equal to A[r].
You can make quicksort stable (I disagree with Matthew Jones there) but not in it's default and quickest (heh) form.
Martin (see the comments) is correct that a quicksort on a linked list where you start with the first element as pivot and append values at the end of the lower and upper sublists as you go through the array. However, quicksort is supposed to work on a simple array rather than a linked list. One of the advantages of quicksort is it's low memory footprint (because everything happens in place). If you're using a linked list you're already incurring a memory overhead for all the pointers to next values etc, and you're swapping those rather than the values.

If you need a stable O(n*log(n)) sort, use mergesort. (The best way to make quicksort stable by the way is to chose a median of random values as the pivot. This is not stable for all elements equivalent, however.)

Quick sort is not stable. Here is the case when its not stable.
5 5 4 8
taking 1st 5 as pivot, we will have following after 1st pass-
4 5 5 8
As you can see order of 5's have been changed. Now if we continue doing sorting it will change the order of 5's in sorted array.

From Wikipedia:
Quicksort is a comparison sort and, in
efficient implementations, is not a
stable sort.

One way to solve this problem is by not taking Last Element of array as Key. Quick sort is randomized algorithm.
Its performance highly depends upon selection of Key. Although algorithm def says we should take last or first element as key, in reality we can select any element as key.
So I tried Median of 3 approach, which says take first ,middle and last element of array. Sorts them and then use middle position as a Key.
So for example my array is {9,6,3,10,15}. So by sorting first, middle and last element it will be {3,6,9,10,15}. Now use 9 as key. So moving key to the end it will be {3,6,15,10,9}.
All we need to take care is what happens if 9 comes more than once. That is key it self comes more than once.
In such cases after selecting key as middle index we need to go through elements between Key to Right end and if any element is found same key i.e. if 9 is found between middle position to the end make that 9 as key.
Now in the region of elements greater than 9 i.e. loop of j if any 9 is found swap it with region of elements less than that is region of i. Your array will be stable sorted.

Related

Find element of an array that appears only once in O(logn) time

Given an array A with all elements appearing twice except one element which appears only once. How do we find the element which appears only once in O(logn) time? Let's discuss two cases.
Array is always sorted and elements are in sequential order. Let's assume A = [1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 6], we want to find 3 in log n time because it appears only once.
When the array is not sorted and the elements are not in sequential order.
I can only come up with a solution of using the XOR operator on the binary representation of the integers as explained Here, and at the end, the binary string will represent the element which appears only once because duplicates will cancel out. But it takes O(n) time. How can we do better than that?

using Haroon S' comment this is the solution which I think is correct, given the constraints for time.
class Solution:
def singleNonDuplicate(self, nums: List[int]) -> int:
low = 0
high = len(nums)-1
while(low<high):
mid = (low+high)//2
if(mid%2==0):
mid+=1
if(nums[mid]==nums[mid+1]):
# answer in second half
high = mid-1
elif(nums[mid]==nums[mid-1]):
# answer in first half
low = mid+1
return nums[low]

If the elements are sorted (i.e., the first case you mentioned) then I believe a strategy not unlike binary search could work in O(logN) time.
Starting from the left endpoint in a sorted array, until we encounter the unique element, all the index pairs (2i, 2i + 1) we encounter along the way will have the same value. (i.e., due to the array being sorted) However, as we go towards the right endpoint of the array, as soon as we consider an array that includes the unique element, that structure of "same values within (2i, 2i+1) index pairs" will be invalid.
Using that information, a search algorithm similar to binary search can find out in which half of the array the unique element is. Basically, you can deduce that, "in the left half of the array, if the values in the rightmost index pair (2i, 2i+1) are the same, then the unique value is in the right half". (i.e., with the exception of the last index on the left half-array being even; but you can overcome that case with various O(1) time operations)
The overall complexity then becomes O(logN), due to the halving of the array size at each step.
For the demonstration of the index notion I mentioned above, see your own example. In the left of the unique element(i.e. 3) all index pairs (2i, 2i+1) have the same values. And all subarrays starting from index 0 and ending with an index that is to the right of the unique element, all index pairs (2i, 2i+1) have a correspond to cells that contain different values.
Unless the array is sorted, though, since you'd have to investigate each and every element, I believe any algorithm you may come up with would take at least O(n) time. This is what I think will happen in the second case you mention in your question.

In the general case this is impossible, as to make sure an element doesn't repeat you need to check every other element.
From your example, it seems the array might be a sorted sequence of integers with no "gaps" (or some other clearly defined sequence, like all even numbers, etc). In this case it is possible with a modified binary search.
You have the array [1,1,2,2,3,4,4,5,5,6,6].
You check the middle element and the element following it and see 3 and 4. Now you know there are only 5 elements from the set {1, 2, 3}, while there are 6 elements from the set {4, 5, 6}. Which means, the missing elements is in {1, 2, 3}.
Then you recurse on [1,1,2,2,3]. You see 2,2. Now you know there are 2 "1" elements and 1 "3" element, so 3 is the answer.
The reason you check 2 elements in each step is that if you see just "3", you don't know whether you hit the first 3 in "3,3" or the second one. But if you read 2 elements you always find a "boundary" between 2 different elements.
The condition for this to be viable is that, given the value of an element, you need to be able to calculate in O(1) how many different elements come before this element. In your case this is trivial, but it is also possible for any arithmetic series, geometric series (with fixed size numbers)...

This is not a O(log n) solution. I have no idea how to solve it in logarithmic time without the constraints that the array is sorted and we have a known difference between consecutive numbers so we can recognise when we are to the left or right of the singleton. The other solutions already deal with that special case and I couldn’t do better there either.
I have a suggestion that might solve the general case in O(n), rather than O(n log n) when you first sort the array. It’s not as fast as the xor solution, but it will also work for non-integers. The elements must have an order, so it is not completely general, but it will work anywhere you can sort the elements.
The idea is the same as the k’th order element algorithm based on Quicksort. You partition and recurse on one half of the array. The time recurrence is T(n) = T(n/2) + O(n) = O(n).
Given array x and indices i,j, representing sub-array x[i:j], partition with quicksort’s partitioning method. You want a variant that partitions x[i:j] into three segments, x[i:k] x[k:l], x[l:j] where all elements in the first part are smaller than the pivot (whatever it is) all elements in x[k:l] are equal to the pivot, and all elements in the last segment are greater than the pivot.
(you might be able to use a version that only partitions in two, or explicitly count the number of pivots, but with this version is easier to work with here)
Now, if the middle segment has length one, you have your singleton. It is the pivot.
If not, the length of the segment that has the singleton is odd while the other is even. So recurse on the segment with the odd length.
It doesn’t give you worst case linear time, for the same reason that Quicksort isn’t worst case log-linear, but you get an expected linear time algorithm and likely a fast one at that.
Not, of course, as fast as those solutions based on binary search, but here the elements do not need to be sorted and we can handle elements with arbitrary gaps between them. We are also not restricted to data where we can easily manipulate their bit-patterns. So it is more general. If you can compare the elements, this approach will find the singleton in O(n).

This solution will find the element in the array that appeared only once but there should not be more than one element of that type and the array should be sorted. This is Binary Search and will return the element in O(log n) time.
var singleNonDuplicate = function(nums) {
let s=0,e= nums.length-1
while(s < e){
let mid = Math.trunc(s+(e-s)/2)
if((mid%2 == 0&& nums[mid] ==nums[mid+1])||(mid%2==1 && nums[mid] == nums[mid-1]) ){
s= mid+1
}
else{
e = mid
}
}
return nums[s] // can return nums[e] also
};

I don't believe there is a O(log n) solution for that. The reason is that in order to find which element is appearing only once, you at least need to iterate over the elements of that array once.

What are the stabilty "factors" of sorting?

Lately, I have been learning about various methods of sorting and a lot of them are unstable i.e selection sort, quick sort, heap sort.
My question is: What are the general factors that make sorting unstable?

Most of the efficient sorting algorithms are efficient since they move data over a longer distance i.e. far closer to the final position every move. This efficiency causes the loss of stability in sorting.
For example, when you do a simple sort like bubble sort, you compare and swap neighboring elements. In this case, it is easy to not move the elements if they are already in the correct order. But say in the case of quick-sort, the partitioning process might chose to say move so the swaps are minimal. For example, if you partition the below list on the number 2, the most efficient way would be to swap the 1st element with the 4th element and 2nd element with the 5th element
2 3 1 1 1 4
1 1 1 2 3 4
If you notice, now we have changed the sequence of 1's in the list causing it to be unstable.
So to sum it up, some algorithms are very suitable for stable sorting (like bubble-sort), whereas some others like quick sort can be made stable by carefully selecting a partitioning algorithm, albeit at the cost of efficiency or complexity or both.
We usually classify the algorithm to be stable or not based on the most "natural" implementation of it.

A sorting algorithm is stable when it uses the original order of elements to break ties in the new ordering. For example, lets say you have records of (name, age) and you want to sort them by age.
If you use a stable sort on (Matt, 50), (Bob, 20), (Alice, 50), then you will get (Bob, 20), (Matt, 50), (Alice, 50). The Matt and Alice records have equal ages, so they are equal according to the sorting criteria. The stable sort preserves their original relative order -- Matt came before Alice in the original list, so it comes before Alice in the output.
If you use an unstable sort on the same list, you might get (Bob, 20), (Matt, 50), (Alice, 50) or you might get (Bob, 20), (Alice, 50), (Matt, 50). Elements that compare equal will be grouped together but can come out in any order.
It's often handy to have a stable sort, but a stable sort implementation has to remember information about the original order of the elements while its reordering them.
In-place array sorting algorithms are designed not to use any extra space to store this kind of information, and they destroy the original ordering while they work. The fast ones like quicksort aren't usually stable, because reordering the array in ways that preserve the original order to break ties is slow. Slow array sorting algorithms like insertion sort or selection sort can usually be written to be stable without difficulty.
Sorting algorithms that copy data from one place to another, or work with other data structures like linked lists, can be both fast and stable. Merge sort is the most common.

If you have an example input of
1 5 3 7 1
You want for to be stable the last 1 to never go before the first 1.
Generally, elements with the same value in the input array to not have changed their positions once sorted.
Then sorted would look like:
1(f) 3 5 7 1(l)
f: first, l: last(or second if more than 2).
For example, QuickSort uses swaps and because the comparisons are done with greater than (>=) or less than, equally valued elements can be swapped while sorting. And as result in the output.

Quick sorting algorithm states using middle element as pivot

I need help understanding exactly how the quick sort algorithm works. I've been watching teaching videos and still fail to really grasp it completely.
I have an unsorted list: 1, 2, 9, 5, 6, 4, 7, 8, 3
And I have to quick sort it using 6 as the pivot.
I need to see the state of the list after each partition procedure.
My main problem is understanding what the order of the elements are before and after the pivot. So in this case if we made 6 the pivot, I know the numbers 1 - 5 will be before 6 and 7 - 9 will go after that. But what will the order of the numbers 1 - 5 be and 7 - 9 be in the first partition given my list above?
Here is the partition algorithm that I want to use (bear in my I'm using the middle element as my initial pivot):
Determine the pivot, and swap the pivot with the first element of the list.
Suppose that the index smallIndex points to the last element smaller than the pivot. The index smallIndex is initialized to the first element of the list.
For the remaining elements in the list (starting at the second element)
If the current element is smaller than the pivot
a. Increment smallIndex
b. Swap the current element with the array element pointed to by smallIndex.
Swap the first element, that is the pivot, with the array element pointed to by smallIndex.
It would be amazing if anyone could show the list after each single little change that occurs to the list in the algorithm.

It doesn't matter.
All that matters - all that the partitioning process asserts - is that, after it has been run, there are no values on the left-hand side of the center point that emerges that are greater than the pivot and that there are no values on the right-hand side that are less than the pivot value.
The internal order of the two partitions is then handled in the subsequent recursive calls for each half.

Quicksort: pivot position after one partition

I am reading about quicksort, looking at different implementations and I am trying to wrap my head around something.
In this implementation (which of course works), the pivot is chosen as the middle element and then the left and right pointer move to the right and left accordingly, swapping elements to partition around the pivot.
I was trying the array [4, 3, 2, 6, 8, 1, 0].
On the first partition, pivot is 6 and all the left elements are already smaller than 6, so the left pointer will stop at the pivot. On the right side, we will swap 0 with 6, and then 1 and 8, so at the end of the first iteration, the array will look like:
[4, 3, 2, 0, 1, 8, 6].
However, I was under the impression that after each iteration in quicksort, the pivot ends up in its rightful place, so here it should end up in position 5 of the array.
So, it is possible (and ok) that the pivot doesn't end up in its correct iteration or is it something obvious I am missing?

There are many possible variations of the quicksort algorithm. In this one it is OK for the pivot to be not in its correct place in its iteration.
The defining feature of every variation of the quicksort algorithm is that after the partition step, we have a part in the beginning of the array, where all the elements are less or equal to pivot, and a non-overlapping part in the end of the array where all the elements are greater or equal to pivot. There may also be a part between them, where every element is equal to pivot. This layout ensures, that after we sort the left part and the right part with recursive calls, and leave the middle part intact, the whole array will be sorted.
Notice, that in general elements equal to pivot may go to any part of the array. A good implementation of quicksort, that avoids quadratic time for the most obvious case, i.e. all equal elements, must spread elements equal to pivot between parts rationally.
Possible variants include:
The middle part includes only 1 element: the pivot. In that case pivot takes its final place in the array after the partition and won't be used in the recursive calls. That's what you meant by pivot taking its place in its iteration. For this approach the good implementation must move about half the elements equal to pivot to the left part and the other half to the right part, otherwise we would have quadratic time for an array with all equal elements.
There is no middle part. Pivot and all elements equal to it are spread between the left and the right part. That's what the implementation you linked does. Once again, in this approach about half of the elements equal to pivot should go to the left part, and the other half to the right part. This can also be mixed with the first variation, depending on whether we are sorting an array with an odd or an even number of elements.
Every element equal to pivot goes to the middle part. There are no elements equal to pivot in either left or right part. That's quite efficient and that's the example Wikipedia gives for solving the all-elements-equal problem. Arrays with all elements equal to each other are sorted in linear time in that case.
Thus, the correct and efficient implementation of quicksort is quite tricky (there is also a problem of choosing a good pivot, for which several approaches with different tradeoffs exist as well; or an optimisation of switching to another non-recursive sorting algorithm for smaller sub-array sizes).
Also, it seems that the implementation you linked to, may do recursive calls on overlapping subarrays:
if (i <= j) {
exchange(i, j);
i++;
j--;
}
For example, when i is equal to j, those elements will be swapped, and i will become greater than j by 2. After that 3 elements will overlap between the ranges of the following recursive calls. The code still seems to work correctly though.

Sequence of Some Algorithms on Sorting

i see in some midterm or final exam on MIT that the following question repeat and repeat in same manner.
we show an array in the some step of one sorting algorithm.
5,3,1,9,8,2,4,7
2,3,1,4,5,8,9,7
1,2,3,4,5,8,9,7
1,2,3,4,5,8,7,9
1,2,3,4,5,7,8,9
which of Insertion Sort / Quick Sort / Merge Sort / Exchange Sort is used?
how i find solution of this Questions? ?
Edit: i think this is quick sort because each level some elements is lower than pivot and some elements is greater that pivot ....

In such cases you can either a) find some pattern if you think there is one or b) go with simple elimination. Let's try elimination:
1) it cannot be insertion sort as insertion sort starts from the beginning and treats the range [0,k] as a sorted subarray of already checked values. Then it continues one by one so we first would insert 3 before 5 etc as we would at first treat [5] as a sorted subarray of size 1 and insert 3 into it as it's the next value in the whole array.
2) Merge sort would sort neighbor first as it would first recursively treat the whole array as single element arrays and then go back up the recursion tree and merge neigbors so more like this:
[3,5],[1,9],[2,8],[4,7]
[1,3,5,9],[2,4,7,8]
[1,2,3,4,5,6,7,8]
[] shows which parts were sorted at each step.
This means that after one pass neighbors will be sorted.
3) exchange sort would also have a different ordering - the second line should start with 3 as you would swap 5 and 3, then 5 and 1 etc in the first pass. So after one pass we would go from 5,3,1,9,8,2,4,7 into 3,1,5,8,2,4,7,9 if my bubble sort serves me right. We compare each pair and swap if element at i+1 is greater that at i. This way the last element will be the largest.
4) as you fairly pointed out this is quick sort as in each step we can clearly see that the array is getting pivoted around a certain value 4, then you pivot the left half around 2 and the right half around 5 etc.
The parts in bold are the patterns I was talking about, now since you know them you can easily check which one it is :-)

It should be quick sort, not only because the evidence of partition, but also this interesting fact: for some level, only one part of the array changed.
Now let's discuss each algorithm:
Insertion sort will give you a pattern that the first few elements must be sorted, but obviously we don't have this pattern;
Bubble sort (exchange sort) will keep exchanging neighbors if the former element is bigger than the later element, and thus the last k elements will be sorted after k iterations. Based on these two facts, we won't have a pair of neighbor (a, b) that b < a exists after each iteration. However, the sequence doesn't follow this, say the term (3, 1) in the first sequence still exists in the second sequence.
Merge sort first splits the array into 2 + 2 + 2 subarrays and then merge it into 4 + 4 and finally a sorted array of 8 elements, so totally should take 3 steps, but we have 4 steps here, so won't be merge sort.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio