Pivot element selection in Quick Select - algorithm

Is there any significance of selecting a random pivot over last element for quick select?
I can still find the required element by always selecting the last element as pivot. Will it effect the runtime.

The choice of pivot has a huge influence on the runtime of quickselect on a given array.
In deterministic quickselect, if you always choose the last element as your pivot, imagine what happens if you try to select out of an always-sorted list. Your pivot will always be the worst possible pivot and will only eliminate one number from the array, leading to an Θ(n2) runtime.
In randomized quickselect, it's still technically possible that the runtime will be Θ(n2) if the algorithm always makes the worst possible choice on each recursive call, but this is extraordinarily unlikely. The runtime will be O(n) with high probability, and there is no "killer" input.
In other words, quickselect with a deterministically-chosen pivot will always have at least one "killer" input that forces it to run in time Θ(n2), while quickselect with a random pivot has no killer inputs and has excellent average time guarantees. The analyses are totally different as a result.
Hope this helps!

Related

Worst case runtime of random quicksort if you keep randomly choosing a pivot and partitioning until you find a good pivot

If you changed the random quicksort algorithm to repeatedly randomly pick a pivot and run partition until it finds a "good" pivot what would be the worst case cost of the algorithm? If we kept track of the pivots used so far so that we never use the same one twice for the same array.
In the worst case, the random pivot always is the maximum or minimum possible value, in which case the runtime is O(n^2).
For a more detailed analysis, look here.
The expected value of the runtime is still O(n * log n) though.

In quicksort If an array is randomized, does using the median of 3 for pivot selection matter?

I've been comparing the run times of various pivot selection algorithms. Surprisingly the simplest one where the first element is always chosen is the fastest. This may be because I'm filling the array with random data.
If the array has been randomized (shuffled) does it matter? For example picking the medium of 3 as the pivot is always(?) better than picking the first element as the pivot. But this isn't what I've noticed. Is it because if the array is already randomized there would be no reason to assume sortedness, and using the medium is assuming there is some degree of sortedness?
The worst case runtime of quicksort is O(n²). Quicksort is only in average case a fast sorting algorithm.
To reach a average runtime of O(n log n) you have to choose a random pivot element.
But instead of choosing a random pivot element, you can shuffle the list and choose the first element.
To see that this holds you can look at this that way: lets say all elements are in a specific order. Shuffling means you use a random permutation on the list of elements, so a random element will be at the first position and also on all other positions. You can also see it by shuffling the list by randomly choose one of all elements for the first element, then choosing randomly one element of the other (not yet coosen elements) for the second element, and so on.
If your list is already a random generated list, you can directly choose the first element as pivot, without shuffling again.
So, choosing the first element is the fastest one because of the random generated input, but choosing the thrid or the last will also as fast as choosing the first.
All other ways to choose a pivot element have to compute something (a median or a random number or something like this), but they have no advantage over a random choice.
A substantially late response, but I believe it will add some additional info.
Surprisingly the simplest one where the first element is always chosen
is the fastest.
This is actually not surprisingly at all, since you mentioned that you test the algorithm with the random data. In the reality, a percentage of almost-sorted and sorted data is much greater than it would statistically be expected. Take for example the chronological data, when you collect it into the log file some elements can be out of order, but most of them are already sorted. Unfortunately, the Quicksort implementation that takes first (or last) element as a pivot is vulnerable to such input and it degenerates into O(n^2) complexity because in the partition step you divide your array into two halves of size 1 and n-1 and therefore you get n partitions instead of log n, on average.
That's why people decided to add some sort of randomization that would make a probability of getting the problematic input as minimum as possible. There are three well-known approaches:
shuffle the input - to quote Robert Sedgewick, "the probability of getting O(n^2) performance with such approach is lower than the probability that you will be hit by a thunderstrike" :)
choose the pivot element randomly - Wikipedia says that in average, expected number of comparisons in this case is 1.386 n log n
choose the pivot element as a median of three - Wikipedia says that in average, expected number of comparisons in this case is 1.188 n log n
However, randomization costs. If you shuffle the input array, that is O(n) which is dominated by O(nlogn), but you need to take in the account the cost of invoking random(..) method n times. With your simple approach, that is avoided and it is thus faster.
See also:
Worst case for Quicksort - when can it occur?

Is Quicksort "adaptive" and "online"?

That is to say, does Quicksort perform BETTER when given an already sorted list? I don't see why this would be the case, but perhaps I don't understand exactly the algorithm.
Also, can quicksort "keep going" whilst we add new data to the list WHILE SORTING? Seems to me the algorithm needs the full set of all data at the beginning to "work".
does Quicksort perform BETTER when given an already sorted list?
No, in fact the way it's usually taught (use the first element as the pivot) an already sorted (or nearly sorted) list is the worst-case. Using the middle or a random element as pivot can mitigate this, however.
can quicksort "keep going" whilst we add new data to the list WHILE SORTING?
No, your intuition is correct, you need the entire data set from the start.
Does Quicksort perform BETTER when given an already sorted list
I think the performance of quicks sort majorly depends upon the choice of the pivot element at every step. It would be the worst if the pivot element which is selected is likely to be either the smallest, or the largest element in the list
quicksort "keep going" whilst we add new data to the list WHILE
SORTING?
Yes quicksort is not adaptive. Thats the property of quick sort.
Quicksort, when its choice of pivots is random, has a runtime of O(n lg n) where n is the size of the array. If its choice of pivots is in sorted order its runtime degrades to O(n^2). Whether you choose the pivot from the left side, right side, middle or randomly, it doesn't matter since it is possible, if not likely, to select pivots in sorted order.
The only way to avoid this is to guarantee the pivots aren't in order by using a technique such as the "Median of Three."
According to Robert Sedgewick, Algorithms, Addison-Wesley Publishing Company, 1988, page 124, if you use the Median of Three technique to choose the pivot and stop the recursion for small partitions (anywhere from 5 to 25 in size; this leaves the array unsorted but you can finish it up quickly with an insertion sort) then quicksort will always be O(n lg n) and, furthermore, run 20% faster than ordinary quicksort.

What's the difference between good pivot VS "bad pivot" in Select algorithm?

I'm trying to understand the Select algorithm and I came across a good pivot VS a bad pivot . I can see that the algorithm is using Partition algorithm for separating the bigger elements on the right of
the pivot , and the smaller elements on the left of the pivot .
But what does it mean a bad pivot ?
How can a bad pivot throw the total run time into O(n^2) ?
Thanks
The selection algorithm will be fast if it can discard huge chunks of the array at each step. A good pivot is one that for some definition of "a lot" causes the algorithm to discard "a lot" of the array elements. A bad pivot is one for which the algorithm discards very title of the array.
In the worst case, the pivot might be the largest or smallest element of the array. If this happens, then the algorithm will partition the elements in a way where one group of values is empty, since there will be either no elements less than the pivot or no elements greater than the pivot. This partitioning step takes time O(n), and will have to be run O(n) times, since each iteration decreases the size of the array by one. This degrades the algorithm runtime to O(n2). Interestingly, this is also the way to get quick sort to degenerate to time O(n2).
Hope this helps!

Quick sort Worst case

I'm working on the program just needed in the following to understand it better.
What is the worst case running time for Quicksort and what may cause this worse case performance? How can we modify quicksort program to mitigate this problem?
I know that it has worst case O(n^2) and I know it occurs when the pivot unique minimum or maximum element. My question is how can I modify the program to mitigate this problem.
A good algorithm will be good.
Quicksort's performance is dependent on your pivot selection algorithm. The most naive pivot selection algorithm is to just choose the first element as your pivot. It's easy to see that this results in worst case behavior if your data is already sorted (the first element will always be the min).
There are two common algorithms to solve this problem: randomly choose a pivot, or choose the median of three. Random is obvious so I won't go into detail. Median of three involves selecting three elements (usually the first, middle and last) and choosing the median of those as the pivot.
Since random number generators are typically pseudo-random (therefore deterministic) and a non-random median of three algorithm is deterministic, it's possible to construct data that results in worst case behavior, however it's rare for it to come up in normal usage.
You also need to consider the performance impact. The running time of your random number generator will affect the running time of your quicksort. With median of three, you are increasing the number of comparisons.
Worst Performance Condition:
When each time pivot chosen is 'greatest' or 'smallest' and this pattern repeats
So for 1 3 5 4 2
If pivots are chosen in order 1,2,3,4,5 Or 5,4,3,2,1
then the worst case running time is O(n*n)
How avoid the worst case:
(1)Divide the array into five sets.So if 1..100 the sets are (1..20) (21..40) (41..60) (61..80) (81..100)
(2)Choose median of first five elements in each of set so (3) (23) (43) (63) (83)
(3)Now choose the median among them as the pivot so here its (43)
An easy modification is to choose the pivot randomly. This gives good results with high probability.
It's been a while, but I think the worst case for quicksort was when the data was already sorted. A quick check to see if the data is already sorted could help alleviate this problem.
The worst case running time depends on the partition method within quick-sort. That has two aspects:
selecting the pivot
how to partition around the pivot
Good strategies to select the pivot have been outlinied in previous posts (median of medians, or median of three or randomization). But even if the pivot is wisely selected, in the extreme, if an array has all equal elements it will lead to worst case runtime if only two partitions are built, because one will carry the equal elements, that is all elements:
this causes partion to be called n times, each of it taking n/2 in average leading to O(n²)
this is not good, because it's not a theoretical worst case scenario but a quite common one
note that it is not solved by detecting the empty partition, because the pivot could have the highest or lowest element value (e.g. median is 5 which is as well the highest element value, but there might still be a few misplaced < 5 values)
A way around this problem is to partition into three partitions, a lower (elements < pivot), an equal (elements = pivot) and an upper partition. The "=pivot elements" are in their final position. Lower and upper partition needs still to be sorted if not empty.
Together with randomization, median of medians or some combination to select a pivot a worst case scenario is quite rare but not impossible, which leaves the algorithm with a worst case upper bound of O(n²).
The question I wonder is frequently asked. AFAI research there are 2 keys of its worstness.
If array is already sorted no matter ascending or descending in addition to selecting pivot as minimum(smallest) or maximum(greatest) element of the list. [2,3,4] or [4,3,2]
If all elements are same. [2,2,2]

Resources