Median of medians is not the real median. Correct? - algorithm

Personally, I think the median of medians is not the real median. Correct?So if the above statement is true, why using the median of medians as the pivot to partition the array to find the Kth min elem's time complexity worst case is O(n)? The "n" is the num of elems.

Median of medians is indeed only an approximation, not necessarily the actual median.
It is used as an optimization, to calculate the pivot for a partition of the array in algorithms like Quicksort or Quickselect, such that the worst case complexity of O(n^2) is avoided.
Wikipedia article about it, saying:
Although this approach optimizes quite well, it is typically
outperformed in practice by instead choosing random pivots, which has
average linear time for selection and average linearithmic time for
sorting, and avoids the overhead of computing the pivot.

Related

Quick select with random pick index or with median of medians?

To avoid the O(n^2) worst case scenario for quick select, I am aware of 2 options:
Randomly choose a pivot index
Use median of medians (MoM) to select an approximate median and pivot around that
When using MoM with quick select, we can guarantee worst case O(n). When using (1), we can't guarantee worst case O(n), but the probability of the algorithm going to O(n^2) should be extremely small. The overhead cost of (2) is much more than (1), where the latter adds little to no additional complexity.
So when should we use one over the other?
As you've noted, the median-of-medians approach is slower than quickselect, but has a better worst-case runtime. Assuming quickselect is truly using a random choice of pivot at each step, you can prove that not only is the expected runtime O(n), but that the probability that its runtime exceeds Θ(n log n) is very, very small (at most 1 / nk for any choice of constant k). So in that sense, if you have the ability to select pivots at random, quickselect will likely be faster.
However, not all implementations of quickselect use true randomness for the pivots, and some use deterministic pivot selection algorithms. This, unfortunately, can lead to pathological inputs that trigger the Θ(n2) worst-case runtime, which is a problem if you have adversarially-chosen inputs.
Once nice compromise between the two is introselect. The basic idea behind introselect is to use quickselect with a deterministic pivot selection algorithm. As the algorithm is running, it keeps track of how many times it's picked a pivot without throwing away at least 30% the input array. If that number exceeds some threshold, it stops using a random pivot choice and switches to the median-of-medians approach to select a good pivot, forcing a 30% size reduction. This approach means that in the common case when quickselect rapidly reduces the input size, introselect is basically identical to quickselect with a tiny bookkeeping overhead. However, in cases where quickselect would degrade to quadratic, introselect stops and switches to the worst-case efficient median-of-medians approach, ensuring the worst-case runtime is O(n). This gives you, essentially, the best of both worlds - it's fast on average, and its worst-case is never worse than O(n).

find approx median in unsorted list

i want to find approx median in unsorted list,i know two algorithm
algorithm 1- quickselect
algorithm 2- Median of medians
i can't use quickselect in my project because it take O(n^2) in worst case.
i heard about Median of medians,but my colleagues suggest that it takes O(n) with some constant factor.therefore its time complexity is Cn and constant factor is is large compare to quickselect. i want to know what is the constant factor associated with Median of medians ?and why Median of medians not use pseudo median of 9 element ?
or is their any other algorithm to find approx median in linear time O(n) ?
Although I wouldn't be quick to discard quickselect, since its worst-case performance is greatly improbable with properly-chosen pivots...
Perhaps introselect:
Introselect (short for "introspective selection") is a selection algorithm that is a hybrid of quickselect and median of medians which has fast average performance and optimal worst-case performance.
Introselect works by optimistically starting out with quickselect and only switching to the worst-time linear algorithm if it recurses too many times without making sufficient progress. The switching strategy is the main technical content of the algorithm. Simply limiting the recursion to constant depth is not good enough, since this would make the algorithm switch on all sufficiently large lists. Musser discusses a couple of simple approaches:
Keep track of the list of sizes of the subpartitions processed so far. If at any point k recursive calls have been made without halving the list size, for some small positive k, switch to the worst-case linear algorithm.
Sum the size of all partitions generated so far. If this exceeds the list size times some small positive constant k, switch to the worst-case linear algorithm. This sum is easy to track in a single scalar variable.
Both approaches limit the recursion depth to k ⌈log n⌉ = O(log n) and the total running time to O(n).

Quicksort vs Median asymptotic behavior

Quicksort and Median use the same method (Divide and concuer), why is it then that they have different asymptotic behavior?
Is it that quicksort may not use the proper pivot?
When you use method partition in Quicksort (see method in the link) to find the median, the method return index of element which have correct position, based on this position, you only need to check for selected parts which contains the median.
For example array length is 5, so median is 3. The partition method return 2, so you only need to check the upper part of the array from 2 to 5, not the whole array as Quicksort.
If you use Hoare's original select algorithm, you can get the same sort of poor worst case performance that you can from Quicksort.
If you use the median of medians, then you limit the worst case, at the expense of being slower in most typical cases.
You could use the median of medians to find a pivot for Quicksort, which would have roughly the same effect--limit the worst case, at the expense of being slower in most cases.
Of course, for the sort (in general) each partition operation is O(N), and you expect to do about log(N) partition operations, so you get approximately O(N log N) overall complexity.
With median finding, you also expect to do approximately O(log N) steps, but you only consider the partition from the previous step that can include the median (or quartile, etc. that you care about). You expect the sizes of those partitions to divide by (approximately) two at every step, rather than always having to partition the entire input, so you end up with approximately O(N) complexity instead of O(N log N) overall.
[Note that throughout this, I'm sort of abusing big-O notation to represent expected complexity whereas big-O is really supposed to represent the upper-bound (i.e., worst-case) complexity.]

Finding the median of medians of quicksort

I am working on quick-sort with median of medians algorithm. I normally use the selection-sort to get the median of the subarrays of 5 elements. However, if there are thousands of subarrays, it means that I have to find a median of thousand medians. I think I cannot use the selection-sort to find that median because it is not optimal.
Question:
Can anyone suggest me a better way to find that median?
Thanks in advance.
The median-of-medians algorithm doesn't work by finding the median of each block of size 5 and then running a sorting algorithm on them to find the median. Instead, you typically would sort each block, take the median of each, then recursively invoke the median-of-medians algorithm on these medians to get a good pivot. It's very uncommon to see the median-of-medians algorithm used in quicksort, since the constant factor in the O(n) runtime of the median-of-medians algorithm is so large that it tends to noticeably degrade performance.
There are several possible improvements you can try over this original approach. The simplest way to get a good pivot is just to pick a random element - this leads to Θ(n log n) runtime with very high probability. If you're not comfortable using randomness, you can try using the introselect algorithm, which is a modification of the median-of-medians algorithm that tries to lower the constant factor by guessing an element that might be a good pivot and cutting off the recursion early if one is found. You could also try writing introsort, which uses quicksort and switches to a different algorithm (usually heapsort) if it appears that the algorithm is degenerating.
Hope this helps!

Using median selection in quicksort?

I have a slight question about Quicksort. In the case where the minimun or maximum value of the array is selected, the pivot value the partition is very inefficient as the array size decreases by 1 one only.
However if I add code of selecting the median of that array, I think then Ii will be more efficient. Since partition algorithm is already O(N), it will give an O(N log N) algorithm.
Can this be done?
You absolutely can use a linear-time median selection algorithm to compute the pivot in quicksort. This gives you a worst-case O(n log n) sorting algorithm.
However, the constant factor on linear-time selection tends to be so high that the resulting algorithm will, in practice, be much, much slower than a quicksort that just randomly chooses the pivot on each iteration. Therefore, it's not common to see such an implementation.
A completely different approach to avoiding the O(n2) worst-case is to use an approach like the one in introsort. This algorithm monitors the recursive depth of the quicksort. If it appears that the algorithm is starting to degenerate, it switches to a different sorting algorithm (usually, heapsort) with a guaranteed worst-case O(n log n). This makes the overall algorithm O(n log n) without noticeably decreasing performance.
Hope this helps!

Resources