i want to find approx median in unsorted list,i know two algorithm
algorithm 1- quickselect
algorithm 2- Median of medians
i can't use quickselect in my project because it take O(n^2) in worst case.
i heard about Median of medians,but my colleagues suggest that it takes O(n) with some constant factor.therefore its time complexity is Cn and constant factor is is large compare to quickselect. i want to know what is the constant factor associated with Median of medians ?and why Median of medians not use pseudo median of 9 element ?
or is their any other algorithm to find approx median in linear time O(n) ?
Although I wouldn't be quick to discard quickselect, since its worst-case performance is greatly improbable with properly-chosen pivots...
Perhaps introselect:
Introselect (short for "introspective selection") is a selection algorithm that is a hybrid of quickselect and median of medians which has fast average performance and optimal worst-case performance.
Introselect works by optimistically starting out with quickselect and only switching to the worst-time linear algorithm if it recurses too many times without making sufficient progress. The switching strategy is the main technical content of the algorithm. Simply limiting the recursion to constant depth is not good enough, since this would make the algorithm switch on all sufficiently large lists. Musser discusses a couple of simple approaches:
Keep track of the list of sizes of the subpartitions processed so far. If at any point k recursive calls have been made without halving the list size, for some small positive k, switch to the worst-case linear algorithm.
Sum the size of all partitions generated so far. If this exceeds the list size times some small positive constant k, switch to the worst-case linear algorithm. This sum is easy to track in a single scalar variable.
Both approaches limit the recursion depth to k ⌈log n⌉ = O(log n) and the total running time to O(n).
Related
I am prepping for interview leet-code type problems and I came across the k closest problem, but given a sorted array. This problem requires finding the k closest elements by value to an input value from the array. The answer to this problem was fairly straight forward and I did not have any issues determining a linear-time algorithm to solve it.
However, working on this problem got me thinking. Is it possible to solve this problem given an unsorted array in linear time? My first thought was to use a heap and that would give an O(nlogk) time complexity solution, but I am trying to determine if its possible to come up with an O(n) solution? I was thinking about possibly using something like quickselect, but the issue is that this has an expected time of O(n), not a worst case time of O(n).
Is this even possible?
The median-of-medians algorithm makes Quickselect take O(n) time in the worst case.
It is used to select a pivot:
Divide the array into groups of 5 (O(n))
Find the median of each group (O(n))
Use Quickselect to find the median of the n/5 medians (O(n))
The resulting pivot is guaranteed to be greater and less than 30% of the elements, so it guarantees linear time Quickselect.
After selecting the pivot, of course, you have to continue on with the rest of Quickselect, which includes a recursive call like the one we made to select the pivot.
The worst case total time is T(n) = O(n) + T(0.7n) + T(n/5), which is still linear. Compared to the expected time of normal Quickselect, though, it's pretty slow, which is why we don't often use this in practice.
Your heap solution would be very welcome at an interview, I'm sure.
If you really want to get rid of the logk, which in practical applications should seldom be a problem, then yes, using Quickselect would be another option. Something like this:
Partition your array in values smaller and larger than x. <- O(n).
For the lower half, run Quickselect to find the kth largest number, then take the right-side partition which are your k largest numbers. <- O(n)
Repeat step 2 for the higher half, but for the k smallest numbers. <- O(n)
Merge your k smallest and k largest numbers and extract the k closest numbers. <- O(k)
This gives you a total time complexity of O(n), as you said.
However, a few points about your worry about expected time vs worst-case time. I understand that if an interview question explicitly insists on worst-case O(n), then this solution might not be accepted, but otherwise, this can well be considered O(n) in practice.
The key here being that for randomized quickselect and random or well-behaved input, the probability that the time complexity goes beyond O(n) decreases exponentially as the input grows. Meaning that already at largeish inputs, the probability is as small as guessing at a specific atom in the known universe. The assumption on well-behaved input concerns being somewhat random in nature and not adversarial. See this discussion on a similar (not identical) problem.
Personally, I think the median of medians is not the real median. Correct?So if the above statement is true, why using the median of medians as the pivot to partition the array to find the Kth min elem's time complexity worst case is O(n)? The "n" is the num of elems.
Median of medians is indeed only an approximation, not necessarily the actual median.
It is used as an optimization, to calculate the pivot for a partition of the array in algorithms like Quicksort or Quickselect, such that the worst case complexity of O(n^2) is avoided.
Wikipedia article about it, saying:
Although this approach optimizes quite well, it is typically
outperformed in practice by instead choosing random pivots, which has
average linear time for selection and average linearithmic time for
sorting, and avoids the overhead of computing the pivot.
Quicksort and Median use the same method (Divide and concuer), why is it then that they have different asymptotic behavior?
Is it that quicksort may not use the proper pivot?
When you use method partition in Quicksort (see method in the link) to find the median, the method return index of element which have correct position, based on this position, you only need to check for selected parts which contains the median.
For example array length is 5, so median is 3. The partition method return 2, so you only need to check the upper part of the array from 2 to 5, not the whole array as Quicksort.
If you use Hoare's original select algorithm, you can get the same sort of poor worst case performance that you can from Quicksort.
If you use the median of medians, then you limit the worst case, at the expense of being slower in most typical cases.
You could use the median of medians to find a pivot for Quicksort, which would have roughly the same effect--limit the worst case, at the expense of being slower in most cases.
Of course, for the sort (in general) each partition operation is O(N), and you expect to do about log(N) partition operations, so you get approximately O(N log N) overall complexity.
With median finding, you also expect to do approximately O(log N) steps, but you only consider the partition from the previous step that can include the median (or quartile, etc. that you care about). You expect the sizes of those partitions to divide by (approximately) two at every step, rather than always having to partition the entire input, so you end up with approximately O(N) complexity instead of O(N log N) overall.
[Note that throughout this, I'm sort of abusing big-O notation to represent expected complexity whereas big-O is really supposed to represent the upper-bound (i.e., worst-case) complexity.]
I am working on quick-sort with median of medians algorithm. I normally use the selection-sort to get the median of the subarrays of 5 elements. However, if there are thousands of subarrays, it means that I have to find a median of thousand medians. I think I cannot use the selection-sort to find that median because it is not optimal.
Question:
Can anyone suggest me a better way to find that median?
Thanks in advance.
The median-of-medians algorithm doesn't work by finding the median of each block of size 5 and then running a sorting algorithm on them to find the median. Instead, you typically would sort each block, take the median of each, then recursively invoke the median-of-medians algorithm on these medians to get a good pivot. It's very uncommon to see the median-of-medians algorithm used in quicksort, since the constant factor in the O(n) runtime of the median-of-medians algorithm is so large that it tends to noticeably degrade performance.
There are several possible improvements you can try over this original approach. The simplest way to get a good pivot is just to pick a random element - this leads to Θ(n log n) runtime with very high probability. If you're not comfortable using randomness, you can try using the introselect algorithm, which is a modification of the median-of-medians algorithm that tries to lower the constant factor by guessing an element that might be a good pivot and cutting off the recursion early if one is found. You could also try writing introsort, which uses quicksort and switches to a different algorithm (usually heapsort) if it appears that the algorithm is degenerating.
Hope this helps!
I have a slight question about Quicksort. In the case where the minimun or maximum value of the array is selected, the pivot value the partition is very inefficient as the array size decreases by 1 one only.
However if I add code of selecting the median of that array, I think then Ii will be more efficient. Since partition algorithm is already O(N), it will give an O(N log N) algorithm.
Can this be done?
You absolutely can use a linear-time median selection algorithm to compute the pivot in quicksort. This gives you a worst-case O(n log n) sorting algorithm.
However, the constant factor on linear-time selection tends to be so high that the resulting algorithm will, in practice, be much, much slower than a quicksort that just randomly chooses the pivot on each iteration. Therefore, it's not common to see such an implementation.
A completely different approach to avoiding the O(n2) worst-case is to use an approach like the one in introsort. This algorithm monitors the recursive depth of the quicksort. If it appears that the algorithm is starting to degenerate, it switches to a different sorting algorithm (usually, heapsort) with a guaranteed worst-case O(n log n). This makes the overall algorithm O(n log n) without noticeably decreasing performance.
Hope this helps!