I want to use a Divide-and-Conquer procedure for the computation of the i-th greatest element at a row of integers and analyze the asymptotic time complexity of the algorithm.
Algorithm ith(A,low,high){
q=partition(A,low,high);
if (high-i+1==q) return A[q];
else if (high-i+1<q) ith(A,low,q-1);
else ith(A,q+1,high);
}
Is it right? If so, how could we find its time complexity?
The time complexity is described by the following recurrence relation:
T(n)=T(n-q)+T(q-1)+ฮ(n)
But how can we solve this recurrence relation, without knowing the value of q?
Or is there an algorithm with less time complexity that computes the i-th greatest element at a row of integers?
This is a variation of the quick select algorithm (which finds the i-th smallest element rather than i-thgreatest element). It has a running time of O(n^2)in the worst case, and O(n) in the average case.
To see the worst case, assume you are searching for the nth largest element, and it happens that the algorithm always picks q to be the largest element in the remaining range. so you will be calling the ith function n times. In addition the partition subroutine takes O(n) so the total running time is O(n^2).
To understand the average case analysis , check the explanation given by professor Tim Roughgarden here.
Related
I am prepping for interview leet-code type problems and I came across the k closest problem, but given a sorted array. This problem requires finding the k closest elements by value to an input value from the array. The answer to this problem was fairly straight forward and I did not have any issues determining a linear-time algorithm to solve it.
However, working on this problem got me thinking. Is it possible to solve this problem given an unsorted array in linear time? My first thought was to use a heap and that would give an O(nlogk) time complexity solution, but I am trying to determine if its possible to come up with an O(n) solution? I was thinking about possibly using something like quickselect, but the issue is that this has an expected time of O(n), not a worst case time of O(n).
Is this even possible?
The median-of-medians algorithm makes Quickselect take O(n) time in the worst case.
It is used to select a pivot:
Divide the array into groups of 5 (O(n))
Find the median of each group (O(n))
Use Quickselect to find the median of the n/5 medians (O(n))
The resulting pivot is guaranteed to be greater and less than 30% of the elements, so it guarantees linear time Quickselect.
After selecting the pivot, of course, you have to continue on with the rest of Quickselect, which includes a recursive call like the one we made to select the pivot.
The worst case total time is T(n) = O(n) + T(0.7n) + T(n/5), which is still linear. Compared to the expected time of normal Quickselect, though, it's pretty slow, which is why we don't often use this in practice.
Your heap solution would be very welcome at an interview, I'm sure.
If you really want to get rid of the logk, which in practical applications should seldom be a problem, then yes, using Quickselect would be another option. Something like this:
Partition your array in values smaller and larger than x. <- O(n).
For the lower half, run Quickselect to find the kth largest number, then take the right-side partition which are your k largest numbers. <- O(n)
Repeat step 2 for the higher half, but for the k smallest numbers. <- O(n)
Merge your k smallest and k largest numbers and extract the k closest numbers. <- O(k)
This gives you a total time complexity of O(n), as you said.
However, a few points about your worry about expected time vs worst-case time. I understand that if an interview question explicitly insists on worst-case O(n), then this solution might not be accepted, but otherwise, this can well be considered O(n) in practice.
The key here being that for randomized quickselect and random or well-behaved input, the probability that the time complexity goes beyond O(n) decreases exponentially as the input grows. Meaning that already at largeish inputs, the probability is as small as guessing at a specific atom in the known universe. The assumption on well-behaved input concerns being somewhat random in nature and not adversarial. See this discussion on a similar (not identical) problem.
This is my question I have got somewhere.
Given a list of numbers in random order write a linear time algorithm to find the ๐th smallest number in the list. Explain why your algorithm is linear.
I have searched almost half the web and what I got to know is a linear-time algorithm is whose time complexity must be O(n). (I may be wrong somewhere)
We can solve the above question by different algorithms eg.
Sort the array and select k-1 element [O(n log n)]
Using min-heap [O(n + klog n)]
etc.
Now the problem is I couldn't find any algorithm which has O(n) time complexity and satisfies that algorithm is linear.
What can be the solution for this problem?
This is std::nth_element
From cppreference:
Notes
The algorithm used is typically introselect although other selection algorithms with suitable average-case complexity are allowed.
Given a list of numbers
although it is not compatible with std::list, only std::vector, std::deque and std::array, as it requires RandomAccessIterator.
linear search remembering k smallest values is O(n*k) but if k is considered constant then its O(n) time.
However if k is not considered as constant then Using histogram leads to O(n+m.log(m)) time and O(m) space complexity where m is number of possible distinct values/range in your input data. The algo is like this:
create histogram counters for each possible value and set it to zero O(m)
process all data and count the values O(m)
sort the histogram O(m.log(m))
pick k-th element from histogram O(1)
in case we are talking about unsigned integers from 0 to m-1 then histogram is computed like this:
int data[n]={your data},cnt[m],i;
for (i=0;i<m;i++) cnt[i]=0;
for (i=0;i<n;i++) cnt[data[i]]++;
However if your input data values does not comply above condition you need to change the range by interpolation or hashing. However if m is huge (or contains huge gaps) is this a no go as such histogram is either using buckets (which is not usable for your problem) or need list of values which lead to no longer linear complexity.
So when put all this together is your problem solvable with linear complexity when:
n >= m.log(m)
Given an array of integers. I want to design a comparison-based algorithm
that pairs the largest element with the smallest one, the second largest one with the second smallest one and so on. Obviously, this is easy if I sort the array, but I want to do it in O(n) time. How can I possibly solve this problem?
Well i can prove that it does not exists.
Let`s proof by contradiction: suppose there was such algorithm
When we could get an array of kth min and kth max pairs.
We could when get sorted array by taking all mins in order then all max in order,
so we could get original array sorted in O(n) steps.
So we could get a comparision based sorting algorithm that sorts in O(n)
Yet it can be proven that comparision based sorting algorithm must take atleast n
log n steps. (many proofs online. i.e. https://www.geeksforgeeks.org/lower-bound-on-comparison-based-sorting-algorithms/)
Hence we have a contradiction so such algortihm does not
exist.
Assuming a random process X of n elements: X={x1,...,xn}.
For a given probability p, the corresponding quantile is determined through the quantile function Q defined as:
Q(p)={x|Pr(X<=x)=p}
What is the time complexity of finding the quantile of a given probability p?
Quickselect can use median of medians pivot selection to get O(n) worst-case runtime to select the kth order statistic. This appears to be more or less what you are after unless I am misinterpreting (you want the (n*p)'th smallest element).
You cannot possibly improve upon this in the general case: you must at least look at the whole array or else the element you didn't look at might be the answer you need.
Therefore, the Quickselect is theoretically optimal in the worst case. Note: this pivot selection strategy has bad constants and is not used in practice. In practice, using random pivot selection gives good expected performance but O(n^2) worst-case.
For my algorithm design class homework came this brain teaser:
Given a list of N distinct positive integers, partition the list into two
sublists of n/2 size such that the difference between sums of the sublists
is maximized.
Assume that n is even and determine the time complexity.
At first glance, the solution seems to be
sort the list via mergesort
select the n/2 location
for all elements greater than, add to high array
for all elements lower than, add to low array
This would have a time complexity of O((n log n)+ n)
Are there any better algorithm choices for this problem?
Since you can calculate median in O(n) time you can also solve this problem in O(n) time. Calculate median, and using it as threshold, create high array and low array.
See http://en.wikipedia.org/wiki/Median_search on calculating median in O(n) time.
Try
http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm
What you're effectively doing is finding the median. The trick is, once you've found the values, you wouldn't have needed to sort the first n/2 and the last n/2.