The time complexity of quick select - algorithm

I read that the time complexity of quick select is:
T(n) = T(n/5) + T(7n/10) + O(n)
I read the above thing as "time taken to quick select from n elements = (time taken to select from 7n/10 elements)+ (time taken to quickselect from n/5 elements) + (some const *n)"
So I understand that once we find decent pivot, only 7n/10 elements are left, and doing one round of arranging the pivot takes time n.
But the n/5 part confuses me. I know it has got to do with median of medians, but i don't quite get it.
Median of medians from what i understood , is recursively splitting into 5 and finding the medians, till u get 1.
I found that the time taken to do that, is about n
So T of mom(n)=n
How do you equate that T of quick_select(n) = T_mom(n)/5?
In other words, this is what I think the equation should read:
T(n)= O(n)+n+T(7n/10)
where,
O(n) -> for finding median
n-> for getting the pivot into its position
T(7n/10) -> Doing the same thing for the other 7n/10 elements. (worst case)
Can someone tell me where I'm going wrong?

In this setup, T(n) refers to the number of steps required to compute MoM on an array of n elements. Let's go through the algorithm one step at a time and see what happens.
First, we break the input into blocks of size 5, sort each block, form a new array of the medians of those blocks, and recursively call MoM to get the median of that new array. Let's see how long each of those steps takes:
Break the input into blocks of size 5: this could be done in time O(1) by just implicitly partitioning the array into blocks without moving anything.
Sort each block: sorting an array of any constant size takes time O(1). There are O(n) such blocks (specifically, ⌈n / 5⌉), so this takes time O(n).
Get the median of each block and form a new array from those medians. The median element of each block can be found in time O(1) by just looking at the center element. There are O(n) blocks, so this step takes time O(n).
Recursively call MoM on that new array. This takes time T(⌈n/5⌉), since we're making a recursive call on the array of that size we formed in the previous step.
So this means that the logic to get the actual median of medians takes time O(n) + T(⌈n/5⌉).
So where does the T(7n/10) part come from? Well, the next step in the algorithm is to use the median of medians we found in step (4) as a partition element to split the elements into elements less than that pivot and elements greater than that pivot. From there, we can determine whether we've found the element we're looking for (if it's at the right spot in the array) or whether we need to recurse on the left or right regions of the array. The advantage of picking the median of the block medians as the splitting point is that it guarantees a worst-case 70/30 split in this step between the smaller and larger elements, so if we do have to recursively continue the algorithm, in the worst case we do so with roughly 7n/10 elements.

In the median of median part, we do the followings:
Take median of sublists which each of them has at most 5 elements. for each of this lists we need O(1) operations and there are n/5 such lists so totally it takes O(n) to just find median of each of them.
We take median of those n/5 medians (median of medians). This needs T(n/5), because there are only n/5 elements which we should check.
So the median of median part is actually T(n/5) + O(n), BTW the T(7n/10) part is not exactly as what you said.

Related

quick sort time and space complexity?

Quick is the in place algorithm which does not use any auxiliary array. So why memory complexity of this O(nlog(n)) ?
Similarly I understand it's worst case time complexity is O(n^2) but not getting why average case time complexity is O(nlog(n)). Basically I am not sure what do we mean when we say average case complexity ?
To your second point an excerpt from Wikipedia:
The most unbalanced partition occurs when the partitioning routine returns one of sublists of size n − 1. This may occur if the pivot happens to be the smallest or largest element in the list, or in some implementations (e.g., the Lomuto partition scheme as described above) when all the elements are equal.
If this happens repeatedly in every partition, then each recursive call processes a list of size one less than the previous list. Consequently, we can make n − 1 nested calls before we reach a list of size 1. This means that the call tree is a linear chain of n − 1 nested calls. The ith call does O(n − i) work to do the partition, and {\displaystyle \textstyle \sum _{i=0}^{n}(n-i)=O(n^{2})} , so in that case Quicksort takes O(n²) time.
Because you usually don't know what exact numbers you have to sort and you don't know, which pivot element you choose, you have the chance, that your pivot element isn't the smallest or biggest number in the array you sort. If you have an array of n not duplicated numbers, you have the chance of (n - 2) / n, that you don't have a worst case.

Can't define runtime of this algorithm

I have one algorithm here.
Click here to check algorithm image
What it does, it traverse an array and find 3 largest values and return their sum.
For example, an array [1,2,3,4,5] will return 12 (3+4+5=12).
The algorithm in the image says it is O(nlogk). But that is what I cannot understand.
Followings is my perspective about first for loop in the image:
Heap's method "insert()" and "deleteMin()", they both takes O(logn). So in the first for loop, it makes O(2*logn) by adding their runtime, which is simply O(logn). Since first for loop iterates for all element in the array, so total runtime of first for loop is O(nlogn).
Following is my perspective about 2nd while loop in the image:
From the previous for loop, we have deleted some of minimum values if h.size() > k. So the number of values in the heap is currently k. "sum=sum+h.min()" takes O(logn) because searching minimum value in heap takes O(logn) if I know correctly, and "h.deleteMin()" also takes O(logn) because it has to search again and delete. So is O(2*logn) by adding their runtime, which is simply O(logn). Since we iterate this while loop for only k times because there are k numbers of elements, so 2nd while loops result in O(k*logn)
So we have O(nlogn) from first for loop, and O(klogn) from 2nd while loop. It is obvious that O(nlogn) is greater than O(klogn) since k is some constant. Thus this algorithm ends in being O(nlogn) at the end.
But the answer says it is "O(nlogk)" instead of "O(nlogn)".
Can you explain the reason?
Operations on heap take O(log(size_of_heap)). In case of this algorithm heap size is k (excluding first several iterations).
So we get O(total_number_of_elements*log(size_of_heap))=O(n*log(k)).
Your assumption about insert() and deletemin() runtime takes O(log n) is the not correct. The 'n' in O(log n) represents the no.of elements in heap. In this case it is k.
Hence, for the first loop - you have O(2*logk) for every element and a total will have O(nlogk) and 2nd loop - O(klogk)
Together the total complexity can be defines as O(n*logk)

Number of comparisons in quick sort variation

Will the number of comparisons differ when we take the last element as the pivot element in Quick sort and when we take the first element as the pivot element in the quick sort??
No it will not. In quick sort, what happens is, we chose a pivot element(say x). Then divide the list to 2 parts larger than x and less than x.
Therefore, the number of comparisons change slightly proportional to the recursion depth. That is, the more deeper the recursive function goes, more the number of comparisons to be made to divide the list to 2 parts.
The recursion depth differs - More the value of x can divide the list to similar length parts, lesser will be the recursion depth.
Therefore, the conclusion is, it doesn't matter whether you chose the first or the last element as the pivot, but whether that value can divide the list to 2 similar length lists.
Edit
The more the pivot is close to the median, lesser will be the complexity (O(nlogn)). The more the pivot is close to the max or min of the list, complexity increases (up to O(n^2))
When a first element or last element is chosen as pivot the number of comparisons remain same but it is the worst case as the array is either sorted or reverse sorted.
In every step ,numbers are divided as per the following recurrence.
T(n) = T(n-1) + O(n) and if you solve this relation it will give you the complexity of theta(n^2)
And when you choose median element as pivot it gives a recurrence relationship of
T(n) = 2T(n/2) + \theta(n) which is the best case as it gives complexity of `nlogn`

Finding a specific ratio in an unsorted array. Time complexity

This is a homework assignment.
The goal is to present an algorithm in pseudocode that will search an array of numbers (doesn't specify if integers or >0) and check if the ratio of any two numbers equals a given x. Time complexity must be under O(nlogn).
My idea was to mergesort the array (O(nlogn) time) and then if |x| > 1 start checking for every number in desending order (using a binary traversal algorithm). The check should also take O(logn) time for each number, with a worst case of n checks gives a total of O(nlogn). If I am not missing anything this should give us a worst case of O(nlogn) + O(nlogn) = O(nlogn), within the parameters of the assignment.
I realize that it doesn't really matter where I start checking the ratios after sorting, but the time cost is amortized by 1/2).
Is my logic correct? Is there a faster algorithm?
An example in case it isn't clear:
Given an array { 4, 9, 2, 1, 8, 6 }
If we want to seach for a ratio of 2:
Mergesort { 9, 8, 6, 4, 2, 1 }
Since the given ratio is >1 we will search from left to right.
2a. First number is 9. Checking 9 / 4 > 2. Checking 9/6 < 2 Next Number.
2b. Second number is 8. Checking 8 / 4 = 2. DONE
The analysis you have presented is correct and is a perfectly good way to solve this problem. Sorting does work in time O(n log n), and 2n binary searches also takes O(n log n) time. That said, I don't think you want to use the term "amortized" here, since that refers to a different type of analysis.
As a hint for how to speed up your solution a bit, the general idea of your solution is to make it possible to efficiently query, for any number, whether that number exists in the array. That way, you can just loop over all numbers and look for anything that would make the ratio work. However, if you use an auxiliary data structure outside the array that supports fast access, you can possibly whittle down your runtime at the cost of increasing the memory usage. Try thinking about what data structures support very fast access (say, O(1) lookups) and see if you can use any of them here.
Hope this helps!
to solve this problem, only O(nlgn) is enough
step 1, sort the array. that cost O(nlgn)
step 2, check whether the ratio exists, this step only needs o(n)
u just need two pointers, one points to the first element(smallest one), another points to the last element(biggest one).
calculate the ratio.
if the ratio is bigger than the specified one, move the second pointer to its previous element.
if the ratio is smaller than the specified one, move the first pointer to its next element.
repeat the above steps until:
u find the exact ratio, or
either the first pointer reaches the end, or the second point reaches the beginning
The complexity of your algorithm is O(n²), because after sorting the array, you iterate over each element (up to n times) and in each iteration you execute up to n - 1 divisions.
Instead, after sorting the array, iterate over each element, and in each iteration divide the element by the ratio, then see if the result is contained in the array:
division: O(1)
search in sorted list: O(log n)
repeat for each element: n times
Results in time complexity O(n log n)
In your example:
9/2 = 4.5 (not found)
8/2 = 4 (found)
(1) Build a hashmap of this array. Time Cost: O(n)
(2) For every element a[i], search a[i]*x in HashMap. Time Cost: O(n).
Total Cost: O(n)

Algorithm to solve variation of k-partition

For my algorithm design class homework came this brain teaser:
Given a list of N distinct positive integers, partition the list into two
sublists of n/2 size such that the difference between sums of the sublists
is maximized.
Assume that n is even and determine the time complexity.
At first glance, the solution seems to be
sort the list via mergesort
select the n/2 location
for all elements greater than, add to high array
for all elements lower than, add to low array
This would have a time complexity of O((n log n)+ n)
Are there any better algorithm choices for this problem?
Since you can calculate median in O(n) time you can also solve this problem in O(n) time. Calculate median, and using it as threshold, create high array and low array.
See http://en.wikipedia.org/wiki/Median_search on calculating median in O(n) time.
Try
http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm
What you're effectively doing is finding the median. The trick is, once you've found the values, you wouldn't have needed to sort the first n/2 and the last n/2.

Resources