I would like to come up with a recurrence for this given problem:
Consider a variation of the randomized quicksort algorithm where the pivot is picked randomly until the array is partitioned in such a way that both the lower subarray L and the greater subarray G
contain 3/4 of the elements of the array. For instance, if the randomly chosen pivot
partitions the array in such a way that L contains 1/10 of the elements, then another
pivot is randomly chosen. Analyze the expected running time of this algorithm.
At first I treated this question as if it's just a regular quicksort question and came up with this recurrence where:
T(n) = T(3/4n) + T(n/4) + Θ(n) (where Θ(n) comes from the partition)
It would make sense if we had an algorithm where the split is always 1/4 : 3/4. But we are using random pivotting here and pivot changes everytime the condition for partioning is not satisfied. I know that worst-case running time for randomized quicksort is still O(n^2) but I think under these circumstances the worst case is different now (something worse than O(n^2)). Am I on the right track so far?
The time complexity of quick sort will never go beyond O(n^2) unless you chose some logic which takes O(n) time to chose the pivot.
The best way to chose the pivot is a random element or end or first element.
There are n/2 bad pivots. Assuming you never select the same pivot twice (if you do, the worst case is always selecting a bad pivot, i.e. infinite time), in the worst case you'd repeat the partitioning n/2 times, which leads to Θ(n^2) complexity of partitioning phase. The recurrence becomes
T(n) = T(n/4) + T(3n/4) + Θ(n^2)
Related
I have been given this algorithm that computes the median of an array and partitions the other items around it.
It puts all the elements smaller than the median in a set A1, all those equal to it in A2 and all those bigger in A3. If A1 is bigger than 1 it goes recursively into it and the same happens for A3. It terminates after copying a concatenation of A1, A2 and A3 in A.
I know it’s very similar to Quickselect, but I don’t know how to proceed in order to figure out the time complexity in the worst case.
What I know is that in Quicksort, time complexity is T(n) = n -1 + T(a) + T(n -a-1), where n - 1 is for the partition, T(a) is the recursive call on the first part and t(n-a-1) is the recursive call on the last part. In that case the worst scenario happened when the pivot was always the biggest or the smallest item in the array.
But now, since we have the median as the pivot, what could the worst case be?
You can use the Big 5 Algorithm which will give you an approximate median. If you use this as your pivot in quicksort, the worst-case complexity would be O(n log n) instead of O(n^2), since we are making equal divisions each time instead of the worst case when we divide unequally with one bucket having one element and the other having n - 1 elements.
This worst case is very unlikely on the other hand. There is a decent amount overhead attached with finding the pivot point using the Big 5 median algorithm, so in practice is it outperformed by choosing random pivots. But if you wanted to find the median every time, the worst case would be O(n logn)
Time complexity of Quicksort when pivot always is the 2nd smallest element in a sublist.
Is it still O(NlogN)?
If i solve the recurrence equation
F(N) = F(N-2) + N
= F(N-2(2)) + 2N -2
= F(N-3(2)) + 3N - (2+1)(2)
= F(N-4(2)) + 4N - (3+2+1)(2)
Which is O(N^2), but I doubt my answer somehow, someone help me with the clarification please?
To start with, the quicksort algorithm has an average time complexity of O(NlogN), but its worst-time complexity is actually O(N^2).
The generic complexity analysis of quicksort depends not just on the devising of the recurrence relations, but also on the value of the variable K in F(N-K) term of your recurrence relation. And according to whether you're calculating best, average and worst case complexities, that value is usually estimated by the probability distribution of having the best, average, or worst element as the pivot, respectively.
If, for instance, you want to compute the best case, then you may think that your pivot always divides the array into two. (i.e. K=N/2) If computing for the worst case, you may think that your pivot is either the largest or the smallest element. (i.e. K=1) For the average case, based on the probability distribution of the indices of the elements, K=N/4 is used. (You may need more about it here) Basically, for the average case, your recurrence relation becomes F(N) = F(N / 4) + F(3 * N / 4) + N, which yields O(NlogN).
Now, the value you assumed for K, namely 2, is just one shy from the worst case scenario. That is why you can not observe the average case performance of O(NlogN) here, and get O(N^2).
Will the number of comparisons differ when we take the last element as the pivot element in Quick sort and when we take the first element as the pivot element in the quick sort??
No it will not. In quick sort, what happens is, we chose a pivot element(say x). Then divide the list to 2 parts larger than x and less than x.
Therefore, the number of comparisons change slightly proportional to the recursion depth. That is, the more deeper the recursive function goes, more the number of comparisons to be made to divide the list to 2 parts.
The recursion depth differs - More the value of x can divide the list to similar length parts, lesser will be the recursion depth.
Therefore, the conclusion is, it doesn't matter whether you chose the first or the last element as the pivot, but whether that value can divide the list to 2 similar length lists.
Edit
The more the pivot is close to the median, lesser will be the complexity (O(nlogn)). The more the pivot is close to the max or min of the list, complexity increases (up to O(n^2))
When a first element or last element is chosen as pivot the number of comparisons remain same but it is the worst case as the array is either sorted or reverse sorted.
In every step ,numbers are divided as per the following recurrence.
T(n) = T(n-1) + O(n) and if you solve this relation it will give you the complexity of theta(n^2)
And when you choose median element as pivot it gives a recurrence relationship of
T(n) = 2T(n/2) + \theta(n) which is the best case as it gives complexity of `nlogn`
I know the worst case of the algorithm - which is when the elements are already sorted or when all the elements are same,but want to know the point at which the algorithm moves from a complexity of nlgn to n2.
It depends on how we choose the pivot.
One view says that when all the elements are already sorted. Well, it is not 100% right. In this condition, if we choose the first element as the pivot, the complexity becomes N^2.
Since we have,
T(N) = T(N-1) + cN (N >1), if you are good at basic math, then:
T(N) = O(N^2)
As mentioned above, it depends on how we choose the pivot. Although in some textbooks, it chooses the first pivot mainly, that is not recommenced.
One popular method is : median-of-three partitioning. It choose median value of a[left],a[right] and a[(left+right)/2].
It will perform worst i.e,
O(n^2)
in following cases
If the list is already sorted and pivot is first element
If list is sorted in reverse order and pivot is last element.
If all elements are same in the list. In this case pivot selection do not matter.
Note- Already sorted cannot be the worst case if pivot is selected as median.
The worst case time for quick sort occurs when the chosen pivot does not divide the array. For example, if we choose the first element as the pivot every time and the array is already sorted, then the array is not divided at all. Hence the complexity is O(n^2).
To avoid this we randomize the index for the pivot. Assuming that the pivot splits the array in two equal sized parts we have a complexity of O(n log n).
For exact analysis see Formal analysis section in https://en.wikipedia.org/wiki/Quicksort
Given the list of numbers:
2 5 1 8 4 10 6 3 7 9 0
The actual implementation of quick sort I understand, but a question on my homework that I didn't was:
What is the optimal choice of pivot, why?
I had assumed when reading this that the obvious choice for a pivot would be the 5 or 6 since its in the middle of the list. I figured quick sort would work either way though since we choose a new pivot every time. Which makes the followup question make a little more sense, but does anyone have a formal definition?
Why is an optimal pivot not practical?
The optimal pivot is the median of the set you're currently working on, because it will split the set into two equal-sized subsets which guarantees O(n log n) performance. The reason it's not practical is because of the cost of finding the actual median. You essentially have to sort the data to find the median, so it's like the book Catch 22 - "How do I sort the data?" "Find the median" "How do I find a median?" "Sort the data".
Optimal pivot is in the middle, because when you move it to the left or to the right (or take biggest or smallest item), you increase depth of recursion. In the worst case you will get O(n^2) except of O(n*log2(n)) when taking the middle.
Optimal pivot must be median of numbers because then subproblem sizes are exactly half of original. The time complexity is defined as follows:-
T(N) = T(N/2) + O(N)
which evaluates to
T(N) = O(NlogN)
Whereas if pivot ends up being the first element of array after partitioning then:-
T(N) = T(N-1) + O(N)
T(N) = O(N^2)
which is as bad as bubble sort
The reason that using median always as pivot is not practical because the algorithm that do it in O(N) are very complex & u can always do it in O(NlogN) but that is sorting again which is the problem which we are solving. Here is an example of algorithm that evaluates median in O(N) : -
Median of Medians