Number of comparisons in quick sort variation - algorithm

Will the number of comparisons differ when we take the last element as the pivot element in Quick sort and when we take the first element as the pivot element in the quick sort??

No it will not. In quick sort, what happens is, we chose a pivot element(say x). Then divide the list to 2 parts larger than x and less than x.
Therefore, the number of comparisons change slightly proportional to the recursion depth. That is, the more deeper the recursive function goes, more the number of comparisons to be made to divide the list to 2 parts.
The recursion depth differs - More the value of x can divide the list to similar length parts, lesser will be the recursion depth.
Therefore, the conclusion is, it doesn't matter whether you chose the first or the last element as the pivot, but whether that value can divide the list to 2 similar length lists.
Edit
The more the pivot is close to the median, lesser will be the complexity (O(nlogn)). The more the pivot is close to the max or min of the list, complexity increases (up to O(n^2))

When a first element or last element is chosen as pivot the number of comparisons remain same but it is the worst case as the array is either sorted or reverse sorted.
In every step ,numbers are divided as per the following recurrence.
T(n) = T(n-1) + O(n) and if you solve this relation it will give you the complexity of theta(n^2)
And when you choose median element as pivot it gives a recurrence relationship of
T(n) = 2T(n/2) + \theta(n) which is the best case as it gives complexity of `nlogn`

Related

quick sort time and space complexity?

Quick is the in place algorithm which does not use any auxiliary array. So why memory complexity of this O(nlog(n)) ?
Similarly I understand it's worst case time complexity is O(n^2) but not getting why average case time complexity is O(nlog(n)). Basically I am not sure what do we mean when we say average case complexity ?
To your second point an excerpt from Wikipedia:
The most unbalanced partition occurs when the partitioning routine returns one of sublists of size n − 1. This may occur if the pivot happens to be the smallest or largest element in the list, or in some implementations (e.g., the Lomuto partition scheme as described above) when all the elements are equal.
If this happens repeatedly in every partition, then each recursive call processes a list of size one less than the previous list. Consequently, we can make n − 1 nested calls before we reach a list of size 1. This means that the call tree is a linear chain of n − 1 nested calls. The ith call does O(n − i) work to do the partition, and {\displaystyle \textstyle \sum _{i=0}^{n}(n-i)=O(n^{2})} , so in that case Quicksort takes O(n²) time.
Because you usually don't know what exact numbers you have to sort and you don't know, which pivot element you choose, you have the chance, that your pivot element isn't the smallest or biggest number in the array you sort. If you have an array of n not duplicated numbers, you have the chance of (n - 2) / n, that you don't have a worst case.

Binary search with Random element

I know that Binary Search has time complexity of O(logn) to search for an element in a sorted array. But let's say if instead of selecting the middle element, we select a random element, how would it impact the time complexity. Will it still be O(logn) or will it be something else?
For example :
A traditional binary search in an array of size 18 , will go down like 18 -> 9 -> 4 ...
My modified binary search pings a random element and decides to remove the right part or left part based on the value.
My attempt:
let C(N) be the average number of comparisons required by a search among N elements. For simplicity, we assume that the algorithm only terminates when there is a single element left (no early termination on strict equality with the key).
As the pivot value is chosen at random, the probabilities of the remaining sizes are uniform and we can write the recurrence
C(N) = 1 + 1/N.Sum(1<=i<=N:C(i))
Then
N.C(N) - (N-1).C(N-1) = 1 + C(N)
and
C(N) - C(N-1) = 1 / (N-1)
The solution of this recurrence is the Harmonic series, hence the behavior is indeed logarithmic.
C(N) ~ Ln(N-1) + Gamma
Note that this is the natural logarithm, which is better than the base 2 logarithm by a factor 1.44 !
My bet is that adding the early termination test would further improve the log basis (and keep the log behavior), but at the same time double the number of comparisons, so that globally it would be worse in terms of comparisons.
Let us assume we have a tree of size 18. The number I am looking for is in the 1st spot. In the worst case, I always randomly pick the highest number, (18->17->16...). Effectively only eliminating one element in every iteration. So it become a linear search: O(n) time
The recursion in the answer of #Yves Daoust relies on the assumption that the target element is located either at the beginning or the end of the array. In general, where the element lies in the array changes after each recursive call making it difficult to write and solve the recursion. Here is another solution that proves O(log n) bound on the expected number of recursive calls.
Let T be the (random) number of elements checked by the randomized version of binary search. We can write T=sum I{element i is checked} where we sum over i from 1 to n and I{element i is checked} is an indicator variable. Our goal is to asymptotically bound E[T]=sum Pr{element i is checked}. For the algorithm to check element i it must be the case that this element is selected uniformly at random from the array of size at least |j-i|+1 where j is the index of the element that we are searching for. This is because arrays of smaller size simply won't contain the element under index i while the element under index j is always contained in the array during each recursive call. Thus, the probability that the algorithm checks the element at index i is at most 1/(|j-i|+1). In fact, with a bit more effort one can show that this probability is exactly equal to 1/(|j-i|+1). Thus, we have
E[T]=sum Pr{element i is checked} <= sum_i 1/(|j-i|+1)=O(log n),
where the last equation follows from the summation of harmonic series.

What is the cut point where "quick sort" transforms from nlgn to n2?

I know the worst case of the algorithm - which is when the elements are already sorted or when all the elements are same,but want to know the point at which the algorithm moves from a complexity of nlgn to n2.
It depends on how we choose the pivot.
One view says that when all the elements are already sorted. Well, it is not 100% right. In this condition, if we choose the first element as the pivot, the complexity becomes N^2.
Since we have,
T(N) = T(N-1) + cN (N >1), if you are good at basic math, then:
T(N) = O(N^2)
As mentioned above, it depends on how we choose the pivot. Although in some textbooks, it chooses the first pivot mainly, that is not recommenced.
One popular method is : median-of-three partitioning. It choose median value of a[left],a[right] and a[(left+right)/2].
It will perform worst i.e,
O(n^2)
in following cases
If the list is already sorted and pivot is first element
If list is sorted in reverse order and pivot is last element.
If all elements are same in the list. In this case pivot selection do not matter.
Note- Already sorted cannot be the worst case if pivot is selected as median.
The worst case time for quick sort occurs when the chosen pivot does not divide the array. For example, if we choose the first element as the pivot every time and the array is already sorted, then the array is not divided at all. Hence the complexity is O(n^2).
To avoid this we randomize the index for the pivot. Assuming that the pivot splits the array in two equal sized parts we have a complexity of O(n log n).
For exact analysis see Formal analysis section in https://en.wikipedia.org/wiki/Quicksort

Find minimum of n Elements with log(n) comparisons per element

I need to write an algorithm that finds the minimum of n elements with:
n-1 comparisons.
Performing only log(n) comparisons per element.
I thought of the selection search algorithm, but I think it compares each element more than log(n) times.
Any ideas?
Thanks!
You can think of a selection process as a tournament:
The first element is compared to the second, the third to the fourth and so on.
The winner of a comparison is the smaller element.
All the winners participate in the next round, in the same manner, until one element remains. The remaining element is the smallest of all.
Pseudocode
I'll give the recursive solution, but you can implement it iteratively also.
smallestElement(A[1...n]):
if size(A) == 1:
return A[1]
else
return min(smallestElement(A[1...n/2], smallestElement(A[n/2 + 1...n]))
The recursion has depth logn because on every level we dividing the size of the input by 2, so the winner of the tournament participates in logn comparison, and no one element participates in more comparisons.

Prove that the running time of quick sort after modification = O(Nk)

this is a homework question, and I'm not that at finding the complixity but I'm trying my best!
Three-way partitioning is a modification of quicksort that partitions elements into groups smaller than, equal to, and larger than the pivot. Only the groups of smaller and larger elements need to be recursively sorted. Show that if there are N items but only k unique values (in other words there are many duplicates), then the running time of this modification to quicksort is O(Nk).
my try:
on the average case:
the tree subroutines will be at these indices:
I assume that the subroutine that have duplicated items will equal (n-k)
first: from 0 - to(i-1)
Second: i - (i+(n-k-1))
third: (i+n-k) - (n-1)
number of comparisons = (n-k)-1
So,
T(n) = (n-k)-1 + Sigma from 0 until (n-k-1) [ T(i) + T (i-k)]
then I'm not sure how I'm gonna continue :S
It might be a very bad start though :$
Hope to find a help
First of all, you shouldn't look at the average case since the upper bound of O(nk) can be proved for the worst case, which is a stronger statement.
You should look at the maximum possible depth of recursion. In normal quicksort, the maximum depth is n. For each level, the total number of operations done is O(n), which gives O(n^2) total in the worst case.
Here, it's not hard to prove that the maximum possible depth is k (since one unique value will be removed at each level), which leads to O(nk) total.
I don't have a formal education in complexity. But if you think about it as a mathematical problem, you can prove it as a mathematical proof.
For all sorting algorithms, the best case scenario will always be O(n) for n elements because to sort n elements you have to consider each one atleast once. Now, for your particular optimisation of quicksort, what you have done is simplified the issue because now, you are only sorting unique values: All the values that are the same as the pivot are already considered sorted, and by virtue of its nature, quicksort will guarantee that every unique value will feature as the pivot at some point in the operation, so this eliminates duplicates.
This means for an N size list, quicksort must perform some operation N times (once for every position in the list), and because it is trying to sort the list, that operation is trying to find the position of that value in the list, but because you are effectively dealing with just unique values, and there are k of those, the quicksort algorithm must perform k comparisons for each element. So it performs Nk operations for an N sized list with k unique elements.
To summarise:
This algorithm eliminates checking against duplicate values.
But all sorting algorithms must look at every value in the list at least once. N operations
For every value in the list the operation is to find its position relative to other values in the list.
Because duplicates get removed, this leaves only k values to check against.
O(Nk)

Resources