Time Complexity to Sort a K Sorted Array Using Quicksort Algorithm - algorithm

Problem:
I have to analyze the time complexity to sort (using Quick sort) a list of integer values which are almost sorted.
What I have done?
I have read SO Q1, SO Q2, SO Q3 and this one.
However, I have not found anything which mentioned explicitly the time complexity to sort a k sorted array using Quick sort.
Since the time complexity of Quick sort algorithm depends on the strategy of choosing pivot and there is a probability to face the worst case due to having almost sorted data, to avoid worst case, I have used median of three values(first, middle, last) as a pivot as referred here.
What do I think?
Since in average case, the time complexity of Quick sort algorithm is O(n log(n)) and as mentioned here, "For any non trivial value of n, a divide and conquer algorithm will need many O(n) passes, even if the array be almost completely sorted",
I think the time complexity to sort a k sorted array using Quick sort algorithm is O(n log(n)), if the worst case does not occur.
My Question:
Am I right that the time complexity to sort a k sorted array using Quick sort algorithm is O(n log(n)) if I try to avoid worst case selecting a proper pivot and if the worst case does not occur.

When you say time complexity of Quick Sort, it is O(n^2), because the worst case is assumed by default. However, if you use another strategy to choose pivot, like Randomized Quick Sort, for example, your time complexity is still going to be O(n^2) by default. But the expected time complexity is O(n log(n)), since the occurrence of the worst case is highly unlikely. So if you can prove somehow that the worst case is 100% guaranteed not to happen, then you can say time complexity is less than O(n^2), otherwise, by default, the worst case is considered, no matter how unlikely.

Related

Time complexity while choosing various pivots in quick sort for sorted, reverse sorted, and repeated elements array

I have to solve the following question in an assignment:
Calculate the time and space complexities of the Quick Sort for following input. Also, discuss the method of calculating the complexity.
(a) When input array is already sorted.
(b) When input array is reverse sorted.
(c) When all the elements in the input array are the same.
I am having trouble in calculating the time complexities of the different cases. The following table shows pivot vs case, with cells in bold being the one where I have doubts.
TIME
First
Middle
Last
Sorted
O(n^2)
O(n*logn)
O(n^2)
Same
O(n^2)
O(n^2)
O(n^2)
Reverse
O(n^2)
O(n*logn)
O(n^2)
Are these right? If not, what am I doing wrong?
When all elements are the same, classic Lomuto partition scheme has worst case O(n^2) complexity, while Hoare partition scheme has best case O(n log(n)) complexity with middle value as pivot.
https://en.wikipedia.org/wiki/Quicksort#Repeated_elements

Sorting algorithm with quadratic and linear runtime

Is it possible to make a sorting algorithm
That its running time at worst case is quadratic => n^2
But in most cases
(That is, on more than half of the n-size inputs)
the run time will be linear => n ??
I was thinking about Radix Sort and just make the worst case, worse
But I do not know if it is possible.
yes, the Bucket Sort
https://www.geeksforgeeks.org/bucket-sort-2/?ref=lbp
you can reed abut the algorithm in the link
In the worst case, we could have a bucket which contains all n values of the
array. Since insertion sort has worst case running time O(n^2), so does Bucket
sort. We can avoid this by using merge sort to sort each bucket instead, which
has worst case running time O(n lg n).
Yes, it is possible.
Bucket sort analysis does reveal such behaviour (with reasonable number of buckets).

What is the appropriate data structure for insertion sort?

I revisited insertion sort algorithm and noticed something funny.
One obviously shouldn't use an array with this sort, as upon insertion, one will have to shift all subsequent elements O(n^2 log(n)). However a linked list is also not good here, since we preferably find the right placement using binary search, which isn't possible for a simple linked list (so we end up with O(n^2)).
Which makes me wonder: what is a data structure on which this sorting algorithm provides its premise of O(nlog(n)) complexity?
From where did you get the premise of O(n log n)? Wikipedia disagrees, as does my own experience. The premises of the insertion sort include components that are O(n) for each of the n elements.
Also, I believe that your claim of O(n^2 log n) is incorrect. The binary search is log n, and the ensuing "move sideways" is n, but these two steps are in succession, not nested. The result is n + log n, not a multiplication. The result is the expected O(n^2).
If you use a gapped array and a binary search to figure out where to insert things, then with high probability your sort will be O(n log(n)). See https://en.wikipedia.org/wiki/Library_sort for details.
However this is not as efficient as a wide variety of other sorts that are widely implemented. So this knowledge is only of theoretical interest.
Insertion sort is defined over array or list, if you use some other data structure, then it will be another algorithm.
Of course if you use a BST, insertion and search would be O(log(n)) and your overall complexity would be O(n.log(n)) on the average (remind that it will be O(n^2) in the worst), but this will be no more an insertion sort but a tree sort. If you use an AVL tree, then you get the O(n.log(n)) worst case complexity.
In insertion sort the best case scenario is when the sequence is already sorted and that takes Linear time and in the worst case takes O(n^2) time. I do not know how you got the logarithmic part in the complexity.

Using median selection in quicksort?

I have a slight question about Quicksort. In the case where the minimun or maximum value of the array is selected, the pivot value the partition is very inefficient as the array size decreases by 1 one only.
However if I add code of selecting the median of that array, I think then Ii will be more efficient. Since partition algorithm is already O(N), it will give an O(N log N) algorithm.
Can this be done?
You absolutely can use a linear-time median selection algorithm to compute the pivot in quicksort. This gives you a worst-case O(n log n) sorting algorithm.
However, the constant factor on linear-time selection tends to be so high that the resulting algorithm will, in practice, be much, much slower than a quicksort that just randomly chooses the pivot on each iteration. Therefore, it's not common to see such an implementation.
A completely different approach to avoiding the O(n2) worst-case is to use an approach like the one in introsort. This algorithm monitors the recursive depth of the quicksort. If it appears that the algorithm is starting to degenerate, it switches to a different sorting algorithm (usually, heapsort) with a guaranteed worst-case O(n log n). This makes the overall algorithm O(n log n) without noticeably decreasing performance.
Hope this helps!

On the efficiency of tries and radix sort

Radix sort's time complexity is O(kn) where n is the number of keys to be sorted and k is the key length. Similarly, the time complexity for the insert, delete, and lookup operations in a trie is O(k). However, assuming all elements are distinct, isn't k>=log(n)? If so, that would mean Radix sort's asymptotic time complexity is O(nlogn), equal to that of quicksort, and trie operations have a time complexity of O(logn), equal to that of a balanced binary search tree. Of course, the constant factors may differ significantly, but the asymptotic time complexities won't. Is this true, and if so, do radix sort and tries have other advantages over other algorithms and data structures?
Edit:
Quicksort and its competitors perform O(nlogn) comparisons; in the worst case each comparison will take O(k) time (keys differ only at last digit checked). Therefore, those algorithms take O(knlogn) time. By that same logic, balanced binary search tree operations take O(klogn) time.
Big O notation is not used that way, even if k>=log n for radix sorting, O(kn) means that your processing time will double if n doubles and so on, this is how you should use big-o notation.
One advantage of radix sort is that it's worst case is O(kn) (quicksort's O(n^2)) so radix sort is somehow more resistant to malicious input than quicksort. It can also be really fast in term of real perfomance, if you use bitwise operations, a power of 2 as a base and in-place msd-radix sort with insertion sort for smaller arrays.
The same argument is valid for tries, they are resistant to malicious input in the sense that insertion/search is O(k) in the worst case. Hashtables perform insertion/search in O(1) but with O(k) hashing and in the worst case O(N) insertion/search. Also, tries can store strings more efficiently.
Check Algorithmic Complexity Attacks
The asymptotic time complexity of Radix sort is O(NlogN) which is also the time complexity of Qucik sort. The advantage of Radix sort is that it's best, average and worst case performance is same where as the worst case performance of Quick sort is O(N^2). But it takes twice the sapce as required by Quick sort. So, if space complexity is not a problem then Radix sort is a better option.

Resources