What is the appropriate data structure for insertion sort? - algorithm

I revisited insertion sort algorithm and noticed something funny.
One obviously shouldn't use an array with this sort, as upon insertion, one will have to shift all subsequent elements O(n^2 log(n)). However a linked list is also not good here, since we preferably find the right placement using binary search, which isn't possible for a simple linked list (so we end up with O(n^2)).
Which makes me wonder: what is a data structure on which this sorting algorithm provides its premise of O(nlog(n)) complexity?

From where did you get the premise of O(n log n)? Wikipedia disagrees, as does my own experience. The premises of the insertion sort include components that are O(n) for each of the n elements.
Also, I believe that your claim of O(n^2 log n) is incorrect. The binary search is log n, and the ensuing "move sideways" is n, but these two steps are in succession, not nested. The result is n + log n, not a multiplication. The result is the expected O(n^2).

If you use a gapped array and a binary search to figure out where to insert things, then with high probability your sort will be O(n log(n)). See https://en.wikipedia.org/wiki/Library_sort for details.
However this is not as efficient as a wide variety of other sorts that are widely implemented. So this knowledge is only of theoretical interest.

Insertion sort is defined over array or list, if you use some other data structure, then it will be another algorithm.
Of course if you use a BST, insertion and search would be O(log(n)) and your overall complexity would be O(n.log(n)) on the average (remind that it will be O(n^2) in the worst), but this will be no more an insertion sort but a tree sort. If you use an AVL tree, then you get the O(n.log(n)) worst case complexity.

In insertion sort the best case scenario is when the sequence is already sorted and that takes Linear time and in the worst case takes O(n^2) time. I do not know how you got the logarithmic part in the complexity.

Related

Best runtime for n-1 comparisons?

If an algorithm must make n-1 comparisons to find a certain element, then can we assume that best possible runtime of the algorithm is O(n)?
I know that the lower bound for sorting algorithms is nlogn but since we only return the found one element, I figured it would be possible to do better in terms of run time?
Thanks!
To find a certain element in an unsorted list you need O(n).
But if you sort the array (takes O(n log n) in general) you can find a certain element in O(log n).
So if you want to find often elements in the same list it is most likely worth to sort the list to then be able to find elements much more efficient.
If your array is unsorted and you find some element in it then in worst case Linear search algorithm make n-1 comparisons and time complexity will be O(n).
But if you want to reduce your time complexity then first sort your array and use Binary search algorithm it is take O(logn) in worst case.
So Binary search algorithm is more efficient then linear search.
For unsorted elements, worst case is when you have to go over all the elements, i.e., O(N). If you need many look-ups then you have several pre-processing alternatives that speed up all future accesses.
Option 1: put the elements in a standard hash table. Creating the hash table costs O(N), on average, and later pay O(1) on average for each lookup. This assumes that a reasonable hash-function can be created for this type of elements.
Most languages/libraries implement bucket-based hash-tables, which in pathological cases can put all elements in one bucket, costing O(N) per lookup.
Option 2: there are other hash-table implementations that don't suffer from pathological O(N) cases. The Robin Hood hashing (Wikipedia) (more at Programming.Guide) guarantees O(log N) lookup in the worst case, with average of O(1).
Option 3: another option is to sort elements in O(N log N) once, and then use binary-search to lookup in O(log N). Usually this is slower than Robin Hood hashing (Option 2).
Option 4: If the values are simple integers with limited range, with max-min around N, then it is possible to put the values in an array (list), such that array[value-min] will contain a count of how many times the value appears in the input. It costs O(N) to construct, and O(1) to lookup. Better, the constants for both preprocessing and lookup are significantly lower than in any other method.
Note: I didn't mention the O(N) counting-sort as an alternative to the general case of O(N log N) sorting (option 3), since if max(value)-min(value) is small enough for counting-sort, then option 4 is relevant and is simpler and faster.
If applicable, choose option 4, otherwise if you wish to invest time and code then choose option 2. If 4 isn't applicable, and 2 is not worth the effort in your case, then choose option 2 if you don't mind the pathological worst-case (never choose option 2 when an adversary may want to harm you in a DOS attack).
Your question has nothing to do with sorting, let alone linear search.
If you claim that n-1 comparisons are mandated, then your problem has certainly complexity Ω(n). But with that information alone, you can't guarantee O(n) because it is not said that these n-1 comparisons are sufficient, nor that the algorithm does not perform extra operations, for instance to decide which comparisons to perform. It could turn out that your algorithm is O(n³) with no chance to do better, but we can't tell.
Best case complexity: Ω(n).
Worst case complexity: unknown.

Time Complexity to Sort a K Sorted Array Using Quicksort Algorithm

Problem:
I have to analyze the time complexity to sort (using Quick sort) a list of integer values which are almost sorted.
What I have done?
I have read SO Q1, SO Q2, SO Q3 and this one.
However, I have not found anything which mentioned explicitly the time complexity to sort a k sorted array using Quick sort.
Since the time complexity of Quick sort algorithm depends on the strategy of choosing pivot and there is a probability to face the worst case due to having almost sorted data, to avoid worst case, I have used median of three values(first, middle, last) as a pivot as referred here.
What do I think?
Since in average case, the time complexity of Quick sort algorithm is O(n log(n)) and as mentioned here, "For any non trivial value of n, a divide and conquer algorithm will need many O(n) passes, even if the array be almost completely sorted",
I think the time complexity to sort a k sorted array using Quick sort algorithm is O(n log(n)), if the worst case does not occur.
My Question:
Am I right that the time complexity to sort a k sorted array using Quick sort algorithm is O(n log(n)) if I try to avoid worst case selecting a proper pivot and if the worst case does not occur.
When you say time complexity of Quick Sort, it is O(n^2), because the worst case is assumed by default. However, if you use another strategy to choose pivot, like Randomized Quick Sort, for example, your time complexity is still going to be O(n^2) by default. But the expected time complexity is O(n log(n)), since the occurrence of the worst case is highly unlikely. So if you can prove somehow that the worst case is 100% guaranteed not to happen, then you can say time complexity is less than O(n^2), otherwise, by default, the worst case is considered, no matter how unlikely.

Insertion Sort of O(n^2) complexity and using Binary Search on previous values to improve complexity

How would the algorithm's (of Insertion Sort of O(n^2)) complexity change if you changed the algorithm to use a binary search instead of searching previous values until you found where to insert your current value. Also, When would this be useful?
Your new complexity is still quadratic, since you need to move all of the sorted parts rightward. Therefore, using binary search is only marginally better.
I would recommend a fast sorting algorithm (in O(n log n) time) for large arrays, the quadratic insertion sort algorithm is suited only for small arrays.

O(nlogn) in-place sorting algorithm

This question was in the preparation exam for my midterm in introduction to computer science.
There exists an algorithm which can find the kth element in a list in
O(n) time, and suppose that it is in place. Using this algorithm,
write an in place sorting algorithm that runs in worst case time
O(n*log(n)), and prove that it does. Given that this algorithm exists,
why is mergesort still used?
I assume I must write some alternate form of the quicksort algorithm, which has a worst case of O(n^2), since merge-sort is not an in-place algorithm. What confuses me is the given algorithm to find the kth element in a list. Isn't a simple loop iteration through through the elements of an array already a O(n) algorithm?
How can the provided algorithm make any difference in the running time of the sorting algorithm if it does not change anything in the execution time? I don't see how used with either quicksort, insertion sort or selection sort, it could lower the worst case to O(nlogn). Any input is appreciated!
Check wiki, namely the "Selection by sorting" section:
Similarly, given a median-selection algorithm or general selection algorithm applied to find the median, one can use it as a pivot strategy in Quicksort, obtaining a sorting algorithm. If the selection algorithm is optimal, meaning O(n), then the resulting sorting algorithm is optimal, meaning O(n log n). The median is the best pivot for sorting, as it evenly divides the data, and thus guarantees optimal sorting, assuming the selection algorithm is optimal. A sorting analog to median of medians exists, using the pivot strategy (approximate median) in Quicksort, and similarly yields an optimal Quicksort.
The short answer why mergesort is prefered over quicksort in some cases is that it is stable (while quicksort is not).
Reasons for merge sort. Merge Sort is stable. Merge sort does more moves but fewer compares than quick sort. If the compare overhead is greater than move overhead, then merge sort is faster. One situation where compare overhead may be greater is sorting an array of indices or pointers to objects, like strings.
If sorting a linked list, then merge sort using an array of pointers to the first nodes of working lists is the fastest method I'm aware of. This is how HP / Microsoft std::list::sort() is implemented. In the array of pointers, array[i] is either NULL or points to a list of length pow(2,i) (except the last pointer points to a list of unlimited length).
I found the solution:
if(start>stop) 2 op.
pivot<-partition(A, start, stop) 2 op. + n
quickSort(A, start, pivot-1) 2 op. + T(n/2)
quickSort(A, pibvot+1, stop) 2 op. + T(n/2)
T(n)=8+2T(n/2)+n k=1
=8+2(8+2T(n/4)+n/2)+n
=24+4T(n/4)+2n K=2
...
=(2^K-1)*8+2^k*T(n/2^k)+kn
Recursion finishes when n=2^k <==> k=log2(n)
T(n)=(2^(log2(n))-1)*8+2^(log2(n))*2+log2(n)*n
=n-8+2n+nlog2(n)
=3n+nlog2(n)-8
=n(3+log2(n))-8
is O(nlogn)
Quick sort have worstcase O(n^2), but that only occurs if you have bad luck when choosing the pivot. If you can select the kth element in O(n) that means you can choose a good pivot by doing O(n) extra steps. That yields a woest-case O(nlogn) algorithm. There are a couple of reasons why mergesort is still used. First, this selection algorithm is more or less cumbersome to implement in-place, and also adds several extra operations to the regular quicksort, so it is not that fastest than merge sort, as one might expect.
Nevertheless, MergeSort is not still used because of its worst time complexity, in fact HeapSort achieves the same worst case bounds and is also in place, and didn't replace MergeSort, though it has also other disadvantages against quicksort. The main reason why MergeSort survives is because it is the fastest stable sort algorithm know so far. There are several applications in which is paramount to have an stable sorting algorithm. And that is the strength of MergeSort.
A stable sort is such that the equal items preserve the original relative order. For example, this is very useful when you have two keys, and you want to sort by first key first and then by second key, preserving the first key order.
The problem with HeapSort against quicksort is that it is cache inefficient, since you swap/compare elements too far from each other in the array, while quicksort compares consequent elements, these elements are more likely to be in the cache at the same time.

Differences in efficiency between merge quick and heap sort

All of these sorting algorithms have an average case of O(n log n), so I would just like to know how I would be able to differentiate between these three sorting algorithms if I could run tests but not know which sorting algorithm was being run.
another difference between Heap and Merge sort you may want to concern is, Heap is not stable sort, but Mergesort is.
here is a table(link below), you could find (almost) any information about comparison sort algorithms you want.
https://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms
heapsort is a inplace sorting algorithm , we don't need extra storage to sort the elements but mergesort is not inplace sorting algorithm , we required extra storage , in merge procedure , to sort the elements.The worst case running time of quicksort is O(n^2) that differentiate it form heapsort and mergesort
There are many cases in which performance of these algorithms are different.
For example.
if all input element are same.
then, heapsort will run in O(n) time
quicksort will run in O(n^2) time. (if last element is a pivote element)
and,
mergesort is going to take O(logn) time.

Resources