Insertion Sort of O(n^2) complexity and using Binary Search on previous values to improve complexity - sorting

How would the algorithm's (of Insertion Sort of O(n^2)) complexity change if you changed the algorithm to use a binary search instead of searching previous values until you found where to insert your current value. Also, When would this be useful?

Your new complexity is still quadratic, since you need to move all of the sorted parts rightward. Therefore, using binary search is only marginally better.
I would recommend a fast sorting algorithm (in O(n log n) time) for large arrays, the quadratic insertion sort algorithm is suited only for small arrays.

Related

What kind of input data are the following sorting algorithms good/bad for?

What kind of data input are the following sorting algorithms efficient on/not efficient on? Quicksort, Mergesort, Heapsort, Insertion sort etc.
I know there are at least 2 factors that affect the performance of a sorting algorithm: 1) The size of the input, and 2) whether or not the data is already mostly sorted. But I don't know exactly how these factors affect the efficiency of the algorithms.
I'd like to study this in detail, so if there are any sources/links that you can point me to, that'd be great.
Assuming quicksort is based on Hoare partition scheme (middle value as pivot), then it won't degrade to worst case time complexity of O(n^2) for almost sorted data.
https://en.wikipedia.org/wiki/Quicksort#Hoare_partition_scheme
Mergesort always does n ⌈log2(n)⌉ moves. If data is already sorted, then the number of compares is about (⌈n ⌈log2(n)⌉)/2.
Heapsort time complexity remains about the same (duplicates may reduce running time).
Insertion sort is the only sort in this list that is faster if the data is nearly sorted, but it's time complexity is O(n^2). I'm thinking that for nearly sorted data, the time complexity would be ~ O(m n), where m is the number of elements out of place.
Variations of natural merge sort, which might use insertion sort on small runs while scanning and identifying already sorted runs, would have time complexity O(n) on already sorted data.

Quick Sort vs Insertion Sort

When building a sorting algorithm to sort an array, how many n elements in the array is quick sort faster that Insertion sort? I know that Quick sort is good for more elements and that Insertion sort is great for smaller size. But was wondering around what size is Quick Sort a far better option than Insertion Sort?
These algorithms depend on more than just the size of the arrays to determine their run time. For quicksort, the pivot your algorithm selects can have a significant effect on runtime. If the pivot is consistently the greatest or least element, then the quicksort takes O(n^2). Insertion sort is also influenced by factors besides array size. If you are inserting elements in order, the algorithm might allow for a runtime of O(n) regardless of array size. However, if you are inserting in reverse-order, this algorithm will take O(n^2). Due to these factors, there is no size n for which one algorithm is guaranteed to perform better than the other. If you are concerned with the runtimes of sorting algorithms for large arrays, you should check out heapsort or mergesort, they are both O(n log n) and are much faster!

What is the appropriate data structure for insertion sort?

I revisited insertion sort algorithm and noticed something funny.
One obviously shouldn't use an array with this sort, as upon insertion, one will have to shift all subsequent elements O(n^2 log(n)). However a linked list is also not good here, since we preferably find the right placement using binary search, which isn't possible for a simple linked list (so we end up with O(n^2)).
Which makes me wonder: what is a data structure on which this sorting algorithm provides its premise of O(nlog(n)) complexity?
From where did you get the premise of O(n log n)? Wikipedia disagrees, as does my own experience. The premises of the insertion sort include components that are O(n) for each of the n elements.
Also, I believe that your claim of O(n^2 log n) is incorrect. The binary search is log n, and the ensuing "move sideways" is n, but these two steps are in succession, not nested. The result is n + log n, not a multiplication. The result is the expected O(n^2).
If you use a gapped array and a binary search to figure out where to insert things, then with high probability your sort will be O(n log(n)). See https://en.wikipedia.org/wiki/Library_sort for details.
However this is not as efficient as a wide variety of other sorts that are widely implemented. So this knowledge is only of theoretical interest.
Insertion sort is defined over array or list, if you use some other data structure, then it will be another algorithm.
Of course if you use a BST, insertion and search would be O(log(n)) and your overall complexity would be O(n.log(n)) on the average (remind that it will be O(n^2) in the worst), but this will be no more an insertion sort but a tree sort. If you use an AVL tree, then you get the O(n.log(n)) worst case complexity.
In insertion sort the best case scenario is when the sequence is already sorted and that takes Linear time and in the worst case takes O(n^2) time. I do not know how you got the logarithmic part in the complexity.

For faser searching, shouldn't one apply merge sort on the data before doing binary search or just jump straight to linear search?

I'm learning about algorithms and have doubts about their application in certain situations. There is the divide and conquer merge sort, and the binary search. Both faster than linear growth algos.
Let's say I want to search for some value in a large list of data. I don't know whether the data is sorted or not. How about instead of doing a linear search, why not first do merge sort and then do binary search. Would that be faster? Or the process of applying merge sort and then binary search combined would slow it down even more than linear search? Why? Would it depend on the size of the data?
There's a flaw in the premise of your question. Merge Sort has O(N logN) complexity, which is the best any comparison-based sorting algorithm can be, but that's still a lot slower than a single linear scan. Note that log2(1000) ~= 10. (Obviously, the constant-factors matter a lot, esp. for smallish problem sizes. Linear search of an array is one of the most efficient things a CPU can do. Copying stuff around for MergeSort is not bad, because the loads and stores are from sequential addresses (so caches and prefetching are effective), but it's still a ton more work than 10 reads through the array.)
If you need to support a mix of insert/delete and query operations, all with good time complexity, pick the right data structure for the task. A binary search tree is probably appropriate (or a Red-Black tree or some other variant that does some kind of rebalancing to prevent O(n) worst-case behaviour). That'll give you O(log n) query, and O(log n) insert/delete.
sorted array gives you O(n) insert/delete (because you have to shuffle the remaining elements over to make or close gaps), but O(log n) query (with lower time and space overhead than a tree).
unsorted array: O(n) query (linear search), O(1) insert (append to the end), O(n) delete (O(n) query, then shuffle elements to close the gap). Efficient deletion of elements near the end.
linked list, sorted or unsorted: few advantages other than simplicity.
hash table: insert/delete: O(1) average (amortized). query for present/not-present: O(1). Query for which two elements a non-present value is between: O(n) linear scan keeping track of the min element greater than x, and max element less than x.
If your inserts/deletes happen in large chunks, then sorting the new batch and doing a merge-sort is much more efficient than adding elements one at a time to a sorted array. (i.e. InsertionSort). Adding a chunk at the end and doing QuickSort is also an option, and might modify less memory.
So the best choice depends on the access pattern you're optimizing for.
If the list is of size n, then
TimeOfMergeSort(list) + TimeOfBinarySearch(list) = O(n log n) + O(log n) = O(n log n)
TimeOfLinearSearch(list) = O(n)
O(n) < O(n log n)
Implies
TimeOfLinearSearch(list) < TimeOfMergeSort(list) + TimeOfBinarySearch(list)
Of course, as mentioned in the comments frequency of sorting and frequency of searching play a huge role in amortized cost.

Differences in efficiency between merge quick and heap sort

All of these sorting algorithms have an average case of O(n log n), so I would just like to know how I would be able to differentiate between these three sorting algorithms if I could run tests but not know which sorting algorithm was being run.
another difference between Heap and Merge sort you may want to concern is, Heap is not stable sort, but Mergesort is.
here is a table(link below), you could find (almost) any information about comparison sort algorithms you want.
https://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms
heapsort is a inplace sorting algorithm , we don't need extra storage to sort the elements but mergesort is not inplace sorting algorithm , we required extra storage , in merge procedure , to sort the elements.The worst case running time of quicksort is O(n^2) that differentiate it form heapsort and mergesort
There are many cases in which performance of these algorithms are different.
For example.
if all input element are same.
then, heapsort will run in O(n) time
quicksort will run in O(n^2) time. (if last element is a pivote element)
and,
mergesort is going to take O(logn) time.

Resources