Sorting partially sorted array - sorting

Suppose you are given a sorted list of n elements followed by f(n) randomly ordered elements. How would you sort the list if (i) f(n) = O(logn). I feel best algo would be merge sort but I am not sure of the resulting time complexity.

You should first sort the f(n) elements with any sort method and then use merge sort for the final phase. The time complexity would be O(n) as O(log(n)2) is negligible compared to the linear scan of the sorted portion.
If by list you mean an array, you could reduce the number of comparisons to O(log(n)2) by looking for the insertion point into the left portion using binary search. It would still take O(n) copying operations, so depending on the relative costs of copying vs: comparing, the time might stay sub-linear even for moderately large values of n.

Related

Merge k sorted arrays of size n in less then O(nklogk) time complexity

The question:
Merge k sorted arrays each with n elements into a single array of size nk in minimum time complexity. The algorithm should be a comparison-based algorithm. No assumption on the input should be made.
So I know about an algorithm that solves the problem in nklogk time complexity as mentioned here: https://www.geeksforgeeks.org/merge-k-sorted-arrays/.
Though, my question is can we sort in less than nklogk, meaning, the runtime is o(nklogk).
So I searched through the internet and found this answer:
Merge k sorted arrays of size n in O(nk) time complexity
Which claims to divide an array of size K into singletons and merge them into a single array. But this is incorrect since one can claim that he found an algorithm that solves the problem in sqrt(n)klogk which is o(nklogk) but n=1 so we sort the array in KlogK time which doesn't contradict the lower bound on sorting an array.
So how can I contradict the lower bound on sorting an array? meaning, for an array of size N which doesn't have any assumptions on the input, sorting will take at least NlogN operations.
The lower bound of n log n only applies to comparison-based sorting algorithms (heap sort, merge sort, etc.). There are, of course, sorting algorithms that have better time complexities (such as counting sort), however they are not comparison-based.

Sorting Algorithms with time complexity Log(n)

Is there any sorting algorithm with an average time complexity log(n)??
example [8,2,7,5,0,1]
sort given array with time complexity log(n)
No; this is, in fact, impossible for an arbitrary list! We can prove this fairly simply: the absolute minimum thing we must do for a sort is look at each element in the list at least once. After all, an element may belong anywhere in the sorted list; if we don't even look at an element, it's impossible for us to sort the array. This means that any sorting algorithm has a lower bound of n, and since n > log(n), a log(n) sort is impossible.
Although n is the lower bound, most sorts (like merge sort, quick sort) are n*log(n) time. In fact, while we can sort purely numerical lists in n time in some cases with radix sort, we actually have no way to, say, sort arbitrary objects like strings in less than n*log(n).
That said, there may be times when the list is not arbitrary; ex. we have a list that is entirely sorted except for one element, and we need to put that element in the list. In that case, methods like binary search tree can let you insert in log(n), but this is only possible because we are operating on a single element. Building up a tree (ie. performing n inserts) is n*log(n) time.
As #dominicm00 also mentioned the answer is no.
In general when you see an algorithm with time complexity of Log N with base 2 that means that, you are dividing the input list into 2 sets, and getting rid of one of them repeatedly. In sorting algorithm we need to put all the elements in their appropriate place, if we get rid of half of the list in each iteration, that does not correlate with sorting functionality.
The most efficient sorting algorithms have the time complexity of O(n), but with some limitations. Three most famous algorithm with complexity of O(n) are :
Counting sort with time complexity of O(n+k), while k is the maximum number in given list. Assuming n>>k, you can consider its time complexity as O(n)
Radix sort with time complexity of O(d*(n+k)), where k is maximum number of input list and d is maximum number of digits you may have in input list. Similar to counting sort assuming n>>k && n>>d => time complexity will be O(n)
Bucket sort with time complexity of O(n)
But in general due to limitation of each of these algorithms most implementation relies on O(n* log n) algorithms, such as merge sort, quick sort, and heap sort.
Also there are some sorting algorithms with time complexity of O(n^2) which are recommended for list with smaller sizes such as insertion sort, selection sort, and bubble sort.
Using a PLA it might be possible to implement counting sort for a few elements with a low range of values.
count each amount in parallel and sum using lg2(N) steps
find the offset of each element in lg2(N) steps
write the array in O(1)
Only massive parallel computation would be able to do this, general purpose CPU's would not do here unless they implement it in silicon as part of their SIMD.

Does sorting time of n numbers depend on a permutation of the numbers?

Consider this problem:
A comparison-based sorting algorithm sorts an array with n items. For which fraction of n! permutations, the number of comparisons may be cn where c is a constant?
I know the best time complexity for sorting an array with arbitrary items is O(nlogn) and it doesn't depend on any order, right? So, there is no fraction that leads to cn comparisons. Please guide me if I am wrong.
This depends on the sorting algorithm you use.
Optimized Bubble Sort for example, compares all neighboring elements of an array and swaps them when the left element is larger then right one. This is repeated until no swaps where performed.
When you give Bubble Sort a sorted array it won't perform any swaps in the first iteration and thus sorts in O(n).
On the other hand, Heapsort will take O(n log n) independent of the order of the input.
Edit:
To answer your question for a given sorting algorithm, might be non-trivial. Only one out of n! permutations is sorted (assuming no duplicates for simplicity). However, for the example of bubblesort you could (starting for the sorted array) swap each pair of neighboring elements. This input will take Bubblesort two iterations which is also O(n).

For faser searching, shouldn't one apply merge sort on the data before doing binary search or just jump straight to linear search?

I'm learning about algorithms and have doubts about their application in certain situations. There is the divide and conquer merge sort, and the binary search. Both faster than linear growth algos.
Let's say I want to search for some value in a large list of data. I don't know whether the data is sorted or not. How about instead of doing a linear search, why not first do merge sort and then do binary search. Would that be faster? Or the process of applying merge sort and then binary search combined would slow it down even more than linear search? Why? Would it depend on the size of the data?
There's a flaw in the premise of your question. Merge Sort has O(N logN) complexity, which is the best any comparison-based sorting algorithm can be, but that's still a lot slower than a single linear scan. Note that log2(1000) ~= 10. (Obviously, the constant-factors matter a lot, esp. for smallish problem sizes. Linear search of an array is one of the most efficient things a CPU can do. Copying stuff around for MergeSort is not bad, because the loads and stores are from sequential addresses (so caches and prefetching are effective), but it's still a ton more work than 10 reads through the array.)
If you need to support a mix of insert/delete and query operations, all with good time complexity, pick the right data structure for the task. A binary search tree is probably appropriate (or a Red-Black tree or some other variant that does some kind of rebalancing to prevent O(n) worst-case behaviour). That'll give you O(log n) query, and O(log n) insert/delete.
sorted array gives you O(n) insert/delete (because you have to shuffle the remaining elements over to make or close gaps), but O(log n) query (with lower time and space overhead than a tree).
unsorted array: O(n) query (linear search), O(1) insert (append to the end), O(n) delete (O(n) query, then shuffle elements to close the gap). Efficient deletion of elements near the end.
linked list, sorted or unsorted: few advantages other than simplicity.
hash table: insert/delete: O(1) average (amortized). query for present/not-present: O(1). Query for which two elements a non-present value is between: O(n) linear scan keeping track of the min element greater than x, and max element less than x.
If your inserts/deletes happen in large chunks, then sorting the new batch and doing a merge-sort is much more efficient than adding elements one at a time to a sorted array. (i.e. InsertionSort). Adding a chunk at the end and doing QuickSort is also an option, and might modify less memory.
So the best choice depends on the access pattern you're optimizing for.
If the list is of size n, then
TimeOfMergeSort(list) + TimeOfBinarySearch(list) = O(n log n) + O(log n) = O(n log n)
TimeOfLinearSearch(list) = O(n)
O(n) < O(n log n)
Implies
TimeOfLinearSearch(list) < TimeOfMergeSort(list) + TimeOfBinarySearch(list)
Of course, as mentioned in the comments frequency of sorting and frequency of searching play a huge role in amortized cost.

Algorithm comparison in unsorted array

If I have a unsorted array A[1.....n]
using linear search to search number x
using bubble sorting to sort the array A in ascending order, then use binary search to search number x in sorted array
Which way will be more efficient — 1 or 2?
How to justify it?
If you need to search for a single number, nothing can beat a linear search: sorting cannot proceed faster than O(n), and even that is achievable only in special cases. Moreover, bubble sort is extremely inefficient, taking O(n2) time. Binary search is faster than that, so the overall timing is going to be dominated by O(n2).
Hence you are comparing O(n) to O(n2); obviously, O(n) wins.
The picture would be different if you needed to search for k different numbers, where k is larger than n2. The outcome of this comparison may very well be negative.

Resources