Given n numbers that all are identical, then what would be the running time of merge sort?
Will it be in linear time O(n) or,
best case O(nlogn)
For a pure merge sort, the number of moves is always the same O(n log(n)). If all elements are the same or in order or reverse order, the number of compares is about half the number of compares for the worst case.
A natural merge sort that creates runs based on existing ordering of data would take O(n) time for all identical values or in order or reverse order. A variation of this is a hybrid insertion sort + merge sort called Timsort.
https://en.wikipedia.org/wiki/Timsort
You need to recheck the recursive formula that you have for the merge sort:
T(n) = 2T(n/2) + \Theta(n)
Now, when all values are identical, let see what will be changed in the formulation. \Theta(n) is for merging two subarrays. As the merging of two subarrays with identical members sweeps those arrays, independent of the identical members, it will be the same in your case.
Therefore, the recursion formula will be unchanged for the specified case; hence the time complexity will be Theta(n log n). That can be considered as one of the shortcomings of the mergesort.
Related
Is there any sorting algorithm with an average time complexity log(n)??
example [8,2,7,5,0,1]
sort given array with time complexity log(n)
No; this is, in fact, impossible for an arbitrary list! We can prove this fairly simply: the absolute minimum thing we must do for a sort is look at each element in the list at least once. After all, an element may belong anywhere in the sorted list; if we don't even look at an element, it's impossible for us to sort the array. This means that any sorting algorithm has a lower bound of n, and since n > log(n), a log(n) sort is impossible.
Although n is the lower bound, most sorts (like merge sort, quick sort) are n*log(n) time. In fact, while we can sort purely numerical lists in n time in some cases with radix sort, we actually have no way to, say, sort arbitrary objects like strings in less than n*log(n).
That said, there may be times when the list is not arbitrary; ex. we have a list that is entirely sorted except for one element, and we need to put that element in the list. In that case, methods like binary search tree can let you insert in log(n), but this is only possible because we are operating on a single element. Building up a tree (ie. performing n inserts) is n*log(n) time.
As #dominicm00 also mentioned the answer is no.
In general when you see an algorithm with time complexity of Log N with base 2 that means that, you are dividing the input list into 2 sets, and getting rid of one of them repeatedly. In sorting algorithm we need to put all the elements in their appropriate place, if we get rid of half of the list in each iteration, that does not correlate with sorting functionality.
The most efficient sorting algorithms have the time complexity of O(n), but with some limitations. Three most famous algorithm with complexity of O(n) are :
Counting sort with time complexity of O(n+k), while k is the maximum number in given list. Assuming n>>k, you can consider its time complexity as O(n)
Radix sort with time complexity of O(d*(n+k)), where k is maximum number of input list and d is maximum number of digits you may have in input list. Similar to counting sort assuming n>>k && n>>d => time complexity will be O(n)
Bucket sort with time complexity of O(n)
But in general due to limitation of each of these algorithms most implementation relies on O(n* log n) algorithms, such as merge sort, quick sort, and heap sort.
Also there are some sorting algorithms with time complexity of O(n^2) which are recommended for list with smaller sizes such as insertion sort, selection sort, and bubble sort.
Using a PLA it might be possible to implement counting sort for a few elements with a low range of values.
count each amount in parallel and sum using lg2(N) steps
find the offset of each element in lg2(N) steps
write the array in O(1)
Only massive parallel computation would be able to do this, general purpose CPU's would not do here unless they implement it in silicon as part of their SIMD.
Consider the problem:
Although merge sort runs in Θ(nlgn) worst-case time and insertion sort runs in Θ(n^2) worstcase
time, the constant factors in insertion sort makes it faster for small n. Thus, it makes sense to use insertion sort within merge sort when subproblems become sufficiently small.
Consider a modification for merge sort in which n/k sublists of length k are sorted using insertion sort and then merged using the standard merging mechanism, where k is a value to be determined.
Question: Show that the sublists can be merged in Θ(n lg(n/k)) worst-case time:
My solution:
To merge n/k sublists into n/2k it takes Θ(n) times
To merge n/2k sublists into n/4k it takes Θ(n) times
...
To merge 2 sublists into 1 it takes Θ(n) times
Then I was struggling with further steps and I had a look at the solution:
We have lg(n/k) such merges, so merging n/k sublists into one list takes Θ(n lg(n/k)) worst-case time.
I have two questions:
1)How do they end up with lg(n/k) merges? Please, clarify the calculations?
2)Why is the final result Θ(n lg(n/k))?
You seem to be pretty close to the actual answer. I believe the phrasing of the answer you looked up is what makes it harder for you to understand, because I do not think that the total number of individual merges required is lg(n/k). What I believe the answer refers to is the number of merging steps required until we end up with the sorted list.
Instead of the answer, however, let's continue building onto your reasoning. Merging two lists of length k has O(k) time complexity. To merge n/k such lists into n/(2k) lists, we will do n/(2k) merges with complexity O(k) each, resulting in an overall O(n) complexity, as you mentioned.
You may extend this logic to the next step, where n/(2k) lists are merged into n/(4k), and state that the second step has O(n) complexity, as well. In fact, each merging step, will take O(n) time.
The next thing to do here is estimating how many of these merging steps we have. We started with n/k lists, and after the first step we obtained n/(2k) lists. After that, at each step, the number of lists is halved, until there is only 1 list left, which will be our result. (i.e. sorted list) Well, how many times do you think we have to divide n/k by 2, until we end up with 1? That is exactly what log(n/k) means, isn't it? So, there will be log(n/k) of such merging steps, each of which takes O(n).
Consequently, the entire procedure will have a time complexity of O(nlog(n/k)).
I am trying to determine the running time in Big O of merge sort for:
(A) sorted input
(B) reverse-ordered input
(C) random input
My answer is that it would take O(n lgn) for all three scenarios, since regardless of the default order of the input, merge sort will always divide the input into the smallest unit of 1 element. Then it will compare each element with each element in the adjacent list to sort and merge the two adjacent lists. It will continue to do this until finally all the elements are sorted and merged.
That said, all we really need to find then is the Big O complexity of merge sort in general, since the worst, average, and best cases will all take the same time.
My question is, can somebody tell me if my answers are correct, and if so, explain why the Big O complexity of merge sort ends up being O(n lgn)?
The answer to this question depends on your implementation of Merge Sort.
When naively implemented, merge sort indeed uses O(n * log n) time as it will always divide the input down to the smallest unit. However, there's a specific implementation called Natural Merge Sort that will keep numbers in their correct order if they're already ordered in the input array by essentially first looking at the given input and deciding which parts need to be ordered, that is, divided and later merged again.
Natural Merge Sort will only take O(n) time for an ordered input and in general be faster for a random input than for a reverse-ordered input. In the latter two cases, runtime will be O(n * log n).
To answer your last question, I'll look at the "normal" Mergesort; the explanation is easier that way.
Note that Mergesort can be visualized as a binary tree where in the root we have the whole input, on the next layer the two halves you get from dividing the input once, on the third layer we have four quarters and so on... On the last layer we finally have individual numbers.
Then note that the whole tree is O(log n) deep (this can also be proved mathematically). On each layer we have to make some comparisons and swaps on n numbers in total - this is because the total amount of numbers on a layer doesn't decrease when we go down the tree. In the picture, we need to do comparisons and swaps on 8 numbers on each layer. The way Mergesort works, we'll actually have to do exactly 8 comparisons and up to 8 swaps per layer. If we have an input of length n instead of 8, we'll need n comparisons and up to n swaps per layer (this is O(n)). We have O(log n) layers, so the whole runtime will be O(n * log n).
If I have to sort one list and merge it with another already sorted one. Then what will the running time be if I use merge sort and insertion sort?
Merge sort is: n logn
Insertion sort is: n^2
But together they are?
EDIT: Oh, so what I actually meant was that I had to sort one of the lists and merge them together.
I have made the pseudocode for the insertion sort, but I don't know what the running time of the two algorithms will be.
http://gyazo.com/0010f053f0fe64a82dad1dd383740a3f
The complexity of merging two sorted lists with lengths n1 and n2 is O(n1 + n2); that should be enough to work out the big-Oh of the entire algorithm
Suppose we have an array of size n with all the elements identical. What will be O(n)? Will it be linear?
This depends on how the algorithm is implemented.
With a standard "vanilla" implementation of mergesort, the time required to sort an array will always be Θ(n log n) because the merges required at each step each take linear time.
However, with the appropriate optimizations, it's possible to get this to run in time O(n). In many mergesort implementations, the input array is continuously modified so that larger and larger ranges are sorted, and when a merge step occurs, the algorithm uses an external buffer to merge two adjacent sorted ranges. In that case, there's a nifty optimization you can do: before doing the merge, check if the last element of the first range is less than or equal to the first element of the second range. If so, the two ranges taken together are already sorted, so no merging needs to be done.
Suppose you perform this optimization and try sorting an array where all elements are already sorted. What happens? Well, each call to mergesort will fire off two more recursive calls. After those return, it can check the endpoints of the sorted ranges and will notice that they're already in sorted order, so there's no more work left to be done. Overall, this does O(1) work per call, so we have this recurrence relation for the time complexity of the algorithm:
T(n) = 2T(n/2) + O(1)
This solves to O(n), so only linear work is done.