In Mergesort Algorithm, instead of splitting array into the equal half, try to split array from random point in each call, I want to calculate the average time of this algorithm?
Our notes calculate it as normal merge sort. any formal idea?
Here is a proof that its time complexity is O(n log n)(it's not very formal).
Let's call a split "good" if the size of the largest part is at most 3/4 of the initial subarray(it looks this way: bad bad good good good good bad bad for an array with 8 elements). The probability of split to be good is 1/2. It means that among two splits we expect one two be "good".
Let's draw a tree of recursive merge sort calls:
[a_1, a_2, a_3, ..., a_n] --- level 1
/ \
[a_1, ..., a_k] [a_k + 1, a_n] --- level 2
/ \ / \
... --- level 3
...
--- level m
It is clear that there are at most n elements at each level, so the time complexity is O(n * m).
But 1). implies that the number of levels is 2 * log(n, 4 / 3), where log(a, b) is a logarithm of a base b, which is O(log n).
Thus, the time complexity is O(n * log n).
I assume you're talking about recursive merge sort.
In standard merge sort, you split the array at the midpoint, so you end up with (mostly) same-sized subarrays at each level. But if you split somewhere else then, except in pathological cases, you still end up with nearly the same number of subarrays.
Look at it this way: the divide and conquer approach of standard merge sort results in log n "levels" of sorting, with each level containing all n items. You do n comparisons at each level to sort the subarrays. That's where the n log n comes from.
If you randomly split your array, then you're bound to have more levels, but not all items are at all levels. That is, smaller subarrays result in single-item arrays before the longer ones do. So not all items are compared at all levels of the algorithm. Meaning that some items are compared more often than others but on average, each item is compared log n times.
So what you're really asking is, given a total number of items N split into k sorted arrays, is it faster to merge if each of the k arrays is the same length, rather than the k arrays being of varying lengths.
The answer is no. Merging N items from k sorted arrays takes the same amount of time regardless of the lengths of the individual arrays. See How to sort K sorted arrays, with MERGE SORT for an example.
So the answer to your question is that the average case (and the best case) of doing a recursive merge sort with a random split will be O(n log n), with stack space usage of O(log n). The worst case, which would occur only if your random split always split the array into one subarray that contains a single item, and the other contains the remainder, would require O(n) stack space, but still only O(n log n) time.
Note that if you use an iterative merge sort, there is no asymptotic difference in time or space usage.
Related
Help me to understand the runtime of the Modified MergeSort algorithm.
In the classic MergeSort, when the input array is divided into two parts and sorted recursively, the execution time is: nlogn
What will be the execution time of the MergeSort algorithm if
divide the input array into three parts (not half), recursively sort every third and finally merge the results using the three-argument Merge merge sub-program.
n
nlogn
n (log n) ^ 2
n ^ 2logn
In the classic Merge Sort algorithm, there are approximately n * log2(n) comparisons and as many element copy operations, hence the time complexity of O(n.log(n)) because multiplicative constants are implicit.
If instead of splitting the array into 2 parts, you split into 3 parts, perform the same sort recursively on the parts and merge the 3 sorted slices into one, the number of comparisons increases to approximately 2 * n * log3(n) and the number of element copies is reduced to n * log3(n), but both are still proportional to n * log(n). Factoring out multiplicative constants, you still get a time complexity of O(n.log(n)).
I am preparing for software development interviews, I always faced the problem in distinguishing the difference between O(logn) and O(nLogn). Can anyone explain me with some examples or share some resource with me. I don't have any code to show. I understand O(Logn) but I haven't understood O(nlogn).
Think of it as O(n*log(n)), i.e. "doing log(n) work n times". For example, searching for an element in a sorted list of length n is O(log(n)). Searching for the element in n different sorted lists, each of length n is O(n*log(n)).
Remember that O(n) is defined relative to some real quantity n. This might be the size of a list, or the number of different elements in a collection. Therefore, every variable that appears inside O(...) represents something interacting to increase the runtime. O(n*m) could be written O(n_1 + n_2 + ... + n_m) and represent the same thing: "doing n, m times".
Let's take a concrete example of this, mergesort. For n input elements: On the very last iteration of our sort, we have two halves of the input, each half size n/2, and each half is sorted. All we have to do is merge them together, which takes n operations. On the next-to-last iteration, we have twice as many pieces (4) each of size n/4. For each of our two pairs of size n/4, we merge the pair together, which takes n/2 operations for a pair (one for each element in the pair, just like before), i.e. n operations for the two pairs.
From here, we can extrapolate that every level of our mergesort takes n operations to merge. The big-O complexity is therefore n times the number of levels. On the last level, the size of the chunks we're merging is n/2. Before that, it's n/4, before that n/8, etc. all the way to size 1. How many times must you divide n by 2 to get 1? log(n). So we have log(n) levels. Therefore, our total runtime is O(n (work per level) * log(n) (number of levels)), n work log(n) times.
I am working on a modified merge sort algorithm using a similar procedure for merging two sorted arrays, but instead want to merge √n sorted arrays of √n size. It will start with an array of size n, then recursively be divided into √n sub problems as stated above. The following algorithm is used:
1.) Divide array of n elements into √n pieces
2.) Pass elements back into method for recursion
3.) Compare pieces from step 1
4.) Merge components together to form sorted array
I am fairly certain this is the proper algorithm, but I am unsure how to find the Big O run time. Any guidance in the proper direction is greatly appreciated!
The key part is to find the complexity of the merging step. Assuming that an analogous method to that of the 2-way case is used:
Finding the minimum element out of all √n arrays is O(√n).
This needs to be done for all n elements to be merged; possible edge cases when some of the arrays are depleted only contribute a subtracted O(√n) in complexity.
Hence the complexity of merging is O(n√n). Expanding the recurrence:
Where (*) marks an expansion of the T() terms. Spotting the pattern for the m-th expansion:
Coefficient of T-term is n to the power of sum of powers of 1/2 up to m.
Argument of T-term is 1/2 to the power of m.
Accumulated terms the sum of n to the power of 1 + powers of 1/2 up to m.
Writing the above rules as a compact series:
(*) used the standard formula for geometric series.
(**) notes that for a summation of powers of n, the highest power dominates (1/2). Assume the stopping condition to be some small constant, be it n = 1:
Note that as n increases, the 2^(1 - ...) term vanishes. The first term is therefore bounded from above by O(n), which is overshadowed by the second term.
The time complexity of √n-way merge-sort is therefore O(n^1.5), which is worse than the O(n log n) complexity of 2-way merge-sort.
How do you the find complexity of quick sort when ratio between partition sizes is 5:n-5 or something like 1:19? I do not really understand how to calculate the complexity of the algorithm in these situations.
In general, keep the following in mind:
If you split an array into two pieces defined by some fixed ratio of a:b at each point, after O(log n) splits, the subarrays will be down to size 0.
If you split an array into two pieces where one size is a constant k, it will take Θ(n / k) splits to get the subarray sizes to drop to 0.
Now, think about the work that quicksort does at each level of the recursion. At each layer, it needs to do work proportional to the number of elements in the layer. If you use the first approach and have something like a 1/20 : 19/20 split, then there will be at most n elements per layer but only O(log n) layers, so the total work done will be O(n log n), which is great.
On the other hand, suppose that you always pull off five elements. Then the larger array at each step will have sizes n, n - 5, n - 10, n - 15, ..., 10, 5, 0. If you work out the math and sum this up, this works out to Θ(n2) total work, which is not very efficient.
Generally speaking, try to avoid splitting off a fixed number of elements at a time in quicksort. That gives you the degenerate case that you need to worry about.
A Merge algorithm merges two sorted input arrays into a sorted output array, by repeatedly comparing the smallest elements of the two input arrays, and moving the smaller one of the two to the output.
Now we need to merge three sorted input arrays (A1, A2, and A3) of the same length into a (sorted) output array, and there are two methods:
Using the above Merge algorithm to merge A1 and A2 into A4, then using the same algorithm to merge A4 and A3 into the output array.
Revising the above Merge algorithm, by repeatedly comparing the smallest elements of the three input arrays, and moving the smallest one of the three to the output array.
Which of the above two algorithms is more efficient, if only considering the worst case of array element movements (i.e., assignments)?
Which of the above two algorithms is more efficient, if only considering the worst case of array element comparisons?
Between these two algorithms, which one has a higher overall efficiency in worst case?
If all that you care about are the number of array writes, the second version (three-way merge) is faster than the first algorithm (two instances of two-way merge). The three-way merge algorithm will do exactly 3n writes (where n is the length of any of the sequences), since it merges all three ranges in one pass. The first approach will merge two of the ranges together, doing 2n writes, and will then merge that sequence with the third sequence, doing 3n writes for a net total of 5n writes.
More generally, suppose that you have k ranges of elements, all of length n. If you merge those ranges pairwise, then merge those merges pairwise again, etc., then you will do roughly k/2 merge steps merging ranges of length n, then k/4 merges of ranges of length 2n, then k/8 merges of length 4n, etc. This gives the sum
kn/2 + kn/2 + ... + kn/2 (log n times)
For a net number of array writes that are O(kn lg n). If, on the other hand, you use a k-way comparison at each step, you do exactly kn writes, which is much smaller.
Now, let's think about how many comparisons you do in each setup. In the three-way merge, each element written into the output sequence requires finding the minimum of three values. This requires two comparisons - one to compare the first values of the first two sequences, and one to compare the minimum of those two values to the first value of the third array. Thus for each value written out to the resulting sequence, we use two comparisons, and since there are 3n values written we need to do a total of at most 6n comparisons.
A much better way to do this would be to store the sequences in a min-heap, where sequences are compared by their first element. On each step, we dequeue the sequence from the heap with the smallest first value, write that value to the result, then enqueue tue rest of the sequence back into the heap. With k sequences, this means that each element written out requires at most O(lg k) comparisons, since heap insertion runs in O(lg k). This gives a net runtime of O(kn lg k), since each of the kn elements written out requires O(lg k) processing time.
In the other version, we begin by doing a standard two-way merge, which requires one comparison per element written out, for a net total of 2n comparisons. In the second pass of the merge, in the worst case we do a total of 3n comparisons, since there are 3G elements being merged. This gives a net total of 5n comparisons. If we use the generalized construction for pairwise merging that's described above, we will need to use O(kn lg n) comparisons, since each element written requires one comparison and we do O(kn lg n) writes.
In short, for the specific case of k=3, we have that the three-way merge does 3n writes and 6n comparisons for a net of 9n memory reads and writes. The iterated two-way merge does 5n writes and 5n comparisons for a net total of 10n memory reads and writes, and so the three-way-merge version is better.
If we consider the generalized constructions, the k-way merge does O(nk) writes and O(nk lg k) comparisons for a total of O(nk lg k) memory operations. The iterated two-way merge algorithm does O(nk lg n) writes and O(nk lg n) comparisons for a total of O(nk lg n) memory operations. Thus the k-way merge is asymptotically better for a few long sequences, while the iterated merge sort is faster for many short sequences.
Hope this helps!