TriMerge vs Merge Sort - sorting

Can some one tell me which is better of the two algorithms TriMergeSort and MergeSort.
The time complexity of the MergeSort would be nlogn base 2.
The time complexity of the TriMergeSort is nlogn base 3.
Since TriMergeSort is base 3 and MergeSort is base 2 I am considering TriMergeSort is faster than that of MergeSort.
Please correct me if I am wrong.

While you are right that the number of levels in the recursive structure is log2 n in the case of regular mergesort and log3 n in the case of three-way mergesort, it's important to remember that the work done per level increases as the number of levels increases. Specifically, in your merge step, you need to switch from a normal 2-way merge to a special 3-way merge. At each step in the merge, you need to determine which of the lists has the smallest unused element. In a two-way merge, you just compare the front elements of the two lists against one another. In a three-way merge, there are more comparisons required because you have to find the lowest element out of three elements.
Generalizing this to a k-way mergesort, the number of layers will be logk n, but the work for the merge will be higher than this. It's possible to do a k-way merge of n total elements in time O(n log k) by using binary heaps, so more work is required as k increases.
Interestingly, if we talk about the amount of work required overall, then we can see that we need to do O(n log k) work across logk n levels. This gives us a total runtime of O(n log k logk n). Using the change-of-base formula for logarithms, which says that logk n = log2 n / log2 k, we see that the runtime will be
O(n log k logk n)
= O(n log k (log n / log k))
= O(n log n)
In other words, there isn't an asymptotic difference between the algorithms when you choose different values of k. The drop in levels due to a higher splitting factor is offset by an increased amount of work per level.
To figure out which algorithm is best, the best option would be to run them all and see what happens. Due to caching effects and locality of reference, I suspect that the answer might at some level depend on the particular architecture you're using.

As far as Big-O complexity, it doesn't matter.
Regular merge sort is n * log_2(n) which is equivalent to n * (log(n) / log(2)). The log(2) is constant, so merge sort is simply n * log(n)
Tri-merge sort is n * log_3(n) which, using the same logic for regular merge sort, is simply n * log(n)
Given that both reduce to O(n * log(n)), it's not really possible to say which is better.
An alternate way to demonstrate why you can't just assume tri-merge to be better:
Assume a 3-way merge is better than a 2-way merge.
In general, assume an (N+1)-way merge is better than an N-way merge.
If this were true, it would be best to use an N-way merge where N is the number of elements you're sorting. However, the merge step requires choosing the least element from N sources which requires O(N) time.
This means that the N-way merge sort runs in O(N^2) time, effectively making it selection sort.

Related

Measuring the Time Complexity of Insertion And Merge Sort Hybrid

I have a very basic implementation of merge and insertion sort that involves a threshold below which insertion sort is used on sub-arrays of problem size n, where merge and insertion sort are the most basic and widely available:
def hybrid_sort(array: list, threshold: int = 10):
if len(array) > 1:
mid = len(array) // 2
left = array[:mid]
right = array [mid:]
if len(array) > threshold:
hybrid_sort(left)
hybrid_sort(right)
merge(array, left, right)
else:
insertion_sort(array)
Unless I am completely misunderstanding then this would mean that we have a recurrence relation for this particular piece of code generalized as:
T(n) = 2T(n/2) + O(n^2)
The first half showing up for merge sort, and the second being insertion sort opertations.
By the master theorem, n raised to log_b(a) would equal n in this case, because you'd have n raised to the log_2(2) which is 1, so n^1 = n.
Then, our F(n) = n^2 which is is 'larger' than n, so by case 3 of the master theorem my algorithm above would be f(n) or O(n^2), because f(n) is bounded from below by n.
This doesn't seem right to me considering we know merge sort is O(nlog(n)), and I'm having a hard time wrapping my head around this. I think it's because I've not yet analyzed such an algorithm that has a conditional 'if' check.
Can anyone illuminate this for me?
Unless the threshold itself depends on n, the insertion sort part does not matter at all. This has the same complexity as a normal merge sort.
Keep in mind that the time complexity of an algorithm that takes an input of size n is a function of n that is generally difficult to compute exactly, and so we focus on the asymptotic behavior of that function instead. This is where the big O notation comes into play.
In your case, as long as threshold is a constant, this means that as n grows, threshold becomes insignificant and all the insertion sorts can just be grouped up as a constant factor, making the overall complexity O((n-threshold) * log(n-threshold) * f(threshold)), where f(threshold) is a constant. So it simplifies to O(n log n), the complexity of merge sort.
Here's a different perspective that might help give some visibility into what's happening.
Let's suppose that once the array size reaches k, you switch from merge sort to insertion sort. We want to work out the time complexity of this new approach. To do so, we'll imagine the "difference" between the old algorithm and the new algorithm. Specifically, if we didn't make any changes to the algorithm, merge sort would take time Θ(n log n) to complete. However, once we get to arrays of size k, we stop running mergesort and instead use insertion sort. Therefore, we'll make some observations:
There are Θ(n / k) subarrays of the original array of size k.
We are skipping calling mergesort on all these arrays. Therefore, we're avoiding doing Θ(k log k) work for each of Θ(n / k) subarrays, so we're avoiding doing Θ(n log k) work.
Instead, we're insertion-sorting each of those subarrays. Insertion sort, in the worst case, takes time O(k2) when run on an array of size k. There are Θ(n / k) of those arrays, so we're adding in a factor of O(nk) total work.
Overall, this means that the work we're doing in this new variant is O(n log n) - O(n log k) + O(nk). Dialing k up or down will change the total amount of work done. If k is a fixed constant (that is, k = O(1)), this simplifies to
O(n log n) - O(n log k) + O(nk)
= O(n log n) - O(n) + O(n)
= O(n log n)
and the asymptotic runtime is the same as that of regular insertion sort.
It's worth noting that as k gets larger, eventually the O(nk) term will dominate the O(n log k) term, so there's some crossover point where increasing k starts decreasing the runtime. You'd have to do some experimentation to fine-tune when to make the switch. But empirically, setting k to some modest value will indeed give you a big performance boost.

What would be the running time of an algorithm that combines mergeSort and heapsort?

I have been given this problem that asks to compute the worst case running time of an algorithm that's exactly like mergeSort, but one of the two recursive calls is substituted by Heapsort.
So, I know that dividing in mergesort takes constant time and that merging is O(n). Heapsort takes O(nlogn).
This is what I came up with: T(n) = 2T(n/2) + O((n/2)logn)+ O(n).
I have some doubts about the O((n/2)logn) part. Is it n or n/2? I wrote n/2 because I'm doing heapsort only on half of the array, but I'm not sure that's correct
The question asks about running time, but should it be asking about time complexity?
Since recursion is mentioned, this is a question about top down merge sort (as opposed to bottom up merge sort).
With the code written as described, since heap sort is not recursive, recursion only occurs on one of each of the split sub-arrays. Heap sort will be called to sort sub-arrays of size n/2, n/4, n/8, n/16, ... , and no merging takes place until two sub-arrays of size 1 are the result of the recursive splitting. In the simple case where array size is a power of 2, then "merge sort" is only used for a single element, the rest of the sub-arrays of size {1, 2, 4, 8, ..., n/8, n/4, n/2} are sorted by heap sort and then merged.
Since heap sort is slower than merge sort, then running time will be longer, but time complexity remains at O(n log(n)) since constant or lower term factors are ignored for time complexity.
Let’s work out what the recurrence relation should be in this case. Here, we’re
splitting the array in half,
recursively sorting one half (T(n / 2)),
heapsorting one half (O(n log n)), and then
merging the two halves together (O(n)).
That gives us this recurrence relation:
T(n) = T(n / 2) + O(n log n).
Why is this O(n log n) and not, say, O((n / 2) log (n / 2))? The reason is that big-O notation munches up constant factors, so O(n log n) expresses the same asymptotic growth rate as O((n / 2) log (n / 2)). And why isn’t there a coefficient of 2 on the T(n / 2)? It’s because we’re only making one recursive call; remember that the other call was replaced by heapsort.
All that’s left to do now is to solve this recurrence. It does indeed work out to O(n log n), and I’ll leave it to you to decide how you want to show this. The iteration method is a great option here.

Best sorting algorithm for a partly sorted sequence?

I have to answer the following question:
What sorting algorithm is recommended if the first n-m part
is already sorted and the remaining part m is unsorted? Are there any algorithms that take O(n log m) comparisons? What about O(m log n) comparisons?
I just can't find the solution.
My first idea was insertion sort because O(n) for almost sorted sequence. But since we don't know the size of m the Runtime is very likely to be O(n^2) eventough the sequence is half sorted already isn't it?
Then I tought perhabs its quick sort because it takes (Sum from k=1 to n) Cavg (1-m) + Cavg (n-m) comparisons. But after ignoring the n-m part of the sequence the remaining sequence is 1-m in quicksort and not m.
Merge Sort and heap sort should have a runtime of O(m log m) for the remaining sequence m I would say.
Does anyone have an idea or can give me some advice?
Greetings
Have you tried sorting remaining part m separately as O(m log (m)) complexity (with any algorithm you like: MergeSort, HeapSort, QuickSort, ...) and then merge that part with sorted part using MergeSort (You won't even need to fully implement MergeSort - just single pass of it's inner loop body to merge two sorted sequences)?
That would result in O(m*log(m) + n + m) = O(m*log(m) + n) complexity. I don't believe it is possible to find better asymptotic complexity on single-core CPU. Although it will require additional O(n+m) memory for merging result array.
Which sort algorithm works best on mostly sorted data?
Sounds like insertion and bubble are good. You are free to implement as many as you want then test to see which is faster/fewer operations by supplying them partially sorted data.

Radix Sort & O(N log N) Efficiency

I have been learning about Radix sort recently and one of the sources I have used is the Wikipedia page. At the moment there is the following paragraph there regarding the efficiency of the algorithm:
The topic of the efficiency of radix sort compared to other sorting
algorithms is somewhat tricky and subject to quite a lot of
misunderstandings. Whether radix sort is equally efficient, less
efficient or more efficient than the best comparison-based algorithms
depends on the details of the assumptions made. Radix sort complexity
is O(wn) for n keys which are integers of word size w. Sometimes w is
presented as a constant, which would make radix sort better (for
sufficiently large n) than the best comparison-based sorting
algorithms, which all perform O(n log n) comparisons to sort n keys.
However, in general w cannot be considered a constant: if all n
keys are distinct, then w has to be at least log n for a random-access
machine to be able to store them in memory, which gives at best a time
complexity O(n log n). That would seem to make radix sort at most
equally efficient as the best comparison-based sorts (and worse if
keys are much longer than log n).
The part in bold has regrettably become a bit of a block that I am unable to get past. I understand that in general Radix sort is O(wn), and through other sources have seen how O(n) can be achieved, but cannot quite understand why n distinct keys requires O(n log n) time for storage in a random-access machine. I'm fairly certain it comes down to some simple mathematics, but unfortunately a solid understanding remains just beyond my grasp.
My closest attempt is as follows:
Given a base, 'B' and a number in that base, 'N', The maximum digits 'N' can have is:
(logB of N) + 1.
If each number in a given list, L, is unique, then we have up to:
L *((logB of N) + 1) possibilities
At which point I'm unsure how to progress.
Is anyone able to please expand on the above section in bold and break down why n distinct keys requires a minimum of log n for random-access storage?
Assuming MSB radix sort with constant m bins:
For an arbitrarily large data type which must accommodate at least n distinct values, the number of bits required is N = ceiling(log2(n))
Thus the amount of memory required to store each value is also O(log n); assuming sequential memory access, the time complexity of reading / writing a value is O(N) = O(log n), although can use pointers instead
The number of digits is O(N / m) = O(log n)
Importantly, each consecutive digit must differ by a power-of-2, i.e. m must also be a power-of-2; assume this to be small enough for the HW platform, e.g. 4-bit digits = 16 bins
During sorting:
For each radix pass, of which there are O(log n):
Count each bucket: get the value of the current digit using bit operations - O(1) for all n values. Should note that each counter must also be N bits, although increments by 1 will be (amortized) O(1). If we had used non-power-of-2 digits, this would in general be O(log n log log n) ( source )
Make the bucket count array cumulative: must perform m - 1 additions, each of which is O(N) = O(log n) (unlike the increment special case)
Write the output array: loop through n values, determine the bin again, and write the pointer with the correct offset
Thus the total complexity is O(log n) * [ n * O(1) + m * O(log n) + n * O(1) ] = O(n log n).

How to calculate order (big O) for more complex algorithms (eg quicksort)

I know there are quite a bunch of questions about big O notation, I have already checked:
Plain english explanation of Big O
Big O, how do you calculate/approximate it?
Big O Notation Homework--Code Fragment Algorithm Analysis?
to name a few.
I know by "intuition" how to calculate it for n, n^2, n! and so, however I am completely lost on how to calculate it for algorithms that are log n , n log n, n log log n and so.
What I mean is, I know that Quick Sort is n log n (on average).. but, why? Same thing for merge/comb, etc.
Could anybody explain me in a not too math-y way how do you calculate this?
The main reason is that Im about to have a big interview and I'm pretty sure they'll ask for this kind of stuff. I have researched for a few days now, and everybody seem to have either an explanation of why bubble sort is n^2 or the unreadable explanation (for me) on Wikipedia
The logarithm is the inverse operation of exponentiation. An example of exponentiation is when you double the number of items at each step. Thus, a logarithmic algorithm often halves the number of items at each step. For example, binary search falls into this category.
Many algorithms require a logarithmic number of big steps, but each big step requires O(n) units of work. Mergesort falls into this category.
Usually you can identify these kinds of problems by visualizing them as a balanced binary tree. For example, here's merge sort:
6 2 0 4 1 3 7 5
2 6 0 4 1 3 5 7
0 2 4 6 1 3 5 7
0 1 2 3 4 5 6 7
At the top is the input, as leaves of the tree. The algorithm creates a new node by sorting the two nodes above it. We know the height of a balanced binary tree is O(log n) so there are O(log n) big steps. However, creating each new row takes O(n) work. O(log n) big steps of O(n) work each means that mergesort is O(n log n) overall.
Generally, O(log n) algorithms look like the function below. They get to discard half of the data at each step.
def function(data, n):
if n <= constant:
return do_simple_case(data, n)
if some_condition():
function(data[:n/2], n / 2) # Recurse on first half of data
else:
function(data[n/2:], n - n / 2) # Recurse on second half of data
While O(n log n) algorithms look like the function below. They also split the data in half, but they need to consider both halves.
def function(data, n):
if n <= constant:
return do_simple_case(data, n)
part1 = function(data[n/2:], n / 2) # Recurse on first half of data
part2 = function(data[:n/2], n - n / 2) # Recurse on second half of data
return combine(part1, part2)
Where do_simple_case() takes O(1) time and combine() takes no more than O(n) time.
The algorithms don't need to split the data exactly in half. They could split it into one-third and two-thirds, and that would be fine. For average-case performance, splitting it in half on average is sufficient (like QuickSort). As long as the recursion is done on pieces of (n/something) and (n - n/something), it's okay. If it's breaking it into (k) and (n-k) then the height of the tree will be O(n) and not O(log n).
You can usually claim log n for algorithms where it halves the space/time each time it runs. A good example of this is any binary algorithm (e.g., binary search). You pick either left or right, which then axes the space you're searching in half. The pattern of repeatedly doing half is log n.
For some algorithms, getting a tight bound for the running time through intuition is close to impossible (I don't think I'll ever be able to intuit a O(n log log n) running time, for instance, and I doubt anyone will ever expect you to). If you can get your hands on the CLRS Introduction to Algorithms text, you'll find a pretty thorough treatment of asymptotic notation which is appropriately rigorous without being completely opaque.
If the algorithm is recursive, one simple way to derive a bound is to write out a recurrence and then set out to solve it, either iteratively or using the Master Theorem or some other way. For instance, if you're not looking to be super rigorous about it, the easiest way to get QuickSort's running time is through the Master Theorem -- QuickSort entails partitioning the array into two relatively equal subarrays (it should be fairly intuitive to see that this is O(n)), and then calling QuickSort recursively on those two subarrays. Then if we let T(n) denote the running time, we have T(n) = 2T(n/2) + O(n), which by the Master Method is O(n log n).
Check out the "phone book" example given here: What is a plain English explanation of "Big O" notation?
Remember that Big-O is all about scale: how much more operation will this algorithm require as the data set grows?
O(log n) generally means you can cut the dataset in half with each iteration (e.g. binary search)
O(n log n) means you're performing an O(log n) operation for each item in your dataset
I'm pretty sure 'O(n log log n)' doesn't make any sense. Or if it does, it simplifies down to O(n log n).
I'll attempt to do an intuitive analysis of why Mergesort is n log n and if you can give me an example of an n log log n algorithm, I can work through it as well.
Mergesort is a sorting example that works through splitting a list of elements repeatedly until only elements exists and then merging these lists together. The primary operation in each of these merges is comparison and each merge requires at most n comparisons where n is the length of the two lists combined. From this you can derive the recurrence and easily solve it, but we'll avoid that method.
Instead consider how Mergesort is going to behave, we're going to take a list and split it, then take those halves and split it again, until we have n partitions of length 1. I hope that it's easy to see that this recursion will only go log (n) deep until we have split the list up into our n partitions.
Now that we have that each of these n partitions will need to be merged, then once those are merged the next level will need to be merged, until we have a list of length n again. Refer to wikipedia's graphic for a simple example of this process http://en.wikipedia.org/wiki/File:Merge_sort_algorithm_diagram.svg.
Now consider the amount of time that this process will take, we're going to have log (n) levels and at each level we will have to merge all of the lists. As it turns out each level will take n time to merge, because we'll be merging a total of n elements each time. Then you can fairly easily see that it will take n log (n) time to sort an array with mergesort if you take the comparison operation to be the most important operation.
If anything is unclear or I skipped somewhere please let me know and I can try to be more verbose.
Edit Second Explanation:
Let me think if I can explain this better.
The problem is broken into a bunch of smaller lists and then the smaller lists are sorted and merged until you return to the original list which is now sorted.
When you break up the problems you have several different levels of size first you'll have two lists of size: n/2, n/2 then at the next level you'll have four lists of size: n/4, n/4, n/4, n/4 at the next level you'll have n/8, n/8 ,n/8 ,n/8, n/8, n/8 ,n/8 ,n/8 this continues until n/2^k is equal to 1 (each subdivision is the length divided by a power of 2, not all lengths will be divisible by four so it won't be quite this pretty). This is repeated division by two and can continue at most log_2(n) times, because 2^(log_2(n) )=n, so any more division by 2 would yield a list of size zero.
Now the important thing to note is that at every level we have n elements so for each level the merge will take n time, because merge is a linear operation. If there are log(n) levels of the recursion then we will perform this linear operation log(n) times, therefore our running time will be n log(n).
Sorry if that isn't helpful either.
When applying a divide-and-conquer algorithm where you partition the problem into sub-problems until it is so simple that it is trivial, if the partitioning goes well, the size of each sub-problem is n/2 or thereabout. This is often the origin of the log(n) that crops up in big-O complexity: O(log(n)) is the number of recursive calls needed when partitioning goes well.

Resources