Optimizing Mergesort

Optimizing Mergesort - algorithm

Merge-sort is a fairly common sorting algorithm, and I have written a working merge-sort algorithm. Then I want to optimize it. The first step was to convert it from a recursive to an iterative one, which I did. Then I couldn't discern what else can be optimized. After poring through lots of articles on internet, I got two mechanisms, using multi-merge sort and tiled merge-sort. However none of the documents provided any pseudo-code, or even cared to explain much on how to do it, and how does it offer the advantages its author says it does, like being cache-friendly and improved locality hit.
Can anyone elaborate on this matter, and if possible, provide some pseudo-code? Specifically, I want to know how to make it cache-friendly. I have absolutely no idea about what these things are, otherwise I would have tried it myself.

One common and relatively straightforward optimization you can make is to switch from mergesort to another algorithm like insertion sort when the subarray sizes get below a certain threshold. Although mergesort runs in time O(n log n), that talks about its long-term growth rate and doesn't say anything about how well the algorithm will perform on small inputs. Insertion sort, for example, runs pretty fast on small input sizes even though it's worse in the long run. Consequently, consider changing the base case of your mergesort so that if the array to sort is below a certain size threshold (say, 50-100), you use insertion sort rather than continuing onward in the recursion. From experience, this can markedly improve the performance of the algorithm.

Related

How do i calculate the running time for a hybrid algorithm?

I have a project to program a hybrid algorithm and calculate the running time for it.
I programed the hybrid algorithm, where i have insertion sort algorithm and merge sort algorithm, and after the user enters the unsorted array the program will call the most appropriate algorithm (which can either be insertion sort or merge sort) upon a threshold i specified, my question is how do I calculate the running time for this hybrid algorithm?
Because the way I see it the program can only apply one algorithm at each time, have this ever been done before? please tell me if you know a name for this so i can search it.
(p.s. what i have is the most basic form of insertion sort and merge sort “the merge sort is with exactly two halves” so they have typical time complexity)
Any help will be appreciated.

Introsort does something very similar. If you carefully crafted your threshold, you are likely to end up with O(N log N) run time complexity, which is the best you can achieve for comparison-based sorting.
In theory, insertion sort has the better best-case complexity of O(N), so you might be able to reach that, too, for a best-case scenario. However, if your algorithm is a simple
if input_size < threshold:
insertion_sort(data)
else:
merge_sort(data)
then you wouldn't achieve this, since for large inputs, you would always use merge-sort.
Usually, this whole thing is done to achieve a practical constant speed-up on a real computer by using algorithms that very specific computers can do faster in some cases. If we're just looking at this from a time complexity theory point of view, it wouldn't make much sense to switch from merge sort to insertion sort, except for the case that the input data is already (mostly) sorted. Also note that complexity theory does not consider absolute run times, but rather the relationship between run times and input size.
How much you actually increase the performance (and whether you increase it at all) is very hard to say, and depends on a lot of factors. Usually, you would want to properly benchmark your algorithm and simply measure how much faster it is.

Algorithmic complexity vs real life situations?

My question is about theory vs practice thing.
Let’s say for example that I want to sort a list of numbers. Mergesort has a complexity of O(n*logn) while bubblesort has a complexity of O(n^2).
This means that mergesort is quicker. But the complexity doesn’t take into account the whole thing happening on a computer. What I mean by that, is that mergesort for example is a divide and conquer algorithm and it needs more space than bubblesort.
So isn’t it possible that the creation of this additional space and usage of resources (time to transfer the data, to populate the code instructions, etc) to take more time than bubblesort which doesn’t use any additional space ?
Wouldn’t be possible to be more efficient to use an algorithm with worse (“bigger”) complexity than another for certain length of inputs (maybe small) ?

The answer is a clear yes.
A classic example is that insertion sort is O(n^2). However efficient sorting implementations often switch to insertion sort at something like 100 elements left because insertion sort makes really good use of cache, and avoids pipeline stalls in the CPU. No, insertion sort won't scale, but it outperforms.
The way that I put it is that scalability is like a Mack Truck. You want it for a big load, but it might not be the best thing to take for a shopping trip at the local grocery store.

Algorithmic complexity only tells you how two algorithms will compare as their input grows larger, i.e. approaches infinity. It tells you nothing about how they will compare on smaller inputs. The only way to know that for sure is to benchmark on data and equipment that represents a typical situation.

When should one implement a simple or advanced sorting algorithm?

Apart from the obvious "It's faster when there are many elements". When is it more appropriate to use a simple sorting algorithm (0(N^2)) compared to an advanced one (O(N log N))?
I've read quite a bit about for example insertion sort being preferred when you've got a small array that's nearly sorted because you get the best case N. Why is it not good to use quicksort for example, when you've got say 20 elements. Not just insertion or quick but rather when and why is a more simple algorithm useful compared to an advanced?
EDIT: If we're working with for example an array, does it matter which data input we have? Such as objects or primitive types (Integer).

The big-oh notation captures the runtime cost of the algorithm for large values of N. It is less effective at measuring the runtime of the algorithm for small values.
The actual transition from one algorithm to another is not a trivial thing. For large N, the effects of N really dominate. For small numbers, more complex effects become very important. For example, some algorithms have better cache coherency. Others are best when you know something about the data (like your example of insertion sort when the data is nearly sorted).
The balance also changes over time. In the past, CPU speeds and memory speeds were closer together. Cache coherency issues were less of an issue. In modern times, CPU speeds have generally left memory busses behind, so cache coherency is more important.
So there's no one clear cut and dry answer to when you should use one algorithm over another. The only reliable answer is to profile your code and see.
For amusement: I was looking at the dynamic disjoint forest problem a few years back. I came across a state-of-the-art paper that permitted some operations to be done in something silly like O(log log N / log^4N). They did some truly brilliant math to get there, but there was a catch. The operations were so expensive that, for my graphs of 50-100 nodes, it was far slower than the O(n log n) solution that I eventually used. The paper's solution was far more important for people operating on graphs of 500,000+ nodes.

When programming sorting algorithms, you have to take into account how much work would be put into implementing the actual algorithm vs the actual speed of it. For big O, the time to implement advanced algorithms would be outweighed by the decreased time taken to sort. For small O, such as 20-100 items, the difference is minimal, so taking a simpler route is much better.

First of all O-Notation gives you the sense of the worst case scenario. So in case the array is nearly sorted the execution time could be near to linear time so it would be better than quick sort for example.
In case the n is small enough, we do take into consideration other aspects. Algorithms such as Quick-sort can be slower because of all the recursions called. At that point it depends on how the OS handles the recursions which can end up being slower than the simple arithmetic operations required in the insertion-sort. And not to mention the additional memory space required for recursive algorithms.

Better than 99% of the time, you should not be implementing a sorting algorithm at all.
Instead use a standard sorting algorithm from your language's standard library. In one line of code you get to use a tested and optimized implementation which is O(n log(n)). It likely implements tricks you wouldn't have thought of.
For external sorts, I've used the Unix sort utility from time to time. Aside from the non-intuitive LC_ALL=C environment variable that I need to get it to behave, it is very useful.
Any other cases where you actually need to implement your own sorting algorithm, what you implement will be driven by your precise needs. I've had to deal with this exactly once for production code in two decades of programming. (That was because for a complex series of reasons, I needed to be sorting compressed data on a machine which literally did not have enough disk space to store said data uncompressed. I used a merge sort.)

what is the best algorithm of sorting in speed

There's bubble, insert, selection, quick sorting algorithm.
Which one is the 'fastest' algorithm?
code size is not important.
Bubble sort
insertion sort
quick sort
I tried to check speed. when data is already sorted, bubble, insertion's Big-O is n but the algorithm is too slow on large lists.
Is it good to use only one algorithm?
Or faster to use a different mix?

Quicksort is generally very good, only really falling down when the data is close to being ordered already, or when the data has a lot of similarity (lots of key repeats), in which case it is slower.
If you don't know anything about your data and you don't mind risking the slow case of quick sort (if you think about it you can probably make a determination for your case if it's ever likely you'll get this (from already ordered data)) then quicksort is never going to be a BAD choice.
If you decide your data is or will sometimes (or often enough to be a problem) be sorted (or significantly partially sorted) already, or one way and another you decide you can't risk the worst case of quicksort, then consider timsort.
As noted by the comments on your question though, if it's really important to have the ultimate performance, you should consider implementing several algorithms and trying them on good representative sample data.

HP / Microsoft std::sort is introsort (quick sort switching to heap sort if nesting reaches some limit), and std::stable_sort is a variation of bottom up mergesort.
For sorting an array or vector of mostly random integers, counting / radix sort would normally be fastest.
Most external sorts are some variation of a k-way bottom up merge sort (the initial internal sort phase could use any of the algorithms mentioned above).
For sorting a small (16 or less) fixed number of elements, a sorting network could be used. This seems to be one of the lesser known algorithms. It would mostly be useful if having to repeatedly sort small sets of elements, perhaps implemented in hardware.

Efficiency of Sort Algorithms

I am studying up for a pretty important interview tomorrow and there is one thing that I have a great deal of trouble with: Sorting algorithms and BigO efficiencies.
What number is important to know? The best, worst, or average efficiency?

worst, followed by average. be aware of the real-world impact of the so-called "hidden constants" too - for instance, the classic quicksort algorithm is O(n^2) in the worst case, and O(n log n) on average, whereas mergesort is O(n log n) in the worst case, but quicksort will outperform mergesort in practice.

All of them are important to know, of course. You have to understand the benefits of one sort algorithm in the average case can become a terrible deficit in the worst case, or the worst case isn't that bad, but the best case isn't that good, and it only works well on unsorted data, etc.

In short.
Sorting algorithm efficiency will vary on input data and task.
sorting max speed, that can be archived is n*log(n)
if data contains sorted sub data, max speed can be better then n*log(n)
if data consists of duplicates, sorting can be done in near linear time
most of sorting algorithms have their uses
Most of quick sort variants have its average case also n*log(n), but thy are usually faster then other not heavily optimized algorithms. It is faster when it is not stable, but stable variants are only a fraction slower. Main problem is worst case. Best casual fix is Introsort.
Most of merge sort variants have its best, average and worst case fixed to n*log(n). It is stable and relatively easy to scale up. BUT it needs a binary tree (or its emulation) with relative to the size of total items. Main problem is memory. Best casual fix is timsort.
Sorting algorithms vary also by size of input. I can make a newbie claim, that over 10T size data input, there is no match for merge sort variants.

I recommend that you don't merely memorize these factoids. Learn why they are what they are. If I was interviewing you, I would make sure to ask questions that show my you understand how to analyze an algorithm, not just can spit back out something you saw on a webpage or in a book. Additionally, the day before an interview is not the time to be doing this studying.
I wish you the best of luck!! Please report back in a comment how it went!

I am over with one set of interviews at my college just now...
Every algorithm has its benefits, otherwise it wont exist.
So, its better to understand what is so good with the algorithm that you are studying. Where does it do well? How can it be improved?
I guess you'll automatically need to read various efficiency notations when you do this. Mind the worst case, and pay attention to the average case, best cases are rare.
All the best for your interview.

You may also want to look into other types of sorting that can be used when certain conditions exist. For example, consider Radix sort. http://en.wikipedia.org/wiki/Radix_sort

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Optimizing Mergesort - algorithm

Related

How do i calculate the running time for a hybrid algorithm?

Algorithmic complexity vs real life situations?

When should one implement a simple or advanced sorting algorithm?

what is the best algorithm of sorting in speed

Efficiency of Sort Algorithms

Categories

Resources