Sorting technique - most efficient - sorting

What sorting technique would you use to sort 10,000 items using just 1000 available slots in your RAM?
Heap Sort
Quick Sort
Bubble Sort
Merge Sort
I am confused between quick and merge sort. Both have average time complexity of nlogn but again heap sort also has the same complexity. Any inputs would be appreciated!

Time complexity won't help you here - what the question is looking for is space complexity. Just as a hint, n = 10000 and you have only 1000 available spaces, so you need to pick an algorithm that is better than O(n) space complexity even in the worst case.

This seems like an HW question, so I'd prefer not to answer directly. In general, though, since your RAM is small and your list is big, you'll do best with something like a cache oblivious algorithm.

Related

Algorithmic complexity vs real life situations?

My question is about theory vs practice thing.
Let’s say for example that I want to sort a list of numbers. Mergesort has a complexity of O(n*logn) while bubblesort has a complexity of O(n^2).
This means that mergesort is quicker. But the complexity doesn’t take into account the whole thing happening on a computer. What I mean by that, is that mergesort for example is a divide and conquer algorithm and it needs more space than bubblesort.
So isn’t it possible that the creation of this additional space and usage of resources (time to transfer the data, to populate the code instructions, etc) to take more time than bubblesort which doesn’t use any additional space ?
Wouldn’t be possible to be more efficient to use an algorithm with worse (“bigger”) complexity than another for certain length of inputs (maybe small) ?
The answer is a clear yes.
A classic example is that insertion sort is O(n^2). However efficient sorting implementations often switch to insertion sort at something like 100 elements left because insertion sort makes really good use of cache, and avoids pipeline stalls in the CPU. No, insertion sort won't scale, but it outperforms.
The way that I put it is that scalability is like a Mack Truck. You want it for a big load, but it might not be the best thing to take for a shopping trip at the local grocery store.
Algorithmic complexity only tells you how two algorithms will compare as their input grows larger, i.e. approaches infinity. It tells you nothing about how they will compare on smaller inputs. The only way to know that for sure is to benchmark on data and equipment that represents a typical situation.

Insertion Sort is a good choice for "small" data sets. What is "small"?

There are many places I have seen where it talks about how Insertion Sort is good for small data sets. I can't find a number for what "small" is though. My guess is that there is no absolute answer and that it depends on the type of machine the code is being run on.
However, what factors go into deciding what is the threshold for when Insertion Sort is a good idea? And what are some ballpark figures for "small"? 5? 10? 50? 100?
Thanks!
Site saying Insertion Sort is good for small data sets:
https://www.toptal.com/developers/sorting-algorithms/insertion-sort
Yes, your guess is right - there is no absolute answer, one have to measure where is threshold between insertion sort and other methods.
For example, typical values for triggering to insertion sort (and get some gain, of course) for small pieces inside combined merge or quick sort are about 32-100 (but can vary depending on data and implementation details)
An attempt at an answer, providing we're talking about the general sorting problem. Insertion sort is on average O(n^2), efficient sorting algorithms are on average O(nlogn). So vaguely speaking if something takes K steps to sort efficiently it will take around (kind of) K^2 steps with insertion sort.
So if n > K is too slow for your liking with an efficient sort, n > K^0.5 will be too slow for you (roughly) with insertion sort.
Practically speaking let's say you're happy to sort arrays of size 10^8 with something efficient then you might be happy to sort arrays of size 10^4 with insertion sort.

Memory speed tradeoff of sorting algorithm

Consider only the bubble sort and merge sort. For bubble sort, time complexity would be O(n) to worst case O(n^2) and space complexity O(1). For merge sort, time complexity would be O(nlogn) with space complexity O(n). Which sort would you choose if the size of input is less than 1000 and why? What about more than 1000?
This is an interview question I had. Just want to know how you guys would answer it.
Consider only the bubble sort and merge sort.
By less than 1000, it might mean RAM is enough for any sorting algorithm without external storage. It also implies that the theoretical bound for time complexity doesn't matter in this case. You can pick any sorting algorithm you like without incurring any time penalty. For example, you can do bubble sort since it may be easy and intuitive to implement. Merge sort is just as good.
When the input size is bigger than 1000, it is probably assuming that the time complexity matters and even that RAM may not be big enough without external storage. In this case, if you have to choose between the two, merge sort is the safe one to pick. This is because merge sort has better worst case performance over bubble sort and merge sort is a good candidate for external sort(when input size is bigger than RAM).

Real world examples to decide which sorting algorithm works best

I am risking this question being closed before i get an answer, but i really do want to know the answer. So here goes.
I am currently trying to learn algorithms, and I am beginning to understand it as such but cannot relate to it.
I understand Time Complexity and Space Complexity. I also do understand some sorting algorithms based on the pseudo code
Sorting algorithms like
Bubble Sort
Insertion Sort
Selection Sort
Quicksort
Mergesort
Heapsort (Some what)
I am also aware of Best Case and Worst Case scenarios(Average case not so much).
Some online relevant references
Nice place which shows all the above graphically.
This gave me a good understanding as well.
BUT my question is - can some one give me REAL WORLD EXAMPLES where these sorting algorithms are implemented.
As the number of elements increases, you will use more sophisticated sorting algorithms. The later sorting techniques have a higher initial overhead, so you need a lot of elements to sort to justify that cost. If you only have 10 elements, a bubble or insertion sort will be the much faster than a merge sort, or heapsort.
Space complexity is important to consider for smaller embedded devices like a TV remote, or a cell phone. You don't have enough space to do something like a heapsort on those devices.
Datebases use an external merge sort to sort sets of data that are too large to be loaded entirely into memory. The driving factor in this sort is the reduction in the number of disk I/Os.
Good bubble sort discussion, there are many other factors to consider that contribute to a time and space complexity.
Sorting-Algorithms.com
One example is C++ STL sort
as the wikipedia page says:
The GNU Standard C++ library, for example, uses a hybrid sorting
algorithm: introsort is performed first, to a maximum depth given by
2×log2 n, where n is the number of elements, followed by an insertion
sort on the result.1 Whatever the implementation, the complexity
should be O(n log n) comparisons on the average.[2]

Average time complexity of quicksort vs insertion sort

I'm lead to believe that quick sort should be faster than insertion sort on a medium size unorderd int array. I've implemented both algorithms in java and I notice quicksort is significantly slower then insertion sorrt.
I have a theory: quiksort is being slower because it's recursive and the call it's making to it's own method signature is quite slow in the JVM which is why my timer is giving much higher readings than I expected, whereas insertion isn't recursive and all thwe work is done within one method so they JVM isn't having to do any extra grunt work? amirite?
You may be interested in these Sorting Algorithm Animations.
Probably not, unless your recursive methods are making any big allocations. Its more likely there's a quirk in your code or your data set is small.
The JVM shouldn't have any trouble with recursive calls.
Unless you've hit one of Quicksort's pathological cases (often, a list that is already sorted), Quicksort should be O(n log n) — substantially faster than insertion sort's O(n^2) as n increases.
You may want to use merge sort or heap sort instead; they don't have pathological cases. They are both O(n log n).
(When I did these long ago in C++, quicksort was faster than insertion sort with fairly small ns. Radix is notable faster with mid-size ns as well.)
theoretically Quick Sort should work faster than insertion sort for random data of medium to large size.
I guess the differences should be in the way QS is implemented:
pivot selection for the given data ?(3-median is a better approach)
using the same Swap mechanism for QS and insertion sort ?
is the input random enuf, i.e ., if you have clusters of ordered data performance will
suffer.
I did this exercise in C and results are in accordance with theory.
Actually for small value of n insertion sort is better than quick sort. As for small value of n instead of n^2 or nlogn the time depends more on constant.
The fastest implementations of quicksort use looping instead of recursion. Recursion typically isn't very fast.
You have to be careful how you make the recursive calls, and because it's Java, you can't rely on tail calls being optimized, so you should probably manage your own stack for the recursion.
Everything that is available to be known about quicksort vs insertion sort can be found in Bob Sedgewick's doctoral dissertation. The boiled-down version can be found in his algorithms textbooks.
I remember that in school, when we did sorting in Java, we would actually do a hybrid of the two. So for resursive algorithms like quicksort and mergesort, we would actually do insertion sort for segments that were very smal, say 10 records or so.
Recursion is slow, so use it with care. And as was noted before, if you can figure a way to implement the same algorithm in an iterative fashion, then do that.
There are three things to consider here. First, insertion sort is much faster (O(n) vs O(n log n)) than quicksort IF the data set is already sorted, or nearly so; second, if the data set is very small, the 'start up time" to set up the quicksort, find a pivot point and so on, dominates the rest; and third, Quicksort is a little subtle, you may want to re-read the code after a night's sleep.
How are you choosing your pivot in Quicksort?
This simple fact is the key to your question, and probably why Quicksort is running slower. In cases like this it's a good idea to post at least the important sections of your code if you're looking for some real help.
Actually for little worth of n insertion type is healthier than fast type. As for little worth of n rather than n^2 or nlogn the time depends a lot of on constant.
Web Development Indianapolis

Resources