About bubble sort vs merge sort - algorithm

This is an interview question that I recently found on Internet:
If you are going to implement a function which takes an integer array as input and returns the maximum, would you use bubble sort or merge sort to implement this function? What if the array size is less than 1000? What if it is greater than 1000?
This is how I think about it:
First, it is really weird to use sorting to implement the above function. You can just go through the array once and find the max one.
Second, if have to make a choice between the two, then bubble sort is better - you don't have to implement the whole bubble sort procedure but only need to do the first pass. It is better than merge sort both in time and space.
Are there any mistakes in my answer? Did I miss anything?

It's a trick question. If you just want the maximum, (or indeed, the kth value for any k, which includes finding the median), there's a perfectly good O(n) algorithm. Sorting is a waste of time. That's what they want to hear.
As you say, the algorithm for maximum is really trivial. To ace a question like this, you should have the quick-select algorithm ready, and also be able to suggest a heap datastructure in case you need to be able to mutate the list of values and always be able to produce the maximum rapidly.

I just googled the algorithms. The bubble sort wins in both situations because of the largest benefit of only having to run through it once. Merge sort can not cut any short cuts for only having to calculate the largest number. Merge takes the length of the list, finds the middle, and then all the numbers below the middle compare to the left and all above compare to the right; in oppose to creating unique pairs to compare. Meaning for every number left in the array an equal number of comparisons need to be made. In addition to that each number is compared twice so the lowest numbers of the array will most likely get eliminated in both of their comparisons. Meaning only one less number in the array after doing two comparisons in many situations. Bubble would dominate

Firstly I agree with everything you have said, but perhaps it is asking about knowing time complexity's of the algorithms and how the input size is a big factor in which will be fastest.
Bubble sort is O(n2) and Merge Sort is O(nlogn). So, on a small set it wont be that different but on a lot of data Bubble sort will be much slower.

Barring the maximum part, bubble sort is slower asymptotically, but it has a big advantage for small n in that it doesn't require the merging/creation of new arrays. In some implementations, this might make it faster in real time.

only one pass is needed , for worst case , to find maximum u just have to traverse the whole array , so bubble would be better ..

Merge sort is easy for a computer to sort the elements and it takes less time to sort than bubble sort. Best case with merge sort is n*log2n and worst case is n*log2n. With bubble sort best case is O(n) and worst case is O(n2).

Related

Insertion Sort is a good choice for "small" data sets. What is "small"?

There are many places I have seen where it talks about how Insertion Sort is good for small data sets. I can't find a number for what "small" is though. My guess is that there is no absolute answer and that it depends on the type of machine the code is being run on.
However, what factors go into deciding what is the threshold for when Insertion Sort is a good idea? And what are some ballpark figures for "small"? 5? 10? 50? 100?
Thanks!
Site saying Insertion Sort is good for small data sets:
https://www.toptal.com/developers/sorting-algorithms/insertion-sort
Yes, your guess is right - there is no absolute answer, one have to measure where is threshold between insertion sort and other methods.
For example, typical values for triggering to insertion sort (and get some gain, of course) for small pieces inside combined merge or quick sort are about 32-100 (but can vary depending on data and implementation details)
An attempt at an answer, providing we're talking about the general sorting problem. Insertion sort is on average O(n^2), efficient sorting algorithms are on average O(nlogn). So vaguely speaking if something takes K steps to sort efficiently it will take around (kind of) K^2 steps with insertion sort.
So if n > K is too slow for your liking with an efficient sort, n > K^0.5 will be too slow for you (roughly) with insertion sort.
Practically speaking let's say you're happy to sort arrays of size 10^8 with something efficient then you might be happy to sort arrays of size 10^4 with insertion sort.

Sorting method for not so scrambled data

I want to sort a million numbers. I already have them in memory (let's assume they fit) and I know for a fact it is very very likely that any given number is originally in a position quite close to it's final position after sorting (i.e. the 1000th number in the original data will very very likely end up between positions 900 and 1100 after sorting).
Which sorting method would perform best in this case? And most importantly, why would it perform better than the others? Assuming memory is big enough for any common method.
Plain old insertion sort runs in time O(nk) if you run it on n elements that are all at most k spots from their final positions. Many sorting algorithms, notably introsort, use this fact by having the sorting algorithm stop sorting once the elements are close enough and then switching to insertion sort. Since insertion sort has very low constant factors hidden in the big-O, I'd suspect it would work quite well here.

Is my analysis for identifying the best sorting algorithm to solve this task correct?

This was an interview question and I am wondering if my analysis was correct:
A 'magic select' function basically generates the 'mth' smallest value in an array that has a size of n. The task was to sort the 'm' elements in ascending order using an efficient algorithm. My analysis was to first use the 'magic select' function to get the 'mth' smallest value. I then used a partition function to sort of create a pivot to get all smaller elements on the left. After that point, I felt that a bucket sort should accomplish the task of sorting the left half efficiently.
I was just wondering if this was the best way to sort the 'm' smallest elements. I see the possibility of a quick sort being used here too. However, I thought that avoiding a comparison based sorting could lead to an O(n). Could radix sort or heap sort (O(nlogn)) be used for this too? If I didn't do it in the best possible way, which could be the best possible way to accomplish this? An array was the data structure I was allowed to use.
Many thanks!
I'm pretty sure you can't do any better than any standard algorithm for selecting the k lowest elements out of an array in sorted order. The time complexity of your "magic machine" is O(n), which is the same time complexity you'd get from a standard selection algorithm like the median-of-medians algorithm or quickselect.
Consequently, your approaches seem very reasonable. I doubt you can do any better asymptotically.
Hope this helps!

Insertion sort better than Bubble sort?

I am doing my revision for the exam.
Would like to know under what condition will Insertion sort performs better than bubble sort given same average case complexity of O(N^2).
I did found some related articles, but I can't understand them.
Would anyone mind explaining it in a simple way?
The advantage of bubblesort is in the speed of detecting an already sorted list:
BubbleSort Best Case Scenario: O(n)
However, even in this case insertion sort got better/same performance.
Bubblesort is, more or less, only good for understanding and/or teaching the mechanism of sortalgorithm, but wont find a proper usage in programming these days, because its complexity
O(n²)
means that its efficiency decreases dramatically on lists of more than a small number of elements.
Following things came to my mind:
Bubble sort always takes one more pass over array to determine if it's sorted. On the other hand, insertion sort not need this -- once last element inserted, algorithm guarantees that array is sorted.
Bubble sort does n comparisons on every pass. Insertion sort does less than n comparisons: once the algorithm finds the position where to insert current element it stops making comparisons and takes next element.
Finally, quote from wikipedia article:
Bubble sort also interacts poorly with modern CPU hardware. It
requires at least twice as many writes as insertion sort, twice as
many cache misses, and asymptotically more branch mispredictions.
Experiments by Astrachan sorting strings in Java show bubble sort to
be roughly 5 times slower than insertion sort and 40% slower than
selection sort
You can find link to original research paper there.
I guess the answer you're looking for is here:
Bubble sort may also be efficiently used on a list that is already
sorted except for a very small number of elements. For example, if
only one element is not in order, bubble sort will take only 2n time.
If two elements are not in order, bubble sort will take only at most
3n time...
and
Insertion sort is a simple sorting algorithm that is relatively
efficient for small lists and mostly sorted lists, and often is used
as part of more sophisticated algorithms
Could you provide links to the related articles you don't understand? I'm not sure what aspects they might be addressing. Other than that, there is a theoretical difference which might be that bubble sort is more suited for collections represented as arrays (than it is for those represented as linked lists), while insertion sort is suited for linked lists.
The reasoning would be that bubble sort always swaps two items at a time which is trivial on both, array and linked list (more efficient on arrays), while insertion sort inserts at a place in a given list which is trivial for linked lists but involves moving all subsequent elements in an array to the right.
That being said, take it with a grain of salt. First of all, sorting arrays is, in practice, almost always faster than sorting linked lists. Simply due to the fact that scanning the list once has an enormous difference already. Apart from that, moving n elements of an array to the right, is much faster than performing n (or even n/2) swaps. This is why other answers correctly claim insertion sort to be superior in general, and why I really wonder about the articles you read, because I fail to think of a simple way of saying this is better in cases A, and that is better in cases B.
In the worst case both tend to perform at O(n^2)
In the best case scenario, i.e., when the array is already sorted, Bubble sort can perform at O(n).

Determining the order of a list of numbers (possibly without sorting)

I have an array of unique integers (e.g. val[i]), in arbitrary order, and I would like to populate another array (ord[i]) with the the sorted indexes of the integers. In other words, val[ord[i]] is in sorted order for increasing i.
Right now, I just fill in ord with 0, ..., N, then sort it based on the value array, but I am wondering if we can be more efficient about it since ord is not populated to begin with. This is more of a question out of curiousity; I don't really care about the extra overhead from having to prepopulate a list and then sort it (it's small, I use insertion sort). This may be a silly question with an obvious answer, but I couldn't find anything online.
In terms of time complexity, there can't be a faster method than sorting. If there was, then you could use it to sort faster: Generate the indices and then use them to reorder the original array to be in sorted order. This reordering would take linear time, so overall you would have a faster sort algorithm, creating a contradiction.
insertion sort is iirc pretty low c so works ok on small lists (its also good for almost sorted lists) however are you sure your list is small enough that a sort with better worst case complexity wouldnt be better? a non-in-place merge or quick sort would seem to fit the bill well (a quick sort may well devolve to another sort for very small lists anyway as an optimization).
ultimately to know which is quicker you will have to profile, big O only tells you how complexity grows as n -> infinity

Resources