Sorting algorithm problem - algorithm

Here's a brain teaser that's been on my mind for a few days.
We have a sequence S of n elements. Each element is an integer in the range [0, n^2-1]. Describe a simple method for sorting S in O(n) time.
Probably something obvious that I am just missing, but I'd appreciate any insight.

Bucket Sort!
Bucket sort, or bin sort, is a sorting algorithm that works by partitioning an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, and is a cousin of radix sort in the most to least significant digit flavour. Bucket sort is a generalization of pigeonhole sort. Since bucket sort is not a comparison sort, the Ω(n log n) lower bound is inapplicable. The computational complexity estimates involve the number of buckets.

Radix Sort! (which is just a special case of bucket sort.)

Write in base n and do a bucket sort, by doing a counting sort for each bucket (buckets correspond to digits in base n).
O(n) time, O(n) space.

Quicksort is O(n log n), as the standard "good algo" way to sort a list. So there has to be some kind of "trick" to get down to O(n) time.
The only real trick to this data is it goes from 0 to n^2-1... but I can't think of how this could be used to sort in O(n) time....
P.S. Sounds like homework, not something you would "puzzle on for the sake of knowledge"
P.P.S. I fail for not thinking of bucket sort.

Related

Sorting Algorithms with time complexity Log(n)

Is there any sorting algorithm with an average time complexity log(n)??
example [8,2,7,5,0,1]
sort given array with time complexity log(n)
No; this is, in fact, impossible for an arbitrary list! We can prove this fairly simply: the absolute minimum thing we must do for a sort is look at each element in the list at least once. After all, an element may belong anywhere in the sorted list; if we don't even look at an element, it's impossible for us to sort the array. This means that any sorting algorithm has a lower bound of n, and since n > log(n), a log(n) sort is impossible.
Although n is the lower bound, most sorts (like merge sort, quick sort) are n*log(n) time. In fact, while we can sort purely numerical lists in n time in some cases with radix sort, we actually have no way to, say, sort arbitrary objects like strings in less than n*log(n).
That said, there may be times when the list is not arbitrary; ex. we have a list that is entirely sorted except for one element, and we need to put that element in the list. In that case, methods like binary search tree can let you insert in log(n), but this is only possible because we are operating on a single element. Building up a tree (ie. performing n inserts) is n*log(n) time.
As #dominicm00 also mentioned the answer is no.
In general when you see an algorithm with time complexity of Log N with base 2 that means that, you are dividing the input list into 2 sets, and getting rid of one of them repeatedly. In sorting algorithm we need to put all the elements in their appropriate place, if we get rid of half of the list in each iteration, that does not correlate with sorting functionality.
The most efficient sorting algorithms have the time complexity of O(n), but with some limitations. Three most famous algorithm with complexity of O(n) are :
Counting sort with time complexity of O(n+k), while k is the maximum number in given list. Assuming n>>k, you can consider its time complexity as O(n)
Radix sort with time complexity of O(d*(n+k)), where k is maximum number of input list and d is maximum number of digits you may have in input list. Similar to counting sort assuming n>>k && n>>d => time complexity will be O(n)
Bucket sort with time complexity of O(n)
But in general due to limitation of each of these algorithms most implementation relies on O(n* log n) algorithms, such as merge sort, quick sort, and heap sort.
Also there are some sorting algorithms with time complexity of O(n^2) which are recommended for list with smaller sizes such as insertion sort, selection sort, and bubble sort.
Using a PLA it might be possible to implement counting sort for a few elements with a low range of values.
count each amount in parallel and sum using lg2(N) steps
find the offset of each element in lg2(N) steps
write the array in O(1)
Only massive parallel computation would be able to do this, general purpose CPU's would not do here unless they implement it in silicon as part of their SIMD.

Quick Sort vs Insertion Sort

When building a sorting algorithm to sort an array, how many n elements in the array is quick sort faster that Insertion sort? I know that Quick sort is good for more elements and that Insertion sort is great for smaller size. But was wondering around what size is Quick Sort a far better option than Insertion Sort?
These algorithms depend on more than just the size of the arrays to determine their run time. For quicksort, the pivot your algorithm selects can have a significant effect on runtime. If the pivot is consistently the greatest or least element, then the quicksort takes O(n^2). Insertion sort is also influenced by factors besides array size. If you are inserting elements in order, the algorithm might allow for a runtime of O(n) regardless of array size. However, if you are inserting in reverse-order, this algorithm will take O(n^2). Due to these factors, there is no size n for which one algorithm is guaranteed to perform better than the other. If you are concerned with the runtimes of sorting algorithms for large arrays, you should check out heapsort or mergesort, they are both O(n log n) and are much faster!

O(nlogn) in-place sorting algorithm

This question was in the preparation exam for my midterm in introduction to computer science.
There exists an algorithm which can find the kth element in a list in
O(n) time, and suppose that it is in place. Using this algorithm,
write an in place sorting algorithm that runs in worst case time
O(n*log(n)), and prove that it does. Given that this algorithm exists,
why is mergesort still used?
I assume I must write some alternate form of the quicksort algorithm, which has a worst case of O(n^2), since merge-sort is not an in-place algorithm. What confuses me is the given algorithm to find the kth element in a list. Isn't a simple loop iteration through through the elements of an array already a O(n) algorithm?
How can the provided algorithm make any difference in the running time of the sorting algorithm if it does not change anything in the execution time? I don't see how used with either quicksort, insertion sort or selection sort, it could lower the worst case to O(nlogn). Any input is appreciated!
Check wiki, namely the "Selection by sorting" section:
Similarly, given a median-selection algorithm or general selection algorithm applied to find the median, one can use it as a pivot strategy in Quicksort, obtaining a sorting algorithm. If the selection algorithm is optimal, meaning O(n), then the resulting sorting algorithm is optimal, meaning O(n log n). The median is the best pivot for sorting, as it evenly divides the data, and thus guarantees optimal sorting, assuming the selection algorithm is optimal. A sorting analog to median of medians exists, using the pivot strategy (approximate median) in Quicksort, and similarly yields an optimal Quicksort.
The short answer why mergesort is prefered over quicksort in some cases is that it is stable (while quicksort is not).
Reasons for merge sort. Merge Sort is stable. Merge sort does more moves but fewer compares than quick sort. If the compare overhead is greater than move overhead, then merge sort is faster. One situation where compare overhead may be greater is sorting an array of indices or pointers to objects, like strings.
If sorting a linked list, then merge sort using an array of pointers to the first nodes of working lists is the fastest method I'm aware of. This is how HP / Microsoft std::list::sort() is implemented. In the array of pointers, array[i] is either NULL or points to a list of length pow(2,i) (except the last pointer points to a list of unlimited length).
I found the solution:
if(start>stop) 2 op.
pivot<-partition(A, start, stop) 2 op. + n
quickSort(A, start, pivot-1) 2 op. + T(n/2)
quickSort(A, pibvot+1, stop) 2 op. + T(n/2)
T(n)=8+2T(n/2)+n k=1
=8+2(8+2T(n/4)+n/2)+n
=24+4T(n/4)+2n K=2
...
=(2^K-1)*8+2^k*T(n/2^k)+kn
Recursion finishes when n=2^k <==> k=log2(n)
T(n)=(2^(log2(n))-1)*8+2^(log2(n))*2+log2(n)*n
=n-8+2n+nlog2(n)
=3n+nlog2(n)-8
=n(3+log2(n))-8
is O(nlogn)
Quick sort have worstcase O(n^2), but that only occurs if you have bad luck when choosing the pivot. If you can select the kth element in O(n) that means you can choose a good pivot by doing O(n) extra steps. That yields a woest-case O(nlogn) algorithm. There are a couple of reasons why mergesort is still used. First, this selection algorithm is more or less cumbersome to implement in-place, and also adds several extra operations to the regular quicksort, so it is not that fastest than merge sort, as one might expect.
Nevertheless, MergeSort is not still used because of its worst time complexity, in fact HeapSort achieves the same worst case bounds and is also in place, and didn't replace MergeSort, though it has also other disadvantages against quicksort. The main reason why MergeSort survives is because it is the fastest stable sort algorithm know so far. There are several applications in which is paramount to have an stable sorting algorithm. And that is the strength of MergeSort.
A stable sort is such that the equal items preserve the original relative order. For example, this is very useful when you have two keys, and you want to sort by first key first and then by second key, preserving the first key order.
The problem with HeapSort against quicksort is that it is cache inefficient, since you swap/compare elements too far from each other in the array, while quicksort compares consequent elements, these elements are more likely to be in the cache at the same time.

appropriate sorting algorithm

I am a little unsure of my answer to the question below. Please help:
Suppose you are given a list of N integers. All but one of the integers are sorted in numerical order. Identify a sorting algorithm which will sort this special case in O(N) time and explain why this sorting algorithm achieves O(N) runtime in this case.
I think it is insertion sort but am not sure why that is the case.
Thanks!!
Insertion sort is adaptive, and is efficient for substantially sorted data set. It can sort almost sorted data in O(n+d) where d is number of inversions and in your case d is 1.

What is the worst case complexity for bucket sort?

I just read the Wikipedia page about Bucket sort. In this article they say that the worst case complexity is O(n²). But I thought the worst case complexity was O(n + k) where k are the number of buckets. This is how I calculate this complexity:
Add the element to the bucket. Using a linked list this is O(1)
Going through the list and put the elements in the correct bucket = O(n)
Merging the buckets = O(k)
O(1) * O(n) + O(k) = O(n + k)
Am I missing something?
In order to merge the buckets, they first need to be sorted. Consider the pseudocode given in the Wikipedia article:
function bucketSort(array, n) is
buckets ← new array of n empty lists
for i = 0 to (length(array)-1) do
insert array[i] into buckets[msbits(array[i], k)]
for i = 0 to n - 1 do
nextSort(buckets[i])
return the concatenation of buckets[0], ..., buckets[n-1]
The nextSort(buckets[i]) sorts each of the individual buckets. Generally, a different sort is used to sort the buckets (i.e. insertion sort), as once you get down and size, different, non-recursive sorts often give you better performance.
Now, consider the case where all n elements end up in the same bucket. If we use insertion sort to sort individual buckets, this could lead to the worst case performance of O(n^2). I think the answer must be dependent on the sort you choose to sort the individual buckets.
What if the algorithm decides that every element belongs in the same bucket? In that case, the linked list in that bucket needs to be traversed every time an element is added. That takes 1 step, then 2, then 3, 4, 5... n . Thus the time is the sum of all of the numbers from 1 to n which is (n^2 + n)/2, which is O(n^2).
Of course, this is "worst case" (all the elements in one bucket) - the algorithm to calculate which bucket to place an element is generally designed to avoid this behavior.
If you can guarantee that each bucket represents a unique value (equivalent items), then the worst case time complexity would be O(m+n) as you pointed out.
Bucket sort assumes that the input is drawn from a uniform distribution. This implies that a few items fall in each bucket. In turn, this leads to a nice average running time of O(n). Indeed, if the n elements are inserted in each bucket so that O(1) elements fall in each different bucket (insertion requires O(1) per item), then sorting a bucket using insertion sort requires, on average, O(1) as well (this is proved in almost all textbooks on algorithms). Since you must sort n buckets, the average complexity is O(n).
Now, assume that the input is not drawn from a uniform distribution. As already pointed out by #mfrankli, this may lead in the worst case to a situation in which all of the items fall for example all in the first bucket. In this case, insertion sort will require in the worst case O(n^2).
Note that you may use the following trick to maintain the same average O(n) complexity, while providing an O(n log n) complexity in the worst case. Instead of using insertion sort, simply use an algorithm with O(n log n) complexity in the worst case: either merge sort or heap sort (but not quick sort, which achieves O(n log n) only on average).
This is an add-on answer to #perreal. I tried to post it as a comment but it's too long. #perreal is correctly pointing out when bucket sort makes the most sense. The different answers are making different assumptions about what data is being sorted. E.G. if the keys to be sorted are strings, then the range of possible keys will be too large (larger than the bucket array), and we will have to only use the first character of the string for the bucket positions or some other strategy. The individual buckets will have to be sorted because they hold items with different keys, leading to O(n^2).
But if we are sorting data where the keys are integers in a known range, then the buckets are always already sorted because the keys in the bucket are equal, which leads to the linear time sort. Not only are the buckets sorted, but the sort is stable because we can pull items out of the bucket array in the order they were added.
The thing that I wanted to add is that if you are facing O(n^2) because of the nature of the keys to be sorted, bucket sort might not be the right approach. When you have a range of possible keys that is proportional to the size of the input, then you can take advantage of the linear time bucket sort by having each bucket hold only 1 value of a key.

Resources