Definition of non comparison sort? - algorithm

I'm going through sorting algorithms. Radix sort is stated as an non comparison sort but it compares digits in a number and sorts them. Can some one Please let me know what does non comparison sort actually mean?

To my understanding, the differences between comparison and non-comparison sort algorithm are not whether there are comparisons in the algorithm, but whether they use the internal character of the items to be sorted.
A comparison sort algorithm sorts items by comparing values between each other. It can be applied to any sorting cases. And the best complexity is O(n*log(n)) which can be proved mathematically.
A non-comparison sort algorithm uses the internal character of the values to be sorted. It can only be applied to some particular cases, and requires particular values. And the best complexity is probably better depending on cases, such as O(n).
All sorting problem that can be sorted with non-comparison sort algorithm can be sorted with comparison sort algorithm, but not vice versa.
For Radix sort, it benefits from that the sorted items are numbers that can be reduced to digits. It cares about what the sorted items are. While the comparison sort algorithm only needs an order of items.

A comparison sort algorithm compares pairs of the items being sorted and the output of each comparison is binary(i.e. smaller than or not smaller than). Radix sort considers the digits of the numbers in sequence and instead of comparing them, groups numbers in buckets with respect to the value of the digit(in a stable manner). Note that the digit is not compared to anything - it is simply put in a bucket corresponding to its value.
It is important to know why we care about comparison/non-comparison sort algorithms. If we use a comparison sort algorithm then on each comparison we will split the set of possible outcomes roughly in half(because the output is binary) thus the best complexity we can possibly have is O(log(n!)) = O(n*log(n)). This restriction does not hold for non-comparison sorts.

Related

Modified version of merge sort which uses insertion sort

After I have divided an array using merge sort,till the array has length k,I'm supposed to use insertion sort on the k length array and then continue with merging. What should be the optimal value of k?
Also, I found these questions similar to mine but didn't find a definite answer
Choosing minimum length k of array for merge sort where use of insertion sort to sort the subarrays is more optimal than standard merge sort
Modification to merge sort to implement merge sort with insertion sort Java
Just measure.
The best threshold value depends on your programming language, data type, daata set value distribution, computer hardware, mergesort and insertion sort implementation details and so on.
Usually this value is in range 10-200, and the gain for the best value is not very significant.
This I feel was a more proper answer for my question http://atekihcan.github.io/CLRS/P02-01/, quoting it here,
For the modified algorithm to have the same asymptotic running time as standard merge sort,
Θ(nk+nlg(n/k))=Θ(nk+nlgn−nlgk)
must be same as
Θ(nlgn).
To satisfy this condition, k cannot grow faster than lg⁡n asymptotically (if k grows faster than lg⁡n, because of the nk term, the algorithm will run at worse asymptotic time than Θ(nlgn). But just this argument is not enough as we have to check for
k=Θ(lgn),
the requirement holds or not.
If we assume,
k=Θ(lgn),
Θ(nk+nlg(n/k))=Θ(nk+nlgn−nlgk)=Θ(nlgn+nlgn−nlg(lgn))=Θ(2nlgn−nlg(lgn))†=Θ(nlgn)
†lg(lgn) is very small compared to lgn for sufficiently larger values of n.

Sorting n strings of size n?

I want to sort n strings each of length n in O(n^2) ? Is their any other solution beside radix based sorting or trie based ?
Let's suppose you try using a comparison-based sort. Comparing two strings of length n takes, in the worst case, time O(n). Consequently, in the worst case, you'd have to make only O(n) comparisons to sort the strings. However, that's impossible, since comparison-based sorting algorithms require Ω(n log n) comparisons on average. This worst-case can be realized; starting with the array x1, x2, ..., xn, you can form the strings an-1x1, an-1x2, ..., an-1xn, and comparisons will take time Θ(n) each.
That rules out comparison sorts, which leaves behind approaches that harness actual properties of strings. The approaches you listed - trie-based approaches and radix sort - form the basis for most of the algorithms along these lines (in fact, to the best of my knowledge, all string sorting algorithms are variations on these themes). There are nicely-optimized implementations of these algorithms. For example, burstsort is an optimized radix sort that's designed to be cache-friendly, and therefore has better performance than the naive algorithm.
One detail that you need to keep in mind is that the size of the alphabet from which your strings are drawn does have an impact on the runtime. Radix sort on strings of length n more properly takes time O(n2 + n|Σ|), where |Σ| is the number of different characters that the strings can be made from. Consequently, you can't necessarily use radix sort to sort n strings of length n in time O(n2) if there are way more than n characters in your alphabet.

Algorithm comparison in unsorted array

If I have a unsorted array A[1.....n]
using linear search to search number x
using bubble sorting to sort the array A in ascending order, then use binary search to search number x in sorted array
Which way will be more efficient — 1 or 2?
How to justify it?
If you need to search for a single number, nothing can beat a linear search: sorting cannot proceed faster than O(n), and even that is achievable only in special cases. Moreover, bubble sort is extremely inefficient, taking O(n2) time. Binary search is faster than that, so the overall timing is going to be dominated by O(n2).
Hence you are comparing O(n) to O(n2); obviously, O(n) wins.
The picture would be different if you needed to search for k different numbers, where k is larger than n2. The outcome of this comparison may very well be negative.

appropriate sorting algorithm

I am a little unsure of my answer to the question below. Please help:
Suppose you are given a list of N integers. All but one of the integers are sorted in numerical order. Identify a sorting algorithm which will sort this special case in O(N) time and explain why this sorting algorithm achieves O(N) runtime in this case.
I think it is insertion sort but am not sure why that is the case.
Thanks!!
Insertion sort is adaptive, and is efficient for substantially sorted data set. It can sort almost sorted data in O(n+d) where d is number of inversions and in your case d is 1.

Limitations of comparison based sorting techniques

Comparison sort is opted for in most of the scenarios where data needs to be ordered. Techniques like merge sort, quick sort, insertion sort and other comparison sorts can handle different data types and efficiency with a lower limit of O(nLog(n)).
My questions are
Are there any limitations of comparison based sorting techniques?
Any sort of scenarios where non-comparison sorting techniques would be used?
cheers
You answered it more or less yourself. Comparison based sorting techniques are limited to lower limit of O(n Log(n)). Non-comparison based sorting techniques do not suffer from this limit. The general problem with non-sorting algorithms is that the domain must be better known and for that reason they aren't as versatile as comparison based techniques.
Pigeonhole sort is a great and quite simple example which is pretty fast as long as the number of possible key values is close to the number of elements.
Obviously the limitations of comparison sorts is the time factor - some are better than others, but given a large enough data set, they'll all get too slow at some point. The trick is to choose the right one given the kind and mix of data you're sorting.
Non-comparison sorting is based on other factors ignoring the data, eg counting sort will order a collection of data, by inspecting each element - not comparing it with any other value in the collection. Counting sort is useful to order a collection based on some data, if you had a collection of integers, it would order them by taking all the elements with a value of 1 and putting them into the destination first, then all elements of value 2 etc (ok, it uses a "sparse" array to quickly zoom through the collection and reorder the values, leaving gaps but that's the basic principle)
It is easy to see, why the comparison sort needs about N log N comparisons. There are N! permutations and as we know ln (N!) is approximately N ln N - N + O(ln N). In big O notation, we can neglect terms lower than N ln N, and because ln and log differs only by constant, we get final result O(N log N)

Resources