I heard that if we are sorting n numbers and that the numbers being sorted were converted to base n, then radix sort could be performed in O(n) time.
Have I got this right?
If so, how exactly is this achieved. If we are dealing with 5 numbers and we convert them all to base 5, we can separate the digits into 5 buckets (0, 1, 2, 3, 4).
Even if the numbers we were dealing with had only 7 digits max, wouldn't you still have to cycle through at least 7 * 5 times..? This doesn't seem right.. however.
Sorry, kind of confused about this.
Thanks for your help.
Radix sort works by picking some number base b, writing all the numbers in the input in base b, then sorting the numbers one digit at a time. In this answer, I'll focus on least-significant digit radix sort, in which we sort everything by the least-significant digit, then the second-least-significant digit, etc.
Each time we sort the numbers by some digit, we have to do O(n) work to distribute the elements across all the buckets, then O(n + b) work to iterate across the buckets and obtain the elements in sorted order. Therefore, the runtime for one round of radix sort is O(n + b).
The number of rounds of radix sort depends on the number of digits in each of the numbers. If you write numbers in base b, there will be O(logb M) base-b digits in the number M. If we let M denote the maximum number in the input array, the number of rounds of radix sort will then be O(logb M). Therefore, the asymptotic runtime of radix sort is
O(n + b) · O(logb M) = O((n + b) logb M).
In a typical binary radix sort, you'd pick b = 2 and get a runtime of O(n log M). However, you can choose b to be any value you'd like. If you pick a larger value of b, then there will be fewer base-b digits in each of the numbers (as an example, write a number in base-10 and then in base-16; you'll usually need fewer digits in base-16). In your original question, you asked
Even if the numbers we were dealing with had only 7 digits max, wouldn't you still have to cycle through at least 7 * 5 times?
The answer is "not necessarily." If you do a base-10 radix sort with 7-digit numbers, then yes, you'd have to cycle 7 times. However, if you used a base-100 radix sort, you'd only need to cycle 4 times.
Your other question was about using base-n for the radix sort. If we choose the base we use to be the number n, then we get that the runtime is
O((n + n) logn M) = O(n logn M) = O(n log M / log n)
(This uses the change-of-basis formula for logarithms to rewrite logn M = log M / log n).
This is not O(n), nor should you expect it to be. Think of it this way - radix sort's runtime depends on the length of the strings being sorted. If you sort a small number of extremely long numbers, the runtime is bound to be greater than the time to sort a small number of small numbers simply because you have to actually read the digits of the large number. The trick of using base n is simply a technique for speeding up the algorithm asymptotically.
Hope this helps!
Related
So i came upon this question where:
we have to sort n numbers between 0 and n^3 and the answer of time complexity is O(n) and the author solved it this way:
first we convert the base of these numbers to n in O(n), therefore now we have numbers with maximum 3 digits ( because of n^3 )
now we use radix sort and therefore the time is O(n)
so i have three questions :
1. is this correct? and the best time possible?
2. how is it possible to convert the base of n numbers in O(n)? like O(1) for each number? because some previous topics in this website said its O(M(n) log(n))?!
3. and if this is true, then it means we can sort any n numbers from 0 to n^m in O(n) ?!
( I searched about converting the base of n numbers and some said its
O(logn) for each number and some said its O(n) for n numbers so I got confused about this too)
1) Yes, it's correct. It is the best complexity possible, because any sort would have to at least look at the numbers and that is O(n).
2) Yes, each number is converted to base-n in O(1). Simple ways to do this take O(m^2) in the number of digits, under the usual assumption that you can do arithmetic operations on numbers up to O(n) in O(1) time. m is constant so O(m^2) is O(1)... But really this step is just to say that the radix you use in the radix sort is in O(n). If you implemented this for real, you'd use the smallest power of 2 >= n so you wouldn't need these conversions.
3) Yes, if m is constant. The simplest way takes m passes in an LSB-first radix sort with a radix of around n. Each pass takes O(n) time, and the algorithm requires O(n) extra memory (measured in words that can hold n).
So the author is correct. In practice, though, this is usually approached from the other direction. If you're going to write a function that sorts machine integers, then at some large input size it's going to be faster if you switch to a radix sort. If W is the maximum integer size, then this tradeoff point will be when n >= 2^(W/m) for some constant m. This says the same thing as your constraint, but makes it clear that we're thinking about large-sized inputs only.
There is wrong assumption that radix sort is O(n), it is not.
As described on i.e. wiki:
if all n keys are distinct, then w has to be at least log n for a
random-access machine to be able to store them in memory, which gives
at best a time complexity O(n log n).
The answer is no, "author implementation" is (at best) n log n. Also converting these numbers can take probably more than O(n)
is this correct?
Yes it's correct. If n is used as the base, then it will take 3 radix sort passes, where 3 is a constant, and since time complexity ignores constant factors, it's O(n).
and the best time possible?
Not always. Depending on the maximum value of n, a larger base could be used so that the sort is done in 2 radix sort passes or 1 counting sort pass.
how is it possible to convert the base of n numbers in O(n)? like O(1) for each number?
O(1) just means a constant time complexity == fixed number of operations per number. It doesn't matter if the method chosen is not the fastest if only time complexity is being considered. For example, using a, b, c to represent most to least significant digits and x as the number, then using integer math: a = x/(n^2), b = (x-(a*n^2))/n, c = x%n (assumes x >= 0). (side note - if n is a constant, then an optimizing compiler may convert the divisions into a multiply and shift sequence).
and if this is true, then it means we can sort any n numbers from 0 to n^m in O(n) ?!
Only if m is considered a constant. Otherwise it's O(m n).
I have been learning about Radix sort recently and one of the sources I have used is the Wikipedia page. At the moment there is the following paragraph there regarding the efficiency of the algorithm:
The topic of the efficiency of radix sort compared to other sorting
algorithms is somewhat tricky and subject to quite a lot of
misunderstandings. Whether radix sort is equally efficient, less
efficient or more efficient than the best comparison-based algorithms
depends on the details of the assumptions made. Radix sort complexity
is O(wn) for n keys which are integers of word size w. Sometimes w is
presented as a constant, which would make radix sort better (for
sufficiently large n) than the best comparison-based sorting
algorithms, which all perform O(n log n) comparisons to sort n keys.
However, in general w cannot be considered a constant: if all n
keys are distinct, then w has to be at least log n for a random-access
machine to be able to store them in memory, which gives at best a time
complexity O(n log n). That would seem to make radix sort at most
equally efficient as the best comparison-based sorts (and worse if
keys are much longer than log n).
The part in bold has regrettably become a bit of a block that I am unable to get past. I understand that in general Radix sort is O(wn), and through other sources have seen how O(n) can be achieved, but cannot quite understand why n distinct keys requires O(n log n) time for storage in a random-access machine. I'm fairly certain it comes down to some simple mathematics, but unfortunately a solid understanding remains just beyond my grasp.
My closest attempt is as follows:
Given a base, 'B' and a number in that base, 'N', The maximum digits 'N' can have is:
(logB of N) + 1.
If each number in a given list, L, is unique, then we have up to:
L *((logB of N) + 1) possibilities
At which point I'm unsure how to progress.
Is anyone able to please expand on the above section in bold and break down why n distinct keys requires a minimum of log n for random-access storage?
Assuming MSB radix sort with constant m bins:
For an arbitrarily large data type which must accommodate at least n distinct values, the number of bits required is N = ceiling(log2(n))
Thus the amount of memory required to store each value is also O(log n); assuming sequential memory access, the time complexity of reading / writing a value is O(N) = O(log n), although can use pointers instead
The number of digits is O(N / m) = O(log n)
Importantly, each consecutive digit must differ by a power-of-2, i.e. m must also be a power-of-2; assume this to be small enough for the HW platform, e.g. 4-bit digits = 16 bins
During sorting:
For each radix pass, of which there are O(log n):
Count each bucket: get the value of the current digit using bit operations - O(1) for all n values. Should note that each counter must also be N bits, although increments by 1 will be (amortized) O(1). If we had used non-power-of-2 digits, this would in general be O(log n log log n) ( source )
Make the bucket count array cumulative: must perform m - 1 additions, each of which is O(N) = O(log n) (unlike the increment special case)
Write the output array: loop through n values, determine the bin again, and write the pointer with the correct offset
Thus the total complexity is O(log n) * [ n * O(1) + m * O(log n) + n * O(1) ] = O(n log n).
I learned about radix sort, but still can't figure something out.
Let's say that my max number is nc (c is constant). Can I always change the base of the numbers to be to n and so the worst case complexity will be O(n)?
And if so, isn't the best way to sort an array may be to find the max value O(n) and then use radix sort?
There are two independent questions here:
If the maximum value in the array is nc for a fixed constant c, then does radix sort in base n always take time O(n)?
What is the complexity of finding the largest value in an array and then using a base-n radix sort?
For question (1), you are correct that the runtime will be O(n). The cost of doing a radix sort is O(n logb U), where b is the base of the radix sort and U is the maximum value in the array (this is because that number has Θ(logb U) base-b digits in it). In this case, the runtime is therefore O(n logn nc) = O(nc) = O(n), assuming that c is a fixed constant.
Notice that the preceding analysis assumes that c is a fixed constant known in advance. If you're given an array of arbitrary integer values and use a base-n radix sort, then the runtime will be O(n log U / log n), which is only O(n) if you are guaranteed in advance that the maximum value is at most nc for a fixed constant c. Since this isn't in general a true statement, you can't say that radix sort always runs in time O(n).
Based on this radix sort article http://www.geeksforgeeks.org/radix-sort/ I'm struggling to understand what is being explained in terms of the time complexity of certain methods in the sort.
From the link:
Let there be d digits in input integers. Radix Sort takes O(d*(n+b)) time where b is the base for representing numbers, for example, for decimal system, b is 10. What is the value of d? If k is the maximum possible value, then d would be O(log_b(k)). So overall time complexity is O((n+b)*logb(k)). Which looks more than the time complexity of comparison based sorting algorithms for a large k. Let us first limit k. Let k≤nc where c is a constant. In that case, the complexity becomes O(nlogb(n)).
So I do understand that the sort takes O(d*n) since there are d digits therefore d passes, and you have to process all n elements, but I lost it from there. A simple explanation would be really helpful.
Assuming we use bucket sort for the sorting on each digit: for each digit (d), we process all numbers (n), placing them in buckets for all possible values a digit may have (b).
We then need to process all the buckets, recreating the original list. Placing all items in the buckets takes O(n) time, recreating the list from all the buckets takes O(n + b) time (we have to iterate over all buckets and all elements inside them), and we do this for all digits, giving a running time of O(d * (n + b)).
This is only linear if d is a constant and b is not asymptotically larger than n. So indeed, if you have numbers of log n bits, it will take O(n log n) time.
I was reading wikipedia article on Radix sort and while describing its efficiency, it says
Radix sort efficiency is O(d·n) for n keys which have d or fewer
digits. Sometimes d is presented as a constant, which would make
radix sort better (for sufficiently large n) than the best
comparison-based sorting algorithms, which are all O(n·log(n))
number of comparisons needed. However, in general d cannot be
considered a constant. In particular, under the common (but sometimes
implicit) assumption that all keys are distinct, then d must be at
least of the order of log(n), which gives at best (with densely
packed keys) a time complexity O(n·log(n)).
Now what I don't understand is he line - assumption that all keys are distinct, then d must be at least of the order of log(n)
What exactly is it trying to say?
If we consider key are distinct then we have n distinct key, now assume the biggest key is k , we know that because all the numbers are distinct so k greater or equals to n. so k have log(k) digits and it is at least log(n) so d is O(log(n))
EDIT:
To be more clear about log in base 10 and 2 and Big-O read this post.
If all key are distinct then you can order them, and the biggest is at least n (considering positive integers only)
Then the number of digits of n is log10(n) that's why d is at least log(n)