Explanation of radix sort n x (k/d) - algorithm

I have looked at the best, average and worst case time for the radix sort algorithm.
The average is N X K / D
I understand that N is the number of elements in the algorithm
I understand that K is the number of keys/buckets
Does anyone know what D represents?
I am going by the table on wikipedia, thanks
Reference - http://en.wikipedia.org/wiki/Sorting_algorithm#Radix_sort

D is the number of digits in base K.
For example, if you have K = 16, and the largest number is 255, D = 2 (16 ^ 2 = 256). If you change K to 4, then Dbecomes 4 (4 ^ 4 = 256).

The running time of Radix Sort is commonly simplified to O(n), but there are several factors that can cause this running time to increase dramatically.
Assigning variables:
n = the number of elements to be sorted/n
L = the length of the elements aka the number of digits in each element
k = the range of the digits in each element (digits range from 1 to k)
The radix sort algorithm performs a bucket sort for each digit in every element.
Therefore, L sorts must be done and each sort takes O(n+k) time because it is a bucket sort of n elements into k buckets. Therefore, the more accurate running time of Radix Sort is O(L(n+k)).
When the range of digits, k, is a small constant, as in decimal numbers, then the running time of Radix Sort can be simplified to O(Ln).
Due to these factors impacting the running time of the Radix Sort algorithm, there are certain considerations that need to be made about using the algorithm to perform a sort in order or it to be efficient. The data needs to:
have a fixed length (can choose to pad elements in order to create a uniform length for all elements)
have length of the elements, L, be linear in n
have the digit range, k, be linear in n

Related

Sort an array in linear time

I'm stuck on this question:
Given an array of n integers within the range of [0, 1, ... , n^5 - 1], how can you sort them in linear runtime? (O(n)) And more generally, if the range is [0, 1, ... , n^c - 1] (c is a natural constant bigger than 1) how would you do it? Describe a proper algorithm and explain.
My first thought was to convert the numbers to base n in both of the cases and then use radix sort (which uses counting sort as the sorting algorithm), but I've been told that I can't count on it that the conversion from decimal base to base n is O(1)
So basically I'm pretty stuck as I have no idea I can I do it...
Would be glad for help.
Hint: Bucket sort is a stable sort. So if you sort on condition A, then resort on condition B, you wind up sorted on B and then A.
Let's use // for integer division (dropping the remainder) and % for remainders. And now if you sort on x % m and then sort the output on (x // m) % m, you wind up with a list sorted on the last 2 digits in base b.
Is that enough to get you going?
The term "integers" implies the values are stored as binary numbers, not decimal. Since they're binary numbers use a base that is a power of 2, such as 256, which is common. Use shift and and instead of divide for fixed times.
For linear time complexity O(n), the code should sort based on the number of bits in an integer, typically 32 bits, so it would take 1 scan pass and 4 radix-sort passes. Since the 1 and 4 are constants, time complexity is O(n). If the sort is optimized to reduce the number of radix-sort passes based on range, although it is faster, it will have time complexity O(n log(range)).

number of comparisons needed to sort n values?

I am working on revised selection sort algorithm so that on each pass it finds both the largest and smallest values in the unsorted portion of the array. The sort then moves each of these values into its correct location by swapping array entries.
My question is - How many comparisons are necessary to sort n values?
In normal selection sort it is O(n) comparisons so I am not sure what will be in this case?
Normal selection sort requires O(n^2) comparisons.
At every run it makes K comparisons where K is n-1, n-2, n-3...1, and sum of this arithmetic progression is (n*(n-1)/2)
Your approach (if you are using optimized min/max choice scheme) use 3/2*K comparisons per run, where run length K is n, n-2, n-4...1
Sum of arithmetic progression with a(1)=1, a(n/2)=n, d=2 together with 3/2 multiplier is
3/2 * 1/2 * (n+1) * n/2 = 3/8 * n*(n+1) = O(n^2)
So complexity remains quadratic (and factor is very close to standard)
In your version of selection sort, first you would have to choose two elements as the minimum and maximum, and all of the remaining elements in the unsorted array can get compared with both of them in the worst case.
Let's say if k elements are remaining in the unsorted array, and assuming you pick up first two elements and accordingly assign them to minimum and maximum (1 comparison), then iterate over the rest of k-2 elements, each of which can result in 2 comparisons.So, total comparisons for this iteration will be = 1 + 2*(k-2) = 2*k - 3 comparisons.
Here k will take values as n, n-2, n-4, ... since in every iteration two elements get into their correct position. The summation will result in approximately O(n^2) comparisons.

What is k in counting sort O(n+k) time complexity?

Counting sort worst, best and average time complexity is O(n+k), where n is number of elements to sort. What is k exactly? I see different definitions: maximum element, difference between max element and min element, and so on.
Given array arr1 [1, 3, 5, 9, 12, 7 ] and arr2 [1,2,3,2,1,2,4,1,3,2]
what is k for arr1 and arr2?
Is it true that it is stupid to sort arr1 with counting sort because
n < k (element values are from a range which is wider than number of
elements to sort?
k is maximum possible value in array, assume you have array of length 5 which each number is an integer between 0 and 9, in this example k equals to 9
k is the range of the keys, i.e. the number of array slots it takes to cover all possible values. Thus in case of numbers, Max-Min+1. Of course this assumes that you don't waste space by assigning Min the first slot and Max the last.
It is appropriate to use counting sort when k does not exceed a small multiple of n, let n.k, as in this case, n.k can beat n.log n.
First an array of k counts is zeroed. Then the n elements in the array are read, and the elements of the k counts are incremented depending on the values of the n elements. On the output pass of a counting sort, the array of k counts is read, and array of n elements is written. So there are k writes (to zero the counts), n reads, then k reads and n writes for a total of 2n + 2k operations, but big O ignores the constant 2, so the time complexity is O(n + k).

Runtime of triple nested for loops

I'm currently going through "Cracking the coding interview" textbook and I'm reviewing Big-O and runtime. One of the examples were as such:
Print all positive integer solutions to the equation a^3 + b^3 = c^3 + d^3 where a, b, c, d are integers between 1 and 1000.
The psuedo code solution provided is:
n = 1000;
for c from 1 to n
for d from 1 to n
result = c^3 + d^3
append (c,d) to list at value map[result]
for each result, list in map
for each pair1 in list
for each pair2 in list
print pair1, pair2
The runtime is O(N^2)
I'm not sure how O(N^2) is obtained and after extensive googling and trying to figure out why, I still have no idea. My rational is as following:
Top half is O(N^2) because the outer loop goes to n and inner loop executes n times each.
The bottom half I'm not sure how to calculate, but I got O(size of map) * O(size of list) * O(size of list) = O(size of map) * O(size of list^2).
O(N^2) + O(size of map) * O(size of list^2)
The 2 for loops adding the pairs to the list of the map = O(N) * O(N) b/c it's 2 for loops running N times.
The outer for loop for iterating through the map = O(2N-1) = O(N) b/c the size of the map is 2N - 1 which is essentially N.
The 2 for loops for iterating through the pairs of each list = O(N) * O(N) b/c each list is <= N
Total runtime: O(N^2) + O(N) * O(N^2) = O(N^3)
Not sure what I'm missing here
Could someone help me figure out how O(N^2) is obtained or why my solution is incorrect. Sorry if my explanation is a bit confusing. Thanks
Based on the first part of the solution, sum(size of lists) == N. This means that the second part (nested loop) cannot be more complex then O(N^2). As you said, the complexity is O(size of map)*O(size of list^2), but it should rather be:
O(size of map)*(O(size of list1^2) + O(size of list2^2) + ... )
This means, that in the worst-case scenario we will get a map of size 1, and one list of size N, and the resulting complexity of O(1)*O((N-1)^2) <==> O(N^2)
In other scenarios the complexity will be lower. For instance if we have map of 2 elements, then we will get 2 lists with the total size of N. So the result will then be:
O(2)*( O(size of list1^2) + O(size of list2^2)), where (size of list1)+(size of list2) == N
and we know from basic maths that X^2 + Y^2 <= (X+Y)^2 for positive numbers.
The complexity of the second part is O(sum of (length of lists)^2 in map), since the length of the list varies depending on the we know that sum of length of lists in map is n^2 since we definitely added n^2 pairs in the first bit of the code. Since T(program) = O(n^2) + O(sum of length of lists in map) * O(sum of length of lists in map / size of map) = O(n^2) * O(sum of length of lists in map / size of map), it remains to show that sum of length of lists in map / size of map is O(1). Doing this requires quite a bit of number theory and unfortunately I can't help you there. But do check out these links for more info on how you would go about it: https://en.wikipedia.org/wiki/Taxicab_number
https://math.stackexchange.com/questions/1274816/numbers-that-can-be-expressed-as-the-sum-of-two-cubes-in-exactly-two-different-w
http://oeis.org/A001235
This is a very interesting question! cdo256 made some good points, I will try to explain a bit more and complete the picture.
It is more or less obvious that the key questions are - how many integers exist that can be expressed as a sum of two positive cubes in k different ways (where k >= 2), and what is the possible size of k ? This number determines the sizes of lists which are values of map, which determine the total complexity of the program. Our "search space" is from 2 to 2 * 10^9 because c and d both iterate from 1 to 1000, so the sum of their cubes is at most 2 * 10^9. If none of the numbers in the range [2, 2 * 10^9] could be expressed as a sum of two cubes in more than one way, than the complexity of our program would be O(n^2). Why? Well, first part is obviously O(n^2), and the second part depends on the size of lists which are values of map. But in this case all lists have size 1, and there are n^2 keys in map which gives O(n^2).
However, that is not the case, there is a famous example of "taxicub number" 1729, so let us return to our main question - the number of different ways to express an integer as a sum of two cubes of positive integers. This is an active field of research in number theory, and great summary is given in Joseph H. Silverman's article Taxicabs and Sums of Two Cubes. I recommend to read it thoroughly. Current records are given here. Some interesting facts:
smallest integer that can be expressed as a sum of two cubes of positive integers in three different ways is 87,539,319
smallest integer that can be expressed as a sum of two cubes of positive integers in four different ways is 6,963,472,309,248 (> 2*10^9)
smallest integer that can be expressed as a sum of two cubes of positive integers in six different ways is 24,153,319,581,254,312,065,344 (> 2*10^9)
As you can easily see e.g. here, there are only 2184 integers in range [2, 2 * 10^9] that are expressible as a sum of two positive cubes in two or three different ways, and for k = 4,5,.. these numbers are out of our range. Therefore, the number of keys in map is very close to n^2, and sizes of the value lists are at most 3, which implies that the complexity of the code
for each pair1 in list
for each pair2 in list
print pair1, pair2
is constant, so the total complexity is again O(n^2).

How to justify the correctness and runtime of an algorithm

How do you go about justifying the correctness and runtime of an algorithm?
For example, say I'm asked to justify the correctness and runtime of an algorithm that is essentially counting sort.. I know it runs in worst case O(n), but idk how to justify the correctness or prove that the runtime is O(n).
Question: Describe an algorithm to sort n integers, each in the range [0..n4 − 1], in O(n) time. Justify the correctness and the running time of your algorithm.
Work:
Algorithm to sort n integers in the range [0..(n^4)-1] in O(n)
Use countSort with respect to the least significant digit
countSort with respect to the next least significant digit
Represent each int x in the list as its 4 digits in base n
let k = (n^4)-1 the max value in the range
Since values range from 0..k, create k+1 buckets
Iterate through the list and increment counter each time a value appears
Fill input list with the data from the buckets where each key is a value in the list
From smallest to largest key, add bucket index to the input array
Variables:
array: list of ints to be sorted
result: output array (indexes from 0..n-1)
n: length of the input
k: value such that all keys are in range 0..k-1
count: array of ints with indexes 0..k-1 (starts with all = 0)
x: single input value
total/oCount: control variables
total=0
for x in array
count[key of x] ++
for i < length of k-1
oCount = count[i]
count[i] = total
total += oCount
for x in array
result[count[key of x]] = x
count[key of x] ++
return result
The algorithm uses simple loops without recursion. Initializing the count array and the middle for loop that calculates sum on the count array iterate at most k+1 times. The 1 is constant so this takes O(k). Looping to initialize the result array and input array will take O(n) time. These total to O(n+k) time. k is considered a constant, so the the final running time is O(n)
I need some help to point me in the correct direction. Thanks!

Resources