Maximum number of permutations - algorithm

Some sorting algorithms, like Insertion Sort, have a Θ(n) asymptotic runtime for some subset of the n! possible permutations of n elements, which means that for those permutations, the number of comparisons that Insertion Sort does is kn for some constant k. For a given constant k, what is the maximum number of permutations for which any given comparison sort could terminate within kn comparisons?

Number of operations in insertion sort depends on the number of inversions. So we need to evaluate number of permutations of n values (1..n for simplicity), containing exactly k inversions.
We can see that Inv(n, 0) = 1 - sorted array
Also Inv(0, k) = 0 - empty array
We can get array with n elements and k inversions:
-adding value n to the end of array with n-1 items and k inversions (so number of inversions remains the same)
-inserting value n before the end of array with n-1 items and k-1 inversions (so adding one inversion)
-inserting value n before two elements in the end of array with n-1 items and k-2 inversions (so adding two inversions)
-and so on
Using this approach, we can just fill a table Inv[n][k] row-by-row and cell-by-cell
Inv[n][k] = Sum(Inv[n-1][i]) where j=0..k

Every comparison at most doubles the never of input permutations you can distinguish. Thus, with kn comparisons you can sort at most 2^(kn) permutations.

Related

Efficient way to find sub-array dividing by n

I have an N-size array A, that contain natural numbers.
I need to find an efficient algorithm that finding a pair of indexes, so that the sum of the sub-array elements A[i..j], is divided by N without a reminder of devision.
Any ideas?
The key observation is:
sum(A[i..j]) = sum(A[1..j]) − sum(A[1..(i−1)])
so N divides sum(A[i..j]) if and only if sum(A[1..(i−1)]) and sum(A[1..j]) are congruent modulo N
that is, if sum(A[1..(i−1)]) and sum(A[1..j]) have same remainder when you divide both by N.
So if you just iterate over the array tallying the "sum so far", and keep track of the remainders you've already seen and the indexes where you saw them, then you can do this in O(N) time and O(N) extra space.

number of comparisons needed to sort n values?

I am working on revised selection sort algorithm so that on each pass it finds both the largest and smallest values in the unsorted portion of the array. The sort then moves each of these values into its correct location by swapping array entries.
My question is - How many comparisons are necessary to sort n values?
In normal selection sort it is O(n) comparisons so I am not sure what will be in this case?
Normal selection sort requires O(n^2) comparisons.
At every run it makes K comparisons where K is n-1, n-2, n-3...1, and sum of this arithmetic progression is (n*(n-1)/2)
Your approach (if you are using optimized min/max choice scheme) use 3/2*K comparisons per run, where run length K is n, n-2, n-4...1
Sum of arithmetic progression with a(1)=1, a(n/2)=n, d=2 together with 3/2 multiplier is
3/2 * 1/2 * (n+1) * n/2 = 3/8 * n*(n+1) = O(n^2)
So complexity remains quadratic (and factor is very close to standard)
In your version of selection sort, first you would have to choose two elements as the minimum and maximum, and all of the remaining elements in the unsorted array can get compared with both of them in the worst case.
Let's say if k elements are remaining in the unsorted array, and assuming you pick up first two elements and accordingly assign them to minimum and maximum (1 comparison), then iterate over the rest of k-2 elements, each of which can result in 2 comparisons.So, total comparisons for this iteration will be = 1 + 2*(k-2) = 2*k - 3 comparisons.
Here k will take values as n, n-2, n-4, ... since in every iteration two elements get into their correct position. The summation will result in approximately O(n^2) comparisons.

What is k in counting sort O(n+k) time complexity?

Counting sort worst, best and average time complexity is O(n+k), where n is number of elements to sort. What is k exactly? I see different definitions: maximum element, difference between max element and min element, and so on.
Given array arr1 [1, 3, 5, 9, 12, 7 ] and arr2 [1,2,3,2,1,2,4,1,3,2]
what is k for arr1 and arr2?
Is it true that it is stupid to sort arr1 with counting sort because
n < k (element values are from a range which is wider than number of
elements to sort?
k is maximum possible value in array, assume you have array of length 5 which each number is an integer between 0 and 9, in this example k equals to 9
k is the range of the keys, i.e. the number of array slots it takes to cover all possible values. Thus in case of numbers, Max-Min+1. Of course this assumes that you don't waste space by assigning Min the first slot and Max the last.
It is appropriate to use counting sort when k does not exceed a small multiple of n, let n.k, as in this case, n.k can beat n.log n.
First an array of k counts is zeroed. Then the n elements in the array are read, and the elements of the k counts are incremented depending on the values of the n elements. On the output pass of a counting sort, the array of k counts is read, and array of n elements is written. So there are k writes (to zero the counts), n reads, then k reads and n writes for a total of 2n + 2k operations, but big O ignores the constant 2, so the time complexity is O(n + k).

Finding the kth smallest element in a sequence where duplicates are compressed?

I've been asked to write a program to find the kth order statistic of a data set consisting of character and their occurrences. For example, I have a data set consisting of
B,A,C,A,B,C,A,D
Here I have A with 3 occurrences, B with 2 occurrences C with 2 occurrences and D with on occurrence. They can be grouped in pairs (characters, number of occurrences), so, for example, we could represent the above sequence as
(A,3), (B,2), (C,2) and (D,1).
Assuming than k is the number of these pairs, I am asked to find the kth of the data set in O(n) where n is the number of pairs.
I thought could sort the element based their number of occurrence and find their kth smallest elements, but that won't work in the time bounds. Can I please have some help on the algorithm for this problem?
Assuming that you have access to a linear-time selection algorithm, here's a simple divide-and-conquer algorithm for solving the problem. I'm going to let k denote the total number of pairs and m be the index you're looking for.
If there's just one pair, return the key in that pair.
Otherwise:
Using a linear-time selection algorithm, find the median element. Let medFreq be its frequency.
Sum up the frequencies of the elements less than the median. Call this less. Note that the number of elements less than or equal to the median is less + medFreq.
If less < m < less + medFreq, return the key in the median element.
Otherwise, if m ≤ less, recursively search for the mth element in the first half of the array.
Otherwise (m > less + medFreq), recursively search for the (m - less - medFreq)th element in the second half of the array.
The key insight here is that each iteration of this algorithm tosses out half of the pairs, so each recursive call is on an array half as large as the original array. This gives us the following recurrence relation:
T(k) = T(k / 2) + O(k)
Using the Master Theorem, this solves to O(k).

Finding sub-array sum in an integer array

Given an array of N positive integers. It can have n*(n+1)/2 sub-arrays including single element sub-arrays. Each sub-array has a sum S. Find S's for all sub-arrays is obviously O(n^2) as number of sub-arrays are O(n^2). Many sums S's may be repeated also. Is there any way to find count of all distinct sum (not the exact values of sums but only count) in O(n logn).
I tried an approach but stuck on the way. I iterated the array from index 1 to n.
Say a[i] is the given array. For each index i, a[i] will add to all the sums in which a[i-1] is involved and will include itself also as individual element. But duplicate will emerge if among sums in which a[i-1] is involved, the difference of two sums is a[i]. I mean that, say sums Sp and Sq end up at a[i-1] and difference of both is a[i]. Then Sp + a[i] equals Sq, giving Sq as a duplicate.
Say C[i] is count of the distinct sums in which end up at a[i].
So C[i] = C[i-1] + 1 - numbers of pairs of sums in which a[i-1] is involved whose difference is a[i].
But problem is to find the part of number of pairs in O(log n). Please give me some hint about this or if I am on wrong way and completely different approach is required problem point that out.
When S is not too large, we can count the distinct sums with one (fast) polynomial multiplication. When S is larger, N is hopefully small enough to use a quadratic algorithm.
Let x_1, x_2, ..., x_n be the array elements. Let y_0 = 0 and y_i = x_1 + x_2 + ... + x_i. Let P(z) = z^{y_0} + z^{y_1} + ... + z^{y_n}. Compute the product of polynomials P(z) * P(z^{-1}); the coefficient of z^k with k > 0 is nonzero if and only if k is a sub-array sum, so we just have to read off the number of nonzero coefficients of positive powers. The powers of z, moreover, range from -S to S, so the multiplication takes time on the order of S log S.
You can look at the sub-arrays as a kind of tree. In the sense that subarray [0,3] can be divided to [0,1] and [2,3].
So build up a tree, where nodes are defined by length of the subarray and it's staring offset in the original array, and whenever you compute a subarray, store the result in this tree.
When computing a sub-array, you can check this tree for existing pre-computed values.
Also, when dividing, parts of the array can be computed on different CPU cores, if that matters.
This solution assumes that you don't need all values at once, rather ad-hoc.
For the former, there could be some smarter solution.
Also, I assume that we're talking about counts of elements in 10000's and more. Otherwise, such work is a nice excercise but has not much of a practical value.

Resources