Algorithm for the largest subarray of distinct values in linear time - algorithm

I'm trying to come up with a fast algorithm for, given any array of length n, obtaining the largest subarray of distinct values.
For example, the largest subarray of distinct values of
[1, 4, 3, 2, 4, 2, 8, 1, 9]
would be
[4, 2, 8, 1, 9]
This is my current solution, I think it runs in O(n^2). This is because check_dups runs in linear time, and it is called every time j or i increments.
arr = [0,...,n]
i = 0
j = 1
i_best = i
j_best = j
while i < n-1 and j < n:
if check_dups(arr, i j): //determines if there's duplicates in the subarray i,j in linear time
i += 1
else:
if j - i > j_best - i_best:
i_best = i
j_best = j
j += 1
return subarray(arr, i_best, j_best)
Does anyone have a better solution, in linear time?
Please note this is pseudocode and I'm not looking for an answer that relies on specific existing functions of a defined language (such as arr.contains()).
Thanks!

Consider the problem of finding the largest distinct-valued subarray ending at a particular index j. Conceptually this is straightforward: starting at arr[j], you go backwards and include all elements until you find a duplicate.
Let's use this intuition to solve this problem for all j from 0 up to length(arr). We need to know, at any point in the iteration, how far back we can go before we find a duplicate. That is, we need to know the least i such that subarray(arr, i, j) contains distinct values. (I'm assuming subarray treats the indices as inclusive.)
If we knew i at some point in the iteration (say, when j = k), can we quickly update i when j = k+1? Indeed, if we knew when was the last occurrence of arr[k+1], then we can update i := max(i, lastOccurrence(arr[k+1]) + 1). We can compute lastOccurrence in O(1) time with a HashMap.
Pseudocode:
arr = ... (from input)
map = empty HashMap
i = 0
i_best = 0
j_best = 0
for j from 0 to length(arr) - 1 inclusive:
if map contains-key arr[j]:
i = max(i, map[arr[j]] + 1)
map[arr[j]] = j
if j - i > j_best - i_best:
i_best = i
j_best = j
return subarray(arr, i_best, j_best)

We can adapt pkpnd's algorithm to use an array rather than hash map for an O(n log n) solution or potentially O(n) if your data allows for an O(n) stable sort, but you'd need to implement a stable sorting function that also provides the original indexes of the elements.
1 4 3 2 4 2 8 1 9
0 1 2 3 4 5 6 7 8 (indexes)
Sorted:
1 1 2 2 3 4 4 8 9
0 7 3 5 2 1 4 6 8 (indexes)
--- --- ---
Now, instead of a hash map, build a new array by iterating over the sorted array and inserting the last occurrence of each element according to the duplicate index arrangements. The final array would look like:
1 4 3 2 4 2 8 1 9
-1 -1 -1 -1 1 3 -1 0 -1 (previous occurrence)
We're now ready to run pkpnd's algorithm with a slight modification:
arr = ... (from input)
map = previous occurrence array
i = 0
i_best = 0
j_best = 0
for j from 0 to length(arr) - 1 inclusive:
if map[j] >= 0:
i = max(i, map[j] + 1)
if j - i > j_best - i_best:
i_best = i
j_best = j
return subarray(arr, i_best, j_best)
JavaScript code:
function f(arr, map){
let i = 0
let i_best = 0
let j_best = 0
for (j=0; j<arr.length; j++){
if (map[j] >= 0)
i = Math.max(i, map[j] + 1)
if (j - i > j_best - i_best){
i_best = i
j_best = j
}
}
return [i_best, j_best]
}
let arr = [ 1, 4, 3, 2, 4, 2, 8, 1, 9]
let map = [-1,-1,-1,-1, 1, 3,-1, 0,-1]
console.log(f(arr, map))
arr = [ 1, 2, 2, 2, 2, 2, 1]
map = [-1,-1, 1, 2, 3, 4, 0]
console.log(f(arr, map))

We can use Hashtable(Dictionary in c#)
public int[] FindSubarrayWithDistinctEntities(int[] arr)
{
Dictionary<int, int> dic = new Dictionary<int, int>();
Result r = new Result(); //struct containing start and end index for subarray
int result = 0;
r.st = 1;
r.end = 1;
for (int i = 0; i < arr.Length; i++)
{
if (dic.ContainsKey(arr[i]))
{
int diff = i - (dic[arr[i]] + 1);
if(result<diff)
{
result = diff;
r.st = Math.Min(r.st, (dic[arr[i]] + 1));
r.end = i-1;
}
dic.Remove(arr[i]);
}
dic.Add(arr[i], i);
}
return arr.Skip(r.st).Take(r.end).ToArray();
}

Add every number to Hashset if it isn't already in it. Hashset's insert and search are both O(1). So final result will be O(n).

Related

Min Abs Sum task from codility

There is already a topic about this task, but I'd like to ask about my specific approach.
The task is:
Let A be a non-empty array consisting of N integers.
The abs sum of two for a pair of indices (P, Q) is the absolute value
|A[P] + A[Q]|, for 0 ≤ P ≤ Q < N.
For example, the following array A:
A[0] = 1 A1 = 4 A[2] = -3 has pairs of indices (0, 0), (0,
1), (0, 2), (1, 1), (1, 2), (2, 2). The abs sum of two for the pair
(0, 0) is A[0] + A[0] = |1 + 1| = 2. The abs sum of two for the pair
(0, 1) is A[0] + A1 = |1 + 4| = 5. The abs sum of two for the pair
(0, 2) is A[0] + A[2] = |1 + (−3)| = 2. The abs sum of two for the
pair (1, 1) is A1 + A1 = |4 + 4| = 8. The abs sum of two for the
pair (1, 2) is A1 + A[2] = |4 + (−3)| = 1. The abs sum of two for
the pair (2, 2) is A[2] + A[2] = |(−3) + (−3)| = 6. Write a function:
def solution(A)
that, given a non-empty array A consisting of N integers, returns the
minimal abs sum of two for any pair of indices in this array.
For example, given the following array A:
A[0] = 1 A1 = 4 A[2] = -3 the function should return 1, as
explained above.
Given array A:
A[0] = -8 A1 = 4 A[2] = 5 A[3] =-10 A[4] = 3 the
function should return |(−8) + 5| = 3.
Write an efficient algorithm for the following assumptions:
N is an integer within the range [1..100,000]; each element of array A
is an integer within the range [−1,000,000,000..1,000,000,000].
The official solution is O(N*M^2), but I think it could be solved in O(N).
My approach is to first get rid of duplicates and sort the array. Then we check both ends and sompare the abs sum moving the ends by one towards each other. We try to move the left end, the right one or both. If this doesn't improve the result, our sum is the lowest. My code is:
def solution(A):
A = list(set(A))
n = len(A)
A.sort()
beg = 0
end = n - 1
min_sum = abs(A[beg] + A[end])
while True:
min_left = abs(A[beg+1] + A[end]) if beg+1 < n else float('inf')
min_right = abs(A[beg] + A[end-1]) if end-1 >= 0 else float('inf')
min_both = abs(A[beg+1] + A[end-1]) if beg+1 < n and end-1 >= 0 else float('inf')
min_all = min([min_left, min_right, min_both])
if min_sum <= min_all:
return min_sum
if min_left == min_all:
beg += 1
min_sum = min_left
elif min_right == min_all:
end -= 1
min_sum = min_right
else:
beg += 1
end -= 1
min_sum = min_both
It passes almost all of the tests, but not all. Is there some bug in my code or the approach is wrong?
EDIT:
After the aka.nice answer I was able to fix the code. It scores 100% now.
def solution(A):
A = list(set(A))
n = len(A)
A.sort()
beg = 0
end = n - 1
min_sum = abs(A[beg] + A[end])
while beg <= end:
min_left = abs(A[beg+1] + A[end]) if beg+1 < n else float('inf')
min_right = abs(A[beg] + A[end-1]) if end-1 >= 0 else float('inf')
min_all = min(min_left, min_right)
if min_all < min_sum:
min_sum = min_all
if min_left <= min_all:
beg += 1
else:
end -= 1
return min_sum
Just take this example for array A
-11 -5 -2 5 6 8 12
and execute your algorithm step by step, you get a premature return:
beg=0
end=6
min_sum=1
min_left=7
min_right=3
min_both=3
min_all=3
return min_sum
though there is a better solution abs(5-5)=0.
Hint: you should check the sign of A[beg] and A[end] to decide whether to continue or exit the loop. What to do if both >= 0, if both <= 0, else ?
Note that A.sort() has a non neglectable cost, likely O(N*log(N)), it will dominate the cost of the solution you exhibit.
By the way, what is M in the official cost O(N*M^2)?
And the link you provide is another problem (sum all the elements of A or their opposite).

Bounded square sum algorithm

The problem goes as follows:
You are given two arrays of integers a and b, and two integers lower and upper.
Your task is to find the number of pairs (i, j) such that lower ≤ a[i] * a[i] + b[j] * b[j] ≤ upper.
Example:
For a = [3, -1, 9], b = [100, 5, -2], lower = 7, and upper = 99, the output should be boundedSquareSum(a, b, lower, upper) = 4.
There are only four pairs that satisfy the requirement:
If i = 0 and j = 1, then a[0] = 3, b[1] = 5, and 7 ≤ 3 * 3 + 5 * 5 = 9 + 25 = 36 ≤ 99.
If i = 0 and j = 2, then a[0] = 3, b[2] = -2, and 7 ≤ 3 * 3 + (-2) * (-2) = 9 + 4 = 13 ≤ 99.
If i = 1 and j = 1, then a[1] = -1, b[1] = 5, and 7 ≤ (-1) * (-1) + 5 * 5 = 1 + 25 = 26 ≤ 99.
If i = 2 and j = 2, then a[2] = 9, b[2] = -2, and 7 ≤ 9 * 9 + (-2) * (-2) = 81 + 4 = 85 ≤ 99.
For a = [1, 2, 3, -1, -2, -3], b = [10], lower = 0, and upper = 100, the output should be boundedSquareSum(a, b, lower, upper) = 0.
Since the array b contains only one element 10 and the array a does not contain 0, it is not possible to satisfy 0 ≤ a[i] * a[i] + 10 * 10 ≤ 100.
Now, I know there is a brute force way to solve this, but what would be the optimal solution for this problem?
Sort the smaller array using the absolute value of the elements, then for each element in the unsorted array, binary search the interval on the sorted one.
You can break loop when calculation goes higher than upper limit.
I will reduce execution time.
function boundedSquareSum(a, b, lower, upper) {
let result = 0;
a = a.sort((i,j) => Math.abs(i) - Math.abs(j));
b = b.sort((i,j) => Math.abs(i) - Math.abs(j))
for(let i = 0; i < a.length; i++) {
let aValue = a[i] ** 2;
if(aValue > upper) {
break; // Don't need to check further
}
for(let j = 0; j < b.length; j++) {
let bValue = b[j] ** 2;
let total = aValue + bValue;
if(total > upper) {
break; // Don't need to check further
}
if((total >= lower && total <= upper) ) {
result++;
}
}
}
return result;
}

How to solve sequential split dynamically

There is a number of weight in array arr.
arr= [1,5,3,2,4], each of the value in arr contains weight.
n = 2, must have 2 blocks while split the weight and order cannot break for split
Combination 1:
block 0: [1] max: 1
block 1: [5,3,2,4] max: 5
----------------------------
sum of max from block 0 and 1 is 6
Combination 2:
block 0: [1,5] max: 5
block 1: [3,2,4] max: 4
----------------------------
sum of max from block 0 and 1 is 9
Combination 3:
block 0: [1,5,3] max: 5
block 1: [2,4] max: 4
----------------------------
sum of max from block 0 and 1 is 9
Combination 4:
block 0: [1,5,3, 2] max: 5
block 1: [4] max: 4
----------------------------
sum of max from block 0 and 1 is 9
So here answer is 6 from Combination 1
The hardest part of some problems is just stating them clearly. If you can do that, the code practically writes itself.
I think the problem statement is this: Find the minimum value of a function (f) applied at every index of an array (f(array, index)), where f is the sum of the max values of two subarrays formed by splitting the input array at the given index.
function f(array, index) {
let left = array.slice(0, index)
let right = array.slice(index)
return Math.max(...left) + Math.max(...right)
}
let array = [1, 5, 3, 2, 4]
let smallestMax = Infinity
for (let i=1; i<array.length; i++) {
let max = f(array, i)
smallestMax = max < smallestMax ? max : smallestMax
}
console.log(smallestMax)
#Danh has a straightforward O(N^2) solution, but linear time is also possible without too much more work, and if you have enough data, it'll make a huge difference. Dan did his in JS it looks like, so I'll try to do the same. A bit rusty on these style for loops, so they may be off by one, but quick test in the JS console gave me 6 as expected (after fixing a copy/paste error).
Idea is to pass left to right finding the max at each index (left side). Then right to left finding the max at each index (right side). Then we look at left side plus the right side to get the value. Basically dynamic programming of a sort.
let array = [1, 5, 3, 2, 4]
let maxLeft = {}
let maxRight = {}
let max,newMax;
for (let i=0; i<array.length; i++) {
if (i === 0) {
maxLeft[i] = array[i]
} else {
maxLeft[i] = array[i] < maxLeft[i-1] ? maxLeft[i-1] : array[i]
}
}
for (let i=array.length - 1; i >= 0; i--) {
if (i === array.length - 1) {
maxRight[i] = array[i]
} else {
maxRight[i] = array[i] < maxRight[i+1] ? maxRight[i+1] : array[i]
}
}
for (let i=0; i<array.length; i++) {
newMax = maxLeft[i] + maxRight[i + 1]
if (i === 0) {
max = newMax
} else {
maxLeft[i] = newMax < maxLeft[i-1] ? max : newMax
}
}
console.log(max)

Median of two sorted arrays of different length

I am trying to understand the algorithm that solves this problem in O(log(n+m)) where n and m are the lengths of the arrays. I have taken the liberty to post the link to the explanation of this algorithm:
https://www.geeksforgeeks.org/median-of-two-sorted-arrays-of-different-sizes/
It's so hard for me to digest completely the idea behind this algorithm. I can see that the idea is to reduce the length of one of the arrays to either 1 or 2 and then apply the base cases. The base cases make sense, but I wonder if one can omit the base case for n = 2 and just work on n = 1. I also don't understand the remaining cases part. It looks so weird to me that we have to cut the array B[] from the start to idx. It's weird because idx can be equal to the length of B[], so we would be ignoring the whole array.
TL;DR:
The main idea is that you may delete N elements that are surely smaller than (or equal to) the median from your number set, as long as you delete the same amount that are surely greater or equal.
Let's explain it with an example:
A=[1 2 3 9 10], B=[3 4 5 6 7 8 9]
The middle elements marked:
A=[1 2 3 9 10], B=[3 4 5 6 7 8 9]
The overall median will be between 3 and 6, inclusive. So, if we delete two elements smaller than 3, and two elements greater than 6, we'll still have the same median. The smaller elements we delete from A, and the greater ones from B:
A=[3 9 10], B=[3 4 5 6 7]
Now we delete one element greater than 9 (from A) and one smaller than 5 (from B):
A=[3 9], B=[4 5 6 7]
We reached Case 4 (smaller array has 2 elements): the algorithm calls for the median of
B[M/2], B[M/2 – 1], max(A[0], B[M/2 – 2]), min(A[1], B[M/2 + 1])
being B[2], B[1], max(A[0], B[0]), min(A[1], B[3])
being 6, 5, max(3,4), min(9,7)
being [6 5 4 7]
The median of that array is 5.5. And that's the correct result.
def findmedian(A,B):
if len(A) > len(B):
return findmedian(B,A)# always ensuring that we do the binsearch on the shorter arr
x = len(A)
y = len(B)
start = 0# start and end of shorter arr
end = x
while (start <= end):
partition_x = (start + end)//2# the mid of smaller arr, partition_x is an index
partition_y = (x+y+1)//2 - partition_x# the suitable partition of larger arr to divide the arrs into equal halves
if partition_x == 0:# if there is nothing on the left
left_x = None
if partition_x == x:# if there is nothing on the right
right_x = sys.maxint# +inf
if partition_y == 0:
left_y = None# this is -inf similar to the case for smaller arr
if partition_y == y:
right_y = sys.maxint
if (left_x <= right_y) and (left_y <= right_x):# all the elems on left are smaller than all the elems on right is ensured by
#checking on the extremes only since arrs sorted. Also, the partition always makes equal halves, so found the right spot.
if (x+y) % 2 == 0:
return (max(left_x,left_y) + min(right_x,right_y))/2.0
else:
return max(left_x,left_y)# if the num of elems is odd
elif left_x > right_y:# if we have come more towards right of smaller arr, then move left on smaller arr
end = partition_x -1
else:# if we have come more to the left
start = partition_x + 1
class Solution(object):
def findMedianSortedArrays(self, nums1, nums2):
merged_array = (nums1 + nums2)
merged_array.sort()
l_m_a = len(merged_array)
count = int(l_m_a / 2)
if l_m_a % 2 == 1:
median = merged_array[count]
return median
else:
median_in_even = (merged_array[count] + merged_array[count - 1]) / 2
return median_in_even
class Solution:
def findMedianSortedArrays(self, nums1, nums2):
nums1.extend(nums2)
newArray = sorted(nums1)
if len(newArray)%2==0:
index = len(newArray)//2
median = (newArray[index] + newArray[index-1])/2
return float(median)
else:
index = len(newArray)//2
median = newArray[index]
return float(median)
if __name__ == '__main__':
obj = Solution()
print(obj.findMedianSortedArrays([1,3],[2]))
Median of two sorted Arrays | Same length | Different length
1st we need to merge both arrays in sorted order. And then we can find
the median. The Median will be the central element of the sorted array.
var findMedianSortedArrays = function(nums1, nums2) {
let array = [], leftIndex = 0, rightIndex = 0;
while (leftIndex < nums1.length && rightIndex < nums2.length) {
if (nums1[leftIndex] < nums2[rightIndex]) {
array.push(nums1[leftIndex]);
leftIndex++;
} else {
array.push(nums2[rightIndex]);
rightIndex++;
}
}
// add uninitiated remaining element from either array if any remains.
array = array.concat(nums1.slice(leftIndex)).concat(nums2.slice(rightIndex));
if (array.length % 2 == 0) {
return (array[(array.length / 2) - 1] + array[array.length / 2]) / 2;
} else {
return array[Math.floor(array.length / 2)]
}
};
findMedianSortedArrays([1 2 3 9 10], [3 4 5 6 7 8 9]);
/**
* #param {number[]} nums1
* #param {number[]} nums2
* #return {number}
*/
var findMedianSortedArrays = function (nums1, nums2) {
let newArray = [];
let median;
if (nums1.length > 0 && nums2.length > 0) {
newArray = [...nums1, ...nums2]
newArray.sort(function (a, b) {
return a - b;
})
} else if (nums1.length === 0) {
newArray = nums2
newArray.sort(function (a, b) {
return a - b;
})
} else if (nums2.length === 0) {
newArray = nums1
newArray.sort(function (a, b) {
return a - b;
})
}
if (newArray.length === 1) {
return newArray[0]
}
if (newArray.length === 3) {
return newArray[1]
}
if (newArray.length % 2 === 0) {
const findIndex = Math.floor(newArray.length / 2)
console.log("findIndexeven", findIndex)
const addValue = Math.max((newArray[findIndex - 1] +
newArray[findIndex]), 0)
median = addValue / 2
} else {
const findIndex = Math.floor(newArray.length / 2) + 1
console.log("findIndexodd", findIndex)
median = newArray[findIndex - 1]
}
console.log("medianValue",median)
return median
};
findMedianSortedArrays([1, 2], [3, 4])
For me, it's just several minutes of several lines of python codes and it passed the leetcode check with a runtime beating 62% of Python3 online submissions. My code is here:
class Solution:
def findMedianSortedArrays(self, nums1: List[int], nums2: List[int]) -> float:
n = len(nums1) + len(nums2)
nums1.extend(nums2) # merge two lists
nums1.sort() # sort it
if n % 2 == 0:
return (nums1[n//2-1] + nums1[n//2])/2 # return the median for even n
else:
return nums1[n//2] # return the median for odd n

How to calculate the index (lexicographical order) when the combination is given

I know that there is an algorithm that permits, given a combination of number (no repetitions, no order), calculates the index of the lexicographic order.
It would be very useful for my application to speedup things...
For example:
combination(10, 5)
1 - 1 2 3 4 5
2 - 1 2 3 4 6
3 - 1 2 3 4 7
....
251 - 5 7 8 9 10
252 - 6 7 8 9 10
I need that the algorithm returns the index of the given combination.
es: index( 2, 5, 7, 8, 10 ) --> index
EDIT: actually I'm using a java application that generates all combinations C(53, 5) and inserts them into a TreeMap.
My idea is to create an array that contains all combinations (and related data) that I can index with this algorithm.
Everything is to speedup combination searching.
However I tried some (not all) of your solutions and the algorithms that you proposed are slower that a get() from TreeMap.
If it helps: my needs are for a combination of 5 from 53 starting from 0 to 52.
Thank you again to all :-)
Here is a snippet that will do the work.
#include <iostream>
int main()
{
const int n = 10;
const int k = 5;
int combination[k] = {2, 5, 7, 8, 10};
int index = 0;
int j = 0;
for (int i = 0; i != k; ++i)
{
for (++j; j != combination[i]; ++j)
{
index += c(n - j, k - i - 1);
}
}
std::cout << index + 1 << std::endl;
return 0;
}
It assumes you have a function
int c(int n, int k);
that will return the number of combinations of choosing k elements out of n elements.
The loop calculates the number of combinations preceding the given combination.
By adding one at the end we get the actual index.
For the given combination there are
c(9, 4) = 126 combinations containing 1 and hence preceding it in lexicographic order.
Of the combinations containing 2 as the smallest number there are
c(7, 3) = 35 combinations having 3 as the second smallest number
c(6, 3) = 20 combinations having 4 as the second smallest number
All of these are preceding the given combination.
Of the combinations containing 2 and 5 as the two smallest numbers there are
c(4, 2) = 6 combinations having 6 as the third smallest number.
All of these are preceding the given combination.
Etc.
If you put a print statement in the inner loop you will get the numbers
126, 35, 20, 6, 1.
Hope that explains the code.
Convert your number selections to a factorial base number. This number will be the index you want. Technically this calculates the lexicographical index of all permutations, but if you only give it combinations, the indexes will still be well ordered, just with some large gaps for all the permutations that come in between each combination.
Edit: pseudocode removed, it was incorrect, but the method above should work. Too tired to come up with correct pseudocode at the moment.
Edit 2: Here's an example. Say we were choosing a combination of 5 elements from a set of 10 elements, like in your example above. If the combination was 2 3 4 6 8, you would get the related factorial base number like so:
Take the unselected elements and count how many you have to pass by to get to the one you are selecting.
1 2 3 4 5 6 7 8 9 10
2 -> 1
1 3 4 5 6 7 8 9 10
3 -> 1
1 4 5 6 7 8 9 10
4 -> 1
1 5 6 7 8 9 10
6 -> 2
1 5 7 8 9 10
8 -> 3
So the index in factorial base is 1112300000
In decimal base, it's
1*9! + 1*8! + 1*7! + 2*6! + 3*5! = 410040
This is Algorithm 2.7 kSubsetLexRank on page 44 of Combinatorial Algorithms by Kreher and Stinson.
r = 0
t[0] = 0
for i from 1 to k
if t[i - 1] + 1 <= t[i] - 1
for j from t[i - 1] to t[i] - 1
r = r + choose(n - j, k - i)
return r
The array t holds your values, for example [5 7 8 9 10]. The function choose(n, k) calculates the number "n choose k". The result value r will be the index, 251 for the example. Other inputs are n and k, for the example they would be 10 and 5.
zero-base,
# v: array of length k consisting of numbers between 0 and n-1 (ascending)
def index_of_combination(n,k,v):
idx = 0
for p in range(k-1):
if p == 0: arrg = range(1,v[p]+1)
else: arrg = range(v[p-1]+2, v[p]+1)
for a in arrg:
idx += combi[n-a, k-1-p]
idx += v[k-1] - v[k-2] - 1
return idx
Null Set has the right approach. The index corresponds to the factorial-base number of the sequence. You build a factorial-base number just like any other base number, except that the base decreases for each digit.
Now, the value of each digit in the factorial-base number is the number of elements less than it that have not yet been used. So, for combination(10, 5):
(1 2 3 4 5) == 0*9!/5! + 0*8!/5! + 0*7!/5! + 0*6!/5! + 0*5!/5!
== 0*3024 + 0*336 + 0*42 + 0*6 + 0*1
== 0
(10 9 8 7 6) == 9*3024 + 8*336 + 7*42 + 6*6 + 5*1
== 30239
It should be pretty easy to calculate the index incrementally.
If you have a set of positive integers 0<=x_1 < x_2< ... < x_k , then you could use something called the squashed order:
I = sum(j=1..k) Choose(x_j,j)
The beauty of the squashed order is that it works independent of the largest value in the parent set.
The squashed order is not the order you are looking for, but it is related.
To use the squashed order to get the lexicographic order in the set of k-subsets of {1,...,n) is by taking
1 <= x1 < ... < x_k <=n
compute
0 <= n-x_k < n-x_(k-1) ... < n-x_1
Then compute the squashed order index of (n-x_k,...,n-k_1)
Then subtract the squashed order index from Choose(n,k) to get your result, which is the lexicographic index.
If you have relatively small values of n and k, you can cache all the values Choose(a,b) with a
See Anderson, Combinatorics on Finite Sets, pp 112-119
I needed also the same for a project of mine and the fastest solution I found was (Python):
import math
def nCr(n,r):
f = math.factorial
return f(n) / f(r) / f(n-r)
def index(comb,n,k):
r=nCr(n,k)
for i in range(k):
if n-comb[i]<k-i:continue
r=r-nCr(n-comb[i],k-i)
return r
My input "comb" contained elements in increasing order You can test the code with for example:
import itertools
k=3
t=[1,2,3,4,5]
for x in itertools.combinations(t, k):
print x,index(x,len(t),k)
It is not hard to prove that if comb=(a1,a2,a3...,ak) (in increasing order) then:
index=[nCk-(n-a1+1)Ck] + [(n-a1)C(k-1)-(n-a2+1)C(k-1)] + ... =
nCk -(n-a1)Ck -(n-a2)C(k-1) - .... -(n-ak)C1
There's another way to do all this. You could generate all possible combinations and write them into a binary file where each comb is represented by it's index starting from zero. Then, when you need to find an index, and the combination is given, you apply a binary search on the file. Here's the function. It's written in VB.NET 2010 for my lotto program, it works with Israel lottery system so there's a bonus (7th) number; just ignore it.
Public Function Comb2Index( _
ByVal gAr() As Byte) As UInt32
Dim mxPntr As UInt32 = WHL.AMT.WHL_SYS_00 '(16.273.488)
Dim mdPntr As UInt32 = mxPntr \ 2
Dim eqCntr As Byte
Dim rdAr() As Byte
modBinary.OpenFile(WHL.WHL_SYS_00, _
FileMode.Open, FileAccess.Read)
Do
modBinary.ReadBlock(mdPntr, rdAr)
RP: If eqCntr = 7 Then GoTo EX
If gAr(eqCntr) = rdAr(eqCntr) Then
eqCntr += 1
GoTo RP
ElseIf gAr(eqCntr) < rdAr(eqCntr) Then
If eqCntr > 0 Then eqCntr = 0
mxPntr = mdPntr
mdPntr \= 2
ElseIf gAr(eqCntr) > rdAr(eqCntr) Then
If eqCntr > 0 Then eqCntr = 0
mdPntr += (mxPntr - mdPntr) \ 2
End If
Loop Until eqCntr = 7
EX: modBinary.CloseFile()
Return mdPntr
End Function
P.S. It takes 5 to 10 mins to generate 16 million combs on a Core 2 Duo. To find the index using binary search on file takes 397 milliseconds on a SATA drive.
Assuming the maximum setSize is not too large, you can simply generate a lookup table, where the inputs are encoded this way:
int index(a,b,c,...)
{
int key = 0;
key |= 1<<a;
key |= 1<<b;
key |= 1<<c;
//repeat for all arguments
return Lookup[key];
}
To generate the lookup table, look at this "banker's order" algorithm. Generate all the combinations, and also store the base index for each nItems. (For the example on p6, this would be [0,1,5,11,15]). Note that by you storing the answers in the opposite order from the example (LSBs set first) you will only need one table, sized for the largest possible set.
Populate the lookup table by walking through the combinations doing Lookup[combination[i]]=i-baseIdx[nItems]
EDIT: Never mind. This is completely wrong.
Let your combination be (a1, a2, ..., ak-1, ak) where a1 < a2 < ... < ak. Let choose(a,b) = a!/(b!*(a-b)!) if a >= b and 0 otherwise. Then, the index you are looking for is
choose(ak-1, k) + choose(ak-1-1, k-1) + choose(ak-2-1, k-2) + ... + choose (a2-1, 2) + choose (a1-1, 1) + 1
The first term counts the number of k-element combinations such that the largest element is less than ak. The second term counts the number of (k-1)-element combinations such that the largest element is less than ak-1. And, so on.
Notice that the size of the universe of elements to be chosen from (10 in your example) does not play a role in the computation of the index. Can you see why?
Sample solution:
class Program
{
static void Main(string[] args)
{
// The input
var n = 5;
var t = new[] { 2, 4, 5 };
// Helping transformations
ComputeDistances(t);
CorrectDistances(t);
// The algorithm
var r = CalculateRank(t, n);
Console.WriteLine("n = 5");
Console.WriteLine("t = {2, 4, 5}");
Console.WriteLine("r = {0}", r);
Console.ReadKey();
}
static void ComputeDistances(int[] t)
{
var k = t.Length;
while (--k >= 0)
t[k] -= (k + 1);
}
static void CorrectDistances(int[] t)
{
var k = t.Length;
while (--k > 0)
t[k] -= t[k - 1];
}
static int CalculateRank(int[] t, int n)
{
int k = t.Length - 1, r = 0;
for (var i = 0; i < t.Length; i++)
{
if (t[i] == 0)
{
n--;
k--;
continue;
}
for (var j = 0; j < t[i]; j++)
{
n--;
r += CalculateBinomialCoefficient(n, k);
}
n--;
k--;
}
return r;
}
static int CalculateBinomialCoefficient(int n, int k)
{
int i, l = 1, m, x, y;
if (n - k < k)
{
x = k;
y = n - k;
}
else
{
x = n - k;
y = k;
}
for (i = x + 1; i <= n; i++)
l *= i;
m = CalculateFactorial(y);
return l/m;
}
static int CalculateFactorial(int n)
{
int i, w = 1;
for (i = 1; i <= n; i++)
w *= i;
return w;
}
}
The idea behind the scenes is to associate a k-subset with an operation of drawing k-elements from the n-size set. It is a combination, so the overall count of possible items will be (n k). It is a clue that we could seek the solution in Pascal Triangle. After a while of comparing manually written examples with the appropriate numbers from the Pascal Triangle, we will find the pattern and hence the algorithm.
I used user515430's answer and converted to python3. Also this supports non-continuous values so you could pass in [1,3,5,7,9] as your pool instead of range(1,11)
from itertools import combinations
from scipy.special import comb
from pandas import Index
debugcombinations = False
class IndexedCombination:
def __init__(self, _setsize, _poolvalues):
self.setsize = _setsize
self.poolvals = Index(_poolvalues)
self.poolsize = len(self.poolvals)
self.totalcombinations = 1
fast_k = min(self.setsize, self.poolsize - self.setsize)
for i in range(1, fast_k + 1):
self.totalcombinations = self.totalcombinations * (self.poolsize - fast_k + i) // i
#fill the nCr cache
self.choose_cache = {}
n = self.poolsize
k = self.setsize
for i in range(k + 1):
for j in range(n + 1):
if n - j >= k - i:
self.choose_cache[n - j,k - i] = comb(n - j,k - i, exact=True)
if debugcombinations:
print('testnth = ' + str(self.testnth()))
def get_nth_combination(self,index):
n = self.poolsize
r = self.setsize
c = self.totalcombinations
#if index < 0 or index >= c:
# raise IndexError
result = []
while r:
c, n, r = c*r//n, n-1, r-1
while index >= c:
index -= c
c, n = c*(n-r)//n, n-1
result.append(self.poolvals[-1 - n])
return tuple(result)
def get_n_from_combination(self,someset):
n = self.poolsize
k = self.setsize
index = 0
j = 0
for i in range(k):
setidx = self.poolvals.get_loc(someset[i])
for j in range(j + 1, setidx + 1):
index += self.choose_cache[n - j, k - i - 1]
j += 1
return index
#just used to test whether nth_combination from the internet actually works
def testnth(self):
n = 0
_setsize = self.setsize
mainset = self.poolvals
for someset in combinations(mainset, _setsize):
nthset = self.get_nth_combination(n)
n2 = self.get_n_from_combination(nthset)
if debugcombinations:
print(str(n) + ': ' + str(someset) + ' vs ' + str(n2) + ': ' + str(nthset))
if n != n2:
return False
for x in range(_setsize):
if someset[x] != nthset[x]:
return False
n += 1
return True
setcombination = IndexedCombination(5, list(range(1,10+1)))
print( str(setcombination.get_n_from_combination([2,5,7,8,10])))
returns 188

Resources