Sum of different elements in tuples nonexponential algorithm - algorithm

I was working on something, and I was able to reduce a problem to a particular form: given n tuples each of k integers, say: (a1,a2,a3,a4) , (b1,b2,b3,b4) , (c1,c2,c3,c4) , (d1,d2,d3,d4), I wish to choose any number of tuples, that, when added to each other, give a tuple with no positive elements. If I choose tuples a and b, I get tuple (a1+b1,a2+b2,a3+b3,a4+b4). So, if a = (1,-2,2,0) and b=(-1, 1, -3,0) then a+b =(0,-1,-1,0) which includes no positive numbers, hence is a solution of the problem.
Is there a way to obtain a solution (or verify its nonexistence) using a method other than checking the sum of all subset tuples, which takes 2^n steps?
Since this question is from my head, and not a particular textbook, I do not know the proper way to express it, and research to find an answer has been completely futile. Most of my searches directed me to the subset sum problem, where we choose k elements from a list that sum to a particular question. My problem could be said to be a complication of that: we choose a group of tuples from a list, and we want the sum of each element in these tuples to be <=0.
Edit: Thanks to the link provided, and due to the comments that indicated that a less than exponential solution is difficult, solving the question for the tuples whose elements range between -1,0, and 1 will be enough for me. Furthermore, the tuples will have ranging from 10,000-20,000 integers, and there will be no more than 1000 tuples. Each tuple has at most 10 1's, and 10 -1's, and the rest are zeroes
If anyone could also prove that it is some sort of NP, that would be great.
I failed to come up with a DP solution, and sorting doesn't seem useful

This can be solved in pseudo polynomial time with the given constraints using dynamic programming.
Explanation
This is similar to the pseudo polynomial time dynamic programming solution for the subset sum problem. It is only extended to multiple dimensions (4).
Time complexity
O(n * sum4) or in this case, since sum has been bounded by n,
O(n5)
Solution
Demo
Here is a top-down dynamic programming solution with memoization in C++.
const int N = 50;
int a[50][4]= {{0, 1, -1, 0},
{1, -1, 0, 0},
{-1, -1, 0, -1}};
unordered_map<int, bool> dp[N];
bool subset(int n, int sum1, int sum2, int sum3, int sum4)
{
// Base case: No tuple selected
if (n == -1 && !sum1 && !sum2 && !sum3 && !sum4)
return true;
// Base case: No tuple selected with non-zero sum
else if(n == -1)
return false;
else if(dp[n].find(hashsum(sum1, sum2, sum3, sum4)) != dp[n].end() )
return dp[n][hashsum(sum1, sum2, sum3, sum4)];
// Include the current element
bool include = subset(n - 1,
sum1 - a[n][0],
sum2 - a[n][1],
sum3 - a[n][2],
sum4 - a[n][3]);
// Exclude the current element
bool exclude = subset(n - 1, sum1, sum2, sum3, sum4);
return dp[n][hashsum(sum1, sum2, sum3, sum4)] = include || exclude;
}
For memoization, the hashsum is calculated as follows:
int hashsum(int sum1, int sum2, int sum3, int sum4) {
int offset = N;
int base = 2 * N;
int hashSum = 0;
hashSum += (sum1 + offset) * 1;
hashSum += (sum2 + offset) * base;
hashSum += (sum3 + offset) * base * base;
hashSum += (sum4 + offset) * base * base * base;
return hashSum;
}
The driver code can then search for any non-positive sum as follows:
int main()
{
int n = 3;
bool flag = false;
int sum1, sum2, sum3, sum4;
for (sum1 = -n; sum1 <= 0; sum1++) {
for (sum2 = -n; sum2 <= 0; sum2++) {
for (sum3 = -n; sum3 <= 0; sum3++) {
for (sum4 = -n; sum4 <= 0; sum4++) {
if (subset(n - 1, sum1, sum2, sum3, sum4)) {
flag = true;
goto done;
}
}
}
}
}
done:
if (flag && (sum1 || sum2 || sum3 || sum4))
cout << "Solution found. " << sum1 << ' ' << sum2 << ' ' << sum3 << ' ' << sum4 << std::endl;
else
cout << "No solution found.\n";
return 0;
}
Note that a trivial solution with sums (0, 0, 0, 0} where no element is ever selected always exists and thus is left out in the driver code.

Related

Algorithm to sum a triple?

We have an array A with m positive integer numbers, what's an algorithm that will
return true if there's a triple (x,y,z) in A
such that A[x] + A[y] + A[z] = 200
Otherwise return false. Numbers in array are distinct and running time must be O(n).
I came up with O(n^3). Any ideas on how to achieve this with O(n)?
Since elements are unique, this boils down to pre processing the array in O(n) to filter redundant elements - which are larger than 200 (none of them will be in the triplet).
Than, you have an array which its size is no larger than 200.
Checking all triplets in this array is O(200^3)=O(1) (it can be done more efficiently in terms of constants though).
So, this will be O(n) U O(200^3) = O(n)
I think you can solve this problem with bit operation. Such as bitset in C++ STL.
Using 3 bitsets, the first bitset cache all numbers you can get it by add 1 number, the second bitset cache all numbers you can get it by add 2 numbers, the third bitset cache all numbers you can get it by add 3 numbers. Then if a new number is coming, you can maintain the bitsets by simple bit operation.
Here is a sample C++ code:
bitset<256> bs[4];
for (int i = 0; i < 4; ++i)
bs[i].reset();
int N, number;
cin >> N;
while (N--)
{
cin >> number;
bs[3] |= (bs[2] << number);
bs[2] |= (bs[1] << number);
if (number <= 200)
bs[1].set(number);
//cout << "1: " << bs[1] << endl;
//cout << "2: " << bs[2] << endl;
//cout << "3: " << bs[3] << endl;
}
cout << bs[3][200] << endl;
The algorithm complexity is O(n). Because bit operation is quickly, each 64-bit long type can cache 64 number, so if you don't want to use bitset, you can use 4 long type(64 * 4 = 256) to replace it.
I agree with #amit's solution, but there is an question: How can we make it better, in our case just faster.
Here is my solution and it's almost based on amit' idea, but the asymptotic complexity == O(n + sum*(sum+1)/2), where n is a length of input array.
Firstly, we need n steps to filter the input array and put each value, that less the sum into the new array, where index of the value is equal to the value. At the end of this step we have the array, which size is equal to sum and we are able to access any value in O(1).
Finally, to find x,y,z we only need sum*(sum+1)/2 steps.
typedef struct SumATripleResult
{
unsigned int x;
unsigned int y;
unsigned int z;
} SumATripleResult;
SumATripleResult sumATriple(unsigned int totalSum, unsigned int *inputArray, unsigned int n)
{
SumATripleResult result;
unsigned int array[totalSum];
//Filter the input array and put each value into 'array' where array[value] = value
for (size_t i = 0; i<n; i++)
{
unsigned int value = inputArray[i];
if(value<totalSum)
{
array[value] = value;
}
}
unsigned int x;
unsigned int y;
unsigned int z;
for (size_t i = 0; i<totalSum; i++)
{
x = array[i];
for (size_t j = i+1; x>0 && j<totalSum; j++)
{
y = array[j];
if( y==0 || x + y >= totalSum) continue;
unsigned int zIdx = totalSum - (x + y);
if(zIdx == x || zIdx == y) continue;
z = array[zIdx];
if( z != 0)
{
result.x = x;
result.y = y;
result.z = z;
return result;
}
}
}
//nothing found
return result;
}
//Test
unsigned int array[] = {1, 21, 30, 12, 15, 10, 3, 5, 6, 11, 17, 31};
SumATripleResult r = sumATriple(52, array, 12);
printf("result = %d %d %d", r.x, r.y, r.y);
r = sumATriple(49, array, 12);
printf("result = %d %d %d", r.x, r.y, r.y);
r = sumATriple(32, array, 12);
printf("result = %d %d %d", r.x, r.y, r.y);
This is known as 3SUM problem and has no linear solution yet. I am providing a pseudo code running with O(n^2) using binary search algorithm:
sumTriple(А[1...n]: array of integers,sum: integer): bool
sort(A)
for i ← 1 to n-2
j ← i+1
k ← n
while k > j
if A[i]+A[j]+A[k] = sum
print i,j,k
return true
else if A[i]+A[j]+A[k] > sum
k ← k-1
else // A[i]+A[j]+A[k] < sum
j ← j+1
return false
More information and further details for the problem you can find here.

SUM exactly using K elements solution

Problem: On a given array with N numbers, find subset of size M (exactly M elements) that equal to SUM.
I am looking for a Dynamic Programming(DP) solution for this problem. Basically looking to understand the matrix filled approach. I wrote below program but didn't add memoization as i am still wondering how to do that.
#include <stdio.h>
#define SIZE(a) sizeof(a)/sizeof(a[0])
int binary[100];
int a[] = {1, 2, 5, 5, 100};
void show(int* p, int size) {
int j;
for (j = 0; j < size; j++)
if (p[j])
printf("%d\n", a[j]);
}
void subset_sum(int target, int i, int sum, int *a, int size, int K) {
if (sum == target && !K) {
show(binary, size);
} else if (sum < target && i < size) {
binary[i] = 1;
foo(target, i + 1, sum + a[i], a, size, K-1);
binary[i] = 0;
foo(target, i + 1, sum, a, size, K);
}
}
int main() {
int target = 10;
int K = 2;
subset_sum(target, 0, 0, a, SIZE(a), K);
}
Is the below recurrence solution makes sense?
Let DP[SUM][j][k] sum up to SUM with exactly K elements picked from 0 to j elements.
DP[i][j][k] = DP[i][j-1][k] || DP[i-a[j]][j-1][k-1] { input array a[0....j] }
Base cases are:
DP[0][0][0] = DP[0][j][0] = DP[0][0][k] = 1
DP[i][0][0] = DP[i][j][0] = 0
It means we can either consider this element ( DP[i-a[j]][j-1][k-1] ) or we don't consider the current element (DP[i][j-1][k]). If we consider current element, k is reduced by 1 which reduces the elements that needs to be considered and same goes when current element is not considered i.e. K is not reduced by 1.
Your solution looks right to me.
Right now, you're basically backtracking over all possibilities and printing each solution. If you only want one solution, you could add a flag that you set when one solution was found and check before continuing with recursive calls.
For memoization, you should first get rid of the binary array, after which you can do something like this:
int memo[NUM_ELEMENTS][MAX_SUM][MAX_K];
bool subset_sum(int target, int i, int sum, int *a, int size, int K) {
if (sum == target && !K) {
memo[i][sum][K] = true;
return memo[i][sum][K];
} else if (sum < target && i < size) {
if (memo[i][sum][K] != -1)
return memo[i][sum][K];
memo[i][sum][K] = foo(target, i + 1, sum + a[i], a, size, K-1) ||
foo(target, i + 1, sum, a, size, K);
return memo[i][sum][K]
}
return false;
}
Then, look at memo[_all indexes_][target][K]. If this is true, there exists at least one solution. You can store addition information to get you that next solution, or you can iterate with an i from found_index - 1 to 0 and check for which i you have memo[i][sum - a[i]][K - 1] == true. Then recurse on that, and so on. This will allow you to reconstruct the solution using just the memo array.
To my understanding, if only the feasibility of the input has to be checked, the problem can be solved with a two-dimensional state space
bool[][] IsFeasible = new bool[n][k]
where IsFeasible[i][j] is true if and only if there is a subset of the elements 1 to i which sum up to exactly j for every
1 <= i <= n
1 <= j <= k
and for this state space, the recurrence relation
IsFeasible[i][j] = IsFeasible[i-1][k-a[i]] || IsFeasible[i-1][k]
can be used, where the left-hand side of the or-operator || corresponds to selecting the i-th item and the right-hand side corresponds to to not selecting the i-th item. The actual choice of items could be obtained by backtracking or auxiliary information saved during evaluation.

Dividing an array into K subsets such that sum of all subsets is same using bitmasks+DP

So, this problem I dont have any clue how to solve it the problem statement is :
Given a set S of N integers the task is decide if it is possible to
divide them into K non-empty subsets such that the sum of elements in
every of the K subsets is equal.
N can be at max 20. K can be at max 8
The problem is to be solved specifically using DP+Bitmasks!
I cannot understand where to start ! As there are K sets to be maintained , I cannot take K states each representing some or the other!!
If I try taking the whole set as a state and K as the other, I have issues in creating a recurrent relation!
Can you help??
The link to original problem Problem
You can solve the problem in O(N * 2^N), so the K is meaningless for the complexity.
First let me warn you about the corner case N < K with all the numbers being zero, in which the answer is "no".
The idea of my algorithm is the following. Assume we have computed the sum of each of the masks (that can be done in O(2^N)). We know that for each of the groups, the sum should be the total sum divided by K.
We can do a DP with masks in which the state is just a binary mask telling which numbers have been used. The key idea in removing the K from the algorithm complexity is noticing that if we know which numbers have been used, we know the sum so far, so we also know which group we are filling now (current sum / group sum). Then just try to select the next number for the group: it will be valid if we do not exceed the group expected sum.
You can check my C++ code:
#include <iostream>
#include <vector>
#include <cstring>
using namespace std;
typedef long long ll;
ll v[21 + 5];
ll sum[(1 << 21) + 5];
ll group_sum;
int n, k;
void compute_sums(int position, ll current_sum, int mask)
{
if (position == -1)
{
sum[mask] = current_sum;
return;
}
compute_sums(position - 1, current_sum, mask << 1);
compute_sums(position - 1, current_sum + v[position], (mask << 1) + 1);
}
void solve_case()
{
cin >> n >> k;
for (int i = 0; i < n; ++i)
cin >> v[i];
memset(sum, 0, sizeof(sum));
compute_sums(n - 1, 0, 0);
group_sum = sum[(1 << n) - 1];
if (group_sum % k != 0)
{
cout << "no" << endl;
return;
}
if (group_sum == 0)
{
if (n >= k)
cout << "yes" << endl;
else
cout << "no" << endl;
return;
}
group_sum /= k;
vector<int> M(1 << n, 0);
M[0] = 1;
for (int mask = 0; mask < (1 << n); ++mask)
{
if (M[mask])
{
int current_group = sum[mask] / group_sum;
for (int i = 0; i < n; ++i)
{
if ((mask >> i) & 1)
continue;
if (sum[mask | (1 << i)] <= group_sum * (current_group + 1))
M[mask | (1 << i)] = 1;
}
}
}
if (M[(1 << n) - 1])
cout << "yes" << endl;
else
cout << "no" << endl;
}
int main()
{
int cases;
cin >> cases;
for (int z = 1; z <= cases; ++z)
solve_case();
}
Here's the working O(K*2^N*N) implementation in JavaScript. From the pseudo code https://discuss.codechef.com/questions/58420/sanskar-editorial
http://jsfiddle.net/d7q4o0nj/
function equality(set, size, count) {
if(size < count) { return false; }
var total = set.reduce(function(p, c) { return p + c; }, 0);
if((total % count) !== 0) { return false }
var subsetTotal = total / count;
var search = {0: true};
var nextSearch = {};
for(var i=0; i<count; i++) {
for(var bits=0; bits < (1 << size); bits++){
if(search[bits] !== true) { continue; }
var sum = 0;
for(var j=0; j < size; j++) {
if((bits & (1 << j)) !== 0) { sum += set[j]; }
}
sum -= i * subsetTotal;
for(var j=0; j < size; j++) {
if((bits & (1 << j)) !== 0) { continue; }
var testBits = bits | (1 << j);
var tmpTotal = sum + set[j];
if(tmpTotal == subsetTotal) { nextSearch[testBits] = true; }
else if(tmpTotal < subsetTotal) { search[testBits] = true; }
}
}
search = nextSearch;
nextSearch = {};
}
if(search[(1 << size) - 1] === true) {
return true;
}
return false;
}
console.log(true, equality([1,2,3,1,2,3], 6, 2));
console.log(true, equality([1, 2, 4, 5, 6], 5, 3));
console.log(true, equality([10,20,10,20,10,20,10,20,10,20], 10, 5));
console.log(false, equality([1,2,4,5,7], 5, 3));
EDIT The algorithm finds all of the bitmasks (which represent subsets bits) that meet the criteria (having a sum tmpTotal less than or equal to the ideal subset sum subsetTotal). Repeating this process by the amount of subsets required count, you either have a bitmask where all size bits are set which means success or the test fails.
EXAMPLE
set = [1, 2, 1, 2]
size = 4
count = 2, we want to try to partition the set into 2 subsets
subsetTotal = (1+2+1+2) / 2 = 3
Iteration 1:
search = {0b: true, 1b: true, 10b: true, 100b: true, 1000b: true, 101b: true}
nextSearch = {11b: true, 1100b: true, 110b: true, 1001b: true }
Iteration 2:
search = {11b: true, 1100b: true, 110b: true, 1001b: true, 111b: true, 1101b: true }
nextSearch = {1111b: true}
Final Check
(1 << size) == 10000b, (1 << size) - 1 == 1111b
Since nextSearch[ 1111b ] exists we return success.
UPD: I confused N and K with each other and my idea is true but not efficient.Efficient idea added at the end
Assume that so far you've created k-1 subsets, and now you want to create the k-th subset. For creating the k-th subset, you need to be able to answer these two questions:
1- What should be the sum of elements of k-th subset?
2- Which elements have been used so far ?
Answering the first question is easy, the sum should be equal to sum of all elements divided by K, let's name it subSum.
For second question, we need to have the state of each element, used or not. Here we need to use bitmask idea.
Here's the dp recurrence:
dp[i][mask] = means is it possible to create i subsets with sum of each equals to subSum, using the elements which are 1(not used) in mask (in its bit representation), So dp[i][mask] is a boolean type.
dp[i][mask] = OR(dp[i-1][mask2]) for all possible mask2 states. mask2 will be produced by converting some 1's of mask to 0's, i.e. those 1's that we want to be the elements of i-th subset.
For checking all possible mask2, you need to check all 2^n possible subsets of available 1's bits.Therefore, totaly, the time complexity will be O(N*(2^n)*(2^n)). In your problem is 20*2^8*2^8= 10*2^17 < 10^7 which can pass the time limit.
Obviously, for base case you have to handle dp[0][mask] on your own, without using the recurrence.Final answer is whether dp[K][2^N-1] is true or not.
__UPD__: For getting a better performance,before get into DP, you could preprocess all subsets with sum of subSum. Then, for calculating mask2, you just need to iterate over the preprocessed list, and see whether the AND operation of them with mask would result in the subset in the list or not.
UPD2:
For having an efficient solution, instead of finding proper mask2, we could use the fact that at each step, we know the sum of elements till that point. So we could add elements one by one into the mask, and whenever we had a sum which is divisible by K we could go to the next step for creating next subset.
if (sum of used elements of mask is divisible by K)
dp[i][mask]= dp[i+1][mask];
else
dp[i][mask]|=dp[i][mask ^(1<<i)] provided that i-th item is not used and can not exceed the current sum more than i*subSum.

Finding minimal absolute sum of a subarray

There's an array A containing (positive and negative) integers. Find a (contiguous) subarray whose elements' absolute sum is minimal, e.g.:
A = [2, -4, 6, -3, 9]
|(−4) + 6 + (−3)| = 1 <- minimal absolute sum
I've started by implementing a brute-force algorithm which was O(N^2) or O(N^3), though it produced correct results. But the task specifies:
complexity:
- expected worst-case time complexity is O(N*log(N))
- expected worst-case space complexity is O(N)
After some searching I thought that maybe Kadane's algorithm can be modified to fit this problem but I failed to do it.
My question is - is Kadane's algorithm the right way to go? If not, could you point me in the right direction (or name an algorithm that could help me here)? I don't want a ready-made code, I just need help in finding the right algorithm.
If you compute the partial sums
such as
2, 2 +(-4), 2 + (-4) + 6, 2 + (-4) + 6 + (-3)...
Then the sum of any contiguous subarray is the difference of two of the partial sums. So to find the contiguous subarray whose absolute value is minimal, I suggest that you sort the partial sums and then find the two values which are closest together, and use the positions of these two partial sums in the original sequence to find the start and end of the sub-array with smallest absolute value.
The expensive bit here is the sort, so I think this runs in time O(n * log(n)).
This is C++ implementation of Saksow's algorithm.
int solution(vector<int> &A) {
vector<int> P;
int min = 20000 ;
int dif = 0 ;
P.resize(A.size()+1);
P[0] = 0;
for(int i = 1 ; i < P.size(); i ++)
{
P[i] = P[i-1]+A[i-1];
}
sort(P.begin(),P.end());
for(int i = 1 ; i < P.size(); i++)
{
dif = P[i]-P[i-1];
if(dif<min)
{
min = dif;
}
}
return min;
}
I was doing this test on Codility and I found mcdowella answer quite helpful, but not enough I have to say: so here is a 2015 answer guys!
We need to build the prefix sums of array A (called P here) like: P[0] = 0, P[1] = P[0] + A[0], P[2] = P[1] + A[1], ..., P[N] = P[N-1] + A[N-1]
The "min abs sum" of A will be the minimum absolute difference between 2 elements in P. So we just have to .sort() P and loop through it taking every time 2 successive elements. This way we have O(N + Nlog(N) + N) which equals to O(Nlog(N)).
That's it!
The answer is yes, Kadane's algorithm is definitely the way to go for solving your problem.
http://en.wikipedia.org/wiki/Maximum_subarray_problem
Source - I've closely worked with a PhD student who's entire PhD thesis was devoted to the maximum subarray problem.
def min_abs_subarray(a):
s = [a[0]]
for e in a[1:]:
s.append(s[-1] + e)
s = sorted(s)
min = abs(s[0])
t = s[0]
for x in s[1:]:
cur = abs(x)
min = cur if cur < min else min
cur = abs(t-x)
min = cur if cur < min else min
t = x
return min
You can run Kadane's algorithmtwice(or do it in one go) to find minimum and maximum sum where finding minimum works in same way as maximum with reversed signs and then calculate new maximum by comparing their absolute value.
Source-Someone's(dont remember who) comment in this site.
Here is an Iterative solution in python. It's 100% correct.
def solution(A):
memo = []
if not len(A):
return 0
for ind, val in enumerate(A):
if ind == 0:
memo.append([val, -1*val])
else:
newElem = []
for i in memo[ind - 1]:
newElem.append(i+val)
newElem.append(i-val)
memo.append(newElem)
return min(abs(n) for n in memo.pop())
Short Sweet and work like a charm. JavaScript / NodeJs solution
function solution(A, i=0, sum =0 ) {
//Edge case if Array is empty
if(A.length == 0) return 0;
// Base case. For last Array element , add and substart from sum
// and find min of their absolute value
if(A.length -1 === i){
return Math.min( Math.abs(sum + A[i]), Math.abs(sum - A[i])) ;
}
// Absolute value by adding the elem with the sum.
// And recusrively move to next elem
let plus = Math.abs(solution(A, i+1, sum+A[i]));
// Absolute value by substracting the elem from the sum
let minus = Math.abs(solution(A, i+1, sum-A[i]));
return Math.min(plus, minus);
}
console.log(solution([-100, 3, 2, 4]))
Here is a C solution based on Kadane's algorithm.
Hopefully its helpful.
#include <stdio.h>
int min(int a, int b)
{
return (a >= b)? b: a;
}
int min_slice(int A[], int N) {
if (N==0 || N>1000000)
return 0;
int minTillHere = A[0];
int minSoFar = A[0];
int i;
for(i = 1; i < N; i++){
minTillHere = min(A[i], minTillHere + A[i]);
minSoFar = min(minSoFar, minTillHere);
}
return minSoFar;
}
int main(){
int A[]={3, 2, -6, 4, 0}, N = 5;
//int A[]={3, 2, 6, 4, 0}, N = 5;
//int A[]={-4, -8, -3, -2, -4, -10}, N = 6;
printf("Minimum slice = %d \n", min_slice(A,N));
return 0;
}
public static int solution(int[] A) {
int minTillHere = A[0];
int absMinTillHere = A[0];
int minSoFar = A[0];
int i;
for(i = 1; i < A.length; i++){
absMinTillHere = Math.min(Math.abs(A[i]),Math.abs(minTillHere + A[i]));
minTillHere = Math.min(A[i], minTillHere + A[i]);
minSoFar = Math.min(Math.abs(minSoFar), absMinTillHere);
}
return minSoFar;
}
int main()
{
int n; cin >> n;
vector<int>a(n);
for(int i = 0; i < n; i++) cin >> a[i];
long long local_min = 0, global_min = LLONG_MAX;
for(int i = 0; i < n; i++)
{
if(abs(local_min + a[i]) > abs(a[i]))
{
local_min = a[i];
}
else local_min += a[i];
global_min = min(global_min, abs(local_min));
}
cout << global_min << endl;
}

Perfect minimal hash for mathematical combinations

First, define two integers N and K, where N >= K, both known at compile time. For example: N = 8 and K = 3.
Next, define a set of integers [0, N) (or [1, N] if that makes the answer simpler) and call it S. For example: {0, 1, 2, 3, 4, 5, 6, 7}
The number of subsets of S with K elements is given by the formula C(N, K). Example
My problem is this: Create a perfect minimal hash for those subsets. The size of the example hash table will be C(8, 3) or 56.
I don't care about ordering, only that there be 56 entries in the hash table, and that I can determine the hash quickly from a set of K integers. I also don't care about reversibility.
Example hash: hash({5, 2, 3}) = 42. (The number 42 isn't important, at least not here)
Is there a generic algorithm for this that will work with any values of N and K? I wasn't able to find one by searching Google, or my own naive efforts.
There is an algorithm to code and decode a combination into its number in the lexicographical order of all combinations with a given fixed K. The algorithm is linear to N for both code and decode of the combination. What language are you interested in?
EDIT: here is example code in c++(it founds the lexicographical number of a combination in the sequence of all combinations of n elements as opposed to the ones with k elements but is really good starting point):
typedef long long ll;
// Returns the number in the lexicographical order of all combinations of n numbers
// of the provided combination.
ll code(vector<int> a,int n)
{
sort(a.begin(),a.end());
int cur = 0;
int m = a.size();
ll res =0;
for(int i=0;i<a.size();i++)
{
if(a[i] == cur+1)
{
res++;
cur = a[i];
continue;
}
else
{
res++;
int number_of_greater_nums = n - a[i];
for(int j = a[i]-1,increment=1;j>cur;j--,increment++)
res += 1LL << (number_of_greater_nums+increment);
cur = a[i];
}
}
return res;
}
// Takes the lexicographical code of a combination of n numbers and returns the
// combination
vector<int> decode(ll kod, int n)
{
vector<int> res;
int cur = 0;
int left = n; // Out of how many numbers are we left to choose.
while(kod)
{
ll all = 1LL << left;// how many are the total combinations
for(int i=n;i>=0;i--)
{
if(all - (1LL << (n-i+1)) +1 <= kod)
{
res.push_back(i);
left = n-i;
kod -= all - (1LL << (n-i+1)) +1;
break;
}
}
}
return res;
}
I am sorry I have an algorithm for the problem you are asking for right now, but I believe it will be a good exercise to try to understand what I do above. Truth is this is one of the algorithms I teach in the course "Design and analysis of algorithms" and that is why I had it pre-written.
This is what you (and I) need:
hash() maps k-tuples from [1..n] onto the set 1..C(n,k)\subset N.
The effort is k subtractions (and O(k) is a lower bound anyway, see Strandjev's remark above):
// bino[n][k] is (n "over" k) = C(n,k) = {n \choose k}
// these are assumed to be precomputed globals
int hash(V a,int n, int k) {// V is assumed to be ordered, a_k<...<a_1
// hash(a_k,..,a_2,a_1) = (n k) - sum_(i=1)^k (n-a_i i)
// ii is "inverse i", runs from left to right
int res = bino[n][k];
int i;
for(unsigned int ii = 0; ii < a.size(); ++ii) {
i = a.size() - ii;
res = res - bino[n-a[ii]][i];
}
return res;
}

Resources