Find the number of possible combinations of three permuted lists - algorithm

I'm looking for a solution for this task:
There are three permuted integer lists:
index
0 1 2 3
[2,4,3,1]
[3,4,1,2]
[1,2,4,3]
I'd like to know how many combinations of three tuples across the lists there are. For example, after rotating the second list by one to the right and the third list by one to the left:
0 1 2 3
[2,4,3,1]
3 0 1 2
[2,3,4,1]
1 2 3 0
[2,4,3,1]
would result in two combinations (2,2,2) and (1,1,1). I'm only interested in the number of combinations, not the actual combinations themselves.
The lists have always the same length N. From my understanding, there are is at least one combination and maximally N.
I've written an imperative solution, using three nested for loops, but for larger problems sizes (e.g. N > 1000) this quickly becomes unbearable.
Is there are more efficient approach than brute force (trying all combinations)?. Maybe some clever algorithm or a mathematical trick?
Edit:
I'm rephrasing the question to make it (hopefully) more clear:
I have 3 permutations of a list [1..N].
The lists can be individually rotated left or right, until the elements for some indexes line up. In the above example that would be:
Right rotate list 2 by 1
Left rotate list 3 by 1
Now the columns are aligned for 2 and 1.
I've also added the indexes the example above. Please tell me, if it's still unclear.
My code so far:
#include <iostream>
int
solve(int n, int * a, int * b, int * c)
{
int max = 0;
for (int i = 0; i < n; ++i) {
int m = 0;
for (int j = 0; j < n; ++j) {
if (a[i] == b[j]) {
for (int k = 0; k < n; ++k) {
if (a[i] == c[k]) {
for (int l = 0; l < n; ++l) {
if (a[l] == b[(l+j) % n] && a[l] == b[(l+k) % n]) {
++m;
}
}
}
}
}
}
if (m > max) {
max = m;
}
}
return max;
}
int
main(int argc, char ** argv)
{
int n = 5;
int a[] = { 1, 5, 4, 3, 2 };
int b[] = { 1, 3, 2, 4, 5 };
int c[] = { 2, 1, 5, 4, 3 };
std::cout << solve(n, a, b, c) << std::endl;
return 0;
}

Here is an efficient solution:
Let's assume that we have picked a fixed element from the first list and we want to match it to the elements from the second and the third list with the same value. It uniquely determines the rotation of the second and the third list(we can assume that the first list is never rotated). It gives us a pair of two integers: (the position of this element in the first list minus its position in the second list modulo N, the same thing for the first and the third list).
Now we can iterate over all elements of the first list and generate these pairs.
The answer is the number of ocurrences of the most frequent pair.
The time complexity is O(N * log N) if we use standard sort to find the most frequent pair or O(N) if we use radix sort or a hash table.

You can make it by creating all combinations like:
0,0,0
0,0,1
0,1,0
0,1,1
1,0,0
1,0,1
1,1,0
1,1,1
Each 0 / 1 can be your array
This code can help you creating this list above:
private static ArrayList<String> getBinaryArray(ArrayList<Integer> array){
//calculating the possible combinations we can get
int possibleCombinations = (int) Math.pow(2, array.size());
//creating an array with all the possible combinations in binary
String binary = "";
ArrayList<String> binaryArray = new ArrayList<String>();
for (int k = 0; k <possibleCombinations; k++) {
binary = Integer.toBinaryString(k);
//adding '0' as much as we need
int len = (array.size() - binary.length());
for (int w = 1; w<=len; w++) {
binary = "0" + binary;
}
binaryArray.add(binary);
}
return binaryArray;
}
it's also either can be with 0/1/2 numbers which each other number can be the lists you got.
if it's not so clear please tell me

Related

How will I solve this using DP?

Question link: http://codeforces.com/contest/2/problem/B
There is a square matrix n × n, consisting of non-negative integer numbers. You should find such a way on it that
starts in the upper left cell of the matrix;
each following cell is to the right or down from the current cell;
the way ends in the bottom right cell.
Moreover, if we multiply together all the numbers along the way, the result should be the least "round". In other words, it should end in the least possible number of zeros.
Input
The first line contains an integer number n (2 ≤ n ≤ 1000), n is the size of the matrix. Then follow n lines containing the matrix elements (non-negative integer numbers not exceeding 10^9).
Output
In the first line print the least number of trailing zeros. In the second line print the correspondent way itself.
I thought of the following: In the end, whatever the answer will be, it should contain minimum powers of 2's and 5's. Therefore, what I did was, for each entry in the input matrix, I calculated the powers of 2's and 5's and stored them in separate matrices.
for (i = 0; i < n; i++)
{
for ( j = 0; j < n; j++)
{
cin>>foo;
matrix[i][j] = foo;
int n1 = calctwo(foo); // calculates the number of 2's in factorisation of that number
int n2 = calcfive(foo); // calculates number of 5's
two[i][j] = n1;
five[i][j] = n2;
}
}
After that, I did this:
for (i = 0; i < n; i++)
{
for ( j = 0; j < n; j++ )
{
dp[i][j] = min(two[i][j],five[i][j]); // Here, dp[i][j] will store minimum number of 2's and 5's.
}
}
But the above doesn't really a valid answer, I don't know why? Have I implemented the correct approach? Or, is this the correct way of solving this question?
Edit: Here are my functions of calculating the number of two's and number of five's in a number.
int calctwo (int foo)
{
int counter = 0;
while (foo%2 == 0)
{
if (foo%2 == 0)
{
counter++;
foo = foo/2;
}
else
break;
}
return counter;
}
int calcfive (int foo)
{
int counter = 0;
while (foo%5 == 0)
{
if (foo%5 == 0)
{
counter++;
foo = foo/5;
}
else
break;
}
return counter;
}
Edit2: I/O Example as given in the link:
Input:
3
1 2 3
4 5 6
7 8 9
Output:
0
DDRR
Since you are interested only in the number of trailing zeroes you need only to consider the powers of 2, 5 which you could keep in two separate nxn arrays. So for the array
1 2 3
4 5 6
7 8 9
you just keep the arrays
the powers of 2 the powers of 5
0 1 0 0 0 0
2 0 1 0 1 0
0 3 0 0 0 0
The insight for the problem is the following. Notice that if you find a path which minimizes the sum of the powers of 2 and a path which minimizes the number sum of the powers of 5 then the answer is the one with lower value of those two paths. So you reduce your problem to the two times application of the following classical dp problem: find a path, starting from the top-left corner and ending at the bottom-right, such that the sum of its elements is minimum. Again, following the example, we have:
minimal path for the
powers of 2 value
* * - 2
- * *
- - *
minimal path for the
powers of 5 value
* - - 0
* - -
* * *
so your answer is
* - -
* - -
* * *
with value 0
Note 1
It might seem that taking the minimum of the both optimal paths gives only an upper bound so a question that may rise is: is this bound actually achieved? The answer is yes. For convenience, let the number of 2's along the 2's optimal path is a and the number of 5's along the 5's optimal path is b. Without loss of generality assume that the minimum of the both optimal paths is the one for the power of 2's (that is a < b). Let the number of 5's along the minimal path is c. Now the question is: are there as much as 5's as there are 2's along this path (i.e. is c >= a?). Assume that the answer is no. That means that there are less 5's than 2's along the minimal path (that is c < a). Since the optimal value of 5's paths is b we have that every 5's path has at least b 5's in it. This should also be true for the minimal path. That means that c > b. We have that c < a so a > b but the initial assumption was that a < b. Contradiction.
Note 2
You might also want consider the case in which there is an element 0 in your matrix. I'd assume that number of trailing zeroes when the product is 1. In this case, if the algorithm has produced a result with a value more than 1 you should output 1 and print a path that goes through the element 0.
Here is the code. I've used pair<int,int> to store factor of 2 and 5 in the matrix.
#include<vector>
#include<iostream>
using namespace std;
#define pii pair<int,int>
#define F first
#define S second
#define MP make_pair
int calc2(int a){
int c=0;
while(a%2==0){
c++;
a/=2;
}
return c;
}
int calc5(int a){
int c=0;
while(a%5==0){
c++;
a/=5;
}
return c;
}
int mini(int a,int b){
return a<b?a:b;
}
pii min(pii a, pii b){
if(mini(a.F,a.S) < mini(b.F,b.S))
return a;
return b;
}
int main(){
int n;
cin>>n;
vector<vector<pii > > v;
vector<vector<int> > path;
int i,j;
for(i=0;i<n;i++){
vector<pii > x;
vector<int> q(n,0);
for(j=0;j<n;j++){
int y;cin>>y;
x.push_back(MP(calc2(y),calc5(y))); //I store factors of 2,5 in the vector to calculate
}
x.push_back(MP(100000,100000)); //padding each row to n+1 elements (to handle overflow in code)
v.push_back(x);
path.push_back(q); //initialize path matrix to 0
}
vector<pii > x(n+1,MP(100000,100000));
v.push_back(x); //pad 1 more row to handle index overflow
for(i=n-1;i>=0;i--){
for(j=n-1;j>=0;j--){ //move from destination to source grid
if(i==n-1 && j==n-1)
continue;
//here, the LHS of condition in if block is the condition which determines minimum number of trailing 0's. This is the same condition that is used to manipulate "v" for getting the same result.
if(min(MP(v[i][j].F+v[i+1][j].F,v[i][j].S+v[i+1][j].S), MP(v[i][j].F+v[i][j+1].F,v[i][j].S+v[i][j+1].S)) == MP(v[i][j].F+v[i+1][j].F,v[i][j].S+v[i+1][j].S))
path[i][j] = 1; //go down
else
path[i][j] = 2; //go right
v[i][j] = min(MP(v[i][j].F+v[i+1][j].F,v[i][j].S+v[i+1][j].S), MP(v[i][j].F+v[i][j+1].F,v[i][j].S+v[i][j+1].S));
}
}
cout<<mini(v[0][0].F, v[0][0].S)<<endl; //print result
for(i=0,j=0;i<=n-1 && j<=n-1;){ //print path (I don't know o/p format)
cout<<"("<<i<<","<<j<<") -> ";
if(path[i][j]==1)
i++;
else
j++;
}
return 0;
}
This code gives fine results as far as the test cases I checked. If you have any doubts regarding this code, ask in comments.
EDIT:
The basic thought process.
To reach the destination, there are only 2 options. I started with destination to avoid the problem of path ahead calculation, because if 2 have same minimum values, then we chose any one of them. If the path to destination is already calculated, it does not matter which we take.
And minimum is to check which pair is more suitable. If a pair has minimum 2's or 5's than other, it will produce less 0's.
Here is a solution proposal using Javascript and functional programming.
It relies on several functions:
the core function is smallest_trailer that recursively goes through the grid. I have chosen to go in 4 possible direction, left "L", right "R", down "D" and "U". It is not possible to pass twice on the same cell. The direction that is chosen is the one with the smallest number of trailing zeros. The counting of trailing zeros is devoted to another function.
the function zero_trailer(p,n,nbz) assumes that you arrive on a cell with a value p while you already have an accumulator n and met nbz zeros on your way. The function returns an array with two elements, the new number of zeros and the new accumulator. The accumulator will be a power of 2 or 5. The function uses the auxiliary function pow_2_5(n) that returns the powers of 2 and 5 inside n.
Other functions are more anecdotical: deepCopy(arr) makes a standard deep copy of the array arr, out_bound(i,j,n) returns true if the cell (i,j) is out of bound of the grid of size n, myMinIndex(arr) returns the min index of an array of 2 dimensional arrays (each subarray contains the nb of trailing zeros and the path as a string). The min is only taken on the first element of subarrays.
MAX_SAFE_INTEGER is a (large) constant for the maximal number of trailing zeros when the path is wrong (goes out of bound for example).
Here is the code, which works on the example given in the comments above and in the orginal link.
var MAX_SAFE_INTEGER = 9007199254740991;
function pow_2_5(n) {
// returns the power of 2 and 5 inside n
function pow_not_2_5(k) {
if (k%2===0) {
return pow_not_2_5(k/2);
}
else if (k%5===0) {
return pow_not_2_5(k/5);
}
else {
return k;
}
}
return n/pow_not_2_5(n);
}
function zero_trailer(p,n,nbz) {
// takes an input two numbers p and n that should be multiplied and a given initial number of zeros (nbz = nb of zeros)
// n is the accumulator of previous multiplications (a power of 5 or 2)
// returns an array [kbz, k] where kbz is the total new number of zeros (nbz + the trailing zeros from the multiplication of p and n)
// and k is the new accumulator (typically a power of 5 or 2)
function zero_aux(k,kbz) {
if (k===0) {
return [1,0];
}
else if (k%10===0) {
return zero_aux(k/10,kbz+1);
}
else {
return [kbz,k];
}
}
return zero_aux(pow_2_5(p)*n,nbz);
}
function out_bound(i,j,n) {
return !((i>=0)&&(i<n)&&(j>=0)&&(j<n));
}
function deepCopy(arr){
var toR = new Array(arr.length);
for(var i=0;i<arr.length;i++){
var toRi = new Array(arr[i].length);
for(var j=0;j<arr[i].length;j++){
toRi[j] = arr[i][j];
}
toR[i] = toRi;
}
return toR;
}
function myMinIndex(arr) {
var min = arr[0][0];
var minIndex = 0;
for (var i = 1; i < arr.length; i++) {
if (arr[i][0] < min) {
minIndex = i;
min = arr[i][0];
}
}
return minIndex;
}
function smallest_trailer(grid) {
var n = grid.length;
function st_aux(i,j,grid_aux, acc_mult, nb_z, path) {
if ((i===n-1)&&(j===n-1)) {
var tmp_acc_nbz_f = zero_trailer(grid_aux[i][j],acc_mult,nb_z);
return [tmp_acc_nbz_f[0], path];
}
else if (out_bound(i,j,n)) {
return [MAX_SAFE_INTEGER,[]];
}
else if (grid_aux[i][j]<0) {
return [MAX_SAFE_INTEGER,[]];
}
else {
var tmp_acc_nbz = zero_trailer(grid_aux[i][j],acc_mult,nb_z) ;
grid_aux[i][j]=-1;
var res = [st_aux(i+1,j,deepCopy(grid_aux), tmp_acc_nbz[1], tmp_acc_nbz[0], path+"D"),
st_aux(i-1,j,deepCopy(grid_aux), tmp_acc_nbz[1], tmp_acc_nbz[0], path+"U"),
st_aux(i,j+1,deepCopy(grid_aux), tmp_acc_nbz[1], tmp_acc_nbz[0], path+"R"),
st_aux(i,j-1,deepCopy(grid_aux), tmp_acc_nbz[1], tmp_acc_nbz[0], path+"L")];
return res[myMinIndex(res)];
}
}
return st_aux(0,0,grid, 1, 0, "");
}
myGrid = [[1, 25, 100],[2, 1, 25],[100, 5, 1]];
console.log(smallest_trailer(myGrid)); //[0,"RDDR"]
myGrid = [[1, 2, 100],[25, 1, 5],[100, 25, 1]];
console.log(smallest_trailer(myGrid)); //[0,"DRDR"]
myGrid = [[1, 10, 1, 1, 1],[1, 1, 1, 10, 1],[10, 10, 10, 10, 1],[10, 10, 10, 10, 1],[10, 10, 10, 10, 1]];
console.log(smallest_trailer(myGrid)); //[0,"DRRURRDDDD"]
This is my Dynamic Programming solution.
https://app.codility.com/demo/results/trainingAXFQ5B-SZQ/
For better understanding we can simplify the task and assume that there are no zeros in the matrix (i.e. matrix contains only positive integers), then the Java solution will be the following:
class Solution {
public int solution(int[][] a) {
int minPws[][] = new int[a.length][a[0].length];
int minPws2 = getMinPws(a, minPws, 2);
int minPws5 = getMinPws(a, minPws, 5);
return min(minPws2, minPws5);
}
private int getMinPws(int[][] a, int[][] minPws, int p) {
minPws[0][0] = pws(a[0][0], p);
//Fullfill the first row
for (int j = 1; j < a[0].length; j++) {
minPws[0][j] = minPws[0][j-1] + pws(a[0][j], p);
}
//Fullfill the first column
for (int i = 1; i < a.length; i++) {
minPws[i][0] = minPws[i-1][0] + pws(a[i][0], p);
}
//Fullfill the rest of matrix
for (int i = 1; i < a.length; i++) {
for (int j = 1; j < a[0].length; j++) {
minPws[i][j] = min(minPws[i-1][j], minPws[i][j-1]) + pws(a[i][j], p);
}
}
return minPws[a.length-1][a[0].length-1];
}
private int pws(int n, int p) {
//Only when n > 0
int pws = 0;
while (n % p == 0) {
pws++;
n /= p;
}
return pws;
}
private int min(int a, int b) {
return (a < b) ? a : b;
}
}

SUM exactly using K elements solution

Problem: On a given array with N numbers, find subset of size M (exactly M elements) that equal to SUM.
I am looking for a Dynamic Programming(DP) solution for this problem. Basically looking to understand the matrix filled approach. I wrote below program but didn't add memoization as i am still wondering how to do that.
#include <stdio.h>
#define SIZE(a) sizeof(a)/sizeof(a[0])
int binary[100];
int a[] = {1, 2, 5, 5, 100};
void show(int* p, int size) {
int j;
for (j = 0; j < size; j++)
if (p[j])
printf("%d\n", a[j]);
}
void subset_sum(int target, int i, int sum, int *a, int size, int K) {
if (sum == target && !K) {
show(binary, size);
} else if (sum < target && i < size) {
binary[i] = 1;
foo(target, i + 1, sum + a[i], a, size, K-1);
binary[i] = 0;
foo(target, i + 1, sum, a, size, K);
}
}
int main() {
int target = 10;
int K = 2;
subset_sum(target, 0, 0, a, SIZE(a), K);
}
Is the below recurrence solution makes sense?
Let DP[SUM][j][k] sum up to SUM with exactly K elements picked from 0 to j elements.
DP[i][j][k] = DP[i][j-1][k] || DP[i-a[j]][j-1][k-1] { input array a[0....j] }
Base cases are:
DP[0][0][0] = DP[0][j][0] = DP[0][0][k] = 1
DP[i][0][0] = DP[i][j][0] = 0
It means we can either consider this element ( DP[i-a[j]][j-1][k-1] ) or we don't consider the current element (DP[i][j-1][k]). If we consider current element, k is reduced by 1 which reduces the elements that needs to be considered and same goes when current element is not considered i.e. K is not reduced by 1.
Your solution looks right to me.
Right now, you're basically backtracking over all possibilities and printing each solution. If you only want one solution, you could add a flag that you set when one solution was found and check before continuing with recursive calls.
For memoization, you should first get rid of the binary array, after which you can do something like this:
int memo[NUM_ELEMENTS][MAX_SUM][MAX_K];
bool subset_sum(int target, int i, int sum, int *a, int size, int K) {
if (sum == target && !K) {
memo[i][sum][K] = true;
return memo[i][sum][K];
} else if (sum < target && i < size) {
if (memo[i][sum][K] != -1)
return memo[i][sum][K];
memo[i][sum][K] = foo(target, i + 1, sum + a[i], a, size, K-1) ||
foo(target, i + 1, sum, a, size, K);
return memo[i][sum][K]
}
return false;
}
Then, look at memo[_all indexes_][target][K]. If this is true, there exists at least one solution. You can store addition information to get you that next solution, or you can iterate with an i from found_index - 1 to 0 and check for which i you have memo[i][sum - a[i]][K - 1] == true. Then recurse on that, and so on. This will allow you to reconstruct the solution using just the memo array.
To my understanding, if only the feasibility of the input has to be checked, the problem can be solved with a two-dimensional state space
bool[][] IsFeasible = new bool[n][k]
where IsFeasible[i][j] is true if and only if there is a subset of the elements 1 to i which sum up to exactly j for every
1 <= i <= n
1 <= j <= k
and for this state space, the recurrence relation
IsFeasible[i][j] = IsFeasible[i-1][k-a[i]] || IsFeasible[i-1][k]
can be used, where the left-hand side of the or-operator || corresponds to selecting the i-th item and the right-hand side corresponds to to not selecting the i-th item. The actual choice of items could be obtained by backtracking or auxiliary information saved during evaluation.

Implementing quickselect

I'm trying to implement the quickselect algorithm. Though, I have understood the theory behind it very well; I'm finding it difficult to convert it into a well functioning program.
Here is how I'm going step by step to implement it and where I am facing problem:
Problem: Find the 4th smallest element in A[] = {2,1,3,7,5,4,6}
k = 4.
index:0|1|2|3|4|5|6
Corresponding values: 2|1|3|7|5|4|6
initially, l = 0 and r = 6
Step 1) Taking pivot as the leftmost element (pivot will always be the leftmost in this problem)-
pivot_index = 0
pivot_value = 2
Step 2) Applying the partition algo; putting the pivot at the right place ([<p][p][>p])-
We get the following array: 1|2|3|7|5|4|6
where, pivot_index = i-1 = 1
and therefore, pivot_value = 2
Step 3) Compare pivot_index with k-
k=3, pivot_index = 1; k>pivot_index
Hence, Our k-th smallest number lies in the right part of the array.
Right array = i to r and we do not bother with the left part (l to i-1) anymore.
Step 4) We modify the value of k as k - (pivot_index) => 4-1 = 2; k = 3.
Here is the problem: Should not the value of k be 2? Because we have two values on the left part of the array: 1|2? Should we calculate k as k - (pivot_index+1)?
Let's assume k = 3 is correct.
Step 5) "New" array to work on: 3|7|5|4|6 with corresponding indexes: 2|3|4|5|6
Now, pivot_index = 2 and pivot_index = 3
Step 6) Applying partition algo on the above array-
3|7|5|4|6 (array remains unchanged as pivot itself is the lowest value).
i = 3
pivot_index = i-1 = 2
pivot_value = 3
Step 7) Compare pivot_index with k
k=3 and pivot_index=2
k > pivot_index
and so on....
Is this approach correct?
Here is my code which is not working. I have used a random number generator to select a random pivot, the pivot is then swapped with the first element in the array.
#include<stdio.h>
#include<stdlib.h>
void print_array(int arr[], int array_length){
int i;
for(i=0; i<array_length; ++i) {
printf("%d ", arr[i]);
}
}
int random_no(min, max){
int diff = max-min;
return (int) (((double)(diff+1)/RAND_MAX) * rand() + min);
}
void swap(int *a, int *b){
int temp;
temp = *a;
*a = *b;
*b = temp;
}
int get_kth_small(int arr[], int k, int l, int r){
if((r-l) >= 1){
k = k + (l-1);
int pivot_index = random_no(l, r);
int i, j;
swap(&arr[pivot_index], &arr[l]); //Switch the pivot with the first element in the array. Now, the pivit is in arr[l]
i=l+1;
for(j=l+1; j<=r; ++j){
if(arr[j]<arr[l]){
swap(&arr[j], &arr[i]);
++i;
}
}
swap(&arr[l], &arr[i-1]); //Switch the pivot to the correct place; <p, p, >p
printf("value of i-1: %d\n", i-1);
printf("Value of k: %d\n", k);
if(k == (i-1)){
printf("Found: %d\n", arr[i]);
return 0;
}
if(k>(i-1)){
k=k-(i-1);
get_kth_small(arr, k, i, r);
} else {
get_kth_small(arr, k, l, r-1);
}
//get_kth_small(arr, k, i, r);
//get_kth_small(arr, k, l, i-1);
}
}
void main(){
srand(time(NULL));
int arr[] = {2,1,3,7,5,4,6};
int arr_size = sizeof(arr)/sizeof(arr[0]);
int k = 3, l = 0;
int r = arr_size - 1;
//printf("Enter the value of k: ");
//scanf("%d", &k);
get_kth_small(arr, k, l, r);
print_array(arr, arr_size);
printf("\n");
}
What you describe is a valid way to implement quick select. There are numerous other approaches how to select the pivot and most of them will give a better expected complexity but in essence the algorithm is the same.
"Step 2: putting the pivot at the right place": don't do that. In fact you can't put the pivot at the right place, as you don't know what it is. The partitioning rule is to put all elements smaller or equal than the pivot before those larger. Just leave the pivot where it is!
Quick select goes as follows: to find the Kth among N elements, 1) choose a pivot value, 2) move all elements smaller or equal to the pivot before the others, forming two zones of length Nle and Ngt, 3) recurse on the relevant zone with (K, Nle) or (K-Nle, Ngt), until N=1.
Actually, any value can be taken for the pivot, even one not present in the array; but the partition must be such that Nle and Ngt are nonzero.

Perfect minimal hash for mathematical combinations

First, define two integers N and K, where N >= K, both known at compile time. For example: N = 8 and K = 3.
Next, define a set of integers [0, N) (or [1, N] if that makes the answer simpler) and call it S. For example: {0, 1, 2, 3, 4, 5, 6, 7}
The number of subsets of S with K elements is given by the formula C(N, K). Example
My problem is this: Create a perfect minimal hash for those subsets. The size of the example hash table will be C(8, 3) or 56.
I don't care about ordering, only that there be 56 entries in the hash table, and that I can determine the hash quickly from a set of K integers. I also don't care about reversibility.
Example hash: hash({5, 2, 3}) = 42. (The number 42 isn't important, at least not here)
Is there a generic algorithm for this that will work with any values of N and K? I wasn't able to find one by searching Google, or my own naive efforts.
There is an algorithm to code and decode a combination into its number in the lexicographical order of all combinations with a given fixed K. The algorithm is linear to N for both code and decode of the combination. What language are you interested in?
EDIT: here is example code in c++(it founds the lexicographical number of a combination in the sequence of all combinations of n elements as opposed to the ones with k elements but is really good starting point):
typedef long long ll;
// Returns the number in the lexicographical order of all combinations of n numbers
// of the provided combination.
ll code(vector<int> a,int n)
{
sort(a.begin(),a.end());
int cur = 0;
int m = a.size();
ll res =0;
for(int i=0;i<a.size();i++)
{
if(a[i] == cur+1)
{
res++;
cur = a[i];
continue;
}
else
{
res++;
int number_of_greater_nums = n - a[i];
for(int j = a[i]-1,increment=1;j>cur;j--,increment++)
res += 1LL << (number_of_greater_nums+increment);
cur = a[i];
}
}
return res;
}
// Takes the lexicographical code of a combination of n numbers and returns the
// combination
vector<int> decode(ll kod, int n)
{
vector<int> res;
int cur = 0;
int left = n; // Out of how many numbers are we left to choose.
while(kod)
{
ll all = 1LL << left;// how many are the total combinations
for(int i=n;i>=0;i--)
{
if(all - (1LL << (n-i+1)) +1 <= kod)
{
res.push_back(i);
left = n-i;
kod -= all - (1LL << (n-i+1)) +1;
break;
}
}
}
return res;
}
I am sorry I have an algorithm for the problem you are asking for right now, but I believe it will be a good exercise to try to understand what I do above. Truth is this is one of the algorithms I teach in the course "Design and analysis of algorithms" and that is why I had it pre-written.
This is what you (and I) need:
hash() maps k-tuples from [1..n] onto the set 1..C(n,k)\subset N.
The effort is k subtractions (and O(k) is a lower bound anyway, see Strandjev's remark above):
// bino[n][k] is (n "over" k) = C(n,k) = {n \choose k}
// these are assumed to be precomputed globals
int hash(V a,int n, int k) {// V is assumed to be ordered, a_k<...<a_1
// hash(a_k,..,a_2,a_1) = (n k) - sum_(i=1)^k (n-a_i i)
// ii is "inverse i", runs from left to right
int res = bino[n][k];
int i;
for(unsigned int ii = 0; ii < a.size(); ++ii) {
i = a.size() - ii;
res = res - bino[n-a[ii]][i];
}
return res;
}

Dynamic programming exercise for string cutting

I have been working on the following problem from this book.
A certain string-processing language offers a primitive operation which splits a string into two pieces. Since this operation involves copying the original string, it takes n units of time for a string of length n, regardless of the location of the cut. Suppose, now, that you want to break a string into many pieces. The order in which the breaks are made can affect the total running time. For example, if you want to cut a 20-character string at positions 3 and 10, then making the first cut at position 3 incurs a total cost of 20+17=37, while doing position 10 first has a better cost of 20+10=30.
I need a dynamic programming algorithm that given m cuts, finds the minimum cost of cutting a string into m+1 pieces.
The divide and conquer approach seems to me the best one for this kind of problem. Here is a Java implementation of the algorithm:
Note: the array m should be sorted in ascending order (use Arrays.sort(m);)
public int findMinCutCost(int[] m, int n) {
int cost = n * m.length;
for (int i=0; i<m.length; i++) {
cost = Math.min(findMinCutCostImpl(m, n, i), cost);
}
return cost;
}
private int findMinCutCostImpl(int[] m, int n, int i) {
if (m.length == 1) return n;
int cl = 0, cr = 0;
if (i > 0) {
cl = Integer.MAX_VALUE;
int[] ml = Arrays.copyOfRange(m, 0, i);
int nl = m[i];
for (int j=0; j<ml.length; j++) {
cl = Math.min(findMinCutCostImpl(ml, nl, j), cl);
}
}
if (i < m.length - 1) {
cr = Integer.MAX_VALUE;
int[] mr = Arrays.copyOfRange(m, i + 1, m.length);
int nr = n - m[i];
for (int j=0; j<mr.length; j++) {
mr[j] = mr[j] - m[i];
}
for (int j=0; j<mr.length; j++) {
cr = Math.min(findMinCutCostImpl(mr, nr, j), cr);
}
}
return n + cl + cr;
}
For example :
int n = 20;
int[] m = new int[] { 10, 3 };
System.out.println(findMinCutCost(m, n));
Will print 30
** Edit **
I have implemented two other methods to answer the problem in the question.
1. Median cut approximation
This method cut recursively always the biggest chunks. The results are not always the best solution, but offers a not negligible gain (in the order of +100000% gain from my tests) for a negligible minimal cut loss difference from the best cost.
public int findMinCutCost2(int[] m, int n) {
if (m.length == 0) return 0;
if (m.length == 1) return n;
float half = n/2f;
int bestIndex = 0;
for (int i=1; i<m.length; i++) {
if (Math.abs(half - m[bestIndex]) > Math.abs(half - m[i])) {
bestIndex = i;
}
}
int cl = 0, cr = 0;
if (bestIndex > 0) {
int[] ml = Arrays.copyOfRange(m, 0, bestIndex);
int nl = m[bestIndex];
cl = findMinCutCost2(ml, nl);
}
if (bestIndex < m.length - 1) {
int[] mr = Arrays.copyOfRange(m, bestIndex + 1, m.length);
int nr = n - m[bestIndex];
for (int j=0; j<mr.length; j++) {
mr[j] = mr[j] - m[bestIndex];
}
cr = findMinCutCost2(mr, nr);
}
return n + cl + cr;
}
2. A constant time multi-cut
Instead of calculating the minimal cost, just use different indices and buffers. Since this method executes in a constant time, it always returns n. Plus, the method actually split the string in substrings.
public int findMinCutCost3(int[] m, int n) {
char[][] charArr = new char[m.length+1][];
charArr[0] = new char[m[0]];
for (int i=0, j=0, k=0; j<n; j++) {
//charArr[i][k++] = string[j]; // string is the actual string to split
if (i < m.length && j == m[i]) {
if (++i >= m.length) {
charArr[i] = new char[n - m[i-1]];
} else {
charArr[i] = new char[m[i] - m[i-1]];
}
k=0;
}
}
return n;
}
Note: that this last method could easily be modified to accept a String str argument instead of n and set n = str.length(), and return a String[] array from charArr[][].
For dynamic programming, I claim that all you really need to know is what the state space should be - how to represent partial problems.
Here we are dividing a string up into m+1 pieces by creating new breaks. I claim that a good state space is a set of (a, b) pairs, where a is the location of the start of a substring and b is the location of the end of the same substring, counted as number of breaks in the final broken down string. The cost associated with each pair is the minimum cost of breaking it up. If b <= a + 1, then the cost is 0, because there are no more breaks to put in. If b is larger, then the possible locations for the next break in that substring are the points a+1, a+2,... b-1. The next break is going to cost b-a regardless of where we put it, but if we put it at position k the minimum cost of later breaks is (a, k) + (k, b).
So to solve this with dynamic programming, build up a table (a, b) of minimum costs, where you can work out the cost of breaks on strings with k sections by considering k - 1 possible breaks and then looking up the costs of strings with at most k - 1 sections.
One way to expand on this would be to start by creating a table T[a, b] and setting all entries in that table to infinity. Then go over the table again and where b <= a+1 put T[a,b] = 0. This fills in entries representing sections of the original string which need no further cuts. Now scan through the table and for each T[a,b] with b > a + 1 consider every possible k such that a < k < b and if min_k ((length between breaks a and b) + T[a,k] + T[k,b]) < T[a,b] set T[a,b] to that minimum value. This recognizes where you now know a way to chop up the substrings represented by T[a,k] and T[k,b] cheaply, so this gives you a better way to chop up T[a,b]. If you now repeat this m times you are done - use a standard dynamic programming backtrack to work out the solution. It might help if you save the best value of k for each T[a,b] in a separate table.
python code:
mincost(n, cut_list) =min { n+ mincost(k,left_cut_list) + min(n-k, right_cut_list) }
import sys
def splitstr(n,cut_list):
if len(cut_list) == 0:
return [0,[]]
min_positions = []
min_cost = sys.maxint
for k in cut_list:
left_split = [ x for x in cut_list if x < k]
right_split = [ x-k for x in cut_list if x > k]
#print n,k, left_split, right_split
lcost = splitstr(k,left_split)
rcost = splitstr(n-k,right_split)
cost = n+lcost[0] + rcost[0]
positions = [k] + lcost[1]+ [x+k for x in rcost[1]]
#print "cost:", cost, " min: ", positions
if cost < min_cost:
min_cost = cost
min_positions = positions
return ( min_cost, min_positions)
print splitstr(20,[3,10,16]) # (40, [10, 3, 16])
print splitstr(20,[3,10]) # (30, [10, 3])
print splitstr(5,[1,2,3,4,5]) # (13, [2, 1, 3, 4, 5])
print splitstr(1,[1]) # (1, [1]) # m cuts m+1 substrings
Here is a c++ implementation. Its an O(n^3) Implementation using D.P . Assuming that the cut array is sorted . If it is not it takes O(n^3) time to sort it hence asymptotic time complexity remains same.
#include <iostream>
#include <string.h>
#include <stdio.h>
#include <limits.h>
using namespace std;
int main(){
int i,j,gap,k,l,m,n;
while(scanf("%d%d",&n,&k)!=EOF){
int a[n+1][n+1];
int cut[k];
memset(a,0,sizeof(a));
for(i=0;i<k;i++)
cin >> cut[i];
for(gap=1;gap<=n;gap++){
for(i=0,j=i+gap;j<=n;j++,i++){
if(gap==1)
a[i][j]=0;
else{
int min = INT_MAX;
for(m=0;m<k;m++){
if(cut[m]<j and cut[m] >i){
int cost=(j-i)+a[i][cut[m]]+a[cut[m]][j];
if(cost<min)
min=cost;
}
}
if(min>=INT_MAX)
a[i][j]=0;
else
a[i][j]=min;
}
}
}
cout << a[0][n] << endl;
}
return 0;
}

Resources