How to propose a "Subset operations" algorithm? - algorithm

I am working on this problem:
Given a set (or multiset) of positive numbers, find all the numbers which are combination of some elements from the set. Combination means sum, subtraction or product.
For example, if A = {3, 4, 7}, we have to find 3, 4, 7, 3+4, 3*4, |3-4|, 3+7, 3*7, |3-7|, 4+7, 4*7, |4-7|, 3+4+7, 3+4*7, 3+|4-7|, |3+7-4|, |3*7-4|...
Fortunately, our set is not bigger than 10 numbers, but I am not able to find an algorithm to find all the solutions. You can see this problem as the "subset sum problem" (given a set A and an integer k, say if A contains a subset whose elements sum k) but instead of sum, it is a combination of operators, and we want to find all the possible k-values.
I tried but too many possible solutions are missing. It is not correct but I only want to show the essence of my idea: (C++ code)
vector<int> analyze (vector<int> v) {
if (v.size()==1) return v[0];
vector<int> result;
vector<int> u = analyze(v.delete(1)); //u = analyze(v[1], ..., v[n])
for (int i = 0; i < u.size(); i++) {
result.add(v[0] + u[i]);
result.add(v[0] * u[i]);
result.add(abs(v[0] - u[i]));
}
result.add(v[0]);
return result + u; //Union
}
If A = {a, b, c, d} this function will not return:
a*(c+b*d)
(ab)+(cd)
|a-d|+b
Anyone knows about how to approach this problem, or any bibliography which helps?

I'd suggest to generate all parse trees (there is no need to do it explicitly).
Let's assume that we have a subset of the initial set. If there is one number, we just return it. Otherwise, we iterate over all operations. For a fixed operation, we iterate over all ways to partition the set into two subsets. We can compute all possible expressions for the subsets recursively and then combine them using this operation.
To take into account the fact that we are allowed not to use some of the numbers, we can run this algorithm for all subsets of the given set.

Related

Need help understanding the solution for the Jewelry Topcoder solution

I am fairly new to dynamic programming and don't yet understand most of the types of problems it can solve. Hence I am facing problems in understaing the solution of Jewelry topcoder problem.
Can someone at least give me some hints as to what the code is doing ?
Most importantly is this problem a variant of the subset-sum problem ? Because that's what I am studying to make sense of this problem.
What are these two functions actually counting ? Why are we using actually two DP tables ?
void cnk() {
nk[0][0]=1;
FOR(k,1,MAXN) {
nk[0][k]=0;
}
FOR(n,1,MAXN) {
nk[n][0]=1;
FOR(k,1,MAXN)
nk[n][k] = nk[n-1][k-1]+nk[n-1][k];
}
}
void calc(LL T[MAXN+1][MAX+1]) {
T[0][0] = 1;
FOR(x,1,MAX) T[0][x]=0;
FOR(ile,1,n) {
int a = v[ile-1];
FOR(x,0,MAX) {
T[ile][x] = T[ile-1][x];
if(x>=a) T[ile][x] +=T[ile-1][x-a];
}
}
}
How is the original solution constructed by using the following logic ?
FOR(u,1,c) {
int uu = u * v[done];
FOR(x,uu,MAX)
res += B[done][x-uu] * F[n-done-u][x] * nk[c][u];
}
done=p;
}
Any help would be greatly appreciated.
Let's consider the following task first:
"Given a vector V of N positive integers less than K, find the number of subsets whose sum equals S".
This can be solved in polynomial time with dynamic programming using some extra-memory.
The dynamic programming approach goes like this:
instead of solving the problem for N and S, we will solve all the problems of the following form:
"Find the number of ways to write sum s (with s ≤ S) using only the first n ≤ N of the numbers".
This is a common characteristic of the dynamic programming solutions: instead of only solving the original problem, you solve an entire family of related problems. The key idea is that solutions for more difficult problem settings (i.e. higher n and s) can efficiently be built up from the solutions of the easier settings.
Solving the problem for n = 0 is trivial (sum s = 0 can be expressed in one way -- using the empty set, while all other sums can't be expressed in any ways).
Now consider that we have solved the problem for all values up to a certain n and that we have these solutions in a matrix A (i.e. A[n][s] is the number of ways to write sum s using the first n elements).
Then, we can find the solutions for n+1, using the following formula:
A[n+1][s] = A[n][s - V[n+1]] + A[n][s].
Indeed, when we write the sum s using the first n+1 numbers we can either include or not V[n+1] (the n+1th term).
This is what the calc function computes. (the cnk function uses Pascal's rule to compute binomial coefficients)
Note: in general, if in the end we are only interested in answering the initial problem (i.e. for N and S), then the array A can be uni-dimensional (with length S) -- this is because whenever trying to construct solutions for n + 1 we only need the solutions for n, and not for smaller values).
This problem (the one initially stated in this answer) is indeed related to the subset sum problem (finding a subset of elements with sum zero).
A similar type of dynamic programming approach can be applied if we have a reasonable limit on the absolute values of the integers used (we need to allocate an auxiliary array to represent all possible reachable sums).
In the zero-sum problem we are not actually interested in the count, thus the A array can be an array of booleans (indicating whether a sum is reachable or not).
In addition, another auxiliary array, B can be used to allow reconstructing the solution if one exists.
The recurrence would now look like this:
if (!A[s] && A[s - V[n+1]]) {
A[s] = true;
// the index of the last value used to reach sum _s_,
// allows going backwards to reproduce the entire solution
B[s] = n + 1;
}
Note: the actual implementation requires some additional care for handling the negative sums, which can not directly represent indices in the array (the indices can be shifted by taking into account the minimum reachable sum, or, if working in C/C++, a trick like the one described in this answer can be applied: https://stackoverflow.com/a/3473686/6184684).
I'll detail how the above ideas apply in the TopCoder problem and its solution linked in the question.
The B and F matrices.
First, note the meaning of the B and F matrices in the solution:
B[i][s] represents the number of ways to reach sum s using only the smallest i items
F[i][s] represents the number of ways to reach sum s using only the largest i items
Indeed, both matrices are computed using the calc function, after sorting the array of jewelry values in ascending order (for B) and descending order (for F).
Solution for the case with no duplicates.
Consider first the case with no duplicate jewelry values, using this example: [5, 6, 7, 11, 15].
For the reminder of the answer I will assume that the array was sorted in ascending order (thus "first i items" will refer to the smallest i ones).
Each item given to Bob has value less (or equal) to each item given to Frank, thus in every good solution there will be a separation point such that Bob receives only items before that separation point, and Frank receives only items after that point.
To count all solutions we would need to sum over all possible separation points.
When, for example, the separation point is between the 3rd and 4th item, Bob would pick items only from the [5, 6, 7] sub-array (smallest 3 items), and Frank would pick items from the remaining [11, 12] sub-array (largest 2 items). In this case there is a single sum (s = 11) that can be obtained by both of them. Each time a sum can be obtained by both, we need to multiply the number of ways that each of them can reach the respective sum (e.g. if Bob could reach a sum s in 4 ways and Frank could reach the same sum s in 5 ways, then we could get 20 = 4 * 5 valid solutions with that sum, because each combination is a valid solution).
Thus we would get the following code by considering all separation points and all possible sums:
res = 0;
for (int i = 0; i < n; i++) {
for (int s = 0; s <= maxS; s++) {
res += B[i][s] * F[n-i][s]
}
}
However, there is a subtle issue here. This would often count the same combination multiple times (for various separation points). In the example provided above, the same solution with sum 11 would be counted both for the separation [5, 6] - [7, 11, 15], as well as for the separation [5, 6, 7] - [11, 15].
To alleviate this problem we can partition the solutions by "the largest value of an item picked by Bob" (or, equivalently, by always forcing Bob to include in his selection the largest valued item from the first sub-array under the current separation).
In order to count the number of ways to reach sum s when Bob's largest valued item is the ith one (sorted in ascending order), we can use B[i][s - v[i]]. This holds because using the v[i] valued item implies requiring the sum s - v[i] to be expressed using subsets from the first i items (indices 0, 1, ... i - 1).
This would be implemented as follows:
res = 0;
for (int i = 0; i < n; i++) {
for (int s = v[i]; s <= maxS; s++) {
res += B[i][s - v[i]] * F[n - 1 - i][s];
}
}
This is getting closer to the solution on TopCoder (in that solution, done corresponds to the i above, and uu = v[i]).
Extension for the case when duplicates are allowed.
When duplicate values can appear in the array, it's no longer easy to directly count the number of solutions when Bob's most valuable item is v[i]. We need to also consider the number of such items picked by Bob.
If there are c items that have the same value as v[i], i.e. v[i] = v[i+1] = ... v[i + c - 1], and Bob picks u such items, then the number of ways for him to reach a certain sum s is equal to:
comb(c, u) * B[i][s - u * v[i]] (1)
Indeed, this holds because the u items can be picked from the total of c which have the same value in comb(c, u) ways. For each such choice of the u items, the remaining sum is s - u * v[i], and this should be expressed using a subset from the first i items (indices 0, 1, ... i - 1), thus it can be done in B[i][s - u * v[i]] ways.
For Frank, if Bob used u of the v[i] items, the number of ways to express sum s will be equal to:
F[n - i - u][s] (2)
Indeed, since Bob uses the smallest i + u values, Frank can use any of the largest n - i - u values to reach the sum s.
By combining relations (1) and (2) from above, we obtain that the number of solutions where both Frank and Bob have sum s, when Bob's most valued item is v[i] and he picks u such items is equal to:
comb(c, u) * B[i][s - u * v[i]] * F[n - i - u][s].
This is precisely what the given solution implements.
Indeed, the variable done corresponds to variable i above, variable x corresponds to sums s, the index p is used to determine the c items with same value as v[done], and the loop over u is used in order to consider all possible numbers of such items picked by Bob.
Here's some Java code for this that references the original solution. It also incorporates qwertyman's fantastic explanations (to the extent feasible). I've added some of my comments along the way.
import java.util.*;
public class Jewelry {
int MAX_SUM=30005;
int MAX_N=30;
long[][] C;
// Generate all possible sums
// ret[i][sum] = number of ways to compute sum using the first i numbers from val[]
public long[][] genDP(int[] val) {
int i, sum, n=val.length;
long[][] ret = new long[MAX_N+1][MAX_SUM];
ret[0][0] = 1;
for(i=0; i+1<=n; i++) {
for(sum=0; sum<MAX_SUM; sum++) {
// Carry over the sum from i to i+1 for each sum
// Problem definition allows excluding numbers from calculating sums
// So we are essentially excluding the last number for this calculation
ret[i+1][sum] = ret[i][sum];
// DP: (Number of ways to generate sum using i+1 numbers =
// Number of ways to generate sum-val[i] using i numbers)
if(sum>=val[i])
ret[i+1][sum] += ret[i][sum-val[i]];
}
}
return ret;
}
// C(n, r) - all possible combinations of choosing r numbers from n numbers
// Leverage Pascal's polynomial co-efficients for an n-degree polynomial
// Leverage Dynamic Programming to build this upfront
public void nCr() {
C = new long[MAX_N+1][MAX_N+1];
int n, r;
C[0][0] = 1;
for(n=1; n<=MAX_N; n++) {
C[n][0] = 1;
for(r=1; r<=MAX_N; r++)
C[n][r] = C[n-1][r-1] + C[n-1][r];
}
}
/*
General Concept:
- Sort array
- Incrementally divide array into two partitions
+ Accomplished by using two different arrays - L for left, R for right
- Take all possible sums on the left side and match with all possible sums
on the right side (multiply these numbers to get totals for each sum)
- Adjust for common sums so as to not overcount
- Adjust for duplicate numbers
*/
public long howMany(int[] values) {
int i, j, sum, n=values.length;
// Pre-compute C(n,r) and store in C[][]
nCr();
/*
Incrementally split the array and calculate sums on either side
For eg. if val={2, 3, 4, 5, 9}, we would partition this as
{2 | 3, 4, 5, 9} then {2, 3 | 4, 5, 9}, etc.
First, sort it ascendingly and generate its sum matrix L
Then, sort it descendingly, and generate another sum matrix R
In later calculations, manipulate indexes to simulate the partitions
So at any point L[i] would correspond to R[n-i-1]. eg. L[1] = R[5-1-1]=R[3]
*/
// Sort ascendingly
Arrays.sort(values);
// Generate all sums for the "Left" partition using the sorted array
long[][] L = genDP(values);
// Sort descendingly by reversing the existing array.
// Java 8 doesn't support Arrays.sort for primitive int types
// Use Comparator or sort manually. This uses the manual sort.
for(i=0; i<n/2; i++) {
int tmp = values[i];
values[i] = values[n-i-1];
values[n-i-1] = tmp;
}
// Generate all sums for the "Right" partition using the re-sorted array
long[][] R = genDP(values);
// Re-sort in ascending order as we will be using values[] as reference later
Arrays.sort(values);
long tot = 0;
for(i=0; i<n; i++) {
int dup=0;
// How many duplicates of values[i] do we have?
for(j=0; j<n; j++)
if(values[j] == values[i])
dup++;
/*
Calculate total by iterating through each sum and multiplying counts on
both partitions for that sum
However, there may be count of sums that get duplicated
For instance, if val={2, 3, 4, 5, 9}, you'd get:
{2, 3 | 4, 5, 9} and {2, 3, 4 | 5, 9} (on two different iterations)
In this case, the subset {2, 3 | 5} is counted twice
To account for this, exclude the current largest number, val[i], from L's
sum and exclude it from R's i index
There is another issue of duplicate numbers
Eg. If values={2, 3, 3, 3, 4}, how do you know which 3 went to L?
To solve this, group the same numbers
Applying to {2, 3, 3, 3, 4} :
- Exclude 3, 6 (3+3) and 9 (3+3+3) from L's sum calculation
- Exclude 1, 2 and 3 from R's index count
We're essentially saying that we will exclude the sum contribution of these
elements to L and ignore their count contribution to R
*/
for(j=1; j<=dup; j++) {
int dup_sum = j*values[i];
for(sum=dup_sum; sum<MAX_SUM; sum++) {
// (ways to pick j numbers from dup) * (ways to get sum-dup_sum from i numbers) * (ways to get sum from n-i-j numbers)
if(n-i-j>=0)
tot += C[dup][j] * L[i][sum-dup_sum] * R[n-i-j][sum];
}
}
// Skip past the duplicates of values[i] that we've now accounted for
i += dup-1;
}
return tot;
}
}

Incorrect Recursive approach to finding combinations of coins to produce given change

I was recently doing a project euler problem (namely #31) which was basically finding out how many ways we can sum to 200 using elements of the set {1,2,5,10,20,50,100,200}.
The idea that I used was this: the number of ways to sum to N is equal to
(the number of ways to sum N-k) * (number of ways to sum k), summed over all possible values of k.
I realized that this approach is WRONG, namely due to the fact that it creates several several duplicate counts. I have tried to adjust the formula to avoid duplicates, but to no avail. I am seeking the wisdom of stack overflowers regarding:
whether my recursive approach is concerned with the correct subproblem to solve
If there exists one, what would be an effective way to eliminate duplicates
how should we approach recursive problems such that we are concerned with the correct subproblem? what are some indicators that we've chosen a correct (or incorrect) subproblem?
When trying to avoid duplicate permutations, a straightforward strategy that works in most cases is to only create rising or falling sequences.
In your example, if you pick a value and then recurse with the whole set, you will get duplicate sequences like 50,50,100 and 50,100,50 and 100,50,50. However, if you recurse with the rule that the next value should be equal to or smaller than the currently selected value, out of those three you will only get the sequence 100,50,50.
So an algorithm that counts only unique combinations would be e.g.:
function uniqueCombinations(set, target, previous) {
for all values in set not greater than previous {
if value equals target {
increment count
}
if value is smaller than target {
uniqueCombinations(set, target - value, value)
}
}
}
uniqueCombinations([1,2,5,10,20,50,100,200], 200, 200)
Alternatively, you can create a copy of the set before every recursion, and remove the elements from it that you don't want repeated.
The rising/falling sequence method also works with iterations. Let's say you want to find all unique combinations of three letters. This algorithm will print results like a,c,e, but not a,e,c or e,a,c:
for letter1 is 'a' to 'x' {
for letter2 is first letter after letter1 to 'y' {
for letter3 is first letter after letter2 to 'z' {
print [letter1,letter2,letter3]
}
}
}
m69 gives a nice strategy that often works, but I think it's worthwhile to better understand why it works. When trying to count items (of any kind), the general principle is:
Think of a rule that classifies any given item into exactly one of several non-overlapping categories. That is, come up with a list of concrete categories A, B, ..., Z that will make the following sentence true: An item is either in category A, or in category B, or ..., or in category Z.
Once you have done this, you can safely count the number of items in each category and add these counts together, comfortable in the knowledge that (a) any item that is counted in one category is not counted again in any other category, and (b) any item that you want to count is in some category (i.e., none are missed).
How could we form categories for your specific problem here? One way to do it is to notice that every item (i.e., every multiset of coin values that sums to the desired total N) either contains the 50-coin exactly zero times, or it contains it exactly once, or it contains it exactly twice, or ..., or it contains it exactly RoundDown(N / 50) times. These categories don't overlap: if a solution uses exactly 5 50-coins, it pretty clearly can't also use exactly 7 50-coins, for example. Also, every solution is clearly in some category (notice that we include a category for the case in which no 50-coins are used). So if we had a way to count, for any given k, the number of solutions that use coins from the set {1,2,5,10,20,50,100,200} to produce a sum of N and use exactly k 50-coins, then we could sum over all k from 0 to N/50 and get an accurate count.
How to do this efficiently? This is where the recursion comes in. The number of solutions that use coins from the set {1,2,5,10,20,50,100,200} to produce a sum of N and use exactly k 50-coins is equal to the number of solutions that sum to N-50k and do not use any 50-coins, i.e. use coins only from the set {1,2,5,10,20,100,200}. This of course works for any particular coin denomination that we could have chosen, so these subproblems have the same shape as the original problem: we can solve each one by simply choosing another coin arbitrarily (e.g. the 10-coin), forming a new set of categories based on this new coin, counting the number of items in each category and summing them up. The subproblems become smaller until we reach some simple base case that we process directly (e.g. no allowed coins left: then there is 1 item if N=0, and 0 items otherwise).
I started with the 50-coin (instead of, say, the largest or the smallest coin) to emphasise that the particular choice used to form the set of non-overlapping categories doesn't matter for the correctness of the algorithm. But in practice, passing explicit representations of sets of coins around is unnecessarily expensive. Since we don't actually care about the particular sequence of coins to use for forming categories, we're free to choose a more efficient representation. Here (and in many problems), it's convenient to represent the set of allowed coins implicitly as simply a single integer, maxCoin, which we interpret to mean that the first maxCoin coins in the original ordered list of coins are the allowed ones. This limits the possible sets we can represent, but here that's OK: If we always choose the last allowed coin to form categories on, we can communicate the new, more-restricted "set" of allowed coins to subproblems very succinctly by simply passing the argument maxCoin-1 to it. This is the essence of m69's answer.
There's some good guidance here. Another way to think about this is as a dynamic program. For this, we must pose the problem as a simple decision among options that leaves us with a smaller version of the same problem. It boils out to a certain kind of recursive expression.
Put the coin values c0, c1, ... c_(n-1) in any order you like. Then define W(i,v) as the number of ways you can make change for value v using coins ci, c_(i+1), ... c_(n-1). The answer we want is W(0,200). All that's left is to define W:
W(i,v) = sum_[k = 0..floor(200/ci)] W(i+1, v-ci*k)
In words: the number of ways we can make change with coins ci onward is to sum up all the ways we can make change after a decision to use some feasible number k of coins ci, removing that much value from the problem.
Of course we need base cases for the recursion. This happens when i=n-1: the last coin value. At this point there's a way to make change if and only if the value we need is an exact multiple of c_(n-1).
W(n-1,v) = 1 if v % c_(n-1) == 0 and 0 otherwise.
We generally don't want to implement this as a simple recursive function. The same argument values occur repeatedly, which leads to an exponential (in n and v) amount of wasted computation. There are simple ways to avoid this. Tabular evaluation and memoization are two.
Another point is that it is more efficient to have the values in descending order. By taking big chunks of value early, the total number of recursive evaluations is minimized. Additionally, since c_(n-1) is now 1, the base case is just W(n-1)=1. Now it becomes fairly obvious that we can add a second base case as an optimization: W(n-2,v) = floor(v/c_(n-2)). That's how many times the for loop will sum W(n-1,1) = 1!
But this is gilding a lilly. The problem is so small that exponential behavior doesn't signify. Here is a little implementation to show that order really doesn't matter:
#include <stdio.h>
#define n 8
int cv[][n] = {
{200,100,50,20,10,5,2,1},
{1,2,5,10,20,50,100,200},
{1,10,100,2,20,200,5,50},
};
int *c;
int w(int i, int v) {
if (i == n - 1) return v % c[n - 1] == 0;
int sum = 0;
for (int k = 0; k <= v / c[i]; ++k)
sum += w(i + 1, v - c[i] * k);
return sum;
}
int main(int argc, char *argv[]) {
unsigned p;
if (argc != 2 || sscanf(argv[1], "%d", &p) != 1 || p > 2) p = 0;
c = cv[p];
printf("Ways(%u) = %d\n", p, w(0, 200));
return 0;
}
Drumroll, please...
$ ./foo 0
Ways(0) = 73682
$ ./foo 1
Ways(1) = 73682
$ ./foo 2
Ways(2) = 73682

Dynamic Programming Coin Change Problems

I am having issues with understanding dynamic programming solutions to various problems, specifically the coin change problem:
"Given a value N, if we want to make change for N cents, and we have infinite supply of each of S = { S1, S2, .. , Sm} valued coins, how many ways can we make the change? The order of coins doesn’t matter.
For example, for N = 4 and S = {1,2,3}, there are four solutions: {1,1,1,1},{1,1,2},{2,2},{1,3}. So output should be 4. For N = 10 and S = {2, 5, 3, 6}, there are five solutions: {2,2,2,2,2}, {2,2,3,3}, {2,2,6}, {2,3,5} and {5,5}. So the output should be 5."
There is another variation of this problem where the solution is the minimum number of coins to satisfy the amount.
These problems appear very similar, but the solutions are very different.
Number of possible ways to make change: the optimal substructure for this is DP(m,n) = DP(m-1, n) + DP(m, n-Sm) where DP is the number of solutions for all coins up to the mth coin and amount=n.
Minimum amount of coins: the optimal substructure for this is
DP[i] = Min{ DP[i-d1], DP[i-d2],...DP[i-dn] } + 1 where i is the total amount and d1..dn represent each coin denomination.
Why is it that the first one required a 2-D array and the second a 1-D array? Why is the optimal substructure for the number of ways to make change not "DP[i] = DP[i-d1]+DP[i-d2]+...DP[i-dn]" where DP[i] is the number of ways i amount can be obtained by the coins. It sounds logical to me, but it produces an incorrect answer. Why is that second dimension for the coins needed in this problem, but not needed in the minimum amount problem?
LINKS TO PROBLEMS:
http://comproguide.blogspot.com/2013/12/minimum-coin-change-problem.html
http://www.geeksforgeeks.org/dynamic-programming-set-7-coin-change/
Thanks in advance. Every website I go to only explains how the solution works, not why other solutions do not work.
Lets first talk about the number of ways, DP(m,n) = DP(m-1, n) + DP(m, n-Sm). This in indeed correct because either you can use the mth denomination or you can avoid it. Now you say why don't we write it as DP[i] = DP[i-d1]+DP[i-d2]+...DP[i-dn]. Well this will lead to over counting , lets take an example where n=4 m=2 and S={1,3}. Now according to your solution dp[4]=dp[1]+dp[3]. ( Assuming 1 to be a base case dp[1]=1 ) .Now dp[3]=dp[2]+dp[0]. ( Again dp[0]=1 by base case ). Again applying the same dp[2]=dp[1]=1. Thus in total you get answer as 3 when its supposed to be just 2 ( (1,3) and (1,1,1,1) ). Its so because
your second method treats (1,3) and (3,1) as two different solution.Your second method can be applied to case where order matters, which is also a standard problem.
Now to your second question you say that minimum number of denominations can
be found out by DP[i] = Min{ DP[i-d1], DP[i-d2],...DP[i-dn] } + 1. Well this is correct as in finding minimum denominations, order or no order does not matter. Why this is linear / 1-D DP , well although the DP array is 1-D each state depends on at most m states unlike your first solution where array is 2-D but each state depends on at most 2 states. So in both case run time which is ( number of states * number of states each state depends on ) is the same which is O(nm). So both are correct, just your second solution saves memory. So either you can find it by 1-D array method or by 2-D by using the recurrence
dp(n,m)=min(dp(m-1,n),1+dp(m,n-Sm)). (Just use min in your first recurrence)
Hope I cleared the doubts , do post if still something is unclear.
This is a very good explanation of the coin change problem using Dynamic Programming.
The code is as follows:
public static int change(int amount, int[] coins){
int[] combinations = new int[amount + 1];
combinations[0] = 1;
for(int coin : coins){
for(int i = 1; i < combinations.length; i++){
if(i >= coin){
combinations[i] += combinations[i - coin];
//printAmount(combinations);
}
}
//System.out.println();
}
return combinations[amount];
}

Generating Balls in Boxes

Given two sorted vectors a and b, find all vectors which are sums of a and some permutation of b, and which are unique once sorted.
You can create one of the sought vectors in the following way:
Take vector a and a permutation of vector b.
Sum them together so c[i]=a[i]+b[i].
Sort c.
I'm interested in finding the set of b-permutations that yield the entire set of unique c vectors.
Example 0: a='ccdd' and b='xxyy'
Gives the summed vectors: 'cycydxdx', 'cxcxdydy', 'cxcydxdy'.
Notice that the permutations of b: 'xyxy' and 'yxyx' are equal, because in both cases the "box c" and the "box d" both get exactly one 'x' and one 'y'.
I guess this is similar to putting M balls in M boxes (one in each) with some groups of balls and boxes being identical.
Update: Given a string a='aabbbcdddd' and b='xxyyzzttqq' your problem will be 10 balls in 4 boxes. There are 4 distinct boxes of size 2, 3, 1 and 4. The balls are pair wise indistinguishable.
Example 1: Given strings are a='xyy' and b='kkd'.
Possible solution: 'kkd', 'dkk'.
Reason: We see that all unique permutations of b are 'kkd', 'kdk' and 'dkk'. However with our restraints, the two first permutations are considered equal as the indices on which the differ maps to the same char 'y' in string a.
Example 2: Given strings are a='xyy' and b='khd'.
Possible solution: 'khd', 'dkh', 'hkd'.
Example 3: Given strings are a='xxxx' and b='khhd'.
Possible solution: 'khhd'.
I can solve the problem of generating unique candidate b permutations using Narayana Pandita's algorithm as decribed on Wikipedia/Permutation.
The second part seams harder. My best shot is to join the two strings pairwise to a list, sort it and use it as a key in a lookup set. ('xx'+'hd' join→'xh','xd' sort→'xd','xh').
As my M is often very big, and as similarities in the strings are common, I currently generate way more b permutations than actually goes through the set filter. I would love to have an algorithm generating the correct ones directly. Any improvement is welcome.
To generate k-combinations of possibly repeated elements (multiset), the following could be useful: A Gray Code for Combinations of a Multiset (1995).
For a recursive solution you try the following:
Count the number of times each character appears. Say they are x1 x2 ... xm, corresponding to m distinct characters.
Then you need to find all possible ordered pairs (y1 y2 ... ym) such that
0 <= yi <= xi
and Sum yi = k.
Here yi is the number of times character i appears.
The idea is, fix the number of times char 1 appears (y1). Then recursively generate all combinations of k-y1 from the remaining.
psuedocode:
List Generate (int [] x /* array index starting at 1*/,
int k /* size of set */) {
list = List.Empty;
if (Sum(x) < k) return list;
for (int i = 0; i <= x[1], i++) {
// Remove first element and generate subsets of size k-i.
remaining = x.Remove(1);
list_i = Generate(remaining, k-i);
if (list_i.NotEmpty()) {
list = list + list_i;
} else {
return list;
}
}
return list;
}
PRIOR TO EDITS:
If I understood it correctly, you need to look at string a, see the symbols that appear exactly once. Say there are k such symbols. Then you need to generate all possible permutations of b, which contain k elements and map to those symbols at the corresponding positions. The rest you can ignore/fill in as you see fit.
I remember posting C# code for that here: How to find permutation of k in a given length?
I am assuming xxyy will give only 1 unique string and the ones that appear exactly once are the 'distinguishing' points.
Eg in case of a=xyy, b=add
distinguishing point is x
So you select permuations of 'add' of length 1. Those gives you a and d.
Thus add and dad (or dda) are the ones you need.
For a=xyyz b=good
distinguishing points are x and z
So you generate permutations of b of length 2 giving
go
og
oo
od
do
gd
dg
giving you 7 unique permutations.
Does that help? Is my understanding correct?
Ok, I'm sorry I never was able to clearly explain the problem, but here is a solution.
We need two functions combinations and runvector(v). combinations(s,k) generates the unique combinations of a multiset of a length k. For s='xxyy' these would be ['xx','xy','yy']. runvector(v) transforms a multiset represented as a sorted vector into a more simple structure, the runvector. runvector('cddeee')=[1,2,3].
To solve the problem, we will use recursive generators. We run through all the combinations that fits in box1 and the recourse on the rest of the boxes, banning the values we already chose. To accomplish the banning, combinations will maintain a bitarray across of calls.
In python the approach looks like this:
def fillrest(banned,out,rv,b,i):
if i == len(rv):
yield None
return
for comb in combinations(b,rv[i],banned):
out[i] = comb
for rest in fillrest(banned,out,rv,b,i+1):
yield None
def balls(a,b):
rv = runvector(a)
banned = [False for _ in b]
out = [None for _ in rv]
for _ in fill(out,rv,0,b,banned):
yield out[:]
>>> print list(balls('abbccc','xyyzzz'))
[['x', 'yy', 'zzz'],
['x', 'yz', 'yzz'],
['x', 'zz', 'yyz'],
['y', 'xy', 'zzz'],
['y', 'xz', 'yzz'],
['y', 'yz', 'xzz'],
['y', 'zz', 'xyz'],
['z', 'xy', 'yzz'],
['z', 'xz', 'yyz'],
['z', 'yy', 'xzz'],
['z', 'yz', 'xyz'],
['z', 'zz', 'xyy']]
The output are in 'box' format, but can easily be merged back to simple strings: 'xyyzzzz', 'xyzyzz'...

Counting combinations of pairs of items from multiple lists without repetition

Given a scenario where we have multiple lists of pairs of items, for example:
{12,13,14,23,24}
{14,15,25}
{16,17,25,26,36}
where 12 is a pair of items '1' and '2' (and thus 21 is equivalent to 12), we want to count the number of ways that we can choose pairs of items from each of the lists such that no single item is repeated. You must select one, and only one pair, from each list. The number of items in each list and the number of lists is arbitrary, but you can assume there are at least two lists with at least one pair of items per list. And the pairs are made from symbols from a finite alphabet, assume digits [1-9]. Also, a list can neither contain duplicate pairs {12,12} or {12,21} nor can it contain symmetric pairs {11}.
More specifically, in the example above, if we choose the pair of items 14 from the first list, then the only choice we have for the second list is 25 because 14 and 15 contain a '1'. And consequently, the only choice from the third list is 36 because 16 and 17 contain a '1', and 25 and 26 contain a '2'.
Does anyone know of an efficient way to count the total combinations of pairs of items without going through every permutation of choices and asking "is this a valid selection?", as the lists can each contain hundreds of pairs of items?
UPDATE
After spending some time with this, I realized that it is trivial to count the number of combinations when none of the lists share a distinct pair. However, as soon as a distinct pair is shared between two or more lists, the combinatorial formula does not apply.
As of now, I've been trying to figure out if there is a way (using combinatorial math and not brute force) to count the number of combinations in which every list has the same pairs of items. For example:
{12,23,34,45,67}
{12,23,34,45,67}
{12,23,34,45,67}
The problem is #P-complete. This is even HARDER than NP-complete. It is as hard as finding the number of satisfying assignments to an instance of SAT.
The reduction is from Perfect matching. Suppose you have the graph G = {V, E} where E, the set of edges, is a list of pairs of vertices (those pairs that are connected by an edge). Then encode an instance of "pairs of items" by having |V|/2 copies of E. In other words, have a number of copies of E equal to half of the number of vertices. Now, a "hit" in your case would correspond to |V|/2 edges with no repeated vertices, implying that all |V| vertices were covered. This is the definition of a perfect matching. And every perfect matching would be a hit -- it's a 1-1 correspondence.
Lets says that every element in the lists is a node in a graph. There is an edge between two nodes if they can be selected together (they have no common symbol). There is no edge between two nodes of the same list. If we have n lists the problem is to find the number of cliques of size n in this graph. There is no clique which is bigger than n elements. Given that finding out whether at least one such clique exists is np-complete I think this problem is np-complete. See: http://en.wikipedia.org/wiki/Clique_problem
As pointed out we have to prove that solving this problem can solve the Clique problem to show that this is NP-complete. If we can count the number of required sets ie the number of n size cliques then we know whether there is at least one clique with size n. Unfortunatelly if there is no clique of size n then we don't know whether there are cliques with size k < n.
Another question is whether we can represent any graph in this problem. I guess yes but I am not sure about it.
I still feel this is NP-Complete
While the problem looks quite simple it could be related to the NP-complete Set Cover Problem. So it could be possible that there is no efficent way to detect valid combinations, hence no efficent way to count them.
UPDATE
I thought about the list items beeing pairs because it seems to make the problem harder to attack - you have to check two properties for one item. So I looked for a way to reduce the pair to a scalar item and found a way.
Map the set of the n symbols to the set of the first n primes - I will call this function M. In the case of the symbols 0 to 9 we obtain the following mapping and M(4) = 11 for example.
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9} => {2, 3, 5, 7, 11, 13, 17, 19, 23, 29}
Now we can map a pair (n, m) using the mapping X to the product of the mappings of n and m. This will turn the pair (2, 5) into X((2, 5)) = M(2) * M(5) = 5 * 13 = 65.
X((n, m)) = M(n) * M(m)
Why all this? If we have two pairs (a, b) and (c, d) from two lists, map them using the mapping X to x and y and multiply them, we obtain the product x * y = M(a) * M(b) * M(c) * M(d) - a product of four primes. We can extend the product by more factors by selecting a pair from each list and obtain a product of 2w primes if we have w lists. The final question is what does this product tell us about the pairs we selected and multiplied? If the selected pairs form a valid selection, we never choose one symbol twice, hence the product contains no prime twice and is square free. If the selection is invalid the product contains at least one prime twice and is not square free. And here a final example.
X((2, 5)) = 5 * 13 = 65
X((3, 6)) = 7 * 17 = 119
X((3, 4)) = 7 * 11 = 77
Selecting 25 and 36 yields 65 * 119 = 7735 = 5 * 7 * 13 * 17 and is square free, hence valid. Selecting 36 and 34 yields 119 * 77 = 9163 = 7² * 11 * 17 and is not square free, hence not valid.
Also note how nicely this preserves the symmetrie - X((m, n)) = X((n, m)) - and prohibites symmetric pairs because X((m, m)) = M(m) * M(m) is not square free.
I don't know if this will be any help, but now you know it and can think about it...^^
This is the first part of an reduction of a 3-SAT problem to this problem. The 3-SET problem is the following.
(!A | B | C) & (B | !C | !D) & (A | !B)
And here is the reduction as far as I got.
m-n represents a pair
a line reprresents a list
an asterisk represents an abitrary unique symbol
A1-A1' !A1-!A1' => Select A true or false
B1-B1' !B1-!B1' => Select B true or false
C1-C1' !C1-!C1' => Select C true or false
D1-D1' !D1-!D1' => Select D true or false
A1-* !B1-* !C1-* => !A | B | C
A2-!A1' !A2-A1' => Create a copy of A
B2-!B1' !B2-B1' => Create a copy of B
C2-!C1' !C2-C1' => Create a copy of C
D2-!D1' !D2-D1' => Create a copy of D
!B2-* C2-* D2-* => B | !C | !D
(How to perform a second copy of the four variables???)
!A3-* B3-*
If I (or somebody else) can complete this reduction and show how to do it in the general case, this will proof the problem NP-complete. I am just stuck with copying the variables a second time.
I am going to say there is no calculation that you can do other than brute force becuse there is a function that has to be evaluated to decide whether an item from set B can be used given the item chosen in set A. Simple combinatorial math wont work.
You can speed up the calculation by 1 to 2 magnitudes using memoization and hashing.
Memoization is remembering previous results of similar brute force paths. If you are at list n and you have already consumed symbols x,y,z and previously you have encountered this situation, then you will be adding in the same number of possible combinations from the remaining lists. It does not matter how you got to list n using x,y,z. So, use a cached result if there is one, or continue the calc to the next list and check there. If you make a brute force recursive algorithm to calculate the result, but cache results, this works great.
The key to the saved result is: the current list, and the symbols that have been used. Sort the symbols to make your key. I think a dictionary or an array of dictionaries makes sense here.
Use hashing to reduce the number of pairs that need to be searched in each list. For each list, make a hash of the pairs that would be available given that a certain number of symbols are already consumed. Choose the number of consumed symbols you want to use in your hash based on how much memory you want to use and the time you want to spend pre-calculating. I think using 1-2 symbols would be good. Sort these hashes by the number of items in them...ascending, and then keep the top n. I say throw out the rest, becasue if the hash only reduces your work a small amount, its probably not worth keeping (it will take longer to find the hash if there are more of them). So as you are going through the lists, you can do a quick scan the list's hash to see if you have used a symbol in the hash. If you have, then use the first hash that comes up to scan the list. The first hash would contain the fewest pairs to scan. If you are really handy, you might be able to build these hashes as you go and not waste time up front to do it.
You might be able to toss the hash and use a tree, but my guess is that filling the tree will take a long time.
Constraint programming is a nice approach if you want to generate all the combinations. Just to try it out, I wrote a model using Gecode (version 3.2.2) to solve your problem. The two examples given are very easy to solve, but other instances might be harder. It should be better than generate and test in any case.
/*
* Main authors:
* Mikael Zayenz Lagerkvist <lagerkvist#gecode.org>
*
* Copyright:
* Mikael Zayenz Lagerkvist, 2009
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
* LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
* OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
* WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*
*/
#include <gecode/driver.hh>
#include <gecode/int.hh>
#include <gecode/minimodel.hh>
using namespace Gecode;
namespace {
/// List of specifications
extern const int* specs[];
/// Number of specifications
extern const unsigned int n_specs;
}
/**
* \brief Selecting pairs
*
* Given a set of lists of pairs of values, select a pair from each
* list so that no value is selected more than once.
*
*/
class SelectPairs : public Script {
protected:
/// Specification
const int* spec;
/// The values from all selected pairs
IntVarArray s;
public:
/// The actual problem
SelectPairs(const SizeOptions& opt)
: spec(specs[opt.size()]),
s(*this,spec[0] * 2,Int::Limits::min, Int::Limits::max) {
int pos = 1; // Position read from spec
// For all lists
for (int i = 0; i < spec[0]; ++i) {
int npairs = spec[pos++];
// Construct TupleSet for pairs from list i
TupleSet ts;
for (int p = 0; p < npairs; ++p) {
IntArgs tuple(2);
tuple[0] = spec[pos++];
tuple[1] = spec[pos++];
ts.add(tuple);
}
ts.finalize();
// <s[2i],s[2i+1]> must be from list i
IntVarArgs pair(2);
pair[0] = s[2*i]; pair[1] = s[2*i + 1];
extensional(*this, pair, ts);
}
// All values must be pairwise distinct
distinct(*this, s, opt.icl());
// Select values for the variables
branch(*this, s, INT_VAR_SIZE_MIN, INT_VAL_MIN);
}
/// Constructor for cloning \a s
SelectPairs(bool share, SelectPairs& sp)
: Script(share,sp), spec(sp.spec) {
s.update(*this, share, sp.s);
}
/// Perform copying during cloning
virtual Space*
copy(bool share) {
return new SelectPairs(share,*this);
}
/// Print solution
virtual void
print(std::ostream& os) const {
os << "\t";
for (int i = 0; i < spec[0]; ++i) {
os << "(" << s[2*i] << "," << s[2*i+1] << ") ";
if ((i+1) % 10 == 0)
os << std::endl << "\t";
}
if (spec[0] % 10)
os << std::endl;
}
};
/** \brief Main-function
* \relates SelectPairs
*/
int
main(int argc, char* argv[]) {
SizeOptions opt("SelectPairs");
opt.iterations(500);
opt.size(0);
opt.parse(argc,argv);
if (opt.size() >= n_specs) {
std::cerr << "Error: size must be between 0 and "
<< n_specs-1 << std::endl;
return 1;
}
Script::run<SelectPairs,DFS,SizeOptions>(opt);
return 0;
}
namespace {
const int s0[] = {
// Number of lists
3,
// Lists (number of pairs, pair0, pair1, ...)
5, 1,2, 1,3, 1,4, 2,3, 2,4,
3, 1,4, 1,5, 2,5,
5, 1,6, 1,7, 2,5, 2,6, 3,6
};
const int s1[] = {
// Number of lists
3,
// Lists (number of pairs, pair0, pair1, ...)
5, 1,2, 2,3, 3,4, 4,5, 6,7,
5, 1,2, 2,3, 3,4, 4,5, 6,7,
5, 1,2, 2,3, 3,4, 4,5, 6,7
};
const int *specs[] = {s0, s1};
const unsigned n_specs = sizeof(specs)/sizeof(int*);
}
First try.. Here is an algorithm with an improved reduced average complexity than brute force. Essentially you create strings with increasing lengths in each iteration. This may not be the best solution but we will wait for the best one to come by... :)
Start with list 1. All entries in that list are valid solutions of length 2 (#=5)
Next, when you introduce list 2. keep a record of all valid solutions of length 4, which end up being {1425, 2314, 2315, 2415} (#=4).
When you add the third list to the mix, repeat the process. You will end up with {142536, 241536} (#=2).
The complexity reduction comes in place because you are throwing away bad strings in each iteration. The worst case scenario happens to be still the same -- in the case that all pairs are distinct.
This feels like a good problem to which to apply a constraint programming approach. To the list of packages provided by Wikipedia I'll add that I've had good experience using Gecode in the past; the Gecode examples also provide a basic tutorial to constraint programming. Constraint Processing is a good book on the subject if you want to dig deeper into how the algorithms work.

Resources