algorithm for series to calculate the maximum descend inside? - algorithm

Given a series x(i), i from 1 to N, let's say N = 10,000.
for any i < j,
D(i,j) = x(i) - x(j), if x(i) > x (j); or,
= 0, if x(i) <= x(j).
Define
Dmax(im, jm) := max D(i,j), for all 1 <= i < j <=N.
What's the best algorithm to calculate Dmax, im, and jm?
I tried to use Dynamic programming, but this seems is not dividable... Then i'm a bit lost... Could you guys please suggest? is backtracking the way out?

Iterate over the series, keeping track of the following values:
The maximum element so far
The maximum descent so far
For each element, there are two possible values for the new maximum descent:
It remains the same
It equals maximumElementSoFar - newElement
So pick the one which gives the higher value. The maximum descent at the end of iteration will be your result. This will work in linear time and constant additional space.

If I understand you correctly you have an array of numbers, and want to find the largest positive difference between two neighbouring elements of the array ?
Since you're going to have to go through the array at least once, to compute the differences, I don't see why you can't just keep a record, as you go, of the largest difference found so far (and of its location), updating as that changes.
If your problem is as simple as I understand it, I'm not sure why you need to think about dynamic programming. I expect I've misunderstood the question.

Dmax(im, jm) := max D(i,j) = max(x(i) -x(j),0) = max(max(x(i) -x(j)),0)
You just need to compute x(i) -x(j) for all values , which is O(n^2), and then get the max. No need for dynamic programming.

You can divide the series x(i) into sub series where each sub series contains and descending sub list of x(i) (e.g if x = 5, 4, 1, 2, 1 then x1 = 5, 4, 1 and x2 = 2, 1) and then in each sub list you can do: first_in_sub_series - last_sub_series and then compare all the results you get and find the maximum and this is the answer.
If i understood the problem correctly this should provide you with a basic linear algorithm to solve it.
e.g:
x = 5, 4, 1, 2, 1 then x1 = 5, 4, 1 and x2 = 2, 1
rx1 = 4
rx2 = 1
dmax = 4 and im = 1 and jm = 3 because we are talking about x1 which is the first 3 items of x.

Related

Daily Coding Problem 316 : Coin Change Problem - determination of denomination?

I'm going through the Daily Coding Problems and am currently stuck in one of the problems. It goes by:
You are given an array of length N, where each element i represents
the number of ways we can produce i units of change. For example, [1,
0, 1, 1, 2] would indicate that there is only one way to make 0, 2, or
3 units, and two ways of making 4 units.
Given such an array, determine the denominations that must be in use.
In the case above, for example, there must be coins with values 2, 3,
and 4.
I'm unable to figure out how to determine the denomination from the total number of ways array. Can you work it out?
Somebody already worked out this problem here, but it's devoid of any explanation.
From what I could gather is that he collects all the elements whose value(number of ways == 1) and appends it to his answer, but I think it doesn't consider the fact that the same number can be formed from a combination of lower denominations for which still the number of ways would come out to be 1 irrespective of the denomination's presence.
For example, in the case of arr = [1, 1, a, b, c, 1]. We know that denomination 1 exists since arr[1] = 1. Now we can also see that arr[5] = 1, this should not necessarily mean that denomination 5 is available since 5 can be formed using coins of denomination 1, i.e. (1 + 1 + 1 + 1 + 1).
Thanks in advance!
If you're solving the coin change problem, the best technique is to maintain an array of ways of making change with a partial set of the available denominations, and add in a new denomination d by updating the array like this:
for i = d upto N
a[i] += a[i-d]
Your actual problem is the reverse of this: finding denominations based on the total number of ways. Note that if you know one d, you can remove it from the ways array by reversing the above procedure:
for i = N downto d
a[i] -= a[i-d]
You can find the lowest denomination available by looking for the first 1 in the array (other than the value at index 0, which is always 1). Then, once you've found the lowest denomination, you can remove its effect on the ways array, and repeat until the array is zeroed (except for the first value).
Here's a full solution in Python:
def rways(A):
dens = []
for i in range(1, len(A)):
if not A[i]: continue
dens.append(i)
for j in range(len(A)-1, i-1, -1):
A[j] -= A[j-i]
return dens
print(rways([1, 0, 1, 1, 2]))
You might want to add error-checking: if you find a non-zero value that's not 1 when searching for the next denomination, then the original array isn't valid.
For reference and comparison, here's some code for computing the ways of making change from a set of denominations:
def ways(dens, N):
A = [1] + [0] * N
for d in dens:
for i in range(d, N+1):
A[i] += A[i-d]
return A
print(ways([2, 3, 4], 4))

Need help understanding the solution for the Jewelry Topcoder solution

I am fairly new to dynamic programming and don't yet understand most of the types of problems it can solve. Hence I am facing problems in understaing the solution of Jewelry topcoder problem.
Can someone at least give me some hints as to what the code is doing ?
Most importantly is this problem a variant of the subset-sum problem ? Because that's what I am studying to make sense of this problem.
What are these two functions actually counting ? Why are we using actually two DP tables ?
void cnk() {
nk[0][0]=1;
FOR(k,1,MAXN) {
nk[0][k]=0;
}
FOR(n,1,MAXN) {
nk[n][0]=1;
FOR(k,1,MAXN)
nk[n][k] = nk[n-1][k-1]+nk[n-1][k];
}
}
void calc(LL T[MAXN+1][MAX+1]) {
T[0][0] = 1;
FOR(x,1,MAX) T[0][x]=0;
FOR(ile,1,n) {
int a = v[ile-1];
FOR(x,0,MAX) {
T[ile][x] = T[ile-1][x];
if(x>=a) T[ile][x] +=T[ile-1][x-a];
}
}
}
How is the original solution constructed by using the following logic ?
FOR(u,1,c) {
int uu = u * v[done];
FOR(x,uu,MAX)
res += B[done][x-uu] * F[n-done-u][x] * nk[c][u];
}
done=p;
}
Any help would be greatly appreciated.
Let's consider the following task first:
"Given a vector V of N positive integers less than K, find the number of subsets whose sum equals S".
This can be solved in polynomial time with dynamic programming using some extra-memory.
The dynamic programming approach goes like this:
instead of solving the problem for N and S, we will solve all the problems of the following form:
"Find the number of ways to write sum s (with s ≤ S) using only the first n ≤ N of the numbers".
This is a common characteristic of the dynamic programming solutions: instead of only solving the original problem, you solve an entire family of related problems. The key idea is that solutions for more difficult problem settings (i.e. higher n and s) can efficiently be built up from the solutions of the easier settings.
Solving the problem for n = 0 is trivial (sum s = 0 can be expressed in one way -- using the empty set, while all other sums can't be expressed in any ways).
Now consider that we have solved the problem for all values up to a certain n and that we have these solutions in a matrix A (i.e. A[n][s] is the number of ways to write sum s using the first n elements).
Then, we can find the solutions for n+1, using the following formula:
A[n+1][s] = A[n][s - V[n+1]] + A[n][s].
Indeed, when we write the sum s using the first n+1 numbers we can either include or not V[n+1] (the n+1th term).
This is what the calc function computes. (the cnk function uses Pascal's rule to compute binomial coefficients)
Note: in general, if in the end we are only interested in answering the initial problem (i.e. for N and S), then the array A can be uni-dimensional (with length S) -- this is because whenever trying to construct solutions for n + 1 we only need the solutions for n, and not for smaller values).
This problem (the one initially stated in this answer) is indeed related to the subset sum problem (finding a subset of elements with sum zero).
A similar type of dynamic programming approach can be applied if we have a reasonable limit on the absolute values of the integers used (we need to allocate an auxiliary array to represent all possible reachable sums).
In the zero-sum problem we are not actually interested in the count, thus the A array can be an array of booleans (indicating whether a sum is reachable or not).
In addition, another auxiliary array, B can be used to allow reconstructing the solution if one exists.
The recurrence would now look like this:
if (!A[s] && A[s - V[n+1]]) {
A[s] = true;
// the index of the last value used to reach sum _s_,
// allows going backwards to reproduce the entire solution
B[s] = n + 1;
}
Note: the actual implementation requires some additional care for handling the negative sums, which can not directly represent indices in the array (the indices can be shifted by taking into account the minimum reachable sum, or, if working in C/C++, a trick like the one described in this answer can be applied: https://stackoverflow.com/a/3473686/6184684).
I'll detail how the above ideas apply in the TopCoder problem and its solution linked in the question.
The B and F matrices.
First, note the meaning of the B and F matrices in the solution:
B[i][s] represents the number of ways to reach sum s using only the smallest i items
F[i][s] represents the number of ways to reach sum s using only the largest i items
Indeed, both matrices are computed using the calc function, after sorting the array of jewelry values in ascending order (for B) and descending order (for F).
Solution for the case with no duplicates.
Consider first the case with no duplicate jewelry values, using this example: [5, 6, 7, 11, 15].
For the reminder of the answer I will assume that the array was sorted in ascending order (thus "first i items" will refer to the smallest i ones).
Each item given to Bob has value less (or equal) to each item given to Frank, thus in every good solution there will be a separation point such that Bob receives only items before that separation point, and Frank receives only items after that point.
To count all solutions we would need to sum over all possible separation points.
When, for example, the separation point is between the 3rd and 4th item, Bob would pick items only from the [5, 6, 7] sub-array (smallest 3 items), and Frank would pick items from the remaining [11, 12] sub-array (largest 2 items). In this case there is a single sum (s = 11) that can be obtained by both of them. Each time a sum can be obtained by both, we need to multiply the number of ways that each of them can reach the respective sum (e.g. if Bob could reach a sum s in 4 ways and Frank could reach the same sum s in 5 ways, then we could get 20 = 4 * 5 valid solutions with that sum, because each combination is a valid solution).
Thus we would get the following code by considering all separation points and all possible sums:
res = 0;
for (int i = 0; i < n; i++) {
for (int s = 0; s <= maxS; s++) {
res += B[i][s] * F[n-i][s]
}
}
However, there is a subtle issue here. This would often count the same combination multiple times (for various separation points). In the example provided above, the same solution with sum 11 would be counted both for the separation [5, 6] - [7, 11, 15], as well as for the separation [5, 6, 7] - [11, 15].
To alleviate this problem we can partition the solutions by "the largest value of an item picked by Bob" (or, equivalently, by always forcing Bob to include in his selection the largest valued item from the first sub-array under the current separation).
In order to count the number of ways to reach sum s when Bob's largest valued item is the ith one (sorted in ascending order), we can use B[i][s - v[i]]. This holds because using the v[i] valued item implies requiring the sum s - v[i] to be expressed using subsets from the first i items (indices 0, 1, ... i - 1).
This would be implemented as follows:
res = 0;
for (int i = 0; i < n; i++) {
for (int s = v[i]; s <= maxS; s++) {
res += B[i][s - v[i]] * F[n - 1 - i][s];
}
}
This is getting closer to the solution on TopCoder (in that solution, done corresponds to the i above, and uu = v[i]).
Extension for the case when duplicates are allowed.
When duplicate values can appear in the array, it's no longer easy to directly count the number of solutions when Bob's most valuable item is v[i]. We need to also consider the number of such items picked by Bob.
If there are c items that have the same value as v[i], i.e. v[i] = v[i+1] = ... v[i + c - 1], and Bob picks u such items, then the number of ways for him to reach a certain sum s is equal to:
comb(c, u) * B[i][s - u * v[i]] (1)
Indeed, this holds because the u items can be picked from the total of c which have the same value in comb(c, u) ways. For each such choice of the u items, the remaining sum is s - u * v[i], and this should be expressed using a subset from the first i items (indices 0, 1, ... i - 1), thus it can be done in B[i][s - u * v[i]] ways.
For Frank, if Bob used u of the v[i] items, the number of ways to express sum s will be equal to:
F[n - i - u][s] (2)
Indeed, since Bob uses the smallest i + u values, Frank can use any of the largest n - i - u values to reach the sum s.
By combining relations (1) and (2) from above, we obtain that the number of solutions where both Frank and Bob have sum s, when Bob's most valued item is v[i] and he picks u such items is equal to:
comb(c, u) * B[i][s - u * v[i]] * F[n - i - u][s].
This is precisely what the given solution implements.
Indeed, the variable done corresponds to variable i above, variable x corresponds to sums s, the index p is used to determine the c items with same value as v[done], and the loop over u is used in order to consider all possible numbers of such items picked by Bob.
Here's some Java code for this that references the original solution. It also incorporates qwertyman's fantastic explanations (to the extent feasible). I've added some of my comments along the way.
import java.util.*;
public class Jewelry {
int MAX_SUM=30005;
int MAX_N=30;
long[][] C;
// Generate all possible sums
// ret[i][sum] = number of ways to compute sum using the first i numbers from val[]
public long[][] genDP(int[] val) {
int i, sum, n=val.length;
long[][] ret = new long[MAX_N+1][MAX_SUM];
ret[0][0] = 1;
for(i=0; i+1<=n; i++) {
for(sum=0; sum<MAX_SUM; sum++) {
// Carry over the sum from i to i+1 for each sum
// Problem definition allows excluding numbers from calculating sums
// So we are essentially excluding the last number for this calculation
ret[i+1][sum] = ret[i][sum];
// DP: (Number of ways to generate sum using i+1 numbers =
// Number of ways to generate sum-val[i] using i numbers)
if(sum>=val[i])
ret[i+1][sum] += ret[i][sum-val[i]];
}
}
return ret;
}
// C(n, r) - all possible combinations of choosing r numbers from n numbers
// Leverage Pascal's polynomial co-efficients for an n-degree polynomial
// Leverage Dynamic Programming to build this upfront
public void nCr() {
C = new long[MAX_N+1][MAX_N+1];
int n, r;
C[0][0] = 1;
for(n=1; n<=MAX_N; n++) {
C[n][0] = 1;
for(r=1; r<=MAX_N; r++)
C[n][r] = C[n-1][r-1] + C[n-1][r];
}
}
/*
General Concept:
- Sort array
- Incrementally divide array into two partitions
+ Accomplished by using two different arrays - L for left, R for right
- Take all possible sums on the left side and match with all possible sums
on the right side (multiply these numbers to get totals for each sum)
- Adjust for common sums so as to not overcount
- Adjust for duplicate numbers
*/
public long howMany(int[] values) {
int i, j, sum, n=values.length;
// Pre-compute C(n,r) and store in C[][]
nCr();
/*
Incrementally split the array and calculate sums on either side
For eg. if val={2, 3, 4, 5, 9}, we would partition this as
{2 | 3, 4, 5, 9} then {2, 3 | 4, 5, 9}, etc.
First, sort it ascendingly and generate its sum matrix L
Then, sort it descendingly, and generate another sum matrix R
In later calculations, manipulate indexes to simulate the partitions
So at any point L[i] would correspond to R[n-i-1]. eg. L[1] = R[5-1-1]=R[3]
*/
// Sort ascendingly
Arrays.sort(values);
// Generate all sums for the "Left" partition using the sorted array
long[][] L = genDP(values);
// Sort descendingly by reversing the existing array.
// Java 8 doesn't support Arrays.sort for primitive int types
// Use Comparator or sort manually. This uses the manual sort.
for(i=0; i<n/2; i++) {
int tmp = values[i];
values[i] = values[n-i-1];
values[n-i-1] = tmp;
}
// Generate all sums for the "Right" partition using the re-sorted array
long[][] R = genDP(values);
// Re-sort in ascending order as we will be using values[] as reference later
Arrays.sort(values);
long tot = 0;
for(i=0; i<n; i++) {
int dup=0;
// How many duplicates of values[i] do we have?
for(j=0; j<n; j++)
if(values[j] == values[i])
dup++;
/*
Calculate total by iterating through each sum and multiplying counts on
both partitions for that sum
However, there may be count of sums that get duplicated
For instance, if val={2, 3, 4, 5, 9}, you'd get:
{2, 3 | 4, 5, 9} and {2, 3, 4 | 5, 9} (on two different iterations)
In this case, the subset {2, 3 | 5} is counted twice
To account for this, exclude the current largest number, val[i], from L's
sum and exclude it from R's i index
There is another issue of duplicate numbers
Eg. If values={2, 3, 3, 3, 4}, how do you know which 3 went to L?
To solve this, group the same numbers
Applying to {2, 3, 3, 3, 4} :
- Exclude 3, 6 (3+3) and 9 (3+3+3) from L's sum calculation
- Exclude 1, 2 and 3 from R's index count
We're essentially saying that we will exclude the sum contribution of these
elements to L and ignore their count contribution to R
*/
for(j=1; j<=dup; j++) {
int dup_sum = j*values[i];
for(sum=dup_sum; sum<MAX_SUM; sum++) {
// (ways to pick j numbers from dup) * (ways to get sum-dup_sum from i numbers) * (ways to get sum from n-i-j numbers)
if(n-i-j>=0)
tot += C[dup][j] * L[i][sum-dup_sum] * R[n-i-j][sum];
}
}
// Skip past the duplicates of values[i] that we've now accounted for
i += dup-1;
}
return tot;
}
}

Best way to distribute a given resource (eg. budget) for optimal output

I am trying to find a solution in which a given resource (eg. budget) will be best distributed to different options which yields different results on the resource provided.
Let's say I have N = 1200 and some functions. (a, b, c, d are some unknown variables)
f1(x) = a * x
f2(x) = b * x^c
f3(x) = a*x + b*x^2 + c*x^3
f4(x) = d^x
f5(x) = log x^d
...
And also, let's say there n number of these functions that yield different results based on its input x, where x = 0 or x >= m, where m is a constant.
Although I am not able to find exact formula for the given functions, I am able to find the output. This means that I can do:
X = f1(N1) + f2(N2) + f3(N3) + ... + fn(Nn) where (N1 + ... Nn) = N as many times as there are ways of distributing N into n numbers, and find a specific case where X is the greatest.
How would I actually go about finding the best distribution of N with the least computation power, using whatever libraries currently available?
If you are happy with allocations constrained to be whole numbers then there is a dynamic programming solution of cost O(Nn) - so you can increase accuracy by scaling if you want, but this will increase cpu time.
For each i=1 to n maintain an array where element j gives the maximum yield using only the first i functions giving them a total allowance of j.
For i=1 this is simply the result of f1().
For i=k+1 consider when working out the result for j consider each possible way of splitting j units between f_{k+1}() and the table that tells you the best return from a distribution among the first k functions - so you can calculate the table for i=k+1 using the table created for k.
At the end you get the best possible return for n functions and N resources. It makes it easier to find out what that best answer is if you maintain of a set of arrays telling the best way to distribute k units among the first i functions, for all possible values of i and k. Then you can look up the best allocation for f100(), subtract off the value this allocated to f100() from N, look up the best allocation for f99() given the resulting resources, and carry on like this until you have worked out the best allocations for all f().
As an example suppose f1(x) = 2x, f2(x) = x^2 and f3(x) = 3 if x>0 and 0 otherwise. Suppose we have 3 units of resource.
The first table is just f1(x) which is 0, 2, 4, 6 for 0,1,2,3 units.
The second table is the best you can do using f1(x) and f2(x) for 0,1,2,3 units and is 0, 2, 4, 9, switching from f1 to f2 at x=2.
The third table is 0, 3, 5, 9. I can get 3 and 5 by using 1 unit for f3() and the rest for the best solution in the second table. 9 is simply the best solution in the second table - there is no better solution using 3 resources that gives any of them to f(3)
So 9 is the best answer here. One way to work out how to get there is to keep the tables around and recalculate that answer. 9 comes from f3(0) + 9 from the second table so all 3 units are available to f2() + f1(). The second table 9 comes from f2(3) so there are no units left for f(1) and we get f1(0) + f2(3) + f3(0).
When you are working the resources to use at stage i=k+1 you have a table form i=k that tells you exactly the result to expect from the resources you have left over after you have decided to use some at stage i=k+1. The best distribution does not become incorrect because that stage i=k you have worked out the result for the best distribution given every possible number of remaining resources.

Choose three numbers from a sequence such that their sum is less than a value

Let's say I have an sequence of numbers :
1, 2, 3, 4, 5, 2, 4, 1
I wonder about algorithm which could say
how many possible ways of choosing 3 numbers from sequence above exist, such that their sum doesn't exceed 7?
I was asked to write a program to solve the problem. Are there any program techniques I can use?
I will be appreciate your answer!
To get the lowest 3-sum possible, you will simply need to choose the lowest 3 numbers. If this number is lower than the given number - you are done. Otherwise you can answer - there is no such solution, since every other sum you get is bigger than the one you just found, which by its own is bigger than the desired number.
If you wish to find out "How many different summations there are to a number smaller than the given number", that's a different problem, that can be solved using Dynamic Programming in O(n*number*3) = O(n*number):
f(x,i,3) = (x <+ 0 ? 0 : 1)
f(_,n,_) = 0 //out of bound
f(x,i,used) = f(x-arr[i],i+1, used + 1) + f(x,i+1,used)
Invoke with f(number,0,0)
The following program written in Python 3.4.1 gives one solution that may help you with the problem.
NUMBERS = 1, 2, 3, 4, 5, 2, 4, 1
TARGET = 7
USING = 3
def main():
candidates = sorted(NUMBERS)[:USING]
if sum(candidates) <= TARGET:
print('Your numbers are', candidates)
else:
print('Your goal is not possible.')
if __name__ == '__main__':
main()
Edit:
Based on your comment that you want all possible solutions, the following provides this information along with the number of unique solutions. A solution is considered to be the same as another if both have the same numbers in them (regardless of order).
import itertools
NUMBERS = 1, 2, 3, 4, 5, 2, 4, 1
TARGET = 7
USING = 3
def main():
# Find all possible solutions.
solutions = []
for candidates in itertools.combinations(NUMBERS, USING):
if sum(candidates) <= TARGET:
print('Solution:', candidates)
solutions.append(candidates)
print('There are', len(solutions), 'solutions to your problem.')
# Find all unique solutions.
unique = {tuple(sorted(answer)) for answer in solutions}
print('However, only', len(unique), 'answers are unique.')
for answer in sorted(unique):
print('Unique:', answer)
if __name__ == '__main__':
main()
It is possible to obtain O(n^2) time complexity using two pointers technique:
Sort the numbers.
Let's fix the middle number. Let's assume that its index is mid.
For a fixed mid, you can maintain two indices: low and high. They correspond to the smallest and the biggest number in a sum. Initially, low = mid - 1 and high = mid + 1.
Now you can increment high by one in a loop and decrement low as long as the sum of 3 numbers is greater then S. For a fixed high and mid, low shows how many numbers can added to a[mid] and a[high] so that thier sum is <= S. Note that for a fixed mid, high can be incremented O(n) times and low can be decremented only O(n) times. Thus, time complexity is O(n^2).
This algorithm requires only O(1) additional space(for low, mid and high indices).
Use recursion. A C++ solution:
void count(std::vector<int>& arr, int totalTaken, int index, int currentSum, int expectedSum, int *totalSolutions){
if (index == arr.size()) return;
if (totalTaken == 3)
if (currentSum <= expectedSum)
(*totalSolutions)++;
else return;
count(arr, totalTaken++, idex++, curentSum+arr[index],expectedSum, totalSolutions)
count(arr, totalTaken, index++, currentSum, expectedSum, totalSolutions)
}
Call with count(your_vector,0,0,0,expectedSum,ptr2int) after the function has exectuted, you will have your result stored in *ptr2int

Determine whether a symbol is part of the ith combination nCr

UPDATE:
Combinatorics and unranking was eventually what I needed.
The links below helped alot:
http://msdn.microsoft.com/en-us/library/aa289166(v=vs.71).aspx
http://www.codeproject.com/Articles/21335/Combinations-in-C-Part-2
The Problem
Given a list of N symbols say {0,1,2,3,4...}
And NCr combinations of these
eg. NC3 will generate:
0 1 2
0 1 3
0 1 4
...
...
1 2 3
1 2 4
etc...
For the ith combination (i = [1 .. NCr]) I want to determine Whether a symbol (s) is part of it.
Func(N, r, i, s) = True/False or 0/1
eg. Continuing from above
The 1st combination contains 0 1 2 but not 3
F(N,3,1,"0") = TRUE
F(N,3,1,"1") = TRUE
F(N,3,1,"2") = TRUE
F(N,3,1,"3") = FALSE
Current approaches and tibits that might help or be related.
Relation to matrices
For r = 2 eg. 4C2 the combinations are the upper (or lower) half of a 2D matrix
1,2 1,3 1,4
----2,3 2,4
--------3,4
For r = 3 its the corner of a 3D matrix or cube
for r = 4 Its the "corner" of a 4D matrix and so on.
Another relation
Ideally the solution would be of a form something like the answer to this:
Calculate Combination based on position
The nth combination in the list of combinations of length r (with repitition allowed), the ith symbol can be calculated
Using integer division and remainder:
n/r^i % r = (0 for 0th symbol, 1 for 1st symbol....etc)
eg for the 6th comb of 3 symbols the 0th 1st and 2nd symbols are:
i = 0 => 6 / 3^0 % 3 = 0
i = 1 => 6 / 3^1 % 3 = 2
i = 2 => 6 / 3^2 % 3 = 0
The 6th comb would then be 0 2 0
I need something similar but with repition not allowed.
Thank you for following this question this far :]
Kevin.
I believe your problem is that of unranking combinations or subsets.
I will give you an implementation in Mathematica, from the package Combinatorica, but the Google link above is probably a better place to start, unless you are familiar with the semantics.
UnrankKSubset::usage = "UnrankKSubset[m, k, l] gives the mth k-subset of set l, listed in lexicographic order."
UnrankKSubset[m_Integer, 1, s_List] := {s[[m + 1]]}
UnrankKSubset[0, k_Integer, s_List] := Take[s, k]
UnrankKSubset[m_Integer, k_Integer, s_List] :=
Block[{i = 1, n = Length[s], x1, u, $RecursionLimit = Infinity},
u = Binomial[n, k];
While[Binomial[i, k] < u - m, i++];
x1 = n - (i - 1);
Prepend[UnrankKSubset[m - u + Binomial[i, k], k-1, Drop[s, x1]], s[[x1]]]
]
Usage is like:
UnrankKSubset[5, 3, {0, 1, 2, 3, 4}]
{0, 3, 4}
Yielding the 6th (indexing from 0) length-3 combination of set {0, 1, 2, 3, 4}.
There's a very efficient algorithm for this problem, which is also contained in the recently published:Knuth, The Art of Computer Programming, Volume 4A (section 7.2.1.3).
Since you don't care about the order in which the combinations are generated, let's use the lexicographic order of the combinations where each combination is listed in descending order. Thus for r=3, the first 11 combinations of 3 symbols would be: 210, 310, 320, 321, 410, 420, 421, 430, 431, 432, 510. The advantage of this ordering is that the enumeration is independent of n; indeed it is an enumeration over all combinations of 3 symbols from {0, 1, 2, …}.
There is a standard method to directly generate the ith combination given i, so to test whether a symbol s is part of the ith combination, you can simply generate it and check.
Method
How many combinations of r symbols start with a particular symbol s? Well, the remaining r-1 positions must come from the s symbols 0, 1, 2, …, s-1, so it's (s choose r-1), where (s choose r-1) or C(s,r-1) is the binomial coefficient denoting the number of ways of choosing r-1 objects from s objects. As this is true for all s, the first symbol of the ith combination is the smallest s such that
&Sum;k=0s(k choose r-1) ≥ i.
Once you know the first symbol, the problem reduces to finding the (i - &Sum;k=0s-1(k choose r-1))-th combination of r-1 symbols, where we've subtracted those combinations that start with a symbol less than s.
Code
Python code (you can write C(n,r) more efficiently, but this is fast enough for us):
#!/usr/bin/env python
tC = {}
def C(n,r):
if tC.has_key((n,r)): return tC[(n,r)]
if r>n-r: r=n-r
if r<0: return 0
if r==0: return 1
tC[(n,r)] = C(n-1,r) + C(n-1,r-1)
return tC[(n,r)]
def combination(r, k):
'''Finds the kth combination of r letters.'''
if r==0: return []
sum = 0
s = 0
while True:
if sum + C(s,r-1) < k:
sum += C(s,r-1)
s += 1
else:
return [s] + combination(r-1, k-sum)
def Func(N, r, i, s): return s in combination(r, i)
for i in range(1, 20): print combination(3, i)
print combination(500, 10000000000000000000000000000000000000000000000000000000000000000)
Note how fast this is: it finds the 10000000000000000000000000000000000000000000000000000000000000000th combination of 500 letters (it starts with 542) in less than 0.5 seconds.
I have written a class to handle common functions for working with the binomial coefficient, which is the type of problem that your problem falls under. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
This class can easily be applied to your problem. If you have the rank (or index) to the binomial coefficient table, then simply call the class method that returns the K-indexes in an array. Then, loop through that returned array to see if any of the K-index values match the value you have. Pretty straight forward...

Resources