Could you please suggest any algorithm ideas for solving the following problem:
Given positive integers S, K ≤ N. Find distinct positive integers x1, x2, ..., xK such that
x12 + x22 + ... + xK2 = S
xi ≤ N and xi < xi+1 for all i = 1...K.
So far I know that I should start searching for the largest x below the square root of S.
Here's a slightly optimised brute force. As I linked to in the comments (and close vote), it seems to me we could modify the standard subset sum problem to work with squares.
JavaScript code (ellipsis at the end of an output means add the rest of the numbers down to 1):
function f(S, K, N){
function g(s, k, n, c){
const minS = k*(k+1)*(2*k+1)/6;
const maxN = Math.min(
n,
~~Math.sqrt(s - (k-1)*k*(2*k-1)/6)
);
if (s < minS || maxN < 1)
return [];
else if (s == minS)
return [c.concat([k, '...'])];
if (k == 0)
return [];
return g(s - maxN*maxN, k - 1, maxN - 1, c.slice().concat(maxN))
.concat(
g(s, k, maxN - 1, c)
);
}
return g(S, K, N, []);
}
for (let j=1; j<6; j++)
console.log(JSON.stringify([178, j, f(178, j, 100)]));
console.log(JSON.stringify([178, 5, f(178, 5, 10)]));
Since you ask for an algorithm idea, and not a precise answer, so here is my 2 cents.
You can transform your problem to get rid of the squares by finding all perfect squares ( e.g. 1, 4, 9, 16, 25...) from 1 to S and put them into a set called PerfectSquareSetLessThanS.
Then, if you can solve the following (more general) problem, you can easily solve your original problem.
General problem:
Given an arbitary set V, find a subset W of V that has exactly K numbers such that they sum to S.
While I don't yet have a precise solution, your general problem looks a lot like a well known problem called the 3SUM problem.
The 3SUM problem asks if given a set of n numbers, are there a set of 3 numbers whose sum is 0. The wikipedia page outlines an algorithm to search for such tupple.
Note that the 3SUM problems have different variants, 1 of which is the k-SUM (substitute 3 by k in the above paragraph) , and another one is the non-zero sum variant. Maybe you could find a combination of these two variants to build your generalized solution.
Related
Note: This may involve a good deal of number theory, but the formula I found online is only an approximation, so I believe an exact solution requires some sort of iterative calculation by a computer.
My goal is to find an efficient algorithm (in terms of time complexity) to solve the following problem for large values of n:
Let R(a,b) be the amount of steps that the Euclidean algorithm takes to find the GCD of nonnegative integers a and b. That is, R(a,b) = 1 + R(b,a%b), and R(a,0) = 0. Given a natural number n, find the sum of R(a,b) for all 1 <= a,b <= n.
For example, if n = 2, then the solution is R(1,1) + R(1,2) + R(2,1) + R(2,2) = 1 + 2 + 1 + 1 = 5.
Since there are n^2 pairs corresponding to the numbers to be added together, simply computing R(a,b) for every pair can do no better than O(n^2), regardless of the efficiency of R. Thus, to improve the efficiency of the algorithm, a faster method must somehow calculate the sum of R(a,b) over many values at once. There are a few properties that I suspect might be useful:
If a = b, then R(a,b) = 1
If a < b, then R(a,b) = 1 + R(b,a)
R(a,b) = R(ka,kb) where k is some natural number
If b <= a, then R(a,b) = R(a+b,b)
If b <= a < 2b, then R(a,b) = R(2a-b,a)
Because of the first two properties, it is only necessary to find the sum of R(a,b) over pairs where a > b. I tried using this in addition to the third property in a method that computes R(a,b) only for pairs where a and b are also coprime in addition to a being greater than b. The total sum is then n plus the sum of (n / a) * ((2 * R(a,b)) + 1) over all such pairs (using integer division for n / a). This algorithm still had time complexity O(n^2), I discovered, due to Euler's totient function being roughly linear.
I don't need any specific code solution, I just need to figure out the procedure for a more efficient algorithm. But if the programming language matters, my attempts to solve this problem have used C++.
Side note: I have found that a formula has been discovered that nearly solves this problem, but it is only an approximation. Note that the formula calculates the average rather than the sum, so it would just need to be multiplied by n^2. If the formula could be expanded to reduce the error, it might work, but from what I can tell, I'm not sure if this is possible.
Using Stern-Brocot, due to symmetry, we can look at just one of the four subtrees rooted at 1/3, 2/3, 3/2 or 3/1. The time complexity is still O(n^2) but obviously performs less calculations. The version below uses the subtree rooted at 2/3 (or at least that's the one I looked at to think through :). Also note, we only care about the denominators there since the numerators are lower. Also note the code relies on rules 2 and 3 as well.
C++ code (takes about a tenth of a second for n = 10,000):
#include <iostream>
using namespace std;
long g(int n, int l, int mid, int r, int fromL, int turns){
long right = 0;
long left = 0;
if (mid + r <= n)
right = g(n, mid, mid + r, r, 1, turns + (1^fromL));
if (mid + l <= n)
left = g(n, l, mid + l, mid, 0, turns + fromL);
// Multiples
int k = n / mid;
// This subtree is rooted at 2/3
return 4 * k * turns + left + right;
}
long f(int n) {
// 1/1, 2/2, 3/3 etc.
long total = n;
// 1/2, 2/4, 3/6 etc.
if (n > 1)
total += 3 * (n >> 1);
if (n > 2)
// Technically 3 turns for 2/3 but
// we can avoid a subtraction
// per call by starting with 2. (I
// guess that means it could be
// another subtree, but I haven't
// thought it through.)
total += g(n, 2, 3, 1, 1, 2);
return total;
}
int main() {
cout << f(10000);
return 0;
}
I think this is a hard problem. We can avoid division and reduce the space usage to linear at least via the Stern--Brocot tree.
def f(n, a, b, r):
return r if a + b > n else r + f(n, a + b, b, r) + f(n, a + b, a, r + 1)
def R_sum(n):
return sum(f(n, d, d, 1) for d in range(1, n + 1))
def R(a, b):
return 1 + R(b, a % b) if b else 0
def test(n):
print(R_sum(n))
print(sum(R(a, b) for a in range(1, n + 1) for b in range(1, n + 1)))
test(100)
Suppose n =3
Using the numbers 1 and 2, the combinations to get the sum n would be [1,2], [2,1], [1,1,1]. 3 combinations.
If n= 2
then the combinations to get the sum n would be [1,1] and [2]. 2 combinations.
How do I write an algorithm to compute the number of combinations of 1 and 2 to get the sum to n?
What you are describing seems to be the Coin-Change Problem in the dynamic-programming category. You can check out these links below to gain a better understanding of them -
Coin Change Problem - GeeksForGeeks
Coin Change Problem - Algorithmist
Since just providing a link is not acceptable, i am posting some of the link's content here to give you an idea -
Given a value N, if we want to make change for N cents, and we have infinite supply of each of
S = { S1, S2, .. , Sm} valued coins,
how many ways can we make the change?
The order of coins doesn’t matter.
For example, for N = 4 and S = {1,2,3},
there are four solutions: {1,1,1,1},{1,1,2},{2,2},{1,3}.
So output should be 4.
For N = 10 and S = {2, 5, 3, 6},
there are five solutions: {2,2,2,2,2}, {2,2,3,3}, {2,2,6}, {2,3,5} and {5,5}.
So the output should be 5.
Optimal Substructure
To count total number solutions, we can divide all set solutions in two sets.
1) Solutions that do not contain mth coin (or Sm).
2) Solutions that contain at least one Sm.
Let count(S[], m, n) be the function to count the number of solutions,
then it can be written as sum of count(S[], m-1, n) and count(S[], m, n-Sm).
Therefore, the problem has optimal substructure property as the problem can be solved using solutions to subproblems.
Overlapping Subproblems
Following is a simple recursive implementation of the Coin Change problem. The implementation simply follows the recursive structure mentioned above.
// Returns the count of ways we can sum S[0...m-1] coins to get sum n
int count( int S[], int m, int n )
{
// If n is 0 then there is 1 solution (do not include any coin)
if (n == 0)
return 1;
// If n is less than 0 then no solution exists
if (n < 0)
return 0;
// If there are no coins and n is greater than 0, then no solution exist
if (m <=0 && n >= 1)
return 0;
// count is sum of solutions (i) including S[m-1] (ii) excluding S[m-1]
return count( S, m - 1, n ) + count( S, m, n-S[m-1] );
}
If you think and dry run the above code you should be able to grasp the basic idea. I hope this solves your problem.
The above explanation :
Courtesy - GeeksforGeeks
This problem can be solved in Mathematics way. The equation is summation of (n-r)Cr where r = 0,1,2,...,n/2 for all n>0
Note that this equation can be further generalize for n>0 and set {1,x} only. Other than this, you should look at the idea provided by #RahulNori
n=4
[1,1,1,1] - 1 combination (4C0)
[2,1,1], [1,2,1], [1,1,2] - 3 combination (3C1)
[2,2] - 1 combination (2C2)
n=5
[1,1,1,1,1] - 1 combination (5C0)
[2,1,1,1], [1,2,1,1], [1,1,2,1], [1,1,1,2] - 4 combination (4C1)
[2,2,1], [2,1,2], [1,2,2] - 3 combination (3C2)
n=6
[1,1,1,1,1,1] - 1 combination (6C0)
[2,1,1,1,1], [1,2,1,1,1], [1,1,2,1,1], [1,1,1,2,1], [1,1,1,1,2] - 5 combination (5C1)
[2,2,1,1], [2,1,2,1], [1,2,2,1], [2,1,1,2], [1,2,1,2], [1,1,2,2] - 6 combination (4C2)
[2,2,2] - 1 combination (3C3)
I'm looking to construct an algorithm which gives the arrangements with repetition of n sequences of a given step S (which can be a positive real number), under the constraint that the sum of all combinations is k, with k a positive integer.
My problem is thus to find the solutions to the equation:
x 1 + x 2 + ⋯ + x n = k
where
0 ≤ x i ≤ b i
and S (the step) a real number with finite decimal.
For instance, if 0≤xi≤50, and S=2.5 then xi = {0, 2.5 , 5,..., 47.5, 50}.
The point here is to look only through the combinations having a sum=k because if n is big it is not possible to generate all the arrangements, so I would like to bypass this to generate only the combinations that match the constraint.
I was thinking to start with n=2 for instance, and find all linear combinations that match the constraint.
ex: if xi = {0, 2.5 , 5,..., 47.5, 50} and k=100, then we only have one combination={50,50}
For n=3, we have the combination for n=2 times 3, i.e. {50,50,0},{50,0,50} and {0,50,50} plus the combinations {50,47.5,2.5} * 3! etc...
If xi = {0, 2.5 , 5,..., 37.5, 40} and k=100, then we have 0 combinations for n=2 because 2*40<100, and we have {40,40,20} times 3 for n=3... (if I'm not mistaken)
I'm a bit lost as I can't seem to find a proper way to start the algorithm, knowing that I should have the step S and b as inputs.
Do you have any suggestions?
Thanks
You can transform your problem into an integer problem by dividing everything by S: We want to find all integer sequences y1, ..., yn with:
(1) 0 ≤ yi ≤ ⌊b / S⌋
(2) y1 + ... + yn = k / S
We can see that there is no solution if k is not a multiple of S. Once we have reduced the problem, I would suggest using a pseudopolynomial dynamic programming algorithm to solve the subset sum problem and then reconstruct the solution from it. Let f(i, j) be the number of ways to make sum j with i elements. We have the following recurrence:
f(0,0) = 1
f(0,j) = 0 forall j > 0
f(i,j) = sum_{m = 0}^{min(floor(b / S), j)} f(i - 1, j - m)
We can solve f in O(n * k / S) time by filling it row by row. Now we want to reconstruct the solution. I'm using Python-style pseudocode to illustrate the concept:
def reconstruct(i, j):
if f(i,j) == 0:
return
if i == 0:
yield []
return
for m := 0 to min(floor(b / S), j):
for rest in reconstruct(i - 1, j - m):
yield [m] + rest
result = reconstruct(n, k / S)
result will be a list of all possible combinations.
What you are describing sounds like a special case of the subset sum problem. Once you put it in those terms, you'll find that Pisinger apparently has a linear time algorithm for solving a more general version of your problem, since your weights are bounded. If you're interested in designing your own algorithm, you might start by reading Pisinger's thesis to get some ideas.
Since you are looking for all possible solutions and not just a single solution, the dynamic programming approach is probably your best bet.
Given:
array of integers
value K,M
Question:
Find the maximum sum which we can obtain from all K element subsets of given array such that sum is less than value M?
is there a non dynamic programming solution available to this problem?
or if it is only dp[i][j][k] can only solve this type of problem!
can you please explain the algorithm.
Many people have commented correctly that the answer below from years ago, which uses dynamic programming, incorrectly encodes solutions allowing an element of the array to appear in a "subset" multiple times. Luckily there is still hope for a DP based approach.
Let dp[i][j][k] = true if there exists a size k subset of the first i elements of the input array summing up to j
Our base case is dp[0][0][0] = true
Now, either the size k subset of the first i elements uses a[i + 1], or it does not, giving the recurrence
dp[i + 1][j][k] = dp[i][j - a[i + 1]][k - 1] OR dp[i][j][k]
Put everything together:
given A[1...N]
initialize dp[0...N][0...M][0...K] to false
dp[0][0][0] = true
for i = 0 to N - 1:
for j = 0 to M:
for k = 0 to K:
if dp[i][j][k]:
dp[i + 1][j][k] = true
if j >= A[i] and k >= 1 and dp[i][j - A[i + 1]][k - 1]:
dp[i + 1][j][k] = true
max_sum = 0
for j = 0 to M:
if dp[N][j][K]:
max_sum = j
return max_sum
giving O(NMK) time and space complexity.
Stepping back, we've made one assumption here implicitly which is that A[1...i] are all non-negative. With negative numbers, initializing the second dimension 0...M is not correct. Consider a size K subset made up of a size K - 1 subset with sum exceeding M and one other sufficiently negative element of A[] such that overall sum no longer exceeds M. Similarly, our size K - 1 subset could sum to some extremely negative number and then with a sufficiently positive element of A[] sum to M. In order for our algorithm to still work in both cases we would need to increase the second dimension from M to the difference between the sum of all positive elements in A[] and the sum of all negative elements (the sum of the absolute values of all elements in A[]).
As for whether a non dynamic programming solution exists, certainly there is the naive exponential time brute force solution and variations that optimize the constant factor in the exponent.
Beyond that? Well your problem is closely related to subset sum and the literature for the big name NP complete problems is rather extensive. And as a general principle algorithms can come in all shapes and sizes -- it's not impossible for me to imagine doing say, randomization, approximation, (just choose the error parameter to be sufficiently small!) plain old reductions to other NP complete problems (convert your problem into a giant boolean circuit and run a SAT solver). Yes these are different algorithms. Are they faster than a dynamic programming solution? Some of them, probably. Are they as simple to understand or implement, without say training beyond standard introduction to algorithms material? Probably not.
This is a variant of the Knapsack or subset-problem, where in terms of time (at the cost of exponential growing space requirements as the input size grows), dynamic programming is the most efficient method that CORRECTLY solves this problem. See Is this variant of the subset sum problem easier to solve? for a similar question to yours.
However, since your problem is not exactly the same, I'll provide an explanation anyways. Let dp[i][j] = true, if there is a subset of length i that sums to j and false if there isn't. The idea is that dp[][] will encode the sums of all possible subsets for every possible length. We can then simply find the largest j <= M such that dp[K][j] is true. Our base case dp[0][0] = true because we can always make a subset that sums to 0 by picking one of size 0.
The recurrence is also fairly straightforward. Suppose we've calculated the values of dp[][] using the first n values of the array. To find all possible subsets of the first n+1 values of the array, we can simply take the n+1_th value and add it to all the subsets we've seen before. More concretely, we have the following code:
initialize dp[0..K][0..M] to false
dp[0][0] = true
for i = 0 to N:
for s = 0 to K - 1:
for j = M to 0:
if dp[s][j] && A[i] + j < M:
dp[s + 1][j + A[i]] = true
for j = M to 0:
if dp[K][j]:
print j
break
We're looking for a subset of K elements for which the sum of the elements is a maximum, but less than M.
We can place bounds [X, Y] on the largest element in the subset as follows.
First we sort the (N) integers, values[0] ... values[N-1], with the element values[0] is the smallest.
The lower bound X is the largest integer for which
values[X] + values[X-1] + .... + values[X-(K-1)] < M.
(If X is N-1, then we've found the answer.)
The upper bound Y is the largest integer less than N for which
values[0] + values[1] + ... + values[K-2] + values[Y] < M.
With this observation, we can now bound the second-highest term for each value of the highest term Z, where
X <= Z <= Y.
We can use exactly the same method, since the form of the problem is exactly the same. The reduced problem is finding a subset of K-1 elements, taken from values[0] ... values[Z-1], for which the sum of the elements is a maximum, but less than M - values[Z].
Once we've bound that value in the same way, we can put bounds on the third-largest value for each pair of the two highest values. And so on.
This gives us a tree structure to search, hopefully with much fewer combinations to search than N choose K.
Felix is correct that this is a special case of the knapsack problem. His dynamic programming algorithm takes O(K*M) size and O(K*K*M) amount of time. I believe his use of the variable N really should be K.
There are two books devoted to the knapsack problem. The latest one, by Kellerer, Pferschy and Pisinger [2004, Springer-Verlag, ISBN 3-540-40286-1] gives an improved dynamic programming algorithm on their page 76, Figure 4.2 that takes O(K+M) space and O(KM) time, which is huge reduction compared to the dynamic programming algorithm given by Felix. Note that there is a typo on the book's last line of the algorithm where it should be c-bar := c-bar - w_(r(c-bar)).
My C# implementation is below. I cannot say that I have extensively tested it, and I welcome feedback on this. I used BitArray to implement the concept of the sets given in the algorithm in the book. In my code, c is the capacity (which in the original post was called M), and I used w instead of A as the array that holds the weights.
An example of its use is:
int[] optimal_indexes_for_ssp = new SubsetSumProblem(12, new List<int> { 1, 3, 5, 6 }).SolveSubsetSumProblem();
where the array optimal_indexes_for_ssp contains [0,2,3] corresponding to the elements 1, 5, 6.
using System;
using System.Collections.Generic;
using System.Collections;
using System.Linq;
public class SubsetSumProblem
{
private int[] w;
private int c;
public SubsetSumProblem(int c, IEnumerable<int> w)
{
if (c < 0) throw new ArgumentOutOfRangeException("Capacity for subset sum problem must be at least 0, but input was: " + c.ToString());
int n = w.Count();
this.w = new int[n];
this.c = c;
IEnumerator<int> pwi = w.GetEnumerator();
pwi.MoveNext();
for (int i = 0; i < n; i++, pwi.MoveNext())
this.w[i] = pwi.Current;
}
public int[] SolveSubsetSumProblem()
{
int n = w.Length;
int[] r = new int[c+1];
BitArray R = new BitArray(c+1);
R[0] = true;
BitArray Rp = new BitArray(c+1);
for (int d =0; d<=c ; d++) r[d] = 0;
for (int j = 0; j < n; j++)
{
Rp.SetAll(false);
for (int k = 0; k <= c; k++)
if (R[k] && k + w[j] <= c) Rp[k + w[j]] = true;
for (int k = w[j]; k <= c; k++) // since Rp[k]=false for k<w[j]
if (Rp[k])
{
if (!R[k]) r[k] = j;
R[k] = true;
}
}
int capacity_used= 0;
for(int d=c; d>=0; d--)
if (R[d])
{
capacity_used = d;
break;
}
List<int> result = new List<int>();
while (capacity_used > 0)
{
result.Add(r[capacity_used]);
capacity_used -= w[r[capacity_used]];
} ;
if (capacity_used < 0) throw new Exception("Subset sum program has an internal logic error");
return result.ToArray();
}
}
I was wondering how to solve such a problem using DP.
Given n balls and m bins, each bin having max. capacity c1, c2,...cm. What is the total number of ways of distributing these n balls into these m bins.
The problem I face is
How to find a recurrence relation (I could when the capacities were all a single constant c).
There will be several test cases, each having its own set of c1,c2....cm. So how would the DP actually apply for all these test cases because the answer explicitly depends on present c1,c2....cm, and I can't store (or pre-compute) the answer for all combinations of c1,c2....cm.
Also, I could not come up with any direct combinatoric formula for this problem too, and I don't think one exists too.
You can define your function assuming the limits c[0], c[1], ... c[m-1] as fixed and then writing the recursive formula that returns the number of ways you can distribute n balls into bins starting at index k. With this approach a basic formula is simply
def solutions(n, k):
if n == 0:
return 1 # Out of balls, there's only one solution (0, 0, 0, 0 ... 0)
if k == m:
return 0 # Out of bins... no solutions
total = 0
for h in xrange(0, min(n, c[k])+1): # from 0 to c[k] (included) into bin k
total += solutions(n - h, k + 1)
return total
then you need to add memoization (this will be equivalent to a DP approach) and some other optimizations like e.g. that if n > c[k] + c[k+1] + c[k+2] + ... then you know there are no solutions without the need to search (and you can precompute the partial sums).
There exists a combinatoric formula for this problem. The problem of finding the solutions to your problem is equivalent to finding the number of solutions of the equation
x1 + x2 + x3 + ... + xm = n
where xi < ci
Which is equivalent to finding the cofficient of x^n in the following equation
(1+x+..x^c1)(1+x+..+x^c2)...(1+x+...+x^cm)
The recursion for this equation is pretty simple
M(i,j) = summation(M(i-1, j-k)) where 0<= k <= cj
M(i,j) = 0 j <= 0
M(i,1) = i given for every 1= 1
M(i,j) is the number of ways of distributing the j balls in first i bins.
For the Dynamic Programming part Solve this recursion by Memoization, You will get your DP Solution automatically.