The cell-sum puzzle is defined as follows:
Given two sets of non-negative integers X = {x1, x2,...,xm} and Y = {y1, y2,...,yn}, fill each cell in a grid of m rows and n columns with a single non-negative integer such that xi is the sum of the cells in the ith row for every i ≤ m and such that yj is the sum of the cells in the jth column for every j ≤ n.
For example, if X = {7, 13} and Y = {8, 9, 3}, then your goal would be to replace the question marks in the following grid:
? + ? + ? = 7
+ + +
? + ? + ? = 13
= = =
8 9 3
and a valid solution would be:
3 + 1 + 3 = 7
+ + +
5 + 8 + 0 = 13
= = =
8 9 3
How do you solve this puzzle for arbitrarily large m and n? Also, for your method of choice, do you know the time complexity, and can you tell whether it is the most efficient algorithm possible?

Here's a linear-time algorithm (O(m + n) assuming we can output a sparse matrix, which is asymptotically optimal because we have to read the whole input; otherwise O(m n), which is optimal because we have to write the whole output).
Fill in the upper-left question mark with the min of the first row sum and the first column sum. If the first row sum equals the min, put zeros in the rest of the row. If the first column sum equals the min, put zeros in the rest of the column. Extract the subproblem by subtracting the new value from the first row/column if they remain and recurse.
On your example:
? + ? + ? = 7
+ + +
? + ? + ? = 13
= = =
8 9 3
Min of 7 and 8 is 7.
7 + 0 + 0 = 7
+ + +
? + ? + ? = 13
= = =
8 9 3
Extract the subproblem.
? + ? + ? = 13
= = =
1 9 3
Min of 13 and 1 is 1.
1 + ? + ? = 13
= = =
1 9 3
Extract the subproblem.
? + ? = 12
= =
9 3
Keep going until we get the final solution.
7 + 0 + 0 = 7
+ + +
1 + 9 + 3 = 13
= = =
8 9 3

Edit: the problem is not NP-hard. The algorithm in David Eisenstat's answer is provably correct for finding a solution. However, I'll leave this answer here since it gives a way to find all solutions, which might be of interest to some.
For what it's worth, my "method of choice" is constraint programming; it's easy to model this as a constraint satisfaction problem, and then a wide range of well-developed algorithms can be applied. The code below is in Python, using the python-constraint library.
x_sums = [7, 13]
y_sums = [8, 9, 3]
from constraint import *
problem = Problem()
x_n, y_n = len(x_sums), len(y_sums)
max_num = max(x_sums + y_sums)
problem.addVariables(range(x_n * y_n), range(max_num + 1))
for i, x in enumerate(x_sums):
v = [ i + x_n * j for j in range(y_n) ]
problem.addConstraint(ExactSumConstraint(x), v)
for j, y in enumerate(y_sums):
v = [ i + x_n * j for i in range(x_n) ]
problem.addConstraint(ExactSumConstraint(y), v)
solution = problem.getSolution()
for i in range(x_n):
print(*( solution[i + x_n * j] for j in range(y_n) ))
Output: it finds a different solution to yours. Alternatively, you could search for all solutions; there are 26 of them.
4 0 3
4 9 0
The time complexity of this is hard to pin down exactly; as a very weak upper bound we can say it's definitely at most O(max_num ** (x_n * y_n)) since that's the size of the search space. In practice it is much better than that, but the algorithm this library uses is rather complicated and difficult to analyse precisely. It's a backtracking search, but with some clever ways of using the constraints to eliminate the vast majority of branches from the search tree.
For some idea of how deep this rabbit hole goes, the Handbook of Constraint Programming gives a lot of details about techniques that constraint-solving algorithms can use to improve efficiency.


How to solve M times prefix sum with better time complexity

The problem is to find the prefix sum of array of length N by repeating the process M times. e.g.
Example N=3
array = 1 2 3
output = 1 6 21
Step 1 prefix Sum = 1 3 6
Step 2 prefix sum = 1 4 10
Step 3 prefix sum = 1 5 15
Step 4(M) prefix sum = 1 6 21
Example 2:
array = 1 2 3 4 5
output = 1 5 15 35 70
I was not able to solve the problem and kept getting lime limit exceeded. I used dynamic programming to solve it in O(NM) time. I looked around and found the following general mathematical solution but I still not able to solve it because my math isn't that great to understand it. Can someone solve it in a better time complexity?
Hint: 3, 4, 5 and 6, 10, 15 are sections of diagonals on Pascal's Triangle.
JavaScript code:
function f(n, m) {
const result = [1];
for (let i = 1; i < n; i++)
result.push(result[i-1] * (m + i + 1) / i);
return result;
console.log(JSON.stringify(f(3, 4)));
console.log(JSON.stringify(f(5, 3)));

Image Quantization with quantums Algorithm question

I came across a question and unable to find a feasible solution.
Image Quantization
Given a grayscale mage, each pixels color range from (0 to 255), compress the range of values to a given number of quantum values.
The goal is to do that with the minimum sum of costs needed, the cost of a pixel is defined as the absolute difference between its color and the closest quantum value for it.
There are 3 rows 3 columns, image [[7,2,8], [8,2,3], [9,8 255]] quantums = 3 number of quantum values.The optimal quantum values are (2,8,255) Leading to the minimum sum of costs |7-8| + |2-2| + |8-8| + |8-8| + |2-2| + |3-2| + |9-8| + |8-8| + |255-255| = 1+0+0+0+0+1+1+0+0 = 3
Function description
Complete the solve function provided in the editor. This function takes the following 4 parameters and returns the minimum sum of costs.
n Represents the number of rows in the image
m Represents the number of columns in the image
image Represents the image
quantums Represents the number of quantum values.
Print a single integer the minimum sum of costs/
Sample Input 1
7 2 8
8 2 3
9 8 255
Sample output 1
The optimum quantum values are {0,1,2,3,4,5,7,8,9,255} Leading the minimum sum of costs |7-7| + |2-2| + |8-8| + |8-8| + |2-2| + |3-3| + |9-9| + |8-8| + |255-255| = 0+0+0+0+0+0+0+0+0 = 0
can anyone help me to reach the solution ?
Clearly if we have as many or more quantums available than distinct pixels, we can return 0 as we set at least enough quantums to each equal one distinct pixel. Now consider setting the quantum at the lowest number of the sorted, grouped list.
M = [
[7, 2, 8],
[8, 2, 3],
[9, 8, 255]
[(2, 2), (3, 1), (7, 1), (8, 3), (9, 1), (255, 1)]
We record the required sum of differences:
0 + 0 + 1 + 5 + 6 + 6 + 6 + 7 + 253 = 284
Now to update by incrementing the quantum by 1, we observe that we have a movement of 1 per element so all we need is the count of affected elements.
Incremenet 2 to 3
1 + 1 + 0 + 4 + 5 + 5 + 5 + 6 + 252 = 279
284 + 2 * 1 - 7 * 1
= 284 + 2 - 7
= 279
Consider traversing from the left with a single quantum, calculating only the effect on pixels in the sorted, grouped list that are on the left side of the quantum value.
To only update the left side when adding a quantum, we have:
left[k][q] = min(left[k-1][p] + effect(A, p, q))
where effect is the effect on the elements in A (the sorted, grouped list) as we reduce p incrementally and update the effect on the pixels in the range, [p, q] according to whether they are closer to p or q. As we increase q for each round of k, we can keep the relevant place in the sorted, grouped pixel list with a pointer that moves incrementally.
If we have a solution for
where it is the best for pixels on the left side of q when including k quantums with the rightmost quantum set as the number q, then the complete candidate solution would be given by:
left[k][q] + effect(A, q, list_end)
where there is no quantum between q and list_end
Time complexity would be O(n + k * q * q) = O(n + quantums ^ 3), where n is the number of elements in the input matrix.
Python code:
def f(M, quantums):
pixel_freq = [0] * 256
for row in M:
for colour in row:
pixel_freq[colour] += 1
# dp[k][q] stores the best solution up
# to the qth quantum value, with
# considering the effect left of
# k quantums with the rightmost as q
dp = [[0] * 256 for _ in range(quantums + 1)]
pixel_count = pixel_freq[0]
for q in range(1, 256):
dp[1][q] = dp[1][q-1] + pixel_count
pixel_count += pixel_freq[q]
predecessor = [[None] * 256 for _ in range(quantums + 1)]
# Main iteration, where the full
# candidate includes both right and
# left effects while incrementing the
# number of quantums.
for k in range(2, quantums + 1):
for q in range(k - 1, 256):
# Adding a quantum to the right
# of the rightmost doesn't change
# the left cost already calculated
# for the rightmost.
best_left = dp[k-1][q-1]
predecessor[k][q] = q - 1
q_effect = 0
p_effect = 0
p_count = 0
for p in range(q - 2, k - 3, -1):
r_idx = p + (q - p) // 2
# When the distance between p
# and q is even, we reassign
# one pixel frequency to q
if (q - p - 1) % 2 == 0:
r_freq = pixel_freq[r_idx + 1]
q_effect += (q - r_idx - 1) * r_freq
p_count -= r_freq
p_effect -= r_freq * (r_idx - p)
# Either way, we add one pixel frequency
# to p_count and recalculate
p_count += pixel_freq[p + 1]
p_effect += p_count
effect = dp[k-1][p] + p_effect + q_effect
if effect < best_left:
best_left = effect
predecessor[k][q] = p
dp[k][q] = best_left
# Records the cost only on the right
# of the rightmost quantum
# for candidate solutions.
right_side_effect = 0
pixel_count = pixel_freq[255]
best = dp[quantums][255]
best_quantum = 255
for q in range(254, quantums-1, -1):
right_side_effect += pixel_count
pixel_count += pixel_freq[q]
candidate = dp[quantums][q] + right_side_effect
if candidate < best:
best = candidate
best_quantum = q
quantum_list = [best_quantum]
prev_quantum = best_quantum
for i in range(k, 1, -1):
prev_quantum = predecessor[i][prev_quantum]
return best, list(reversed(quantum_list))
M = [
[7, 2, 8],
[8, 2, 3],
[9, 8, 255]
k = 3
print(f(M, k)) # (3, [2, 8, 255])
M = [
[7, 2, 8],
[8, 2, 3],
[9, 8, 255]
k = 10
print(f(M, k)) # (0, [2, 3, 7, 8, 9, 251, 252, 253, 254, 255])
I would propose the following:
step 0
Input is:
image = 7 2 8
8 2 3
9 8 255
quantums = 3
step 1
Then you can calculate histogram from the input image. Since your image is grayscale, it can contain only values from 0-255.
It means that your histogram array has length equal to 256.
hist = int[256] // init the histogram array
for each pixel color in image // iterate over image
hist[color]++ // and increment histogram values
value 0 0 2 1 0 0 0 1 2 1 0 . . . 1
color 0 1 2 3 4 5 6 7 8 9 10 . . . 255
How to read the histogram:
color 3 has 1 occurrence
color 8 has 2 occurrences
With tis approach, we have reduced our problem from N (amount of pixels) to 256 (histogram size).
Time complexity of this step is O(N)
step 2
Once we have histogram in place, we can calculate its # of quantums local maximums. In our case, we can calculate 3 local maximums.
For the sake of simplicity, I will not write the pseudo code, there are numerous examples on internet. Just google ('find local maximum/extrema in array'
It is important that you end up with 3 biggest local maximums. In our case it is:
value 0 0 2 1 0 0 0 1 2 1 0 . . . 1
color 0 1 2 3 4 5 6 7 8 9 10 . . . 255
^ ^ ^
These values (2, 8, 266) are your tops of the mountains.
Time complexity of this step is O(quantums)
I could explain why it is not O(1) or O(256), since you can find local maximums in a single pass. If needed I will add a comment.
step 3
Once you have your tops of the mountains, you want to isolate each mountain in a way that it has the maximum possible surface.
So, you will do that by finding the minimum value between two tops
In our case it is:
value 0 0 2 1 0 0 0 1 2 1 0 . . . 1
color 0 1 2 3 4 5 6 7 8 9 10 . . . 255
^ ^
| \ / \
- - _ _ _ _ . . . _ ^
So our goal is to find between index values:
from 0 to 2 (not needed, first mountain start from beginning)
from 2 to 8 (to see where first mountain ends, and second one starts)
from 8 to 255 (to see where second one ends, and third starts)
from 255 to end (just noted, also not needed, last mountain always reaches the end)
There are multiple candidates (multiple zeros), and it is not important which one you choose for minimum. Final surface of the mountain is always the same.
Let's say that our algorithm return two minimums. We will use them in next step.
min_1_2 = 6
min_2_3 = 254
Time complexity of this step is O(256). You need just a single pass over histogram to calculate all minimums (actually you will do multiple smaller iterations, but in total you visit each element only once.
Someone could consider this as O(1)
Step 4
Calculate the median of each mountain.
This can be the tricky one. Why? Because we want to calculate the median using the original values (colors) and not counters (occurrences).
There is also the formula that can give us good estimate, and this one can be performed quite fast (looking only at histogram values) (https://medium.com/analytics-vidhya/descriptive-statistics-iii-c36ecb06a9ae)
If that is not precise enough, then the only option is to "unwrap" the calculated values. Then, we could sort these "raw" pixels and easily find the median.
In our case, those medians are 2, 8, 255
Time complexity of this step is O(nlogn) if we have to sort the whole original image. If approximation works fine, then time complexity of this step is almost the constant.
step 5
This is final step.
You now know the start and end of the "mountain".
You also know the median that belongs to that "mountain"
Again, you can iterate over each mountain and calculate the DIFF.
diff = 0
median_1 = 2
median_2 = 8
median_3 = 255
for each hist value (color, count) between START and END // for first mountain -> START = 0, END = 6
// for second mountain -> START = 6, END = 254
// for third mountain -> START = 254, END = 255
diff = diff + |color - median_X| * count
Time complexity of this step is again O(256), and it can be considered as constant time O(1)

Unique combinations of numbers that add up to a sum

I was asked this in an interview recently and got completely stumped. I know there are questions like this asked on here before but none handled the little twist thrown onto this one.
Given a number, find all possible ways you can add up to it using only the numbers 1,2,3. So for an input of 3, the output would be 4 because the combinations would be 1,1,1 and 1,2 and 2,1 and 3. I know about the coin change algorithm but it doesn't give me that permutation of 1,2 and 2,1. So I just ended up implementing the coin change algorithm and couldn't get the permutation part. Does anybody have any ideas?
It's a recursive problem:
take for example the possible options for 5
1 X X X X
2 X X X
3 X X
f(5)=f(4) + f(3) + f(2)
So the generic solution is
f(N)= f(N-1) + f(N-2) + f(N-3) for N > 3
To answer your question about classification of the problem it looks like dynamic programming problem to me. See following question taken from stanford.edu
1-dimensional DP Example
◮ Problem: given n, find the number of different ways to write
n as the sum of 1, 3, 4
◮ Example: for n = 5, the answer is 6
5 = 1 + 1 + 1 + 1 + 1
= 1 + 1 + 3
= 1 + 3 + 1
= 3 + 1 + 1
= 1 + 4
= 4 + 1
And here is the solution to similar problem

Number of ways of distributing n identical balls into groups such that each group has atleast k balls?

I am trying to do this using recursion with memoization ,I have identified the following base cases .
I) when n==k there is only one group with all the balls.
II) when k>n then no groups can have atleast k balls,hence zero.
I am unable to move forward from here.How can this be done?
As an illustration when n=6 ,k=2
That is 4 different groupings can be formed.
This can be represented by the two dimensional recursive formula described below:
T(0, k) = 1
T(n, k) = 0 n < k, n != 0
T(n, k) = T(n-k, k) + T(n, k + 1)
^ ^
There is a box with k balls, No box with k balls, advance to next k
put them
In the above, T(n,k) is the number of distributions of n balls such that each box gets at least k.
And the trick is to think of k as the lowest possible number of balls, and seperate the problem to two scenarios: Is there a box with exactly k balls (if so, place them and recurse with n-k balls), or not (and then, recurse with minimal value of k+1, and same number of balls).
Example, to calculate your example: T(6,2) (6 balls, minimum 2 per box):
T(6,2) = T(4,2) + T(6,3)
T(4,2) = T(2,2) + T(4,3) = T(0,2) + T(2,3) + T(1,3) + T(4,4) =
= T(0,2) + T(2,3) + T(1,3) + T(0,4) + T(4,5) =
= 1 + 0 + 0 + 1 + 0
= 2
T(6,3) = T(3,3) + T(6,4) = T(0,3) + T(3,4) + T(2,4) + T(6,5)
= T(0,3) + T(3,4) + T(2,4) + T(1,5) + T(6,6) =
= T(0,3) + T(3,4) + T(2,4) + T(1,5) + T(0,6) + T(6,7) =
= 1 + 0 + 0 + 0 + 1 + 0
= 2
T(6,2) = T(4,2) + T(6,3) = 2 + 2 = 4
Using Dynamic Programming, it can be calculated in O(n^2) time.
This case can be solved pretty simple:
Number of buckets
The maximum-number of buckets b can be determined as follows:
b = roundDown(n / k)
Each valid distribution can use at most b buckets.
Number of distributions with x buckets
For a given number of buckets the number of distribution can be found pretty simple:
Distribute k balls to each bucket. Find the number of ways to distribute the remaining balls (r = n - k * x) to x buckets:
total_distributions(x) = bincoefficient(x , n - k * x)
EDIT: this will onyl work, if order matters. Since it doesn't for the question, we can use a few tricks here:
Each distribution can be mapped to a sequence of numbers. E.g.: d = {d1 , d2 , ... , dx}. We can easily generate all of these sequences starting with the "first" sequence {r , 0 , ... , 0} and subsequently moving 1s from the left to the right. So the next sequence would look like this: {r - 1 , 1 , ... , 0}. If only sequences matching d1 >= d2 >= ... >= dx are generated, no duplicates will be generated. This constraint can easily be used to optimize this search a bit: We can only move a 1 from da to db (with a = b - 1), if da - 1 >= db + 1 is given, since otherwise the constraint that the array is sorted is violated. The 1s to move are always the rightmost that can be moved. Another way to think of this would be to view r as a unary number and simply split that string into groups such that each group is atleast as long as it's successor.
sequence[0] = r
sequenceCount = 1
while true
int i = findRightmostMoveable(sequence)
if i == -1
return sequenceCount
sequence[i] -= 1
sequence[i + 1] -= 1
for i in [length(sequence) - 1 , 0)
if sequence[i - 1] > sequence[i] + 1
return i - 1
return -1
Actually findRightmostMoveable could be optimized a bit, if we look at the structure-transitions of the sequence (to be more precise the difference between two elements of the sequence). But to be honest I'm by far too lazy to optimize this further.
Putting the pieces together
range(1 , roundDown(n / k)).map(b -> countSequences(b)).sum()

Can brute force algorithms scale?

I have a math problem that I solve by trial and error (I think this is called brute force), and the program works fine when there are a few options, but as I add more variables/data it takes longer and longer to run.
My problem is although, the prototype works, it is useful with thousands of variables and large data sets; so, I'm wondering if it is possible to scale brute force algorithms. How can I approach scaling it?
I was starting to learn and play around with Hadoop (and HBase); although it looks promising, I wanted to verify that what I'm trying to do isn't impossible.
If it helps, I wrote the program in Java (and can use it if possible), but ended up porting it to Python, because I feel more comfortable with it.
Update: To provide more insight, I think I'll add a simplified version of the code to get the idea. Basically if I know the sum is 100, I am trying to find all combinations of the variables that could equal it. This is simple, in my version I may use larger numbers and many more variables. It's the Diophantine, and I believe there is no algorithm that exists to solve it without brute force.
int sum = 100;
int a1 = 20;
int a2 = 5;
int a3 = 10;
for (int i = 0; i * a1 <= sum; i++) {
for (int j = 0; i * a1 + j * a2 <= sum; j++) {
for (int k = 0; i * a1 + j * a2 + k * a3 <= sum; k++) {
if (i * a1 + j * a2 + k * a3 == sum) {
System.out.println(i + "," + j + "," + k);
I am new to programming, and I am sorry if I'm not framing this question correctly. This is more of a general question.
Typically, you can quantify how well an algorithm will scale by using big-O notation to analyze its growth rate. When you say that your algorithm works by "brute force," it's unclear to what extent it will scale. If your "brute force" solution works by listing all possible subsets or combinations of a set of data, then it almost certainly will not scale (it will have asymptotic complexity O(2n) or O(n!), respectively). If your brute force solution works by finding all pairs of elements and checking each, it may scale reasonably well (O(n2)). Without more information about how your algorithm works, though, it's difficult to say.
You may want to look at this excellent post about big-O as a starting point for how to reason about the long-term scalablility of your program. Typically speaking, anything that has growth rate O(n log n), O(n), O(log n), or O(1) scale extremely well, anything with growth rate O(n2) or O(n3) will scale up to a point, and anything with growth rate O(2n) or higher will not scale at all.
Another option would be to look up the problem you're trying to solve to see how well-studied it is. Some problems are known to have great solutions, and if yours is one of them it might be worth seeing what others have come up with. Perhaps there is a very clean, non-brute-force solution that scales really well! Some other problems are conjectured to have no scalable algorithms at all (the so-called NP-hard problems). If that's the case, then you should be pretty confident that there's no way to get a scalable approach.
And finally, you can always ask a new question here at Stack Overflow describing what you're trying to do and asking for input. Maybe the community can help you solve your problem more efficiently than you initially expected!
EDIT: Given the description of the problem that you are trying to solve, right now you are doing one for loop per variable from 0 up to the number you're trying to target. The complexity of this algorithm is O(Uk), where k is the number of variables and U is the sum. This approach will not scale very well at all. Introducing each new variable in the above case will make the algori2thm run 100 times slower, which definitely will not scale very well if you want 100 variables!
However, I think that there is a fairly good algorithm whose runtime is O(U2k) that uses O(Uk) memory to solve the problem. The intuition is as follows: Suppose that we want to sum up 1, 2, and 4 to get 10. There are many ways to do this:
2 * 4 + 1 * 2 + 0 * 1
2 * 4 + 0 * 2 + 2 * 1
1 * 4 + 3 * 2 + 0 * 1
1 * 4 + 2 * 2 + 2 * 1
1 * 4 + 1 * 2 + 4 * 1
1 * 4 + 0 * 2 + 6 * 1
0 * 4 + 5 * 2 + 0 * 1
0 * 4 + 4 * 2 + 2 * 1
0 * 4 + 3 * 2 + 4 * 1
0 * 4 + 2 * 2 + 6 * 1
0 * 4 + 1 * 2 + 8 * 1
0 * 4 + 0 * 2 + 10 * 1
The key observation is that we can write all of these out as sums, but more importantly, as sums where each term in the sum is no greater than the previous term:
2 * 4 + 1 * 2 + 0 * 1 = 4 + 4 + 2
2 * 4 + 0 * 2 + 2 * 1 = 4 + 4 + 1 + 1
1 * 4 + 3 * 2 + 0 * 1 = 4 + 2 + 2 + 2
1 * 4 + 2 * 2 + 2 * 1 = 4 + 2 + 2 + 1 + 1
1 * 4 + 1 * 2 + 4 * 1 = 4 + 2 + 1 + 1 + 1 + 1
1 * 4 + 0 * 2 + 6 * 1 = 4 + 1 + 1 + 1 + 1 + 1 + 1
0 * 4 + 5 * 2 + 0 * 1 = 2 + 2 + 2 + 2 + 2
0 * 4 + 4 * 2 + 2 * 1 = 2 + 2 + 2 + 2 + 1 + 1
0 * 4 + 3 * 2 + 4 * 1 = 2 + 2 + 2 + 1 + 1 + 1 + 1
0 * 4 + 2 * 2 + 6 * 1 = 2 + 2 + 1 + 1 + 1 + 1 + 1 + 1
0 * 4 + 1 * 2 + 8 * 1 = 2 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
0 * 4 + 0 * 2 + 10 * 1 = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
So this gives an interesting idea about how to generate all possible ways to sum up to the target. The idea is to fix the first coefficient, then generate all possible ways to make the rest of the sum work out. In other words, we can think about the problem recursively. If we list the variables in order as x1, x2, ..., xn, then we can try fixing some particular coefficient for x1, then solving the problem of summing up sum - c_1 x_1 using just x2, ..., xn.
So far this doesn't seem all that fancy - in fact, it's precisely what you're doing above - but there is one trick we can use. As long as we're going to be thinking about this problem recursively, let's think about the problem in the opposite manner. Rather than starting with sum and trying to break it down, what if instead we started with 0 and tried to build up everything that we could?
Here's the idea. Suppose that we already know in advance all the numbers we can make using just sums of x1. Then for every number k between 0 and sum, inclusive, we can make k out of x2 and x1 out of any combination where k - c2 x2 is something that can be made out of combinations of x1. But since we've precomputed this, we can just iterate up over all possible legal values of c2, compute k - c2 x2, and see if we know how to make it. Assuming we store a giant U x (k + 1) table of boolean values such that table entry [x, y] stores "can we sum up the first y values, inclusive, in a way that sums up to precisely U?," we can fill in the table efficiently. This is called dynamic programming and is a powerful algorithmic tool.
More concretely, here's how this might work. Given k variables, create a U x (k + 1) table T of values. Then, set T[0][0] = true and T[x][0] = false for all x > 0. The rationale here is that T[0][0] means "can we get the sum zero using a linear combination of the first zero variables?" and the answer is definitely yes (the empty sum is zero!), but for any other sum made of no a linear combination of no variables we definitely cannot make it.
Now, for i = 1 .. k, we'll try to fill in the values of T[x][i]. Remember that T[x][i] means "can we make x as a linear combination of the first i variables?" Well, we know that we can do this if there is some coefficient c such that k - c xi can be made using a linear combination of x1, x2, ..., xi - 1. But for any c, that's just whether T[x - c xi][i - 1] is true. Thus we can say
for i = 1 to k
for z = 0 to sum:
for c = 1 to z / x_i:
if T[z - c * x_i][i - 1] is true:
set T[z][i] to true
Inspecting the loops, we see that the outer loop runs k times, the inner loop runs sum times per iteration, and the innermost loop runs also at most sum times per iteration. Their product is (using our notation from above) O(U2 k), which is way better than the O(Uk) algorithm that you had originally.
But how do you use this information to list off all of the possible ways to sum up to the target? The trick here is to realize that you can use the table to avoid wasting a huge amount of effort searching over every possible combination when many of them aren't going to work.
Let's see an example. Suppose that we have this table completely computed and want to list off all solutions. One idea is to think about listing all solutions where the coefficient of the last variable is zero, then when the last variable is one, etc. The issue with the approach you had before is that for some coefficients there might not be any solutions at all. But with the table we have constructed above, we can prune out those branches. For example, suppose that we want to see if there are any solutions that start with xk having coefficient 0. This means that we're asking if there are any ways to sum up a linear combination of the first k - 1 variables so that the sum of those values is sum. This is possible if and only if T[sum][k - 1] is true. If it is true, then we can recursively try assigning coefficients to the rest of the values in a way that sums up to sum. If not, then we skip this coefficient and go on to the next.
Recursively, this looks something like this:
function RecursivelyListAllThatWork(k, sum) // Using last k variables, make sum
/* Base case: If we've assigned all the variables correctly, list this
* solution.
if k == 0:
print what we have so far
/* Recursive step: Try all coefficients, but only if they work. */
for c = 0 to sum / x_k:
if T[sum - c * x_k][k - 1] is true:
mark the coefficient of x_k to be c
call RecursivelyListAllThatWork(k - 1, sum - c * x_k)
unmark the coefficient of x_k
This recursively will list all the solutions that work, using the values in the table we just constructed to skip a huge amount of wasted effort. Once you've built this table, you could divvy this work up by farming out the task to multiple computers, having them each list a subset of the total solutions, and processing them all in parallel.
Hope this helps!
By definition, brute force algorithms are stupid. You'd be much better off with a more clever algorithm (if you have one). A better algorithm will reduce the work that has do be done, hopefully to a degree that you can do it without needing to "scale out" to multiple machines.
Regardless of algorithm, there comes a point when the amount of data or computation power required is so big that you will need use something like Hadoop. But usually, we are really talking Big Data here. You can already do a lot with a single PC these days.
The algorithm to solve this issue is closed to the process we learn for manual mathematical division or also to convert from decimal to another base like octal or hexadecimal - except that two examples only look for a single canonical solution.
To be sure the recursion ends, it is important to order the data array. To be efficient and limit the number of recursions, it is also important to start with higher data values.
Concretely, here is a Java recursive implementation for this problem - with a copy of the result vector coeff for each recursion as expected in theory.
import java.util.Arrays;
public class Solver
public static void main(String[] args)
int target_sum = 100;
// pre-requisite: sorted values !!
int[] data = new int[] { 5, 10, 20, 25, 40, 50 };
// result vector, init to 0
int[] coeff = new int[data.length];
Arrays.fill(coeff, 0);
partialSum(data.length - 1, target_sum, coeff, data);
private static void printResult(int[] coeff, int[] data) {
for (int i = coeff.length - 1; i >= 0; i--) {
if (coeff[i] > 0) {
System.out.print(data[i] + " * " + coeff[i] + " ");
private static void partialSum(int k, int sum, int[] coeff, int[] data) {
int x_k = data[k];
for (int c = sum / x_k; c >= 0; c--) {
coeff[k] = c;
if (c * x_k == sum) {
printResult(coeff, data);
} else if (k > 0) {
// contextual result in parameters, local to method scope
int[] newcoeff = Arrays.copyOf(coeff, coeff.length);
partialSum(k - 1, sum - c * x_k, newcoeff, data);
// for loop on "c" goes on with previous coeff content
But now that code is in a special case: the last value test for each coeff is 0, so the copy is not necessary.
As a complexity estimation, we can use the maximum depth of recursive calls as data.length * min({ data }). For sure, it will not scale well and the limited factor is the stack trace memory (-Xss JVM option). The code may fail with a stack overflow error for a large data set.
To avoid this drawbacks, the "derecursion" process is useful. It consists in replacing the method call stack by a programmatic stack to store an execution context to process later. Here is the code for that:
import java.util.Arrays;
import java.util.ArrayDeque;
import java.util.Queue;
public class NonRecursive
// pre-requisite: sorted values !!
private static final int[] data = new int[] { 5, 10, 20, 25, 40, 50 };
// Context to store intermediate computation or a solution
static class Context {
int k;
int sum;
int[] coeff;
Context(int k, int sum, int[] coeff) {
this.k = k;
this.sum = sum;
this.coeff = coeff;
private static void printResult(int[] coeff) {
for (int i = coeff.length - 1; i >= 0; i--) {
if (coeff[i] > 0) {
System.out.print(data[i] + " * " + coeff[i] + " ");
public static void main(String[] args)
int target_sum = 100;
// result vector, init to 0
int[] coeff = new int[data.length];
Arrays.fill(coeff, 0);
// queue with contexts to process
Queue<Context> contexts = new ArrayDeque<Context>();
// initial context
contexts.add(new Context(data.length - 1, target_sum, coeff));
while(!contexts.isEmpty()) {
Context current = contexts.poll();
int x_k = data[current.k];
for (int c = current.sum / x_k; c >= 0; c--) {
current.coeff[current.k] = c;
int[] newcoeff = Arrays.copyOf(current.coeff, current.coeff.length);
if (c * x_k == current.sum) {
} else if (current.k > 0) {
contexts.add(new Context(current.k - 1, current.sum - c * x_k, newcoeff));
From my point of view, it is difficult to be more efficient in a single thread execution - the stack mechanism now requires coeff array copies.
