Related
I need to solve the following problem: given an integer sequence x of size N, and a subset size k, find all the possible subset sums. A subset sum is the sum of elements in the subset.
If elements in x are allowed to appear many times (up to k of course) in a subset (sub-multiset), this problem has a pseudo polynomial time solution via FFT. Here is an example:
x = [0, 1, 2, 3, 6]
k = 4
xFrequency = [1, 1, 1, 1, 0, 0, 1] # On the support of [0, 1, 2, 3, 4, 5, 6]
sumFrequency = selfConvolve(xFrequency, times = 4) # A fast approach is to simply raise the power of the Fourier series.
sumFrequency > 0 # Gives a boolean vector indicating all possible size-k subset sums.
But what can be done if an element cannot show up multiple times in a subset?
I came up with the following method but am unsure of its correctness. The idea is to first find the frequencies of sums that are produced by adding at least 2 identical elements:
y = [0, 2, 4, 6, 12] # = [0, 1, 2, 3, 6] + [0, 1, 2, 3, 6]
yFrequency = [0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1]
sumFrequencyWithRedundancy = convolve(yFrequency, x, x)
My reasoning is that since y represents all possible sums of 2 identical elements, then every sum in y + x + x is guaranteed to have been produced by adding at least 2 identical elements. Finally
sumFrequencyNoRedundancy = sumFrequency - sumFrequencyWithRedundancy
sumFrequencyNoRedundancy > 0
Any mistake or any other established method for solving the problem?
Thanks!
Edits:
After some tests, it does not work. There turns out to be much more combinations that should be excluded from sumFrequency besides sumFrequencyWithRedundancy, and the combinatoric analyses seem to escalate rapidly with k, eventually making it less efficient than brute-force summation.
My motivation was to find all possible sample sums given sampling without replacement and the sample size. Then I came across the idea of solving the standard subset sum problem via FFT --- free subset size and the qualified subsets themselves unneeded. The reference materials can be easily found online, basically a divide and conquer approach:
Divide the superset into 2 sets, left and right.
Compute all possible subset sums in the left and right sets. The sums are represented by 2 boolean vectors.
Convolve the 2 boolean vectors.
Find if the target sum is indicated in the final boolean vector.
You can see why the algorithm works for the standard subset sum problem.
If anyone can let me know some work on how to find all possible size-k subset sums, I would really appreciate it!
Given k and the n-element array x, it suffices to evaluate the degree-k coefficient in z of the polynomial
n x[i]
product (1 + y z).
i=1
This coefficient is a polynomial in y where the exponents with nonzero coefficients indicate the sums that can be formed using exactly k distinct terms.
One strategy is to split x with reasonably balanced sums, evaluate each half mod z^(k+1), and then multiply using the school algorithm for the outer multiplications and FFT (or whatever) for the inner. This should end up costing roughly O(k^2 S log^2 S).
The idea for evaluating elementary symmetric polynomials efficiently is due to Ben-Or.
We have N sets of triples,like
1. { (4; 0,1), (5 ; 0.3), (7; 0,6) }
2. { (7; 0.2), (8 ; 0.4), (1 ; 0.4) }
...
N. { (6; 0.3), (1; 0.2), (9 ; 0.5) }
and need to choose only one pair from each triple, so that the sum of the first members in pair will be minimal, but also we have a condition that sum of the second members in pair must be not less than a given P number.
We can solve this by sorting all possible pair combinations with the sum of their first members (3 ^ N combinations), and in that sorted list choose the first one which also satisfies the second condition.
Could you please help to suggest a better, non trivial solution for this problem?
If there are no constraints on the values inside your triplets, then we are facing a pretty general version of integer programming problem, more specifically a 0-1 linear programming problem, as it can be represented as a system of equations with every coefficient being 0 or 1. You can find the possible approaches on the wiki page, but there is no fast-and-easy solution for this problem in general.
Alternatively, if the second numbers of each pair (the ones that need to sum up to >= P) are from a small enough range, we could view this as Dynamic Programming problem similar to a Knapsack problem. "Small enough" there is a bit hard to define because the original data has non-integer numbers. If they were integers, then the algorithmic complexity of solution I will describe is O(P * N). For non-integer numbers, they need to be first converted to integers by multiplying them all, as well as P, by a large enough number. In your example, the precision of each number is 1 digit after zero, so multiplying by 10 is enough. Hence, the actual complexity is O(M * P * N), where M is the factor everything was multiplied by to achieve integer numbers.
After this, we are essentially solving a modified Knapsack problem: instead of constraining the weight from above, we are constraining it from below, and on each step we are choosing a pair from a triplet, as opposed to deciding whether to put an item into the knapsack or not.
Let's define a function minimum_sum[i][s] which at values i, s represents the minimum possible sum (of first numbers in each pair we took) we can achieve if the sum of the second numbers in pairs taken so far is equal to s and we already considered the first i triplets. One exception to this definition is that minimum_sum[i][P] has the minimum for all sums exceeding P as well. If we can compute all values of this function, then minimum_sum[N][P] is the answer. The function values can be computed with something like this:
minimum_sum[0][0]=0, all other values are set to infinity
for i=0..N-1:
for s=0..P:
for j=0..2:
minimum_sum[i+1][min(P, s+B[i][j])] = min(minimum_sum[i+1][min(P, s+B[i][j])], minimum_sum[i][s] + A[i][j]
A[i][j] here denote the first number in i-th triplet's j-th pair, and B[i][j] denote the second number of the same triplet.
This solution is viable if N is large, but P is small and precision on Bs isn't too high. For instance, if N=50, there is little hope to compute 3^N possibilities, but with M*P=1000000 this approach would work extremely fast.
Python implementation of the idea above:
def compute(A, B, P):
n = len(A)
# note that I use 1,000,000 as “infinity” here, which might need to be increased depending on input data
best = [[1000000 for i in range(P + 1)] for j in range(n + 1)]
best[0][0] = 0
for i in range(n):
for s in range(P+1):
for j in range(3):
best[i+1][min(P, s+B[i][j])] = min(best[i+1][min(P, s+B[i][j])], best[i][s]+A[i][j])
return best[n][P]
Testing:
A=[[4, 5, 7], [7, 8, 1], [6, 1, 9]]
# second numbers in each pair after scaling them up to be integers
B=[[1, 3, 6], [2, 4, 4], [3, 2, 5]]
In [7]: compute(A, B, 0)
Out[7]: 6
In [14]: compute(A, B, 7)
Out[14]: 6
In [15]: compute(A, B, 8)
Out[15]: 7
In [20]: compute(A, B, 13)
Out[20]: 14
I'm looking for an algorithm that can take in a set of natural numbers, for example:
S = {1, 3, 4, 2, 9, 34, 432, 43}
Then divide them into as equal piles as possible. The number of piles are predefined as n.
The goal is to have the sum of the difference between each pile and the lowest pile, to be the smallest.
Here comes an example.
Let's say you have:
S = { 1, 2, 2, 3, 1, 2, 3 }
n = 3
Then a solution could be
N1 = { 1, 2 }
N2 = { 2, 3 }
N3 = { 1, 2, 3 }
The sum of these piles would be 3, 5 and 6. The error would be: (5 - 3) + (6 - 3) = 5.
The algorithm needs to find the solution with the lowest error.
Any help is appreciated. Please comment if something is unclear.
I would argue that there is no efficient way to solve this problem because it is a NP-hard problem.
Proof:
Let's denote the problem you proposed as P*,
We can reduce the partition problem(known NP-hard) into P* by doing the following
Given a arbitrary partition problem P1, we ask the black box which solve P* to solve P1 with N=2(i.e, divide the set into 2 pile that minimize the different).
If the difference return by the black box is zero, -> there is a solution for P1
If the difference return by the black box is non-zero, -> there isn't a solution for P1
Therefore, P* is NP-hard
This sounds like a variation of the https://en.m.wikipedia.org/wiki/Bin_packing_problem. However, the size of the bins is not given thus it is at least as hard as Bin Packing. Thus the problem is NP-hard.
For an approximate solution you could for example calculate the average bin size and perform an adaptation of first-fit or best-fit in order to allow small overpacking.
Suppose I have a multiset of 10 digits, for example S = { 1, 1, 2, 2, 2, 3, 3, 3, 8, 9 }. Is there any method other than brute force to find the number of distinct permutations of the elements of S such that when a permutation is regarded as a ten digit integer, it is divisible by a particular number n ? n will be in the range 1 to 10000.
For example:
if S = { 1, 2, 3, 4, 6, 1, 2, 3, 4, 6 } and n = 10, the result is 0 (since no permutation of those 10 digits will ever give a number divisible by 10)
if S = { 1, 1, 3, 3, 5, 5, 7, 7, 9, 2} and n = 2, the result is 9! / 2^4 (since we must have the 2 at the end, there are 9! ways to permute the other elements, but there are four pairs of identical elements)
You could prune the search like so: find the prime factorization of NUM. Obviously to be divisible by NUM, a permutation needs to be divisible by all of NUM's prime factors. Hence you can use simple divisibility rules to avoid generating many invalid candidates.
I have some thoughts but it's not organized into an actual algorithm.
For N=2, we simply see how many even digits we can put on the end of our permutations and calculate the number that way.
For N=3, we know the sum of the digits has to be divisible by 3. This means we can freely put any 3s, 6s, 9s and 0s in our permutations, but any other digits we'll have to put in pairs that sum to 3, 6 or 9 (or a triplet of 1s). I don't think this would be too hard to implement.
For N=4, we can do something similar to N=2.
I think we can come up with cases like this for up to N=10 (N=7 might be tricky). Then, we might be able to do any N > 10 by factoring it. For example, if N=18, any and all permutations that are divisible by N are also divisible by 2 and 9. Of course if N is a prime number we might be in trouble.
My idea: sort the digits of S increasing and decreasing. Now you have the min and max that can be generated from S. Now take all multiples of N in the interval min, max and see which of them are formed by the digits in S.
I have a number n, and I want to find three numbers whose product is n but are as close to each other as possible. That is, if n = 12 then I'd like to get 3, 2, 2 as a result, as opposed to 6, 1, 2.
Another way to think of it is that if n is the volume of a cuboid then I want to find the lengths of the sides so as to make the cuboid as much like a cube as possible (that is, the lengths as similar as possible). These numbers must be integers.
I know there is unlikely to be a perfect solution to this, and I'm happy to use something which gives a good answer most of the time, but I just can't think where to go with coming up with this algorithm. Any ideas?
Here's my first algorithm sketch, granted that n is relatively small:
Compute the prime factors of n.
Pick out the three largest and assign them to f1, f2, f3. If there are less than three factors, assign 1.
Loop over remaining factors in decreasing order, multiply them into the currently smallest partition.
Edit
Let's take n=60.
Its prime factors are 5 3 2 2.
Set f1=5, f2=3 and f3=2.
The remaining 2 is multiplied to f3, because it is the smallest.
We end up with 5 * 4 * 3 = 60.
Edit
This algorithm will not find optimum, notice btillys comment:
Consider 17550 = 2 * 3 * 3 * 3 * 5 * 5
* 13. Your algorithm would give 15, 30, 39 when the best is 25, 26, 27.
Edit
Ok, here's my second algorithm sketch with a slightly better heuristic:
Set the list L to the prime factors of n.
Set r to the cube root of n.
Create the set of three factors F, initially set to 1.
Iterate over the prime factors in descending order:
Try to multiply the current factor L[i] with each of the factors in descending order.
If the result is less than r, perform the multiplication and move on to the next
prime factor.
If not, try the next F. If out of Fs, multiply with the smallest one.
This will work for the case of 17550:
n=17550
L=13,5,5,3,3,3,2
r=25.98
F = { 1, 1, 1 }
Iteration 1:
F[0] * 13 is less than r, set F to {13,1,1}.
Iteration 2:
F[0] * 5 = 65 is greated than r.
F[1] * 5 = 5 is less than r, set F to {13,5,1}.
Iteration 3:
F[0] * 5 = 65 is greated than r.
F[1] * 5 = 25 is less than r, set F to {13,25,1}.
Iteration 4:
F[0] * 3 = 39 is greated than r.
F[1] * 3 = 75 is greated than r.
F[2] * 3 = 3 is less than r, set F to {13,25,3}.
Iteration 5:
F[0] * 3 = 39 is greated than r.
F[1] * 3 = 75 is greated than r.
F[2] * 3 = 9 is less than r, set F to {13,25,9}.
Iteration 6:
F[0] * 3 = 39 is greated than r.
F[1] * 3 = 75 is greated than r.
F[2] * 3 = 27 is greater than r, but it is the smallest F we can get. Set F to {13,25,27}.
Iteration 7:
F[0] * 2 = 26 is greated than r, but it is the smallest F we can get. Set F to {26,25,27}.
Here's a purely math based approach, that returns the optimal solution and does not involve any kind of sorting. Hell, it doesn't even need the prime factors.
Background:
1) Recall that for a polynomial
the sum and product of the roots are given by
where x_i are the roots.
2) Recall another elementary result from optimization theory:
i.e., given two variables such that their product is a constant, the sum is minimum when the two variables are equal to each other. The tilde variables denote the optimal values.
A corollary of this would be that if the sum of two variables whose product is constant, is a minimum, then the two variables are equal to each other.
Reformulate the original problem:
Your question above can now be reformulated as a polynomial root-finding exercise. We'll construct a polynomial that satisfies your conditions, and the roots of that polynomial will be your answer. If you need k numbers that are optimal, you'll have a polynomial of degree k. In this case, we can talk in terms of a cubic equation
We know that:
c is the negative of the input number (assume positive)
a is an integer and negative (since factors are all positive)
b is an integer (which is the sum of the roots taken two at a time) and is positive.
Roots of p must be real (and positive, but that has already been addressed).
To solve the problem, we simply need to maximize a subject to the above set of conditions. The only part not explicitly known right now, is condition 4, which we can easily enforce using the discriminant of the polynomial.
For a cubic polynomial p, the discriminant is
and p has real and distinct roots if ∆>0 and real and coincident (either two or all three) if ∆=0. So, constraint 4 now reads ∆>=0. This is now simple and easy to program.
Solution in Mathematica
Here's a solution in Mathematica that implements this.
And here's a test on some of the numbers used in other answers/comments.
The column on the left is the list and the corresponding row in the column on the right gives the optimal solution.
NOTE:
I just noticed that OP never mentioned that the 3 numbers needed to be integers although everyone (including myself until now) assumed that they were (probably because of his first example). Re-reading the question, and going by the cube example, it doesn't seem like OP was fixated on integers.
This is an important point which will decide which class of algorithms to pursue and needs to be defined. If they need not be integers, there are several polynomial based solutions that can be provided, one of which is mine (after relaxing the integer constraint). If they should be integers, then perhaps an approach using branch-n-bound/branch-n-cut/cutting plane might be more appropriate.
The following was written assuming the OP meant the three numbers to be integers.
The way I've implemented it right now, it can give a non-integer solution in certain cases.
The reason this gives non-integer solutions for x is because I had only maximized a, when actually, b also needs to be minimum (not only that, but also because I haven't placed a constraint on the x_i being integers. It is possible to use the integer root theorem, which would involve finding the prime factors, but makes things more complicated.)
Mathematica code in text
Clear[poly, disc, f]
poly = x^3 + a x^2 + b x + c;
disc = Discriminant[poly, x];
f[n_Integer] :=
Module[{p, \[CapitalDelta] = disc /. c -> -n},
p = poly /.
Maximize[{a, \[CapitalDelta] >= 0,
b > 0 && a < 0 && {a, b} \[Element] Integers}, {a, b}][[
2]] /. c -> -n;
Solve[p == 0]
]
There may be a clever way to find the tightest triplet, as Anders Lindahl is pursuing, but I will focus on a more basic approach.
If I generate all triplets, then I can filter them afterward however I want, so I will start there. The best way I know to generate these uses recursion:
f[n_, 1] := {{n}}
f[n_, k_] := Join ##
Table[
{q, ##} & ### Select[f[n/q, k - 1], #[[1]] >= q &],
{q, #[[2 ;; ⌈ Length##/k ⌉ ]] & # Divisors # n}
]
This function f takes two integer arguments, the number to factor n, and the number of factors to produce k.
The section #[[2 ;; ⌈ Length##/k ⌉ ]] & # Divisors # n uses Divisors to produce a list of all divisors of n (including 1), and then takes from these from the second (to drop the 1) to the Ceiling of the number of divisors divided by k.
For example, for {n = 240, k = 3} the output is {2, 3, 4, 5, 6, 8}
The Table command iterates over this list while accumulating results, assigning each element to q.
The body of the Table is Select[f[n/q, k - 1], #[[1]] >= q &]. This calls f recursively, and then selects from the result all lists that begin with a number >= q.
{q, ##} & ### (also in the body) then "prepends" q to each of these selected lists.
Finally, Join ## merges the lists of these selected lists that are produced by each loop of Table.
The result is all of the integer factors of n into k parts, in lexicographical order. Example:
In[]:= f[240, 3]
Out[]= {{2, 2, 60}, {2, 3, 40}, {2, 4, 30}, {2, 5, 24}, {2, 6, 20},
{2, 8, 15}, {2, 10, 12}, {3, 4, 20}, {3, 5, 16}, {3, 8, 10},
{4, 4, 15}, {4, 5, 12}, {4, 6, 10}, {5, 6, 8}}
With the output of the function/algorithm given above, one can then test triplets for quality however desired.
Notice that because of the ordering the last triplet in the output is the one with the greatest minimum factor. This will usually be the most "cubic" of the results, but occasionally it is not.
If the true optimum must be found, it makes sense to test starting from the right side of the list, abandoning the search if a better result is not found quickly, as the quality of the results decrease as you move left.
Obviously this method relies upon a fast Divisors function, but I presume that this is either a standard library function, or you can find a good implementation here on StackOverflow. With that in place, this should be quite fast. The code above finds all triplets for n from 1 to 10,000 in 1.26 seconds on my machine.
Instead of reinventing the wheel, one should recognize this as a variation of a well known NP-complete problem.
Compute the prime factors of n.
Compute the logarithms of these factors
The problem translates as partitioning these logs into three sums that are as close as possible.
This problem is known as a variation of the Bin Packing problem, known as Multiprocessor scheduling
Given the fact that the Multiprocessor scheduling problem is NP-complete, it's no wonder that it's hard to find an algorithm that does not search the whole problem space and finds the optimum solution.
But I guess there are already several algorithms that deal with either Bin-Packing or Multiprocessor-Scheduling and find near-optimum solutions in efficient manner.
Another related problem (generalization) is the Job shop scheduling. See the wikipedia description with many links to known algorithms.
What wikipedia describes as (the often-used LPT-Algorithm (Longest Processing Time) is exactly what Anders Lindahl came up with first.
EDIT
Here's a shorter explanation using more efficient code, KSetPartitions simplifies things considerably. So did some suggestions from Mr.W. The overall logic remains the same.
Assuming there a at least 3 prime factors of n,
Find the list of triplet KSetPartitions for the prime factors of n.
Multiply each of the elements (prime factors) within each subset to produce all possible combinations for three divisors of n (when multiplied they yield n). You can think of the divisors as the length, width and height of an orthogonal parallelepiped.
The parallelepiped closest to a cube will have the shortest space diagonal. Sum the squares of the three divisors for each case and pick the smallest.
Here's the code in Mathematica:
Needs["Combinatorica`"]
g[n_] := Module[{factors = Join ## ConstantArray ### FactorInteger[n]},
Sort[Union[Sort /# Apply[Times, Union[Sort /#
KSetPartitions[factors, 3]], {2}]]
/. {a_Integer, b_Integer, c_Integer} :>
{Total[Power[{a, b, c}, 2]], {a, b, c}}][[1, 2]]]
It can handle fairly large numbers, but slows down considerably as the number of factors of n grows. The examples below show timings for 240, 2400, ...24000000.
This could be sped up in principle by taking into account cases where a prime factor appears more than once in a divisor. I don't have the know-how to do it yet.
In[28]:= g[240]
Out[28]= {5, 6, 8}
In[27]:= t = Table[Timing[g[24*10^n]][[1]], {n, 6}]
Out[27]= {0.001868, 0.012734, 0.102968, 1.02469, 10.4816, 105.444}