Algorithm to approximate an optimal solution for an integer allocation pro­blem - algorithm

I have the following problem:
Given a set of sums of variables like { a + b, b + c, c + d, a + a + d, b }, find positive integer values for the variables such that all sums are distinct and the highest sum is as small as possible.
Is there an algorithm to find or approximate a solution to this kind of problems?

I have created a possible solution and an implementation in C#. Hope it is what you need. It would be nice if someone proves it is correct/incorrect but it works and results look correct. The details on theory are below. Its complexity is something about O(N!*M^3*Log2(N)) where N is number of variables and M is number of summands all sums.
BTW, for your example it gives this result:
c=3, a=2, d=2, b=1
{a+b=3; b+c=4; c+d=5; a+a+d=6; b=1}
max=6
UPDATE
Theory of the algorithm.
Assume the variables are ordered, e.g. a >= b >= c >= ....
Lets say a set of sums is a Bag if all sums in it are distinct.
All sums in a Bag can be divided into two groups: sums that do not contain variable a and sums that do contain. Lets call the first group as Head and the second as Tail.
Note that both are Bags because they contain distinct sums.
We can subtract a from each sum in Tail so that all sums remain distinct (i.e. the Tail is still a Bag). This way we get two bags both without variable a.
Similar way we exclude variable b from two Bags and get four Bags.
This operation we repeat for each variable until we get sums with last variable (lets say it is d). The smallest value of d is 1.
After that we can return to the previous step and include variable c in sums from tails. Remember that we have many pairs Head-Tail and need to join them back. To do that we add c to each sum in each Tail so that sums from the Tail have to be distinct from the Head.
How to calculate c? The idea is to calculate its invalid values and then take the smallest value that is not invalid and also is equal or greater than d. Calculating invalid values is trivial, using condition HeadSum != TailSum + c => c != HeadSum - TailSum. For each combination of tail sum and head sum we get all invalid values.
Collapsing all pairs Head-Tail back and calculating each variable we get the solution for the case a >= b >= c >= d.
Then we calculate a solution for each permutation of a >= b >= c >= d and find the smallest one.
PS It would be great if someone provide better solution because I think my algorithm is somewhat of approximation (but a good approximation) or even better to prove it.
The worst thing is N! complexity because of permutations and I still have no idea how to get rid of it.

Related

Karp reduction from PARTITION to SUBSET SUM

PARTITION: Given a set of positive integers A={a_1,...,a_n} does there exist a subset of A with sum equal to the sum of it's complement?
SUBSET SUM: Given a set of positive integers A={a_1,...,a_n} and another positive integer B, does there exist a subset of A such that it's sum is equal to B?
I was trying to prove that if PARTITION is NP-complete then SUBSET SUM is also NP-complete, by reducing PART to SSUM.
My solution was: let A={a1,...,an} be a set of positive integers. Then if A when fed into PART gives the solution I={k1,...,km} (where k_i are the indices of the members of the solution subset), then we construct A'={a1,...an,S} where S is the sum of {a_k1,a_k2,...,a_km}. A' is a solution to SSUM.
My problem with this is that this goes only one way, meaning that we can't show that given A' then A is a solution to PART. Is this a problem? and how could i modify the proof to cover it?
Partition to SubsetSum is actually easier than what you've done here.
If Partition is Satisfied that means that there are some subsets P1 and P2 such that sum(P1) = sum(P2) correct? Because sum(P1) + sum(P2) = Sum(A), which means that sum(P1) = sum(P2) = (1/2)sum(A)
We don't even need to construct an A' for subset sum. Just set A' = A and the target sum = (1/2)sum(A). It should be clear that this is the exact same problem as Partition with almost no abstraction.
In other words Partition is always just subset sum where target sum = (1/2)sum(A)

Select K unique random numbers from range with sum equal to S

i have a range
R = {0, ..., N}
and i like to get K elements which have a sum equal to S, but the elements should be selected randomly.
So an easy brute force method would be to determine all element combinations containing K numbers resulting in S and picking one of the combinations by random.
I am trying to think about a recursive solution where a random number is selected and then the problem reduces to find (K-1) random numbers with sum equal to (S - K0) but this need not yield in a solution.
Is there a better approach?
A sample would be:
R = {0,1,2,3,4,5}, S = 5, K = 2
Solutions: randomly pick one of {{1,4};{2,3};{0.5}}
In general, if K is big (then N also), and S not too little, it is unpredictable, because, there are two many combinations.
Brute force: try every combinations. You are sure to find a solution, if there exists one, but if there are more than, say, 1 Md, or somewhat, it it almost impossible to list them all.
Your algorithm:
To choose at random, your algorithm is ok: take one number at random, then another, ...
But you make an assumption: there exists a solution with the numbers you pick: you dont know.
So what ? if statistically there exist many solutions, you could find it like that, perhaps, or perhaps not.
Some trails:
1 Use S/K
If every numbers < S/K, it is impossible.
if every numbers > S/K, it is impossible.
So lets assume that there are numbers < S/K, and other > S/K
2 keep only numbers < S, very interesting if S is little.
3 idea: If S is big, and numbers little, you have chance that there exist many combinations.
idea of algorithm
1 take one number N1 at random
2 if N1 < S/K, take another one N2 > S/K
3 calculate N1+N2: if < 2.S/K take another one N3> S/K, if not
4 iterate at each step: if sum < n S/K take another one > S/K, if not
5 you can have better precision, by replacing S/K by (S-sum N1,N2,...)/(K-n)
If at one step you dont can not find any number, backtrack
hope it helps
I would start with Dirichlet distribution (https://en.wikipedia.org/wiki/Dirichlet_distribution). Using it, you could sample uniformly in (0..1) distributed random numbers Xi, such that SumiXi = 1.
For S <= N, it is easy to see that sampling beyond S is useless and should be rejected outright.
So, combining with acceptance/rejection, something along the lines
Divide interval [0...1] into S (or S+1 if 0 is allowed) equal bins.
Sample K numbers from Dirichlet distribution.
Map sampled numbers to bin index, so you have now sampled integers which are
all below or equal S and have sum equal to S.
If all integers are distinct, accept the sampling, otherwise reject the sampling and go to step 2

Algorithm for generating a set of Subset-Distinct-Sum integers

I'm attempting to create a scoring system for a card game which would preclude ties in scoring, by setting the point value of each card such that no two combinations of cards could add up to the same score. (For this particular case, I need a set of 17 integers, since there are 17 scorable cards.)
I've tried several heuristic approaches (various winnowing procedures along the lines of taking an array of integers, iteratively generating random subsets, and discard those which appear in subsets sharing a common sum); then exhaustively validating the results (by enumerating their subsets).
From what I've seen, the theoretical limit to the size of such a set is near log2(n), where n is the number of members of the superset from which the subset-distinct-sum subset is drawn. However, while I've been able to approach this, I've not been able to match it. My best result so far is a set of 13 integers, drawn from the 250,000 integers between 10,000 and 25,000,000, counting by hundreds (the latter is immaterial to the algorithm, but is a domain constrain of my use case):
[332600,708900,2130500,2435900,5322500,7564200,10594500,12776200,17326700,17925700,22004400,23334700,24764900]
I've hunted around, and most of the SDS generators are sequence generators that make no pretense of creating dense sets, but instead have the ability to be continued indefinitely to larger and larger numbers (e.g. the Conway-Guy Sequence). I have no such constraint, and would prefer a denser set without requiring a sequence relationship with each other.
(I did consider using the Conway-Guy Sequence n=2..18 * 10,000, but the resulting set has a broader range than I would like. I'd also really like a more general algorithm.)
Edit: For clarity, I'm looking for a way (non-deterministic or dynamic-programming methods are fine) to generate an SDS set denser than those provided by simply enumerating exponents or using a sequence like Conway-Guy. I hope, by discarding the "sequence generator" constraint, I can find numbers much closer together than such sequences provide.
For any value of N, it is readily possible to generate up to Floor(Log2(N))-1 numbers (which we'll call the set "S") such that:
All members of S are less than or equal to N, and
No two distinct subsets of S have the same sum, and
All members of S are within a factor of two of each other.
Your suspicions were correct in that S would not be in any sense extensible (you could not add more members to it)
Method:
For N, find T = 2^P , where T is the highest power of two that is less than or equal to N. That is:
P = Floor( Log2(N) ), and
T = 2^P
Then the members of S can be generated as:
for( i=0 to P-2 ): S(i) = 2^i + 2^(P-1)
Or, to put it another way, S(i) = 2^i, for 0<= i < P-1
This makes for a total of P-1 (or Floor(Log2(N))-1) members. Can two distinct subsets of S ever sum to the same number? No:
Proof
Let's consider any two subsets of S: U and V, which are distinct (that is, they have no members in common). Then the sum of U is:
Sum(U) = O(U)*(T/2) + Sum(2^i| S(i):U)
Where
O(U) is the Order of the set U (how many elements it has),
"S(i):U" means "S(i) is an element of U", and
"|" is the conditioning operator (means "given that.." or "where.."),
So, putting the last two together, Sum(2^i| S(i):U) just means "the sum of all of the powers of two that are elements of U" (remembering that S(i) = 2^i)).
And likewise, the sum of V is:
Sum(V) = O(V)*(2^(P-1)) + Sum(2^i| S(i):V)
Now because U and V are distinct: Sum(2^i| S(i):U) can never be equal, because no two sums of distinct powers of two can ever be equal.
Also, because Sum(2^i; 0 <= i < P-1) = 2^(P-1)-1), these sums of the powers of two must always be less than 2^P-1. This means that the sums of U and V could only be equal if:
O(U)*(2^(P-1)) = O(V)*(2^(P-1))
or
O(U) = O(V)
That is, if U and V have the same number of elements, so that the first terms will be equal (because the second terms can never be as large as any differences in the first terms).
In such a case (O(U) = (O(V)) the first terms are equal, so Sum(U) would equal Sum(V) IFF their second terms (the binary sums) are also equal. However, we already know that they can never be equal, therefore, it can never be true that Sum(U) = Sum(V).
It seems like another way of phrasing the problem is to make sure that the previous terms never sum to the current term. If that's never the case, you'll never have two sums that add up to the same.
Ex: 2, 3, 6, 12, 24, 48, 96, ...
Summing to any single element {i} takes 1 more than the sum of the previous terms, and summing to any multi-element set {i,j} takes more than the sum of previous elements to i and previous elements to j.
More mathematically: (i-1), i, 2i, 4i, 8i, ... 2^n i Should work for any i, n.
The only way this doesn't work is if you're allowed to choose the same number twice in your subset (if that's the case, you should specify it in the problem). But that brings up the issue that Sum{i} = Sum{i} for any number, so that seems like an issue.

Sum Combination List

I need an algorithm for this problem:
Given a set of n natural numbers x1,x2,...,xn, a number S and k. Form the sum of k numbers picked from the set (a number can be pick many times) with sum S.
Stated differently: List every possible combination for S with Bounds: n<=256, x<=1000, k<=32
E.g.
problem instance: {1,2,5,9,11,12,14,15}, S=30, k=3
There are 4 possible combinations
S=1+14+15, 2+14+14, 5+11+15, 9+9+12.
With these bounds, it is unfeasible to use brute force but I think of dynamic programming is a good approach.
The scheme is: Table t, with t[m,v] = number of combinations of sum v formed by m numbers.
1. Initialize t[1,x(i)], for every i.
2. Then use formula t[m,v]=Sum(t[m-1,v-x(i)], every i satisfied v-x(i)>0), 2<=m<=k.
3. After obtaining t[k,S], I can trace back to find all the combinations.
The dilemma is that t[m,v] can be increase by duplicate commutative combinations e.g., t[2,16]=2 due to 16=15+1 and 1+15. Furthermore, the final result f[3,30] is large, due to 1+14+15, 1+15+14, ...,2+14+14,14+2+14,...
How to get rid of symmetric permutations? Thanks in advance.
You can get rid of permutations by imposing an ordering on the way you pick elements of x. Make your table a triple t[m, v, n] = number of combinations of sum v formed by m numbers from x1..xn. Now observe t[m, v, n] = t[m, v, n-1] + t[m-1, v-x_n, n]. This solves the permutation problem by only generating summands in reverse order from their appearance in x. So for instance it'll generate 15+14+1 and 14+14+2 but never 14+15+1.
(You probably don't need to fill out the whole table, so you should probably compute lazily; in fact, a memoized recursive function is probably what you want here.)

Finding even numbers in an array without using feedback

I saw this post: Finding even numbers in an array and I was thinking about how you could do it without feedback. Here's what I mean.
Given an array of length n containing at most e even numbers and a
function isEven that returns true if the input is even and false
otherwise, write a function that prints all the even numbers in the
array using the fewest number of calls to isEven.
The answer on the post was to use a binary search, which is neat since it doesn't mean the array has to be in order. The number of times you have to check if a number is even is e log n instead if n because you do a binary search (log n) to find one even number each time (e times).
But that idea means that you divide the array in half, test for evenness, then decide which half to keep based on the result.
My question is whether or not you can beat n calls on a fixed testing scheme where you check all the numbers you want for evenness without knowing the outcome, and then figure out where the even numbers are after you've done all the tests based on the results. So I guess it's no-feedback or blind or some term like that.
I was thinking about this for a while and couldn't come up with anything. The binary search idea doesn't work at all with this constraint, but maybe something else does? Even getting down to n/2 calls instead of n (yes, I know they are the same big-O) would be good.
The technical term for "no-feedback or blind" is "non-adaptive". O(e log n) calls still suffice, but the algorithm is rather more involved.
Instead of testing the evenness of products, we're going to test the evenness of sums. Let E ≠ F be distinct subsets of {1, …, n}. If we have one array x1, …, xn with even numbers at positions E and another array y1, …, yn with even numbers at positions F, how many subsets J of {1, …, n} satisfy
(∑i in J xi) mod 2 ≠ (∑i in J yi) mod 2?
The answer is 2n-1. Let i be an index such that xi mod 2 ≠ yi mod 2. Let S be a subset of {1, …, i - 1, i + 1, … n}. Either J = S is a solution or J = S union {i} is a solution, but not both.
For every possible outcome E, we need to make calls that eliminate every other possible outcome F. Suppose we make 2e log n calls at random. For each pair E ≠ F, the probability that we still cannot distinguish E from F is (2n-1/2n)2e log n = n-2e, because there are 2n possible calls and only 2n-1 fail to distinguish. There are at most ne + 1 choices of E and thus at most (ne + 1)ne/2 pairs. By a union bound, the probability that there exists some indistinguishable pair is at most n-2e(ne + 1)ne/2 < 1 (assuming we're looking at an interesting case where e ≥ 1 and n ≥ 2), so there exists a sequence of 2e log n calls that does the job.
Note that, while I've used randomness to show that a good sequence of calls exists, the resulting algorithm is deterministic (and, of course, non-adaptive, because we chose that sequence without knowledge of the outcomes).
You can use the Chinese Remainder Theorem to do this. I'm going to change your notation a bit.
Suppose you have N numbers of which at most E are even. Choose a sequence of distinct prime powers q1,q2,...,qk such that their product is at least N^E, i.e.
qi = pi^ei
where pi is prime and ei > 0 is an integer and
q1 * q2 * ... * qk >= N^E
Now make a bunch of 0-1 matrices. Let Mi be the qi x N matrix where the entry in row r and column c has a 1 if c = r mod qi and a 0 otherwise. For example, if qi = 3^2, then row 2 has ones in columns 2, 11, 20, ... 2 + 9j and 0 elsewhere.
Now stack these matrices vertically to get a Q x N matrix M, where Q = q1 + q2 + ... + qk. The rows of M tell you which numbers to multiply together (the nonzero positions). This gives a total of Q products that you need to test for evenness. Call each row a "trial", and say that a "trial involves j" if the jth column of that row is nonempty. The theorem you need is the following:
THEOREM: The number in position j is even if and only if all trials involving j are even.
So you do a total of Q trials and then look at the results. If you choose the prime powers intelligently, then Q should be significantly smaller than N. There are asymptotic results that show you can always get Q on the order of
(2E log N)^2 / 2log(2E log N)
This theorem is actually a corollary of the Chinese Remainder Theorem. The only place that I've seen this used is in Combinatorial Group Testing. Apparently the problem originally arose when testing soldiers coming back from WWII for syphilis.
The problem you are facing is a form of group testing, type of a problem with the objective of reducing the cost of identifying certain elements of a set (up to d elements of a set of N elements).
As you've already stated, there are two basic principles via which the testing may be carried out:
Non-adaptive Group Testing, where all the tests to be performed are decided a priori.
Adaptive Group Testing, where we perform several tests, basing each test on the outcome of previous tests. Obviously, adaptive testing has a potential to reduce the cost, compared to non-adaptive testing.
Theoretical bounds for both principles have been studied, and are available in this Wiki article, or this paper.
For adaptive testing, the upper bound is O(d*log(N)) (as already described in this answer).
For non-adaptive testing, it can be shown that the upper bound is O(d*d/log(d)*log(N)), which is obviously larger than the upper bound for adaptive testing by a factor of d/log(d).
This upper bound for non-adaptive testing comes from an algorithm which uses disjunct matrices: matrices of dimension T x N ("number of tests" x "number of elements"), where each item can be either true (if an element was included in a test), or false (if it wasn't), with a property that any subset of d columns must differ from all other columns by at least a single row (test inclusion). This allows linear time of decoding (there are also "d-separable" matrices where fewer test are needed, but the time complexity for their decoding is exponential and not computationaly feasible).
Conclusion:
My question is whether or not you can beat n calls on a fixed testing scheme [...]
For such a scheme and a sufficiently large value of N, a disjunct matrix can be constructed which would have less than K * [d*d/log(d)*log(N)] rows. So, for large values of N, yes, you can beat it.
The underlying question (challenge) is kind of silly. If the binary search answer is acceptable (where it sums sub arrays and sends them to IsEven) then I can think of a way to do it with E or less calls to IsEven (assuming the numbers are integers of course).
JavaScript to demonstrate
// sort the array by only the first bit of the number
A.sort(function(x,y) { return (x & 1) - (y & 1); });
// all of the evens will be at the beginning
for(var i=0; i < E && i < A.length; i++) {
if(IsEven(A[i]))
Print(A[i]);
else
break;
}
Not exactly a solution, but just few thoughts.
It is easy to see that if a solution exists for array length n that takes less than n tests, then for any array length m > n it is easy to see that there is always a solution with less than m tests. So, if you have a solution for n = 2 or 3 or 4, then the problem is solved.
You can split the array into pairs of numbers and for each pair: if the sum is odd, then exactly one of them is even, otherwise if one of the numbers is even, then both of them are even. This way for each pair it takes either one or two tests. Best case:n/2 tests, worse case:n tests, if even and odd numbers are chosen with equal probability, then: 3n/4 tests.
My hunch is there is no solution with less than n tests. Not sure how to prove it.
UPDATE: The second solution can be extended in the following way.
Check if the sum of two numbers is even. If odd, then exactly one of them is even. Otherwise label the set as "homogeneous set of size 2". Take two "homogenous set"s of same size n. Pick one number from each set and check if their sum is even. If it is even, combine these two sets to a "homogeneous set of size 2n". Otherwise, it implies that one of those sets purely consists of even numbers and the other one purely odd numbers.
Best case:n/2 tests. Average case: 3*n/2. Worst case is still n. Worst case exists only when all the numbers are even or all the numbers are odd.
If we can add and multiply array elements, then we can compute every Boolean function (up to complementation) on the low-order bits. Simulate a circuit that encodes the positions of the even numbers as a number from 0 to nC0 + nC1 + ... + nCe - 1 represented in binary and use calls to isEven to read off the bits.
Number of calls used: within 1 of the information-theoretic optimum.
See also fully homomorphic encryption.

Resources