Subset sum problem where each number can be added or subtracted - algorithm

Given a set A containing n positive integers, how can I find the smallest integer >= 0 that can be obtained using all the elements in the set. Each element can be can be either added or subtracted to the total.
Few examples to make this clear.
A = [ 2, 1, 3]
Result = 0 (2 + 1 - 3)
A = [1, 2, 0]
Result = 1 (-1 + 2 + 0)
A = [1, 2, 1, 7, 6]
Result = 1 (1 + 2 - 1 - 7 + 6)

You can solve it by using Boolean Integer Programming. There are several algorithms (e.g. Gomory or branch and bound) and free libraries (e.g. LP-Solve) available.
Calculate the sum of the list and call it s. Double the numbers in the list. Say the doubled numbers are a,b,c. Then you have the following equation system:
Boolean x,y,z
a*x+b*y+c*z >= s
Minimize ax+by+cz!
The boolean variables indicate if the corresponding number should be added (when true) or subtracted (when false).
[Edit]
I should mention that the transformed problem can be seen as "knapsack problem" as well:
Boolean x,y,z
-a*x-b*y-c*z <= -s
Maximize ax+by+cz!

Related

Rearrange list to satisfy a condition

I was asked this during a coding interview but wasn't able to solve this. Any pointers would be very helpful.
I was given an integer list (think of it as a number line) which needs to be rearranged so that the difference between elements is equal to M (an integer which is given). The list needs to be rearranged in such a way that the value of the max absolute difference between the elements' new positions and the original positions needs to be minimized. Eventually, this value multiplied by 2 is returned.
Test cases:
//1.
original_list = [1, 2, 3, 4]
M = 2
rearranged_list = [-0.5, 1.5, 3.5, 5.5]
// difference in values of original and rearranged lists
diff = [1.5, 0.5, 0.5, 1.5]
max_of_diff = 1.5 // list is rearranged in such a way so that this value is minimized
return_val = 1.5 * 2 = 3
//2.
original_list = [1, 2, 4, 3]
M = 2
rearranged_list = [-1, 1, 3, 5]
// difference in values of original and rearranged lists
diff = [2, 1, 1, 2]
max_of_diff = 2 // list is rearranged in such a way so that this value is minimized
return_val = 2 * 2 = 4
Constraints:
1 <= list_length <= 10^5
1 <= M <= 10^4
-10^9 <= list[i] <= 10^9
There's a question on leetcode which is very similar to this: https://leetcode.com/problems/minimize-deviation-in-array/ but there, the operations that are performed on the array are mentioned while that's not been mentioned here. I'm really stumped.
Here is how you can think of it:
The "rearanged" list is like a straight line that has a slope that corresponds to M.
Here is a visualisation for the first example:
The black dots are the input values [1, 2, 3, 4] where the index of the array is the X-coordinate, and the actual value at that index, the Y-coordinate.
The green line is determined by M. Initially this line runs through the origin at (0, 0). The red line segments represent the differences that must be taken into account.
Now the green line has to move vertically to its optimal position. We can see that we only need to look at the difference it makes with the first and with the last point. The other two inputs will never contribute to an extreme. This is generally true: there are only two input elements that need to be taken into account. They are the points that make the greatest (signed -- not absolute) difference and the least difference.
We can see that we need to move the green line in such a way that the signed differences with these two extremes are each others opposite: i.e. their absolute difference becomes the same, but the sign will be opposite.
Twice this absolute difference is what we need to return, and it is actually the difference between the greatest (signed) difference and the least (signed) difference.
So, in conclusion, we must generate the values on the green line, find the least and greatest (signed) difference with the data points (Y-coordinates) and return the difference between those two.
Here is an implementation in JavaScript running the two examples you provided:
function solve(y, slope) {
let low = Infinity;
let high = -Infinity;
for (let x = 0; x < y.length; x++) {
let dy = y[x] - x * slope;
low = Math.min(low, dy);
high = Math.max(high, dy);
}
return high - low;
}
console.log(solve([1, 2, 3, 4], 2)); // 3
console.log(solve([1, 2, 4, 3], 2)); // 4

Minimum Delete operations to empty the vector

My friend was asked this question in an interview:
We have a vector of integers consisting only of 0s and 1s. A delete consists of selecting consecutive equal numbers and removing them. The remaining parts are then attached to each other. For e.g., if the vector is [0,1,1,0] then after removing [1,1] we get [0,0]. We need one delete to remove an element from the vector, if no consecutive elements are found.
We need to write a function that returns the minimum number of deletes to make the vector empty.
Examples 1:
Input: [0,1,1,0]
Output: 2
Explanation: [0,1,1,0] -> [0,0] -> []
Examples 2:
Input: [1,0,1,0]
Output: 3
Explanation: [1,0,1,0] -> [0,1,0] -> [0,0] -> [].
Examples 3:
Input: [1,1,1]
Output: 1
Explanation: [1,1,1] -> []
I am unsure of how to solve this question. I feel that we can use a greedy approach:
Remove all consecutive equal elements and increment the delete counter for each;
Remove elements of the form <a, b, c> where a==c and a!=b, because of we had multiple consecutive bs, it would have been deleted in step (1) above. Increment the delete counter once as we delete one b.
Repeat steps (1) and (2) as long as we can.
Increment delete counter once for each of the remaining elements in the vector.
But I am not sure if this would work. Could someone please confirm if this is the right approach? If not, how do we solve this?
Hint
You can simplify this problem greatly by noticing the following fact: a chain of consecutive zeros or ones can be shortened or lengthened without changing the final solution. By example, the two vectors have the same solution:
[1, 0, 1]
[1, 0, 0, 0, 0, 0, 0, 1]
With that in mind, the solution becomes simpler. So I encourage you to pause and try to figure it out!
Solution
With the previous remark, we can reduce the problem to vectors of alternating zeros and ones. In fact, since zero and one have no special meaning here, it suffices to solve for all such vector which start by... say a one.
[] # number of steps: 0
[1] # number of steps: 1
[1, 0] # number of steps: 2
[1, 0, 1] # number of steps: 2
[1, 0, 1, 0] # number of steps: 3
[1, 0, 1, 0, 1] # number of steps: 3
[1, 0, 1, 0, 1, 0] # number of steps: 4
[1, 0, 1, 0, 1, 0, 1] # number of steps: 4
We notice a pattern, the solution seems to be floor(n / 2) + 1 for n > 1 where n is the length of those sequences. But can we prove it..?
Proof
We will proceed by induction. Suppose you have a solution for a vector of length n - 2, then any move you do (except for deleting the two characters on the edges of the vector) will have the following result.
[..., 0, 1, 0, 1, 0 ...]
^------------ delete this one
Result:
[..., 0, 1, 1, 0, ...]
But we already mentioned that a chain of consecutive zeros or ones can be shortened or lengthened without changing the final solution. So the result of the deletion is in fact equivalent to now having to solve for:
[..., 0, 1, 0, ...]
What we did is one deletion in n elements and arrived to a case which is equivalent to having to solve for n - 2 elements. So the solution for a vector of size n is...
Solution(n) = Solution(n - 2) + 1
= [floor((n - 2) / 2) + 1] + 1
= floor(n / 2) + 1
Keeping in mind that the solutions for [1] and [1, 0] are respectively 1 and 2, this concludes our proof. Notice here, that [] turns out to be an edge case.
Interestingly enough, this proof also shows us that the optimal sequence of deletions for a given vector is highly non-unique. You can simply delete any block of ones or zeros, except for the first and last ones, and you will end up with an optimal solution.
Conclusion
In conclusion, given an arbitrary vector of ones and zeros, the smallest number of deletions you will need can be computed by counting the number of groups of consecutive ones or zeros. The answer is then floor(n / 2) + 1 for n > 1.
Just for fun, here is a Python implementation to solve this problem.
from itertools import groupby
def solution(vector):
n = 0
for group in groupby(vector):
n += 1
return n // 2 + 1 if n > 1 else n
Intuition: If we remove the subsegments of one integer, then all the remaining integers are of one type leads to only one operation.
Choosing the integer which is not the starting one to remove subsegments leads to optimal results.
Solution:
Take the integer other than the one that is starting as a flag.
Count the number of contiguous segments of the flag in a vector.
The answer will be the above count + 1(one operation for removing a segment of starting integer)
So, the answer is:
answer = Count of contiguous segments of flag + 1
Example 1:
[0,1,1,0]
flag = 1
Count of subsegments with flag = 1
So, answer = 1 + 1 = 2
Example 2:
[1,0,1,0]
flag = 0
Count of subsegments with flag = 2
So, answer = 2 + 1 = 3
Example 3:
[1,1,1]
flag = 0
Count of subsegments with flag = 0
So, answer = 0 + 1 = 1

Minimum common remainder of division

I have n pairs of numbers: ( p[1], s[1] ), ( p[2], s[2] ), ... , ( p[n], s[n] )
Where p[i] is integer greater than 1; s[i] is integer : 0 <= s[i] < p[i]
Is there any way to determine minimum positive integer a , such that for each pair :
( s[i] + a ) mod p[i] != 0
Anything better than brute force ?
It is possible to do better than brute force. Brute force would be O(A·n), where A is the minimum valid value for a that we are looking for.
The approach described below uses a min-heap and achieves O(n·log(n) + A·log(n)) time complexity.
First, notice that replacing a with a value of the form (p[i] - s[i]) + k * p[i] leads to a reminder equal to zero in the ith pair, for any positive integer k. Thus, the numbers of that form are invalid a values (the solution that we are looking for is different from all of them).
The proposed algorithm is an efficient way to generate the numbers of that form (for all i and k), i.e. the invalid values for a, in increasing order. As soon as the current value differs from the previous one by more than 1, it means that there was a valid a in-between.
The pseudocode below details this approach.
1. construct a min-heap from all the following pairs (p[i] - s[i], p[i]),
where the heap comparator is based on the first element of the pairs.
2. a0 = -1; maxA = lcm(p[i])
3. Repeat
3a. Retrieve and remove the root of the heap, (a, p[i]).
3b. If a - a0 > 1 then the result is a0 + 1. Exit.
3c. if a is at least maxA, then no solution exists. Exit.
3d. Insert into the heap the value (a + p[i], p[i]).
3e. a0 = a
Remark: it is possible for such an a to not exist. If a valid a is not found below LCM(p[1], p[2], ... p[n]), then it is guaranteed that no valid a exists.
I'll show below an example of how this algorithm works.
Consider the following (p, s) pairs: { (2, 1), (5, 3) }.
The first pair indicates that a should avoid values like 1, 3, 5, 7, ..., whereas the second pair indicates that we should avoid values like 2, 7, 12, 17, ... .
The min-heap initially contains the first element of each sequence (step 1 of the pseudocode) -- shown in bold below:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
We retrieve and remove the head of the heap, i.e., the minimum value among the two bold ones, and this is 1. We add into the heap the next element from that sequence, thus the heap now contains the elements 2 and 3:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
We again retrieve the head of the heap, this time it contains the value 2, and add the next element of that sequence into the heap:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
The algorithm continues, we will next retrieve value 3, and add 5 into the heap:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
Finally, now we retrieve value 5. At this point we realize that the value 4 is not among the invalid values for a, thus that is the solution that we are looking for.
I can think of two different solutions. First:
p_max = lcm (p[0],p[1],...,p[n]) - 1;
for a = 0 to p_max:
zero_found = false;
for i = 0 to n:
if ( s[i] + a ) mod p[i] == 0:
zero_found = true;
break;
if !zero_found:
return a;
return -1;
I suppose this is the one you call "brute force". Notice that p_max represents Least Common Multiple of p[i]s - 1 (solution is either in the closed interval [0, p_max], or it does not exist). Complexity of this solution is O(n * p_max) in the worst case (plus the running time for calculating lcm!). There is a better solution regarding the time complexity, but it uses an additional binary array - classical time-space tradeoff. Its idea is similar to the Sieve of Eratosthenes, but for remainders instead of primes :)
p_max = lcm (p[0],p[1],...,p[n]) - 1;
int remainders[p_max + 1] = {0};
for i = 0 to n:
int rem = s[i] - p[i];
while rem >= -p_max:
remainders[-rem] = 1;
rem -= p[i];
for i = 0 to n:
if !remainders[i]:
return i;
return -1;
Explanation of the algorithm: first, we create an array remainders that will indicate whether certain negative remainder exists in the whole set. What is a negative remainder? It's simple, notice that 6 = 2 mod 4 is equivalent to 6 = -2 mod 4. If remainders[i] == 1, it means that if we add i to one of the s[j], we will get p[j] (which is 0, and that is what we want to avoid). Array is populated with all possible negative remainders, up to -p_max. Now all we have to do is search for the first i, such that remainder[i] == 0 and return it, if it exists - notice that the solution does not have to exists. In the problem text, you have indicated that you are searching for the minimum positive integer, I don't see why zero would not fit (if all s[i] are positive). However, if that is a strong requirement, just change the for loop to start from 1 instead of 0, and increment p_max.
The complexity of this algorithm is n + sum (p_max / p[i]) = n + p_max * sum (1 / p[i]), where i goes from to 0 to n. Since all p[i]s are at least 2, that is asymptotically better than the brute force solution.
An example for better understanding: suppose that the input is (5,4), (5,1), (2,0). p_max is lcm(5,5,2) - 1 = 10 - 1 = 9, so we create array with 10 elements, initially filled with zeros. Now let's proceed pair by pair:
from the first pair, we have remainders[1] = 1 and remainders[6] = 1
second pair gives remainders[4] = 1 and remainders[9] = 1
last pair gives remainders[0] = 1, remainders[2] = 1, remainders[4] = 1, remainders[6] = 1 and remainders[8] = 1.
Therefore, first index with zero value in the array is 3, which is a desired solution.

Maximum continuous achievable number

The problem
Definitions
Let's define a natural number N as a writable number (WN) for number set in M numeral system, if it can be written in this numeral system from members of U using each member no more than once. More strict definition of 'written': - here CONCAT means concatenation.
Let's define a natural number N as a continuous achievable number (CAN) for symbol set in M numeral system if it is a WN-number for U and M and also N-1 is a CAN-number for U and M (Another definition may be N is CAN for U and M if all 0 .. N numbers are WN for U and M). More strict:
Issue
Let we have a set of S natural numbers: (we are treating zero as a natural number) and natural number M, M>1. The problem is to find maximum CAN (MCAN) for given U and M. Given set U may contain duplicates - but each duplicate could not be used more than once, of cause (i.e. if U contains {x, y, y, z} - then each y could be used 0 or 1 time, so y could be used 0..2 times total). Also U expected to be valid in M-numeral system (i.e. can not contain symbols 8 or 9 in any member if M=8). And, of cause, members of U are numbers, not symbols for M (so 11 is valid for M=10) - otherwise the problem will be trivial.
My approach
I have in mind a simple algorithm now, which is simply checking if current number is CAN via:
Check if 0 is WN for given U and M? Go to 2: We're done, MCAN is null
Check if 1 is WN for given U and M? Go to 3: We're done, MCAN is 0
...
So, this algorithm is trying to build all this sequence. I doubt this part can be improved, but may be it can? Now, how to check if number is a WN. This is also some kind of 'substitution brute-force'. I have a realization of that for M=10 (in fact, since we're dealing with strings, any other M is not a problem) with PHP function:
//$mNumber is our N, $rgNumbers is our U
function isWriteable($mNumber, $rgNumbers)
{
if(in_array((string)$mNumber, $rgNumbers=array_map('strval', $rgNumbers), true))
{
return true;
}
for($i=1; $i<=strlen((string)$mNumber); $i++)
{
foreach($rgKeys = array_keys(array_filter($rgNumbers, function($sX) use ($mNumber, $i)
{
return $sX==substr((string)$mNumber, 0, $i);
})) as $iKey)
{
$rgTemp = $rgNumbers;
unset($rgTemp[$iKey]);
if(isWriteable(substr((string)$mNumber, $i), $rgTemp))
{
return true;
}
}
}
return false;
}
-so we're trying one piece and then check if the rest part could be written with recursion. If it can not be written, we're trying next member of U. I think this is a point which can be improved.
Specifics
As you see, an algorithm is trying to build all numbers before N and check if they are WN. But the only question is - to find MCAN, so, question is:
May be constructive algorithm is excessive here? And, if yes, what other options could be used?
Is there more quick way to determine if number is WN for given U and M? (this point may have no sense if previous point has positive answer and we'll not build and check all numbers before N).
Samples
U = {4, 1, 5, 2, 0}
M = 10
then MCAN = 2 (3 couldn't be reached)
U = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11}
M = 10
then MCAN = 21 (all before could be reached, for 22 there are no two 2 symbols total).
Hash the digit count for digits from 0 to m-1. Hash the numbers greater than m that are composed of one repeated digit.
MCAN is bound by the smallest digit for which all combinations of that digit for a given digit count cannot be constructed (e.g., X000,X00X,X0XX,XX0X,XXX0,XXXX), or (digit count - 1) in the case of zero (for example, for all combinations of four digits, combinations are needed for only three zeros; for a zero count of zero, MCAN is null). Digit counts are evaluated in ascending order.
Examples:
1. MCAN (10, {4, 1, 5, 2, 0})
3 is the smallest digit for which a digit-count of one cannot be constructed.
MCAN = 2
2. MCAN (10, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11})
2 is the smallest digit for which a digit-count of two cannot be constructed.
MCAN = 21
3. (from Alma Do Mundo's comment below) MCAN (2, {0,0,0,1,1,1})
1 is the smallest digit for which all combinations for a digit-count of four
cannot be constructed.
MCAN = 1110
4. (example from No One in Particular's answer)
MCAN (2, {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1111,11111111})
1 is the smallest digit for which all combinations for a digit-count of five
cannot be constructed.
MCAN = 10101
The recursion steps I've made are:
If the digit string is available in your alphabet, mark it used and return immediately
If the digit string is of length 1, return failure
Split the string in two and try each part
This is my code:
$u = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11];
echo ncan($u), "\n"; // 21
// the functions
function satisfy($n, array $u)
{
if (!empty($u[$n])) { // step 1
--$u[$n];
return $u;
} elseif (strlen($n) == 1) { // step 2
return false;
}
// step 3
for ($i = 1; $i < strlen($n); ++$i) {
$u2 = satisfy(substr($n, 0, $i), $u);
if ($u2 && satisfy(substr($n, $i), $u2)) {
return true;
}
}
return false;
}
function is_can($n, $u)
{
return satisfy($n, $u) !== false;
}
function ncan($u)
{
$umap = array_reduce($u, function(&$result, $item) {
#$result[$item]++;
return $result;
}, []);
$i = -1;
while (is_can($i + 1, $umap)) {
++$i;
}
return $i;
}
Here is another approach:
1) Order the set U with regards to the usual numerical ordering for base M.
2) If there is a symbol between 0 and (M-1) which is missing, then that is the first number which is NOT MCAN.
3) Find the fist symbol which has the least number of entries in the set U. From this we have an upper bound on the first number which is NOT MCAN. That number would be {xxxx} N times. For example, if M = 4 and U = { 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3}, then the number 333 is not MCAN. This gives us our upper bound.
4) So, if the first element of the set U which has the small number of occurences is x and it has C occurences, then we can clearly represent any number with C digits. (Since every element has at least C entries).
5) Now we ask if there is any number less than (C+1)x which can't be MCAN? Well, any (C+1) digit number can have either (C+1) of the same symbol or only at most (C) of the same symbol. Since x is minimal from step 3, (C+1)y for y < x can be done and (C)a + b can be done for any distinct a, b since they have (C) copies at least.
The above method works for set elements of only 1 symbol. However, we now see that it becomes more complex if multi-symbol elements are allowed. Consider the following case:
U = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1111,11111111}
Define c(A,B) = the number of 'A' symbols of 'B' length.
So for our example, c(0,1) = 15, c(0,2) = 0, c(0,3) = 0, c(0,4) = 0, ...
c(1,1) = 3, c(1,2) = 0, c(1,3) = 0, c(1,4) = 1, c(0,5) = 0, ..., c(1,8) = 1
The maximal 0 string we can't do is 16. The maximal 1 string we can't do is also 16.
1 = 1
11 = 1+1
111 = 1+1+1
1111 = 1111
11111 = 1+1111
111111 = 1+1+1111
1111111 = 1+1+1+1111
11111111 = 11111111
111111111 = 1+11111111
1111111111 = 1+1+11111111
11111111111 = 1+1+1+11111111
111111111111 = 1111+11111111
1111111111111 = 1+1111+11111111
11111111111111 = 1+1+1111+11111111
111111111111111 = 1+1+1+1111+11111111
But can we make the string 11111101111? We can't because the last 1 string (1111) needs the only set of 1's with the 4 in a row. Once we take that, we can't make the first 1 string (111111) because we only have an 8 (which is too big) or 3 1-lengths which are too small.
So for multi-symbols, we need another approach.
We know from sorting and ordering our strings what is the minimum length we can't do for a given symbol. (In the example above, it would be 16 zeros or 16 ones.) So this is our upper bound for an answer.
What we have to do now is start a 1 and count up in base M. For each number we write it in base M and then determine if we can make it from our set U. We do this by using the same approach used in the coin change problem: dynamic programming. (See for example http://www.geeksforgeeks.org/dynamic-programming-set-7-coin-change/ for the algorithm.) The only difference is that in our case we only have finite number of each elements, not an infinite supply.
Instead of subtracting the amount we are using like in the coin change problem, we strip the matching symbol off of the front of the string we are trying to match. (This is the opposite of our addition - concatenation.)

Allocate an array of integers proportionally compensating for rounding errors

I have an array of non-negative values. I want to build an array of values who's sum is 20 so that they are proportional to the first array.
This would be an easy problem, except that I want the proportional array to sum to exactly
20, compensating for any rounding error.
For example, the array
input = [400, 400, 0, 0, 100, 50, 50]
would yield
output = [8, 8, 0, 0, 2, 1, 1]
sum(output) = 20
However, most cases are going to have a lot of rounding errors, like
input = [3, 3, 3, 3, 3, 3, 18]
naively yields
output = [1, 1, 1, 1, 1, 1, 10]
sum(output) = 16 (ouch)
Is there a good way to apportion the output array so that it adds up to 20 every time?
There's a very simple answer to this question: I've done it many times. After each assignment into the new array, you reduce the values you're working with as follows:
Call the first array A, and the new, proportional array B (which starts out empty).
Call the sum of A elements T
Call the desired sum S.
For each element of the array (i) do the following:
a. B[i] = round(A[i] / T * S). (rounding to nearest integer, penny or whatever is required)
b. T = T - A[i]
c. S = S - B[i]
That's it! Easy to implement in any programming language or in a spreadsheet.
The solution is optimal in that the resulting array's elements will never be more than 1 away from their ideal, non-rounded values. Let's demonstrate with your example:
T = 36, S = 20. B[1] = round(A[1] / T * S) = 2. (ideally, 1.666....)
T = 33, S = 18. B[2] = round(A[2] / T * S) = 2. (ideally, 1.666....)
T = 30, S = 16. B[3] = round(A[3] / T * S) = 2. (ideally, 1.666....)
T = 27, S = 14. B[4] = round(A[4] / T * S) = 2. (ideally, 1.666....)
T = 24, S = 12. B[5] = round(A[5] / T * S) = 2. (ideally, 1.666....)
T = 21, S = 10. B[6] = round(A[6] / T * S) = 1. (ideally, 1.666....)
T = 18, S = 9. B[7] = round(A[7] / T * S) = 9. (ideally, 10)
Notice that comparing every value in B with it's ideal value in parentheses, the difference is never more than 1.
It's also interesting to note that rearranging the elements in the array can result in different corresponding values in the resulting array. I've found that arranging the elements in ascending order is best, because it results in the smallest average percentage difference between actual and ideal.
Your problem is similar to a proportional representation where you want to share N seats (in your case 20) among parties proportionnaly to the votes they obtain, in your case [3, 3, 3, 3, 3, 3, 18]
There are several methods used in different countries to handle the rounding problem. My code below uses the Hagenbach-Bischoff quota method used in Switzerland, which basically allocates the seats remaining after an integer division by (N+1) to parties which have the highest remainder:
def proportional(nseats,votes):
"""assign n seats proportionaly to votes using Hagenbach-Bischoff quota
:param nseats: int number of seats to assign
:param votes: iterable of int or float weighting each party
:result: list of ints seats allocated to each party
"""
quota=sum(votes)/(1.+nseats) #force float
frac=[vote/quota for vote in votes]
res=[int(f) for f in frac]
n=nseats-sum(res) #number of seats remaining to allocate
if n==0: return res #done
if n<0: return [min(x,nseats) for x in res] # see siamii's comment
#give the remaining seats to the n parties with the largest remainder
remainders=[ai-bi for ai,bi in zip(frac,res)]
limit=sorted(remainders,reverse=True)[n-1]
#n parties with remainter larger than limit get an extra seat
for i,r in enumerate(remainders):
if r>=limit:
res[i]+=1
n-=1 # attempt to handle perfect equality
if n==0: return res #done
raise #should never happen
However this method doesn't always give the same number of seats to parties with perfect equality as in your case:
proportional(20,[3, 3, 3, 3, 3, 3, 18])
[2,2,2,2,1,1,10]
You have set 3 incompatible requirements. An integer-valued array proportional to [1,1,1] cannot be made to sum to exactly 20. You must choose to break one of the "sum to exactly 20", "proportional to input", and "integer values" requirements.
If you choose to break the requirement for integer values, then use floating point or rational numbers. If you choose to break the exact sum requirement, then you've already solved the problem. Choosing to break proportionality is a little trickier. One approach you might take is to figure out how far off your sum is, and then distribute corrections randomly through the output array. For example, if your input is:
[1, 1, 1]
then you could first make it sum as well as possible while still being proportional:
[7, 7, 7]
and since 20 - (7+7+7) = -1, choose one element to decrement at random:
[7, 6, 7]
If the error was 4, you would choose four elements to increment.
A naïve solution that doesn't perform well, but will provide the right result...
Write an iterator that given an array with eight integers (candidate) and the input array, output the index of the element that is farthest away from being proportional to the others (pseudocode):
function next_index(candidate, input)
// Calculate weights
for i in 1 .. 8
w[i] = candidate[i] / input[i]
end for
// find the smallest weight
min = 0
min_index = 0
for i in 1 .. 8
if w[i] < min then
min = w[i]
min_index = i
end if
end for
return min_index
end function
Then just do this
result = [0, 0, 0, 0, 0, 0, 0, 0]
result[next_index(result, input)]++ for 1 .. 20
If there is no optimal solution, it'll skew towards the beginning of the array.
Using the approach above, you can reduce the number of iterations by rounding down (as you did in your example) and then just use the approach above to add what has been left out due to rounding errors:
result = <<approach using rounding down>>
while sum(result) < 20
result[next_index(result, input)]++
So the answers and comments above were helpful... particularly the decreasing sum comment from #Frederik.
The solution I came up with takes advantage of the fact that for an input array v, sum(v_i * 20) is divisible by sum(v). So for each value in v, I mulitply by 20 and divide by the sum. I keep the quotient, and accumulate the remainder. Whenever the accumulator is greater than sum(v), I add one to the value. That way I'm guaranteed that all the remainders get rolled into the results.
Is that legible? Here's the implementation in Python:
def proportion(values, total):
# set up by getting the sum of the values and starting
# with an empty result list and accumulator
sum_values = sum(values)
new_values = []
acc = 0
for v in values:
# for each value, find quotient and remainder
q, r = divmod(v * total, sum_values)
if acc + r < sum_values:
# if the accumlator plus remainder is too small, just add and move on
acc += r
else:
# we've accumulated enough to go over sum(values), so add 1 to result
if acc > r:
# add to previous
new_values[-1] += 1
else:
# add to current
q += 1
acc -= sum_values - r
# save the new value
new_values.append(q)
# accumulator is guaranteed to be zero at the end
print new_values, sum_values, acc
return new_values
(I added an enhancement that if the accumulator > remainder, I increment the previous value instead of the current value)

Resources