Rearrange list to satisfy a condition

Rearrange list to satisfy a condition - algorithm

I was asked this during a coding interview but wasn't able to solve this. Any pointers would be very helpful.
I was given an integer list (think of it as a number line) which needs to be rearranged so that the difference between elements is equal to M (an integer which is given). The list needs to be rearranged in such a way that the value of the max absolute difference between the elements' new positions and the original positions needs to be minimized. Eventually, this value multiplied by 2 is returned.
Test cases:
//1.
original_list = [1, 2, 3, 4]
M = 2
rearranged_list = [-0.5, 1.5, 3.5, 5.5]
// difference in values of original and rearranged lists
diff = [1.5, 0.5, 0.5, 1.5]
max_of_diff = 1.5 // list is rearranged in such a way so that this value is minimized
return_val = 1.5 * 2 = 3
//2.
original_list = [1, 2, 4, 3]
M = 2
rearranged_list = [-1, 1, 3, 5]
// difference in values of original and rearranged lists
diff = [2, 1, 1, 2]
max_of_diff = 2 // list is rearranged in such a way so that this value is minimized
return_val = 2 * 2 = 4
Constraints:
1 <= list_length <= 10^5
1 <= M <= 10^4
-10^9 <= list[i] <= 10^9
There's a question on leetcode which is very similar to this: https://leetcode.com/problems/minimize-deviation-in-array/ but there, the operations that are performed on the array are mentioned while that's not been mentioned here. I'm really stumped.

Here is how you can think of it:
The "rearanged" list is like a straight line that has a slope that corresponds to M.
Here is a visualisation for the first example:
The black dots are the input values [1, 2, 3, 4] where the index of the array is the X-coordinate, and the actual value at that index, the Y-coordinate.
The green line is determined by M. Initially this line runs through the origin at (0, 0). The red line segments represent the differences that must be taken into account.
Now the green line has to move vertically to its optimal position. We can see that we only need to look at the difference it makes with the first and with the last point. The other two inputs will never contribute to an extreme. This is generally true: there are only two input elements that need to be taken into account. They are the points that make the greatest (signed -- not absolute) difference and the least difference.
We can see that we need to move the green line in such a way that the signed differences with these two extremes are each others opposite: i.e. their absolute difference becomes the same, but the sign will be opposite.
Twice this absolute difference is what we need to return, and it is actually the difference between the greatest (signed) difference and the least (signed) difference.
So, in conclusion, we must generate the values on the green line, find the least and greatest (signed) difference with the data points (Y-coordinates) and return the difference between those two.
Here is an implementation in JavaScript running the two examples you provided:
function solve(y, slope) {
let low = Infinity;
let high = -Infinity;
for (let x = 0; x < y.length; x++) {
let dy = y[x] - x * slope;
low = Math.min(low, dy);
high = Math.max(high, dy);
}
return high - low;
}
console.log(solve([1, 2, 3, 4], 2)); // 3
console.log(solve([1, 2, 4, 3], 2)); // 4

Related

Need an algorithm to "evenly" iterate over all possible combinations of a set of values

sorry for the horrible title, I am really struggling to find the right words for what I am looking for.
I think what I want to do is actually quite simple, but I still can't really wrap my head around creating algorithms. I bet I could have easily found a solution on the web if I wasn't lacking basic knowledge of algorithm terminology.
Let's assume I want to iterate over all combinations of an array of five integers, where each integer is a number between zero and nine. Naturally, I could just increment from 0 to 99999. [0, 0, 0, 0, 1], [0, 0, 0, 0, 2], ... [9, 9, 9, 9, 9].
However, I need to "evenly" (don't really know how to call it) increment the individual elements. Ideally, the sequence of arrays that is produced by the algorithm should look something like this:
[0,0,0,0,0] [1,0,0,0,0] [0,1,0,0,0] [0,0,1,0,0]
[0,0,0,1,0] [0,0,0,0,1] [1,1,0,0,0] [1,0,1,0,0]
[1,0,0,1,0] [1,0,0,0,1] [1,1,0,1,0] [1,1,0,0,1]
[1,1,1,0,0] [1,1,1,1,0] [1,1,1,0,1] [1,1,1,1,1]
[2,0,0,0,0] [2,1,0,0,0] [2,0,1,0,0] [2,0,0,1,0]
[2,0,0,0,1] [2,1,1,0,0] [2,1,0,1,0] .....
I probably made a few mistake in the sequence above, but maybe you can guess what I am trying to approach. Don't introduce a number higher than 1 unless every possible combination of 0s and 1s has been determined, don't introduce a number higher than 2 unless every possible combination of 0s, 1s and 2s has been determined, and so on..
I would really appreciate someone pointing me in the right direction! Thanks a lot

You've already said that you can get the combinations you are looking for by enumerating all nk possible sequences, except that you don't get them in the desired order.
You could generate the sequences in the right order if you used an odometer-style enumerator. At first, all digits must be 0 or 1. When the odometer would wrap (after 1111...), you increment the set of the digits to [0, 1, 2]. Reset the sequence to 2000... and keep iterating, but only emit sequences that have at least one 2 in them, because you've already generated all sequences of 0's and 1's. Repeat until after wrapping you go beyond the maximum threshold.
Filtering out the duplicates that don't have the current top digit in them can be done by keeping track of the count of top numbers.
Here's an implementation in C with hard-enumed limits:
enum {
SIZE = 3,
TOP = 4
};
typedef struct Generator Generator;
struct Generator {
unsigned top; // current threshold
unsigned val[SIZE]; // sequence array
unsigned tops; // count of "top" values
};
/*
* "raw" generator backend which produces all sequences
* and keeps track of how many top numbers there are
*/
int gen_next_raw(Generator *gen)
{
int i = 0;
do {
if (gen->val[i] == gen->top) gen->tops--;
gen->val[i]++;
if (gen->val[i] == gen->top) gen->tops++;
if (gen->val[i] <= gen->top) return 1;
gen->val[i++] = 0;
} while (i < SIZE);
return 0;
}
/*
* actual generator, which filters out duplicates
* and increases the threshold if needed
*/
int gen_next(Generator *gen)
{
while (gen_next_raw(gen)) {
if (gen->tops) return 1;
}
gen->top++;
if (gen->top > TOP) return 0;
memset(gen->val, 0, sizeof(gen->val));
gen->val[0] = gen->top;
gen->tops = 1;
return 1;
}
The gen_next_raw function is the base implementation of the odometer with the addition of keeping a count of current top digits. The gen_next function uses it as backend. It filters out the duplicates and increases the threshold as needed. (All that can probably be done more efficiently.)
Generate the sequence with:
Generator gen = {0};
while (gen_next(&gen)) {
if (is_good(gen.val)) {
puts("Bingo!");
break;
}
}

You could break this down into two subproblems:
get all combinations with replacement of 0, 1, 2, ... for the given number of digits
get all (unique) permutations of those combinations
Your desired ordering is still different than the order those are typically generated in (e.g. (0,1,1) before (0,0,2), and (0,0,1) before (1,0,0)), but you can just collect all the combinations and all the permutations individually and sort them, at least requiring much less memory than for generating, collecting and sorting all those combinations.
Example in Python, using implementations of those functions from the itertools library; key=lambda c: c[::-1] sorts the lists in-order, but reversing the order of the individual elements to get your desired order:
from itertools import combinations_with_replacement, permutations
places = 3
max_digit = 3
all_combs = list(combinations_with_replacement(range(0, max_digit+1), r=places))
for comb in sorted(all_combs, key=lambda c: c[::-1]):
all_perms = set(permutations(comb))
for perm in sorted(all_perms, key=lambda c: c[::-1]):
print(perm)
And some selected output (64 elements in total)
(0, 0, 0)
(1, 0, 0)
(0, 1, 0)
...
(0, 1, 1)
(1, 1, 1)
(2, 0, 0)
(0, 2, 0)
...
(0, 1, 2)
(2, 1, 1)
...
(2, 2, 2)
(3, 0, 0)
(0, 3, 0)
...
(2, 3, 3)
(3, 3, 3)
For 27 places with values up to 27 that would still be too many combinations-with-replacement to generate and sort, so this part should be replaced with a custom algorithm.
keep track of how often each digit appears; start with all zeros
find the smallest digit that has a non-zero count, increment the count of the digit after that, and redistribute the remaining smaller counts back to the smallest digit (i.e. zero)
In Python:
def generate_combinations(places, max_digit):
# initially [places, 0, 0, ..., 0]
counts = [places] + [0] * max_digit
yield [i for i, c in enumerate(counts) for _ in range(c)]
while True:
# find lowest digit with a smaller digit with non-zero count
k = next(i for i, c in enumerate(counts) if c > 0) + 1
if k == max_digit + 1:
break
# add one more to that digit, and reset all below to start
counts[k] += 1
counts[0] = places - sum(counts[k:])
for i in range(1, k):
counts[i] = 0
yield [i for i, c in enumerate(counts) for _ in range(c)]
For the second part, we can still use a standard permutations generator, although for 27! that would be too many to collect in a set, but if you expect the result in the first few hundred combinations, you might just keep track of already seen permutations and skip those, and hope that you find the result before that set grows too large...
from itertools import permutations
for comb in generate_combinations(places=3, max_digit=3):
for p in set(permutations(comb)):
print(p)
print()

Number of ways to pick the elements of an array?

How to formulate this problem in code?
Problem Statement:
UPDATED:
Find the number of ways to pick the element from the array which are
not visited.
We starting from 1,2,.....,n with some (1<= x <= n) number of elements already picked/visited randomly which is given in the input.
Now, we need to find the number of ways we can pick rest of the (n - x) number of elements present in the array, and the way we pick an element is defined as:
On every turn, we can only pick the element which is adjacent(either left or right) to some visited element i.e
in an array of elements:
1,2,3,4,5,6 let's say we have visited 3 & 6 then we can now pick
2 or 4 or 5, as they are unvisited and adjacent to visited nodes, now say we pick 2, so now we can pick 1 or 4 or 5 and continues.
example:
input: N = 6(number of elements: 1, 2, 3, 4, 5, 6)
M = 2(number of visited elements)
visited elements are = 1, 5
Output: 16(number of ways we can pick the unvisited elements)
ways: 4, 6, 2, 3
4, 6, 3, 2
4, 2, 3, 6
4, 2, 6, 3
4, 3, 2, 6
4, 3, 6, 2
6, 4, 2, 3
6, 4, 2, 3
6, 2, 3, 4
6, 2, 4, 3
2, 6, 4, 3
2, 6, 3, 4
2, 4, 6, 3
2, 4, 3, 6
2, 3, 4, 6
2, 3, 6, 4.

Some analysis of the problem:
The actual values in the input array are assumed to be 1...n, but these values do not really play a role. These values just represent indexes that are referenced by the other input array, which lists the visited indexes (1-based)
The list of visited indexes actually cuts the main array into subarrays with smaller sizes. So for example, when n=6 and visited=[1,5], then the original array [1,2,3,4,5,6] is cut into [2,3,4] and [6]. So it cuts it into sizes 3 and 1. At this point the index numbering loses its purpose, so the problem really is fully described with those two sizes: 3 and 1. To illustrate, the solution for (n=6, visited=[1,5]) is necessarily the same as for (n=7, visited[1,2,6]): the sizes into which the original array is cut, are the same in both cases (in a different order, but that doesn't influence the result).
Algorithm, based on a list of sizes of subarrays (see above):
The number of ways that one such subarray can be visited, is not that difficult: if the subarray's size is 1, there is just one way. If it is greater, then at each pick, there are two possibilities: either you pick from the left side or from the right side. So you get like 2*2*..*2*1 possibilities to pick. This is 2size-1 possibilities.
The two outer subarrays are an exception to this, as you can only pick items from the inside-out, so for those the number of ways to visit such a subarray is just 1.
The number of ways that you can pick items from two subarrays can be determined as follows: count the number of ways to pick from just one of those subarrays, and the number of ways to pick from the other one. Then consider that you can alternate when to pick from one sub array or from the other. This comes down to interweaving the two sub arrays. Let's say the larger of the two sub arrays has j elements, and the smaller k, then consider there are j+1 positions where an element from the smaller sub array can be injected (merged) into the larger array. There are "k multichoose j+1" ways ways to inject all elements from the smaller sub array.
When you have counted the number of ways to merge two subarrays, you actually have an array with a size that is the sum of those two sizes. The above logic can then be applied with this array and the next subarray in the problem specification. The number of ways just multiplies as you merge more subarrays into this growing array. Of course, you don't really deal with the arrays, just with sizes.
Here is an implementation in JavaScript, which applies the above algorithm:
function getSubArraySizes(n, visited) {
// Translate the problem into a set of sizes (of subarrays)
let j = 0;
let sizes = [];
for (let i of visited) {
let size = i - j - 1;
if (size > 0) sizes.push(size);
j = i;
}
let size = n - j;
if (size > 0) sizes.push(size);
return sizes;
}
function Combi(n, k) {
// Count combinations: "from n, take k"
// See Wikipedia on "Combination"
let c = 1;
let end = Math.min(k, n - k);
for (let i = 0; i < end; i++) {
c = c * (n-i) / (end-i); // This is floating point
}
return c; // ... but result is integer
}
function getPickCount(sizes) {
// Main function, based on a list of sizes of subarrays
let count = 0;
let result = 1;
for (let i = 0; i < sizes.length; i++) {
let size = sizes[i];
// Number of ways to take items from this chunk:
// - when items can only be taken from one side: 1
// - otherwise: every time we have a choice between 2, except for the last remaining item
let pickCount = i == 0 || i == sizes.length-1 ? 1 : 2 ** (size-1);
// Number of ways to merge/weave two arrays, where relative order of elements is not changed
// = a "k multichoice from n". See
// https://en.wikipedia.org/wiki/Combination#Number_of_combinations_with_repetition
let weaveCount = count == 0 ? 1 // First time only
: Combi(size+count, Math.min(count, size));
// Number of possibilities:
result *= pickCount * weaveCount;
// Update the size to be the size of the merged/woven array
count += size;
}
return result;
}
// Demo with the example input (n = 6, visited = 1 and 5)
let result = getPickCount(getSubArraySizes(6, [1, 5]));
console.log(result);

Minimum common remainder of division

I have n pairs of numbers: ( p[1], s[1] ), ( p[2], s[2] ), ... , ( p[n], s[n] )
Where p[i] is integer greater than 1; s[i] is integer : 0 <= s[i] < p[i]
Is there any way to determine minimum positive integer a , such that for each pair :
( s[i] + a ) mod p[i] != 0
Anything better than brute force ?

It is possible to do better than brute force. Brute force would be O(A·n), where A is the minimum valid value for a that we are looking for.
The approach described below uses a min-heap and achieves O(n·log(n) + A·log(n)) time complexity.
First, notice that replacing a with a value of the form (p[i] - s[i]) + k * p[i] leads to a reminder equal to zero in the ith pair, for any positive integer k. Thus, the numbers of that form are invalid a values (the solution that we are looking for is different from all of them).
The proposed algorithm is an efficient way to generate the numbers of that form (for all i and k), i.e. the invalid values for a, in increasing order. As soon as the current value differs from the previous one by more than 1, it means that there was a valid a in-between.
The pseudocode below details this approach.
1. construct a min-heap from all the following pairs (p[i] - s[i], p[i]),
where the heap comparator is based on the first element of the pairs.
2. a0 = -1; maxA = lcm(p[i])
3. Repeat
3a. Retrieve and remove the root of the heap, (a, p[i]).
3b. If a - a0 > 1 then the result is a0 + 1. Exit.
3c. if a is at least maxA, then no solution exists. Exit.
3d. Insert into the heap the value (a + p[i], p[i]).
3e. a0 = a
Remark: it is possible for such an a to not exist. If a valid a is not found below LCM(p[1], p[2], ... p[n]), then it is guaranteed that no valid a exists.
I'll show below an example of how this algorithm works.
Consider the following (p, s) pairs: { (2, 1), (5, 3) }.
The first pair indicates that a should avoid values like 1, 3, 5, 7, ..., whereas the second pair indicates that we should avoid values like 2, 7, 12, 17, ... .
The min-heap initially contains the first element of each sequence (step 1 of the pseudocode) -- shown in bold below:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
We retrieve and remove the head of the heap, i.e., the minimum value among the two bold ones, and this is 1. We add into the heap the next element from that sequence, thus the heap now contains the elements 2 and 3:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
We again retrieve the head of the heap, this time it contains the value 2, and add the next element of that sequence into the heap:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
The algorithm continues, we will next retrieve value 3, and add 5 into the heap:
1, 3, 5, 7, ...
2, 7, 12, 17, ...
Finally, now we retrieve value 5. At this point we realize that the value 4 is not among the invalid values for a, thus that is the solution that we are looking for.

I can think of two different solutions. First:
p_max = lcm (p[0],p[1],...,p[n]) - 1;
for a = 0 to p_max:
zero_found = false;
for i = 0 to n:
if ( s[i] + a ) mod p[i] == 0:
zero_found = true;
break;
if !zero_found:
return a;
return -1;
I suppose this is the one you call "brute force". Notice that p_max represents Least Common Multiple of p[i]s - 1 (solution is either in the closed interval [0, p_max], or it does not exist). Complexity of this solution is O(n * p_max) in the worst case (plus the running time for calculating lcm!). There is a better solution regarding the time complexity, but it uses an additional binary array - classical time-space tradeoff. Its idea is similar to the Sieve of Eratosthenes, but for remainders instead of primes :)
p_max = lcm (p[0],p[1],...,p[n]) - 1;
int remainders[p_max + 1] = {0};
for i = 0 to n:
int rem = s[i] - p[i];
while rem >= -p_max:
remainders[-rem] = 1;
rem -= p[i];
for i = 0 to n:
if !remainders[i]:
return i;
return -1;
Explanation of the algorithm: first, we create an array remainders that will indicate whether certain negative remainder exists in the whole set. What is a negative remainder? It's simple, notice that 6 = 2 mod 4 is equivalent to 6 = -2 mod 4. If remainders[i] == 1, it means that if we add i to one of the s[j], we will get p[j] (which is 0, and that is what we want to avoid). Array is populated with all possible negative remainders, up to -p_max. Now all we have to do is search for the first i, such that remainder[i] == 0 and return it, if it exists - notice that the solution does not have to exists. In the problem text, you have indicated that you are searching for the minimum positive integer, I don't see why zero would not fit (if all s[i] are positive). However, if that is a strong requirement, just change the for loop to start from 1 instead of 0, and increment p_max.
The complexity of this algorithm is n + sum (p_max / p[i]) = n + p_max * sum (1 / p[i]), where i goes from to 0 to n. Since all p[i]s are at least 2, that is asymptotically better than the brute force solution.
An example for better understanding: suppose that the input is (5,4), (5,1), (2,0). p_max is lcm(5,5,2) - 1 = 10 - 1 = 9, so we create array with 10 elements, initially filled with zeros. Now let's proceed pair by pair:
from the first pair, we have remainders[1] = 1 and remainders[6] = 1
second pair gives remainders[4] = 1 and remainders[9] = 1
last pair gives remainders[0] = 1, remainders[2] = 1, remainders[4] = 1, remainders[6] = 1 and remainders[8] = 1.
Therefore, first index with zero value in the array is 3, which is a desired solution.

Allocate an array of integers proportionally compensating for rounding errors

I have an array of non-negative values. I want to build an array of values who's sum is 20 so that they are proportional to the first array.
This would be an easy problem, except that I want the proportional array to sum to exactly
20, compensating for any rounding error.
For example, the array
input = [400, 400, 0, 0, 100, 50, 50]
would yield
output = [8, 8, 0, 0, 2, 1, 1]
sum(output) = 20
However, most cases are going to have a lot of rounding errors, like
input = [3, 3, 3, 3, 3, 3, 18]
naively yields
output = [1, 1, 1, 1, 1, 1, 10]
sum(output) = 16 (ouch)
Is there a good way to apportion the output array so that it adds up to 20 every time?

There's a very simple answer to this question: I've done it many times. After each assignment into the new array, you reduce the values you're working with as follows:
Call the first array A, and the new, proportional array B (which starts out empty).
Call the sum of A elements T
Call the desired sum S.
For each element of the array (i) do the following:
a. B[i] = round(A[i] / T * S). (rounding to nearest integer, penny or whatever is required)
b. T = T - A[i]
c. S = S - B[i]
That's it! Easy to implement in any programming language or in a spreadsheet.
The solution is optimal in that the resulting array's elements will never be more than 1 away from their ideal, non-rounded values. Let's demonstrate with your example:
T = 36, S = 20. B[1] = round(A[1] / T * S) = 2. (ideally, 1.666....)
T = 33, S = 18. B[2] = round(A[2] / T * S) = 2. (ideally, 1.666....)
T = 30, S = 16. B[3] = round(A[3] / T * S) = 2. (ideally, 1.666....)
T = 27, S = 14. B[4] = round(A[4] / T * S) = 2. (ideally, 1.666....)
T = 24, S = 12. B[5] = round(A[5] / T * S) = 2. (ideally, 1.666....)
T = 21, S = 10. B[6] = round(A[6] / T * S) = 1. (ideally, 1.666....)
T = 18, S = 9. B[7] = round(A[7] / T * S) = 9. (ideally, 10)
Notice that comparing every value in B with it's ideal value in parentheses, the difference is never more than 1.
It's also interesting to note that rearranging the elements in the array can result in different corresponding values in the resulting array. I've found that arranging the elements in ascending order is best, because it results in the smallest average percentage difference between actual and ideal.

Your problem is similar to a proportional representation where you want to share N seats (in your case 20) among parties proportionnaly to the votes they obtain, in your case [3, 3, 3, 3, 3, 3, 18]
There are several methods used in different countries to handle the rounding problem. My code below uses the Hagenbach-Bischoff quota method used in Switzerland, which basically allocates the seats remaining after an integer division by (N+1) to parties which have the highest remainder:
def proportional(nseats,votes):
"""assign n seats proportionaly to votes using Hagenbach-Bischoff quota
:param nseats: int number of seats to assign
:param votes: iterable of int or float weighting each party
:result: list of ints seats allocated to each party
"""
quota=sum(votes)/(1.+nseats) #force float
frac=[vote/quota for vote in votes]
res=[int(f) for f in frac]
n=nseats-sum(res) #number of seats remaining to allocate
if n==0: return res #done
if n<0: return [min(x,nseats) for x in res] # see siamii's comment
#give the remaining seats to the n parties with the largest remainder
remainders=[ai-bi for ai,bi in zip(frac,res)]
limit=sorted(remainders,reverse=True)[n-1]
#n parties with remainter larger than limit get an extra seat
for i,r in enumerate(remainders):
if r>=limit:
res[i]+=1
n-=1 # attempt to handle perfect equality
if n==0: return res #done
raise #should never happen
However this method doesn't always give the same number of seats to parties with perfect equality as in your case:
proportional(20,[3, 3, 3, 3, 3, 3, 18])
[2,2,2,2,1,1,10]

You have set 3 incompatible requirements. An integer-valued array proportional to [1,1,1] cannot be made to sum to exactly 20. You must choose to break one of the "sum to exactly 20", "proportional to input", and "integer values" requirements.
If you choose to break the requirement for integer values, then use floating point or rational numbers. If you choose to break the exact sum requirement, then you've already solved the problem. Choosing to break proportionality is a little trickier. One approach you might take is to figure out how far off your sum is, and then distribute corrections randomly through the output array. For example, if your input is:
[1, 1, 1]
then you could first make it sum as well as possible while still being proportional:
[7, 7, 7]
and since 20 - (7+7+7) = -1, choose one element to decrement at random:
[7, 6, 7]
If the error was 4, you would choose four elements to increment.

A naïve solution that doesn't perform well, but will provide the right result...
Write an iterator that given an array with eight integers (candidate) and the input array, output the index of the element that is farthest away from being proportional to the others (pseudocode):
function next_index(candidate, input)
// Calculate weights
for i in 1 .. 8
w[i] = candidate[i] / input[i]
end for
// find the smallest weight
min = 0
min_index = 0
for i in 1 .. 8
if w[i] < min then
min = w[i]
min_index = i
end if
end for
return min_index
end function
Then just do this
result = [0, 0, 0, 0, 0, 0, 0, 0]
result[next_index(result, input)]++ for 1 .. 20
If there is no optimal solution, it'll skew towards the beginning of the array.
Using the approach above, you can reduce the number of iterations by rounding down (as you did in your example) and then just use the approach above to add what has been left out due to rounding errors:
result = <<approach using rounding down>>
while sum(result) < 20
result[next_index(result, input)]++

So the answers and comments above were helpful... particularly the decreasing sum comment from #Frederik.
The solution I came up with takes advantage of the fact that for an input array v, sum(v_i * 20) is divisible by sum(v). So for each value in v, I mulitply by 20 and divide by the sum. I keep the quotient, and accumulate the remainder. Whenever the accumulator is greater than sum(v), I add one to the value. That way I'm guaranteed that all the remainders get rolled into the results.
Is that legible? Here's the implementation in Python:
def proportion(values, total):
# set up by getting the sum of the values and starting
# with an empty result list and accumulator
sum_values = sum(values)
new_values = []
acc = 0
for v in values:
# for each value, find quotient and remainder
q, r = divmod(v * total, sum_values)
if acc + r < sum_values:
# if the accumlator plus remainder is too small, just add and move on
acc += r
else:
# we've accumulated enough to go over sum(values), so add 1 to result
if acc > r:
# add to previous
new_values[-1] += 1
else:
# add to current
q += 1
acc -= sum_values - r
# save the new value
new_values.append(q)
# accumulator is guaranteed to be zero at the end
print new_values, sum_values, acc
return new_values
(I added an enhancement that if the accumulator > remainder, I increment the previous value instead of the current value)

How to round floats to integers while preserving their sum?

Let's say I have an array of floating point numbers, in sorted (let's say ascending) order, whose sum is known to be an integer N. I want to "round" these numbers to integers while leaving their sum unchanged. In other words, I'm looking for an algorithm that converts the array of floating-point numbers (call it fn) to an array of integers (call it in) such that:
the two arrays have the same length
the sum of the array of integers is N
the difference between each floating-point number fn[i] and its corresponding integer in[i] is less than 1 (or equal to 1 if you really must)
given that the floats are in sorted order (fn[i] <= fn[i+1]), the integers will also be in sorted order (in[i] <= in[i+1])
Given that those four conditions are satisfied, an algorithm that minimizes the rounding variance (sum((in[i] - fn[i])^2)) is preferable, but it's not a big deal.
Examples:
[0.02, 0.03, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14]
=> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[0.1, 0.3, 0.4, 0.4, 0.8]
=> [0, 0, 0, 1, 1]
[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
=> [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[0.4, 0.4, 0.4, 0.4, 9.2, 9.2]
=> [0, 0, 1, 1, 9, 9] is preferable
=> [0, 0, 0, 0, 10, 10] is acceptable
[0.5, 0.5, 11]
=> [0, 1, 11] is fine
=> [0, 0, 12] is technically not allowed but I'd take it in a pinch
To answer some excellent questions raised in the comments:
Repeated elements are allowed in both arrays (although I would also be interested to hear about algorithms that work only if the array of floats does not include repeats)
There is no single correct answer - for a given input array of floats, there are generally multiple arrays of ints that satisfy the four conditions.
The application I had in mind was - and this is kind of odd - distributing points to the top finishers in a game of MarioKart ;-) Never actually played the game myself, but while watching someone else I noticed that there were 24 points distributed among the top 4 finishers, and I wondered how it might be possible to distribute the points according to finishing time (so if someone finishes with a large lead they get a larger share of the points). The game tracks point totals as integers, hence the need for this kind of rounding.
For the curious, here is the test script I used to identify which algorithms worked.

One option you could try is "cascade rounding".
For this algorithm you keep track of two running totals: one of floating point numbers so far, and one of the integers so far.
To get the next integer you add the next fp number to your running total, round the running total, then subtract the integer running total from the rounded running total:-
number running total integer integer running total
1.3 1.3 1 1
1.7 3.0 2 3
1.9 4.9 2 5
2.2 8.1 3 8
2.8 10.9 3 11
3.1 14.0 3 14

Here is one algorithm which should accomplish the task. The main difference to other algorithms is that this one rounds the numbers in correct order always. Minimizing roundoff error.
The language is some pseudo language which probably derived from JavaScript or Lua. Should explain the point. Note the one based indexing (which is nicer with x to y for loops. :p)
// Temp array with same length as fn.
tempArr = Array(fn.length)
// Calculate the expected sum.
arraySum = sum(fn)
lowerSum = 0
-- Populate temp array.
for i = 1 to fn.lengthf
tempArr[i] = { result: floor(fn[i]), // Lower bound
difference: fn[i] - floor(fn[i]), // Roundoff error
index: i } // Original index
// Calculate the lower sum
lowerSum = lowerSum + tempArr[i].result
end for
// Sort the temp array on the roundoff error
sort(tempArr, "difference")
// Now arraySum - lowerSum gives us the difference between sums of these
// arrays. tempArr is ordered in such a way that the numbers closest to the
// next one are at the top.
difference = arraySum - lowerSum
// Add 1 to those most likely to round up to the next number so that
// the difference is nullified.
for i = (tempArr.length - difference + 1) to tempArr.length
tempArr.result = tempArr.result + 1
end for
// Optionally sort the array based on the original index.
array(sort, "index")

One really easy way is to take all the fractional parts and sum them up. That number by the definition of your problem must be a whole number. Distribute that whole number evenly starting with the largest of your numbers. Then give one to the second largest number... etc. until you run out of things to distribute.
Note this is pseudocode... and may be off by one in an index... its late and I am sleepy.
float accumulator = 0;
for (i = 0; i < num_elements; i++) /* assumes 0 based array */
{
accumulator += (fn[i] - floor(fn[i]));
fn[i] = (fn[i] - floor(fn[i]);
}
i = num_elements;
while ((accumulator > 0) && (i>=0))
{
fn[i-1] += 1; /* assumes 0 based array */
accumulator -= 1;
i--;
}
Update: There are other methods of distributing the accumulated values based on how much truncation was performed on each value. This would require keeping a seperate list called loss[i] = fn[i] - floor(fn[i]). You can then repeat over the fn[i] list and give 1 to the greatest loss item repeatedly (setting the loss[i] to 0 afterwards). Its complicated but I guess it works.

How about:
a) start: array is [0.1, 0.2, 0.4, 0.5, 0.8], N=3, presuming it's sorted
b) round them all the usual way: array is [0 0 0 1 1]
c) get the sum of the new array and subtract it from N to get the remainder.
d) while remainder>0, iterate through elements, going from the last one
- check if the new value would break rule 3.
- if not, add 1
e) in case that remainder<0, iterate from first one to the last one
- check if the new value would break rule 3.
- if not, subtract 1

Essentially what you'd do is distribute the leftovers after rounding to the most likely candidates.
Round the floats as you normally would, but keep track of the delta from rounding and associated index into fn and in.
Sort the second array by delta.
While sum(in) < N, work forwards from the largest negative delta, incrementing the rounded value (making sure you still satisfy rule #3).
Or, while sum(in) > N, work backwards from the largest positive delta, decrementing the rounded value (making sure you still satisfy rule #3).
Example:
[0.02, 0.03, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14] N=1
1. [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] sum=0
and [[-0.02, 0], [-0.03, 1], [-0.05, 2], [-0.06, 3], [-0.07, 4], [-0.08, 5],
[-0.09, 6], [-0.1, 7], [-0.11, 8], [-0.12, 9], [-0.13, 10], [-0.14, 11]]
2. sorting will reverse the array
3. working from the largest negative remainder, you get [-0.14, 11].
Increment `in[11]` and you get [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1] sum=1
Done.

Can you try something like this?
in [i] = fn [i] - int (fn [i]);
fn_res [i] = fn [i] - in [i];
fn_res → is the resultant fraction.
(I thought this was basic ...), Are we missing something?

Well, 4 is the pain point. Otherwise you could do things like "usually round down and accumulate leftover; round up when accumulator >= 1". (edit: actually, that might still be OK as long as you swapped their position?)
There might be a way to do it with linear programming? (that's maths "programming", not computer programming - you'd need some maths to find the feasible solution, although you could probably skip the usual "optimisation" part).
As an example of the linear programming - with the example [1.3, 1.7, 1.9, 2.2, 2.8, 3.1] you could have the rules:
1 <= i < 2
1 <= j < 2
1 <= k < 2
2 <= l < 3
3 <= m < 4
i <= j <= k <= l <= m
i + j + k + l + m = 13
Then apply some linear/matrix algebra ;-p Hint: there are products to do the above based on things like the "Simplex" algorithm. Common university fodder, too (I wrote one at uni for my final project).

The problem, as I see it, is that the sorting algorithm is not specified. Or more like - whether it's a stable sort or not.
Consider the following array of floats:
[ 0.2 0.2 0.2 0.2 0.2 ]
The sum is 1. The integer array then should be:
[ 0 0 0 0 1 ]
However, if the sorting algorithm isn't stable, it could sort the "1" somewhere else in the array...

Make the summed diffs are to be under 1, and check to be sorted.
some like,
while(i < sizeof(fn) / sizeof(float)) {
res += fn[i] - floor(fn[i]);
if (res >= 1) {
res--;
in[i] = ceil(fn[i]);
}
else
in[i] = floor(fn[i]);
if (in[i-1] > in[i])
swap(in[i-1], in[i++]);
}
(it's paper code, so i didn't check the validity.)

Below a python and numpy implementation of #mikko-rantanen 's code. It took me a bit to put this together, so this may be helpful to future Googlers despite the age of the topic.
import numpy as np
from math import floor
original_array = np.array([1.2, 1.5, 1.4, 1.3, 1.7, 1.9])
# Calculate length of original array
# Need to substract 1, as indecies start at 0, but product of dimensions
# results in a count starting at 1
array_len = original_array.size - 1 # Index starts at 0, but product at 1
# Calculate expected sum of original values (must be integer)
expected_sum = np.sum(original_array)
# Collect values for temporary array population
array_list = []
lower_sum = 0
for i, j in enumerate(np.nditer(original_array)):
array_list.append([i, floor(j), j - floor(j)]) # Original index, lower bound, roundoff error
# Calculate the lower sum of values
lower_sum += floor(j)
# Populate temporary array
temp_array = np.array(array_list)
# Sort temporary array based on roundoff error
temp_array = temp_array[temp_array[:,2].argsort()]
# Calculate difference between expected sum and the lower sum
# This is the number of integers that need to be rounded up from the lower sum
# The sort order (roundoff error) ensures that the value closest to be
# rounded up is at the bottom of the array
difference = int(expected_sum - lower_sum)
# Add one to the number most likely to round up to eliminate the difference
temp_array_len, _ = temp_array.shape
for i in xrange(temp_array_len - difference, temp_array_len):
temp_array[i,1] += 1
# Re-sort the array based on original index
temp_array = temp_array[temp_array[:,0].argsort()]
# Return array to one-dimensional format of original array
array_list = []
for i in xrange(temp_array_len):
array_list.append(int(temp_array[i,1]))
new_array = np.array(array_list)

Calculate sum of floor and sum of numbers.
Round sum of numbers, and subtract with sum of floor, the difference is how many ceiling we need to patch(how many +1 we need).
Sorting the array with its difference of ceiling to number, from small to large.
For diff times(diff is how many ceiling we need to patch), we set result as ceiling of number. Others set result as floor of numbers.
public class Float_Ceil_or_Floor {
public static int[] getNearlyArrayWithSameSum(double[] numbers) {
NumWithDiff[] numWithDiffs = new NumWithDiff[numbers.length];
double sum = 0.0;
int floorSum = 0;
for (int i = 0; i < numbers.length; i++) {
int floor = (int)numbers[i];
int ceil = floor;
if (floor < numbers[i]) ceil++; // check if a number like 4.0 has same floor and ceiling
floorSum += floor;
sum += numbers[i];
numWithDiffs[i] = new NumWithDiff(ceil,floor, ceil - numbers[i]);
}
// sort array by its diffWithCeil
Arrays.sort(numWithDiffs, (a,b)->{
if(a.diffWithCeil < b.diffWithCeil) return -1;
else return 1;
});
int roundSum = (int) Math.round(sum);
int diff = roundSum - floorSum;
int[] res = new int[numbers.length];
for (int i = 0; i < numWithDiffs.length; i++) {
if(diff > 0 && numWithDiffs[i].floor != numWithDiffs[i].ceil){
res[i] = numWithDiffs[i].ceil;
diff--;
} else {
res[i] = numWithDiffs[i].floor;
}
}
return res;
}
public static void main(String[] args) {
double[] arr = { 1.2, 3.7, 100, 4.8 };
int[] res = getNearlyArrayWithSameSum(arr);
for (int i : res) System.out.print(i + " ");
}
}
class NumWithDiff {
int ceil;
int floor;
double diffWithCeil;
public NumWithDiff(int c, int f, double d) {
this.ceil = c;
this.floor = f;
this.diffWithCeil = d;
}
}

Without minimizing the variance, here's a trivial one:
Sort values from left to right.
Round all down to the next integer.
Let the sum of those integers be K. Increase the N-K rightmost values by 1.
Restore original order.
This obviously satisfies your conditions 1.-4. Alternatively, you could round to the closest integer, and increase N-K of the ones you had rounded down. You can do this greedily by the difference between the original and rounded value, but each run of rounded-down values must only be increased from right to left, to maintain sorted order.

If you can accept a small change in the total while improving the variance this will probabilistically preserve totals in python:
import math
import random
integer_list = [int(x) + int(random.random() <= math.modf(x)[0]) for x in my_list]
to explain it rounds all numbers down and adds one with a probability equal to the fractional part i.e. one in ten 0.1 will become 1 and the rest 0
this works for statistical data where you are converting a large numbers of fractional persons into either 1 person or 0 persons

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Rearrange list to satisfy a condition - algorithm

Related

Need an algorithm to "evenly" iterate over all possible combinations of a set of values

Number of ways to pick the elements of an array?

Minimum common remainder of division

Allocate an array of integers proportionally compensating for rounding errors

How to round floats to integers while preserving their sum?

Categories

Resources