Consider the four percentages below, represented as float numbers:
13.626332%
47.989636%
9.596008%
28.788024%
-----------
100.000000%
I need to represent these percentages as whole numbers. If I simply use Math.round(), I end up with a total of 101%.
14 + 48 + 10 + 29 = 101
If I use parseInt(), I end up with a total of 97%.
13 + 47 + 9 + 28 = 97
What's a good algorithm to represent any number of percentages as whole numbers while still maintaining a total of 100%?
Edit: After reading some of the comments and answers, there are clearly many ways to go about solving this.
In my mind, to remain true to the numbers, the "right" result is the one that minimizes the overall error, defined by how much error rounding would introduce relative to the actual value:
value rounded error decision
----------------------------------------------------
13.626332 14 2.7% round up (14)
47.989636 48 0.0% round up (48)
9.596008 10 4.0% don't round up (9)
28.788024 29 2.7% round up (29)
In case of a tie (3.33, 3.33, 3.33) an arbitrary decision can be made (e.g. 3, 4, 3).
There are many ways to do just this, provided you are not concerned about reliance on the original decimal data.
The first and perhaps most popular method would be the Largest Remainder Method
Which is basically:
Rounding everything down
Getting the difference in sum and 100
Distributing the difference by adding 1 to items in decreasing order of their decimal parts
In your case, it would go like this:
13.626332%
47.989636%
9.596008%
28.788024%
If you take the integer parts, you get
13
47
9
28
which adds up to 97, and you want to add three more. Now, you look at the decimal parts, which are
.626332%
.989636%
.596008%
.788024%
and take the largest ones until the total reaches 100. So you would get:
14
48
9
29
Alternatively, you can simply choose to show one decimal place instead of integer values. So the numbers would be 48.3 and 23.9 etc. This would drop the variance from 100 by a lot.
Probably the "best" way to do this (quoted since "best" is a subjective term) is to keep a running (non-integral) tally of where you are, and round that value.
Then use that along with the history to work out what value should be used. For example, using the values you gave:
Value CumulValue CumulRounded PrevBaseline Need
--------- ---------- ------------ ------------ ----
0
13.626332 13.626332 14 0 14 ( 14 - 0)
47.989636 61.615968 62 14 48 ( 62 - 14)
9.596008 71.211976 71 62 9 ( 71 - 62)
28.788024 100.000000 100 71 29 (100 - 71)
---
100
At each stage, you don't round the number itself. Instead, you round the accumulated value and work out the best integer that reaches that value from the previous baseline - that baseline is the cumulative value (rounded) of the previous row.
This works because you're not losing information at each stage but rather using the information more intelligently. The 'correct' rounded values are in the final column and you can see that they sum to 100.
You can see the difference between this and blindly rounding each value, in the third value above. While 9.596008 would normally round up to 10, the accumulated 71.211976 correctly rounds down to 71 - this means that only 9 is needed to add to the previous baseline of 62.
This also works for "problematic" sequence like three roughly-1/3 values, where one of them should be rounded up:
Value CumulValue CumulRounded PrevBaseline Need
--------- ---------- ------------ ------------ ----
0
33.333333 33.333333 33 0 33 ( 33 - 0)
33.333333 66.666666 67 33 34 ( 67 - 33)
33.333333 99.999999 100 67 33 (100 - 67)
---
100
Since none of the answers here seem to solve it properly, here's my semi-obfuscated version using underscorejs:
function foo(l, target) {
var off = target - _.reduce(l, function(acc, x) { return acc + Math.round(x) }, 0);
return _.chain(l).
sortBy(function(x) { return Math.round(x) - x }).
map(function(x, i) { return Math.round(x) + (off > i) - (i >= (l.length + off)) }).
value();
}
foo([13.626332, 47.989636, 9.596008, 28.788024], 100) // => [48, 29, 14, 9]
foo([16.666, 16.666, 16.666, 16.666, 16.666, 16.666], 100) // => [17, 17, 17, 17, 16, 16]
foo([33.333, 33.333, 33.333], 100) // => [34, 33, 33]
foo([33.3, 33.3, 33.3, 0.1], 100) // => [34, 33, 33, 0]
The goal of rounding is to generate the least amount of error. When you're rounding a single value, that process is simple and straightforward and most people understand it easily. When you're rounding multiple numbers at the same time, the process gets trickier - you must define how the errors are going to combine, i.e. what must be minimized.
The well-voted answer by Varun Vohra minimizes the sum of the absolute errors, and it's very simple to implement. However there are edge cases it does not handle - what should be the result of rounding 24.25, 23.25, 27.25, 25.25? One of those needs to be rounded up instead of down. You would probably just arbitrarily pick the first or last one in the list.
Perhaps it's better to use the relative error instead of the absolute error. Rounding 23.25 up to 24 changes it by 3.2% while rounding 27.25 up to 28 only changes it by 2.8%. Now there's a clear winner.
It's possible to tweak this even further. One common technique is to square each error, so that large errors count disproportionately more than small ones. I'd also use a non-linear divisor to get the relative error - it doesn't seem right that an error at 1% is 99 times more important than an error at 99%. In the code below I've used the square root.
The complete algorithm is as follows:
Sum the percentages after rounding them all down, and subtract from 100. This tells you how many of those percentages must be rounded up instead.
Generate two error scores for each percentage, one when when rounded down and one when rounded up. Take the difference between the two.
Sort the error differences produced above.
For the number of percentages that need to be rounded up, take an item from the sorted list and increment the rounded down percentage by 1.
You may still have more than one combination with the same error sum, for example 33.3333333, 33.3333333, 33.3333333. This is unavoidable, and the result will be completely arbitrary. The code I give below prefers to round up the values on the left.
Putting it all together in Python looks like this.
from math import isclose, sqrt
def error_gen(actual, rounded):
divisor = sqrt(1.0 if actual < 1.0 else actual)
return abs(rounded - actual) ** 2 / divisor
def round_to_100(percents):
if not isclose(sum(percents), 100):
raise ValueError
n = len(percents)
rounded = [int(x) for x in percents]
up_count = 100 - sum(rounded)
errors = [(error_gen(percents[i], rounded[i] + 1) - error_gen(percents[i], rounded[i]), i) for i in range(n)]
rank = sorted(errors)
for i in range(up_count):
rounded[rank[i][1]] += 1
return rounded
>>> round_to_100([13.626332, 47.989636, 9.596008, 28.788024])
[14, 48, 9, 29]
>>> round_to_100([33.3333333, 33.3333333, 33.3333333])
[34, 33, 33]
>>> round_to_100([24.25, 23.25, 27.25, 25.25])
[24, 23, 28, 25]
>>> round_to_100([1.25, 2.25, 3.25, 4.25, 89.0])
[1, 2, 3, 4, 90]
As you can see with that last example, this algorithm is still capable of delivering non-intuitive results. Even though 89.0 needs no rounding whatsoever, one of the values in that list needed to be rounded up; the lowest relative error results from rounding up that large value rather than the much smaller alternatives.
This answer originally advocated going through every possible combination of round up/round down, but as pointed out in the comments a simpler method works better. The algorithm and code reflect that simplification.
I wrote a C# version rounding helper, the algorithm is same as Varun Vohra's answer, hope it helps.
public static List<decimal> GetPerfectRounding(List<decimal> original,
decimal forceSum, int decimals)
{
var rounded = original.Select(x => Math.Round(x, decimals)).ToList();
Debug.Assert(Math.Round(forceSum, decimals) == forceSum);
var delta = forceSum - rounded.Sum();
if (delta == 0) return rounded;
var deltaUnit = Convert.ToDecimal(Math.Pow(0.1, decimals)) * Math.Sign(delta);
List<int> applyDeltaSequence;
if (delta < 0)
{
applyDeltaSequence = original
.Zip(Enumerable.Range(0, int.MaxValue), (x, index) => new { x, index })
.OrderBy(a => original[a.index] - rounded[a.index])
.ThenByDescending(a => a.index)
.Select(a => a.index).ToList();
}
else
{
applyDeltaSequence = original
.Zip(Enumerable.Range(0, int.MaxValue), (x, index) => new { x, index })
.OrderByDescending(a => original[a.index] - rounded[a.index])
.Select(a => a.index).ToList();
}
Enumerable.Repeat(applyDeltaSequence, int.MaxValue)
.SelectMany(x => x)
.Take(Convert.ToInt32(delta/deltaUnit))
.ForEach(index => rounded[index] += deltaUnit);
return rounded;
}
It pass the following Unit test:
[TestMethod]
public void TestPerfectRounding()
{
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> {3.333m, 3.334m, 3.333m}, 10, 2),
new List<decimal> {3.33m, 3.34m, 3.33m});
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> {3.33m, 3.34m, 3.33m}, 10, 1),
new List<decimal> {3.3m, 3.4m, 3.3m});
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> {3.333m, 3.334m, 3.333m}, 10, 1),
new List<decimal> {3.3m, 3.4m, 3.3m});
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> { 13.626332m, 47.989636m, 9.596008m, 28.788024m }, 100, 0),
new List<decimal> {14, 48, 9, 29});
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> { 16.666m, 16.666m, 16.666m, 16.666m, 16.666m, 16.666m }, 100, 0),
new List<decimal> { 17, 17, 17, 17, 16, 16 });
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> { 33.333m, 33.333m, 33.333m }, 100, 0),
new List<decimal> { 34, 33, 33 });
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> { 33.3m, 33.3m, 33.3m, 0.1m }, 100, 0),
new List<decimal> { 34, 33, 33, 0 });
}
DO NOT sum the rounded numbers. You're going to have inaccurate results. The total could be off significantly depending on the number of terms and the distribution of fractional parts.
Display the rounded numbers but sum the actual values. Depending on how you're presenting the numbers, the actual way to do that would vary. That way you get
14
48
10
29
__
100
Any way you go you're going to have discrepancy. There's no way in your example to show numbers that add up to 100 without "rounding" one value the wrong way (least error would be changing 9.596 to 9)
EDIT
You need to choose between one of the following:
Accuracy of the items
Accuracy of the sum (if you're summing rounded values)
Consistency between the rounded items and the rounded sum)
Most of the time when dealing with percentages #3 is the best option because it's more obvious when the total equals 101% than when the individual items don't total to 100, and you keep the individual items accurate. "Rounding" 9.596 to 9 is inaccurate in my opinion.
To explain this I sometimes add a footnote that explains that the individual values are rounded and may not total 100% - anyone that understands rounding should be able to understand that explanation.
You could try keeping track of your error due to rounding, and then rounding against the grain if the accumulated error is greater than the fractional portion of the current number.
13.62 -> 14 (+.38)
47.98 -> 48 (+.02 (+.40 total))
9.59 -> 10 (+.41 (+.81 total))
28.78 -> 28 (round down because .81 > .78)
------------
100
Not sure if this would work in general, but it seems to work similar if the order is reversed:
28.78 -> 29 (+.22)
9.59 -> 9 (-.37; rounded down because .59 > .22)
47.98 -> 48 (-.35)
13.62 -> 14 (+.03)
------------
100
I'm sure there are edge cases where this might break down, but any approach is going to be at least somewhat arbitrary since you're basically modifying your input data.
I'm not sure what level of accuracy you need, but what I would do is simply add 1 the first n numbers, n being the ceil of the total sum of decimals. In this case that is 3, so I would add 1 to the first 3 items and floor the rest. Of course this is not super accurate, some numbers might be rounded up or down when it shouldn't but it works okay and will always result in 100%.
So [ 13.626332, 47.989636, 9.596008, 28.788024 ] would be [14, 48, 10, 28] because Math.ceil(.626332+.989636+.596008+.788024) == 3
function evenRound( arr ) {
var decimal = -~arr.map(function( a ){ return a % 1 })
.reduce(function( a,b ){ return a + b }); // Ceil of total sum of decimals
for ( var i = 0; i < decimal; ++i ) {
arr[ i ] = ++arr[ i ]; // compensate error by adding 1 the the first n items
}
return arr.map(function( a ){ return ~~a }); // floor all other numbers
}
var nums = evenRound( [ 13.626332, 47.989636, 9.596008, 28.788024 ] );
var total = nums.reduce(function( a,b ){ return a + b }); //=> 100
You can always inform users that the numbers are rounded and may not be super-accurate...
I once wrote an unround tool, to find the minimal perturbation to a set of numbers to match a goal. It was a different problem, but one could in theory use a similar idea here. In this case, we have a set of choices.
Thus for the first element, we can either round it up to 14, or down to 13. The cost (in a binary integer programming sense) of doing so is less for the round up than the round down, because the round down requires we move that value a larger distance. Similarly, we can round each number up or down, so there are a total of 16 choices we must choose from.
13.626332
47.989636
9.596008
+ 28.788024
-----------
100.000000
I'd normally solve the general problem in MATLAB, here using bintprog, a binary integer programming tool, but there are only a few choices to be tested, so it is easy enough with simple loops to test out each of the 16 alternatives. For example, suppose we were to round this set as:
Original Rounded Absolute error
13.626 13 0.62633
47.99 48 0.01036
9.596 10 0.40399
+ 28.788 29 0.21198
---------------------------------------
100.000 100 1.25266
The total absolute error made is 1.25266. It can be reduced slightly by the following alternative rounding:
Original Rounded Absolute error
13.626 14 0.37367
47.99 48 0.01036
9.596 9 0.59601
+ 28.788 29 0.21198
---------------------------------------
100.000 100 1.19202
In fact, this will be the optimal solution in terms of the absolute error. Of course, if there were 20 terms, the search space will be of size 2^20 = 1048576. For 30 or 40 terms, that space will be of significant size. In that case, you would need to use a tool that can efficiently search the space, perhaps using a branch and bound scheme.
I think the following will achieve what you are after
function func( orig, target ) {
var i = orig.length, j = 0, total = 0, change, newVals = [], next, factor1, factor2, len = orig.length, marginOfErrors = [];
// map original values to new array
while( i-- ) {
total += newVals[i] = Math.round( orig[i] );
}
change = total < target ? 1 : -1;
while( total !== target ) {
// Iterate through values and select the one that once changed will introduce
// the least margin of error in terms of itself. e.g. Incrementing 10 by 1
// would mean an error of 10% in relation to the value itself.
for( i = 0; i < len; i++ ) {
next = i === len - 1 ? 0 : i + 1;
factor2 = errorFactor( orig[next], newVals[next] + change );
factor1 = errorFactor( orig[i], newVals[i] + change );
if( factor1 > factor2 ) {
j = next;
}
}
newVals[j] += change;
total += change;
}
for( i = 0; i < len; i++ ) { marginOfErrors[i] = newVals[i] && Math.abs( orig[i] - newVals[i] ) / orig[i]; }
// Math.round() causes some problems as it is difficult to know at the beginning
// whether numbers should have been rounded up or down to reduce total margin of error.
// This section of code increments and decrements values by 1 to find the number
// combination with least margin of error.
for( i = 0; i < len; i++ ) {
for( j = 0; j < len; j++ ) {
if( j === i ) continue;
var roundUpFactor = errorFactor( orig[i], newVals[i] + 1) + errorFactor( orig[j], newVals[j] - 1 );
var roundDownFactor = errorFactor( orig[i], newVals[i] - 1) + errorFactor( orig[j], newVals[j] + 1 );
var sumMargin = marginOfErrors[i] + marginOfErrors[j];
if( roundUpFactor < sumMargin) {
newVals[i] = newVals[i] + 1;
newVals[j] = newVals[j] - 1;
marginOfErrors[i] = newVals[i] && Math.abs( orig[i] - newVals[i] ) / orig[i];
marginOfErrors[j] = newVals[j] && Math.abs( orig[j] - newVals[j] ) / orig[j];
}
if( roundDownFactor < sumMargin ) {
newVals[i] = newVals[i] - 1;
newVals[j] = newVals[j] + 1;
marginOfErrors[i] = newVals[i] && Math.abs( orig[i] - newVals[i] ) / orig[i];
marginOfErrors[j] = newVals[j] && Math.abs( orig[j] - newVals[j] ) / orig[j];
}
}
}
function errorFactor( oldNum, newNum ) {
return Math.abs( oldNum - newNum ) / oldNum;
}
return newVals;
}
func([16.666, 16.666, 16.666, 16.666, 16.666, 16.666], 100); // => [16, 16, 17, 17, 17, 17]
func([33.333, 33.333, 33.333], 100); // => [34, 33, 33]
func([33.3, 33.3, 33.3, 0.1], 100); // => [34, 33, 33, 0]
func([13.25, 47.25, 11.25, 28.25], 100 ); // => [13, 48, 11, 28]
func( [25.5, 25.5, 25.5, 23.5], 100 ); // => [25, 25, 26, 24]
One last thing, I ran the function using the numbers originally given in the question to compare to the desired output
func([13.626332, 47.989636, 9.596008, 28.788024], 100); // => [48, 29, 13, 10]
This was different to what the question wanted => [ 48, 29, 14, 9]. I couldn't understand this until I looked at the total margin of error
-------------------------------------------------
| original | question | % diff | mine | % diff |
-------------------------------------------------
| 13.626332 | 14 | 2.74% | 13 | 4.5% |
| 47.989636 | 48 | 0.02% | 48 | 0.02% |
| 9.596008 | 9 | 6.2% | 10 | 4.2% |
| 28.788024 | 29 | 0.7% | 29 | 0.7% |
-------------------------------------------------
| Totals | 100 | 9.66% | 100 | 9.43% |
-------------------------------------------------
Essentially, the result from my function actually introduces the least amount of error.
Fiddle here
Note: the selected answer is changing the array order which is not preferred, here I provide more different variations that achieving the same result and keeping the array in order
Discussion
given [98.88, .56, .56] how do you want to round it? you have four option
1- round things up and subtract what is added from the rest of the numbers, so the result becomes [98, 1, 1]
this could be a good answer, but what if we have [97.5, .5, .5, .5, .5, .5]? then you need to round it up to [95, 1, 1, 1, 1, 1]
do you see how it goes? if you add more 0-like numbers, you will lose more value from the rest of your numbers. this could be very troublesome when you have a big array of zero-like number like [40, .5, .5 , ... , .5]. when you round up this, you could end up with an array of ones: [1, 1, .... , 1]
so round-up isn't a good option.
2- you round down the numbers. so [98.88, .56, .56] becomes [98, 0, 0], then you are 2 less than 100. you ignore anything that is already 0, then add up the difference to the biggest numbers. so bigger numbers will get more.
3- same as previous, round down numbers, but you sort descending based on the decimals, divide up the diff based on the decimal, so biggest decimal will get the diff.
4- you round up, but you add what you added to the next number. so like a wave what you have added will be redirected to the end of your array. so [98.88, .56, .56] becomes [99, 0, 1]
none of these are ideal, so be mindful that your data is going to lose its shape.
here I provide a code for cases 2 and 3 (as case No.1 is not practical when you have a lot of zero-like numbers). it's modern Js and doesn't need any library to use
2nd case
const v1 = [13.626332, 47.989636, 9.596008, 28.788024];// => [ 14, 48, 9, 29 ]
const v2 = [16.666, 16.666, 16.666, 16.666, 16.666, 16.666] // => [ 17, 17, 17, 17, 16, 16 ]
const v3 = [33.333, 33.333, 33.333] // => [ 34, 33, 33 ]
const v4 = [33.3, 33.3, 33.3, 0.1] // => [ 34, 33, 33, 0 ]
const v5 = [98.88, .56, .56] // =>[ 100, 0, 0 ]
const v6 = [97.5, .5, .5, .5, .5, .5] // => [ 100, 0, 0, 0, 0, 0 ]
const normalizePercentageByNumber = (input) => {
const rounded: number[] = input.map(x => Math.floor(x));
const afterRoundSum = rounded.reduce((pre, curr) => pre + curr, 0);
const countMutableItems = rounded.filter(x => x >=1).length;
const errorRate = 100 - afterRoundSum;
const deductPortion = Math.ceil(errorRate / countMutableItems);
const biggest = [...rounded].sort((a, b) => b - a).slice(0, Math.min(Math.abs(errorRate), countMutableItems));
const result = rounded.map(x => {
const indexOfX = biggest.indexOf(x);
if (indexOfX >= 0) {
x += deductPortion;
console.log(biggest)
biggest.splice(indexOfX, 1);
return x;
}
return x;
});
return result;
}
3rd case
const normalizePercentageByDecimal = (input: number[]) => {
const rounded= input.map((x, i) => ({number: Math.floor(x), decimal: x%1, index: i }));
const decimalSorted= [...rounded].sort((a,b)=> b.decimal-a.decimal);
const sum = rounded.reduce((pre, curr)=> pre + curr.number, 0) ;
const error= 100-sum;
for (let i = 0; i < error; i++) {
const element = decimalSorted[i];
element.number++;
}
const result= [...decimalSorted].sort((a,b)=> a.index-b.index);
return result.map(x=> x.number);
}
4th case
you just need to calculate how much extra air added or deducted to your numbers on each roundup and, add or subtract it again in the next item.
const v1 = [13.626332, 47.989636, 9.596008, 28.788024];// => [14, 48, 10, 28 ]
const v2 = [16.666, 16.666, 16.666, 16.666, 16.666, 16.666] // => [17, 16, 17, 16, 17, 17]
const v3 = [33.333, 33.333, 33.333] // => [33, 34, 33]
const v4 = [33.3, 33.3, 33.3, 0.1] // => [33, 34, 33, 0]
const normalizePercentageByWave= v4.reduce((pre, curr, i, arr) => {
let number = Math.round(curr + pre.decimal);
let total = pre.total + number;
const decimal = curr - number;
if (i == arr.length - 1 && total < 100) {
const diff = 100 - total;
total += diff;
number += diff;
}
return { total, numbers: [...pre.numbers, number], decimal };
}, { total: 0, numbers: [], decimal: 0 });
If you have just just two options you are good to use Math.round(). Only problematic pair of values are X.5 (eg. 37.5 and 62.5) it will round both values up and you will end up with 101% as you can try here:
https://jsfiddle.net/f8np1t0k/2/
Since you need to show always 100% you simply remove one percentage from on of them, for example on first one
const correctedARounded = Number.isInteger(aRounded-0.5) ? a - 1 : a
Or you can favor the option with more % votes.
The error of 1% diff happens 114 times for 10k cases of divisions between pairs of 1-100 values.
My JS implementation for the well-voted answer by Varun Vohra
const set1 = [13.626332, 47.989636, 9.596008, 28.788024];
// const set2 = [24.25, 23.25, 27.25, 25.25];
const values = set1;
console.log('Total: ', values.reduce((accum, each) => accum + each));
console.log('Incorrectly Rounded: ',
values.reduce((accum, each) => accum + Math.round(each), 0));
const adjustValues = (values) => {
// 1. Separate integer and decimal part
// 2. Store both in a new array of objects sorted by decimal part descending
// 3. Add in original position to "put back" at the end
const flooredAndSortedByDecimal = values.map((value, position) => (
{
floored: Math.floor(value),
decimal: value - Number.parseInt(value),
position
}
)).sort(({decimal}, {decimal: otherDecimal}) => otherDecimal - decimal);
const roundedTotal = values.reduce((total, value) => total + Math.floor(value), 0);
let availableForDistribution = 100 - roundedTotal;
// Add 1 to each value from what's available
const adjustedValues = flooredAndSortedByDecimal.map(value => {
const { floored, ...rest } = value;
let finalPercentage = floored;
if(availableForDistribution > 0){
finalPercentage = floored + 1;
availableForDistribution--;
}
return {
finalPercentage,
...rest
}
});
// Put back and return the new values
return adjustedValues
.sort(({position}, {position: otherPosition}) => position - otherPosition)
.map(({finalPercentage}) => finalPercentage);
}
const finalPercentages = adjustValues(values);
console.log({finalPercentages})
// { finalPercentage: [14, 48, 9, 29]}
Or something like this for brevity, where you just accumulate the error...
const p = [13.626332, 47.989636, 9.596008, 28.788024];
const round = (a, e = 0) => a.map(x => (r = Math.round(x + e), e += x - r, r));
console.log(round(p));
Result: [14, 48, 9, 29]
If you are rounding it there is no good way to get it exactly the same in all case.
You can take the decimal part of the N percentages you have (in the example you gave it is 4).
Add the decimal parts. In your example you have total of fractional part = 3.
Ceil the 3 numbers with highest fractions and floor the rest.
(Sorry for the edits)
If you really must round them, there are already very good suggestions here (largest remainder, least relative error, and so on).
There is also already one good reason not to round (you'll get at least one number that "looks better" but is "wrong"), and how to solve that (warn your readers) and that is what I do.
Let me add on the "wrong" number part.
Suppose you have three events/entitys/... with some percentages that you approximate as:
DAY 1
who | real | app
----|-------|------
A | 33.34 | 34
B | 33.33 | 33
C | 33.33 | 33
Later on the values change slightly, to
DAY 2
who | real | app
----|-------|------
A | 33.35 | 33
B | 33.36 | 34
C | 33.29 | 33
The first table has the already mentioned problem of having a "wrong" number: 33.34 is closer to 33 than to 34.
But now you have a bigger error. Comparing day 2 to day 1, the real percentage value for A increased, by 0.01%, but the approximation shows a decrease by 1%.
That is a qualitative error, probably quite worse that the initial quantitative error.
One could devise a approximation for the whole set but, you may have to publish data on day one, thus you'll not know about day two. So, unless you really, really, must approximate, you probably better not.
Here's a simpler Python implementation of #varun-vohra answer:
def apportion_pcts(pcts, total):
proportions = [total * (pct / 100) for pct in pcts]
apportions = [math.floor(p) for p in proportions]
remainder = total - sum(apportions)
remainders = [(i, p - math.floor(p)) for (i, p) in enumerate(proportions)]
remainders.sort(key=operator.itemgetter(1), reverse=True)
for (i, _) in itertools.cycle(remainders):
if remainder == 0:
break
else:
apportions[i] += 1
remainder -= 1
return apportions
You need math, itertools, operator.
check if this is valid or not as far as my test cases I am able to get this working.
let's say number is k;
sort percentage by descending oder.
iterate over each percentage from descending order.
calculate percentage of k for first percentage take Math.Ceil of output.
next k = k-1
iterate over till all percentage is consumed.
I have implemented the method from Varun Vohra's answer here for both lists and dicts.
import math
import numbers
import operator
import itertools
def round_list_percentages(number_list):
"""
Takes a list where all values are numbers that add up to 100,
and rounds them off to integers while still retaining a sum of 100.
A total value sum that rounds to 100.00 with two decimals is acceptable.
This ensures that all input where the values are calculated with [fraction]/[total]
and the sum of all fractions equal the total, should pass.
"""
# Check input
if not all(isinstance(i, numbers.Number) for i in number_list):
raise ValueError('All values of the list must be a number')
# Generate a key for each value
key_generator = itertools.count()
value_dict = {next(key_generator): value for value in number_list}
return round_dictionary_percentages(value_dict).values()
def round_dictionary_percentages(dictionary):
"""
Takes a dictionary where all values are numbers that add up to 100,
and rounds them off to integers while still retaining a sum of 100.
A total value sum that rounds to 100.00 with two decimals is acceptable.
This ensures that all input where the values are calculated with [fraction]/[total]
and the sum of all fractions equal the total, should pass.
"""
# Check input
# Only allow numbers
if not all(isinstance(i, numbers.Number) for i in dictionary.values()):
raise ValueError('All values of the dictionary must be a number')
# Make sure the sum is close enough to 100
# Round value_sum to 2 decimals to avoid floating point representation errors
value_sum = round(sum(dictionary.values()), 2)
if not value_sum == 100:
raise ValueError('The sum of the values must be 100')
# Initial floored results
# Does not add up to 100, so we need to add something
result = {key: int(math.floor(value)) for key, value in dictionary.items()}
# Remainders for each key
result_remainders = {key: value % 1 for key, value in dictionary.items()}
# Keys sorted by remainder (biggest first)
sorted_keys = [key for key, value in sorted(result_remainders.items(), key=operator.itemgetter(1), reverse=True)]
# Otherwise add missing values up to 100
# One cycle is enough, since flooring removes a max value of < 1 per item,
# i.e. this loop should always break before going through the whole list
for key in sorted_keys:
if sum(result.values()) == 100:
break
result[key] += 1
# Return
return result
For those having the percentages in a pandas Series, here is my implemantation of the Largest remainder method (as in Varun Vohra's answer), where you can even select the decimals to which you want to round.
import numpy as np
def largestRemainderMethod(pd_series, decimals=1):
floor_series = ((10**decimals * pd_series).astype(np.int)).apply(np.floor)
diff = 100 * (10**decimals) - floor_series.sum().astype(np.int)
series_decimals = pd_series - floor_series / (10**decimals)
series_sorted_by_decimals = series_decimals.sort_values(ascending=False)
for i in range(0, len(series_sorted_by_decimals)):
if i < diff:
series_sorted_by_decimals.iloc[[i]] = 1
else:
series_sorted_by_decimals.iloc[[i]] = 0
out_series = ((floor_series + series_sorted_by_decimals) / (10**decimals)).sort_values(ascending=False)
return out_series
Here's a Ruby gem that implements the Largest Remainder method:
https://github.com/jethroo/lare_round
To use:
a = Array.new(3){ BigDecimal('0.3334') }
# => [#<BigDecimal:887b6c8,'0.3334E0',9(18)>, #<BigDecimal:887b600,'0.3334E0',9(18)>, #<BigDecimal:887b4c0,'0.3334E0',9(18)>]
a = LareRound.round(a,2)
# => [#<BigDecimal:8867330,'0.34E0',9(36)>, #<BigDecimal:8867290,'0.33E0',9(36)>, #<BigDecimal:88671f0,'0.33E0',9(36)>]
a.reduce(:+).to_f
# => 1.0
I wrote a function in Javascript that takes an array of percentages and outputs an array with rounded percentages using the Largest Remainder Method. It doesn't use any libraries.
Input: [21.6, 46.7, 31, 0.5, 0.2]
Output: [22, 47, 31, 0, 0]
const values = [21.6, 46.7, 31, 0.5, 0.2];
console.log(roundPercentages(values));
function roundPercentages(values) {
const flooredValues = values.map(e => Math.floor(e));
const remainders = values.map(e => e - Math.floor(e));
const totalRemainder = 100 - flooredValues.reduce((a, b) => a + b);
// Deep copy because order of remainders is important
[...remainders]
// Sort from highest to lowest remainder
.sort((a, b) => b - a)
// Get the n largest remainder values, where n = totalRemainder
.slice(0, totalRemainder)
// Add 1 to the floored percentages with the highest remainder (divide the total remainder)
.forEach(e => flooredValues[remainders.indexOf(e)] += 1);
return flooredValues;
}
This is a case for banker's rounding, aka 'round half-even'. It is supported by BigDecimal. Its purpose is to ensure that rounding balances out, i.e. doesn't favour either the bank orthe customer.
I feel that it should be something very simple and obvious but just stuck on this for the last half an hour and can't move on.
All I need is to split an array of elements into N groups based on element index.
For example we have an array of 30 elements [e1,e2,...e30], that has to be divided into N=3 groups like this:
group1: [e1, ..., e10]
group2: [e11, ..., e20]
group3: [e21, ..., e30]
I came up with nasty mess like this for N=3 (pseudo language, I left multiplication on 0 and 1 just for clarification):
for(i=0;i<array_size;i++) {
if(i>=0*(array_size/3) && i<1*(array_size/3) {
print "group1";
} else if(i>=1*(array_size/3) && i<2*(array_size/3) {
print "group2";
} else if(i>=2*(array_size/3) && i<3*(array_size/3)
print "group3";
}
}
But what would be the proper general solution?
Thanks.
What about something like this?
for(i=0;i<array_size;i++) {
print "group" + (Math.floor(i/(array_size/N)) + 1)
}
Here's a little function which will do what you want - it presumes you know the number of groups you want to make:
function arrayToGroups(source, groups) {
//This is the array of groups to return:
var grouped = [];
//work out the size of the group
var groupSize = Math.ceil(source.length/groups);
//clone the source array so we can safely splice it (splicing modifies the array)
var queue = source.slice(0);
for (var r=0;r<groups;r++) {
//Grab the next groupful from the queue, and append it to the array of groups
grouped.push(queue.splice(0, groupSize));
}
return grouped;
}
And you use it like:
var herbs = ['basil', 'marjoram', 'aniseed', 'parsely', 'chives', 'sage', 'fennel', 'oregano', 'thyme', 'tarragon', 'rosemary'];
var herbGroups = arrayToGroups(herbs, 3);
which returns:
herbGroups[0] = ['basil', 'marjoram', 'aniseed', 'parsely']
herbGroups[1] = ['chives', 'sage', 'fennel', 'oregano']
herbGroups[2] = ['thyme', 'tarragon', 'rosemary']
It doesn't do any sanity checking to make sure you pass in an array and a number, but you could add that easily enough. You could probably prototype it into the Javascript's object type, too, which would give you a handy 'toGroups' method on Arrays.
Using a vector language makes this task simple, right tool and all that. Just thought I'd throw this out there to let folks check out an alternative methodology.
The explained version in K (an APL descendent):
split:{[values;n] / define function split with two parameters
enum:!n / ! does enumerate from 0 through n exclusive, : is assign
floor:_(#values)%n / 33 for this sample, % is divide, _ floor, # count
cut:floor*enum / 0 33 66 for this sample data, * multiplies atom * vector
:cut _ values / cut the values at the given cutpoints, yielding #cut lists
}
values:1+!30 / generate values 1 through 30
n:3 / how many groups to split into
groups:split[values;n] / set the groups
yields the expected output:
(1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30)
The short version in K :
split:{((_(#x)%y)*!y)_ x}
groups:split[1+!30;3]
yields the same output:
(1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30)
I modified Beejamin's function above and just wanted to share it.
function arrayToGroups($source, $pergroup) {
$grouped = array();
$groupCount = ceil(count($source)/$pergroup);
$queue = $source;
for ($r=0; $r<$groupCount; $r++) {
array_push($grouped, array_splice($queue, 0, $pergroup));
}
return $grouped;
}
This asks how many items to have per group instead of how many groups total. PHP.
const int g = 3; // number of groups
const int n = (array_size + g - 1)/g; // elements per group
for (i=0,j=1; i<array_size; ++i) {
if (i > j*n)
++j;
printf("Group %d\n", j);
}
int group[3][10];
int groupIndex = 0;
int itemIndex = 0;
for(i = 0; i < array_size; i++)
{
group[groupIndex][itemIndex] = big_array[i];
itemIndex++;
if (itemIndex == 10)
{
itemIndex = 0;
groupIndex++;
}
}
There's probably an infinite number of ways of do this.
I'd suggest: for each group, create a base pointer and count.
struct group {foo * ptr; size_t count };
group * pgroups = new group [ngroups];
size_t objects_per_group = array_size / ngroups;
for (unsigned u = 0; u < ngroups; ++u ) {
group & g = pgroups[u];
size_t index = u * objects_per_group;
g.ptr = & array [index];
g.count = min (objects_per_group, array_size - index); // last group may have less!
}
...`
for (unsigned u = 0; u < ngroups; ++u) {
// group "g" is an array at pgroups[g].ptr, dimension pgroups[g].count
group & g = pgroups[u];
// enumerate the group:
for (unsigned v = 0; v < g.count; ++v) {
fprintf (stdout, "group %u, item %u, %s\n",
(unsigned) u, (unsigned) v, (const char *) g.ptr[v]->somestring);
} }
delete[] pgroups;
I think the problem is a little more complicated; and considering that your only look at group as a 1 dimensional problem your going to get a very odd view of what groups actually are.
Firstly the problem is dimensional according to the number of group primes, and group combinations you are dealing with. In Mathematics; this is represented as n to the power of n or n^n which can be translated to !n (factor of n).
If I have 5 groups arrayed as (1, 2, 3, 4, 5) then I wanted to represent it as certain groups or combonations of groups according to a factorial expression then the combonations get bigger
Group 1x1 = 1,2,3,4,5
Group 2x1 = 12, 23, 45, 13, 14, 15, 21, 24, 25, 31, 32, 34, 35, 41, 42, 43, 45, 51, 52, 53, 54
so the strategy creates a branch systematic branch (easy enough)
12, 13, 14, 15
21, 22, 23, 24
31, 32, 34, 35
41, 42, 43, 45
51, 52, 53, 55
Group 1 + 2x2x1 = (1, 23, 45), (2, 13, 45), (3, 12, 45), (4, 12, 35), (1, 24, 35), (1, 25, 35), (1, 32, 45), (1, 34, 25), (1, 35, 24), ... etc
As you can see when you begin to add factorial sets the comboniations become not so easy to create a mathematic reference to express the terms. It gets worst when you get up into a base set > 3 or 4 length.
If I am understanding your question: you want to expressing in a generic terms an algorythm which allows you to create grouping strategies programmatically?
This is a complicated set; and is represented best in calculus; as set theory. Otherwise all your doing is a two dimensional array handling.
the first Array expresses the grouping strategy;
the second Array expresses the grouping elements.
I don't think this is what your being asked to do, because the term "GROUP" in mathematics has a very specific allocation for the term. You should not use the term group; rather express it as a set; set1, set2 if that is what you are doing.
Set1 contains elements of set2; and therefor this is handled with the same mathematics as Sets and unions are expressed. Lookup "Vin Diagrams" and "Union"; avoid using the term group unless you are representing the factor of a set.
http://en.wikipedia.org/wiki/Group_(mathematics)
I think what you are trying to express is the groups within a known set or table; This is on the wikipedia.org example D2.
In which case that means you have to look at the problem like a rubik's cube; and it gets complicated.
I'm working the same problem in javascript; when I am done I might publish it ;). It's very complicated.