RxJS — Group items by multiple conditions and process each group - rxjs

I have set of emitted values (stock market transactions) with time and price and vol like this...
TIME PRICE VOL
13:45:01 12 1
13:45:01 12 1
13:45:01 12 10
13:45:01 13 1
13:45:01 13 3
13:45:02 13 1
And I just want to merge values within the same second and same value and accumulate VOL so the resulting observable will be emitting such values based on source above:
TIME PRICE ACCUMULATED VOL
13:45:01 12 12
13:45:01 13 4
13:45:02 13 1
This is some kind of grouping and then reducing. I'm reading documentation but can't figure out which operators to use...
Can somebody help?

Assuming stockSource is the realtime stream that push buy/sell item. You can use scan to accumulate your data and compute accumulation count there.
stockSource.pipe(scan((acc,curr)=>{
const foundObj=acc.find(obj=>obj.PRICE===curr.PRICE);
if(!foundObj){ return [...acc,curr]}
foundObj.ACCUMULATED++;
return acc
},[])).subscribe()

A Custom Operator
I think there must be a better way to do this, but it didn't come to me, so I made a custom operator.
This might be overkill? No idea, You can try it out/ test it yourself if you're so inclined.
Note that for many operators, returning from(thing) is the same as returning thing. So when I return an array from concatMap, that array is turned into a stream for me.
function customOperator(){
return s => defer(() => {
let currentTime = "";
let buffer = {};
return s.pipe(
concatMap(({time, price, vol}) => {
let ret = [];
if(currentTime != time){
ret = Object.values(buffer);
buffer = {}
}
let accumVol = buffer[price]?.accumulatedVol;
accumVol = accumVol != null? accumVol : 0;
buffer[price] = {
time,
price,
accumulatedVol: accumVol + vol
};
currentTime = time;
return ret;
}),
s => concat(s, defer(() => Object.values(buffer)))
)
})
}
function makeTransaction(time, price, vol){
return ({time, price, vol});
}
of(
makeTransaction("13:45:01", 12, 1),
makeTransaction("13:45:01", 12, 1),
makeTransaction("13:45:01", 12, 10),
makeTransaction("13:45:01", 13, 1),
makeTransaction("13:45:01", 13, 3),
makeTransaction("13:45:02", 13, 1)
).pipe(
customOperator()
).subscribe(console.log);

Unless I misunderstood your requirements, this looks like a job for groupBy:
Generate a key to group items by time and price
There will be one stream per set
Reduce each stream into a single reduced emission to accumulate all volumes
Merge back each stream into the output
const transactions$ =
of( {time: '13:45:01', price: 12, volume: 1}
, {time: '13:45:01', price: 12, volume: 1}
, {time: '13:45:01', price: 12, volume: 10}
, {time: '13:45:01', price: 13, volume: 1}
, {time: '13:45:01', price: 13, volume: 3}
, {time: '13:45:02', price: 13, volume: 1});
const groups$ =
transactions$.pipe( groupBy(t => `${t.price}#${t.time}`)
, mergeMap(group$ =>
group$.pipe(reduce((tt, {time, price, volume}) =>
({time, price, volume: volume + tt.volume}),
{volume: 0}))));
groups$.subscribe(t => console.log(JSON.stringify(t)))
<script src="https://cdnjs.cloudflare.com/ajax/libs/rxjs/7.3.0/rxjs.umd.min.js" integrity="sha512-y3JTS47nnpKORJX8Jn1Rlm+QgRIIZHtu3hWxal0e81avPrqUH48yk+aCi+gprT0RMAcpYa0WCkapxe+bpBHD6g==" crossorigin="anonymous" referrerpolicy="no-referrer"></script>
<script>
const {of} = rxjs;
const {groupBy, mergeMap, reduce} = rxjs.operators;
</script>

Related

How can I rewrite this countdown timer using RxPY?

Here's RxJS code I'm trying to reproduce with RxPY.
const counter$ = interval(1000);
counter$
.pipe(
mapTo(-1),
scan((accumulator, current) => {
return accumulator + current;
}, 10),
takeWhile(value => value >= 0)
)
.subscribe(console.log);
9
8
7
6
5
4
3
2
1
0
-1
And here's what I through was equivalent but is not
counter = rx.interval(1)
composed = counter.pipe(
ops.map(lambda value: value - 1),
ops.scan(lambda acc, curr: acc + curr, 10),
ops.take_while(lambda value: value >= 0),
)
composed.subscribe(lambda value: print(value))
9
9
10
12
15
19
Could someone help me to understand what I'm missing here?
I don't know python at all, but I do notice one difference in your map between your js and python:
mapTo(-1) // always emits -1
-- vs --
ops.map(lambda value: value - 1) # emits interval index - 1
I think the solution is simple, just remove the "value":
ops.map(lambda value: -1)
However, if your "current value" is always -1 you can simplify by not using map at all and put -1 in your scan() function. Here's what it looks like in rxjs:
const countDown$ = interval(1000).pipe(
scan(accumulator => accumulator - 1, 10)
takeWhile(value => value >= 0)
);

Group sorted array according a mean difference threshold

Problem statement: I have an array of N sorted integers and a threshold value K. I would like to group them in such a way that for each element, the difference between the group mean and the element is <= K. What is the best algorithm to use?
I've looked into Jenks' natural breaks and k-means clustering, but both of those seem better suited to a situation where you have a desired number of clusters, whereas I have a desired maximum variance per-cluster.
// example
const distances = [5, 8, 8, 9, 16, 20, 29, 42, 56, 57, 57, 58, 103, 104, 150, 167]
const threshold = 10
// desired output:
// cluster(distances) =>
// [
// [8, 8, 9, 5, 16, 20]
// [29, 42]
// [56, 57, 57, 58]
// [103, 104]
// [150, 167]
// ]
Here's my progress so far: https://gist.github.com/qrohlf/785c667735171b7353702cc74c10857d
I'm probably going to try some kind of divide-and-conquer approach for correcting the 'ballpark' results I get from the implementation I currently have, but I don't really see a great, clean way to do this right now.
I searched and I found this: Unweighted Pair Group Method with Arithmetic Mean.
Here is an article with an example: link. I think it will help you, It looks easy to confirm with your purpose.
The UPGMA algorithm produces rooted dendrograms and requires a constant-rate assumption - that is, it assumes an ultrametric tree in which the distances from the root to every branch tip are equal.
For anyone else bumping into this, here's my (unoptimized) implementation of the UPGMA algorithm described above:
const head = array => array[0]
const tail = array => array.slice(1)
const last = array => array[array.length - 1]
const sum = array => array.reduce((a, b) => a + b)
const avg = array => sum(array) / array.length
const minIndex = array => array.reduce((iMin, x, i) => x < array[iMin] ? i : iMin, 0)
const range = length => Array.apply(null, Array(length)).map((_, i) => i)
const isArray = Array.isArray
const distances = [5, 8, 8, 9, 16, 20, 29, 42, 56, 57, 57, 58, 103, 104, 150, 167, 800]
// cluster an array of numeric values such that the mean difference of each
// point within each cluster is within a threshold value
const cluster = (points, threshold = 10) => {
return _cluster(points, range(points.length).map(i => [i]), threshold).map(c =>
isArray(c) ? c.map(i => points[i]) : [points[c]])
}
// recursive call
const _cluster = (points, clusters, threshold) => {
const matrix = getDistanceMatrix(points, clusters)
// get the minimum col index for each row in the matrix
const rowMinimums = matrix.map(minIndex)
// get the index for the column containing the smallest distance
const bestRow = minIndex(rowMinimums.map((col, row) => matrix[row][col]))
const bestCol = rowMinimums[bestRow]
const isValid = isValidCluster(points, mergeClusters(clusters[bestRow], clusters[bestCol]), threshold)
if (!isValid) {
return clusters
}
return _cluster(points, merge(clusters, bestRow, bestCol), threshold)
}
const isValidCluster = (points, cluster, threshold) => {
// at this point, cluster is guaranteed to be an array, not a single point
const distances = cluster.map(i => points[i])
const mean = avg(distances)
return distances.every(d => Math.abs(mean - d) <= threshold)
}
// immutable merge of indices a and b in clusters
const merge = (clusters, a, b) => {
// merge two clusters by index
const clusterA = clusters[a]
const clusterB = clusters[b]
// optimization opportunity: this filter is causing *another* iteration
// of clusters.
const withoutPoints = clusters.filter(c => c !== clusterA && c !== clusterB)
return [mergeClusters(clusterA, clusterB)].concat(withoutPoints)
}
const mergeClusters = (clusterA, clusterB) => clusterA.concat(clusterB)
// optimization opportunity: this currently does 2x the work needed, since the
// distance from a->b is the same as the distance from b->a
const getDistanceMatrix = (points, clusters) => {
// reduce clusters to distance/average distance
const reduced = clusters.map(c => Array.isArray(c) ? avg(c.map(i => points[i])) : points[c])
return reduced.map((i, row) => reduced.map((j, col) => (row === col) ? Infinity : Math.abs(j - i)))
}
const log2DArray = rows => console.log('[\n' + rows.map(row => ' [' + row.join(', ') + ']').join('\n') + '\n]')
console.log('clustered points:')
log2DArray(cluster(distances))

Is there an RX method which combines map and filter?

I'm new to RxJS. I know I could just .filter and .map an observable to get the change I'm looking for. But, is there any method which combines the two into one function?
Yes there is.
FlatMap.
Suppose you have an Observable of numbers (1, 2, 3, 4, 5, ...) and you want to filter for even numbers and map them to x*10.
var tenTimesEvenNumbers = numbers.flatMap(function (x) {
if (x % 2 === 0) {
return Rx.Observable.just(x * 10);
} else {
return Rx.Observable.empty();
}
});
As of rxjs v6.6.7, the solution becomes as following:
// Initialise observable with some numbers
const numbers = of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
// Pipe the observable using mergeMap
const tenTimesEvenNumbers = numbers.pipe(
mergeMap((x: number) => {
// If the number is even, return an observable containing the number multiplied by ten
// Otherwise return an empty observable
return x % 2 === 0 ? of(x * 10) : EMPTY;
})
);
// Subscribe to the observable and print the values
tenTimesEvenNumbers.subscribe((value: number) =>
console.log('Value:', value)
);
The above will print:
Value: 20
Value: 40
Value: 60
Value: 80
Value: 100
Here is a working stackblitz as well.

How to make rounded percentages add up to 100%

Consider the four percentages below, represented as float numbers:
13.626332%
47.989636%
9.596008%
28.788024%
-----------
100.000000%
I need to represent these percentages as whole numbers. If I simply use Math.round(), I end up with a total of 101%.
14 + 48 + 10 + 29 = 101
If I use parseInt(), I end up with a total of 97%.
13 + 47 + 9 + 28 = 97
What's a good algorithm to represent any number of percentages as whole numbers while still maintaining a total of 100%?
Edit: After reading some of the comments and answers, there are clearly many ways to go about solving this.
In my mind, to remain true to the numbers, the "right" result is the one that minimizes the overall error, defined by how much error rounding would introduce relative to the actual value:
value rounded error decision
----------------------------------------------------
13.626332 14 2.7% round up (14)
47.989636 48 0.0% round up (48)
9.596008 10 4.0% don't round up (9)
28.788024 29 2.7% round up (29)
In case of a tie (3.33, 3.33, 3.33) an arbitrary decision can be made (e.g. 3, 4, 3).
There are many ways to do just this, provided you are not concerned about reliance on the original decimal data.
The first and perhaps most popular method would be the Largest Remainder Method
Which is basically:
Rounding everything down
Getting the difference in sum and 100
Distributing the difference by adding 1 to items in decreasing order of their decimal parts
In your case, it would go like this:
13.626332%
47.989636%
9.596008%
28.788024%
If you take the integer parts, you get
13
47
9
28
which adds up to 97, and you want to add three more. Now, you look at the decimal parts, which are
.626332%
.989636%
.596008%
.788024%
and take the largest ones until the total reaches 100. So you would get:
14
48
9
29
Alternatively, you can simply choose to show one decimal place instead of integer values. So the numbers would be 48.3 and 23.9 etc. This would drop the variance from 100 by a lot.
Probably the "best" way to do this (quoted since "best" is a subjective term) is to keep a running (non-integral) tally of where you are, and round that value.
Then use that along with the history to work out what value should be used. For example, using the values you gave:
Value CumulValue CumulRounded PrevBaseline Need
--------- ---------- ------------ ------------ ----
0
13.626332 13.626332 14 0 14 ( 14 - 0)
47.989636 61.615968 62 14 48 ( 62 - 14)
9.596008 71.211976 71 62 9 ( 71 - 62)
28.788024 100.000000 100 71 29 (100 - 71)
---
100
At each stage, you don't round the number itself. Instead, you round the accumulated value and work out the best integer that reaches that value from the previous baseline - that baseline is the cumulative value (rounded) of the previous row.
This works because you're not losing information at each stage but rather using the information more intelligently. The 'correct' rounded values are in the final column and you can see that they sum to 100.
You can see the difference between this and blindly rounding each value, in the third value above. While 9.596008 would normally round up to 10, the accumulated 71.211976 correctly rounds down to 71 - this means that only 9 is needed to add to the previous baseline of 62.
This also works for "problematic" sequence like three roughly-1/3 values, where one of them should be rounded up:
Value CumulValue CumulRounded PrevBaseline Need
--------- ---------- ------------ ------------ ----
0
33.333333 33.333333 33 0 33 ( 33 - 0)
33.333333 66.666666 67 33 34 ( 67 - 33)
33.333333 99.999999 100 67 33 (100 - 67)
---
100
Since none of the answers here seem to solve it properly, here's my semi-obfuscated version using underscorejs:
function foo(l, target) {
var off = target - _.reduce(l, function(acc, x) { return acc + Math.round(x) }, 0);
return _.chain(l).
sortBy(function(x) { return Math.round(x) - x }).
map(function(x, i) { return Math.round(x) + (off > i) - (i >= (l.length + off)) }).
value();
}
foo([13.626332, 47.989636, 9.596008, 28.788024], 100) // => [48, 29, 14, 9]
foo([16.666, 16.666, 16.666, 16.666, 16.666, 16.666], 100) // => [17, 17, 17, 17, 16, 16]
foo([33.333, 33.333, 33.333], 100) // => [34, 33, 33]
foo([33.3, 33.3, 33.3, 0.1], 100) // => [34, 33, 33, 0]
The goal of rounding is to generate the least amount of error. When you're rounding a single value, that process is simple and straightforward and most people understand it easily. When you're rounding multiple numbers at the same time, the process gets trickier - you must define how the errors are going to combine, i.e. what must be minimized.
The well-voted answer by Varun Vohra minimizes the sum of the absolute errors, and it's very simple to implement. However there are edge cases it does not handle - what should be the result of rounding 24.25, 23.25, 27.25, 25.25? One of those needs to be rounded up instead of down. You would probably just arbitrarily pick the first or last one in the list.
Perhaps it's better to use the relative error instead of the absolute error. Rounding 23.25 up to 24 changes it by 3.2% while rounding 27.25 up to 28 only changes it by 2.8%. Now there's a clear winner.
It's possible to tweak this even further. One common technique is to square each error, so that large errors count disproportionately more than small ones. I'd also use a non-linear divisor to get the relative error - it doesn't seem right that an error at 1% is 99 times more important than an error at 99%. In the code below I've used the square root.
The complete algorithm is as follows:
Sum the percentages after rounding them all down, and subtract from 100. This tells you how many of those percentages must be rounded up instead.
Generate two error scores for each percentage, one when when rounded down and one when rounded up. Take the difference between the two.
Sort the error differences produced above.
For the number of percentages that need to be rounded up, take an item from the sorted list and increment the rounded down percentage by 1.
You may still have more than one combination with the same error sum, for example 33.3333333, 33.3333333, 33.3333333. This is unavoidable, and the result will be completely arbitrary. The code I give below prefers to round up the values on the left.
Putting it all together in Python looks like this.
from math import isclose, sqrt
def error_gen(actual, rounded):
divisor = sqrt(1.0 if actual < 1.0 else actual)
return abs(rounded - actual) ** 2 / divisor
def round_to_100(percents):
if not isclose(sum(percents), 100):
raise ValueError
n = len(percents)
rounded = [int(x) for x in percents]
up_count = 100 - sum(rounded)
errors = [(error_gen(percents[i], rounded[i] + 1) - error_gen(percents[i], rounded[i]), i) for i in range(n)]
rank = sorted(errors)
for i in range(up_count):
rounded[rank[i][1]] += 1
return rounded
>>> round_to_100([13.626332, 47.989636, 9.596008, 28.788024])
[14, 48, 9, 29]
>>> round_to_100([33.3333333, 33.3333333, 33.3333333])
[34, 33, 33]
>>> round_to_100([24.25, 23.25, 27.25, 25.25])
[24, 23, 28, 25]
>>> round_to_100([1.25, 2.25, 3.25, 4.25, 89.0])
[1, 2, 3, 4, 90]
As you can see with that last example, this algorithm is still capable of delivering non-intuitive results. Even though 89.0 needs no rounding whatsoever, one of the values in that list needed to be rounded up; the lowest relative error results from rounding up that large value rather than the much smaller alternatives.
This answer originally advocated going through every possible combination of round up/round down, but as pointed out in the comments a simpler method works better. The algorithm and code reflect that simplification.
I wrote a C# version rounding helper, the algorithm is same as Varun Vohra's answer, hope it helps.
public static List<decimal> GetPerfectRounding(List<decimal> original,
decimal forceSum, int decimals)
{
var rounded = original.Select(x => Math.Round(x, decimals)).ToList();
Debug.Assert(Math.Round(forceSum, decimals) == forceSum);
var delta = forceSum - rounded.Sum();
if (delta == 0) return rounded;
var deltaUnit = Convert.ToDecimal(Math.Pow(0.1, decimals)) * Math.Sign(delta);
List<int> applyDeltaSequence;
if (delta < 0)
{
applyDeltaSequence = original
.Zip(Enumerable.Range(0, int.MaxValue), (x, index) => new { x, index })
.OrderBy(a => original[a.index] - rounded[a.index])
.ThenByDescending(a => a.index)
.Select(a => a.index).ToList();
}
else
{
applyDeltaSequence = original
.Zip(Enumerable.Range(0, int.MaxValue), (x, index) => new { x, index })
.OrderByDescending(a => original[a.index] - rounded[a.index])
.Select(a => a.index).ToList();
}
Enumerable.Repeat(applyDeltaSequence, int.MaxValue)
.SelectMany(x => x)
.Take(Convert.ToInt32(delta/deltaUnit))
.ForEach(index => rounded[index] += deltaUnit);
return rounded;
}
It pass the following Unit test:
[TestMethod]
public void TestPerfectRounding()
{
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> {3.333m, 3.334m, 3.333m}, 10, 2),
new List<decimal> {3.33m, 3.34m, 3.33m});
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> {3.33m, 3.34m, 3.33m}, 10, 1),
new List<decimal> {3.3m, 3.4m, 3.3m});
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> {3.333m, 3.334m, 3.333m}, 10, 1),
new List<decimal> {3.3m, 3.4m, 3.3m});
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> { 13.626332m, 47.989636m, 9.596008m, 28.788024m }, 100, 0),
new List<decimal> {14, 48, 9, 29});
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> { 16.666m, 16.666m, 16.666m, 16.666m, 16.666m, 16.666m }, 100, 0),
new List<decimal> { 17, 17, 17, 17, 16, 16 });
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> { 33.333m, 33.333m, 33.333m }, 100, 0),
new List<decimal> { 34, 33, 33 });
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> { 33.3m, 33.3m, 33.3m, 0.1m }, 100, 0),
new List<decimal> { 34, 33, 33, 0 });
}
DO NOT sum the rounded numbers. You're going to have inaccurate results. The total could be off significantly depending on the number of terms and the distribution of fractional parts.
Display the rounded numbers but sum the actual values. Depending on how you're presenting the numbers, the actual way to do that would vary. That way you get
14
48
10
29
__
100
Any way you go you're going to have discrepancy. There's no way in your example to show numbers that add up to 100 without "rounding" one value the wrong way (least error would be changing 9.596 to 9)
EDIT
You need to choose between one of the following:
Accuracy of the items
Accuracy of the sum (if you're summing rounded values)
Consistency between the rounded items and the rounded sum)
Most of the time when dealing with percentages #3 is the best option because it's more obvious when the total equals 101% than when the individual items don't total to 100, and you keep the individual items accurate. "Rounding" 9.596 to 9 is inaccurate in my opinion.
To explain this I sometimes add a footnote that explains that the individual values are rounded and may not total 100% - anyone that understands rounding should be able to understand that explanation.
You could try keeping track of your error due to rounding, and then rounding against the grain if the accumulated error is greater than the fractional portion of the current number.
13.62 -> 14 (+.38)
47.98 -> 48 (+.02 (+.40 total))
9.59 -> 10 (+.41 (+.81 total))
28.78 -> 28 (round down because .81 > .78)
------------
100
Not sure if this would work in general, but it seems to work similar if the order is reversed:
28.78 -> 29 (+.22)
9.59 -> 9 (-.37; rounded down because .59 > .22)
47.98 -> 48 (-.35)
13.62 -> 14 (+.03)
------------
100
I'm sure there are edge cases where this might break down, but any approach is going to be at least somewhat arbitrary since you're basically modifying your input data.
I'm not sure what level of accuracy you need, but what I would do is simply add 1 the first n numbers, n being the ceil of the total sum of decimals. In this case that is 3, so I would add 1 to the first 3 items and floor the rest. Of course this is not super accurate, some numbers might be rounded up or down when it shouldn't but it works okay and will always result in 100%.
So [ 13.626332, 47.989636, 9.596008, 28.788024 ] would be [14, 48, 10, 28] because Math.ceil(.626332+.989636+.596008+.788024) == 3
function evenRound( arr ) {
var decimal = -~arr.map(function( a ){ return a % 1 })
.reduce(function( a,b ){ return a + b }); // Ceil of total sum of decimals
for ( var i = 0; i < decimal; ++i ) {
arr[ i ] = ++arr[ i ]; // compensate error by adding 1 the the first n items
}
return arr.map(function( a ){ return ~~a }); // floor all other numbers
}
var nums = evenRound( [ 13.626332, 47.989636, 9.596008, 28.788024 ] );
var total = nums.reduce(function( a,b ){ return a + b }); //=> 100
You can always inform users that the numbers are rounded and may not be super-accurate...
I once wrote an unround tool, to find the minimal perturbation to a set of numbers to match a goal. It was a different problem, but one could in theory use a similar idea here. In this case, we have a set of choices.
Thus for the first element, we can either round it up to 14, or down to 13. The cost (in a binary integer programming sense) of doing so is less for the round up than the round down, because the round down requires we move that value a larger distance. Similarly, we can round each number up or down, so there are a total of 16 choices we must choose from.
13.626332
47.989636
9.596008
+ 28.788024
-----------
100.000000
I'd normally solve the general problem in MATLAB, here using bintprog, a binary integer programming tool, but there are only a few choices to be tested, so it is easy enough with simple loops to test out each of the 16 alternatives. For example, suppose we were to round this set as:
Original Rounded Absolute error
13.626 13 0.62633
47.99 48 0.01036
9.596 10 0.40399
+ 28.788 29 0.21198
---------------------------------------
100.000 100 1.25266
The total absolute error made is 1.25266. It can be reduced slightly by the following alternative rounding:
Original Rounded Absolute error
13.626 14 0.37367
47.99 48 0.01036
9.596 9 0.59601
+ 28.788 29 0.21198
---------------------------------------
100.000 100 1.19202
In fact, this will be the optimal solution in terms of the absolute error. Of course, if there were 20 terms, the search space will be of size 2^20 = 1048576. For 30 or 40 terms, that space will be of significant size. In that case, you would need to use a tool that can efficiently search the space, perhaps using a branch and bound scheme.
I think the following will achieve what you are after
function func( orig, target ) {
var i = orig.length, j = 0, total = 0, change, newVals = [], next, factor1, factor2, len = orig.length, marginOfErrors = [];
// map original values to new array
while( i-- ) {
total += newVals[i] = Math.round( orig[i] );
}
change = total < target ? 1 : -1;
while( total !== target ) {
// Iterate through values and select the one that once changed will introduce
// the least margin of error in terms of itself. e.g. Incrementing 10 by 1
// would mean an error of 10% in relation to the value itself.
for( i = 0; i < len; i++ ) {
next = i === len - 1 ? 0 : i + 1;
factor2 = errorFactor( orig[next], newVals[next] + change );
factor1 = errorFactor( orig[i], newVals[i] + change );
if( factor1 > factor2 ) {
j = next;
}
}
newVals[j] += change;
total += change;
}
for( i = 0; i < len; i++ ) { marginOfErrors[i] = newVals[i] && Math.abs( orig[i] - newVals[i] ) / orig[i]; }
// Math.round() causes some problems as it is difficult to know at the beginning
// whether numbers should have been rounded up or down to reduce total margin of error.
// This section of code increments and decrements values by 1 to find the number
// combination with least margin of error.
for( i = 0; i < len; i++ ) {
for( j = 0; j < len; j++ ) {
if( j === i ) continue;
var roundUpFactor = errorFactor( orig[i], newVals[i] + 1) + errorFactor( orig[j], newVals[j] - 1 );
var roundDownFactor = errorFactor( orig[i], newVals[i] - 1) + errorFactor( orig[j], newVals[j] + 1 );
var sumMargin = marginOfErrors[i] + marginOfErrors[j];
if( roundUpFactor < sumMargin) {
newVals[i] = newVals[i] + 1;
newVals[j] = newVals[j] - 1;
marginOfErrors[i] = newVals[i] && Math.abs( orig[i] - newVals[i] ) / orig[i];
marginOfErrors[j] = newVals[j] && Math.abs( orig[j] - newVals[j] ) / orig[j];
}
if( roundDownFactor < sumMargin ) {
newVals[i] = newVals[i] - 1;
newVals[j] = newVals[j] + 1;
marginOfErrors[i] = newVals[i] && Math.abs( orig[i] - newVals[i] ) / orig[i];
marginOfErrors[j] = newVals[j] && Math.abs( orig[j] - newVals[j] ) / orig[j];
}
}
}
function errorFactor( oldNum, newNum ) {
return Math.abs( oldNum - newNum ) / oldNum;
}
return newVals;
}
func([16.666, 16.666, 16.666, 16.666, 16.666, 16.666], 100); // => [16, 16, 17, 17, 17, 17]
func([33.333, 33.333, 33.333], 100); // => [34, 33, 33]
func([33.3, 33.3, 33.3, 0.1], 100); // => [34, 33, 33, 0]
func([13.25, 47.25, 11.25, 28.25], 100 ); // => [13, 48, 11, 28]
func( [25.5, 25.5, 25.5, 23.5], 100 ); // => [25, 25, 26, 24]
One last thing, I ran the function using the numbers originally given in the question to compare to the desired output
func([13.626332, 47.989636, 9.596008, 28.788024], 100); // => [48, 29, 13, 10]
This was different to what the question wanted => [ 48, 29, 14, 9]. I couldn't understand this until I looked at the total margin of error
-------------------------------------------------
| original | question | % diff | mine | % diff |
-------------------------------------------------
| 13.626332 | 14 | 2.74% | 13 | 4.5% |
| 47.989636 | 48 | 0.02% | 48 | 0.02% |
| 9.596008 | 9 | 6.2% | 10 | 4.2% |
| 28.788024 | 29 | 0.7% | 29 | 0.7% |
-------------------------------------------------
| Totals | 100 | 9.66% | 100 | 9.43% |
-------------------------------------------------
Essentially, the result from my function actually introduces the least amount of error.
Fiddle here
Note: the selected answer is changing the array order which is not preferred, here I provide more different variations that achieving the same result and keeping the array in order
Discussion
given [98.88, .56, .56] how do you want to round it? you have four option
1- round things up and subtract what is added from the rest of the numbers, so the result becomes [98, 1, 1]
this could be a good answer, but what if we have [97.5, .5, .5, .5, .5, .5]? then you need to round it up to [95, 1, 1, 1, 1, 1]
do you see how it goes? if you add more 0-like numbers, you will lose more value from the rest of your numbers. this could be very troublesome when you have a big array of zero-like number like [40, .5, .5 , ... , .5]. when you round up this, you could end up with an array of ones: [1, 1, .... , 1]
so round-up isn't a good option.
2- you round down the numbers. so [98.88, .56, .56] becomes [98, 0, 0], then you are 2 less than 100. you ignore anything that is already 0, then add up the difference to the biggest numbers. so bigger numbers will get more.
3- same as previous, round down numbers, but you sort descending based on the decimals, divide up the diff based on the decimal, so biggest decimal will get the diff.
4- you round up, but you add what you added to the next number. so like a wave what you have added will be redirected to the end of your array. so [98.88, .56, .56] becomes [99, 0, 1]
none of these are ideal, so be mindful that your data is going to lose its shape.
here I provide a code for cases 2 and 3 (as case No.1 is not practical when you have a lot of zero-like numbers). it's modern Js and doesn't need any library to use
2nd case
const v1 = [13.626332, 47.989636, 9.596008, 28.788024];// => [ 14, 48, 9, 29 ]
const v2 = [16.666, 16.666, 16.666, 16.666, 16.666, 16.666] // => [ 17, 17, 17, 17, 16, 16 ] 
const v3 = [33.333, 33.333, 33.333] // => [ 34, 33, 33 ]
const v4 = [33.3, 33.3, 33.3, 0.1] // => [ 34, 33, 33, 0 ]
const v5 = [98.88, .56, .56] // =>[ 100, 0, 0 ]
const v6 = [97.5, .5, .5, .5, .5, .5] // => [ 100, 0, 0, 0, 0, 0 ]
const normalizePercentageByNumber = (input) => {
const rounded: number[] = input.map(x => Math.floor(x));
const afterRoundSum = rounded.reduce((pre, curr) => pre + curr, 0);
const countMutableItems = rounded.filter(x => x >=1).length;
const errorRate = 100 - afterRoundSum;
const deductPortion = Math.ceil(errorRate / countMutableItems);
const biggest = [...rounded].sort((a, b) => b - a).slice(0, Math.min(Math.abs(errorRate), countMutableItems));
const result = rounded.map(x => {
const indexOfX = biggest.indexOf(x);
if (indexOfX >= 0) {
x += deductPortion;
console.log(biggest)
biggest.splice(indexOfX, 1);
return x;
}
return x;
});
return result;
}
3rd case
const normalizePercentageByDecimal = (input: number[]) => {
const rounded= input.map((x, i) => ({number: Math.floor(x), decimal: x%1, index: i }));
const decimalSorted= [...rounded].sort((a,b)=> b.decimal-a.decimal);
const sum = rounded.reduce((pre, curr)=> pre + curr.number, 0) ;
const error= 100-sum;
for (let i = 0; i < error; i++) {
const element = decimalSorted[i];
element.number++;
}
const result= [...decimalSorted].sort((a,b)=> a.index-b.index);
return result.map(x=> x.number);
}
4th case
you just need to calculate how much extra air added or deducted to your numbers on each roundup and, add or subtract it again in the next item.
const v1 = [13.626332, 47.989636, 9.596008, 28.788024];// => [14, 48, 10, 28 ]
const v2 = [16.666, 16.666, 16.666, 16.666, 16.666, 16.666] // => [17, 16, 17, 16, 17, 17]
const v3 = [33.333, 33.333, 33.333] // => [33, 34, 33]
const v4 = [33.3, 33.3, 33.3, 0.1] // => [33, 34, 33, 0]
const normalizePercentageByWave= v4.reduce((pre, curr, i, arr) => {
let number = Math.round(curr + pre.decimal);
let total = pre.total + number;
const decimal = curr - number;
if (i == arr.length - 1 && total < 100) {
const diff = 100 - total;
total += diff;
number += diff;
}
return { total, numbers: [...pre.numbers, number], decimal };
}, { total: 0, numbers: [], decimal: 0 });
If you have just just two options you are good to use Math.round(). Only problematic pair of values are X.5 (eg. 37.5 and 62.5) it will round both values up and you will end up with 101% as you can try here:
https://jsfiddle.net/f8np1t0k/2/
Since you need to show always 100% you simply remove one percentage from on of them, for example on first one
const correctedARounded = Number.isInteger(aRounded-0.5) ? a - 1 : a
Or you can favor the option with more % votes.
The error of 1% diff happens 114 times for 10k cases of divisions between pairs of 1-100 values.
My JS implementation for the well-voted answer by Varun Vohra
const set1 = [13.626332, 47.989636, 9.596008, 28.788024];
// const set2 = [24.25, 23.25, 27.25, 25.25];
const values = set1;
console.log('Total: ', values.reduce((accum, each) => accum + each));
console.log('Incorrectly Rounded: ',
values.reduce((accum, each) => accum + Math.round(each), 0));
const adjustValues = (values) => {
// 1. Separate integer and decimal part
// 2. Store both in a new array of objects sorted by decimal part descending
// 3. Add in original position to "put back" at the end
const flooredAndSortedByDecimal = values.map((value, position) => (
{
floored: Math.floor(value),
decimal: value - Number.parseInt(value),
position
}
)).sort(({decimal}, {decimal: otherDecimal}) => otherDecimal - decimal);
const roundedTotal = values.reduce((total, value) => total + Math.floor(value), 0);
let availableForDistribution = 100 - roundedTotal;
// Add 1 to each value from what's available
const adjustedValues = flooredAndSortedByDecimal.map(value => {
const { floored, ...rest } = value;
let finalPercentage = floored;
if(availableForDistribution > 0){
finalPercentage = floored + 1;
availableForDistribution--;
}
return {
finalPercentage,
...rest
}
});
// Put back and return the new values
return adjustedValues
.sort(({position}, {position: otherPosition}) => position - otherPosition)
.map(({finalPercentage}) => finalPercentage);
}
const finalPercentages = adjustValues(values);
console.log({finalPercentages})
// { finalPercentage: [14, 48, 9, 29]}
Or something like this for brevity, where you just accumulate the error...
const p = [13.626332, 47.989636, 9.596008, 28.788024];
const round = (a, e = 0) => a.map(x => (r = Math.round(x + e), e += x - r, r));
console.log(round(p));
Result: [14, 48, 9, 29]
If you are rounding it there is no good way to get it exactly the same in all case.
You can take the decimal part of the N percentages you have (in the example you gave it is 4).
Add the decimal parts. In your example you have total of fractional part = 3.
Ceil the 3 numbers with highest fractions and floor the rest.
(Sorry for the edits)
If you really must round them, there are already very good suggestions here (largest remainder, least relative error, and so on).
There is also already one good reason not to round (you'll get at least one number that "looks better" but is "wrong"), and how to solve that (warn your readers) and that is what I do.
Let me add on the "wrong" number part.
Suppose you have three events/entitys/... with some percentages that you approximate as:
DAY 1
who | real | app
----|-------|------
A | 33.34 | 34
B | 33.33 | 33
C | 33.33 | 33
Later on the values change slightly, to
DAY 2
who | real | app
----|-------|------
A | 33.35 | 33
B | 33.36 | 34
C | 33.29 | 33
The first table has the already mentioned problem of having a "wrong" number: 33.34 is closer to 33 than to 34.
But now you have a bigger error. Comparing day 2 to day 1, the real percentage value for A increased, by 0.01%, but the approximation shows a decrease by 1%.
That is a qualitative error, probably quite worse that the initial quantitative error.
One could devise a approximation for the whole set but, you may have to publish data on day one, thus you'll not know about day two. So, unless you really, really, must approximate, you probably better not.
Here's a simpler Python implementation of #varun-vohra answer:
def apportion_pcts(pcts, total):
proportions = [total * (pct / 100) for pct in pcts]
apportions = [math.floor(p) for p in proportions]
remainder = total - sum(apportions)
remainders = [(i, p - math.floor(p)) for (i, p) in enumerate(proportions)]
remainders.sort(key=operator.itemgetter(1), reverse=True)
for (i, _) in itertools.cycle(remainders):
if remainder == 0:
break
else:
apportions[i] += 1
remainder -= 1
return apportions
You need math, itertools, operator.
check if this is valid or not as far as my test cases I am able to get this working.
let's say number is k;
sort percentage by descending oder.
iterate over each percentage from descending order.
calculate percentage of k for first percentage take Math.Ceil of output.
next k = k-1
iterate over till all percentage is consumed.
I have implemented the method from Varun Vohra's answer here for both lists and dicts.
import math
import numbers
import operator
import itertools
def round_list_percentages(number_list):
"""
Takes a list where all values are numbers that add up to 100,
and rounds them off to integers while still retaining a sum of 100.
A total value sum that rounds to 100.00 with two decimals is acceptable.
This ensures that all input where the values are calculated with [fraction]/[total]
and the sum of all fractions equal the total, should pass.
"""
# Check input
if not all(isinstance(i, numbers.Number) for i in number_list):
raise ValueError('All values of the list must be a number')
# Generate a key for each value
key_generator = itertools.count()
value_dict = {next(key_generator): value for value in number_list}
return round_dictionary_percentages(value_dict).values()
def round_dictionary_percentages(dictionary):
"""
Takes a dictionary where all values are numbers that add up to 100,
and rounds them off to integers while still retaining a sum of 100.
A total value sum that rounds to 100.00 with two decimals is acceptable.
This ensures that all input where the values are calculated with [fraction]/[total]
and the sum of all fractions equal the total, should pass.
"""
# Check input
# Only allow numbers
if not all(isinstance(i, numbers.Number) for i in dictionary.values()):
raise ValueError('All values of the dictionary must be a number')
# Make sure the sum is close enough to 100
# Round value_sum to 2 decimals to avoid floating point representation errors
value_sum = round(sum(dictionary.values()), 2)
if not value_sum == 100:
raise ValueError('The sum of the values must be 100')
# Initial floored results
# Does not add up to 100, so we need to add something
result = {key: int(math.floor(value)) for key, value in dictionary.items()}
# Remainders for each key
result_remainders = {key: value % 1 for key, value in dictionary.items()}
# Keys sorted by remainder (biggest first)
sorted_keys = [key for key, value in sorted(result_remainders.items(), key=operator.itemgetter(1), reverse=True)]
# Otherwise add missing values up to 100
# One cycle is enough, since flooring removes a max value of < 1 per item,
# i.e. this loop should always break before going through the whole list
for key in sorted_keys:
if sum(result.values()) == 100:
break
result[key] += 1
# Return
return result
For those having the percentages in a pandas Series, here is my implemantation of the Largest remainder method (as in Varun Vohra's answer), where you can even select the decimals to which you want to round.
import numpy as np
def largestRemainderMethod(pd_series, decimals=1):
floor_series = ((10**decimals * pd_series).astype(np.int)).apply(np.floor)
diff = 100 * (10**decimals) - floor_series.sum().astype(np.int)
series_decimals = pd_series - floor_series / (10**decimals)
series_sorted_by_decimals = series_decimals.sort_values(ascending=False)
for i in range(0, len(series_sorted_by_decimals)):
if i < diff:
series_sorted_by_decimals.iloc[[i]] = 1
else:
series_sorted_by_decimals.iloc[[i]] = 0
out_series = ((floor_series + series_sorted_by_decimals) / (10**decimals)).sort_values(ascending=False)
return out_series
Here's a Ruby gem that implements the Largest Remainder method:
https://github.com/jethroo/lare_round
To use:
a = Array.new(3){ BigDecimal('0.3334') }
# => [#<BigDecimal:887b6c8,'0.3334E0',9(18)>, #<BigDecimal:887b600,'0.3334E0',9(18)>, #<BigDecimal:887b4c0,'0.3334E0',9(18)>]
a = LareRound.round(a,2)
# => [#<BigDecimal:8867330,'0.34E0',9(36)>, #<BigDecimal:8867290,'0.33E0',9(36)>, #<BigDecimal:88671f0,'0.33E0',9(36)>]
a.reduce(:+).to_f
# => 1.0
I wrote a function in Javascript that takes an array of percentages and outputs an array with rounded percentages using the Largest Remainder Method. It doesn't use any libraries.
Input: [21.6, 46.7, 31, 0.5, 0.2]
Output: [22, 47, 31, 0, 0]
const values = [21.6, 46.7, 31, 0.5, 0.2];
console.log(roundPercentages(values));
function roundPercentages(values) {
const flooredValues = values.map(e => Math.floor(e));
const remainders = values.map(e => e - Math.floor(e));
const totalRemainder = 100 - flooredValues.reduce((a, b) => a + b);
// Deep copy because order of remainders is important
[...remainders]
// Sort from highest to lowest remainder
.sort((a, b) => b - a)
// Get the n largest remainder values, where n = totalRemainder
.slice(0, totalRemainder)
// Add 1 to the floored percentages with the highest remainder (divide the total remainder)
.forEach(e => flooredValues[remainders.indexOf(e)] += 1);
return flooredValues;
}
This is a case for banker's rounding, aka 'round half-even'. It is supported by BigDecimal. Its purpose is to ensure that rounding balances out, i.e. doesn't favour either the bank orthe customer.

Partition a collection into "k" close-to-equal pieces (Scala, but language agnostic)

Defined before this block of code:
dataset can be a Vector or List
numberOfSlices is an Int denoting how many "times" to slice dataset
I want to split the dataset into numberOfSlices slices, distributed as evenly as possible. By "split" I guess I mean "partition" (intersection of all should be empty, union of all should be the original) to use the set theory term, though this is not necessarily a set, just an arbitrary collection.
e.g.
dataset = List(1, 2, 3, 4, 5, 6, 7)
numberOfSlices = 3
slices == ListBuffer(Vector(1, 2), Vector(3, 4), Vector(5, 6, 7))
Is there a better way to do it than what I have below? (which I'm not even sure is optimal...)
Or perhaps this is not an algorithmically feasible endeavor, in which case any known good heuristics?
val slices = new ListBuffer[Vector[Int]]
val stepSize = dataset.length / numberOfSlices
var currentStep = 0
var looper = 0
while (looper != numberOfSlices) {
if (looper != numberOfSlices - 1) {
slices += dataset.slice(currentStep, currentStep + stepSize)
currentStep += stepSize
} else {
slices += dataset.slice(currentStep, dataset.length)
}
looper += 1
}
If the behavior of xs.grouped(xs.size / n) doesn't work for you, it's pretty easy to define exactly what you want. The quotient is the size of the smaller pieces, and the remainder is the number of the bigger pieces:
def cut[A](xs: Seq[A], n: Int) = {
val (quot, rem) = (xs.size / n, xs.size % n)
val (smaller, bigger) = xs.splitAt(xs.size - rem * (quot + 1))
smaller.grouped(quot) ++ bigger.grouped(quot + 1)
}
The typical "optimal" partition calculates an exact fractional length after cutting and then rounds to find the actual number to take:
def cut[A](xs: Seq[A], n: Int):Vector[Seq[A]] = {
val m = xs.length
val targets = (0 to n).map{x => math.round((x.toDouble*m)/n).toInt}
def snip(xs: Seq[A], ns: Seq[Int], got: Vector[Seq[A]]): Vector[Seq[A]] = {
if (ns.length<2) got
else {
val (i,j) = (ns.head, ns.tail.head)
snip(xs.drop(j-i), ns.tail, got :+ xs.take(j-i))
}
}
snip(xs, targets, Vector.empty)
}
This way your longer and shorter blocks will be interspersed, which is often more desirable for evenness:
scala> cut(List(1,2,3,4,5,6,7,8,9,10),4)
res5: Vector[Seq[Int]] =
Vector(List(1, 2, 3), List(4, 5), List(6, 7, 8), List(9, 10))
You can even cut more times than you have elements:
scala> cut(List(1,2,3),5)
res6: Vector[Seq[Int]] =
Vector(List(1), List(), List(2), List(), List(3))
Here's a one-liner that does the job for me, using the familiar Scala trick of a recursive function that returns a Stream. Notice the use of (x+k/2)/k to round the chunk sizes, intercalating the smaller and larger chunks in the final list, all with sizes with at most one element of difference. If you round up instead, with (x+k-1)/k, you move the smaller blocks to the end, and x/k moves them to the beginning.
def k_folds(k: Int, vv: Seq[Int]): Stream[Seq[Int]] =
if (k > 1)
vv.take((vv.size+k/2)/k) +: k_folds(k-1, vv.drop((vv.size+k/2)/k))
else
Stream(vv)
Demo:
scala> val indices = scala.util.Random.shuffle(1 to 39)
scala> for (ff <- k_folds(7, indices)) println(ff)
Vector(29, 8, 24, 14, 22, 2)
Vector(28, 36, 27, 7, 25, 4)
Vector(6, 26, 17, 13, 23)
Vector(3, 35, 34, 9, 37, 32)
Vector(33, 20, 31, 11, 16)
Vector(19, 30, 21, 39, 5, 15)
Vector(1, 38, 18, 10, 12)
scala> for (ff <- k_folds(7, indices)) println(ff.size)
6
6
5
6
5
6
5
scala> for (ff <- indices.grouped((indices.size+7-1)/7)) println(ff)
Vector(29, 8, 24, 14, 22, 2)
Vector(28, 36, 27, 7, 25, 4)
Vector(6, 26, 17, 13, 23, 3)
Vector(35, 34, 9, 37, 32, 33)
Vector(20, 31, 11, 16, 19, 30)
Vector(21, 39, 5, 15, 1, 38)
Vector(18, 10, 12)
scala> for (ff <- indices.grouped((indices.size+7-1)/7)) println(ff.size)
6
6
6
6
6
6
3
Notice how grouped does not try to even out the size of all the sub-lists.
Here is my take on the problem:
def partition[T](items: Seq[T], partitionsCount: Int): List[Seq[T]] = {
val minPartitionSize = items.size / partitionsCount
val extraItemsCount = items.size % partitionsCount
def loop(unpartitioned: Seq[T], acc: List[Seq[T]], extra: Int): List[Seq[T]] =
if (unpartitioned.nonEmpty) {
val (splitIndex, newExtra) = if (extra > 0) (minPartitionSize + 1, extra - 1) else (minPartitionSize, extra)
val (newPartition, remaining) = unpartitioned.splitAt(splitIndex)
loop(remaining, newPartition :: acc, newExtra)
} else acc
loop(items, List.empty, extraItemsCount).reverse
}
It's more verbose than some of the other solutions but hopefully more clear as well. reverse is only necessary if you want the order to be preserved.
As Kaito mentions grouped is exactly what you are looking for. But if you just want to know how to implement such a method, there are many ways ;-). You could for example do it like this:
def grouped[A](xs: List[A], size: Int) = {
def grouped[A](xs: List[A], size: Int, result: List[List[A]]): List[List[A]] = {
if(xs.isEmpty) {
result
} else {
val (slice, rest) = xs.splitAt(size)
grouped(rest, size, result :+ slice)
}
}
grouped(xs, size, Nil)
}
I'd approach it this way: Given n elements and m partitions (n>m), either n mod m == 0 in which case, each partition will have n/m elements, or n mod m = y, in which case you'll have each partition with n/m elements and you have to distribute y over some m.
You'll have y slots with n/m+1 elements and (m-y) slots with n/m. How you distribute them is your choice.

Resources