d3 stack on data without header

d3 stack on data without header - d3.js

My csv is dynamically generated and doesn't have any headers because the number of columns and rows are varying with each run. An example below
A, 30, 40, 35, 25
B, 25, 35, 45, 35
Which if there were headers would look like as below
Age1, Age2, Age1, Age2
A, 30, 40, 35, 25
B, 25, 35, 45, 35
For each row the data is in pairs, i.e. col1 & col[2] need to be stacked and col [3] & col [4] need to be stacked. Goal is to have a clustered stacked bar chart with A and B in X axis and two stacked bars for each pair.
I was trying to follow the example at https://bl.ocks.org/SpaceActuary/6233700e7f443b719855a227f4749ee5 but I am not able to get, how to use the stack function in the absence of headers/keys.

You can use d3.text to load the CSV data, and then iterate over the text to create an array of objects with named values, which could then be stacked or whatever you would normally do in D3 with your data
d3.text("data.csv", function(text) {
console.log(text);
var data = []
d3.csvParseRows(text).forEach(function(row) {
let obj = {}
row.forEach(function(value, i) {
let pairIndex = Math.floor((i - 1) / 2)
//assume first value is the index or name for the row, eg A, B, etc
if (i == 0) {
obj.index = value
}
else if (i % 2 == 0) {
let v = "age2-" + pairIndex
obj[v] = value
} else {
let v = "age1-" + pairIndex
obj[v] = value
}
});
data.push(obj)
});
console.log(data);
// continue with your code

Related

Group sorted array according a mean difference threshold

Problem statement: I have an array of N sorted integers and a threshold value K. I would like to group them in such a way that for each element, the difference between the group mean and the element is <= K. What is the best algorithm to use?
I've looked into Jenks' natural breaks and k-means clustering, but both of those seem better suited to a situation where you have a desired number of clusters, whereas I have a desired maximum variance per-cluster.
// example
const distances = [5, 8, 8, 9, 16, 20, 29, 42, 56, 57, 57, 58, 103, 104, 150, 167]
const threshold = 10
// desired output:
// cluster(distances) =>
// [
// [8, 8, 9, 5, 16, 20]
// [29, 42]
// [56, 57, 57, 58]
// [103, 104]
// [150, 167]
// ]
Here's my progress so far: https://gist.github.com/qrohlf/785c667735171b7353702cc74c10857d
I'm probably going to try some kind of divide-and-conquer approach for correcting the 'ballpark' results I get from the implementation I currently have, but I don't really see a great, clean way to do this right now.

I searched and I found this: Unweighted Pair Group Method with Arithmetic Mean.
Here is an article with an example: link. I think it will help you, It looks easy to confirm with your purpose.
The UPGMA algorithm produces rooted dendrograms and requires a constant-rate assumption - that is, it assumes an ultrametric tree in which the distances from the root to every branch tip are equal.

For anyone else bumping into this, here's my (unoptimized) implementation of the UPGMA algorithm described above:
const head = array => array[0]
const tail = array => array.slice(1)
const last = array => array[array.length - 1]
const sum = array => array.reduce((a, b) => a + b)
const avg = array => sum(array) / array.length
const minIndex = array => array.reduce((iMin, x, i) => x < array[iMin] ? i : iMin, 0)
const range = length => Array.apply(null, Array(length)).map((_, i) => i)
const isArray = Array.isArray
const distances = [5, 8, 8, 9, 16, 20, 29, 42, 56, 57, 57, 58, 103, 104, 150, 167, 800]
// cluster an array of numeric values such that the mean difference of each
// point within each cluster is within a threshold value
const cluster = (points, threshold = 10) => {
return _cluster(points, range(points.length).map(i => [i]), threshold).map(c =>
isArray(c) ? c.map(i => points[i]) : [points[c]])
}
// recursive call
const _cluster = (points, clusters, threshold) => {
const matrix = getDistanceMatrix(points, clusters)
// get the minimum col index for each row in the matrix
const rowMinimums = matrix.map(minIndex)
// get the index for the column containing the smallest distance
const bestRow = minIndex(rowMinimums.map((col, row) => matrix[row][col]))
const bestCol = rowMinimums[bestRow]
const isValid = isValidCluster(points, mergeClusters(clusters[bestRow], clusters[bestCol]), threshold)
if (!isValid) {
return clusters
}
return _cluster(points, merge(clusters, bestRow, bestCol), threshold)
}
const isValidCluster = (points, cluster, threshold) => {
// at this point, cluster is guaranteed to be an array, not a single point
const distances = cluster.map(i => points[i])
const mean = avg(distances)
return distances.every(d => Math.abs(mean - d) <= threshold)
}
// immutable merge of indices a and b in clusters
const merge = (clusters, a, b) => {
// merge two clusters by index
const clusterA = clusters[a]
const clusterB = clusters[b]
// optimization opportunity: this filter is causing *another* iteration
// of clusters.
const withoutPoints = clusters.filter(c => c !== clusterA && c !== clusterB)
return [mergeClusters(clusterA, clusterB)].concat(withoutPoints)
}
const mergeClusters = (clusterA, clusterB) => clusterA.concat(clusterB)
// optimization opportunity: this currently does 2x the work needed, since the
// distance from a->b is the same as the distance from b->a
const getDistanceMatrix = (points, clusters) => {
// reduce clusters to distance/average distance
const reduced = clusters.map(c => Array.isArray(c) ? avg(c.map(i => points[i])) : points[c])
return reduced.map((i, row) => reduced.map((j, col) => (row === col) ? Infinity : Math.abs(j - i)))
}
const log2DArray = rows => console.log('[\n' + rows.map(row => ' [' + row.join(', ') + ']').join('\n') + '\n]')
console.log('clustered points:')
log2DArray(cluster(distances))

Returning A Sorted List's Index in Lua

I access object properties with an index number
object = {}
object.y = {60,20,40}
object.g = {box1,box2,box3} -- graphic
object.c = {false,false,false} -- collision
-- object.y[2] is 20 and its graphic is box2
-- sorted by y location, index should be, object.sort = {2,3,1}
I know table.sort sorts a list, but how can I sort the y list that returns index for the purpose of drawing each object in-front depending on the y location.
Maybe the quicksort function can be edited, I don't understand it.
http://rosettacode.org/wiki/Sorting_algorithms/Quicksort#Lua
https://github.com/mirven/lua_snippets/blob/master/lua/quicksort.lua
Is this possible?

Do not store the data as you're currently doing. Use something like:
object = {
{
y = 60,
g = box1,
c = false,
},
{
y = 20,
g = box2,
c = false,
},
{
y = 40,
g = box3,
c = false,
},
}
and then use the following callback function in table.sort:
function CustomSort(L, R)
return L.y > R.y
end
as shown below:
table.sort(object, CustomSort)

This should work:
local temp = {}
local values = object.y
-- filling temp with all indexes
for i=1,#values do
temp[i] = i
end
-- sorting the indexes, using object.y as comparison
table.sort(temp,function(a,b)
return values[a] < values[b]
end)
-- sorting is done here, have fun with it
object.sort = temp
temp will be {2,3,1} when using this code combined with yours.

#EinsteinK #hjpotter92 : Thank you
RESULT: This is the final version of the answers I received. My question is solved.
Use sortIndex(object) to get sorted list in object.sort . Update sort after objects move.
box1 = love.graphics.newImage("tile1.png")
box2 = love.graphics.newImage("tile2.png")
box3 = love.graphics.newImage("tile3.png")
hero = love.graphics.newImage("hero.png")
object = {
{ x = 200, y = 50, g = box1 },
{ x = 50, y = 100, g = box2 },
{ x = 150, y = 200, g = box3 },
{ x = 0, y = 0, g = hero }
}
function sortIndex(item)
-- Sort id, using item values
local function sortY(a,b)
return item[a].y < item[b].y
end
--------------------------------
local i
local id = {} -- id list
for i = 1, #item do -- Fill id list
id[i] = i
end
-- print( unpack(id) ) -- Check before
table.sort(id,sortY)-- Sort list
-- print( unpack(id) ) -- Check after
item.sort = id -- List added to object.sort
end
sortIndex(object) -- print( unpack(object.sort) ) -- Check sorted id's
function drawObject()
local i,v, g,x,y
for i = 1, #object do
v = object.sort[i] -- Draw in order
x = object[v].x
y = object[v].y
g = object[v].g
love.graphics.draw(g,x,y)
end
end

How to make rounded percentages add up to 100%

Consider the four percentages below, represented as float numbers:
13.626332%
47.989636%
9.596008%
28.788024%
-----------
100.000000%
I need to represent these percentages as whole numbers. If I simply use Math.round(), I end up with a total of 101%.
14 + 48 + 10 + 29 = 101
If I use parseInt(), I end up with a total of 97%.
13 + 47 + 9 + 28 = 97
What's a good algorithm to represent any number of percentages as whole numbers while still maintaining a total of 100%?
Edit: After reading some of the comments and answers, there are clearly many ways to go about solving this.
In my mind, to remain true to the numbers, the "right" result is the one that minimizes the overall error, defined by how much error rounding would introduce relative to the actual value:
value rounded error decision
----------------------------------------------------
13.626332 14 2.7% round up (14)
47.989636 48 0.0% round up (48)
9.596008 10 4.0% don't round up (9)
28.788024 29 2.7% round up (29)
In case of a tie (3.33, 3.33, 3.33) an arbitrary decision can be made (e.g. 3, 4, 3).

There are many ways to do just this, provided you are not concerned about reliance on the original decimal data.
The first and perhaps most popular method would be the Largest Remainder Method
Which is basically:
Rounding everything down
Getting the difference in sum and 100
Distributing the difference by adding 1 to items in decreasing order of their decimal parts
In your case, it would go like this:
13.626332%
47.989636%
9.596008%
28.788024%
If you take the integer parts, you get
13
47
9
28
which adds up to 97, and you want to add three more. Now, you look at the decimal parts, which are
.626332%
.989636%
.596008%
.788024%
and take the largest ones until the total reaches 100. So you would get:
14
48
9
29
Alternatively, you can simply choose to show one decimal place instead of integer values. So the numbers would be 48.3 and 23.9 etc. This would drop the variance from 100 by a lot.

Probably the "best" way to do this (quoted since "best" is a subjective term) is to keep a running (non-integral) tally of where you are, and round that value.
Then use that along with the history to work out what value should be used. For example, using the values you gave:
Value CumulValue CumulRounded PrevBaseline Need
--------- ---------- ------------ ------------ ----
0
13.626332 13.626332 14 0 14 ( 14 - 0)
47.989636 61.615968 62 14 48 ( 62 - 14)
9.596008 71.211976 71 62 9 ( 71 - 62)
28.788024 100.000000 100 71 29 (100 - 71)
---
100
At each stage, you don't round the number itself. Instead, you round the accumulated value and work out the best integer that reaches that value from the previous baseline - that baseline is the cumulative value (rounded) of the previous row.
This works because you're not losing information at each stage but rather using the information more intelligently. The 'correct' rounded values are in the final column and you can see that they sum to 100.
You can see the difference between this and blindly rounding each value, in the third value above. While 9.596008 would normally round up to 10, the accumulated 71.211976 correctly rounds down to 71 - this means that only 9 is needed to add to the previous baseline of 62.
This also works for "problematic" sequence like three roughly-1/3 values, where one of them should be rounded up:
Value CumulValue CumulRounded PrevBaseline Need
--------- ---------- ------------ ------------ ----
0
33.333333 33.333333 33 0 33 ( 33 - 0)
33.333333 66.666666 67 33 34 ( 67 - 33)
33.333333 99.999999 100 67 33 (100 - 67)
---
100

Since none of the answers here seem to solve it properly, here's my semi-obfuscated version using underscorejs:
function foo(l, target) {
var off = target - _.reduce(l, function(acc, x) { return acc + Math.round(x) }, 0);
return _.chain(l).
sortBy(function(x) { return Math.round(x) - x }).
map(function(x, i) { return Math.round(x) + (off > i) - (i >= (l.length + off)) }).
value();
}
foo([13.626332, 47.989636, 9.596008, 28.788024], 100) // => [48, 29, 14, 9]
foo([16.666, 16.666, 16.666, 16.666, 16.666, 16.666], 100) // => [17, 17, 17, 17, 16, 16]
foo([33.333, 33.333, 33.333], 100) // => [34, 33, 33]
foo([33.3, 33.3, 33.3, 0.1], 100) // => [34, 33, 33, 0]

The goal of rounding is to generate the least amount of error. When you're rounding a single value, that process is simple and straightforward and most people understand it easily. When you're rounding multiple numbers at the same time, the process gets trickier - you must define how the errors are going to combine, i.e. what must be minimized.
The well-voted answer by Varun Vohra minimizes the sum of the absolute errors, and it's very simple to implement. However there are edge cases it does not handle - what should be the result of rounding 24.25, 23.25, 27.25, 25.25? One of those needs to be rounded up instead of down. You would probably just arbitrarily pick the first or last one in the list.
Perhaps it's better to use the relative error instead of the absolute error. Rounding 23.25 up to 24 changes it by 3.2% while rounding 27.25 up to 28 only changes it by 2.8%. Now there's a clear winner.
It's possible to tweak this even further. One common technique is to square each error, so that large errors count disproportionately more than small ones. I'd also use a non-linear divisor to get the relative error - it doesn't seem right that an error at 1% is 99 times more important than an error at 99%. In the code below I've used the square root.
The complete algorithm is as follows:
Sum the percentages after rounding them all down, and subtract from 100. This tells you how many of those percentages must be rounded up instead.
Generate two error scores for each percentage, one when when rounded down and one when rounded up. Take the difference between the two.
Sort the error differences produced above.
For the number of percentages that need to be rounded up, take an item from the sorted list and increment the rounded down percentage by 1.
You may still have more than one combination with the same error sum, for example 33.3333333, 33.3333333, 33.3333333. This is unavoidable, and the result will be completely arbitrary. The code I give below prefers to round up the values on the left.
Putting it all together in Python looks like this.
from math import isclose, sqrt
def error_gen(actual, rounded):
divisor = sqrt(1.0 if actual < 1.0 else actual)
return abs(rounded - actual) ** 2 / divisor
def round_to_100(percents):
if not isclose(sum(percents), 100):
raise ValueError
n = len(percents)
rounded = [int(x) for x in percents]
up_count = 100 - sum(rounded)
errors = [(error_gen(percents[i], rounded[i] + 1) - error_gen(percents[i], rounded[i]), i) for i in range(n)]
rank = sorted(errors)
for i in range(up_count):
rounded[rank[i][1]] += 1
return rounded
>>> round_to_100([13.626332, 47.989636, 9.596008, 28.788024])
[14, 48, 9, 29]
>>> round_to_100([33.3333333, 33.3333333, 33.3333333])
[34, 33, 33]
>>> round_to_100([24.25, 23.25, 27.25, 25.25])
[24, 23, 28, 25]
>>> round_to_100([1.25, 2.25, 3.25, 4.25, 89.0])
[1, 2, 3, 4, 90]
As you can see with that last example, this algorithm is still capable of delivering non-intuitive results. Even though 89.0 needs no rounding whatsoever, one of the values in that list needed to be rounded up; the lowest relative error results from rounding up that large value rather than the much smaller alternatives.
This answer originally advocated going through every possible combination of round up/round down, but as pointed out in the comments a simpler method works better. The algorithm and code reflect that simplification.

I wrote a C# version rounding helper, the algorithm is same as Varun Vohra's answer, hope it helps.
public static List<decimal> GetPerfectRounding(List<decimal> original,
decimal forceSum, int decimals)
{
var rounded = original.Select(x => Math.Round(x, decimals)).ToList();
Debug.Assert(Math.Round(forceSum, decimals) == forceSum);
var delta = forceSum - rounded.Sum();
if (delta == 0) return rounded;
var deltaUnit = Convert.ToDecimal(Math.Pow(0.1, decimals)) * Math.Sign(delta);
List<int> applyDeltaSequence;
if (delta < 0)
{
applyDeltaSequence = original
.Zip(Enumerable.Range(0, int.MaxValue), (x, index) => new { x, index })
.OrderBy(a => original[a.index] - rounded[a.index])
.ThenByDescending(a => a.index)
.Select(a => a.index).ToList();
}
else
{
applyDeltaSequence = original
.Zip(Enumerable.Range(0, int.MaxValue), (x, index) => new { x, index })
.OrderByDescending(a => original[a.index] - rounded[a.index])
.Select(a => a.index).ToList();
}
Enumerable.Repeat(applyDeltaSequence, int.MaxValue)
.SelectMany(x => x)
.Take(Convert.ToInt32(delta/deltaUnit))
.ForEach(index => rounded[index] += deltaUnit);
return rounded;
}
It pass the following Unit test:
[TestMethod]
public void TestPerfectRounding()
{
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> {3.333m, 3.334m, 3.333m}, 10, 2),
new List<decimal> {3.33m, 3.34m, 3.33m});
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> {3.33m, 3.34m, 3.33m}, 10, 1),
new List<decimal> {3.3m, 3.4m, 3.3m});
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> {3.333m, 3.334m, 3.333m}, 10, 1),
new List<decimal> {3.3m, 3.4m, 3.3m});
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> { 13.626332m, 47.989636m, 9.596008m, 28.788024m }, 100, 0),
new List<decimal> {14, 48, 9, 29});
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> { 16.666m, 16.666m, 16.666m, 16.666m, 16.666m, 16.666m }, 100, 0),
new List<decimal> { 17, 17, 17, 17, 16, 16 });
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> { 33.333m, 33.333m, 33.333m }, 100, 0),
new List<decimal> { 34, 33, 33 });
CollectionAssert.AreEqual(Utils.GetPerfectRounding(
new List<decimal> { 33.3m, 33.3m, 33.3m, 0.1m }, 100, 0),
new List<decimal> { 34, 33, 33, 0 });
}

DO NOT sum the rounded numbers. You're going to have inaccurate results. The total could be off significantly depending on the number of terms and the distribution of fractional parts.
Display the rounded numbers but sum the actual values. Depending on how you're presenting the numbers, the actual way to do that would vary. That way you get
14
48
10
29
__
100
Any way you go you're going to have discrepancy. There's no way in your example to show numbers that add up to 100 without "rounding" one value the wrong way (least error would be changing 9.596 to 9)
EDIT
You need to choose between one of the following:
Accuracy of the items
Accuracy of the sum (if you're summing rounded values)
Consistency between the rounded items and the rounded sum)
Most of the time when dealing with percentages #3 is the best option because it's more obvious when the total equals 101% than when the individual items don't total to 100, and you keep the individual items accurate. "Rounding" 9.596 to 9 is inaccurate in my opinion.
To explain this I sometimes add a footnote that explains that the individual values are rounded and may not total 100% - anyone that understands rounding should be able to understand that explanation.

You could try keeping track of your error due to rounding, and then rounding against the grain if the accumulated error is greater than the fractional portion of the current number.
13.62 -> 14 (+.38)
47.98 -> 48 (+.02 (+.40 total))
9.59 -> 10 (+.41 (+.81 total))
28.78 -> 28 (round down because .81 > .78)
------------
100
Not sure if this would work in general, but it seems to work similar if the order is reversed:
28.78 -> 29 (+.22)
9.59 -> 9 (-.37; rounded down because .59 > .22)
47.98 -> 48 (-.35)
13.62 -> 14 (+.03)
------------
100
I'm sure there are edge cases where this might break down, but any approach is going to be at least somewhat arbitrary since you're basically modifying your input data.

I'm not sure what level of accuracy you need, but what I would do is simply add 1 the first n numbers, n being the ceil of the total sum of decimals. In this case that is 3, so I would add 1 to the first 3 items and floor the rest. Of course this is not super accurate, some numbers might be rounded up or down when it shouldn't but it works okay and will always result in 100%.
So [ 13.626332, 47.989636, 9.596008, 28.788024 ] would be [14, 48, 10, 28] because Math.ceil(.626332+.989636+.596008+.788024) == 3
function evenRound( arr ) {
var decimal = -~arr.map(function( a ){ return a % 1 })
.reduce(function( a,b ){ return a + b }); // Ceil of total sum of decimals
for ( var i = 0; i < decimal; ++i ) {
arr[ i ] = ++arr[ i ]; // compensate error by adding 1 the the first n items
}
return arr.map(function( a ){ return ~~a }); // floor all other numbers
}
var nums = evenRound( [ 13.626332, 47.989636, 9.596008, 28.788024 ] );
var total = nums.reduce(function( a,b ){ return a + b }); //=> 100
You can always inform users that the numbers are rounded and may not be super-accurate...

I once wrote an unround tool, to find the minimal perturbation to a set of numbers to match a goal. It was a different problem, but one could in theory use a similar idea here. In this case, we have a set of choices.
Thus for the first element, we can either round it up to 14, or down to 13. The cost (in a binary integer programming sense) of doing so is less for the round up than the round down, because the round down requires we move that value a larger distance. Similarly, we can round each number up or down, so there are a total of 16 choices we must choose from.
13.626332
47.989636
9.596008
+ 28.788024
-----------
100.000000
I'd normally solve the general problem in MATLAB, here using bintprog, a binary integer programming tool, but there are only a few choices to be tested, so it is easy enough with simple loops to test out each of the 16 alternatives. For example, suppose we were to round this set as:
Original Rounded Absolute error
13.626 13 0.62633
47.99 48 0.01036
9.596 10 0.40399
+ 28.788 29 0.21198
---------------------------------------
100.000 100 1.25266
The total absolute error made is 1.25266. It can be reduced slightly by the following alternative rounding:
Original Rounded Absolute error
13.626 14 0.37367
47.99 48 0.01036
9.596 9 0.59601
+ 28.788 29 0.21198
---------------------------------------
100.000 100 1.19202
In fact, this will be the optimal solution in terms of the absolute error. Of course, if there were 20 terms, the search space will be of size 2^20 = 1048576. For 30 or 40 terms, that space will be of significant size. In that case, you would need to use a tool that can efficiently search the space, perhaps using a branch and bound scheme.

I think the following will achieve what you are after
function func( orig, target ) {
var i = orig.length, j = 0, total = 0, change, newVals = [], next, factor1, factor2, len = orig.length, marginOfErrors = [];
// map original values to new array
while( i-- ) {
total += newVals[i] = Math.round( orig[i] );
}
change = total < target ? 1 : -1;
while( total !== target ) {
// Iterate through values and select the one that once changed will introduce
// the least margin of error in terms of itself. e.g. Incrementing 10 by 1
// would mean an error of 10% in relation to the value itself.
for( i = 0; i < len; i++ ) {
next = i === len - 1 ? 0 : i + 1;
factor2 = errorFactor( orig[next], newVals[next] + change );
factor1 = errorFactor( orig[i], newVals[i] + change );
if( factor1 > factor2 ) {
j = next;
}
}
newVals[j] += change;
total += change;
}
for( i = 0; i < len; i++ ) { marginOfErrors[i] = newVals[i] && Math.abs( orig[i] - newVals[i] ) / orig[i]; }
// Math.round() causes some problems as it is difficult to know at the beginning
// whether numbers should have been rounded up or down to reduce total margin of error.
// This section of code increments and decrements values by 1 to find the number
// combination with least margin of error.
for( i = 0; i < len; i++ ) {
for( j = 0; j < len; j++ ) {
if( j === i ) continue;
var roundUpFactor = errorFactor( orig[i], newVals[i] + 1) + errorFactor( orig[j], newVals[j] - 1 );
var roundDownFactor = errorFactor( orig[i], newVals[i] - 1) + errorFactor( orig[j], newVals[j] + 1 );
var sumMargin = marginOfErrors[i] + marginOfErrors[j];
if( roundUpFactor < sumMargin) {
newVals[i] = newVals[i] + 1;
newVals[j] = newVals[j] - 1;
marginOfErrors[i] = newVals[i] && Math.abs( orig[i] - newVals[i] ) / orig[i];
marginOfErrors[j] = newVals[j] && Math.abs( orig[j] - newVals[j] ) / orig[j];
}
if( roundDownFactor < sumMargin ) {
newVals[i] = newVals[i] - 1;
newVals[j] = newVals[j] + 1;
marginOfErrors[i] = newVals[i] && Math.abs( orig[i] - newVals[i] ) / orig[i];
marginOfErrors[j] = newVals[j] && Math.abs( orig[j] - newVals[j] ) / orig[j];
}
}
}
function errorFactor( oldNum, newNum ) {
return Math.abs( oldNum - newNum ) / oldNum;
}
return newVals;
}
func([16.666, 16.666, 16.666, 16.666, 16.666, 16.666], 100); // => [16, 16, 17, 17, 17, 17]
func([33.333, 33.333, 33.333], 100); // => [34, 33, 33]
func([33.3, 33.3, 33.3, 0.1], 100); // => [34, 33, 33, 0]
func([13.25, 47.25, 11.25, 28.25], 100 ); // => [13, 48, 11, 28]
func( [25.5, 25.5, 25.5, 23.5], 100 ); // => [25, 25, 26, 24]
One last thing, I ran the function using the numbers originally given in the question to compare to the desired output
func([13.626332, 47.989636, 9.596008, 28.788024], 100); // => [48, 29, 13, 10]
This was different to what the question wanted => [ 48, 29, 14, 9]. I couldn't understand this until I looked at the total margin of error
-------------------------------------------------
| original | question | % diff | mine | % diff |
-------------------------------------------------
| 13.626332 | 14 | 2.74% | 13 | 4.5% |
| 47.989636 | 48 | 0.02% | 48 | 0.02% |
| 9.596008 | 9 | 6.2% | 10 | 4.2% |
| 28.788024 | 29 | 0.7% | 29 | 0.7% |
-------------------------------------------------
| Totals | 100 | 9.66% | 100 | 9.43% |
-------------------------------------------------
Essentially, the result from my function actually introduces the least amount of error.
Fiddle here

Note: the selected answer is changing the array order which is not preferred, here I provide more different variations that achieving the same result and keeping the array in order
Discussion
given [98.88, .56, .56] how do you want to round it? you have four option
1- round things up and subtract what is added from the rest of the numbers, so the result becomes [98, 1, 1]
this could be a good answer, but what if we have [97.5, .5, .5, .5, .5, .5]? then you need to round it up to [95, 1, 1, 1, 1, 1]
do you see how it goes? if you add more 0-like numbers, you will lose more value from the rest of your numbers. this could be very troublesome when you have a big array of zero-like number like [40, .5, .5 , ... , .5]. when you round up this, you could end up with an array of ones: [1, 1, .... , 1]
so round-up isn't a good option.
2- you round down the numbers. so [98.88, .56, .56] becomes [98, 0, 0], then you are 2 less than 100. you ignore anything that is already 0, then add up the difference to the biggest numbers. so bigger numbers will get more.
3- same as previous, round down numbers, but you sort descending based on the decimals, divide up the diff based on the decimal, so biggest decimal will get the diff.
4- you round up, but you add what you added to the next number. so like a wave what you have added will be redirected to the end of your array. so [98.88, .56, .56] becomes [99, 0, 1]
none of these are ideal, so be mindful that your data is going to lose its shape.
here I provide a code for cases 2 and 3 (as case No.1 is not practical when you have a lot of zero-like numbers). it's modern Js and doesn't need any library to use
2nd case
const v1 = [13.626332, 47.989636, 9.596008, 28.788024];// => [ 14, 48, 9, 29 ]
const v2 = [16.666, 16.666, 16.666, 16.666, 16.666, 16.666] // => [ 17, 17, 17, 17, 16, 16 ] 
const v3 = [33.333, 33.333, 33.333] // => [ 34, 33, 33 ]
const v4 = [33.3, 33.3, 33.3, 0.1] // => [ 34, 33, 33, 0 ]
const v5 = [98.88, .56, .56] // =>[ 100, 0, 0 ]
const v6 = [97.5, .5, .5, .5, .5, .5] // => [ 100, 0, 0, 0, 0, 0 ]
const normalizePercentageByNumber = (input) => {
const rounded: number[] = input.map(x => Math.floor(x));
const afterRoundSum = rounded.reduce((pre, curr) => pre + curr, 0);
const countMutableItems = rounded.filter(x => x >=1).length;
const errorRate = 100 - afterRoundSum;
const deductPortion = Math.ceil(errorRate / countMutableItems);
const biggest = [...rounded].sort((a, b) => b - a).slice(0, Math.min(Math.abs(errorRate), countMutableItems));
const result = rounded.map(x => {
const indexOfX = biggest.indexOf(x);
if (indexOfX >= 0) {
x += deductPortion;
console.log(biggest)
biggest.splice(indexOfX, 1);
return x;
}
return x;
});
return result;
}
3rd case
const normalizePercentageByDecimal = (input: number[]) => {
const rounded= input.map((x, i) => ({number: Math.floor(x), decimal: x%1, index: i }));
const decimalSorted= [...rounded].sort((a,b)=> b.decimal-a.decimal);
const sum = rounded.reduce((pre, curr)=> pre + curr.number, 0) ;
const error= 100-sum;
for (let i = 0; i < error; i++) {
const element = decimalSorted[i];
element.number++;
}
const result= [...decimalSorted].sort((a,b)=> a.index-b.index);
return result.map(x=> x.number);
}
4th case
you just need to calculate how much extra air added or deducted to your numbers on each roundup and, add or subtract it again in the next item.
const v1 = [13.626332, 47.989636, 9.596008, 28.788024];// => [14, 48, 10, 28 ]
const v2 = [16.666, 16.666, 16.666, 16.666, 16.666, 16.666] // => [17, 16, 17, 16, 17, 17]
const v3 = [33.333, 33.333, 33.333] // => [33, 34, 33]
const v4 = [33.3, 33.3, 33.3, 0.1] // => [33, 34, 33, 0]
const normalizePercentageByWave= v4.reduce((pre, curr, i, arr) => {
let number = Math.round(curr + pre.decimal);
let total = pre.total + number;
const decimal = curr - number;
if (i == arr.length - 1 && total < 100) {
const diff = 100 - total;
total += diff;
number += diff;
}
return { total, numbers: [...pre.numbers, number], decimal };
}, { total: 0, numbers: [], decimal: 0 });

If you have just just two options you are good to use Math.round(). Only problematic pair of values are X.5 (eg. 37.5 and 62.5) it will round both values up and you will end up with 101% as you can try here:
https://jsfiddle.net/f8np1t0k/2/
Since you need to show always 100% you simply remove one percentage from on of them, for example on first one
const correctedARounded = Number.isInteger(aRounded-0.5) ? a - 1 : a
Or you can favor the option with more % votes.
The error of 1% diff happens 114 times for 10k cases of divisions between pairs of 1-100 values.

My JS implementation for the well-voted answer by Varun Vohra
const set1 = [13.626332, 47.989636, 9.596008, 28.788024];
// const set2 = [24.25, 23.25, 27.25, 25.25];
const values = set1;
console.log('Total: ', values.reduce((accum, each) => accum + each));
console.log('Incorrectly Rounded: ',
values.reduce((accum, each) => accum + Math.round(each), 0));
const adjustValues = (values) => {
// 1. Separate integer and decimal part
// 2. Store both in a new array of objects sorted by decimal part descending
// 3. Add in original position to "put back" at the end
const flooredAndSortedByDecimal = values.map((value, position) => (
{
floored: Math.floor(value),
decimal: value - Number.parseInt(value),
position
}
)).sort(({decimal}, {decimal: otherDecimal}) => otherDecimal - decimal);
const roundedTotal = values.reduce((total, value) => total + Math.floor(value), 0);
let availableForDistribution = 100 - roundedTotal;
// Add 1 to each value from what's available
const adjustedValues = flooredAndSortedByDecimal.map(value => {
const { floored, ...rest } = value;
let finalPercentage = floored;
if(availableForDistribution > 0){
finalPercentage = floored + 1;
availableForDistribution--;
}
return {
finalPercentage,
...rest
}
});
// Put back and return the new values
return adjustedValues
.sort(({position}, {position: otherPosition}) => position - otherPosition)
.map(({finalPercentage}) => finalPercentage);
}
const finalPercentages = adjustValues(values);
console.log({finalPercentages})
// { finalPercentage: [14, 48, 9, 29]}

Or something like this for brevity, where you just accumulate the error...
const p = [13.626332, 47.989636, 9.596008, 28.788024];
const round = (a, e = 0) => a.map(x => (r = Math.round(x + e), e += x - r, r));
console.log(round(p));
Result: [14, 48, 9, 29]

If you are rounding it there is no good way to get it exactly the same in all case.
You can take the decimal part of the N percentages you have (in the example you gave it is 4).
Add the decimal parts. In your example you have total of fractional part = 3.
Ceil the 3 numbers with highest fractions and floor the rest.
(Sorry for the edits)

If you really must round them, there are already very good suggestions here (largest remainder, least relative error, and so on).
There is also already one good reason not to round (you'll get at least one number that "looks better" but is "wrong"), and how to solve that (warn your readers) and that is what I do.
Let me add on the "wrong" number part.
Suppose you have three events/entitys/... with some percentages that you approximate as:
DAY 1
who | real | app
----|-------|------
A | 33.34 | 34
B | 33.33 | 33
C | 33.33 | 33
Later on the values change slightly, to
DAY 2
who | real | app
----|-------|------
A | 33.35 | 33
B | 33.36 | 34
C | 33.29 | 33
The first table has the already mentioned problem of having a "wrong" number: 33.34 is closer to 33 than to 34.
But now you have a bigger error. Comparing day 2 to day 1, the real percentage value for A increased, by 0.01%, but the approximation shows a decrease by 1%.
That is a qualitative error, probably quite worse that the initial quantitative error.
One could devise a approximation for the whole set but, you may have to publish data on day one, thus you'll not know about day two. So, unless you really, really, must approximate, you probably better not.

Here's a simpler Python implementation of #varun-vohra answer:
def apportion_pcts(pcts, total):
proportions = [total * (pct / 100) for pct in pcts]
apportions = [math.floor(p) for p in proportions]
remainder = total - sum(apportions)
remainders = [(i, p - math.floor(p)) for (i, p) in enumerate(proportions)]
remainders.sort(key=operator.itemgetter(1), reverse=True)
for (i, _) in itertools.cycle(remainders):
if remainder == 0:
break
else:
apportions[i] += 1
remainder -= 1
return apportions
You need math, itertools, operator.

check if this is valid or not as far as my test cases I am able to get this working.
let's say number is k;
sort percentage by descending oder.
iterate over each percentage from descending order.
calculate percentage of k for first percentage take Math.Ceil of output.
next k = k-1
iterate over till all percentage is consumed.

I have implemented the method from Varun Vohra's answer here for both lists and dicts.
import math
import numbers
import operator
import itertools
def round_list_percentages(number_list):
"""
Takes a list where all values are numbers that add up to 100,
and rounds them off to integers while still retaining a sum of 100.
A total value sum that rounds to 100.00 with two decimals is acceptable.
This ensures that all input where the values are calculated with [fraction]/[total]
and the sum of all fractions equal the total, should pass.
"""
# Check input
if not all(isinstance(i, numbers.Number) for i in number_list):
raise ValueError('All values of the list must be a number')
# Generate a key for each value
key_generator = itertools.count()
value_dict = {next(key_generator): value for value in number_list}
return round_dictionary_percentages(value_dict).values()
def round_dictionary_percentages(dictionary):
"""
Takes a dictionary where all values are numbers that add up to 100,
and rounds them off to integers while still retaining a sum of 100.
A total value sum that rounds to 100.00 with two decimals is acceptable.
This ensures that all input where the values are calculated with [fraction]/[total]
and the sum of all fractions equal the total, should pass.
"""
# Check input
# Only allow numbers
if not all(isinstance(i, numbers.Number) for i in dictionary.values()):
raise ValueError('All values of the dictionary must be a number')
# Make sure the sum is close enough to 100
# Round value_sum to 2 decimals to avoid floating point representation errors
value_sum = round(sum(dictionary.values()), 2)
if not value_sum == 100:
raise ValueError('The sum of the values must be 100')
# Initial floored results
# Does not add up to 100, so we need to add something
result = {key: int(math.floor(value)) for key, value in dictionary.items()}
# Remainders for each key
result_remainders = {key: value % 1 for key, value in dictionary.items()}
# Keys sorted by remainder (biggest first)
sorted_keys = [key for key, value in sorted(result_remainders.items(), key=operator.itemgetter(1), reverse=True)]
# Otherwise add missing values up to 100
# One cycle is enough, since flooring removes a max value of < 1 per item,
# i.e. this loop should always break before going through the whole list
for key in sorted_keys:
if sum(result.values()) == 100:
break
result[key] += 1
# Return
return result

For those having the percentages in a pandas Series, here is my implemantation of the Largest remainder method (as in Varun Vohra's answer), where you can even select the decimals to which you want to round.
import numpy as np
def largestRemainderMethod(pd_series, decimals=1):
floor_series = ((10**decimals * pd_series).astype(np.int)).apply(np.floor)
diff = 100 * (10**decimals) - floor_series.sum().astype(np.int)
series_decimals = pd_series - floor_series / (10**decimals)
series_sorted_by_decimals = series_decimals.sort_values(ascending=False)
for i in range(0, len(series_sorted_by_decimals)):
if i < diff:
series_sorted_by_decimals.iloc[[i]] = 1
else:
series_sorted_by_decimals.iloc[[i]] = 0
out_series = ((floor_series + series_sorted_by_decimals) / (10**decimals)).sort_values(ascending=False)
return out_series

Here's a Ruby gem that implements the Largest Remainder method:
https://github.com/jethroo/lare_round
To use:
a = Array.new(3){ BigDecimal('0.3334') }
# => [#<BigDecimal:887b6c8,'0.3334E0',9(18)>, #<BigDecimal:887b600,'0.3334E0',9(18)>, #<BigDecimal:887b4c0,'0.3334E0',9(18)>]
a = LareRound.round(a,2)
# => [#<BigDecimal:8867330,'0.34E0',9(36)>, #<BigDecimal:8867290,'0.33E0',9(36)>, #<BigDecimal:88671f0,'0.33E0',9(36)>]
a.reduce(:+).to_f
# => 1.0

I wrote a function in Javascript that takes an array of percentages and outputs an array with rounded percentages using the Largest Remainder Method. It doesn't use any libraries.
Input: [21.6, 46.7, 31, 0.5, 0.2]
Output: [22, 47, 31, 0, 0]
const values = [21.6, 46.7, 31, 0.5, 0.2];
console.log(roundPercentages(values));
function roundPercentages(values) {
const flooredValues = values.map(e => Math.floor(e));
const remainders = values.map(e => e - Math.floor(e));
const totalRemainder = 100 - flooredValues.reduce((a, b) => a + b);
// Deep copy because order of remainders is important
[...remainders]
// Sort from highest to lowest remainder
.sort((a, b) => b - a)
// Get the n largest remainder values, where n = totalRemainder
.slice(0, totalRemainder)
// Add 1 to the floored percentages with the highest remainder (divide the total remainder)
.forEach(e => flooredValues[remainders.indexOf(e)] += 1);
return flooredValues;
}

This is a case for banker's rounding, aka 'round half-even'. It is supported by BigDecimal. Its purpose is to ensure that rounding balances out, i.e. doesn't favour either the bank orthe customer.

Ranking in Linq

There's a generic list of numbers, say
{980, 850,700, 680}---n nos.
I try to compare the above list with a decimal no. say 690., the O/p I need is,to get the ranking of the number which I'm gonna input("692). i,e the desired O/P should be Ranking ="4"
How can I get the O/p for above scenario..??

Following on from Alex's post I think you are looking for
var numbers = new List<int>() { 980, 850, 700, 680 };
var dec = new Decimal(692.0);
var temp = numbers.Count(x => x > dec) + 1;
this will return the position you are looking for

If you want to look for an exact match of a decimal input to a int on the list,you can use FindIndex.
var numbers = new List<int>() { 980, 850, 700, 680 };
var dec = new Decimal(680.0);
var res = numbers.FindIndex(x => x == dec);
It returns the 0-based position of the match.

Your question is not clear, i'm not sure what role 690 is playing.
Assuming that the user can ernter a number and you want to find the rank(index) of the number in the list when it would be inserted. Assuming also that your list should be sorted descending since you want the position of the new int according to it's value:
var input = 692;
var numbers = new List<int>() { 980, 850, 700, 680 };
var firstLower = numbers.OrderByDescending(i => i)
.Select((i, index) => new { Value = i, Index = index })
.FirstOrDefault(x => x.Value < input);
var rank = firstLower == null ? numbers.Count + 1 : firstLower.Index + 1;
Note that the OrderByDescending might be redundant if your list is already sorted, but i assume that your sample data is only sorted accidentally.

Dynamic linq - Group by interval (DateTime, Numeric)

I search everywhere and didn`t find anwser for this question. I want to group by intervals (DateTime, Numeric) in Dynamic linq (the data will be crated dynamically so i must use dynamic linq)
Lets assume that we have such data:
ID|Date|Price
1|2010-11-01|100
2|2010-11-01|120
3|2010-11-02|50
4|2010-12-01|30
5|2010-12-01|220
6|2011-01-01|400
How to get this data grouped by like this
-(Group by Day) following groups
->2010-11-01 = 2 elements
->2010-11-02 = 1 elements
->2010-12-01 = 2 elements
->2011-01-01 = 1 elements
-(Group by Month) following groups
->2010-11 = 3 elements
->2010-12 = 2 elements
->2011-01 = 1 elements
-(Group by Quarter) following groups
->2010 q.03 = 5 elements
->2011 q.01 = 1 elements
-(Group by Year) following groups
->2010 = 5 elements
->2011 = 1 element
-(Group by Price (From 0, Each 50)) following groups
-> <0-50) = 1 elements
-> <50-100) = 1 elements
-> <100-150) = 2 elements
-> <200-250) = 1 elements
-> <400-450) = 1 elements
-(ideally it would be Group by Price (From 0-50,From 50-150, From 150-500)) following groups
-> <0-50) = 1 elements
-> <50-150) = 3 elements
-> <150-500) = 2 elements
Any Ideas? I stress again - it must be DYNAMIC LINQ or eventually some sophisticated lambda expression? I should been able to "group" it by column name that will be in string. e.g.
GroupBy("Date"), GroupBy("Price");

Here's how to do it:
For instance:
Group by Month
public Item[] data =
{
new Item { Date = new DateTime(2011, 11, 6), Price = 103, Name = "a" },
new Item { Date = new DateTime(2011, 11, 16), Price = 110, Name = "b" },
new Item { Date = new DateTime(2011, 12, 4), Price = 200, Name = "c" },
new Item { Date = new DateTime(2011, 12, 4), Price = 230, Name = "d" },
new Item { Date = new DateTime(2012, 1, 15), Price = 117, Name = "e" }
};
var groups = data.AsQueryable().GroupBy("Date.Month", "it").Cast<IGrouping<int, Item>>();
You can use "Date.Day" and "Date.Year" and for something like a price range you could use a function which maps everything in the range onto the same value e.g. using integer division "(it.Price / 50)"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

d3 stack on data without header - d3.js

Related

Group sorted array according a mean difference threshold

Returning A Sorted List's Index in Lua

How to make rounded percentages add up to 100%

Ranking in Linq

Dynamic linq - Group by interval (DateTime, Numeric)

Categories

Resources