D3: use d3.max to get containing object? - d3.js

I have the following data, an array of objects:
var data = [
{ x: 0, y0: 0, y: 100 },
{ x: 1, y0: 0, y: 150 },
{ x: 2, y0: 50, y: 100 },
{ x: 3, y0: 50, y: 150 }
]
I'd like to find the object with the biggest discrepancy between y and y0, using D3.
I can do this to get the biggest difference:
var max_val = d3.max(data, function(d) { return d.y - d.y0;} );
It returns 150. But what I don't know how to do is get the containing object, and learn that the corresponding value of x is 1.
Any ideas?

Your question asks how to use d3.max to find this object, but an alternate idea is to use the Javascript Array's sort function to do this for you:
>>> data.sort(function(a, b){ return (b.y - b.y0) - (a.y - a.y0); } )[0]
Object {x: 1, y0: 0, y: 150}
Here I am sorting data using a function that, given two objects a and b, compares them using the difference of their y and y0 properties. By subtracting the value of a from b, I'm returning the objects in descending order and then taking the first Object.

I believe currently there isn't a good way of doing this through purely d3. From the d3 docs it states:
Returns the maximum value in the given array using natural order. If
the array is empty, returns undefined. An optional accessor function
may be specified, which is equivalent to calling array.map(accessor)
before computing the maximum value.
I went into the source code to look at how d3.max is calculated:
d3.max = function(array, f) {
var i = -1, n = array.length, a, b;
if (arguments.length === 1) {
while (++i < n && !((a = array[i]) != null && a <= a)) a = undefined;
while (++i < n) if ((b = array[i]) != null && b > a) a = b;
} else {
while (++i < n && !((a = f.call(array, array[i], i)) != null && a <= a)) a = undefined;
while (++i < n) if ((b = f.call(array, array[i], i)) != null && b > a) a = b;
}
return a;
}
This is partly due to the fact that your accessor function returns a value itself. You can probably customize it to make it return an object, but d3.max specifically handles numbers.
As other people have stated, there are other ways of handling this with pure Javascript.
Related questions:
Why is domain not using d3.max(data) in D3?

You can also use the standard array.reduce function to find this without using d3 :
var highestDiscrepencyObject = data.reduce(function(memo, val){
var dis = val.y - val.y0,
memoDis = memo.y - memo.y0;
return (dis > memoDis || memo.y === undefined) ? val : memo;
}, {});
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/Reduce

Related

Minimum number of steps using only multiply A by 2, or divide A by 2 or increment A by one to go from number A to B

Given two numbers A and B, what is the minimum number of steps to transform number A to become number B.
A step can either be A *= 2, A++ or A /= 2 if and only if A is an even number.
What is the most efficient algorithm to achieve this?
Suppose A and B can be really large numbers.
Here's my take, done in C#.
var a = 2;
var b = 15;
var found = new HashSet<int>() { a };
var operations = new (string operation, Func<int, bool> condition, Func<int, int> projection)[]
{
("/2", x => x % 2 == 0, x => x / 2),
("*2", x => x <= int.MaxValue / 2, x => x *2),
("+1", x => true, x => x + 1),
};
IEnumerable<(int count, string operations, int value)> Project((int count, string operations, int value) current)
{
foreach (var operation in operations)
{
if (operation.condition(current.value))
{
var value = operation.projection(current.value);
if (!found.Contains(value))
{
found.Add(value);
yield return (current.count + 1, $"{current.operations}, {operation.operation}", value);
}
}
}
}
var candidates = new[] { (count: 0, operations: $"{a}", value: a) };
while (!found.Contains(b))
{
candidates =
candidates
.SelectMany(c => Project(c))
.ToArray();
}
var result = candidates.Where(x => x.value == b).First();
Console.WriteLine($"{result.count} operations: {result.operations} = {result.value}");
That outputs:
5 operations: 2, +1, *2, +1, *2, +1 = 15
Basically, this is starting with a at the zeroth step. It then takes this generation and produces all possible values from the operations to create the next generation. If it produces a value that it has already seen it discards the value as there is an equal or faster operation to produce the value. It keeps repeating until b is found.

How to divide a set into two sets such that the difference of the average is minimum?

As I understand, it is related to the partition problem.
But I would like to ask a slightly different problem which I don't care about the sum but the average. In this case, it needs to optimize 2 constraints (sum and number of items) at the same time. It seems to be a harder problem and I cannot see any solutions online.
Are there any solutions for this variant? Or how does it relate to the partition problem?
Example:
input X = [1,1,1,1,1,6]
output based on sum: A = [1,1,1,1,1], B=[6]
output based on average: A = [1], B=[1,1,1,1,6]
On some inputs, a modification of the dynamic program for the usual partition problem will give a speedup. We have to classify each partial solution by its count and sum instead of just sum, which slows things down a bit. Python 3 below (note that the use of dictionaries implicitly collapses functionally identical partial solutions):
def children(ab, x):
a, b = ab
yield a + [x], b
yield a, b + [x]
def proper(ab):
a, b = ab
return a and b
def avg(lst):
return sum(lst) / len(lst)
def abs_diff_avg(ab):
a, b = ab
return abs(avg(a) - avg(b))
def min_abs_diff_avg(lst):
solutions = {(0, 0): ([], [])}
for x in lst:
solutions = {
(sum(a), len(a)): (a, b)
for ab in solutions.values()
for (a, b) in children(ab, x)
}
return min(filter(proper, solutions.values()), key=abs_diff_avg)
print(min_abs_diff_avg([1, 1, 1, 1, 1, 6]))
let S_i the sum of a subset of v of size i
let S be the total sum of v, n the length of v
the err to minimize is
err_i = |avg(S_i) - avg(S-S_i)|
err_i = |S_i/i - (S-S_i)/(n-i)|
err_i = |(nS_i - iS)/(i(n-i))|
algorithm below does:
for all tuple sizes (1,...,n/2) as i
- for all tuples of size i-1 as t_{i-1}
- generate all possible tuple of size i from t_{i-1} by adjoining one elem from v
- track best tuple in regard of err_i
The only cut I found being:
for two tuples of size i having the same sum, keep the one whose last element's index is the smallest
e.g given tuples A, B (where X is some taken element from v)
A: [X,....,X....]
B: [.,X,.....,X..]
keep A because its right-most element has the minimal index
(idea being that at size 3, A will offer the same candidates as B plus some more)
function generateTuples (v, tuples) {
const nextTuples = new Map()
for (const [, t] of tuples) {
for (let l = t.l + 1; l < v.length; ++l) {
const s = t.s + v[l]
if (!nextTuples.has(s) || nextTuples.get(s).l > l) {
const nextTuple = { v: t.v.concat(l), s, l }
nextTuples.set(s, nextTuple)
}
}
}
return nextTuples
}
function processV (v) {
const fErr = (() => {
const n = v.length
const S = v.reduce((s, x) => s + x, 0)
return ({ s: S_i, v }) => {
const i = v.length
return Math.abs((n * S_i - i * S) / (i * (n - i)))
}
})()
let tuples = new Map([[0, { v: [], s: 0, l: -1 }]])
let best = null
let err = 9e3
for (let i = 0; i < Math.ceil(v.length / 2); ++i) {
const nextTuples = generateTuples(v, tuples)
for (const [, t] of nextTuples) {
if (fErr(t) <= err) {
best = t
err = fErr(t)
}
}
tuples = nextTuples
}
const s1Indices = new Set(best.v)
return {
sol: v.reduce(([v1, v2], x, i) => {
(s1Indices.has(i) ? v1 : v2).push(x)
return [v1, v2]
}, [[], []]),
err
}
}
console.log('best: ', processV([1, 1, 1, 1, 1, 6]))
console.log('best: ', processV([1, 2, 3, 4, 5]))
console.log('best: ', processV([1, 3, 5, 7, 7, 8]))

Function to return value from "chopped" input range

I'm looking for a function or algorithm that, for a value in a specified range, will return a value in the same range but based on the chop/divide the value is in. Difficult to explain - some expected output based on this empty shell
function choppedRange(value, min, max, chops) {
// value - a value in range min to max
// chops - integer defining how many "subranges" or "chops" to return values from
...
}
// Map (linear conversion) input value in range oldMin -> oldMax
// to a value in range newMin -> newMax
function remap(oldValue, oldMin, oldMax, newMin, newMax) {
return (((oldValue - oldMin) * (newMax - newMin)) / (oldMax - oldMin)) + newMin;
}
What the illustration is trying to show is the following:
Determine in which chop the input value (blue dot) is. (let's say it's value is 0.35...)
Remap (function above) the value (blue) 0.35 in range 0,25 to 0.5
to be in the range of min and max arguments e.g. 0 to 1: remap(0.35, 0.25, 0.5, 0, 1)
The output (green dot) with the example values should be 0.3999...
Hope you can see what I mean.
As you can see I have the remapping done - but I'm having trouble deciding the "chop" values for the input to remap - like this:
remap(input_value, chop_min, chop_max, min, max)
I want to find chop_min and chop_max based on the chops argument in choppedRange
function choppedRange(value, min, max, chops) {
// Figure out chop_min and chop_max
...
return remap(value, chop_min, chop_max, min, max)
}
I ended up with this implementation
function choppedRange(value, min, max, chops) {
// value - a value in range min to max
// chops - integer defining how many "subranges" or "chops" to return values from
var chopMin = min, chopMax = max, chopValue = max / chops, i, c
for (i = 1; i <= chops; i++) {
c = chopValue * i
if (c < value)
chopMin = c
if (c >= value) {
chopMax = c
break;
}
}
return (((value - chopMin) * (max - min)) / (chopMax - chopMin)) + min
}
var r = choppedRange(0, 0, 1, 4)
console.log('Result',r)
r = choppedRange(0.35, 0, 1, 4)
console.log('Result',r)
r = choppedRange(0.5, 0, 1, 4)
console.log('Result',r)
r = choppedRange(0.6, 0, 1, 4)
console.log('Result',r)
r = choppedRange(1, 0, 1, 4)
console.log('Result',r)

how to get localizable or customizable si codes with d3.format

When drawing graphs using SI codes is pretty much what we want. Our y-axis values tend to be large currency values. eg: $10,411,504,201.20
Abbreviating this, at least in a US locale, this should translate to $10.4B.
But using d3.format's 's' type for SI codes this would display as $10.4G. This might be great for some locales and good when dealing with computer-based values (eg: processor speed, memory...), but not so with currency or other non-computer types of values.
Is there a way to get locale-specific functionality similar to SI-codes that would convert billions to B instead of G, etc...?
(I realize this is mostly an SI-codes thing and not specific to D3, but since I'm using D3 this seems the most appropriate tag.)
I prefer overriding d3.formatPrefix. Then you can just forget about replacing strings within your viz code. Simply execute the following code immediately after loading D3.js.
// Change D3's SI prefix to more business friendly units
// K = thousands
// M = millions
// B = billions
// T = trillion
// P = quadrillion
// E = quintillion
// small decimals are handled with e-n formatting.
var d3_formatPrefixes = ["e-24","e-21","e-18","e-15","e-12","e-9","e-6","e-3","","K","M","B","T","P","E","Z","Y"].map(d3_formatPrefix);
// Override d3's formatPrefix function
d3.formatPrefix = function(value, precision) {
var i = 0;
if (value) {
if (value < 0) {
value *= -1;
}
if (precision) {
value = d3.round(value, d3_format_precision(value, precision));
}
i = 1 + Math.floor(1e-12 + Math.log(value) / Math.LN10);
i = Math.max(-24, Math.min(24, Math.floor((i - 1) / 3) * 3));
}
return d3_formatPrefixes[8 + i / 3];
};
function d3_formatPrefix(d, i) {
var k = Math.pow(10, Math.abs(8 - i) * 3);
return {
scale: i > 8 ? function(d) { return d / k; } : function(d) { return d * k; },
symbol: d
};
}
function d3_format_precision(x, p) {
return p - (x ? Math.ceil(Math.log(x) / Math.LN10) : 1);
}
After running this code, try formatting a number with SI prefix:
d3.format(".3s")(1234567890) // 1.23B
You could augment this code pretty simply to support different locales by including locale-specific d3_formatPrefixes values in an object and then select the proper one that matches a locale you need.
I like the answer by #nross83
Just going to paste a variation that I think might be more robust.
Example:
import { formatLocale, formatSpecifier } from "d3";
const baseLocale = {
decimal: ".",
thousands: ",",
grouping: [3],
currency: ["$", ""],
};
// You can define your own si prefix abbr. here
const d3SiPrefixMap = {
y: "e-24",
z: "e-21",
a: "e-18",
f: "e-15",
p: "e-12",
n: "e-9",
µ: "e-6",
m: "e-3",
"": "",
k: "K",
M: "M",
G: "B",
T: "T",
P: "P",
E: "E",
Z: "Z",
Y: "Y",
};
const d3Format = (specifier: string) => {
const locale = formatLocale({ ...baseLocale });
const formattedSpecifier = formatSpecifier(specifier);
const valueFormatter = locale.format(specifier);
return (value: number) => {
const result = valueFormatter(value);
if (formattedSpecifier.type === "s") {
// modify the return value when using si-prefix.
const lastChar = result[result.length - 1];
if (Object.keys(d3SiPrefixMap).includes(lastChar)) {
return result.slice(0, -1) + d3SiPrefixMap[lastChar];
}
}
// return the default result from d3 format in case the format type is not set to `s` (si suffix)
return result;
};
}
And use it like the following:
const value = 1000000000;
const formattedValue = d3Format("~s")(value);
console.log({formattedValue}); // Outputs: {formattedValue: "1B"}
We used the formatSpecifier function from d3-format to check if the format type is s, i.e. si suffix, and only modify the return value in this case.
In the example above, I have not modified the actual d3 function. You can change the code accordingly if you want to do that for the viz stuff.
I hope this answer is helpful. Thank you :)

Algorithm to generate a sequence proportional to specified percentage

Given a Map of objects and designated proportions (let's say they add up to 100 to make it easy):
val ss : Map[String,Double] = Map("A"->42, "B"->32, "C"->26)
How can I generate a sequence such that for a subset of size n there are ~42% "A"s, ~32% "B"s and ~26% "C"s? (Obviously, small n will have larger errors).
(Work language is Scala, but I'm just asking for the algorithm.)
UPDATE: I resisted a random approach since, for instance, there's ~16% chance that the sequence would start with AA and ~11% chance it would start with BB and there would be very low odds that for n precisely == (sum of proportions) the distribution would be perfect. So, following #MvG's answer, I implemented as follows:
/**
Returns the key whose achieved proportions are most below desired proportions
*/
def next[T](proportions : Map[T, Double], achievedToDate : Map[T,Double]) : T = {
val proportionsSum = proportions.values.sum
val desiredPercentages = proportions.mapValues(v => v / proportionsSum)
//Initially no achieved percentages, so avoid / 0
val toDateTotal = if(achievedToDate.values.sum == 0.0){
1
}else{
achievedToDate.values.sum
}
val achievedPercentages = achievedToDate.mapValues(v => v / toDateTotal)
val gaps = achievedPercentages.map{ case (k, v) =>
val gap = desiredPercentages(k) - v
(k -> gap)
}
val maxUnder = gaps.values.toList.sortWith(_ > _).head
//println("Max gap is " + maxUnder)
val gapsForMaxUnder = gaps.mapValues{v => Math.abs(v - maxUnder) < Double.Epsilon }
val keysByHasMaxUnder = gapsForMaxUnder.map(_.swap)
keysByHasMaxUnder(true)
}
/**
Stream of most-fair next element
*/
def proportionalStream[T](proportions : Map[T, Double], toDate : Map[T, Double]) : Stream[T] = {
val nextS = next(proportions, toDate)
val tailToDate = toDate + (nextS -> (toDate(nextS) + 1.0))
Stream.cons(
nextS,
proportionalStream(proportions, tailToDate)
)
}
That when used, e.g., :
val ss : Map[String,Double] = Map("A"->42, "B"->32, "C"->26)
val none : Map[String,Double] = ss.mapValues(_ => 0.0)
val mySequence = (proportionalStream(ss, none) take 100).toList
println("Desired : " + ss)
println("Achieved : " + mySequence.groupBy(identity).mapValues(_.size))
mySequence.map(s => print(s))
println
produces :
Desired : Map(A -> 42.0, B -> 32.0, C -> 26.0)
Achieved : Map(C -> 26, A -> 42, B -> 32)
ABCABCABACBACABACBABACABCABACBACABABCABACABCABACBA
CABABCABACBACABACBABACABCABACBACABABCABACABCABACBA
For a deterministic approach, the most obvious solution would probably be this:
Keep track of the number of occurrences of each item in the sequence so far.
For the next item, choose that item for which the difference between intended and actual count (or proportion, if you prefer that) is maximal, but only if the intended count (resp. proportion) is greater than the actual one.
If there is a tie, break it in an arbitrary but deterministic way, e.g. choosing the alphabetically lowest item.
This approach would ensure an optimal adherence to the prescribed ratio for every prefix of the infinite sequence generated in this way.
Quick & dirty python proof of concept (don't expect any of the variable “names” to make any sense):
import sys
p = [0.42, 0.32, 0.26]
c = [0, 0, 0]
a = ['A', 'B', 'C']
n = 0
while n < 70*5:
n += 1
x = 0
s = n*p[0] - c[0]
for i in [1, 2]:
si = n*p[i] - c[i]
if si > s:
x = i
s = si
sys.stdout.write(a[x])
if n % 70 == 0:
sys.stdout.write('\n')
c[x] += 1
Generates
ABCABCABACABACBABCAABCABACBACABACBABCABACABACBACBAABCABCABACABACBABCAB
ACABACBACABACBABCABACABACBACBAABCABCABACABACBABCAABCABACBACABACBABCABA
CABACBACBAABCABCABACABACBABCABACABACBACBAACBABCABACABACBACBAABCABCABAC
ABACBABCABACABACBACBAACBABCABACABACBACBAABCABCABACABACBABCABACABACBACB
AACBABCABACABACBACBAABCABCABACABACBABCAABCABACBACBAACBABCABACABACBACBA
For every item of the sequence, compute a (pseudo-)random number r equidistributed between 0 (inclusive) and 100 (exclusive).
If 0 ≤ r < 42, take A
If 42 ≤ r < (42+32), take B
If (42+32) ≤ r < (42+32+26)=100, take C
The number of each entry in your subset is going to be the same as in your map, but with a scaling factor applied.
The scaling factor is n/100.
So if n was 50, you would have { Ax21, Bx16, Cx13 }.
Randomize the order to your liking.
The simplest "deterministic" [in terms of #elements of each category] solution [IMO] will be: add elements in predefined order, and then shuffle the resulting list.
First, add map(x)/100 * n elements from each element x chose how you handle integer arithmetics to avoid off by one element], and then shuffle the resulting list.
Shuffling a list is simple with fisher-yates shuffle, which is implemented in most languages: for example java has Collections.shuffle(), and C++ has random_shuffle()
In java, it will be as simple as:
int N = 107;
List<String> res = new ArrayList<String>();
for (Entry<String,Integer> e : map.entrySet()) { //map is predefined Map<String,Integer> for frequencies
for (int i = 0; i < Math.round(e.getValue()/100.0 * N); i++) {
res.add(e.getKey());
}
}
Collections.shuffle(res);
This is nondeterministic, but gives a distribution of values close to MvG's. It suffers from the problem that it could give AAA right at the start. I post it here for completeness' sake given how it proves my dissent with MvG was misplaced (and I don't expect any upvotes).
Now, if someone has an idea for an expand function that is deterministic and won't just duplicate MvG's method (rendering the calc function useless), I'm all ears!
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>ErikE's answer</title>
</head>
<body>
<div id="output"></div>
<script type="text/javascript">
if (!Array.each) {
Array.prototype.each = function(callback) {
var i, l = this.length;
for (i = 0; i < l; i += 1) {
callback(i, this[i]);
}
};
}
if (!Array.prototype.sum) {
Array.prototype.sum = function() {
var sum = 0;
this.each(function(i, val) {
sum += val;
});
return sum;
};
}
function expand(counts) {
var
result = "",
charlist = [],
l,
index;
counts.each(function(i, val) {
char = String.fromCharCode(i + 65);
for ( ; val > 0; val -= 1) {
charlist.push(char);
}
});
l = charlist.length;
for ( ; l > 0; l -= 1) {
index = Math.floor(Math.random() * l);
result += charlist[index];
charlist.splice(index, 1);
}
return result;
}
function calc(n, proportions) {
var percents = [],
counts = [],
errors = [],
fnmap = [],
errorSum,
worstIndex;
fnmap[1] = "min";
fnmap[-1] = "max";
proportions.each(function(i, val) {
percents[i] = val / proportions.sum() * n;
counts[i] = Math.round(percents[i]);
errors[i] = counts[i] - percents[i];
});
errorSum = counts.sum() - n;
while (errorSum != 0) {
adjust = errorSum < 0 ? 1 : -1;
worstIndex = errors.indexOf(Math[fnmap[adjust]].apply(0, errors));
counts[worstIndex] += adjust;
errors[worstIndex] = counts[worstIndex] - percents[worstIndex];
errorSum += adjust;
}
return expand(counts);
}
document.body.onload = function() {
document.getElementById('output').innerHTML = calc(99, [25.1, 24.9, 25.9, 24.1]);
};
</script>
</body>
</html>

Resources