display non-uniform datas with a gauss curve (a bit like kernel density estimation) - d3.js

I've got this kind of non uniforme datas :
[{'time':0,'sum':0},{'time':600,'sum':2},{'time':700,'sum':4},{'time':1200,'sum':1},{'time':1300,'sum':3},{'time':1600,'sum':1},{'time':2000,'sum':0}];
"time" is on x axis and "sum" on y axis. If I make an area, I've got these shapes (curved in red, not curved in white) :
https://codepen.io/kilden/pen/podadRW
But the meaning of this is wrong. I have to interpret the "missing" datas. A bit like the "kernel density estimation" charts (example here :https://bl.ocks.org/mbostock/4341954) where values are at zero when there is no data, but there is a "fall off" around the point with data. (a gaussian curve)
It's hard to explain with words (and English is not my mother tongue). So I did this second codepen to show the idea of the shape. The area in red is the shape I want (White one is the reference of the first codepen) :
https://codepen.io/kilden/pen/VwrQrbo
I wonder if there is a way to make this kind of cumulative gaussian curves with a (hidden?) d3 function or a trick function ?

A. Your cheating yourself when you use the Epanechnikov kernel, evaluate these on a rather coarse grid and make a smooth line interpolation so that it looks gaussian. Just take a gaussian kernel to start with.
B. You're comparing apples and oranges. A kernel density estimate is an estimate of a probability density that cannot be compared to the count of observations. The integral of the kernel density estimate is always equal to 1. You can scale the estimate by the total count of observations, but even then your curve would not "stick" to the point, since the kernel spreads the observation away from the point.
C. What comes close to what you want to achieve is implemented below. Use a gaussian curve which is 1 at 0, i. e. without the normalizing factor and the rescaling by the bandwidth. The bandwidth now scales only the width of the curve but not its height. Then use your original data array and add up all these curves with the weight sum from your data array.
This will match your data points when there are no clustered observations. Naturally, when two observations are close to each other, their individual gaussian curves can add up to something bigger than each observation.
DISCLAIMER: As I already pointed out in the comments, this just produces a pretty chart and is mathematical nonsense. I strongly recommend working out the mathematics behind what it is you really want to achieve. Only then you should make a chart of your data.
const WIDTH = 600;
const HEIGHT = 150;
const BANDWIDTH = 25;
let data = [
{time: 0, sum: 0},
{time: 200, sum: 4},
{time: 250, sum: 2},
{time: 500, sum: 1},
{time: 600, sum: 2},
{time: 1500, sum: 5},
{time: 1600, sum: 4},
{time: 1800, sum: 3},
{time: 2000, sum: 0},
];
// svg
const svg = d3.select("body")
.append("svg")
.attr("width", WIDTH)
.attr("height", HEIGHT)
.style("background-color", "grey");
// scales
const x_scale = d3.scaleLinear()
.domain([0, 2000])
.range([0, WIDTH]);
const y_scale = d3.scaleLinear()
.range([HEIGHT, 0]);
// curve interpolator
const line = d3.line()
.x(d => x_scale(d.time))
.y(d => y_scale(d.sum))
.curve(d3.curveMonotoneX);
const grid = [...Array(2001).keys()];
svg.append("path")
.style("fill", "rgba(255,255,255,0.4");
// gaussian "kernel"
const gaussian = k => x => Math.exp(-0.5 * x / k * x / k);
// similar to kernel density estimate
function estimate(kernel, grid) {
return obs => grid.map(x => ({time: x, sum: d3.sum(obs, d => d.sum * kernel(x - d.time))}));
}
function render(data) {
data = data.sort((a, b) => a.time - b.time);
// make curve estimate with these kernels
const curve_estimate = estimate(gaussian(BANDWIDTH), grid)(data);
// set endpoints to zero for area plot
curve_estimate[0].sum = 0;
curve_estimate[curve_estimate.length-1].sum = 0;
y_scale.domain([0, 1.5 * Math.max(d3.max(data, d => d.sum), d3.max(curve_estimate, d => d.sum))]);
svg.select("path")
.attr("d", line(curve_estimate))
const circles = svg.selectAll("circle")
.data(data, d => d.time)
.join(
enter => enter.append("circle")
.attr("fill", "red"),
update => update.attr("fill", "white")
)
.attr("r", 2)
.attr("cx", d => x_scale(d.time))
.attr("cy", d => y_scale(d.sum));
}
render(data);
function randomData() {
data = [];
for (let i = 0; i < 10; i++) {
data.push({
time: Math.round(2000 * Math.random()),
sum: Math.round(10 * Math.random()) + 1,
});
}
render(data);
}
function addData() {
data.push({
time: Math.round(2000 * Math.random()),
sum: Math.round(10 * Math.random()) + 1,
});
render(data);
}
d3.select("#random_data").on("click", randomData);
d3.select("#add_data").on("click", addData);
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/7.3.0/d3.min.js"></script>
<button id="random_data">
Random Data
</button>
<button id="add_data">
Add data point
</button>

Related

How to match up scaleBand with scaleLinear in D3.js

I have two series of scale, one is linear and the other is band, how can I make them to match up if there is some caps in the data.
Take a look at the example if necessary.
Mouse over and you see the boxes are not matching with the breaks of line.
If you want your scaleBand to be scaled (widened) where data is missing, I don't think that the scaleBand is the proper method for this, but it is unclear if that is something you want. Band scales are intended to provide equal spacing for each data value and that all values are present - it is an ordinal scale.
Assuming you only want the band scale to be aligned with your data where it is present:
If you log the domains of each of your x scales (scaleBand and scaleLinear) we find that the scaleBand has a domain of:
[ "1", "2", "8", "9", "13", "14", "20", "22" ] // 8 elements
And the scaleLinear has a domain of:
[ 1, 22 ] // a span of 22 'elements'
The scaleBand will need an equivalent domain to the scaleLinear. You can do this statically ( which I show mostly to demonstrate how d3.range will work):
let xBand = d3.scaleBand()
.domain(d3.range(1,23))
.rangeRound([0, width]);
This actually produces a domain that has 22 elements from 1 through 22.
or dynamically:
let xBand = d3.scaleBand()
.domain(d3.range(d3.min(testData1, d => d[0],
d3.max(testData1, d => d[0]+1)))
You could do this other ways, but the d3.range() function is nice and easy.
However, there is still one issue that remains, this is aligning the ticks between the two scales. For the linear scale, the tick for the first value (1) is on the y axis, but the band gap scale starts (and is not centered) on the y axis and fills the gap between 1 and 2. In other words, the center point of the band does not align vertically with the vertices of the line graph.
This can be addressed by adding 0.5 to both the lower and upper bounds of the linear scale's domain:
let xDomain = [
d3.min(testData1, d => d[0]-0.5),
d3.max(testData1, d => d[0]+0.5)
];
I've udpated your codepen with the relevant changes: codepen.
And in the event that that disappears, here is a snippet (the mouse over does not work for me for some reason in the snippet, it does in the codepen )
let width = 1000;
let height = 300;
let svg = d3.select(".wrapper-area-simple").append("svg")
.attr("width", width + 80)
.attr("height", height + 80)
.append('svg:g')
.attr('transform', 'translate(40, 30)');
let testData1 = [
[ 1, 10],
[ 2, 30],
[ 8, 34],
[ 9, 26],
[13, 37],
[14, 12],
[20, 23],
[22, 16],
];
let xDomain = [
d3.min(testData1, d => d[0]-0.5),
d3.max(testData1, d => d[0]+0.5)
];
let x = d3.scaleLinear()
.rangeRound([0, width])
.domain(xDomain);
let y = d3.scaleLinear()
.range([height, 0])
.domain(d3.extent(testData1, d => d[1]));
let line = d3.line()
.x(d => x(d[0]))
.y(d => y(d[1]));
svg.append('svg:g')
.datum(testData1)
.append('svg:path')
.attr('d', line)
.attr('fill', 'none')
.attr('stroke', '#000');
let xAxis = d3.axisBottom(x)
.ticks(testData1.length);
svg.append('svg:g')
.call(xAxis)
.attr('transform', `translate(0, 300)`);
let xBand = d3.scaleBand()
.domain(d3.range(d3.min(testData1, d => d[0]),
d3.max(testData1, d => d[0]+1)
))
.rangeRound([0, width]);
svg.append('svg:g')
.selectAll('rect')
.data(testData1)
.enter()
.append('svg:rect')
.attr('x', d => xBand(d[0]))
.attr('width', xBand.bandwidth())
.attr('height', height)
.attr('fill', '#000')
.on('mouseover', function() {
d3.select(this).classed('over', true);
})
.on('mouseout', function() {
d3.select(this).classed('over', false);
});
svg {
border: 1px solid red;
}
rect {
opacity: .1;
}
rect.over {
opacity: .2;
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/4.5.0/d3.min.js"> </script>
<div class="wrapper-area-simple"></div>
Well, bad news for you: they will never match up (in your case). Let's see why.
This is your data:
let testData1 = [
[1, 10],
[2, 30],
[8, 34],
[9, 26],
[13, 37],
[14, 12],
[20, 23],
[22, 16],
];
As you can see, regarding the x coordinate, the line jumps from 1 to 2, but then from 2 to 8, from 8 to 9, and then from 9 to 13... That is, the x range intervals are not regular, evenly spaced. So far, so good.
However, when you pass the same data to the band scale, this is what it does: it divides the range ([0, width], which is basically the width) by testData1.length, that is, it divides the range by 8, and creates 8 equal intervals. Those are your bands, and that's the expected behaviour of the band scale. From the documentation:
Discrete output values are automatically computed by the scale by dividing the continuous range into uniform bands. (emphasis mine)
One solution here is simply using another linear scale:
let xBand = d3.scaleLinear()
.domain(xDomain)
.rangeRound([0, width]);
And this math to the width of the rectangles:
.attr('width', (d,i) => testData1[i+1] ? xBand(testData1[i+1][0]) - xBand(d[0]) : 0)
Here is your updated Codepen: http://codepen.io/anon/pen/MJdGyY?editors=0010

d3.v4: How to set ticks every Math.PI/2

In the d3.v4 documentation the following is stated:
To generate ticks every fifteen minutes with a time scale, say:
axis.tickArguments([d3.timeMinute.every(15)]);
Is there a similar approach that can be used with values other than time? I am plotting sine and cosine curves, so I'd like the ticks to begin at -2*Math.PI, end at 2*Math.PI, and between these values I'd like a tick to occur every Math.PI/2. I could, of course, explicitly compute the tick values and supply them to the tickValue method; however, if there is a simpler way to accomplish this, as in the time-related example quoted above, I'd prefer to use that.
Setting the end ticks and specifying the precise space of the ticks in a linear scale is a pain in the neck. The reason is that D3 axis generator was created in such a way that the ticks are automatically generated and spaced. So, what is handy for someone who doesn't care too much for customisation can be a nuisance for those that want a precise customisation.
My solution here is a hack: create two scales, one linear scale that you'll use to plot your data, and a second scale, that you'll use only to make the axis and whose values you can set at your will. Here, I choose a scalePoint() for the ordinal scale.
Something like this:
var realScale = d3.scaleLinear()
.range([10,width-10])
.domain([-2*Math.PI, 2*Math.PI]);
var axisScale = d3.scalePoint()
.range([10,width-10])
.domain(["-2 \u03c0", "-1.5 \u03c0", "-\u03c0", "-0.5 \u03c0", "0",
"0.5 \u03c0", "\u03c0", "1.5 \u03c0", "2 \u03c0"]);
Don't mind the \u03c0, that's just π (pi) in Unicode.
Check this demo, hover over the circles to see their positions:
var width = 500,
height = 150;
var data = [-2, -1, 0, 0.5, 1.5];
var realScale = d3.scaleLinear()
.range([10, width - 10])
.domain([-2 * Math.PI, 2 * Math.PI]);
var axisScale = d3.scalePoint()
.range([10, width - 10])
.domain(["-2 \u03c0", "-1.5 \u03c0", "-\u03c0", "-0.5 \u03c0", "0", "0.5 \u03c0", "\u03c0", "1.5 \u03c0", "2 \u03c0"]);
var svg = d3.select("body").append("svg")
.attr("width", width)
.attr("height", height);
var circles = svg.selectAll("circle").data(data)
.enter()
.append("circle")
.attr("r", 8)
.attr("fill", "teal")
.attr("cy", 50)
.attr("cx", function(d) {
return realScale(d * Math.PI)
})
.append("title")
.text(function(d) {
return "this circle is at " + d + " \u03c0"
});
var axis = d3.axisBottom(axisScale);
var gX = svg.append("g")
.attr("transform", "translate(0,100)")
.call(axis);
<script src="https://d3js.org/d3.v4.min.js"></script>
I was able to implement an x axis in units of PI/2, under program control (not manually laid out), by targetting the D3 tickValues and tickFormat methods. The call to tickValues sets the ticks at intervals of PI/2. The call to tickFormat generates appropriate tick labels. You can view the complete code on GitHub:
https://github.com/quantbo/sine_cosine
My solution is to customise tickValues and tickFormat. Only 1 scale is needed, and delegate d3.ticks function to give me the new tickValues that are proportional to Math.PI.
const piChar = String.fromCharCode(960);
const tickFormat = val => {
const piVal = val / Math.PI;
return piVal + piChar;
};
const convertSIToTrig = siDomain => {
const trigMin = siDomain[0] / Math.PI;
const trigMax = siDomain[1] / Math.PI;
return d3.ticks(trigMin, trigMax, 10).map(v => v * Math.PI);
};
const xScale = d3.scaleLinear().domain([-Math.PI * 2, Math.PI * 2]).range([0, 600]);
const xAxis = d3.axisBottom(xScale)
.tickValues(convertSIToTrig(xScale.domain()))
.tickFormat(tickFormat);
This way if your xScale's domain were changed via zoom/pan, the new tickValues are nicely generated with smaller/bigger interval

sine wave not going to the amplitude height in d3.js

Here is a jsbin of what I have so far.
My sine wave is not going to the y value of 1 or -1, i.e the amplitude.
My yScale is defined like this:
const yScaleAxis = d3.scale.linear()
.domain([-1, 1])
.range([radius, -radius]);
And I am creating the values like this:
const xValues = [0, 1.57, 3.14, 4.71, 6.28]; // 0 to 2PI
const sineData = xValues.map((x) => {
console.log(Math.sin(x));
return {x: x, y: Math.sin(x)};
});
The values for y are logged as:
0
0.9999996829318346
0.0015926529164868282
-0.999997146387718
-0.0031853017931379904
I then use the scale to set the values:
const sine = d3.svg.line()
.interpolate('basis')
.x( (d) => {return xScaleAxis(d.x);})
.y( (d) => {return yScaleAxis(d.y);});
circleGroup.append('path')
.datum(sineData)
.attr('class', 'sine-curve')
.attr('d', sine);
But as you can see in the jsbin the amplitude of the sine wave is not reaching 1 or -1 and I am not sure why.
Change the line interpolation method to monotone, basis corresponds to a B-spline
More info about the interpolation options provided by d3

how do you draw linear line in scatter plot with d3.js

I am looking to implement ggplot2 type of graphs using d3.js library for interactivey purpose. I love ggplot2 but users are interested in interactive graphs. I've been exploring d3.js library and there seems to be lots of different graph capability but I really did not see any statistical graphs like linear line, forecast etc. Given a scatter plot, is it possible to also add linear line to the graph.
I have this sample script that draws scatter plot. How would I add linear line to this graph in d3.js?
// data that you want to plot, I've used separate arrays for x and y values
var xdata = [5, 10, 15, 20],
ydata = [3, 17, 4, 6];
// size and margins for the chart
var margin = {top: 20, right: 15, bottom: 60, left: 60}
, width = 960 - margin.left - margin.right
, height = 500 - margin.top - margin.bottom;
// x and y scales, I've used linear here but there are other options
// the scales translate data values to pixel values for you
var x = d3.scale.linear()
.domain([0, d3.max(xdata)]) // the range of the values to plot
.range([ 0, width ]); // the pixel range of the x-axis
var y = d3.scale.linear()
.domain([0, d3.max(ydata)])
.range([ height, 0 ]);
// the chart object, includes all margins
var chart = d3.select('body')
.append('svg:svg')
.attr('width', width + margin.right + margin.left)
.attr('height', height + margin.top + margin.bottom)
.attr('class', 'chart')
// the main object where the chart and axis will be drawn
var main = chart.append('g')
.attr('transform', 'translate(' + margin.left + ',' + margin.top + ')')
.attr('width', width)
.attr('height', height)
.attr('class', 'main')
// draw the x axis
var xAxis = d3.svg.axis()
.scale(x)
.orient('bottom');
main.append('g')
.attr('transform', 'translate(0,' + height + ')')
.attr('class', 'main axis date')
.call(xAxis);
// draw the y axis
var yAxis = d3.svg.axis()
.scale(y)
.orient('left');
main.append('g')
.attr('transform', 'translate(0,0)')
.attr('class', 'main axis date')
.call(yAxis);
// draw the graph object
var g = main.append("svg:g");
g.selectAll("scatter-dots")
.data(ydata) // using the values in the ydata array
.enter().append("svg:circle") // create a new circle for each value
.attr("cy", function (d) { return y(d); } ) // translate y value to a pixel
.attr("cx", function (d,i) { return x(xdata[i]); } ) // translate x value
.attr("r", 10) // radius of circle
.style("opacity", 0.6); // opacity of circle
To add a line to your plot, all that you need to do is to append some line SVGs to your main SVG (chart) or to the group that contains your SVG elements (main).
Your code would look something like the following:
chart.append('line')
.attr('x1',x(10))
.attr('x2',x(20))
.attr('y1',y(5))
.attr('y2',y(10))
This would draw a line from (10,5) to (20,10). You could similarly create a data set for your lines and append a whole bunch of them.
One thing you might be interested in is the SVG path element. This is more common for lines than drawing one straight segment at a time. The documentation is here.
On another note you may find it easier to work with data in d3 if you create it all as one object. For example, if your data was in the following form:
data = [{x: 5, y:3}, {x: 10, y:17}, {x: 15, y:4}, {x: 20, y:6}]
You could use:
g.selectAll("scatter-dots")
.data(ydata) // using the values in the ydata array
.enter().append("svg:circle") // create a new circle for each value
.attr("cy", function (d) { return y(d.y); } ) //set y
.attr("cx", function (d,i) { return x(d.x); } ) //set x
This would eliminate potentially messy indexing if your data gets more complex.
UPDATE: Here is the relevant block: https://bl.ocks.org/HarryStevens/be559bed98d662f69e68fc8a7e0ad097
I wrote this function to calculate a linear regression from data, formatted as JSON.
It takes 5 parameters:
1) Your data
2) The column name of the data plotted on your x-axis
3) The column name of the data plotted on your y-axis
4) The minimum value of your x-axis
5) The minimum value of your y-axis
I got the formula for calculating a linear regression from http://classroom.synonym.com/calculate-trendline-2709.html
function calcLinear(data, x, y, minX, minY){
/////////
//SLOPE//
/////////
// Let n = the number of data points
var n = data.length;
var pts = [];
data.forEach(function(d,i){
var obj = {};
obj.x = d[x];
obj.y = d[y];
obj.mult = obj.x*obj.y;
pts.push(obj);
});
// Let a equal n times the summation of all x-values multiplied by their corresponding y-values
// Let b equal the sum of all x-values times the sum of all y-values
// Let c equal n times the sum of all squared x-values
// Let d equal the squared sum of all x-values
var sum = 0;
var xSum = 0;
var ySum = 0;
var sumSq = 0;
pts.forEach(function(pt){
sum = sum + pt.mult;
xSum = xSum + pt.x;
ySum = ySum + pt.y;
sumSq = sumSq + (pt.x * pt.x);
});
var a = sum * n;
var b = xSum * ySum;
var c = sumSq * n;
var d = xSum * xSum;
// Plug the values that you calculated for a, b, c, and d into the following equation to calculate the slope
// m = (a - b) / (c - d)
var m = (a - b) / (c - d);
/////////////
//INTERCEPT//
/////////////
// Let e equal the sum of all y-values
var e = ySum;
// Let f equal the slope times the sum of all x-values
var f = m * xSum;
// Plug the values you have calculated for e and f into the following equation for the y-intercept
// y-intercept = b = (e - f) / n = (14.5 - 10.5) / 3 = 1.3
var b = (e - f) / n;
// return an object of two points
// each point is an object with an x and y coordinate
return {
ptA : {
x: minX,
y: m * minX + b
},
ptB : {
y: minY,
x: (minY - b) / m
}
}
}

d3.js ticks function giving more elements than needed

I have this simple linear scale:
var x = d3.scale.linear().domain([0, 250]);
x.ticks(6), as expected, returns:
[0, 50, 100, 150, 200, 250]
However, x.ticks(11) returns:
[0, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240]
When what I want is:
[0, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250]
How do I fix this?
I had a similar issue with ordinal scales, I simply wrote some code to pick evenly spaced intervals in my data. Since I wanted it to always choose the first and last data element on the axis, I calculate the middle part only. Since some things do not divide evenly, rather than having the residual in one or two bins, I spread it out across the bins as I go; until there is no more residual.
There is probably a simpler way to accomplish this but here's what I did:
function getTickValues(data, numValues, accessor)
{
var interval, residual, tickIndices, last, i;
if (numValues <= 0)
{
tickIndices = [];
}
else if (numValues == 1)
{
tickIndices = [ Math.floor(numValues/2) ];
}
else
{
// We have at least 2 ticks to display.
// Calculate the rough interval between ticks.
interval = Math.floor(data.length / (numValues-1));
// If it's not perfect, record it in the residual.
residual = Math.floor(data.length % (numValues-1));
// Always label our first datapoint.
tickIndices = [0];
// Set stop point on the interior ticks.
last = data.length-interval;
// Figure out the interior ticks, gently drift to accommodate
// the residual.
for (i=interval; i<last; i+=interval)
{
if (residual > 0)
{
i += 1;
residual -= 1;
}
tickIndices.push(i);
}
// Always graph the last tick.
tickIndices.push(data.length-1);
}
if (accessor)
{
return tickIndices.map(function(d) { return accessor(d); });
}
return tickIndices.map(function(i) { return data[i]; });
}
You call the function via:
getTickvalues(yourData, numValues, [optionalAccessor]);
Where yourData is your array of data, numvalues is the number of ticks you want. If your array contains a complex datastructure then the optional accessor comes in handy.
Lastly, you then feed this into your axis. Instead of ticks(numTicks) which is only a hint to d3 apparently, you call tickValues() instead.
I learned the hard way that your tickValues have to match your data exactly for ordinal scales. This may or may not be as helpful for linear scales, but I thought I'd share it anyways.
Hope this helps.
Pat
You can fix this by replacing the x.ticks(11) with your desired array.
So if your code looks like this and x is your linear scale:
chart.selectAll("line")
.data(x.ticks(11))
.enter()
.append("line")
.attr("x1", x)
.attr("x2", x)
.attr("y1", 0)
.attr("y2",120)
.style("stroke", "#CCC");
You can replace x.ticks(11) with your array:
var desiredArray = [0, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250]
chart.selectAll("line")
.data(desiredArray)
.enter()
.append("line")
.attr("x1", x)
.attr("x2", x)
.attr("y1", 0)
.attr("y2",120)
.style("stroke", "#CCC");
The linear scale will automatically place your desired axes based on your input. The reason why the ticks() isn't giving you your desired separation is because d3 just treats ticks() as a suggestion.
axis.tickvalues((function(last, values) {
var myArray = [0];
for(var i = 1; i < values; i++) {
myArray.push(last*i/(values-1))
}
return myArray;
})(250, 11));
This should give you an evenly spaced out array for specifying the number of tick values you want in a particular range.

Resources