dc.js lineChart - fill missing dates and show zero where no data - dc.js

I have a dc.js lineChart that is showing the number of events per hour. I would like rather than joining the line between two known values the value should be shown as zero.
So for the data below I would like to have the line drop to zero for 10AM
{datetime: "2018-05-01 09:10:00", event: 1}
{datetime: "2018-05-01 11:30:00", event: 1}
{datetime: "2018-05-01 11:45:00", event: 1}
{datetime: "2018-05-01 12:15:00", event: 1}
var eventsByDay = facts.dimension(function(d) { return d3.time.hour(d.datetime);});
var eventsByDayGroup = eventsByDay.group().reduceCount(function(d) { return d.datetime; });
I've had a look at defined but don't think that is right, I think I need to add the zero value into the data for each hour that has no data? However I'm not sure how to go about it and I can't seem to find an example of what I'm trying within dc.js
This other question does answer this but for d3.js and I'm unsure how to translate that - d3 linechart - Show 0 on the y-axis without passing in all points?
Can anyone point me in the right direction?
Thanks!

You are on the right track with ensure_group_bins but instead of knowing the required set of bins beforehand, in this case we need to calculate them.
Luckily d3 provides interval.range which returns an array of dates for every interval boundary between two dates.
Then we need to merge-sort that set with the bins from the original group. Perhaps I have over-engineered this slightly, but here is a function to do that:
function fill_intervals(group, interval) {
return {
all: function() {
var orig = group.all().map(kv => ({key: new Date(kv.key), value: kv.value}));
var target = interval.range(orig[0].key, orig[orig.length-1].key);
var result = [];
for(var oi = 0, ti = 0; oi < orig.length && ti < target.length;) {
if(orig[oi].key <= target[ti]) {
result.push(orig[oi]);
if(orig[oi++].key.valueOf() === target[ti].valueOf())
++ti;
} else {
result.push({key: target[ti], value: 0});
++ti;
}
}
if(oi<orig.length)
Array.prototype.push.apply(result, orig.slice(oi));
if(ti<target.length)
Array.prototype.push.apply(result, target.slice(ti).map(t => ({key: t, value: 0})));
return result;
}
}
}
Basically we iterate over both the original bins and the target bins, and take whichever is lower. If they are the same, then we increment both counters; otherwise we just increment the lower one.
Finally, when either array has run out, we append all remaining results from the other array.
Here is an example fiddle based on your code.
It's written in D3v4 but you should only have to change d3.timeHour in two places to d3.time.hour to use it with D3v3.
I'll add this function to the FAQ!

Related

Histogram based on "reduceSummed" groups

I have CSV data with the following pattern:
Quarter,productCategory,unitsSold
2018-01-01,A,21766
2018-01-01,B,10076
2018-01-01,C,4060
2018-04-01,A,27014
2018-04-01,B,12219
2018-04-01,C,4740
2018-07-01,A,29503
2018-07-01,B,13020
2018-07-01,C,5549
2018-10-01,A,3796
2018-10-01,B,15110
2018-10-01,C,6137
2019-01-01,A,25008
2019-01-01,B,11655
2019-01-01,C,4630
2019-04-01,A,31633
2019-04-01,B,14837
2019-04-01,C,5863
2019-07-01,A,33813
2019-07-01,B,15442
2019-07-01,C,6293
2019-10-01,A,35732
2019-10-01,B,19482
2019-10-01,C,6841
As you can see, there are 3 product categories sold every day. I can make a histogram and count how many Quarters are involved per bin of unitsSold. The problem here is that every Quarter is counted separately. What I would like is a histogram where the bins of unitsSold are already grouped with a reduceSum on the Quarter.
This would result in something like this:
Quarter, unitsSold
2018-01-01,35902
2018-04-01,43973
2018-07-01,48072
2018-10-01,25043
2019-01-01,41293
2019-04-01,52333
2019-07-01,55548
2019-10-01,62055
Where, based on the bins of unitsSold, a number of Quarters would fall into. For example a bin of 50.000 - 70.000 would count 3 Quarters (2019-04-01, 2019-07-01 and 2019-10-01)
Normally I would do something like this:
const histogramChart = new dc.BarChart('#histogram');
const histogramDim = ndx.dimension(d => Math.round(d.unitsSold / binSize) * binSize);
const histogramGroup = histogramDim.group().reduceCount();
But in the desired situation the histogram is kind of created on something that has already been "reducedSummed". Ending up in a barchart histogram like this (data does not match with this example):
How can this be done with dc.js/crossfilter.js?
Regrouping the data by value
I think the major difference between your question and this previous question is that you want to bin the data when you "regroup" it. (Sometimes this is called a "double reduce"... no clear names for this stuff.)
Here's one way to do that, using an offset and width:
function regroup(group, width, offset = 0) {
return {
all: function() {
const bins = {};
group.all().forEach(({key, value}) => {
const bin = Math.floor((value - offset) / width);
bins[bin] = (bins[bin] || 0) + 1;
});
return Object.entries(bins).map(
([bin, count]) => ({key: bin*width + offset, value: count}));
}
}
}
What we do here is loop through the original group and
map each value to its bin number
increment the count for that bin number, or start at 1
map the bins back to original numbers, with counts
Testing it out
I displayed your original data with the following chart (too lazy to figure out quarters, although I think it's not hard with recent D3):
const quarterDim = cf.dimension(({Quarter}) => Quarter),
unitsGroup = quarterDim.group().reduceSum(({unitsSold}) => unitsSold);
quarterChart.width(300)
.height(200)
.margins({left: 50, top: 0, right: 0, bottom: 20})
.dimension(quarterDim)
.group(unitsGroup)
.x(d3.scaleTime().domain([d3.min(data, d => d.Quarter), d3.timeMonth.offset(d3.max(data, d => d.Quarter), 3)]))
.elasticY(true)
.xUnits(d3.timeMonths);
and the new chart with
const rg = regroup(unitsGroup, 10000);
countQuartersChart.width(500)
.height(200)
.dimension({})
.group(rg)
.x(d3.scaleLinear())
.xUnits(dc.units.fp.precision(10000))
.elasticX(true)
.elasticY(true);
(Note the empty dimension, which disables filtering. Filtering may be possible but you have to map back to the original dimension keys so I’m skipping that for now.)
Here are the charts I get, which look correct at a glance:
Demo fiddle.
Adding filtering to the chart
To implement filtering on this "number of quarters by values" histogram, first let's enable filtering between the by-values chart and the quarters chart by putting the by-values chart on its own dimension:
const quarterDim2 = cf.dimension(({Quarter}) => Quarter),
unitsGroup2 = quarterDim2.group().reduceSum(({unitsSold}) => unitsSold);
const byvaluesGroup = regroup(unitsGroup2, 10000);
countQuartersChart.width(500)
.height(200)
.dimension(quarterDim2)
.group(byvaluesGroup)
.x(d3.scaleLinear())
.xUnits(dc.units.fp.precision(10000))
.elasticX(true)
.elasticY(true);
Then, we implement filtering with
countQuartersChart.filterHandler((dimension, filters) => {
if(filters.length === 0)
dimension.filter(null);
else {
console.assert(filters.length === 1 && filters[0].filterType === 'RangedFilter');
const range = filters[0];
const included_quarters = unitsGroup2.all()
.filter(({value}) => range[0] <= value && value < range[1])
.map(({key}) => key.getTime());
dimension.filterFunction(k => included_quarters.includes(k.getTime()));
}
return filters;
});
This finds all quarters in unitsGroup2 that have a value which falls in the range. Then it sets the dimension's filter to accept only the dates of those quarters.
Odds and ends
Quarters
D3 supports quarters with interval.every:
const quarterInterval = d3.timeMonth.every(3);
chart.xUnits(quarterInterval.range);
Eliminating the zeroth bin
As discussed in the comments, when other charts have filters active, there may end up being many quarters with less than 10000 units sold, resulting in a very tall zero bar which distorts the chart.
The zeroth bin can be removed with
delete bins[0];
before the return in regroup()
Rounding the by-values brush
If snapping to the bars is desired, you can enable it with
.round(x => Math.round(x/10000)*10000)
Otherwise, the filtered range can start or end inside of a bar, and the way the bars are colored when brushed is somewhat inaccurate as seen below.
Here's the new fiddle.

dc line chart with binned temporal data not displaying empty bins [duplicate]

I have a dc.js lineChart that is showing the number of events per hour. I would like rather than joining the line between two known values the value should be shown as zero.
So for the data below I would like to have the line drop to zero for 10AM
{datetime: "2018-05-01 09:10:00", event: 1}
{datetime: "2018-05-01 11:30:00", event: 1}
{datetime: "2018-05-01 11:45:00", event: 1}
{datetime: "2018-05-01 12:15:00", event: 1}
var eventsByDay = facts.dimension(function(d) { return d3.time.hour(d.datetime);});
var eventsByDayGroup = eventsByDay.group().reduceCount(function(d) { return d.datetime; });
I've had a look at defined but don't think that is right, I think I need to add the zero value into the data for each hour that has no data? However I'm not sure how to go about it and I can't seem to find an example of what I'm trying within dc.js
This other question does answer this but for d3.js and I'm unsure how to translate that - d3 linechart - Show 0 on the y-axis without passing in all points?
Can anyone point me in the right direction?
Thanks!
You are on the right track with ensure_group_bins but instead of knowing the required set of bins beforehand, in this case we need to calculate them.
Luckily d3 provides interval.range which returns an array of dates for every interval boundary between two dates.
Then we need to merge-sort that set with the bins from the original group. Perhaps I have over-engineered this slightly, but here is a function to do that:
function fill_intervals(group, interval) {
return {
all: function() {
var orig = group.all().map(kv => ({key: new Date(kv.key), value: kv.value}));
var target = interval.range(orig[0].key, orig[orig.length-1].key);
var result = [];
for(var oi = 0, ti = 0; oi < orig.length && ti < target.length;) {
if(orig[oi].key <= target[ti]) {
result.push(orig[oi]);
if(orig[oi++].key.valueOf() === target[ti].valueOf())
++ti;
} else {
result.push({key: target[ti], value: 0});
++ti;
}
}
if(oi<orig.length)
Array.prototype.push.apply(result, orig.slice(oi));
if(ti<target.length)
Array.prototype.push.apply(result, target.slice(ti).map(t => ({key: t, value: 0})));
return result;
}
}
}
Basically we iterate over both the original bins and the target bins, and take whichever is lower. If they are the same, then we increment both counters; otherwise we just increment the lower one.
Finally, when either array has run out, we append all remaining results from the other array.
Here is an example fiddle based on your code.
It's written in D3v4 but you should only have to change d3.timeHour in two places to d3.time.hour to use it with D3v3.
I'll add this function to the FAQ!

PieChart with all values joined

I'm newbie and I'm working on a dashboard. I want to show with a pie chart the total value of one dimension (100% when all the registers all selected, and change it with the other filters). I've tried it with groupAll() but it doesn't work. This code works but it shows the groups separate. How can I do this? Thanks a lot!!!
CSV
CausaRaiz,probabilidad,costeReparacion,costePerdidaProduccion,impacto,noDetectabilidad,criticidad,codigo,coste,duracion,recursosRequeridos
PR.CR01,2,1.3,1,1,1,2,AM.PR.01,1,2,Operarios
PR.CR02,4,2.3,3,2.5,2,20,AM.PR.02,2,3,Ingenieria
PR.CR03,4,3.3,4,3.5,4,25,AM.PR.03,3,4,Externos
PR.CR04,2,2.7,2,2,2,8,AM.PR.04,3,4,Externos
FR.CR01,3,2.9,3,2.5,3,22,AM.FR.01,4,5,Ingenieria
FR.CR02,2,2.1,2,2,2,8,AM.FR.02,4,3,Operarios
FR.CR03,1,1.7,1,1,1,1,AM.FR.03,3,5,Operarios
RF.CR01,1,1.9,2,2,3,6,AM.RF.01,3,5,Externos
RF.CR02,3,3.5,4,3.5,4,20,AM.RF.02,4,4,Ingenieria
RF.CR03,4,3.9,4,3.5,4,25,AM.RF.03,4,5,Operarios
Code working
var pieCri = dc.pieChart("#criPie")
var criDimension = ndx.dimension(function(d) { return +d.criticidad; });
var criGroup =criDimension.group().reduceCount();
pieCri
.width(270)
.height(270)
.innerRadius(20)
.dimension(criDimension)
.group(criGroup)
.on('pretransition', function(chart) {
chart.selectAll('text.pie-slice').text(function(d) {
return d.data.key + ' ' + dc.utils.printSingleValue((d.endAngle - d.startAngle) / (2*Math.PI) * 100) + '%';
})
});
pieCri.render();
I can show the total percentage with a number:
var critTotal = ndx.groupAll().reduceSum(function(d) { return +d.criticidad; });
var numbCriPerc = dc.numberDisplay("#criPerc");
numbCriPerc
.group(critTotal)
.formatNumber(d3.format(".3s"))
.valueAccessor( function(d) { return d/critTotalValue*100; } );
But I prefer in a pie chart to show the difference between all the registers and the selection.
If I understand your question correctly, you want to show a pie chart with exactly two slices: the count of items included, and the count of items excluded.
You're on the right track with using groupAll, which is great for taking a count of rows (or sum of a field) based on the current filters. There are just two parts missing:
finding the full total with no filters applied
putting the data in the right format for the pie chart to read it
This kind of preprocessing is really easy to do with a fake group, which will adapt as the filters change.
Here is one way to do it:
// takes a groupAll and produces a fake group with two key/value pairs:
// included: the total value currently filtered
// excluded: the total value currently excluded from the filter
// "includeKey" and "excludeKey" are the key names to give to the two pairs
// note: this must be constructed before any filters are applied!
function portion_group(groupAll, includeKey, excludeKey) {
includeKey = includeKey || "included";
excludeKey = excludeKey || "excluded";
var total = groupAll.value();
return {
all: function() {
var current = groupAll.value();
return [
{
key: includeKey,
value: current
},
{
key: excludeKey,
value: total - current
}
]
}
}
}
You'll construct a groupAll to find the total under the current filters:
var criGroupAll = criDimension.groupAll().reduceCount();
And you can construct the fake group when passing it to the chart:
.group(portion_group(criGroupAll))
Note: you must have no filters active when constructing the fake group this way, since it will grab the unfiltered total at that point.
Finally, I noticed that the way you were customizing pie chart labels, they would be shown even if the slice is empty. That looked especially bad in this example, so I fixed it like this:
.on('pretransition', function(chart) {
chart.selectAll('text.pie-slice').text(function(d) {
return d3.select(this).text() && (d.data.key + ' ' + dc.utils.printSingleValue((d.endAngle - d.startAngle) / (2*Math.PI) * 100) + '%');
})
});
This detects whether the label text is empty because of minAngleForLabel, and doesn't try to replace it in that case.
Example fiddle based on your code.

dc.js rowChart to Filter by max key

I have a dashboard where I'm showing Headcount over time. One is a line Graph that shows headcount over time period, the other is a rowChart that is split by HCLevel1 - that is simply there to allow users to filter.
I would like the rowChart to show Heads for the latest period within the date filter (rather than showing the full sum of heads for the full period which would be wrong).
I can do this by combining two fields into a dimension, but the problem with this is that when I use the rowChart to filter by business, I only see one month in the line chart - whereas I'd like to see the full period that's filtered. I can't work out how I could do this with a fake group, because the rowChart's dimension/key is HCLevel1.
My data is formatted like this:
var data = = [
{
"HCLevel1": "Commercial",
"HCLevel2": "Portfolio TH",
"Period": 201407,
"Heads": 720
},
I've tried to use this custom reduce (picked up from another SO question) but it doesn't work correctly (minus values, incorrect values etc).
function reduceAddAvgPeriods(p, v) {
if (v.Period in p.periodsArray) {
p.periodsArray[v.Period] += v.Heads;
} else {
p.periodsArray[v.Period] = 0;
p.periodCount++;
}
p.heads += v.Heads;
return p;
}
Currently, my jsfiddle example is combining 2 fields for the dimension, but as you can see, I can't then filter using the rowChart to show me the full period on the line chart.
I can use reductio to give me the average, but I'd like to provide actual Heads value for most recent date filtered.
https://jsfiddle.net/kevinelphick/4ybekqey/3/
I hope this is possible, any help would be much appreciated, thanks!
I glanced at this a few days ago, but it took me a little while to figure out. Tricky!
We can restrict the design by considering these two facts:
We want to filter the row chart by "Level". That's simply
var dimLevel = cf.dimension(function (d) { return d.HCLevel1 || ''; });
A group does not observe its own dimension's filters. So we probably want to use the dimension from #1 to produce the data (the group) for the row chart.
Given these two restrictions, maybe we can dimension and group by level, but inside the bins of the group, keep track of the periods that contribute to that bin?
This is a common pattern often used for stacked charts:
var levelPeriodGroup = dimLevel.group().reduce(
function(p, v) {
p[v.Period] = (p[v.Period] || 0) + v.Heads;
return p;
},
function(p, v) {
p[v.Period] -= v.Heads;
return p;
},
function() {
return {};
}
);
Here, we'll just 'peel off' the top stack, dropping any zeros:
function last_period(group, maxPeriod) {
return {
all: function() {
var max = maxPeriod();
return group.all().map(function(kv) {
return {key: kv.key, value: kv.value[max]};
}).filter(function(kv) {
return kv.value > 0;
});
}
};
}
To keep last_period somewhat general, maxPeriod is now a function, which we'll define like this:
function max_period() {
return dimPeriod.top(1)[0].Period;
}
Bringing it all together and supplying it to the row chart:
rowChart
.group(last_period(levelPeriodGroup, max_period))
.dimension(dimLevel)
.elasticX(true);
Since the period is no longer part of the labels of the chart, we can put it in a headline:
<h4>Last Period: <span id="last-period"></span></h4>
and update it whenever the row chart is drawn:
rowChart.on('pretransition', function(chart) {
d3.select('#last-period').text(max_period());
});

For NVD3 lineChart Remove Missing Values (to be able to interpolate)

I am using NVD3 to visualise data on economic inequality. The chart for the US is here: http://www.chartbookofeconomicinequality.com/inequality-by-country/USA/
These are two lineCharts on top of each other. The problem I have is that there are quite a lot of missing values and this causes two problems:
If I would not make sure that the missing values are not visualised the line Chart would connect all shown values with the missing values. Therefore I used the following to not have the missing values included in the line chart:
chart = nv.models.lineChart()
.x(function(d) { return d[0] })
.y(function(d) { return d[1]== 0 ? null : d[1]; })
But still if you hover over the x-axis you see that the missing values are shown in the tooltip on mouseover. Can I get rid of them altogether? Possibly using remove in NVD3?
The second problem is directly related to that. Now the line only connects values of the same series when there is no missing values in between. That means there are many gaps in the lines. Is it possible to connect the dots of one series even if there are missing values in between?
Thank you for your help!
As Lars showed, getting the graph to look the way you want is just a matter of removing the missing values from your data arrays.
However, you wouldn't normally want to do that by hand, deleting all the rows with missing values. You need to use an array filter function to remove the missing values from your data arrays.
Once you have the complete data array as an array of series objects, each with an array of values, this code should work:
//to remove the missing values, so that the graph
//will just connect the valid points,
//filter each data array:
data.forEach(function(series) {
series.values = series.values.filter(
function(d){return d.y||(d.y === 0);}
);
//the filter function returns true if the
//data has a valid y value
//(either a "true" value or the number zero,
// but not null or NaN)
});
Updated fiddle here: http://jsfiddle.net/xammamax/8Kk8v/
Of course, when you are constructing the data array from a csv where each series is a separate column, you can do the filtering at the same time as you create the array:
var chartdata = [];//initialize as empty array
d3.csv("top_1_L-shaped.csv", function(error, csv) {
if (error)
return console.log("there was an error loading the csv: " + error);
var columndata = ["Germany", "Switzerland", "Portugal",
"Japan", "Italy", "Spain", "France",
"Finland", "Sweden", "Denmark", "Netherlands"];
for (var i = 0; i < columndata.length; i++) {
chartdata[i].key = columndata[i];
chartdata[i].values = csv.map(function(d) {
return [+d["year"], +d[ columndata[i] ] ];
})
.filter(function(d){
return d[1]||(d[1] === 0);
});
//the filter is applied to the mapped array,
//and the results are assigned to the values array.
}
});

Resources