display the number of distinct items with data count widget - dc.js

I have a list of items, some of them have several rows for the same item (different variants of the same item).
I want to count how many items exists in total, and how many are selected. however, when I'm using the data count widget, it only works with the number of rows by default, but can't figure out how to change that behaviour.
var data=[{"item":"Category A","color":"blue"},
{"item":"Category B","color":"blue"},
{"item":"Category A","color":"pink"}];
var ndx = crossfilter(data);
var dimension = ndx.dimension(function(d) {return d.item;});
var all = ndx.groupAll();
dc.dataCount('.dc-data-count')
.dimension(ndx)
.group(all)
.html ({some: "%filter-count selected out of %total-count items <a class='btn btn-default btn-xs' href='javascript:dc.filterAll(); dc.renderAll();' role='button'>Reset filters</a>", all: "%total-count items. Click on the graphs to filter them"})
According to dc doc, the only dimension possible is the entire data set and the only group is the one returned by dimension.groupAll()
Is there a workaround to count something else than the number of records (eg. ndx.dimension(function(d) {return d.item;});?

Well, it's pretty ugly but you could easily hack it to produce other behavior.
If you look at the code, it's only calling .size() on the dimension, and only .value() on the group. That's why the documents say that only a groupAll is appropriate; regular groups don't have .value().
https://github.com/dc-js/dc.js/blob/develop/src/data-count.js#L93-94
But it calls no other methods on the dimension or group, so you can fake this pretty easily:
chart.dimension({size: function() {
return 42; // calculate something here
} });
It might be somewhat more appropriate to use two number display widgets instead, but this should work.

Related

Using DC.js datatable to create a table with aggregated data by year

I've been trying to use dc.js and crossfilter to both build charts and tables from a certain dataset.
So far building charts works fine, but I want to use the datatable functionality to build a small html table to summarize the data as follows:
|Year|TotalEmployees|
|2015|555|
|2016|666|
|2017|777|
My dataset has around 20 000 rows, here's a sample of the data:
var data = [
{"Year":"2015","Category":"1","NbEmployee":"51"},
{"Year":"2015","Category":"2","NbEmployee":"31"},
{"Year":"2015","Category":"3","NbEmployee":"14"}
{"Year":"2016","Category":"1","NbEmployee":"51"},
{"Year":"2016","Category":"2","NbEmployee":"55"},
{"Year":"2016","Category":"3","NbEmployee":"65"},
{"Year":"2017","Category":"1","NbEmployee":"76"},
{"Year":"2017","Category":"2","NbEmployee":"98"},
];
So far this piece of code returns one row of result per row of data, and although it feels like it should be a simple manipulation, I can't figure out the right syntax to build a summarized table with one row per year:
var ndx = crossfilter(data);
var tableDim = ndx.dimension(function(d) {
return d.Year;
});
var datatable = dc.dataTable("#dc-data-table");
datatable
.dimension(tableDim)
.group(function(d) {
d.NbEmployee += d.NbEmployee;
return d.Year;
})
.columns([
function(d) {return d.Year;},
function(d) {return d.NbEmployee;},
]);
I've tried countless times to apply the
.group().reduceSum()
functions to the dimension into a variable and then passing it to the .group() parameter, but I always end up with a compilation error, I'm pretty clueless right now.
The SQL translation of what I'm looking for is this:
SELECT
Year,
NbEmp = SUM(NbEmploye)
FROM DB
GROUP BY
Year
ORDER BY
Year
Thanks in advance for your help!
The dataTable's group is not a group - yes, pretty confusing to use this method to mean something completely different from what it means in all the other charts. Here, it's a function, everywhere else it's a crossfilter object.
The dataTable is unique out of the dc.js charts in that it reads its data from the .dimension() object. This is because it displays the raw rows of data, rather than aggregated data, by default.
However, it can be used to display a group instead. This works because the only method it actually calls on the dimension is .top(), if you choose to display in descending order.
If you want to display in ascending order, you can use a fake group to produce an object which supports the .bottom() method.

click in datatable to filter other charts (dc.js)

I need to filter other charts when I click a row in the datatable.
I did
my_table.on('pretransition', function (table) {
table.selectAll('td.dc-table-column')
.on('click',function(d){
table.filter(d.key)
dc.redrawAll();
})
});
but nothing happens in the other charts.
Can you help me, please?
If the table dimension is a dimension...
The data that ordinarily populates a data table is the raw rows from the original data set, not key/value pairs.
So it is likely that d.key is undefined.
I'd advise you first to stick
console.log(d)
into your click handler to see what your data looks like, to make sure d.key is valid.
Second, remember that a chart filters through its dimension. So you will need to pass a value to table.filter() that is a valid key for your dimension, and then it will filter out all rows for which the key is different. This may not be just the one row that you chose.
Typically a table dimension is chosen for the way it orders the values for the rows. You might actually want to filter some other dimension. But hopefully this is enough to get you started.
But what if the the table dimension is a group?
The above technique will only work if your table takes a crossfilter dimension as its dimension. If, as in the fiddle you linked in the comments, you're using a group as your dimension, that object has no .filter() method, so the table.filter() method won't do anything.
If you only need to filter the one item that was clicked, you could just do
foodim.filter(d.key)
This has an effect but it's not that useful.
If you need the toggle functionality used in dc's ordinal charts, you'll need to simulate it. It's not all that complicated:
// global
var filterKeys = [];
// inside click event
if(filterKeys.indexOf(d.key)===-1)
filterKeys.push(d.key);
else
filterKeys = filterKeys.filter(k => k != d.key);
if(filterKeys.length === 0)
foodim.filter(null);
else
foodim.filterFunction(function(d) {
return filterKeys.indexOf(d) !== -1;
})
Example fiddle: https://jsfiddle.net/gordonwoodhull/kfmfkLj0/9/

Get only non-filtered data from dc.js chart (dimension / group)

So this is a question regarding a rather specific problem. As I know from Gordon, main contributor of dc.js, there is no support for elasticY(true) function for logarithmic scales.
So, after knowing this, I tried to implement my own solution, by building a workaround, inside dc.js's renderlet event. This event is always triggered by a click of the user onto the barchart. What I wanted to do is this:
let groupSize = this.getGroupSize(fakeGroup, this.yValue);
let maximum = group.top(1)[0].value;
let minimum = group.top(groupSize)[groupSize-1].value;
console.log(minimum, maximum);
chart.y(d3.scale.log().domain([minimum, maximum])
.range(this.height, 0)
.nice()
.clamp(true));
I thought, that at this point the "fakeGroup" (which is just group.top(50)) contains only the data points that are NOT filtered out after the user clicked somewhere. However, this group always contains all data points that are in the top 50 and doesn't change on filter events.
What I really wanted is get all data points that are NOT filtered out, to get a new maximum and minimum for the yScale and rescale the yAxis accordingly by calling chart.y(...) again.
Is there any way to get only data rows that are still in the chart and not filtered out. I also tried using remove_empty_bins(group) but didn't have any luck with that. Somewhere is always all() or top() missing, even after giving remove_empty_bins both functions.
This is how i solved it:
I made a function called rescale(), which looks like this:
rescale(chart, group, fakeGroup) {
let groupSize = this.getGroupSize(fakeGroup, this.yValue);
let minTop = group.top(groupSize)[groupSize-1].value;
let minimum = minTop > 0 ? minTop : 0.0001;
let maximum = group.top(1)[0].value;
chart.y(d3.scale.log().domain([minimum, maximum])
.range(this.height, 0)
.nice()
.clamp(true));}
I think the parameters are pretty self-explanatory, I just get my chart, the whole group as set by dimension.group.reduceSum and a fake group I created, which contains the top 50 elements, to reduce bar count of my chart.
The rescale() method is called in the event listener
chart.on('preRedraw', (chart) => {
this.rescale(chart, group, fakeGroup);
}
So what I do is re-defining (re-setting min and max values regarding filtered data) the charts yAxis everytime the chart gets redrawn, which happens to also be every time one of my charts is filtered. So now, the scale always fits the filtered data the chart contains after filtering another chart.

dataCount graph filtered by a dimension

I have a list of participants to various events as the data source
eventid,participant_name
42,xavier
42,gordon
11,john
...
by default, dataCount will say they are 3 participants, I need to display the number of events (so 2)
I tried creating a dimension
var event = ndx.dimension(function(d) {
return d.eventid;
})
but can't manage to use it in dataCount
dc.dataCount(".dc-data-count")
//.dimension(ndx) //working, but counts participants
.dimension(event) // not working
How do I do that?
It sounds to me like you are trying to use the data count widget to count group bins rather than rows.
The data count widget is only designed to count records, not keys or groups or anything else. But you could fake out the objects, since the widget is calling just .size() on the dimension, and just .value() on the group.
But what to put there? The value is actually sort of easy, since it's the count of groups with non-zero value:
var eventGroup = event.group();
widget.group({value: function() {
return eventGroup.all().filter(function(kv) { return kv.value>0; }).length;
})
But what is the size? Well, according to the crossfilter documentation, group.size actually returns what we want, "the number of distinct values in the group, independent of any filters; the cardinality."
So oddly, it seems like
widget.dimension(eventGroup)
should work. Of course, I haven't tested any of this, so please comment if this doesn't work!
(Sigh, what I wouldn't do for a real data model in dc.js. It is rather confusing that the "dimension" for this widget is actually the crossfilter object. Another place where there is kind of a weird economy of methods, like the dimension's group, which is just a function.)

How to show "missing" rows in a rowChart using crossfilter and dc.js?

I'm using code similar to that in the dc.js annotated example:
var ndx = crossfilter(data);
...
var dayName=["0.Sun","1.Mon","2.Tue","3.Wed","4.Thu","5.Fri","6.Sat"];
var dayOfWeek = ndx.dimension(function (d) {
var day = d.dd.getDay();
return dayName[day];
});
var dayOfWeekGroup = dayOfWeek.group();
var dayOfWeekChart = dc.rowChart("#day-of-week-chart");
dayOfWeekChart.width(180)
.height(180)
.group(dayOfWeekGroup)
.label(function(d){return d.key.substr(2);})
.dimension(dayOfWeek);
The issue I've got is that only days of the week present in the data are displayed in my rowChart, and there's no guarantee every day will be represented in all of my data sets.
This is desirable behaviour for many types of categories, but it's a bit disconcerting to omit them for short and well-known lists like day and month names and I'd rather an empty row was included instead.
For a barChart, I can use .xUnits(dc.units.ordinal) and something like .x(d3.scale.ordinal.domain(dayName)).
Is there some way to do the same thing for a rowChart so that all days of the week are displayed, whether present in data or not?
From my understanding of the crossfilter library, I need to do this at the chart level, and the dimension is OK as is. I've been digging around in the dc.js 1.6.0 api reference, and the d3 scales documentation but haven't had any luck finding what I'm looking for.
Solution
Based on #Gordon's answer, I've added the following function:
function ordinal_groups(keys, group) {
return {
all: function () {
var values = {};
group.all().forEach(function(d, i) {
values[d.key] = d.value;
});
var g = [];
keys.forEach(function(key) {
g.push({key: key,
value: values[key] || 0});
});
return g;
}
};
}
Calling this as follows will fill in any missing rows with 0s:
.group(ordinal_groups(dayNames, dayOfWeekGroup))
Actually, I think you are better off making sure that the groups exist before passing them off to dc.js.
One way to do this is the "fake group" pattern described here:
https://github.com/dc-js/dc.js/wiki/FAQ#filter-the-data-before-its-charted
This way you can make sure the extra entries are created every time the data changes.
Are you saying that you tried adding the extra entries to the ordinal domain and they still weren't represented in the row chart, whereas this did work for bar charts? That sounds like a bug to me. Specifically, it looks like support for ordinal domains needs to be added to the row chart.

Resources