dataCount graph filtered by a dimension - dc.js

I have a list of participants to various events as the data source
eventid,participant_name
42,xavier
42,gordon
11,john
...
by default, dataCount will say they are 3 participants, I need to display the number of events (so 2)
I tried creating a dimension
var event = ndx.dimension(function(d) {
return d.eventid;
})
but can't manage to use it in dataCount
dc.dataCount(".dc-data-count")
//.dimension(ndx) //working, but counts participants
.dimension(event) // not working
How do I do that?

It sounds to me like you are trying to use the data count widget to count group bins rather than rows.
The data count widget is only designed to count records, not keys or groups or anything else. But you could fake out the objects, since the widget is calling just .size() on the dimension, and just .value() on the group.
But what to put there? The value is actually sort of easy, since it's the count of groups with non-zero value:
var eventGroup = event.group();
widget.group({value: function() {
return eventGroup.all().filter(function(kv) { return kv.value>0; }).length;
})
But what is the size? Well, according to the crossfilter documentation, group.size actually returns what we want, "the number of distinct values in the group, independent of any filters; the cardinality."
So oddly, it seems like
widget.dimension(eventGroup)
should work. Of course, I haven't tested any of this, so please comment if this doesn't work!
(Sigh, what I wouldn't do for a real data model in dc.js. It is rather confusing that the "dimension" for this widget is actually the crossfilter object. Another place where there is kind of a weird economy of methods, like the dimension's group, which is just a function.)

Related

dc.js - avoid data points animation when adding data to scatter plot

I'm trying to implement a live data visualization (i.e. with new data arriving periodically) using dc.js. The problem I'm having is the following - when new data is added to the plot, already existing points often start to "dance around", even though they were not changed. Can this be avoided?
The following fiddle illustrates this.
My guess is that crossfilter sorts data internally, which results in points moving on the chart for data items that changed their position (index) in the internal storage. Data is added in the following way:
var data = [];
var ndx = crossfilter(data)
setInterval(function() {
var value = ndx.size() + 1;
if (value > 50) {
return;
}
var newElement = {
x: myRandom(),
y: myRandom()
};
ndx.add([newElement]);
dc.redrawAll();
}, 1000);
Any ideas?
I stand by my comments above. dc.js should be fixed by binding the data using a key function, and probably the best way to deal with the problem is just to disable transitions on the scatterplot using .transitionDuration(0)
However, I was curious if it was possible to work around the current problems by keeping the group in a set order using a fake group. And it is indeed, at least for this example where there is no aggregation and we just want to display the original data points.
First, we add a third field, index, to the data. This has to order the data in the same order in which it comes in. As noted in the discussion above, the scatter plot is currently binding data by its index, so we need to keep the points in a set order; nothing should be inserted.
var newElement = {
index: value,
x: myRandom(),
y: myRandom()
};
Next, we have to preserve this index through the binning and aggregation. We could keep it either in the key or in the value, but keeping it in the key seems more fitting:
xyiDimension = ndx.dimension(function(d) {
return [+d.x, +d.y, d.index];
}),
xyiGroup = xyiDimension.group();
The original reduction didn't make sense to me, so I dropped it. We'll just use the default behavior, which counts the number of rows which fall into each bin. The counts should be 1 if included, or 0 if filtered out. Including the index in the key also ensures uniqueness, which the original keys were not guaranteed to have.
Now we can create a fake group that keeps everything sorted by index:
var xyiGroupSorted = {
all: function() {
var ret = xyiGroup.all().slice().sort((a,b) => a.key[2] - b.key[2]);
return ret;
}
}
This will fetch the original data whenever it's requested by the chart, create a copy of the array (because the original is owned by crossfilter), and sort it to return it to the correct order.
And voila, we have a scatter plot that behaves the way it should, even though the data has gone through crossfilter.
Fork of your fiddle: https://jsfiddle.net/gordonwoodhull/mj81m42v/13/
[After all this, maybe we shouldn't have given the data to crossfilter in the first place! We could have just created a fake group which exposes the original data. But maybe there's some use to this technique. At least it proves that there's almost always a way to work around any problems in dc.js & crossfilter.]

click in datatable to filter other charts (dc.js)

I need to filter other charts when I click a row in the datatable.
I did
my_table.on('pretransition', function (table) {
table.selectAll('td.dc-table-column')
.on('click',function(d){
table.filter(d.key)
dc.redrawAll();
})
});
but nothing happens in the other charts.
Can you help me, please?
If the table dimension is a dimension...
The data that ordinarily populates a data table is the raw rows from the original data set, not key/value pairs.
So it is likely that d.key is undefined.
I'd advise you first to stick
console.log(d)
into your click handler to see what your data looks like, to make sure d.key is valid.
Second, remember that a chart filters through its dimension. So you will need to pass a value to table.filter() that is a valid key for your dimension, and then it will filter out all rows for which the key is different. This may not be just the one row that you chose.
Typically a table dimension is chosen for the way it orders the values for the rows. You might actually want to filter some other dimension. But hopefully this is enough to get you started.
But what if the the table dimension is a group?
The above technique will only work if your table takes a crossfilter dimension as its dimension. If, as in the fiddle you linked in the comments, you're using a group as your dimension, that object has no .filter() method, so the table.filter() method won't do anything.
If you only need to filter the one item that was clicked, you could just do
foodim.filter(d.key)
This has an effect but it's not that useful.
If you need the toggle functionality used in dc's ordinal charts, you'll need to simulate it. It's not all that complicated:
// global
var filterKeys = [];
// inside click event
if(filterKeys.indexOf(d.key)===-1)
filterKeys.push(d.key);
else
filterKeys = filterKeys.filter(k => k != d.key);
if(filterKeys.length === 0)
foodim.filter(null);
else
foodim.filterFunction(function(d) {
return filterKeys.indexOf(d) !== -1;
})
Example fiddle: https://jsfiddle.net/gordonwoodhull/kfmfkLj0/9/

Get only non-filtered data from dc.js chart (dimension / group)

So this is a question regarding a rather specific problem. As I know from Gordon, main contributor of dc.js, there is no support for elasticY(true) function for logarithmic scales.
So, after knowing this, I tried to implement my own solution, by building a workaround, inside dc.js's renderlet event. This event is always triggered by a click of the user onto the barchart. What I wanted to do is this:
let groupSize = this.getGroupSize(fakeGroup, this.yValue);
let maximum = group.top(1)[0].value;
let minimum = group.top(groupSize)[groupSize-1].value;
console.log(minimum, maximum);
chart.y(d3.scale.log().domain([minimum, maximum])
.range(this.height, 0)
.nice()
.clamp(true));
I thought, that at this point the "fakeGroup" (which is just group.top(50)) contains only the data points that are NOT filtered out after the user clicked somewhere. However, this group always contains all data points that are in the top 50 and doesn't change on filter events.
What I really wanted is get all data points that are NOT filtered out, to get a new maximum and minimum for the yScale and rescale the yAxis accordingly by calling chart.y(...) again.
Is there any way to get only data rows that are still in the chart and not filtered out. I also tried using remove_empty_bins(group) but didn't have any luck with that. Somewhere is always all() or top() missing, even after giving remove_empty_bins both functions.
This is how i solved it:
I made a function called rescale(), which looks like this:
rescale(chart, group, fakeGroup) {
let groupSize = this.getGroupSize(fakeGroup, this.yValue);
let minTop = group.top(groupSize)[groupSize-1].value;
let minimum = minTop > 0 ? minTop : 0.0001;
let maximum = group.top(1)[0].value;
chart.y(d3.scale.log().domain([minimum, maximum])
.range(this.height, 0)
.nice()
.clamp(true));}
I think the parameters are pretty self-explanatory, I just get my chart, the whole group as set by dimension.group.reduceSum and a fake group I created, which contains the top 50 elements, to reduce bar count of my chart.
The rescale() method is called in the event listener
chart.on('preRedraw', (chart) => {
this.rescale(chart, group, fakeGroup);
}
So what I do is re-defining (re-setting min and max values regarding filtered data) the charts yAxis everytime the chart gets redrawn, which happens to also be every time one of my charts is filtered. So now, the scale always fits the filtered data the chart contains after filtering another chart.

Using multiple datasets in dc.js

I have more than 5 normalized tables with one common dimension (primary key).
I don't want to combine them into single dataset and plot graphs.
I have created individual crossfilter objects to load data.
When any graph of the belonging to respected crossfilter object filtered,
I retrieve the filter using (primary key) in following way
rowchart.on("filtered",function(){
var filter=dimension.group().all().filter(function(d){return d.value>0}).map(function(d){return d.key});
}
Then this filter is passed on common dimension of all other crossfilter's object.
This implementation works fine for any two objects.
But when any other chart belonging to other crossfilter object filtered, it resets all the dimensions of all objects.
Is there any better way to implement this use case?
One way to do this, if you're able to look up rows by the primary key, is to tell crossfilter that your data is just a set of keys, and then define your dimension and group functions to actually do the table lookup.
E.g. for the simple example where you have arrays A and B of data, of the same length, and the primary key is the index, do
var ndx = crossfilter(d3.range(0, A.length));
var dateDim = ndx.dimension(function(i) { return A[i].date; });
var nameDim = ndx.dimension(function(i) { return B[i].name; });
Similarly, for the group reductions, refer to the data in the same way, since the reduce functions take a "row" from the main crossfilter. Say you're reducing on sum of salaries in B:
var salaryGroup = nameDim.reduceSum(function(i) { return B[i].salary; });
I do this in a situation where my data is column-major instead of row-major (R dataframes), and it works great.

display the number of distinct items with data count widget

I have a list of items, some of them have several rows for the same item (different variants of the same item).
I want to count how many items exists in total, and how many are selected. however, when I'm using the data count widget, it only works with the number of rows by default, but can't figure out how to change that behaviour.
var data=[{"item":"Category A","color":"blue"},
{"item":"Category B","color":"blue"},
{"item":"Category A","color":"pink"}];
var ndx = crossfilter(data);
var dimension = ndx.dimension(function(d) {return d.item;});
var all = ndx.groupAll();
dc.dataCount('.dc-data-count')
.dimension(ndx)
.group(all)
.html ({some: "%filter-count selected out of %total-count items <a class='btn btn-default btn-xs' href='javascript:dc.filterAll(); dc.renderAll();' role='button'>Reset filters</a>", all: "%total-count items. Click on the graphs to filter them"})
According to dc doc, the only dimension possible is the entire data set and the only group is the one returned by dimension.groupAll()
Is there a workaround to count something else than the number of records (eg. ndx.dimension(function(d) {return d.item;});?
Well, it's pretty ugly but you could easily hack it to produce other behavior.
If you look at the code, it's only calling .size() on the dimension, and only .value() on the group. That's why the documents say that only a groupAll is appropriate; regular groups don't have .value().
https://github.com/dc-js/dc.js/blob/develop/src/data-count.js#L93-94
But it calls no other methods on the dimension or group, so you can fake this pretty easily:
chart.dimension({size: function() {
return 42; // calculate something here
} });
It might be somewhat more appropriate to use two number display widgets instead, but this should work.

Resources