Using multiple datasets in dc.js - dc.js

I have more than 5 normalized tables with one common dimension (primary key).
I don't want to combine them into single dataset and plot graphs.
I have created individual crossfilter objects to load data.
When any graph of the belonging to respected crossfilter object filtered,
I retrieve the filter using (primary key) in following way
rowchart.on("filtered",function(){
var filter=dimension.group().all().filter(function(d){return d.value>0}).map(function(d){return d.key});
}
Then this filter is passed on common dimension of all other crossfilter's object.
This implementation works fine for any two objects.
But when any other chart belonging to other crossfilter object filtered, it resets all the dimensions of all objects.
Is there any better way to implement this use case?

One way to do this, if you're able to look up rows by the primary key, is to tell crossfilter that your data is just a set of keys, and then define your dimension and group functions to actually do the table lookup.
E.g. for the simple example where you have arrays A and B of data, of the same length, and the primary key is the index, do
var ndx = crossfilter(d3.range(0, A.length));
var dateDim = ndx.dimension(function(i) { return A[i].date; });
var nameDim = ndx.dimension(function(i) { return B[i].name; });
Similarly, for the group reductions, refer to the data in the same way, since the reduce functions take a "row" from the main crossfilter. Say you're reducing on sum of salaries in B:
var salaryGroup = nameDim.reduceSum(function(i) { return B[i].salary; });
I do this in a situation where my data is column-major instead of row-major (R dataframes), and it works great.

Related

dc.js - avoid data points animation when adding data to scatter plot

I'm trying to implement a live data visualization (i.e. with new data arriving periodically) using dc.js. The problem I'm having is the following - when new data is added to the plot, already existing points often start to "dance around", even though they were not changed. Can this be avoided?
The following fiddle illustrates this.
My guess is that crossfilter sorts data internally, which results in points moving on the chart for data items that changed their position (index) in the internal storage. Data is added in the following way:
var data = [];
var ndx = crossfilter(data)
setInterval(function() {
var value = ndx.size() + 1;
if (value > 50) {
return;
}
var newElement = {
x: myRandom(),
y: myRandom()
};
ndx.add([newElement]);
dc.redrawAll();
}, 1000);
Any ideas?
I stand by my comments above. dc.js should be fixed by binding the data using a key function, and probably the best way to deal with the problem is just to disable transitions on the scatterplot using .transitionDuration(0)
However, I was curious if it was possible to work around the current problems by keeping the group in a set order using a fake group. And it is indeed, at least for this example where there is no aggregation and we just want to display the original data points.
First, we add a third field, index, to the data. This has to order the data in the same order in which it comes in. As noted in the discussion above, the scatter plot is currently binding data by its index, so we need to keep the points in a set order; nothing should be inserted.
var newElement = {
index: value,
x: myRandom(),
y: myRandom()
};
Next, we have to preserve this index through the binning and aggregation. We could keep it either in the key or in the value, but keeping it in the key seems more fitting:
xyiDimension = ndx.dimension(function(d) {
return [+d.x, +d.y, d.index];
}),
xyiGroup = xyiDimension.group();
The original reduction didn't make sense to me, so I dropped it. We'll just use the default behavior, which counts the number of rows which fall into each bin. The counts should be 1 if included, or 0 if filtered out. Including the index in the key also ensures uniqueness, which the original keys were not guaranteed to have.
Now we can create a fake group that keeps everything sorted by index:
var xyiGroupSorted = {
all: function() {
var ret = xyiGroup.all().slice().sort((a,b) => a.key[2] - b.key[2]);
return ret;
}
}
This will fetch the original data whenever it's requested by the chart, create a copy of the array (because the original is owned by crossfilter), and sort it to return it to the correct order.
And voila, we have a scatter plot that behaves the way it should, even though the data has gone through crossfilter.
Fork of your fiddle: https://jsfiddle.net/gordonwoodhull/mj81m42v/13/
[After all this, maybe we shouldn't have given the data to crossfilter in the first place! We could have just created a fake group which exposes the original data. But maybe there's some use to this technique. At least it proves that there's almost always a way to work around any problems in dc.js & crossfilter.]

Using DC.js datatable to create a table with aggregated data by year

I've been trying to use dc.js and crossfilter to both build charts and tables from a certain dataset.
So far building charts works fine, but I want to use the datatable functionality to build a small html table to summarize the data as follows:
|Year|TotalEmployees|
|2015|555|
|2016|666|
|2017|777|
My dataset has around 20 000 rows, here's a sample of the data:
var data = [
{"Year":"2015","Category":"1","NbEmployee":"51"},
{"Year":"2015","Category":"2","NbEmployee":"31"},
{"Year":"2015","Category":"3","NbEmployee":"14"}
{"Year":"2016","Category":"1","NbEmployee":"51"},
{"Year":"2016","Category":"2","NbEmployee":"55"},
{"Year":"2016","Category":"3","NbEmployee":"65"},
{"Year":"2017","Category":"1","NbEmployee":"76"},
{"Year":"2017","Category":"2","NbEmployee":"98"},
];
So far this piece of code returns one row of result per row of data, and although it feels like it should be a simple manipulation, I can't figure out the right syntax to build a summarized table with one row per year:
var ndx = crossfilter(data);
var tableDim = ndx.dimension(function(d) {
return d.Year;
});
var datatable = dc.dataTable("#dc-data-table");
datatable
.dimension(tableDim)
.group(function(d) {
d.NbEmployee += d.NbEmployee;
return d.Year;
})
.columns([
function(d) {return d.Year;},
function(d) {return d.NbEmployee;},
]);
I've tried countless times to apply the
.group().reduceSum()
functions to the dimension into a variable and then passing it to the .group() parameter, but I always end up with a compilation error, I'm pretty clueless right now.
The SQL translation of what I'm looking for is this:
SELECT
Year,
NbEmp = SUM(NbEmploye)
FROM DB
GROUP BY
Year
ORDER BY
Year
Thanks in advance for your help!
The dataTable's group is not a group - yes, pretty confusing to use this method to mean something completely different from what it means in all the other charts. Here, it's a function, everywhere else it's a crossfilter object.
The dataTable is unique out of the dc.js charts in that it reads its data from the .dimension() object. This is because it displays the raw rows of data, rather than aggregated data, by default.
However, it can be used to display a group instead. This works because the only method it actually calls on the dimension is .top(), if you choose to display in descending order.
If you want to display in ascending order, you can use a fake group to produce an object which supports the .bottom() method.

click in datatable to filter other charts (dc.js)

I need to filter other charts when I click a row in the datatable.
I did
my_table.on('pretransition', function (table) {
table.selectAll('td.dc-table-column')
.on('click',function(d){
table.filter(d.key)
dc.redrawAll();
})
});
but nothing happens in the other charts.
Can you help me, please?
If the table dimension is a dimension...
The data that ordinarily populates a data table is the raw rows from the original data set, not key/value pairs.
So it is likely that d.key is undefined.
I'd advise you first to stick
console.log(d)
into your click handler to see what your data looks like, to make sure d.key is valid.
Second, remember that a chart filters through its dimension. So you will need to pass a value to table.filter() that is a valid key for your dimension, and then it will filter out all rows for which the key is different. This may not be just the one row that you chose.
Typically a table dimension is chosen for the way it orders the values for the rows. You might actually want to filter some other dimension. But hopefully this is enough to get you started.
But what if the the table dimension is a group?
The above technique will only work if your table takes a crossfilter dimension as its dimension. If, as in the fiddle you linked in the comments, you're using a group as your dimension, that object has no .filter() method, so the table.filter() method won't do anything.
If you only need to filter the one item that was clicked, you could just do
foodim.filter(d.key)
This has an effect but it's not that useful.
If you need the toggle functionality used in dc's ordinal charts, you'll need to simulate it. It's not all that complicated:
// global
var filterKeys = [];
// inside click event
if(filterKeys.indexOf(d.key)===-1)
filterKeys.push(d.key);
else
filterKeys = filterKeys.filter(k => k != d.key);
if(filterKeys.length === 0)
foodim.filter(null);
else
foodim.filterFunction(function(d) {
return filterKeys.indexOf(d) !== -1;
})
Example fiddle: https://jsfiddle.net/gordonwoodhull/kfmfkLj0/9/

dataCount graph filtered by a dimension

I have a list of participants to various events as the data source
eventid,participant_name
42,xavier
42,gordon
11,john
...
by default, dataCount will say they are 3 participants, I need to display the number of events (so 2)
I tried creating a dimension
var event = ndx.dimension(function(d) {
return d.eventid;
})
but can't manage to use it in dataCount
dc.dataCount(".dc-data-count")
//.dimension(ndx) //working, but counts participants
.dimension(event) // not working
How do I do that?
It sounds to me like you are trying to use the data count widget to count group bins rather than rows.
The data count widget is only designed to count records, not keys or groups or anything else. But you could fake out the objects, since the widget is calling just .size() on the dimension, and just .value() on the group.
But what to put there? The value is actually sort of easy, since it's the count of groups with non-zero value:
var eventGroup = event.group();
widget.group({value: function() {
return eventGroup.all().filter(function(kv) { return kv.value>0; }).length;
})
But what is the size? Well, according to the crossfilter documentation, group.size actually returns what we want, "the number of distinct values in the group, independent of any filters; the cardinality."
So oddly, it seems like
widget.dimension(eventGroup)
should work. Of course, I haven't tested any of this, so please comment if this doesn't work!
(Sigh, what I wouldn't do for a real data model in dc.js. It is rather confusing that the "dimension" for this widget is actually the crossfilter object. Another place where there is kind of a weird economy of methods, like the dimension's group, which is just a function.)

Apply Filter from one Crossfilter dataset to another Crossfilter

I have two datasets that have similar columns/dimensions but are grouped differently by row and contain different measures.
Ex:
Dataset 1
Year Category SubCategory Value01 Value02
2000 Cars Sport 10 11
2000 Cars Family 15 16
2000 Boats Sport 20 21
2000 Boats Family 25 26
...
Dataset 2
Year Category ValueA ValueB
2000 Cars 100 101
2000 Boats 200 201
...
Dataset 1 has its own crossfilter object, Dataset 2 has a separate crossfilter object. I have multiple dc.js charts, some tied to the dataset 1, some to dataset 2.
When a dc.js chart filters dataset 1 on a column/dimension that also exists in dataset 2, I want to apply that same filter to dataset 2. How can this be achieved?
I don't think there is any automatic way to do this in crossfilter or dc.js. But if you're willing to roll your own dimension wrapper, you could supply that instead of the original dimension objects and have that forward to all the underlying dimensions.
EDIT: based on #Aravind's fiddle below, here is a "dimension mirror" that works, at least for this simple example:
function mirror_dimension() {
var dims = Array.prototype.slice.call(arguments, 0);
function mirror(fname) {
return function(v) {
dims.forEach(function(dim) {
dim[fname](v);
});
};
}
return {
filter: mirror('filter'),
filterExact: mirror('filterExact'),
filterRange: mirror('filterRange'),
filterFunction: mirror('filterFunction')
};
}
It's a bit messy using this. For each dimension you want to mirror from crossfilter A to crossfilter B, you'll need to create a mirror dimension on crossfilter B, and vice versa:
// Creating the dimensions
subject_DA = CFA.dimension(function(d){ return d.Subject; });
name_DA = CFA.dimension(function(d){ return d.Name; });
// mirror dimensions to receive events from crossfilter B
mirror_subject_DA = CFA.dimension(function(d){ return d.Subject; });
mirror_name_DA = CFA.dimension(function(d){ return d.Name; });
subject_DB = CFB.dimension(function(d){ return d.Subject; });
name_DB = CFB.dimension(function(d){ return d.Name; });
// mirror dimensions to receive events from crossfilter A
mirror_subject_DB = CFB.dimension(function(d){ return d.Subject; });
mirror_name_DB = CFB.dimension(function(d){ return d.Name; });
Now you tie them together when passing them off to the charts:
// subject Chart
subjectAChart
.dimension(mirror_dimension(subject_DA, mirror_subject_DB))
// ...
// subject Chart
subjectBChart
.dimension(mirror_dimension(subject_DB, mirror_subject_DA))
// ...
nameAChart
.dimension(mirror_dimension(name_DA, mirror_name_DB))
// ...
nameBChart
.dimension(mirror_dimension(name_DB, mirror_name_DA))
// ...
Since all the charts are implicitly on the same chart group, the redraw events will automatically get propagated between them when they are filtered. And each filter action on one crossfilter will get applied to the mirror dimension on the other crossfilter.
Maybe not something I'd recommend doing, but as usual, it can be made to work.
Here's the fiddle: https://jsfiddle.net/gordonwoodhull/7dwn4y87/8/
#Gordon's suggestion is a good one.
I usually approach this differently, by combining the 2 tables into a single table (add ValueA and ValueB to each row of Data Set 1) and then using custom groupings to only aggregate ValueA and Value B once for each unique Year/Category combination. Each group would need to keep a map of keys it has seen before and the count for each of those keys, only aggregating the value of ValueA or ValueB if it is a new combination of keys. This does result in complicated grouping logic, but allows you to avoid needing to coordinate between 2 Crossfilter objects.
Personally, I just find complex custom groupings easier to test and maintian than coordination logic, but that's not the case for everyone.

Resources