Get only non-filtered data from dc.js chart (dimension / group) - d3.js

So this is a question regarding a rather specific problem. As I know from Gordon, main contributor of dc.js, there is no support for elasticY(true) function for logarithmic scales.
So, after knowing this, I tried to implement my own solution, by building a workaround, inside dc.js's renderlet event. This event is always triggered by a click of the user onto the barchart. What I wanted to do is this:
let groupSize = this.getGroupSize(fakeGroup, this.yValue);
let maximum = group.top(1)[0].value;
let minimum = group.top(groupSize)[groupSize-1].value;
console.log(minimum, maximum);
chart.y(d3.scale.log().domain([minimum, maximum])
.range(this.height, 0)
.nice()
.clamp(true));
I thought, that at this point the "fakeGroup" (which is just group.top(50)) contains only the data points that are NOT filtered out after the user clicked somewhere. However, this group always contains all data points that are in the top 50 and doesn't change on filter events.
What I really wanted is get all data points that are NOT filtered out, to get a new maximum and minimum for the yScale and rescale the yAxis accordingly by calling chart.y(...) again.
Is there any way to get only data rows that are still in the chart and not filtered out. I also tried using remove_empty_bins(group) but didn't have any luck with that. Somewhere is always all() or top() missing, even after giving remove_empty_bins both functions.

This is how i solved it:
I made a function called rescale(), which looks like this:
rescale(chart, group, fakeGroup) {
let groupSize = this.getGroupSize(fakeGroup, this.yValue);
let minTop = group.top(groupSize)[groupSize-1].value;
let minimum = minTop > 0 ? minTop : 0.0001;
let maximum = group.top(1)[0].value;
chart.y(d3.scale.log().domain([minimum, maximum])
.range(this.height, 0)
.nice()
.clamp(true));}
I think the parameters are pretty self-explanatory, I just get my chart, the whole group as set by dimension.group.reduceSum and a fake group I created, which contains the top 50 elements, to reduce bar count of my chart.
The rescale() method is called in the event listener
chart.on('preRedraw', (chart) => {
this.rescale(chart, group, fakeGroup);
}
So what I do is re-defining (re-setting min and max values regarding filtered data) the charts yAxis everytime the chart gets redrawn, which happens to also be every time one of my charts is filtered. So now, the scale always fits the filtered data the chart contains after filtering another chart.

Related

How to show only limited number of records in box plot dc.js

I want to show the most recent 10 bins for box plot.
If a filter is applied to the bar chart or line chart, the box plot should show the most recent 10 records according to those filters.
I made dimension by date(ordinal). But I am unable to get the result.
I didn’t get how to do it with a fake group. I am new to dc.js.
The pic of scenario is attached. Let me know if anyone need more detail to help me.
in image i tried some solution by time scale.
You can do this with two fake groups, one to remove the empty box plots, and one to take the last N elements of the resulting data.
Removing empty box plots:
function remove_empty_array_bins(group) {
return {
all: function() {
return group.all().filter(d => d.value.length);
}
};
}
This just filters the bins, removing any where the .value array is of length 0.
Taking the last N elements:
function cap_group(group, N) {
return {
all: function() {
var all = group.all();
return all.slice(all.length - N);
}
};
}
This is essentially what the cap mixin does, except without creating a bin for "others" (which is somewhat tricky).
We fetch the data from the original group, see how long it is, and then slice that array from all.length - N to the end.
Chain these fake together when passing them to the chart:
chart
.group(cap_group(remove_empty_array_bins(closeGroup), 5))
I'm using 5 instead of 10 because I have a smaller data set to work with.
Demo fiddle.
This example uses a "real" time scale rather than ordinal dates. There are a few ways to do ordinal dates, but if your group is still sorted from low to high dates, this should still work.
If not, you'll have to edit your question to include an example of the code you are using to generate the ordinal date group.

dc.js - avoid data points animation when adding data to scatter plot

I'm trying to implement a live data visualization (i.e. with new data arriving periodically) using dc.js. The problem I'm having is the following - when new data is added to the plot, already existing points often start to "dance around", even though they were not changed. Can this be avoided?
The following fiddle illustrates this.
My guess is that crossfilter sorts data internally, which results in points moving on the chart for data items that changed their position (index) in the internal storage. Data is added in the following way:
var data = [];
var ndx = crossfilter(data)
setInterval(function() {
var value = ndx.size() + 1;
if (value > 50) {
return;
}
var newElement = {
x: myRandom(),
y: myRandom()
};
ndx.add([newElement]);
dc.redrawAll();
}, 1000);
Any ideas?
I stand by my comments above. dc.js should be fixed by binding the data using a key function, and probably the best way to deal with the problem is just to disable transitions on the scatterplot using .transitionDuration(0)
However, I was curious if it was possible to work around the current problems by keeping the group in a set order using a fake group. And it is indeed, at least for this example where there is no aggregation and we just want to display the original data points.
First, we add a third field, index, to the data. This has to order the data in the same order in which it comes in. As noted in the discussion above, the scatter plot is currently binding data by its index, so we need to keep the points in a set order; nothing should be inserted.
var newElement = {
index: value,
x: myRandom(),
y: myRandom()
};
Next, we have to preserve this index through the binning and aggregation. We could keep it either in the key or in the value, but keeping it in the key seems more fitting:
xyiDimension = ndx.dimension(function(d) {
return [+d.x, +d.y, d.index];
}),
xyiGroup = xyiDimension.group();
The original reduction didn't make sense to me, so I dropped it. We'll just use the default behavior, which counts the number of rows which fall into each bin. The counts should be 1 if included, or 0 if filtered out. Including the index in the key also ensures uniqueness, which the original keys were not guaranteed to have.
Now we can create a fake group that keeps everything sorted by index:
var xyiGroupSorted = {
all: function() {
var ret = xyiGroup.all().slice().sort((a,b) => a.key[2] - b.key[2]);
return ret;
}
}
This will fetch the original data whenever it's requested by the chart, create a copy of the array (because the original is owned by crossfilter), and sort it to return it to the correct order.
And voila, we have a scatter plot that behaves the way it should, even though the data has gone through crossfilter.
Fork of your fiddle: https://jsfiddle.net/gordonwoodhull/mj81m42v/13/
[After all this, maybe we shouldn't have given the data to crossfilter in the first place! We could have just created a fake group which exposes the original data. But maybe there's some use to this technique. At least it proves that there's almost always a way to work around any problems in dc.js & crossfilter.]

Clicking on rowchart (dc.js) changes the percentage

I need to solve a problem with dc and crossfilter, I have two rowcharts in which I show the calculated percentage of each row as:
(d.value/ndx.groupAll().reduceCount().value()*100).toFixed(1)
When you click on a row in the first chart, the text changes to 100% and does not maintain the old percentage value, also the percentages of the rows of the same chart where the row was selected change.
Is it possible to keep the original percentage when I click ?, affecting the other graphics where it was not clicked.
regards
thank you very much
First off, you probably don't want to call ndx.groupAll() inside of the calculation for the percentages, since that will be called many times. This method creates a object which will get updated every time a filter changes.
Now, there are three ways to interpret your specific question. I think the first case is the most likely, but the other two are also legitimate, so I'll address all three.
Percentages affected by other charts
Clearly you don't want the percentage affected by filtering the current chart. You almost never want that. But it often makes sense to have the percentage label affected by filtering on other charts, so that all the bars in the row chart add up to 100%.
The subtle difference between dimension.groupAll() and crossfilter.groupAll() is that the former will not observe that dimensions filters, whereas the latter observes all filters. If we use the row chart dimension's groupAll it will observe the other filters but not filters on this chart:
var totalGroup = rowDim.groupAll().reduceCount();
rowChart.label(function(kv) {
return kv.key + ' (' + (kv.value/totalGroup.value()*100).toFixed(1) + '%)';
});
That's probably what you want, but reading your question literally suggests two other possible answers. So read on if that's not what you were looking for.
Percentages out of the constant total, but affected by other filters
Crossfilter doesn't have any particular way to calculate unfiltered totals, but if want to use the unfiltered total, we can capture the value before any filters are applied.
So:
var total = rowDim.groupAll().reduceCount().value;
rowChart.label(function(kv) {
return kv.key + ' (' + (kv.value/total*100).toFixed(1) + '%)';
});
In this case, the percentages will always show the portion out of the full, unfiltered, total denominator, but the numerators will reflect filters on other charts.
Percentages not affected by filtering at all
If you really want to just completely freeze the percentages and show unfiltered percentages, not affected by any filtering, we'll have to do a little extra work to capture those values.
(This is similar to what you need to do if you want to show a "shadow" of the unfiltered bars behind them.)
We'll copy all the group data into a map we can use to look up the values:
var rowUnfilteredAll = rowGroup.all().reduce(function(p, kv) {
p[kv.key] = kv.value;
return p;
}, {});
Now the label code is similar to before, but we lookup values instead of reading them from the bound data:
var total = rowDim.groupAll().reduceCount().value;
rowChart.label(function(kv) {
return kv.key + ' (' + (rowUnfilteredAll[kv.key]/total*100).toFixed(1) + '%)';
});
(There might be a simpler way to just freeze the labels, but this is what came to mind.)

dc.js Composite Graph - Plot New Line for Each Person

Good Evening Everyone,
I'm trying to take the data from a database full of hour reports (name, timestamp, hours worked, etc.) and create a plot using dc.js to visualize the data. I would like the timestamp to be on the x-axis, the sum of hours for the particular timestamp on the y-axis, and a new bar graph for each unique name all on the same chart.
It appears based on my objectives that using crossfilter.js the timestamp should be my 'dimension' and then the sum of hours should be my 'group'.
Question 1, how would I then use the dimension and group to further split the data based on the person's name and then create a bar graph to add to my composite graph? I would like for the crossfilter.js functionality to remain intact so that if I add a date range tool or some other user controllable filter, everything updates accordingly.
Question 2, my timestamps are in MySQL datetime format: YYYY-mm-dd HH:MM:SS so how would I go about dropping precision? For instance, if I want to combine all entries from the same day into one entry (day precision) or combine all entries in one month into a single entry (month precision).
Thanks in advance!
---- Added on 2017/01/28 16:06
To further clarify, I'm referencing the Crossfilter & DC APIs alongside the DC NASDAQ and Composite examples. The Composite example has shown me how to place multiple line/bar charts on a single graph. On the composite chart I've created, each of the bar charts I've added a dimension based off of the timestamps in the data-set. Now I'm trying to figure out how to define the groups for each. I want each bar chart to represent the total time worked per timestamp.
For example, I have five people in my database, so I want there to be five bar charts within the single composite chart. Today all five submitted reports saying they worked 8 hours, so now all five bar charts should show a mark at 01/28/2017 on the x-axis and 8 hours on the y-axis.
var parseDate = d3.time.format('%Y-%m-%d %H:%M:%S').parse;
data.forEach(function(d) {
d.timestamp = parseDate(d.timestamp);
});
var ndx = crossfilter(data);
var writtenDimension = ndx.dimension(function(d) {
return d.timestamp;
});
var hoursSumGroup = writtenDimension.group().reduceSum(function(d) {
return d.time_total;
});
var minDate = parseDate('2017-01-01 00:00:00');
var maxDate = parseDate('2017-01-31 23:59:59');
var mybarChart = dc.compositeChart("#my_chart");
mybarChart
.width(window.innerWidth)
.height(480)
.x(d3.time.scale().domain([minDate,maxDate]))
.brushOn(false)
.clipPadding(10)
.yAxisLabel("This is the Y Axis!")
.compose([
dc.barChart(mybarChart)
.dimension(writtenDimension)
.colors('red')
.group(hoursSumGroup, "Top Line")
]);
So based on what I have right now and the example I've provided, in the compose section I should have 5 charts because there are 5 people (obviously this needs to be dynamic in the end) and each of those charts should only show the timestamp: total_time data for that person.
At this point I don't know how to further breakup the group hoursSumGroup based on each person and this is where my Question #1 comes in and I need help figuring out.
Question #2 above is that I want to make sure that the code is both dynamic (more people can be handled without code change), when minDate and maxDate are later tied to user input fields, the charts update automatically (I assume through adjusting the dimension variable in some way), and if I add a names filter that if I unselect names that the chart will update by removing the data for that person.
A Question #3 that I'm now realizing I'll want to figure out is how to get the person's name to show up in the pointer tooltip (the title) along with timestamp and total_time values.
There are a number of ways to go about this, but I think the easiest thing to do is to create a custom reduction which reduces each person into a sub-bin.
First off, addressing question #2, you'll want to set up your dimension based on the time interval you're interested in. For instance, if you're looking at days:
var writtenDimension = ndx.dimension(function(d) {
return d3.time.hour(d.timestamp);
});
chart.xUnits(d3.time.hours);
This will cause each timestamp to be rounded down to the nearest hour, and tell the chart to calculate the bar width accordingly.
Next, here's a custom reduction (from the FAQ) which will create an object for each reduced value, with values for each person's name:
var hoursSumGroup = writtenDimension.group().reduce(
function(p, v) { // add
p[v.name] = (p[v.name] || 0) + d.time_total;
return p;
},
function(p, v) { // remove
p[v.name] -= d.time_total;
return p;
},
function() { // init
return {};
});
I did not go with the series example I mentioned in the comments, because I think composite keys can be difficult to deal with. That's another option, and I'll expand my answer if that's necessary.
Next, we can feed the composite line charts with value accessors that can fetch the value by name.
Assume we have an array names.
compositeChart.shareTitle(false);
compositeChart.compose(
names.map(function(name) {
return dc.lineChart(compositeChart)
.dimension(writtenDimension)
.colors('red')
.group(hoursSumGroup)
.valueAccessor(function(kv) {
return kv.value[name];
})
.title(function(kv) {
return name + ' ' + kv.key + ': ' + kv.value;
});
}));
Again, it wouldn't make sense to use bar charts here, because they would obscure each other.
If you filter a name elsewhere, it will cause the line for the name to drop to zero. Having the line disappear entirely would probably not be so simple.
The above shareTitle(false) ensures that the child charts will draw their own titles; the title functions just add the current name to those titles (which would usually just be key:value).

dataCount graph filtered by a dimension

I have a list of participants to various events as the data source
eventid,participant_name
42,xavier
42,gordon
11,john
...
by default, dataCount will say they are 3 participants, I need to display the number of events (so 2)
I tried creating a dimension
var event = ndx.dimension(function(d) {
return d.eventid;
})
but can't manage to use it in dataCount
dc.dataCount(".dc-data-count")
//.dimension(ndx) //working, but counts participants
.dimension(event) // not working
How do I do that?
It sounds to me like you are trying to use the data count widget to count group bins rather than rows.
The data count widget is only designed to count records, not keys or groups or anything else. But you could fake out the objects, since the widget is calling just .size() on the dimension, and just .value() on the group.
But what to put there? The value is actually sort of easy, since it's the count of groups with non-zero value:
var eventGroup = event.group();
widget.group({value: function() {
return eventGroup.all().filter(function(kv) { return kv.value>0; }).length;
})
But what is the size? Well, according to the crossfilter documentation, group.size actually returns what we want, "the number of distinct values in the group, independent of any filters; the cardinality."
So oddly, it seems like
widget.dimension(eventGroup)
should work. Of course, I haven't tested any of this, so please comment if this doesn't work!
(Sigh, what I wouldn't do for a real data model in dc.js. It is rather confusing that the "dimension" for this widget is actually the crossfilter object. Another place where there is kind of a weird economy of methods, like the dimension's group, which is just a function.)

Resources