I need to display the top20 alarms out of a large dataset.
The standard dc.js works great and using .rowsCap(20) in a rowChart gives me the top20.
I am now trying to remove the zero rows when the data is filtered below 20 entries. Several similar posted questions pointed to the remove_empty_bins() from https://github.com/dc-js/dc.js/wiki/FAQ#filter-the-data-before-its-charted which works correctly if I remove the .rowsCap(20) but fails I combine the two.
Using dc.js-2.0.0-beta.1 it fails on line 3415 for the group.top(_cap) call because the .top attribute is not available for the fake group generated by remove_empty_bins().
Same error when trying to add .top(20) when defining the fake group.
Is there an easy way to combine remove_empty_bins(original_group) with .top() or .rowCaps() for a rowChart ?
--Nico
It is just a little bit more complicated to add .top(n) to the fake group:
function remove_empty_bins(source_group) {
function non_zero_pred(d) {
return d.value != 0;
}
return {
all: function () {
return source_group.all().filter(non_zero_pred);
},
top: function(n) {
return source_group.top(Infinity)
.filter(non_zero_pred)
.slice(0, n);
}
};
}
The efficiency is not perfect, because this fetches all the groups in sorted order and then throws out all but the first n, while crossfilter is able to only pull the first n using a heap. But this shouldn't matter unless the number of groups is huge.
Working fork of your fiddle: http://jsfiddle.net/gordonwoodhull/za8ksj45/3/
EDIT: note that the need to provide group.top() is being eliminated in dc.js 2.1.2, since the functionality overlaps with chart.ordering() and leads to confusing bugs. (As well as making these data preprocessors difficult to write.)
Related
I want to show the most recent 10 bins for box plot.
If a filter is applied to the bar chart or line chart, the box plot should show the most recent 10 records according to those filters.
I made dimension by date(ordinal). But I am unable to get the result.
I didn’t get how to do it with a fake group. I am new to dc.js.
The pic of scenario is attached. Let me know if anyone need more detail to help me.
in image i tried some solution by time scale.
You can do this with two fake groups, one to remove the empty box plots, and one to take the last N elements of the resulting data.
Removing empty box plots:
function remove_empty_array_bins(group) {
return {
all: function() {
return group.all().filter(d => d.value.length);
}
};
}
This just filters the bins, removing any where the .value array is of length 0.
Taking the last N elements:
function cap_group(group, N) {
return {
all: function() {
var all = group.all();
return all.slice(all.length - N);
}
};
}
This is essentially what the cap mixin does, except without creating a bin for "others" (which is somewhat tricky).
We fetch the data from the original group, see how long it is, and then slice that array from all.length - N to the end.
Chain these fake together when passing them to the chart:
chart
.group(cap_group(remove_empty_array_bins(closeGroup), 5))
I'm using 5 instead of 10 because I have a smaller data set to work with.
Demo fiddle.
This example uses a "real" time scale rather than ordinal dates. There are a few ways to do ordinal dates, but if your group is still sorted from low to high dates, this should still work.
If not, you'll have to edit your question to include an example of the code you are using to generate the ordinal date group.
I'm trying to implement a live data visualization (i.e. with new data arriving periodically) using dc.js. The problem I'm having is the following - when new data is added to the plot, already existing points often start to "dance around", even though they were not changed. Can this be avoided?
The following fiddle illustrates this.
My guess is that crossfilter sorts data internally, which results in points moving on the chart for data items that changed their position (index) in the internal storage. Data is added in the following way:
var data = [];
var ndx = crossfilter(data)
setInterval(function() {
var value = ndx.size() + 1;
if (value > 50) {
return;
}
var newElement = {
x: myRandom(),
y: myRandom()
};
ndx.add([newElement]);
dc.redrawAll();
}, 1000);
Any ideas?
I stand by my comments above. dc.js should be fixed by binding the data using a key function, and probably the best way to deal with the problem is just to disable transitions on the scatterplot using .transitionDuration(0)
However, I was curious if it was possible to work around the current problems by keeping the group in a set order using a fake group. And it is indeed, at least for this example where there is no aggregation and we just want to display the original data points.
First, we add a third field, index, to the data. This has to order the data in the same order in which it comes in. As noted in the discussion above, the scatter plot is currently binding data by its index, so we need to keep the points in a set order; nothing should be inserted.
var newElement = {
index: value,
x: myRandom(),
y: myRandom()
};
Next, we have to preserve this index through the binning and aggregation. We could keep it either in the key or in the value, but keeping it in the key seems more fitting:
xyiDimension = ndx.dimension(function(d) {
return [+d.x, +d.y, d.index];
}),
xyiGroup = xyiDimension.group();
The original reduction didn't make sense to me, so I dropped it. We'll just use the default behavior, which counts the number of rows which fall into each bin. The counts should be 1 if included, or 0 if filtered out. Including the index in the key also ensures uniqueness, which the original keys were not guaranteed to have.
Now we can create a fake group that keeps everything sorted by index:
var xyiGroupSorted = {
all: function() {
var ret = xyiGroup.all().slice().sort((a,b) => a.key[2] - b.key[2]);
return ret;
}
}
This will fetch the original data whenever it's requested by the chart, create a copy of the array (because the original is owned by crossfilter), and sort it to return it to the correct order.
And voila, we have a scatter plot that behaves the way it should, even though the data has gone through crossfilter.
Fork of your fiddle: https://jsfiddle.net/gordonwoodhull/mj81m42v/13/
[After all this, maybe we shouldn't have given the data to crossfilter in the first place! We could have just created a fake group which exposes the original data. But maybe there's some use to this technique. At least it proves that there's almost always a way to work around any problems in dc.js & crossfilter.]
I'm using the grouped bar PR of dc.js and the corresponding grouped bar chart example as a baseline.
For some reason, I have to use numbers in my data as opposed to strings. (Convert "male" and "female" to 1/0). I'm guessing it has to do with the reduce functions I'm using. This also effects my x-axis labels, of course. I'd rather they show the text variations.
ndx = crossfilter(eData),
groupDim = ndx.dimension(function(d) {return d.service;}),
qtySumGroup = groupDim.group().reduce(
function(p,v) { p[v.component] = (p[v.component] || 0) + v.qty; return p; },
function(p,v) { p[v.component] = (p[v.component] || 0) - v.qty; return p; },
function() { return{}; });
I'm also noticing that it doesn't seem to crossfilter the data. When I click one of the bars in a group, it doesn't filter my other charts on the page. What am I missing?
Here's the first part of the answer. In order to use string components/genders for grouping, you'll need to adjust the way data is selected for "stacking" (actually grouping when this version of dc.js is used).
So, you can grab the component names by first walking the data and grabbing the components:
var components = Object.keys(etsData.reduce(function(p, v) {
p[v.component] = 1;
return p;
}, {}));
This builds an object where the keys are the component names, and then pulls just the keys as an array.
Then we use components to select the categories like so:
grpChart
.group(qtySumGroup, components[0], sel_stack(components[0]));
for(var i=1; i<components.length; ++i)
grpChart.stack(qtySumGroup, components[i], sel_stack(components[i]));
This is just the same as the original
grpChart
.group(qtySumGroup, "1", sel_stack('1'));
for(var i=2; i<6; ++i)
grpChart.stack(qtySumGroup, ''+i, sel_stack(i));
except that it is indexing by string instead of integer.
I realize this is not the important part of your question, but unfortunately filtering by stack segments is not currently supported in dc.js. I'll try to return to that part later today if I have time - it should be possible to hack it in using a dimension with compound keys (or using two dimensions) and a custom click event, but I haven't seen anyone try this yet.
It would no doubt be a helpful feature to add to dc.js, even if just as an external customization.
EDIT: I've added an example of filtering the segments of a stack, which should apply equally well for grouped bars (although I haven't tried it with your code). The technique is explained in the relevant dc.js issue.
I have a list of participants to various events as the data source
eventid,participant_name
42,xavier
42,gordon
11,john
...
by default, dataCount will say they are 3 participants, I need to display the number of events (so 2)
I tried creating a dimension
var event = ndx.dimension(function(d) {
return d.eventid;
})
but can't manage to use it in dataCount
dc.dataCount(".dc-data-count")
//.dimension(ndx) //working, but counts participants
.dimension(event) // not working
How do I do that?
It sounds to me like you are trying to use the data count widget to count group bins rather than rows.
The data count widget is only designed to count records, not keys or groups or anything else. But you could fake out the objects, since the widget is calling just .size() on the dimension, and just .value() on the group.
But what to put there? The value is actually sort of easy, since it's the count of groups with non-zero value:
var eventGroup = event.group();
widget.group({value: function() {
return eventGroup.all().filter(function(kv) { return kv.value>0; }).length;
})
But what is the size? Well, according to the crossfilter documentation, group.size actually returns what we want, "the number of distinct values in the group, independent of any filters; the cardinality."
So oddly, it seems like
widget.dimension(eventGroup)
should work. Of course, I haven't tested any of this, so please comment if this doesn't work!
(Sigh, what I wouldn't do for a real data model in dc.js. It is rather confusing that the "dimension" for this widget is actually the crossfilter object. Another place where there is kind of a weird economy of methods, like the dimension's group, which is just a function.)
I have a line chart. Its purpose is to show the amount of transactions per user over a given time period.
To do this I'm getting the dates of all users transactions. I'm working off this example : http://bl.ocks.org/mbostock/3884955 and have the line chart renedering fine.
My x-axis is time and the y-axis is number of transactions. The problem I have is to do with displaying dates when there is no activity.
Say I have 4 transactions on Tuesday and 5 transactions on Thursday..I need to show that there has been 0 transactions on Wednesday. As no data exists in my database explicitly stating that a user has made no transactions on Wedensday do I need to pass in the Wednesday time (and all other times, depending on the timeframe) with a 0 value? or can I do it with d3? I can't seem to find any examples that fit my problem.
This seems like a pretty common issue, so I worked up an example implementation here: http://jsfiddle.net/nrabinowitz/dhW2F/2/
Relevant code:
// get the min/max dates
var extent = d3.extent(data, function(d) { return d.date; }),
// hash the existing days for easy lookup
dateHash = data.reduce(function(agg, d) {
agg[d.date] = true;
return agg;
}, {}),
// note that this leverages some "get all headers but date" processing
// already present in the example
headers = color.domain();
// make even intervals
d3.time.days(extent[0], extent[1])
// drop the existing ones
.filter(function(date) {
return !dateHash[date];
})
// and push them into the array
.forEach(function(date) {
var emptyRow = { date: date };
headers.forEach(function(header) {
emptyRow[header] = null;
});
data.push(emptyRow);
});
// re-sort the data
data.sort(function(a, b) { return d3.ascending(a.date, b.date); });
As you can see, it's a bit convoluted, but seems to work well - you make an array of evenly spaced dates using the handy d3.interval.range method, filter out those dates already present in your data, and use the remaining ones to push empty rows. One downside is that performance could be slow for a big dataset - and this assumes full rows are empty, rather than different empty dates in different series.
An alternate representation, with gaps (using line.defined) instead of zero points, is here: http://jsfiddle.net/nrabinowitz/dhW2F/3/