dc.js Composite Graph - Plot New Line for Each Person - d3.js

Good Evening Everyone,
I'm trying to take the data from a database full of hour reports (name, timestamp, hours worked, etc.) and create a plot using dc.js to visualize the data. I would like the timestamp to be on the x-axis, the sum of hours for the particular timestamp on the y-axis, and a new bar graph for each unique name all on the same chart.
It appears based on my objectives that using crossfilter.js the timestamp should be my 'dimension' and then the sum of hours should be my 'group'.
Question 1, how would I then use the dimension and group to further split the data based on the person's name and then create a bar graph to add to my composite graph? I would like for the crossfilter.js functionality to remain intact so that if I add a date range tool or some other user controllable filter, everything updates accordingly.
Question 2, my timestamps are in MySQL datetime format: YYYY-mm-dd HH:MM:SS so how would I go about dropping precision? For instance, if I want to combine all entries from the same day into one entry (day precision) or combine all entries in one month into a single entry (month precision).
Thanks in advance!
---- Added on 2017/01/28 16:06
To further clarify, I'm referencing the Crossfilter & DC APIs alongside the DC NASDAQ and Composite examples. The Composite example has shown me how to place multiple line/bar charts on a single graph. On the composite chart I've created, each of the bar charts I've added a dimension based off of the timestamps in the data-set. Now I'm trying to figure out how to define the groups for each. I want each bar chart to represent the total time worked per timestamp.
For example, I have five people in my database, so I want there to be five bar charts within the single composite chart. Today all five submitted reports saying they worked 8 hours, so now all five bar charts should show a mark at 01/28/2017 on the x-axis and 8 hours on the y-axis.
var parseDate = d3.time.format('%Y-%m-%d %H:%M:%S').parse;
data.forEach(function(d) {
d.timestamp = parseDate(d.timestamp);
});
var ndx = crossfilter(data);
var writtenDimension = ndx.dimension(function(d) {
return d.timestamp;
});
var hoursSumGroup = writtenDimension.group().reduceSum(function(d) {
return d.time_total;
});
var minDate = parseDate('2017-01-01 00:00:00');
var maxDate = parseDate('2017-01-31 23:59:59');
var mybarChart = dc.compositeChart("#my_chart");
mybarChart
.width(window.innerWidth)
.height(480)
.x(d3.time.scale().domain([minDate,maxDate]))
.brushOn(false)
.clipPadding(10)
.yAxisLabel("This is the Y Axis!")
.compose([
dc.barChart(mybarChart)
.dimension(writtenDimension)
.colors('red')
.group(hoursSumGroup, "Top Line")
]);
So based on what I have right now and the example I've provided, in the compose section I should have 5 charts because there are 5 people (obviously this needs to be dynamic in the end) and each of those charts should only show the timestamp: total_time data for that person.
At this point I don't know how to further breakup the group hoursSumGroup based on each person and this is where my Question #1 comes in and I need help figuring out.
Question #2 above is that I want to make sure that the code is both dynamic (more people can be handled without code change), when minDate and maxDate are later tied to user input fields, the charts update automatically (I assume through adjusting the dimension variable in some way), and if I add a names filter that if I unselect names that the chart will update by removing the data for that person.
A Question #3 that I'm now realizing I'll want to figure out is how to get the person's name to show up in the pointer tooltip (the title) along with timestamp and total_time values.

There are a number of ways to go about this, but I think the easiest thing to do is to create a custom reduction which reduces each person into a sub-bin.
First off, addressing question #2, you'll want to set up your dimension based on the time interval you're interested in. For instance, if you're looking at days:
var writtenDimension = ndx.dimension(function(d) {
return d3.time.hour(d.timestamp);
});
chart.xUnits(d3.time.hours);
This will cause each timestamp to be rounded down to the nearest hour, and tell the chart to calculate the bar width accordingly.
Next, here's a custom reduction (from the FAQ) which will create an object for each reduced value, with values for each person's name:
var hoursSumGroup = writtenDimension.group().reduce(
function(p, v) { // add
p[v.name] = (p[v.name] || 0) + d.time_total;
return p;
},
function(p, v) { // remove
p[v.name] -= d.time_total;
return p;
},
function() { // init
return {};
});
I did not go with the series example I mentioned in the comments, because I think composite keys can be difficult to deal with. That's another option, and I'll expand my answer if that's necessary.
Next, we can feed the composite line charts with value accessors that can fetch the value by name.
Assume we have an array names.
compositeChart.shareTitle(false);
compositeChart.compose(
names.map(function(name) {
return dc.lineChart(compositeChart)
.dimension(writtenDimension)
.colors('red')
.group(hoursSumGroup)
.valueAccessor(function(kv) {
return kv.value[name];
})
.title(function(kv) {
return name + ' ' + kv.key + ': ' + kv.value;
});
}));
Again, it wouldn't make sense to use bar charts here, because they would obscure each other.
If you filter a name elsewhere, it will cause the line for the name to drop to zero. Having the line disappear entirely would probably not be so simple.
The above shareTitle(false) ensures that the child charts will draw their own titles; the title functions just add the current name to those titles (which would usually just be key:value).

Related

Is there a way to filter a dimension based on the value of another field?

I'm building a data dashboard for a project and I want to be able to compare data from two distinct groups in the same dataset.
My dataset looks like this:
Number,Name,Gender,Race,Height,Publisher,Alignment,Weight,Superpower,Strength,Costume Colour
1,A-Bomb,Male,Metahuman,203,Marvel Comics,Good,441,Superhuman Strength,10,None
2,Abin Sur,Male,Alien,185,DC Comics,Good,90,Cosmic Power,40,Green
3,Abomination,Male,Metahuman,203,Marvel Comics,Bad,441,Superhuman Strength,10,None
4,Abraxas,Male,Cosmic Entity,1000,Marvel Comics,Bad,1000,Reality Warping,40,Green
5,Absorbing Man,Male,Metahuman,193,Marvel Comics,Bad,122,Matter Duplication,5,None
6,Adam Strange,Male,Human,185,DC Comics,Good,88,None,0,Red
I want to create two separate selectMenus which list the character names, but with each of the two menus filtered on Publisher name.
So one drop down will have all the characters associated with Marvel Comics, and the other will have all the characters associated with DC Comics.
Once this is set up, the idea is that the dashboard can then show a set of graphs which work as a comparison between the two characters that have been selected - so I don't want the entire dataset to be split out, I just want the selection to be filtered by Publisher.
I've been through dozens of different Stack Overflow threads about similar stuff but still struggling. I've created the dimensions for character name and character publisher but I'm getting really lost trying to use one to filter the other.
This is what I've got so far (ignore the costume color stuff, that's for something further down the line) - the data in 'heroes-information.csv' is in the same format I shared above.
// Bring in data from csv files
Promise.all([d3.csv("../data/heroes-information.csv"), d3.csv("../data/costume-colors.csv")])
.then(function(data) {
// Tidy data before use
data.forEach(function(d) {
d.Height = +d.Height;
d.Weight = +d.Weight;
d.Strength = +d.Strength;
});
// Bring in Heroes data
var ndx = crossfilter(data[0]);
// Bring in costume color data
var ndxcol = crossfilter(data[1]);
// Create colorScale to dynamically color pie chat slices
var colorScale = d3.scaleOrdinal()
.domain(data[1].map(row => row.Name))
.range(data[1].map(row => row.RGB));
// Define chart type
var dccomicsSelector = dc.selectMenu('#dccomics-selector');
// Define chart dimension
var character = ndx.dimension(dc.pluck('Name'));
var characterPublisher = ndx.dimension(dc.pluck('Publisher'));
var dccomicsCharacters = character.group();
var dccomicsPublisher = characterPublisher.group();
dccomicsSelector
.dimension(character)
.group(dccomicsCharacters);
dc.renderAll();
});
I'm probably missing something really obvious but I'm fairly new to DC.js and Crossfilter so a bit lost in the weeds with this one, any help would be much appreciated!

Using DC.js datatable to create a table with aggregated data by year

I've been trying to use dc.js and crossfilter to both build charts and tables from a certain dataset.
So far building charts works fine, but I want to use the datatable functionality to build a small html table to summarize the data as follows:
|Year|TotalEmployees|
|2015|555|
|2016|666|
|2017|777|
My dataset has around 20 000 rows, here's a sample of the data:
var data = [
{"Year":"2015","Category":"1","NbEmployee":"51"},
{"Year":"2015","Category":"2","NbEmployee":"31"},
{"Year":"2015","Category":"3","NbEmployee":"14"}
{"Year":"2016","Category":"1","NbEmployee":"51"},
{"Year":"2016","Category":"2","NbEmployee":"55"},
{"Year":"2016","Category":"3","NbEmployee":"65"},
{"Year":"2017","Category":"1","NbEmployee":"76"},
{"Year":"2017","Category":"2","NbEmployee":"98"},
];
So far this piece of code returns one row of result per row of data, and although it feels like it should be a simple manipulation, I can't figure out the right syntax to build a summarized table with one row per year:
var ndx = crossfilter(data);
var tableDim = ndx.dimension(function(d) {
return d.Year;
});
var datatable = dc.dataTable("#dc-data-table");
datatable
.dimension(tableDim)
.group(function(d) {
d.NbEmployee += d.NbEmployee;
return d.Year;
})
.columns([
function(d) {return d.Year;},
function(d) {return d.NbEmployee;},
]);
I've tried countless times to apply the
.group().reduceSum()
functions to the dimension into a variable and then passing it to the .group() parameter, but I always end up with a compilation error, I'm pretty clueless right now.
The SQL translation of what I'm looking for is this:
SELECT
Year,
NbEmp = SUM(NbEmploye)
FROM DB
GROUP BY
Year
ORDER BY
Year
Thanks in advance for your help!
The dataTable's group is not a group - yes, pretty confusing to use this method to mean something completely different from what it means in all the other charts. Here, it's a function, everywhere else it's a crossfilter object.
The dataTable is unique out of the dc.js charts in that it reads its data from the .dimension() object. This is because it displays the raw rows of data, rather than aggregated data, by default.
However, it can be used to display a group instead. This works because the only method it actually calls on the dimension is .top(), if you choose to display in descending order.
If you want to display in ascending order, you can use a fake group to produce an object which supports the .bottom() method.

crossfilter "double grouping" where key is the value of another reduction

Here is my data about mac address. It is recorded per minute. For each minute, I have many unique Mac addresses.
mac_add,created_time
18:59:36:12:23:33,2016-12-07 00:00:00.000
1c:e1:92:34:d7:46,2016-12-07 00:00:00.000
2c:f0:ee:86:bd:51,2016-12-07 00:00:00.000
5c:cf:7f:d3:2e:ce,2016-12-07 00:00:00.000
...
18:59:36:12:23:33,2016-12-07 00:01:00.000
1c:cd:e5:1e:99:78,2016-12-07 00:01:00.000
1c:e1:92:34:d7:46,2016-12-07 00:01:00.000
5c:cf:7f:22:01:df,2016-12-07 00:01:00.000
5c:cf:7f:d3:2e:ce,2016-12-07 00:01:00.000
...
I would like to create 2 bar charts using dc.js and crossfilter. Please refer to the image for the charts.
The first bar chart is easy enough to create. It is brushable. I created the "created_time" dimension, and created a group and reduceCount by "mac_add", such as below:
var moveTime = ndx.dimension(function (d) {
return d.dd; //# this is the created_time
});
var timeGroup = moveTime.group().reduceCount(function (d) {
return d.mac_add;
});
var visitorChart = dc.barChart('#visitor-no-bar');
visitorChart.width(990)
.height(350)
.margins({ top: 0, right: 50, bottom: 20, left: 40 })
.dimension(moveTime)
.group(timeGroup)
.centerBar(true)
.gap(1)
.elasticY(true)
.x(d3.time.scale().domain([new Date(2016, 11, 7), new Date(2016, 11, 13)]))
.round(d3.time.minute.round)
.xUnits(d3.time.minute);
visitorChart.render();
The problem is on the second bar chart. The idea is that, one row of the data equals 1 minute, so I can aggregate and sum all minutes of each mac address to get the time length of each mac addresses, by creating another dimension by "mac_add" and do reduceCount on "mac_add" to get the time length. Then the goal is to group the time length by 30 minutes. So we can get how many mac address that have time length of 30 min and less, how many mac_add that have time length between 30 min and 1 hour, how many mac_add that have time length between 1 hour and 1.5 hour, etc...
Please correct me if I am wrong. Logically, I was thinking the dimension of the second bar chart should be the group of time length (such as <30, <1hr, < 1.5hr, etc). But the time length group themselves are not fix. It depends on the brush selection of the first chart. Maybe it only contains 30 min, maybe it only contains 1.5 hours, maybe it contains 1.5 hours and 2 hours, etc...
So I am really confused what parameters to put into the second bar chart. And method to get the required parameters (how to group a grouped data). Please help me to explain the solution.
Regards,
Marvin
I think we've called this a "double grouping" in the past, but I can't find the previous questions.
Setting up the groups
I'd start with a regular crossfilter group for the mac addresses, and then produce a fake group to aggregate by count of minutes.
var minutesPerMacDim = ndx.dimension(function(d) { return d.mac_add; }),
minutesPerMapGroup = minutesPerMacDim.group();
function bin_keys_by_value(group, bin_value) {
var _bins;
return {
all: function() {
var bins = {};
group.all().forEach(function(kv) {
var valk = bin_value(kv.value);
bins[valk] = bins[valk] || [];
bins[valk].push(kv.key);
});
_bins = bins;
// note: Object.keys returning numerical order here might not
// work everywhere, but I couldn't find a browser where it didn't
return Object.keys(bins).map(function(bin) {
return {key: bin, value: bins[bin].length};
})
},
bins: function() {
return _bins;
}
};
}
function bin_30_mins = function(v) {
return 30 * Math.ceil(v/30);
}
var macsPerMinuteCount = bin_keys_by_value(minutesPerMacGroup);
This will retain the mac addresses for each time bin, which we'll need for filtering later. It's uncommon to add a non-standard method bins to a fake group, but I can't think of an efficient way to retain that information, given that the filtering interface will only give us access to the keys.
Since the function takes a binning function, we could even use a threshold scale if we wanted more complicated bins than just rounding up to the nearest 30 minutes. A quantize scale is a more general way to do the rounding shown above.
Setting up the chart
Using this data to drive a chart is simple: we can use the dimension and fake group as usual.
chart
.dimension(minutesPerMacDim)
.group(macsPerMinuteCount)
Setting up the chart so that it can filter is a bit more complicated:
chart.filterHandler(function(dimension, filters) {
if(filters.length === 0)
dimension.filter(null);
else {
var bins = chart.group().bins(); // retrieve cached bins
var macs = filters.map(function(key) { return bins[key]; })
macs = Array.prototype.concat.apply([], macs);
var macset = d3.set(macs);
dimension.filterFunction(function(key) {
return macset.has(key);
})
}
})
Recall that we're using a dimension which is keyed on mac addresses; this is good because we want to filter on mac addresses. But the chart is receiving minute-counts for its keys, and the filters will contain those keys, like 30, 60, 90, etc. So we need to supply a filterHandler which takes minute-count keys and filters the dimension based on those.
Note 1: This is all untested, so if it doesn't work, please post an example as a fiddle or bl.ock - there are fiddles and blocks you can fork to get started on the main page.
Note 2: Strictly speaking, this is not measuring the length of connections: it's counting the total number of minutes connected. Not sure if this matters to you. If a user disconnects and then reconnects within the timeframe, the two sessions will be counted as one. I think you'd have to preprocess to get duration.
EDIT: Based on your fiddle (thank you!) the code above does seem to work. It's just a matter of setting up the x scale and xUnits properly.
chart2
.x(d3.scale.linear().domain([60,1440]))
.xUnits(function(start, end) {
return (end-start)/30;
})
A linear scale will do just fine here - I wouldn't try to quantize that scale, since the 30-minute divisions are already set up. We do need to set the xUnits so that dc.js knows how wide to make the bars.
I'm not sure why elasticX didn't work here, but the <30 bin completely dwarfed everything else, so I thought it was best to leave that out.
Fork of your fiddle: https://jsfiddle.net/gordonwoodhull/2a8ow1ay/2/

How to show "missing" rows in a rowChart using crossfilter and dc.js?

I'm using code similar to that in the dc.js annotated example:
var ndx = crossfilter(data);
...
var dayName=["0.Sun","1.Mon","2.Tue","3.Wed","4.Thu","5.Fri","6.Sat"];
var dayOfWeek = ndx.dimension(function (d) {
var day = d.dd.getDay();
return dayName[day];
});
var dayOfWeekGroup = dayOfWeek.group();
var dayOfWeekChart = dc.rowChart("#day-of-week-chart");
dayOfWeekChart.width(180)
.height(180)
.group(dayOfWeekGroup)
.label(function(d){return d.key.substr(2);})
.dimension(dayOfWeek);
The issue I've got is that only days of the week present in the data are displayed in my rowChart, and there's no guarantee every day will be represented in all of my data sets.
This is desirable behaviour for many types of categories, but it's a bit disconcerting to omit them for short and well-known lists like day and month names and I'd rather an empty row was included instead.
For a barChart, I can use .xUnits(dc.units.ordinal) and something like .x(d3.scale.ordinal.domain(dayName)).
Is there some way to do the same thing for a rowChart so that all days of the week are displayed, whether present in data or not?
From my understanding of the crossfilter library, I need to do this at the chart level, and the dimension is OK as is. I've been digging around in the dc.js 1.6.0 api reference, and the d3 scales documentation but haven't had any luck finding what I'm looking for.
Solution
Based on #Gordon's answer, I've added the following function:
function ordinal_groups(keys, group) {
return {
all: function () {
var values = {};
group.all().forEach(function(d, i) {
values[d.key] = d.value;
});
var g = [];
keys.forEach(function(key) {
g.push({key: key,
value: values[key] || 0});
});
return g;
}
};
}
Calling this as follows will fill in any missing rows with 0s:
.group(ordinal_groups(dayNames, dayOfWeekGroup))
Actually, I think you are better off making sure that the groups exist before passing them off to dc.js.
One way to do this is the "fake group" pattern described here:
https://github.com/dc-js/dc.js/wiki/FAQ#filter-the-data-before-its-charted
This way you can make sure the extra entries are created every time the data changes.
Are you saying that you tried adding the extra entries to the ordinal domain and they still weren't represented in the row chart, whereas this did work for bar charts? That sounds like a bug to me. Specifically, it looks like support for ordinal domains needs to be added to the row chart.

d3 linechart - Show 0 on the y-axis without passing in all points?

I have a line chart. Its purpose is to show the amount of transactions per user over a given time period.
To do this I'm getting the dates of all users transactions. I'm working off this example : http://bl.ocks.org/mbostock/3884955 and have the line chart renedering fine.
My x-axis is time and the y-axis is number of transactions. The problem I have is to do with displaying dates when there is no activity.
Say I have 4 transactions on Tuesday and 5 transactions on Thursday..I need to show that there has been 0 transactions on Wednesday. As no data exists in my database explicitly stating that a user has made no transactions on Wedensday do I need to pass in the Wednesday time (and all other times, depending on the timeframe) with a 0 value? or can I do it with d3? I can't seem to find any examples that fit my problem.
This seems like a pretty common issue, so I worked up an example implementation here: http://jsfiddle.net/nrabinowitz/dhW2F/2/
Relevant code:
// get the min/max dates
var extent = d3.extent(data, function(d) { return d.date; }),
// hash the existing days for easy lookup
dateHash = data.reduce(function(agg, d) {
agg[d.date] = true;
return agg;
}, {}),
// note that this leverages some "get all headers but date" processing
// already present in the example
headers = color.domain();
// make even intervals
d3.time.days(extent[0], extent[1])
// drop the existing ones
.filter(function(date) {
return !dateHash[date];
})
// and push them into the array
.forEach(function(date) {
var emptyRow = { date: date };
headers.forEach(function(header) {
emptyRow[header] = null;
});
data.push(emptyRow);
});
// re-sort the data
data.sort(function(a, b) { return d3.ascending(a.date, b.date); });
As you can see, it's a bit convoluted, but seems to work well - you make an array of evenly spaced dates using the handy d3.interval.range method, filter out those dates already present in your data, and use the remaining ones to push empty rows. One downside is that performance could be slow for a big dataset - and this assumes full rows are empty, rather than different empty dates in different series.
An alternate representation, with gaps (using line.defined) instead of zero points, is here: http://jsfiddle.net/nrabinowitz/dhW2F/3/

Resources