Summary
I have a large collection of data with various datetimes. Currently I have been able to group all my data to display properly in a local timezone; however, when trying to display this data in a different timezone the lines on a lineChart get choppy and the connection between them is not as smooth as when in a local timezone.
I had found this link here detailing a possible solution, but sadly this won't work without me duplicating my entire dataset with times offset from utc and then once as normal. The data I have is not only used in several charts to gain insights about trends and statistics, but there is a raw table used for display/editing/reviewing specific data members. Thus translating one into a utc time and calculating the time change in utc time would throw the other off.
https://groups.google.com/forum/#!msg/d3-js/iWmP9Npv2Go/xyypdLjWu2QJ
My question is: Is there a way to translate datetime data across timezones and have dc.js respect the timezone you would like to display in. I would like the adjusted graphs to look the same way the local graph looks where the lines are not one-sided based on the timezone.
code and photos
fiddle: https://jsfiddle.net/spacarar/j0urt9sy/49/
this is the correctly displaying image for my local timezone. The lines are smooth transitions between dates.
This is the incorrectly displaying image for any other timezone (depending on which side of my local changes orientation from leaning left to leaning right)
A simplified version of my data looks something like this:
var data = [
{
value: 42,
datetime: '2019-10-24T07:18:00.000000'
},
{
value: 10,
datetime: '2019-10-24T07:19:12.000000'
},
{
value: 12,
datetime: '2019-10-29T04:18:00.000000'
},
{
value: 8,
datetime: '2019-10-29T09:18:00.000000'
}
]
which I then fake group to fill in any dates that may not be present in the data and translate them using moment-timezone to end up with a data structure similar to this
{
value: 0,
datetime: moment timezone object with full datetime,
date: moment timezone object representing only date (0 hours, minutes, seconds, ms)
}
this fake grouped/fixed data is then used to create the chart with the following code
var ndx = dc.crossfilter(fakeGroupedData)
var dateDim = ndx.dimension(dc.pluck('date'))
var top = dateDim.top(1)[0] ? dateDim.top(1)[0].date : null
var bottom = dateDim.bottom(1)[0] ? dateDim.bottom(1)[0].date : null
var chart = dc.lineChart('#date-chart')
chart.yAxis().tickFormat(dc.d3.format(',.0f'))
chart.xAxis().ticks(10).tickFormat(d => moment(d).format('M/D'))
chart.dimension(dateDim)
.group(dateDim.group().reduceSum(dc.pluck('value')))
.x(dc.d3.scaleTime().domain([bottom, top]).nice())
.elasticY(true)
.renderArea(true)
.render()
A couple of points about your date-filling.
This is not what's normally meant by a "fake group". You're filling in the source data and all of your crossfilter groups are completely "real" :)
There isn't any point in filling at a higher resolution than you intend to show. To simplify the problem, I changed your code to fill by days, and it worked exactly the same:
let start = moment.tz(startDate, tzSelection).startOf('day')
let end = moment.tz(endDate, tzSelection).endOf('day')
let hours = end.diff(start, 'days')
for (let i = 0; i < hours; i++) {
let fakeTime = moment(start).add(i, 'days')
let date = moment(fakeTime.format('YYYY-MM-DD'))
fakeGroupedData.push({
value: 0,
datetime: fakeTime,
date
})
}
It might be easier to use d3-time, since that integrates tighter with dc.js, but I didn't want to make big changes to your code.
However, you are essentially quantizing by day, so you can set up your dimension to quantize to the beginning of the day in the current timezone, and that will fix your chart:
var dateDim = ndx.dimension(d => d3.timeDay(dc.pluck('date')(d)))
If you do this, you don't need to modify your input dates:
el.date = el.datetime //.clone().startOf('day')
D3 will truncate to the current day, and then crossfilter will bin at that resolution.
https://jsfiddle.net/gordonwoodhull/Lxvcoq3h/19/
Note that it's binning both of the 10/29 entries into one.
In my timezone UTC-5, the moment startOf('day') rounding was causing the first of those entries to land on the 28th, which matches what you said you wanted:
https://jsfiddle.net/gordonwoodhull/Lxvcoq3h/21/
You'll have to decide which one is correct for your application. The main point is that if you're displaying your charts in the local timezone, the data should be quantized to local days.
Related
Good Evening Everyone,
I'm trying to take the data from a database full of hour reports (name, timestamp, hours worked, etc.) and create a plot using dc.js to visualize the data. I would like the timestamp to be on the x-axis, the sum of hours for the particular timestamp on the y-axis, and a new bar graph for each unique name all on the same chart.
It appears based on my objectives that using crossfilter.js the timestamp should be my 'dimension' and then the sum of hours should be my 'group'.
Question 1, how would I then use the dimension and group to further split the data based on the person's name and then create a bar graph to add to my composite graph? I would like for the crossfilter.js functionality to remain intact so that if I add a date range tool or some other user controllable filter, everything updates accordingly.
Question 2, my timestamps are in MySQL datetime format: YYYY-mm-dd HH:MM:SS so how would I go about dropping precision? For instance, if I want to combine all entries from the same day into one entry (day precision) or combine all entries in one month into a single entry (month precision).
Thanks in advance!
---- Added on 2017/01/28 16:06
To further clarify, I'm referencing the Crossfilter & DC APIs alongside the DC NASDAQ and Composite examples. The Composite example has shown me how to place multiple line/bar charts on a single graph. On the composite chart I've created, each of the bar charts I've added a dimension based off of the timestamps in the data-set. Now I'm trying to figure out how to define the groups for each. I want each bar chart to represent the total time worked per timestamp.
For example, I have five people in my database, so I want there to be five bar charts within the single composite chart. Today all five submitted reports saying they worked 8 hours, so now all five bar charts should show a mark at 01/28/2017 on the x-axis and 8 hours on the y-axis.
var parseDate = d3.time.format('%Y-%m-%d %H:%M:%S').parse;
data.forEach(function(d) {
d.timestamp = parseDate(d.timestamp);
});
var ndx = crossfilter(data);
var writtenDimension = ndx.dimension(function(d) {
return d.timestamp;
});
var hoursSumGroup = writtenDimension.group().reduceSum(function(d) {
return d.time_total;
});
var minDate = parseDate('2017-01-01 00:00:00');
var maxDate = parseDate('2017-01-31 23:59:59');
var mybarChart = dc.compositeChart("#my_chart");
mybarChart
.width(window.innerWidth)
.height(480)
.x(d3.time.scale().domain([minDate,maxDate]))
.brushOn(false)
.clipPadding(10)
.yAxisLabel("This is the Y Axis!")
.compose([
dc.barChart(mybarChart)
.dimension(writtenDimension)
.colors('red')
.group(hoursSumGroup, "Top Line")
]);
So based on what I have right now and the example I've provided, in the compose section I should have 5 charts because there are 5 people (obviously this needs to be dynamic in the end) and each of those charts should only show the timestamp: total_time data for that person.
At this point I don't know how to further breakup the group hoursSumGroup based on each person and this is where my Question #1 comes in and I need help figuring out.
Question #2 above is that I want to make sure that the code is both dynamic (more people can be handled without code change), when minDate and maxDate are later tied to user input fields, the charts update automatically (I assume through adjusting the dimension variable in some way), and if I add a names filter that if I unselect names that the chart will update by removing the data for that person.
A Question #3 that I'm now realizing I'll want to figure out is how to get the person's name to show up in the pointer tooltip (the title) along with timestamp and total_time values.
There are a number of ways to go about this, but I think the easiest thing to do is to create a custom reduction which reduces each person into a sub-bin.
First off, addressing question #2, you'll want to set up your dimension based on the time interval you're interested in. For instance, if you're looking at days:
var writtenDimension = ndx.dimension(function(d) {
return d3.time.hour(d.timestamp);
});
chart.xUnits(d3.time.hours);
This will cause each timestamp to be rounded down to the nearest hour, and tell the chart to calculate the bar width accordingly.
Next, here's a custom reduction (from the FAQ) which will create an object for each reduced value, with values for each person's name:
var hoursSumGroup = writtenDimension.group().reduce(
function(p, v) { // add
p[v.name] = (p[v.name] || 0) + d.time_total;
return p;
},
function(p, v) { // remove
p[v.name] -= d.time_total;
return p;
},
function() { // init
return {};
});
I did not go with the series example I mentioned in the comments, because I think composite keys can be difficult to deal with. That's another option, and I'll expand my answer if that's necessary.
Next, we can feed the composite line charts with value accessors that can fetch the value by name.
Assume we have an array names.
compositeChart.shareTitle(false);
compositeChart.compose(
names.map(function(name) {
return dc.lineChart(compositeChart)
.dimension(writtenDimension)
.colors('red')
.group(hoursSumGroup)
.valueAccessor(function(kv) {
return kv.value[name];
})
.title(function(kv) {
return name + ' ' + kv.key + ': ' + kv.value;
});
}));
Again, it wouldn't make sense to use bar charts here, because they would obscure each other.
If you filter a name elsewhere, it will cause the line for the name to drop to zero. Having the line disappear entirely would probably not be so simple.
The above shareTitle(false) ensures that the child charts will draw their own titles; the title functions just add the current name to those titles (which would usually just be key:value).
Here is my data about mac address. It is recorded per minute. For each minute, I have many unique Mac addresses.
mac_add,created_time
18:59:36:12:23:33,2016-12-07 00:00:00.000
1c:e1:92:34:d7:46,2016-12-07 00:00:00.000
2c:f0:ee:86:bd:51,2016-12-07 00:00:00.000
5c:cf:7f:d3:2e:ce,2016-12-07 00:00:00.000
...
18:59:36:12:23:33,2016-12-07 00:01:00.000
1c:cd:e5:1e:99:78,2016-12-07 00:01:00.000
1c:e1:92:34:d7:46,2016-12-07 00:01:00.000
5c:cf:7f:22:01:df,2016-12-07 00:01:00.000
5c:cf:7f:d3:2e:ce,2016-12-07 00:01:00.000
...
I would like to create 2 bar charts using dc.js and crossfilter. Please refer to the image for the charts.
The first bar chart is easy enough to create. It is brushable. I created the "created_time" dimension, and created a group and reduceCount by "mac_add", such as below:
var moveTime = ndx.dimension(function (d) {
return d.dd; //# this is the created_time
});
var timeGroup = moveTime.group().reduceCount(function (d) {
return d.mac_add;
});
var visitorChart = dc.barChart('#visitor-no-bar');
visitorChart.width(990)
.height(350)
.margins({ top: 0, right: 50, bottom: 20, left: 40 })
.dimension(moveTime)
.group(timeGroup)
.centerBar(true)
.gap(1)
.elasticY(true)
.x(d3.time.scale().domain([new Date(2016, 11, 7), new Date(2016, 11, 13)]))
.round(d3.time.minute.round)
.xUnits(d3.time.minute);
visitorChart.render();
The problem is on the second bar chart. The idea is that, one row of the data equals 1 minute, so I can aggregate and sum all minutes of each mac address to get the time length of each mac addresses, by creating another dimension by "mac_add" and do reduceCount on "mac_add" to get the time length. Then the goal is to group the time length by 30 minutes. So we can get how many mac address that have time length of 30 min and less, how many mac_add that have time length between 30 min and 1 hour, how many mac_add that have time length between 1 hour and 1.5 hour, etc...
Please correct me if I am wrong. Logically, I was thinking the dimension of the second bar chart should be the group of time length (such as <30, <1hr, < 1.5hr, etc). But the time length group themselves are not fix. It depends on the brush selection of the first chart. Maybe it only contains 30 min, maybe it only contains 1.5 hours, maybe it contains 1.5 hours and 2 hours, etc...
So I am really confused what parameters to put into the second bar chart. And method to get the required parameters (how to group a grouped data). Please help me to explain the solution.
Regards,
Marvin
I think we've called this a "double grouping" in the past, but I can't find the previous questions.
Setting up the groups
I'd start with a regular crossfilter group for the mac addresses, and then produce a fake group to aggregate by count of minutes.
var minutesPerMacDim = ndx.dimension(function(d) { return d.mac_add; }),
minutesPerMapGroup = minutesPerMacDim.group();
function bin_keys_by_value(group, bin_value) {
var _bins;
return {
all: function() {
var bins = {};
group.all().forEach(function(kv) {
var valk = bin_value(kv.value);
bins[valk] = bins[valk] || [];
bins[valk].push(kv.key);
});
_bins = bins;
// note: Object.keys returning numerical order here might not
// work everywhere, but I couldn't find a browser where it didn't
return Object.keys(bins).map(function(bin) {
return {key: bin, value: bins[bin].length};
})
},
bins: function() {
return _bins;
}
};
}
function bin_30_mins = function(v) {
return 30 * Math.ceil(v/30);
}
var macsPerMinuteCount = bin_keys_by_value(minutesPerMacGroup);
This will retain the mac addresses for each time bin, which we'll need for filtering later. It's uncommon to add a non-standard method bins to a fake group, but I can't think of an efficient way to retain that information, given that the filtering interface will only give us access to the keys.
Since the function takes a binning function, we could even use a threshold scale if we wanted more complicated bins than just rounding up to the nearest 30 minutes. A quantize scale is a more general way to do the rounding shown above.
Setting up the chart
Using this data to drive a chart is simple: we can use the dimension and fake group as usual.
chart
.dimension(minutesPerMacDim)
.group(macsPerMinuteCount)
Setting up the chart so that it can filter is a bit more complicated:
chart.filterHandler(function(dimension, filters) {
if(filters.length === 0)
dimension.filter(null);
else {
var bins = chart.group().bins(); // retrieve cached bins
var macs = filters.map(function(key) { return bins[key]; })
macs = Array.prototype.concat.apply([], macs);
var macset = d3.set(macs);
dimension.filterFunction(function(key) {
return macset.has(key);
})
}
})
Recall that we're using a dimension which is keyed on mac addresses; this is good because we want to filter on mac addresses. But the chart is receiving minute-counts for its keys, and the filters will contain those keys, like 30, 60, 90, etc. So we need to supply a filterHandler which takes minute-count keys and filters the dimension based on those.
Note 1: This is all untested, so if it doesn't work, please post an example as a fiddle or bl.ock - there are fiddles and blocks you can fork to get started on the main page.
Note 2: Strictly speaking, this is not measuring the length of connections: it's counting the total number of minutes connected. Not sure if this matters to you. If a user disconnects and then reconnects within the timeframe, the two sessions will be counted as one. I think you'd have to preprocess to get duration.
EDIT: Based on your fiddle (thank you!) the code above does seem to work. It's just a matter of setting up the x scale and xUnits properly.
chart2
.x(d3.scale.linear().domain([60,1440]))
.xUnits(function(start, end) {
return (end-start)/30;
})
A linear scale will do just fine here - I wouldn't try to quantize that scale, since the 30-minute divisions are already set up. We do need to set the xUnits so that dc.js knows how wide to make the bars.
I'm not sure why elasticX didn't work here, but the <30 bin completely dwarfed everything else, so I thought it was best to leave that out.
Fork of your fiddle: https://jsfiddle.net/gordonwoodhull/2a8ow1ay/2/
I have a barChart with a d3.time.scale x-axis. I am displaying some data per hour, but the first and last data point bars are always cut in half when using centerBar(true).
(When using centerBar(false) the last bar disappears completely.)
The time window is based upon the data itself and is calculated as follows:
var minDate = dateDim.bottom(1)[0]["timestamp"];
var maxDate = dateDim.top(1)[0]["timestamp"];
.x(d3.time.scale().domain([minDate, maxDate]));
The last line sets the time scale domain to use min and maxDate.
This is how it looks:
I have increased the bar width slightly using .xUnits(function(){return 60;}) as the default is so thin that the first bar disappears within the y-axis.
Also I already tried to change the domain by substracting/adding one hour to min/maxDate, but this results in unexpected behaviour of the first bar.
I used the following to calculate the offset:
minDate.setHours(minDate.getHours() - 1);
maxDate.setHours(maxDate.getHours() + 1);
Is there a fix or workaround for this to add padding before the first and after the last bar?
Subtract an hour from the minDate and add an hour to the maxDate to get an hour worth of padding on each side of your min and max data.
The trick here is to use d3.time.hour.offset and play with offsets until it looks nice.
.x(d3.time.scale().domain([d3.time.hour.offset(minDate, -1), d3.time.hour.offset(maxDate, 2)])); `
See this fiddle http://jsfiddle.net/austinlyons/ujdxhd27/3/
The mistake was not realising JavaScript's passing-by-reference when using objects such as Date objects.
In addition to Austin's answer, which did solve the problem by using d3 functionality, I investigated why my initial attempt by modifying the minDate and maxDate variables failed.
The problem is that when creating the variables
var minDate = dateDim.bottom(1)[0]["timestamp"];
var maxDate = dateDim.top(1)[0]["timestamp"];
I created pointers to the actual data instead of creating new objects with the same value as the minDate and maxDate objects. The line
minDate.setHours(minDate.getHours() - 1);
therefore then manipulated the actual underlying data within the date dimension dateDim, which then led to the peculiar behaviour.
The obvious solution would have been to create new Date() objects like this:
var minDate = new Date(dateDim.bottom(1)[0]["timestamp"]);
var maxDate = new Date(dateDim.top(1)[0]["timestamp"]);
and then do the desired manipulations:
minDate.setHours(minDate.getHours() - 1);
maxDate.setHours(maxDate.getHours() + 1);
I'm able to zoom as in block 4015254. But as per my other questions I'm trying to zoom into the data, not the vector diagram.
I think I'm most of the way there but need a way to know what the new x-axis endpoints are.
For example, block 4015254 starts with an x-axis ranging from January 1999 to January 2003. After zooming in a bit, the x-axis now ranges from about February 2000 to about November 2000. Those are the endpoints I'm looking for.
I'm assuming I'll need to pass my chart's endpoints into a scaling function or d3.time.scale or d3.scale.linear.
The idea is to grab those endpoints and, upon zoomend recalculate the bar chart's granularity based upon the new range. Here's some pseudocode:
function zoomEndFunction(scope) {
return function() {
var that = scope;
var startDate = xAxis.scale()(0); // <---- this doesn't work
var endDate = xAxis.scale()(that.chartWidth); // <---- nor this
that.redrawChart(startDate, endDate);
}
}
I have a line chart. Its purpose is to show the amount of transactions per user over a given time period.
To do this I'm getting the dates of all users transactions. I'm working off this example : http://bl.ocks.org/mbostock/3884955 and have the line chart renedering fine.
My x-axis is time and the y-axis is number of transactions. The problem I have is to do with displaying dates when there is no activity.
Say I have 4 transactions on Tuesday and 5 transactions on Thursday..I need to show that there has been 0 transactions on Wednesday. As no data exists in my database explicitly stating that a user has made no transactions on Wedensday do I need to pass in the Wednesday time (and all other times, depending on the timeframe) with a 0 value? or can I do it with d3? I can't seem to find any examples that fit my problem.
This seems like a pretty common issue, so I worked up an example implementation here: http://jsfiddle.net/nrabinowitz/dhW2F/2/
Relevant code:
// get the min/max dates
var extent = d3.extent(data, function(d) { return d.date; }),
// hash the existing days for easy lookup
dateHash = data.reduce(function(agg, d) {
agg[d.date] = true;
return agg;
}, {}),
// note that this leverages some "get all headers but date" processing
// already present in the example
headers = color.domain();
// make even intervals
d3.time.days(extent[0], extent[1])
// drop the existing ones
.filter(function(date) {
return !dateHash[date];
})
// and push them into the array
.forEach(function(date) {
var emptyRow = { date: date };
headers.forEach(function(header) {
emptyRow[header] = null;
});
data.push(emptyRow);
});
// re-sort the data
data.sort(function(a, b) { return d3.ascending(a.date, b.date); });
As you can see, it's a bit convoluted, but seems to work well - you make an array of evenly spaced dates using the handy d3.interval.range method, filter out those dates already present in your data, and use the remaining ones to push empty rows. One downside is that performance could be slow for a big dataset - and this assumes full rows are empty, rather than different empty dates in different series.
An alternate representation, with gaps (using line.defined) instead of zero points, is here: http://jsfiddle.net/nrabinowitz/dhW2F/3/