Group by multiple dimensions recursively in dc.js? - d3.js

dc.js has been great, and now I'm trying to understand how to use it for data with multiple dimensions.
I have time series data (csv), which contains the number of people that fit a certain attribute on a given day - e.g. the number of brown-haired people age 65+. A simplified version of it looks like this (There are 5 options for hair color, 5 for age group, and about 200 dates):
Date, Hair Color, 0-18, 19-39, 40-64, 65+
1/1/21, Brown, 5, 3, 10, 2
1/1/21, Blonde, 15, 2, 4, 1
1/2/21, Brown, 2, 8, 0, 2
1/2/21, Blonde, 11, 6, 7, 4
...
I'd like to be able to plot the cumulative counts over time for each sub-population. The complication is that I'd like to show
A plot aggregated by hair color
(so summing over all age groups), which can then be toggled (ideally by clicking on one of the lines) to show:
A plot for a given hair color
disaggregated by age group.
(Note that in the mockups, I'm normalizing counts to show it as a cumulative percentage. I've been doing that calculation straightforwardly with valueAccessors.)
My question is: how do I create the dimensions and groups to create these plots?
I'd prefer not to create individual variables for each age group (I'd like it to be generic enough to expand to finer categories). But I'm having trouble understanding how to use reduce and filters to achieve my desired outcome.
Also, should I be doing it all as linecharts in a compositeChart, or in a series chart? There is the added wrinkle that I plan to then annotate the chart with extra trendlines added in from d3.
Thanks!

The series chart is a convenience class that generates a composite chart underneath.
It allows you to specify your data using a 2D key, where one component is the key to be used for the X values in the chart, and one component is another key to be used for splitting the data into multiple layers - lines, in your case. You also give it the "prototype" of the layer chart, in the form of a function that returns a partially-initialized chart.
It sounds like you are on the right track, so I won't attempt to give a complete answer, just a few hints. Please feel free to follow up in the comments, and I will edit this answer to fill in details.
Flattening the data
You will probably want to flatten your data so that there is only one value per row, i.e. structure it with an Age column and a Value column. This is a general best practice for working with crossfilter.
It's possible to work with the data as you have it, but
you won't be able to filter by age, since filtering in crossfilter is by row
aggregating across ages will be more complicated, requiring custom reductions
Using multikeys and series chart
Following the series chart example, you might define your dimension as
const colorDateDimension = cf.dimension(d => [d['Hair Color'], d.Date]);
Now any group on this dimension will aggregate by both hair color and date.
Now if you're using the series chart, you can extract the components with
chart
.seriesAccessor(({key}) => key[0])
.keyAccessor(({key}) => key[1])
You could use the third parameter of the series chart chart function to determine the color or dash style of the layer, e.g.:
const dashStyles = {
'0-18': [3,1],
'19-29': [4,1,1,1],
// ...
};
.chart(function(c, _, subkey) {
return new dc.LineChart(c).dashStyle(dashStyles[subkey]);
})
Interaction
dc.js does not natively support the kind of drill-down you are describing. It would be easier to have one chart which is by hair color and another chart which is by age. Then when no hair color is selected, the age chart will show all hair colors, and when no age is selected, the hair color chart will show all ages.
If you want drill-down as you describe, you will have to write custom code to apply the filter and swap the chart definition when a hair color is clicked. It's not terribly complicated but please ask a follow-up question if you can't figure it out - it's better to keep SO questions on a single topic.
Annotating with D3
This part is pretty simple no matter how you implement the charts.
You will implement a pretransition handler and use chart.selectAll to add the content you need. There are many examples here on SO, so I won't go into it here.
Conclusion
I hope this gets you started. I've answered your specific question and given some hints about other assumptions or implicit questions within your question. It will be some work to get the results you want, but it is definitely possible.

Related

create a double donut chart in a single view with two measure values in tableau

I need to create this visual in a single view in tableau. The chart contains two value, one is ytd and other one is lytd. both are measures(made by a calculated field). Need help to achieve this visual.
Let's just look at how to show two pie charts on one sheet, which isn't obvious!
In Tableau Public take the Superstore and make dual pie charts for two variables, sales and profits, including all data in each chart.
The trick is to use a new calculated variable MIN(1) ( yes, minimum of one. ) and put that up on the rows shelf. ( To be honest I have no idea at all why this works. )
Here's btProvider's youtube video that suggested this idea.
https://www.youtube.com/watch?v=1rqkjkUsUj4
and here's a polished version what I got relatively easily
Here's a view of what the sheet looks like with Min(1) up on the rows shelf twice which produces two pie charts that can be separately defined on the Marks card.
I put my whole workbook up on Tableau Public so you can see what I did.
https://public.tableau.com/app/profile/wade.schuette/viz/dualpiechartsdemo/Dashboard1?publish=yes

Kibana Visualization Separating X-Axis Values I Want Grouped

I have data being written to Elasticsearch that I wanted to visualize in Kibana, but I'm having problems with the visualization.
I have a process writing when it starts {ProcessStartTime} and when it stops {ProcessStopTime}
I'm trying to create what I thought was a simple visualization:
A vertical bar chart with Count as the Y-Axis and {ProcessStartTime} and {ProcessStopTime} as bars on the X-Axis.
The problem is, instead of count of 480 for the {ProcessStartTime} as one vertical bar and a count for 389 for {ProcessStopTime} as another vertical bar. It separates out all unique {ProcessStartTime} entires so I have a count of 1 with a thousand vertical bars. Moreover, I appears I cannot add more than one term, just sub categories, so {ProcessStopTime} isn't on the bar chart at all. So I decided to try the Filter aggregation, which allowed me to get a count of all entries with "ProcessStartTime" in the body. However, I cannot add "ProcessStopTime" as another filter as those don't coexist.
My current solution is to have two charts, using the Filter aggregation, then compare the charts side-by-side to compare the counts. For obvious reasons, I'd like those combined, but I just don't see how to have two X-Axis buckets, or to group the data as it needs to be.
I am missing something obvious?
I might get wrong what you are trying to do and I can't comment on your question to ask for details, but here are a few things that you can do:
Get all entries regardless of their content (empty search query). Keep the Y-axis metrics for Aggregation-Count.
After that you can set a bucket for the X-axis with Filters aggregation, and use 2 filters.
Filter 1: ProcessStartTime: *
Filter 2: ProcessStopTime: *
This setup should give you 2 bars with the count of records that have the given attributes.
The other option is to make a new attribute, for example 'event', and give this attribute the values 'ProcessStartTime' and 'ProcessStopTime', and make a Terms aggregation bucket setup on event.keyword.
I hope this helps.

D3JS Change TSV Data Column Ordering

I've just draw a stacked-area-chart with D3JS.
This is my referral implementation
I also need to dynamically swap the ordering of the layers.
I think that there isn't a way to do it dynamically without redrawing (or is there any? :D )
Actually i'm trying to map the data to a new header column, but this implies the redrawing.
Let me show you an example:
Here is the TSV header ['date', 'columnA', 'columnB', 'columnC']
Every column, except of 'date', represent the % of area for that sample.
I would like to dynamically rearrange the area layers, but I'm pretty sure that I also need to parse again the data with a new header
eg: ,
['date', 'columnA', 'columnB', 'columnC']
-map to-
['date','columnB', 'columnC', 'columnA']
and then draw the result.
I'm doing it right? Thanks for your support, cheers.
This is the line that defines the array that will be passed to the stack() function:
var keys = data.columns.slice(1);
Right now, this is the array:
["Google Chrome","Internet Explorer","Firefox","Safari","Microsoft Edge","Opera","Mozilla","Other/Unknown"]
But you can sort it anyway you want. For instance, sorting by alphabetic order:
keys.sort();
Which gives us:
["Firefox","Google Chrome","Internet Explorer","Microsoft Edge","Mozilla","Opera","Other/Unknown","Safari"]
Here is the result: https://bl.ocks.org/anonymous/6a339ed0731a70bb234af150ee6b4a99
Here is another one, with a random permutation (refresh the page to see diferent orders): https://bl.ocks.org/anonymous/662f99901219b8907030ec3c84363f3a
Pay attention to this: the order in the stacked area chart is now different, but the colours don't keep the same for each browser (that is, each stacked area). That's because d3.scaleOrdinal(d3.schemeCategory10) assigns the colours in a first-come, first served basis.

dc.js heatmap expanding data

I am trying to show machine states over time. Part of this is to reproduce/automate a report that used to be done by hand. It consists of coloring 2minute 'time slices' in Excel based on what the machine is doing.
(Sorry, not enough reputation to post a picture, but it is a classic heatmap where the state drives the color. Some non DC-JS fiddle: http://jsfiddle.net/ww6Lbnc5/4/)
I was able to generate most of what I want in the following jsfiddle:
http://jsfiddle.net/hwhfxz2t/14/
See fiddle for code.
The total state duration (for selected time frame) is shown in the pieChart, followed by the individual state lines and then the heatmap that people are used to. (the ZOOM and date selection buttons do not work in the fiddle but are there to select specific data ranges or zoom in if you like).
The line charts uses the original representation of the states, which consists of a time the state is entered and a duration.
In order to make the heat map work, I had to (I think) take the original data and convert it into individual minute chunks and mark them with a state. So for instance the original data specifying:
RUN state starting 14:30 for 300 seconds
becomes:
14:30=RUN, 14:31=RUN, 14:32=RUN, 14:33=RUN and 14:34=RUN
The code in lines 233-297 loops through the original data and generates a new one that does this. In cases where there is more than one state within a given minute, the last state survives.
This works okay but it seems that this code is exactly what is normally done in group().reduce(add,remove,init). But in this case I need to add multiple timeslots depending on the duration of a state.
Also, because it is now using a different crossfilter, maps do not update each other.
Here are my questions related to this:
Can I display a heatmap without supplying information for all individual
'cells'? (i.e. straddle cells based on a value, similar to rowspan in a table)
Can I add multiple values at once inside group().reduce()?
Is there an easy way to invert the yAxis so 0 is at the top?
When clicking a row in the heatmap, it selects a column and vice-versa?
I'm not sure if this should be in the crossfilter group. If so please ignore my rambling. If someone knows how to keep the charts linked by grouping better, please let me know.
--Nico
Concerning Question 3:
DC.js heatmaps currently do not support custom order functions on axis but there is a pull request that has been merged into the developing branch and should be accessible to the public soon.
You could manually edit the dc.js file to set the sorting in heatmaps to a custom function. In the latest (2.0.0-beta10) version it is the following line:
rowValues.sort(d3.ascending);
and accordingly
colValues.sort(d3.ascending);

How to create a bar chart in D3?

I need help creating a bar chart, just a normal bar chart.
I have 5 objects, each with their own 10 fields. I want to create a side-by-side bar layout where the x axis labels are the 10 fields. Above the ith field should be a rectangle for each 5 object with each height corresponding to that objects data for that field. Each object has its own distinctive color.
I've spend 4+ hours and I've got nothing. Google reveals nothing that helps. Note: the rectangles of the bar chart should be vertical.
Edit: I'm not getting a lot of help. The answers have been unhelpful, though no fault of the authors. So here's my situation. I'm reading a JSON. The format of the JSON
[
...
{
"RowID":85,
"Name":"Tormore",
"B":2,
"S":2,
"S2":1,
"M":0,
"T":"0",
"H":1,
"S":0,
"W":1,
"N":2,
"Malty":1,
"Fruity":0,
"Floral":0
}
...
]
There are a total of 86 JSON pieces of data. That is, there 86 RowID's. I need to only work with 5. There needs to be 5 colors or however many rowID's I want to work with. The x-axis are the "B", "S" ... "Floral" from the JSON. So there should be "clumps" of 5 bars (colored corresponding to "Name") for each "B", ... "Floral" piece of data. The end goal is to make it easy for the user to visually compare 5 things according to their attributes.
I cannot make this work with the given examples. I've been working on this for at least 6 hours. I'm at my wits end.
It is tough to interpret questions at times...I thought the user might want a grouped bar chart :-) Anyways, I borrowed this graph from Mike and cooked up this UPDATED fiddle with appropriate variable names. If that is what you want, you can add more fields and objects and, of course, change the values appropriately. (The color scale may have to be touched as well.)
Some of the fake data:
var data = [
{"Field":"Field1","Object1":"2704659","Object2":"4499890"},
{"Field":"Field2","Object1":"2027307","Object2":"3277946"},
...

Resources