crossfilter.js multidimensional "dimension" - d3.js

not sure how to do this. I have the following data:
Date, Country, QuantityA, QuantityB.
I want to make a timeline Chart with the ratio between Quantity A and B. I also want to create a barChart with Country, which will show the ratio in every country.
The problem is that the ratios are not additive, so if I do this:
var timeDim = ndx.dimension(function(d) {return d.Date;});
ratioAB = timeDim.group().reduceSum(function(d) {return QuantityA/QuantityB}
This will return the ratios for every country separately and will add them up. What I want is to add up QuantityA and QuantityB and then do the ratio.
Thus, the timeline chart will only show the right ratio if I filter in one of the countries.
Is there a way to add both the country and the date as a dimension?

You can create a custom grouping to calculate the sum of QuantityA, the sum of QuantityB, and the ratio between the 2. Or you could just create 2 sum groups, one summing QuantityA, the other QuantityB, and then calculate the ratio when you build the visualization.

Related

Seaborn grouped Bar plot

I am trying to create visualizations for recent commonwealth medal tally dataset.
I would like to create a grouped bar chart of top ten countries by total number of medals won.
Y axis = total
x axis = Country name
How can I divide totals into three bars consisting of no of :
gold, Silver,Bronze medals won by each country?
I created one using excel, but don't know how to do it using seaborn
P.S. I have already tried using a list of columns for hue.
df_10 = df.head(10)
sns.barplot(data = df_10, x = 'team' , y = 'total' , hue = df_10[["gold" ,
"silver","bronze"]].apply(tuple , axis = 1) )
Here is the chart that I created using excel:
enter image description here
To plot the graph, you will need to change the dataframe to the format that will allow for easy plotting. One of the ways to do this is using dataframe.melt(). The method used by you may not work... Once the data is in a format that seaborn understands easily, plotting will become simple. As you have not provided the format for df_10, I have assumed the data to have 4 columns - Country, Gold, Silver and Bronze. Below is the code...
## Use melt using Country as ID and G, S, B as the rows for values
df_10 = pd.melt(df_10, id_vars=['Country'], value_vars=['Gold', 'Silver', 'Bronze'])
df_10.rename(columns={'value':'Count', 'variable':'Medals'}, inplace=True) ##Rename so the plot has informative texts
fig, ax=plt.subplots(figsize=(12, 7)) ## Set figure size
ax=sns.barplot(data=df_10, x='Country', y='Count', hue='Medals') ## Plot the graph

Plotting unique values from a single column in a dc.js box plot

What i am trying to do is plot data from a single column as a box plot in dc.js
My aim here is outlier detection. When i plot the unique numerical values from a column, i should be able to spot the outliers on the box plot.
Now i am having a tough time achieving this. I was able to implement the outlier detection for two columns using the scatter plot. But it seems the same way of defining groups and dimensions does not work with the box plot. So i will have to write a map reduce function to do that.
The box plot should take the numerical dimension as the chart Dimension but not sure how to calculate the chart group for the unique counts.
var groupReduce = dim1.group().reduce(
function(p,v) {
p.push(v.num_donors);
return p;
},
function(p,v) {
p.splice(p.indexOf(v.num_donors), 1);
return p;
},
function() {
return [];
}
);
I tried reductio as well but not sure if exception aggregation is the feature i am looking for. I just want to plot the unique values on a box plot.
Here is the JSfiddle of what i have so far
https://jsfiddle.net/anmolkoul/w5fq73ys/1/
And this is what i am hoping to build for a single numerical columns (in this case num_donors):

dc.js Composite Graph - Plot New Line for Each Person

Good Evening Everyone,
I'm trying to take the data from a database full of hour reports (name, timestamp, hours worked, etc.) and create a plot using dc.js to visualize the data. I would like the timestamp to be on the x-axis, the sum of hours for the particular timestamp on the y-axis, and a new bar graph for each unique name all on the same chart.
It appears based on my objectives that using crossfilter.js the timestamp should be my 'dimension' and then the sum of hours should be my 'group'.
Question 1, how would I then use the dimension and group to further split the data based on the person's name and then create a bar graph to add to my composite graph? I would like for the crossfilter.js functionality to remain intact so that if I add a date range tool or some other user controllable filter, everything updates accordingly.
Question 2, my timestamps are in MySQL datetime format: YYYY-mm-dd HH:MM:SS so how would I go about dropping precision? For instance, if I want to combine all entries from the same day into one entry (day precision) or combine all entries in one month into a single entry (month precision).
Thanks in advance!
---- Added on 2017/01/28 16:06
To further clarify, I'm referencing the Crossfilter & DC APIs alongside the DC NASDAQ and Composite examples. The Composite example has shown me how to place multiple line/bar charts on a single graph. On the composite chart I've created, each of the bar charts I've added a dimension based off of the timestamps in the data-set. Now I'm trying to figure out how to define the groups for each. I want each bar chart to represent the total time worked per timestamp.
For example, I have five people in my database, so I want there to be five bar charts within the single composite chart. Today all five submitted reports saying they worked 8 hours, so now all five bar charts should show a mark at 01/28/2017 on the x-axis and 8 hours on the y-axis.
var parseDate = d3.time.format('%Y-%m-%d %H:%M:%S').parse;
data.forEach(function(d) {
d.timestamp = parseDate(d.timestamp);
});
var ndx = crossfilter(data);
var writtenDimension = ndx.dimension(function(d) {
return d.timestamp;
});
var hoursSumGroup = writtenDimension.group().reduceSum(function(d) {
return d.time_total;
});
var minDate = parseDate('2017-01-01 00:00:00');
var maxDate = parseDate('2017-01-31 23:59:59');
var mybarChart = dc.compositeChart("#my_chart");
mybarChart
.width(window.innerWidth)
.height(480)
.x(d3.time.scale().domain([minDate,maxDate]))
.brushOn(false)
.clipPadding(10)
.yAxisLabel("This is the Y Axis!")
.compose([
dc.barChart(mybarChart)
.dimension(writtenDimension)
.colors('red')
.group(hoursSumGroup, "Top Line")
]);
So based on what I have right now and the example I've provided, in the compose section I should have 5 charts because there are 5 people (obviously this needs to be dynamic in the end) and each of those charts should only show the timestamp: total_time data for that person.
At this point I don't know how to further breakup the group hoursSumGroup based on each person and this is where my Question #1 comes in and I need help figuring out.
Question #2 above is that I want to make sure that the code is both dynamic (more people can be handled without code change), when minDate and maxDate are later tied to user input fields, the charts update automatically (I assume through adjusting the dimension variable in some way), and if I add a names filter that if I unselect names that the chart will update by removing the data for that person.
A Question #3 that I'm now realizing I'll want to figure out is how to get the person's name to show up in the pointer tooltip (the title) along with timestamp and total_time values.
There are a number of ways to go about this, but I think the easiest thing to do is to create a custom reduction which reduces each person into a sub-bin.
First off, addressing question #2, you'll want to set up your dimension based on the time interval you're interested in. For instance, if you're looking at days:
var writtenDimension = ndx.dimension(function(d) {
return d3.time.hour(d.timestamp);
});
chart.xUnits(d3.time.hours);
This will cause each timestamp to be rounded down to the nearest hour, and tell the chart to calculate the bar width accordingly.
Next, here's a custom reduction (from the FAQ) which will create an object for each reduced value, with values for each person's name:
var hoursSumGroup = writtenDimension.group().reduce(
function(p, v) { // add
p[v.name] = (p[v.name] || 0) + d.time_total;
return p;
},
function(p, v) { // remove
p[v.name] -= d.time_total;
return p;
},
function() { // init
return {};
});
I did not go with the series example I mentioned in the comments, because I think composite keys can be difficult to deal with. That's another option, and I'll expand my answer if that's necessary.
Next, we can feed the composite line charts with value accessors that can fetch the value by name.
Assume we have an array names.
compositeChart.shareTitle(false);
compositeChart.compose(
names.map(function(name) {
return dc.lineChart(compositeChart)
.dimension(writtenDimension)
.colors('red')
.group(hoursSumGroup)
.valueAccessor(function(kv) {
return kv.value[name];
})
.title(function(kv) {
return name + ' ' + kv.key + ': ' + kv.value;
});
}));
Again, it wouldn't make sense to use bar charts here, because they would obscure each other.
If you filter a name elsewhere, it will cause the line for the name to drop to zero. Having the line disappear entirely would probably not be so simple.
The above shareTitle(false) ensures that the child charts will draw their own titles; the title functions just add the current name to those titles (which would usually just be key:value).

Get only non-filtered data from dc.js chart (dimension / group)

So this is a question regarding a rather specific problem. As I know from Gordon, main contributor of dc.js, there is no support for elasticY(true) function for logarithmic scales.
So, after knowing this, I tried to implement my own solution, by building a workaround, inside dc.js's renderlet event. This event is always triggered by a click of the user onto the barchart. What I wanted to do is this:
let groupSize = this.getGroupSize(fakeGroup, this.yValue);
let maximum = group.top(1)[0].value;
let minimum = group.top(groupSize)[groupSize-1].value;
console.log(minimum, maximum);
chart.y(d3.scale.log().domain([minimum, maximum])
.range(this.height, 0)
.nice()
.clamp(true));
I thought, that at this point the "fakeGroup" (which is just group.top(50)) contains only the data points that are NOT filtered out after the user clicked somewhere. However, this group always contains all data points that are in the top 50 and doesn't change on filter events.
What I really wanted is get all data points that are NOT filtered out, to get a new maximum and minimum for the yScale and rescale the yAxis accordingly by calling chart.y(...) again.
Is there any way to get only data rows that are still in the chart and not filtered out. I also tried using remove_empty_bins(group) but didn't have any luck with that. Somewhere is always all() or top() missing, even after giving remove_empty_bins both functions.
This is how i solved it:
I made a function called rescale(), which looks like this:
rescale(chart, group, fakeGroup) {
let groupSize = this.getGroupSize(fakeGroup, this.yValue);
let minTop = group.top(groupSize)[groupSize-1].value;
let minimum = minTop > 0 ? minTop : 0.0001;
let maximum = group.top(1)[0].value;
chart.y(d3.scale.log().domain([minimum, maximum])
.range(this.height, 0)
.nice()
.clamp(true));}
I think the parameters are pretty self-explanatory, I just get my chart, the whole group as set by dimension.group.reduceSum and a fake group I created, which contains the top 50 elements, to reduce bar count of my chart.
The rescale() method is called in the event listener
chart.on('preRedraw', (chart) => {
this.rescale(chart, group, fakeGroup);
}
So what I do is re-defining (re-setting min and max values regarding filtered data) the charts yAxis everytime the chart gets redrawn, which happens to also be every time one of my charts is filtered. So now, the scale always fits the filtered data the chart contains after filtering another chart.

Prevent a graph from recalculating its own percentages

I have three Row Charts and my code calculates and updates the percentages for each chart whenever a user first lands on the page or clicks a rectangle bar of a chart.  This is how it calculates the percentages
posChart:
% Position= unique StoreNumber counts per Position / unique StoreNumber counts for all POSITIONs
deptChart:
% Departments= POSITION counts per DEPARTMENT/POSITION counts for all DEPARTMENTs
stateChart:
% States= unique StoreNumber counts per STATE / unique StoreNumber counts for all STATEs
What I want is when a user clicks a rectangle bar of a rowChart such as “COUNTS BY STATE”, it should NOT update/recalculate the percentages for that chart (it should not affect its own percentages), however, percentages should be recalculated for the other two charts i.e. “COUNTS BY DEPARTMENT” and “COUNTS BY POSITION”.  The Same scenario holds for the other charts as well. This is what I want
If a user clicks a
“COUNTS BY DEPARTMENT” chart --> recalculate percentages for “COUNTS BY POSITION” and “COUNTS BY STATE” charts
“COUNTS BY POSITION” chart --> recalculate percentages for “COUNTS BY DISTRIBUTOR” and “COUNTS BY STATE” charts
Please Help!!
link:http://jsfiddle.net/mfi_login/z860sz69/
Thanks for the reply.
There is a problem with the solution you provided. I am looking for the global total for all filters but I don’t want those totals to be changed when user clicks on a current chart's rectangular bar.
e.g.
if there are two different POSITIONS (Supervisor, Account Manager) with the same StoreNumber (3), then I want StoreNumber to be counted as 1 not 2
If we take an example of Account Manager % calculation (COUNTS BY POSITION chart)
total unique StoreNumbers=3
Total Account Manager POSITIONs=2
% = 2/3=66%
Is there a way to redraw the other two charts without touching the current one?
It seems to me that what you really want is to use the total of the chart's groups, not the overall total. If you use the overall total then all filters will be observed, but if you use the total for the current group, it will not observe any filters on the current chart.
This will have the effect you want - it's not about preventing any calculations, but about making sure each chart is affected only by the filters on the other charts.
So, instead of bin_counter, let's define sum_group and sum_group_xep:
function sum_group(group, acc) {
acc = acc || function(kv) { return kv.value; };
return d3.sum(group.all().filter(function(kv) {
return acc(kv) > 0;
}), acc);
}
function sum_group_xep(group) {
return sum_group(group, function(kv) {
return kv.value.exceptionCount;
});
}
And we'll use these for each chart, so e.g.:
posChart
.valueAccessor(function (d) {
//gets the total unique store numbers for selected filters
var value=sum_group_xep(posGrp)
var percent=value>0?(d.value.exceptionCount/value)*100:0
//this returns the x-axis percentages
return percent
})
deptChart
.valueAccessor(function (d) {
total=sum_group(deptGrp)
var percent=d.value?(d.value/total)*100:0
return percent
})
stateChart
.valueAccessor(function (d) {
var value=sum_group_xep(stateGrp);
return value>0?(d.value.exceptionCount/value)*100:0
})
... along with the other 6 places these are used. There's probably a better way to organize this without so much duplication of code, but I didn't really think about it!
Fork of your fiddle: http://jsfiddle.net/gordonwoodhull/yggohcpv/8/
EDIT: Reductio might have better shortcuts for this, but I think the principle of dividing by the total of the values in the current chart's group, rather than using a groupAll which observes all filters, is the right start.

Resources