Reduce function in dc.js - dc.js

I am new to the dc.js library and wanted to do the crossfilter calculation below on group method of my geochoropleth chart. I am pretty sure there is some function I could pass to the reduce method of group.
I have the following data in DATA.csv (the first row contains column names):
BUDGET,GID,MDIS,USPRO,TYPE,FILEURL,RVID,VERDATE,VERSTAT,SCORE
10428,ALI-G-006,Aliabad,Kunduz,Hard,/uploadedfiles/reports/SIKA North/136-SIKA-North-ALI-G-006.pdf,0,19-08-2014,2,0
24853,ALI-G-008,Aliabad,Kunduz,Hard,/uploadedfiles/reports/SIKA North/561-SIKA-North-ALI-G-008.pdf,0,19-08-2014,0
24831,ALI-G-019,Aliabad,Kunduz,Hard,/uploadedfiles/reports/SIKA North/987-SIKA-North-ALI-G-019.pdf,0,18-08-2014,2,0
24771,IMA-G-017,Imam Sahib,Kunduz,Hard,/uploadedfiles/reports/SIKA North/557-SIKA-North- IMA-G-017.pdf,0,28-08-2014,2,1
21818,IMA-G-019,Imam Sahib,Kunduz,Hard,/uploadedfiles/reports/SIKA North/992-SIKA-North-IMA-G-019.pdf,0,27-08-2014,2,0
12266,KHA-G-007,Khanabad,Kunduz,Hard,/uploadedfiles/reports/SIKA North/583-SIKA-North - KHA-G-007.pdf,0,7/9/2014,1,0
23148,KUN-G-002,Kunduz,Kunduz,Hard,/uploadedfiles/reports/SIKA North/909-SIKA-North - KUN-G-002.pdf,0,1/9/2014,2,0
54584,KUN-G-004,Kunduz,Kunduz,Hard,/uploadedfiles/reports/SIKA North/702-SIKA-North - KUN-G-004 20140709.pdf,0,9/7/2014,1,0
24544,PUL-G-001,Pul-e Khumri,Baghlan,Hard,/uploadedfiles/reports/SIKA North/599-SIKA-North - PUL-G-001 - 20140623.pdf,0,6/7/2014,2,1
40149,SSKDAG046,Arghandab (1),Kandahar,Hard,/uploadedfiles/reports/SIKA South/239-SIKA-South-SSKDAG046.pdf,0,12/9/2014,0,0.625
39452,0003 LGR MAG,Muhammad Aghah,Logar,Hard,/uploadedfiles/reports/SIKA East/792-SIKA-East - 0003 LGR MAG - 20140610.pdf,0,10/6/2014,2,0.7
58298,0013 LGR MAG,Muhammad Aghah,Logar,Hard,/uploadedfiles/reports/SIKA East/591-SIKA-East - 0013 LGR MAG 20140601.pdf,0,1/6/2014,2,0]
Below is the dimension and group for my chart:
var facts = crossfilter(data);
var scoref = facts.dimension(function (d) { return d.district;});
var scoreg = scoref.group().reduceSum(function(d){return d.score;});
The d.score field's value is calculated using the code below with PHP:
$tempsql = $dbase->query('select "VERMDIS", COUNT(*) AS TOTAL, SUM("VERSTAT") AS SAM FROM mt_fver GROUP BY "VERMDIS"');
while ($r = pg_fetch_array($tempsql)) {
$dist = $r['VERMDIS'];
$score = $r[2] / (2 * $r[1]);
$disxx[$dist] = $score;
}
What I would like to achieve is to do the same calculation using group().reduce(function (p,v) { /* ... */ }) from the dc.js library while grouping the values by district names.

What results/errors are you getting? Looks like pretty much the right idea, but take into account:
if you're using d3.csv(), it will return the values as strings, so you'll need either to preprocess your data, or use +d.score to convert the values while reading them
the field may come out as d.SCORE depending how you are reading the CSV in.
you may need to adapt the reduce function to suit your calculation
If you put a line break in the function you pass to reduce, you can set a breakpoint there using the browser's debugger and experiment on the console to figure out what expression works for what you need.

Related

How to add filter result to select menu

I'm stuck with my first dashboard project with d3, dc and crossfilter. Cannot find a solution.
"ETCBTC","BUY","0.002325","1.04","0.00241800","0.00104","ETC"
"ETCBTC","SELL","0.002358","1.04","0.00245232","0.00000245","BTC"
"LTCETH","SELL","0.30239","0.006","0.00181434","0.00000181","ETH"
"LTCETH","SELL","0.30239","0.149","0.04505611","0.00004506","ETH"
I have different trading pairs in first column and from it i need to use only last pair BTC and ETH in this example.
I found the filter that helps me to do that.
The thing is I need to have BTC and ETH in my select menu which can apply filter.
function show_market_selector(ndx) {
var marketDim = ndx.dimension(dc.pluck("Market"));
var selectorMenu = marketDim.group();
function filterItems(query) {
return ndx.dimension(dc.pluck("Market")).filter(function(el) {
return el.toLowerCase().indexOf(query.toLowerCase()) > 0;
});
}
filterItems("BTC");
var select = dc.selectMenu("#market-selector")
.dimension(marketDim)
.group(selectorMenu);
select.title(function (d){
return "BTC";
});
}
Now I get all pair in group in this menu. But my target is just to have BTC and ETH in the select menu.
I hope someone can give me advice. Thank you.
I think it would be easier just to use the currency as your dimension key:
var currencyDim = ndx.dimension(d => d.Market.slice(3)),
currencyGroup = marketDim.group();
var select = dc.selectMenu("#market-selector")
.dimension(currencyDim)
.group(currencyGroup);
You don't really want to create a new dimension every time filterItems is called - dimensions are heavy-weight indices which are intended to be kept around.
The name of dimension.filter() is confusing - it's nothing like JavaScript's Array.prototype.filter(), which returns the matching rows. Instead, it's an imperative function which sets the current filter for that dimension (and changes what all the other dimensions see).
If you need a "from currency" dimension, that would be
var fromCurrencyDim = ndx.dimension(d => d.Market.slice(0,3))

How to get the total sum of a column in jqgrid

i have a two columns in jqgrid, ShopID and NetSales, and i would like to add Contribution Column which will be a calculated column. the formula is NetSales divided by Total. please see image for example.
i know how to get the Total using getCol like this var sumtotal = grid.jqGrid('getCol', 'NetSales', false, 'sum');, but dont know how to use it further for division. i have tried, but it didnt work. please help.
i
Commonly you have two ways to solve the problem
As mentioned into the note you should calculate the sum of the column before to put the data into the grid. If you have this value you can use a custom fomatter to calculate the percentage. In this case the calculated sum should be defined as global in the scope.
Direct calculation of this without using any jqGrid method - see below
Suppose you have a local data like this
mydata = [
{ShipId: 1, NetSales: 150000},
{ShipId: 2, NetSales: 200000},
...
]
You can easy do (no checks it is just idea)
var sum = 0;
$.each(mydata, function( i, row) {
sum += parseFloat(row.NetSales);
}
$.each(mydata, function( i, row) {
if (sum > 0 ) {
row.Contribution = parseFloat(row.NetSales)/sum*100;
}
}
Then put mydata in jqGrid directly without to do any calculation and use custom formatter to display the percentage.

Avoid multiple sums in custom crossfilter reduce functions

This question arise from some difficulties in creating a crossfilter dataset, in particular on how to group the different dimension and compute a derived values. The final aim is to have a number of dc.js graphs using the dimensions and groups.
(Fiddle example https://jsfiddle.net/raino01r/0vjtqsjL/)
Question
Before going on with the explanation of the setting, the key question is the following:
How to create custom add, remove, init, functions to pass in .reduce so that the first two do not sum multiple times the same feature?
Data
Let's say I want to monitor the failure rate of a number of machines (just an example). I do this using different dimension: month, machine's location, and type of failure.
For example I have the data in the following form:
| month | room | failureType | failCount | machineCount |
|---------|------|-------------|-----------|--------------|
| 2015-01 | 1 | A | 10 | 5 |
| 2015-01 | 1 | B | 2 | 5 |
| 2015-01 | 2 | A | 0 | 3 |
| 2015-01 | 2 | B | 1 | 3 |
| 2015-02 | . | . | . | . |
Expected
For the three given dimensions, I should have:
month_1_rate = $\frac{10+2+0+1}{5+3}$;
room_1_rate = $\frac{10+2}{5}$;
type_A_rate = $\frac{10+0}{5+3}$.
Idea
Essentially, what counts in this setting is the couple (day, room). I.e. given a day and a room there should be a rate attached to them (then the crossfilter should act to take in account the other filters).
Therefore, a way to go could be to store the couples that have already been used and do not sum machineCount for them - however we still want to update the failCount value.
Attempt (failing)
My attempt was to create custom reduce functions and not summing MachineCount that were already taken into account.
However there are some unexpected behaviours. I'm sure this is not the way to go - so I hope to have some suggestion on this.
// A dimension is one of:
// ndx = crossfilter(data);
// ndx.dimension(function(d){return d.month;})
// ndx.dimension(function(d){return d.room;})
// ndx.dimension(function(d){return d.failureType;})
// Goal: have a general way to get the group given the dimension:
function get_group(dim){
return dim.group().reduce(add_rate, remove_rate, initial_rate);
}
// month is given as datetime object
var monthNameFormat = d3.time.format("%Y-%m");
//
function check_done(p, v){
return p.done.indexOf(v.room+'_'+monthNameFormat(v.month))==-1;
}
// The three functions needed for the custom `.reduce` block.
function add_rate(p, v){
var index = check_done(p, v);
if (index) p.done.push(v.room+'_'+monthNameFormat(v.month));
var count_to_sum = (index)? v.machineCount:0;
p.mach_count += count_to_sum;
p.fail_count += v.failCount;
p.rate = (p.mach_count==0) ? 0 : p.fail_count*1000/p.mach_count;
return p;
}
function remove_rate(p, v){
var index = check_done(p, v);
var count_to_subtract = (index)? v.machineCount:0;
if (index) p.done.push(v.room+'_'+monthNameFormat(v.month));
p.mach_count -= count_to_subtract;
p.fail_count -= v.failCount;
p.rate = (p.mach_count==0) ? 0 : p.fail_count*1000/p.mach_count;
return p;
}
function initial_rate(){
return {rate: 0, mach_count:0, fail_count:0, done: new Array()};
}
Connection with dc.js
As mentioned, the previous code is needed to create dimension, group to be passed in three different bar graphs using dc.js.
Each graph will have .valueAccessor(function(d){return d.value.rate};).
See the jsfiddle (https://jsfiddle.net/raino01r/0vjtqsjL/), for an implementation. Different numbers, but the datastructure is the same. Notice the in the fiddle you expect a Machine count to be 18 (in both months), however you always get the double (because of the 2 different locations).
Edit
Reduction + dc.js
Following Ethan Jewett answer, I used reductio to take care of the grouping. The updated fiddle is here https://jsfiddle.net/raino01r/dpa3vv69/
My reducer object needs two exception (month, room), when summing the machineCount values. Hence it is built as follows:
var reducer = reductio()
reducer.value('mach_count')
.exception(function(d) { return d.room; })
.exception(function(d) { return d.month; })
.exceptionSum(function(d) { return d.machineCount; })
reducer.value('fail_count')
.sum(function(d) { return d.failCount; })
This seems to fix the numbers when the graphs are rendered.
However, I do have a strange behaviour when filtering one single month and looking at the numbers in the type graph.
Possible solution
Rather double create two exception, I could merge the two fields when processing the data. I.e. as soon the data is defined I couls:
data.foreach(function(x){
x['room_month'] = x['room'] + '_' + x['month'];
})
Then the above reduction code should become:
var reducer = reductio()
reducer.value('mach_count')
.exception(function(d) { return d.room_month; })
.exceptionSum(function(d) { return d.machineCount; })
reducer.value('fail_count')
.sum(function(d) { return d.failCount; })
This solution seems to work. However I am not sure if this is a sensible things to do: if the dataset is large,adding a new feature could slow down things quite a lot!
A few things:
Don't calculate rates in your Crossfilter reducers. Calculate the components of the rates. This will keep both simpler and faster. Do the actual division in your value accessor.
You've basically got the right idea. I think there are two problems that I see immediately:
In your remove_rate your are not removing the key from the p.done array. You should be doing something like if (index) p.done.splice(p.done.indexOf(v.room+'_'+monthNameFormat(v.month)), 1); to remove it.
In your reduce functions, index is a boolean. (index == -1) will never evaluate to true, IIRC. So your added machine count will always be 0. Use var count_to_sum = index ? v.machineCount:0; instead.
If you want to put together a working example, I or someone else will be happy to get it going for you, I'm sure.
You may also want to try Reductio. Crossfilter reducers are difficult to do right and efficiently, so it may make sense to use a library to help. With Reductio, creating a group that calculates your machine count and failure count looks like this:
var reducer = reductio()
reducer.value('mach_count')
.exception(function(d) { return d.room; })
.exceptionSum(function(d) { return d.machineCount; })
reducer.value('fail_count')
.sum(function(d) { return d.failCount; })
var dim = ndx.dimension(...)
var grp = dim.group()
reducer(group)

crossfilter: obtain the count of values falling into the product of two columns

I have a data set like
{"parent":"/home","inside":"/files","filename":"type.jar",
"extension":"jar","type":"modified","archive"}
Likewise many there are many rows in the json array. I am using crossfilter to read the data and plot graphs and datatables. the Type in the data set has values "added", "modified" and "deleted".
I want to create a data table like
Extension | Added | Modified | Deleted
where added, modified and deleted will hold the count of the files with the specific extension. Can anyone suggest me a way to do so?
So far I have created a dimension like this:
var extensionType = facts.dimension(function(d) {
return d.extension; });
var extensionTypeGroup=extensionType.group();
and I get a grouped output like this,
{"key":"class","value":424},
{"key":"js","value":176},
{"key":"properties","value":26},
{"key":"jar","value":10},
{"key":"css","value":8},
{"key":"txt","value":6},
{"key":"war","value":4},
{"key":"png","value":4},
{"key":"handlebars","value":4},
{"key":"jar_local","value":2},
{"key":"aar","value":2}
How do I get the separate count of added deleted and modified?
Probably the easiest way to do this is to reduce to an object rather than a single value.
This is covered in the FAQ: How do I reduce multiple values at once? What if rows contain a single value but a different value per row? You probably just needed the right search terms to find it.
Actually it looks like the code from the FAQ will work for you unmodified:
var extensionTypeGroup = extensionType.group().reduce(
function(p, v) { // add
p[v.type] = (p[v.type] || 0) + v.value;
return p;
},
function(p, v) { // remove
p[v.type] -= v.value;
return p;
},
function() { // initial
return {};
});

group.all() call required for data to populate correctly

So I've encountered a weird issue when dealing with making Groups based on a variable when the crossfilter is using an array, instead of a literal number.
I currently have an output array of a date, then 4 values, that I then map into a composite graph. The problem is that the 4 values can fluctuate depending on the input given to the page. What I mean is that based on what it receives, I can have 3 values, or 10, and there's no way to know in advance. They're placed into an array which is then given to a crossfilter. When in testing, I was accessing using
dimension.group.reduceSum(function(d) { return d[0]; });
Where 0 was changed to whatever I needed. But I've finished testing, for the most part, and began to adapt it into a dynamic system where it can change, but there's always at least the first two. To do this I created an integer that keeps track of what index I'm at, and then increases it after the group has been created. The following code is being used:
var range = crossfilter(results);
var dLen = 0;
var curIndex = 0;
var dateDimension = range.dimension(function(d) { dLen = d.length; return d[curIndex]; });
curIndex++;
var aGroup = dateDimension.group().reduceSum(function(d) { return d[curIndex]; });
curIndex++;
var bGroup = dateDimension.group().reduceSum(function(d) { return d[curIndex]; });
curIndex++;
var otherGroups = [];
for(var h = 0; h < dLen-3; h++) {
otherGroups[h] = dateDimension.group().reduceSum(function(d) { return d[curIndex]; });
curIndex++;
}
var charts = [];
for(var x = 0; x < dLen - 3; x++) {
charts[x] = dc.barChart(dataGraph)
.group(otherGroups[x], "Extra Group " + (x+1))
.hidableStacks(true)
}
charts[charts.length] = dc.lineChart(dataGraph)
.group(aGroup, "Group A")
.hidableStacks(true)
charts[charts.length] = dc.lineChart(dataGraph)
.group(aGroup, "Group B")
.hidableStacks(true)
The issue is this:
The graph gets built empty. I checked the curIndex variable multiple times and it was always correct. I finally decided to instead check the actual group's resulting data using the .all() method.
The weird thing is that AFTER I used .all(), now the data works. Without a .all() call, the graph cannot determine the data and outputs absolutely nothing, however if I call .all() immediately after the group has been created, it populates correctly.
Each Group needs to call .all(), or only the ones that do will work. For example, when I first was debugging, I used .all() only on aGroup, and only aGroup populated into the graph. When I added it to bGroup, then both aGroup and bGroup populated. So in the current build, every group has .all() called directly after it is created.
Technically there's no issue, but I'm really confused on why this is required. I have absolutely no idea what the cause of this is, and I was wondering if there was any insight into it. When I was using literals, there was no issue, it only happens when I'm using a variable to create the groups. I tried to get output later, and when I do I received NaN for all the values. I'm not really sure why .all() is changing values into what they should be especially when it only occurs if I do it immediately after the group has been created.
Below is a screenshot of the graph. The top is when everything has a .all() call after being created, while the bottom is when the Extra Groups (the ones defined in the for loop) do not have the .all() call anymore. The data is just not there at all, I'm not really sure why. Any thoughts would be great.
http://i.stack.imgur.com/0j1ey.jpg
It looks like you may have run into the classic "generating lambdas from loops" JavaScript problem.
You are creating a whole bunch of functions that reference curIndex but unless you call those functions immediately, they will refer to the same instance of curIndex in the global environment. So if you call them after initialization, they will probably all try to use a value which is past the end.
Instead, you might create a function which generates your lambdas, like so:
function accessor(curIndex) {
return function(d) { return d[curIndex]; };
}
And then each time call .reduceSum(accessor(curIndex))
This will cause the value of curIndex to get copied each time you call the accessor function (or you can think of each generated function as having its own environment with its own curIndex).

Resources