Year over Year Stats from a Crossfilter Dataset - dc.js

Summary
I want to pull out Year over Year stats in a Crossfilter-DC driven dashboard
Year over Year (YoY) Definition
2017 YoY is the total units in 2017 divided by the total units in 2016.
Details
I'm using DC.js (and therefore D3.js & Crossfilter) to create an interactive Dashboard that can also be used to change the data it's rendering.
I have data, that though wider (has ~6 other attributes in addition to date and quantity: size, color, etc...sales data), boils down to objects like:
[
{ date: 2017-12-7, quantity: 56, color: blue ...},
{ date: 2017-2-17, quantity: 104, color: red ...},
{ date: 2016-12-7, quantity: 60, color: red ...},
{ date: 2016-4-15, quantity: 6, color: blue ...},
{ date: 2017-2-17, quantity: 10, color: green ...},
{ date: 2016-12-7, quantity: 12, color: green ...}
...
]
I'm displaying one rowchart per attribuet such that you can see the totals by color, size, etc. People would use each of these charts to be able to see the totals by that attribute and drill into the data by filtering by just a color, or a color and a size, or a size, etc. This setup is all (relatively) straight forward and kind of what DC is made for.
However, now I'd like to add some YoY stats such that I can show a barchart with x-axis as the years, and the y-axis as the YoY values (ex. YoY-2019 = Units-2019 / Units-2018). I'd also like to do the same by quarter and month such that I could see YoY Mar-2019 = Units-Mar-2019 / Units-Mar-2018 (and the same for quarter).
I have a year dimension and sum quantity
var yearDim = crossfilterObject.dimension(_ => _.date.getFullYear());
var quantityGroup = yearDim.group.reduceSum(_ => _.quantity);
I can't figure out how to do the Year over Year calc though in the nice, beautiful DC.js-way.
Attempted Solutions
Year+1
Add another dimension that's year + 1. I didn't' really get any further though because all I get out of it are two dimensions whose year groups I want to divide ... but am not sure how.
var yearPlusOneDim = crossfilterObject.dimension(_ => _.date.getFullYear() + 1);
Visually I can graph the two separately and I know, conceptually, what I want to do: which is divide the 2017 number in yearDim by the 2017 number in YearPlusOneDim (which, in reality, is the 2016 number). But "as a concept is as far as I got on this one.
Abandon DC Graphing
I could always use the yearDim's quantity group to get the array of values, which I could then feed into a normal D3.js graph.
var annualValues = quantityGroup.all();
console.log(annualValues);
// output = [{key: 2016, value: 78}, {key: 2017, value: 170}]
// example data from the limited rows listed above
But this feels like a hacky solution that's bound to fail and not benefit from all the rapid and dynamic DC updating.

I'd use a fake group, in order to solve this in one pass.
As #Ethan says, you could also use a value accessor, but then you'd have to look up the previous year each time a value is accessed - so you'd probably have to keep an extra table around. With a fake group, you only need this table in the body of your .all() function.
Here's a quick sketch of what the fake group might look like:
function yoy_group(group) {
return {
all: function() {
// index all values by date
var bydate = group.all().reduce(function(p, kv) {
p[kv.key.getTime()] = kv.value;
return p;
}, {});
// for any key/value pair which had a value one year earlier,
// produce a new pair with the ratio between this year and last
return group.all().reduce(function(p, kv) {
var date = d3.timeYear.offset(kv.key, -1);
if(bydate[date.getTime()])
p.push({key: kv.key, value: kv.value / bydate[date.getTime()]});
return p;
}, []);
}
};
}
The idea is simple: first index all the values by date. Then when producing the array of key/value pairs, look each one up to see if it had a value one year earlier. If so, push a pair to the result (otherwise drop it).
This should work for any date-keyed group where the dates have been rounded.
Note the use of Array.reduce in a couple of places. This is the spiritual ancestor of crossfilter's group.reduce - it takes a function which has the same signature as the reduce-add function, and an initial value (not a function) and produces a single value. Instead of reacting to changes like the crossfilter one does, it just loops over the array once. It's useful when you want to produce an object from an array, or produce an array of different size from the original.
Also, when indexing an object by a date, I use Date.getTime() to fetch the numeric representation of the date. Otherwise the date coerces to a string representation which may not be exact. Probably for this application it would be okay to skip .getTime() but I'm in the habit of always comparing dates exactly.
Demo fiddle of YOY trade volume in the data set used by the stock example on the main dc.js page.

I've rewritten #Gordon 's code below. All the credit is his for the solution (answered above) and I've just wirtten down my own version (far longer and likely only useful for beginners like me) of the code (much more verbose!) and the explanation (also much more verbose) to replicate my thinking in bridging my near-nothing starting point up to #Gordon 's really clever answer.
yoyGroup = function(group) {
return { all: function() {
// For every key-value pair in the group, iterate across it, indexing it by it's time-value
var valuesByDate = group.all().reduce(function(outputArray, thisKeyValuePair) {
outputArray[thisKeyValuePair.key.getTime()] = thisKeyValuePair.value;
return outputArray;
}, []);
return group.all().reduce(function(newAllArray, thisKeyValuePair) {
var dateLastYear = d3.timeYear.offset(thisKeyValuePair.key, -1);
if (valuesByDate[dateLastYear.getTime()]) {
newAllArray.push({
key: thisKeyValuePair.key,
value: thisKeyValuePair.value / valuesByDate[dateLastYear.getTime()] - 1
});
}
return newAllArray;
}, []); // closing reduce() and a function(...)
}}; // closing the return object & a function
};
¿Why are we overwritting the all() function?
When DC.js goes to create a graph based on a grouping, the only function from Crossfilter it uses is the all() function. So if we want to do something custom to a grouping to affect a DC graph, we only have to overwrite that one function: all().
¿What does the all() function need to return?
A group's all function must return an array of objects and each object must have two properties: key & value.
¿So what exactly are we doing here?
We're starting with an existing group which shows some values over time (Important Assumption: keys are date objects) and then creating a wrapper around it so that we can take advantage of the work that crossfilter has already done to aggregate at a certain level (ex. year, month, etc.).
We start by using reduce to manipulate the array of objects into a more simple array where the keys and values that were in the objects are now directly in the array. We do this to make it easier to look up values by keys.
before / output structure of group.all()
[ {key: k1, value: v1},
{key: k2, value: v2},
{key: k3, value: v3}
]
after
[ k1: v1,
k2: v2,
k3: v3
]
Then we move on to creating the correct all() structure again: an array of objects each of which has a key & value property. We start with the existing group's all() array (once again), but this time we have the advantage of our valuesByDate array which will make it easy to look up other dates.
So we iterate (via reduce) over the original group.all() output and lookup in the array we generated earlier (valuesByDate), if there's an entry from one year ago (valuesByDate[dateLastYear.getTime()]). (We use getTime() so it's simple integers rather than objects we're indexing off of.) If there is an element of the array from one year ago, then we add a key-value object-pair to our soon-to-be-returned array with the current key (date) and for the value we divide the "now" value (thisKeyValuePair.value) by the value 1 year ago: valuesByDate[dateLastYear.getTime()]. Lastly we subtract 1 so that it's (the most traditional definition of) YoY. Ex. This year = 110 and last year = 100 ... YoY = +10% = 110/100 - 1.

Related

I need to find a faster solution to iterate rows in Google App Script

I'm trying to save some rows values for multiple columns on multiple tabs in GAS, but it's taking a lot of time and I'd like to find a faster way of doing this, if there's any.
A project e.g:'Project1' -as a key- has a value associated with it which corresponds to the column where it's stored, the tabs are 600+ iterations long.
this script opens up a tab called 'person1' at first and goes through all the rows for the column that corresponds to that project in 'projects' dictionary (it's the same format for every tab, but more projects will be added in the future)
right now i'm iterating through the 'members' dictionary (length=m), then through the projects dictionary (length=p) and finally through the length of the rows (length='r'), in the meantime it access the other spreadsheet where I want to save all those rows.
This means that the current time complexity of my algorithm is O(mpr) and it's WAY too slow.
for 15 people and 6 projects each, the amount of iterations would be 156600+ = 54,000 iterations at least (more people and more projects and more rows will be added).
is there any way to make my algorithm faster?
const members = {'Person1':'P1', 'Person2':'P2'};
const projects = {'Project1':'L','Project2':'R'}
function saveRowValue() {
let sourceSpreadsheet = SpreadsheetApp.getActiveSpreadsheet();
let targetSpreadsheet = SpreadsheetApp.openById('-SPREADSHEET-');
let targetSheet = targetSpreadsheet.getSheetByName('Tracking time');
let rowsToWrite = [];
rowsToWrite.push(['Project', 'Initials', 'Date', 'Tracking time'])
var rowsToSave = 1;
for(m in members){
Logger.log(m +' initials:'+ members[m]);
let sourceSheet = sourceSpreadsheet.getSheetByName(m);
for(p in projects){
let values = sourceSheet.getRange(projects[p]+"1:"+projects[p]).getValues();
Logger.log(values)
let list = [null, 0,''];
for(var i=0; i<values.length; i++){
try{
date = sourceSheet.getRange('B'+i).getValue();
let val = sourceSheet.getRange(projects[p]+i)
val = Utilities.formatDate(val.getValue(), "GMT", val.getNumberFormat())
Logger.log(val);
if(!(list.includes(val)) && date instanceof Date){
//rowsToWrite.push();
rowsToSave++;
targetSheet.getRange(rowsToSave,1,1,4).setValues([[p, members[m], date, val]]);
}
}catch(e){
Logger.log(e)
}
}
}
}
Logger.log(rowsToWrite);
[Here you can see how much time it takes to iterate 600 rows for a single project and a single member after changing what Yuri Khristich told me to change][1]
[1]: https://i.stack.imgur.com/CnRZY.png
First step is to try to get rid of getValue() and setValue() in loops. All data should be captured at once as 2D arrays in one step and put on the sheet in one step as well. No single cell or single row operations.
Next trick depends on your workflow. Say, it's unlikely that every time all 54000+ cells need to be checked. Probably there are ranges that have no changes. You can figure out some way to indicate the changes. And process only the changed ranges. Probably, the indication could be performed with onChange() trigger. For example you can add * to the name of the sheets and columns where changes have occurred and remove these * whenever you run your script.
Reference:
Use batch operations

Google Apps Script: Activate and sort rows when cells in given column are not blank

I'm extremely new to Apps Script and trying to make my first thing. It's a shopping list.
I want to create a function that will activate and then sort (by Column 1, 'Aisle #') all rows where there are values in a given other column (Column 3, 'Qty'). The idea is to sort the items on the list for that week (i.e., with a value filled in for Qty) by aisle to give me the order I should be looking for things. I do not want to sort items which are in the spreadsheet but without
a value for Qty.
Here is what I've got so far:
var sheet = ss.getActiveSheet()
var range = sheet.getDataRange();
var rangeVals = range.getValues()
function orderList2(){
if(rangeVals[3] != ""){
sheet.activate().sort(1, ascending=true);
};
};
I'm trying to use "if" to define which rows to activate before doing the sort (as I don't want to sort the entire sheet—I only want to sort the items I will be buying that week, i.e., the items with a value in Column 3). The script runs but ends up sorting the entire sheet.
The closest thing I could find was an iteration, but when I did it, it ended up only activating the top-left cell.
Any help you can provide would be greatly appreciated!
Cheers,
Nick
Answer:
Use Range.sort() instead of Sheet.sort() if you don't want to sort the entire sheet.
Explanation:
You want to sort the data according to the value in column A (Aisle #), if the corresponding value in C (Qty) is not empty.
If my assumption is correct, the rows where Qty is empty should go below the rest of data, and they should not be sorted according to their Aisle #.
In this case, I'd suggest the following:
Sort the full range of data (headers excluded) according to Qty, so that the rows without a Qty are placed at the bottom, using Range.sort() (if you don't need to exclude the headers, you can use Sheet.sort() instead).
Use SpreadsheetApp.flush() to apply the sort to the spreadsheet.
Use getValues(), filter() and length to know how many rows in the initial range have their column C populated (variable QtyElements in the sample below).
Using QtyElements, retrieve the range of rows with a non-empty column C, and sort it according to column 1, using Range.sort().
Code sample:
function orderList2() {
var sheet = SpreadsheetApp.getActiveSheet();
var firstRow = 2; // Range starts at row 2, header row excluded
var fullRange = sheet.getRange(firstRow, 1, sheet.getLastRow() - firstRow + 1, sheet.getLastColumn());
fullRange.sort(3); // Sort full range according to Qty
SpreadsheetApp.flush(); // Refresh spreadsheet
var QtyElements = fullRange.getValues().filter(row => row[2] !== "").length;
sheet.getRange(firstRow, 1, QtyElements, sheet.getLastColumn())
.sort(1); // If not specified, default ascending: true
//.sort({column: 1, ascending: false}); // Uncomment if you want descending sort
}
Reference:
Range.sort(sortSpecObj)

How to get dynamic field count in dc.js numberDisplay?

I'm currently trying to figure out how to get a count of unique records to display using DJ.js and D3.js
The data set looks like this:
id,name,artists,genre,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
6DCZcSspjsKoFjzjrWoCd,God's Plan,Drake,Hip-Hop/Rap,0.754,0.449,7,-9.211,1,0.109,0.0332,8.29E-05,0.552,0.357,77.169,198973,4
3ee8Jmje8o58CHK66QrVC,SAD!,XXXTENTACION,Hip-Hop/Rap,0.74,0.613,8,-4.88,1,0.145,0.258,0.00372,0.123,0.473,75.023,166606,4
There are 100 records in the data set, and I would expect the count to display 70 for the count of unique artists.
var ndx = crossfilter(spotifyData);
totalArtists(ndx);
....
function totalArtists(ndx) {
// Select the artists
var totalArtistsND = dc.numberDisplay("#unique-artists");
// Count them
var dim = ndx.dimension(dc.pluck("artists"));
var uniqueArtist = dim.groupAll();
totalArtistsND.group(uniqueArtist).valueAccessor(x => x);
totalArtistsND.render();
}
I am only getting 100 as a result when I should be getting 70.
Thanks a million, any help would be appreciated
You are on the right track - a groupAll object is usually the right kind of object to use with dc.numberDisplay.
However, dimension.groupAll doesn't use the dimension's key function. Like any groupAll, it looks at all the records and returns one value; the only difference between dimension.groupAll() and crossfilter.groupAll() is that the former does not observe the dimension's filters while the latter observes all filters.
If you were going to use dimension.groupAll, you'd have to write reduce functions that watch the rows as they are added and removed, and keeps a count of how many unique artists it has seen. Sounds kind of tedious and possibly buggy.
Instead, we can write a "fake groupAll", an object whose .value() method returns a value dynamically computed according to the current filters.
The ordinary group object already has a unique count: the number of bins. So we can create a fake groupAll which wraps an ordinary group and returns the length of the array returned by group.all():
function unique_count_groupall(group) {
return {
value: function() {
return group.all().filter(kv => kv.value).length;
}
};
}
Note that we also have to filter out any bins of value zero before counting.
Use the fake groupAll like this:
var uniqueArtist = unique_count_groupall(dim.group());
Demo fiddle.
I just added this to the FAQ.

Plotting aggregated data with sub-columns in dc.js

I have data in the form:
data = [..., {id:X,..., turnover:[[2015,2017,2018],[2000000,3000000,2800000]]}, ...];
My goal is to plot the year in the x-axis, against the average turnover for all companies currently selected via crossfilter in the y-axis.
The years recorded per company are inconsistent, but there should always be three years.
If it would help, I can reorganise the data to be in the form:
data = [..., {id:X,..., turnover:{2015:2000000, 2017:3000000, 2018:2800000}}, ...];
Had I been able to reorganise the data further to look like:
[...{id:X, ..., year:2015, turnover:2000000},{id:X,...,year:2017,turnover:3000000},{id:X,...,year:2018,turnover:2800000}];
Then this question would provide a solution.
But splitting the companies into separate rows doesn't make sense with everything else I'm doing.
Unless I'm mistaken, you have what I call a "tag dimension", aka a dimension with array keys.
You want each row to be recorded once for each year it contains, but you only want it to affect this dimension. You don't want to observe the row multiple times in the other dimensions, which is why you don't want to flatten.
With your original data format, your dimension definition would look something like:
var yearsDimension = cf.dimension(d => d.turnover[0], true);
The key function for a tag dimension should return an array, here of years.
This feature is still fairly new, as crossfilter goes, and a couple of minor bugs were found this year. These bugs should be easy to avoid. The feature has gotten a lot of use and no major bugs have been found.
Always beware with tag dimensions, since any aggregations will add up to more than 100% - in your case 300%. But if you are doing averages across companies for a year, this should not be a problem.
pairs of tags and values
What's unique about your problem is that you not only have multiple keys per row, you also have multiple values associated with those keys.
Although the crossfilter tag dimension feature is handy, it gives you no way to know which tag you are looking at when you reduce. Further, the most powerful and general group reduction method, group.reduce(), doesn't tell you which key you are reducing..
But there is one even more powerful way to reduce across the entire crossfilter at once: dimension.groupAll()
A groupAll object behaves like a group, except that it is fed all of the rows, and it returns only one bin. If you use dimension.groupAll() you get a groupAll object that observes all filters except those on that dimension. You can also use crossfilter.groupAll if you want a groupAll that observes all filters.
Here is a solution (using ES6 syntax for brevity) of reduction functions for groupAll.reduce() that reduces all of the rows into an object of year => {count, total}.
function avg_paired_tag_reduction(idTag, valTag) {
return {
add(p, v) {
v[idTag].forEach((id, i) => {
p[id] = p[id] || {count: 0, total: 0};
++p[id].count;
p[id].total += v[valTag][i];
});
return p;
},
remove(p, v) {
v[idTag].forEach((id, i) => {
console.assert(p[id]);
--p[id].count;
p[id].total -= v[valTag][i];
})
return p;
},
init() {
return {};
}
};
}
It will be fed every row and it will loop over the keys and values in the row, producing a count and total for every key. It assumes that the length of the key array and the value array are the same.
Then we can use a "fake group" to turn the object on demand into the array of {key,value} pairs that dc.js charts expect:
function groupall_map_to_group(groupAll) {
return {
all() {
return Object.entries(groupAll.value())
.map(([key, value]) => ({key,value}));
}
};
}
Use these functions like this:
const red = avg_paired_tag_reduction('id', 'val');
const avgPairedTagGroup = turnoverYearsDim.groupAll().reduce(
red.add, red.remove, red.init
);
console.log(groupall_map_to_group(avgPairedTagGroup).all());
Although it's possible to compute a running average, it's more efficient to instead calculate a count and total, as above, and then tell the chart how to compute the average in the value accessor:
chart.dimension(turnoverYearsDim)
.group(groupall_map_to_group(avgPairedTagGroup))
.valueAccessor(kv => kv.value.total / kv.value.count)
Demo fiddle.

How to create x-axis range groups with Crossfilter and dc.js?

Here is a fiddle to help show what I'd like to do:
http://jsfiddle.net/m4x7o5of/
I have a set of records, each with a float value. For example:
var records = [{name: 'record1', value: 1.34563}, ..., {name: 'record5000', value: 0.62974}];
I'd like to create a barchart in dc.js that plots the records on the x-axis in range buckets, e.g x-number of records with value between 0 and .5, y-number of records between .5 and 1, z-number of records between 1 and 1.5, and so on.
I'm using an ordinal scale so that I can divide the set of records into 5ths, but I can't figure out how to get the records grouped together in the ranges like I described. In the linked fiddle, only the records with a value that matches the plotted ordinals will get displayed right now.
Is it even possible to group the records like this? Any help would be appreciated.
dimension.group takes a function that you can use to derive the group key. So dimension.group(function(d) { return Math.floor(d); }); will give you group keys of 0, 1, 2, 3, 4, 5, 6, 7, and 8 for your data set. You'll just need to construct a function that returns the values you want based on the values in your data set. Is that what you're looking to do?

Resources