Dimensional Charting with Non-Exclusive Attributes - dc.js

The following is a schematic, simplified, table, showing HTTP transactions. I'd like to build a DC analysis for it using dc, but some of the columns don't map well to crossfilter.
In the settings of this question, all HTTP transactions have the fields time, host, requestHeaders, responseHeaders, and numBytes. However, different transactions have different specific HTTP request and response headers. In the table above, 0 and 1 represent the absence and presence, respectively, of a specific header in a specific transaction. The sub-columns of requestHeaders and responseHeaders represent the unions of the headers present in transactions. Different HTTP transaction datasets will almost surely generate different sub-columns.
For this question, a row in this chart is represented in code like this:
{
"time": 0,
"host": "a.com",
"requestHeaders": {"foo": 0, "bar": 1, "baz": 1},
"responseHeaders": {"shmip": 0, "shmap": 1, "shmoop": 0},
"numBytes": 12
}
The time, host, and numBytes all translate easily into crossfilter, and so it's possible to build charts answering things like what was the total number of bytes seen for transactions between 2 and 4 for host a.com. E.g.,
var ndx = crossfilter(data);
...
var hostDim = ndx.dimension(function(d) {
return d.host;
});
var hostBytes = hostDim.group().reduceSum(function(d) {
return d.numBytes;
});
The problem is that, for all slices of time and host, I'd like to show (capped) bar charts of the (leading) request and response headers by bytes. E.g. (see the first row), for time 0 and host a.com, the request headers bar chart should show that bar and baz each have 12.
There are two problems, a minor one and a major one.
Minor Problem
This doesn't fit quite naturally into dc, as it's one-directional. These bar charts should be updated for the other slices, but they can't be used for slicing themselves. E.g., you shouldn't be able to select bar and deselect baz, and look for a resulting breakdown of hosts by bytes, because what would this mean: hosts in the transactions that have bar but don't have baz? hosts in the the transactions that have bar and either do or don't have baz? It's too unintuitive.
How can I make some dc charts one directional. Is it through some hack of disabling mouse inputs?
Major Problem
As opposed to host, foo and bar are non-exclusive. Each transaction's host is either something or the other, but a transaction's headers might include any combination of foo and bar.
How can I define crossfilter dimensions for requestHeaders, then, and how can I use dc? That is
var ndx = crossfilter(data);
...
var requestHeadersDim = ndx.dimension(function(d) {
// What should go here?
});

The way I usually deal with the major problem you state is to transform my data so that there is a separate record for each header (all other fields in these duplicate records are the same). Then I use custom group aggregations to avoid double-counting. These custom aggregations are a bit hard to manage so I built Reductio to help with this using the 'exception' function - github.com/esjewett/reductio

Hacked it (efficiently, but very inelegantly) by looking at the source code of dc. It's possible to distort the meaning of crossfilter to achieve the desired effect.
The final result is in this fiddle. It is slightly more limited than the question, as the fields of responseHeaders are hardcoded to foo, bar, and baz. Removing this restriction is more in the domain of simple Javascript.
Minor Problem
Using a simple css hack, I simply defined
.avoid-clicks {
pointer-events: none;
}
and gave the div this class. Inelegant but effective.
Major Problem
The major problem is solved by distorting the meaning of crossfilter concepts, and "fooling" dc.
Let's say the data looks like this:
var transactions = [
{
"time": 0,
"host": "a.com",
"requestHeaders": {"foo": 0, "bar": 1, "baz": 1},
"responseHeaders": {"shmip": 0, "shmap": 1, "shmoop": 0},
"numBytes": 12
},
{
"time": 1,
"host": "b.org",
"requestHeaders": {"foo": 0, "bar": 1, "baz": 1},
"responseHeaders": {"shmip": 0, "shmap": 1, "shmoop": 1},
"numBytes": 3
},
...
];
We can define a "dummy" dimension, which ignores the data:
var transactionsNdx = crossfilter(transactions);
var dummyDim = transactionsNdx
.dimension(function(d) {
return 0;
});
Using this dimension, we can define a group that counts the total foo, bar, and baz bytes of the filtered rows:
var requestHeadersGroup = dummyDim
.group()
.reduce(
/* callback for when data is added to the current filter results */
function (p, v) {
return {
"foo": p.foo + v.requestHeaders.foo * v.numBytes,
"bar": p.bar + v.requestHeaders.bar * v.numBytes,
"baz": p.baz + v.requestHeaders.baz * v.numBytes,
}
},
/* callback for when data is removed from the current filter results */
function (p, v) {
return {
"foo": p.foo - v.requestHeaders.foo * v.numBytes,
"bar": p.bar - v.requestHeaders.bar * v.numBytes,
"baz": p.baz - v.requestHeaders.baz * v.numBytes,
}
},
/* initialize p */
function () {
return {
"foo": 0,
"bar": 0,
"baz": 0
}
}
);
Note that this isn't a proper crossfilter group at all. It will not map the dimensions to their values. Rather, it maps 0 to a value which itself maps the dimensions to their values (ugly!). We therefore need to transform this group into something that actually looks like a crossfilter group:
var getSortedFromGroup = function() {
var all = requestHeadersGroup.all()[0].value;
all = [
{
"key": "foo",
"value": all.foo
},
{
"key": "bar",
"value": all.bar
},
{
"key": "foo",
"value": all.baz
}];
return all.sort(function(lhs, rhs) {
return lhs.value - rhs.value;
});
}
var requestHeadersDisplayGroup = {
"top": function(k) {
return getSortedFromGroup();
},
"all": function() {
return getSortedFromGroup();
},
};
We now can create a regular dc chart, and pass the adaptor group
requestHeadersDisplayGroup to it. It works normally from this point on.

Related

Buffer elements based on its contents with comparer function

In RSJS how to buffer values so buffer will be flushed when next element is different from previous. If elements by some comparator are the same then it should buffer them until next change is detected...
Suppose I have such elements...
{ t: 10, price:12 },
{ t: 10, price:13 },
{ t: 10, price:14 },
{ t: 11, price:12 },
{ t: 11, price:13 },
{ t: 10, price:14 },
{ t: 10, price:15 },
The elements are the same if t property value is the same as previous element t value so at the output I just want such buffers...
[ { t: 10, price:12 }, { t: 10, price:13}, { t: 10, price:14} ],
[ { t: 11, price:12}, { t: 11, price:13} ],
[ { t: 10, price:14 }, { t: 10, price:15 } ]
So in the result I have two elements emited (two buffers each containing the same objects ).
I was trying to use bufferWhen or just buffer but I don't know how to specify closingNotifier in this case because this need to be dependent on elements that are approaching. Anyone can help?
TLDR;
const items = [
{ t: 10, price: 12 },
{ t: 10, price: 13 },
{ t: 10, price: 14 },
{ t: 11, price: 12 },
{ t: 11, price: 13 },
{ t: 10, price: 14 },
{ t: 10, price: 15 }
];
const src$ = from(items).pipe(
delay(0),
share()
);
const closingNotifier$ = src$.pipe(
distinctUntilKeyChanged('t'),
skip(1),
share({ resetOnRefCountZero: false })
);
src$.pipe(bufferWhen(() => closingNotifier$)).subscribe(console.log);
StackBlitz demo.
Detailed explanation
The tricky part was to determine the closingNotifier because, as you said, it depends on the values that come from the stream. My first thought was that src$ has to play 2 different roles: 1) the stream which emits values and 2) the closingNotifier for a buffer operator. This is why the share() operator is used:
const src$ = from(items).pipe(
delay(0),
share()
);
delay(0) is also used because the source's items are emitted synchronously. And since the source would be subscribed twice(because the source is the stream, but also the closingNotifier), its important that both subscribers receive values. If delay(0) was omitted, only the first subscriber would receive the items, and the second one would receive nothing, because it was registered after all the source's items have been emitted. With delay(0) we just ensure that both subscribers(the first one from the subscribe callback and the second one is the inner subscriber of closingNotifier) are registered before the source emits the value.
Onto closingNotifier:
const closingNotifier$ = src$.pipe(
distinctUntilKeyChanged('t'),
skip(1),
share({ resetOnRefCountZero: false })
);
distinctUntilKeyChanged('t'), is used because the signal that the buffer should emit the accumulated items is when an item with a different t value comes from the stream.
skip(1) is used because when the very first value comes from the stream, after the first subscription to the closingNotifier, it will cause the buffered items to be sent immediately, which is not what we want, because it is the first batch of items.
share({ resetOnRefCountZero: false }) - this is the interesting part; as you've seen, we're using bufferWhen(() => closingNotifier$) instead of buffer(closingNotifier$); that is because buffer first subscribes to the source, and then to the notifier; this complicates the situation a bit so I decided to go with bufferWhen, which subscribes to the notifier first and then to the source; the problem with bufferWhen is that it resubscribes the to closingNotifier each time after it emits, so for that we needed to use share, because we wouldn't like to repeat the logic for the first batch of items(the skip operator) when there have already been some items; the problem with share()(without the resetOnRefCountZero option) is that it will still resubscribe each time after it emits, because that's the default behavior when the inner Subject used by share is left without subscribers; this can be solved by using resetOnRefCountZero: false, which won't resubscribe to the source when the first subscriber is registered, after the inner Subject had been previously left without subscribers;

How to normalize Recharts ComposedChart with one dataKey significantly greater than other

I'm wondering if it's possible to "normalize" dataset value for multiple rechart instances (let's say Area and Bar).
The problem is that my dataset is a monthly Stats chart for orders income and amount. And one value is significantly greater than the other.
Example data looks:
{
income: 5050,
amount: 3,
},
{
income: 8600,
amount: 5,
},
Component (simplified)
<ComposedChart data={daysData}>
<Area dataKey="income" />
<Bar dataKey="amount" />
</ComposedChart>
This results in Bar chart barely visible on charts.
I would like bars to be for example half of the container height despite the original value is low.
I could manually multiple amount by 1000 and somehow transform tooltip values, but this is not a stable solution, because amounts could be measured in hundreds of thousands or even millions
As a temporary solution, I wrote simple normalizing functions. It works but doesn't seem right.
Is it anything similar in functionality I could do with native recharts params?
const daysDataNormalized = useMemo(() => {
const greatestIncome = daysData.reduce((acc, x) => (acc.income > x.income ? acc : x)).income;
const greatestAmount = daysData.reduce((acc, x) => (acc.amount > x.amount ? acc : x)).amount;
const multiplier = greatestIncome / greatestAmount / 2;
return daysData.map((stat) => ({
name: stat.name,
income: stat.income,
amount: stat.amount * multiplier,
amountTooltip: stat.amount,
}));
}, [daysData]);
const TooltipFormater = (value, name, props) => {
if (name === 'amount') {
return value;
}
if (name === 'income') {
return props.payload.amountTooltip;
}
};

How to show scatter plot on specific condition which I set using dc.js

I want a scatter plot composed with a line chart, but I only want the scatter plot to show when the value is not zero.
I have data as below, range of val1 is 0~100, range of val2 is -1, 0, 1
[
{
val1: 10,
val2: 0
},
{
val1: 20,
val2: 1
},
{
val1: 30,
val2: -1
},
{
val1: 40,
val2: -1
},
{
val1: 50,
val2: 1
},
{
val1: 60,
val2: 0
},
{
val1: 70,
val2: 0
},
{
val1: 80,
val2: 1
},
{
val1: 90,
val2: 1
},
{
val1: 100,
val2: 1
}
]
I want to show the line chart of val1 every tick and I want to put a scatter plot on top of this line when val2 is -1 or 1, not 0. The scatter plot should be colored by the value.
How can I do it?
This is another of those places where a "fake group" can come in handy, because we're both transforming a group (by coloring the dots), and omitting some points.
(Despite the ugly name for this pattern, it's quite powerful to do live transformations of the data after it's been aggregated, and this technique will probably shape future versions of dc.js.)
Crossfilter on indices
First though, we have to use another unusual technique in order to deal with this data, which has no field which corresponds to the X axis. This may or may not come up in your actual data.
We'll define the crossfilter data as the range of indices within the data, and the dimension key as the index:
var ndx = crossfilter(d3.range(experiments.length)),
dim = ndx.dimension(function(i) { return i; }),
Now whenever we read data, we'll need to use the index to read the original array. So the first group (for the line chart) can be defined like this:
group1 = dim.group().reduceSum(function(i) { return experiments[i].val1; });
Transforming and filtering
Now we get to the heart of the question: how to produce another group which has colored dots for the non-zero val2 values.
Following the "fake group" pattern, we'll create a function which, given a group, produces a new object with a .all() method. The method pulls the data from the first group and transforms it.
function keep_nonzeros(group, field2) {
return {
all: function() {
return group.all().map(function(kv) {
return {
key: kv.key,
value: {
y: kv.value,
color: experiments[kv.key][field2]
}
}
}).filter(function(kv) {
return kv.value.color != 0
})
}
}
}
I chose to first transform the data by adding the color field to the value with .map(), and then filter out the zeros with .filter(). Most "fake groups" use one or both of these handy Array methods.
Building the composite
Now we can build a composite chart using a line chart and a scatter plot:
chart
.width(600)
.height(400)
.x(d3.scale.linear())
.xAxisPadding(0.25).yAxisPadding(5)
.elasticX(true)
.compose([
dc.lineChart(chart).group(group1),
dc.scatterPlot(chart).group(keep_nonzeros(group1, 'val2'))
// https://github.com/dc-js/dc.js/issues/870
.keyAccessor(function(kv) { return kv.key; })
.valueAccessor(function(kv) { return kv.value.y; })
.colorAccessor(function(kv) { return kv.value.color; })
.colors(d3.scale.ordinal().domain([-1,1]).range(['red', 'black']))
]);
Most of this is boilerplate stuff at this point, but note that we have to set both the key and value accessors for the scatterPlot, because it makes unusual assumptions about the key structure which only matter if you want to do rectangular brushing.
Fiddle: https://jsfiddle.net/gordonwoodhull/6cm8bpym/17/

D3 Zoomable Treemap changing the children accessor

I am trying to use Mike Bostock's zoomable treemap http://bost.ocks.org/mike/treemap/ with one modification. Instead of using nested JSON data, I have have a simple mapping from parents to a list of children. I built a function, getChildren(root), that simply returns root's children, or null if root does not have any children.
I have tried replacing all instances of d.children() with getChildren(d) in the treemap javascript file, but it seems that it is not working properly.
The resulting page shows the orange bar as normal up top, but nothing else displays correctly (i.e. there are no rectangles underneath the orange bar, just empty gray space). All the text from the children is mashed up in the top left corner of the empty gray space, so it might be that coordinates are not being assigned correctly.
Any ideas??
Thanks!
It looks like there were a few issues here:
Your data structure doesn't seem to be referencing the child nodes:
var nMap = {};
nMap.papa = {};
nMap.papa["children"] = [];
nMap.papa["children"].push({
"name": "c1"
});
// snip
nMap.c1 = {
size: 5
};
Unless I'm missing something, your getChildren function gets the { name: "c1" } object but never looks up nMap.c1. I'm not exactly certain what your alternative data structure is trying to achieve, but it seems like the most obvious option is to use a flat map of nodes, with children referenced by id, like this:
var nMap = {};
nMap.c1 = {
name: "c1",
value: 5
};
nMap.c2 = {
name: "c2",
value: 5
};
nMap.c3 = {
name: "c3",
value: 5
};
nMap.papa = {
name: "papa",
children: ['c1', 'c2', 'c3']
};
With a structure like this, you can map to the real children in the getChildren function:
function getChildren(par){
var parName = par.name,
childNames = parName in nMap && nMap[parName].children;
if (childNames) {
// look up real nodes
return childNames.map(function(name) { return nMap[name]; });
}
}
Your children were using size instead of value to indicate weight, and the rest of the code expected value (so they all had weight 0).
Because you're using the "zoomable" treemap approach, which uses a specialized version of the treemap layout, you don't need to specify the .children accessor of the treemap layout. Instead, use your custom accessor in the the custom layout helper:
function layout(d) {
// get the children with your accessor
var children = getChildren(d);
if (children && children.length > 0) {
treemap.nodes({ children: children });
children.forEach(function(c) {
c.x = d.x + c.x * d.dx;
c.y = d.y + c.y * d.dy;
c.dx *= d.dx;
c.dy *= d.dy;
c.parent = d;
layout(c);
});
}
}
Working fiddle here: http://jsfiddle.net/nrabinowitz/WpQCy/

d3 accessor of hierarchical array of objects

I'm feeding data to d3 via json in a format that looks like this:
[
{
"outcome_id":22,
"history":[
{
"time":"2013-05-06T16:38:55+03:00",
"balance_of_power":0.2
},
{
"time":"2013-05-07T00:38:55+03:00",
"balance_of_power":0.2222222222222222
},
{
"time":"2013-05-07T08:38:55+03:00",
"balance_of_power":0.36363636363636365
}
],
"winner":true,
"name":"Pauline"
},
{
"outcome_id":23,
"history":[
{
"time":"2013-05-06T16:38:55+03:00",
"balance_of_power":0.2
},
{
"time":"2013-05-07T00:38:55+03:00",
"balance_of_power":0.1111111111111111
},
{
"time":"2013-05-07T08:38:55+03:00",
"balance_of_power":0.09090909090909091
}
],
"winner":false,
"name":"Romain"
}
]
I use this data to draw both a multiple series line chart (to show the evolution of "balance_of_power" through time) and a donut chart to represent the latest value of "balance_of_power" for all series.
So each top-level array element is an object that has several attributes, one of them being "history", which is itself an array of objects (that have the time and balance_of_power attributes).
A working example can be found here.
To produce the data for the donut chart I use a function that takes the latest element from each history array (the data is sorted by time) and generate another attribute that's called "last_balance".
For example the first element becomes:
{
"outcome_id":22,
"history":[...],
"winner":true,
"name":"Pauline",
"last_balance":0.36363636363636365
}
And then I specify the right accessor from the pie layout value:
pie = d3.layout.pie().value(function(d) { return d.latest_balance; })
Now I'd like to get rid of the extra step and change the accessor function so that I can read the value directly form the initial data structure and also be able to access any balance_of_power for a time given as an argument.
Is there a way to do that with only modifying the accessor of pie value ?
EDIT
I changed the .value function to this:
.value(function(d) {
var h = d.history[0];
d.history.forEach(function(elt, i, a) {
console.log("======"); // start debug
console.log("search for:"+selected_time.toString());
console.log("current value:"+d.history[i].time.toString());
console.log("test:"+(d.history[i].time == selected_time));
console.log("======"); // end debug
if(d.history[i].time == selected_time) {
h = d.history[i];
}
});
return h.balance_of_power;
})
But the comparison always fails, even when the values seem to be identical, so the previous code always returns the initial value.
Here's what the javascript console shows for the last iteration:
====== final_balance_donut_chart.js?body=1:11
search for:Thu Jun 06 2013 16:06:00 GMT+0200 (CEST) final_balance_donut_chart.js?body=1:12
current value:Thu Jun 06 2013 16:06:00 GMT+0200 (CEST) final_balance_donut_chart.js?body=1:13
test:false final_balance_donut_chart.js?body=1:14
======
EDIT 2
For some reason I had to convert both times to string to make this work.
Here is the final code fore .value:
.value(function(d) {
var h = d.history[0];
d.history.forEach(function(elt) {
if(elt.time.toString() == selected_time.toString()) {
h = elt;
}
});
return h.balance_of_power;
})
Yes, your code would look something like this.
time = "...";
pie = d3.layout.pie()
.value(function(d) {
var h = d.history[0];
for(var i = 0; i < d.history.length; i++) {
if(d.history[i].time == time) {
h = d.history[i];
break;
}
}
return h.balance_of_power;
});
You will need to handle the case when the time is not in the history though.

Resources