Complex d3.nest() manipulation - d3.js

I have an array of arrays that looks like this:
var arrays = [[1,2,3,4,5],
[1,2,6,4,5],
[1,3,6,4,5],
[1,2,3,6,5],
[1,7,5],
[1,7,3,5]]
I want to use d3.nest() or even just standard javascript to convert this data into a nested data structure that I can use with d3.partition.
Specifically, I want to create this flare.json data format.
The levels of the json object I want to create with d3.nest() correspond to the index positions in the array. Notice that 1 is in the first position in all the subarrays in the example data above; therefore, it is at root of the tree. At the next positions in the arrays there are three values, 2, 3, and 7, therefore, the root value 1 has 3 children. At this point the tree looks like this:
1
/ | \
2 3 7
At the third position in the subarrays there are four values, 3, 5, and 6. These children would be places into the tree as follows:
1
____|___
/ | \
2 3 7
/ \ / / \
3 6 6 3 5
How can I produce this data structure using d3.nest()? The full data structure with the example data I showed above should look like this:
{"label": 1,
"children": [
{"label": 2, "children": [
{"label": 3, "children": [
{"label": 4, "children": [
{"label": 5}
]},
{"label": 6, "children": [
{"label": 5}
]}
]},
{"label": 6, "children": [
{"label": 4, "children": [
{"label": 5}
]}
]},
{"label": 3, "children": [
{"label": 6, "children": [
{"label": 4, "children": [
{"label": 5}
]}
]}
]},
{"label": 7, "children": [
{"label": 3, "children": [
{"label": 5}
]},
{"label": 5}
]}
]}
]}
I'm trying to convert my array data structure above using something like this (very wrong):
var data = d3.nest()
.key(function(d, i) { return d.i; })
.rollup(function(d) { return d.length; })
I've been banging my head for a week to try and understand how I can produce this hierarchical data structure from an array of arrays. I'd be very grateful if someone could help me out.
#meetamit's answer in the comments is good, but in my case my tree is too deep to repeatedly apply .keys() to the data, so I cannot manually write a function like this.

Here's a more straightforward function that just uses nested for-loops to cycle through all the path instructions in each of your set of arrays.
To make it easier to find the child element with a given label, I have implemented children as a data object/associative array instead of a numbered array. If you want to be really robust, you could use a d3.map for the reasons described at that link, but if your labels are actually integers than that's not going to be a problem. Either way, it just means that when you need to access the children as an array (e.g., for the d3 layout functions), you have to specify a function to make an array out of the values of the object -- the d3.values(object) utility function does it for you.
The key code:
var root={},
path, node, next, i,j, N, M;
for (i = 0, N=arrays.length; i<N; i++){
//for each path in the data array
path = arrays[i];
node = root; //start the path from the root
for (j=0,M=path.length; j<M; j++){
//follow the path through the tree
//creating new nodes as necessary
if (!node.children){
//undefined, so create it:
node.children = {};
//children is defined as an object
//(not array) to allow named keys
}
next = node.children[path[j]];
//find the child node whose key matches
//the label of this step in the path
if (!next) {
//undefined, so create
next = node.children[path[j]] =
{label:path[j]};
}
node = next;
// step down the tree before analyzing the
// next step in the path.
}
}
Implemented with your sample data array and a basic cluster dendogram charting method:
http://fiddle.jshell.net/KWc73/
Edited to add:
As mentioned in the comments, to get the output looking exactly as requested:
Access the data's root object from the default root object's children array.
Use a recursive function to cycle through the tree, replacing the children objects with children arrays.
Like this:
root = d3.values(root.children)[0];
//this is the root from the original data,
//assuming all paths start from one root, like in the example data
//recurse through the tree, turning the child
//objects into arrays
function childrenToArray(n){
if (n.children) {
//this node has children
n.children = d3.values(n.children);
//convert to array
n.children.forEach(childrenToArray);
//recurse down tree
}
}
childrenToArray(root);
Updated fiddle:
http://fiddle.jshell.net/KWc73/1/

If you extend the specification of Array, it's not actually that complex. The basic idea is to build up the tree level by level, taking each array element at a time and comparing to the previous one. This is the code (minus extensions):
function process(prevs, i) {
var vals = arrays.filter(function(d) { return prevs === null || d.slice(0, i).compare(prevs); })
.map(function(d) { return d[i]; }).getUnique();
return vals.map(function(d) {
var ret = { label: d }
if(i < arrays.map(function(d) { return d.length; }).max() - 1) {
tmp = process(prevs === null ? [d] : prevs.concat([d]), i+1);
if(tmp.filter(function(d) { return d.label != undefined; }).length > 0)
ret.children = tmp;
}
return ret;
});
}
No guarantees that it won't break for edge cases, but it seems to work fine with your data.
Complete jsfiddle here.
Some more detailed explanations:
First, we get the arrays that are relevant for the current path. This is done by filtering out those that are not the same as prevs, which is our current (partial) path. At the start, prevs is null and nothing is filtered.
For these arrays, we get the values that corresponds to the current level in the tree (the ith element). Duplicates are filtered. This is done by the .map() and .getUnique().
For each of the values we got this way, there will be a return value. So we iterate over them (vals.map()). For each, we set the label attribute. The rest of the code determines whether there are children and gets them through a recursive call. To do this, we first check whether there are elements left in the arrays, i.e. if we are at the deepest level of the tree. If so, we make the recursive call, passing in the new prev that includes the element we are currently processing and the next level (i+1). Finally, we check the result of this recursive call for empty elements -- if there are only empty children, we don't save them. This is necessary because not all of the arrays (i.e. not all of the paths) have the same length.

Since d3-collection has been deprecated in favor of d3.array, we can use d3.groups to achieve what used to work with d3.nest:
var input = [
[1, 2, 3, 4, 5],
[1, 2, 6, 4, 5],
[1, 3, 6, 4, 5],
[1, 2, 3, 6, 5],
[1, 7, 5],
[1, 7, 3, 5]
];
function process(arrays, depth) {
return d3.groups(arrays, d => d[depth]).map(x => {
if (
x[1].length > 1 || // if there is more than 1 child
(x[1].length == 1 && x[1][0][depth+1]) // if there is 1 child and the future depth is inferior to the child's length
)
return ({
"label": x[0],
"children": process(x[1], depth+1)
});
return ({ "label": x[0] }); // if there is no child
});
};
console.log(process(input, 0));
<script src="https://d3js.org/d3-array.v2.min.js"></script>
This:
Works as a recursion on the arrays' depths.
Each recursion step groups (d3.groups) its arrays on the array element whose index is equal to the depth.
Depending on whether there are children or not, the recursion stops.
Here is the intermediate result produced by d3.groups within a recursion step (grouping arrays on there 3rd element):
var input = [
[1, 2, 3, 4, 5],
[1, 2, 6, 4, 5],
[1, 2, 3, 6, 5]
];
console.log(d3.groups(input, d => d[2]));
<script src="https://d3js.org/d3-array.v2.min.js"></script>

Edit - fixed
Here is my solution
Pro:It is all in one go (doesn't need objects converting to arrays like above)
Pro:It keeps the size/value count
Pro:the output is EXACTLY the same as a d3 flare with children
Con:it is uglier, and likely less efficient
Big Thanks to previous comments for helping me work it out.
var data = [[1,2,3,4,5],
[1,2,6,4,5],
[1,3,6,4,5],
[1,2,3,6,5],
[1,7,5],
[1,7,3,5]]
var root = {"name":"flare", "children":[]} // the output
var node // pointer thingy
var row
// loop through array
for(var i=0;i<data.length;i++){
row = data[i];
node = root;
// loop through each field
for(var j=0;j<row.length;j++){
// set undefined to "null"
if (typeof row[j] !== 'undefined' && row[j] !== null) {
attribute = row[j]
}else{
attribute = "null"
}
// using underscore.js, does this field exist
if(_.where(node.children, {name:attribute}) == false ){
if(j < row.length -1){
// this is not the deepest field, so create a child with children
var oobj = {"name":attribute, "children":[] }
node.children.push(oobj)
node = node.children[node.children.length-1]
}else{
// this is the deepest we go, so set a starting size/value of 1
node.children.push({"name":attribute, "size":1 })
}
}else{
// the fields exists, but we need to find where
found = false
pos = 0
for(var k=0;k< node.children.length ;k++){
if(node.children[k]['name'] == attribute){
pos = k
found = true
break
}
}
if(!node.children[pos]['children']){
// if no key called children then we are at the deepest layer, increment
node.children[pos]['size'] = parseInt(node.children[pos]['size']) + 1
}else{
// we are not at the deepest, so move the pointer "node" and allow code to continue
node = node.children[pos]
}
}
}
}
// object here
console.log(root)
// stringified version to page
document.getElementById('output').innerHTML = JSON.stringify(root, null, 1);
Working examples
https://jsfiddle.net/7qaz062u/
Output
{ "name": "flare", "children": [ { "name": 1, "children": [ { "name": 2, "children": [ { "name": 3, "children": [ { "name": 4, "children": [ { "name": 5, "size": 1 } ] } ] }, { "name": 6, "children": [ { "name": 4, "children": [ { "name": 5, "size": 1 } ] } ] } ] }, { "name": 3, "children": [ { "name": 6, "children": [ { "name": 4, "children": [ { "name": 5, "size": 1 } ] } ] }, { "name": 3, "children": [ { "name": 6, "children": [ { "name": 5, "size": 1 } ] } ] } ] }, { "name": 7, "children": [ { "name": 5, "size": 1 }, { "name": 3, "children": [ { "name": 5, "size": 1 } ] } ] } ] } ] }

Related

Graph Search - Finding all the nodes that need to be delinked in order to disconnect two nodes

So I was solving a code challenge where the input was given as array of connections:
[
[2, 9],
[7, 2],
[7, 9],
[9, 5],
]
where the format of the array is [fromNode,toNode]. I tried to solve the problem with time complexity of O(N^2) however it seemed to have failed some unknown test cases that I couldn't think of.
Here's what I did:
const buildConnections = ({ toID, fromID, allConnections }) => {
const directConnectionNodes = {}; // Nodes which are directly connected to `toNode`
const mutualConnections = {}; // Nodes and their respective connections (Directly or Indirectly)
allConnections.forEach(([from, to]) => {
// O(N)
if (to == toID) {
directConnectionNodes[from] = true; // Adding the node to direct connections
} else {
const keys = Object.keys(mutualConnections);
// This step is basically to update existing node if the connections are updated
// Example First pair is [3,5] which would result in {5:{3}}
// Second pair is [2,3] which would result in {3:{2} & 5:{...allPairsOf3}=5:{3,2}}
// => Since 3 had a new connection with 2
keys.forEach((key) => {
// O(N*N)
const value = mutualConnections[key];
if (value[to] !== undefined) {
mutualConnections[key] = {
// O(N*N*N)
...mutualConnections[key],
[from]: true,
};
}
});
if (mutualConnections[from]) {
mutualConnections[to] = {
...mutualConnections[from],
[from]: true,
};
} else mutualConnections[to] = { [from]: true };
}
});
const directConnectionNodeKeys = Object.keys(directConnectionNodes);
const results = [];
// Here I check for disconnecting each node that could possibly connect the two nodes together
directConnectionNodeKeys.forEach((key) => {
if (
key == fromID ||
(mutualConnections[key] && mutualConnections[key][fromID])
)
results.push(key);
});
console.log(results.join(' '));
}
I had two concerns here:
First thing is that the algorithm I wrote assuming that there's a big graph with lots of edges and vertices, could go O(NMN) which is like (N^3) which is pretty bad, can someone help me figure out a more optimized way to do so?
Second thing is that what possible edge cases could I be missing?

Buffer elements based on its contents with comparer function

In RSJS how to buffer values so buffer will be flushed when next element is different from previous. If elements by some comparator are the same then it should buffer them until next change is detected...
Suppose I have such elements...
{ t: 10, price:12 },
{ t: 10, price:13 },
{ t: 10, price:14 },
{ t: 11, price:12 },
{ t: 11, price:13 },
{ t: 10, price:14 },
{ t: 10, price:15 },
The elements are the same if t property value is the same as previous element t value so at the output I just want such buffers...
[ { t: 10, price:12 }, { t: 10, price:13}, { t: 10, price:14} ],
[ { t: 11, price:12}, { t: 11, price:13} ],
[ { t: 10, price:14 }, { t: 10, price:15 } ]
So in the result I have two elements emited (two buffers each containing the same objects ).
I was trying to use bufferWhen or just buffer but I don't know how to specify closingNotifier in this case because this need to be dependent on elements that are approaching. Anyone can help?
TLDR;
const items = [
{ t: 10, price: 12 },
{ t: 10, price: 13 },
{ t: 10, price: 14 },
{ t: 11, price: 12 },
{ t: 11, price: 13 },
{ t: 10, price: 14 },
{ t: 10, price: 15 }
];
const src$ = from(items).pipe(
delay(0),
share()
);
const closingNotifier$ = src$.pipe(
distinctUntilKeyChanged('t'),
skip(1),
share({ resetOnRefCountZero: false })
);
src$.pipe(bufferWhen(() => closingNotifier$)).subscribe(console.log);
StackBlitz demo.
Detailed explanation
The tricky part was to determine the closingNotifier because, as you said, it depends on the values that come from the stream. My first thought was that src$ has to play 2 different roles: 1) the stream which emits values and 2) the closingNotifier for a buffer operator. This is why the share() operator is used:
const src$ = from(items).pipe(
delay(0),
share()
);
delay(0) is also used because the source's items are emitted synchronously. And since the source would be subscribed twice(because the source is the stream, but also the closingNotifier), its important that both subscribers receive values. If delay(0) was omitted, only the first subscriber would receive the items, and the second one would receive nothing, because it was registered after all the source's items have been emitted. With delay(0) we just ensure that both subscribers(the first one from the subscribe callback and the second one is the inner subscriber of closingNotifier) are registered before the source emits the value.
Onto closingNotifier:
const closingNotifier$ = src$.pipe(
distinctUntilKeyChanged('t'),
skip(1),
share({ resetOnRefCountZero: false })
);
distinctUntilKeyChanged('t'), is used because the signal that the buffer should emit the accumulated items is when an item with a different t value comes from the stream.
skip(1) is used because when the very first value comes from the stream, after the first subscription to the closingNotifier, it will cause the buffered items to be sent immediately, which is not what we want, because it is the first batch of items.
share({ resetOnRefCountZero: false }) - this is the interesting part; as you've seen, we're using bufferWhen(() => closingNotifier$) instead of buffer(closingNotifier$); that is because buffer first subscribes to the source, and then to the notifier; this complicates the situation a bit so I decided to go with bufferWhen, which subscribes to the notifier first and then to the source; the problem with bufferWhen is that it resubscribes the to closingNotifier each time after it emits, so for that we needed to use share, because we wouldn't like to repeat the logic for the first batch of items(the skip operator) when there have already been some items; the problem with share()(without the resetOnRefCountZero option) is that it will still resubscribe each time after it emits, because that's the default behavior when the inner Subject used by share is left without subscribers; this can be solved by using resetOnRefCountZero: false, which won't resubscribe to the source when the first subscriber is registered, after the inner Subject had been previously left without subscribers;

Is it possible to get indices from x values in d3?

I have a data set containing dates (x values). Later in the code I need to get index of the array element which contains that date. Is it possible to get a particular index of the array according to date input?
Data set:
{"date": "2006-12-01", "POPYFR": "6.32296e+07", "status": {}}, {"date": "2007-12-01", "POPYFR": "6.36451e+07", "status": {}
So if I have a date 2006-12-01 the function should return 0 etc.
Since you are using d3 try bisector or bisect
https://observablehq.com/#d3/d3-bisect
const data = {"date": "2006-12-01", "POPYFR": "6.32296e+07", "status": {}}, {"date": "2007-12-01", "POPYFR": "6.36451e+07", "status": {}
bisect = d3.bisector(d => moment(d.date, "YYYY-MM-DD").toDate())
const value = bisect(data, "2006-12-01");
Is there a reason you can't just iterate over the data? Something like
indexOfDate(date) {
for (var i=0; i<dataSet.length; i++) {
if (dataSet[i].date === date) return i;
}
return -1;
}
?

Jsplumb - Connectors

Am trying to draw a flowchart. I create divs dynamically and have set a unique 'id' property for each div and connect them using Jsplumb connectors.
I get the source and destination id from database(note that 'id' property for div dynamically created is its ID from database) and store in 'connectors' json. Its format is
Eg:
{[from:A,to:B], [from:A,to:C], [from:B,to:C]}
angular.forEach(connectors, function (connect) {
$scope.connection(connect.from, connect.to);
})
The jsplumb code is as follows
$scope.connection = function (s, t) {
var stateMachineConnector1 = {
connector: ["Flowchart", { stub: 25, midpoint: 0.001 }],
maxConnections: -1,
paintStyle: { lineWidth: 3, stroke: "#421111" },
endpoint: "Blank",
anchor: "Continuous",
anchors: [strt, end],
overlays: [["PlainArrow", { location: 1, width: 15, length: 12 }]]
};
var firstInstance = jsPlumb.getInstance();
firstInstance.connect({ source: s.toString(), target: t.toString() }, stateMachineConnector1);
}
THE PROBLEM:
What i have now is
Here the connector B to C overlaps existing A to C connector.
What i need is to separate the two connections like below
I could not find a solution for this anywhere. Any help? Thanks!
Using anchor perimeter calculates the appropriate position for endpoints.
jsfiddle demo for perimeter
jsPlumb.connect({
source:$('#item1'),
target:$("#item2"),
endpoint:"Dot",
connector: ["Flowchart", { stub: 25, midpoint: 0.001 }],
anchors:[
[ "Perimeter", { shape:"Square" } ],
[ "Perimeter", { shape:"Square" } ]
]
});
Jsplumb anchors
What I suggest you to do, to exactly replicate your schema, would be to set 2 endpoints on on box on A, B and C
A Endpoints should be [0.25, 1, 0, 0, 0, 0] and [0.75, 1, 0, 0, 0, 0]
B and C Endpoints should be [0.25, 0, 0, 0, 0, 0] and [0.75, 0, 0, 0, 0, 0]
It basically works like this (I might be wrong for the 4 last one its been a while but you only need to worry about the x and y)
[x,y,offsetx, offsety, angle, angle]
For the x 0 is the extreme left and 1 extreme right
Same goes for y (0 is top and 1 is bottom).
Take care

Dimensional Charting with Non-Exclusive Attributes

The following is a schematic, simplified, table, showing HTTP transactions. I'd like to build a DC analysis for it using dc, but some of the columns don't map well to crossfilter.
In the settings of this question, all HTTP transactions have the fields time, host, requestHeaders, responseHeaders, and numBytes. However, different transactions have different specific HTTP request and response headers. In the table above, 0 and 1 represent the absence and presence, respectively, of a specific header in a specific transaction. The sub-columns of requestHeaders and responseHeaders represent the unions of the headers present in transactions. Different HTTP transaction datasets will almost surely generate different sub-columns.
For this question, a row in this chart is represented in code like this:
{
"time": 0,
"host": "a.com",
"requestHeaders": {"foo": 0, "bar": 1, "baz": 1},
"responseHeaders": {"shmip": 0, "shmap": 1, "shmoop": 0},
"numBytes": 12
}
The time, host, and numBytes all translate easily into crossfilter, and so it's possible to build charts answering things like what was the total number of bytes seen for transactions between 2 and 4 for host a.com. E.g.,
var ndx = crossfilter(data);
...
var hostDim = ndx.dimension(function(d) {
return d.host;
});
var hostBytes = hostDim.group().reduceSum(function(d) {
return d.numBytes;
});
The problem is that, for all slices of time and host, I'd like to show (capped) bar charts of the (leading) request and response headers by bytes. E.g. (see the first row), for time 0 and host a.com, the request headers bar chart should show that bar and baz each have 12.
There are two problems, a minor one and a major one.
Minor Problem
This doesn't fit quite naturally into dc, as it's one-directional. These bar charts should be updated for the other slices, but they can't be used for slicing themselves. E.g., you shouldn't be able to select bar and deselect baz, and look for a resulting breakdown of hosts by bytes, because what would this mean: hosts in the transactions that have bar but don't have baz? hosts in the the transactions that have bar and either do or don't have baz? It's too unintuitive.
How can I make some dc charts one directional. Is it through some hack of disabling mouse inputs?
Major Problem
As opposed to host, foo and bar are non-exclusive. Each transaction's host is either something or the other, but a transaction's headers might include any combination of foo and bar.
How can I define crossfilter dimensions for requestHeaders, then, and how can I use dc? That is
var ndx = crossfilter(data);
...
var requestHeadersDim = ndx.dimension(function(d) {
// What should go here?
});
The way I usually deal with the major problem you state is to transform my data so that there is a separate record for each header (all other fields in these duplicate records are the same). Then I use custom group aggregations to avoid double-counting. These custom aggregations are a bit hard to manage so I built Reductio to help with this using the 'exception' function - github.com/esjewett/reductio
Hacked it (efficiently, but very inelegantly) by looking at the source code of dc. It's possible to distort the meaning of crossfilter to achieve the desired effect.
The final result is in this fiddle. It is slightly more limited than the question, as the fields of responseHeaders are hardcoded to foo, bar, and baz. Removing this restriction is more in the domain of simple Javascript.
Minor Problem
Using a simple css hack, I simply defined
.avoid-clicks {
pointer-events: none;
}
and gave the div this class. Inelegant but effective.
Major Problem
The major problem is solved by distorting the meaning of crossfilter concepts, and "fooling" dc.
Let's say the data looks like this:
var transactions = [
{
"time": 0,
"host": "a.com",
"requestHeaders": {"foo": 0, "bar": 1, "baz": 1},
"responseHeaders": {"shmip": 0, "shmap": 1, "shmoop": 0},
"numBytes": 12
},
{
"time": 1,
"host": "b.org",
"requestHeaders": {"foo": 0, "bar": 1, "baz": 1},
"responseHeaders": {"shmip": 0, "shmap": 1, "shmoop": 1},
"numBytes": 3
},
...
];
We can define a "dummy" dimension, which ignores the data:
var transactionsNdx = crossfilter(transactions);
var dummyDim = transactionsNdx
.dimension(function(d) {
return 0;
});
Using this dimension, we can define a group that counts the total foo, bar, and baz bytes of the filtered rows:
var requestHeadersGroup = dummyDim
.group()
.reduce(
/* callback for when data is added to the current filter results */
function (p, v) {
return {
"foo": p.foo + v.requestHeaders.foo * v.numBytes,
"bar": p.bar + v.requestHeaders.bar * v.numBytes,
"baz": p.baz + v.requestHeaders.baz * v.numBytes,
}
},
/* callback for when data is removed from the current filter results */
function (p, v) {
return {
"foo": p.foo - v.requestHeaders.foo * v.numBytes,
"bar": p.bar - v.requestHeaders.bar * v.numBytes,
"baz": p.baz - v.requestHeaders.baz * v.numBytes,
}
},
/* initialize p */
function () {
return {
"foo": 0,
"bar": 0,
"baz": 0
}
}
);
Note that this isn't a proper crossfilter group at all. It will not map the dimensions to their values. Rather, it maps 0 to a value which itself maps the dimensions to their values (ugly!). We therefore need to transform this group into something that actually looks like a crossfilter group:
var getSortedFromGroup = function() {
var all = requestHeadersGroup.all()[0].value;
all = [
{
"key": "foo",
"value": all.foo
},
{
"key": "bar",
"value": all.bar
},
{
"key": "foo",
"value": all.baz
}];
return all.sort(function(lhs, rhs) {
return lhs.value - rhs.value;
});
}
var requestHeadersDisplayGroup = {
"top": function(k) {
return getSortedFromGroup();
},
"all": function() {
return getSortedFromGroup();
},
};
We now can create a regular dc chart, and pass the adaptor group
requestHeadersDisplayGroup to it. It works normally from this point on.

Resources