Dendrogram from hierarchical clustering in D3 - d3.js

I'm quite new to D3 so I apologize if this is a very basic question. I wish to implement a dendrogram that shows the result of a hierarchical clustering algorithm. This layout differs in a major point from the examples I have been able to find: Except for the leaves of the tree, the nodes does not have any identity, but merely joins subtrees at specific heights relative to their like-ness.
As an example look at:
http://r.789695.n4.nabble.com/file/n2293207/Dendrogram.jpeg
Compared to http://bl.ocks.org/mbostock/4063570 this dendrogram does not have the 'n-partite nature' (defined layers for each level of nodes).
The question is thus how to define a dendrogram with arbitrary join positions of subtrees?
Thanks
Thomas
edit:
It seems it was not as difficult as anticipated and it did not require development of a new layout. In my input data I included an additional parameter with the calculated height of the join. An example json file would be something like this:
{
"height": "1",
"children": [
{
"height": "0.8",
"children": [
{
"name": "leaf 1",
"height": "0"
},
{
"height": "0.35",
"children": [
{
"name": "leaf 2",
"height": "0"
},
{
"name": "leaf 3",
"height": "0"
}
]
]
},
{
"name": "leaf 4",
"height": "0"
}
]
}
and then, when calculating the node object, transform the y value using map:
var nodes = cluster.nodes(root).map(function(d) {d.y = scale(d.height); return d});
where cluster is your cluster layout object and scale is a suitable scale for the dendrogram height.

Related

Choosing proper data type in OpenAPI

Could someone recommend what schema to choose in YAML file for API, if the expected response is the following:
"items": [{
"garden": [{
"tree": "pine",
"height": "33"
}, {
"height": "33",
"age": "200"
}
]
}
],
I was thinking about
list (items)
object (garden)
list (list of objects)
object (pairs - height+age)
I am confused, because property height may appear a lot of times in the list together with different params.

D3 - Group Rows of Swimlane Data

I am developing a swimlane diagram using D3 v4. The diagram is a planning aid depicting Tasks which are carried out over time. The Tasks are up the Y axis and time is along the X axis.
Here is some example data to help understand my problem:
Tasks
[
{
"id": "2a606884-d6b9-4ad1-a5ff-5c816c43fef6",
"description": "Task 01",
"start": "2017-11-07T02:00:00.000Z",
"finish": "2017-11-07T08:00:00.000Z",
"label": "Task 01",
"taskTypeId": "0b936e39-49b9-4cc8-b5c5-b1f1338e9faf",
"taskTypeDescription": "Walk the dog"
},
{
"id": "6713025e-63e2-4ff3-8202-43e17c13431d",
"description": "Task 02",
"start": "2017-11-07T08:00:00.000Z",
"finish": "2017-11-07T12:00:00.000Z",
"label": "Task 02 02",
"taskTypeId": "9af060ba-5abf-4627-8462-7c21281ab487",
"taskTypeDescription": "Wash the car"
},
{
"id": "ff071aa5-e14b-4b32-bd51-7cf079f4c876",
"description": "Task 03",
"start": "2017-11-07T12:00:00.000Z",
"finish": "2017-11-07T14:00:00.000Z",
"label": "Task 03",
"taskTypeId": "8e6a8b11-0e23-4473-8795-ac74fc1efe07",
"taskTypeDescription": "Make the beds"
},
{
"id": "a84219e2-5da9-4119-915b-d84f35fda9d0",
"description": "Task 04",
"start": "2017-11-07T12:00:00.000Z",
"finish": "2017-11-07T14:00:00.000Z",
"label": "Task 04",
"taskTypeId": "a065dfe2-2c68-4467-84a5-1fce7c34513b",
"taskTypeDescription": "Wash up dishes"
}
]
New TaskTypes array nested by Area:
[
{
"key": "Outdoor",
"values": [
{
"id": "a97ad203-37e4-4fb8-8168-c3fdc1980d3d",
"description": "Walk the dog",
"areaId": "19952c5a-b762-4937-a613-6151c8cd9332",
"areaDescription": "Outdoor"
},
{
"id": "0b936e39-49b9-4cc8-b5c5-b1f1338e9faf",
"description": "Wash the car",
"areaId": "19952c5a-b762-4937-a613-6151c8cd9332",
"areaDescription": "Outdoor"
}
]
},
{
"key": "Indoor",
"values": [
{
"id": "8632bd18-8968-4185-95f0-f093f7fc9a02",
"description": "Make the beds",
"areaId": "87d8f755-ef60-4cfa-9a4a-c94cff9f8a22",
"areaDescription": "Indoor"
},
{
"id": "8e6a8b11-0e23-4473-8795-ac74fc1efe07",
"description": "Wash the dishes",
"areaId": "87d8f755-ef60-4cfa-9a4a-c94cff9f8a22",
"areaDescription": "Indoor"
}
]
}
]
So far my diagram lists all the TaskTypes up the Y axis using a simple 1 dimensional array of TaskTypes. Each Task is then positioned, within its row for the Task's TaskType, along the X axis according to the start/finish of the Task. All good.
My yScale is currently like this:
this.yScale = d3
.scaleBand()
.domain(this.taskTypeDescriptions)
.rangeRound([0, this.chartHeight])
.padding(padding);
...where this.taskTypeDescriptions is a simple 1 dimensional array of TaskTypes.
Now I have grouped TaskTypes by Area. In my example data above there are 2 Areas: Outdoor and Indoor. I want to visually group the rows of TaskType by their parent Area.
This appears to be like a grouped bar chart except that all the examples I have seen of these have the same second tier of data repeated for each primary group of data. In my scenario I have many discrete TaskTypes, none of them repeated, but I still want them grouped by their Area. Is this possible at all?
Any suggestions or thoughts very welcome. Thanks.

D3 Tree Graph Modified Data Model

I'm hoping to get opinions on what would be a good data model to drawing something like d3 tree graph. Tree graph won't work for me as is because I have scenarios where a child node could be linked with two parent nodes, sometimes, the parent nodes from different levels of hierarchy. I'm planning to modify d3 tree graph to use a different data model, and here's where I'd really appreciate expert opinions. Following is a simple representation of what I think the data model could be. One is hierarchical model and the other is flat. Has anyone actually meddled with d3 tree data model? Any help/opinions are greatly appreciated! Thanks in advance for the help!
var hierarchicalData = [
{
"id": "n1",
"children": [
{
"id": "n1-a",
"children":[
{
"id": "n1-a-1"
}
]
},
{
"id": "n1-b",
"children":[
{
"id": "n1-b-1"
}
]
}
]
},
{
"id": "n2",
"children": [
{
"id": "n2-a",
"children":[
{
"id": "n2-a-1"
}
]
}
]
}
];
The following is a flat representation of the exact same hierarchical model but contains "level" that represents hierarchy.
{
"n1":{
"level": 0,
"children": ["n1-a", "n1-b"],
},
"n1-a":{
"level": 1,
"children":["n1-a-1"]
},
"n1-a-1":{
"level": 2,
"children":[]
},
"n1-b":{
"level": 1,
"children":["n1-b-1"]
},
"n1-b-1":{
"level": 2,
"children":[]
},
"n2":{
"level": 0,
"children": ["n2-a"]
},
"n2-a":{
"level": 1,
"children": ["n2-a-1"]
},
"n2-a-1":{
"level": 2,
"children":[]
}
}
If a child node can have more than one parent, then it's not a tree graph by definition.
There are several UI approaches you may take if you want to have a tree but it really depends on what you're trying to accomplish.
I worked with d3 tree to present a company Org-chart.
Several companies have employees with a direct manager and secondary manager.
What we did is showing the connection only to the direct manager.
But we presented the link to the other manager on mouse-over on the employee node.
This is more of a UI solution than a data model solution and there are many other possibilities.
Another option is to do what My-heritage did with family tree. They're showing both parents of each node, but only one of them is connected to the rest of the tree presented.

Can D3 or CrossFilter be used to aggregate totals from multidimensional datasets?

Question: When provided a multidimensional json structure that lacks key:values for some nodes, is it possible to reduce the structure in D3 or CrossFilter to aggregate totals accross each node that has a unique identifier?
Using plain object notation, I'm able to filter on just on subset of data: data.map.exe.Program_Files. I would like to calculate and group from Class and Type located 3 nodes down from the map notation regardless of the parent node value. The Donut Chart would be used to filter on (ps1, exe, dll, etc..)
Example Script:
Plunker Example
Note: I have no control over the dataset and required to maintain the structure of the current data consumed by other D3 controls. So flattening into a new Object Array would lose binding with other controls.
Sample Subset of dataset, see data.json on plunker:
{
"map": {
"ps1": {
"User": [],
"Program_Files": [],
"System_Files": [
{
"class": "dir",
"type": "ps1",
"total": 10,
"handled": 5,
"nothandled": 5,
"percentHandled": 0.0,
"id": 1,
"directory": "c:\\windows\\system32\\"
}
],
"Temp": [],
"Public": []
},
"dll": {
"User": [],
"Program_Files": [],
"System_Files": [
{
"class": "dir",
"type": "dll",
"total": 1000,
"handled": 685,
"nothandled": 315,
"percentHandled": 0.0,
"id": 1,
"directory": "c:\\windows\\system32\\"
}
],
"Temp": [],
"Public": []
}....
After further research and guidance, the final solution was to flatten the structure into a single object array. One can still process multi-deminsional json structures when a key:value element exists, this was not my case, so the format condition forced me to flatten the structure.

from Neo4j to GraphJSON with Ruby

I'm trying to get visualizations using d3.js or alchemy.js--but alchemy, in particular, requires the datasource to be in GraphJSON.
I've been playing around with the tutorials and examples of Max De Marzi (using neography), Michael Hunger (cy2neo, js), Neo4j, and Neo4j.rb -- but I cannot seem to get all the way there. Mostly because I don't know what I'm doing--but this is how I'm trying to learn.
What I'm trying to achieve would be along the lines of:
https://bl.ocks.org/mbostock/3750558
or the default visualization here: http://graphalchemist.github.io/Alchemy/#/docs
And you can see what GraphJSON formatting should look like by finding it on this page also: http://graphalchemist.github.io/Alchemy/#/docs
If I run the following...
get '/followers' do
Neo4j::Session.open(:server_db, "http://localhost:7474")
query = Neo4j::Session.query('MATCH (a--(b)--(c) RETURN a,b,c LIMIT 30')
puts "--------------"
puts query_to_graph_json(query)
query_to_graph_json(query)
end
# This is supposed to grab nodes and edges, but it never gets edges.
# It's originally from a conversation at the neo4j.rb site
def query_to_graph_json(query)
nodes = {}
edges = {}
add_datum = Proc.new do |datum|
case datum
when Neo4j::ActiveNode, Neo4j::Server::CypherNode
nodes[datum.neo_id] = {
id: datum.neo_id,
properties: datum.props #was attributes, but kept saying that wasn't a method
}
when Neo4j::ActiveRel, Neo4j::Server::CypherRelationship
edges[[datum.start_node.neo_id, datum.end_node.neo_id]] = {
source: datum.start_node.neo_id,
target: datum.end_node.neo_id,
type: datum.rel_type,
properties: datum.props
}
else
raise "Invalid value found: #{datum.inspect}"
end
end
query.each do |row|
row.to_a.each do |datum|
if datum.is_a?(Array)
datum.each {|d| add_datum.call(d) }
else
add_datum.call(datum)
end
end
end
{
nodes: nodes.values,
edges: edges.values
}.to_json
end
I'll get...
{
"nodes": [
{
"id": 597,
"properties": {
"name": "John",
"type": "Person"
}
},
{
"id": 127,
"properties": {
"name": "Chris",
"type": "Person"
}
},
{
"id": 129,
"properties": {
"name": "Suzie",
"type": "Person"
}
},
],
"edges": [
]
}
The problem being that I need the edges.
If I run...
get '/followers' do
content_type :json
neo = Neography::Rest.new("http://localhost:7474")
cypher = "MATCH (a)--(b)--(c) RETURN ID(a),a.name,ID(b),b.name,ID(c),c.name LIMIT 30"
puts neo.execute_query(cypher).to_json
end
I'll get a table of paths. But it's not formatted in the way I need--and I have no idea how it might get from this format to the GraphJSON format.
{
"columns": [
"ID(a)",
"a.name",
"ID(b)",
"b.name",
"ID(c)",
"c.name"
],
"data": [
[
597,
"John",
127,
"Chris",
129,
"Suzie"
],
[
597,
"John",
6,
"Adam",
595,
"Pee-Wee"
]
]
}
I think that one problem that you're having is that, instead of matching two nodes and one relationship, you're matching three nodes and two relationships. Here's your MATCH:
MATCH (a)--(b)--(c)
It should be like:
MATCH (a)-[b]-(c)
In a MATCH clause the [] can be excluded and you can just do a raw -- (or --> or <--) which represents the relationship.
You probably want to be querying for one specific direction though. If you query bidirectionally you'll get the same relationship twice with the start and end nodes switched.
Using neo4j-core (which I biased towards as one of the maintainers ;)
nodes = []
rels = []
session.query('(source)-[rel]->(target)').pluck(:source, :rel, :target).each do |source, rel, target|
nodes << source
nodes << target
rels << rel
end
{
nodes: nodes,
edges: rels
}.to_json
Also note that if you don't specify any labels your query might be slow, depending on the number of nodes). Depends on what you need ;)
This Cypher query should return the edges array as per the example format:
MATCH (a)-[r]-(b)
WITH collect(
{
source: id(a),
target: id(b),
caption: type(r)
}
) AS edges
RETURN edges
Running this against some sample data, the results look like this:
[
{
"source": 9456,
"target": 9454,
"caption": "LIKES"
},
{
"source": 9456,
"target": 9454,
"caption": "LIKES"
},
{
"source": 9456,
"target": 9455,
"caption": "LIKES"
},
{
"source": 9454,
"target": 9456,
"caption": "LIKES"
}
]

Resources