I have a simple animated node diagram showing hospital infections over time. It's modeled on Mike Bostock's Wealth of Nations (wonderful work).
I'm trying to understand why I'm seeing nodes 'out ahead' of the nodes being transitioned in the current month. It's a weird thing to try and explain and easier to look at in the diagram itself.
The code charting doesn't seem complicated (however, I suspect I'm doing something wrong with the grouping).
And the CSV data itself looks pretty clean.
So what's happing to cause the transition() of nodes ahead of the current month?
And more importantly, how can I clean this up to prevent it?
Related
maybe this is not the place for this question, but maybe someone is an experienced user of D3.js.
I would like to create a dendrogram where I initially show nodes from different levels (precomputed) and nodes are colored differently. The nodes have different tooltips for colored part and for the grey part.
Also I would like to side that with a heatmap.
Do you think combining those thing is possible in D3?
Since the work to do that is quite big I would like to know if it is reasonable to even start.
Part of the result I'm aiming for is here:
The short answer to your question is yes.
I'm looking into the same sort of problem/challenge and found a very nice example that almost exactly does what you describe: https://github.com/MaayanLab/clustergrammer
Since the solution involves 10k+ lines of code and this case is not a simple 'use this to do this' answer I'm not providing code excerpts (for details see their github). In short; it uses D3 libraries + javascript code for dynamic plotting, zooming and sorting of the heatmap and a collapsed dendrogram. It loads (meta)-data from a pre-computed json file that contains the information on clusters and some meta data.
I understand your question you don't prefer a pre-computed input. This is also the case for the application that I am buidling. I'm looking into generalising the generation of the json file from an SQLquery which can then hook up to the clustergrammer.js code. I will update this thread if I find out more/have a different/working solution that does everything on the fly.
I want to get my hands dirty with some machine learning, and I finally have a problem which seems like a good beginner project. However, despite reading a lot about the subject I am unsure how to get started, and what my basic approach should be.
I have a dataset which should look like this.
a real dataset looks more like this:
I want to identify the points in the red circles (on the first image), and be robust against occasional artifacts like the one in the blue circle.
I sounds like a really easy task. However, the is quite a lot of noise in the raw data. My current implementation is pretty traditional. It blurs the data and compares the first and second derivative to some estimated threshold values. This approach works, but can "only" identify the points with ~99.7% accuracy, but since I do around 100.000 measurements a day I would love to increase this number.
So, this is what I have:
All the datasets I want/need
A pretty good model of how the data should look.
A pretty good training set, using my existing algorithm (the outlines can be fixed manually)
However, I do not have a basic idea how what approach I should use. I feels like none of the material I've read on machine learning fit's this problem.
Can someone help me with the super high level approach to solve this problem?
I am getting the gps position and time of a voluntary person which moves around. I am acquiring the position every second with Matlab and save it in a matrix.
Now I would like to be able to say if the person is moving normal or not. For example running in circles is not normal for a person who usually only walks around.
I am not looking for a complete solution because I would like to learn through my project and understand every aspect. I would be very grateful if you could show me the right direction. Good literature, tutorials and simple catchwords would also be very helpful for me because at the moment I dont know how to approach my problem.
Thank you very much in advance!
Kind regards,
Tom
What you're looking for is anomaly detection. The primary commercial application of this technology is in fraud detection. As for pointing to resources any books that cover data mining should have a section about anomaly detection.
Something to for warn you about, it sounds from your description that you will be working with time series data which is its own branch of data mining.
Catchwords: Anomaly Detection, and Time Series Data.
Books: ISBN-13 978-0321321367 Introduction to Data Mining (This is a good starting point if you don't have a lot of background in the subject)
This question is in regards to Mike Bostock's very exciting d3.js library in general, and more specifically the treemap plot. Note: treemap seems to have two versions, the "talk version" and the "example version". My question relates to the "talk version," which has the zoom feature.
My question is more of a wish: How difficult would it be to extend treemap to accommodate and show multiple internal nodes, with multiple levels of zoom? For example, click to go down one level and option-click to go up one level. Perhaps to keep things tidy, only nodes one level deeper are painted -- as you zoom in, deeper levels are resolved.
This is my pie-in-the-sky wish -- I am not familiar with javascript and can't take this on right now -- but it seems do-able on a visual/UI level. I did notice that mbostock commented here that treemap only shows leaf nodes, but I don't know if this is a design constraint or just a SMOP.
Anyone with any interest in doing this? Possibly for a commission? Thanks.
It appears the author posted an nearly exact answer my question on his website the day after I posted this question. Whether or not this question prompted the adaptation, I am excited to try it out!
He is calling it "Zoomable Treemap". He also points out a couple other examples on the net.
Thanks, mbostock!
I'm a member of a small but fairly sociable online forum, and just for fun we've been plotting a chart of who's met who in real life. Here's what it looked like fairly recently.
(The colour is the "distance" from the currently-selected user, e.g., yellow is someone who's met someone who's met them. And no, I'm not Zak.) Apologies for the faded lines, they don't seem to have weathered the SO upload process very well.
It's generated as SVG, with a big block of JSON defining who's met who. The position (x,y) of each member on the chart is hard-coded into that JSON. Until now, it's been fairly easy to cope when someone meets someone else - at worst, maybe two or three people need to be shuffled around - but it does involve editing the co-ordinates manually. And now that the European and North American contingents are meeting up, and a few on the periphery are showing up at meets, all hell is breaking loose...
We can put some effort into making all the nodes draggable, which would make the job of re-arranging a bit less tiresome. But it seems more sensible to let the computer take care of positioning them, especially as the problem will only get harder with more members.
So, does anyone know of an algorithm for positioning these nodes on the chart, based on which other nodes they're linked with?
Ideally, it would
minimise or avoid long links
avoid having lines run underneath unrelated nodes
take account of the fact that well-connected nodes are bigger
do its best to show the wider "all these guys met each other" relationships (the big circle at the bottom is largely the result of one meet, for example, though the chart has no idea of when any two people met)
but if it gets us close enough to tweak it, that's progress.
And, what's the real name for these charts? I believe they're called "link charts", but I'm not getting good results from Google using that name or anything else I can think of.
We'll likely be implementing this in PHP or Javascript, but right now it's how to begin approaching the problem that's the bigger question.
Edit: Some great answers coming already. I would be very interested in the actual algorithm(s) used, though, as well as tools that do the job.
What you are looking for are f.e. force-based algorithms. There are quite a few libraries, and some have been named already, like prefuse, yWorks. Here a few more: jung, gvf, jGraph.
The real name for it is "graph". To generate graph, and have a good layout algorithm, the best is to use a software which will do the job.
I advise you to use Gephi.
This soft is able to do all the things you want to.
Have a look at the yWorks tools.
You can google for graph visualization. There are more libraries for this, including GraphViz, but probably not all your requirements will be met.
If you can deal w/ Java, take a look at prefuse.
Have a look at NodeXL
Also, this book may be relevant.