Performance and layouts of Cytoscape.js - performance

I'm testing the rendering performance of Cytoscape.js.
My graph contains about 5000 nodes and 5000 edges without x, y positions, using automatic layout of Cytoscape.js. But it takes more than 15 seconds with euler layout extension after rendering all nodes and edges, the brower of the graph page will get stuck for a while or response slowly in the next operation. As it was said, the Cytoscape.js is limited by the performance of browsers. We load json data from java server client, and load datas with for loop, then use layout.run() to run auto layout. How to improve the performance with big data?
Datas with x,y positions will improve the performance, right? But we don't know how to circulate the x,y positions in Java. Can you show me? Is there java-plugin for the layouts in Cytoscape.js?

Cytoscape.js is impossible to be used with actual big data (i.e. terrabytes or more) because it runs in the browser. Even for medium sizes like your 5000 nodes and edges, 15 seconds sounds normal for Cytoscape.js.
The problem is that JavaScript is slower in tasks like graph layouting because modern CPUs have more and more cores and JavaScript's parallelism implementation (web workers) have too much overhead for algorithms with many short iteration steps where the result of all threads has to be integrated. Also, as far as I know, GPU computing is harder to do in JavaScript.
Both of those problems may be addressed in the future and the developer of Cytoscape.js, Max Franz, seems to be extremely active and supportive, so if JavaScript ever gets better parallelization and GPU computing support, I am positive this will find its way into Cytoscape.js shortly.
For now you could try some workarounds:
Is the graph always the same? Then you could precalculate the layout and load it as a preset layout.
Does the graph at least change infrequently? Then you can cache the layout and only recalculate it if necessary.
I don't know what you mean by "circulate the x,y positions in Java", do you mean "load a preset layout in the JavaScript(!) library Cytoscape.js"? If so, that is explained here: http://js.cytoscape.org/#layouts/preset. Specifically, you define the x and y coordinates like:
let options = {
name: 'preset',
positions: ... // map of (node id) => (position obj); or function(node){ return somPos; }
...
There are also graph visualizations that have way less features than Cytoscape.js but are faster, so if you don't need any of the features and just want to visualize a simple graph, you may try ngraph, see a demo at http://www.yasiv.com/graphs#Bai/rw5151.

Related

d3js force large number of nodes

pl. help me with this noob questions. I want to show a network with large number (70000) of nodes, and 2.1 million links in force layout. Looking for a good and scalable way to do this.
How do we actually show such large nodes practically, can we do some kind of approximation and show semantically same network (e.g: http://www.visualcomplexity.com/vc/project.cfm?id=76 )
How do we actually reduce such data in back end [ say using KDE ? We cannot afford to use science.js in front end as the volume is large ]
Initial view can be the network with pre-determined locations of the nodes or clusters. How do we predertmine the locations in back end, before sending the data to d3js. Do we have to use topojson ?
Any such examples are available using d3js (and a backend - say java, python etc) ?
Sorry about the question, but do you really need to show all that information in one shot?
If you really need it, have first a look with Gephi and see what it looks like, then pass to the next step.
If you see that you can focus on specific nodes or patterns at the beginning and then explore the result of the chart, probably this is the best solution from a performance point of view.
In case the discovery approach works but you are still having troubles with many items on the screen, just control the force layout with a time based threshold. It's not perfect but it will work for hundred nodes.
Next step
If you decide to go anyway on this path, I would recommend the followings:
Aggregate: that's probably the most useful thing you can do here: let the user interact with the data and dig in it to see more in detail. That is the best solution if you have to serve many clients.
Do not run the force directed layout on the front end with the entire network as is: it will eat all the browser resources for at least tens of minutes in any case.
Compute the layout on the back end - e.g. using JUNG or Gephi core itself in Java or NetworkX in Python - and then just display the result.
Cache the result of the point above as well: they are many even for the server if you have many clients, so cache it.
When the user drag the network, hide the links: it should speed up the computation ( sigmajs uses this trick)

DrawPrimitives performance

I want to draw single faces instead of xna models because it's too slow.
But I don't know what the difference is between
DrawPrimitives
DrawUserPrimitives
DrawIndexedPrimitives
DrawUserIndexedPrimitives
Which one is the fastest method? And what are the indices good for?
The simple answer to your question is that the "User" versions are a fair bit slower on the CPU because they have to transfer vertex data to the GPU (via the driver and the bus) each time they are called.
The non-User versions use vertex and index buffers that already exist on the GPU (you put them there at load time). They have considerably less data to transfer, so they are faster.
The "User" and "Indexed" versions will also each have a performance impact on the GPU. This impact is relatively tiny. Generally speaking you don't need to worry about it.
The User versions exist because they are faster when your data changes each frame. There is also DynamicVertexBuffer which can be used with the non-User version of the draw functions. I believe it is slightly faster than the User methods in cases where you can pre-allocate the buffer at the desired size.
The Indexed versions allow you to select vertices out of your vertex buffer using an index buffer (so triangles that you draw can choose vertices at any position in the vertex buffer). The alternative is that your vertex buffer is simply interpreted as as sequential list of triangle vertices (based on PrimitiveType). The main reason for the existence of index buffers is to remove the need for duplicate vertices in your vertex buffer (which would require additional memory and processing on the GPU).
BUT...
XNA's Model class internally uses DrawIndexedPrimitives. Not only that, but it uses it correctly (ie: it doesn't draw single faces - but as many as it can at once - for the best performance). So if you are finding that it is slow, then your problem lies elsewhere.
I suggest trying to diagnose the reason why your game is performing poorly, before trying to select a "solution". Maybe ask for help doing that in a question here (or on https://gamedev.stackexchange.com/).
All in one time if you can , Instancied draw will be always better , but that need you to give all the textures in one time ! In my case , for example , I like to draw instancied objects with 1 only texture ... all the trees , all the ground , all buildings , etc ...

Clustering geo-data for heatmap

I have a list of tweets with their geo locations.
They are going to be displayed in a heatmap image transparently placed over Google Map.
The trick is to find groups of locations residing next to each other and display
them as a single heatmap circle/figure of a certain heat/color, based on cluster size.
Is there some library ready to grouping locations in a map into clusters?
Or I better should decide my clusterization params and build a custom algorithm?
I don't know if there is a 'library ready to grouping locations in a map into clusters', maybe it is, maybe it isn't. Anyways, I don't recommend you to build your custom clustering algorithm since there are a lot of libraries already implemented for this.
#recursive sent you a link with a php code for k-means (one clustering algorithm). There is also a huge Java library with other techniques (Java-ML) including k-means too, hierarchical clustering, k-means++ (to select the centroids), etc.
Finally I'd like to tell you that clustering is a non-supervised algorithm, which means that effectively, it will give you a set of clusters with data inside them, but at a first glance you don't know how the algorithm clustered your data. I mean, it may be clustered by locations as you want, but it can be clustered also by another characteristic you don't need so it's all about playing with the parameters of the algorithm and tune your solutions.
I'm interested in the final solution you could find to this problem :) Maybe you can share it in a comment when you end this project!
K means clustering is a technique often used for such problems
The basic idea is this:
Given an initial set of k means m1,…,mk, the
algorithm proceeds by alternating between two steps:
Assignment step: Assign each observation to the cluster with the closest mean
Update step: Calculate the new means to be the centroid of the observations in the cluster.
Here is some sample code for php.
heatmap.js is an HTML5 library for rendering heatmaps, and has a sample for doing it on top of the Google Maps API. It's pretty robust, but only works in browsers that support canvas:
The heatmap.js library is currently supported in Firefox 3.6+, Chrome
10, Safari 5, Opera 11 and IE 9+.
You can try my php class hilbert curve at phpclasses.org. It's a monster curve and reduces 2d complexity to 1d complexity. I use a quadkey to address a coordinate and it has 21 zoom levels like Google maps.
This isn't really a clustering problem. Head maps don't work by creating clusters. Instead they convolute the data with a gaussian kernel. If you're not familiar with image processing, think of it as using a normal or gaussian "stamp" and stamping it over each point. Since the overlays of the stamp will add up on top of each other, areas of high density will have higher values.
One simple alternative for heatmaps is to just round the lat/long to some decimals and group by that.
See this explanation about lat/long decimal accuracy.
1 decimal - 11km
2 decimals - 1.1km
3 decimals - 110m
etc.
For a low zoom level heatmap with lots of data, rounding to 1 or 2 decimals and grouping the results by that should do the trick.

Performance problem with rendering 1000 cubes in XNA 4.0 [duplicate]

I'm aware that the following is a vague question, but I'm hitting performance problems that I did not anticipate in XNA.
I have a low poly model (It has 18 faces and 14 vertices) that I'm trying to draw to the screen a (high!) number of times. I get over 60 FPS (on a decent machine) until I draw this model 5000+ times. Am I asking too much here? I'd very much like to double or triple that number (10-15k) at least.
My code for actually drawing the models is given below. I have tried to eliminate as much computation from the draw cycle as possible, is there more I can squeeze from it, or better alternatives all together?
Note: tile.Offset is computed once during initialisation, not every cycle.
foreach (var tile in Tiles)
{
var myModel = tile.Model;
Matrix[] transforms = new Matrix[myModel.Bones.Count];
myModel.CopyAbsoluteBoneTransformsTo(transforms);
foreach (ModelMesh mesh in myModel.Meshes)
{
foreach (BasicEffect effect in mesh.Effects)
{
// effect.EnableDefaultLighting();
effect.World = transforms[mesh.ParentBone.Index]
* Matrix.CreateTranslation(tile.Offset);
effect.View = CameraManager.ViewMatrix;
effect.Projection = CameraManager.ProjectionMatrix;
}
mesh.Draw();
}
}
You're quite clearly hitting the batch limit. See this presentation and this answer and this answer for details. Put simply: there is a limit to how many draw calls you can submit to the GPU each second.
The batch limit is a CPU-based limit, so you'll probably see that your CPU gets pegged once you get to your 5000+ models. Worse still, when your game is doing other calculations, it will reduce the CPU time available to submit those batches.
(And it's important to note that, conversely, you are almost certainly not hitting GPU limits. No need to worry about mesh complexity yet.)
There are a number of ways to reduce your batch count. Frustrum culling is one. Probably the best one to persue in your case is Geometry Instancing, this lets you draw multiple models in a single batch. Here is an XNA sample that does this.
Better still, if it's static geometry, can you simply bake it all into one or a few big meshes?
As with any performance problem there are limits where a particular approach works. You need to measure and see where problems are. The best option is to use profiler but even basic measurements like looking at CPU load may show what bottlencks you have.
As a first investiagtion step I'd recommend to remove all computations (like matrix multiplications) and see you get improvments - this would mean that CPU is still doing more work than GPU.
Make sure you are not doing measurements on debug build - it could make application significantly slower if it is CPU bound.
Side note: GPU works the best when you send large operations relatively infrequently. Your code does more or less opposite - send huge number of very small drawing requests. You should be able to batch your primitives and get better performance. There are samples around how to render large number of simple objects (including ones in DirectX SDK), searching for "gpu rendering crowds" can give you starting point.

Datastructure for googlemap like application?

I am doing a maprouting application. Several people have suggested me, that I do a datastructure where I split the map in a grid. In theory it sounds really good, but I am not to sure because of the bad performance I get when I implement it.
In the worst case you have to draw every road. If you divide the map in a grid, the sum of roads in all the cells in the grid, will be much larger than if you put all roads in a list.(each cell must have more roads than actually needed if a road goes through it).
If I have to zoom in I can see some smartness in using a grid, but if I keep it in a list I can just decrease the numbers of roads each time I zoom in.
As it is now(by using the list) it is not really fast, so I am all for making it faster. But in practice dividing in a grid makes it slower for me.
Any suggestigion for what datastructure I should be using and/or what I might be doing wrong?
See this question for related information:
What algorithms compute directions from point A to point B on a map?
Somebody who writes this kind of software for a living has answered it.
Also for rendering see:
What is the best way to read, represent and render map data?
I'm not quite sure if you're trying to do routing quick or rendering!
If you want it to go quick, you might be better off organizing your roads in to major and minor roads.
Use the list of minor roads to find a route to the nearest major road.
Use the major roads to get you near the destination.
Then go back to the minor roads to complete the route.
Without a split like this, there are a heck of a lot of roads to search, most of which are quite slow routes.
google does not draw each road every time the screen is refreshed. They used pre-drawn tiles of the map. They can redraw them as needed. e.g. when there is a map update. They even use transparent overlays, stacks of tiles to add and remove layers of details.
Very clever, but very simple.
You may want to look at openlayers javascript library. Free and can do just about anything you need to do with a map.
Maptraction JS is also available - its not as complete as OpenLayers
More optimal then using a grid as your spatial data structure, might be a quadtree because it logarithmically breaks down the map. And from studying the source, my guesstimate is that google uses (that or) a similar data structure.
As for getting directions, you might want to look in to hierarchical path finding to approximate the direction at first and to speed up the process; generic path finding algorithms tend to be quite slow at that level of complexity.

Resources