How to improve Dagre js calculation performance - dagre-d3

I'm using Dagre to generate graph coordinates on the frontend for a graph of around 700 nodes & 700 edges and it is currently taking around 1.5 to 2 seconds to generate (this is before rendering). How should I go about optimising this, are there any known ways to speed it up?
For example, I already know the graph is directed, acyclic & topo-sorted (validated in API), so could somehow skip this part of the algorithm?
Another approach could be to try and reduce the size of the graph in the first place, by 'clustering' closed groups as per the diagram below (which could then be expanded on click in the ui). Any known algorithms to achieve this?

Related

How to optimize a large graph for pathfinding algorithms?

I develop a small program where an user can create a simple diagrams with abstract blocks connected by lines, for example, flow charts or structural diagrams. One of the clause of the statement of work is that lines have to get around other blocks\lines and don't intersect them while moving.
Illustration
I tried to use pathfinding algorithms like A* or Lee's algorithm for it and consider a workspace (a window with diagram elements) like a graph - one pixel is one graph node. However, moving of blocks\lines causes the significant time delay (for example, pathfinding in the workspace with size 500x500 takes about 320-360 ms). It seems like the graph is too big for those algorithms.
Could you please tell me how to reduce the number of nodes with regard to this case? Maybe is there a way to speed up those algorithms or use something other for it?!
Don't think of this as a graph theory problem, think of it as a physics problem.
Visualize it as follows. Each block has a specified force pulling it towards the last place that it was put. Line segments, blocks, and the edge of the graph repel each other with an inverse square law (except that the end of the line you are drawing doesn't repel blocks in front of it). Under sufficient stress, a line segment can be broken into smaller line segments that have a pull towards returning to being a straight line.
The dynamics are complicated, but the number of entities is the number of objects you see on screen, not the number of pixels it is drawn on. Therefore you'll be able to do updates relatively quickly.
You'll need to play with the dynamics a bit to get a good experience, but this should be a more tractable approach.

Reduce number of nodes in 3D A* pathfinding using (part of a) uniform grid representation

I am calculating pathfinding inside a mesh which I have build a uniform grid around. The nodes (cells in the 3D grid) close to what I deem a "standable" surface I mark as accessible and they are used in my pathfinding. To get alot of detail (like being able to pathfind up small stair cases) the ammount of accessible cells in my grid have grown quite large, several thousand in larger buildings. (every grid cell is 0.5x0.5x0.5 m and the meshes are rooms with real world dimensions). Even though I only use a fraction of the actual cells in my grid for pathfinding the huge ammount slows the algorithm down. Other than that it works fine and finds the correct path through the mesh, using a weighted manhattan distance heuristic.
Imagine my grid looks like that and the mesh is inside it (can be more or less cubes but its always cubical), however the pathfinding will not be calculated on all the small cubes just a few marked as accessible (usually at the bottom of the grid but that can depend on how many floors the mesh has).
I am looking to reduce the search space for the pathfinding... I have looked at clustering like how HPA* does it and other clustering algorithms like Markov but they all seem to be best used with node graphs and not grids. One obvious solution would be to just increase the size of the small cubes building the grid but then I would lose alot of detail in the pathfinding and it would not be as robust. How could I cluster these small cubes? This is how a typical search space looks when I do my pathfinding (blue are accessible, green is path):
and as you see there is a lot of cubes to search through because the distance between them is quite small!
Never mind that the grid is an unoptimal solution for pathfinding for now.
Does anyone have an idea on how to reduce the ammount of cubes in the grid I have to search through and how would I access the neighbors after I reduce the space? :) Right now it only looks at the closest neighbors while expanding the search space.
A couple possibilities come to mind.
Higher-level Pathfinding
The first is that your A* search may be searching the entire problem space. For example, you live in Austin, Texas, and want to get into a particular building somewhere in Alberta, Canada. A simple A* algorithm would search a lot of Mexico and the USA before finally searching Canada for the building.
Consider creating a second layer of A* to solve this problem. You'd first find out which states to travel between to get to Canada, then which provinces to reach Alberta, then Calgary, and then the Calgary Zoo, for example. In a sense, you start with an overview, then fill it in with more detailed paths.
If you have enormous levels, such as skyrim's, you may need to add pathfinding layers between towns (multiple buildings), regions (multiple towns), and even countries (multiple regions). If you were making a GPS system, you might even need continents. If we'd become interstellar, our spaceships might contain pathfinding layers for planets, sectors, and even galaxies.
By using layers, you help to narrow down your search area significantly, especially if different areas don't use the same co-ordinate system! (It's fairly hard to estimate distance for one A* pathfinder if one of the regions needs latitude-longitude, another 3d-cartesian, and the next requires pathfinding through a time dimension.)
More efficient algorithms
Finding efficient algorithms becomes more important in 3 dimensions because there are more nodes to expand while searching. A Dijkstra search which expands x^2 nodes would search x^3, with x being the distance between the start and goal. A 4D game would require yet more efficiency in pathfinding.
One of the benefits of grid-based pathfinding is that you can exploit topographical properties like path symmetry. If two paths consist of the same movements in a different order, you don't need to find both of them. This is where a very efficient algorithm called Jump Point Search comes into play.
Here is a side-by-side comparison of A* (left) and JPS (right). Expanded/searched nodes are shown in red with walls in black:
Notice that they both find the same path, but JPS easily searched less than a tenth of what A* did.
As of now, I haven't seen an official 3-dimensional implementation, but I've helped another user generalize the algorithm to multiple dimensions.
Simplified Meshes (Graphs)
Another way to get rid of nodes during the search is to remove them before the search. For example, do you really need nodes in wide-open areas where you can trust a much more stupid AI to find its way? If you are building levels that don't change, create a script that parses them into the simplest grid which only contains important nodes.
This is actually called 'offline pathfinding'; basically finding ways to calculate paths before you need to find them. If your level will remain the same, running the script for a few minutes each time you update the level will easily cut 90% of the time you pathfind. After all, you've done most of the work before it became urgent. It's like trying to find your way around a new city compared to one you grew up in; knowing the landmarks means you don't really need a map.
Similar approaches to the 'symmetry-breaking' that Jump Point Search uses were introduced by Daniel Harabor, the creator of the algorithm. They are mentioned in one of his lectures, and allow you to preprocess the level to store only jump-points in your pathfinding mesh.
Clever Heuristics
Many academic papers state that A*'s cost function is f(x) = g(x) + h(x), which doesn't make it obvious that you may use other functions, multiply the weight of the cost functions, and even implement heatmaps of territory or recent deaths as functions. These may create sub-optimal paths, but they greatly improve the intelligence of your search. Who cares about the shortest path when your opponent has a choke point on it and has been easily dispatching anybody travelling through it? Better to be certain the AI can reach the goal safely than to let it be stupid.
For example, you may want to prevent the algorithm from letting enemies access secret areas so that they avoid revealing them to the player, and so that they AI seems to be unaware of them. All you need to achieve this is a uniform cost function for any point within those 'off-limits' regions. In a game like this, enemies would simply give up on hunting the player after the path grew too costly. Another cool option is to 'scent' regions the player has been recently (by temporarily increasing the cost of unvisited locations because many algorithms dislike negative costs).
If you know what places you won't need to search, but can't implement in your algorithm's logic, a simple increase to their cost will prevent unnecessary searching. There's a lot of ways to take advantage of heuristics to simplify and inform your pathfinding, but your biggest gains will come from Jump Point Search.
EDIT: Jump Point Search implicitly selects pathfinding direction using the same heuristics as A*, so you may be able to implement heuristics to a small degree, but their cost function won't be the cost of a node, but rather, the cost of traveling between the two nodes. (A* generally searches adjacent nodes, so the distinction between a node's cost and the cost of traveling to it tends to break down.)
Summary
Although octrees/quad-trees/b-trees can be useful in collision-detection, they aren't as applicable to searches because they section a graph based on its coordinates; not on its connections. Layering your graph (mesh in your vocabulary) into super graphs (regions) is a more effective solution.
Hopefully I've covered anything you'll find useful.
Good luck!

Fastest Algorithm For Graph Planarization

I'm using Processing to develop a navigation system for complex data and processes. As part of that I have gotten into graph layout pretty deeply. It's all fun and my opinions on layout algorithms are : force-directed is for sissies (just look at it scale...haha), eigenvector projection is cool, Sugiyama layers look good but fail fast on graphy graphs, and although I have relied on eigenvectors thus far, I need to minimize edge crossings to really get to the point of the data. I know, I know NP-complete etc.
I should add that I have some good success from applying xy boxing and using Sugiyama-like permutation to reduce edge crossings across rows and columns. Viz: graph (|V|=90,avg degree log|V|) can go from 11000 crossings -> 1500 (by boxed eigenvectors) -> 300 by alternating row and column permutations to reduce crossings.
But the local minima... whatever it is sticks around this mark, and the result is not as clear as it could be. My research into the lit suggests to me that I really want to use the planarization algorithm like what they do use for VLSI:
Use BFS or something to make the maximal planar subgraph
1.a. Layout the planar subgraph nice-like
Cleverly add outstanding edges to recover the original graph
Please reply with your thoughts on the fastest planarization algorithm, you are welcome to go into some depth about any specific optimizations you have had familiarity with.
Thanks so much!
Given that all graphs are not planar (which you know), it might be better to go for a "next-best" approach rather than a "always provides best answer" approach.
I say this because my roommate in grad school had the same problem you did. He was trying to convert a graph to planar form and all the algorithms that guarantee minimum edge crossings were too slow. What he ended up doing what just implementing a randomized algorithm. Basically, lay out the graph, then jigger nodes that have edges with lots of crossings, and eventually you'll handle the worst clumps of edges.
The advantages were: you can cut it out after a specific amount of time, so if you're time limited this always comes in under a certain amount of time, if you get a crappy graph you can just run it again (over the already laid out graph) to get something better, and it's relatively easy to code up.
Disadvantage is you don't always get the global minima in crossings, and if the graph gets stuck in high crossing areas, his algorithm would sometimes shoot a node way off into the distance to try and resolve it, which sometimes causes really weird looking graphs.
You know already lots of graph layout but I believe your question is still, unfortunately, underspecified.
You can planarize any graph really quickly by doing the following: (1) randomly layout the graph on a plane (2) calculate where edges cross (3) create pseudo-vertices at the crossings (this is what you anyway do when you use planarity based layout for non-planar graphs) (4) the graph extended with the new vertices and split edges is automatically planar.
The first difficulty comes from having an algorithm that calculates a combinatorial embedding that minimizes the number of edge crossings. The second difficulty is using that combinatorial embedding to do layout in Euclidean plane that is visually appealing, e.g. for orthogonal graph layout you might want to minimize the number of bends, maximize the size of faces, minimize the area of the graph as a whole, etc., and these goals might conflict each other.

Layout a graph for printing on paper

Our application displays potentially big graphs with lots of nodes and edges. We use things like dot, of course, to layout the graph, and they look well on screen. However, users would like to print them to paper. Now technically we can do it, we split the graph into little images and the users could print that. But there is no guarantee that by cutting at the page size we're not going to cut through nodes, have loads of edges between different pages, etc. I'm looking for an algorithm that would alter the layout of the graph so that it'd be more usable on printing:
- ensure no node will fall on page boundaries
- try to minimize edges between pages
It seems that looks like finding "clusters" of densely related nodes to put on the same page with few edges crossing to other pages to other clusters. Can anybody point me to relevant literature/tools to do this kind of things?
Thanks
A cluster analysis is a good start.
I would suggest the following way to process:
First define a cost function:
Given a page repartition : Cost(Repartition) = f(nodes near boundaries, multipages-edges)
THe objective will be to minimize this function.
chose a cluster analysis algorithm:
define a cluster algorithm (you can have a look at my post on DBSCAN : DBSCAN code in C# or vb.net , for Cluster Analysis
and add a "page" constraint to the clustering. (size of the page)
Depending on order you will go through your points the outcome will be different because of the "page constraint".
chose the one with the smallest cost, or stop when you find a acceptable cost.
The complexity of this algorithm would be O(n!) ... kind of big for big graphs.
Therefore you need to think of some more clustering constraints, or just make a partial search (test the n2 first to have it in O(n2)) to get a good approximation.
Depending of the acceptable cost you can get a result in an reasonnable time or not.

data structure to support google/bing maps [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I was wondering what the data structure is in an application like google/bing maps. How is it that the results are returned so quickly when searching for directions?
what kind of algorithms are being used to determine this information?
thanks
There are two parts to your question:
What kind of data structure is used to store the map information.
What kind of algorithm is used to "navigate" from source to destination.
To this, I would add another question:
How is Google/Bing able to "stream in" the data. So for example, you are able to zoom in from miles up to the ground level seamlessly, all the while maintaining the coordinate system.
I will attempt to address each question in order. Do note, that I do not work for the Google Maps or the Bing team, so quite obviously, this information might not be completely accurate. I am basing this off of the knowledge gained from a good CS course about data structures and algorithms.
Ans 1) The map is stored in an Edge Weighted Directed Graph. Locations on the map are Vertices and the path from one location to another (from one vertex to another) are the Edges.
Quite obviously, since there can be millions of vertices and an order of magnitude more edges, the really interesting thing would be the representation of this Edge Weighted Digraph.
I would say that this would be represented by some kind of Adjacency List and the reason I say so is because, if you imagine a map, it is essentially a sparse graph. There are only a few ways to get from one location to another. Think about your house! How many roads (edges in our case) lead to it? Adjacency Lists are good for representing sparse graphs, and adjacency matrix is good for representing dense graphs.
Of course, even though we are able to efficiently represent sparse graphs in memory, given the sheer number of Vertices and Edges, it would be impossible to store everything in memory at once. Hence, I would imagine some kind of a streaming library underneath.
To create an analogy for this, if you have ever played an open-world game like World of Warcraft / Syrim / GTA, you will observe that to a large part, there is no loading screen. But quite obviously, it is impossible to fit everything into memory at once. Thus using a combination of quad-trees and frustum culling algorithms, these games are able to dynamically load resources (terrain, sprites, meshes etc).
I would imagine something similar, but for Graphs. I have not put a lot of thought into this particular aspect, but to cook up a very basic system, one can imagine an in memory database, which they query and add/remove vertices and edges from the graph at run-time as needed. This brings us to another interesting point. Since vertices and edges need to be removed and added at run-time, the classic implementation of Adjacency List will not cut it.
In a classic implementation, we simply store a List (a Vector in Java) in each element of an array: Adj[]. I would imagine, a linked list in place of the Adj[] array and a binary search tree in place of List[Edge]. The binary search tree would facilitate O(log N) insertion and removal of nodes. This is extremely desirable since in the List implementation, while addition is O(1), removal is O(N) and when you are dealing with millions of edges, this is prohibitive.
A final point to note here is that until you actually start the navigation, there is "no" graph. Since there can be million of users, it doesn't make sense to maintain one giant graph for everybody (this would be impossible due to memory space requirement alone). I would imagine that as you stat the navigation process, a graph is created for you. Quite obviously, since you start from location A and go to location B (and possibly other locations after that), the graph created just for you should not take up a very large amount of memory (provided the streaming architecture is in place).
Ans 2) This is a very interesting question. The most basic algorithm for solving this problem would be Dijkstra Path Finding algorithm. Faster variations such as A* exist. I would imagine Dijkstra to be fast enough, if it could work properly with the streaming architecture discussed above. Dijkstra uses space proportional to V and time proportional to E lg V, which are very good figures, especially for sparse graphs. Do keep in mind, if the streaming architecture has not been nailed down, V and E will explode and the space and run-time requirements of Dijkstra will make it prohibitive.
Ans 1) Streaming question: Do not confuse this question with the streaming architecture discussed above. This is basically asking how the seamless zoom is achieved.
A good algorithm for achieving this is the Quad Tree algorithm (you can generalize this to n-tree). You store coarser images higher up in the tree and higher resolution images as you traverse down the tree. This is actually what KML (Keyhole) did with its mapping algorithm. Keyhole was a company that partnered with NVIDIA many years back to produce one of the first "Google Earth" like softwares.
The inspiration for Quad Tree culling, comes from modern 3D games, where it is used to quickly cull away parts of the scene which is not in the view frustum.
To further clarify this, imagine that you are looking at the map of USA from really high up. At this level, you basically split the map into 4 sections and make each section a child of the Quad Tree.
Now, as you zoom in, you zoom in on one of the sections (quite obviously you can zoom right in the center, so that your zoom actually touches all 4 sections, but for simplicity's sake, lets say you zoom in on one of the sections). So when you zoom in to one section, you traverse the 4 children of that section. These 4 children contain higher resolution data of its parent. You can then continue to zoom down till you hit a set of leaves, which contain the highest resolution data. To make the jump from one resolution to the next "seamless" a combination of blur and fading effects can be utilized.
As a follow-up to this post, I will try to add links to many of the concepts I put in here.
For this sort of application, you would want some sort of database to represent map features and the connections between them, and would then need:
spatial indexing of the map feature database, so that it can be efficiently queried by 2D coordinates; and
a good way to search the connections to find a least-cost route, for some measure of cost (e.g. distance).
For 1, an example would be the R-tree data structure.
For 2, you need a graph search algorithm, such as A*.
Look up a paper about Highway Dimension from google authors. The idea is to precompute the shortest path between important nodes and then route everything through those. You are not going to use residential streets to go from LA to Chicago save for getting on and off the freeway at both ends.
I'm not sure of the internal data structure, but it may be some kind of 2D coordinate based tree structure that only displays a certain number of levels. The levels would correspond to zoom factors, so you could ignore as insignificant things below, say, 5 levels below the current level, and things above the current level.
Regardless of how it's structured, here's how you can use it:
http://code.google.com/apis/maps/documentation/reference.html
I would think of it as a computational geometry problem. When you click on a particular coordinate in the map and using that information, can get the latitude and longitude of that location. Based on the latitude and longitude and the level of zoom, the place can be identified.
Once you have identified the two places, the only problem for you is to identify the nearest route. Now this problem is finding the shortest path between two points, having polygon blocks between them(which correspond to the places which contains no roads) and the only possible connections are roads. This is a known problem and efficient algorithms exist to solve this.
I am not sure if this is what google is doing, but I hope they do something on these lines.
I am taking computational geometry this semester. Here is the course link: http://www.ams.sunysb.edu/~jsbm/courses/545/ams545.html. Check them if you are interested.
I was wondering what the data
structure is in an application like
google/bing maps.
To the user: XHTML/CSS/Javascript. Like any website.
On the server: who knows? Any Google devs around here? It certainly isn't PHP or ASP.net...
How is it that the results are
returned so quickly when searching for
directions?
Because Google spent years, manpower and millions of dollars on building up the architecture to get the fastest server reaction time possible?
What kind of algorithms are being used
to determine this information?
A journey planner algorithm.

Resources