one-pass force-directed graph drawing algorithm - algorithm

I'm looking for a one-pass algorithm (or ideas of how to write it myself) that can calculate the two or three dimensional coordinates for a directed, unweighted graph.
The only metadata the vertices have are title and category.
I need to implement this algorithm in a way that vertices can be added/removed without recalculating the entire graph structure.
This algorithm has to be applied to a large (5gb) dataset which is constantly changing.
My Google skills have led me to n-pass algorithms which are not what what I am looking for.

I guess your question might still be an open issue. I know a research project called Tulip (http://tulip.labri.fr/TulipDrupal/) which is a (large-scale) graph viewer. A paper on the method is available at http://dept-info.labri.fr/~auber/documents/publi/auberChapterTulipGDSBook.pdf and surely you can find more algorithms browsing the personal web page of D. Auber and his colleagues.

There is a related question here:
https://cstheory.stackexchange.com/questions/11889/an-algorithm-to-efficiently-draw-a-extremely-large-graph-in-real-time
The top answer has a number of papers that might be of interest. I think one of the keys to the problem is to try and recompute the location of a reduced amount of nodes in your graph.

Related

Is there an algorithm to link points, that minimises Manhattan length?

I'm trying to link points in the plane, ie draw a graph, but using only axis-aligned lines. I found the KDTree algorithm
to be quite promising and close to what I need
but it does not make sure the segments are as small as possible.
The result I'm looking for is closer to
I have also read up on
https://en.wikipedia.org/wiki/Delaunay_triangulation
because initially, I thought that would be it;
but it turns out its way off:
- based on circles and triangles
- traces a perimeter
- nodes have multiple connections (>=3)
Can you point me towards an algorithm that already exists?
or can you help me with drafting a new algorithm?
PS: Only 1000-1100 points so efficiency is not super important.
In terms of Goals and Costs, reaching all nodes is the Goal
and the length of all segments is the Cost.
Thanks to MBo, I now know that this is known as 'The Steiner Tree Problem'. This is the subject of a 1992 book (of the same name) demonstrating that it is an NP-hard problem.
This is the Rectilinear variant of that. There are a few approximate algorithms or heuristic algorithms known to (help) solve it.
( HAN, HAN4, LBH, HWAB, HWAD, BEA are listed inside
https://www.sciencedirect.com/science/article/pii/0166218X9390010L )
I haven't found anything yet that a "practitioner" might be able to actually use. Still looking.
It seems like a 'good' way forward is:
Compute edges using Delaunay triangulation.
Label each edge with its length or rectilinear distance.
Find the minimum spanning tree (with algorithms such as Borůvka's, Prim's, or Kruskal's).
Add Steiner points restricted to the Hanan grid.
Move edges to the grid.
Still unclear about that last step. How?

How to identify loosely-connected components of a graph

Imagine that a graph has two relatively densely connected components that are only connected to each other by relatively few edges. How can I identify the components? I don't know the proper terminology, but the intuition is that a relatively densely connected subgraph is hanging onto another subgraph by a few threads. I want to identify these clumps that are only loosely connected to the rest of the graph.
If your graph represents a real-world system, this task is called community detection. You'll find many articles about that, starting with Fortunato's review (2010). He describes, amongst others, the min-cut based methods mentioned in the earlier answers.
There're also many posts on SO, such as :
Are there implementations of algorithms for community detection in graphs?
What are the differences between community detection algorithms in igraph?
Community detection in Networkx
People in Cross Validated also talk about community detection :
How to do community detection in a weighted social network/graph?
How to find communities?
Finally, there's a proposal in Area 51 for a new Network Science site, which would be more directly related to this problem.
You probably want sparsest cut instead of min cut -- unless you can identify several nodes in a component, min cuts have a tendency to be very unbalanced when the degrees are small. One of the common approaches to sparsest cut is to compute an eigenvector of the graph's Laplacian and threshold it.
The answer might be somewhat general, but you could could try to model your problem as a flow problem and generate a minimum cut; see here. The edges could be bidirectional with capacity 1, and a resulting cut would maybe yield the desired partition?

Pathfinding through four dimensional data

The problem is finding an optimal route for a plane through four dimensional winds (winds at differing heights and that change as you travel (predicative wind model)).
I've used the traditional A* search algorithm and hacked it to get it to work in 3 dimensions and with wind vectors.
It works in a lot the cases but is extremely slow (im dealing with huge amounts of data nodes) and doesnt work for some edge cases.
I feel like I've got it working "well" but its feels very hacked together.
Is there a better more efficient way for path finding through data like this (maybe a genetic algorithm or neural network), or something I havent even considered? Maybe fluid dynamics? I dont know?
Edit: further details.
Data is wind vectors (direction, magnitude).
Data is spaced 15x15km at 25 different elevation levels.
By "doesnt always work" I mean it will pick a stupid path for an aircraft because the path weight is the same as another path. Its fine for path finding but sub-optimal for a plane.
I take many things into account for each node change:
Cost of elevation over descending.
Wind resistance.
Ignoring nodes with too high of a resistance.
Cost of diagonal tavel vs straight etc.
I use euclidean distance as my heuristic or H value.
I use various factors for my weight or G value (the list above).
Thanks!
You can always have a trade off of time-optimilaity by using a weighted A*.
Weighted A* [or A* epsilon], is expected to find a path faster then A*, but the path won't be optimal [However, it gives you a bound on its optimality, as a paremeter of your epsilon/weight].
A* isn't advertised to be the fastest search algorithm; it does guarantee that the first solution it finds will be the best (assuming you provide an admissible heuristic). If yours isn't working for some cases, then something is wrong with some aspect of your implementation (maybe w/ the mechanics of A*, maybe the domain-specific stuff; given you haven't provided any details, can't say more than that).
If it is too slow, you might want to reconsider the heuristic you are using.
If you don't need an optimal solution, then some other technique might be more appropriate. Again, given how little you have provided about the problem, hard to say more than that.
Are you planning offline or online?
Typically for these problems you don't know what the winds are until you're actually flying through them. If this really is an online problem you may want to consider trying to construct a near-optimal policy. There is a quite a lot of research in this area already, one of the best is "Autonomous Control of Soaring Aircraft by Reinforcement Learning" by John Wharington.

Fastest Algorithm For Graph Planarization

I'm using Processing to develop a navigation system for complex data and processes. As part of that I have gotten into graph layout pretty deeply. It's all fun and my opinions on layout algorithms are : force-directed is for sissies (just look at it scale...haha), eigenvector projection is cool, Sugiyama layers look good but fail fast on graphy graphs, and although I have relied on eigenvectors thus far, I need to minimize edge crossings to really get to the point of the data. I know, I know NP-complete etc.
I should add that I have some good success from applying xy boxing and using Sugiyama-like permutation to reduce edge crossings across rows and columns. Viz: graph (|V|=90,avg degree log|V|) can go from 11000 crossings -> 1500 (by boxed eigenvectors) -> 300 by alternating row and column permutations to reduce crossings.
But the local minima... whatever it is sticks around this mark, and the result is not as clear as it could be. My research into the lit suggests to me that I really want to use the planarization algorithm like what they do use for VLSI:
Use BFS or something to make the maximal planar subgraph
1.a. Layout the planar subgraph nice-like
Cleverly add outstanding edges to recover the original graph
Please reply with your thoughts on the fastest planarization algorithm, you are welcome to go into some depth about any specific optimizations you have had familiarity with.
Thanks so much!
Given that all graphs are not planar (which you know), it might be better to go for a "next-best" approach rather than a "always provides best answer" approach.
I say this because my roommate in grad school had the same problem you did. He was trying to convert a graph to planar form and all the algorithms that guarantee minimum edge crossings were too slow. What he ended up doing what just implementing a randomized algorithm. Basically, lay out the graph, then jigger nodes that have edges with lots of crossings, and eventually you'll handle the worst clumps of edges.
The advantages were: you can cut it out after a specific amount of time, so if you're time limited this always comes in under a certain amount of time, if you get a crappy graph you can just run it again (over the already laid out graph) to get something better, and it's relatively easy to code up.
Disadvantage is you don't always get the global minima in crossings, and if the graph gets stuck in high crossing areas, his algorithm would sometimes shoot a node way off into the distance to try and resolve it, which sometimes causes really weird looking graphs.
You know already lots of graph layout but I believe your question is still, unfortunately, underspecified.
You can planarize any graph really quickly by doing the following: (1) randomly layout the graph on a plane (2) calculate where edges cross (3) create pseudo-vertices at the crossings (this is what you anyway do when you use planarity based layout for non-planar graphs) (4) the graph extended with the new vertices and split edges is automatically planar.
The first difficulty comes from having an algorithm that calculates a combinatorial embedding that minimizes the number of edge crossings. The second difficulty is using that combinatorial embedding to do layout in Euclidean plane that is visually appealing, e.g. for orthogonal graph layout you might want to minimize the number of bends, maximize the size of faces, minimize the area of the graph as a whole, etc., and these goals might conflict each other.

Explain 0-extension algorithm

I'm trying to implement the 0-extension algorithm.
It is used to colour a graph with a number of colours where some nodes already have a colour assigned and where every edge has a distance. The algorithm calculates an assignment of colours so that neighbouring nodes with the same colour have as much distance between them as possible.
I found this paper explaining the algorithm: http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=1FBA2D22588CABDAA8ECF73B41BD3D72?doi=10.1.1.100.8049&rep=rep1&type=pdf
but I don't see how I need to implement it.
I already asked this question on the "theoretical computer science" site, but halfway the discussion we went beyond the site's scope:
https://cstheory.stackexchange.com/questions/6163/explain-0-extension-algorithm
Can anyone explain this algorithm in layman's terms?
I'm planning to make the final code opensource in the jgrapht package.
The objective of 0-extension is to minimize the total weighted cost of edges with different color endpoints rather than to maximize it, so 0-extension is really a clustering problem rather than a coloring problem. I'm generally skeptical that using a clustering algorithm to color would have good results. If you want something with a theoretical guarantee, you could look into approximations to the MAXCUT problem (really a generalization if there are more than two colors), but I suspect that a local-search algorithm would work better in practice.

Resources