Edges output by graphviz dot bunch up - graphviz

How can I prevent the lines bunching up and making up unclear what leads to what, as shown in this image:

The most simple way would be to increase ranksep - the minimal distance between two ranks. This should leave more space for the edges, at least if you're using dot.
If the concentration of edges is due to a lot of nodes on the same rank, you may consider using the unflatten utility. Including this step in the generation of the graph allows to distribute nodes on the same rank to different ranks and therefore make the graph narrower (but longer), creating some space between the nodes. The edges to the individual nodes should then be more easily distinguishable.
A complete example on how to use the unflatten utility (with picture) can be found in this answer.

Related

Compute all paths in graph that has multiple inputs and one output

I want to compute all the paths in directed acyclic graph from multiple inputs (x1, .., xn) to one output. The graph has the same depth which d and the inputs come to the graph at the same time (the shape is like Artificial Neural Networks with many inputs and one output). Could you please tell me if there are some algorithms that can compute such paths?
Regards,
1) Run a depth first search, starting at the output and traversing each edge in the reverse direction, to find all nodes from which you can get to the output.
2) Delete all nodes from which you cannot get to the output.
3) Run a recursive search on the modified graph, starting at each input node in turn, to find all paths to the output.
Because you have removed all the dead ends, this should produce all the paths as fast as you can output them, but you should be warned that there may be a very large number of different paths, even from small graphs - a graph the shape of a ladder and length n may have about 2^n paths - at each rung you have a choice as to whether to go up the left or the right hand side of the ladders, so there are 2^(number of rungs) different paths.

Quickest way to determine if a rectangular graph would be partitioned if a node is removed

I am working on an atmospheric simulation for a video game, and a problem I have stumbled into is that I need a cheap (in processing time) way to determine if a graph of nodes in a rectangular grid (each node is connected to up to four neighbours, NSEW) would become partitioned if I removed a particular node.
I have tried searching for ways of detecting if a graph is partitioned but so far I have not found anything that suits my problem. I have not taken advanced math courses and only have basic knowledge of graph theory so it is possible that I just have not been searching with the right terms.
If possible, it would be very very desirable to avoid having to search through the whole graph.
You can find articulation points using a modified depth first search - see http://en.wikipedia.org/wiki/Biconnected_component. An articulation point of a graph is a node that, if removed, disconnects the graph. Every graph can be split up at the articulation points into biconnected components. If you are lucky, you just need to know whether a point is an articulation point. If not, perhaps splitting the graph up into a tree of biconnected components and analysing them will help.

Venn Diagram Drawing Algorithms

Someone asked about overlapping subclusters in GraphViz and got the following response:
Sorry, no. General subgraphs can share nodes without implying subset
containment but not clusters. The problem is in the drawing.
If clusters can overlap arbitrarily, drawing them becomes the problem
of drawing Venn diagrams, for which there are no good algorithms.
What is a formal definition or example of the "problem of drawing Venn diagrams"?, and why is it (I assume NP-complete/hard) hard ? (Extra points: Sketch a reduction to a well-known NP-complete problem)
You have N points and a binary relation R on them, and you need to represent the relation graphically so that every node is represented by a circle on Euclidean plane so that two circles overlap if and only if for the corresponding nodes n and n' it holds that n R n'.
Instead of a Venn diagram as such, we can in many cases use GraphViz for the same purpose using the dual graph, which is the Boolean lattice of the intersections of the sets. Each node represents a unique choice of sets to include and sets to exclude. Nodes that differ only by the inclusion/exclusion of a single set are connected.
For increasing numbers of sets, of course there are in general many, many nodes and edges. But in many practical settings there will be many sets that do not intersect at all, so that those intersection nodes and any edges from them to other nodes may be omitted. By this method the number of nodes and edges may be reduced to a practical number.
When laying out the resulting graph it may be best to select the GraphViz algorithm "neato" and to ask to avoid overlapping the nodes. One way to make those settings is by writing, inside the opening curly brace for the graph, layout=neato,overlap=prism; .

Looking for an algorithm by possibly Dijkstra

I am looking for an algorithm that distributes nodes on a plane, such that the edges are
all the same size. I think it is by Dijkstra, but I cannot remember.
Anyone heard of this algorithm?
In general this will be impossible. Effectively you want something similar to the finite pictures in tilings of the plane.
There are some simple cases - regular polygons and a few graphs which include joined polygons, but even something as simple as the complete graph for 4 points (tetrahedron) is impossible.
If you want something that tries to balance the impossible constraints, try graphviz and its neato program.
Well if you want to create any graph with such property, then there are number of graphs that may help you with that, for instance: a line, a ring, a tree etc .. but in here, you are the one who decide what edges to include or exclude.
If you have a certain graph, and you want to have all edges of the same size then this is impossible (because of some cases) - such as: a complete graph of more than 3 nodes, a star topology with one master and more than 5 slaves, and slaves that are directly close to each other are neighbors. [I believe the cases in the other posts tells you more]
A special case, is given a graph $G(V,E)$, draw $G$ such that the length of each edge in $e \in E$ is less than a unit. This is an NP-Hard problem. [That is, you cannot decide whether an arbitrary graph $G$ is a unit disk graph]

Storing very large graphs on disk/streaming graph partitioning algorithms?

Suppose that I have a very large undirected, unweighted graph (starting at hundreds of millions of vertices, ~10 edges per vertex), non-distributed and processed by single thread only and that I want to do breadth-first searches on it. I expect them to be I/O-bound, thus I need a good-for-BFS disk page layout, disk space is not an issue. The searches can start on every vertex with equal probability. Intuitively that means minimizing the number of edges between vertices on different disk pages, which is a graph partitioning problem.
The graph itself looks like a spaghetti, think of random set of points randomly interconnected, with some bias towards shorter edges.
The problem is, how does one partition graph this large? The available graph partitioners I have found work with graphs that fit into memory only. I could not find any descriptions nor implementations of any streaming graph partitioning algorithms.
OR, maybe there is an alternative to partitioning graph for getting a disk layout that works well with BFS?
Right now as an approximation I use the fact that the vertices have spatial coordinates attached to them and put the vertices on disk in Hilbert sort order. This way spatially close vertices land on the same page, but the presence or absence of edge between them is completely ignored. Can I do better?
As an alternative, I can split graph into pieces using the Hilbert sort order for vertices, partition the subgraphs, stitch them back and accept poor partitioning on the seams.
Some things I have looked into already:
How to store a large directed unweighted graph with billions of nodes and vertices
http://neo4j.org/ - I found zero information on how does it do graph layout on disk
Partitioning implementations (unless I'm mistaken, all of them need to fit graph into memory):
http://glaros.dtc.umn.edu/gkhome/views/metis
http://www.sandia.gov/~bahendr/chaco.html
http://staffweb.cms.gre.ac.uk/~c.walshaw/jostle/
http://www.cerfacs.fr/algor/Softs/MESHPART/
EDIT: info on how the graphs looks like and that BFS can start everywhere.
EDIT: idea on partitioning subgraphs
No algorithm truly needs to "fit into memory"--you can always page things in and out as needed. But you do want to avoid having the computation take unreasonably long--and global graph partitioning in the generic case is a NP-complete problem, which is "unreasonably long" for most problems that do not even fit in memory.
Fortunately, you want to do breadth-first searches, which means that you want a format where breadth-first is the easy computation. I don't know of any algorithms offhand that do this, but you can construct your own breadth-first layout if you're willing to allow a bit of extra disk space.
If the edges are not biased towards local interactions, then disentangling the graph will be difficult. If they are biased towards local interactions, then I suggest an algorithm like the following:
Pick a random set of vertices as starting points from throughout the entire data set.
For each vertex, collect all neighboring vertices (takes one sweep through the data set).
For each set of neighboring vertices collect the set of neighbors-of-neighbors and rank them according to how many edges connect to them. If you don't have space in a page to store them all, keep the most-connected vertices. If you do have space to save them all, you may wish to throw away the least useful ones (e.g. if the fraction of edges kept within a page / fraction of vertices needing storage ratio drops "too low"--where "too low" will depend on how much breadth your searches really need, and whether you can do any pruning and so on--then don't include those in the neighborhood.
Repeat the process of collecting and ranking neighbors until your neighborhood is full (e.g. fills some page size that suits you). Then check for repeats among the randomly chosen starts. If you have a small number of vertices appearing in both, remove them from one or the other, whichever breaks fewer edges. If you have a large number of vertices appearing in both, keep the neighborhood with the best (vertices in neighborhood/broken edge) ratio and throw the other away.
Now you have some local neighborhoods that are approximately locally optimal in that breadth-first searches tend to fall inside. If your breadth-first search prunes off unproductive branches pretty effectively, then this is probably good enough. If not, you probably want adjacent neighborhoods to cluster.
If you don't need adjacent neighborhoods to cluster too much, you set aside the vertices you've grouped into neighborhoods, and repeat the process on the remaining data until all vertices are accounted for. You change each vertex identifier to (vertex,neighborhood), and you're done: when following edges, you know exactly which page to grab, and most of them will be close by given the construction.
If you do need adjacent neighborhoods, then you'll need to keep track of your growing neighborhoods. You repeat the previous process (pick at random, grow neighborhoods), but now rank neighbors by both how many edges they satisfy within the neighborhood and what fraction of their edges that leave the neighborhood are in an existing group. You might need weighting factors, but something like
score = (# edges within) - (# neighborhoods outside) - (# neighborhoodless edges outside)
would probably do the trick.
Now, this is not globally or even locally optimal, but this or something very much like it should give a nicely locally-connected structure, and should let you produce a covering set of neighborhoods that have relatively high interconnectivity.
Again, it depends whether your breadth-first search prunes branches or not. If it does, the inexpensive thing to do is to maximize local interconnectivity. If it doesn't the thing to do is to minimize external connectivity--and in that case, I'd suggest just collecting breadth-first sets up to some size and saving those (with duplication at the edges of the sets--you're not badly limited by hard drive space, are you?).
You might want to look at HDF5. Despite the H standing for Hierarchical it can store graphs, check the documentation under the keyword 'Groups', and it is designed for very large datasets. If I understand correctly HDF5 'files' can be spread across multiple o/s 'files'. Now, HDF5 is only a data structure, plus a set of libraries for low- and high-level manipulations of the data structure. Off the top of my head I haven't a clue about streaming graph-partitioning algorithms, but I stick to the notion that if you get the data structure right algorithms become easier to implement.
What do you already know about the mega-graph ? Does it naturally partition into dense subgraphs which themselves are sparsely connected ? Would a topological sort of the graph be a better basis for storage on disk than the existing spatial sort?
Failing crisp answers to such questions, maybe you just have to bite the bullet and read the graph multiple times to build the partitions, in which case you just want the fastest I/O you can manage, and sophisticated layout of partitions on nodes is nice but not as important. If you can partition the graph into sub-graphs which themselves have single edges to the other sub-graphs you maybe able to make the problem more tractable.
You want a good-for-BFS layout, but BFS is usually applied to trees. Does your graph have a unique root from which to start all BFSes? If not, then layout for BFS from one vertex will be suboptimal for BFS from another vertex.

Resources