Data structure for a with complex node - data-structures

I have a directed-graph, each node is a complex data type.
Does anyone know How to build a data structure for this graph.
Like for the instance, picture here :
Thank in advance.

Looking at the image, it just looks like a directed graph. I assume what you mean by "complex data type" is that each vertex holds some sort of complex information, such as a hash table or something.
What I recommend you do if to create a dedicated Vertex class that holds the relevant information, then create a graph class using either an adjacency matrix or adjacency list implementation, depending on how dense/sparse/big/small the graph will be.

Related

Which graph visualization library is suitable for static node and edges with text?

I would like to implement a graph in Java. I defined class Node and class Edge, and the graph is represented with a list of Node, and for each node, a list of edges starting from the node is defined as a field of the node. Now I would like to use some Java library to visualize the graph.
Currently I have tried graphviz-java. The result meets my requirement but the edges and nodes are not placed well. See the following graph below:
I created the graph with the API it provided like node(xxx), addLink() through traversing the node list of my graph instead of generating the raw dot file text.
The key code of creating the graph is as follows:
for (LiteralNode litNode : literalNodeArrayList)
{
MutableNode mutableNode = mutNode(litNode.getNodeElement());
for (Edge e : litNode.getEdgeList()) {
MutableNode endNode = mutNode(e.endNode.getNodeElement());
mutableNode.addLink(to(endNode.port(Compass.NORTH)).add(Arrow.VEE));
}
mutableNodes.add(mutableNode);
g.add(mutableNode);
}
It's basically simple traverse with loops. The node and edges are dynamically changed according to different cases.
I would like the node and edges adjust the location to make the graph display better. I have also tried using json file and d3js to visualize the graph but I found those libraries focus too much on displaying huge amount of nodes and their distributions, which I don't need. I like the structure graphviz provided with some stratification. So I would like to get some recommendation.
I mentioned the json file because if no Java library is available, I might as well try some javascript library like d3js and only use java as a backend tool to generate json file I need to input. So the recommendation is not limited to Java.
All graphviz attributes (https://graphviz.gitlab.io/_pages/doc/info/attrs.html) are supported.
Node distances are set with "ranksep" and "nodesep".
To force nodes on the same level, add a subgraph with "rank=same" and the nodes that should be aligned.
Here's a simple example:
Graph g = graph()
.graphAttr().with(attr("ranksep", .1), attr("nodesep", .1))
.with(graph().graphAttr().with(Rank.SAME).with(node("a"), node("c")))
.with(node("a").link("b"), node("b").link("c"));
The graph looks good to me. What do you mean by "I would like the node and edges adjust the location to make the graph display better."?

Simple way to represent force/flow directed, weighted graph?

I have a set of weighted pairwise relations between nodes which are all of the same type, like this:
A-[1]->B
A-[2]->C
B-[3]->C
B-[2]->D
E-[1]->A
I'd like to lay out this graph in such a way that makes the precedence order of the nodes relatively clear (i.e. that "flow" goes roughly from E to A to B/C/D)
I think what I need similar to a Force Layout but with the added notion of edge weight and directionality
I've looked into using neo4j's builtin viz view and d3 but they don't seem to offer what I need out of the box. Is there a standard approach to this kind of problem?
Even with Neo4j's built in viz you should be able to do:
MATCH path = (a:LabelA {id:"A"})-[:FOO*..10]->()
RETURN path
which should show you the tree starting from A

BGL requires a model that it does not provide itself?

I'd like to use the parallel MST algorithm dense_boruvka_minimum_spanning_tree from boost.
One required parameter for the interface of that algorithm is a graph which "must be a model of Vertex List Graph and Distributed Edge List Graph". I found that the only model of boost that incorporates the Distributed Edge List Graph concept is the Distributed Adjacency List. However in the section "Graph Concepts" of that model it is explicitly said that
"[...] the distributed adjacency list does not model the Vertex List Graph or Edge List Graph concepts [...]
(Emphasized by me)
At this point I am confused. I'm supposed to pass a data structure to the interface of a boost algorithm which is not provided by the framework? Did I misunderstand something?
NB: I'm pretty new in the boost world.
Boost graph provides generic algorithms around concepts, and has historically included very few models of graph concepts. People will generally have their graphs already in some existing datastructure that they can adapt.
In this light
At this point I am confused. I'm supposed to pass a data structure to the interface of a boost algorithm which is not provided by the framework?
is not even so strange.
The concept of DistributedAdjacencyList only provides DistributedVertexListGraph when you need VertexListGraph.
The key difference is highlighted under DVLG:
A Distributed Vertex List Graph is a graph whose vertices are distributed across multiple processes or address spaces. The vertices and num_vertices functions retain the same signatures as in the Vertex List Graph concept, but return only the local set (and size of the local set) of vertices.
In other words: a DVLG is really just a VLG already, just distributed.
What you will want to do is "undistribute" the DVLG using the VertexListAdaptor:
The vertex list graph adaptor adapts any model of Distributed Vertex List Graph in a Vertex List Graph. In the former type of graph, the set of vertices is distributed across the process group, so no process has access to all vertices. In the latter type of graph, however, every process has access to every vertex in the graph. This is required by some distributed algorithms, such as the implementations of Minimum spanning tree algorithms.
The solution seems to be the "VertexListAdaptor": http://www.boost.org/doc/libs/1_47_0/libs/graph_parallel/doc/html/vertex_list_adaptor.html

Directed Graph Versus Associative Array

I have been reading up on directed graphs. I have managed to get an abstract graph data type working in my application but I don't find it particularly intuitive and am considering replacing it with an ordinary multi-dimensional array.
My graph is sparse and acyclic. Each vertex is reachable from one particular 'master' vertex. If it was a tree, this master vertex would be the 'root'. It it was a social network, this master vertex would be 'me'.
Although my graph may have hundreds of thousands of vertices it has a finite depth: the greatest distance between any two nodes is 3 edges.
The underlying data representation is an adjacency list. A small example would look like this:
Head | Tails
--------------
1 | 2, 3, 4
2 | 5
3 | 5
4 | 5
5 | 6
If I was using an ordinary multi-dim array instead of my graph data type, it would look something like this:
$me[1][2][5][6]
$me[1][3][5][6]
$me[1][4][5][6]
Now, the main things that I want to be able to do with this graph are:
Navigate it as a hierarchy. I realise that some child vertices will feature in more than one category (e.g. #5), but that is what I want for this particular use case. I can't see any real difference between an array and a graph for this point.
Lay it out as a list (alphabetical, according to vertex name), with no duplicates. I would probably do a DFS, flagging visited vertices as I go, to avoid exploring them more than once. But as far as I can see this is achievable using either the graph or the array, and at the same cost.
Do an 'all paths' analysis for any given pair of points. Because I want 'all paths' (ie. I'm not simply checking for reachability), it seems to me that I have to traverse the entire graph, and again I can see no advantage in a graph over an array.
I get the feeling that I am missing something, but I can't put my finger on it. Can you??? Any ideas, suggestions, insights or advice gratefully accepted... (By the way, I'm using PHP, and the data source is a relational DB. I don't think this makes any real difference though).
Thanks!
One thing you need to understand is that a directed graph (or digraph) is a concept, whereas an associative array is a data structure.
An instance of the digraph concept can be stored in many different data structures, of which you can find the most common on this wikipedia page.
I'm not sure what you are doing with your multidimensional array... storing all paths? You will end up with a N³ space complexity, and trouble building it. A tree-based structure would be more efficient at the very least.
Now to the things you want to do with your graph:
Navigate as a hierarchy. The basic digraph concept doesn't allow to go up in the hierarchy, but you can easily store the reverse graph as well (especially with matrix-based representations, just use 3 values instead of 2 - forward, backward and nothing) .
Lay it out as a list, according to name. You have to store the name somewhere (either in a side map or in the vertex object), but it shouldn't be any harder than sorting anything else according to name.
Do an 'all paths' analysis. You can probably get away with linear complexity (in the number of paths) through DP and a shared representation of paths.
It looks that your data structure is too complicated. If you represent a directed graph as a multidimensional array, it is almost always of dimension two so that
$array[$x][$y]
is a boolean value that is TRUE if and only if there is an edge from node $x to node $y in the graph. In your example if would be e.g.
$array[1][2] = TRUE
$array[1][5] = FALSE
But for sparse graphs, using this boolean matrix representation is not usually good. Typically you would have a one-dimensional array that maps every node to a set of nodes to which there is an edge, e.g.
$array[1] = { 2, 3, 4 }
where { ... } means some sort of an unordered collection data structure, which can be e.g. a binary search tree or a hash set (hash table).
This data structure enables you to quickly find the nodes to which there is an arc from a given node, which is a key feature for graph algorithms.
Sometimes you want to be able to traverse your graph backwards also; in that case you would have another array that maps nodes to the list of their predecessors.

What are good ways of organizing directed graph data?

Here's my situation. I have a graph that has different sets of data being added at different times. For example, set1 might have a few thousand nodes and then set2 comes in later and we apply business logic to create edges from set1 to set2(and disgard any Vertices from set1 that do not have edges to set2). Then at a later point, we get set3, set4, and so on and the same process applies between each set and its previous set.
Question, what's the best way to organize this? What I did before was name the nodes set1-xx, set2-xx,etc.. The problem I faced was when I was trying to run analytics between the current set and the previous set I would have to run a loop through the entire graph and look for all the nodes that started with 'setx'. It took a long time as the graph grew, so I thought of another solution which was to create a node called 'set1' and have it connected to all nodes for that particular set. I am testing it but I was wondering if there way a more efficient way or a build in way of handling data structures like this? Is there a way to somehow segment data like this?
I think a general solution would be application but if it helps I'm using neo4j(so any specific solution to that database would be good as well).
You have a very special type of a directed graph, called a layered graph.
The choice of the data structure depends primarily on the expected graph density (how many nodes from a previous set/layer are typically connected to a node in the current set/layer) and on the operations that you need to perform on it most of the time. It is definitely a good idea to have each layer directly represented by a numeric index (that is, the outermost structure will be an array of sets/layers), and presumably you can also use one array of vertices per layer. However, the list of edges per vertex (out only, or in and out sets of edges depending on whether you ever traverse the layers backward) may be any of the following:
Linked list of vertex identifiers; this is good if the graph is very sparse and edges are often added/removed.
Sorted array of vertex identifiers; this is good if the graph is quite sparse and immutable.
Array of booleans, indexed by vertex identifiers, determining whether a given vertex is or is not linked by an edge from the current vertex; this is good if the graph is dense.
The "vertex identifier" can take many forms. For example, it can be an index into the array of vertices on the next layer.
Your second solution is what I would do- create a setX node and connect all nodes belonging to that set to setX. That way your data is partitioned and it is easier to query.

Resources