Aligning Clusters within Graphviz - graphviz

I am using Graphviz to automatically create an architecture diagram. I am having the following two problems and was hoping to get assistance.
I am using UUID to uniquely identify a component (example: "a5320de8-a320-11ea-bb37-0242ac130002" [label="Component A"]). When mapping A -> B, I'll get "Component A" -> b0c5e47c. Which is strange. The only way that I've been able to map UUID to UUID is to put quotes around them. Any suggestions?
I want to align clusters in a specific manner and specific direction. I've tried {rank=same; cluster_B, cluster_C, cluster_D}; and "9653369c-a322-11ea-bb37-0242ac130002" -> "aa31adb9-9621-40c2-855c-621832dd8c61" [style=invis] But neither work.
I have three sections within my dot file, they are:
Components (within this section, I list out all 100+ components and color code them based on a specific rule.
Clusters (within this section, I cluster the components into specific 'groupings')
Diagram or mapping (within this section, I then map the different components and clusters).
Here is a sample of my DOT file.
digraph architecture {
#graph [rankdir=LR]
compound=true;
#Compliant
node[fillcolor="#013220" style="filled" shape=square fontcolor="white"];
"a5320de8-a320-11ea-bb37-0242ac130002" [label="Component A"]
"b0c5e47c-a320-11ea-bb37-0242ac130002" [label="Component B"]
#Clusters
#Customer-facing client application cluster
subgraph cluster_A{
label="Client Apps";
"f7b3915d-6b3d-4d4c-bef0-bdabda915c03";
"9912de2b-739a-4c5c-834e-e0c3d09d70d1";
"16bb2066-9293-470e-99ec-c59d8426c0ab";
"641a6601-f4f6-4c06-baa6-e5e232f8abed";
"c5e92b09-a470-4fb6-af5c-e5f7dbeff919";
}
#Diagrams
"f7b3915d-6b3d-4d4c-bef0-bdabda915c03" -> {"35305026-d285-458c-85ad-7eae4e785e84", "76e0e679-42a6-47f0-9164-abc223da07fe"};
76e0e679-42a6-47f0-9164-abc223da07fe" -> "35305026-d285-458c-85ad-7eae4e785108";
}
I get something like:
However, I want to arrange the cluster is a specific way, like:

As you found out, hyphens are not legal characters in a node ID unless the string is quoted. If you want more info: https://www.graphviz.org/doc/info/lang.html
There is no straight-forward to align clusters. Sometime you can force desired alignment by embedding multiple clusters within another cluster to "shrink-wrap" them. For example embed B3 and B4 within cluster B34. But no guarantees.
You can use gvpr to reposition clusters (and their contents) but that can get pretty complex.

Related

How to add this type of node description by Mermaid?

This is a flowchart pattern that I really like to use and I currently use drawio to draw it:
Notice that there are two kinds of descriptions in the flow chart
description1:How does A get to B
description2:Some properties of B
I know Mermaid can implement the description1 by:
graph TB
A --->|"description1:<br>How does A get to B"| B
But description2 is also very important to me, is there any way to achieve it?
The current workaround:
I use the heading of subgraph instead of description2:
graph TB
A --->|"description1:<br>How does A get to B"| B
subgraph description2:<br>Some properties of B
B
end
But I have to say it's a very ugly temporary solution. So I ask here..
While some types of Mermaid diagrams explicitly support notes (e.g. sequence diagrams), flowcharts do not.
I believe the closest you're going to get is to connect B to itself with an invisible link (~~~):
graph TB
A --->|"description1:<br>How does A get to B"| B
B ~~~|"description2:<br>Some properties of B"| B

evenly distribute pygraphviz nodes

I have a graphviz code like this:
import pygraphviz as pgv
A = pgv.AGraph(strict=False, directed=True,
overlap=False, sep="+10,10")
[A.add_node(k) for k, v in S] # adding all nodes
A.add_edge(S.created, S.packaged_unassigned)
A.add_edge(S.packaged_unassigned, S.packaged_assigned)
A.add_edge(S.packaged_assigned, S.packaged_unassigned, style="dotted")
A.add_edge(S.packaged_assigned, S.shipped_to_distributor)
A.add_edge(S.shipped_to_distributor, S.on_distributor_side_out)
A.add_edge(S.on_distributor_side_out, S.shipped_to_deployer)
A.add_edge(S.shipped_to_deployer, S.on_distributor_side_in)
A.add_edge(S.on_distributor_side_in, S.shipped_to_lab)
A.add_edge(S.shipped_to_lab, S.on_lab_side)
A.add_edge(S.on_lab_side, S.analysis_completed)
A.add_edge(S.analysis_completed, S.completed)
A.layout()
A.draw("status_chart.png")
which produces this output:
https://i.ibb.co/7pJQ8rd/Screenshot-2020-12-20-at-22-23-10.png
My concern here is that the nodes seem to not utilize the available space properly. Instead they just span the diagonal of the image.
How can i make graphviz utilize the space better to create a smaller image while keeping the constraint of no overlaps?
One constraint is that you have specified a chained graph. and thus, the plot is somewhat constrained by that.
See the documentation for some possible alternative options to specify:
https://pygraphviz.github.io/documentation/latest/reference/agraph.html
For example, you can try specifying a landscape view
AGraph(landscape='true'...)
You can also try experimenting with different layout directives:
Optional prog=[‘neato’|’dot’|’twopi’|’circo’|’fdp’|’nop’] will use specified graphviz layout method.
Also see:
unflatten(args='')
To adjust directed graphs to improve layout aspect ratio.
Another technique you could use is to break-up parts of the graph, such that it might look like this (adding a text label to the nodes, with the numbering scheme):
(1) -> (2) ->(3)
(4) -> (5) ->(6)
...
NOTE: The Python documentation for that library specifically suggests that you also refer to the Graphviz documentation...for additional options.
http://www.graphviz.org/doc/info/lang.html
http://www.graphviz.org/doc/info/attrs.html
For example:
http://www.graphviz.org/doc/info/attrs.html#d:ratio
http://www.graphviz.org/doc/info/attrs.html#a:layout
http://www.graphviz.org/doc/info/attrs.html#d:scale
The Gallery also has some great examples, illustrating the use of different parameters
http://www.graphviz.org/gallery/

Multiple graphs inside Graphviz DOT file

I have this Graphviz DOT graph:
digraph unit_test {
label="Unit test"
edge [fillcolor="#a6cee3" color="#1f78b4"]
node[shape="ellipse" style="filled" fillcolor="#1f77b4"]
start
end
node[shape="box" style="filled" fillcolor="#ff7f0e"]
process
subgraph cluster_process {
label = "Major logic"
process
}
start -> process
process -> end
}
The above renders as:
I have this second graph:
digraph details {
label = "Process details"
edge [fillcolor="#a6cee3" color="#1f78b4"]
node[shape="ellipse" style="filled" fillcolor="#1f77b4"]
start
end
node[shape="box" style="filled" fillcolor="#ff7f0e"]
details
subgraph cluster_details {
label = "Details"
details
}
start -> details
details -> end
}
Which renders to:
Problem
When I put the above two graphs inside the same DOT file named supporting.dot and I run dot -Tpng -o supporting.png supporting.dot command, terminal prints out some jiberish and the output image file won't contain both graphs, it just contains the first one. Is it possible to use multiple graphs inside a single DOT file? If so, what am I missing?
Question is unclear about what is to be accomplished, but maybe the following is a starting point
digraph G{
subgraph unit_test {
label="Unit test"
edge [fillcolor="#a6cee3" color="#1f78b4"]
node[shape="ellipse" style="filled" fillcolor="#1f77b4"]
start
end
node[shape="box" style="filled" fillcolor="#ff7f0e"]
process
subgraph cluster_process {
label = "Major logic"
process
}
start -> process
process -> end
}
subgraph details {
label = "Process details"
edge [fillcolor="#a6cee3" color="#1f78b4"]
node[shape="ellipse" style="filled" fillcolor="#1f77b4"]
start1 [label="start"]
end1 [label="end"]
node[shape="box" style="filled" fillcolor="#ff7f0e"]
details
subgraph cluster_details {
label = "Details"
details
}
start1 -> details
details -> end1
}
}
Note the naming / labels in the second subgraph.
Dot can't render 2 graphs into a single file, the output you see is probably the content of one of the graphs as a png.
In order to prevent that, you may run your graphs first through gvpack - something similar to:
gvpack -u supporting.dot | dot -Tpng -o supporting.png
This combines all graphs in supporting.dot into a single graph, which then is rendered with dot.
The layout of the graphs can be influenced by some more options of gvpack.
It is legal to have multiple graphs defined in one input file. You can then produce multiple output files using the -O option, like this:
dot -Tpng -O multi.gv
This will produce multi.gv.png and multi.gv.2.png
I got a better answer
http://www.bound-t.com/manuals/ref-manual.pdf
The -dot_dir option and the names of drawing files
The -dot option creates a single file that contains all drawings from one Bound-T run. If you
then use the dot tool to create a PostScript file, each drawing will go on its own page in the
PostScript file. However, dot can also generate graphical formats that do not have a concept of
"page" and then it may happen that only the first drawing is visible. If you want to use such
non-paged graphical formats it is better to create a directory (folder) to hold the drawing files
and use the Bound-T option -dot_dir instead of the option -dot. The -dot_dir option creates a
separate file for each drawing, named as follows:
• The call-graph of a root subprogram is put in a file called cg_R_nnn.dot, where R is the
link-name of the root subprogram, edited to replace most non-alphanumeric characters
with underscores '_', and nnn is a sequential number to distinguish root subprograms that
have the same name after this editing.
• If the call-graph of some root subrogram is recursive, Bound-T draws the joint call-graph of
all roots and puts it in a file called jcg_all_roots_001.dot.
• The flow-graph of a subprogram is put in a file called fg_S_nnn.dot, where S is the linkname of the subprogram, edited as above, and nnn is a sequential number to distinguish
subprograms that have the same name after this editing and also to distinguish drawings
that show different flow-graphs (execution bounds) for the same subprogram.
The sequential numbers nnn start from 1 and increment by 1 for each drawing file; the same
number sequence is shared by all types of drawings and all subprograms. For example, if we
analyse the root subprogram main?func that calls the two subprograms start$sense and
start$actuate, with the -dot_dir option and -draw options that ask for one flow-graph drawing
of each subprogram, the following drawing files are created:
• cg_main_func_001.dot for the call-graph of main?func
• fg_main_func_002.dot for the flow-graph of main?func
• fg_start_sense_003.dot for the flow-graph of start$sense
• fg_start_actuate_004.dot for the flow-graph of start$actuate.

Cryptic dot error message for graph with big (sub)clusters

I've written a little tool dumping (in dot format) the dependency graph of a project where all the files living in the same directory are gathered in a cluster. When I try to generate a pdf containing the corresponding graph, dot starts to cry:
The command dot -Tpdf trimmedgraph.dot -o graph.pdf produces the cryptic error message Error: install_in_rank clusterReals virtual rank 21 i = 0 an = 0 which does not yield any result on google.
I've tried to edit trimmedgraph.dot manually: turning the subgraph clusterReals into Reals yields a file that can be compiled but all the content of my Reals/ directory is obviously not gathered anymore.
Is there a way to generate only dot-valid files (I was planning to send my patch upstream eventually but if I cannot guarantee that everything will be okay...)?
I've put the two versions of trimmedgraph.dot online but they are rather big and given that I have no idea where the problem is, I cannot really come up with a minimal file recreating the problem.
I think (but couldn't find any documentation about this) cluster names have to be unique within the entire dot file. In your file however, there are two subgraphs called clusterReals.
The solution is to make sure all cluster names are unique - since the name doesn't appear anywhere in the output, you may just use numbers when generating dot files.
A quick test shows that strange things happen when reusing the same cluster name:
digraph dependencies {
subgraph cluster0 {
label="First cluster 0";
Node1;
subgraph cluster0 {
label="Second cluster 0";
Node2;
}
}
subgraph cluster0 {
label="Third cluster 0";
Node3;
}
subgraph cluster1 {
label="Cluster 1";
Node4;
subgraph cluster0 {
label="Fourth cluster 0";
Node5;
}
}
}
All the cluster0 seem to be merged together (nodes, label), unless it is not possible because they're included by other clusters. At least that's what it looks like... Since the consequences are unpredictable (error in your case), I'd try to always use unique cluster names.

What's the best way to inherit properties in a tree-based structure?

I have a simple CMS system, that has a simple tree hierarchy:
We have pages A through E that has the following hierarchy:
A -> B -> C -> D -> E
All the pages are the same class, and have a parent-child relationship.
Now, let's say I have a property I want inherited among the pages. Let's say A is red:
A (red) -> B -> C -> D -> E
In this case, B through E would inherit "red".
Or a more complex scenarios:
A (red) -> B -> C (blue) -> D -> E
B would inherit red, and D/E would both be blue.
What would be the best way to solve something like this? I have a tree structure with over 6,000 leafs and about 100 of those leaves have inheritable properties. Those 100 or so leaves have their properties saved in the database. For leaves without explicit properties, I look up the ancestors and use memcached to save the properties. Then there are very overly-complex algorithms to handle expiring those caches. It's terribly convoluted and I'd like refactor to a more cleaner solution / data structure.
Does anybody have any ideas?
Thanks!
There is a data model that allows you to express this kind of information perfectly, that is RDF/RDFS . RDF is a W3C standard to model data based on triples (subject, predicate, object) and URIs; and RDFS , among other things, allows you to describe Class hierarchies and Property hierarchies. And the good thing is that there are many libraries out there that help you to create and query this type of data.
For instance if I want to say that a specific document Lion is of class Animal and programmer is of class Geek , I could say:
doc:lion rdf:type class:mamal .
doc:programmer rdf:type class:Geek .
Now I could declare a hierarchy of classes, and say that every mamal is an animal and every animal is a living thing.
class:mamal rdfs:subClassOf class:animal .
class:animal rdfs:subClassOf class:LivingThing .
And, that every geek is a human and that every human is living thing:
class:geek rdfs:subClassOf class:human .
class:human rdfs:subClassOf class:LivingThing .
There is a language , similar to SQL, called SPARQL to query this kind of data, so for instance if I issue the query:
SELECT * WHERE {
?doc rdf:type class:LivingThing .
}
Where ?doc is a variable that will bind things type of class:LivingThing. I would get as result of this query doc:lion and doc:programmer because the database technology will follow the semantics of RDFS and therefore by computing the closure of classes it'll know that doc:lion and doc:programmer are class:LivingThing.
In the same way the query:
SELECT * WHERE {
doc:lion rdf:type ?class .
}
Will tell me that doc:lion is rdf:type of class:mamal class:animal and class:LivingThing.
In the same way that as I just explained, with RDFS, you can create hierarchies of properties, and say:
doc:programmer doc:studies doc:computerscience .
doc:lion doc:instint doc:hunting .
And we can say that both properties doc:skill and doc:instint are sub-properties of doc:knows:
doc:studies rdfs:subPropertyOf doc:knows .
doc:instint rdfs:subPropertyOf doc:knows .
With the query:
SELECT * WHERE {
?s doc:knows ?o .
}
We will get that a lion knows how to hunt and programmers know computer science.
Most RDF/RDFS databases can easily deal with the numbers of elements you mentioned in your question, and there are many choices to start. If you are a Java person you could have a look at Jena, there are also frameworks for .Net lije this one or Python with RDFLIB
But most importantly, have a look at the documentation of your CMS, because maybe there are plugins to export metadata as RDF. Drupal, for instance, is quite advance in this case (see http://drupal.org/project/rdf
If your problem is performance-related...
I assume you'd want to save on memory of all these inheritable properties (or perhaps you have a lot of them), otherwise this can be trivially solved with virtual properties.
If you need sparse inheritable properties, say if you are modelling how HTML DOM properties or CSS properties propagate, you'll need to:
Keep a pointer to the parent node (for walking upwards)
Use a hash dictionary to store the properties inside each class (or each instance, depending on your needs), keyed by name
If the properties don't vary by instance, use a class-static dictionary
If the properties can be overridden instance-by-instance, add an instance dictionary on top
When accessing a property, start finding it at the leaf, look in the instance dictionary first, then the class-static dictionary, then walk up the tree
Of course you can add more functionalities on top of this. This is similar to how Windows Presentation Foundation solves this problem via DependencyProperty.
If your problem is database-related...
If instead your problem is to avoid reading the database to walk up the tree (i.e. loading the parents to find inherited properties), you'll need to do some sort of caching for the parent values. Or alternatively, when you load a leaf from the database, you can load all its parents and create a master merged properties dictionary in memory.
If you want to avoid multiple database lookups to find each parent, one trick is to encode the path to each node into a text field, e.g. "1.2.1.3.4" for a leaf on the 6th level. Then, only load up nodes that have paths which are beginning substrings. You can then get the entire parents path in one SQL query.

Resources