How to allow edges to overlap nodes? - graphviz

I'm looking for a way to disable edge routing during rendering of dot graphs.
Ideally this would be a per-edge option, but disabling routing altogether would be helpful as well.
The graphs I'm generating represent syntax trees with additional edges from the usage of an identifier to its declaration as shown below.
Simple DAST
Now, this is still mostly readable, but with larger graphs the blue edges get very confusing very quickly, since dot seems to love routing them all over the place.
Complex DAST
I would prefer if they just went in a straight (or curved) line between the two nodes, ignoring all overlap with nodes and other edges.
Unfortunately I've been unable to find a way to achieve this effect and honestly I doubt it is even possible.
A similar question has been asked before, but I decided to open a new one anyway due to the following reasons:
I don't require the nodes to stay in a fixed position
Apart from the blue edges, my graph is always a tree (no edge overlap to worry about)
Running dot in multiple passes is unfortunately not an option for me
The other question is over 6 years old, so maybe a feature was added since then
My attempts so far:
Added "overlap=true" to the graph settings
Added "overlap=true" to individual edges
Neither of these seems to have any effect whatsoever.
The file layout is pretty simple (excerpt):
digraph {
node [shape=Mrecord];
graph [ordering=out, overlap=true, nodesep=0.3, ranksep=1];
...
# ReferenceLiteral rec
node0 -> node40 [style=dashed, color=blue, constraint=false]
node0 [shape=box style=filled label="rec" fillcolor="#cccccc"]
...
# PortNode Record
node28:p0:s -> node40:n
node28:p1_0:s -> node7:n
node28 [label="{Record|{<p0>Name|{Elements|{<p1_0>1}}}}"]
...
# DeclarationLiteral rec
node40 [shape=box style=filled label="rec" fillcolor="#cccccc"]
...
}

Related

What's the difference between effect and control edges of V8's TurboFan?

I've read many blog posts, articles, presentation and videos, even inspected V8's source code, both the bytecode generator, the sea-of-nodes graph generator and the optimization phases, and still couldn't find an answer.
V8's optimizing compiler, TurboFan, uses an IR of type "sea of nodes". All of the academic articles I found about it says that it's basically a CFG combined with a data-flow graph, and as such has two type of edges to connect nodes: data edges and control edges. Basically, if you take only the data edges you form a data-flow graph while if you choose the control edges you get a control flow graph.
However, TurboFan has one more edge type: "effect edges" (and effect phis). I suppose that this is what this slide means when it says that this is not "sea" of nodes but "soup" of nodes, because I couldn't find this term anywhere else. From what I understood, effect edges help the compiler keep the structure of statements/expressions that if reordered will have a visible side-effect. The example everyone uses is o.f = o.f + 1: the load has to come before the store (or we'll read the new value), and the addition has to come before the store, too (or otherwise we'll store the old value and uselessly increment the result).
But I cannot understand: isn't this the goal of control edges? From searching through the code I can see that almost every node has an effect edge and a control edge. Their uses isn't obvious though. From what I understand, in sea of nodes you use control edges to constrain evaluation order. So why do we need both effect and control edges? Probably I'm missing something fundamental, but I can't find it.
TL;DR: What's the use of effect edges and EffectPhi nodes, and how they're different from control edges.
Great thanks.
The idea of a sea-of-nodes compiler is that IR nodes have maximum freedom to move around. That's why in a snippet like o.f = o.f + 1, the sequence load-add-store is not considered "control flow". Only conditions and branches are. So if we slightly extend the example:
if (condition) {
o.f = o.f + 1;
} else {
// something else
}
then, as you describe in the question, effect dependencies ensure that load-add-store are scheduled in this order, and control dependencies ensure that all three operations are only performed if condition is true (technically, if the true-branch of the if-condition is taken). Note that this is important even for the load; for instance, o might be an invalid object if condition is false, and attempting to load its f property might cause a segfault in that case.

How I guarantee a graphviz cluster is ALWAYS on the end?

I looked here in many other questions, but none of them do what I want, some rely on messing with lots of settings totally unrelated to the cluster itself, and hoping it will end where you want.
Basically I want a cluster to be the last one, no matter what, like the "rank=sink" option would do to a node, but to a cluster.
How I do that without resorting to mininum length edges and other ugly hacks?
EDIT: end I mean the end of the graph, the bottommost item on the default direction, or rightmost item in the LR one.
I am unaware of anything as nice as rank=... However, This works pretty well:
Embed "everything else" within a (new) cluster. add peripheries=0 if you like
Create 1 or more invisible edges from nodes near the bottom of the "everything else" cluster to nodes near the top of the "end" cluster
digraph {
subgraph cluster0{
peripheries=0
a->b->c->d->e->f
}
subgraph cluster1{
x1->x2
}
f->x1 [style=dashed ]
}

Graphviz: how to use neato with very large graphs with subgraph clusters?

I have a large, but not really huge(?) graph, with 13 subgraph clusters containing about 100 nodes and 3,147 edges.
Dot crashes on Windows and seg faults on Linux.
This question suggests that the solution is to use neato, rather than dot.
But, this page says
Please note there are some quirks here ... only the DOT and FDP layout methods seem to support subgraphs
My output is a huge, black ball of spaghetti, no matter how far I zoom in. So I removed all of the messages but one, and that showed that the subgrphs appear to be drawn nested in each other.
They are absolutely not nested in the source file; here's a sample, with commercially sensitive names changed:
digraph G {
labelloc="t"; // place the label at the top (b seems to be default)
label="XXX message passing";
rankdir = "LR"
newrank = "true"
subgraph cluster_AAA {
label="AAA"
rank="same"
AAA_1
}
subgraph cluster_BBB {
label="BBB"
rank="same"
BBB_1
BBB_2
}
subgraph cluster_CCC {
label="CCC"
rank="same"
CCC_1
CCC_2
CCC_3
}
That certainly seems to be syntactically correct (the edges follow after).
So, it seems like that linked page was correct:
only the DOT and FDP layout methods seem to support subgraphs
BUT, it also seems like I need neato for a large graph.
What are my options?
[Updtae] I ran fdp and got the following error message
Error: node "xxx" is contained in two non-comparable clusters "AAA" and "BBB"
That seems to give a clue. Is it really the case that a node name may not be used in two clusters?
If so, the solution would seem to be to precede the node names with the cluster name ...
so I do not have a general solution to solve your problem.
But have you had a look at "mars"?
It's a command line tool designed specifically for the use of graphviz programs with very large graphs.
You can find it here: https://github.com/marckhoury/mars

Generating a directed acyclic graph from predefined elements with connection requirements

I am working on a system, when given a bank of different types of elements will create a directed acyclic graph connecting some or all the elements. Each element has some input A and an output B. When building the Graph, the system will need to make sure, the output of the previous node, matches the input of the current one.
The input and output of the nodes are to make sure only certain types of elements are connected
The elements would look like this
ElementName : Input -> Output
Possibly with multiple inputs/output, or with no outputs(See below).
One : X -> Y
Two : Y -> Z,F
Three : Y, Z -> W
Four : Z -> F
Five : F -> NULL
Note:
We are talking about a lot of different elements, 30 or so now, but the plan is to add more as time goes on.
This is part of a project to do a procedural generated narrative. The nodes are individual quests. The inputs are what you need to start the quest. The outputs are how the story state is effected.
Problem:
I have seen several different approaches to generating a random DAG, not one for making a DAG from some preset connection requirements(with rules on connecting them).
I also want some way of limiting complexity of the graph. i.e limit the number of branches they can have.
Idea of what I want:
You have a bunch of different types of legos in a bin, say 30. You have rules on connecting the Legos.
Blue -> Red
Blue -> White
Red -> Yellow
Yellow -> Green/Brown
Brown -> Blue
As you all know, in addition to a color each lego had a shape.So 2 blue legos may not be the same type of lego. So The goal is to build a large structure that fits our rules. Even with our rules, we can still connect the legos in a bunch of different structures.
P.S. I am hoping this is not to general of a question. If it is, please make a note and I will try to make it more specific.
It sounds like an L-system (aka Lindenmayer system) approach would work:
Your collection of Legos is analogous to an alphabet of symbols
Your connection rules correspond to a collection of production rules that expand each symbol into some larger string of symbols
Your starting Lego represents the the initial "axiom" string from which to begin construction
The resulting geometric structures is your DAG
The simplest approach would be something like: given a Lego, randomly select a valid connection rule & add a new Lego to the DAG. From there you could add in more complexity as needed. If you need to skew the random selection to favor certain rules, you're essentially building a stochastic grammar. If the selection of a rule depends on previously generated parts of the DAG it's a type of context sensitive grammar.
Graph rewriting, algorithmically creating a new graph out of base graph, might be a more literal solution, but I personally find that L-systems easier to internalize & that researching them yields results that are not overly academic/theoretical in nature.
L-systems themselves are a category of formal grammars. It might be worth checking into some of those related ideas, but it's pretty easy (for me at least) to get side tracked by theoretical stuff at the expense of core development.

Good data structure or database to represent objects and transitions between objects?

I'm having trouble choosing a data structure to use to help identify resources and transitions between resources. After the graph is defined, I'd like to run analysis on the transformation between resources to determine what inputs could yield what outputs.
For example, we could take traditional currency for example:
Dollar -> 3:2 -> Euros
Euros -> 2:3 -> Dollar
Euros -> 1:100 -> Yen
Yen -> 95:1 -> Euro
Yen -> 50:1 -> T-shirt
Dollar -> 2:1 -> Candy Bar
The typical use case would be to take some starting resources, such 5 USD and 100 Japan Yen, and determine what these could be transformed into: how many candy bars? how any tshirts? The graph would be much more complicated through with hundreds of resources each with potentially dozens of transitions to other resources.
Thanks for your ideas!
This sounds like a problem for a standard graph.
Let each resource be a node and connect 2 nodes if there's a transition between them, with the edge weight being the transition ratio.
These edges will probably need to be directed, and, if the transitions are inverses, have 2 edges, one in either direction. Alternatively, if the transitions are inverses, you can have an undirected graph and define the edge weight as the transition ratio from the 'smallest' node to the 'largest' (you'll need to have some possibly arbitrary ordering of node). By "transitions are inverses" I mean, if you go from any resource to any other resource, you can also go back again, and, if you do so, you get back the same amount as you originally started with (although, this doesn't appear to hold from the example).
Then you'll probably have to use breadth-first search (or similar) to determine getting from one resource to another.
In terms of SQL, a possible structure is as follows:
Resource
ID, ...
Transition
ResourceID1, ResourceID2, Cost

Resources