I'm trying to parse a file that contains lines in a hierarchical structure. For example the file:
a b c
a b d
a B C
A B C
indicates that a contains b and B, that b contains c and d, that B contains C. A contains a different B which contains its own C.
This is much like a list of files.
I want to format this in a hierarchical bracketed way like:
a {
b {
c
d
}
B {
C
}
}
A {
B {
C
}
}
I couldn't come up with a decent way to do this. I thought that AWK would be my best bet, but came up short with how to actually implement it.
Context
My input is actually a list of files. I can of course separate the fields by spaces if needed, or keep them with /. The files are unordered and generated from a code-base during compile-time via inspection. My desired output is going to be a graphviz DOT file containing each file in its own subgraph.
Thus for the input:
a/b/c
a/b/d
a/B/C
A/B/C
the output would be
digraph {
subgraph cluster_a {
label = a
subgraph cluster_b {
label = b
node_1 [label=c]
node_2 [label=d]
}
subgraph cluster_B {
label = B
node_3 [label=C]
}
}
subgraph cluster_A {
label = A
subgraph cluster_B {
label = B
node_4 [label=C]
}
}
}
Does anybody know how I could get this processing done? I'm open to other tools as well, not just AWK.
NOTE: Depth is not fixed, though I could pre-compute the maximum depth if necessary. Not all leaves will be at the same depth either.
I'm open to other tools as well, not just AWK.
I offer this Python solution:
import sys
INDENT = ' '
NODE_COUNT = 1
def build(node, l):
x = l[0]
if x not in node:
node[x] = {}
if len(l) > 1:
build(node[x], l[1:])
def indent(s, depth):
print('%s%s' % (INDENT * depth, s))
def print_node(label, value, depth):
if len(value.keys()) > 0:
indent('subgraph cluster_%s {' % label, depth)
indent(' label = %s' % label, depth)
for child in value:
print_node(child, value[child], depth+1)
indent('}', depth)
else:
global NODE_COUNT
indent('node_%d [label=%s]' % (NODE_COUNT, label), depth)
NODE_COUNT += 1
def main():
d = {}
for line in sys.stdin:
build(d, [x.strip() for x in line.split()])
print('digraph {')
for k in d.keys():
print_node(k, d[k], 1)
print('}')
if __name__ == '__main__':
main()
Result:
$ cat rels.txt
a b c
a b d
a B C
A B C
$ cat rels.txt | python3 make_rels.py
digraph {
subgraph cluster_a {
label = a
subgraph cluster_b {
label = b
node_1 [label=c]
node_2 [label=d]
}
subgraph cluster_B {
label = B
node_3 [label=C]
}
}
subgraph cluster_A {
label = A
subgraph cluster_B {
label = B
node_4 [label=C]
}
}
}
If the depth is fixed at 3 levels
gawk -F/ '
{f[$1][$2][$3] = 1}
END {
n = 0
print "digraph {"
for (a in f) {
print " subgraph cluster_" a " {"
print " label = " a
for (b in f[a]) {
print " subgraph cluster_" b " {"
print " label = " b
for (c in f[a][b]) {
printf " node_%d [label=%s]\n", ++n, c
}
print " }"
}
print " }"
}
print "}"
}
' file
digraph {
subgraph cluster_A {
label = A
subgraph cluster_B {
label = B
node_1 [label=C]
}
}
subgraph cluster_a {
label = a
subgraph cluster_B {
label = B
node_2 [label=C]
}
subgraph cluster_b {
label = b
node_3 [label=c]
node_4 [label=d]
}
}
}
If the depth is arbitrary, things get complicated.
Related
How can I use gvpr to drop all nodes except those with outdegree=="0" and their parent nodes?
So given
A > B
B > C
B > D
D > E
drop A only.
create an array of all nodes to be deleted (or not).
Do not delete if
outdegree==0
or
if tail of edge where head.outdegree==0
BEGIN{
int DELETE[];
}
BEG_G{
$tvtype=TV_ne // nodes first
}
N{
if ($.outdegree==0){
print ("// DELETE: ", $.name);
DELETE[$]=0;
}else{
DELETE[$]=1;
}
}
E{
print ("// head: ", $.head);
if (DELETE[$.head]==0){
print ("// DELETE: ", $.name);
DELETE[$.tail]=0;
}
}
END_G{
node_t aNode;
for (DELETE[aNode]){
if (DELETE[aNode]==1){
delete($G, aNode);
}
}
}
I'm trying to draw a tree but have a problem with the following approach:
Use of 'invisible' nodes to connect levels of tree,
Use 'rank same' to draw nodes on the same level
Using this code I get following result
graph G{
edge [arrowhead = none];
splines = ortho;
rankdir = LR;
node [ shape="box" fixedsize = true width = 4 height = 1];
{ rank = same; "C" }
{ rank = same;
"B"
"A"
}
{ rank = same;
"F"
"D"
"E"
}
node [ shape="cricle" width = 0 height = 0 style=invis];
{ rank = same;
"B_Inv_Parent_1"
"C_Inv_Even_Children_0"
"A_Inv_Parent_1"
}
{ rank = same;
"F_Inv_Parent_2"
"D_Inv_Parent_2"
"A_Inv_Even_Children_1"
"E_Inv_Parent_2"
}
"C" -- "C_Inv_Even_Children_0";
"B_Inv_Parent_1" -- "C_Inv_Even_Children_0" -- "A_Inv_Parent_1";
"B_Inv_Parent_1" -- "B";
"A_Inv_Parent_1" -- "A";
"B" -- "F_Inv_Parent_2";
"F_Inv_Parent_2" -- "F";
"A" -- "A_Inv_Even_Children_1";
"D_Inv_Parent_2" -- "A_Inv_Even_Children_1" -- "E_Inv_Parent_2";
"D_Inv_Parent_2" -- "D";
"E_Inv_Parent_2" -- "E";
}
I have a problem in the 3rd level: D is drawn on top of the picture thus making a connection with E not ideal.
I would like to have the same results as with C, B and A.
I think the problem is with the order of nodes definition however, I can't manage to get it working whatever order I define them in.
Can anyone spot another problem with my code and suggest a fix?
I have cleaned up your code and re-arranged a few lines - after all, I think that introducing
F_Inv_Parent_2 -- D_Inv_Parent_2 -- A_Inv_Even_Children_1 -- E_Inv_Parent_2;
has been the key. You don't need to define edge arrows since you don't have a directed graph, and there is a typo in shape="cricle".
Here my edited version
graph G
{
splines = ortho;
rankdir = LR;
// node definitions
node [ shape="box" fixedsize = true width = 4 height = 1];
C
{ rank = same; B A }
{ rank = same; F D E }
node [ shape="point" width = 0 height = 0 ];
{ rank = same;
B_Inv_Parent_1
C_Inv_Even_Children_0
A_Inv_Parent_1 }
{ rank = same;
F_Inv_Parent_2
D_Inv_Parent_2
A_Inv_Even_Children_1
E_Inv_Parent_2 }
// edges
C -- C_Inv_Even_Children_0;
B_Inv_Parent_1 -- C_Inv_Even_Children_0 -- A_Inv_Parent_1;
B_Inv_Parent_1 -- B -- F_Inv_Parent_2;
A_Inv_Parent_1 -- A -- A_Inv_Even_Children_1;
F_Inv_Parent_2 -- D_Inv_Parent_2 -- A_Inv_Even_Children_1 -- E_Inv_Parent_2;
F_Inv_Parent_2 -- F;
D_Inv_Parent_2 -- D;
E_Inv_Parent_2 -- E;
}
and the result:
EDIT: I may have misunderstood your intention how you want to connect the third level - if so, replace
F_Inv_Parent_2 -- D_Inv_Parent_2 -- A_Inv_Even_Children_1 -- E_Inv_Parent_2;
with
F_Inv_Parent_2 -- D_Inv_Parent_2[ style = invis ];
D_Inv_Parent_2 -- A_Inv_Even_Children_1 -- E_Inv_Parent_2;
which gives you
EDIT No. 2, in response to yr comment:
Adding weight to the edge helps straightening it - I give the full code even though only two lines have changed (plus comments), for easier copy & paste:
graph G
{
splines = ortho;
rankdir = LR;
// node definitions
node [ shape="box" fixedsize = true width = 4 height = 1];
C
{ rank = same; B A }
{ rank = same; F D E }
node [ shape="point" width = 0 height = 0 ];
{ rank = same;
B_Inv_Parent_1
C_Inv_Even_Children_0
A_Inv_Parent_1 }
{ rank = same;
F_Inv_Parent_2
D_Inv_Parent_2
A_Inv_Even_Children_1
E_Inv_Parent_2 }
// edges
C -- C_Inv_Even_Children_0;
B_Inv_Parent_1 -- C_Inv_Even_Children_0 -- A_Inv_Parent_1;
// add extra weight to the continouous connection between four levels:
B_Inv_Parent_1 -- B -- F_Inv_Parent_2 -- F[ weight = 10 ];
// no weight here:
A_Inv_Parent_1 -- A -- A_Inv_Even_Children_1;
F_Inv_Parent_2 -- D_Inv_Parent_2[ style = invis ];
D_Inv_Parent_2 -- A_Inv_Even_Children_1 -- E_Inv_Parent_2;
// F_Inv_Parent_2 -- F; ### moved
D_Inv_Parent_2 -- D;
E_Inv_Parent_2 -- E;
}
Which gives you the disired straight line from B via F_Inv_Parent_2 to F which is actually the grandchild:
I've tried to make something like this using Graphviz:
x y z
| | |
# | |
a#__\| |
# /#b |
# #__\|
# # /#c
# d#/__#
# #\ x
# # |
e#/__# |
#\ # |
But ranking doesn't seem to be working as I expect. I want e to be below all of the other nodes.
digraph x
{
rankdir = tb;
size = "7.5, 7.5";
rank = source;
a -> b -> c -> d -> e;
subgraph "cluster x"
{
style=filled;
color=lightgrey;
label="x";
a -> e [style=invis];
}
subgraph "cluster y"
{
label="y";
b -> d [style=invis];
}
subgraph "cluster z"
{
label="z";
c;
}
}
I've tried to use clusterrank = global which sort of works, but then the subgraphs are not separated into a more obvious column and there's overlap over the columns. It also is not going to the right like I want. The following image highlights one of the overlaps in red, but as you can see there are 4.
digraph x
{
rankdir = tb;
rankstep=equally;
clusterrank = global;
size = "7.5, 7.5";
a -> b -> c -> d -> e;
subgraph "cluster x"
{
style=filled;
color=lightgrey;
label="x";
a -> e [style=invis];
}
subgraph "cluster y"
{
label="y";
b -> d [style=invis];
}
subgraph "cluster z"
{
label="z";
c;
}
}
I've tried to make a separate cluster that is going to have a guaranteed top to bottom ranking and then rank the appropriate clusters together, but it does the same as the previous attempt, removing the boxes seen the the first attempt and causing unwanted overlapping.
digraph x
{
rankdir = tb;
1 -> 2 -> 3 -> 4 -> 5;
a -> b -> c -> d -> e;
{ rank=same; 1; a; }
{ rank=same; 2; b; }
{ rank=same; 3; c; }
{ rank=same; 4; d; }
{ rank=same; 5; e; }
subgraph "cluster x"
{
style=filled;
color=lightgrey;
label="x";
a -> e [style=invis];
}
subgraph "cluster y"
{
label="y";
b -> d [style=invis];
}
subgraph "cluster z"
{
label="z";
c;
}
}
Anyone have any ideas as to try and get the layout I want?
As a side note, I tried to login to the Graphviz forum regarding this matter, but found that logging in from this page doesn't seem to work. I keep getting a long timeout problem. I check my email account and nothing is there. I try creating a new account with the same email and it says that the account is already in use. I then try and get them to reset my password and I get another timeout problem.
Does anyone know who I can contact to try and fix that annoying login problem? Maybe someone who is already logged in can post that for me?
Run dot with -Gnewrank. That will get you what you want based on your sketch. If more tweaks are needed, please specify what you are after.
Your last solution will work as soon as you do a minor tuning
Use newrank=true to avoid of "unboxing" clusters
Play with splines=... to adjust arrows
Define label as separate nodes.
digraph x
{
rankdir = tb;
newrank=true;
splines=ortho;
0 -> 1 -> 2 -> 3 -> 4 -> 5;
X; Y; Z;
a -> b -> c -> d -> e;
{ rank=same; 0 X Y Z}
{ rank=same; 1; a; }
{ rank=same; 2; b; }
{ rank=same; 3; c; }
{ rank=same; 4; d; }
{ rank=same; 5; e; }
subgraph "cluster x"
{
style=filled;
color=lightgrey;
a -> e [style=invis];
}
subgraph "cluster y"
{
b -> d [style=invis];
}
subgraph "cluster z"
{
c;
}
}
My structure has two main chains with side nodes in sub graphs. Every thing looks nice but when i close the two chains all the boxes in the sub graphs jumps to the right side.
At the end of my code you can remove the "I"->"J" then you can see the best what I mean.
I am not a native English speaker, sorry about my English and I am a graphviz newbie.
digraph G {
size ="6,6";
node [color=black fontsize=12, shape=box, fontname=Helvetica];
subgraph {
rank = same;
"b"->"B"[arrowhead=none];
}
subgraph {
rank=same;
"c"->"C"[arrowhead=none];
}
subgraph {
rank=same;
"e"->"E" [arrowhead=none];
}
subgraph {
rank = same;
"f"->"F"[arrowhead=none];
}
subgraph {
rank = same;
"g"->"G"[arrowhead=none];
}
"0" -> "A" -> "B" -> "C"->"D" -> "E" -> "F" -> "G" -> "H"->"I";
"0" -> "K"->"L"->"M"->"N"->"O" ->"P"->"1";
subgraph {
rank = same;
"L"->"l"[arrowhead=none];
}
subgraph {
rank=same;
"M"->"m"[arrowhead=none];
}
subgraph {
rank=same;
"N"->"n" [arrowhead=none];
}
subgraph {
rank = same;
"O"->"o"[arrowhead=none];
}
subgraph {
rank = same;
"P"->"p"[arrowhead=none];
}
"1"->"J";
"I"->"J";
}
and with "I"->"J"; removed:
This is how I'd go about it: Create a cluster for each main chain with its side nodes:
digraph G {
size ="6,6";
node [color=black fontsize=12, shape=box, fontname=Helvetica];
subgraph[style=invis];
subgraph cluster0 {
A -> B -> C -> D -> E -> F -> G -> H -> I;
edge[arrowhead=none];
{rank = same; b->B;}
{rank = same; c->C;}
{rank = same; e->E;}
{rank = same; f->F;}
{rank = same; g->G;}
}
subgraph cluster1 {
K -> L -> M -> N -> O -> P -> 1 -> J;
edge[arrowhead=none];
{rank = same; L->l;}
{rank = same; M->m;}
{rank = same; N->n;}
{rank = same; O->o;}
{rank = same; P->p;}
}
0 -> A;
0 -> K;
I -> J;
}
Resulting in:
I'm trying to have an edge between clusters in Graphviz where the edge does not affect the ranking.
This looks fine:
digraph {
subgraph clusterX {
A
B
}
subgraph clusterY {
C
D
}
A -> B
B -> C [constraint=false]
C -> D
}
However when I add a label to the C -> D edge the B -> C edge tries to circumvent said label (which looks ugly).
digraph {
subgraph clusterX {
A
B
}
subgraph clusterY {
C
D
}
A -> B
B -> C [constraint=false]
C -> D [label=yadda]
}
Any idea how I can keep the edge from B to C straight?
The easiest way to achieve this is to add splines=false to the dot file - this forces the rendering of the edges to be straight lines:
digraph {
splines=false;
subgraph clusterX {
A;
B;
}
subgraph clusterY {
C;
D;
}
A -> B;
B -> C [constraint=false];
C -> D [label=yadda];
}
Output:
You can use this version :
digraph G {
subgraph cluster_X {
A [ pos = "0,1!" ];
B [ pos = "0,0!" ];
}
subgraph cluster_Y {
C [ pos = "1,1!" ];
D [ pos = "1,0!" ];
}
A -> B
B -> C[label="yadda"]
C -> D;
}
Then you use neato (not dot)
neato -Tpng -oyadda.png yadda.dot
And the result is :
Instead of the label attribute, you can use the attributes xlabel or headlbel or taillabel.
Result with xlabel:
Script:
digraph {
subgraph clusterX { A B }
subgraph clusterY { C D }
A -> B
B -> C [constraint=false]
C -> D [xlabel=yadda]
}
Result with headlabel:
Script:
digraph {
subgraph clusterX { A B }
subgraph clusterY { C D }
A -> B
B -> C [constraint=false]
C -> D [headlabel=yadda]
}
Result with taillabel:
Script:
digraph {
subgraph clusterX { A B }
subgraph clusterY { C D }
A -> B
B -> C [constraint=false]
C -> D [taillabel=yadda]
}