graphviz nodes of different colors - graphviz

I am trying to create a directed graph with nodes of different filled colors. I have created a gv file like this:
digraph mentions {
"A" -> "B"
"A" -> "C"
"B" -> "C"
"B" -> "A"
"A" [shape=circle, style=filled, fillcolor=red]
"B" [shape=circle, style=filled, fillcolor=green]
"C" [shape=circle, style=filled, fillcolor=purple]
}
And my command line argument is:
ccomps -zX#0-1000 testGraphCalls.gv | \
grep "-" | cat <(echo "digraph mentions {") - <(echo "}") | \
sfdp -Gbgcolor=white -Ecolor=blue \
-Nwidth=1 -Nheight=1 -Nfixedsize=true \
-Nlabel='' -Earrowsize=0.4 -Gsize=75 -Gratio=fill \
-Tpng > test.png
However, the nodes of my white circles outlined in black. Any ideas of how I can get the nodes to fill properly?

What is all the stuff in your command? When you grep for "-" in your .dv file, grep won't print the three attribute lines.
Do you have the program dot, which you could use for a quick test of your graph file?

Instead of a .gv file, write your dependency code in a .dot file.
Use python profilers like graphviz, qcachegrind, etc. which automatically converts the .dot file to a calling tree.

Related

Parallelize a awk script with multiple input files and changing the name of the output file

I have a series of text files in a folder sub.yr_by_yr which I pass to a for loop to subset a Beagle file from the header. I want to parallelize this script to subset the Beagle file from the header values (which is done using my subbeagle.awk script). I use the title of the text files to export the subset to a new file name using the base pattern matching in bash (file11=${file1%.subbeagle.txt}) to get the desired output (MM.beagle.${file11}.gz)
for file1 in $(ls sub.yr_by_yr)
do
echo -e "Doing sub-samples \n $file1"
file11=${file1%.subbeagle.txt}
awk -f subbeagle.awk \
./sub.yr_by_yr/$file1 <(zcat ../MajorMinor.beagle.gz) | gzip > sub.yr_by_yr_beagle.files/MM.beagle.${file11}.gz
done
The for loop works, but takes for ever... hence the need for parallelization. the folder sub.yr_by_yr contains >10 files named
something like similar to this: sp.yrseries.site1.1.subbeagle.txt, sp.yrseries.site1.2.subbeagle.txt, sp.yrseries.site1.3.subbeagle.txt...
I've tried
parallel "file11=${{}%.subbeagle.txt}; awk -f $SUBBEAGLEAWKSCRIPT ./sub.yr_by_yr/{} <(zcat ../MajorMinor.beagle.gz) | gzip > sub.yr_by_yr_beagle.files/MM.beagle.${file11}.gz" ::: sub.yr_by_yr/*.subbeagle.txt
But it gives me 'bad substitution'
How could I use the awk script in parallel and rename the files accordingly?
Content of subbeagle.awk:
# Source: https://stackoverflow.com/questions/74451358/select-columns-based-on-their-names-from-a-file-using-awk
BEGIN { FS=OFS="\t" } # uncomment if input/output fields are tab delimited
FNR==NR { headers[$1]; next }
{ sep=""
for (i=1; i<=NF; i++) {
if (FNR==1 && ($i in headers)) {
fldids[i]
}
if (i in fldids) {
printf "%s%s",sep,$i
sep=OFS # if not set elsewhere (eg, in a BEGIN{}block) then default OFS == <space>
}
}
print ""
}
Content of MajorMinor.beagle.gz
marker allele1 allele2 FINCH_WB_ID1_splitMerged FINCH_WB_ID1_splitMerged FINCH_WB_ID1_splitMerged FINCH_WB_ID2_splitMerged FINCH_WB_ID2_splitMerged
chr1_34273 G C 0.79924 0.20076 3.18183e-09 0.940649 0.0593509
chr1_34285 G A 0.79924 0.20076 3.18183e-09 0.969347 0.0306534
chr1_34291 G C 0.666111 0.333847 4.20288e-05 0.969347 0.0306534
chr1_34299 C G 0.000251063 0.999498 0.000251063 0.996035 0.00396529
UPDATE:
I was able to get this from this source:
parallel "awk -f subbeagle.awk {} <(zcat ../MajorMinor.beagle.gz) | gzip > 'sub.yr_by_yr_beagle.files/MM.beagle.{/.}_test.gz'" ::: sub.yr_by_yr/*.subbeagle.txt
The only fancy thing that needs to be removed is the .subbeagle par of the input file name...
So the parallel tutorial helped me here:
parallel --rpl '{mymy} s:.*/::; s:\.[^.]+$::;s:\.[^.]+$::;' "awk -f subbeagle.awk {} <(zcat ../MajorMinor.beagle.gz) | gzip > 'sub.yr_by_yr_beagle.files/MM.beagle.{mymy}.gz'" ::: sub.yr_by_yr/*.subbeagle.txt
Let's break this:
--rpl '{mymy} s:.*/::; s:\.[^.]+$::;s:\.[^.]+$::;'
--rpl will "define a shorthand replacement string" (see parallel tutorial and another example here)
{mymy} is my 'new' replacement string, which will execute what is after it.
s:.*/::; is the definition to {/} (see parallel tutorial, search for "Perl expression replacement string", the last part of that section shows the definition of 7 'default' replacement strings)
s:\.[^.]+$::;s:\.[^.]+$::; removes 2 extensions (so .subbeagle.txt where .txt is the first extension and .subbeagle is the second)
"awk -f subbeagle.awk {} <(zcat ../MajorMinor.beagle.gz) | gzip > 'sub.yr_by_yr_beagle.files/MM.beagle.{mymy}.gz'"
is the subsetting and compressing par of the script. Note that the {mymy} is where the replacement will take place. As you can see {} will be in input string. The rest is unchanged!
::: sub.yr_by_yr/*.subbeagle.txt will pass all the files to parallel as input.
It took ~ 2 hours to do at least ~5 files, but using 22 cores, I could do all files this in a fraction of the time (~20 minutes)!

Leveraging graphviz to create a network weathermap configuration

Given a generated list of nodes and links, is there a way I can use dot or some other tool from the graphviz package to create coordinates for those nodes such that I in turn can use that information to generate a configuration file for network weathermap?
The answer is simple, calling dot or the other tools without an output argument printed the information I wanted to stdout.
I wrote this shell script to make a graph from an mrtg config file, but decided to not pursue the weathermap part, due to the results being too cluttered;
grep -P '^SetEnv.*MRTG_INT_IP="..*" MRTG_INT_DESCR=".*"' $1 | grep -v 'MRTG_INT_IP="127.' | grep -v 'MRTG_INT_IP="10.255.' |\
sed \
-e 's/SetEnv\[\(.*\.switch\.hapro\.no_.*\)]: MRTG_INT_IP="\(.*\)" MRTG_INT_DESCR="\(.*\)"/\1 \2 \3/' \
-e 's/\//_/g' |\
sort -t/ -k 1 -n -k 2 -n -k 3 -n -k 4 |\
gawk '
BEGIN { print "graph '$2' {"; }
{
graph[overlap=false];
v = "'$2'"
print v " -- " $3
}
END { print "}" }'
Thought I would share this in case someone else found it useful in the future.
I used the script like ./mkconf ../switch/mrtg.1c.conf 1c | dot -Tpng > test.png

Caesar Cypher Code Not Working

I am meant to create a Caesar Cypher that takes in a parameter and shifts the code based on that parameter but my code messes up with the Upper Case and lower case.
So, it's meant to be like:
$ echo "I came, I saw, I conquered." | ./caesar.sh
V pnzr, V fnj, V pbadhrerq.
but I get:
V pnzr, V FnJ, V pBADHERrq.
My code is:
#!/bin/sh
if [ -z "$#" ];then
rotation=13;
else
rotation=$((# % 16));
fi
tr $(printf %${rotation}s | tr ' ' '.')\a-zA-Z a-zA-Z
How can I fix this?
You are rotating across the entire double-alphabet, 'a-zA-Z', so 's' maps to 'F':
abcdefghijklmnopqrstuvwxyzABCDEFGHIKLMNOPQRSTUVWXYZ
|------------^
You apparently want to preserve case, so I would recommend that you apply two separate mappings: first, map 'a-z' to 'n-za-m' (or whatever, as appropriate for your input parameter). Then in the second pass, map capitals, 'A-Z' -> 'N-ZA-M'.
A basic adaptation of your scheme that works is:
rotation=$((${1:-13} % 26))
padding=$(printf "%${rotation}s" "" | tr ' ' '\001')
tr "${padding}a-z" "a-za-z" |
tr "${padding}A-Z" "A-ZA-Z"
This uses parameter expansion and arithmetic to determine the rotation.
It uses your basic mechanism for setting the padding, but uses Control-A instead of . as the padding character; you seldom have Control-A in your text.
The actual rotation commands deal with lower case separately from upper case.
With the script contained in a file script.sh, I got:
$ bash script.sh
I came, I saw, I conquered
Can you say SYZYGY after midnight?
V pnzr, V fnj, V pbadhrerq
Pna lbh fnl FLMLTL nsgre zvqavtug?
$ bash script.sh 3
I came, I saw, I conquered, and O, was it ever worthwhile!
Can you say SYZYGY after midnight? ABC...XYZ abc...xyz
L fdph, L vdz, L frqtxhuhg, dqg R, zdv lw hyhu zruwkzkloh!
Fdq brx vdb VBCBJB diwhu plgqljkw? DEF...ABC def...abc
$
The pipeline meant that the first line of input was not pushed through to the second tr command at the end of line.

How to copy lines between two strings to a new text file matching first string fully and second string partially

This is how my file input.txt looks like
$cat input.txt
*NODE_SET
a
b
c
d
*NODE
e
f
g
h
*XYZ
Now I want to output lines between NODE until it encounters another line containing "" to a new file "output.txt".
i.e. output.txt should contain:
*NODE
e
f
g
h
*XYZ
The problem here is *XYZ can be any unknown string; i.e. it can be *ELEMENT, *LOAD etc but star exists in t he begning of the line.
This sed should work:
sed -n '/^\*NODE$/,/^*/p' input.txt > output.txt
OUTPUT:
*NODE
e
f
g
h
*XYZ

How to draw multiple graphs with dot?

I have a print_dot() function that outputs dot on stdout.
That way I can do:
$ ./myprogram < input | dot -T x11
It works great when I try to print one graph.
Now when I print several graphs, nothing shows up. The dot window is blank, X11 and dot take all the CPU. Nothing is printed on stderr.
$ echo -e "graph { a -- b }" | dot -T x11 # work
$ echo -e "graph { a -- b } \n graph { c --d }" | dot -T x11 # doesn't work
# it seems to be interpreted nonetheless
$ echo -e "graph { a -- b } \n graph { c -- d } " | dot -T xdot
graph {
...
}
graph {
...
}
Also, when I remove the \n between the 2 graphs, only the first graph is interpreted (what a nice feature...):
$ echo -e "graph { a -- b } graph { c -- d } " | dot -T xdot
graph {
...
}
Piping the xdot output to dot again doesn't fix the problem.
So, how does one render multiple graphs with graphviz?
One calls dot multiple times. Or one puts everything into a single graph, taking care to avoid duplication of names.
Use gvpack
$ echo -e "graph { a -- b }\ngraph { c -- d }" | gvpack -u | dot -Tpng > graphs.png
Result
Simple script that reads graphs on stdin and opens multiple dot instance.
#!/usr/bin/perl
my $o;
my #l;
while(<>) {
if(/^\s*(di)?graph/) {
push #l, $o;
$o = '';
}
$o .= $_;
}
if($o =~ /graph/) {
push #l, $o;
}
for(#l) {
if(fork() == 0) {
open my $p, '| dot -T x11' or die $!;
print $p $_;
close $p;
exit 0;
}
}

Resources