Creating large webgraphs on the command line - graphviz

Am I going about this this right way?
I have a file called input.txt that has data like this:
digraph G {
rankdir=LR
node [shape=box style=filled]
A->B
A->C
A->D
B->D
C->E
E->F
}
I then input that into dot.exe which creates the data.dot file:
C:\Graphviz\bin\dot.exe < input.txt > data.dot
Then I input the dot file back to create my svg file:
C:\Graphviz\bin\dot.exe -Tsvg data.dot > data.svg
It seems to work for a small amount of data in the input.txt file, but it seems to fail with larger data sets.
I replaced dot.exe with sfdp.exe but it creates nodes that overlap each other.

Related

gnuplot : variable paths to data file in a for loop

I would like to plot multiple curve on the same graph using a for loop. Each data file (named stat_coupe) is located in a different folder (fwal055wal055/rep16/ and fwal055wal055_c2/rep20/). fwal055wal055 and fwal055wal055_c2 correspond to names of simulation. First, I need to get a previous result, a single number (Utau), in other files (named file_fwal055wal055 and file_fwal055wal055_c2). This is successfully done thanks to the command awk. The result depend on the file: Utaufwal055wal055=10.5 and Utaufwal055wal055_c2=12.2.
Then I need to divid the 1st column of the file stat_coupe corresponding to the path fwal055wal055/rep16/ by the value of Utaufwal055wal055 and do the same thing for the file stat_coupe corresponding to the path fwal055wal055_c2/rep20/ with the value of Utaufwal055wal055_c2. Moreover, each plot should have a specific format which depend on the type of simulation run (fwal055wal055 or fwal055wal055_c2).
The presented problem is reduced to 2 simulations fwal055wal055 and fwal055wal055_c2 and 1 plot but I have about 20 simulations and 15 various graphs to plot that is why I would like to use the for loop.
To summary at each iteration I have:
a specific format,
a specific path,
a specific value of Utau
I want to indicate the wright format, path and value of Utau at each iteration of the for loop. The solution I propose below successfully permits to obtain the value of Utau for each simulation but the code #path_.i and #format_.i does not work.
#!/bin/bash
for elem in fwal055wal055 fwal055wal055_c2;
do
Utau[${elem}]=$(awk 'FNR==5{print $1}' file_$elem)
done
gnuplot -persist <<-EOFMarker
format_fwal055wal055='pt 1 ps 1.0 lc 0 title "WALE"'
format_fwal055wal055_c2='pt 2 ps 1.0 lc 0 title "WALE c2"'
path_fwal055wal055='"fwal055wal055/rep16/stat_coupe"'
path_fwal055wal055_c2='"fwal055wal055_c2/rep20/stat_coupe"'
list="fwal055wal055 fwal055wal055_c2"
plot for [i in list] #path_.i u 1:(\$2/${Utau[${i}]}) #format_.i
EOFMarker
I would like to obtain something equivalent to:
plot #path_fwal055wal055 u 1:(\$2/${Utau[${i}]}) #format_fwal055wal055,\
#path_fwal055wal055_c2 u 1:(\$2/${Utau[${i}]}) #format_fwal055wal055_c2
Can someone help me to solve this issue ?
Thank you very much,
Martin
Check help sprintf, help words and help word.
I would create two strings with the same number of items and then combine them with sprintf(). From gnuplot 5.2 on you could also do it with arrays.
# Version 1
PATHS = '"fwal055wal055/rep16/stat_coupe" "fwal055wal055_c2/rep20/stat_coupe"'
FILES = "fwal055wal055 fwal055wal055_c2"
plot for [i=1:words(FILES)] sprintf("%s_%s",word(PATHS,i),word(FILES,i)) u 1:2
or you could define a function for your filenames to keep the plot command short and readable.
# Version 2
PATHS = '"rep16/stat_coupe" "rep20/stat_coupe"'
FILES = "fwal055wal055 fwal055wal055_c2"
myFilename(i) = sprintf("%s/%s_%s",word(FILES,i),word(PATHS,i),word(FILES,i))
plot for [i=1:words(FILES)] myFilename(i) u 1:2
Addition (after some clarifications...)
If I understand your question now correctly, the following code should do the job.
For the extraction of the UTAUS you do a separate loop before plotting and store the extracted values in a string. During plotting you get these values back via word(UTAUS,i). Since you do the mathematical operation column(2)/word(UTAUS,i), gnuplot will interpret them as number. Check help words, help word, help sprintf, help every.
Code:
### extract and normalize in a loop with individual files and directories
reset session
FILES = 'fwal055wal055 fwal055wal055_c2'
DIRS = 'rep16 rep20'
TITLES = '"WALE" "WALE c2"' # if you have spaces you need to put it into double quotes
UTAUS = ''
# define functions for better readability
myExtractionFile(i) = sprintf("file_%s",word(FILES,i))
myDataFile(i) = sprintf("%s/%s/stat_coupe",word(FILES,i),word(DIRS,i))
myTitle(i) = word(TITLES,i)
# define point or line appearance. Add more if you have more files
set style line 1 pt 1 ps 1.0 lc 0
set style line 2 pt 2 ps 1.0 lc 1
# extract the UTAUs
do for [i=1:words(FILES)] {
set table $Dummy
plot myExtractionFile(i) u (utau=$1) every ::4::4 w table # extract value row 5, column 1 (not counting header lines)
unset table
UTAUS = UTAUS.sprintf(" %g",utau) # append the extracted value as string
}
plot for [i=1:words(FILES)] myDataFile(i) u 1:(column(2)/word(UTAUS,i)) ls i title myTitle(i)
### end of code

Is there a way to write single band raster from multiple raster stacks

I have 4 subfolder that contains 5 rasters with continuous values. So a build a loop with "for" function to :
list these raster files
stack these files per folder , i.e 4 rasterstacks objects (that contains 5 rasters)
I apllied a treshold to transform the the continuous raster in binary raster
Finally I wrote the binary raster using wirte.raster function.
My issue is in a step 4. Eventhough I use the argument "byLayer = T" in writeRaster function
the rasters saved were a rasterstack with the 5 binary rasters. And i want write it per raster, per file, per band
I really grateful if anyone give me any insights
setwd("Vole_raw_mean_Present/")
sub <- list.dirs(full.names=FALSE, recursive=FALSE)
for(j in 1:length(sub)) {
print(sub[j])
h <- list.files(path=sub[j], recursive=TRUE, full.names=TRUE, pattern='.tif')
print(h)
stack_present <- stack(h)
print(stack_present)
binary_0.2 <- stack_present >=0.2
writeRaster(binary_0.2, filename=paste0(sub[j], bylayer = T, suffix = "_bin.tif"), overwrite=TRUE)
}
This is wrong because the argument "bylayer" is lost as it becomes part of the filename)
writeRaster(binary_0.2, filename=paste0(sub[j], bylayer = T, suffix = "_bin.tif"), overwrite=TRUE)
It should be something like this (and it helps to do it in two steps)
f <- paste0(sub[j], "bin.tif")
writeRaster(binary_0.2, filename=f, bylayer=TRUE, overwrite=TRUE)
Illustrated here
library(raster)
b <- brick(system.file("external/rlogo.grd", package="raster"))
dir.create("test")
setwd("test")
writeRaster(b, filename="abc.tif", bylayer=T)
list.files()
#[1] "abc_1.tif" "abc_2.tif" "abc_3.tif"
writeRaster(b, filename="bin.tif", bylayer=T, suffix = paste0("f", 1:3))
list.files(pattern="bin")
#[1] "bin_f1.tif" "bin_f2.tif" "bin_f3.tif"
Alternatively, you can loop over the files within each folder

Only show unique edges in graphviz

I have an input file with about ~5000 lines and 1 to 9 nodes per line.
Many edges are not unique and I would like to only show the unique ones.
A more simple example.
graph {
a -- b
a -- b
a -- b
}
Yields
Is there a way to make the above graph yield something like
I know I could change the sample input to
graph {
a -- b
}
But it would not be easy to do that for my real input.
There actually is a way: Use the strict keyword:
strict graph G {
a -- b [label="First"];
a -- b [label="Second"];
a -- b [label="Third"];
}
Result:
Without strict, all three edges would be shown. Note that it only takes the first edge's attributes, contrary to what the documentation suggests.
Try strict:
strict graph {
a -- b
a -- b
a -- b
}
This yields
and should work for any size of graph.
In case you want to get a clean file, which doesn't contain any of the duplicate edges, you can use the graph processing tool gvpr.
Here is a snippet, which does just that:
BEG_G { graph_t g = graph($G.name,"U") }
E {
node_t h = clone(g,$.head);
node_t t = clone(g,$.tail);
if(isEdge(t,h,"")==NULL){
edge_t e = clone(g,$);
}
}
END_G { $O = g; }
save this as something like gvpr_rm_dupl_edges and run $ gvpr -f gvpr_rm_dupl_edges input.dot -o output.dot. gvpr comes preinstalled with graphviz.
In case of directed graphs, change the "U" in the beginning of the code snippet to "D"
I wrote this snippet for a simple graph, without sub-graphs. It might not work on something more sophisticated.

Merging of two part files with header as only first line Hadoop

how can i merge two or more part files in hadoop to single file in such a way that merge output is having entire data but, only one header that is in the 1st line of merge output .
File 1
column1|column2|column3
20000|newyork|john
30000|sydney|joseph
File n
column1|column2|column3
60000|delhi|mike
30000|sydney|joseph
Merged output should be
column1|column2|column3
20000|newyork|john
30000|sydney|joseph
60000|delhi|mike
30000|sydney|joseph
Is there any easy way using hadoop fs -cat command.. ?
or by any other method..
Method 1:
Leaving the headers on is fairly complicated without creating an index or rank, since in Pig a collection of tuples is unsorted. Here's what a Pig job looks like, using rank and order by to place the header on top.
header_ranked.pig
HEADER = LOAD 'header.txt' USING PigStorage('|') AS (b0:int,b1:chararray,b2:chararray,b3:chararray);
H1 = LOAD 'header_test' USING PigStorage('|') AS (c1:chararray,c2:chararray,c3:chararray);
F_H1 = FILTER H1 BY NOT (c1 MATCHES 'column1' AND c2 MATCHES 'column2' AND c3 MATCHES 'column3');
R_H1 = RANK F_H1 by c1 DESC DENSE;
U = UNION R_H1, HEADER;
O = ORDER U by rank_F_H1;
F = FOREACH O GENERATE c1,c2,c3;
dump F;
The two sample files, each containing 2 records and a header line, were placed in a directory called header_test. Additionally, in order for this program to work, I had to create a header file in the following format:
header.txt
0|column1|column2|column3
Walking through the code, the file containing the headers (slightly modified to include an additional column, which is the rank value of 0) is loaded into the HEADER alias.
Next the actual data is loaded into the H1 alias, as it grabs all files under the header_test directory.
F_H1 filters out all headers from the data. If you had 20 files that were loaded into H1 from the header_test directory, those 20 headers would now be filtered out of the data.
R_H1 creates a rank on the filtered data, in descending order and without skipping any numbers.
U effectively concatenates the ranked filtered data with the 0|column1|column2|column3 header line.
O orders the data by the rank, so that the header (which has a rank of 0), appears on top.
And finally, F gets rid of the ranking, leaving the clean tuples.
Results
(column1,column2,column3)
(60000,delhi,mike)
(30000,sydney,joseph)
(30000,sydney,joseph)
(20000,newyork,john)
Method 2:
Basically, leave the headers on one file, strip them from the rest, and then mash them together. Not sure it'll stay sorted, though, haven't tested it thoroughly.
H1 = LOAD 'header_test/header1.txt' USING PigStorage('|') AS (c1:chararray,c2:chararray,c3:chararray);
H2 = LOAD 'header_test/header2.txt' USING PigStorage('|') AS (d1:chararray,d2:chararray,d3:chararray);
F_H2 = FILTER H2 BY NOT (d1 MATCHES 'column1' AND d2 MATCHES 'column2' AND d3 MATCHES 'column3');
U = UNION H1, F_H2;
dump U;
Results
(column1,column2,column3)
(20000,newyork,john)
(30000,sydney,joseph)
(60000,delhi,mike)
(30000,sydney,joseph)

Programmatically specifying nodes of the same rank within networkx's wrapper for pygraphviz/dot

Is it possible to alter the following code to put Child_4 at the same horizontal level as Grandchild_0 (thereby pushing Grandchild_4 to its own level)?
import networkx as nx
import matplotlib.pyplot as plt
G = nx.DiGraph()
G.add_node("ROOT")
for i in xrange(5):
G.add_node("Child_%i" % i)
G.add_node("Grandchild_%i" % i)
G.add_edge("ROOT", "Child_%i" % i)
G.add_edge("Child_%i" % i, "Grandchild_%i" % i)
pos=nx.graphviz_layout(G,prog='dot')
nx.draw(G,pos,arrows=False)
plt.show()
The above code produces the following layout, which I'd like to alter by shifting a child down one level to be horizontally aligned with the grandchildren:
Within the Python network library networkx, I'm using graphviz's dot engine to render a tree (following this recommendation). I would like to control the y-position of the nodes by specifying which nodes should have the same height. The nodes might be at different depths in the tree.
I know I could control the node height if I wrote my own graphviz code through using the rank=same command (e.g., {rank=same; n4 -> p2;} [ex.]). However, I am relying on networkx.graphviz_layout() [doc | source] to generate the node positions, and graphviz_layout can send only command line arguments to pygraphviz. My attempts to use variants of nx.graphviz_layout(G, prog='dot', args="-Grank=same; n4 -> p2;") have failed. Is it possible to describe the desired node heights within the NetworkX wrapper for pygraphviz, or do I need to write my own wrapper around pygraphviz? Edit: The answer provides a new wrapper around pygraphviz. It would significantly simplify things to send the rank information within the existing NetworkX wrapper for pygraphviz. I'll change my accepted answer if someone can tell me how that might be possible.
I can't find a way to achieve this through the original networkx wrapper.
Instead, I've written a new wrapper for pygraphviz, with most lines copied from the source code. It adds a parameter sameRank = [] for a list of nodes-of-the-same-rank lists and a for loop around an invocation of pygraphviz.add_subgraph(listOfNodes,rank="same").
def graphviz_layout_with_rank(G, prog = "neato", root = None, sameRank = [], args = ""):
## See original import of pygraphviz in try-except block
## See original identification of root through command line
A = nx.to_agraph(G)
for sameNodeHeight in sameRank:
if type(sameNodeHeight) == str:
print("node \"%s\" has no peers in its rank group" %sameNodeHeight)
A.add_subgraph(sameNodeHeight, rank="same")
A.layout(prog=prog, args=args)
## See original saving of each node location to node_pos
return node_pos
In the question example, Child_4 can be pushed to the same horizontal level as Grandchild_0 through the line:
pos=graphviz_layout_with_rank(G, prog='dot',sameRank=[["Child_4","Grandchild_0"]])

Resources