Below is the code meant to output successive images for a gif file:
for i in {1..600}
do
python Phy_asg.py $i
gnuplot <<- EOF
unset tics;unset key;unset border
set xrange [-15:15]
set yrange [-15:15]
set arrow 1 from 0.012*$i,cos(0.012*$i)-pi to sin(0.024*$i),cos(0.012*$i ) nohead ls 8 lw 2
set arrow 2 from sin(0.024*$i)+pi,0.012*$i to sin(0.024*$i),cos(0.012*$i ) nohead ls 8 lw 2
plot "< seq -9 .2 -3.1" u (cos(2*$1)):($1) with lines
replot "< seq -9 .2 -3.1" u ($1):(cos(2*$1)) with lines
replot "data_asg.txt" with lines lt 22 lw 2
set terminal png size 512,512
set output "Phy_gif_$i.png"
replot
EOF
done
Here the Phy_asg.py is python script to produce data in form of text file and its name is data_asg.txt. The shell gives me error in the line 10. It says:
gnuplot> plot "< seq -9 .2 -3.1" u (cos(2*)):() with lines
^
line 0: invalid expression
I am not able to figure out the problem. Is it with the seq command or formatting error.
The $1 is interpreted as shell parameter instead of data column. Either escape the dollar, \$1 or use column(1), I prefer latter
for i in {1..600}
do
python Phy_asg.py $i
gnuplot <<- EOF
set terminal png size 512,512
set output "Phy_gif_$i.png"
unset tics;unset key;unset border
set xrange [-15:15]
set yrange [-15:15]
set arrow 1 from 0.012*$i,cos(0.012*$i)-pi to sin(0.024*$i),cos(0.012*$i ) nohead ls 8 lw 2
set arrow 2 from sin(0.024*$i)+pi,0.012*$i to sin(0.024*$i),cos(0.012*$i ) nohead ls 8 lw 2
set style data lines
plot "< seq -9 .2 -3.1" u (cos(2*column(1) )):1, \
"< seq -9 .2 -3.1" u 1:(cos(2*column(1))), \
"data_asg.txt" lt 22 lw 2
EOF
done
Related
Dealing with the analysis of multi-column data, organized in the following format:
#Acceptor DonorH Donor Frames Frac AvgDist AvgAng
lig_608#O2 GLU_166#H GLU_166#N 708 0.7548 2.8489 160.3990
lig_608#O3 THR_26#H THR_26#N 532 0.5672 2.8699 161.9043
THR_26#O lig_608#H15 lig_608#N6 414 0.4414 2.8509 153.3394
lig_608#N2 HIE_163#HE2 HIE_163#NE2 199 0.2122 2.9167 156.3248
GLN_189#OE1 lig_608#H2 lig_608#N4 32 0.0341 2.8899 156.4308
THR_25#OG1 lig_608#H14 lig_608#N5 26 0.0277 2.8906 160.9933
lig_608#O4 GLY_143#H GLY_143#N 25 0.0267 2.8647 146.5977
lig_608#O3 THR_25#HG1 THR_25#OG1 16 0.0171 2.7618 152.3421
lig_608#O2 GLN_189#HE21 GLN_189#NE2 15 0.0160 2.8947 154.3567
lig_608#N7 ASN_142#HD22 ASN_142#ND2 10 0.0107 2.9196 147.8856
lig_608#O4 ASN_142#HD21 ASN_142#ND2 9 0.0096 2.8462 148.4038
HIE_41#O lig_608#H14 lig_608#N5 9 0.0096 2.8693 148.4560
GLN_189#NE2 lig_608#H2 lig_608#N4 7 0.0075 2.9562 153.6447
lig_608#O4 ASN_142#HD22 ASN_142#ND2 4 0.0043 2.8954 158.0293
THR_26#O lig_608#H14 lig_608#N5 2 0.0021 2.8259 156.4279
lig_608#O4 ASN_119#HD21 ASN_119#ND2 1 0.0011 2.8786 144.1573
lig_608#N2 GLU_166#H GLU_166#N 1 0.0011 2.9295 149.3281
My gnuplot script integrated into BASH filters data, selecting only two columns matching the conditions: 1) either index from the 1st or 3rd column excluding pattern started from "lig"; 2) values from the 5th column that are > 0.05
#!/bin/bash
output=$(pwd)
# begining pattern of each processed file
target='HBavg'
# loop each file and create a bar graph
for file in "${output}"/${target}*.log ; do
file_name3=$(basename "$file")
file_name2="${file_name3/.log/}"
file_name="${file_name2/${target}_/}"
echo "vizualisation with Gnuplot!"
cat <<EOS | gnuplot > ${output}/${file_name2}.png
set term pngcairo size 800,600
### conditional xtic labels
reset session
set termoption noenhanced
set title "$file_name" font "Century,22" textcolor "#b8860b"
set tics font "Helvetica,10"
FILE = "$file"
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
set yrange [0:1]
set key off
set style fill solid 0.5
set boxwidth 0.9
set grid y
#set xrange[-1:5]
set table \$Filtered
myTic(col1,col2) = strcol(col1)[1:3] eq 'lig' ? strcol(col2) : strcol(col1)
plot FILE u ((y0=column(5))>0.05 ? sprintf("%g %s",y0,myTic(1,3)) : '') w table
unset table
plot \$Filtered u 0:1:xtic(2) w boxes, '' u 0:1:1 w labels offset 0,1
### end of script
EOS
done
eventually it writes filtered data into a new table producing a multi-bar plot which looks like:
As we may see here the bars are pre-sorted according to the values on Y (corresponded to the values from the 5th column of initial data). How would it be possible rather to sort bars according to the alphabetic order of the naming patterns displayed on X (eventually changing the order of the displayed bars on the graph)?
Since the original data is alway sorted according to the 5th column (Frac), would it be possible to resort it directly providing to Gnuplot ?
the idea may be to pipe it directly in gnuplot script with awk and sort e.g:
plot "<awk -v OFS='\t' 'NR > 1 && \$5 > 0.05' $file | sort -k1,1" using 0:5:xtic(3) with boxes
how could I do the same with my script (where the data is filtered using gnuplot and I need only to sort the bars produced via):
plot \$Filtered u 0:1:xtic(2) w boxes, '' u 0:1:1 w labels offset 0,1
edit: added color alternation
I would stick to external tools for processing the data then call gnuplot:
#!/bin/bash
{
echo '$data << EOD'
awk 'NR > 1 && $5 > 0.05 {print ($1 ~ /^lig/ ? $2 : $1 ), $5}' file.log |
sort -t ' ' -k1,1 |
awk -v colors='0x4472c4 0xed7d31' '
BEGIN { nc = split(colors,clrArr) }
{ print $0, clrArr[NR % nc + 1] }
'
echo 'EOD'
cat << 'EOF'
set term pngcairo size 800,600
set title "file.log" font "Century,22" textcolor "#b8860b"
set xtics noenhanced font "Helvetica,10"
set xlabel "H-bond donor, residue"
set ylabel "Fraction, %"
set yrange [0:1]
set key off
set boxwidth 0.9
set style fill solid 1.0
plot $data using 0:2:3:xtic(1) with boxes lc rgb var, \
'' using 0:2:2 with labels offset 0,1
EOF
} | gnuplot > file.png
remarks:
The problem with printing the values on top of the bars in Gnuplot is that you can't do it directly from a stream, you need a file or a variable. Here I saved the input data into the $data variable.
You'll be able to expand shell variables in the HEREDOC if you unquote it (<< 'EOF' => << EOF), but you have to make sure that you escape the $ of $data
The simplest way to add colors is to add a "color" field in the output of awk but the sorting would mess it up; that's why I add the color in an other awk after the sort.
I have been using a script I created some time ago to monitor the convergence of some numerical calculations. What it does is, extract some data with awk, write them in some files and then I use gnuplot to plot the data in a dumb terminal. It works ok but lately I have been wondering if I am writing too much to the disk for such a task and I am curious if there is a way to use gnuplot to plot the result of awk without the need to write the result in a file first.
Here is the script I wrote:
#!/bin/bash
#
input=$1
#
timing=~/tmp/time.dat
nriter=~/tmp/nriter.dat
totenconv=~/tmp/totenconv.dat
#
test=false
while ! $test; do
clear
awk '/total cpu time/ {print $9-p;p=$9}' $input | tail -n 60 > $timing
awk '/ total energy/ && !/!/{a=$4; nr[NR+1]}; NR in nr{print a," ",$5}' $input | tail -n 60 > $nriter
awk '/!/{a=$5; nr[NR+2]}; NR in nr{print a," ",$5}' $input > $totenconv
gnuplot <<__EOF
set term dumb feed 160, 40
set multiplot layout 2, 2
#
set lmargin 15
set rmargin 2
set bmargin 1
set autoscale
#set format y "%-4.7f"
#set xlabel "nr. iterations"
plot '${nriter}' using 0:1 with lines title 'TotEn' axes x1y1
#
set lmargin 15
set rmargin 2
set bmargin 1
set autoscale
#set format y "%-4.7f"
#set xlabel "nr. iteration"
plot '${nriter}' using 0:2 with lines title 'Accuracy' axes x1y1
#
set rmargin 1
set bmargin 1.5
set autoscale
#set format y "%-4.7f"
set xlabel "nr. iteration"
plot '${totenconv}' using 1 with lines title 'TotEnConv' axes x1y1
#
set rmargin 1
set bmargin 1.5
set autoscale
set format y "%-4.0f"
set xlabel "nr. iteration"
plot '${timing}' with lines title 'Timing (s)' axes x1y1
#plot '${totenconv}' using 2 with lines title 'AccuracyConv' axes x1y1
__EOF
# tail -n 5 $input
# echo -e "\n"
date
iter=$(grep " total energy" $input | wc -l)
conviter=$(awk '/!/' $input | wc -l)
echo "number of iterations = " $iter " converged iterations = " $conviter
sleep 10s
if grep -q "JOB DONE" $input ; then
grep '!' $input
echo -e "\n"
echo "Job finished"
rm $nriter
rm $totenconv
rm $timing
date
test=true
else
test=false
fi
done
This produces a nice grid of four plots when the data is available, but I would be great if I could avoid writing to disk all the time. I don't need this data when the calculation is finished, just for this monitoring purpose.
Also, is there a better way to do this? Or is gnuplot the only option?
Edit: I am detailing what the awk bits are doing in the script as requested by #theozh:
awk '/total cpu time/ {print $9-p;p=$9}' $input - this one searches for the pattern total cpu time which appears many times in the file $input and goes to the column 9 on the line with the pattern. There it finds a number which is a time in seconds. It takes the difference between the number it finds and the one that it was found before.
awk '/ total energy/ && !/!/{a=$4; nr[NR+1]}; NR in nr{print a," ",$5}' $input - this searches for the patter total energy (there are 5 spaces before the work total) and takes the number it finds on column 4 and also goes to the second line below the line with the pattern and takes the number found at column 5
awk '/!/{a=$5; nr[NR+2]}; NR in nr{print a," ",$5}' $input - here it searches for the pattern ! and takes the number at column 5 from the line and then goes 2 lines below and takes the number at column 5.
awk works with lines and each line is devided in columns. for example the line below:
This is an example
Has 4 columns separated by the space character.
Thank you for your awk explanations, I learned again something useful.
I don't want to say that the gnuplot-only solution will be straightforward, efficient and easy to understand, but it can be done.
The assumption is that the columns or items are separated by spaces.
The ingredients are the following:
since gnuplot 5.0 you have datablocks (e.g. $Data) and since gnuplot 5.2.0 you can address the lines via index, e.g. $Data[i]. Check help datablocks. Datablocks are no files on disk but data in memory.
writing data to a datablock via with table, check help table.
to check whether a string is contained within another string you can use strstr(), check help strstrt.
use the ternary operator (check help ternary) to create a filter
to get the nth item in a string (separated by spaces) check help word.
! is the negation (check help unary)
although there is a line counter $0 in gnuplot (check help pseudocolumns) but it will be reset to 0 if you have a double empty line. That's why I would your my counter, e.g. via n=0 and n=n+1.
As far as I know, if you're using your gnuplot script in bash, you have to escape the gnuplot $ with \$, e.g. \$Data.
In order to mimic tail -n 60, i.e. only plot the last 60 datapoints of a datablock, you can use, e.g.
plot $myNrIter u ($0>|$myNrIter|-60 ? $0 : NaN):1 w lp pt 7 ti "Accuracy"
Again, it is maybe not easy to follow. The code below can maybe still be optimized.
The following might serve as a starting point and I hope you can adapt it to your needs.
Code:
### mimic an awk script using gnuplot
reset session
# if you have a file you would first need to load it 1:1 into a datablock
# see here: https://stackoverflow.com/a/65316744/7295599
$Data <<EOD
# some header of some minimal example data
1 2 3 4 5 6 7 8 9
1 2 total cpu time 6 7 8 9.1
something else
1 2 total cpu time 6 7 8 9.2
1 total energy 4.1 5 6 7 8 9
1 2 3 4 5.1 6 7 8 9
! 2 3 4 5.01 6 7 8 9
1 one line below exclamation mark
1 2nd line below 5.11 exclamation mark
1 2 total cpu time 6 7 8 9.4
1 total energy 4.2 5 6 7 8 9
1 2 3 4 5.2 6 7 8 9
1 2 total cpu time 6 7 8 9.5
# again something else
! 2 3 4 5.02 6 7 8 9
1 one line below exclamation mark
1 2nd line below 5.22 exclamation mark
1 2 total cpu time 6 7 8 9.9
1 total energy 4.3 5 6 7 8 9
1 2 3 4 5.3 6 7 8 9
! 2 3 4 5.03 6 7 8 9
1 one line below exclamation mark
1 2nd line below 5.33 exclamation mark
EOD
set datafile missing NaN # missing data NaN
set datafile commentschar '' # no comment lines
found(n,s) = strstrt($Data[n],s)>0 # returns true or 1 if string s is found in line n of datablock
item(n,col) = word($Data[n],col) # returns column col of line n of datablock
set table $myTiming
myFilter(n,col) = found(n,'total cpu time') ? (p0=p1,p1=item(n,col),p1-p0) : NaN
plot n=(p1=NaN,0) $Data u (n=n+1, myFilter(n,9)) w table
set table $myNrIter
myFilter(n,col1,col2) = found(n,' total energy') && !found(n,'!') ? \
sprintf("%s %s",item(n,col1),item(n+1,col2)) : NaN
plot n=0 $Data u (n=n+1, myFilter(n,4,5)) w table
set table $myTotenconv
myFilter(n,col1,col2) = found(n,'!') ? sprintf("%s %s",item(n,col1),item(n+2,col2)) : NaN
plot n=0 $Data u (n=n+1, myFilter(n,5,5)) w table
unset table
print $myTiming
print $myNrIter
print $myTotenconv
set multiplot layout 2,2
plot $myNrIter u 0:1 w lp pt 7 ti "Accuracy"
plot $myNrIter u 0:2 w lp pt 7 ti "TotEnConv"
plot $myTotenconv u 0:1 w lp pt 7 ti "AccuracyConv"
plot $myTiming u 0:1 w lp pt 7 ti "Timing (s)"
unset multiplot
### end of code
Result: (printout and plot)
0.1
0.2
0.1
0.4
4.1 5.1
4.2 5.2
4.3 5.3
5.01 5.11
5.02 5.22
5.03 5.33
i need to extract path from string
for e.g
title="set key invert ; set bmargin 0 ; set multiplot ; set size 1.0 , 0.33 ; set origin 0.0 , 0.67 ; set format x "" ; set xtics offset 15.75 "1970-01-01 00:00:00" , 31556736 ; plot "/usr/local/lucid/www/tmp/20171003101438149255.dat" using 1:5 notitle with linespoints ls 2'"
Then expected output should be
/usr/local/lucid/www/tmp/20171003101438149255.dat
using awk or grep
sed approach:
title='set key invert ; set bmargin 0 ; set multiplot ; set size 1.0 , 0.33 ; set origin 0.0 , 0.67 ; set format x "" ; set xtics offset 15.75 "1970-01-01 00:00:00" , 31556736 ; plot "/usr/local/lucid/www/tmp/20171003101438149255.dat" using 1:5 notitle with linespoints ls 2'
sed 's/.* plot "\([^"]\+\).*/\1/' <<<$title
/usr/local/lucid/www/tmp/20171003101438149255.dat
With grep solution,
grep -oP '"\K/[^"]*(?=")' <<< $title
With awk solution,
awk '{match($0,/\/[^"]*/,a);print a[0]}' <<< $title
Shorter regex with grep:
grep -oP 'plot "\K[^"]+' <<< $title
/usr/local/lucid/www/tmp/20171003101438149255.dat
I have a database file that looks like:
aaa bb ccc 2 3.34534 kkk 3 4.5099 34%
rr wie fff 4 4.59050 asd 6 5.0983 1.345%
I need to plot a range starting at the 'y' value of the 5th column (i.e. 3.34534) up to the value on the 8th column. Or lets say, a y=3.34534 line with line width of 4.5099-3.34534 for the first line. Or, some sort of filled curve between y=3.34534 and y=4.5099 for the first line. This has to be done for all lines a filled curve between the value on the 5th column and the 8th column. The question is, how to access those values and input them into gnuplot. A shell script maybe? (So far I have managed to save the values to an array x() and y(): for value in column5 first line accessed by ${x[0]} and the one in the 8th column to ${y[0]}, the question now would be how to input values from the array into the gnuplot syntax via EOF>>). Any help appreciated.
If you want to have everything together in the bash script, you can first define a variable, which contains all gnuplot code (see e.g. BASH: Keeping formatting but substituting variables):
read -r -d '' GNUPLOT_SCRIPT <<EOF
set xrange [0:1];
plot x
EOF
Note, that with that construct every line of the gnuplot code must be terminated with a ;.
For the plotting I would use the boxxyerrorbars plotting style, which plots boxes at a point with a given width and height. In the gnuplot using statement, the first and second value are the x and y values of the box center, the third and fourth values give the half box width and height.
You didn't say anything about the x-values, so I chose the xrange to be from 0 to 1.
Assuming, that you "database" is in a string, the bash script looks like the following:
#!/bin/bash
database="aaa bb ccc 2 3.34534 kkk 3 4.5099 34%
rr wie fff 4 4.59050 asd 6 5.0983 1.345%"
read -r -d '' GNUPLOT_SCRIPT <<EOF
set xrange[0:1];
set style fill solid 1.0;
set style data boxxyerrorbars;
unset key;
plot '-' using (0.5):(0.5*(column(5)+column(8))):(0.5):(abs(0.5*(column(5) - column(8))))
EOF
echo "$database" | gnuplot -persist -e "$GNUPLOT_SCRIPT"
If you want to save the plot in a file, you don't need the -persist option.
To answer myself the other part of the question, I figured out how to do it from a file.dat Assuming the database is not in a string but in file.dat, and the result is out as a .png image.
gnuplot << EOF
set terminal png
set output "niceplot.png"
plot "file.dat" using (0.5):(0.5*(column(5)+column(9))):(0.5):(abs(0.5*(column(5) - column(9)))) with boxxy fs solid 1 noborder lc rgb "red" title "Range"
EOF
Where, fs solid 1 noborder lc rgb "red" title "Range", is some styling for gnuplot.
Thanks Christoph for suggesting the error box.
So I'm looking for some quick-and-dirty solution.
The problem:
I am trying to plot a specific section of a data file with gnuplot. This is fine. The basic line goes something like
plot "<(sed -n '1,100p' pointsandstuff.dat)" u 1:log($4**2+$5**2) notitle
This works just fine. The next step I want is to include in my title another part of the data, namely the data entry $3 (which for the points listed is identical, so I can parse it from anywhere). I run into problem because, while plot seems fine, I can't seem to feed regex info into 'title'. An example of something that doesn't work"
plot "<(sed -n '1,100p' pointsandstuff.dat)" u 1:log($4**2+$5**2) title "<(sed -n '1,1p' pointsandstuff.dat)"
(This would spit out a whole data line, in theory, though in practice I just get the title "<(sed...")
I tried attacking this with a bash script, but the '$'s that I use throw the bash script into a tizzy:
#!/bin/bash
STRING=$(echo|sed -n '25001,25001p' pointsandstuff.dat)
echo $STRING
gnuplot -persist << EOF
set xrange[:] noreverse nowriteback
set yrange[:] noreverse nowriteback
eval "plot "<(sed -n '25001,30000p' pointsandstuff.dat)" u 1:log($4**2+$5**2) title $STRING
EOF
Bash won't know what to do with '$4' and '$5'.
You seem to be attempting process substitution, but the double quotes stop it working in the first case and you need a command substitution in the second case.
You have:
plot "<(sed -n '1,100p' pointsandstuff.dat)" u 1:log($4**2+$5**2) \
title "<(sed -n '1,1p' pointsandstuff.dat)"
You need:
plot <(sed -n '1,100p' pointsandstuff.dat) u 1:log($4**2+$5**2) \
title "$(sed -n '1,1p' pointsandstuff.dat)"
The double quotes in the second case might not be strictly necessary, but you won't go wrong with them present.
Process substitution generates a file name and feeds the output of the nested command into that file; the command thinks it is reading a file (because it is reading a file).
Command substitution captures the output of the nested command in a string and passes that string to the command (when it is used as an argument to a command, as here).
My understanding of the question is a little hazy, but it looks like you want to plot the first 100 lines -- This is quite easy to do:
plot '< head -100 datafile.dat' u ....
Of course, you can use sed if you wish (or awk or ...). A gnuplot only solution might look like this:
plot 'datafile.dat' u ($0 > 100? 1/0:$1):(log($4**2+$5**2))
Or like this (which is more simple for regular selections):
plot 'datafile.dat' every ::25001::30000 u 1:(log($4**2+$5**2)
and explained in more detail in another answer.
Now, if you want the title to come from the datafile, you can parse it out using gnuplot's backtic substitution:
plot ... title "`head -1 datafile.dat | awk '{print $3}'`"
which is essentially the same as gnuplot's system command:
plot ... title system("head -1 datafile.dat | awk '{print $3}'")
but in this case, you might be able to use the columnhead function:
plot ... title columnhead(3)
Aha, thanks all. Have come up with a few solutions by now--the simplest being just escaping those $s from before (which I mistakenly thought gnuplot disliked...). To whit:
STRING=$(echo|sed -n '1,1p' spointsandstuff.dat)
echo $STRING
gnuplot -persist << EOF
set xrange[:] noreverse nowriteback
set yrange[:] noreverse nowriteback
eval "plot "<(sed -n '1,100p' pointsandstuff.dat)" u 1:(log(\$4**2+\$5**2)) title '$STRING'
!gv diag_spec.eps &
EOF
Thanks all, though--it's been a good excuse to play with this stuff...here's hoping that, if any poor soul sees this script later, it might be a bit easier on them.
Just for the records, instead of $4 or $5 you could write column(4) and column(5) and you don't have to think about escaping $ or not.
Furthermore, there is no need for sed, awk, head or other system calls which make the script platform dependent.
Here is a platform-independent gnuplot-only solution:
OP does not show example data, but what I understood from the description that the part of the data which should be plotted contains an identical text in column 3.
Note that row index is zero-based in gnuplot. The construct '+' u ... every ::0::0 is just one way of many ways to plot a single data point, and here it is just used for the title which has been extracted in the preceding plot command.
By the way, if OP wanted to plot only data which is defined by the text in column 3 (which is not clear from the question), e.g. in the example below PartB data, there would be another way, where you even don't have to specify the row indices M and N because gnuplot could simply filter the PartB data.
Data: SO12223772.dat
1 2 PartA 11 12
2 2 PartA 21 22
3 1 PartA 31 32
4 2 PartA 41 42
5 2 PartA 51 52
6 2 PartA 61 62
7 2 PartB 71 72
8 2 PartB 81 82
9 2 PartB 91 92
10 2 PartB 101 102
11 2 PartB 111 112
12 2 PartC 121 122
13 2 PartC 131 132
14 2 PartC 141 142
Script: (works with gnuplot>=4.4.0, March 2010)
### plotting only a part of data and extracting title from a column
reset
FILE = "SO12223772.dat"
M = 6 # row index 0-based
N = 10
set style line 1 pt 7 lc rgb "blue"
set key top left
f(col1,col2) = log(column(col1)**2+column(col2)**2)
plot FILE u 1:(f(4,5)) w lp ls 1 lc rgb "grey" ti "full data", \
'' u 1:(myTitle=strcol(3),f(4,5)) every ::M::N w lp ls 1 notitle, \
'+' u 1:(1/0) every ::0::0 w lp ls 1 ti myTitle
### end of script
Result: