Histogram of occurrences from different datafiles - shell

The results of my program simulations are several datafiles with the first column indicate success (=0) or error (=1) and the second column is the simulation time in seconds.
An example of these two columns is:
1 185.48736852299064
1 199.44533672989186
1 207.35654106612733
1 213.5214031236177
1 215.50576147950017
0 219.62444310777695
0 222.26750248416354
0 236.1402270910635
1 238.5124609287994
0 246.4538392581228
. .
. .
. .
1 307.482605596962
1 329.16494123373445
0 329.6454558227778
1 330.52804695995303
0 332.0673690346546
0 358.3001385706268
0 359.82271742496414
1 400.8162129871805
0 404.88783391725985
1 411.27012219170393
I can make a frequency plot (histogram) of the errors (1's) binning the data.
set encoding iso_8859_1
set key left top
set ylabel "P_{error}"
set xlabel "Time [s]"
set size 1.4, 1.2
set terminal postscript eps enhanced color "Helvetica" 16
set grid ytics
set key spacing 1.5
set style fill transparent solid 0.3
`grep '^ 1' lookup-ratio-50-0.0034-50-7-20-10-3-1.txt | awk '{print $2}' > t7.dat`
stats 't7.dat' u 1
set output "t7.eps"
binwidth=2000
bin(x,width)=width*floor(x/width)
plot 't7.dat' using (bin($1,binwidth)):(1.0/STATS_records) smooth freq with boxes lc rgb "midnight-blue" title "7x7_P_error"
The result
I want to improve the Gnuplot above to and include the rest of datafiles lookup-.....-.txt and their error samples, and join them in the same frequency plot.
I would like also avoiding the use of intermediate files like t7.dat.
Besides, I would like to plot a horizontal line of the mean of the error probability.
How could I plot all the sample data in the same plot?
Regards

If I understand you correctly, you want to do the histogram over several files. So, you basically have to concatenate several datafiles.
Of course, you can do this with some external programs like awk, etc. or shell commands.
Below is a possible solution for gnuplot and a system command and no need for a temporary file. The system command is for Windows, but you probably can easily translate this to Linux. And maybe you need to check whether the "NaN" values do not messup your binning and histogram results.
### start code
reset session
# create some dummy data files
do for [i=1:5] {
set table sprintf("lookup-blahblah_%d.txt", i)
set samples 50
plot '+' u (int(rand(0)+0.5)):(rand(0)*0.9+0.1) w table
unset table
}
# end creating dummy data files
FILELIST = system("dir /B lookup*.txt") # this is for Windows
print FILELIST
undefine $AllDataWithError
set table $AllDataWithError append
do for [i=1:words(FILELIST)] {
plot word(FILELIST,i) u ($1==1? $1 : NaN):($1==1? $2 : NaN) w table
}
unset table
print $AllDataWithError
# ... do your binning and plotting
### end of code
Edit:
Apparently, NaN and/or empty lines seem to mess up smooth freq and/or binning?!
So, we need to extract only the lines with errors (=1).
From the above code you can merge several files into one datablock.
The code below already starts with one datablock similar to your data.
### start of code
reset session
# create some dummy datablock with some distribution (with no negative values)
Height =3000
Pos = 6000
set table $Data
set samples 1000
plot '+' u (int(rand(0)+0.3)):(abs(invnorm(rand(0))*Height+Pos)) w table
unset table
# end creating dummy data
stats $Data nooutput
Datapoints = STATS_records
# get only the error lines
# plot $Data into the table $Dummy.
# If $1==1 (=Error) write the line number $0 into column 1 and value into column 2
# else write NaN into column 1 and column 2.
# Since $0 is the line number which is unique
# 'smooth frequency' will keep these lines "as is"
# but change the NaN lines to empty lines.
Error = 1
Success = 0
set table $Dummy
plot $Data u ($1==Error ? $0 : NaN):($1==Error ? $2 : NaN) smooth freq
unset table
# get rid of empty lines in $Dummy
# Since empty lines seem to also mess up binning you need to remove them
# by writing $Dummy into the dataset $Error via "plot ... with table".
set table $Error
plot $Dummy u 1:2 with table
unset table
bin(x) = binwidth*floor(x/binwidth)
stats $Error nooutput
ErrorCount = STATS_records
set multiplot layout 3,1
set key outside
set label 1 sprintf("Datapoints: %g\nSuccess: %g\nError: %g",\
Datapoints, Datapoints-ErrorCount,ErrorCount) at graph 1.02, first 0
plot $Data u 0:($1 == Success ? $2 : NaN) w impulses lc rgb "web-green" t "Success",\
$Data u 0:($1 == Error ? -$2 : NaN) w impulses lc rgb "red" t "Error",\
unset label 1
set key inside
binwidth = 1000
plot $Error using (bin($2)):(1.0/STATS_records) smooth freq with boxes t sprintf("binwidth: %d",binwidth) lc rgb "blue"
binwidth=100
set xrange[GPVAL_X_MIN:GPVAL_X_MAX] # use same xrange as graph before
plot $Error using (bin($2)):(1.0/STATS_records) smooth freq with boxes t sprintf("binwidth: %d",binwidth) lc rgb "magenta"
unset multiplot
### end of code
which results in something like:

you can pipe the data and plot directives to gnuplot without a temp file,
for example
$ awk 'BEGIN{print "plot \"-\" using ($1):($2)";
while(i++<20) print i,rand()*20; print "e"}' | gnuplot -p
will create a random plot. You can print the directive in the BEGIN block as I did and the main awk statement can filter the data.
For your plot, something like this
$ awk 'BEGIN{print "...." }
$1==1{print $2}
END {print "e"}' lookup-*.txt | gnuplot -p

Related

Get line number where first occurrence of a value appears?

I have a CSV file like below:
E Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Mean
1 0.7019 0.6734 0.6599 0.6511 0.701 0.6977 0.680833333
2 0.6421 0.6478 0.6095 0.608 0.6525 0.6285 0.6314
3 0.6039 0.6096 0.563 0.5539 0.6218 0.5716 0.5873
4 0.5564 0.5545 0.5138 0.4962 0.5781 0.5154 0.535733333
5 0.5056 0.4972 0.4704 0.4488 0.5245 0.4694 0.485983333
I'm trying to use find the row number where the final column has a value below a certain range. For example, below 0.6.
Using the above CSV file, I want to return 3 because E = 3 is the first row where Mean <= 0.60. If there is no value below 0.6 I want to return 0. I am in effect returning the value in the first column based on the final column.
I plan to initialize this number as a constant in gnuplot. How can this be done? I've tagged awk because I think it's related.
In case you want a gnuplot-only version... if you use a file remove the datablock and replace $Data by your filename in " ".
Edit: You can do it without a dummy table, it can be done shorter with stats (check help stats). Even shorter than the accepted solution (well, we are not at code golf here), but additionally platform-independent because it's gnuplot-only.
Furthermore, in case E could be any number, i.e. 0 as well, then it might be better
to first assign E = NaN and then compare E to NaN (see here: gnuplot: How to compare to NaN?).
Script:
### conditional extraction into a variable
reset session
$Data <<EOD
E Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Mean
1 0.7019 0.6734 0.6599 0.6511 0.701 0.6977 0.680833333
2 0.6421 0.6478 0.6095 0.608 0.6525 0.6285 0.6314
3 0.6039 0.6096 0.563 0.5539 0.6218 0.5716 0.5873
4 0.5564 0.5545 0.5138 0.4962 0.5781 0.5154 0.535733333
5 0.5056 0.4972 0.4704 0.4488 0.5245 0.4694 0.485983333
EOD
E = NaN
stats $Data u ($8<=0.6 && E!=E? E=$1 : 0) nooutput
print E
### end of script
Result:
3.0
Actually, OP wants to return E=0 if the condition was not met. Then the script would be like this:
E=0
stats $Data u ($8<=0.6 && E==0? E=$1 : 0) nooutput
Another awk. You could initialize the default return value to var ret in BEGIN but since it's 0 there is really no point as empty var+0 produces the same effect. If the threshold value of 0.6 is not met before the ENDis reached, that is returned. If it is met, exit invokes the END and ret is output:
$ awk '
NR>1 && $NF<0.6 { # final column has a value below a certain range
ret=$1 # I want to return 3 because E = 3
exit
}
END {
print ret+0
}' file
Output:
3
Something like this should do the trick:
awk 'NR>1 && $8<.6 {print $1;fnd=1;exit}END{if(!fnd){print 0}}' yourfile

Gnuplot: data normalization of multiple dataset in one file

Image one file with 250 datasets with varying length (2000 +-500) lines and 11 columns. Here a comprehensive small example:
file.sum:
0.00000e+00 9.51287e-09
1.15418e-04 8.51287e-09
4.16445e-04 7.51287e-09
8.53721e-04 6.51287e-09
1.42697e-03 5.51287e-09
1.70302e-03 4.51287e-09
2.27189e-03 3.51287e-09
2.54732e-03 1.51287e-09
3.11304e-03 0.51287e-09
0.00000e+00 13.28378e-09
1.15418e-04 12.28378e-09
3.19663e-04 11.28378e-09
5.78178e-04 10.28378e-09
8.67479e-04 09.28378e-09
1.20883e-03 08.28378e-09
1.58817e-03 07.28378e-09
1.75840e-03 06.28378e-09
2.21069e-03 05.28378e-09
I wanted to display every 10 datasets and normalize it to the first element. The first value to normalize is 9.51287e-09 and the second would be 13.28378e-09. Of course with this massive dataset, I can not do it manually or even split the file.
So far I got every ten'th dataset but with the normalization, I do have my problems.
#!/usr/bin/gnuplot
reset
set xrange [0:0.1]
plot for [val=1:250:10] 'file.sum' i val u 1:11 w l
Working of this example:
plot.gp:
#!/usr/bin/gnuplot
reset
set xrange [0:0.01]
plot for [val=1:2:1] 'file.sum' i val u 1:2 w l
Some hints I found in:
Gnuplot: data normalization
I guess you can write a awk script to handle this, but there may be a more gnuplot friendlier way. Any suggestions are appreciated.
Assuming you have one file with data sections each separated by two or more empty lines you can use the script below.
In gnuplot console check help pseudocolumns. column(-2) tells you in which block you are and column(0) tells you wich line of this block you are (counting starts from 0).
Define a function Normalized(n) which does the following: if you are in the first line of a subblock put the value of column(n) into the variable y0. All values of this block will now be divided by y0. Also check help ternary.
In case you want a legend for the blocks you can plot a dummy plot, actually plotting NaN (i.e. nothing) but place an entry for the key.
Code:
### normalize each block by its first value
reset session
set colorsequence classic
$Data <<EOD
0.00000e+00 9.51287e-09
1.15418e-04 8.51287e-09
4.16445e-04 7.51287e-09
8.53721e-04 6.51287e-09
1.42697e-03 5.51287e-09
1.70302e-03 4.51287e-09
2.27189e-03 3.51287e-09
2.54732e-03 1.51287e-09
3.11304e-03 0.51287e-09
0.00000e+00 13.28378e-09
1.15418e-04 12.28378e-09
3.19663e-04 11.28378e-09
5.78178e-04 10.28378e-09
8.67479e-04 09.28378e-09
1.20883e-03 08.28378e-09
1.58817e-03 07.28378e-09
1.75840e-03 06.28378e-09
2.21069e-03 05.28378e-09
EOD
Normalized(n) = column(n)/(column(0)==0 ? y0=column(n) : y0)
plot $Data u 1:(Normalized(2)):(myBlocks=column(-2)+1) w lp pt 7 lc var notitle, \
for [i=0:myBlocks-1] '' u 1:(NaN) w lp pt 7 lc i+1 ti sprintf("Block %d",i)
### end of code
Result:

How to make animated gif in Gnuplot 5

Basically, I have solved the heat equation for (x,y,t) and I want to show the variation of the temperature function with time.The program was written in Fortran 90 and the solution data was stored in a file diffeqn3D_file.txt.
This is the program:
Program diffeqn3D
Implicit none
Integer:: b,c,d,l,i,j,k,x,y,t
Real:: a,r,s,h,t1,k1,u,v,tt,p
Real,Dimension(0:500,0:500,0:500):: f1 !f=f(x,t)
!t1=time step and h=position step along x and
!k=position step along y and a=conductivity
open(7, file='diffeqn3D_file.txt', status='unknown')
a=0.024
t1=0.1
h=0.1
k1=0.1
r=(h**2)/(k1**2)
s=(h**2)/(a*t1)
l=10
tt=80.5
b=100
c=100
d=100
!The temperature is TT at x=0 and 0 at x=l.
!The rod is heated along the line x=0.
!Initial conditions to be changed as per problem..
Do x=0,b
Do y=0,c
Do t=0,d
If(x==0) Then
f1(x,y,t)=tt
Else If((x.ne.0).and.t==0) Then
f1(x,y,t)=0
End If
End Do
End Do
End Do
print *,f1(9,7,5)
print *,r
print *,a,h,t1,h**2,a*t1,(h**2)/(a*t1)
print *,f1(0,1,1)
print *,f1(3,1,1)
!num_soln_of_eqnwrite(7,*)
Do t=1,d
Do y=1,c-1
Do x=1,b-1
p=f1(x-1,y,t-1)+f1(x+1,y,t-1)+r*f1(x,y-1,t-1)+r*f1(x,y+1,t-1)-(2+2*r-s)*f1(x,y,t-1)
f1(x,y,t)=p/s
!f1(x,t)=0.5*(f1(x-1,t-1)+f1(x+1,t-1))
!print *,f1(x,t),b
End Do
End Do
End Do
Do i=0,d
Do k=0,b
Do j=0,c
u=k*h
v=j*k1
write(7,*) u,v,f1(k,j,i)
End Do
End Do
write(7,*) " "
write(7,*) " "
End Do
close(7)
End Program diffeqn3D
And after compilation and run, I enter the following code in gnuplot but it does not run, rather it hangs up or creates a gif picture, not animation.
set terminal gif animate delay 1
set output 'diffeqn3D.gif'
stats 'diffeqn3D_file.txt' nooutput
do for [i=1:int(STATS_blocks)] {
splot 'diffeqn3D_file.txt'
}
Sometimes it also puts up a warning message, citing no z-values for autoscale range.
What is wrong with my code and how should I proceed?
First, try to add some print commands for "debug" information:
set terminal gif animate delay 1
set output 'diffeqn3D.gif'
stats 'diffeqn3D_file.txt' nooutput
print int(STATS_blocks)
do for [i=1:int(STATS_blocks)] {
print i
splot 'diffeqn3D_file.txt'
}
Second, what happens?
The splot command does not have an index specifier, try to use:
splot 'diffeqn3D_file.txt' index i
Without the index i gnuplots always plots the whole file which has two consequences:
The data file is quite large. Plotting takes quite a long time and it seems that gnuplot hangs.
Gnuplot plots always the same data, there are no changes which show up in an animation.
Now gnuplot runs much faster and we will fix the autoscale error. Again, there are two points:
The index specifies a data set within the data file. The stats command counts those sets which "are separated by pairs of blank records" (from gnuplot documentation). Your data file ends with a pair of blank records - this starts a new data set in gnuplot. But this data set is empty which finally leads to the error. There are only STATS_blocks-1 data sets.
The index is zero based. The loop should start with 0 and end at STATS_blocks-2.
So we arrive at this plot command:
do for [i=0:int(STATS_blocks)-2] {
print i
splot 'diffeqn3D_file.txt' index i
}

gnuplot: set columnheader as label

Is there a chance to set the header of the data file columns as label (not as key)?
I have data files with 5 or 6 columns and a header above each column. Now I would like to use the columnheader with the set label command. Is this possible?
On a unixoid system, the head command helps:
header = system("head -n 1 ".filename)
label1 = word(header,1)
label2 = word(header,2)
...
set label 1 at 0.5,0.5 label1
set label 2 ....
MS win does not have the head command, you might use 'findstr /B \"#\"' instead, if the header line begins with a "#". Or use cygwin to get a full GNU + POSIX environment under Windows.
The word() function should split your header string at the same positions as columnhead(). Unless of course you have a different separator (not space or tab):
separator =","
p1 = strstrt(header,separator)
p2 = strstrt(header[p1+1:],separator)
...
label1=header[1:p1-1]
...

Fit many pieces of data in Gnuplot's for loop

Data
Model Decreasing-err Constant-err Increasing-err
2025 73-78 80-85 87-92
2035 63-68 80-85 97-107
2050 42-57 75-90 104.5-119.5
which data-structure (use of -err) described here.
To plot the points, I run
set terminal qt size 560,270;
set grid; set offset 1,1,0,0;
set datafile separator " -";
set key autotitle columnhead;
plot for [i=2:6:2] "data.dat" using 1:(0.5*(column(i)+column(i+1))):(0.5*(column(i+1)-column(i))) with yerror;
and get
However, I would like to add a line fits to these points which you cannot do just with with yerrorlines because of kinks.
My pseudocode for fitting the increasing and decreasing lines
inc(x) = k1*x + k2;
con(x) = n1*x + n2;
dec(x) = m1*x + m2;
fit inc(x), con(x) dec(x) for [i=2:6:2] "data.dat"
using 1:(0.5*(column(i)+column(i+1))):(0.5*(column(i+1)-column(i)))
via k1,k2,n1,n2,m1,m2;
where the problem is in using the function fit with for loop.
How can you use Gnuplot's fit in a for loop?
I would like to fit many lines at the same time to the data.
I would use do in conjunction with eval to do this:
# Define your functions (you can also use do + eval to do this)
f1(x) = a1*x+b1
f2(x) = a2*x+b2
f3(x) = a3*x+b3
# Loop
do for [i=1:3] {
eval sprintf("fit f%g(x) 'data.dat' u 0:%g via a%g, b%g", i, i, i, i)
}
You can adapt the above to your own purposes.

Resources