Gnuplot: data normalization of multiple dataset in one file - bash

Image one file with 250 datasets with varying length (2000 +-500) lines and 11 columns. Here a comprehensive small example:
file.sum:
0.00000e+00 9.51287e-09
1.15418e-04 8.51287e-09
4.16445e-04 7.51287e-09
8.53721e-04 6.51287e-09
1.42697e-03 5.51287e-09
1.70302e-03 4.51287e-09
2.27189e-03 3.51287e-09
2.54732e-03 1.51287e-09
3.11304e-03 0.51287e-09
0.00000e+00 13.28378e-09
1.15418e-04 12.28378e-09
3.19663e-04 11.28378e-09
5.78178e-04 10.28378e-09
8.67479e-04 09.28378e-09
1.20883e-03 08.28378e-09
1.58817e-03 07.28378e-09
1.75840e-03 06.28378e-09
2.21069e-03 05.28378e-09
I wanted to display every 10 datasets and normalize it to the first element. The first value to normalize is 9.51287e-09 and the second would be 13.28378e-09. Of course with this massive dataset, I can not do it manually or even split the file.
So far I got every ten'th dataset but with the normalization, I do have my problems.
#!/usr/bin/gnuplot
reset
set xrange [0:0.1]
plot for [val=1:250:10] 'file.sum' i val u 1:11 w l
Working of this example:
plot.gp:
#!/usr/bin/gnuplot
reset
set xrange [0:0.01]
plot for [val=1:2:1] 'file.sum' i val u 1:2 w l
Some hints I found in:
Gnuplot: data normalization
I guess you can write a awk script to handle this, but there may be a more gnuplot friendlier way. Any suggestions are appreciated.

Assuming you have one file with data sections each separated by two or more empty lines you can use the script below.
In gnuplot console check help pseudocolumns. column(-2) tells you in which block you are and column(0) tells you wich line of this block you are (counting starts from 0).
Define a function Normalized(n) which does the following: if you are in the first line of a subblock put the value of column(n) into the variable y0. All values of this block will now be divided by y0. Also check help ternary.
In case you want a legend for the blocks you can plot a dummy plot, actually plotting NaN (i.e. nothing) but place an entry for the key.
Code:
### normalize each block by its first value
reset session
set colorsequence classic
$Data <<EOD
0.00000e+00 9.51287e-09
1.15418e-04 8.51287e-09
4.16445e-04 7.51287e-09
8.53721e-04 6.51287e-09
1.42697e-03 5.51287e-09
1.70302e-03 4.51287e-09
2.27189e-03 3.51287e-09
2.54732e-03 1.51287e-09
3.11304e-03 0.51287e-09
0.00000e+00 13.28378e-09
1.15418e-04 12.28378e-09
3.19663e-04 11.28378e-09
5.78178e-04 10.28378e-09
8.67479e-04 09.28378e-09
1.20883e-03 08.28378e-09
1.58817e-03 07.28378e-09
1.75840e-03 06.28378e-09
2.21069e-03 05.28378e-09
EOD
Normalized(n) = column(n)/(column(0)==0 ? y0=column(n) : y0)
plot $Data u 1:(Normalized(2)):(myBlocks=column(-2)+1) w lp pt 7 lc var notitle, \
for [i=0:myBlocks-1] '' u 1:(NaN) w lp pt 7 lc i+1 ti sprintf("Block %d",i)
### end of code
Result:

Related

Histogram of occurrences from different datafiles

The results of my program simulations are several datafiles with the first column indicate success (=0) or error (=1) and the second column is the simulation time in seconds.
An example of these two columns is:
1 185.48736852299064
1 199.44533672989186
1 207.35654106612733
1 213.5214031236177
1 215.50576147950017
0 219.62444310777695
0 222.26750248416354
0 236.1402270910635
1 238.5124609287994
0 246.4538392581228
. .
. .
. .
1 307.482605596962
1 329.16494123373445
0 329.6454558227778
1 330.52804695995303
0 332.0673690346546
0 358.3001385706268
0 359.82271742496414
1 400.8162129871805
0 404.88783391725985
1 411.27012219170393
I can make a frequency plot (histogram) of the errors (1's) binning the data.
set encoding iso_8859_1
set key left top
set ylabel "P_{error}"
set xlabel "Time [s]"
set size 1.4, 1.2
set terminal postscript eps enhanced color "Helvetica" 16
set grid ytics
set key spacing 1.5
set style fill transparent solid 0.3
`grep '^ 1' lookup-ratio-50-0.0034-50-7-20-10-3-1.txt | awk '{print $2}' > t7.dat`
stats 't7.dat' u 1
set output "t7.eps"
binwidth=2000
bin(x,width)=width*floor(x/width)
plot 't7.dat' using (bin($1,binwidth)):(1.0/STATS_records) smooth freq with boxes lc rgb "midnight-blue" title "7x7_P_error"
The result
I want to improve the Gnuplot above to and include the rest of datafiles lookup-.....-.txt and their error samples, and join them in the same frequency plot.
I would like also avoiding the use of intermediate files like t7.dat.
Besides, I would like to plot a horizontal line of the mean of the error probability.
How could I plot all the sample data in the same plot?
Regards
If I understand you correctly, you want to do the histogram over several files. So, you basically have to concatenate several datafiles.
Of course, you can do this with some external programs like awk, etc. or shell commands.
Below is a possible solution for gnuplot and a system command and no need for a temporary file. The system command is for Windows, but you probably can easily translate this to Linux. And maybe you need to check whether the "NaN" values do not messup your binning and histogram results.
### start code
reset session
# create some dummy data files
do for [i=1:5] {
set table sprintf("lookup-blahblah_%d.txt", i)
set samples 50
plot '+' u (int(rand(0)+0.5)):(rand(0)*0.9+0.1) w table
unset table
}
# end creating dummy data files
FILELIST = system("dir /B lookup*.txt") # this is for Windows
print FILELIST
undefine $AllDataWithError
set table $AllDataWithError append
do for [i=1:words(FILELIST)] {
plot word(FILELIST,i) u ($1==1? $1 : NaN):($1==1? $2 : NaN) w table
}
unset table
print $AllDataWithError
# ... do your binning and plotting
### end of code
Edit:
Apparently, NaN and/or empty lines seem to mess up smooth freq and/or binning?!
So, we need to extract only the lines with errors (=1).
From the above code you can merge several files into one datablock.
The code below already starts with one datablock similar to your data.
### start of code
reset session
# create some dummy datablock with some distribution (with no negative values)
Height =3000
Pos = 6000
set table $Data
set samples 1000
plot '+' u (int(rand(0)+0.3)):(abs(invnorm(rand(0))*Height+Pos)) w table
unset table
# end creating dummy data
stats $Data nooutput
Datapoints = STATS_records
# get only the error lines
# plot $Data into the table $Dummy.
# If $1==1 (=Error) write the line number $0 into column 1 and value into column 2
# else write NaN into column 1 and column 2.
# Since $0 is the line number which is unique
# 'smooth frequency' will keep these lines "as is"
# but change the NaN lines to empty lines.
Error = 1
Success = 0
set table $Dummy
plot $Data u ($1==Error ? $0 : NaN):($1==Error ? $2 : NaN) smooth freq
unset table
# get rid of empty lines in $Dummy
# Since empty lines seem to also mess up binning you need to remove them
# by writing $Dummy into the dataset $Error via "plot ... with table".
set table $Error
plot $Dummy u 1:2 with table
unset table
bin(x) = binwidth*floor(x/binwidth)
stats $Error nooutput
ErrorCount = STATS_records
set multiplot layout 3,1
set key outside
set label 1 sprintf("Datapoints: %g\nSuccess: %g\nError: %g",\
Datapoints, Datapoints-ErrorCount,ErrorCount) at graph 1.02, first 0
plot $Data u 0:($1 == Success ? $2 : NaN) w impulses lc rgb "web-green" t "Success",\
$Data u 0:($1 == Error ? -$2 : NaN) w impulses lc rgb "red" t "Error",\
unset label 1
set key inside
binwidth = 1000
plot $Error using (bin($2)):(1.0/STATS_records) smooth freq with boxes t sprintf("binwidth: %d",binwidth) lc rgb "blue"
binwidth=100
set xrange[GPVAL_X_MIN:GPVAL_X_MAX] # use same xrange as graph before
plot $Error using (bin($2)):(1.0/STATS_records) smooth freq with boxes t sprintf("binwidth: %d",binwidth) lc rgb "magenta"
unset multiplot
### end of code
which results in something like:
you can pipe the data and plot directives to gnuplot without a temp file,
for example
$ awk 'BEGIN{print "plot \"-\" using ($1):($2)";
while(i++<20) print i,rand()*20; print "e"}' | gnuplot -p
will create a random plot. You can print the directive in the BEGIN block as I did and the main awk statement can filter the data.
For your plot, something like this
$ awk 'BEGIN{print "...." }
$1==1{print $2}
END {print "e"}' lookup-*.txt | gnuplot -p

How to make animated gif in Gnuplot 5

Basically, I have solved the heat equation for (x,y,t) and I want to show the variation of the temperature function with time.The program was written in Fortran 90 and the solution data was stored in a file diffeqn3D_file.txt.
This is the program:
Program diffeqn3D
Implicit none
Integer:: b,c,d,l,i,j,k,x,y,t
Real:: a,r,s,h,t1,k1,u,v,tt,p
Real,Dimension(0:500,0:500,0:500):: f1 !f=f(x,t)
!t1=time step and h=position step along x and
!k=position step along y and a=conductivity
open(7, file='diffeqn3D_file.txt', status='unknown')
a=0.024
t1=0.1
h=0.1
k1=0.1
r=(h**2)/(k1**2)
s=(h**2)/(a*t1)
l=10
tt=80.5
b=100
c=100
d=100
!The temperature is TT at x=0 and 0 at x=l.
!The rod is heated along the line x=0.
!Initial conditions to be changed as per problem..
Do x=0,b
Do y=0,c
Do t=0,d
If(x==0) Then
f1(x,y,t)=tt
Else If((x.ne.0).and.t==0) Then
f1(x,y,t)=0
End If
End Do
End Do
End Do
print *,f1(9,7,5)
print *,r
print *,a,h,t1,h**2,a*t1,(h**2)/(a*t1)
print *,f1(0,1,1)
print *,f1(3,1,1)
!num_soln_of_eqnwrite(7,*)
Do t=1,d
Do y=1,c-1
Do x=1,b-1
p=f1(x-1,y,t-1)+f1(x+1,y,t-1)+r*f1(x,y-1,t-1)+r*f1(x,y+1,t-1)-(2+2*r-s)*f1(x,y,t-1)
f1(x,y,t)=p/s
!f1(x,t)=0.5*(f1(x-1,t-1)+f1(x+1,t-1))
!print *,f1(x,t),b
End Do
End Do
End Do
Do i=0,d
Do k=0,b
Do j=0,c
u=k*h
v=j*k1
write(7,*) u,v,f1(k,j,i)
End Do
End Do
write(7,*) " "
write(7,*) " "
End Do
close(7)
End Program diffeqn3D
And after compilation and run, I enter the following code in gnuplot but it does not run, rather it hangs up or creates a gif picture, not animation.
set terminal gif animate delay 1
set output 'diffeqn3D.gif'
stats 'diffeqn3D_file.txt' nooutput
do for [i=1:int(STATS_blocks)] {
splot 'diffeqn3D_file.txt'
}
Sometimes it also puts up a warning message, citing no z-values for autoscale range.
What is wrong with my code and how should I proceed?
First, try to add some print commands for "debug" information:
set terminal gif animate delay 1
set output 'diffeqn3D.gif'
stats 'diffeqn3D_file.txt' nooutput
print int(STATS_blocks)
do for [i=1:int(STATS_blocks)] {
print i
splot 'diffeqn3D_file.txt'
}
Second, what happens?
The splot command does not have an index specifier, try to use:
splot 'diffeqn3D_file.txt' index i
Without the index i gnuplots always plots the whole file which has two consequences:
The data file is quite large. Plotting takes quite a long time and it seems that gnuplot hangs.
Gnuplot plots always the same data, there are no changes which show up in an animation.
Now gnuplot runs much faster and we will fix the autoscale error. Again, there are two points:
The index specifies a data set within the data file. The stats command counts those sets which "are separated by pairs of blank records" (from gnuplot documentation). Your data file ends with a pair of blank records - this starts a new data set in gnuplot. But this data set is empty which finally leads to the error. There are only STATS_blocks-1 data sets.
The index is zero based. The loop should start with 0 and end at STATS_blocks-2.
So we arrive at this plot command:
do for [i=0:int(STATS_blocks)-2] {
print i
splot 'diffeqn3D_file.txt' index i
}

How to calculate number of missing values summed over time dimension in a netcdf file in bash

I have a netcdf file with data as a function of lon,lat and time. I would like to calculate the total number of missing entries in each grid cell summed over the time dimension, preferably with CDO or NCO so I do not need to invoke R, python etc.
I know how to get the total number of missing values
ncap2 -s "nmiss=var.number_miss()" in.nc out.nc
as I answered to this related question:
count number of missing values in netcdf file - R
and CDO can tell me the total summed over space with
cdo info in.nc
but I can't work out how to sum over time. Is there a way for example of specifying the dimension to sum over with number_miss in ncap2?
We added the missing() function to ncap2 to solve this problem elegantly as of NCO 4.6.7 (May, 2017). To count missing values through time:
ncap2 -s 'mss_val=three_dmn_var_dbl.missing().ttl($time)' in.nc out.nc
Here ncap2 chains two methods together, missing(), followed by a total over the time dimension. The 2D variable mss_val is in out.nc. The response below does the same but averages over space and reports through time (because I misinterpreted the OP).
Old/obsolete answer:
There are two ways to do this with NCO/ncap2, though neither is as elegant as I would like. Either call assemble the answer one record at a time by calling num_miss() with one record at a time, or (my preference) use the boolean comparison function followed by the total operator along the axes of choice:
zender#aerosol:~$ ncap2 -O -s 'tmp=three_dmn_var_dbl;mss_val=tmp.get_miss();tmp.delete_miss();tmp_bool=(tmp==mss_val);tmp_bool_ttl=tmp_bool.ttl($lon,$lat);print(tmp_bool_ttl);' ~/nco/data/in.nc ~/foo.nc
tmp_bool_ttl[0]=0
tmp_bool_ttl[1]=0
tmp_bool_ttl[2]=0
tmp_bool_ttl[3]=8
tmp_bool_ttl[4]=0
tmp_bool_ttl[5]=0
tmp_bool_ttl[6]=0
tmp_bool_ttl[7]=1
tmp_bool_ttl[8]=0
tmp_bool_ttl[9]=2
or
zender#aerosol:~$ ncap2 -O -s 'for(rec=0;rec<time.size();rec++){nmiss=three_dmn_var_int(rec,:,:).number_miss();print(nmiss);}' ~/nco/data/in.nc ~/foo.nc
nmiss = 0
nmiss = 0
nmiss = 8
nmiss = 0
nmiss = 0
nmiss = 1
nmiss = 0
nmiss = 2
nmiss = 1
nmiss = 2
Even though you are asking for another solution, I would like to show you that it takes only one very short line to find the answer with the help of Python. The variable m_data has exactly the same shape as a variable with missing values read using the netCDF4 package. With the execution of only one np.sum command with the correct axis specified, you have your answer.
import numpy as np
import matplotlib.pyplot as plt
import netCDF4 as nc4
# Generate random data for this experiment.
data = np.random.rand(365, 64, 128)
# Masked data, this is how the data is read from NetCDF by the netCDF4 package.
# For this example, I mask all values less than 0.1.
m_data = np.ma.masked_array(data, mask=data<0.1)
# It only takes one operation to find the answer.
n_values_missing = np.sum(m_data.mask, axis=0)
# Just a plot of the result.
plt.figure()
plt.pcolormesh(n_values_missing)
plt.colorbar()
plt.xlabel('lon')
plt.ylabel('lat')
plt.show()
# Save a netCDF file of the results.
f = nc4.Dataset('test.nc', 'w', format='NETCDF4')
f.createDimension('lon', 128)
f.createDimension('lat', 64 )
n_values_missing_nc = f.createVariable('n_values_missing', 'i4', ('lat', 'lon'))
n_values_missing_nc[:,:] = n_values_missing[:,:]
f.close()

Fit many pieces of data in Gnuplot's for loop

Data
Model Decreasing-err Constant-err Increasing-err
2025 73-78 80-85 87-92
2035 63-68 80-85 97-107
2050 42-57 75-90 104.5-119.5
which data-structure (use of -err) described here.
To plot the points, I run
set terminal qt size 560,270;
set grid; set offset 1,1,0,0;
set datafile separator " -";
set key autotitle columnhead;
plot for [i=2:6:2] "data.dat" using 1:(0.5*(column(i)+column(i+1))):(0.5*(column(i+1)-column(i))) with yerror;
and get
However, I would like to add a line fits to these points which you cannot do just with with yerrorlines because of kinks.
My pseudocode for fitting the increasing and decreasing lines
inc(x) = k1*x + k2;
con(x) = n1*x + n2;
dec(x) = m1*x + m2;
fit inc(x), con(x) dec(x) for [i=2:6:2] "data.dat"
using 1:(0.5*(column(i)+column(i+1))):(0.5*(column(i+1)-column(i)))
via k1,k2,n1,n2,m1,m2;
where the problem is in using the function fit with for loop.
How can you use Gnuplot's fit in a for loop?
I would like to fit many lines at the same time to the data.
I would use do in conjunction with eval to do this:
# Define your functions (you can also use do + eval to do this)
f1(x) = a1*x+b1
f2(x) = a2*x+b2
f3(x) = a3*x+b3
# Loop
do for [i=1:3] {
eval sprintf("fit f%g(x) 'data.dat' u 0:%g via a%g, b%g", i, i, i, i)
}
You can adapt the above to your own purposes.

Plot 3d scattered data using gnuplot

Hi I just got some 3d scattered data (the data name is just data.txt) which look like the following:
0 0 0
-1.08051e-16 -1.73991e-16 -1.79157e-16
-1.02169e-15 -1.19283e-15 5.92632e-16
3.41114e-16 -1.02211e-15 3.19436e-15
-4.51742e-15 -5.18861e-15 -4.60754e-15
-2.00685e-15 -4.67813e-15 -4.86101e-15
-9.82727e-16 -2.24413e-15 -5.87927e-16
-7.74439e-16 -9.73515e-16 -1.69707e-15
4.32668e-16 2.15869e-15 -2.25004e-15
-3.74495e-15 -2.20596e-15 -7.33201e-16
-4.97941e-16 -5.45749e-16 -2.93136e-15
-2.40174e-15 -4.31022e-15 7.13531e-15
-4.58812e-15 -4.38568e-15 -9.99635e-16
-7.00716e-15 7.53852e-15 -8.484e-15
4.50028e-15 2.2255e-15 2.32808e-15
-8.57887e-15 3.09127e-15 -3.49207e-15
-2.0608e-16 -6.06078e-15 -6.07822e-16
-7.76829e-15 -1.47001e-14 -1.08924e-14
1.04016e-15 6.33122e-16 -2.11985e-15
2.33557e-15 -7.92667e-15 2.52748e-15
6.94335e-15 3.70286e-15 -1.44815e-15
.........
the 1st,2nd and 3rd column represent x,y and z axis, respectively.
I'd like to use splot command to plot these data. Can anyone kindly give some suggestions? Thanks.
Since your data is nicely formatted, you could start with
splot 'data.txt'
If you want to get fancy, you can add some options to change how it is plotted:
splot 'data.txt' with points pointtype 7
What kind of suggestions are you looking for?

Resources