For sorting a file by column, Linux users have the utility sort.
Windows users have to install, e.g. CoreUtils from GnuWin to get the same (or similar) functionality.
So, the minimal code for sorting the file first by column1 and then by column2, and then plotting the file would be something like this:
plot '<sort -k 1,2 "myFile.dat"' u 1:2
Now, however, I have a datablock $Data:
$Data <<EOD
1 6
4 8
3 7
2 5
1 4
2 3
EOD
The commands I tried so far which all end up in error messages:
plot '<sort -k 1,2' $Data u 1:2
#--> Bad data on line 1 of file <sort -k 1,2
plot '<sort -k 1,2 $Data' u 1:2
plot '<sort -k 1,2 <$Data' u 1:2
#--> warning: Skipping data file with no valid points
#--> x range is invalid
plot '<sort -k 1,2 '<$Data u 1:2
#--> Column number or datablock line expected
I don't want to write the datablock to a file first and read it again from file. I currently don't see how I would redirect the content of $Data to the standard input of sort. Is there any solution for Windows as well as for Linux?
Update:
When using #Ethan's suggested code, I get the following result. Mind the lines 2 5 and 2 3 which I expected (and Ethan has it) the other way round.
# Curve 0 of 1, 6 points
# Curve title: "$Data_1 using 1:2:1"
# x y type
1 4 i
1 6 i
2 5 i
2 3 i
3 7 i
4 8 i
Any idea why this is? I'm running gnuplot 5.4.1 on Win10.
It wouldn't be gnuplot if there wasn't a workaround. Well, a bit cumbersome but it seems to work.
Apparently, smooth zsort sorts each subblock separately. Hence, after the first sort you "simply" need to split your data into sub-blocks whenever the value in the first column changes.
sort by the first column
insert an empty line before the value in the first column changes
sort by the second column
plot it into a table to remove empty lines again
Code: (edit: with graphical representation it's easier to illustrate the undesired behaviour of zsort (under Windows only))
### sorting datablock, "bug": Windows zsort does not preserve order
reset session
# create some random test data
set print $Data
do for [i=1:100] {
print sprintf("%g %g", int(rand(0)*10), int(rand(0)*10))
}
set print
# order not preserving (only under Windows)
set table $Data1
plot $Data u 1:2:2 smooth zsort
set table $Data2
plot $Data1 u 1:2:1 smooth zsort
unset table
# order preserving (even under Windows, but cumbersome)
set table $Data3
plot $Data u 1:2:1 smooth zsort
unset table
set print $Data4
do for [i=1:|$Data3|] {
print $Data3[i]
if (i<|$Data3|) { if (word($Data3[i],1) ne word($Data3[i+1],1)) { print "" } }
}
set print
set table $Data5
plot $Data4 u 1:2:2 smooth zsort
set table $Data6
plot $Data5 u 1:2 w table
unset table
set key out
set rmargin 20
set multiplot layout 3,1
plot $Data w lp pt 7 lc "black" ti "Random"
plot $Data2 w lp pt 7 lc "red" ti "zsort"
plot $Data6 w lp pt 7 lc "web-green" ti "Workaround"
unset multiplot
### end of code
Result:
I am going to back up and suggest that you reconsider your initial restriction against using a temporary file. The most straightforward solution is this:
set print "| sort -k 1,2 > sorted.dat"
print $Data
unset print
plot 'sorted.dat'
If you explain why you don't want to use a temp file, maybe there is an answer to that question independent of the sorting issue.
If the concern is the name of the temp file, then perhaps something like this:
tempfile = system("mktemp")
set print "| sort -k 1,2 > ".tempfile
print $Data
unset print
plot tempfile with points
The syntax for sending $Data to stdin of the sort utility is set print "| sort"; print $Data. But that won't do what you want. Instead let's perform the double sort inside gnuplot.
$Data <<EOD
1 6
4 8
3 7
2 5
1 4
2 3
EOD
set table $Data_1
plot $Data using 1:2:2 smooth zsort with points
set table $Data_2
plot $Data_1 using 1:2:1 smooth zsort with points
unset table
print $Data_2
# Curve 0 of 1, 6 points
# Curve title: "$Data_1 using 1:2:1"
# x y type
1 4 i
1 6 i
2 3 i
2 5 i
3 7 i
4 8 i
Related
Hej!
My automatized script is running through many month. I'd like to have the monthly plots to show always the window [1:31], actually YYYY-MM-01 to YYYY-MM-31. For all month my script is handling. (And I don't care that some month have no data at the ends, e.g., February.)
Unfortunately, I don't know how to provide the range to gnuplot correctly. Since data is time, sth like [1:31] doesn't work of course. But neither can I explicitly state the full date in my script since next month will be different again.
Any ideas?
Thanks!
EDIT: I edited the original post to include some more information. Sorry for not including it right away.
DATA: My data basically looks like this (with many more lines in between). I have a separate datafile per month, starting on the 1st, 0 AM, towards the 31st, 12 PM:
2022-07-01,00:00:16,27.3,3,28.0,9.0,995.6
2022-07-01,00:05:16,27.3,3,28.0,9.0,995.5
2022-07-01,00:10:16,27.3,3,28.0,9.0,995.4
2022-07-01,00:15:16,27.3,3,28.0,9.0,995.3
2022-07-01,00:20:16,27.3,3,28.0,9.0,995.3
2022-07-01,00:25:16,27.3,3,28.0,9.0,995.3
...
2022-07-31,23:54:16,27.1,3,27.9,12.1,994.9
2022-07-31,23:55:16,27.1,3,27.9,12.1,994.9
2022-07-31,23:56:16,27.1,3,27.9,12.1,995.1
2022-07-31,23:57:16,27.1,3,27.9,11.9,995.0
2022-07-31,23:58:16,27.1,3,27.9,11.9,995.0
EXAMPLE IMAGE: An example (for the month of July) is attached. As I'm not allowed to have embedded images, see the link:
CODE EXAMPLE: The following code is used to modify the x-axis. A very basic attempt (set xrange ["01":"31"]) is included to set the range between 1 and 31. But since xdata is not integers, this is a failed attempt of course.
set xdata time
set timefmt '%Y-%m-%d,%H:%M:%S'
set format x "%d"
#set xrange ["01":"31"]
set xtics 21600*4*7
set mxtics 7
set grid xtics
set grid mxtics
Of course, the xdata changes from month to month. But I'd like to have the day of the month only, not the full date. And, like I said, fixed to 1-31 for easier comparison between months.
It's not fully clear to me how you want specify the month which you want to plot. Maybe we get a step further with the following example.
Check help tm_year, help tm_mon, help strftime, help strptime, help time_specifiers. Note, that tm_mon() will return a month index from 0 to 11, not from 1 to 12.
The script sets the range from the first of a given month to the first of the next month.
If you give more information about your actual task, we could also put the plots into a loop to loop several months.
Script:
### select only a month from time data
reset session
myTimeFmt = "%Y-%m-%d"
# create some random test data
set table $Data
set samples 365
d0 = strptime(myTimeFmt,"2022-01-01")
plot '+' u (strftime("%Y-%m-%d",d0+$0*24*3600)):(rand(0)*10) w table
unset table
t0(m) = strptime("%Y-%m",m)
t1(m) = (t=strptime("%Y-%m",m),strptime("%Y-%m",sprintf("%04d-%02d",tm_year(t),tm_mon(t)+2)))
monthName(m) = strftime("%B %Y",strptime("%Y-%m",m))
set format x "%1d" timedate
set grid x,y
set xtic 24*3600
set key noautotitle
set multiplot layout 3,1
myMonth = "2022-01"
set xrange[t0(myMonth):t1(myMonth)]
set title monthName(myMonth)
plot $Data u (timecolumn(1,myTimeFmt)):2 w l
myMonth = "2022-02"
set xrange[t0(myMonth):t1(myMonth)]
set title monthName(myMonth)
plot $Data u (timecolumn(1,myTimeFmt)):2 w l
myMonth = "2022-09"
set xrange[t0(myMonth):t1(myMonth)]
set title monthName(myMonth)
plot $Data u (timecolumn(1,myTimeFmt)):2 w l
unset multiplot
### end of script
Result:
Addition:
Now, that you showed your data, I noticed that your date and your time are in different columns. That's why it's important to always add data or describe the format in detail.
Here is a suggestion where the graphs always span 31 days, but the xtics and labels are suppressed for the shorter months. Admittedly, it's not straightforward and maybe not so easy to understand. So, you can try to improve your gnuplot skills.
Script:
### select only a month from time data
reset session
myMonth = "2022-02"
# create some random test data
set table $Data separator comma
t0(m) = strptime("%Y-%m",m)
t1(m) = (t=strptime("%Y-%m",m),strptime("%Y-%m",sprintf("%04d-%02d",tm_year(t),tm_mon(t)+2)))
set xrange[t0(myMonth):t1(myMonth)-1]
set samples 100
plot '+' u (strftime("%Y-%m-%d",$1)):(strftime("%H:%M:%S",$1)):(rand(0)+18):(rand(0)+20):(rand(0)+22) w table
unset table
myDay(col) = tm_mday(timecolumn(1,"%Y-%m-%d"))
myTime(col) = timecolumn(2,"%H:%M:%S")/24./3600.
monthName(m) = strftime("%B %Y",strptime("%Y-%m",m))
set datafile separator comma
set ytic 1
set grid x,y
set key noautotitle
set title monthName(myMonth)
set xrange[1:32]
plot for [col=3:5] $Data u (d0=myDay(1), d0+myTime(2)):col w l lc col-2, \
'+' u (t0=$0+1):(0):xtic(t0==d0+1?'':sprintf("%d",t0)) every ::::d0 w p ps 0 noautoscale
### end of script
Result: (for January, February and September)
#theozh
I had a look as your code, and, actually, the part concerning my original question was not as complicated as I thought. I am using the following code:
t0(m) = strptime("%Y-%m",m)
#t1(m) = (t=strptime("%Y-%m",m),strptime("%Y-%m",sprintf("%04d-%02d",tm_year(t),tm_mon(t)+2)))
t1(m) = t0(m)+31*24*3600
set xrange[t0(myMonth):t1(myMonth)]
The results are as follows for July and September:
The only thing that "remains" would be to set all months in the range 1-31, but that's cosmetics and not really necessary. It works pretty well as it is - and it fits my script so that the automation is working as well!
NICE!
I would like to plot multiple curve on the same graph using a for loop. Each data file (named stat_coupe) is located in a different folder (fwal055wal055/rep16/ and fwal055wal055_c2/rep20/). fwal055wal055 and fwal055wal055_c2 correspond to names of simulation. First, I need to get a previous result, a single number (Utau), in other files (named file_fwal055wal055 and file_fwal055wal055_c2). This is successfully done thanks to the command awk. The result depend on the file: Utaufwal055wal055=10.5 and Utaufwal055wal055_c2=12.2.
Then I need to divid the 1st column of the file stat_coupe corresponding to the path fwal055wal055/rep16/ by the value of Utaufwal055wal055 and do the same thing for the file stat_coupe corresponding to the path fwal055wal055_c2/rep20/ with the value of Utaufwal055wal055_c2. Moreover, each plot should have a specific format which depend on the type of simulation run (fwal055wal055 or fwal055wal055_c2).
The presented problem is reduced to 2 simulations fwal055wal055 and fwal055wal055_c2 and 1 plot but I have about 20 simulations and 15 various graphs to plot that is why I would like to use the for loop.
To summary at each iteration I have:
a specific format,
a specific path,
a specific value of Utau
I want to indicate the wright format, path and value of Utau at each iteration of the for loop. The solution I propose below successfully permits to obtain the value of Utau for each simulation but the code #path_.i and #format_.i does not work.
#!/bin/bash
for elem in fwal055wal055 fwal055wal055_c2;
do
Utau[${elem}]=$(awk 'FNR==5{print $1}' file_$elem)
done
gnuplot -persist <<-EOFMarker
format_fwal055wal055='pt 1 ps 1.0 lc 0 title "WALE"'
format_fwal055wal055_c2='pt 2 ps 1.0 lc 0 title "WALE c2"'
path_fwal055wal055='"fwal055wal055/rep16/stat_coupe"'
path_fwal055wal055_c2='"fwal055wal055_c2/rep20/stat_coupe"'
list="fwal055wal055 fwal055wal055_c2"
plot for [i in list] #path_.i u 1:(\$2/${Utau[${i}]}) #format_.i
EOFMarker
I would like to obtain something equivalent to:
plot #path_fwal055wal055 u 1:(\$2/${Utau[${i}]}) #format_fwal055wal055,\
#path_fwal055wal055_c2 u 1:(\$2/${Utau[${i}]}) #format_fwal055wal055_c2
Can someone help me to solve this issue ?
Thank you very much,
Martin
Check help sprintf, help words and help word.
I would create two strings with the same number of items and then combine them with sprintf(). From gnuplot 5.2 on you could also do it with arrays.
# Version 1
PATHS = '"fwal055wal055/rep16/stat_coupe" "fwal055wal055_c2/rep20/stat_coupe"'
FILES = "fwal055wal055 fwal055wal055_c2"
plot for [i=1:words(FILES)] sprintf("%s_%s",word(PATHS,i),word(FILES,i)) u 1:2
or you could define a function for your filenames to keep the plot command short and readable.
# Version 2
PATHS = '"rep16/stat_coupe" "rep20/stat_coupe"'
FILES = "fwal055wal055 fwal055wal055_c2"
myFilename(i) = sprintf("%s/%s_%s",word(FILES,i),word(PATHS,i),word(FILES,i))
plot for [i=1:words(FILES)] myFilename(i) u 1:2
Addition (after some clarifications...)
If I understand your question now correctly, the following code should do the job.
For the extraction of the UTAUS you do a separate loop before plotting and store the extracted values in a string. During plotting you get these values back via word(UTAUS,i). Since you do the mathematical operation column(2)/word(UTAUS,i), gnuplot will interpret them as number. Check help words, help word, help sprintf, help every.
Code:
### extract and normalize in a loop with individual files and directories
reset session
FILES = 'fwal055wal055 fwal055wal055_c2'
DIRS = 'rep16 rep20'
TITLES = '"WALE" "WALE c2"' # if you have spaces you need to put it into double quotes
UTAUS = ''
# define functions for better readability
myExtractionFile(i) = sprintf("file_%s",word(FILES,i))
myDataFile(i) = sprintf("%s/%s/stat_coupe",word(FILES,i),word(DIRS,i))
myTitle(i) = word(TITLES,i)
# define point or line appearance. Add more if you have more files
set style line 1 pt 1 ps 1.0 lc 0
set style line 2 pt 2 ps 1.0 lc 1
# extract the UTAUs
do for [i=1:words(FILES)] {
set table $Dummy
plot myExtractionFile(i) u (utau=$1) every ::4::4 w table # extract value row 5, column 1 (not counting header lines)
unset table
UTAUS = UTAUS.sprintf(" %g",utau) # append the extracted value as string
}
plot for [i=1:words(FILES)] myDataFile(i) u 1:(column(2)/word(UTAUS,i)) ls i title myTitle(i)
### end of code
I have a number of files (having 10 columns each) with following order:
file_001.txt, file_002.txt, file_003_txt,
file_021.txt, file_023.txt, file_023.txt,
file_041.txt, file_042.txt, file_043.txt,
file_061.txt, file_062.txt, file_063.txt,
file_081.txt, file_082.txt, file_083.txt,
I would like to plot each file with different line. e.g. using 1:2, using 1:3, using 1:5, using 1:8. I can not able to make a loop to call different columns. My following script is not working for k field
plot for [k=2, 3, 5, 8] for [j=0:8:2] for [i=1:3] 'file_0'.j.i.'.txt' u 1:k;
Use for [k in "2 3 5 8"] if you have a list rather than a range.
If j can be > 9, you should set up a function
fname(j,i) = sprintf("name%02.f%.f",j,i)
to get proper file names.
Format string "%02.f" means float (f), no digits after the comma (.), minimum two postions (2), fill empty space with zeroes.
print fname(2,3)
name023
print fname(13,3)
name133
print fname(113,3)
name1133
These are libc format strings, they are not documented inside the gnuplot docs, but there are many sources in the web.
I have the following code, which plots 4 lines:
plot for [i=1:4] \
path_to_file using 1:(column(i)) , \
I also want to plot 8 horizontal lines on this graph, the values of which come from mydata.txt.
I have seen, from the answer to Gnuplot: How to load and display single numeric value from data file, that I can use the stats command to access the constant values I am interested in. I think I can access the cell (row, col) as follows:
stats 'mydata.txt' every ::row::row using col nooutput
value = int(STATS_min)
But their location is a function of i. So, inside the plot command, I want to add something like:
for [i=1:4] \
stats 'mydata.txt' every ::(1+i*10)::(1+i*10) using 1 nooutput
mean = int(STATS_min)
stats 'mydata.txt' every ::(1+i*10)::(1+i*10) using 2 nooutput
SE = int(STATS_min)
upper = mean + 2 * SE
lower = mean - 2 * SE
and then plot upper and lower, as horizontal lines on the graph, above.
I think I can plot them separately by typing plot upper, lower but how do I plot them on the graph, above, for all i?
Thank you.
You can create an array and store the values in it, then using an index that refers to the value's position in the array you can access it inside a loop.
You can create the array as follows:
array=""
do for [i=1:4] {
val = i / 9.
array = sprintf("%s %g",array,val)
}
where I have stored 4 values: 1/9, 2/9, 3/9 and 4/9. In your case you would run stats and store your upper and/or lower variables. You can check what the array looks like in this way:
gnuplot> print array
0.111111 0.222222 0.333333 0.444444
For plotting, you can access the different elements in the array using word(array,i), where i refers to the position. Since the array is a string, you need to convert it to float, which can be done multiplying by 1.:
plot for [i=1:4] 1.*word(array,i)
If you have values stored in a data file, you can process it with awk or even with gnuplot:
array = ""
plot for [i=1:4] "data" every ::i::i u (array=sprintf("%s %g",array,$1), 1/0), \
for [i=1:4] 1.*word(array,i)
The first plot instance creates the array from the first column data entries without plotting the points (the 1/0 option tells gnuplot to ignore them, so expect warning messages) and the second plot instance uses the values stored in array as variables (hence as horizontal lines in this case). Note that every takes 0 as the first entry, so [i=1:4] runs from the second through to the fifth lines of the file.
I have two different variables which are stored as cell arrays. I try to open text file and store these variables as two column arrays. Below is my code, i used \t to seperate x and y data, but in the output file, the x data is written first which is followed by the y data. How can I obtain two column array in the text file?
for j=1:size(data1,2)
file1=['dir\' file(j,1).name];
f1{j}=fopen(file1,'a+')
fprintf(f1{j},'%7.3f\t%20.10f\n',x{1,j}',y{1,j});
fclose(f1{j});
end
Thanks in advance!
You can use dlmwrite as well to accomplish this for numeric data:
x = [1;2;3]; y = [4;5;6]; % two column vectors
dlmwrite('foo.dat',{x,y},'Delimiter','\t')
This produces the output:
1 4
2 5
3 6
Use a MATLAB table if you have R2013b or beyond:
data1 = {'a','b','c'}'
data2 = {1, 2, 3}'
t = table(data1, data2)
writetable(t, 'data.csv')
More info here.