Multiple plots from a single text file (gnuplot) - terminal

Currently, I have a text file and I'm interested in plotting two different curves from a single file(values for x axis are the same-column 1, values for y axis-columns 3 and 4). The plot should be in STDOUT since I'm working from ssh. The file that I am working with looks like this (filename: tmp)
%Iter duration train_objective valid_objective difference
0 6.0 0.0195735 0.0610958 0.0415223
1 5.0 0.180216 0.191344 0.011128
2 5.0 0.223318 0.241081 0.017763
3 6.0 0.245895 0.262197 0.016302
4 6.0 0.25796 0.28056 0.0226
5 6.0 0.269223 0.291769 0.022546
6 5.0 0.281187 0.298474 0.017287
7 5.0 0.283891 0.305579 0.021688
8 5.0 0.296456 0.307381 0.010925
9 5.0 0.296856 0.315487 0.018631
10 5.0 0.295805 0.321391 0.025586
Total training time is 0:06:27
So far, I can only plot the values corresponding to the 3rd column using the following line:
cat tmp | gnuplot -e "set terminal dumb size 120, 30; set autoscale; plot '-' u 1:3 with lines notitle"
Could someone tell me then how I could include the 4th column in the same plot? is that possible?
Thanks!

There is nothing in your description that rules out the trivial answer:
gnuplot -e "plot 'tmp' u 1:3 with lines, '' u 1:4 with lines"
The terminal choice is not relevant (you used 'set term dumb' but it could just as easily be any other output terminal, connection via ssh does not prevent that). If you have additional constraints that require a more complicated solution, please add them to the question.

Related

Gnuplot set term display window bug with qt on MacOS

Problem statement:
I am using qt term in gunplot to do a few graphs, however I get the following bug: the display window comes up for a microsecond and disappears, whereas terminal spits out the following: 'qt.qpa.fonts: Populating font family aliases took 260 ms. Replace uses of missing font family "Sans" with one that exists to avoid this cost.' -- Option 1.0 in code below
What I've tried:
Tried multiple fixes, involving change of font i.e Option 1.1 in code below. Upon changing to Helvetica or Verdana, the error disappears, but there is no display window.
Any ideas on how to fix it?
So far, I can get the graphs saved using the png term -- Option 1.2, all other term seem to produce same error as with qt. Desired solution is to have a functional display window as to avoid opening the saved.png.
P.S Using Mac OS, version 10.15.4 Catalina. Same code used to work before on older OS and older version of Gnuplot with x11/aquaterm support, which is not supported by the current OS and brew install.
Thank you all in advance!!!
Some Code (gunplot zsh script attached below):
gnuplot << EOF
# Option 1.0
set terminal qt
# does Error in commant prompt: qt.qpa.fonts: Populating font family aliases took 252 ms. Replace uses of missing font family "Sans" with one that exists to avoid this cost.
# Option 1.1
#set terminal qt font "Helvetica" # does no error in command prompt, no window displayed
# Option 1.2
#set terminal png
#set output 'saved.png' # saves .png but no window generated
# PARKER WIND
set xr [0.5:2.0]
set yr [0.0:2.5]
set xlabel "r/r_0"
set ylabel "Psi"
set style line 1 lt 1 lc rgb "blue" lw 1 pt 11
set style line 2 lt 1 lc rgb "black" lw 1 pt 11
set style line 3 lt 1 lc rgb "black" lw 1 pt 7 ps 2
set style line 4 lt 1 lc rgb "blue" lw 1 pt 7 ps 2
set style line 5 lt 1 lc rgb "black" lw 3 pt 7 ps 2
set xzeroaxis
# MULTIPLE GRAPHS
plot 'outputdata/parker_0.500.dat' u 1:2 with lines ls 1 title "psi0=0.500" ,\
'outputdata/parker_0.550.dat' u 1:2 with lines ls 1 title "psi0=0.550" ,\
'outputdata/parker_0.600.dat' u 1:2 with lines ls 1 title "psi0=0.600" ,\
'outputdata/parker_0.650.dat' u 1:2 with lines ls 1 title "psi0=0.650" ,\
'outputdata/parker_0.700.dat' u 1:2 with lines ls 1 title "psi0=0.700" ,\
'outputdata/parker_0.750.dat' u 1:2 with lines ls 1 title "psi0=0.750" ,\
'outputdata/parker_0.800.dat' u 1:2 with lines ls 1 title "psi0=0.800" ,\
'outputdata/parker_0.850.dat' u 1:2 with lines ls 1 title "psi0=0.850" ,\
EOF
You need to tell gnuplot to "persist" your plot with the -p option:
gnuplot -p <<EOF
...
...
EOF
You might also consider adding the following to your login script to always select qt output:
export GNUTERM=qt

search through file for data and create new txt file with just that data

I have a txt file that is an output from a machine with a bunch of writing/data/paragraphs which are not used for graphing purposes, but somewhere in the middle of the file I have the actual data that I need to graph. I need to search the file for the data and then print the data to a txt file so I can graph it later.
The data in the middle of the file looks like this (with each data file potentially having different amounts of rows/columns and numbers are separated by spaces):
<> 1 2 3 4 5 6 etc.
A 1.2 1.3 1.4 etc.
B 0.2 0.3 0.4 etc.
C 2.2 2.3 2.4 etc.
etc.
My thinking so far was to grep to '<>' to find the first line (grep '^<>' file) but I'm not sure how I would account for the variable amount of rows/columns when trying to find them. Also, I am using awk to loop over all .txt files in the directory and print to a new outfile so I can do multiple files at once (so maybe I can do this search/printing in awk as well?).
Edit:
--input/expected output file--
input file
This is the data
Here are some paragraphs
<> 1 2 3
A 1.2 1.3 1.4
B 0.2 0.3 0.4
C 2.2 2.3 2.4
more paragraphs
more paragraphs
output file:
<> 1 2 3
A 1.2 1.3 1.4
B 0.2 0.3 0.4
C 2.2 2.3 2.4
Using awk to do this to multiple txt files in a directory.
Here's one in awk. It looks for <> or decimal number ([0-9]+\.[0-9]+) in a record. If that's not enough, maybe you could try to expand that decimal number testing part to test for 3 numbers, something like: (/ [0-9]+\.[0-9]+){3}/
$ awk '/<>/||/[0-9]+\.[0-9]+/' foo
<> 1 2 3
A 1.2 1.3 1.4
B 0.2 0.3 0.4
C 2.2 2.3 2.4

gnuplot not recognizing plot for syntax

I am trying to use the for syntax for multiple columns.
I have a data file colhead.dat:
Id a1 a2 a3
1 1 2 3
2 2 3 4
3 2 3 4
Following the answer https://stackoverflow.com/a/17525615/429850, I do
gnuplot> plot for [i=2:5] 'colhead.dat' u 1:i w lp title columnheader(i)
^
':' expected
How do i write the for loop? Here's the gnuplot version header
Version 4.2 patchlevel 6
last modified Sep 2009
System: Linux 2.6.32-71.el6.x86_64
For-loops have been implemented in version 4.6 of gnuplot, and there was nothing like loops in the versions before. So you have to update your version!
Edit: As Christoph mentioned, first for functionality was introduced in 4.4. However, 4.2 is too old.

Sampling without replacement using awk

I have a lot of text files that look like this:
>ALGKAHOLAGGATACCATAGATGGCACGCCCT
>BLGKAHOLAGGATACCATAGATGGCACGCCCT
>HLGKAHOLAGGATACCATAGATGGCACGCCCT
>DLGKAHOLAGGATACCATAGATGGCACGCCCT
>ELGKAHOLAGGATACCATAGATGGCACGCCCT
>FLGKAHOLAGGATACCATAGATGGCACGCCCT
>JGGKAHOLAGGATACCATAGATGGCACGCCCT
>POGKAHOLAGGATACCATAGATGGCACGCCCT
Is there a way to do a sampling without replacement using awk?
For example, I have this 8 lines, and I only want to sample 4 of these randomly in a new file, without replacement.
The output should look something like this:
>FLGKAHOLAGGATACCATAGATGGCACGCCCT
>POGKAHOLAGGATACCATAGATGGCACGCCCT
>ALGKAHOLAGGATACCATAGATGGCACGCCCT
>BLGKAHOLAGGATACCATAGATGGCACGCCCT
Thanks in advance
How about this for a random sampling of 10% of your lines?
awk 'rand()>0.9' yourfile1 yourfile2 anotherfile
I am not sure what you mean by "replacement"... there is no replacement occurring here, just random selection.
Basically, it looks at each line of each file precisely once and generates a random number on the interval 0 to 1. If the random number is greater than 0.9, the line is output. So basically it is rolling a 10 sided dice for each line and only printing it if the dice comes up as 10. No chance of a line being printed twice - unless it occurs twice in your files, of course.
For added randomness (!) you can add an srand() at the start as suggested by #klashxx
awk 'BEGIN{srand()} rand()>0.9' yourfile(s)
Yes, but I wouldn't. I would use shuf or sort -R (neither POSIX) to randomize the file and then select the first n lines using head.
If you really want to use awk for this, you would need to use the rand function, as Mark Setchell points out.
To obtain random samples from a text file, without replacement, means that once a line has been randomly selected (sampled) it cannot be selected again. Thus, if 10 lines of 100 are to be selected, the ten random line numbers need to be unique.
Here is a script to produce NUM random (without replacement) samples from a text FILE:
#!/usr/bin/env bash
# random-samples.sh NUM FILE
# extract NUM random (without replacement) lines from FILE
num=$(( 10#${1:?'Missing sample size'} ))
file="${2:?'Missing file to sample'}"
lines=`wc -l <$file` # max num of lines in the file
# get_sample MAX
#
# get a random number between 1 .. max
# (see the bash man page on RANDOM
get_sample() {
local max="$1"
local rand=$(( ((max * RANDOM) / 32767) + 1 ))
echo "$rand"
}
# select_line LINE FILE
#
# select line LINE from FILE
select_line() {
head -n $1 $2 | tail -1
}
declare -A samples # keep track of samples
for ((i=1; i<=num; i++)) ; do
sample=
while [[ -z "$sample" ]]; do
sample=`get_sample $lines` # get a new sample
if [[ -n "${samples[$sample]}" ]]; then # already used?
sample= # yes, go again
else
(( samples[$sample]=1 )) # new sample, track it
fi
done
line=`select_line $sample $file` # fetch the sampled line
printf "%2d: %s\n" $i "$line"
done
exit
Here is the output of a few invocations:
./random-samples.sh 10 poetry-samples.txt
1: 11. Because I could not stop for death/He kindly stopped for me 2,360,000 Emily Dickinson
2: 25. Hope springs eternal in the human breast 1,080,000 Alexander Pope
3: 43. The moving finger writes; and, having writ,/Moves on571,000 Edward Fitzgerald
4: 5. And miles to go before I sleep 5,350,000 Robert Frost
5: 6. Not with a bang but a whimper 5,280,000 T.S. Eliot
6: 40. In Xanadu did Kubla Khan 594,000 Coleridge
7: 41. The quality of mercy is not strained 589,000 Shakespeare
8: 7. Tread softly because you tread on my dreams 4,860,000 W.B. Yeats
9: 42. They also serve who only stand and wait 584,000 Milton
10: 48. If you can keep your head when all about you 447,000Kipling
./random-samples.sh 10 poetry-samples.txt
1: 38. Shall I compare thee to a summers day 638,000 Shakespeare
2: 34. Busy old fool, unruly sun 675,000 John Donne
3: 14. Candy/Is dandy/But liquor/Is quicker 2,150,000 Ogden Nash
4: 45. We few, we happy few, we band of brothers 521,000Shakespeare
5: 9. Look on my works, ye mighty, and despair 3,080,000 Shelley
6: 11. Because I could not stop for death/He kindly stopped for me 2,360,000 Emily Dickinson
7: 46. If music be the food of love, play on 507,000 Shakespeare
8: 44. What is this life if, full of care,/We have no time to stand and stare 528,000 W.H. Davies
9: 35. Do not go gentle into that good night 665,000 Dylan Thomas
10: 15. But at my back I always hear 2,010,000 Marvell
./random-samples.sh 10 poetry-samples.txt
1: 26. I think that I shall never see/A poem lovely as a tree. 1,080,000 Joyce Kilmer
2: 32. Human kind/Cannot bear very much reality 891,000 T.S. Eliot
3: 14. Candy/Is dandy/But liquor/Is quicker 2,150,000 Ogden Nash
4: 13. My mistress’ eyes are nothing like the sun 2,230,000Shakespeare
5: 42. They also serve who only stand and wait 584,000 Milton
6: 24. When in disgrace with fortune and men's eyes 1,100,000Shakespeare
7: 21. A narrow fellow in the grass 1,310,000 Emily Dickinson
8: 9. Look on my works, ye mighty, and despair 3,080,000 Shelley
9: 10. Tis better to have loved and lost/Than never to have loved at all 2,400,000 Tennyson
10: 31. O Romeo, Romeo; wherefore art thou Romeo 912,000Shakespeare
Maybe it's better to sample the file using a fixed schema, like sampling one record each 10 lines. You can do that using this awk one-liner:
awk '0==NR%10' filename
If you want to sample a percentage of the total, then you can program a way to calculate the number of rows the awk one-liner should use so the number of records printed matches that quantity/percentage.
I hope this helps!

julian dates as.date

I am trying to run a simple regression in R (mac OSX), to see if the level of an environmental certification has improved over time - among other things. The data I downloaded offers a level from 1-4, and the dates in 1-Mar-12 format. I can't seem to get R to convert the dates, and I keep getting the same error message. The variables are the same length.
$ certification_date: chr "1-Aug-11" "1-Aug-11" "1-Aug-11" "1-Jul-11" ...
jday<-as.Date('certification_date',format='%d-%b-%y',"%j")
mod <- lm(Level_number ~ jday, data=data)
Error in model.frame.default(formula = Level_number ~ jday, data = data, :
variable lengths differ (found for 'jday')
summary(jday)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
NA NA NA NA NA NA "1"
Can anyone spot where I've gone wrong?
You should remove the quotes around certification_date as mentioned in the comment, But %b is the abbreviated month name in the current locale. So you can get another problem with your locals. Here I present a independent-local solution:
## get your current local time
loc <- Sys.getlocale('LC_TIME')
## set the local to english , since %b is local dependent
Sys.setlocale('LC_TIME','ENGLISH')
jday <-as.Date(certification_date,format='%d-%b-%y',"%j")
Sys.setlocale('LC_TIME',loc)
The result is:
jday
[1] "2011-08-01" "2011-08-01" "2011-08-01" "2011-07-01"

Resources