My question is pretty basic. I am plotting several functions at once using gnuplot, and I want to print out (in either a file or on the graph itself) the maximum y-values of every function. Any idea how I could do that?
I looked into STATS and GPVAL_DATA_Y_MAX but I can't really figure out how to make them work with several functions at the same time.
Without going into too much details, let's suppose that my file looks like that :
plot 'file1.dat' us 1:2 title "file1" w lines,\
'file2.dat' us 1:2 title "file2" w lines,\
'file3.dat' us 1:2 title "file3" w lines
You can use the name parameter of the stats option to save the maximum of every file in a different set of variables:
stats 'file1.dat' using 2 nooutput name 'file1'
stats 'file2.dat' using 2 nooutput name 'file2'
stats 'file3.dat' using 2 nooutput name 'file3'
Now you can either print the values to an external file
set print 'max.dat'
print file1_max
print file2_max
print file3_max
If you want to place a respective label near the maximum in your graph, you must also know the corresponding x-value where the data has its maximum. This data is not readily available from the first stats command, only its index in the data file. So you need an additional call to stats in order to get the x-value where the maximum y-value was:
stats 'file1.dat' using 1 every ::file1_index_max::file1_index_max name 'file1_x'
...
And then you can use
set label center at first file1_x_max,first file1_max sprintf('y = %.2f', file1_max) offset char 0,1
Unfortunately, most of the commands cannot be iterated properly with changing variable names.
Related
I try to compare two data files that show the raise in memory consumption vs. time. The data stems from two tests that I executed on different times, I'm trying to make the difference (or equivalence) obvious by shifting the second plot in the x and y axes.
Here is my test.plt script (the plot line is wrapped for this question):
reset
set title "Memory consumtion of process"
set style data fsteps
set xlabel 'Zeit (HH:MM:\nSS.ddd)'
set timefmt '"%H:%M:%S"'
set xdata time
set xrange ['"10:00:00"' : '"13:00:00"']
set yrange [12000 : 20000]
set ylabel "Memory\nin KBytes"
set format x "%H:%M:\n%.3S"
set grid
set key left
plot 'mem-2015-11-26-1229.dat' using 1:2 with lines,
'mem-2015-11-26.dat' using 1:($2-1500) with lines
There is no problem with y since this is a normal scalar value. But I'm having difficulties to do the same with the x axis, this is how I tried to shift x (all with no success):
plot 'mem-2015-11-26-1229.dat' using 1:2 with lines,
'mem-2015-11-26.dat' using ($1-3600.0):($2-1500) with lines
Not even using the alias ($1) seems to work on time axis:
plot 'mem-2015-11-26-1229.dat' using 1:2 with lines,
'mem-2015-11-26.dat' using ($1):($2-1500) with lines
Both give the following warning (and no output for the second file):
"test.plt", line 15: warning: Skipping data file with no valid points
What am I doing wrong? Is it even possible to calculate time values at all? If yes, how?
I have file with many columns that I'd like to plot as follows:
plot for [i=1:30] 'test' using 1:i w lp
This gives the plot I want, but when I do set key, then the key I see has all lines labeled as 1:i:
How can I make this output more meaningful, by actually displayin the value of i?
If you don't set an explicit title, gnuplot selects an automatic title based on the plain plot command call. If you want a meaningful title, you must give it explicitly, like
plot for [i=1:30] 'test' using 1:i w lp title sprintf("column %d", i)
I have created syntax in SPSS that gives me 90 separate iterations of general linear model, each with slightly different variations fixed factors and covariates. In the output file, they are all just named as "General Linear Model." I have to then manually rename each analysis in the output, and I want to find syntax that will add a more specific name to each result that will help me identify it out of the other 89 results (e.g. "General Linear Model - Males Only: Mean by Gender w/ Weight covariate").
This is an example of one analysis from the syntax:
USE ALL.
COMPUTE filter_$=(Muscle = "BICEPS" & Subj = "S1" & SMU = 1 ).
VARIABLE LABELS filter_$ 'Muscle = "BICEPS" & Subj = "S1" & SMU = 1 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0). FILTER BY filter_$.
EXECUTE.
GLM Frequency_Wk6 Frequency_Wk9
Frequency_Wk12 Frequency_Wk16
Frequency_Wk20
/WSFACTOR=Time 5 Polynomial
/METHOD=SSTYPE(3)
/PLOT=PROFILE(Time)
/EMMEANS=TABLES(Time)
/CRITERIA=ALPHA(.05)
/WSDESIGN=Time.
I am looking for syntax to add to this that will name this analysis as: "S1, SMU1 BICEPS, GLM" Not to name the whole output file, but each analysis within the output so I don't have to do it one-by-one. I have over 200 iterations at times that come out in a single output file, and renaming them individually within the output file is taking too much time.
Making an assumption that you are exporting the models to Excel (please clarify otherwise).
There is an undocumented command (OUTPUT COMMENT TEXT) that you can utilize here, though there is also a custom extension TEXT also designed to achieve the same but that would need to be explicitly downloaded via:
Utilities-->Extension Bundles-->Download And Install Extension Bundles--->TEXT
You can use OUTPUT COMMENT TEXT to assign a title/descriptive text just before the output of the GLM model (in the example below I have used FREQUENCIES as an example).
get file="C:\Program Files\IBM\SPSS\Statistics\23\Samples\English\Employee data.sav".
oms /select all /if commands=['output comment' 'frequencies'] subtypes=['comment' 'frequencies']
/destination format=xlsx outfile='C:\Temp\ExportOutput.xlsx' /tag='ExportOutput'.
output comment text="##Model##: This is a long/descriptive title to help me identify the next model that is to be run - jobcat".
freq jobcat.
output comment text="##Model##: This is a long/descriptive title to help me identify the next model that is to be run - gender".
freq gender.
output comment text="##Model##: This is a long/descriptive title to help me identify the next model that is to be run - minority".
freq minority.
omsend tag=['ExportOutput'].
You could use TITLE command here also but it is limited to only 60 characters.
You would have to change the OMS tags appropriately if using TITLE or TEXT.
Edit:
Given the OP wants to actually add a title to the left hand pane in the output viewer, a solution for this is as follows (credit to Albert-Jan Roskam for the Python code):
First save the python file "editTitles.py" to a valid Python search path (for example (for me anyway): "C:\ProgramData\IBM\SPSS\Statistics\23\extensions")
#editTitles.py
import tempfile, os, sys
import SpssClient
def _titleToPane():
"""See titleToPane(). This function does the actual job"""
outputDoc = SpssClient.GetDesignatedOutputDoc()
outputItemList = outputDoc.GetOutputItems()
textFormat = SpssClient.DocExportFormat.SpssFormatText
filename = tempfile.mktemp() + ".txt"
for index in range(outputItemList.Size()):
outputItem = outputItemList.GetItemAt(index)
if outputItem.GetDescription() == u"Page Title":
outputItem.ExportToDocument(filename, textFormat)
with open(filename) as f:
outputItem.SetDescription(f.read().rstrip())
os.remove(filename)
return outputDoc
def titleToPane(spv=None):
"""Copy the contents of the TITLE command of the designated output document
to the left output viewer pane"""
try:
outputDoc = None
SpssClient.StartClient()
if spv:
SpssClient.OpenOutputDoc(spv)
outputDoc = _titleToPane()
if spv and outputDoc:
outputDoc.SaveAs(spv)
except:
print "Error filling TITLE in Output Viewer [%s]" % sys.exc_info()[1]
finally:
SpssClient.StopClient()
Re-start SPSS Statistics and run below as a test:
get file="C:\Program Files\IBM\SPSS\Statistics\23\Samples\English\Employee data.sav".
title="##Model##: jobcat".
freq jobcat.
title="##Model##: gender".
freq gender.
title="##Model##: minority".
freq minority.
begin program.
import editTitles
editTitles.titleToPane()
end program.
The TITLE command will initially add a title to main output viewer (right hand side) but then the python code will transfer that text to the left hand pane output tree structure. As mentioned already, note TITLE is capped to 60 characters only, a warning will be triggered to highlight this also.
This editTitles.py approach is the closest you are going to get to include a descriptive title to identify each model. To replace the actual title "General Linear Model." with a custom title would require scripting knowledge and would involve a lot more code. This is a simpler alternative approach. Python integration required for this to work.
Also consider using:
SPLIT FILE SEPARATE BY <list of filter variables>.
This will automatically produce filter labels in the left hand pane.
This is easy to use for mutually exclusive filters but even if you have overlapping filters you can re-run multiple times (and have filters applied to get as close to your desired set of results).
For example:
get file="C:\Program Files\IBM\SPSS\Statistics\23\Samples\English\Employee data.sav".
sort cases by jobcat minority.
split file separate by jobcat minority.
freq educ.
split file off.
I was wondering if it was possible to create graphs for multiple variables in a single syntax command in SPSS:
GRAPH
/HISTOGRAM(NORMAL)=
As it is, I'm creating multiple graphs as such:
GRAPH
/HISTOGRAM(NORMAL)=CO
GRAPH
/HISTOGRAM(NORMAL)=Min_last
GRAPH
/HISTOGRAM(NORMAL)=Day_abs
etc etc.
If I would do something along the lines of:
GRAPH
/HISTOGRAM(NORMAL)=CO Min_last Day_abs
and it would generate a graph for each variable, I'd be pretty happy.
Anyways, let me know if you think it's possible or if I need to provide more info. Thanks for reading!
If you just to save typing and want an independent set of graphs, you can define a macro like this.
define !H (!positional !cmdend)
!do !i !in (!1)
graph /histogram(normal)=!i.
!doend
!enddefine.
and invoke it with a list of variables.
!H salary salbegin.
The way I like to do it is to reshape the data so all three variables are in the same row using VARSTOCASES and then either panel the charts in small multiples (if you want the axes to be the same) or use SPLIT FILES to produce seperate charts. Example of the split file approach below:
*Making fake data.
INPUT PROGRAM.
LOOP #i = 1 TO 100.
COMPUTE CO = RV.NORMAL(0,1).
COMPUTE Min_last = RV.UNIFORM(0,1).
COMPUTE Days_abs = RV.POISSON(5).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
*Reshaping to long.
VARSTOCASES /MAKE V FROM CO Min_last Days_abs /INDEX VLab (V).
*Split file and build seperate charts.
SORT CASES BY VLab.
SPLIT FILE BY VLab.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=V
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: V=col(source(s), name("V"))
GUIDE: axis(dim(1), label("Value"))
GUIDE: axis(dim(2), label("Frequency"))
ELEMENT: interval(position(summary.count(bin.rect(V))), shape.interior(shape.square))
END GPL.
SPLIT FILE OFF.
I am plotting the creation times of a large batch of files in gnuplot to see if they are created linearly in time (they are not).
Here is my code:
#!/bin/bash
stat -c %Y img2/*png > timedata
echo "set terminal postscript enhanced colour
set output 'file_creation_time.eps'
plot 'timedata'" | gnuplot
The problem I have is that the y data are the creation time in seconds since unix start time, so the plot just has 1.333...e+09 on the y-axis. I would like to have the creation time of the first file scaled to zero so that the relative creation times are readable.
I encounter this problem in a number of data-plotting contexts, so I would like to be able to do this within gnuplot rather than resorting to awk or some utility to preprocess the data.
I know the first time will be the smallest since the files are named serially, so is there a way to access the first element in a file, something like
`plot 'data' using ($1-$1[firstelement])`
?
I think you can do something like that...(the following is untested, but I think it should work...). Basically, you have to plot the file twice -- the first time through gnuplot picks up statistics about the dataset. The second time through, you use what you found on the first run-through to plot what you actually want.
set terminal unknown
plot 'datafile' using 1:2
set terminal post enh eps color
set output 'myfile.eps'
YMIN=GPVAL_Y_MIN
plot '' u 1:($2-YMIN)
If you have gnuplot 4.6, you can do the same thing with the stats command.
http://www.gnuplot.info/demo/stats.html
EDIT It appears you want the first point to provide the offset (sorry, misread the question)...
If you want the first point to provide the offset, you may be able to do something like (again, untested -- requires gnuplot >= 4.3):
first=0;
offset=0;
func(x)=(offset=(first==0)?x:offset,first=1,x-offset)
plot 'datafile' using (func($1))
Gnuplot accepts unix commands, so you can say something like
gnuplot> plot "< tail -3 test.dat" using 1:2 with lines
in order to plot just the last three lines. You can use something like this for your purpose. Moreover, if you want to plot let's say from line 1000 to 2000
plot "<(sed -n '1000,2000p' filename.txt)" using 1:2 with lines
You can check this website, for more examples.
I found a related stackoverflow question here and exploited the awk script from one of the answers:
#!/bin/bash
stat -c %Y img2/*png > timedata
echo "set terminal postscript enhanced colour
set output 'file_creation_time.eps'
unset key
set xlabel 'file number'
set ylabel 'file creation time (after first)'
plot \"<awk '{if(NR==1) {shift = \$1} print (\$1 - shift)}' timedata\"" | gnuplot
The output looks like this (these are not the data I was talking about in my question, but similar):
So, gnuplot can do what I want but it does depend on the UNIX environment...
I also tried mgilson's method:
plot 'timedata'
YMIN=GPVAL_Y_MIN
plot '' u ($1-YMIN)
but gnuplot (my version is 4.4.2) did not find the minimum correctly. It came close; it looks like it plotted such that the minimum of the y range is 0: