Is there a chance to set the header of the data file columns as label (not as key)?
I have data files with 5 or 6 columns and a header above each column. Now I would like to use the columnheader with the set label command. Is this possible?
On a unixoid system, the head command helps:
header = system("head -n 1 ".filename)
label1 = word(header,1)
label2 = word(header,2)
...
set label 1 at 0.5,0.5 label1
set label 2 ....
MS win does not have the head command, you might use 'findstr /B \"#\"' instead, if the header line begins with a "#". Or use cygwin to get a full GNU + POSIX environment under Windows.
The word() function should split your header string at the same positions as columnhead(). Unless of course you have a different separator (not space or tab):
separator =","
p1 = strstrt(header,separator)
p2 = strstrt(header[p1+1:],separator)
...
label1=header[1:p1-1]
...
Related
I’m trying to construct a Syntax to generate a Syntax in SPSS, but I’m having some issues…
I have an excel file with metadata and I would like to use it in order to make a syntax to extract information from it (like this, if I have a huge database, I just need to keep the excel updated – add/delete variables, etc. - and then run a syntax to extract the needed information for a new syntax).
I also noticed the produced syntax has always around 15Mb, which is a lot (applied to more than 500 lines)!
I don’t use Python due to run syntax in different computers and/or configurations.
Any ideas? Can anyone please help me?
Thank you in advance.
Example:
(test.xlsx – sheet 1)
Var Code Label List Var_label (concatenate Var+Label)
V1 3 Sex 1 V1 “Sex”
V2 1 Work 2 V2 “Work”
V3 3 Country 3 V3 “Country”
V4 1 Married 2 V4 “Married”
V5 1 Kids 2 V5 “Kids”
V6 2 Satisf1 4 V6 “Satisf1”
V7 2 Satisf2 4 V7 “Satisf2”
(information from other file)
List = 1
1 “Male”
2 “Female”
List = 2
1 “Yes”
2 “No”
List = 3
1 “Europe”
2 “America”
3 “Asia”
4 “Africa”
5 “Oceania”
List = 4
1 “Very unsatisfied”
10 “Very satisfied”
I want to make a Syntax that generates a new syntax to apply “VARIABLE LABELS” and “VALUE LABELS”. So, I thought about something like this:
GET DATA
/TYPE=XLSX
/FILE="test.xlsx"
/SHEET=name 'sheet 1'
/CELLRANGE=FULL
/READNAMES=ON
/DATATYPEMIN PERCENTAGE=95.0.
EXECUTE.
STRING vlb (A15) labels (A150) value (A12) lab (A1500) point (A2) separate (A50) space (A2) list1 (A100) list2 (A100).
SELECT IF (Code=1).
COMPUTE vlb = "VARIABLE LABELS".
COMPUTE labels = CONCAT (RTRIM(Var_label)," ").
COMPUTE point = ".".
COMPUTE value = "VALUE LABELS".
COMPUTE lab = CONCAT (RTRIM(Var)," ").
COMPUTE list1 = '1 " Yes "'.
COMPUTE list2 = '2 "No".'.
COMPUTE space = " ".
COMPUTE separate="************************************************.".
WRITE OUTFILE = "list_01.sps" / vlb.
WRITE OUTFILE = "list_01.sps" /labels.
WRITE OUTFILE = "list_01.sps" /point.
WRITE OUTFILE = "list_01.sps" /value.
WRITE OUTFILE = "list_01.sps" /lab.
WRITE OUTFILE = "list_01.sps" /list1.
WRITE OUTFILE = "list_01.sps" /list2.
WRITE OUTFILE = "list_01.sps" /space.
WRITE OUTFILE = "list_01.sps" /separate.
WRITE OUTFILE = "list_01.sps" /space.
If there is only one variable with same list (ex: V1), it works ok. However, if there is more than one variable having the same list, it reproduces the codes as much times as number of variables (Ex: V2, V4 and V5).
What I have (Ex: V2, V4 and V5), after running code above:
VARIABLE LABELS
V2 "Work"
.
VALUE LABELS
V2
1 " Yes "
2 " No "
************************************************.
VARIABLE LABELS
V4 "Married"
.
VALUE LABELS
V4
1 " Yes "
2 " No "
************************************************.
VARIABLE LABELS
V5 "Kids"
.
VALUE LABELS
V5
1 " Yes "
2 " No "
************************************************.
What I would like to have:
VARIABLE LABELS
V2 "Work"
V4 "Married"
V5 "Kids"
.
VALUE LABELS
V2 V4 V5
1 " Yes "
2 " No "
I think there are probably ways to automate the whole process better, including the use of your second data source. But for the scope of this question I will suggest a way to get what you asked for specifically.
The key is to build the command with special conditions for first and last lines:
string cmd1 cmd2 (a200).
sort cases by code.
match files /file=* /first=first /last=last /by code. /* marking first and last lines.
do if first.
compute cmd1="VARIABLE LABELS".
compute cmd2="VALUE LABELS".
end if.
if not first cmd1=concat(rtrim(cmd1), " /"). /* "/" only appears from the second varname.
compute cmd1=concat(rtrim(cmd1), " ", Var_label).
compute cmd2=concat(rtrim(cmd2), " ", Var).
do if last.
compute cmd1=concat(rtrim(cmd1), " .").
compute cmd2=concat(rtrim(cmd2), " ", ' 1 " Yes " 2 "No". ').
end if.
exe.
The commands are now ready, but we don't want to get them mixed up so we'll stack them one under the other, and only then write them out:
add files /file=* /rename cmd1=cmd /file=* /rename cmd2=cmd.
exe.
WRITE OUTFILE = "var definitions.sps" / cmd .
exe.
EDIT:
Note that the code above assumes you've already run a select cases if code = ... and that there is a single code in all the remaining lines.
Note also I added an exe. command at the end - without running that the new syntax will appear empty.
The results of my program simulations are several datafiles with the first column indicate success (=0) or error (=1) and the second column is the simulation time in seconds.
An example of these two columns is:
1 185.48736852299064
1 199.44533672989186
1 207.35654106612733
1 213.5214031236177
1 215.50576147950017
0 219.62444310777695
0 222.26750248416354
0 236.1402270910635
1 238.5124609287994
0 246.4538392581228
. .
. .
. .
1 307.482605596962
1 329.16494123373445
0 329.6454558227778
1 330.52804695995303
0 332.0673690346546
0 358.3001385706268
0 359.82271742496414
1 400.8162129871805
0 404.88783391725985
1 411.27012219170393
I can make a frequency plot (histogram) of the errors (1's) binning the data.
set encoding iso_8859_1
set key left top
set ylabel "P_{error}"
set xlabel "Time [s]"
set size 1.4, 1.2
set terminal postscript eps enhanced color "Helvetica" 16
set grid ytics
set key spacing 1.5
set style fill transparent solid 0.3
`grep '^ 1' lookup-ratio-50-0.0034-50-7-20-10-3-1.txt | awk '{print $2}' > t7.dat`
stats 't7.dat' u 1
set output "t7.eps"
binwidth=2000
bin(x,width)=width*floor(x/width)
plot 't7.dat' using (bin($1,binwidth)):(1.0/STATS_records) smooth freq with boxes lc rgb "midnight-blue" title "7x7_P_error"
The result
I want to improve the Gnuplot above to and include the rest of datafiles lookup-.....-.txt and their error samples, and join them in the same frequency plot.
I would like also avoiding the use of intermediate files like t7.dat.
Besides, I would like to plot a horizontal line of the mean of the error probability.
How could I plot all the sample data in the same plot?
Regards
If I understand you correctly, you want to do the histogram over several files. So, you basically have to concatenate several datafiles.
Of course, you can do this with some external programs like awk, etc. or shell commands.
Below is a possible solution for gnuplot and a system command and no need for a temporary file. The system command is for Windows, but you probably can easily translate this to Linux. And maybe you need to check whether the "NaN" values do not messup your binning and histogram results.
### start code
reset session
# create some dummy data files
do for [i=1:5] {
set table sprintf("lookup-blahblah_%d.txt", i)
set samples 50
plot '+' u (int(rand(0)+0.5)):(rand(0)*0.9+0.1) w table
unset table
}
# end creating dummy data files
FILELIST = system("dir /B lookup*.txt") # this is for Windows
print FILELIST
undefine $AllDataWithError
set table $AllDataWithError append
do for [i=1:words(FILELIST)] {
plot word(FILELIST,i) u ($1==1? $1 : NaN):($1==1? $2 : NaN) w table
}
unset table
print $AllDataWithError
# ... do your binning and plotting
### end of code
Edit:
Apparently, NaN and/or empty lines seem to mess up smooth freq and/or binning?!
So, we need to extract only the lines with errors (=1).
From the above code you can merge several files into one datablock.
The code below already starts with one datablock similar to your data.
### start of code
reset session
# create some dummy datablock with some distribution (with no negative values)
Height =3000
Pos = 6000
set table $Data
set samples 1000
plot '+' u (int(rand(0)+0.3)):(abs(invnorm(rand(0))*Height+Pos)) w table
unset table
# end creating dummy data
stats $Data nooutput
Datapoints = STATS_records
# get only the error lines
# plot $Data into the table $Dummy.
# If $1==1 (=Error) write the line number $0 into column 1 and value into column 2
# else write NaN into column 1 and column 2.
# Since $0 is the line number which is unique
# 'smooth frequency' will keep these lines "as is"
# but change the NaN lines to empty lines.
Error = 1
Success = 0
set table $Dummy
plot $Data u ($1==Error ? $0 : NaN):($1==Error ? $2 : NaN) smooth freq
unset table
# get rid of empty lines in $Dummy
# Since empty lines seem to also mess up binning you need to remove them
# by writing $Dummy into the dataset $Error via "plot ... with table".
set table $Error
plot $Dummy u 1:2 with table
unset table
bin(x) = binwidth*floor(x/binwidth)
stats $Error nooutput
ErrorCount = STATS_records
set multiplot layout 3,1
set key outside
set label 1 sprintf("Datapoints: %g\nSuccess: %g\nError: %g",\
Datapoints, Datapoints-ErrorCount,ErrorCount) at graph 1.02, first 0
plot $Data u 0:($1 == Success ? $2 : NaN) w impulses lc rgb "web-green" t "Success",\
$Data u 0:($1 == Error ? -$2 : NaN) w impulses lc rgb "red" t "Error",\
unset label 1
set key inside
binwidth = 1000
plot $Error using (bin($2)):(1.0/STATS_records) smooth freq with boxes t sprintf("binwidth: %d",binwidth) lc rgb "blue"
binwidth=100
set xrange[GPVAL_X_MIN:GPVAL_X_MAX] # use same xrange as graph before
plot $Error using (bin($2)):(1.0/STATS_records) smooth freq with boxes t sprintf("binwidth: %d",binwidth) lc rgb "magenta"
unset multiplot
### end of code
which results in something like:
you can pipe the data and plot directives to gnuplot without a temp file,
for example
$ awk 'BEGIN{print "plot \"-\" using ($1):($2)";
while(i++<20) print i,rand()*20; print "e"}' | gnuplot -p
will create a random plot. You can print the directive in the BEGIN block as I did and the main awk statement can filter the data.
For your plot, something like this
$ awk 'BEGIN{print "...." }
$1==1{print $2}
END {print "e"}' lookup-*.txt | gnuplot -p
I am using printf command to log some values in a file as follows:
printf "Parameter = $parameter v9_value = $v9_val v9_line = $V9_Line_Count v16_val = $v16_val v16_line = $V16_Line_Count"
But the output I am getting as follows:
v16_line = 8elayServerPort v9_value = 41 v9_line = 8 v16_val = 4571
Seems like the line is printed in rotation manner, and last values are coming from starting.
Expected Output:
Parameter = RelayServerPort v9_value = 41 v9_line = 8 v16_val = 4571 v16_line = 8
But v16_line = 8 is overwritten on Parameter = R in line.
printf doesn't add a NL on the end. You need to add \n to the end of your printf.
Not seeing the rest of your program, or where you get your variable values, it's hard to say what else could be the issue.
One thing you can do is to redirect your output to a file and look at that file either through a good program editor or using cat -v which disables control characters.
See if you see ^M in your output. If you do, it could be that you have ^R in your variables.
Also remove $v16_val from your printf (temporarily) and see if your output looks better. If so, that $v16_val might have a CR (^M) in it.
I want to copy contents from text file to an Excel cell using VBA. I can do this succesfully from most text files. But in case of certain files, the code is copying only partial data into the excel file.
This is the code I used for copying
FileName = folderpath & sFile
Set mytextfile = Workbooks.Open(FileName)
mytextfile.Sheets(1).Cells.CurrentRegion.Copy ThisWorkbook.Sheets("RawData").Range("A" & inputRow)
'mytextfile.Sheets(1).Range("A1").CurrentRegion.Copy ThisWorkbook.Sheets("RawData").Range("A" & inputRow)
mytextfile.Close (False)
I already understand what is the problem. While opening certain text files as excel files, some contents are present in cell A1 and the rest in cell A2.
I don't know why it is opened so. I am posting contents of two text file below:
1)Text file whose contents are contained in different cells when opened in excel
fwi!3F5A!041!g1ksIpqub7J MCMILLAN J. PIIKKILA RAYMONDBERRY#WEBTV.NET
+001 061 477 130 F g3ktHqrwc9 CLE!g1ksIpqub7 CLEHS04C |P.O. BOX 171 SEARSPORT,ME Nashville 68800 AZ| |5150 CTY RD 525 Raleigh 64292|
18000000 0412CL0 1 N 2
I got the following output when I used the above text file.
fwi!3F5A!041!g1ksIpqub7J MCMILLAN J. PIIKKILA RAYMONDBERRY#WEBTV.NET +001 061 477 130 F g3ktHqrwc9 CLE!g1ksIpqub7 CLEHS04C |P.O. BOX 171 SEARSPORT
2)Text file whose contents are contained in a single cell A1
fSj!3U68!071!gQloo3d5OGG Presley Y. TART JR PULPACTION82#HOIMAIL.COM
+001 047 475273 M gQmqq6d8ME CVE!gQloo3d5OG CVEGF07C |10001 SW 125TH CT RD Reno 88595 TN| |10849 DEBORAH DRIVE Glendale 70958| 97400000
0712CV0 0 N 0
I got the following output when I used the above text file.
fSj!3U68!071!gQloo3d5OGG Presley Y. TART JR PULPACTION82#HOIMAIL.COM +001 047 475273 M gQmqq6d8ME CVE!gQloo3d5OG CVEGF07C |10001 SW 125TH CT RD Reno 88595 TN| |10849 DEBORAH DRIVE Glendale 70958| 97400000 0712CV0 0 N 0
There is no apparent difference between the contents of these two file.
I also tried this code but without any success:
FileName = folderpath & sFile
Set mytextfile = Workbooks.Open(FileName)
mytextfile.Sheets(1).Range("A1").CurrentRegion.Copy ThisWorkbook.Sheets("RawData").Range("A" & inputRow)
mytextfile.Close (False)
Problem:
You are not using the right approach to read a *.txt file in VBA. Opening file using Workbooks.Open() treats the opened file as a *.csv. Therefore, when Excel is reading a stream and a comma occurs it treats it as a new line separator and throws the remaining part (after the comma) to the next cell. As stated on MSDN Workbooks.Open Method
expression .Open(FileName ... )
expression A variable that represents a Workbooks object.
Clearly, a Workbook object not a txt file.
Solution:
The right approach to read content of a *.txt file is to use the FileSystemObject and TextStream objects from the Microsoft Scripting Runtime Library.
I wrote a simple Sub for you that reads the entire content of a *.txt file. In order to make it work you have to add references to your project
In VBE window, click Tools » References » scroll down, find, and tick Microsoft Scripting Runtime
Now, screen through the below code and modify the path to your *.txt file or pass the path through a parameter and the entire content of your *.txt file will be placed in the first Sheet Sheet(1) Cell A1
Sub ReadTxtFile()
Dim oFSO As New FileSystemObject
Dim oFS As TextStream
Dim fileName As String
' make sure to update your path or
' pass it to the sub through parameter
fileName = "C:\Users\fooboo\Desktop\text.txt"
Set oFS = oFSO.OpenTextFile(fileName)
Dim content As String
content = oFS.ReadAll
With Sheets(1).Range("A1")
.ClearContents
.NumberFormat = "#"
.Value = content
End With
oFS.Close
Set oFS = Nothing
Set oFSO = Nothing
End Sub
I am trying to set variables for gnuplot environment with set for cycle command.
I am using 4.6 version and according the gnuplot documention (page 70) the syntax is following:
for [intvar = start:end{:increment}]
for [stringvar in "A B C D"]
Examples:
set for [i = 1:10] style line i lc rgb "blue"
But i get this error:
gnuplot> set for [var in gpvars] replace(var,'#_#',' ')
^
line 0: Unrecognized option. See 'help set'.
My script:
#!/bin/bash
OUTDIRNAME="out"
TIMEFORMAT='%d.%m.%y'
GPPARS=( "xlabel "Time"" "ylabel "value1"" "y2label "value2"" "format x "%H:%M"")
GPPARS_MOD=()
for (( i=0; i < ${#GPPARS[#]}; i++)); do
FILE=${GPPARS[${i}]}
echo "arg=${FILE}"
GPPARS_MOD+=( "`echo "${FILE}" | sed -e 's/ /#_#/g'`" )
done
gnuplot << EOF
reset
replace(S,C,R)=(strstrt(S,C)) ? \
replace( S[:strstrt(S,C)-1].R.S[strstrt(S,C)+strlen(C):] ,C,R) : S
set terminal png
set output "${OUTDIRNAME}/graph.png"
set timefmt "${TIMEFORMAT}"
set xdata time
gpvars="${GPPARS_MOD[#]}"
set for [var in gpvars] {
replace(var,'#_#',' ')
}
...
EOF
...
exit 0
I am also using function replace, because spaces ( gnuplot ignores escape sequences )The function works flawlessly for plot for cycle.
I have tried with and without function and with variables without spaces, but the result is same.
As a side note -- I'm not sure that I believe your bash array will group things the way you want it to ... for me, your quotations get stripped. try:
GPPARS=( "xlabel 'Time'" "ylabel 'value1'" "y2label 'value2'" "format x '%H:%M'")
instead. (interior double quotes replaced with single quotes)
This is a tricky one -- It's a good thing you're using gnuplot 4.6, otherwise I'm not sure how to go about solving it. (EDIT -- using gnuplot 4.4, you could use a combination of word, words, if, reread, exists and macros, but it's quite a messy solution)
Note that what you have doesn't work because it is akin to:
MYLABEL='xlabel "foo"'
set MYLABEL
Gnuplot doesn't expand MYLABEL prior to doing the set command so that you can do things like:
MYLABEL="totally cool X label here!"
set xlabel MYLABEL
What you want could be accomplished using macros (but alas, not with iteration):
set macro
MYLABEL='xlabel "foo"'
set #MYLABEL
But that doesn't quite work here either because macro expansion happens before anything else (e.g. function evaluation). What you need here is gnuplot's more general iteration introduced in 4.6 combined with eval
do for [ var in gpvars ] {
eval( 'set '.replace(var,'#_#',' ') )
}
EDIT -- gnuplot 4.2+ solution
#top of script -- Nothing should go here.
replace(S,C,R)=(strstrt(S,C)) ? \
replace( S[:strstrt(S,C)-1].R.S[strstrt(S,C)+strlen(C):] ,C,R) : S
if( ! exists("N") ) N=1
TODO="${GPPARS_MOD[#]}"
set macro
do_set=replace(word(TODO,N),'#_#',' ')
set #do_set
N=N+1
if( N <= words(TODO) ) reread
#rest of script here ...