Syntax to generate a Syntax in SPSS - syntax

I’m trying to construct a Syntax to generate a Syntax in SPSS, but I’m having some issues…
I have an excel file with metadata and I would like to use it in order to make a syntax to extract information from it (like this, if I have a huge database, I just need to keep the excel updated – add/delete variables, etc. - and then run a syntax to extract the needed information for a new syntax).
I also noticed the produced syntax has always around 15Mb, which is a lot (applied to more than 500 lines)!
I don’t use Python due to run syntax in different computers and/or configurations.
Any ideas? Can anyone please help me?
Thank you in advance.
Example:
(test.xlsx – sheet 1)
Var Code Label List Var_label (concatenate Var+Label)
V1 3 Sex 1 V1 “Sex”
V2 1 Work 2 V2 “Work”
V3 3 Country 3 V3 “Country”
V4 1 Married 2 V4 “Married”
V5 1 Kids 2 V5 “Kids”
V6 2 Satisf1 4 V6 “Satisf1”
V7 2 Satisf2 4 V7 “Satisf2”
(information from other file)
List = 1
1 “Male”
2 “Female”
List = 2
1 “Yes”
2 “No”
List = 3
1 “Europe”
2 “America”
3 “Asia”
4 “Africa”
5 “Oceania”
List = 4
1 “Very unsatisfied”
10 “Very satisfied”
I want to make a Syntax that generates a new syntax to apply “VARIABLE LABELS” and “VALUE LABELS”. So, I thought about something like this:
GET DATA
/TYPE=XLSX
/FILE="test.xlsx"
/SHEET=name 'sheet 1'
/CELLRANGE=FULL
/READNAMES=ON
/DATATYPEMIN PERCENTAGE=95.0.
EXECUTE.
STRING vlb (A15) labels (A150) value (A12) lab (A1500) point (A2) separate (A50) space (A2) list1 (A100) list2 (A100).
SELECT IF (Code=1).
COMPUTE vlb = "VARIABLE LABELS".
COMPUTE labels = CONCAT (RTRIM(Var_label)," ").
COMPUTE point = ".".
COMPUTE value = "VALUE LABELS".
COMPUTE lab = CONCAT (RTRIM(Var)," ").
COMPUTE list1 = '1 " Yes "'.
COMPUTE list2 = '2 "No".'.
COMPUTE space = " ".
COMPUTE separate="************************************************.".
WRITE OUTFILE = "list_01.sps" / vlb.
WRITE OUTFILE = "list_01.sps" /labels.
WRITE OUTFILE = "list_01.sps" /point.
WRITE OUTFILE = "list_01.sps" /value.
WRITE OUTFILE = "list_01.sps" /lab.
WRITE OUTFILE = "list_01.sps" /list1.
WRITE OUTFILE = "list_01.sps" /list2.
WRITE OUTFILE = "list_01.sps" /space.
WRITE OUTFILE = "list_01.sps" /separate.
WRITE OUTFILE = "list_01.sps" /space.
If there is only one variable with same list (ex: V1), it works ok. However, if there is more than one variable having the same list, it reproduces the codes as much times as number of variables (Ex: V2, V4 and V5).
What I have (Ex: V2, V4 and V5), after running code above:
VARIABLE LABELS
V2 "Work"
.
VALUE LABELS
V2
1 " Yes "
2 " No "
************************************************.
VARIABLE LABELS
V4 "Married"
.
VALUE LABELS
V4
1 " Yes "
2 " No "
************************************************.
VARIABLE LABELS
V5 "Kids"
.
VALUE LABELS
V5
1 " Yes "
2 " No "
************************************************.
What I would like to have:
VARIABLE LABELS
V2 "Work"
V4 "Married"
V5 "Kids"
.
VALUE LABELS
V2 V4 V5
1 " Yes "
2 " No "

I think there are probably ways to automate the whole process better, including the use of your second data source. But for the scope of this question I will suggest a way to get what you asked for specifically.
The key is to build the command with special conditions for first and last lines:
string cmd1 cmd2 (a200).
sort cases by code.
match files /file=* /first=first /last=last /by code. /* marking first and last lines.
do if first.
compute cmd1="VARIABLE LABELS".
compute cmd2="VALUE LABELS".
end if.
if not first cmd1=concat(rtrim(cmd1), " /"). /* "/" only appears from the second varname.
compute cmd1=concat(rtrim(cmd1), " ", Var_label).
compute cmd2=concat(rtrim(cmd2), " ", Var).
do if last.
compute cmd1=concat(rtrim(cmd1), " .").
compute cmd2=concat(rtrim(cmd2), " ", ' 1 " Yes " 2 "No". ').
end if.
exe.
The commands are now ready, but we don't want to get them mixed up so we'll stack them one under the other, and only then write them out:
add files /file=* /rename cmd1=cmd /file=* /rename cmd2=cmd.
exe.
WRITE OUTFILE = "var definitions.sps" / cmd .
exe.
EDIT:
Note that the code above assumes you've already run a select cases if code = ... and that there is a single code in all the remaining lines.
Note also I added an exe. command at the end - without running that the new syntax will appear empty.

Related

Get line number where first occurrence of a value appears?

I have a CSV file like below:
E Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Mean
1 0.7019 0.6734 0.6599 0.6511 0.701 0.6977 0.680833333
2 0.6421 0.6478 0.6095 0.608 0.6525 0.6285 0.6314
3 0.6039 0.6096 0.563 0.5539 0.6218 0.5716 0.5873
4 0.5564 0.5545 0.5138 0.4962 0.5781 0.5154 0.535733333
5 0.5056 0.4972 0.4704 0.4488 0.5245 0.4694 0.485983333
I'm trying to use find the row number where the final column has a value below a certain range. For example, below 0.6.
Using the above CSV file, I want to return 3 because E = 3 is the first row where Mean <= 0.60. If there is no value below 0.6 I want to return 0. I am in effect returning the value in the first column based on the final column.
I plan to initialize this number as a constant in gnuplot. How can this be done? I've tagged awk because I think it's related.
In case you want a gnuplot-only version... if you use a file remove the datablock and replace $Data by your filename in " ".
Edit: You can do it without a dummy table, it can be done shorter with stats (check help stats). Even shorter than the accepted solution (well, we are not at code golf here), but additionally platform-independent because it's gnuplot-only.
Furthermore, in case E could be any number, i.e. 0 as well, then it might be better
to first assign E = NaN and then compare E to NaN (see here: gnuplot: How to compare to NaN?).
Script:
### conditional extraction into a variable
reset session
$Data <<EOD
E Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Mean
1 0.7019 0.6734 0.6599 0.6511 0.701 0.6977 0.680833333
2 0.6421 0.6478 0.6095 0.608 0.6525 0.6285 0.6314
3 0.6039 0.6096 0.563 0.5539 0.6218 0.5716 0.5873
4 0.5564 0.5545 0.5138 0.4962 0.5781 0.5154 0.535733333
5 0.5056 0.4972 0.4704 0.4488 0.5245 0.4694 0.485983333
EOD
E = NaN
stats $Data u ($8<=0.6 && E!=E? E=$1 : 0) nooutput
print E
### end of script
Result:
3.0
Actually, OP wants to return E=0 if the condition was not met. Then the script would be like this:
E=0
stats $Data u ($8<=0.6 && E==0? E=$1 : 0) nooutput
Another awk. You could initialize the default return value to var ret in BEGIN but since it's 0 there is really no point as empty var+0 produces the same effect. If the threshold value of 0.6 is not met before the ENDis reached, that is returned. If it is met, exit invokes the END and ret is output:
$ awk '
NR>1 && $NF<0.6 { # final column has a value below a certain range
ret=$1 # I want to return 3 because E = 3
exit
}
END {
print ret+0
}' file
Output:
3
Something like this should do the trick:
awk 'NR>1 && $8<.6 {print $1;fnd=1;exit}END{if(!fnd){print 0}}' yourfile

SPSS Syntax- new variable based on 3 variables

I need to create a new variable based on 3 variables.
If someone is coded as 1 against any 1 of 3 variables, they are coded as 1 in the new variable
If they don’t code 1 on any variable, but code as 2 against any 1 of 3 variables they are coded as 2 in the new variable
Everything else is coded as 99
In syntax, I’ve written this as:
IF (Keep_Any=1 OR Find_Any=1 OR Improve_Any=1) Keep_Find_Improve=1.
IF ((Keep_Find_Improve~= 1) & (Keep_Any=2 | Find_Any=2 | Improve_Any=2)) Keep_Find_Improve=2.
IF (Keep_Find_Improve~=1 & Keep_Find_Improve~=2) Keep_Find_Improve=99.
EXECUTE.
However, the first part correctly identifies cases coded as 1, but the rest of the syntax doesn't work. This is in despite of using some syntax that uses the exact same logic on other variables:
COMPUTE Keep_Any= Q9a_recoded = 1 | Q9b_recoded = 1 | Q9c_recoded = 1.
EXECUTE.
IF (Q9A_recoded=1 OR Q9B_recoded=1 OR Q9C_recoded=1) Keep_Any=1.
IF ((Keep_Any~=1) & (Q9A_recoded= 2 OR Q9B_recoded=2 OR Q9C_recoded=2)) Keep_Any=2.
IF (Keep_Any~=1 & Keep_Any~=2) Keep_Any=99.
EXECUTE.
Does anyone have any ideas of why this is happening and how to fix it?
Try:
COMPUTE Target1=99.
COMPUTE Target1=ANY(2, V1, V2, V3).
COMPUTE Target1=ANY(1, V1, V2, V3).
Or better still (more efficient, even if more lines of code)
DO IF ANY(1, V1, V2, V3)=1.
COMPUTE Target2= 1.
ELSE IF ANY(2, V1, V2, V3)=1.
COMPUTE Target2 = 2.
ELSE.
COMPUTE Target2=99.
END IF.
The reason your syntax isn't working is the second condition you are using:
IF ((Keep_Find_Improve~= 1) & ...
When the first condition wasn't met, Keep_Find_Improve is still missing, and therefore doesn't meet the condition of ~=1.
The same thing goes later for IF (Keep_Any~=1 & Keep_Any~=2).
So you shouldn't be comparing Keep_Find_Improve to 1 or 2, you should be checking if it has received a value or if it's still missing:
IF (Keep_Any=1 OR Find_Any=1 OR Improve_Any=1) Keep_Find_Improve=1.
IF (missing(Keep_Find_Improve) & (Keep_Any=2 | Find_Any=2 | Improve_Any=2)) Keep_Find_Improve=2.
IF missing(Keep_Find_Improve) Keep_Find_Improve=99.
* alternatively: recode Keep_Find_Improve(miss=99).
EXECUTE.
All this being said, I recommend you use the more advanced code suggested by #JigneshSutar.

combine pipe delimited lines of text in multiple lines with vbscript?

I've got a text file of output that looks essentially like this:
SMITHERSON, SMITH|00012345|15-Jan-1999|000885340
619649339|29-Sep-2015 00:09:30|Black|JOHNERSON, JOHN
00067890|02-Dec-1996|000490365|620094551
29-Sep-2015 23:06:01|Green|DAVISON, DAVE|00086543|06-Jun-2001|000938585
226438332|28-Sep-2015 00:12:12|Yellow
Seven pieces of data, they are always in the correct order but unfortunately they run together and onto different lines. There are carriage return + line feeds at the end of each line and there aren't pipe delimiters. The individual pieces of data are never split over multiple lines - I'm having a hard time explaining so here's another example:
DATA 1|DATA 2|DATA 3
DATA 4
DATA 5|DATA 6|DATA 7
DATA 1|DATA 2|DATA 3|DATA 4
DATA 5|DATA 6|DATA 7
etc...
They will have spaces between them but each piece of data will always stay on it's own line.
And I'm trying to turn it into this:
SMITHERSON, SMITH|00012345|15-Jan-1999|000885340|619649339|29-Sep-2015 00:09:30|Black
JOHNERSON, JOHN|00067890|02-Dec-1996|000490365|620094551|29-Sep-2015 23:06:01|Green
DAVISON, DAVE|00086543|06-Jun-2001|000938585|226438332|28-Sep-2015 00:12:12|Yellow
DATA 1|DATA 2|DATA 3|DATA 4|DATA 5|DATA 6|DATA 7
DATA 1|DATA 2|DATA 3|DATA 4|DATA 5|DATA 6|DATA 7
etc.
Seven pieces of data each on their own line, but still seperated by the '|' for another piece of software to read correctly.
I am spending about one hour every day correcting the text files by hand, so I've been trying to find an example I can work from to do this for a while but have not had any luck wrapping my head around this.
This code is ok. I only tested your sample text, not big files.
It will replace line feeds with the delimiter, then convert the entire file into one big array:
Set fso = CreateObject("Scripting.FileSystemObject")
Set input = fso.OpenTextFile("input.txt", 1)
Set output = fso.OpenTextFile("output.txt", 2, True)
Dim data: data = input.ReadAll
input.Close()
data = Replace(data, vbCrlf, "|")
data = Split(data, "|")
For i=0 To UBound(data) Step 7
output.WriteLine data(i) & "|" & data(i+1) & "|" & data(i+2) & "|" & data(i+3) & "|" & data(i+4) & "|" & data(i+5) & "|" & data(i+6)
Next
output.Close()
Untested, but something like this might do it. (Essentially it copies input to output as a stream, but newlines in the input are converted to pipe characters and every seventh pipe in the output is converted to a newline)
Set fs = CreateObject("Scripting.FileSystemObject")
Set f = fs.OpenTextFile("D:\data\thefile.txt", 1)
Set o = fs.OpenTextFile("D:\data\combined.txt", 2, True)
pipecount = 0
Do While f.AtEndOfFile <> True
If f.AtEndOfLine = True Then
c = f.Read(2) ' Skip the CR+LF
c = "|" ' and pretend we got a pipe character
Else
c = f.Read(1)
End If
If c = "|" Then
pipecount = pipecount + 1
If pipecount = 7 Then
pipecount = 0
o.WriteLine()
Else
o.Write("|")
End If
Else
o.Write(c)
End If
End While
o.Close()

Fit many pieces of data in Gnuplot's for loop

Data
Model Decreasing-err Constant-err Increasing-err
2025 73-78 80-85 87-92
2035 63-68 80-85 97-107
2050 42-57 75-90 104.5-119.5
which data-structure (use of -err) described here.
To plot the points, I run
set terminal qt size 560,270;
set grid; set offset 1,1,0,0;
set datafile separator " -";
set key autotitle columnhead;
plot for [i=2:6:2] "data.dat" using 1:(0.5*(column(i)+column(i+1))):(0.5*(column(i+1)-column(i))) with yerror;
and get
However, I would like to add a line fits to these points which you cannot do just with with yerrorlines because of kinks.
My pseudocode for fitting the increasing and decreasing lines
inc(x) = k1*x + k2;
con(x) = n1*x + n2;
dec(x) = m1*x + m2;
fit inc(x), con(x) dec(x) for [i=2:6:2] "data.dat"
using 1:(0.5*(column(i)+column(i+1))):(0.5*(column(i+1)-column(i)))
via k1,k2,n1,n2,m1,m2;
where the problem is in using the function fit with for loop.
How can you use Gnuplot's fit in a for loop?
I would like to fit many lines at the same time to the data.
I would use do in conjunction with eval to do this:
# Define your functions (you can also use do + eval to do this)
f1(x) = a1*x+b1
f2(x) = a2*x+b2
f3(x) = a3*x+b3
# Loop
do for [i=1:3] {
eval sprintf("fit f%g(x) 'data.dat' u 0:%g via a%g, b%g", i, i, i, i)
}
You can adapt the above to your own purposes.

Running R Code from Command Line (Windows)

I have some R code inside a file called analyse.r. I would like to be able to, from the command line (CMD), run the code in that file without having to pass through the R terminal and I would also like to be able to pass parameters and use those parameters in my code, something like the following pseudocode:
C:\>(execute r script) analyse.r C:\file.txt
and this would execute the script and pass "C:\file.txt" as a parameter to the script and then it could use it to do some further processing on it.
How do I accomplish this?
You want Rscript.exe.
You can control the output from within the script -- see sink() and its documentation.
You can access command-arguments via commandArgs().
You can control command-line arguments more finely via the getopt and optparse packages.
If everything else fails, consider reading the manuals or contributed documentation
Identify where R is install. For window 7 the path could be
1.C:\Program Files\R\R-3.2.2\bin\x64>
2.Call the R code
3.C:\Program Files\R\R-3.2.2\bin\x64>\Rscript Rcode.r
There are two ways to run a R script from command line (windows or linux shell.)
1) R CMD way
R CMD BATCH followed by R script name. The output from this can also be piped to other files as needed.
This way however is a bit old and using Rscript is getting more popular.
2) Rscript way
(This is supported in all platforms. The following example however is tested only for Linux)
This example involves passing path of csv file, the function name and the attribute(row or column) index of the csv file on which this function should work.
Contents of test.csv file
x1,x2
1,2
3,4
5,6
7,8
Compose an R file “a.R” whose contents are
#!/usr/bin/env Rscript
cols <- function(y){
cat("This function will print sum of the column whose index is passed from commandline\n")
cat("processing...column sums\n")
su<-sum(data[,y])
cat(su)
cat("\n")
}
rows <- function(y){
cat("This function will print sum of the row whose index is passed from commandline\n")
cat("processing...row sums\n")
su<-sum(data[y,])
cat(su)
cat("\n")
}
#calling a function based on its name from commandline … y is the row or column index
FUN <- function(run_func,y){
switch(run_func,
rows=rows(as.numeric(y)),
cols=cols(as.numeric(y)),
stop("Enter something that switches me!")
)
}
args <- commandArgs(TRUE)
cat("you passed the following at the command line\n")
cat(args);cat("\n")
filename<-args[1]
func_name<-args[2]
attr_index<-args[3]
data<-read.csv(filename,header=T)
cat("Matrix is:\n")
print(data)
cat("Dimensions of the matrix are\n")
cat(dim(data))
cat("\n")
FUN(func_name,attr_index)
Runing the following on the linux shell
Rscript a.R /home/impadmin/test.csv cols 1
gives
you passed the following at the command line
/home/impadmin/test.csv cols 1
Matrix is:
x1 x2
1 1 2
2 3 4
3 5 6
4 7 8
Dimensions of the matrix are
4 2
This function will print sum of the column whose index is passed from commandline
processing...column sums
16
Runing the following on the linux shell
Rscript a.R /home/impadmin/test.csv rows 2
gives
you passed the following at the command line
/home/impadmin/test.csv rows 2
Matrix is:
x1 x2
1 1 2
2 3 4
3 5 6
4 7 8
Dimensions of the matrix are
4 2
This function will print sum of the row whose index is passed from commandline
processing...row sums
7
We can also make the R script executable as follows (on linux)
chmod a+x a.R
and run the second example again as
./a.R /home/impadmin/test.csv rows 2
This should also work for windows command prompt..
save the following in a text file
f1 <- function(x,y){
print (x)
print (y)
}
args = commandArgs(trailingOnly=TRUE)
f1(args[1], args[2])
No run the following command in windows cmd
Rscript.exe path_to_file "hello" "world"
This will print the following
[1] "hello"
[1] "world"

Resources