Force an error in SPSS Syntax when condition is met - syntax

I'm doing a check in my syntax to see if all of the string fields have values. This looks something like this:
IF(STRING ~= "").
Now, instead of filtering or computing a variable, I would like to force an error if there are fields with no values. I'm running a lot of these scripts and I don't want to keep checking the datasets manually. Instead, I just want to receive an error and stop execution.
Is this possible in SPSS Syntax?

First, you can efficiently count the blanks in your string variables with COUNT:
COUNT blanks = A B C(" ").
where you list the string variables to be checked. So if the sum of blanks is > 0, you want the error to be raised. First aggregate to a new dataset and activate it:
DATASET DECLARE sum.
AGGREGATE /OUTFILE=sum /count=sum(blanks).
The hard part is getting the job to stop when the blank count is greater than 0. You can put the job in a syntax file and run it using INSERT with ERROR=STOP, or you can run it as a production job via Utilities or via the -production flag on the command line of an spj (production facility) job.
Here is how to generate an error on a condition.
DATASET ACTIVATE sum.
begin program.
import spssdata
curs = spssdata.Spssdata()
count = curs.fetchone()
curs.CClose()
if count[0] > 0:
spss.Submit("blank values exist")
end program.
DATASET CLOSE sum.
This code reads the aggregate file and if the blank count is positive issues an invalid command causing an error, which stops the job.

Related

How to skip run time error in tibco?

I am new to tibco and I am working on tibco BW 5.X versions.
I have a scenario where I am working on multiple records coming in from a schema and I have to write a text file with only specific values out of those records.
Ex :
if this is the input:
<param>1</param>
<param>2</param>
<param>1</param>
<param>1</param>
I only have to write the param having values 1 and have to generate error for param having values 2 but after generating error the iteration that is currently going on should continue and must not stop.
I would be grateful if someone can help
I assume in case of the value "2" you want to invoke the "Generate Error" Activity to throw an error up to a calling process or client that some entry was not correct, right?
So if you want to make sure to process the whole list you should not throw the error in the loop group on the list as it will exit.
You can either:
Use 2 seperated lists
map the entries with value "1" into a good list that enter the loop and entries with value "2" into a bad list, that will if filled let you then invoke the "Generate Error" activity after the loop processing.
Append the entries with value "2" in your loop
Thereby after processing the loop you have these entries and if the list contains entries invoke the "Generate Error" activity.
Hope that helps
Cheers Seb
P.s.: if you upload your process it would be more clear to show ;)
You could create a output schema which contains only param1 values and use a mapper activity to perform corresponding transformation and xpath functions for filtering. If you try to implement this solution you can eliminate the chance of param2 values creeping into your output.

DSSetParam -4 error when filling a parameter with a routine

I'm building a Sequence job that contains a UserVariables activity (ParamLoading) and a Job activity (ExtractJob), in that order. ParamLoading creates 4 user variables and invokes a routine to fill each one with the correspondng value, then invokes ExtractJob pasiing it the parametes previously loaded.
ParamLoading invokes a server routine (GetParams) which simply executes a shell script (ShellQuery) and captures the result; that shell script executes an SQL query against an Oracle database and prints the result on screen.
As far as tests go, ShellQuery works as expected and GetParams returns the expected value. But when GetParams is invoked from the sequence job (no matter if it's in ParamLoading or ExtractJob) the job fails with the following error:
Test2..JobControl (#JOB033_TBK_026_EXT_PTLF): Controller problem: Error calling DSSetParam(RUTA_ORIGEN), code=-4
[ParamValue/Limitvalue is not appropriate]
I've checked data types, parameter names, all, without success or even a message saying what might be happening.
Code of ShellQuery:
value=$(sqlplus -s $1/$2#$3/$4 <<!
set heading off
set feedback off
set pages 0
select param_value from cfg_params where filter='$5' and param_name='$6';
!)
echo $value
Code of GetParams:
Call DSExecute("UNIX", Ruta_Programas:"getparams.sh ":Username:" ":Password:" ":Server:" ":ServiceId:" ":Filtro:" ":Parametro, Output, SystemReturnCode)
Ans = Output
Return(Ans)
What are you returning as values from GetParams?
Calling a function from a sequence expects an integer value back and any non-zero digit returned is evaluated as an error.
As a test, try changing the return value from the routines to values 0-4.
Solved. For those struggling with a similar problem, GetParams was returning the captured value from ShellQuery adding a special character called "field delimiter", and given that the character is a 254 in ASCII, any job receiving the parameter would complain of an invalid value, which was the error.
Changing the routine to the following solved it:
Call DSExecute("UNIX", Ruta_Programas:"getparams.sh ":Username:" ":Password:" ":Server:" ":ServiceId:" ":Filtro:" ":Parametro, Output, SystemReturnCode)
Ans = EReplace(Output, #FM, "")
Return(Ans)
Thanks to Matt Calderon for providing a clue for solving.

Hide Labels with No Data in SPSS

I just started using SPSS, there is a option of Select cases that I was trying in SPSS, and later on finding frequency based on that filter.
For Eg:
Suppose Q1 has 12 parts, Q1_1 Q1_2 Q1_3 Q1_4 Q1_5 Q1_6 Q1_7 Q1_8 Q1_9 Q1_10 Q1_11 Q1_12
I want to see data in these variables based on a condition that I used in select cases. Now when I try to see frequencies of these variables based on the filter, only 4 out of 12 satisfy has data.
Now my question is can I hide rest 8 and show only 4 with data on my output window.
It's not entirely clear what you are trying to describe however reading between the lines, I'm guessing you are trying to delete tables generated from FREQUENCIES which may happen to be empty (likely due to a filter applied but perhaps not necessarily either)
You could do this with SPSS Scripting but avoiding that, you may want to explore using CTABLES, which though may not be in the exact same format as FREQUENCY table output it will still none the less retrieve the same information.
Solution below. Assumes Python Integration with SPSS SELECT VARIABLES installed and of course the CTABLE add-on module.
/****** Simulate example data ******/.
input program.
loop #j = 1 to 100.
compute ID=#j.
vector Q(12).
loop #i = 1 to 12.
do if #j<51 and #i<9.
compute Q(#i) = $sysmis.
else.
compute Q(#i) = trunc(rv.uniform(1,5)).
end if.
end loop.
end case.
end loop.
end file.
end input program.
execute.
/************************************/.
/* frequencies without filtering applied */.
freq q1 to q12.
/* frequencies WITH filtering applied */.
/* Empty table here shoult be removed */.
temp.
select if (ID<51).
freq q1 to q12.
spssinc select variables macroname="!Qp" /properties pattern = "^Q\d+$"/options separator="+" order=file.
spssinc select variables macroname="!Qs" /properties pattern = "^Q\d+$"/options separator=" " order=file.
temp.
select if (ID<51).
ctables /table (!Qp)[c][count colpct]
/categories variables=!Qs empty=exclude.
Note if you had assess empty variables at a total level then there is a function in spssaux2 (spssaux2.FindEmptyVars) which could help you find the empty variables and then you could build the syntax to exclude these and so contain the variables with only valid responses and then run FREQUENCIES. But I don't think spssaux2.FindEmptyVars will honor any filtering.

Format statement with unknown columns

I am attempting to use fortran to write out a comma-delimited file for import into another commercial package. The issue is that I have an unknown number of data columns. My output needs to look like this:
a_string,a_float,a_different_float,float_array_elem1,float_array_elem2,...,float_array_elemn
which would result in something that might look like this:
L1080,546876.23,4325678.21,300.2,150.125,...,0.125
L1090,563245.1,2356345.21,27.1245,...,0.00983
I have three issues. One, I would prefer the elements to be tightly grouped (variable column width), two, I do not know how to define a variable number of array elements in the format statement, and three, the array elements can span a large range--maybe 12 orders of magnitude. The following code conceptually does what I want, but the variable 'n' and the lack of column-width definition throws an error (of course):
WRITE(50,900) linenames(ii),loc(ii,1:2),recon(ii,1:n)
900 FORMAT(A,',',F,',',F,n(',',F))
(I should note that n is fixed at run-time.) The write statement does what I want it to when I do WRITE(50,*), except that it's width-delimited.
I think this thread almost answered my question, but I got quite confused: SO. Right now I have a shell script with awk fixing the issue, but that solution is...inelegant. I could do some manipulation to make the output a string, and then just write it, but I would rather like to avoid that option if at all possible.
I'm doing this in Fortran 90 but I like to try to keep my code as backwards-compatible as possible.
the format close to what you want is f0.3, this will give no spaces and a fixed number of decimal places. I think if you want to also lop off trailing zeros you'll need to do a good bit of work.
The 'n' in your write statement can be larger than the number of data values, so one (old school) approach is to put a big number there, eg 100000. Modern fortran does have some syntax to specify indefinite repeat, i'm sure someone will offer that up.
----edit
the unlimited repeat is as you might guess an asterisk..and is evideltly "brand new" in f2008
In order to make sure that no space occurs between the entries in your line, you can write them separately in character variables and then print them out using theadjustl() function in fortran:
program csv
implicit none
integer, parameter :: dp = kind(1.0d0)
integer, parameter :: nn = 3
real(dp), parameter :: floatarray(nn) = [ -1.0_dp, -2.0_dp, -3.0_dp ]
integer :: ii
character(30) :: buffer(nn+2), myformat
! Create format string with appropriate number of fields.
write(myformat, "(A,I0,A)") "(A,", nn + 2, "(',',A))"
! You should execute the following lines in a loop for every line you want to output
write(buffer(1), "(F20.2)") 1.0_dp ! a_float
write(buffer(2), "(F20.2)") 2.0_dp ! a_different_float
do ii = 1, nn
write(buffer(2+ii), "(F20.3)") floatarray(ii)
end do
write(*, myformat) "a_string", (trim(adjustl(buffer(ii))), ii = 1, nn + 2)
end program csv
The demonstration above is only for one output line, but you can easily write a loop around the appropriate block to execute it for all your output lines. Also, you can choose different numerical format for the different entries, if you wish.

Hive - map not sending parameters to custom map script?

I'm trying to use the map clause with Hive but I'm tripping over syntax and not finding many examples of my use case around. I used the map clause before when I had to process one of the columns of a table using an external script.
I had a python script called, say, run, that took one command line parameter and spit out three space separated values. So I just did:
FROM(MAP
tablename.columnName
USING
'run' AS
result1, result2, result3
FROM
tablename
) map_output
INSERT OVERWRITE TABLE results SELECT *;
Now I have a python script that receives a lot more parameters and tried a few things that didn't worked and couldn't find examples on this. I did the obvious thing:
FROM
(MAP
numAgents, alpha, beta, burnin, nsteps, thin
USING
'runAuthorityMCMC' AS numAgents, alpha, beta, energy, avgDegree, maxDegree, accept
FROM
parameters
) map_output
INSERT OVERWRITE TABLE results SELECT *;
But I got an error A user-supplied transfrom script has exited with error code 2 instead of 0. When I run runAuthorityMCMC, with 6 command line parameters sampled from that table, it works perfectly well.
It seems to me it's trying to run the script without passing the parameters at all. In one of the error messages I got exactly the output I expected if this was the case. What is the correct syntax to do what I'm trying to do?
EDIT:
Confirming - this was part of the error message:
usage: runAuthorityMCMC [-h]
numAgents normalizedBrainCapacity ecologicalPressure
burnInSteps monteCarloSteps thiningRatio
runAuthorityMCMC: error: too few arguments
Which is exactly the output I'd expect with too few arguments. The script should take six arguments.
Ok, perhaps there is a difference of vocabulary here but hive doesn't send the values as "arguments" to the script. They are read in through standard input (which is different than passing something as argument). Also, you can try sending the data to /bin/cat so see what's actually being sent to the hive. If my memory serves me right, the values are sent tab separated and result emitted out from the script is also expected to be tab separated.
Trying printing stuff from stdout (or stderr) in your script, you will see the result in your jobtracker logs. That will help you debug.

Resources