SPSS Syntax Concatenate Case Values From Single Column - syntax

I am trying to build a string of values to be inserted into an SQL IN list. For example -
SELECT * FROM TABLE WHERE field IN ('AAA', 'BBB', 'CCC', 'DDD')
The list that I want needs to be constructed from values within a single column of my dataset but I'm struggling to find a way to concatenate those values.
My first thought was to use CASESTOVARS to put each of the values into columns prior to concat. This is simple but the number of cases is variable.
Is there a way to concat all fields without specifying?
Or is there a better way to go about this?
Unfortunately Python is not an option for me in this instance.
A simple sample dataset would be -
CasestoConcat
AAA
BBB
CCC
DDD

You can use the lag function for this.
First creating a bit of sample data to demonstrate on:
data list free/grp (F1) txt (a5).
begin data
1 "aaa" 1 "bb" 1 "cccc" 2 "d" 2 "ee" 2 "fff" 2 "ggggg" 3 "hh" 3 "iii"
end data.
Now the following code makes sure that rows that belong together are consecutive. You can also sort by any other relevant variable to keep the combined text in a specific order.
sort cases by grp.
string merged (A1000).
compute merged=txt.
if $casenum>1 and grp=lag(grp) merged=concat(rtrim(merged), " ", rtrim(lag(merged))).
exe.
At this point if you want to just keep the line that has all the concatenated texts, you can use this:
add files /file=* /by grp /last=lst.
select if lst=1.
exe.

Related

How to use two different excel files in same syntax procedure?

I have an excel file with information about variables (excel1) and another one with information about lists (excel2).
In order to create a syntax to generate a new syntax to create VARIABLE and VALUES LABELS, I used solution proposed by #eli.k here.
But with this solution I have to have a dataset with lists so I could use it instead of writing it “by hand” (copy/paste) (here). One problem came with L2, which has 195 entries so the new create variable would need to be bigger that 20.000 characters (is this possible in SPSS?), appearing all in one line.
What I want to know is if it’s possible to use excel2 automatically in code, line by line.
Using the following code:
GET DATA
/TYPE=XLSX
/FILE=" D:\excel1.xlsx "
/SHEET=name 'Folha1'
/CELLRANGE=FULL
/READNAMES=ON
/DATATYPEMIN PERCENTAGE=95.0.
STRING cmd1 cmd2 (a200).
SORT CASES by List.
MATCH FILES /FILE=* /FIRST=first /LAST=last /BY List. /* marking first and last lines.
DO IF first.
COMPUTE cmd1="VARIABLE LABELS".
COMPUTE cmd2="VALUE LABELS".
END IF.
IF not first cmd1=concat(rtrim(cmd1), " "). /* "/" only appears from the second varname.
COMPUTE cmd1=concat(rtrim(cmd1), " ", Var_label).
COMPUTE cmd2=concat(rtrim(cmd2), " ", Var).
DO IF last.
COMPUTE cmd1=concat(rtrim(cmd1), ".").
COMPUTE cmd2=concat(rtrim(cmd2), " ",' 1 "Afghanistan" 2 "Albania" (…) 195 "Zimbabwe".').
END IF.
EXECUTE.
SELECT IF ('List' 'L2').
ADD FILES /file=* /rename cmd1=cmd /file=* /rename cmd2=cmd.
EXECUTE.
I would like to know if there is a way to replace ' 1 "Afghanistan" 2 "Albania" (…) 195 "Zimbabwe".'' by some function/procedure to grab information from excel2 concerning L2, and showing it line by line:
(…)
VARIABLE LABELS V2 "Country"
/ V3 "Country Mother"
/ V4 "Country Father".
VALUE LABELS V2
V3
V4
1 "Afghanistan"
2 "Albania"
(…)
195 "Zimbabwe".
Thanks for helping me!
This issue is pretty complex and would usually be beyond the scope of Stack-Overflow Q&A but here's my answer anyway:
First I recreate the parts of your example data concerning the value labels only:
data list list/var list (2a5).
begin data
"v1" "L1"
"v2" "L2"
"v3" "L2"
"v4" "L2"
end data.
dataset name xl1.
data list list/list (a5) nb (f5) nb_txt (a20).
begin data
"L1" 1 "Female"
"L1" 2 "Male"
"L2" 1 "Afghanistan"
"L2" 2 "Albania"
"L2" 43 "Israel"
"L2" 195 "Zimbabwe"
end data.
dataset name xl2.
data list list/v1 v2 v3 v4 (4f3).
begin data
1 1 2 3
2 2 2 43
1 2 1 195
end data.
dataset name gen.
Now to work:
The first part is to create a macro for each list of variable labels. since some of the lists are long, I use ADD Value labels separately for each value.
dataset activate xl2.
string cmd (a200) cmdFin (a20).
sort cases by list nb.
match files /file=* /by list /first=first /last=last.
compute cmd=concat("add value labels !1 ", string(nb,f6), " '", rtrim(nb_txt), "' .").
if first cmd=concat("define dolist_", list, " (!pos=!cmdend) ", rtrim(cmd)).
if last cmdFin=" !enddefine .".
write outfile="path\create value label macros.sps"/cmd/cmdfin.
exe.
insert file="path\create value label macros.sps".
After inserting the generated syntax a macro has been defined for each of the value lists. Now we create an additional syntax that will run the related macro for each of the variable names in the list:
dataset activate xl1.
string cmd (a200).
compute cmd=concat("dolist_", list, " ", var, " .").
write outfile="path\run value label macros.sps"/cmd.
exe.
Now we can actually try out the generated macros on our original data:
dataset activate gen.
insert file="path\run value label macros.sps".

SSRS [Sort Alphanumerically]: How to sort a specific column in a report to be [A-Z] & [ASC]

I have a field set that contains bill numbers and I want to sort them first alphabetically then numerically.
For instance I have a column "Bills" that has the following sequence of bills.
- HB200
- SB60
- HB67
Desired outcome is below
- HB67
- HB200
- SB60
How can I use sorting in SSRS Group Properties to have the field sort from [A-Z] & [1 - 1000....]
This should be doable by adding just 2 separate Sort options in the group properties. To test this, I created a simple dataset using your examples.
CREATE TABLE #temp (Bills VARCHAR(20))
INSERT INTO #temp(Bills)
VALUES ('HB200'),('SB60'),('HB67')
SELECT * FROM #temp
Next, I added a matrix with a single row and a single column for my Bills field with a row group.
In the group properties, my sorting options are set up like this:
So to get this working, my theory was that you needed to isolate the numeric characters from the non-numeric characters and use each in their own sort option. To do this, I used the relatively unknown Regex Replace function in SSRS.
This expression gets only the non-numeric characters and is used in the top sorting option:
=System.Text.RegularExpressions.Regex.Replace(Fields!Bills.Value, "[0-9]", "")
While this expression isolates the numeric characters:
=System.Text.RegularExpressions.Regex.Replace(Fields!Bills.Value, "[^0-9]", "")
With these sorting options, my results match what you expect to happen.
In the sort expression for your tablix/table which is displaying the dataset, set the sort to something like:
=IIF(Fields!Bills.Value = "HB67", 1, IIF(Fields!Bills.Value = "HB200", 2, IIF(Fields!Bills.Value = "SB600", 3, 4)))
Then when you sort A-Z, it'll sort by the number given to it in the sort expression.
This is only a solution if you don't have hundreds of values, as this can become quite tedious to create if there's hundreds of possible conditions.

How to do double for loop to generate keep list in SAS?

I have a very large dataset with over 1000 columns, with column names formatted like this:
WORLDDATA.table2_usa_2017_population
WORLDDATA.table2_japan_2017_gnp
I only need to keep a subset of these parameters for a select few countries. I specify the custom lists:
%let list1 = usa canada uk japan southafrica;
%let list2 = population crimerate gnp;
How do I do a double for loop like so:
param_list = []
for (i in list1) {
for (j in list2) {
param_name = WORLDDATA.table2_{list1[i]}_2017_{list2[j]}
param_list.append(param_name)
}
}
such that I can use param_list in
data final_dataset;
set WORLDDATA.table2;
keep {param_list};
run;
Thank you!
Your original data set has data items country and topic encoded into the column name (metadata) you will probably need to transpose the data for use in SAS procedure steps that would use statements such as where, by and class.
Proc TRANSPOSE can pivot data from wide to tall and the output will have a column named _NAME_ which can be used in a where=(where-statement) option on the output data set. Th where-statement would be a regex expression having your lists specified as alternation (|) items in a group (such as (item-1|...|item-N)). The regex engine would perform the implicit outer join that the nested loop in the question pseudo code does. The regex pattern would use the /ix modifiers in order to have a pattern formatted for human readability that also ignores case.
In order to have Proc TRANSPOSE pivot each row of a data set, the data set needs to have row key (a variable or variables in combination) that are distinct from row to row.
Untested example:
proc transpose data=have_wide out=want_subset_categorical (where=(
prxmatch("(?ix)/
table2_
%sysfunc(translate(&LIST1.,|,%str( )) (?# list 1 spaces converted to | ors )
_2017_
%sysfunc(translate(&LIST2.,|,%str( )) (?# list 2 spaces converted to | ors )
/",_name_)
));
by <row-key>;
run;

How do I trim a csv in Matlab, like I could in bash, in order to load the csv with Matlab's readtable?

The csv that needs to be analysed contains a useless label in the first row.
The headers are located in the second row.
Other useless information from line 102 and on, which totals to 147 lines of uselessness, which contain a different number of columns than the 100 rows above it.
The relavent rows contain numeric values as well as the occasional NaN.
When the csv is opened, it would resemble:
unnecessarily labeled
columnA columnB columnC columnD columnE
1 2 3 4 5
4 5 6 NaN 8
[...]
301 302 303 304 305
data that really belongs in a separate csv
csv sample
unnecessarily labeled,,,,,,,,
columnA,columnB,columnC,columnD,columnE,,,,
1,2,3,4,5,,,,
4,5,6,NaN,8,,,,
301,302,303,304,305,,,,
data,that,really,belongs,in,a,separate,csv,
If I were to pre-process the file in bash I would:
sed -e1,1d $f > $processedFilename #remove the top line
head -n -147 $processedFilename > tmp && mv tmp $processedFilename #remove the last 147 lines
Can I do a similar pre-processing in Matlab? Can this be done more directly with readtable ? In other words, how can I load this csv data into a table preferably with the benefits of the headers populating automagically and with only the relevant rows and columns? In other other words, is there a parallel to
T = readtable('patients.xls',...
'Range','C2:E6',...
for csv data?
There's no direct method in Matlab to skip lines at the end of a file, so really the most you can do is read all of the data, then delete the extraneous rows/columns.
We can specify the number of lines to skip at the beginning of the file by passing 'HeaderLines', though.
I imagine you want a method that works in the general case with files of this format; However, if you know at least how many columns of data there will be, and how many extraneous lines there will be at the end, then this should work:
x = readtable('file', 'HeaderLines', 1);
x = x(:, 1:num_columns);
headers = table2cell(x(1, :));
x.Properties.VariableNames = headers;
x = x(2:size(x, 1) - num_extraneous_rows, :);
First, we manually pick the number of columns to include.
Then, we set the headers of the table.
Finally, we exclude the extraneous rows (and the first row containing the headers).

How convert a number in letter in smarty?

I want to put letters instead of numbers.For example, if I have the following statement:
{for $node=1 to {$nr_nods}}
{$nod}<br>
{/for}
where {$nr_nods}=3, will show
1
2
3
,but Y want display
A
B
C
how make this?
In php, assign an array to the template with the equivalences:
$smarty->assign('nums'=>array(1=>'A',2=>'B',3=>'C'));
and then just output the values by key:
{$nums.$nod}

Resources