Rearrange character column using "PROC FORMAT" in SAS - format

I want to take the follow data variable:
"Nebraska-Iowa"
"Washington-Arkansas"
"Illinois-Utah"
and transform it so that it orders the character groups around the hyphen to be in alphabetical order:
"Iowa-Nebraska"
"Arkansas-Washington"
"Illinois-Utah"
Is there an easy way to do this? I need to split the string around the hyphen, rearrange if necessary, and than paste back together.
UPDATE
After playing with Matthew's answer, I have decide to generalize this for any number of states with the following dataset:
Nebraska-Iowa
Washington-Arkansas-Texas
Illinois-Utah
Colorado
Here is the code I am trying to build. What I am struggling with is building an array that I loop through, pull out the appropriate word, and then pasting them back together after arranging. Please help!
/*Example dataset*/
data have;
format text $50.;
input text;
datalines;
Nebraska-Iowa
Washington-Arkansas-Texas
Illinois-Utah
Colorado
run;
/*Rearrange strings in dataset*/
data arrangestrings;
set have;
length result $50;
howmanyb = countc(text,'-');
howmany = howmanyb + 1;
array state[howmany] _character_;
do i=1 to howmany;
state[i] = scan(text, i, '-');
end;
call sortc(of state(*));
result = catx("-", state[*]);
keep result;
run;

I don't think you need to go to the trouble of defining a user-defined format for a task like this. The built-in scan method is your friend here:
data have;
format text $50.;
input text;
datalines;
Nebraska-Iowa
Washington-Arkansas
Illinois-Utah
run;
data want;
set have;
length word1 word2 result $50;
word1 = scan(text, 1, '-');
word2 = scan(text, 2, '-');
result = ifc(word1 <= word2, text, catx('-', word2, word1));
run;
proc print data=want;
run;
Check out the documentation on the built-in functions that I used (scan, ifc, catx) if you're not familiar with them:
http://support.sas.com/documentation/cdl/en/allprodslang/67244/HTML/default/viewer.htm#syntaxByType-function.htm

Related

Adding leading Zeros into Day and Month Value

I have a simple table that has a column with a date in this format:
MM/DD/YYYY.
Unfortunately, there are some folks who are working without leading zeros.
Therefore I would like to add a leading zero into the Month and Day element using Power Query to have a common format.
But how? Does someone have any function to share?
Again, not sure why you want to do this, but
Assuming all of the entries are text that looks like dates, you can use the following M-Code:
Split the string on the delimiter
Change each entry in the list to a number
Add 2000 to the last number
Change the numbers back to text with a "00" format
Recombine with the delimiter
let
Source = Excel.CurrentWorkbook(){[Name="Table29"]}[Content],
//set type = Text
#"Changed Type" = Table.TransformColumnTypes(Source,{{"TextDate", type text}}),
xform = Table.TransformColumns(#"Changed Type",
{"TextDate", each
let
x = Text.Split(_,"/"),
y = List.Transform(x,each Number.From(_)),
z = List.ReplaceRange(y,2,1, {2000+y{2}}),
a= List.Transform(z,each Number.ToText(_,"00")),
b = Text.Combine(a,"/")
in b})
in
xform
I am thinking a better solution might be to set up your data entry method so that all dates are entered as dates rather than text

How to do double for loop to generate keep list in SAS?

I have a very large dataset with over 1000 columns, with column names formatted like this:
WORLDDATA.table2_usa_2017_population
WORLDDATA.table2_japan_2017_gnp
I only need to keep a subset of these parameters for a select few countries. I specify the custom lists:
%let list1 = usa canada uk japan southafrica;
%let list2 = population crimerate gnp;
How do I do a double for loop like so:
param_list = []
for (i in list1) {
for (j in list2) {
param_name = WORLDDATA.table2_{list1[i]}_2017_{list2[j]}
param_list.append(param_name)
}
}
such that I can use param_list in
data final_dataset;
set WORLDDATA.table2;
keep {param_list};
run;
Thank you!
Your original data set has data items country and topic encoded into the column name (metadata) you will probably need to transpose the data for use in SAS procedure steps that would use statements such as where, by and class.
Proc TRANSPOSE can pivot data from wide to tall and the output will have a column named _NAME_ which can be used in a where=(where-statement) option on the output data set. Th where-statement would be a regex expression having your lists specified as alternation (|) items in a group (such as (item-1|...|item-N)). The regex engine would perform the implicit outer join that the nested loop in the question pseudo code does. The regex pattern would use the /ix modifiers in order to have a pattern formatted for human readability that also ignores case.
In order to have Proc TRANSPOSE pivot each row of a data set, the data set needs to have row key (a variable or variables in combination) that are distinct from row to row.
Untested example:
proc transpose data=have_wide out=want_subset_categorical (where=(
prxmatch("(?ix)/
table2_
%sysfunc(translate(&LIST1.,|,%str( )) (?# list 1 spaces converted to | ors )
_2017_
%sysfunc(translate(&LIST2.,|,%str( )) (?# list 2 spaces converted to | ors )
/",_name_)
));
by <row-key>;
run;

replace multiple words in string with specific words from list

How can I, using M-language, replace specific words in a string with other specific words that are specified in a table?
See my example data:
Source code:
let
someTable = Table.FromColumns({{"aa &bb &cc dd","&ee ff &gg hh &ii"}, {Table.FromColumns({{"&bb","&cc"}, {"ReplacementForbb", "ccReplacement"}},{"StringToFind", "ReplaceWith"}), Table.FromColumns({{"&ee", "&gg","&ii"}, {"OtherReplacementForee", "SomeReplacementForgg", "Replacingii"}},{"StringToFind", "ReplaceWith"})}, {"aa ReplacementForbb ccReplacement dd","OtherReplacementForee ff SomeReplacementForgg hh Replacingii"}},{"OriginalString", "Replacements", "WantedResult"})
in
someTable
This is a neat question. You can do this with some table and list M functions as a custom column like this:
= Text.Combine(
List.ReplaceMatchingItems(
Text.Split([OriginalString], " "),
List.Transform(Table.ToList([Replacements]),
each Text.Split(_,",")
)
),
" ")
I'll walk through how this works using the first row as an example.
The [OriginalString] is "aa &bb &cc dd" and we use Text.Split to convert it to a list.
"aa &bb &cc dd" --Text.Split--> {"aa", "&bb", "&cc", "dd"}
Now we need to work on the [Replacements] table and convert it into a list of lists. It starts out:
StringToFind ReplaceWith
------------------------------
&bb ReplacementForbb
&bb ccReplacement
Using Table.ToList this becomes a two element list (since the table had two rows).
{"&bb,ReplacementForbb","&cc,ccReplacement"}
Using Text.Split on the comma, we can transform each element into a list to get
{{"&bb","ReplacementForbb"},{"&cc","ccReplacement"}}
which is the form we need for the List.ReplaceMatchingItems function.
List.ReplaceMatchingItems(
{"aa", "&bb", "&cc", "dd"},
{{"&bb","ReplacementForbb"},{"&cc","ccReplacement"}}
)
This does the replacement and returns the list
{"aa","ReplacementForbb","ccReplacement","dd"}
Finally, we use Text.Combine to concatenate the list above into a single string.
"aa ReplacementForbb ccReplacement dd"

Check for length of a character string in SAS Proc Format

I would like to write a PROC FORMAT to check for errors in a variable that serves as a unique identifier. The variable is a character string of length 16, and it usually has a number of trailing zeros, like so:
0000001234567890
I would like the PROC to output an error to the log if, for example, the variable is null or if the length of the sting is different from 16. Can this be done in the same proc, without having to go through functions such as length()?
what I would like to obtain is something like:
proc format;
value $ id_error
' ' = _ERROR_
*length ne 16 = _ERROR_;
*other errors* = _ERROR_;
other = 'OK';
run;
Is something equivalent to the above possible to do with a single proc format?
Reeza's suggestion to use PROC FCMP is along the right path I think. You can't really check for length in a format without it.
This is covered in the documentation here. The basic structure is, write a fcmp function that takes a character value as input (for a character format) and returns a character value, and then call that fcmp function with no arguments in the format; the input value for the format will be provided automatically.
In 9.3+:
data have;
length cardno $32;
input cardno;
datalines;
1234567890123456
0000153456789152
0000000000000000
1111111111111111
9999999999999999
0123456
01234567897456
0123154654564897987445
;;;;
run;
proc fcmp outlib=work.funcs.fmts;
function check16fmt(charval $) $;
length retval $16;
if length(charval) = 16 then retval='VALID VALUE';
else retval='_ERROR_';
return(retval);
endsub;
run;
options cmplib=work.funcs;
proc format;
value $chk16f
low-high = [check16fmt()];
quit;
data want;
set have;
format cardno $chk16f.;
run;

Reading in dates using informats in SAS when raw data is messy

I am essentially trying to read messy data into SAS using informats and having problems. I have column of data of the following form in a raw txt file, say:
RegDate
0
0
16/10/2002
20/11/2003
0
For RegDate, 0 = missing, otherwise the date is present. I would like to read this data into SAS, giving 'NA' for the zeros and the date for the date, and output into a dataset.
If all dates were present, I could use the code
data test;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile "&pathlocation" delimiter='09'x
MISSOVER DSD firstobs=2 ;
informat RegDate ddmmyy10. ;
format RegDate ddmmyy10. ;
input
RegDate;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
However I cannot read the above text file doing this as it does not take into account the zeros, as the informat is set to read in dates.
If using a proc import statement
proc import datafile="&pathlocation" out=test dbms=tab replace;
run;
it tries to use a best32. informat, as there is a zero in the first row. The dates cannot then be read in.
So I need to create a custom format of some sort. I can do this for a numeric informat alone or a character informat alone, or a picture informat (which is needed for the dates?). I cannot figure out how to combine multiple formats for one variable. I'm sure the solution is very simple however I cannot find it online so I apologise if this is obvious. Is there either a way to a) put some IF-THEN statement into the format so that it does different things depending on the input b) read the data in purely as text so that the formats need to be used.
NA's are text and not valid in SAS - they're used in R. To indicate that the value is missing for a numeric variable SAS uses a period (.). Reading the data in with your code assigns the 0 to missing which would be an appropriate read of the data.
If you want NA you'll need to read or convert the data to text, but then your dates will be text and you'll be limited in what you can do with them, for example no date calculations.
If you really want you could display it that way using a nested format.
proc format;
value na_date_fmt
low-high = [ddmmyy10.]
. = "NA";
run;
data have;
infile cards dsd;
informat regDate ddmmyy10.;
format regDate ddmmyy10.;
format newDate na_date_fmt.;
input regdate;
newDate=regdate;
cards;
0
0
16/10/2002
20/11/2003
0
;
run;
proc print data=have;
run;
You can add an IF statement to the DATA step, like this:
data test;
infile "&pathlocation" delimiter='09'x
MISSOVER DSD firstobs=2 ;
informat RegDate ddmmyy10. ;
format RegDate ddmmyy10. ;
input
RegDate;
if RegDate = 0 then RegDate = .;
run;
The output is
RegDate
.
.
16/10/2012
20/11/2003
.

Resources