SAS Macros in Datalines - syntax

I have a two part question about creating datasets in SAS that calls upon macro variables
Part 1
I'm trying to create a dataset that has one character variable called variable with a length of 100, and 3 observations.
%let first_value=10;
%let second_value=20;
%let third_value=30;
data temp;
infile cards truncover;
input variable $100.;
cards;
First Value: &first_value
Second Value: &second_value
Third Value: &third_value
;
run;
My output dataset doesn't show the macro variables, just the exact text I entered in the datalines. I would love help on syntax of how to concatenate character input with a macro variable. Also I'm curious why sometimes you need a separate length statement for character variables before the input statement when other times you can just specify the length in the input statement like above.
Part 2
Next, I'm trying to create a dataset that has one observation with 4 variables, 3 of which are macro variables.
data temp2;
infile cards dlm=" "
input variable $ first_var second_var third_var
cards;
Observation 1 Filler &first_value &second_value &third_value
;
run;
The 4 spaces in the delimiter statement and between variables in the datalines are actually tabs in my code.
Thanks!

Your examples do not seem to be worth using macro variables.
But if you really need to resolve macro expressions in variable values then use the RESOLVE() function. The RESOLVE() will evaluate all macro code in the text, not just the macro variable references in your example. So any macro function calls and calls to actual macros will be resolved and the generated text returned as the result of the function.
newvar=resolve(oldvar);
So your examples become:
data temp;
infile cards truncover;
input variable $100.;
variable = resolve(variable);
cards;
First Value: &first_value
Second Value: &second_value
Third Value: &third_value
;
data temp2;
infile cards dlm="|" ;
input #;
_infile_=resolve(_infile_);
input variable :$100. first_var second_var third_var ;
cards;
Observation 1 Filler|&first_value|&second_value|&third_value
;
But on the second one be careful as the _INFILE_ variable for CARDS images are fixed multiples of 80 bytes so if the resolved macro expressions make the string longer than the next 80 byte boundary you will lose the extra text.
511 %let xx=%sysfunc(repeat(----+----0,8));
512
513 data test;
514 infile cards truncover;
515 input #;
516 _infile_=resolve(_infile_);
517 input variable $100. ;
518 length=lengthn(variable);
519 put length= variable=;
520 cards;
length=5 variable=short
length=80 variable=long ----+----0----+----0----+----0----+----0----+----0----+----0----+----0----+
NOTE: The data set WORK.TEST has 2 observations and 2 variables.
So use input from an actual file instead. That way the limit is instead the 32,767 byte limit for a character variable.
%let xx=%sysfunc(repeat(----+----0,8));
options parmcards=text;
filename text temp;
parmcards;
short
long &xx
;
531
532
533 data test;
534 infile text truncover;
535 input #;
536 _infile_=resolve(_infile_);
537 input variable $100. ;
538 length=lengthn(variable);
539 put length= variable=;
540 run;
NOTE: The infile TEXT is:
Filename=C:\...\#LN00053,
RECFM=V,LRECL=32767,File Size (bytes)=17,
Last Modified=08Jul2022:23:42:10,
Create Time=08Jul2022:23:42:10
length=5 variable=short
length=95 variable=long ----+----0----+----0----+----0----+----0----+----0----+----0----+----0----+----0----+----0
NOTE: 2 records were read from the infile TEXT.
The minimum record length was 5.
The maximum record length was 8.
NOTE: The data set WORK.TEST has 2 observations and 2 variables.

Related

PROC FREQ in SAS gives too wide columns

I did a simple proc freq in SAS:
PROC FREQ DATA=test;
a * b;
RUN;
This raised the error: insufficient page size to print frequency table
From ERROR: Insufficient page size to print frequency table in SAS PROC FREQ I learned that the error is fixed by enlarging the page size:
option pagesize=max;
But then my table still looked strange with super high white spaces in column b:
Frequency |
Percent |
Row Pct | value 1 | value 2 |
Col Pct | | |
| | |
...etc... ...etc...
| | |
----------+----------+----------+
a | 12 | 3 |
What solved my problem was adding a format to the proc freq that truncated variable b.
PROC FREQ DATA=test;
FORMAT B $7.;
a * b;
RUN;
now my result looks like this and I'm happy enough:
Frequency |
Percent |
Row Pct |
Col Pct | value 1 | value 2 |
----------+----------+----------+
a | 12 | 3 |
I'm left a bit bewilderd, because nowhere in the code did I apply a format to b before, just a lenght statement. Other variables that had their lengths fixed did not have this problem. I did switch from an excel sourcefile to oracle-exadata as source. Is it possible that Oracle pushes variable formats to SAS?
SAS has a nasty habit of attaching formats to character variables pulled from external databases, including PROC IMPORT from an EXCEL file. So if a character variable has a storage length of 200 then SAS will also attach the $200. format to the variable.
When you combine two dataets that both contain the same variable the length will be set by the first version of the variable seen. But the format attached will be set by the first non-empty format seen. So you could combine a dataset where A has length $10 and no format attached with another dataset where A has the format $200. attached and the result will a variable with an actual length of 10 but the $200. format attached.
You can use the format statement where you list variable names but no format specification to remove them. You could do it in the PROC step.
PROC FREQ DATA=test;
tables a * b;
format _character_ ;
RUN;
Or do it in a data step or use PROC DATASETS to modify the formats attached to the variable in an existing dataset.

Replace HEX value with another HEX value in variable-length dataset using SORT/ICEMAN

I have a VB file were can a HEX value ‘0D25’ come in any position from 1 to 20 (values from 21 position should not be changed). This need to be replaced with HEX value ‘4040’.
Input:
----+----1----+----2----+----3----+----4----+
0000/12345678 566 #(#)#0000/12345678 566
FFFF6FFFFFFFF02FFF44B475BFFFF6FFFFFFFF02FFF02
0000112345678D5566005DBD50000112345678D5566D5
Expected output:
----+----1----+----2----+----3----+----4----+
0000/12345678 566 #(#)#0000/12345678 566
FFFF6FFFFFFFF44FFF44B475BFFFF6FFFFFFFF02FFF02
000011234567800566005DBD50000112345678D5566D5
I was using SORT with below control card.
SORT FIELDS=COPY
OUTREC FIELDS=(1,4,5,20,CHANGE=(20,X'0D25',X'4040'),
NOMATCH=(5,20),
21)
CHANGE= does not work the way you think it does. It does a lookup at only the specified position. It then replaces with either the replacement character(s), or with the NOMATCH= character(s) in exactly the length specified as the first sub-parm of CHANGE= (20 in your case).
FINDREP= searches for the specified character(s) in each position, and replaces with the replacement character(s). You limit to the part of the record to be inspected with the STARTPOS=, and ENDPOS= keywords, resp.
In your case the following statement should do what you want:
OUTREC FINDREP=(INOUT=(X'0D25',X'4040'),STARTPOS=5,ENDPOS=24)

Check for length of a character string in SAS Proc Format

I would like to write a PROC FORMAT to check for errors in a variable that serves as a unique identifier. The variable is a character string of length 16, and it usually has a number of trailing zeros, like so:
0000001234567890
I would like the PROC to output an error to the log if, for example, the variable is null or if the length of the sting is different from 16. Can this be done in the same proc, without having to go through functions such as length()?
what I would like to obtain is something like:
proc format;
value $ id_error
' ' = _ERROR_
*length ne 16 = _ERROR_;
*other errors* = _ERROR_;
other = 'OK';
run;
Is something equivalent to the above possible to do with a single proc format?
Reeza's suggestion to use PROC FCMP is along the right path I think. You can't really check for length in a format without it.
This is covered in the documentation here. The basic structure is, write a fcmp function that takes a character value as input (for a character format) and returns a character value, and then call that fcmp function with no arguments in the format; the input value for the format will be provided automatically.
In 9.3+:
data have;
length cardno $32;
input cardno;
datalines;
1234567890123456
0000153456789152
0000000000000000
1111111111111111
9999999999999999
0123456
01234567897456
0123154654564897987445
;;;;
run;
proc fcmp outlib=work.funcs.fmts;
function check16fmt(charval $) $;
length retval $16;
if length(charval) = 16 then retval='VALID VALUE';
else retval='_ERROR_';
return(retval);
endsub;
run;
options cmplib=work.funcs;
proc format;
value $chk16f
low-high = [check16fmt()];
quit;
data want;
set have;
format cardno $chk16f.;
run;

Reading in dates using informats in SAS when raw data is messy

I am essentially trying to read messy data into SAS using informats and having problems. I have column of data of the following form in a raw txt file, say:
RegDate
0
0
16/10/2002
20/11/2003
0
For RegDate, 0 = missing, otherwise the date is present. I would like to read this data into SAS, giving 'NA' for the zeros and the date for the date, and output into a dataset.
If all dates were present, I could use the code
data test;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile "&pathlocation" delimiter='09'x
MISSOVER DSD firstobs=2 ;
informat RegDate ddmmyy10. ;
format RegDate ddmmyy10. ;
input
RegDate;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
However I cannot read the above text file doing this as it does not take into account the zeros, as the informat is set to read in dates.
If using a proc import statement
proc import datafile="&pathlocation" out=test dbms=tab replace;
run;
it tries to use a best32. informat, as there is a zero in the first row. The dates cannot then be read in.
So I need to create a custom format of some sort. I can do this for a numeric informat alone or a character informat alone, or a picture informat (which is needed for the dates?). I cannot figure out how to combine multiple formats for one variable. I'm sure the solution is very simple however I cannot find it online so I apologise if this is obvious. Is there either a way to a) put some IF-THEN statement into the format so that it does different things depending on the input b) read the data in purely as text so that the formats need to be used.
NA's are text and not valid in SAS - they're used in R. To indicate that the value is missing for a numeric variable SAS uses a period (.). Reading the data in with your code assigns the 0 to missing which would be an appropriate read of the data.
If you want NA you'll need to read or convert the data to text, but then your dates will be text and you'll be limited in what you can do with them, for example no date calculations.
If you really want you could display it that way using a nested format.
proc format;
value na_date_fmt
low-high = [ddmmyy10.]
. = "NA";
run;
data have;
infile cards dsd;
informat regDate ddmmyy10.;
format regDate ddmmyy10.;
format newDate na_date_fmt.;
input regdate;
newDate=regdate;
cards;
0
0
16/10/2002
20/11/2003
0
;
run;
proc print data=have;
run;
You can add an IF statement to the DATA step, like this:
data test;
infile "&pathlocation" delimiter='09'x
MISSOVER DSD firstobs=2 ;
informat RegDate ddmmyy10. ;
format RegDate ddmmyy10. ;
input
RegDate;
if RegDate = 0 then RegDate = .;
run;
The output is
RegDate
.
.
16/10/2012
20/11/2003
.

Code Golf: Validate Sudoku Grid

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Introduction
A valid Sudoku grid is filled with numbers 1 to 9, with no number occurring more than once in each sub-block of 9, row or column. Read this article for further details if you're unfamiliar with this popular puzzle.
Challenge
The challenge is to write the shortest program that validates a Sudoku grid that might not be full.
Input will be a string of 9 lines of 9 characters each, representing the grid. An empty cell will be represented by a .. Your output should be Valid if the grid is valid, otherwise output Invalid.
Example
Input
123...789
...456...
456...123
789...456
...123...
564...897
...231...
897...564
...564...
Output
Valid
Input
123456789
987654321
123456789
123456789
987654321
123456789
123456789
987654321
123456789
Output
Invalid
Code Golf Rules
Please post your shortest code in any language that solves this problem. Input and output may be handled via stdin and stdout or by other files of your choice.
Winner will be the shortest solution (by byte count) in a language with an implementation existing prior to the posting of this question. So while you are free to use a language you've just made up in order to submit a 0-byte solution, it won't count, and you'll probably get downvotes.
Golfscript: 56
n%{zip''+9/.{'.'-..&=}%$0=\}:|2*{3/}%|;**"InvV"3/="alid"
C: 165 162 161 160 159
int v[1566],x,y=9,c,b;main(){while(y--)for(x=9;x--+1;)if((c
=getchar()*27)>1242)b|=v[x+c]++|v[y+9+c]++|v[x-x%3+y/3+18+c]
++;puts(b?"Invalid":"Valid");return 0;}
The two newlines are not needed. One char saved by josefx :-) ...
Haskell: 207 230 218 195 172
import List
t=take 3
h=[t,t.drop 3,drop 6]
v[]="V"
v _="Inv"
f s=v[1|v<-[s,transpose s,[g=<<f s|f<-h,g<-h]],g<-map(filter(/='.'))v,g/=nub g]++"alid\n"
main=interact$f.lines
Perl: 168 128
$_=join'',<>;#a=/.../g;print+(/(\d)([^\n]{0,8}|(.{10})*.{9})\1/s
+map"#a[$_,$_+3,$_+6]"=~/(\d).*\1/,0..2,9..11,18..20)?Inv:V,alid
The first regex checks for duplicates that are in the same row and column; the second regex handles duplicates in the "same box".
Further improvement is possible by replacing the \n in the first regex with a literal newline (1 char), or with >= Perl 5.12, replacing [^\n] with \N (3 char)
Earlier, 168 char solution:
Input is from stdin, output is to stderr because it makes things so easy. Linebreaks are optional and not counted.
$_=join'',<>;$m=alid.$/;$n=Inv.$m;/(\d)(\N{0,8}|(.{10})*.{9})\1/s&&
die$n;#a=/.../g;for$i(0,8,17){for$j($i..$i+2){
$_=$a[$j].$a[$j+3].$a[$j+6];/(\d).*\1/&&die$n}}die"V$m"
Python: 230 221 200 185
First the readable version at len=199:
import sys
r=range(9)
g=[raw_input()for _ in r]
s=[[]for _ in r*3]
for i in r:
for j in r:
n=g[i][j]
for x in i,9+j,18+i/3*3+j/3:
<T>if n in s[x]:sys.exit('Invalid')
<T>if n>'.':s[x]+=n
print'Valid'
Since SO doesn't display tab characters, I've used <T> to represent a single tab character.
PS. the same approach minEvilized down to 185 chars:
r=range(9)
g=[raw_input()for _ in r]
s=['']*27
for i in r:
for j in r:
for x in i,9+j,18+i/3*3+j/3:n=g[i][j];s[x]+=n[:n>'.']
print['V','Inv'][any(len(e)>len(set(e))for e in s)]+'alid'
Perl, 153 char
#B contains the 81 elements of the board.
&E tests whether a subset of #B contains any duplicate digits
main loop validates each column, "block", and row of the puzzle
sub E{$V+="#B[#_]"=~/(\d).*\1/}
#B=map/\S/g,<>;
for$d(#b=0..80){
E grep$d==$_%9,#b;
E grep$d==int(($_%9)/3)+3*int$_/27,#b;
E$d*9..$d*9+8}
print$V?Inv:V,alid,$/
Python: 159 158
v=[0]*244
for y in range(9):
for x,c in enumerate(raw_input()):
if c>".":
<T>for k in x,y+9,x-x%3+y//3+18:v[k*9+int(c)]+=1
print["Inv","V"][max(v)<2]+"alid"
<T> is a single tab character
Common Lisp: 266 252
(princ(let((v(make-hash-table))(r "Valid"))(dotimes(y 9)(dotimes(x
10)(let((c(read-char)))(when(>(char-code c)46)(dolist(k(list x(+ 9
y)(+ 18(floor(/ y 3))(- x(mod x 3)))))(when(>(incf(gethash(+(* k
9)(char-code c)-49)v 0))1)(setf r "Invalid")))))))r))
Perl: 186
Input is from stdin, output to stdout, linebreaks in input optional.
#y=map/\S/g,<>;
sub c{(join'',map$y[$_],#$h)=~/(\d).*\1/|c(#_)if$h=pop}
print(('V','Inv')[c map{$x=$_;[$_*9..$_*9+8],[grep$_%9==$x,0..80],[map$_+3*$b[$x],#b=grep$_%9<3,0..20]}0..8],'alid')
(Linebreaks added for "clarity".)
c() is a function that checks the input in #y against a list of lists of position numbers passed as an argument. It returns 0 if all position lists are valid (contain no number more than once) and 1 otherwise, using recursion to check each list. The bottom line builds this list of lists, passes it to c() and uses the result to select the right prefix to output.
One thing that I quite like is that this solution takes advantage of "self-similarity" in the "block" position list in #b (which is redundantly rebuilt many times to avoid having #b=... in a separate statement): the top-left position of the ith block within the entire puzzle can be found by multiplying the ith element in #b by 3.
More spread out:
# Grab input into an array of individual characters, discarding whitespace
#y = map /\S/g, <>;
# Takes a list of position lists.
# Returns 0 if all position lists are valid, 1 otherwise.
sub c {
# Pop the last list into $h, extract the characters at these positions with
# map, and check the result for multiple occurences of
# any digit using a regex. Note | behaves like || here but is shorter ;)
# If the match fails, try again with the remaining list of position lists.
# Because Perl returns the last expression evaluated, if we are at the
# end of the list, the pop will return undef, and this will be passed back
# which is what we want as it evaluates to false.
(join '', map $y[$_], #$h) =~ /(\d).*\1/ | c(#_) if $h = pop
}
# Make a list of position lists with map and pass it to c().
print(('V','Inv')[c map {
$x=$_; # Save the outer "loop" variable
[$_*9..$_*9+8], # Columns
[grep$_%9==$x,0..80], # Rows
[map$_+3*$b[$x],#b=grep$_%9<3,0..20] # Blocks
} 0..8], # Generates 1 column, row and block each time
'alid')
Perl: 202
I'm reading Modern Perl and felt like coding something... (quite a cool book by the way:)
while(<>){$i++;$j=0;for$s(split//){$j++;$l{$i}{$s}++;$c{$j}{$s}++;
$q{(int(($i+2)/3)-1)*3+int(($j+2)/3)}{$s}++}}
$e=V;for$i(1..9){for(1..9){$e=Inv if$l{$i}{$_}>1or$c{$i}{$_}>1or$q{$i}{$_}>1}}
print $e.alid
Count is excluding unnecessary newlines.
This may require Perl 5.12.2.
A bit more readable:
#use feature qw(say);
#use JSON;
#$json = JSON->new->allow_nonref;
while(<>)
{
$i++;
$j=0;
for $s (split //)
{
$j++;
$l{$i}{$s}++;
$c{$j}{$s}++;
$q{(int(($i+2)/3)-1)*3+int(($j+2)/3)}{$s}++;
}
}
#say "lines: ", $json->pretty->encode( \%l );
#say "columns: ", $json->pretty->encode( \%c );
#say "squares: ", $json->pretty->encode( \%q );
$e = V;
for $i (1..9)
{
for (1..9)
{
#say "checking {$i}{$_}: " . $l{$i}{$_} . " / " . $c{$i}{$_} . " / " . $q{$i}{$_};
$e = Inv if $l{$i}{$_} > 1 or $c{$i}{$_} > 1 or $q{$i}{$_} > 1;
}
}
print $e.alid;
Ruby — 176
f=->x{x.any?{|i|(i-[?.]).uniq!}}
a=[*$<].map{|i|i.scan /./}
puts f[a]||f[a.transpose]||f[a.each_slice(3).flat_map{|b|b.transpose.each_slice(3).map &:flatten}]?'Invalid':'Valid'
Lua, 341 bytes
Although I know that Lua isn't the best golfing language, however, considering it's size, I think it's worth posting it ;).
Non-golfed, commented and error-printing version, for extra fun :)
i=io.read("*a"):gsub("\n","") -- Get input, and strip newlines
a={{},{},{}} -- checking array, 1=row, 2=columns, 3=squares
for k=1,3 do for l=1,9 do a[k][l]={0,0,0,0,0,0,0,0,0}end end -- fillup array with 0's (just to have non-nils)
for k=1,81 do -- loop over all numbers
n=tonumber(i:sub(k,k):match'%d') -- get current character, check if it's a digit, and convert to a number
if n then
r={math.floor((k-1)/9)+1,(k-1)%9+1} -- Get row and column number
r[3]=math.floor((r[1]-1)/3)+3*math.floor((r[2]-1)/3)+1 -- Get square number
for l=1,3 do v=a[l][r[l]] -- 1 = row, 2 = column, 3 = square
if v[n] then -- not yet eliminated in this row/column/square
v[n]=nil
else
print("Double "..n.." in "..({"row","column","square"}) [l].." "..r[l]) --error reporting, just for the extra credit :)
q=1 -- Flag indicating invalidity
end
end
end
end
io.write(q and"In"or"","Valid\n")
Golfed version, 341 bytes
f=math.floor p=io.write i=io.read("*a"):gsub("\n","")a={{},{},{}}for k=1,3 do for l=1,9 do a[k][l]={0,0,0,0,0,0,0,0,0}end end for k=1,81 do n=tonumber(i:sub(k,k):match'%d')if n then r={f((k-1)/9)+1,(k-1)%9+1}r[3]=f((r[1]-1)/3)+1+3*f((r[2]-1)/3)for l=1,3 do v=a[l][r[l]]if v[n]then v[n]=nil else q=1 end end end end p(q and"In"or"","Valid\n")
Python: 140
v=[(k,c) for y in range(9) for x,c in enumerate(raw_input()) for k in x,y+9,(x/3,y/3) if c>'.']
print["V","Inv"][len(v)>len(set(v))]+"alid"
ASL: 108
args1["\n"x2I3*x;{;{:=T(T'{:i~{^0}?})}}
{;{;{{,0:e}:;{0:^},u eq}}/`/=}:-C
dc C#;{:|}C&{"Valid"}{"Invalid"}?P
ASL is a Golfscript inspired scripting language I made.

Resources