Unexpected Proc Summary results - summary

I am trying to run a proc summary statement in SAS EG. Below is my code.
proc summary data = SC_Rx_claims;
var PLAN_SCRIPT_COUNT AMOUNT_PAID;
output out = SC_Rx_Sum (drop=_type_ _freq_) SUM=;
run;
PLAN_SCRIPT_COUNT is a field that contains the numeric 1 for each entry. But when I run the summary, I get an unexpected result of ** for PLAN_SCRIPT_COUNT. I don't know what this means or what would cause it. Does anyone have any insight into how resolve this or what ** means?

Most likely you have a format attached to PLAN_SCRIPT_COUNT in the source data and it is being carried forward into the output dataset. Add a format statement to the PROC SUMMARY step to remove it and see if that helps.
proc summary data = SC_Rx_claims;
var PLAN_SCRIPT_COUNT AMOUNT_PAID;
format PLAN_SCRIPT_COUNT AMOUNT_PAID;
output out = SC_Rx_Sum (drop=_type_ _freq_) SUM=;
run;

Related

How to format time in SAS

I have a dataset with three columns : Start, Stop and Date
Observations in my Start and Stop are time type.
I have the following two values in my Start and Stop columns:
24:49:00 and 25:16:00
As there are both over 24 hours format.
I would like to convert those two values to the following:
24:49:00 to 00:49:00
and
25:16:00 to 01:16:00
How to do this in both SAS and proc sql ?
Thank you !
Do you need to convert them? Use the TIMEPART() function.
start_day=datepart(start);
start_time=timepart(start);
format start_time tod8.;
Or do you just want to display them that way?
format start stop tod8.;
Start/Stop time-24:00:00 like this:
data _null_;
start='25:16:14't;
point='24:00:00't;
_start=start-point;
put _start;
format _start time8.;
run;
SAS Time and DateTime values use seconds as their fundamental unit.
Thus you can use either modulus arithmetic or TIMEPART function to extract the less than 24 hour part of a > 24 hour time value.
data have;
start = '24:49:00't;
stop = '25:16:00't;
start_remainder = mod(start, '24:00't); * modulus arithmetic;
stop_remainder = mod(stop, '24:00't);
start_timepart = timepart(start); * TIMEPART function;
stop_timepart = timepart(stop);
format start: stop: time10.;
run;
After the computation do not expect start_remainder is less than stop_remainder to be always true.

How Can I Round All Time Using SAS?

I have a little problem and appreciate if anyone could help me.
What I'm trying to do is basically round the time part to the nearest 30 minute.
My question is how can I do rounding data using SAS.
This is my command:
DATA sampledata;
INFORMAT TRD_EVENT_TM time10.;
FORMAT TRD_EVENT_TM TRD_TMR time14.;
INPUT TRD_EVENT_TM;
TRD_TMR = round(TRD_EVENT_TM, 1800);
INFILE;
00:14:12
00:16:12
09:01:23
09:46:32
15:59:45
;
PROC PRINT; RUN;
But I want to round all time, Not five of them.I am using big data.
Thanks for your attention.
assuming you are asking how to do this rounding on other data, not just your datalines in the example above I suggest you separate these two tasks into two different data steps.
First you create your sample data (this you can exchange for your main data later)
DATA sampledata;
infile datalines;
INPUT TRD_EVENT_TM hhmmss8.;
datalines;
00:14:12
00:16:12
09:01:23
09:46:32
15:59:45
;
RUN;
Then you perform the rounding of the time variables.
data test;
set sampledata;
format TRD_EVENT_TM TRD_TMR time.;
TRD_TMR = round(TRD_EVENT_TM, 1800);
run;
Hope this is the answer to the question you had.
data Sampledata_RT;
set Sampledata04;
TRD_EVENT_ROUNDED = intnx('minute30',TRD_EVENT_TM,1,'b');
TRD_EVENT_ROUFOR = put(TRD_EVENT_ROUNDED,hhmm.);
CountedVOLUME = TRD_PR*TRD_TUROVR;
run;

Where is the syntax error within this SAS view code?

data work.temp work.error / view = work.temp;
infile rawdata;
input Xa Xb Xc;
if Xa=. then output work.errors;
else output work.temp;
run;
It says there's a syntax error in the DATA statement, but I can't find where ...
The error is a typo in the OUTPUT statement. You are trying to write observations to ERRORS but the data statement only defined ERROR.
It is a strange construct and not something I would recommend, but it looks like it will work. When you exercise the view TEMP it will also generate the dataset ERROR.
67 data x; set temp; run;
NOTE: The infile RAWDATA is:
Filename=...
NOTE: 2 records were read from the infile RAWDATA.
The minimum record length was 5.
The maximum record length was 5.
NOTE: View WORK.TEMP.VIEW used (Total process time):
real time 0.32 seconds
cpu time 0.01 seconds
NOTE: The data set WORK.ERROR has 1 observations and 3 variables.
NOTE: There were 1 observations read from the data set WORK.TEMP.
NOTE: The data set WORK.X has 1 observations and 3 variables.

String matching: Gets into infinte loop when using Wait(job, 'finished')

I am working on parallelizing String matching algorithm using MATLAB PCT. I am using createJob and several tasks where i am passing the text to be searched, pattern and other parameters. I get the following error. Any idea. The boyer_horsepool function the tasks are targetted looks fine.
Error using parallel.Job/fetchOutputs (line 677)
An error occurred during execution of Task with ID 1.
Error in stringmatch (line 42)
matches = fetchOutputs(job1);
Caused by:
Error using feval
Undefined function handle.
Code
% create the job
parallel.defaultClusterProfile('local');
cluster = parcluster();
job1 = createJob(cluster);
% create the tasks
for index = 1: num_tasks
ret = createTask(job1, #boyer_horsepool, 1, {haystack, needle, nlength, startValues(index), endValues(index)});
fprintf('For index %d the crateTask value is ?\n',index);
disp(class(ret));
%disp(ret);
end
% Submit and wait for the results
submit(job1);
wait(job1);
% Report the number of matches
matches = fetchOutputs(job1);
delete(job1);
Hm, I could be wrong, but it looks like your syntax is fine...
I think the issue is that it's not recognizing boyer_horsepool as a function. It's hard to do anything further without a bit more context. Try moving that function into the same .m file, and double check the spelling and argument count.
Also, try getAllOutputArguments(job1). It's a long shot, but it might work.
Good luck!

generate a different number of columns based on input number

Suppose I have some XML data that has an unknown number of sub-nodes. Is there a method that allows me to input the number of sub-nodes into the program as a parameter, and have it process them? current code is something like this
SourceXML = LOAD '$input' using org.apache.pig.piggybank.storage.XMLLoader('$TopNode') as test:chararray;
test2 = LIMIT SourceXML 3;
test3 = FOREACH test2 GENERATE REGEX_EXTRACT(test,'<$tag1>(.*)</$tag1>',1),
REGEX_EXTRACT(test,'<$tag2>(.*)</$tag2>',1);
dump test3;
however I may not know in advance how many simple elements there are in the target data (how many $tag# there are). I am hoping to use a .txt file containing parameters that looks something like this:
input=/inputpath/lowerlevelsofpath
numberSimpleElements=3
tag1=tag1name
tag2=tag2name
tag3=tag3name
With a regex_extract being done on each tag in the input file
Any ideas on how to accomplish this?
You could do following
Split the text by some regex, so that each row now has value.
Generate (tag, value) for each row
Do a join between (tag, value) and (list of tags)

Resources