I did a simple proc freq in SAS:
PROC FREQ DATA=test;
a * b;
RUN;
This raised the error: insufficient page size to print frequency table
From ERROR: Insufficient page size to print frequency table in SAS PROC FREQ I learned that the error is fixed by enlarging the page size:
option pagesize=max;
But then my table still looked strange with super high white spaces in column b:
Frequency |
Percent |
Row Pct | value 1 | value 2 |
Col Pct | | |
| | |
...etc... ...etc...
| | |
----------+----------+----------+
a | 12 | 3 |
What solved my problem was adding a format to the proc freq that truncated variable b.
PROC FREQ DATA=test;
FORMAT B $7.;
a * b;
RUN;
now my result looks like this and I'm happy enough:
Frequency |
Percent |
Row Pct |
Col Pct | value 1 | value 2 |
----------+----------+----------+
a | 12 | 3 |
I'm left a bit bewilderd, because nowhere in the code did I apply a format to b before, just a lenght statement. Other variables that had their lengths fixed did not have this problem. I did switch from an excel sourcefile to oracle-exadata as source. Is it possible that Oracle pushes variable formats to SAS?
SAS has a nasty habit of attaching formats to character variables pulled from external databases, including PROC IMPORT from an EXCEL file. So if a character variable has a storage length of 200 then SAS will also attach the $200. format to the variable.
When you combine two dataets that both contain the same variable the length will be set by the first version of the variable seen. But the format attached will be set by the first non-empty format seen. So you could combine a dataset where A has length $10 and no format attached with another dataset where A has the format $200. attached and the result will a variable with an actual length of 10 but the $200. format attached.
You can use the format statement where you list variable names but no format specification to remove them. You could do it in the PROC step.
PROC FREQ DATA=test;
tables a * b;
format _character_ ;
RUN;
Or do it in a data step or use PROC DATASETS to modify the formats attached to the variable in an existing dataset.
Related
I have a two part question about creating datasets in SAS that calls upon macro variables
Part 1
I'm trying to create a dataset that has one character variable called variable with a length of 100, and 3 observations.
%let first_value=10;
%let second_value=20;
%let third_value=30;
data temp;
infile cards truncover;
input variable $100.;
cards;
First Value: &first_value
Second Value: &second_value
Third Value: &third_value
;
run;
My output dataset doesn't show the macro variables, just the exact text I entered in the datalines. I would love help on syntax of how to concatenate character input with a macro variable. Also I'm curious why sometimes you need a separate length statement for character variables before the input statement when other times you can just specify the length in the input statement like above.
Part 2
Next, I'm trying to create a dataset that has one observation with 4 variables, 3 of which are macro variables.
data temp2;
infile cards dlm=" "
input variable $ first_var second_var third_var
cards;
Observation 1 Filler &first_value &second_value &third_value
;
run;
The 4 spaces in the delimiter statement and between variables in the datalines are actually tabs in my code.
Thanks!
Your examples do not seem to be worth using macro variables.
But if you really need to resolve macro expressions in variable values then use the RESOLVE() function. The RESOLVE() will evaluate all macro code in the text, not just the macro variable references in your example. So any macro function calls and calls to actual macros will be resolved and the generated text returned as the result of the function.
newvar=resolve(oldvar);
So your examples become:
data temp;
infile cards truncover;
input variable $100.;
variable = resolve(variable);
cards;
First Value: &first_value
Second Value: &second_value
Third Value: &third_value
;
data temp2;
infile cards dlm="|" ;
input #;
_infile_=resolve(_infile_);
input variable :$100. first_var second_var third_var ;
cards;
Observation 1 Filler|&first_value|&second_value|&third_value
;
But on the second one be careful as the _INFILE_ variable for CARDS images are fixed multiples of 80 bytes so if the resolved macro expressions make the string longer than the next 80 byte boundary you will lose the extra text.
511 %let xx=%sysfunc(repeat(----+----0,8));
512
513 data test;
514 infile cards truncover;
515 input #;
516 _infile_=resolve(_infile_);
517 input variable $100. ;
518 length=lengthn(variable);
519 put length= variable=;
520 cards;
length=5 variable=short
length=80 variable=long ----+----0----+----0----+----0----+----0----+----0----+----0----+----0----+
NOTE: The data set WORK.TEST has 2 observations and 2 variables.
So use input from an actual file instead. That way the limit is instead the 32,767 byte limit for a character variable.
%let xx=%sysfunc(repeat(----+----0,8));
options parmcards=text;
filename text temp;
parmcards;
short
long &xx
;
531
532
533 data test;
534 infile text truncover;
535 input #;
536 _infile_=resolve(_infile_);
537 input variable $100. ;
538 length=lengthn(variable);
539 put length= variable=;
540 run;
NOTE: The infile TEXT is:
Filename=C:\...\#LN00053,
RECFM=V,LRECL=32767,File Size (bytes)=17,
Last Modified=08Jul2022:23:42:10,
Create Time=08Jul2022:23:42:10
length=5 variable=short
length=95 variable=long ----+----0----+----0----+----0----+----0----+----0----+----0----+----0----+----0----+----0
NOTE: 2 records were read from the infile TEXT.
The minimum record length was 5.
The maximum record length was 8.
NOTE: The data set WORK.TEST has 2 observations and 2 variables.
Background Story
I have an excel table of values with thousands seperator . and floating point seperator ,.
If the number is lower than 1000, therefore only the , exists. In UiPath, I'm using a read range and store the data in a data table. Somehow, Uipath manages to replace the , by a . because it interprets the value as float. But this only happens to values lower than 1000. Larger numbers are interpreted as string and all the seperators stay the same.
Example:
+───────────+───────────────+─────────+
| Input | UiPath Value | Type |
+───────────+───────────────+─────────+
| 4.381,14 | 4.381,14 | String |
| 5.677,50 | 5.677,50 | String |
| 605,27 | 605.27 | Double |
+───────────+───────────────+─────────+
Problem
I want to loop through the data table and apply some logic to each value. Because of the different data types, I assign the value to a generic value variable. It is a huge problem that the , is automatically replaced by a ., because in my context, this is a completely different value. Therefore I somehow need to check the data type, so i can replace the seperator again.
Attempt
I'm trying to get the type by GetType().ToString(), but it only delivers me: UiPath.Core.GenericValue
I tried to replicate it. And I have successfully converted to double using the following steps. I have taken one value and followed the below steps.
strValue = dt(0)(0).ToString.Replace(".","$")
strValue = strValue.Replace(",",".")
strValue = strValue.Replace("$",",")
dblValue = CDbl(strValue)
In UiPath, when we read data from Excel, it will be treating the cell values as generic objects. So, we explicitly convert it to String.
I have a very large dataset with over 1000 columns, with column names formatted like this:
WORLDDATA.table2_usa_2017_population
WORLDDATA.table2_japan_2017_gnp
I only need to keep a subset of these parameters for a select few countries. I specify the custom lists:
%let list1 = usa canada uk japan southafrica;
%let list2 = population crimerate gnp;
How do I do a double for loop like so:
param_list = []
for (i in list1) {
for (j in list2) {
param_name = WORLDDATA.table2_{list1[i]}_2017_{list2[j]}
param_list.append(param_name)
}
}
such that I can use param_list in
data final_dataset;
set WORLDDATA.table2;
keep {param_list};
run;
Thank you!
Your original data set has data items country and topic encoded into the column name (metadata) you will probably need to transpose the data for use in SAS procedure steps that would use statements such as where, by and class.
Proc TRANSPOSE can pivot data from wide to tall and the output will have a column named _NAME_ which can be used in a where=(where-statement) option on the output data set. Th where-statement would be a regex expression having your lists specified as alternation (|) items in a group (such as (item-1|...|item-N)). The regex engine would perform the implicit outer join that the nested loop in the question pseudo code does. The regex pattern would use the /ix modifiers in order to have a pattern formatted for human readability that also ignores case.
In order to have Proc TRANSPOSE pivot each row of a data set, the data set needs to have row key (a variable or variables in combination) that are distinct from row to row.
Untested example:
proc transpose data=have_wide out=want_subset_categorical (where=(
prxmatch("(?ix)/
table2_
%sysfunc(translate(&LIST1.,|,%str( )) (?# list 1 spaces converted to | ors )
_2017_
%sysfunc(translate(&LIST2.,|,%str( )) (?# list 2 spaces converted to | ors )
/",_name_)
));
by <row-key>;
run;
I want to write a Query which would give the Sum of the value where the string contains 'SP11' without any break
For Example in the below table I want to add the value of the 3rd, 6th and 7th rows
String | Value
________________|_______
A/B/SP1/ADDS | 12
ss/B/SP2/A | 2
A/C/D/SP11/C | 66
Ass/C/ASD | 46
ACD/SP1/C/V/C | 45
F/D/SP11/C | 85
F/D/SP11/C/12/D | 21
Which would result in something like SP11 = 172 which was derived by adding up the values of
Value of 3rd row(A/C/D/SP11/C)+
Value of 6th row(F/D/SP11/C)+Value of 7th row(F/D/SP11/C/12/D)
= 66+85+21=172
This is the Query I tried to get the value required but this doesn't work
CALCULATE(Sum(Query1[Value]), FIND("*SP11*",Query1[Value])>0)
The correct measure is this :
Measure:=CALCULATE(sum([value]),filter(Table1,FIND("SP11",Table1[string],1,0)>0))
try this:
CALCULATE(SUM(TABLE[VALUE]), SEARCH("SP11",Table[String],1,0)>0)
How does one convert a string/varchar to a timestamp in MonetDB ?
Like this, but with millisecond precision (to six decimal places, ideally):
sql>select str_to_date('2008-09-19-18.40.09.812000', '%Y-%m-%d-%H.%M.%6S');
+--------------------------+
| str_to_date_single_value |
+==========================+
| 2008-09-19 |
+--------------------------+
1 tuple (0.312ms)
I'm not sure whether str_to_date is built in or whether I created it ages ago and forgot.
create function str_to_date(s string, format string) returns date
external name mtime."str_to_date";
Edit: expected output something like
+---------------------------------+
| str_to_timestamp_single_value |
+=================================+
| 2008-09-19 18:40:09.812000 |
+---------------------------------+
Monetdb time conversion functions are listed in :
[Monetdb installation folder]\MonetDB5\lib\monetdb5\createdb\13_date.sql.
Besides the str_to_date function, there is a str_to_timestamp function.
The syntax of the format string follows the MySQL one.
Example :
select sys.str_to_timestamp('2016-02-04 15:30:29', '%Y-%m-%d %H:%M:%S');
The date/time specifiers might need to be changed:
select str_to_date('2008-09-19-18.40.09.812000','%Y-%m-%d-%H.%i.%s.%f')
output:
2008-09-19 18:40:09.812000
*monetdb could be different, although in standard SQL these are the standard date specifiers.
You could also use date_format in addition to str_to_date:
select date_format(str_to_date('SEP 19 2008 06:40:09:812PM','%M %D %Y %h:%i:%s:%f%p'),'%Y-%m-%d-%H.%i.%f');
output:
2008-09-19-18.40.812000