I have an excel file that I imported into SAS that contains 3 variables and 3 observations.
All values are numbers.
24 12 47
99 30 14
50 5 41
Is there a way I can code so that each row is sorted in ascending order?
Result would be:
12 24 47
14 30 99
5 41 50
I need to do this for several excel files that contain huge number of variables and observations.
Thank You.
The simple way is to use CALL SORTN which sorts across rows.
data have;
input a b c;
datalines;
24 12 47
99 30 14
50 5 41
;
run;
data have;
modify have;
call sortn(of _numeric_);
run;
I would use a FCMP sort routine. FCMP functions and subroutines only allow temporary arrays to be passed to them for modification. So you have to assign the values into a temporary array, sort, and then reassign to the permanent variables.
Modify the code below for your number of columns and column names.
options cmplib=work.cmp;
proc fcmp outlib=work.cmp.fns;
subroutine qsort(arr[*],lo,hi);
outargs arr;
i = lo;
j = hi;
do while (i < hi);
pivot = arr[floor((lo+hi)/2)];
do while (i<=j);
do while (arr[i] < pivot);
i = i + 1;
end;
do while (arr[j] > pivot);
j = j - 1;
end;
if (i<=j) then do;
t = arr[i];
arr[i] = arr[j];
arr[j] = t;
i = i + 1;
j = j - 1;
end;
end;
if (lo < j) then
call qsort(arr,lo,j);
lo = i;
j = hi;
end;
endsub;
run;
quit;
data test;
input a b c;
datalines;
24 12 47
99 30 14
50 5 41
;
run;
%let ncol=3;
%let cols = a b c;
data sorted;
set test;
array vars[&ncol] &cols;
/*Only temporary arrays can be passed to FCMP functions*/
array tmp[&ncol] _temporary_;
/*Assign to tmp*/
do i=1 to &ncol;
tmp[i] = vars[i];
end;
/*Sort*/
call qsort(tmp,1,&ncol);
/*Put back sorted values*/
do i=1 to &ncol;
vars[i] = tmp[i];
end;
drop i;
run;
Though there's a package SAS/IML designed specifically for manipulations with matrices (where, I believe, this task would be trivial), it still can be done with SAS Base using a couple of PROCs wrapped into macro loop.
data raw;
input a b c;
datalines;
24 12 47
99 30 14
50 5 41
;
run;
proc transpose data=raw out=raw_t(drop=_:); run;
proc sql noprint;
select name into :vars separated by ' '
from sashelp.vcolumn
where libname='WORK' and memname='RAW_T';
quit;
%macro sort_rows;
%do i=1 %to %sysfunc(countw(&vars));
proc sort data=raw_t(keep=%scan(&vars,&i)) out=column;
by %scan(&vars,&i);
run;
data sortedrows;
%if &i>1 %then set sortedrows;;
set column;
run;
%end;
%mend sort_rows;
%sort_rows
proc transpose data=sortedrows out=sortedrows(drop=_:); run;
First, you transpose your original dataset.
Then you iterate through all columns (which were rows originally) one by one, sorting them and right-joining to each other.
And finally, transpose everything back.
Related
I need to Use SAS random number generation functions RAND() and a DO....END loop to create 100 obs in variable named X then I want to use another DO loop of 500 rounds to generate a total of 500 samples, each with 100 obs. a sample is basically sampling from a standard normal distribution.
I tried the following code but it does not give me what I need:
data A;
call streaminit(123); /* set random number seed */
do i = 1 to 100;
X = rand("Normal"); /* random number generator */
output;
end;
do r = 1 to 500 ;
if i then X = rand("Normal");
output;
end;
run;
Any input will be greatly appreciated.
Perfect time to use PROC IML:
proc iml;
call streaminit(123); /* set seed */
x = j(500, 100); /* allocate 500 by 100 matrix */
call randgen(x, "Normal"); /* fill matrix with N(0,1) random draws */
create mydata from x; /* move matrix to a dataset in the work directory */
append from x;
close mydata;
quit;
Here is a data step solution
data want;
do I=1 to 500;
do _iorc_=1 to 100;
X=rand ("normal");
output;
end;
end;
run;
I hope you guys are well.
DATA: The input data is unsorted and hence I am using hash tables to take the input data, do some iterations, sort and then output. Sorting the original table prior to any iterations (using proc sort) would be a time-consuming effort. If there is no other option, then I will need to sit down for the gruesome sorting approach.
What I want: I am trying to enumerate a table variable "answer" with binary values (0/1) if variable filter = "Y" for the next 6 month observations with the same client. In some instances, the client is missing from some monthly observations eg: client FG5151 is missing from September and October 2006. In short if variable filter "Y" then this observation and the next 6 months observations for same client should be assigned variable "answer" eq 1, else 0.
data have;
input client $ dates date9. filter $;
datalines ;
Fg5151 28.Feb.06 N
Fg5151 31.Mar.06 N
Fg5151 30.Apr.06 N
Fg5151 31.May.06 Y
Fg5151 30.Jun.06 N
Fg5151 31.Jul.06 Y
Fg5151 31.Aug.06 N
Fg5151 30.Nov.06 N
Fg5151 31.Dec.06 N
Fg5151 01.Jan.07 N
A101 28.Feb.06 N
A101 31.Mar.06 N
A101 30.Apr.06 Y
A101 31.May.06 N
A101 30.Jun.06 N
A101 31.Jul.06 N
ABC123 31.Mar.06 N
;
data want;
input client $ dates date9. filter $ answer;
datalines ;
A101 28.Feb.06 N 0
A101 31.Mar.06 N 0
A101 30.Apr.06 Y 1
A101 31.May.06 N 1
A101 30.Jun.06 N 1
A101 31.Jul.06 N 1
ABC123 31.Mar.06 N 0
Fg5151 28.Feb.06 N 0
Fg5151 31.Mar.06 N 0
Fg5151 30.Apr.06 N 0
Fg5151 31.May.06 Y 1
Fg5151 30.Jun.06 N 1
Fg5151 31.Jul.06 Y 1
Fg5151 31.Aug.06 N 1
Fg5151 30.Nov.06 N 1
Fg5151 31.Dec.06 N 1
Fg5151 01.Jan.07 N 0
;
I have written both a hash statement and a data step statement. I dont know how to approach this problem:
/* data step approach */
data want;
set have;
retain answer c;
if _n_=1 or lag(client) ne client then do;
answer=0;
c=0;
end;
if filter="Y" then do;
call symput('xdate',dates);
answer=1;
c=1;
end;
else if answer=1 then c=c+1;
if (intnx("month",dates,6,"same")) then do;
answer=0;
c=0;
end;
run;
/* hash method approach */
data _null_;
set have end=last;
if _n_ = 1 then do;
length newdate 8 answer 8 c 8;
format newdate ddmmyy10.;
declare hash hs(ordered: "a",hashexp: 9);
hs.defineKey("client","dates");
hs.defineData("client","dates","filter","answer","c");
hs.defineDone();
end;
rc = hs.find();
by client dates notsorted;
if rc ne 0 then do;
retain answer c;
if _n_=1 or lag(client) ne client then do;
answer=0;
c=0;
end;
if filter="Y" then do;
answer=1;
c=1;
hs.add();
end;
else if answer=1 then c=c+1;
if (intnx("month",dates,6,"same")) then do;
answer=0;
c=0;
hs.replace();
end;
hs.replace();
end;
if last eq 1 then do;
hs.output(dataset:
"not_working");
end;
run;
Any help would be greatly appreciated.
thank you.
regards,
S
One option is PROC FORMAT. This has a sort in it, but only of the filter='Y' folks, so hopefully that's minimal; and it's actually unnecessary if you are confident your data is grouped (but not sorted) by client (ie, you can skip it, it will not delete anything), and in fact with the m option being used anyway (to avoid worrying about collisions) you probably can skip it regardless.
This is not super-fast necessarily, because it uses putn function instead of put statement. You will have to see how it performs on larger datasets.
The idea here is we construct a format that defines the range of 'Y' for each record, and uses hlo='o' option to define the rest of the ragne as n.
data for_fmt;
set have;
by client notsorted;
if filter='Y' then do;
start = dates;
end = intnx('Month',dates,5,'s');
hlo=' m';
fmtname=cats(client,'F');
label='Y';
output;
end;
if last.client then do;
fmtname=cats(client,'F');
call missing(of start end);
hlo='om';
label='N';
output;
end;
run;
proc sort nodupkey data=for_fmt;
by fmtname start;
run;
proc format cntlin=for_fmt;
quit;
data want;
set have;
answer = putn(dates,cats(client,'F'));
run;
Is there a way to change and manipulate the proportion of a variable in SAS in random sampling?
Lets say that I have table consisting 1000 people. (500 male and 500 female)
If I want to have a random sample of 100 with gender strata - I will have 50 males and 50 females in my output.
I want to learn if there is a way to have the desired proportion of gender values?
Can ı have a random sample of 100 with 70 males and 30 females ?
PROC SURVEYSELECT is the way to do this, using a dataset for n or samprate instead of a number.
data strata_to_Sample;
length sex $1;
input sex $ _NSIZE_;
datalines;
M 70
F 30
;;;;
run;
proc sort data=strata_To_sample;
by sex;
run;
data to_sample;
set sashelp.class;
do _i = 1 to 1e5;
output;
end;
run;
proc sort data=to_Sample;
by sex;
run;
proc surveyselect data=to_sample n=strata_to_sample out=sample;
strata sex;
run;
Generally that is what proc surveyselect is for.
But for a quick and dirty datastep solution:
data in_data;
do i= 1 to 500;
sex = 'M'; output;
sex = 'F'; output;
end;
run;
data in_data;
set in_data;
rannum = ranuni(12345);
run;
proc sort data= in_data; by rannum; run;
data sample_data;
set in_data;
retain count_m count_f 0;
if sex = 'M' and count_m lt 70 then do; count_m + 1; output; end;
else if sex = 'F' and count_f lt 30 then do; count_f + 1; output; end;
run;
proc freq data= sample_data;
table sex;
run;
How do we print the following with O(n) execution
may be using a single for loop?
1
2 3
4 5 n
up to n rows
all I can do is using nested for loops
Nested for loop doesn't necessarily mean it's not O(n) any more. If what's inside the nested loop gets executed O(n) times, then the nested loop is perfectly fine:
cur_num <- 1
cur_step <- 1
while cur_num <= n
for i <- 1 to cur_step
print cur_num++
cur_step++
print '\n'
With a single for loop, it's doable, but slightly less pleasant
cur_num <- 1
cur_step <- 1
cur_step_consumed <- 0
for i <- 1 to n
print cur_num++
cur_step_consumed++
if cur_step_consumed == cur_step
cur_step_consumed <- 0
cur_step++
print '\n'
In C++:
size_t amount = 1;
size_t count = 0;
for(size_t i=1;i<=n;++i){
cout << i << " ";
++count;
if (count == amount){
cout << endl;
count = 0;
++amount;
}
}
output for n = 29:
1
2 3
4 5 6
7 8 9 10
11 12 13 14 15
16 17 18 19 20 21
22 23 24 25 26 27 28
29
The idea is to track the number of elements to print in the current row, and track the number of elements printed in the current row. When the number of elements we've printed for the current row is the same as the total number of elements to print for that row, reset the count and increment the number of elements to print for the next row. You can mess with the formatting to get prettier output, but this is the gist of how it can be done in O(n) time and O(1) space.
I know the basic concept of the merge sort algorithm but when it comes to implementing it via recursion I am having trouble grasping how it works. From what I understand, the merge sort function splits our current array into two halves and using recursion we keep doing this until we are left with 1 element for each side.
If our array is {38, 27, 43, 3, 9, 82, 10} then our recursion will start by calling itself using the subarray (left side of the original array) and repeat the process each time, halving the array and storing the left most side until we reach 1 element:
38 27 43 3 9 82 10
38 27 43 3 <-split
<---first subroutine/recursion
38 27 <-split
<---second subroutine/recursion
38 <---only 1 element left so we return the value back to the first subroutine that called
Then in our second subroutine we move on to the next line: right = merge_sort(right) which again calls itself to split the subarray and storing the right most side:
38 27 <-split
<---second subroutine/recursion
27
<---only 1 element left so we return the value back to the first subroutine that called
Then in our second subroutine we move on to the next line: result = merge(left, right) which calls the merge function to sort our left and right arrays that are just 38 and 27. The merge function sorts our two values based on which is smaller and then it adds the first one to an array although I'm not sure which array. (I need specification on this; shouldn't we have a new array every time we merge two previous arrays?) Then the merge function returns the "result" to another result variable in our merge sort function from having called the merge function. I am assuming this result is the new array that has 38 and 27 sorted in order. Then it looks like we are returning that result again to whatever called the merge sort function but I am confused because wouldn't that end everything? What about the first subroutine that paused for the left side recursion? I'm not sure what happens to:
38 27 43 3
43 3
43
and
43 3
3
Pseudo-code:
function merge_sort(m)
if length(m) ≤ 1
return m
var list left, right, result
var integer middle = length(m) / 2
for each x in m up to middle
add x to left
for each x in m after middle
add x to right
left = merge_sort(left)
right = merge_sort(right)
result = merge(left, right)
return result
Following writing merge_sort function, then it is required to merge both the left and right lists created above. There are several variants for the merge() function; one possibility is this:
function merge(left,right)
var list result
while length(left) > 0 or length(right) > 0
if length(left) > 0 and length(right) > 0
if first(left) ≤ first(right)
append first(left) to result
left = rest(left)
else
append first(right) to result
right = rest(right)
else if length(left) > 0
append first(left) to result
left = rest(left)
else if length(right) > 0
append first(right) to result
right = rest(right)
end while
return result
http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Merge_sort.html
I'm not sure whether it is what you're looking for, but you can simplify your merge loop by replacing or with and in the main condition:
while length(left) > 0 and length(right) > 0
if first(left) ≤ first(right)
append first(left) to result
left = rest(left)
else
append first(right) to result
right = rest(right)
end while
# You know that one of left and right is empty
# Copy the rest of the data from the other
while length(left) > 0
append first(left) to result
left = rest(left)
end while
while length(right) > 0
append first(right) to result
right = rest(right)
end while
Yes, there are three loops, but only one of the last two is ever executed.
Working C99 code based closely on pseudo-code
Thus code uses C99 variable-length arrays (an optional feature in C11). If compiled with -DDEBUG, you'll get extensive tracing while the program is running. If compiled without, you only get the input (unsorted) and output (sorted) arrays printed. I needed it to diagnose a stupid typo (an r_pos where an l_pos was clearly required). Note the general techniques:
Document entry and exit from functions
Create a diagnostic print function (here dump_array() with one argument a 'tag' (to identify which call is being used) and the other arguments the data structure to be printed.
Call the diagnostic print function at suitable points.
Make it easy to enable or disable diagnostics.
For production quality code, my diagnostic print functions also take a FILE *fp argument and write to the given file; I cheated and used stdout here. The extra generality means the function can be used to write to stderr or a log file as well as, or instead of, stdout.
Space management
The merge_sort() code copies the complete input array into two smaller arrays (left and right) and then sorts the smaller arrays (recursion) and merges the sorted smaller arrays into the input array. This happens at each of log N levels of recursion. Some empirical testing shows that the space used is approximately 2N items — it is O(N) space usage.
Shouldn't we have a new array every time we merge two previous arrays?
In a functional programming language, you would have new arrays. In C, you use the input array as the output array too. The code copies the original input array into separate smaller arrays, sorts those smaller arrays, and merges the sorted smaller arrays into the original array.
My other question is what procedure in the code allows us to go back to before the recursion where we split the left side of our array so we can work on the right side to get 43 a 3 in order to merge them as well.
The splitting process creates a copy of the input array (so the information in the original data is temporarily superfluous). The merging process copies the (now sorted) split arrays back into the original array. (Largely repeating myself.)
Source
#include <stddef.h>
extern void merge_sort(int *array, size_t arrlen);
/* Debug */
#ifdef DEBUG
static void dump_array(const char *tag, int *array, size_t len);
static void enter_func(const char *func);
static void exit_func(const char *func);
#else
#define dump_array(t, a, l) ((void)0)
#define enter_func(f) ((void)0)
#define exit_func(f) ((void)0)
#endif
/*
function merge(left, right)
var list result
while length(left) > 0 and length(right) > 0
if first(left) ≤ first(right)
append first(left) to result
left = rest(left)
else
append first(right) to result
right = rest(right)
end while
# You know that one of left and right is empty
# Copy the rest of the data from the other
while length(left) > 0
append first(left) to result
left = rest(left)
end while
while length(right) > 0
append first(right) to result
right = rest(right)
end while
return result
end function
*/
static void merge(int *left, size_t l_len, int *right, size_t r_len, int *output)
{
size_t r_pos = 0;
size_t l_pos = 0;
size_t o_pos = 0;
enter_func(__func__);
dump_array("Left:", left, l_len);
dump_array("Right:", right, r_len);
while (r_pos < r_len && l_pos < l_len)
{
if (right[r_pos] < left[l_pos])
output[o_pos++] = right[r_pos++];
else
output[o_pos++] = left[l_pos++];
}
while (r_pos < r_len)
output[o_pos++] = right[r_pos++];
while (l_pos < l_len)
output[o_pos++] = left[l_pos++];
dump_array("Output:", output, r_len + l_len);
exit_func(__func__);
}
/*
function merge_sort(m)
if length(m) ≤ 1
return m
var list left, right, result
var integer middle = length(m) / 2
for each x in m up to middle
add x to left
for each x in m after middle
add x to right
left = merge_sort(left)
right = merge_sort(right)
result = merge(left, right)
return result
*/
void merge_sort(int *array, size_t len)
{
if (len <= 1)
return;
int left[(len+1)/2];
int l_pos = 0;
int right[(len+1)/2];
int r_pos = 0;
size_t mid = len / 2;
enter_func(__func__);
dump_array("Input:", array, len);
for (size_t i = 0; i < mid; i++)
left[l_pos++] = array[i];
for (size_t i = mid; i < len; i++)
right[r_pos++] = array[i];
dump_array("Left:", left, l_pos);
dump_array("Right:", right, r_pos);
merge_sort(left, l_pos);
merge_sort(right, r_pos);
merge(left, l_pos, right, r_pos, array);
dump_array("Result:", array, len);
exit_func(__func__);
}
/* Test code */
#include <stdio.h>
#ifdef DEBUG
static void enter_func(const char *func)
{
printf("-->> %s\n", func);
}
static void exit_func(const char *func)
{
printf("<<-- %s\n", func);
}
#endif
/* dump_array is always used */
#undef dump_array
static void dump_array(const char *tag, int *array, size_t len)
{
printf("%-8s", tag);
for (size_t i = 0; i < len; i++)
printf(" %2d", array[i]);
putchar('\n');
}
int main(void)
{
int array[] = { 38, 27, 43, 3, 9, 82, 10 };
size_t arrlen = sizeof(array) / sizeof(array[0]);
dump_array("Before:", array, arrlen);
merge_sort(array, arrlen);
dump_array("After:", array, arrlen);
return 0;
}
Sample outputs
Non-debugging
Before: 38 27 43 3 9 82 10
After: 3 9 10 27 38 43 82
Debugging
Before: 38 27 43 3 9 82 10
-->> merge_sort
Input: 38 27 43 3 9 82 10
Left: 38 27 43
Right: 3 9 82 10
-->> merge_sort
Input: 38 27 43
Left: 38
Right: 27 43
-->> merge_sort
Input: 27 43
Left: 27
Right: 43
-->> merge
Left: 27
Right: 43
Output: 27 43
<<-- merge
Result: 27 43
<<-- merge_sort
-->> merge
Left: 38
Right: 27 43
Output: 27 38 43
<<-- merge
Result: 27 38 43
<<-- merge_sort
-->> merge_sort
Input: 3 9 82 10
Left: 3 9
Right: 82 10
-->> merge_sort
Input: 3 9
Left: 3
Right: 9
-->> merge
Left: 3
Right: 9
Output: 3 9
<<-- merge
Result: 3 9
<<-- merge_sort
-->> merge_sort
Input: 82 10
Left: 82
Right: 10
-->> merge
Left: 82
Right: 10
Output: 10 82
<<-- merge
Result: 10 82
<<-- merge_sort
-->> merge
Left: 3 9
Right: 10 82
Output: 3 9 10 82
<<-- merge
Result: 3 9 10 82
<<-- merge_sort
-->> merge
Left: 27 38 43
Right: 3 9 10 82
Output: 3 9 10 27 38 43 82
<<-- merge
Result: 3 9 10 27 38 43 82
<<-- merge_sort
After: 3 9 10 27 38 43 82