SAS Base - Association rules matrix - matrix

I'm using SAS Base/ SAS Enterprise Guide and I'm stuck.
I need to create the Association Rules matrix to calculate the support, lift, confidence and other association metrics.
My table is similar to: (one line per client with all the products associated to)
data test;
input item1 item2 item3 item4 ;
datalines ;
1 0 1 0
1 1 1 0
1 0 1 0
1 0 1 1;
As you know for the market basket analysis I want something like that:
item1 item2 item3 item4
item1 4 1 4 1
item2 1 1 1 0
item3 4 1 4 1
item4 1 0 4 4
Any idea?
Thank you for your help.

I think your example WANT is incorrect.
data test;
input item1-item4;
datalines;
1 0 1 0
1 1 1 0
1 0 1 0
1 0 1 1
;;;;
proc print;
run;
proc corr noprint sscp out=sscp(drop=int: where=(_type_ eq 'SSCP' and _NAME_ ne: 'Int'));
var item:;
run;
proc print;
run;

Related

DAX formula for rept customers

I recently worked on a task where I needed to identify new clients.
I managed to find something similar on google and the final result was this measure that I don't understand
and maybe you can help me understand the logic behind this measure. I obviously thought wrongly that it should be >=MIN(Sheet1[Data])))
not <MIN(Sheet1[Data])))
I improvised some data along with the formula.
new_cust =
CALCULATE(
DISTINCTCOUNT(Sheet1[Cust_id])
,FILTER(
ALL(Sheet1[Data])
,Sheet1[Data]<=MAX(Sheet1[Data])
)
)
-
CALCULATE(
DISTINCTCOUNT(Sheet1[Cust_id])
,FILTER(
ALL(Sheet1[Data])
,Sheet1[Data]<MIN(Sheet1[Data])
)
)
Cust_id Data New_Cust
1 1/1/2023 1
1 1/2/2023 0
2 1/3/2023 1
2 1/4/2023 0
2 1/5/2023 0
3 1/6/2023 1
3 1/7/2023 0
1 2/1/2023 0
1 2/2/2023 0
3 2/3/2023 0
3 2/4/2023 0
3 2/5/2023 0
4 2/6/2023 1
4 2/7/2023 0
4 2/8/2023 0
1 3/1/2023 0
1 3/2/2023 0
2 3/3/2023 0
2 3/4/2023 0
3 3/5/2023 0
3 3/6/2023 0
4 3/7/2023 0
4 3/8/2023 0
6 3/9/2023 1
6 3/10/2023 0
Thank you in advance for your understanding and help

How to make SORTKEY for irregular observations

I'd like to make "SORTKEY" like the below. It's not the same observations for each one.
Basically, each one is 3 obs but if flg=1 then "SORTKEY" includes that observation.
In this example, it means SORTKEY = 2 is 4 obs, SORTKEY ^=2 is 3 obs.
Is there the way to make the SORTKEY manually?. If you have a good idea, please give me some advice.
I want the following dataset, using the "test" dataset.
/*
SORTKEY NO FLG
1    1  0
1    2  0
1    3  0
2    4  0
2    5  0
2    6  0
2    7  1
3    8  0
3    9  0
3    10 0
*/
data test;
input no flg;
cards;
1 0
2 0
3 0
4 0
5 0
6 0
7 1
8 0
9 0
10 0
;
run;
Use a sequence counter to track the 3-rows-per-sortkey requirement.
Example:
data want;
set have;
retain sortkey 1;
seq+1;
if seq > 3 and flag ne 1 then do;
seq = 1;
sortkey+1;
end;
run;

MATLAB - Combine two binary image by comparing 3 x 3 patch (sub-matrix)

Matlab - Hello, I want to combine two binary images with same size (111x111), but first i want to divide the image into 3 x 3 matrix patch (37 sub matrix), with the two conditions:
1.If the 3 x 3 patches from image 2 matrix values is all white (1) then the result matrix = image 1 matrix , example:
image 1 patch: image 2 patch: result:
1 1 0 1 1 1 1 1 0
1 0 1 1 1 1 1 0 1
1 1 1 1 1 1 1 1 1
2. Else, i want to keep the center value of 3 x 3 patches (index (2,2)) from image 1, but the other value from image 2
image 1 patch: Image 2 patch : result:
0 0 0 1 0 1 1 0 1
0 0 0 1 1 0 1 0 0
0 0 0 1 0 1 1 0 1
And do the whole image and combine the whole 3 x 3 patches into result image (111x111 again)
My Code so far (Using mat2cell):
clear;
clc;
I1 = imread('image1.bmp');
I2 = imread('image2.bmp');
TI1 = im2bw(I1); %Thresholding I1
TI2 = im2bw(I2); %Thresholding I2
%Mat2cell patch
cellTI1 = mat2cell(TI1, 3*ones(size(TI1,1)/3,1), 3*ones(size(TI1,2)/3,1))
cellTI2= mat2cell(TI2, 3*ones(size(TI2,1)/3,1), 3*ones(size(TI2,2)/3,1))
% Im Confused with the loop
result1 = ones(37,37);
for i=1:3
for j=1:3
for m=1:37
for n=1:37
if TI2{m,n} == [1 1 1;
1 1 1;
1 1 1]
result1 = TI1(m,n);
else
result1 = [TI2{1,1}(1,1) TI2{1,1}(1,2) TI2{1,1}(1,3);
TI2{1,1}(2,1) TI1{1,1}(2,2) TI2{1,1}(3,2);
TI2{1,1}(3,1) TI2{1,1}(3,2) TI2{1,1}(3,3)];
end
end
end
Sorry for my bad English,
Thanks

Oracle 11g - Adding a Total Column to a Pivot Table

I've created a pivot table with data from multiple tables (using JOINS). How can I add another column to the table which adds up each column from each row?
Example:
Category | A | B | C |
ABC 1 1 1
A 1 0 0
B 0 1 0
C 0 0 1
Category | A | B | C | TOTAL
ABC 1 1 1 3
A 1 0 0 1
B 0 1 0 1
C 0 0 1 1
SCOTT#research 15-APR-15> select * from testing ;
CATEG A B C
----- ---------- ---------- ----------
ABC 1 1 1
A 1 0 0
B 0 1 0
C 0 0 1
SCOTT#research 15-APR-15> select category,a,b,c, sum(a+b+c) as "total" from testing group by category,a,b,c order by category;
CATEG A B C total
----- ---------- ---------- ---------- ----------
A 1 0 0 1
ABC 1 1 1 3
B 0 1 0 1
C 0 0 1 1
In case you want to add a column, then can add one use a procedure to update the values using this,
alter table testing add total int;
use this procedure to update the values
create or replace procedure add_Test
is
sqlis varchar2(10);
total1 int;
begin
for i in (select * from testing) loop
select sum(a+b+c) into total1 from testing where category=i.category;
update testing set total=total1 where category=i.category;
end loop;
commit;
end;
exec add_test;
SCOTT#research 15-APR-15> select * from testing;
CATEG A B C TOTAL
----- ---------- ---------- ---------- ----------
ABC 1 1 1 3
A 1 0 0 1
B 0 1 0 1
C 0 0 1 1

R - Making loops faster

This little code snippet is supposed to loop through a sorted data frame. It keeps a count of how many successive rows have the same information in columns aIndex and cIndex and also bIndex and dIndex. If these are the same, it deposits the count and increments it for the next time around, and if they differ, it deposits the count and resets it to 1 for the next time around.
for (i in 1:nrow(myFrame)) {
if (myFrame[i, aIndex] == myFrame[i, cIndex] &
myFrame[i, bIndex] == myFrame[i, dIndex]) {
myFrame[i, eIndex] <- count
count <- (count + 1)
} else {
myFrame[i, eIndex] <- count
count <- 1
}
}
It's been running for a long time now. I understand that I'm supposed to vectorize whenever possible, but I'm not really seeing it here. What am I supposed to do to make this faster?
Here's what an example few rows should look like after running:
aIndex bIndex cIndex dIndex eIndex
1 2 1 2 1
1 2 1 2 2
1 2 4 8 3
4 8 1 4 1
1 4 1 4 1
I think this will do what you want; the tricky part is that the count resets after the difference, which effectively puts a shift on the eIndex.
There (hopefully) is an easier way to do this, but this is what I came up with.
tmprle <- rle(((myFrame$aIndex == myFrame$cIndex) &
(myFrame$bIndex == myFrame$dIndex)))
myFrame$eIndex <- c(1,
unlist(ifelse(tmprle$values,
Vectorize(seq.default)(from = 2,
length = tmprle$lengths),
lapply(tmprle$lengths,
function(x) {rep(1, each = x)})))
)[-(nrow(myFrame)+1)]
which gives
> myFrame
aIndex bIndex cIndex dIndex eIndex
1 1 2 1 2 1
2 1 2 1 2 2
3 1 2 4 8 3
4 4 8 1 4 1
5 1 4 1 4 1
Maybe this will work. I have reworked the rle and sequence bits.
dat <- read.table(text="aIndex bIndex cIndex dIndex
1 2 1 2
1 2 1 2
1 2 4 8
4 8 1 4
1 4 1 4", header=TRUE, as.is=TRUE,sep = " ")
dat$eIndex <-NA
#identify rows where a=c and b=d, multiply by 1 to get a numeric vector
dat$id<-(dat$aIndex==dat$cIndex & dat$bIndex==dat$dIndex)*1
#identify sequence
runs <- rle(dat$id)
#create sequence, multiply by id to keep only identicals, +1 at the end
count <-sequence(runs$lengths)*dat$id+1
#shift sequence down one notch, start with 1
dat$eIndex <-c(1,count[-length(count)])
dat
aIndex bIndex cIndex dIndex eIndex id
1 1 2 1 2 1 1
2 1 2 1 2 2 1
3 1 2 4 8 3 0
4 4 8 1 4 1 0
5 1 4 1 4 1 1

Resources