I have this data
y x1 x2 pre
1 16 1 1 14
2 15 1 1 13
3 14 1 2 14
4 13 1 2 13
5 12 2 1 12
6 11 2 1 12
7 11 2 2 13
8 13 2 2 13
9 10 3 1 10
10 11 3 1 11
11 11 3 2 11
12 9 3 2 10
And I fitted the following model
lm(y ~ x1 + x2 + x1*x2)
My design matrix is
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 14 1 0 1 1 0
[2,] 1 13 1 0 1 1 0
[3,] 1 14 1 0 0 0 0
[4,] 1 13 1 0 0 0 0
[5,] 1 12 0 1 1 0 1
[6,] 1 12 0 1 1 0 1
[7,] 1 13 0 1 0 0 0
[8,] 1 13 0 1 0 0 0
[9,] 1 10 0 0 1 0 0
[10,] 1 11 0 0 1 0 0
[11,] 1 11 0 0 0 0 0
[12,] 1 10 0 0 0 0 0
I'm trying to use this design to reproduce the following table:
Source DF Squares Mean Square F Value Pr > F
Model 6 44.79166667 7.46527778 12.98 0.0064
Error 5 2.87500000 0.57500000
Corrected Total 11 47.66666667
Source DF Type III SS Mean Square F Value Pr > F
pre 1 3.12500000 3.12500000 5.43 0.0671
x1 2 4.58064516 2.29032258 3.98 0.0923
x2 1 3.01785714 3.01785714 5.25 0.0706
x1*x2 2 1.25000000 0.62500000 1.09 0.4055
The first part is fine
XtX <- t(x) %*% x
XtXinv <- solve(XtX)
betahat <- XtXinv %*% t(x) %*% y
H <- x %*% XtXinv %*% t(x)
IH <- (diag(1,12) - H)
yhat <- H %*% y
e <- IH %*% y
ybar <- mean(y)
MSS <- t(betahat) %*% t(x) %*% y - length(y)*(ybar^2)
ESS <- t(e) %*% e
TSS <- MSS + ESS
dfM <- sum(diag(H)) - 1
dfE <- sum(diag(IH))
dfT <- dfM + dfE
MSM <- MSS/dfM
MSE <- ESS/dfE
Ftest <- MSM / MSE
pr <- 1 - pf(Ftest, dfM, dfE)
The contrast coefficient matrix for 'pre' seems correct.
L <- matrix(c(0,1,0,0,0,0,0), 1, 7, byrow=T)
Lb <- L %*% betahat
LXtXinvLt <- round(L %*% XtXinv %*% t(L), digits=4)
SSpre <- t(Lb) %*% solve(LXtXinvLt) %*% (Lb)
MSpre <- SSpre / 1
Fpre <- MSpre / MSE
PRpre <- 1 - pf(Fpre, 1, 12-7)
But I can't understand how to define the contrast coefficient matrix for x1, x2, and x1*x2. What's the problem with the rest of my code? Below an example for how I think I should calculate for x1
L <- matrix(c(0,0,1,1,0,0,0), 1, 7, byrow=T)
Lb <- L %*% betahat
LXtXinvLt <- round(L %*% XtXinv %*% t(L), digits=4)
SSX1 <- t(Lb) %*% solve(LXtXinvLt) %*% (Lb)
MSX1 <- SSX1 / 1
FX1 <- MSX1 / MSE
PRX1 <- 1 - pf(FX1, 1, 12-7)
Thanks!
Related
I'm learnig to use AMPL to solve some linear programing related problems; but i have a syntax error with a part of my code, and i dont know how to solve it.
#Archivo Mod
#Conjuntos
set T; #Conjunto Periodos
set I; #Conjunto Plantas
set J; #Conjunto Clientes
set M; #Conjunto de materias primas
#Parametros
param D{j in J, t in T};
param CAM{i in I, m in M};
param CFP{i in I, t in T};
param CVP{i in I, t in T};
param CFI{i in I, t in T};
param CVI{i in I, t in T};
param QP{i in I};
param QI{i in I};
param CT{i in I, j in J};
param R{i in I, m in M};
param L; #Gran M
#Variables de desiciòn
var X{m in M, i in I, t in T}>=0 integer;
var Y{i in I, t in T}>=0 integer;
var H{i in I, t in T}>=0 integer; #Cambie la varible I de notación para no confundirla con el conjunto
var Z{i in I, t in T} binary;
var CI{i in I, t in T} binary;
var W{i in I, j in J, t in T}>=0 integer;
var TR{i in I, j in J, t in T} binary; #Si se transporta o no
#F.O
minimize FO: sum{i in I,t in T}CFP[i,t]*Z[i,t]+sum{i in I,t in T}CFI[i,t]*CI[i,t]+sum{i in I, m in M, t in T}CAM[i,m]*X[m,i,t]+sum{i in I, j in J, t in T}TR[i,j,t]*CT[i,J]+sum{i in I,t in T}CvI[i,t]*H[i,t]+sum{i in I,t in T}CVP[i,t]*Y[i,t];
#Restricciones
s.t. R1{i in I, t in T}: Y[i,t]<=QP[i];
s.t. R2{i in I, t in T}: H[i,t]<=QI[i];
s.t. R3{i in I, t in T}: Y[i,t]<=M*Z[i,t];
s.t. R4{i in I, t in T}: H[i,t]<=M*CI[i,t];
s.t. R5{i in I}: H[i,0]=0;
s.t. R6{j in J,t in T}: sum{i in I}W[i,j,t]=D[j,t];
s.t. R7{i in I, t in {1,2,3,4,5}}:H[i,t-1]+Y[i,t]=H[i,t]+sum{j in J}W[i,j,t];
s.t. R8{i in I}:H[i,5]+Y[i,6]=H[i,6]+sum{j in J}W[i,j,6];
s.t. R9{j in J, t in T}:D[j,t]=sum{i in I}W[i,j,t];
s.t. R10{i in I, m in M, t in T}: R[i,m]*Y{i,t]=X[m,i,t];
With the next Data File
#Archivo DAT
#Definiciòn de conjuntos (Se puede separar con comas o espacios)
set T:=1,2,3,4,5,6; #Conjunto Periodos
set I:= P1, P2, P3; #Conjunto Plantas
set J:= C1, C2, C3, C4; #Conjunto Clientes
set M:= M1, M2, M3; #Conjunto de materias primas
#Parametros
param D:
1 2 3 4 5 6:=
C1 300 350 330 320 360 350
C2 500 600 550 400 450 500
C3 1000 800 850 900 950 850
C4 450 600 500 550 400 490
;
param CAM:
M1 M2 M3:=
P1 50 70 20
P2 30 100 20
P3 30 50 20
;
param R:
M1 M2 M3:=
P1 2 1 3
P2 3 1 5
P3 2 1 1
;
param QP:=
P1 1000
P2 800
P3 800
;
param QI:=
P1 300
P2 400
P3 350
;
param CFP:
1 2 3 4 5 6:=
P1 10 10 12 15 15 13
P2 12 12 15 17 17 13
P3 25 25 20 30 30 25
;
param CVP:
1 2 3 4 5 6:=
P1 5 5 10 8 8 7
P2 6 6 12 8 9 8
P3 13 13 10 15 15 15
;
param CFI:
1 2 3 4 5 6:=
P1 2 3 2 2 3 2
P2 2 3 2 2 3 2
P3 2 5 2 7 9 7
;
param CVI:
1 2 3 4 5 6:=
P1 1 1 1 2 1 2
P2 2 2 3 2 2 2
P3 2 2 1 2 1 3
;
param CT:
C1 C2 C3 C4:=
P1 100 80 30 100
P2 120 30 30 120
P3 90 70 30 150
;
param L:=10000000
But when i run the mod file, i have this error
Taller1.mod, line 36 (offset 981):
syntax error
context: minimize FO: sum{i in I,t in T}CFP[i,t]*Z[i,t]+sum{i in I,t in T}CFI[i,t]*CI[i,t]+sum{i in I, m in M, t in T}CAM[i,m]*X[m,i,t]+sum{i in I, j in J, t in >>> T}TR[i,j,t]*CT[i,J] <<< +sum{i in I,t in T}CvI[i,t]*H[i,t]+sum{i in I,t in T}CVP[i,t]*Y[i,t];
I checked my code, but I don't understand what the error is. Please! Help me.
CT[i,>>>J<<<<]
I want to align the memory of a 5x5 matrix represented as an one-dimensional array.
The original array looks like this:
let mut a = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25];
or
[ 1 2 3 4 5 ]
[ 6 7 8 9 10 ]
a = [ 11 12 13 14 15 ]
[ 16 17 18 19 20 ]
[ 21 22 23 24 25 ]
with a length of 25 elements.
after resizing the memory to memory aligned bounds (power of 2), the array will look like this:
a = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ];
or
[ 1 2 3 4 5 6 7 8 ]
[ 9 10 11 12 13 14 15 16 ]
[ 17 18 19 20 21 22 23 24 ]
[ 25 0 0 0 0 0 0 0 ]
a = [ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
the len of a is now 64 elements.
so it will become an 8x8 matrix
the goal is to have following representation:
a = [1 2 3 4 5 0 0 0 6 7 8 9 10 0 0 0 11 12 13 14 15 0 0 0 16 17 18 19 20 0 0 0 21 22 23 24 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ];
or
[ 1 2 3 4 5 0 0 0 ]
[ 6 7 8 9 10 0 0 0 ]
[ 11 12 13 14 15 0 0 0 ]
[ 16 17 18 19 20 0 0 0 ]
[ 21 22 23 24 25 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
The background is to have a memory aligned to a power of two, so calculations can be partially done in parallel ( for OpenCL float4, or the available vector sizes.). I also do not want to use a new array to simply insert the old elements at the correct positions to keep memory consumption low.
At first, I thought about swapping the elements at the range, where there should be a zero with the elements at the end of the array, keeping a pointer to the elements and simulating a queue, but elements would stack up towards the end, and I didn't came up with a working solution.
My language of choice is rust. Is there any smart algorithm to achieve the desired result?
So you have an N * N matrix represented as a vector of size N^2, then you resize the vector to M^2 (M > N), so that the first N^2 elements are the original ones. Now you want to rearrange the original elements, so that the N * N sub-matrix in the upper left of the M * M matrix is the same as the original.
One thing to note is that if you go backwards you will never overwrite a value that you will need later.
The position of index X in the M * M matrix is row X / M (integer division) and column X % M.
The desired position of index X is row X / N and column X % N
An element at row R and column C in the M * M matrix has the index R * M + C
Now taking all this information we can come up with the formula to get the new index Y for the old index X:
Y = (X / N) * M + (X % N)
So you can just make a loop from N^2 - 1 to N and copy the element to the new position calculated with the formula and set its original position to 0. (Everything is 0-based, I hope rust is 0-based as well or you will have to add some +1.)
According to maraca's solution, the code would look like this:
fn zeropad<T: Copy>(
first: T,
data: &mut Vec<T>,
dims: (usize, usize),
) -> (usize, usize) {
let r = next_pow2(dims.0);
let c = next_pow2(dims.1);
if (r, c) == (dims.0, dims.1) {
return (r, c);
}
let new_len = r * c;
let old_len = data.len();
let old_col = dims.1;
// resize
data.resize(new_len, first);
for i in (old_col..old_len).rev() {
let row: usize = i / c;
let col: usize = i % c;
// bigger matrix
let pos_old = row * c + col;
// smaller matrix
let pos_new = (i / dims.1) * c + (i % dims.1);
data[pos_new] = data[pos_old];
data[pos_old] = first;
}
return (r, c);
}
With the following pieces of information, I can easily create an array of matrices
b0=data.frame(b0_1=c(11.41,11.36),b0_2=c(8.767,6.950))
b1=data.frame(b1_1=c(0.8539,0.9565),b1_2=c(-0.03179,0.06752))
b2=data.frame(b2_1=c(-0.013020 ,-0.016540),b2_2=c(-0.0002822,-0.0026720))
T.val=data.frame(T1=c(1,1),T2=c(1,2),T3=c(2,1))
dt_data=cbind(b0,b1,b2,T.val)
fu.time=seq(0,50,by=0.8)
pat=ncol(T.val) #number of T's
nit=2 #no of rows
pt.array1=array(NA, dim=c(nit,length(fu.time),pat))
for ( it.er in 1:nit){
for ( ti in 1:length(fu.time)){
for (pt in 1:pat){
pt.array1[it.er,ti,pt]=b0[it.er,T.val[it.er,pt]]+b1[it.er,T.val[it.er,pt]]*fu.time[ti]+b2[it.er,T.val[it.er,pt]]*fu.time[ti]^2
}
}
}
pt.array_mean=apply(pt.array1, c(3,2), mean)
pt.array_LCL=apply(pt.array1, c(3,2), quantile, prob=0.25)
pt.array_UCL=apply(pt.array1, c(3,2), quantile, prob=0.975)
Now with these additional data, I can create three plots as follows
mydata
pt.ID time IPSS
1 1 0.000000 10
2 1 1.117808 8
3 1 4.504110 5
4 1 6.410959 14
5 1 13.808220 10
6 1 19.890410 4
7 1 28.865750 15
8 1 35.112330 7
9 2 0.000000 6
10 2 1.117808 7
11 2 4.109589 8
12 2 10.093151 7
13 2 16.273973 11
14 2 18.345205 18
15 2 21.567120 14
16 2 25.808220 12
17 2 56.087670 5
18 3 0.000000 8
19 3 1.413699 3
20 3 4.405479 3
21 3 10.389041 8
pdf("plots.pdf")
par(mfrow=c(3,2))
for( pt.no in 1:pat){
plot(IPSS[ID==pt.no]~time[ID==pt.no],xlim=c(0,57),ylim=c(0,35),type="l",col="black",
xlab="f/u time", ylab= "",main = paste("patient", pt.no),data=mydata)
points(IPSS[ID==pt.no]~time[ID==pt.no],data=mydata)
lines(pt.array_mean[pt.no,]~fu.time, col="blue")
lines(pt.array_LCL[pt.no,]~fu.time, col="green")
lines(pt.array_UCL[pt.no,]~fu.time, col="green")
}
dev.off()
The problem arise when the number of rows in each matrix is much bigger say 10000. It takes too much computation time to create the pt.array1 for large number of rows in b0, b1 and b2.
Is there any alternative way I can do it quickly using any builtin function?
Can I avoid the storage allocation for pt.array1 as I am not using it further? I just need pt.array_mean, pt.array_UCL and pt.array_LCL for myplot.
Any help is appreciated.
There are a couple of other approaches you can employ.
First, you largely have a model of b0 + b1*fu + b2*fu^2. Therefore, you could make the coefficients and apply the fu after the fact:
ind <- expand.grid(nits = seq_len(nit), pats = seq_len(pat))
mat_ind <- cbind(ind[, 'nits'], T.val[as.matrix(ind)])
b_mat <- matrix(c(b0[mat_ind], b1[mat_ind], b2[mat_ind]), ncol = 3)
b_mat
[,1] [,2] [,3]
[1,] 11.410 0.85390 -0.0130200
[2,] 11.360 0.95650 -0.0165400
[3,] 11.410 0.85390 -0.0130200
[4,] 6.950 0.06752 -0.0026720
[5,] 8.767 -0.03179 -0.0002822
[6,] 11.360 0.95650 -0.0165400
Now if we apply the model to each row, we will get all of your raw results. The only problem is that we don't match your original output - each column slice of your array is equivalent of a row slice of my matrix output.
pt_array <- apply(b_mat, 1, function(x) x[1] + x[2] * fu.time + x[3] * fu.time^2)
pt_array[1,]
[1] 11.410 11.360 11.410 6.950 8.767 11.360
pt.array1[, 1, ]
[,1] [,2] [,3]
[1,] 11.41 11.41 8.767
[2,] 11.36 6.95 11.360
That's OK because we can fix the shape of it as we get summary statistics - we just need to take the colSums and colQuantiles of each row converted to a 2 x 3 matrix:
library(matrixStats)
pt_summary = array(t(apply(pt_array,
1,
function(row) {
M <- matrix(row, ncol = pat)
c(colMeans2(M),colQuantiles(M, probs = c(0.25, 0.975))
)
}
)),
dim = c(length(fu.time), pat, 3),
dimnames = list(NULL, paste0('pat', seq_len(pat)), c('mean', 'LCL', 'UCL'))
)
pt_summary[1, ,] #slice at time = 1
mean LCL UCL
pat1 11.3850 11.37250 11.40875
pat2 9.1800 8.06500 11.29850
pat3 10.0635 9.41525 11.29518
# rm(pt.array1)
Then to do your final graphing, I simplified it - the data argument can be a subset(mydata, pt.ID == pt.no). Additionally, since the summary statistics are now in an array format, matlines allows everything to be done at once:
par(mfrow=c(3,2))
for( pt.no in 1:pat){
plot(IPSS~pt.ID, data=subset(mydata, pt.ID == pt.no),
xlim=c(0,57), ylim=c(0,35),
type="l",col="black", xlab="f/u time", ylab= "",
main = paste("patient", pt.no)
)
points(IPSS~time, data=subset(mydata, pt.ID == pt.no))
matlines(y = pt_summary[,pt.no ,], x = fu.time, col=c("blue", 'green', 'green'))
}
Consider we have N points on a circle. To each point an index is assigned i = (1,2,...,N). Now, for a randomly selected point, I want to have a vector including the indices of 5 points, [two left neighbors, the point itself, two right neighbors].
See the figure below.
Some sxamples are as follows:
N = 18;
selectedPointIdx = 4;
sequence = [2 3 4 5 6];
selectedPointIdx = 1
sequence = [17 18 1 2 3]
selectedPointIdx = 17
sequence = [15 16 17 18 1];
The conventional way to code this is considering the exceptions as if-else statements, as I did:
if ii == 1
lseq = [N-1 N ii ii+1 ii+2];
elseif ii == 2
lseq = [N ii-1 ii ii+1 ii+2];
elseif ii == N-1
lseq=[ii-2 ii-1 ii N 1];
elseif ii == N
lseq=[ii-2 ii-1 ii 1 2];
else
lseq=[ii-2 ii-1 ii ii+1 ii+2];
end
where ii is selectedPointIdx.
It is not efficient if I consider for instance 7 points instead of 5. What is a more efficient way?
How about this -
off = -2:2
out = mod((off + selectedPointIdx) + 17,18) + 1
For a window size of 7, edit off to -3:3.
It uses the strategy of subtracting 1 + modding + adding back 1 as also discussed here.
Sample run -
>> off = -2:2;
for selectedPointIdx = 1:18
disp(['For selectedPointIdx =',num2str(selectedPointIdx),' :'])
disp(mod((off + selectedPointIdx) + 17,18) + 1)
end
For selectedPointIdx =1 :
17 18 1 2 3
For selectedPointIdx =2 :
18 1 2 3 4
For selectedPointIdx =3 :
1 2 3 4 5
For selectedPointIdx =4 :
2 3 4 5 6
For selectedPointIdx =5 :
3 4 5 6 7
For selectedPointIdx =6 :
4 5 6 7 8
....
For selectedPointIdx =11 :
9 10 11 12 13
For selectedPointIdx =12 :
10 11 12 13 14
For selectedPointIdx =13 :
11 12 13 14 15
For selectedPointIdx =14 :
12 13 14 15 16
For selectedPointIdx =15 :
13 14 15 16 17
For selectedPointIdx =16 :
14 15 16 17 18
For selectedPointIdx =17 :
15 16 17 18 1
For selectedPointIdx =18 :
16 17 18 1 2
You can use modular arithmetic instead: Let p be the point among N points numbered 1 to N. Say you want m neighbors on each side, you can get them as follows:
(p - m - 1) mod N + 1
...
(p - 4) mod N + 1
(p - 3) mod N + 1
(p - 2) mod N + 1
p
(p + 1) mod N + 1
(p + 2) mod N + 1
(p + 3) mod N + 1
...
(p + m - 1) mod N + 1
Code:
N = 18;
p = 2;
m = 3;
for i = p - m : p + m
nb = mod((i - 1) , N) + 1;
disp(nb);
end
Run code here
I would like you to note that you might not necessarily improve performance by avoiding a if statement. A benchmark might be necessary to figure this out. However, this will only be significant if you are treating tens of thousands of numbers.
The original script is
Y = [1 2 3 3 2 1 1 2 3]';
n = length(Y);
Ym = zeros(n, n);
for i=1:n
index = find(Y==Y(i));
Ym(i, index') = 1;
end
then, the Ym is
Ym =
1 0 0 0 0 1 1 0 0
0 1 0 0 1 0 0 1 0
0 0 1 1 0 0 0 0 1
0 0 1 1 0 0 0 0 1
0 1 0 0 1 0 0 1 0
1 0 0 0 0 1 1 0 0
1 0 0 0 0 1 1 0 0
0 1 0 0 1 0 0 1 0
0 0 1 1 0 0 0 0 1
Yes! With bsxfun -
Ym = bsxfun(#eq,Y,Y.')
I find that logical indexing works faster than bsxfun on my computer. Here the sample times for different methods:
tic;
for j=1:10000
Y = [1 2 3 3 2 1 1 2 3]';
n = length(Y);
Ym = zeros(n, n);
for i=1:n
index = find(Y==Y(i));
Ym(i, index') = 1;
end
end
disp('Method 1:');
toc;
tic;
for j=1:10000
Y = [1 2 3 3 2 1 1 2 3]';
n = length(Y);
Ym = zeros(n, n);
for i=1:n
Ym(i, Y==Y(i)') = 1;
end
end
disp('Method 2:');
toc;
tic;
for j=1:10000
Y = [1 2 3 3 2 1 1 2 3]';
n = length(Y);
Ym = zeros(n, n);
a=repmat(Y,1,n);
b=repmat(Y',n,1);
Ym(a==b)=1;
end
disp('Method 3:');
toc;
tic;
for j=1:10000
Y = [1 2 3 3 2 1 1 2 3]';
Ym = bsxfun(#eq,Y,Y.');
end
disp('Method 4');
toc
OUTPUT:
Method 1:
Elapsed time is 0.111412 seconds.
Method 2:
Elapsed time is 0.069617 seconds.
Method 3:
Elapsed time is 0.246780 seconds.
Method 4
Elapsed time is 0.103120 seconds.