storage problem in R. alternative to nested loop for creating array of matrices and then multiple plots - matrix

With the following pieces of information, I can easily create an array of matrices
b0=data.frame(b0_1=c(11.41,11.36),b0_2=c(8.767,6.950))
b1=data.frame(b1_1=c(0.8539,0.9565),b1_2=c(-0.03179,0.06752))
b2=data.frame(b2_1=c(-0.013020 ,-0.016540),b2_2=c(-0.0002822,-0.0026720))
T.val=data.frame(T1=c(1,1),T2=c(1,2),T3=c(2,1))
dt_data=cbind(b0,b1,b2,T.val)
fu.time=seq(0,50,by=0.8)
pat=ncol(T.val) #number of T's
nit=2 #no of rows
pt.array1=array(NA, dim=c(nit,length(fu.time),pat))
for ( it.er in 1:nit){
for ( ti in 1:length(fu.time)){
for (pt in 1:pat){
pt.array1[it.er,ti,pt]=b0[it.er,T.val[it.er,pt]]+b1[it.er,T.val[it.er,pt]]*fu.time[ti]+b2[it.er,T.val[it.er,pt]]*fu.time[ti]^2
}
}
}
pt.array_mean=apply(pt.array1, c(3,2), mean)
pt.array_LCL=apply(pt.array1, c(3,2), quantile, prob=0.25)
pt.array_UCL=apply(pt.array1, c(3,2), quantile, prob=0.975)
Now with these additional data, I can create three plots as follows
mydata
pt.ID time IPSS
1 1 0.000000 10
2 1 1.117808 8
3 1 4.504110 5
4 1 6.410959 14
5 1 13.808220 10
6 1 19.890410 4
7 1 28.865750 15
8 1 35.112330 7
9 2 0.000000 6
10 2 1.117808 7
11 2 4.109589 8
12 2 10.093151 7
13 2 16.273973 11
14 2 18.345205 18
15 2 21.567120 14
16 2 25.808220 12
17 2 56.087670 5
18 3 0.000000 8
19 3 1.413699 3
20 3 4.405479 3
21 3 10.389041 8
pdf("plots.pdf")
par(mfrow=c(3,2))
for( pt.no in 1:pat){
plot(IPSS[ID==pt.no]~time[ID==pt.no],xlim=c(0,57),ylim=c(0,35),type="l",col="black",
xlab="f/u time", ylab= "",main = paste("patient", pt.no),data=mydata)
points(IPSS[ID==pt.no]~time[ID==pt.no],data=mydata)
lines(pt.array_mean[pt.no,]~fu.time, col="blue")
lines(pt.array_LCL[pt.no,]~fu.time, col="green")
lines(pt.array_UCL[pt.no,]~fu.time, col="green")
}
dev.off()
The problem arise when the number of rows in each matrix is much bigger say 10000. It takes too much computation time to create the pt.array1 for large number of rows in b0, b1 and b2.
Is there any alternative way I can do it quickly using any builtin function?
Can I avoid the storage allocation for pt.array1 as I am not using it further? I just need pt.array_mean, pt.array_UCL and pt.array_LCL for myplot.
Any help is appreciated.

There are a couple of other approaches you can employ.
First, you largely have a model of b0 + b1*fu + b2*fu^2. Therefore, you could make the coefficients and apply the fu after the fact:
ind <- expand.grid(nits = seq_len(nit), pats = seq_len(pat))
mat_ind <- cbind(ind[, 'nits'], T.val[as.matrix(ind)])
b_mat <- matrix(c(b0[mat_ind], b1[mat_ind], b2[mat_ind]), ncol = 3)
b_mat
[,1] [,2] [,3]
[1,] 11.410 0.85390 -0.0130200
[2,] 11.360 0.95650 -0.0165400
[3,] 11.410 0.85390 -0.0130200
[4,] 6.950 0.06752 -0.0026720
[5,] 8.767 -0.03179 -0.0002822
[6,] 11.360 0.95650 -0.0165400
Now if we apply the model to each row, we will get all of your raw results. The only problem is that we don't match your original output - each column slice of your array is equivalent of a row slice of my matrix output.
pt_array <- apply(b_mat, 1, function(x) x[1] + x[2] * fu.time + x[3] * fu.time^2)
pt_array[1,]
[1] 11.410 11.360 11.410 6.950 8.767 11.360
pt.array1[, 1, ]
[,1] [,2] [,3]
[1,] 11.41 11.41 8.767
[2,] 11.36 6.95 11.360
That's OK because we can fix the shape of it as we get summary statistics - we just need to take the colSums and colQuantiles of each row converted to a 2 x 3 matrix:
library(matrixStats)
pt_summary = array(t(apply(pt_array,
1,
function(row) {
M <- matrix(row, ncol = pat)
c(colMeans2(M),colQuantiles(M, probs = c(0.25, 0.975))
)
}
)),
dim = c(length(fu.time), pat, 3),
dimnames = list(NULL, paste0('pat', seq_len(pat)), c('mean', 'LCL', 'UCL'))
)
pt_summary[1, ,] #slice at time = 1
mean LCL UCL
pat1 11.3850 11.37250 11.40875
pat2 9.1800 8.06500 11.29850
pat3 10.0635 9.41525 11.29518
# rm(pt.array1)
Then to do your final graphing, I simplified it - the data argument can be a subset(mydata, pt.ID == pt.no). Additionally, since the summary statistics are now in an array format, matlines allows everything to be done at once:
par(mfrow=c(3,2))
for( pt.no in 1:pat){
plot(IPSS~pt.ID, data=subset(mydata, pt.ID == pt.no),
xlim=c(0,57), ylim=c(0,35),
type="l",col="black", xlab="f/u time", ylab= "",
main = paste("patient", pt.no)
)
points(IPSS~time, data=subset(mydata, pt.ID == pt.no))
matlines(y = pt_summary[,pt.no ,], x = fu.time, col=c("blue", 'green', 'green'))
}

Related

Is there a function to generate a specific n Multichoose r combination, given the index number?

For example, 3 multichoose 2 has the following combinations:
i combo
0 = [0,0]
1 = [0,1]
2 = [0,2]
3 = [1,1]
4 = [1,2]
5 = [2,2]
Could a function be written whose arguments are n,r,i and returns the combination in question, without iterating through every combination before it?
Could a function be written whose arguments are n,r,i and returns the combination in question, without iterating through every combination before it?
Yes. We have to do a little counting to get at the heart of this problem. To better illustrate how this can be broken down into very simple smaller problems, we will look at a larger example. Consider all combinations of 5 chosen 3 at a time with no repeats (we will say from here on out 5 choose 3).
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 4
[3,] 1 2 5
[4,] 1 3 4
[5,] 1 3 5
[6,] 1 4 5
[7,] 2 3 4
[8,] 2 3 5
[9,] 2 4 5
[10,] 3 4 5
Notice the first 6 rows. If we remove the first column of these 6 rows and subtract 1 from every element, we obtain:
[,1] [,2] [,1] [,2]
[1,] 2 3 [1,] 1 2
[2,] 2 4 subtract 1 [2,] 1 3
[3,] 2 5 --->>>> [3,] 1 4
[4,] 3 4 [4,] 2 3
[5,] 3 5 [5,] 2 4
[6,] 4 5 [6,] 3 4
The matrix on the right is precisely all of the combinations of 4 choose 2. Continuing on, we see that the "second" group (i.e. rows 7 through 9 of the original matrix) also looks to have order:
[,1] [,2] [,1] [,2]
[1,] 3 4 [1,] 1 2
[2,] 3 5 subtract 2 [2,] 1 3
[3,] 4 5 --->>>> [3,] 2 3
This is simply 3 choose 2. We are starting to see a pattern unfold. Namely, that all combinations of smaller n and r are contained in our parent combinations. This pattern continues as we move to the right. All that is left is to keep up with which combination we are after.
Below is the above algorithm written out in C++ (N.B. there isn't any data validation):
template <typename T>
double nChooseK(T n, T k) {
// Returns number of k-combinations from n elements.
// Mathematically speaking, we have: n!/(k!*(n-k)!)
if (k == n || k == 0)
return 1;
else if (k > n || n < 0)
return 0;
double nCk;
double temp = 1;
for (int i = 1; i <= k; i++)
temp *= (double) (n - k + i) / i;
nCk = std::round(temp);
return nCk;
}
std::vector<int> nthCombination(int n, int r, double i) {
int j = 0, n1 = n - 1, r1 = r - 1;
double temp, index1 = i, index2 = i;
std::vector<int> res(r);
for (int k = 0; k < r; k++) {
temp = nChooseK(n1, r1);
while (temp <= index1) {
index2 -= nChooseK(n1, r1);
n1--;
j++;
temp += nChooseK(n1, r1);
}
res[k] = j;
n1--;
r1--;
j++;
index1 = index2;
}
return res;
}
Calling it on our example above with 5 choose 3 we obtain:
nthCombination(5, 3, 0) -->> 0 1 2
nthCombination(5, 3, 1) -->> 0 1 3
nthCombination(5, 3, 2) -->> 0 1 4
nthCombination(5, 3, 3) -->> 0 2 3
nthCombination(5, 3, 4) -->> 0 2 4
nthCombination(5, 3, 5) -->> 0 3 4
nthCombination(5, 3, 6) -->> 1 2 3
nthCombination(5, 3, 7) -->> 1 2 4
nthCombination(5, 3, 8) -->> 1 3 4
nthCombination(5, 3, 9) -->> 2 3 4
This approach is very efficient as well. Below, we get the billionth combination of 40 choose 20 (which generates more than 100 billion combinations) instantly:
// N.B. base zero so we need to subtract 1
nthCombination(40, 20, 1000000000 - 1) -->>
0 1 2 3 4 5 8 9 14 16 18 20 22 23 31 33 34 35 38 39
Edit
As the OP points out in the comments, they gave an example with repeats. The solution is very similar and it breaks down to counting. We first need a counting function similar to nChooseK but that considers repeats. The function below does just that:
double combsWithReps(int n, int r) {
// For combinations where repetition is allowed, this
// function returns the number of combinations for
// a given n and r. The resulting vector, "triangleVec"
// resembles triangle numbers. In fact, this vector
// is obtained in a very similar method as generating
// triangle numbers, albeit in a repeating fashion.
if (r == 0)
return 1;
int i, k;
std::vector<double> triangleVec(n);
std::vector<double> temp(n);
for (i = 0; i < n; i++)
triangleVec[i] = i+1;
for (i = 1; i < r; i++) {
for (k = 1; k <= n; k++)
temp[k-1] = std::accumulate(triangleVec.begin(), triangleVec.begin() + k, 0.0);
triangleVec = temp;
}
return triangleVec[n-1];
}
And here is the function that generates the ith combination with repeats.
std::vector<int> nthCombWithRep(int n, int r, double i) {
int j = 0, n1 = n, r1 = r - 1;
double temp, index1 = i, index2 = i;
std::vector<int> res(r);
for (int k = 0; k < r; k++) {
temp = combsWithReps(n1, r1);
while (temp <= index1) {
index2 -= combsWithReps(n1, r1);
n1--;
j++;
temp += combsWithReps(n1, r1);
}
res[k] = j;
r1--;
index1 = index2;
}
return res;
}
It is very similar to the first function above. You will notice that n1-- and j++ are removed from the end of the function and also that n1 is initialized to n instead of n - 1.
Here is the above example:
nthCombWithRep(40, 20, 1000000000 - 1) -->>
0 0 0 0 0 0 0 0 0 0 0 4 5 6 8 9 12 18 18 31

Count the frequency of matrix values including 0

I have a vector
A = [ 1 1 1 2 2 3 6 8 9 9 ]
I would like to write a loop that counts the frequencies of values in my vector within a range I choose, this would include values that have 0 frequencies
For example, if I chose the range of 1:9 my results would be
3 2 1 0 0 1 0 1 2
If I picked 1:11 the result would be
3 2 1 0 0 1 0 1 2 0 0
Is this possible? Also ideally I would have to do this for giant matrices and vectors, so the fasted way to calculate this would be appreciated.
Here's an alternative suggestion to histcounts, which appears to be ~8x faster on Matlab 2015b:
A = [ 1 1 1 2 2 3 6 8 9 9 ];
maxRange = 11;
N = accumarray(A(:), 1, [maxRange,1])';
N =
3 2 1 0 0 1 0 1 2 0 0
Comparing the speed:
K>> tic; for i = 1:100000, N1 = accumarray(A(:), 1, [maxRange,1])'; end; toc;
Elapsed time is 0.537597 seconds.
K>> tic; for i = 1:100000, N2 = histcounts(A,1:maxRange+1); end; toc;
Elapsed time is 4.333394 seconds.
K>> isequal(N1, N2)
ans =
1
As per the loop request, here's a looped version, which should not be too slow since the latest engine overhaul:
A = [ 1 1 1 2 2 3 6 8 9 9 ];
maxRange = 11; %// your range
output = zeros(1,maxRange); %// initialise output
for ii = 1:maxRange
tmp = A==ii; %// temporary storage
output(ii) = sum(tmp(:)); %// find the number of occurences
end
which would result in
output =
3 2 1 0 0 1 0 1 2 0 0
Faster and not-looping would be #beaker's suggestion to use histcounts:
[N,edges] = histcounts(A,1:maxRange+1);
N =
3 2 1 0 0 1 0 1 2 0
where the +1 makes sure the last entry is included as well.
Assuming the input A to be a sorted array and the range starts from 1 and goes until some value greater than or equal to the largest element in A, here's an approach using diff and find -
%// Inputs
A = [2 4 4 4 8 9 11 11 11 12]; %// Modified for variety
maxN = 13;
idx = [0 find(diff(A)>0) numel(A)]+1;
out = zeros(1,maxN); %// OR for better performance : out(maxN) = 0;
out(A(idx(1:end-1))) = diff(idx);
Output -
out =
0 1 0 3 0 0 0 1 1 0 3 1 0
This can be done very easily with bsxfun.
Let the data be
A = [ 1 1 1 2 2 3 6 8 9 9 ]; %// data
B = 1:9; %// possible values
Then
result = sum(bsxfun(#eq, A(:), B(:).'), 1);
gives
result =
3 2 1 0 0 1 0 1 2

Selecting neighbours on a circle

Consider we have N points on a circle. To each point an index is assigned i = (1,2,...,N). Now, for a randomly selected point, I want to have a vector including the indices of 5 points, [two left neighbors, the point itself, two right neighbors].
See the figure below.
Some sxamples are as follows:
N = 18;
selectedPointIdx = 4;
sequence = [2 3 4 5 6];
selectedPointIdx = 1
sequence = [17 18 1 2 3]
selectedPointIdx = 17
sequence = [15 16 17 18 1];
The conventional way to code this is considering the exceptions as if-else statements, as I did:
if ii == 1
lseq = [N-1 N ii ii+1 ii+2];
elseif ii == 2
lseq = [N ii-1 ii ii+1 ii+2];
elseif ii == N-1
lseq=[ii-2 ii-1 ii N 1];
elseif ii == N
lseq=[ii-2 ii-1 ii 1 2];
else
lseq=[ii-2 ii-1 ii ii+1 ii+2];
end
where ii is selectedPointIdx.
It is not efficient if I consider for instance 7 points instead of 5. What is a more efficient way?
How about this -
off = -2:2
out = mod((off + selectedPointIdx) + 17,18) + 1
For a window size of 7, edit off to -3:3.
It uses the strategy of subtracting 1 + modding + adding back 1 as also discussed here.
Sample run -
>> off = -2:2;
for selectedPointIdx = 1:18
disp(['For selectedPointIdx =',num2str(selectedPointIdx),' :'])
disp(mod((off + selectedPointIdx) + 17,18) + 1)
end
For selectedPointIdx =1 :
17 18 1 2 3
For selectedPointIdx =2 :
18 1 2 3 4
For selectedPointIdx =3 :
1 2 3 4 5
For selectedPointIdx =4 :
2 3 4 5 6
For selectedPointIdx =5 :
3 4 5 6 7
For selectedPointIdx =6 :
4 5 6 7 8
....
For selectedPointIdx =11 :
9 10 11 12 13
For selectedPointIdx =12 :
10 11 12 13 14
For selectedPointIdx =13 :
11 12 13 14 15
For selectedPointIdx =14 :
12 13 14 15 16
For selectedPointIdx =15 :
13 14 15 16 17
For selectedPointIdx =16 :
14 15 16 17 18
For selectedPointIdx =17 :
15 16 17 18 1
For selectedPointIdx =18 :
16 17 18 1 2
You can use modular arithmetic instead: Let p be the point among N points numbered 1 to N. Say you want m neighbors on each side, you can get them as follows:
(p - m - 1) mod N + 1
...
(p - 4) mod N + 1
(p - 3) mod N + 1
(p - 2) mod N + 1
p
(p + 1) mod N + 1
(p + 2) mod N + 1
(p + 3) mod N + 1
...
(p + m - 1) mod N + 1
Code:
N = 18;
p = 2;
m = 3;
for i = p - m : p + m
nb = mod((i - 1) , N) + 1;
disp(nb);
end
Run code here
I would like you to note that you might not necessarily improve performance by avoiding a if statement. A benchmark might be necessary to figure this out. However, this will only be significant if you are treating tens of thousands of numbers.

Matrix manipulation in Octave

I want to map a mX1 matrix X into mXp matrix Y where each row in the new matrix is as follows:
Y = [ X X.^2 X.^3 ..... X.^p]
I tried to use the following code:
Y = zeros(m, p);
for i=1:m
Y(i,:) = X(i);
for c=2:p
Y(i,:) = [Y(i,:) X(i).^p];
end
end
What you want do is called brodcasting. If you are using Octave 3.8 or later, the following will work fine:
octave> X = (1:5)'
X =
1
2
3
4
5
octave> P = (1:5)
P =
1 2 3 4 5
octave> X .^ P
ans =
1 1 1 1 1
2 4 8 16 32
3 9 27 81 243
4 16 64 256 1024
5 25 125 625 3125
The important thing to note is how X and P are a column and row vector respectively. See the octave manual on the topic.
For older of versions of Octave (without automatic broadcasting), the same can be accomplished with bsxfun (#power, X, P)

Efficient way to create a circulant matrix in R

I want to create a circulant matrix from a vector in R. A circulant matrix is a matrix with the following form.
1 2 3 4
4 1 2 3
3 4 1 2
2 3 4 1
The second row is the same as the first row except the last element is at the beginning, and so on.
Now I have the vector, say, (1, 2, 3, 4) and I want to find a efficient (fast) way to create this matrix. In practice, the numbers are not integers and can be any numbers.
Here is what I am doing now.
x <- 1:4
n <- length(x)
mat <- matrix(NA, n, n)
for (i in 1:n) {
mat[i, ] <- c(x[-(1:(n+1-i))], x[1:(n+1-i)])
}
I wonder if there is a faster way to do this? I need to generate this kind of matrices over and over. A small improvement for one step will make a big difference. Thank you.
This makes use of vector recycling (it throws a warning):
circ<-function(x) {
n<-length(x)
matrix(x[matrix(1:n,n+1,n+1,byrow=T)[c(1,n:2),1:n]],n,n)
}
circ(letters[1:4])
# [,1] [,2] [,3] [,4]
#[1,] "a" "b" "c" "d"
#[2,] "d" "a" "b" "c"
#[3,] "c" "d" "a" "b"
#[4,] "b" "c" "d" "a"
Here are some benchmarks of suggested solutions.
ndoogan takes the lead!
Benchmark
x <- 1:100
microbenchmark(
OP.Circulant(x),
Josh.Circulant(x),
Dwin.Circulant(x) ,
Matt.Circulant(x),
Matt.Circulant2(x),
Ndoogan.Circulant(x),
times=100
)
# Unit: microseconds
# expr min lq median uq max
# 1 Dwin.Circulant(x) 1232.775 1288.1590 1358.999 1504.4490 2900.430
# 2 Josh.Circulant(x) 1081.080 1086.3470 1097.863 1125.8745 2526.237
# 3 Matt.Circulant(x) 61924.920 64579.3735 65948.152 129359.7895 137371.570
# 4 Matt.Circulant2(x) 12746.096 13499.0580 13832.939 14346.8570 16308.040
# 5 Ndoogan.Circulant(x) 469.502 487.2285 528.591 585.8275 1522.363
# 6 OP.Circulant(x) 1291.352 1363.8395 1421.509 1513.4950 2714.707
Code used for benchmark
OP.Circulant <- function(x) {
n <- length(x)
mat <- matrix(NA, n, n)
for (i in 1:n) {
mat[i, ] <- c(x[-(1:(n + 1 - i))], x[1:(n + 1 - i)])
}
return(mat)
}
rotn <- function(x, n) rep(x, 2)[n:(n + length(x) - 1)]
Dwin.Circulant <- function(x) {
n <- length(x)
return(t(sapply(x[c(1L, n:2)], rotn, x = x)))
}
Josh.Circulant <- function(x, nrow = length(x)) {
m <- length(x)
return(matrix(x[(1:m - rep(1:nrow, each = m))%%m + 1L],
ncol = m, byrow = TRUE))
}
Matt.Circulant <- function(x) {
n <- length(x)
mat <- matrix(, n, n)
for (i in seq(-n + 1, n - 1)) {
mat[row(mat) == col(mat) - i] = x[i%%n + 1]
}
return(mat)
}
Matt.Circulant2 <- function(x) {
n <- length(x)
return(rbind(x[], do.call(rbind, lapply(seq(n - 1),
function(i) c(tail(x, i), head(x, -i))))))
}
Ndoogan.Circulant <-function(x) {
n <- length(x)
suppressWarnings(
matrix(x[matrix(1:n,n+1,n+1,byrow=T)[c(1,n:2),1:n]],n,n))
}
# check for identical results (all TRUE)
check <- OP.Circulant(x)
identical(check, OP.Circulant(x))
identical(check, Dwin.Circulant(x))
identical(check, Josh.Circulant(x))
identical(check, Matt.Circulant(x))
identical(check, Matt.Circulant2(x))
identical(check, Ndoogan.Circulant(x))
circulant <- function(x, nrow = length(x)) {
n <- length(x)
matrix(x[(1:n - rep(1:nrow, each=n)) %% n + 1L], ncol=n, byrow=TRUE)
}
circulant(1:4)
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 4 1 2 3
# [3,] 3 4 1 2
# [4,] 2 3 4 1
circulant(7:9, nrow=5)
# [,1] [,2] [,3]
# [1,] 7 8 9
# [2,] 9 7 8
# [3,] 8 9 7
# [4,] 7 8 9
# [5,] 9 7 8
circulant(10:1, nrow=2)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 10 9 8 7 6 5 4 3 2 1
# [2,] 1 10 9 8 7 6 5 4 3 2
rotn <- function(x,n) rep(x,2)[n:(n+length(x)-1)]
sapply(c(1,4:2), rotn, x=1:4)
[,1] [,2] [,3] [,4]
[1,] 1 4 3 2
[2,] 2 1 4 3
[3,] 3 2 1 4
[4,] 4 3 2 1
Might be faster inside a function if you constructed the double-length vector outside the sapply loop.
Here is a solution using Rcpp:
library(Rcpp)
cppFunction("
IntegerMatrix myCirculant(const int n) {
IntegerMatrix res(n);
int val = 1;
int dval = 2;
for (int i = 0; i < n*n; i++) {
res[i] = val;
if (val > 1) {
if (val != dval) {
val--;
} else {
if (dval == n) {
dval = 1;
} else {
dval++;
}
}
} else {
val = n;
}
}
return res;
}")
myCirculant(100)
works only for Integers and takes 1/10 of the time that Ndoogan.Circulant(1:100) takes on my machine.

Resources