Efficient way to create a circulant matrix in R - performance

I want to create a circulant matrix from a vector in R. A circulant matrix is a matrix with the following form.
1 2 3 4
4 1 2 3
3 4 1 2
2 3 4 1
The second row is the same as the first row except the last element is at the beginning, and so on.
Now I have the vector, say, (1, 2, 3, 4) and I want to find a efficient (fast) way to create this matrix. In practice, the numbers are not integers and can be any numbers.
Here is what I am doing now.
x <- 1:4
n <- length(x)
mat <- matrix(NA, n, n)
for (i in 1:n) {
mat[i, ] <- c(x[-(1:(n+1-i))], x[1:(n+1-i)])
}
I wonder if there is a faster way to do this? I need to generate this kind of matrices over and over. A small improvement for one step will make a big difference. Thank you.

This makes use of vector recycling (it throws a warning):
circ<-function(x) {
n<-length(x)
matrix(x[matrix(1:n,n+1,n+1,byrow=T)[c(1,n:2),1:n]],n,n)
}
circ(letters[1:4])
# [,1] [,2] [,3] [,4]
#[1,] "a" "b" "c" "d"
#[2,] "d" "a" "b" "c"
#[3,] "c" "d" "a" "b"
#[4,] "b" "c" "d" "a"

Here are some benchmarks of suggested solutions.
ndoogan takes the lead!
Benchmark
x <- 1:100
microbenchmark(
OP.Circulant(x),
Josh.Circulant(x),
Dwin.Circulant(x) ,
Matt.Circulant(x),
Matt.Circulant2(x),
Ndoogan.Circulant(x),
times=100
)
# Unit: microseconds
# expr min lq median uq max
# 1 Dwin.Circulant(x) 1232.775 1288.1590 1358.999 1504.4490 2900.430
# 2 Josh.Circulant(x) 1081.080 1086.3470 1097.863 1125.8745 2526.237
# 3 Matt.Circulant(x) 61924.920 64579.3735 65948.152 129359.7895 137371.570
# 4 Matt.Circulant2(x) 12746.096 13499.0580 13832.939 14346.8570 16308.040
# 5 Ndoogan.Circulant(x) 469.502 487.2285 528.591 585.8275 1522.363
# 6 OP.Circulant(x) 1291.352 1363.8395 1421.509 1513.4950 2714.707
Code used for benchmark
OP.Circulant <- function(x) {
n <- length(x)
mat <- matrix(NA, n, n)
for (i in 1:n) {
mat[i, ] <- c(x[-(1:(n + 1 - i))], x[1:(n + 1 - i)])
}
return(mat)
}
rotn <- function(x, n) rep(x, 2)[n:(n + length(x) - 1)]
Dwin.Circulant <- function(x) {
n <- length(x)
return(t(sapply(x[c(1L, n:2)], rotn, x = x)))
}
Josh.Circulant <- function(x, nrow = length(x)) {
m <- length(x)
return(matrix(x[(1:m - rep(1:nrow, each = m))%%m + 1L],
ncol = m, byrow = TRUE))
}
Matt.Circulant <- function(x) {
n <- length(x)
mat <- matrix(, n, n)
for (i in seq(-n + 1, n - 1)) {
mat[row(mat) == col(mat) - i] = x[i%%n + 1]
}
return(mat)
}
Matt.Circulant2 <- function(x) {
n <- length(x)
return(rbind(x[], do.call(rbind, lapply(seq(n - 1),
function(i) c(tail(x, i), head(x, -i))))))
}
Ndoogan.Circulant <-function(x) {
n <- length(x)
suppressWarnings(
matrix(x[matrix(1:n,n+1,n+1,byrow=T)[c(1,n:2),1:n]],n,n))
}
# check for identical results (all TRUE)
check <- OP.Circulant(x)
identical(check, OP.Circulant(x))
identical(check, Dwin.Circulant(x))
identical(check, Josh.Circulant(x))
identical(check, Matt.Circulant(x))
identical(check, Matt.Circulant2(x))
identical(check, Ndoogan.Circulant(x))

circulant <- function(x, nrow = length(x)) {
n <- length(x)
matrix(x[(1:n - rep(1:nrow, each=n)) %% n + 1L], ncol=n, byrow=TRUE)
}
circulant(1:4)
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 4 1 2 3
# [3,] 3 4 1 2
# [4,] 2 3 4 1
circulant(7:9, nrow=5)
# [,1] [,2] [,3]
# [1,] 7 8 9
# [2,] 9 7 8
# [3,] 8 9 7
# [4,] 7 8 9
# [5,] 9 7 8
circulant(10:1, nrow=2)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 10 9 8 7 6 5 4 3 2 1
# [2,] 1 10 9 8 7 6 5 4 3 2

rotn <- function(x,n) rep(x,2)[n:(n+length(x)-1)]
sapply(c(1,4:2), rotn, x=1:4)
[,1] [,2] [,3] [,4]
[1,] 1 4 3 2
[2,] 2 1 4 3
[3,] 3 2 1 4
[4,] 4 3 2 1
Might be faster inside a function if you constructed the double-length vector outside the sapply loop.

Here is a solution using Rcpp:
library(Rcpp)
cppFunction("
IntegerMatrix myCirculant(const int n) {
IntegerMatrix res(n);
int val = 1;
int dval = 2;
for (int i = 0; i < n*n; i++) {
res[i] = val;
if (val > 1) {
if (val != dval) {
val--;
} else {
if (dval == n) {
dval = 1;
} else {
dval++;
}
}
} else {
val = n;
}
}
return res;
}")
myCirculant(100)
works only for Integers and takes 1/10 of the time that Ndoogan.Circulant(1:100) takes on my machine.

Related

storage problem in R. alternative to nested loop for creating array of matrices and then multiple plots

With the following pieces of information, I can easily create an array of matrices
b0=data.frame(b0_1=c(11.41,11.36),b0_2=c(8.767,6.950))
b1=data.frame(b1_1=c(0.8539,0.9565),b1_2=c(-0.03179,0.06752))
b2=data.frame(b2_1=c(-0.013020 ,-0.016540),b2_2=c(-0.0002822,-0.0026720))
T.val=data.frame(T1=c(1,1),T2=c(1,2),T3=c(2,1))
dt_data=cbind(b0,b1,b2,T.val)
fu.time=seq(0,50,by=0.8)
pat=ncol(T.val) #number of T's
nit=2 #no of rows
pt.array1=array(NA, dim=c(nit,length(fu.time),pat))
for ( it.er in 1:nit){
for ( ti in 1:length(fu.time)){
for (pt in 1:pat){
pt.array1[it.er,ti,pt]=b0[it.er,T.val[it.er,pt]]+b1[it.er,T.val[it.er,pt]]*fu.time[ti]+b2[it.er,T.val[it.er,pt]]*fu.time[ti]^2
}
}
}
pt.array_mean=apply(pt.array1, c(3,2), mean)
pt.array_LCL=apply(pt.array1, c(3,2), quantile, prob=0.25)
pt.array_UCL=apply(pt.array1, c(3,2), quantile, prob=0.975)
Now with these additional data, I can create three plots as follows
mydata
pt.ID time IPSS
1 1 0.000000 10
2 1 1.117808 8
3 1 4.504110 5
4 1 6.410959 14
5 1 13.808220 10
6 1 19.890410 4
7 1 28.865750 15
8 1 35.112330 7
9 2 0.000000 6
10 2 1.117808 7
11 2 4.109589 8
12 2 10.093151 7
13 2 16.273973 11
14 2 18.345205 18
15 2 21.567120 14
16 2 25.808220 12
17 2 56.087670 5
18 3 0.000000 8
19 3 1.413699 3
20 3 4.405479 3
21 3 10.389041 8
pdf("plots.pdf")
par(mfrow=c(3,2))
for( pt.no in 1:pat){
plot(IPSS[ID==pt.no]~time[ID==pt.no],xlim=c(0,57),ylim=c(0,35),type="l",col="black",
xlab="f/u time", ylab= "",main = paste("patient", pt.no),data=mydata)
points(IPSS[ID==pt.no]~time[ID==pt.no],data=mydata)
lines(pt.array_mean[pt.no,]~fu.time, col="blue")
lines(pt.array_LCL[pt.no,]~fu.time, col="green")
lines(pt.array_UCL[pt.no,]~fu.time, col="green")
}
dev.off()
The problem arise when the number of rows in each matrix is much bigger say 10000. It takes too much computation time to create the pt.array1 for large number of rows in b0, b1 and b2.
Is there any alternative way I can do it quickly using any builtin function?
Can I avoid the storage allocation for pt.array1 as I am not using it further? I just need pt.array_mean, pt.array_UCL and pt.array_LCL for myplot.
Any help is appreciated.
There are a couple of other approaches you can employ.
First, you largely have a model of b0 + b1*fu + b2*fu^2. Therefore, you could make the coefficients and apply the fu after the fact:
ind <- expand.grid(nits = seq_len(nit), pats = seq_len(pat))
mat_ind <- cbind(ind[, 'nits'], T.val[as.matrix(ind)])
b_mat <- matrix(c(b0[mat_ind], b1[mat_ind], b2[mat_ind]), ncol = 3)
b_mat
[,1] [,2] [,3]
[1,] 11.410 0.85390 -0.0130200
[2,] 11.360 0.95650 -0.0165400
[3,] 11.410 0.85390 -0.0130200
[4,] 6.950 0.06752 -0.0026720
[5,] 8.767 -0.03179 -0.0002822
[6,] 11.360 0.95650 -0.0165400
Now if we apply the model to each row, we will get all of your raw results. The only problem is that we don't match your original output - each column slice of your array is equivalent of a row slice of my matrix output.
pt_array <- apply(b_mat, 1, function(x) x[1] + x[2] * fu.time + x[3] * fu.time^2)
pt_array[1,]
[1] 11.410 11.360 11.410 6.950 8.767 11.360
pt.array1[, 1, ]
[,1] [,2] [,3]
[1,] 11.41 11.41 8.767
[2,] 11.36 6.95 11.360
That's OK because we can fix the shape of it as we get summary statistics - we just need to take the colSums and colQuantiles of each row converted to a 2 x 3 matrix:
library(matrixStats)
pt_summary = array(t(apply(pt_array,
1,
function(row) {
M <- matrix(row, ncol = pat)
c(colMeans2(M),colQuantiles(M, probs = c(0.25, 0.975))
)
}
)),
dim = c(length(fu.time), pat, 3),
dimnames = list(NULL, paste0('pat', seq_len(pat)), c('mean', 'LCL', 'UCL'))
)
pt_summary[1, ,] #slice at time = 1
mean LCL UCL
pat1 11.3850 11.37250 11.40875
pat2 9.1800 8.06500 11.29850
pat3 10.0635 9.41525 11.29518
# rm(pt.array1)
Then to do your final graphing, I simplified it - the data argument can be a subset(mydata, pt.ID == pt.no). Additionally, since the summary statistics are now in an array format, matlines allows everything to be done at once:
par(mfrow=c(3,2))
for( pt.no in 1:pat){
plot(IPSS~pt.ID, data=subset(mydata, pt.ID == pt.no),
xlim=c(0,57), ylim=c(0,35),
type="l",col="black", xlab="f/u time", ylab= "",
main = paste("patient", pt.no)
)
points(IPSS~time, data=subset(mydata, pt.ID == pt.no))
matlines(y = pt_summary[,pt.no ,], x = fu.time, col=c("blue", 'green', 'green'))
}

Is there a function to generate a specific n Multichoose r combination, given the index number?

For example, 3 multichoose 2 has the following combinations:
i combo
0 = [0,0]
1 = [0,1]
2 = [0,2]
3 = [1,1]
4 = [1,2]
5 = [2,2]
Could a function be written whose arguments are n,r,i and returns the combination in question, without iterating through every combination before it?
Could a function be written whose arguments are n,r,i and returns the combination in question, without iterating through every combination before it?
Yes. We have to do a little counting to get at the heart of this problem. To better illustrate how this can be broken down into very simple smaller problems, we will look at a larger example. Consider all combinations of 5 chosen 3 at a time with no repeats (we will say from here on out 5 choose 3).
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 4
[3,] 1 2 5
[4,] 1 3 4
[5,] 1 3 5
[6,] 1 4 5
[7,] 2 3 4
[8,] 2 3 5
[9,] 2 4 5
[10,] 3 4 5
Notice the first 6 rows. If we remove the first column of these 6 rows and subtract 1 from every element, we obtain:
[,1] [,2] [,1] [,2]
[1,] 2 3 [1,] 1 2
[2,] 2 4 subtract 1 [2,] 1 3
[3,] 2 5 --->>>> [3,] 1 4
[4,] 3 4 [4,] 2 3
[5,] 3 5 [5,] 2 4
[6,] 4 5 [6,] 3 4
The matrix on the right is precisely all of the combinations of 4 choose 2. Continuing on, we see that the "second" group (i.e. rows 7 through 9 of the original matrix) also looks to have order:
[,1] [,2] [,1] [,2]
[1,] 3 4 [1,] 1 2
[2,] 3 5 subtract 2 [2,] 1 3
[3,] 4 5 --->>>> [3,] 2 3
This is simply 3 choose 2. We are starting to see a pattern unfold. Namely, that all combinations of smaller n and r are contained in our parent combinations. This pattern continues as we move to the right. All that is left is to keep up with which combination we are after.
Below is the above algorithm written out in C++ (N.B. there isn't any data validation):
template <typename T>
double nChooseK(T n, T k) {
// Returns number of k-combinations from n elements.
// Mathematically speaking, we have: n!/(k!*(n-k)!)
if (k == n || k == 0)
return 1;
else if (k > n || n < 0)
return 0;
double nCk;
double temp = 1;
for (int i = 1; i <= k; i++)
temp *= (double) (n - k + i) / i;
nCk = std::round(temp);
return nCk;
}
std::vector<int> nthCombination(int n, int r, double i) {
int j = 0, n1 = n - 1, r1 = r - 1;
double temp, index1 = i, index2 = i;
std::vector<int> res(r);
for (int k = 0; k < r; k++) {
temp = nChooseK(n1, r1);
while (temp <= index1) {
index2 -= nChooseK(n1, r1);
n1--;
j++;
temp += nChooseK(n1, r1);
}
res[k] = j;
n1--;
r1--;
j++;
index1 = index2;
}
return res;
}
Calling it on our example above with 5 choose 3 we obtain:
nthCombination(5, 3, 0) -->> 0 1 2
nthCombination(5, 3, 1) -->> 0 1 3
nthCombination(5, 3, 2) -->> 0 1 4
nthCombination(5, 3, 3) -->> 0 2 3
nthCombination(5, 3, 4) -->> 0 2 4
nthCombination(5, 3, 5) -->> 0 3 4
nthCombination(5, 3, 6) -->> 1 2 3
nthCombination(5, 3, 7) -->> 1 2 4
nthCombination(5, 3, 8) -->> 1 3 4
nthCombination(5, 3, 9) -->> 2 3 4
This approach is very efficient as well. Below, we get the billionth combination of 40 choose 20 (which generates more than 100 billion combinations) instantly:
// N.B. base zero so we need to subtract 1
nthCombination(40, 20, 1000000000 - 1) -->>
0 1 2 3 4 5 8 9 14 16 18 20 22 23 31 33 34 35 38 39
Edit
As the OP points out in the comments, they gave an example with repeats. The solution is very similar and it breaks down to counting. We first need a counting function similar to nChooseK but that considers repeats. The function below does just that:
double combsWithReps(int n, int r) {
// For combinations where repetition is allowed, this
// function returns the number of combinations for
// a given n and r. The resulting vector, "triangleVec"
// resembles triangle numbers. In fact, this vector
// is obtained in a very similar method as generating
// triangle numbers, albeit in a repeating fashion.
if (r == 0)
return 1;
int i, k;
std::vector<double> triangleVec(n);
std::vector<double> temp(n);
for (i = 0; i < n; i++)
triangleVec[i] = i+1;
for (i = 1; i < r; i++) {
for (k = 1; k <= n; k++)
temp[k-1] = std::accumulate(triangleVec.begin(), triangleVec.begin() + k, 0.0);
triangleVec = temp;
}
return triangleVec[n-1];
}
And here is the function that generates the ith combination with repeats.
std::vector<int> nthCombWithRep(int n, int r, double i) {
int j = 0, n1 = n, r1 = r - 1;
double temp, index1 = i, index2 = i;
std::vector<int> res(r);
for (int k = 0; k < r; k++) {
temp = combsWithReps(n1, r1);
while (temp <= index1) {
index2 -= combsWithReps(n1, r1);
n1--;
j++;
temp += combsWithReps(n1, r1);
}
res[k] = j;
r1--;
index1 = index2;
}
return res;
}
It is very similar to the first function above. You will notice that n1-- and j++ are removed from the end of the function and also that n1 is initialized to n instead of n - 1.
Here is the above example:
nthCombWithRep(40, 20, 1000000000 - 1) -->>
0 0 0 0 0 0 0 0 0 0 0 4 5 6 8 9 12 18 18 31

How to define contrast coefficient matrix?

I have this data
y x1 x2 pre
1 16 1 1 14
2 15 1 1 13
3 14 1 2 14
4 13 1 2 13
5 12 2 1 12
6 11 2 1 12
7 11 2 2 13
8 13 2 2 13
9 10 3 1 10
10 11 3 1 11
11 11 3 2 11
12 9 3 2 10
And I fitted the following model
lm(y ~ x1 + x2 + x1*x2)
My design matrix is
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 14 1 0 1 1 0
[2,] 1 13 1 0 1 1 0
[3,] 1 14 1 0 0 0 0
[4,] 1 13 1 0 0 0 0
[5,] 1 12 0 1 1 0 1
[6,] 1 12 0 1 1 0 1
[7,] 1 13 0 1 0 0 0
[8,] 1 13 0 1 0 0 0
[9,] 1 10 0 0 1 0 0
[10,] 1 11 0 0 1 0 0
[11,] 1 11 0 0 0 0 0
[12,] 1 10 0 0 0 0 0
I'm trying to use this design to reproduce the following table:
Source DF Squares Mean Square F Value Pr > F
Model 6 44.79166667 7.46527778 12.98 0.0064
Error 5 2.87500000 0.57500000
Corrected Total 11 47.66666667
Source DF Type III SS Mean Square F Value Pr > F
pre 1 3.12500000 3.12500000 5.43 0.0671
x1 2 4.58064516 2.29032258 3.98 0.0923
x2 1 3.01785714 3.01785714 5.25 0.0706
x1*x2 2 1.25000000 0.62500000 1.09 0.4055
The first part is fine
XtX <- t(x) %*% x
XtXinv <- solve(XtX)
betahat <- XtXinv %*% t(x) %*% y
H <- x %*% XtXinv %*% t(x)
IH <- (diag(1,12) - H)
yhat <- H %*% y
e <- IH %*% y
ybar <- mean(y)
MSS <- t(betahat) %*% t(x) %*% y - length(y)*(ybar^2)
ESS <- t(e) %*% e
TSS <- MSS + ESS
dfM <- sum(diag(H)) - 1
dfE <- sum(diag(IH))
dfT <- dfM + dfE
MSM <- MSS/dfM
MSE <- ESS/dfE
Ftest <- MSM / MSE
pr <- 1 - pf(Ftest, dfM, dfE)
The contrast coefficient matrix for 'pre' seems correct.
L <- matrix(c(0,1,0,0,0,0,0), 1, 7, byrow=T)
Lb <- L %*% betahat
LXtXinvLt <- round(L %*% XtXinv %*% t(L), digits=4)
SSpre <- t(Lb) %*% solve(LXtXinvLt) %*% (Lb)
MSpre <- SSpre / 1
Fpre <- MSpre / MSE
PRpre <- 1 - pf(Fpre, 1, 12-7)
But I can't understand how to define the contrast coefficient matrix for x1, x2, and x1*x2. What's the problem with the rest of my code? Below an example for how I think I should calculate for x1
L <- matrix(c(0,0,1,1,0,0,0), 1, 7, byrow=T)
Lb <- L %*% betahat
LXtXinvLt <- round(L %*% XtXinv %*% t(L), digits=4)
SSX1 <- t(Lb) %*% solve(LXtXinvLt) %*% (Lb)
MSX1 <- SSX1 / 1
FX1 <- MSX1 / MSE
PRX1 <- 1 - pf(FX1, 1, 12-7)
Thanks!

Applying a function to a distance matrix in R

This question came today in the manipulatr mailing list.
http://groups.google.com/group/manipulatr/browse_thread/thread/fbab76945f7cba3f
I am rephrasing.
Given a distance matrix (calculated with dist) apply a function to the rows of the distance matrix.
Code:
library(plyr)
N <- 100
a <- data.frame(b=1:N,c=runif(N))
d <- dist(a,diag=T,upper=T)
sumd <- adply(as.matrix(d),1,sum)
The problem is that to apply the function by row you have to store the whole matrix (instead of just the lower triangular part. So it uses too much memory for large matrices. It fails in my computer for matrices of dimensions ~ 10000.
Any ideas?
First of all, for anyone who hasn't seen this yet, I strongly recommend reading this article on the r-wiki about code optimization.
Here's another version without using ifelse (that's a relatively slow function):
noeq.2 <- function(i, j, N) {
i <- i-1
j <- j-1
x <- i*(N-1) - (i-1)*((i-1) + 1)/2 + j - i
x2 <- j*(N-1) - (j-1)*((j-1) + 1)/2 + i - j
idx <- i < j
x[!idx] <- x2[!idx]
x[i==j] <- 0
x
}
And timings on my laptop:
> N <- 1000
> system.time(sapply(1:N, function(i) sapply(1:N, function(j) noeq(i, j, N))))
user system elapsed
51.31 0.10 52.06
> system.time(sapply(1:N, function(j) noeq.1(1:N, j, N)))
user system elapsed
2.47 0.02 2.67
> system.time(sapply(1:N, function(j) noeq.2(1:N, j, N)))
user system elapsed
0.88 0.01 1.12
And lapply is faster than sapply:
> system.time(do.call("rbind",lapply(1:N, function(j) noeq.2(1:N, j, N))))
user system elapsed
0.67 0.00 0.67
This is a vectorized version of the function noeq (either argument i or j):
noeq.1 <- function(i, j, N) {
i <- i-1
j <- j-1
ifelse(i < j,
i*(N-1) - ((i-1)*i)/2 + j - i,
j*(N-1) - ((j-1)*j)/2 + i - j) * ifelse(i == j, 0, 1)
}
> N <- 4
> sapply(1:N, function(i) sapply(1:N, function(j) noeq(i, j, N)))
[,1] [,2] [,3] [,4]
[1,] 0 1 2 3
[2,] 1 0 4 5
[3,] 2 4 0 6
[4,] 3 5 6 0
> sapply(1:N, function(i) noeq.1(i, 1:N, N))
[,1] [,2] [,3] [,4]
[1,] 0 1 2 3
[2,] 1 0 4 5
[3,] 2 4 0 6
[4,] 3 5 6 0
Timings are done on a 2.4 GHz Intel Core 2 Duo (Mac OS 10.6.1):
> N <- 1000
> system.time(sapply(1:N, function(j) noeq.1(1:N, j, N)))
user system elapsed
0.676 0.061 0.738
> system.time(sapply(1:N, function(i) sapply(1:N, function(j) noeq(i, j, N))))
user system elapsed
14.359 0.032 14.410
My solution is to get the indexes of the distance vector, given a row and the size of the matrix. I got this from codeguru
int Trag_noeq(int row, int col, int N)
{
//assert(row != col); //You can add this in if you like
if (row<col)
return row*(N-1) - (row-1)*((row-1) + 1)/2 + col - row - 1;
else if (col<row)
return col*(N-1) - (col-1)*((col-1) + 1)/2 + row - col - 1;
else
return -1;
}
After translating to R, assuming indexes start at 1, and assuming a lower tri instead of upper tri matrix I got.
EDIT: Using the vectorized version contributed by rcs
noeq.1 <- function(i, j, N) {
i <- i-1
j <- j-1
ix <- ifelse(i < j,
i*(N-1) - (i-1)*((i-1) + 1)/2 + j - i,
j*(N-1) - (j-1)*((j-1) + 1)/2 + i - j) * ifelse(i == j, 0, 1)
ix
}
## To get the indexes of the row, the following one liner works:
getrow <- function(z, N) noeq.1(z, 1:N, N)
## to get the row sums
getsum <- function(d, f=sum) {
N <- attr(d, "Size")
sapply(1:N, function(i) {
if (i%%100==0) print(i)
f(d[getrow(i,N)])
})
}
So, with the example:
sumd2 <- getsum(d)
This was much slower than as.matrix for small matrices before vectorizing. But just about 3x as slow after vectorizing. In a Intel Core2Duo 2ghz applying the sum by row of the size 10000 matrix took just over 100s. The as.matrix method fails. Thanks rcs!

Code-golf: generate pascal's triangle

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Generate a list of lists (or print, I don't mind) a Pascal's Triangle of size N with the least lines of code possible!
Here goes my attempt (118 characters in python 2.6 using a trick):
c,z,k=locals,[0],'_[1]'
p=lambda n:[len(c()[k])and map(sum,zip(z+c()[k][-1],c()[k][-1]+z))or[1]for _ in range(n)]
Explanation:
the first element of the list comprehension (when the length is 0) is [1]
the next elements are obtained the following way:
take the previous list and make two lists, one padded with a 0 at the beginning and the other at the end.
e.g. for the 2nd step, we take [1] and make [0,1] and [1,0]
sum the two new lists element by element
e.g. we make a new list [(0,1),(1,0)] and map with sum.
repeat n times and that's all.
usage (with pretty printing, actually out of the code-golf xD):
result = p(10)
lines = [" ".join(map(str, x)) for x in result]
for i in lines:
print i.center(max(map(len, lines)))
output:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
K (Wikipedia), 15 characters:
p:{x{+':x,0}\1}
Example output:
p 10
(1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
1 10 45 120 210 252 210 120 45 10 1)
It's also easily explained:
p:{x {+':x,0} \ 1}
^ ^------^ ^ ^
A B C D
p is a function taking an implicit parameter x.
p unfolds (C) an anonymous function (B) x times (A) starting at 1 (D).
The anonymous function simply takes a list x, appends 0 and returns a result by adding (+) each adjacent pair (':) of values: so e.g. starting with (1 2 1), it'll produce (1 2 1 0), add pairs (1 1+2 2+1 1+0), giving (1 3 3 1).
Update: Adapted to K4, which shaves off another two characters. For reference, here's the original K3 version:
p:{x{+':0,x,0}\1}
J, another language in the APL family, 9 characters:
p=:!/~#i.
This uses J's builtin "combinations" verb.
Output:
p 10
1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9
0 0 1 3 6 10 15 21 28 36
0 0 0 1 4 10 20 35 56 84
0 0 0 0 1 5 15 35 70 126
0 0 0 0 0 1 6 21 56 126
0 0 0 0 0 0 1 7 28 84
0 0 0 0 0 0 0 1 8 36
0 0 0 0 0 0 0 0 1 9
0 0 0 0 0 0 0 0 0 1
Haskell, 58 characters:
r 0=[1]
r(n+1)=zipWith(+)(0:r n)$r n++[0]
p n=map r[0..n]
Output:
*Main> p 5
[[1],[1,1],[1,2,1],[1,3,3,1],[1,4,6,4,1],[1,5,10,10,5,1]]
More readable:
-- # row 0 is just [1]
row 0 = [1]
-- # row (n+1) is calculated from the previous row
row (n+1) = zipWith (+) ([0] ++ row n) (row n ++ [0])
-- # use that for a list of the first n+1 rows
pascal n = map row [0..n]
69C in C:
f(int*t){int*l=t+*t,*p=t,r=*t,j=0;for(*t=1;l<t+r*r;j=*p++)*l++=j+*p;}
Use it like so:
int main()
{
#define N 10
int i, j;
int t[N*N] = {N};
f(t);
for (i = 0; i < N; i++)
{
for (j = 0; j <= i; j++)
printf("%d ", t[i*N + j]);
putchar('\n');
}
return 0;
}
F#: 81 chars
let f=bigint.Factorial
let p x=[for n in 0I..x->[for k in 0I..n->f n/f k/f(n-k)]]
Explanation: I'm too lazy to be as clever as the Haskell and K programmers, so I took the straight forward route: each element in Pascal's triangle can be uniquely identified using a row n and col k, where the value of each element is n!/(k! (n-k)!.
Python: 75 characters
def G(n):R=[[1]];exec"R+=[map(sum,zip(R[-1]+[0],[0]+R[-1]))];"*~-n;return R
Shorter prolog version (112 instead of 164):
n([X],[X]).
n([H,I|T],[A|B]):-n([I|T],B),A is H+I.
p(0,[[1]]):-!.
p(N,[R,S|T]):-O is N-1,p(O,[S|T]),n([0|S],R).
another stab (python):
def pascals_triangle(n):
x=[[1]]
for i in range(n-1):
x.append(list(map(sum,zip([0]+x[-1],x[-1]+[0]))))
return x
Haskell, 164C with formatting:
i l=zipWith(+)(0:l)$l++[0]
fp=map (concatMap$(' ':).show)f$iterate i[1]
c n l=if(length l<n)then c n$' ':l++" "else l
cl l=map(c(length$last l))l
pt n=cl$take n fp
Without formatting, 52C:
i l=zipWith(+)(0:l)$l++[0]
pt n=take n$iterate i[1]
A more readable form of it:
iterateStep row = zipWith (+) (0:row) (row++[0])
pascalsTriangle n = take n $ iterate iterateStep [1]
-- For the formatted version, we reduce the number of rows at the final step:
formatRow r = concatMap (\l -> ' ':(show l)) r
formattedLines = map formatRow $ iterate iterateStep [1]
centerTo width line =
if length line < width
then centerTo width (" " ++ line ++ " ")
else line
centerLines lines = map (centerTo (length $ last lines)) lines
pascalsTriangle n = centerLines $ take n formattedLines
And perl, 111C, no centering:
$n=<>;$p=' 1 ';for(1..$n){print"$p\n";$x=" ";while($p=~s/^(?= ?\d)(\d* ?)(\d* ?)/$2/){$x.=($1+$2)." ";}$p=$x;}
Scheme — compressed version of 100 characters
(define(P h)(define(l i r)(if(> i h)'()(cons r(l(1+ i)(map +(cons 0 r)(append r '(0))))))(l 1 '(1)))
This is it in a more readable form (269 characters):
(define (pascal height)
(define (next-row row)
(map +
(cons 0 row)
(append row '(0))))
(define (iter i row)
(if (> i height)
'()
(cons row
(iter (1+ i)
(next-row row)))))
(iter 1 '(1)))
VBA/VB6 (392 chars w/ formatting)
Public Function PascalsTriangle(ByVal pRows As Integer)
Dim iRow As Integer
Dim iCol As Integer
Dim lValue As Long
Dim sLine As String
For iRow = 1 To pRows
sLine = ""
For iCol = 1 To iRow
If iCol = 1 Then
lValue = 1
Else
lValue = lValue * (iRow - iCol + 1) / (iCol - 1)
End If
sLine = sLine & " " & lValue
Next
Debug.Print sLine
Next
End Function
PHP 100 characters
$v[]=1;while($a<34){echo join(" ",$v)."\n";$a++;for($k=0;$k<=$a;$k++)$t[$k]=$v[$k-1]+$v[$k];$v=$t;}
Ruby, 83c:
def p(n);n>0?(m=p(n-1);k=m.last;m+[([0]+k).zip(k+[0]).map{|x|x[0]+x[1]}]):[[1]];end
test:
irb(main):001:0> def p(n);n>0?(m=p(n-1);k=m.last;m+[([0]+k).zip(k+[0]).map{|x|x[0]+x[1]}]):[[1]];end
=> nil
irb(main):002:0> p(5)
=> [[1], [1, 1], [1, 2, 1], [1, 3, 3, 1], [1, 4, 6, 4, 1], [1, 5, 10, 10, 5, 1]]
irb(main):003:0>
Another python solution, that could be much shorter if the builtin functions had shorter names... 106 characters.
from itertools import*
r=range
p=lambda n:[[len(list(combinations(r(i),j)))for j in r(i+1)]for i in r(n)]
Another try, in prolog (I'm practising xD), not too short, just 164c:
s([],[],[]).
s([H|T],[J|U],[K|V]):-s(T,U,V),K is H+J.
l([1],0).
l(P,N):-M is N-1,l(A,M),append(A,[0],B),s(B,[0|A],P).
p([],-1).
p([H|T],N):-M is N-1,l(H,N),p(T,M).
explanation:
s = sum lists element by element
l = the Nth row of the triangle
p = the whole triangle of size N
VBA, 122 chars:
Sub p(n)
For r = 1 To n
l = "1"
v = 1
For c = 1 To r - 1
v = v / c * (r - c)
l = l & " " & v
Next
Debug.Print l
Next
End Sub
I wrote this C++ version a few years ago:
#include <iostream>
int main(int,char**a){for(int b=0,c=0,d=0,e=0,f=0,g=0,h=0,i=0;b<atoi(a[1]);(d|f|h)>1?e*=d>1?--d:1,g*=f>1?--f:1,i*=h>1?--h:1:((std::cout<<(i*g?e/(i*g):1)<<" "?d=b+=c++==b?c=0,std::cout<<std::endl?1:0:0,h=d-(f=c):0),e=d,g=f,i=h));}
The following is just a Scala function returning a List[List[Int]]. No pretty printing or anything. Any suggested improvements? (I know it's inefficient, but that's not the main challenge now, is it?). 145 C.
def p(n: Int)={def h(n:Int):List[Int]=n match{case 1=>1::Nil;case _=>(0::h(n-1) zipAll(h(n-1),0,0)).map{n=>n._1+n._2}};(1 to n).toList.map(h(_))}
Or perhaps:
def pascal(n: Int) = {
def helper(n: Int): List[Int] = n match {
case 1 => 1 :: List()
case _ => (0 :: helper(n-1) zipAll (helper(n-1),0,0)).map{ n => n._1 + n._2 }
}
(1 to n).toList.map(helper(_))
}
(I'm a Scala noob, so please be nice to me :D )
a Perl version (139 chars w/o shebang)
#p = (1,1);
while ($#p < 20) {
#q =();
$z = 0;
push #p, 0;
foreach (#p) {
push #q, $_+$z;
$z = $_
}
#p = #q;
print "#p\n";
}
output starts from 1 2 1
PHP, 115 chars
$t[][]=1;
for($i=1;$i<$n;++$i){
$t[$i][0]=1;
for($j=1;$j<$i;++$j)$t[$i][$j]=$t[$i-1][$j-1]+$t[$i-1][$j];
$t[$i][$i]=1;}
If you don't care whether print_r() displays the output array in the correct order, you can shave it to 113 chars like
$t[][]=1;
for($i=1;$i<$n;++$i){
$t[$i][0]=$t[$i][$i]=1;
for($j=1;$j<$i;++$j)$t[$i][$j]=$t[$i-1][$j-1]+$t[$i-1][$j];}
Perl, 63 characters:
for(0..9){push#z,1;say"#z";#z=(1,map{$z[$_-1]+$z[$_]}(1..$#z))}
My attempt in C++ (378c). Not anywhere near as good as the rest of the posts.. but I'm proud of myself for coming up with a solution on my own =)
int* pt(int n)
{
int s=n*(n+1)/2;
int* t=new int[s];
for(int i=0;i<n;++i)
for(int j=0;j<=i;++j)
t[i*n+j] = (!j || j==i) ? 1 : t[(i-1)*n+(j-1)] + t[(i-1)*n+j];
return t;
}
int main()
{
int n,*t;
std::cin>>n;
t=pt(n);
for(int i=0;i<n;++i)
{
for(int j=0;j<=i;j++)
std::cout<<t[i*n+j]<<' ';
std::cout<<"\n";
}
}
Old thread, but I wrote this in response to a challenge on another forum today:
def pascals_triangle(n):
x=[[1]]
for i in range(n-1):
x.append([sum(i) for i in zip([0]+x[-1],x[-1]+[0])])
return x
for x in pascals_triangle(5):
print('{0:^16}'.format(x))
[1]
[1, 1]
[1, 2, 1]
[1, 3, 3, 1]
[1, 4, 6, 4, 1]

Resources