Most efficient way of subsetting vectors - performance

I need to calculate the mean and variance of a subset of a vector. Let x be the vector and y be an indicator for whether the observation is in the subset. Which is more efficient:
sub.mean <- mean(x[y])
sub.var <- var(x[y])
or
sub <- x[y]
sub.mean <- mean(sub)
sub.var <- var(sub)
sub <- NULL
The first approach doesn't create a new object explicitly; but do the calls to mean and var do that implicitly? Or do they work on the original vector as stored?
Is the second faster because it doesn't have to do the subsetting twice?
I'm concerned with speed and with memory management for large data sets.

Benchmarking on a vector of length 10M indicates that (on my machine) the latter approach is faster:
f1 = function(x, y) {
sub.mean <- mean(x[y])
sub.var <- var(x[y])
}
f2 = function(x, y) {
sub <- x[y]
sub.mean <- mean(sub)
sub.var <- var(sub)
sub <- NULL
}
x = rnorm(10000000)
y = rbinom(10000000, 1, .5)
print(system.time(f1(x, y)))
# user system elapsed
# 0.403 0.037 0.440
print(system.time(f2(x, y)))
# user system elapsed
# 0.233 0.002 0.235
This isn't surprising- mean(x[y]) does have to create a new object for the mean function to act on, even if it doesn't add it to the local namespace. Thus, f1 is slower for having to do the subsetting twice (as you surmised).

Related

How to set my inequality constraint into my R function?

I am working on a project consisting of the analysis of different portfolio constructions in a universe of various assets. I work on 22 assets and I recalibrate my portfolio every 90 days. This is why a weights penalties (see code) constraint is applied as the allocation changes every period.
I am currently implementing a construction based on independent components. My objective is to minimize the modified value at risk based on its components. (See code below).
My function runs correctly and everything seems to be OK, my function "MVaR.IC.port" and "MVaR.cm" work well. However, I can only implement this model in the case where short selling is allowed. I would now like to operate only in "Long only", i.e. that my weight vectors w only contain elements >=0. Concretely, i want that the expression "w <- t(w.IC)%*%a$A" in my code be >=0.
Do you know how to help me? Thank you in advance.
[results w.out.MVaR.IC.22,][1] Here are the results that must be positive. I also constraint that the sum of the weights must be equal to 1 (the investor allocates 100% of his wealth.).
Thomas
PS: train and test represent my rolling windows. In fact, I calibrate my models on 'train' (in sample) and apply them on 'test' (out of sample) in order to analyse their performance.
########################################
######### MVar on IC with CM #########
########################################
lower = rep(-5,k)
upper = rep(5,k)
#Set up objective function and constraint
MVaR.IC.cm.port <- function(S, weights, alpha, MixingMatrix)
{
obj <- MVaR(S, weights, alpha)
w.ICA <- t(weights)%*%MixingMatrix
weight.penalty = abs(1000*(1-sum(w.ICA)))
down.weight.penalty = 1000*sum(w.ICA[w.ICA > 1])
up.weight.penalty = 1000*abs(sum(w.ICA[w.ICA < -1]))
return(obj + weight.penalty + down.weight.penalty + up.weight.penalty)
}
#Out of sample return portfolio computation
ret.out.MVaR.IC.cm.22 <- c()
w.out.MVaR.IC.cm.22 <- matrix(ncol = n, nrow = 10)
for (i in 0:9) {
train <- as.matrix(portfolioReturns.new[((1+i*90):(8*90+i*90)),])
test <- as.matrix(portfolioReturns.new[(1+8*90+i*90):(9*90+i*90),])
a <- myfastICA(train, k, alg.typ = "parallel", fun = "logcosh", alpha = 1,
method = "R", row.norm = FALSE, maxit = 2000,
tol = 0.0000000001, verbose = TRUE)
x <- DEoptim(MVaR.IC.cm.port,lower,upper,
control=list(NP=(10*k),F=0.8,CR=0.9, trace=50),
S=a$S, alpha = alpha, MixingMatrix = a$A)
w.IC <- matrix(x$optim$bestmem, ncol=1)
w <- t(w.IC)%*%a$A
for (j in 1:ncol(train)){
w.out.MVaR.IC.cm.22[(i+1),j] <- w[j]
}
ret.out.MVaR.IC.cm.22 <- rbind(ret.out.MVaR.IC.cm.22, test %*% t(w))
}
w.out.MVaR.IC.cm.22

Why do Extra Function Calls Speed Up a Program in Python?

If I extract a computation and place it in another function shouldn't
the code be slower? Evidently not. Below, I can't believe fun2 is slower
than fun1, because fun1 clearly does more computation. What is going on?
(Maybe I can have functions call functions call function and REALLY speed
up my code.)
Heading
##
Python code:
MAX = 10000000
def fun1(): # 4.26 seconds.
def multiply (X, Y): # multiply two 2x2 matrices
a, b, c, d = X
e, f, g, h = Y
return a*e+b*g, a*f+b*h, c*e+d*g, c*f+d*h
X = [1,2,3,4]
Y = [5,6,7,8]
for n in range (MAX):
Z = multiply (X, Y) # Make the call
return Z
#-------------------------------------------------
def fun2(): # 6.56 seconds.
X = [1,2,3,4]
Y = [5,6,7,8]
for n in range (MAX):
Z = X[0]*Y[0] + X[1]*Y[2], \
X[0]*Y[1] + X[1]*Y[3], \
X[2]*Y[0] + X[3]*Y[2], \
X[2]*Y[1] + X[3]*Y[3] # Don't make the call.
return Z
I'm not sure, but I think it might be that
a,b,c,d = X
and then referencing a,b,c and d directly is faster that referencing X[0] (and so on).
Every index in list is another lookup, while a,b,c,d=X is only one lookup (I think).
I finally figured out my own question. The function dispensed with square brackets, and that is where the speed increase came from, not from the function call itself. A Python list contains both values and bit-sizes (or addresses, I don't know which). To access x[3], the computer goes to address x reads the address of x[1], moves there, reads the address of x[2], moves there, reads the address of x[3], moves there and finally accesses the value. This takes time and can be speeded up by assigning the list elements to simple identifiers.

Efficient partial permutation sort in Julia

I am dealing with a problem that requires a partial permutation sort by magnitude in Julia. If x is a vector of dimension p, then what I need are the first k indices corresponding to the k components of x that would appear first in a partial sort by absolute value of x.
Refer to Julia's sorting functions here. Basically, I want a cross between sortperm and select!. When Julia 0.4 is released, I will be able to obtain the same answer by applying sortperm! (this function) to the vector of indices and choosing the first k of them. However, using sortperm! is not ideal here because it will sort the remaining p-k indices of x, which I do not need.
What would be the most memory-efficient way to do the partial permutation sort? I hacked a solution by looking at the sortperm source code. However, since I am not versed in the ordering modules that Julia uses there, I am not sure if my approach is intelligent.
One important detail: I can ignore repeats or ambiguities here. In other words, I do not care about the ordering by abs() of indices for two components 2 and -2. My actual code uses floating point values, so exact equality never occurs for practical purposes.
# initialize a vector for testing
x = [-3,-2,4,1,0,-1]
x2 = copy(x)
k = 3 # num components desired in partial sort
p = 6 # num components in x, x2
# what are the indices that sort x by magnitude?
indices = sortperm(x, by = abs, rev = true)
# now perform partial sort on x2
select!(x2, k, by = abs, rev = true)
# check if first k components are sorted here
# should evaluate to "true"
isequal(x2[1:k], x[indices[1:k]])
# now try my partial permutation sort
# I only need indices2[1:k] at end of day!
indices2 = [1:p]
select!(indices2, 1:k, 1, p, Base.Perm(Base.ord(isless, abs, true, Base.Forward), x))
# same result? should evaluate to "true"
isequal(indices2[1:k], indices[1:k])
EDIT: With the suggested code, we can briefly compare performance on much larger vectors:
p = 10000; k = 100; # asking for largest 1% of components
x = randn(p); x2 = copy(x);
# run following code twice for proper timing results
#time {indices = sortperm(x, by = abs, rev = true); indices[1:k]};
#time {indices2 = [1:p]; select!(indices2, 1:k, 1, p, Base.Perm(Base.ord(isless, abs, true, Base.Forward), x))};
#time selectperm(x,k);
My output:
elapsed time: 0.048876901 seconds (19792096 bytes allocated)
elapsed time: 0.007016534 seconds (2203688 bytes allocated)
elapsed time: 0.004471847 seconds (1657808 bytes allocated)
The following version appears to be relatively space-efficient because it uses only an integer array of the same length as the input array:
function selectperm (x,k)
if k > 1 then
kk = 1:k
else
kk = 1
end
z = collect(1:length(x))
return select!(z,1:k,by = (i)->abs(x[i]), rev = true)
end
x = [-3,-2,4,1,0,-1]
k = 3 # num components desired in partial sort
print (selectperm(x,k))
The output is:
[3,1,2]
... as expected.
I'm not sure if it uses less memory than the originally-proposed solution (though I suspect the memory usage is similar) but the code may be clearer and it does produce only the first k indices whereas the original solution produced all p indices.
(Edit)
selectperm() has been edited to deal with the BoundsError that occurs if k=1 in the call to select!().

Slow nested loop in R

I'm new to R and having trouble vectorizing a nested loop that is particularly slow. The loop goes through a list of centers (vectors stored in a structure) and finds the distance between these vectors and the rows of an array called x below. I know this needs to be vectorized for speed, but cannot figure out the appropriate functions to or use of apply to do so.
clusterCenters <- matrix(runif(10000),nrow=100)
clusterMembers <- matrix(runif(400000),nrow=4000)
features <- matrix(0,(dim(clusterMembers)[1]),(dim(clusterCenters)[1]))
for(c in 1:dim(clusterCenters)[1]){
center <- clusterCenters[c,]
for(v in 1:(dim(clusterMembers)[1])){
vector <- clusterMembers[v,]
features[v,c] <- sqrt(sum((center - vector)^2))
}
}
Thanks for any help.
You can take advantage of R's recycling rules to make this a bit faster. But you have to know and account for the fact that R stores matrices in column-major order. You do that by transposing clusterMembers and then the center vector will be recycled along the columns of t(clusterMembers).
set.seed(21)
clusterCenters <- matrix(runif(10000),nrow=100)
clusterMembers <- matrix(runif(400000),nrow=4000)
# your original code in function form
seven <- function() {
features <- matrix(0,(dim(clusterMembers)[1]),(dim(clusterCenters)[1]))
for(c in 1:dim(clusterCenters)[1]){
center <- clusterCenters[c,]
for(v in 1:(dim(clusterMembers)[1])){
vector <- clusterMembers[v,]
features[v,c] <- sqrt(sum((center - vector)^2))
}
}
features
}
# my fancy function
josh <- function() {
tcm <- t(clusterMembers)
Features <- matrix(0,ncol(tcm),nrow(clusterCenters))
for(i in 1:nrow(clusterCenters)) {
# clusterCenters[i,] returns a vector because drop=TRUE by default
Features[,i] <- colSums((clusterCenters[i,]-tcm)^2)
}
Features <- sqrt(Features) # outside the loop to avoid function calls
}
system.time(seven())
# user system elapsed
# 2.7 0.0 2.7
system.time(josh())
# user system elapsed
# 0.28 0.11 0.39
identical(seven(),josh())
# [1] TRUE

r: for loop operation with nested indices runs super slow

I have an operation I'd like to run for each row of a data frame, changing one column. I'm an apply/ddply/sqldf man, but I'll use loops when they make sense, and I think this is one of those times. This case is tricky because the column to changes depends on information that changes by row; depending on information in one cell, I should make a change to only one of ten other cells in that row. With 75 columns and 20000 rows, the operation takes 10 minutes, when every other operation in my script takes 0-5 seconds, ten seconds max. I've stripped my problem down to the very simple test case below.
n <- 20000
t.df <- data.frame(matrix(1:5000, ncol=10, nrow=n) )
system.time(
for (i in 1:nrow(t.df)) {
t.df[i,(t.df[i,1]%%10 + 1)] <- 99
}
)
This takes 70 seconds with ten columns, and 360 when ncol=50. That's crazy. Are loops the wrong approach? Is there a better, more efficient way to do this?
I already tried initializing the nested term (t.df[i,1]%%10 + 1) as a list outside the for loop. It saves about 30 seconds (out of 10 minutes) but makes the example code above more complicated. So it helps, but its not the solution.
My current best idea came while preparing this test case. For me, only 10 of the columns are relevant (and 75-11 columns are irrelevant). Since the run times depend so much on the number of columns, I can just run the above operation on a data frame that excludes irrelevant columns. That will get me down to just over a minute. But is "for loop with nested indices" even the best way to think about my problem?
It seems the real bottleneck is having the data in the form of a data.frame. I assume that in your real problem you have a compelling reason to use a data.frame. Any way to convert your data in such a way that it can remain in a matrix?
By the way, great question and a very good example.
Here's an illustration of how much faster loops are on matrices than on data.frames:
> n <- 20000
> t.df <- (matrix(1:5000, ncol=10, nrow=n) )
> system.time(
+ for (i in 1:nrow(t.df)) {
+ t.df[i,(t.df[i,1]%%10 + 1)] <- 99
+ }
+ )
user system elapsed
0.084 0.001 0.084
>
> n <- 20000
> t.df <- data.frame(matrix(1:5000, ncol=10, nrow=n) )
> system.time(
+ for (i in 1:nrow(t.df)) {
+ t.df[i,(t.df[i,1]%%10 + 1)] <- 99
+ }
+ )
user system elapsed
31.543 57.664 89.224
Using row and col seems less complicated to me:
t.df[col(t.df) == (row(t.df) %% 10) + 1] <- 99
I think Tommy's is still faster, but using row and col might be easier to understand.
#JD Long is right that if t.df can be represented as a matrix, things will be much faster.
...And then you can actually vectorize the whole thing so that it is lightning fast:
n <- 20000
t.df <- data.frame(matrix(1:5000, ncol=10, nrow=n) )
system.time({
m <- as.matrix(t.df)
m[cbind(seq_len(nrow(m)), m[,1]%%10L + 1L)] <- 99
t2.df <- as.data.frame(m)
}) # 0.00 secs
Unfortunately, the matrix indexing I use here does not seem to work on a data.frame.
EDIT
A variant where I create a logical matrix to index works on data.frame, and is almost as fast:
n <- 20000
t.df <- data.frame(matrix(1:5000, ncol=10, nrow=n) )
system.time({
t2.df <- t.df
# Create a logical matrix with TRUE wherever the replacement should happen
m <- array(FALSE, dim=dim(t2.df))
m[cbind(seq_len(nrow(t2.df)), t2.df[,1]%%10L + 1L)] <- TRUE
t2.df[m] <- 99
}) # 0.01 secs
UPDATE: Added the matrix version of Tommy's solution to the benchmarking exercise.
You can vectorize it. Here is my solution and a comparison with the loop
n <- 20000
t.df <- (matrix(1:5000, ncol=10, nrow=n))
f_ramnath <- function(x){
idx <- x[,1] %% 10 + 1
x[cbind(1:NROW(x), idx)] <- 99
return(x)
}
f_long <- function(t.df){
for (i in 1:nrow(t.df)) {
t.df[i,(t.df[i,1]%%10 + 1)] <- 99
}
return(t.df)
}
f_joran <- function(t.df){
t.df[col(t.df) == (row(t.df) %% 10) + 1] <- 99
return(t.df)
}
f_tommy <- function(t.df){
t2.df <- t.df
# Create a logical matrix with TRUE wherever the replacement should happen
m <- array(FALSE, dim=dim(t2.df))
m[cbind(seq_len(nrow(t2.df)), t2.df[,1]%%10L + 1L)] <- TRUE
t2.df[m] <- 99
return(t2.df)
}
f_tommy_mat <- function(m){
m[cbind(seq_len(nrow(m)), m[,1]%%10L + 1L)] <- 99
}
To compare the performance of the different approaches, we can use rbenchmark.
library(rbenchmark)
benchmark(f_long(t.df), f_ramnath(t.df), f_joran(t.df), f_tommy(t.df),
f_tommy_mat(t.df), replications = 20, order = 'relative',
columns = c('test', 'elapsed', 'relative')
test elapsed relative
5 f_tommy_mat(t.df) 0.135 1.000000
2 f_ramnath(t.df) 0.172 1.274074
4 f_tommy(t.df) 0.311 2.303704
3 f_joran(t.df) 0.705 5.222222
1 f_long(t.df) 2.411 17.859259
Another option for when you do need mixed column types (and so you can't use matrix) is := in data.table. Example from ?":=" :
require(data.table)
m = matrix(1,nrow=100000,ncol=100)
DF = as.data.frame(m)
DT = as.data.table(m)
system.time(for (i in 1:1000) DF[i,1] <- i)
# 591 seconds
system.time(for (i in 1:1000) DT[i,V1:=i])
# 1.16 seconds ( 509 times faster )

Resources