PortfolioAnalytics R - I find optimal portfolio with DEOptim, but when plotting it looks like portfolio is not optimal, why? - portfolio

I create optimal portfolio using optimize.portfolio and solver DEoptim (maxDrawdown risk objective with target -0.1 and maximum return objective). But when I plot the optimized portfolio object, it does not seem that this portfolio is optimal as there are portfolios with the same Drawdown and better returns. Could someone explain?
library('PortfolioAnalytics')
library('PerformanceAnalytics')
library('DEoptim')
data(edhec)
ret <- edhec[, 1:10]
init.portf <- portfolio.spec(assets=colnames(ret))
init.portf <- add.constraint(portfolio=init.portf, type="full_investment")
init.portf <- add.constraint(portfolio=init.portf, type="long_only")
group_list <- list(group1=c(3),
group2=c(1, 2),
group3=c(5, 7, 8))
init.portf <- add.constraint(portfolio= init.portf,
type="group",
groups=group_list,
group_min=c(0.03, 0, 0),
group_max=c(0.032, 0.2, 0.3))
ret.obj.portf <- add.objective(portfolio=init.portf, type="return",
name="mean")
ret.obj.portf <- add.objective(portfolio = ret.obj.portf,
type = 'risk',
name = 'maxDrawdown',
arguments = list(inverse=TRUE),
target = -0.1)
ret.obj.portf$constraints[[1]]$min_sum <- 0.99
ret.obj.portf$constraints[[1]]$max_sum <- 1.01
ret.obj.portf
set.seed(123)
opt.obj.no1.1 <- optimize.portfolio(R=ret, portfolio=ret.obj.portf,
optimize_method="DEoptim", search_size=2000, trace=TRUE)
opt.obj.no1.1
chart.RiskReward(opt.obj.no1.1,
main = 'Optimized Portfolio: max return and 10% maxDD, all investments allowed',
return.col = "mean", risk.col = 'maxDrawdown')

Related

ifelse and return in the for loop

I need a help for my syntax.
library(e1071)
priori <- function (I, N, M) {
a <- as.matrix(runif(I, min = 0.65, max = 1.70))
b <- as.matrix(runif(I, min = -2.80, max = 2.80))
c <- as.matrix(runif(I, min = 0.00, max = 0.35))
k <- c(rnorm(N*20/100, 0, 1), rnorm(N*80/100,0, 0.01))
M <- cbind(b,a,c)
data <- as.data.frame(rmvlogis(N, M, IRT = FALSE, link = "logit", z.vals = k))
print(data)}
This is my syntax which is generate data.
priori.list <- vector("list", 3)
names(priori.list) <- paste0("L", seq_along(priori.list))
priori.sum.list <- vector("list", 3)
for (i in 1:3) {
for (j in 1:100) {
priori.list$L1[[j]] <- priori(10,100, M="2PL")
priori.list$L2[[j]] <- priori(20,500, M="2PL")
priori.list$L3[[j]] <- priori(40,1000,M="3PL")
priori.sum.list [[i]][[j]] <- rowSums(priori.list[[i]][[j]])
print(kurtosis(priori.sum.list[[i]][[j]]))
if(skewness(priori.sum.list[[i]][[j]])>=-1 | skewness(priori.sum.list[[i]][[j]]>=1)
& kurtosis(priori.sum.list[[i]][[j]])>=-1 | kurtosis(priori.sum.list[[i]][[j]]>=1))
{NA}
else
{return(j=j-1)}}}
Then I do a data list from syntax. I want to create a loop according to the skewness and kurtosis coefficient, but I couldn't. My purpose: If the skewness and kurtosis coefficient is outside 1 and -1, save to list if not regenerate. Can You help me for correct syntax? Thank You.

Algorithm for market clearing

I've got discrete step functions for supply and demand. I'm searching for an algorithm to find the equilibrium price, The data are below in R, but a solution any language (or pseudo-code) is acceptable.
demand = data.frame(volume = c(8,2,3,1,1), price=c(1,2,3,4,5))
supply = data.frame(volume = c(3,2,4,2,3), price=c(5,4,3,2,1))
demand$volume <- cumsum(demand$volume)
supply$volume <- cumsum(supply$volume)
plot(demand, type="s")
lines(supply, type="s", col=3)
You need to take partial cumsum volumes from opposite ends of the price range.
demand_cum = (15, 7, 5, 2, 1)
supply_cum = ( 3, 5, 9, 11, 14)
This shows you total, cumulative demand & supply at each price.
Now can you spot the equilibrium?
I was looking into a similar problem and found this great description: https://www.youtube.com/watch?v=FYfbM56L-mE&ab_channel=31761-Renewablesinelectricitymarkets
You can motivate a similar analysis for your problem. Consider an LP formulation. Given the dual solution, you can find the market-clearing price as follows:
demand = data.frame(Type = "demand",Q = c(8,2,3,1,1), P=c(1,2,3,4,5))
supply = data.frame(Type = "supply",Q = c(3,2,4,2,3), P=c(5,4,3,2,1))
ds <- rbind(supply,demand)
By representing the problem from LP, do the following:
ds[ds$Type == "demand","Q"] <- ds[ds$Type == "demand","Q"]
ds[ds$Type == "supply","Q"] <- ds[ds$Type == "supply","Q"]
P_s <- ds[ds$Type == "supply","P"]
P_d <- ds[ds$Type == "demand","P"]
Q_s <- ds[ds$Type == "supply","Q"]
Q_d <- ds[ds$Type == "demand","Q"]
c_vec <- c(P_s,-P_d)
A_mat <- diag(length(c_vec))
b_vec <- c(Q_s,Q_d)
dir_1 <- rep("<=",length(b_vec))
A2_mat <- c(rep(1,length(Q_s)),rep(-1,length(Q_d)))
b2_vec <- 0
A_mat <- rbind(A_mat,A2_mat)
b_vec <- c(b_vec,b2_vec)
dir_1 <- c(dir_1,"=")
library(lpSolve)
sol <- lp ("min", c_vec, A_mat, dir_1, b_vec, compute.sens=TRUE)
price_mc <- sol$duals[nrow(ds) + 1] # extracts the dual, which corresponds to the price
In your example, the market-clearing price is $2.

How can I improve the performance of my data cleaning code that currently uses ddply by using data.table?

I am trying to clean data using ddply but it is running very slowly on 1.3M rows.
Sample code:
#Create Sample Data Frame
num_rows <- 10000
df <- data.frame(id=sample(1:20, num_rows, replace=T),
Consumption=sample(-20:20, num_rows, replace=T),
StartDate=as.Date(sample(15000:15020, num_rows, replace=T), origin = "1970-01-01"))
df$EndDate <- df$StartDate + 90
#df <- df[order(df$id, df$StartDate, df$Consumption),]
#Are values negative?
# Needed for subsetting in ddply rows with same positive and negative values
df$Neg <- ifelse(df$Consumption < 0, -1, 1)
df$Consumption <- abs(df$Consumption)
I have written a function to remove rows where there is a consumption value in one row that is identical but negative to a consumption value in another row (for the same id).
#Remove rows from a data frame where there is an equal but opposite consumption value
#Should ensure only one negative value is removed for each positive one.
clean_negatives <- function(x3){
copies <- abs(sum(x3$Neg))
sgn <- ifelse(sum(x3$Neg) <0, -1, 1)
x3 <- x3[0:copies,]
x3$Consumption <- sgn*x3$Consumption
x3$Neg <- NULL
x3}
I then use ddply to apply that function to remove these erroneous rows in the data
ptm <- proc.time()
df_cleaned <- ddply(df, .(id,StartDate, EndDate, Consumption),
function(x){clean_negatives(x)})
proc.time() - ptm
I was hoping I could use data.table to make this go faster but I couldn't work out how to employ data.table to help.
With 1.3M rows, so far it is taking my desktop all day to compute and still hasn't finished.
Your question asks about data.table implementation. So, I've shown it here. Your function could be drastically simplified as well. You can first get the sign by summing up Neg and then filter the table and then multiply Consumption by sign (as shown below).
require(data.table)
# get the data.table in dt
dt <- data.table(df, key = c("id", "StartDate", "EndDate", "Consumption"))
# first obtain the sign directly
dt <- dt[, sign := sign(sum(Neg)), by = c("id", "StartDate", "EndDate", "Consumption")]
# then filter by abs(sum(Neg))
dt.fil <- dt[, .SD[seq_len(abs(sum(Neg)))], by = c("id", "StartDate", "EndDate", "Consumption")]
# modifying for final output (line commented after Statquant's comment
# dt.fil$Consumption <- dt.fil$Consumption * dt.fil$sign
dt.fil[, Consumption := (Consumption*sign)]
dt.fil <- subset(dt.fil, select=-c(Neg, sign))
Benchmarking
The data with million rows:
#Create Sample Data Frame
num_rows <- 1e6
df <- data.frame(id=sample(1:20, num_rows, replace=T),
Consumption=sample(-20:20, num_rows, replace=T),
StartDate=as.Date(sample(15000:15020, num_rows, replace=T), origin = "1970-01-01"))
df$EndDate <- df$StartDate + 90
df$Neg <- ifelse(df$Consumption < 0, -1, 1)
df$Consumption <- abs(df$Consumption)
The data.table function:
FUN.DT <- function() {
require(data.table)
dt <- data.table(df, key=c("id", "StartDate", "EndDate", "Consumption"))
dt <- dt[, sign := sign(sum(Neg)),
by = c("id", "StartDate", "EndDate", "Consumption")]
dt.fil <- dt[, .SD[seq_len(abs(sum(Neg)))],
by=c("id", "StartDate", "EndDate", "Consumption")]
dt.fil[, Consumption := (Consumption*sign)]
dt.fil <- subset(dt.fil, select=-c(Neg, sign))
}
Your function with ddply
FUN.PLYR <- function() {
require(plyr)
clean_negatives <- function(x3) {
copies <- abs(sum(x3$Neg))
sgn <- ifelse(sum(x3$Neg) <0, -1, 1)
x3 <- x3[0:copies,]
x3$Consumption <- sgn*x3$Consumption
x3$Neg <- NULL
x3
}
df_cleaned <- ddply(df, .(id, StartDate, EndDate, Consumption),
function(x) clean_negatives(x))
}
Benchmarking with rbenchmark (with 1 run only)
require(rbenchmark)
benchmark(FUN.DT(), FUN.PLYR(), replications = 1, order = "elapsed")
test replications elapsed relative user.self sys.self user.child sys.child
1 FUN.DT() 1 6.137 1.000 5.926 0.211 0 0
2 FUN.PLYR() 1 242.268 39.477 152.855 82.881 0 0
My data.table implementation is about 39 times faster than your current plyr implementation (I compare mine to your implementation because the functions are different).
Note: I loaded the packages within the function in order to obtain the complete time to obtain the result. Also, for the same reason I converted the data.frame to data.table with keys inside the benchmarking function. This is therefore the minimum speed-up.

Make this loop faster in R

How can I speed up the following (noob) code:
#"mymatrix" is the matrix of word counts (docs X terms)
#"tfidfmatrix" is the transformed matrix
tfidfmatrix = Matrix(mymatrix, nrow=num_of_docs, ncol=num_of_words, sparse=T)
#Apply a transformation on each row of the matrix
for(i in 1:dim(mymatrix)[[1]]){
r = mymatrix[i,]
s = sapply(r, function(x) ifelse(x==0, 0, (1+log(x))*log((1+ndocs)/(1+x)) ) )
tfmat[i,] = s/sqrt(sum(s^2))
}
return (tfidfmatrix)
Problem is that the matrices I am working on are fairly large (~40kX100k), and this code is very slow.
The reason I am not using "apply" (instead of using a for loop and sapply) is that apply is going to give me the transpose of the matrix I want - I want num_of_docs X num_of_words, but apply will give me the transpose. I will then have to spend more time computing the transpose and re-allocating it.
Any thoughts on making this faster?
Thanks much.
Edit : I have found that the suggestions below greatly speed up my code (besides making me feel stupid). Any suggestions on where I can learn to write "optimized" R code from?
Edit 2: OK, so something is not right. Once I do s.vec[!is.finite(s.vec)] <- 0 every element of s.vec is being set to 0. Just to re-iterate my original matrix is a sparse matrix containing integers. This is due to some quirk of the Matrix package I am using. When I do s.vec[which(s.vec==-Inf)] <- 0 things work as expected. Thoughts?
As per my comment,
#Slightly larger example data
mymatrix <- matrix(runif(10000),nrow=10)
mymatrix[sample(10000,100)] <- 0
tfmat <- matrix(nrow=10, ncol=1000)
ndocs <- 1
justin <- function(){
s.vec <- ifelse(mymatrix==0, 0, (1 + log(mymatrix)) * log((1 + ndocs)/(1 + mymatrix)))
tfmat.vec <- s.vec/sqrt(rowSums(s.vec^2))
}
joran <- function(){
s.vec <- (1 + log(mymatrix)) * log((1 + ndocs)/(1 + mymatrix))
s.vec[!is.finite(s.vec)] <- 0
tfmat.vec <- s.vec/sqrt(rowSums(s.vec^2))
}
require(rbenchmark)
benchmark(justin(),joran(),replications = 1000)
test replications elapsed relative user.self sys.self user.child sys.child
2 joran() 1000 0.940 1.00000 0.842 0.105 0 0
1 justin() 1000 2.786 2.96383 2.617 0.187 0 0
So it's around 3x faster or so.
not sure what ndocs is, but ifelse is already vectorized, so you should be able to use the ifelse statement without walking through the matrix row by row and sapply along the row. The same can be said for the final calc.
However, you haven't given a complete example to replicate...
mymatrix <- matrix(runif(100),nrow=10)
tfmat <- matrix(nrow=10, ncol=10)
ndocs <- 1
s.vec <- ifelse(mymatrix==0, 0, 1 + log(mymatrix)) * log((1 + ndocs)/(1 + mymatrix))
for(i in 1:dim(mymatrix)[[1]]){
r = mymatrix[i,]
s = sapply(r, function(x) ifelse(x==0, 0, (1+log(x))*log((1+ndocs)/(1+x)) ) )
tfmat[i,] <- s
}
all.equal(s.vec, tfmat)
so the only piece missing is the rowSums in your final calc.
tfmat.vec <- s.vec/sqrt(rowSums(s.vec^2))
for(i in 1:dim(mymatrix)[[1]]){
r = mymatrix[i,]
s = sapply(r, function(x) ifelse(x==0, 0, (1+log(x))*log((1+ndocs)/(1+x)) ) )
tfmat[i,] = s/sqrt(sum(s^2))
}
all.equal(tfmat, tfmat.vec)

Applying nlminb to subsets of data (by index or label) and store what the program returns as a new data frame

I was wondering if anyone could kindly help me with this seemingly easy task. I'm using nlminb to conduct optimization and compute some statistics by index. Here's an example from nlminb help.
> x <- rnbinom(100, mu = 10, size = 10)
> hdev <- function(par) {
+ -sum(dnbinom(x, mu = par[1], size = par[2], log = TRUE))
+ }
> nlminb(c(9, 12), hdev)
$par
[1] 9.730000 5.954936
$objective
[1] 297.2074
$convergence
[1] 0
$message
[1] "relative convergence (4)"
$iterations
[1] 10
$evaluations
function gradient
12 27
Suppose I generate random variables x, y, and z where z acts as an index (from 1 to 3).
> x <- rnbinom(100, mu = 10, size = 10)
> y <- rnbinom(100, mu = 10, size = 10)
> z <- rep(1:3, length=100)
> A <- cbind(x,y,z)
> hdev <- function(par) {
+ -sum(dnbinom(x+y, mu = par[1], size = par[2], log = TRUE))}
How can I apply nlminb(c(9, 12), hdev) to the data set by index z? In other words, I would like to compute nlminb(c(9, 12), hdev) for z=1, z=2, and z=3 separately. I tried by(A, z, function(A) nlminb(c(9,12), hdev)) and sparseby(A, z, function(A) nlminb(c(9,12), hdev)), but they return exactly the same values for each value of z.
I would like to turn each output into a new data frame so that it will become a 3X2 matrix.
[1] Z1_ANSWER_1 Z1_ANSWER_2
[2] Z2_ANSWER_1 Z2_ANSWER_2
[3] Z3_ANSWER_1 Z3_ANSWER_2
Since nlminb returns the summary of statistics, I needed to use CASEZ1<-nlminb$par, CASEZ2<-nlminb$par, CASEZ3<-nlminb$par and then use cbind to combine them. However, I would like to automate this process as the real data I'm working on has a lot more categories than z presented here.
If I'm not making myself clear, please let me know. I'll see if I can replicate the actual data set and functions I'm working on (I just don't have them on this computer).
Thank you very much in advance.
Let me try an approach
x <- rnbinom(100, mu = 10, size = 10)
y <- rnbinom(100, mu = 10, size = 10)
z <- rep(1:3, length=100)
A <- as.data.frame(cbind(x,y,z))
At first load the plyr library
library(plyr)
The following code returns the results for each z
dlply(A, .(z), function(x) {
hdev <- function(par, mydata) {-sum(dnbinom(mydata, mu = par[1], size = par[2], log = TRUE))}
nlminb(c(9, 12), hdev, mydata=t(as.vector(x[1] + as.vector(x[2]))))
}
)
Now, with this one you will get a 3x2 dataframe with the $par results
ddply(A, .(z), function(x) {
hdev <- function(par, mydata) {-sum(dnbinom(mydata, mu = par[1], size = par[2], log = TRUE))}
res <- nlminb(c(9, 12), hdev, mydata=t(as.vector(x[1] + as.vector(x[2]))))
return(res$par)
}
)

Resources