How to change font family of a heatmap created by the pheatmap package - font-family

I would like to change the font family of the following graph to Times New Roman,
but couldn't figure out how. Any help will be greatly appreciated!
A reproducible example:
d <- data.frame ( c(runif(5, min=0, max = 5)), c(runif(5, min=0, max = 5)), c(runif(5, min=0, max = 5)), row.names = c("gene1", "gene2", "gene3", "gene4", "gene5"))
colnames(d) <- c("Day 1", "Day 2", "Day3")
pheatmap(d)

Related

Line search fails when training linear SVM with caret

I am trying to train a linear SVM while tuning the parameters with 10fold CV for binary text classification.
As all solutions provided in other threads do not work and I already removed all NAs, NANs and Inf and balanced my dataset by applying downsampling but still the model returns NAs and fails in line search. Therefore I need the help of the community as I am kind of stuck.
The data has 2099 observations of 926 variables and is mostly 0 and 1, 2 or 3s.
dat_SetimentAnalysis <- c(
This is my code:
set.seed(335)
trainIndex <- createDataPartition(dat_SentimentAnalysis$Usefulness, p = .75,
list = FALSE,
times = 1)
train <- dat_SentimentAnalysis[ trainIndex,]
test <- dat_SentimentAnalysis[-trainIndex,]
#check for distribution of class
table(train$Usefulness)
#downsample training set
train <- downSample(train, as.factor(train$Usefulness))
#check again for distribution
table(train$Usefulness)
train <- na.omit(train) #no na values detected
#separate feature and predictors
x_train <- train[2:926]
y_train <- as.factor(train$Usefulness)
x_test <- test[2:926]
y_test <- as.factor(test$Usefulness)
sum(is.na(x_train))
sum(is.na(y_train))
#tune hyperparameters for SVM
fitControl <- trainControl(method = "repeatedcv",
number = 10,
repeats = 3,
search = "grid",
classProbs = TRUE,
savePredictions = TRUE)
model <- caret::train(x = x_train,
y = y_train,
method = "svmLinear",
trControl = fitControl,
tunegrid=data.frame(C=c(0.25, 0.5, 1,5,8,12,100)))
Does anybody have an idea what could be wrong? Because, when I do not perform tuning I get a very poor performing SVM with around 52 % accuracy but at least I get one. So maybe something with the tuning formula is wrong?
Thank you very much for your help!

How to set my inequality constraint into my R function?

I am working on a project consisting of the analysis of different portfolio constructions in a universe of various assets. I work on 22 assets and I recalibrate my portfolio every 90 days. This is why a weights penalties (see code) constraint is applied as the allocation changes every period.
I am currently implementing a construction based on independent components. My objective is to minimize the modified value at risk based on its components. (See code below).
My function runs correctly and everything seems to be OK, my function "MVaR.IC.port" and "MVaR.cm" work well. However, I can only implement this model in the case where short selling is allowed. I would now like to operate only in "Long only", i.e. that my weight vectors w only contain elements >=0. Concretely, i want that the expression "w <- t(w.IC)%*%a$A" in my code be >=0.
Do you know how to help me? Thank you in advance.
[results w.out.MVaR.IC.22,][1] Here are the results that must be positive. I also constraint that the sum of the weights must be equal to 1 (the investor allocates 100% of his wealth.).
Thomas
PS: train and test represent my rolling windows. In fact, I calibrate my models on 'train' (in sample) and apply them on 'test' (out of sample) in order to analyse their performance.
########################################
######### MVar on IC with CM #########
########################################
lower = rep(-5,k)
upper = rep(5,k)
#Set up objective function and constraint
MVaR.IC.cm.port <- function(S, weights, alpha, MixingMatrix)
{
obj <- MVaR(S, weights, alpha)
w.ICA <- t(weights)%*%MixingMatrix
weight.penalty = abs(1000*(1-sum(w.ICA)))
down.weight.penalty = 1000*sum(w.ICA[w.ICA > 1])
up.weight.penalty = 1000*abs(sum(w.ICA[w.ICA < -1]))
return(obj + weight.penalty + down.weight.penalty + up.weight.penalty)
}
#Out of sample return portfolio computation
ret.out.MVaR.IC.cm.22 <- c()
w.out.MVaR.IC.cm.22 <- matrix(ncol = n, nrow = 10)
for (i in 0:9) {
train <- as.matrix(portfolioReturns.new[((1+i*90):(8*90+i*90)),])
test <- as.matrix(portfolioReturns.new[(1+8*90+i*90):(9*90+i*90),])
a <- myfastICA(train, k, alg.typ = "parallel", fun = "logcosh", alpha = 1,
method = "R", row.norm = FALSE, maxit = 2000,
tol = 0.0000000001, verbose = TRUE)
x <- DEoptim(MVaR.IC.cm.port,lower,upper,
control=list(NP=(10*k),F=0.8,CR=0.9, trace=50),
S=a$S, alpha = alpha, MixingMatrix = a$A)
w.IC <- matrix(x$optim$bestmem, ncol=1)
w <- t(w.IC)%*%a$A
for (j in 1:ncol(train)){
w.out.MVaR.IC.cm.22[(i+1),j] <- w[j]
}
ret.out.MVaR.IC.cm.22 <- rbind(ret.out.MVaR.IC.cm.22, test %*% t(w))
}
w.out.MVaR.IC.cm.22

How to request a value from a column by filtering a range

Having in column C country names and in range E:N population in different years
I try to find the country with the max population ever (so in the range E2:N43)
tried with queries below wihout success:
=QUERY(A1:N43,"select C WHERE '"&MAX(E2:N43)&"' IN '"&E2:N43)
=QUERY(A1:N43,"select C WHERE '"&MAX(E2:N43)&"' = '"&E2:N43&"' ")
What's wrong?
min:
=FLATTEN(INDEX(SORT(SPLIT(FLATTEN(FILTER(E2:N&"×"&C2:C, C2:C<>"")), "×"), 1, 1), 1))
max:
=FLATTEN(INDEX(SORT(SPLIT(FLATTEN(FILTER(E2:N&"×"&C2:C, C2:C<>"")), "×"), 1, 0), 1))

R xgboost xgb.cv pred values: best iteration or final iteration?

I am using the xgb.cv function to grid search best hyperparameters in the R implementation of xgboost. When setting predictions to TRUE, it supplies the predictions for the out of fold observations. Presuming you are using early stopping, do the predictions correspond to predictions at the best iteration or are they the predictions of the final iteration?
CV predictions correspond to the best iteration - you can see this using a 'strict' early_stopping value, then comparing the predictions with those made using models trained with the 'best' number of iterations and 'final' number of iterations, eg:
# Load minimum reproducible example
library(xgboost)
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
dtrain <- xgb.DMatrix(train$data, label=train$label)
test <- agaricus.test
dtest <- xgb.DMatrix(test$data, label=test$label)
# Perform cross validation with a 'strict' early_stopping
cv <- xgb.cv(data = train$data, label = train$label, nfold = 5, max_depth = 2,
eta = 1, nthread = 4, nrounds = 10, objective = "binary:logistic",
prediction = TRUE, early_stopping_rounds = 1)
# Check which round was the best iteration (the one that initiated the early stopping)
print(cv$best_iteration)
[1] 3
# Get the predictions
head(cv$pred)
[1] 0.84574515 0.15447612 0.15390711 0.84502697 0.09661318 0.15447612
# Train a model using 3 rounds (corresponds to best iteration)
trained_model <- xgb.train(data = dtrain, max_depth = 2,
eta = 1, nthread = 4, nrounds = 3,
watchlist = list(train = dtrain, eval = dtrain),
objective = "binary:logistic")
# Get predictions
head(predict(trained_model, dtrain))
[1] 0.84625006 0.15353635 0.15353635 0.84625006 0.09530514 0.15353635
# Train a model using 10 rounds (corresponds to final iteration)
trained_model <- xgb.train(data = dtrain, max_depth = 2,
eta = 1, nthread = 4, nrounds = 10,
watchlist = list(train = dtrain, eval = dtrain),
objective = "binary:logistic")
head(predict(trained_model, dtrain))
[1] 0.9884467125 0.0123147098 0.0050151693 0.9884467125 0.0008781737 0.0123147098
So the predictions from the CV are ~the same as the predictions made when the number of iterations is 'best', not 'final'.

Title in plot_grid in rstudio

I am using the following code to set up 9 plots in one figure, and to save it
plotpm2.5 <- plot_grid(p1,p2,p3,p4,p5,p6,p7,p8,p9, ncol = 3, nrow = 3, align
= "h")
save_plot("plotpm2.5.png", plotpm2.5,
ncol = 3,
nrow = 3,
base_aspect_ratio = 1.5)
how can I add a main title to this?
I'm not too sure about is but here us something that might help. For example:
title <- ggdraw() + draw_label("label", fontface='bold')
plot_grid(title, x, ncol=1, rel_heights=c(0.1, 1))
rel_heights controls the margin values for the title. Play with the arguments inside plot_grid to finally get a title that suits you.

Resources