Errors in ARIMA models - arima

I want to fetch MAPE error of arima model after fitting model.
below is the summary of arima model
Series: train
ARIMA(1,1,1)
Coefficients:
ar1 ma1
0.4472 -0.925
s.e. 0.0310 0.014
sigma^2 estimated as 211188552: log likelihood=-14820.68
AIC=29647.36 AICc=29647.38 BIC=29662.98
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 413.1383 14516.15 9886.802 -17.77737 27.93304 0.9202813 -0.008861643
num [1, 1:7] 413.1 14516.1 9886.8 -17.8 27.9 ...
- attr(*, "dimnames")=List of 2
..$ : chr "Training set"
..$ : chr [1:7] "ME" "RMSE" "MAE" "MPE" ...

use this code
mape_error<-accuracy(fit)
mape_error<-data.frame(mape_error)
mape<-mape_error$MAPE

library(forecast)
This gets your point predictions, decide on an horizon(h) length
pred <- forecast(fit, h = 3)$mean
m_pred and actual(same length as h) have to be the same length
MAPE <- accuracy(m_pred, actual)[5]
RMSE<- accuracy(m_pred, actual)[2]
etc.

Related

Stargazer blank fields for ergm fields

I get blank fields in my stargazer summary from my ERG model. I tried setting the summary setting to TRUE and FALSE, both didn't work.
# HERE IS MY ERGM SUMMARY OUTPUT
> summary(ergm68)
Call:
ergm(formula = net_68 ~ edges + gwb1degree(fixed = T, decay = 0.4) +
b1factor("org", base = 1, levels = "L") + b1factor("org",
levels = "P"), bipartite = T)
Monte Carlo Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -1.2388 1.2296 0 -1.007 0.314
gwb1deg.fixed.0.4 1.0210 1.3756 0 0.742 0.458
b1factor.org.L 1.6768 1.0500 0 1.597 0.110
b1factor.org.P 0.3234 1.0186 0 0.317 0.751
Null Deviance: 95.65 on 69 degrees of freedom
Residual Deviance: 88.26 on 65 degrees of freedom
AIC: 96.26 BIC: 105.2 (Smaller is better. MC Std. Err. = 0.03973)
> stargazer(ergm68, type="text", summary=T, digits=1)
# Stargazer summary set to TRUE
> stargazer(ergm79, type="text", summary=T, digits=1)
===============================================
Dependent variable:
---------------------------
net_79
-----------------------------------------------
edges
gwb1deg.fixed.0.4
b1factor.org.L
b1factor.org.P
-----------------------------------------------
Akaike Inf. Crit. 399.0
Bayesian Inf. Crit. 413.8
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
>
Stargazer summary set to FALSE
> stargazer(ergm79, type="text", summary=F, digits=1)
===============================================
Dependent variable:
---------------------------
net_79
-----------------------------------------------
edges
gwb1deg.fixed.0.4
b1factor.org.L
b1factor.org.P
-----------------------------------------------
Akaike Inf. Crit. 399.0
Bayesian Inf. Crit. 413.8
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
>
No luck getting stargazer to generate a complete table form my ERGM output model. I'm not sure if I have missed a setting. Seems to have worked before without any trouble.

Line search fails when training linear SVM with caret

I am trying to train a linear SVM while tuning the parameters with 10fold CV for binary text classification.
As all solutions provided in other threads do not work and I already removed all NAs, NANs and Inf and balanced my dataset by applying downsampling but still the model returns NAs and fails in line search. Therefore I need the help of the community as I am kind of stuck.
The data has 2099 observations of 926 variables and is mostly 0 and 1, 2 or 3s.
dat_SetimentAnalysis <- c(
This is my code:
set.seed(335)
trainIndex <- createDataPartition(dat_SentimentAnalysis$Usefulness, p = .75,
list = FALSE,
times = 1)
train <- dat_SentimentAnalysis[ trainIndex,]
test <- dat_SentimentAnalysis[-trainIndex,]
#check for distribution of class
table(train$Usefulness)
#downsample training set
train <- downSample(train, as.factor(train$Usefulness))
#check again for distribution
table(train$Usefulness)
train <- na.omit(train) #no na values detected
#separate feature and predictors
x_train <- train[2:926]
y_train <- as.factor(train$Usefulness)
x_test <- test[2:926]
y_test <- as.factor(test$Usefulness)
sum(is.na(x_train))
sum(is.na(y_train))
#tune hyperparameters for SVM
fitControl <- trainControl(method = "repeatedcv",
number = 10,
repeats = 3,
search = "grid",
classProbs = TRUE,
savePredictions = TRUE)
model <- caret::train(x = x_train,
y = y_train,
method = "svmLinear",
trControl = fitControl,
tunegrid=data.frame(C=c(0.25, 0.5, 1,5,8,12,100)))
Does anybody have an idea what could be wrong? Because, when I do not perform tuning I get a very poor performing SVM with around 52 % accuracy but at least I get one. So maybe something with the tuning formula is wrong?
Thank you very much for your help!

Microarray DEG analysis scatterplot

I have found that my selected gene, probe I.D 201667_at is differentially expressed between WDLPS and DDLPS tumour tissue samples after performing microarray DEG analysis.
Instead of just a p-value in a table format:
Probe I.D "201667_at" logFC 10.8205874181535 AveExpr 10.6925705768407 t 82.8808890739766 P.Value 3.10189446528995e-88 adj.P Val 3.10189446528995e-88 "B" 191.589248589131
I have decided to present the data as a scatter plot/plot MDS (with an error bar) using expression values of the specific gene between the two tumour types (40 vs 52 samples) to show that it is differentially expressed. So 92 dots/points in total.
Does anyone know how I might do this, if I used these commands for microarray differential expression analysis.
library("arrayQualityMetrics")
library(GEOquery)
library(oligo)
library(Biobase)
library(affy)
library("splitstackshape")
library("tidyr")
library("dplyr")
celFiles <- list.celfiles()
affyRaw <- oligo::rma(affyraw)
eset <- oligo::rma(affyRaw)
library(limma)
pData(eset)
Groups <- c("DDLPS", "DDLPS", "WDLPS", "WDLPS")
design <- model.matrix(~factor(Groups))
colnames(design) <- c("DDLPS", "DDLPSvsWDLPS")
fit <- lmFit(eset, design)
fit <- eBayes(fit)
option (digits =2)
res <- topTable (fit, number = Inf, adjust.method = "none", coef = 1)
write.table(res, "diff_exp.txt", sep= "\t)
require(hgu133a.db)
annotLookup <- select(hgu133a.db, keys = probes,
columns = c('PROBEID', 'ENSEMBL', 'SYMBOL'))
Thankyou

Rcpp instead of mapply in model validation for many subsets

Let's say that we have specified N number of train datasets (80:20 division) and we want to retrieve a two element list with pvalues and coefficients from glm model, for each train dataset. The code reproducing this example is as follows:
library(parallel)
library(caret)
# prepare dataset
data(iris)
iris <- iris[!iris$Species == "setosa", ]
# create validation folds
set.seed(12345)
folds <- createDataPartition(y = iris$Species, times = 100, p = 0.8, list = FALSE)
# glm model expression
model.expr.tr <- expression(glm(formula = Species ~ Sepal.Length,
data = dtr,
family = binomial(link = "logit")))
# glm elements that will be validated
val_part <- list(coefs = expression(summary(eval(model.expr.tr))$coefficients[, 1]),
pvals = expression(summary(eval(model.expr.tr))$coefficients[, 4]))
# lapply with mapply for validation results
val_results <- lapply(val_part, function(x){
mapply(function(i){
trindex <- rownames(iris) %in% folds[, i]
dtr <- iris[trindex, ]
eval(x)
},
i = 1:100)
})
As you are aware, the longest part is running the model summary through all of train datasets, especially if we choose more than 100 of them. In your opinion, is there any way to speed up this process? Of course I am aware of parLapply / mcmapply options but what about some kind of Rcpp speed up in this case? Any suggestions?
Thanks.

Minimization of Nonlinear equations

I am relatively new to R, so I apologize if my questions isn't expressed well, or if there is excessive detail. What I'm doing here is taking a naturally occurring gas isotope of C12 and C13 that is produced at a linear rate (P) at respective fractions (F12 and F13) that sums to 1. The two isotopic gases are then consumed at rates k12 for C13 and k13 for C13. I then want to solve for P and k12 using a minimization function.
The equations are:
Eqn 1: conc.12 = ((F12*P)/k12)-(((F12*P)/k12)-c12zero)exp(-k12(t-t0))
Eqn 2: conc.13 = ((F13*P)/k13)-(((F13*P)/k13)-c13zero)exp(-k13(t-t0))
Eqn 3: Sum Square Error = sum(((conc.12-c12meas)/0.07)^2) +sum(((conc.13-c13meas)/0.07)^2)
conc.12 and conc.13 are the estimated concentrations of two isotopes at time t
c12meas and c13meas are the measured concentrations of two isotopes at time t
t0 is the initial time point
F12 and F13 are fractions that sum to 1
k12 and k13 are exponential decay coefficients for the two isotopes, with k13 = k12/1.06
P is a linear production rate of both 12CH4 and 13CH4
The data for a toy data set with known approximate parameters follow:
Time c12meas c13meas
1 109.7000 19.35660
2 118.9150 18.74356
3 127.6693 18.15943
4 135.9858 17.60285
5 143.8865 17.07253
6 151.3922 16.56722
7 158.5226 16.08575
8 165.2964 15.62698
9 171.7316 15.18986
10 177.8450 14.77336
11 183.6528 14.37650
12 189.1701 13.99837
13 194.4116 13.63807
14 199.3911 13.29476
15 204.1215 12.96765
16 208.6154 12.65597
17 212.8847 12.35899
18 216.9404 12.07602
19 220.7934 11.80639
20 224.4537 11.54949
Note that the rows in reality are of equal length and the problem above has to do with pasting them in to the web portal.
I first tried to solve these equations with optim with the following code:
error.func <- function (k12, P) {
t <- Time
t0 <-Time[1]
c12zero=c12meas[1]
c13zero=c13meas[1]
k13=k12/1.06
F12=0.98
F13=1-F12
ratio.12<- (F12*P)/k12
exp.12<- exp(-k12*(t-t0))
conc.12<-ratio.12 - ((ratio.12-c12zero)*exp.12)
ratio.13<- (F13*P)/k13
exp.13<- exp(-k13*(t-t0))
conc.13<- ratio.13 - ((ratio.13-c13zero)*exp.13)
error <- sum(((conc.12-c12meas)/0.07)^2)
+sum(((conc.13-c13meas)/0.07)^2)
return (error)
}
fit.model <- optim(k12=.05, P = 15, error.func)
This is the error code in R:
"Error in optim(k12 = 0.05, P = 15, error.func) :
cannot coerce type 'closure' to vector of type 'double'
In addition: Warning message:
In optim(k12 = 0.05, P = 15, error.func) :
one-dimensional optimization by Nelder-Mead is unreliable:
use "Brent" or optimize() directly"
My intepretation of this is that the optim function can't solve multiple equations at the same time, so I then tried the solnp function.
isotopes2<- function(x) {
t=Time
t0<-Time[1]
c12zero=c12meas[1]
c13zero=c13meas[1]
k13=x[1]/1.06
F12=0.98
F13=1-F12
ratio.12<- (F12*x[2])/x[1]
exp.12<- exp(-x[1]*(t-t0))
conc.12<-ratio.12 - ((ratio.12-c12zero)*exp.12)
ratio.13<- (F13*x[2])/k13
exp.13<- exp(-k13*(t-t0))
conc.13<- ratio.13 - ((ratio.13-c13zero)*exp.13)
}
error.func <- function (x) {
t <- Time
t0<-Time[1]
c12zero=c12meas[1]
c13zero=c13meas[1]
k13=x[1]/1.06
F12=0.98
F13=1-F12
ratio.12<- (F12*x[2])/x[1]
exp.12<- exp(-x[1]*(t-t0))
conc.12<-ratio.12 - ((ratio.12-c12zero)*exp.12)
ratio.13<- (F13*x[2])/k13
exp.13<- exp(-k13*(t-t0))
conc.13<- ratio.13 - ((ratio.13-c13zero)*exp.13)
error <- sum(((conc.12-c12meas)/0.07)^2)
+sum(((conc.13-c13meas)/0.07)^2)
return (error)
}
x0 <- c(0.05,15)
constraint = c(0)
fit <- solnp (x0, fun = isotopes2, eqfun = error.func, eqB=0)
I received the following error message:
"Error:
solnp-->error: objective function returns value of length greater than 1!

Resources