I am trying to set up RcppArmadillo in my windows system with Rstudio. I have successfully installed RcppArmadillo with the command
install.packages("RcppArmadillo")
in R console.
But when I try to compile a c++ code with RcppArmadillo dependency I get a error like
g++ -m64 -I"C:/PROGRA~1/R/R-30~1.3/include" -DNDEBUG -I"C:/PROGRA~1/R/R-30~1.3/library/Rcpp/include" -I"d:/RCompile/CRANpkg/extralibs64/local/include" -O2 -Wall -mtune=core2 -c colrowStat.cpp -o colrowStat.o colrowStat.cpp:5:26: fatal error: RcppArmadillo.h: No such file or directory compilation terminated. make: *** [colrowStat.o] Error 1 Warning message: running command 'make -f "C:/PROGRA~1/R/R-30~1.3/etc/x64/Makeconf" -f "C:/PROGRA~1/R/R-30~1.3/share/make/winshlib.mk" SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)' SHLIB="sourceCpp_38187.dll" WIN=64 TCLBIN=64 OBJECTS="colrowStat.o"' had status 2
But the header files are available in path_to_my_documents/R/win-libraries/3.0/RcppArmadillo/Include
I think the include path for compilation dose not have this path. I don't how to add this folder to the path. I greatly appreciate any help with this problem.
You are doing it wrong. There are many ways to do it, and we documented several of them. What you do here is not one of them.
Try this instead and go from there:
R> library(Rcpp)
R> cppFunction("arma::mat op(arma::vec x) { return(x*x.t()); }",
+ depends="RcppArmadillo")
R> op(1:2)
[,1] [,2]
[1,] 1 2
[2,] 2 4
R>
This is one of the basic examples: take a vector, multiply it by its transpose and return the result outer product matrix.
What you ultimately want is a package, and for that you could do much worse than starting by RcppArmadillo.package.skeleton().
Your question is short on details, but if you are on a Windows machine and are using RStudio, then here is a fully reproducible example on how to use RcppArmadillo without using the inline package, which is not ideal except for very short functions.
As Dirk has pointed out, this advice is available elsewhere -- the Rcpp* ecosystem is bizarrely well-documented, but this might help a novice.
0. Preliminaries:
You should have the following installed:
Rtools
R package devtools
R package Rcpp
R package RcppArmadillo
1. C++ code:
The example is a simple one of computing the OLS estimator for a linear regression model. Here is what the C++ file, with one function (fnLinRegRcpp) that takes the design matrices as inputs and produces the OLS coefficient estimates and the model residuals as an Rcpp List looks like:
// LinearRegression.cpp
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace arma; // use the Armadillo library for matrix computations
using namespace Rcpp;
// [[Rcpp::export]]
List fnLinRegRcpp(vec vY, mat mX) {
// compute the OLS estimator & model residuals
vec vBeta = solve(mX.t()*mX, mX.t()*vY);
vec vResid = vY - mX * vBeta;
// construct the return object
List ret;
ret["beta"] = vBeta;
ret["resid"] = vResid;
return ret;
}
// END
Note the use of Rcpp attributes:
// [[Rcpp::depends(RcppArmadillo)]]
to indicate library dependencies on the Armadillo library.
2. R code
Here is an example of the compilation of the C++ code using the sourceCpp function, together with an example of the use of the function, and a comparison of the output to the built-in lm.fit function.
# LinearRegression.R
library(devtools)
library(Rcpp)
library(RcppArmadillo)
Rcpp::sourceCpp("code/LinearRegression.cpp",
showOutput = TRUE,
rebuild = FALSE)
# generate some sample data
iK = 4
iN = 100
mX = cbind(1, matrix(rnorm(iK*iN), iN, iK))
vBeta0 = c(2, 3.5, 0.11, 6.33, 23)
vY = rnorm(iN, mean = mX %*% vBeta0)
# test the function
linReg1 = fnLinRegRcpp(vY, mX)
linReg1$beta # coefficient estimates
# compare the results to the built-in lm.fit function
lm.fit(y = vY, x = mX)$coef # coefficient estimates
# END
Related
I'm trying to replicate values from pine script cci() function in golang. I've found this lib https://github.com/markcheno/go-talib/blob/master/talib.go#L1821
but it gives totally different values than cci function does
pseudo code how do I use the lib
cci := talib.Cci(latest14CandlesHighArray, latest14CandlesLowArray, latest14CandlesCloseArray, 14)
The lib gives me the following data
Timestamp: 2021-05-22 18:59:27.675, Symbol: BTCUSDT, Interval: 5m, Open: 38193.78000000, Close: 38122.16000000, High: 38283.55000000, Low: 38067.92000000, StartTime: 2021-05-22 18:55:00.000, EndTime: 2021-05-22 18:59:59.999, Sma: 38091.41020000, Cci0: -16.63898084, Cci1: -53.92565811,
While current cci values on TradingView are: cci0 - -136, cci1 - -49
could anyone guide what do I miss?
Thank you
P.S. cci0 - current candle cci, cci1 - previous candle cci
PineScript has really great reference when looking for functions, usually even supplying the pine code to recreate it.
https://www.tradingview.com/pine-script-reference/v4/#fun_cci
The code wasn't provided for cci, but a step-by-step explanation was.
Here is how I managed to recreate the cci function using Pine, following the steps in the reference:
// This source code is subject to the terms of the Mozilla Public License 2.0 at https://mozilla.org/MPL/2.0/
// © bajaco
//#version=4
study("CCI Breakdown", overlay=false, precision=16)
cci_breakdown(src, p) =>
// The CCI (commodity channel index) is calculated as the
// 1. difference between the typical price of a commodity and its simple moving average,
// divided by the
// 2. mean absolute deviation of the typical price.
// 3. The index is scaled by an inverse factor of 0.015
// to provide more readable numbers
// 1. diff
ma = sma(src,p)
diff = src - ma
// 2. mad
s = 0.0
for i = 0 to p - 1
s := s + abs(src[i] - ma)
mad = s / p
// 3. Scaling
mcci = diff/mad / 0.015
mcci
plot(cci(close, 100))
plot(cci_breakdown(close,100))
I didn't know what mean absolute deviation meant, but at least in their implementation it appears to be taking the difference from the mean for each value in the range, but NOT changing the mean value as you go back.
I don't know Go but that's the logic.
I have read a lot about the pain of replicate the easy robust option from STATA to R to use robust standard errors. I replicated following approaches: StackExchange and Economic Theory Blog. They work but the problem I face is, if I want to print my results using the stargazer function (this prints the .tex code for Latex files).
Here is the illustration to my problem:
reg1 <-lm(rev~id + source + listed + country , data=data2_rev)
stargazer(reg1)
This prints the R output as .tex code (non-robust SE) If i want to use robust SE, i can do it with the sandwich package as follow:
vcov <- vcovHC(reg1, "HC1")
if I now use stargazer(vcov) only the output of the vcovHC function is printed and not the regression output itself.
With the package lmtest() it is possible to print at least the estimator, but not the observations, R2, adj. R2, Residual, Residual St.Error and the F-Statistics.
lmtest::coeftest(reg1, vcov. = sandwich::vcovHC(reg1, type = 'HC1'))
This gives the following output:
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.54923 6.85521 -0.3719 0.710611
id 0.39634 0.12376 3.2026 0.001722 **
source 1.48164 4.20183 0.3526 0.724960
country -4.00398 4.00256 -1.0004 0.319041
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
How can I add or get an output with the following parameters as well?
Residual standard error: 17.43 on 127 degrees of freedom
Multiple R-squared: 0.09676, Adjusted R-squared: 0.07543
F-statistic: 4.535 on 3 and 127 DF, p-value: 0.00469
Did anybody face the same problem and can help me out?
How can I use robust standard errors in the lm function and apply the stargazer function?
You already calculated robust standard errors, and there's an easy way to include it in the stargazeroutput:
library("sandwich")
library("plm")
library("stargazer")
data("Produc", package = "plm")
# Regression
model <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc,
index = c("state","year"),
method="pooling")
# Adjust standard errors
cov1 <- vcovHC(model, type = "HC1")
robust_se <- sqrt(diag(cov1))
# Stargazer output (with and without RSE)
stargazer(model, model, type = "text",
se = list(NULL, robust_se))
Solution found here: https://www.jakeruss.com/cheatsheets/stargazer/#robust-standard-errors-replicating-statas-robust-option
Update I'm not so much into F-Tests. People are discussing those issues, e.g. https://stats.stackexchange.com/questions/93787/f-test-formula-under-robust-standard-error
When you follow http://www3.grips.ac.jp/~yamanota/Lecture_Note_9_Heteroskedasticity
"A heteroskedasticity-robust t statistic can be obtained by dividing an OSL estimator by its robust standard error (for zero null hypotheses). The usual F-statistic, however, is invalid. Instead, we need to use the heteroskedasticity-robust Wald statistic."
and use a Wald statistic here?
This is a fairly simple solution using coeftest:
reg1 <-lm(rev~id + source + listed + country , data=data2_rev)
cl_robust <- coeftest(reg1, vcov = vcovCL, type = "HC1", cluster = ~
country)
se_robust <- cl_robust[, 2]
stargazer(reg1, reg1, cl_robust, se = list(NULL, se_robust, NULL))
Note that I only included cl_robust in the output as a verification that the results are identical.
I have an error for the following code in Julia for solving NLP problem.
using JuMP
using Ipopt
using DataFrames,ConditionalJuMP
m = Model(solver=IpoptSolver())
#importing data
xi=[-1.016713795 0.804447648 0.932137661 1.064136698 -0.963217531
-1.048396778 1.076371484 1.099027177 1.061556926 -0.95185481
-0.980261302 0.271253807 0.184946729 1.062838197 -0.958794505
-0.980703191 0.278820569 0.231132677 1.062967459 -0.959302488
-0.953074503 -0.00768112 0.128808175 1.067743779 -0.978524489
-1.014866259 0.815325963 1.065956208 1.067059122 -0.974682291
-0.995088119 0.550359991 0.845087207 1.066556784 -0.973154887]
xj=xi
pii=[-300
-259.6530828
-284.3708955
-291.3387625
-342.4479859
-356.5031603
-351.0154738]
sample_size=7
bus_num=5
tempsum=0
#define variables
#variable(m,aij[1:2*bus_num,1:2*bus_num]) # define matrix by
2bus_num*2bus_num
for n=1:sample_size
for i=1:bus_num
for j=1:bus_num
tempsum=(aij[i,j]*xi[n,i]*xj[n,j]-pii[n,2])^2+tempsum
end
end
end
#define constraints
#constraint(m, [i=1:2*bus_num,j=2*bus_num],aij[i,j]==aij[j,i])
#constraint(m, [i=1:2*bus_num,j=2*bus_num],aij[i,j]*aij[i,j]<=aij
[i,i]*aij[j,j])
#constraint(m, [i=1:2*bus_num,j=2*bus_num],aij[i,i]*aij[j,j]<=0.25*
(aij[i,i]+aij[j,j])*(aij[i,i]+aij[j,j]))
#define NLP objective function
#NLobjective(m, Min, tempsum)
solve(m)
println("m = ",getobjectivevalue(m))
println("Aij = ", getvalue(aij))
but after operation, an error was shown as below.
MethodError: no method matching parseNLExpr_runtime(::JuMP.Model,
::JuMP.GenericQuadExpr{Float64,JuMP.Variable},
::Array{ReverseDiffSparse.NodeData,1}, ::Int64, ::Array{Float64,1})
The error occurs on #NLobjective(m, Min, tempsum), but I dont know how to modify the code. May you please anyone help me ?
Besides, I am also confused by how to add positive definite contraints here for the solution Matrix aij, as I want to get a positive definite matrix aij.
#NLobjective(m, Min, tempsum) the tempsum in this code must be a function which is registered by Jump.register(***).
I built a GLM model using H2O (ver 3.14) in R. Please note that the training data contains integers, and also many NA, which I use MeanImputation to handle them.
glm <- h2o.glm(
training_frame = train.truth,
x=getColNames(train.truth),
y="isFemale",
family = "binomial",
missing_values_handling = "MeanImputation",
seed = 1000000)
I then use a validation data set to look at the perf, and the Precision looks good to me:
h2o.performance(glm, newdata=valid.truth)%>% h2o.confusionMatrix()
Confusion Matrix (vertical: actual; across: predicted) for max f1 # threshold = 0.529384526696015:
0 1 Error Rate
0 41962 300 0.007099 =300/42262
1 863 13460 0.060253 =863/14323
Totals 42825 13760 0.020553 =1163/56585
I then saved the model as a MOJO:
h2o.download_mojo(glm, path="models/mojo", get_genmodel_jar=TRUE)
I exported the validation DF to a CSV file:
dt.valid <- data.table(as.data.frame(valid.truth))
write.table(dt.valid, row.names = F, na="", file="models/test.csv")
I tried to use the saved mojo to do the same prediction by running this on my Linux shell:
java -cp h2o-genmodel.jar hex.genmodel.tools.PredictCsv \
--mojo GLM_model_R_1511161743608_15 \
--decimal --mojo GLM_model_R_1511161743608_15.zip \
--input ../test.csv --output output.csv
However, the result is terrible. All the records were predicted as 0, which is very different from what I got when I ran the model in R.
I have been stuck in this for a day but I couldn't figure out what went wrong. Anyone can shed some light on this?
I'm trying to train a dataset with 357 features using Isolation Forest sklearn implementation. I can successfully train and get results when the max features variable is set to 1.0 (the default value).
However when max features is set to 2, it gives the following error:
ValueError: Number of features of the model must match the input.
Model n_features is 2 and input n_features is 357
It also gives the same error when the feature count is 1 (int) and not 1.0 (float).
How I understood was that when the feature count is 2 (int), two features should be considered in creating each tree. Is this wrong? How can I change the max features parameter?
The code is as follows:
from sklearn.ensemble.iforest import IsolationForest
def isolation_forest_imp(dataset):
estimators = 10
samples = 100
features = 2
contamination = 0.1
bootstrap = False
random_state = None
verbosity = 0
estimator = IsolationForest(n_estimators=estimators, max_samples=samples, contamination=contamination,
max_features=features,
bootstrap=boostrap, random_state=random_state, verbose=verbosity)
model = estimator.fit(dataset)
In the documentation it states:
max_features : int or float, optional (default=1.0)
The number of features to draw from X to train each base estimator.
- If int, then draw `max_features` features.
- If float, then draw `max_features * X.shape[1]` features.
So, 2 should mean take two features and 1.0 should mean take all of the features, 0.5 take half and so on, from what I understand.
I think this could be a bug, since, taking a look in IsolationForest's fit:
# Isolation Forest inherits from BaseBagging
# and when _fit is called, BaseBagging takes care of the features correctly
super(IsolationForest, self)._fit(X, y, max_samples,
max_depth=max_depth,
sample_weight=sample_weight)
# however, when after _fit the decision_function is called using X - the whole sample - not taking into account the max_features
self.threshold_ = -sp.stats.scoreatpercentile(
-self.decision_function(X), 100. * (1. - self.contamination))
then:
# when the decision function _validate_X_predict is called, with X unmodified,
# it calls the base estimator's (dt) _validate_X_predict with the whole X
X = self.estimators_[0]._validate_X_predict(X, check_input=True)
...
# from tree.py:
def _validate_X_predict(self, X, check_input):
"""Validate X whenever one tries to predict, apply, predict_proba"""
if self.tree_ is None:
raise NotFittedError("Estimator not fitted, "
"call `fit` before exploiting the model.")
if check_input:
X = check_array(X, dtype=DTYPE, accept_sparse="csr")
if issparse(X) and (X.indices.dtype != np.intc or
X.indptr.dtype != np.intc):
raise ValueError("No support for np.int64 index based "
"sparse matrices")
# so, this check fails because X is the original X, not with the max_features applied
n_features = X.shape[1]
if self.n_features_ != n_features:
raise ValueError("Number of features of the model must "
"match the input. Model n_features is %s and "
"input n_features is %s "
% (self.n_features_, n_features))
return X
So, I am not sure on how you can handle this. Maybe figure out the percentage that leads to just the two features you need - even though I am not sure it'll work as expected.
Note: I am using scikit-learn v.0.18
Edit: as #Vivek Kumar commented this is an issue and upgrading to 0.20 should do the trick.