Arima and Arima-Egarch create simulations - arima

I have a projection of returns on an asset for 20yrs (annual).
I want to simulate 10,000 stochastic scenarios using some Arima and Arima Egarch models.
Is there any code readily available for such a thing? Or where is a good place to look for examples of this? What sort of packages / functions would work here?
Very basic R skills so far.
Thanks

Related

DistilBert for self-supervision - switch heads for pre-training: MaskedLM and SequenceClassification

Say I want to train a model for sequence classification. And so I define my model to be:
model = DistilBertForSequenceClassification.from_pretrained("bert-base-uncased")
My question is - what would be the optimal way if I want to pre-train this model with masked language modeling task? After pre-training I would like to model to train on the down-stream task of sequence classification.
My understanding is that I can somehow switch the heads of my model and a DistilBertForMaskedLM for pre-training, and then switch it back to the original downstream task. Although I haven't figured out if this is indeed optimal or how to write it.
Does hugging face offer any built in function that accepts the input ids, a percentage of tokens to masked (which aren't pad tokens) and simply trains the model?
Thank you in advance
I've tried to implement this myself, and while it does seem to work it is extremely slow. I figured there could already be implemented solutions instead of trying to optimize my code.

Clarifying statsmodels AutoReg(), ARMA() and SARIMAX() for time-series forecasting

I am buidling my first time-series prediction model with scikit-learn's LinearRegression(). I also came across statsmodels AutoReg(), ARMA() and SARIMAX(). Unfortunately out of the literature I could not figure out to consider them. Are they alternatives to LinearRegression()? Are they ML? Are they fundamental different?
I'd appreciate a hint, where to look further. Thanks.
All three fit variants of Seasonal Autoregressive Integrated Moving Average with eXogenous Variables (SARIMAX) models.
AutoReg
AutoReg is limited to only Autoregressive Models and so does not include Seasonal or Moving Average components. It does support exogenous regressors. It also supports complex deterministic processes such as Fourier series to model multiple seasonalities. Parameters are estimated using OLS which is equivalent to conditional maximum likelihood. Since parameters are estimated using OLS, estimation is very fast and completely deterministic.
ARIMA
ARIMA is a restricted version of SARIMAX that does not include Seasonal components or Exogenous regressors. Because it excludes these two types of terms, it can offer additional fitting options that are not available when fitting a full SARIMAX model. These have different statistical properties than the Maximum Likelihood method that is the only method available in SARIMAX (ARIMA also supports Maximum Likelihood). Many of these alternative parameter estimation methods are also faster than ML.
SARIMAX
SARIMAX supports all features of ARIMA plus the two additional components. It can only be estimated using Maximum Likelihood. ML uses numerical methods to maximize the function and so estimation of some series/models may encounter difficulties converging.
The examples page is the best place to look to see the detailed use of these models. Many of the notebooks include both code examples and LaTeX markup that explains the underlying math.

How to implement Breusch-Godfrey test for a regression with ARIMA errors in R

I’m fitting a regression with ARIMA errors with the fable package and as mentioned im my previous question the Breusch-Godfrey test is not available there.
The regression part of the model has two pairs of Fourier terms to account for yearly seasonality and several exogenous regressors. The residuals are modeled with a seasonal ARIMA(2,0,0)(1,0,0)[7] model. My goal is to check for autocorrelation in residuals.
I can use the Ljung-Box test but according to this thread and textbook sources there it will not be valid in presence of lags of the dependent variable.
And I’m afraid i will loose my model specification using different packages/libraries. An alternative might be to use Arima from the forecast package and retain model specification. Then use bgtest from lmtest package. But I can’t figure out how to do this.
According to this R forum the Breusch-Godfrey test for an ARIMA model can be done by fitting a simple regression of the residuals from the fitted model on a constant and then perform a bgtest. But it only concerns a simple AR(1) model with no exogenous regressors.
Is this the right way to do it? I’m concerned that for the BG test you have to perform an auxiliary regression on the regressors and lagged resuduals up to order p. How in this case the bgtest knows the X variables since they are not stored in the residuals object - this should be a simple vector.

linearmodels or statsmodels - what are the main differences?

Can anyone explain the different between statsmodels and linearmodels. They are both very similar with respect to many things, but I assume they must also differ?
Does anyone have any insights to share?
linearmodels has mostly models that are not (yet) available in statsmodels especially models for panel data, multivariate or system models and some instrumental variable models.
There is some overlap in functionality, for example generalized method of moments, GMM in linearmodels is for specific linear models, while GMM in statsmodels is designed for general nonlinear GMM with some linear models as special cases.
The author of linearmodels is also one of the main maintainers of statsmodels.
There are some smaller differences in design and style that came from different preferences by the authors of the two packages or because statsmodels handles a much larger and heterogeneous set of models and classes.

PyMC: Hidden Markov Models

How suitable is PyMC in its currently available versions for modelling continuous emission HMMs?
I am interested in having a framework where I can easily explore model variations, without having to update E- and M-step, and dynamic programming recursions for every change I make to the model.
More specific questions are:
When modelling an HMM in PyMC can I answer the 'typical' tasks that one would like to solve -- i.e., besides parameter estimation also infer the most likely sequence (as usually done with the Viterbi algorithm), or solve a smoothing problem?
As compared to an implementation with Expectation Maximization, I would expect a sampling based approach to be slower. If that gives me more flexibility on the model building side, that is fine. I would imagine using PyMC for prototyping models. I am wondering though, if I can expect PyMC to handle inference for models with > 10k observations to finish in any reasonable amount of time.
Would you recommend starting out with PyMC2 or PyMC3 for model building. I know that the inference engine changed between the version, so I would especially wonder what type of sampler might be more suited.
If you'ld think PyMC is not a good choice for my use case, that definitely helps as an answer as well.

Resources