linearmodels or statsmodels - what are the main differences? - statsmodels

Can anyone explain the different between statsmodels and linearmodels. They are both very similar with respect to many things, but I assume they must also differ?
Does anyone have any insights to share?

linearmodels has mostly models that are not (yet) available in statsmodels especially models for panel data, multivariate or system models and some instrumental variable models.
There is some overlap in functionality, for example generalized method of moments, GMM in linearmodels is for specific linear models, while GMM in statsmodels is designed for general nonlinear GMM with some linear models as special cases.
The author of linearmodels is also one of the main maintainers of statsmodels.
There are some smaller differences in design and style that came from different preferences by the authors of the two packages or because statsmodels handles a much larger and heterogeneous set of models and classes.

Related

Clarifying statsmodels AutoReg(), ARMA() and SARIMAX() for time-series forecasting

I am buidling my first time-series prediction model with scikit-learn's LinearRegression(). I also came across statsmodels AutoReg(), ARMA() and SARIMAX(). Unfortunately out of the literature I could not figure out to consider them. Are they alternatives to LinearRegression()? Are they ML? Are they fundamental different?
I'd appreciate a hint, where to look further. Thanks.
All three fit variants of Seasonal Autoregressive Integrated Moving Average with eXogenous Variables (SARIMAX) models.
AutoReg
AutoReg is limited to only Autoregressive Models and so does not include Seasonal or Moving Average components. It does support exogenous regressors. It also supports complex deterministic processes such as Fourier series to model multiple seasonalities. Parameters are estimated using OLS which is equivalent to conditional maximum likelihood. Since parameters are estimated using OLS, estimation is very fast and completely deterministic.
ARIMA
ARIMA is a restricted version of SARIMAX that does not include Seasonal components or Exogenous regressors. Because it excludes these two types of terms, it can offer additional fitting options that are not available when fitting a full SARIMAX model. These have different statistical properties than the Maximum Likelihood method that is the only method available in SARIMAX (ARIMA also supports Maximum Likelihood). Many of these alternative parameter estimation methods are also faster than ML.
SARIMAX
SARIMAX supports all features of ARIMA plus the two additional components. It can only be estimated using Maximum Likelihood. ML uses numerical methods to maximize the function and so estimation of some series/models may encounter difficulties converging.
The examples page is the best place to look to see the detailed use of these models. Many of the notebooks include both code examples and LaTeX markup that explains the underlying math.

How to train on very small data set?

We are trying to understand the underlying model of Rasa - the forums there still didnt get us an answer - on two main questions:
we understand that Rasa model is a transformer-based architecture. Was it
pre-trained on any data set? (eg wikipedia, etc)
then, if we
understand correctly, the intent classification is a fine tuning task
on top of that transformer. How come it works with such small
training sets?
appreciate any insights!
thanks
Lior
the transformer model is not pre-trained on any dataset. We use quite a shallow stack of transformer which is not as data hungry as deeper stacks of transformers used in large pre-trained language models.
Having said that, there isn't an exact number of data points that will be sufficient for training your assistant as it varies by the domain and your problem. Usually a good estimate is 30-40 examples per intent.

GEKKO - General - custom reusable flowsheet object - chemical process flowsheet modelling

No problems to speak of and nor am I currently a user. I am seeing advice on the best implementation practice for flowsheet models. Is there a framework to create custom flowsheet objects in GEKKO/chemical? Is the flowsheet module a mature and equal feature of GEKKO?
I am dealing with a number of applications which would benefit from the ability to inherit flowsheet objects from a yet to be developed custom library, if possible. One such item could be a tubular reactor as described here where it is solved in COMSOL (http://umich.edu/~elements/5e/web_mod/radialeffects/unsteady/index1.htm). Scenarios could involve several unit operations connected in series with recycle streams such as mixer settlers in solvent extraction which also has multiple liquid phases (organic and aqueous). It is worth nothing that all of the models would be of the unsteady state type.
I appreciate the thoughts of the user group in this respect.
Gekko doesn't currently allow black-box models where the equations are not available for requesting information such as first and second derivatives in sparse form. For that reason, a model in COMSOL wouldn't be a good fit for Gekko. If you would like to try to model the same PDE in Gekko, that is a possibility. Here are some PDE applications that may help give you inspiration:
Solid Oxide Fuel Cell
Parabolic and Hyperbolic PDEs Solved with Gekko
The Chemicals library is somewhat limited but it does have some thermodynamic data and basic reactor types. You could put many lumped parameter reactors in series to emulate a Plug Flow Reactor but it may be better to just write out the PDE equations. You may want to write out your own equations instead of relying on the Chemicals library.

PyMC: Hidden Markov Models

How suitable is PyMC in its currently available versions for modelling continuous emission HMMs?
I am interested in having a framework where I can easily explore model variations, without having to update E- and M-step, and dynamic programming recursions for every change I make to the model.
More specific questions are:
When modelling an HMM in PyMC can I answer the 'typical' tasks that one would like to solve -- i.e., besides parameter estimation also infer the most likely sequence (as usually done with the Viterbi algorithm), or solve a smoothing problem?
As compared to an implementation with Expectation Maximization, I would expect a sampling based approach to be slower. If that gives me more flexibility on the model building side, that is fine. I would imagine using PyMC for prototyping models. I am wondering though, if I can expect PyMC to handle inference for models with > 10k observations to finish in any reasonable amount of time.
Would you recommend starting out with PyMC2 or PyMC3 for model building. I know that the inference engine changed between the version, so I would especially wonder what type of sampler might be more suited.
If you'ld think PyMC is not a good choice for my use case, that definitely helps as an answer as well.

Is Latent Semantic Indexing (LSI) a Statistical Classification algorithm?

Is Latent Semantic Indexing (LSI) a Statistical Classification algorithm? Why or why not?
Basically, I'm trying to figure out why the Wikipedia page for Statistical Classification does not mention LSI. I'm just getting into this stuff and I'm trying to see how all the different approaches for classifying something relate to one another.
No, they're not quite the same. Statistical classification is intended to separate items into categories as cleanly as possible -- to make a clean decision about whether item X is more like the items in group A or group B, for example.
LSI is intended to show the degree to which items are similar or different and, primarily, find items that show a degree of similarity to an specified item. While this is similar, it's not quite the same.
LSI/LSA is eventually a technique for dimensionality reduction, and usually is coupled with a nearest neighbor algorithm to make it a into classification system. Hence in itself, its only a way of "indexing" the data in lower dimension using SVD.
Have you read about LSI on Wikipedia ? It says it uses matrix factorization (SVD), which in turn is sometimes used in classification.
The primary distinction in machine learning is between "supervised" and "unsupervised" modeling.
Usually the words "statistical classification" refer to supervised models, but not always.
With supervised methods the training set contains a "ground-truth" label that you build a model to predict. When you evaluate the model, the goal is to predict the best guess at (or probability distribution of) the true label, which you will not have at time of evaluation. Often there's a performance metric and it's quite clear what the right vs wrong answer is.
Unsupervised classification methods attempt to cluster a large number of data points which may appear to vary in complicated ways into a smaller number of "similar" categories. Data in each category ought to be similar in some kind of 'interesting' or 'deep' way. Since there is no "ground truth" you can't evaluate 'right or wrong', but 'more' vs 'less' interesting or useful.
Similarly evaluation time you can place new examples into potentially one of the clusters (crisp classification) or give some kind of weighting quantifying how similar or different looks like the "archetype" of the cluster.
So in some ways supervised and unsupervised models can yield something which is a "prediction", prediction of class/cluster label, but they are intrinsically different.
Often the goal of an unsupervised model is to provide more intelligent and powerfully compact inputs for a subsequent supervised model.

Resources