Package for Multivariate Multinomial Logit - multinomial

I would like to jointly estimate 3 variables. Two of them are categorical and the other one is binary. So I thought about a "multivariate multinomial logit model". I found a lot of theory about it (for Example Agresti 2007 Ch. 9 or Beel and Paap 2014) but I cannot find a package for R. Is there a built-in function or package I can use? I can switch to a bivariate multinomial logit if it is needed.
Thank you very much for your help in this matter!

There are several packages that might interest you for a multinomial logit model. They are mlogit, mnlogit, antitrust, and nnet.
mlogit: This is the most direct Multinomial Logit package currently available. It provides sample data, tools to estimate a multinomial logit model, and additional useful functions such as mlogit.optim to optimize specific parameters of multinomial logit functions.
mnlogit: This package is similar to mlogit, but it does not provide as many additional functions. It may be faster for the actual estimation process though.
antitrust: This package can estimate merger effects under logit (or nested logit) demand. This does not directly provide the multinomial logit coefficients, but it is very good at solving for the bottom line HHI and price effects of a merger.
nnet: This is a package for general multinomial log-linear models, and it can also estimate multinomial logit models.
Hope one of these packages helps for your purposes!

Related

Machine learning algorithm for correlation between indicators

I have a dataset with several indicators related to some geographical entities ,I want to study factors that influence an indicator A (among the other indicator) .I need to determine which indicators affect it the most (correlation)
which ML algo should I use
I want to have a kind of scoring function for my indicator A to allow its prediction
enter image description here
What you are looking for are correlation coefficients, you have multiple choices for that, the most commons are:
Pearson's coefficient which only measure linear relationship between two variables, see [Scipy's implementation]
Spearman's coefficient which can show non-linear relationship , see Scipy's implementation
You can also normalize your data using z-normalization and then do a simple Linear regression. The regression coefficient can give you an idea of the influence of each variable on the outcome. However this method is highly sensible to multi-collinearity which might be present, especially if your variables are geographical.
Could you provide an example of the dataset? Discrete or continuous variables? Which software are you using?
Anyway an easy way to test correlation (without going into ML algorithms in the very sense) is to simply perform Pearson's or Spearman's correlation coefficient on selected features or on the whole dataset by creating a matrix of the data. You can do that in Python with NumPy (see this) or in R (see this).
You can also use simple linear regression or logistic/multinomial logistic regression (depending on the nature of your data) to quantify the influence of the other features on your target variables. Just keep in mind that "correlation is not causation. Look here to see some models.
Then it depends on the object of your analysis whether to aggregate all the features of all the geographical points or create covariance matrices for each "subset" of observation related to the geographical points.

How to correct standard errors standard errors in a multinomial logit using IV

I am trying to estimate a multinomial logit model using an instrumental variable. I didn't find any preexisting package, so I tried to estimate using a two-stage approach.
First estimating the first stage as an OLS with the IV
tsls1<-lm(d~x+z)
Then I used
d.hat<-fitted.values(tsls1)
With that, I used multinom function from the nnet.
tsls2<-multinom(y~x+d.hat)
The problem is that the standard errors are wrong. I was wondering how I could correct them. Or if there is an easier way.

Clarifying statsmodels AutoReg(), ARMA() and SARIMAX() for time-series forecasting

I am buidling my first time-series prediction model with scikit-learn's LinearRegression(). I also came across statsmodels AutoReg(), ARMA() and SARIMAX(). Unfortunately out of the literature I could not figure out to consider them. Are they alternatives to LinearRegression()? Are they ML? Are they fundamental different?
I'd appreciate a hint, where to look further. Thanks.
All three fit variants of Seasonal Autoregressive Integrated Moving Average with eXogenous Variables (SARIMAX) models.
AutoReg
AutoReg is limited to only Autoregressive Models and so does not include Seasonal or Moving Average components. It does support exogenous regressors. It also supports complex deterministic processes such as Fourier series to model multiple seasonalities. Parameters are estimated using OLS which is equivalent to conditional maximum likelihood. Since parameters are estimated using OLS, estimation is very fast and completely deterministic.
ARIMA
ARIMA is a restricted version of SARIMAX that does not include Seasonal components or Exogenous regressors. Because it excludes these two types of terms, it can offer additional fitting options that are not available when fitting a full SARIMAX model. These have different statistical properties than the Maximum Likelihood method that is the only method available in SARIMAX (ARIMA also supports Maximum Likelihood). Many of these alternative parameter estimation methods are also faster than ML.
SARIMAX
SARIMAX supports all features of ARIMA plus the two additional components. It can only be estimated using Maximum Likelihood. ML uses numerical methods to maximize the function and so estimation of some series/models may encounter difficulties converging.
The examples page is the best place to look to see the detailed use of these models. Many of the notebooks include both code examples and LaTeX markup that explains the underlying math.

How to implement Breusch-Godfrey test for a regression with ARIMA errors in R

I’m fitting a regression with ARIMA errors with the fable package and as mentioned im my previous question the Breusch-Godfrey test is not available there.
The regression part of the model has two pairs of Fourier terms to account for yearly seasonality and several exogenous regressors. The residuals are modeled with a seasonal ARIMA(2,0,0)(1,0,0)[7] model. My goal is to check for autocorrelation in residuals.
I can use the Ljung-Box test but according to this thread and textbook sources there it will not be valid in presence of lags of the dependent variable.
And I’m afraid i will loose my model specification using different packages/libraries. An alternative might be to use Arima from the forecast package and retain model specification. Then use bgtest from lmtest package. But I can’t figure out how to do this.
According to this R forum the Breusch-Godfrey test for an ARIMA model can be done by fitting a simple regression of the residuals from the fitted model on a constant and then perform a bgtest. But it only concerns a simple AR(1) model with no exogenous regressors.
Is this the right way to do it? I’m concerned that for the BG test you have to perform an auxiliary regression on the regressors and lagged resuduals up to order p. How in this case the bgtest knows the X variables since they are not stored in the residuals object - this should be a simple vector.

linearmodels or statsmodels - what are the main differences?

Can anyone explain the different between statsmodels and linearmodels. They are both very similar with respect to many things, but I assume they must also differ?
Does anyone have any insights to share?
linearmodels has mostly models that are not (yet) available in statsmodels especially models for panel data, multivariate or system models and some instrumental variable models.
There is some overlap in functionality, for example generalized method of moments, GMM in linearmodels is for specific linear models, while GMM in statsmodels is designed for general nonlinear GMM with some linear models as special cases.
The author of linearmodels is also one of the main maintainers of statsmodels.
There are some smaller differences in design and style that came from different preferences by the authors of the two packages or because statsmodels handles a much larger and heterogeneous set of models and classes.

Resources