I am interested in using mixed logit model for prediction. Is there a function/ package in R that can do it for me? If not, how is it approached mathematically. I understand that the coefficients are random, therefore, a very naive approach is to draw from the distribution of betas and take mean for out of sample Xs.
This is probably what you're looking for:
mlogit is a package for R which enables the estimation of random utility models with individual and/or alternative specific variables. The main extensions of the basic multinomial model (heteroscedastic, nested and random parameter models) are implemented.
Some useful references:
Kenneth Train's exercises for mlogit
Estimation of Random Utility Models in R by Yves Croissant
you can use predict function directly
library("mlogit")
data("Mode", package="mlogit")
Mo <- mlogit.data(Mode, choice = 'choice', shape = 'wide',
varying = c(2:9))
# starting values of the beta coefficients
strt <- c(1.83086600, -1.28168186, 0.30935104, -0.41344010, -0.04665517,
1,0.25997237,0.73648694, 1.30789474, -0.79818416, 0.43013035)
p1 <- mlogit(choice ~ cost + time, Mo, seed = 20,
R = 100, probit = TRUE, start = strt)
Mo2 <- Mo
Mo2[Mo2$alt == 'car', 'cost'] <- Mo2[Mo2$alt == 'car', 'cost'] * 2
newShares <- apply(predict(p1, newdata = Mo2), 2, mean)
hope this answers your question
Related
I have defined three different models obtained from the dataset diabetes from the library lars. The first model (M1) is the one that minimizes the BIC value out of all the possible regression models obtained combining the explanatory variables (which are p=10, so 2^10 possible models). The other two are obtained through glmnet and are a Lasso regression with respectively lambda.min (M2) and lambda.1se (M3), where lambda.min and lambda.1se are obtained through cv.glmnet. Now I should perform 5-fold cross-validation using the RMSE (Root Mean Square Error) function to check which of the tree models Μ1, Μ2 and Μ3, has the best predictive performance. In order to find the errors in the models obtained from Lasso I have to use the ordinal least squares estimates.
This is my code as for now:
library(lars)
library(glmnet)
data(diabetes)
y<-diabetes$y
x<-diabetes$x
x2<-diabetes$x2
X = as.data.frame(cbind(x))
Y = as.data.frame(y)
p=10
n=442
best_score = Inf
M1 = NA
for (i in 1:(2^p-1)){
model = lm(y ~ ., data = subset(X, select = c(which(as.integer(intToBits(i)) == 1))))
if (BIC(model) < best_score){
M1 = model
best_score = BIC ( model )
}
}
W<-as.matrix(X)
Y<-as.matrix(Y)
lasso<-glmnet(W, Y)
x11()
plot(lasso, label=T)
x11()
plot(lasso, xvar = 'lambda', label=T)
lasso$df
lasso$lambda
cvfit<-cv.glmnet(W,Y)
cvfit
coef(cvfit, s="lambda.min")
coef(cvfit, s="lambda.1se")
M2<-glmnet(W,Y,lambda = cvfit$lambda.min)
M3<-glmnet(W,Y,lambda = cvfit$lambda.1se)
I really don't know where to put hands now. Should I first of all split the original dataset in 5 and then compute again the models on the different train and test set? And how do I compute the final RMSE for each model? And what does it mean that I should use ordinal least square estimates for the models obtained through Lasso?
For context, I have a small project in MATLAB where I try to replicate an algorithm involving some optimisation with the Newton algorithm. Although my issue is mainly with MATLAB, maybe it's my lacking profound background knowledge what's keeping me from finding a solution, so feel free to redirect me to the appropriate StackExchange site if needed.
The function I need to calculate the gradient vector and Hessian matrix for the optimization is :
function [zi] = Zi(lambda,j)
zi = m(j)*exp(-(lambda*v_tilde(j,:).'));
end
function [z] = Z(lambda)
res = arrayfun(#(x) Zi(lambda,x),1:length(omega));
z = sum(res);
end
function [f] = F(lambda)
f = log(Z(lambda));
end
where omega and v_tilde are Matrices of n d-Dimensional vectors and lambda is the d-Dimensional argument to the function. (right now, m(j) are just selectors (1 or 0), but the algorithm allows to refine these, so they shouldn't be removed.
I use the Derivest Suite to calculate the gradient and Hessian numerically, and, although logically slow for high dimensions, the algorithm as a whole works.
I implemented the same solution using the sym package, so that I could compute the gradient and Hessian in advance for some fix n and d, so they can then be evaluated quickly when needed. This would be the symbolic version:
V_TILDE = sym('v_tilde',[d,n])
syms n k
lambda = sym('lambda',[d,1]);
F = log(M*exp(-(transpose(V_TILDE)*lambda)));
matlabFunction( grad_F, 'File', sprintf('Grad_%d_dim_%d_n.m',d,n_max), 'vars',{a,lambda,V_TILDE});
matlabFunction( hesse_F, 'File', sprintf('Hesse_%d_dim_%d_n.m',d,n_max), 'vars',{a,lambda,V_TILDE});
As n is fix, there is no need to iterate over omega. The gradient and Hessian of this can be obtained through the corresponding functions of sym and then stored as matlabFunctions.
However, when I test both implementations against some values, they don't match, surprisingly though, the values of the hessian matrix match while the values of the gradient don't (the numerical calculation being correct), and the Newton algorithm iterates until the values are just NaN. These are some example values for d=2 and n=8:
Omega:
12.6987 91.3376
95.7507 96.4889
15.7613 97.0593
95.7167 48.5376
70.6046 3.1833
27.6923 4.6171
9.7132 82.3458
95.0222 3.4446
v:
61.2324
52.2271
gNum = HNum = 1.0e+03 *
8.3624 1.4066 -0.5653
-1.1496 -0.5653 1.6826
gSym = HSym = 1.0e+03 *
-52.8700 1.4066 -0.5653
-53.3768 -0.5653 1.6826
As you can see, the values of HNum and HSym match, but the gradients don't.
I'm happy to give any more context information, code snippets, or anything that helps. Thank you in advance!
Edit: As requested, here is a minimal test. The problem is basically that the values of gNum and gSym don't match (longer explanation above):
omega = [[12.6987, 91.3376];[95.7507, 96.4889];[15.7613, 97.0593];
[95.7167, 48.5376];[70.6046, 3.1833];[27.6923, 4.6171];[9.7132, 82.3458];
[95.0222, 3.4446]];
v = [61.2324;52.2271];
gradStr = sprintf('Grad_%d_dim_%d_n',length(omega(1,:)),length(omega));
hesseStr = sprintf('Hesse_%d_dim_%d_n',length(omega(1,:)),length(omega));
g = str2func(gradStr);
H = str2func(hesseStr);
selector = ones(1,length(omega)); %this will change, when n_max>n
vtilde = zeros(length(omega),length(omega(1,:)));
for i = 1:length(omega)
vtilde(i,:) = omega(i,:)-v;
end
lambda = zeros(1,length(omega(1,:))); % start of the optimization
[gNum,~,~] = gradest(#F,lambda)
[HNum,~] = hessian(#F,lambda)
gSym = g(selector,lambda.',omega.')
HSym = H(selector,lambda.',omega.')
Note: The DerivestSuite is a small library (~6 source files) that can be obtained under https://de.mathworks.com/matlabcentral/fileexchange/13490-adaptive-robust-numerical-differentiation
I implemented a probabilistic matrix factorization model (R = U'V) following the example in Edward's repo:
# data
U_true = np.random.randn(D, N)
V_true = np.random.randn(D, M)
R_true = np.dot(np.transpose(U_true), V_true) + np.random.normal(0, 0.1, size=(N, M))
# model
I = tf.placeholder(tf.float32, [N, M])
U = Normal(loc=tf.zeros([D, N]), scale=tf.ones([D, N]))
V = Normal(loc=tf.zeros([D, M]), scale=tf.ones([D, M]))
R = Normal(loc=tf.matmul(tf.transpose(U), V), scale=tf.ones([N, M]))
I get a good performance when predicting the data in matrix R. However, when I evaluate the inferred traits in U and V, the error varies a lot and can get very high.
I tried with a latent space of small dimension (e.g. 2) and checked if latent traits weren't simply permuted. They sometimes get permuted but even after realigning them the error is still significant.
To throw some numbers: for a synthetic R matrix generated from U and V both normally distributed (mean 0 and variance 1), I can achieve a mean absolute error of 0.003 on R, but on U and V it's usually around 0.5.
I know this model is symmetric, but I am not sure about the implications. I would like to ask:
Is it actually possible to guarantee the recovery of the original latent traits in some way?
If so, how could it be achieved, preferably using Edward?
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have two functions
m1 = f1(w, s)
m2 = f2(w, s)
f1() and f2() are all blackboxs. Given w and s, I can get m1 and m2.
Now, I need to design or find a function g, such that
m2' = g(m1)
Also, the difference between m2 and m2' must be minimized.
The w and s are all stochastic process.
How can I find such a function g()? What knowledge domain does this belong to ?
Assuming you can invoke f1,f2 as many times as you want - this can be solved using regression.
Set a training set: (w_1,s_1,m2_1),...,(w_n,s_n,m2_n).
'Convert' the set to the parameters of g:
(m1_1,m2_1),...,(m1_n,m2_n).
Create your 'base functions'. For example, for base functions of
polynoms up to degree 3 the the 'modified' training set will be
(1,m1_1,m1_1^2,m1_1^3,m2_1), ... It is easy to generalize it to any
degree of polynom or any other set base functions.
Now you have yourself a problem which can be solved by linear
regression using ordinary least squares (OLS)
However, note that for some functions, this might be impossible to calculate find a good model to fit, since you lose data when you reduce the dimensionality from 2 (w,s) to 1 (m1).
Matlab code snap (poor choice of functions):
%example functions
f = #(w,s) w.^2 + s.^3 -1;
g = #(w,s) s.^2 - w + 2;
%random points for sampling
w = rand(1,100);
s = rand(1,100);
%the data
m1 = f(w,s)';
m2 = g(w,s)';
%changing dimension:
d = 5;
points = size(m1,1);
A = ones(points,d);
for jj=1:d
A(:,jj) = (m1.^(jj-1))';
end
%OLS:
theta = pinv(A'*A)*A'*m2;
%new point:
w = rand(1,1);
s = rand(1,1);
m1 = f(w,s);
%estimate the new point:
A = ones(1,d);
for jj=1:d
A(:,jj) = (m1.^(jj-1))';
end
%the estimation:
estimated = A*theta
%the real value:
g(w,s)
This kind of problems are studied in fields such as statistic or inverse problems. Here's one way to approach the problem theoretically (from the point of view of inverse problems):
First of all, it is quite clear that in the general case, the function g might not exists. However, what you can (try to) compute, given that you (assume to) know something about the statistics of w and s, is the posterior probability density p(m2|m1), which can then be used to compute estimators for m2 given m1, for instance, a maximum a posteriori estimate.
The posterior density can be computed using Bayes' formula:
p(m2|m1) = (\int p(m1,m2|w,s)p(w,s) dw ds) / (\int p(m1|w,s) dw ds)
which, in this case, might be (theoretically) nasty to apply since some of the involved maginal probability densities are singular. The best way to proceed numerically depends on the additional assumptions you can do on the statistics of w and s (e.g., Gaussian) and the functions f1, f2 (e.g., smooth). There is no silver bullet.
amit's OLS solution is probably a good starting point. Just be sure to sample from the correct distributions for w and s.
How can I convert a uniform distribution (as most random number generators produce, e.g. between 0.0 and 1.0) into a normal distribution? What if I want a mean and standard deviation of my choosing?
There are plenty of methods:
Do not use Box Muller. Especially if you draw many gaussian numbers. Box Muller yields a result which is clamped between -6 and 6 (assuming double precision. Things worsen with floats.). And it is really less efficient than other available methods.
Ziggurat is fine, but needs a table lookup (and some platform-specific tweaking due to cache size issues)
Ratio-of-uniforms is my favorite, only a few addition/multiplications and a log 1/50th of the time (eg. look there).
Inverting the CDF is efficient (and overlooked, why ?), you have fast implementations of it available if you search google. It is mandatory for Quasi-Random numbers.
The Ziggurat algorithm is pretty efficient for this, although the Box-Muller transform is easier to implement from scratch (and not crazy slow).
Changing the distribution of any function to another involves using the inverse of the function you want.
In other words, if you aim for a specific probability function p(x) you get the distribution by integrating over it -> d(x) = integral(p(x)) and use its inverse: Inv(d(x)). Now use the random probability function (which have uniform distribution) and cast the result value through the function Inv(d(x)). You should get random values cast with distribution according to the function you chose.
This is the generic math approach - by using it you can now choose any probability or distribution function you have as long as it have inverse or good inverse approximation.
Hope this helped and thanks for the small remark about using the distribution and not the probability itself.
Here is a javascript implementation using the polar form of the Box-Muller transformation.
/*
* Returns member of set with a given mean and standard deviation
* mean: mean
* standard deviation: std_dev
*/
function createMemberInNormalDistribution(mean,std_dev){
return mean + (gaussRandom()*std_dev);
}
/*
* Returns random number in normal distribution centering on 0.
* ~95% of numbers returned should fall between -2 and 2
* ie within two standard deviations
*/
function gaussRandom() {
var u = 2*Math.random()-1;
var v = 2*Math.random()-1;
var r = u*u + v*v;
/*if outside interval [0,1] start over*/
if(r == 0 || r >= 1) return gaussRandom();
var c = Math.sqrt(-2*Math.log(r)/r);
return u*c;
/* todo: optimize this algorithm by caching (v*c)
* and returning next time gaussRandom() is called.
* left out for simplicity */
}
Where R1, R2 are random uniform numbers:
NORMAL DISTRIBUTION, with SD of 1:
sqrt(-2*log(R1))*cos(2*pi*R2)
This is exact... no need to do all those slow loops!
Reference: dspguide.com/ch2/6.htm
Use the central limit theorem wikipedia entry mathworld entry to your advantage.
Generate n of the uniformly distributed numbers, sum them, subtract n*0.5 and you have the output of an approximately normal distribution with mean equal to 0 and variance equal to (1/12) * (1/sqrt(N)) (see wikipedia on uniform distributions for that last one)
n=10 gives you something half decent fast. If you want something more than half decent go for tylers solution (as noted in the wikipedia entry on normal distributions)
I would use Box-Muller. Two things about this:
You end up with two values per iteration
Typically, you cache one value and return the other. On the next call for a sample, you return the cached value.
Box-Muller gives a Z-score
You have to then scale the Z-score by the standard deviation and add the mean to get the full value in the normal distribution.
It seems incredible that I could add something to this after eight years, but for the case of Java I would like to point readers to the Random.nextGaussian() method, which generates a Gaussian distribution with mean 0.0 and standard deviation 1.0 for you.
A simple addition and/or multiplication will change the mean and standard deviation to your needs.
The standard Python library module random has what you want:
normalvariate(mu, sigma)
Normal distribution. mu is the mean, and sigma is the standard deviation.
For the algorithm itself, take a look at the function in random.py in the Python library.
The manual entry is here
This is a Matlab implementation using the polar form of the Box-Muller transformation:
Function randn_box_muller.m:
function [values] = randn_box_muller(n, mean, std_dev)
if nargin == 1
mean = 0;
std_dev = 1;
end
r = gaussRandomN(n);
values = r.*std_dev - mean;
end
function [values] = gaussRandomN(n)
[u, v, r] = gaussRandomNValid(n);
c = sqrt(-2*log(r)./r);
values = u.*c;
end
function [u, v, r] = gaussRandomNValid(n)
r = zeros(n, 1);
u = zeros(n, 1);
v = zeros(n, 1);
filter = r==0 | r>=1;
% if outside interval [0,1] start over
while n ~= 0
u(filter) = 2*rand(n, 1)-1;
v(filter) = 2*rand(n, 1)-1;
r(filter) = u(filter).*u(filter) + v(filter).*v(filter);
filter = r==0 | r>=1;
n = size(r(filter),1);
end
end
And invoking histfit(randn_box_muller(10000000),100); this is the result:
Obviously it is really inefficient compared with the Matlab built-in randn.
This is my JavaScript implementation of Algorithm P (Polar method for normal deviates) from Section 3.4.1 of Donald Knuth's book The Art of Computer Programming:
function normal_random(mean,stddev)
{
var V1
var V2
var S
do{
var U1 = Math.random() // return uniform distributed in [0,1[
var U2 = Math.random()
V1 = 2*U1-1
V2 = 2*U2-1
S = V1*V1+V2*V2
}while(S >= 1)
if(S===0) return 0
return mean+stddev*(V1*Math.sqrt(-2*Math.log(S)/S))
}
I thing you should try this in EXCEL: =norminv(rand();0;1). This will product the random numbers which should be normally distributed with the zero mean and unite variance. "0" can be supplied with any value, so that the numbers will be of desired mean, and by changing "1", you will get the variance equal to the square of your input.
For example: =norminv(rand();50;3) will yield to the normally distributed numbers with MEAN = 50 VARIANCE = 9.
Q How can I convert a uniform distribution (as most random number generators produce, e.g. between 0.0 and 1.0) into a normal distribution?
For software implementation I know couple random generator names which give you a pseudo uniform random sequence in [0,1] (Mersenne Twister, Linear Congruate Generator). Let's call it U(x)
It is exist mathematical area which called probibility theory.
First thing: If you want to model r.v. with integral distribution F then you can try just to evaluate F^-1(U(x)). In pr.theory it was proved that such r.v. will have integral distribution F.
Step 2 can be appliable to generate r.v.~F without usage of any counting methods when F^-1 can be derived analytically without problems. (e.g. exp.distribution)
To model normal distribution you can cacculate y1*cos(y2), where y1~is uniform in[0,2pi]. and y2 is the relei distribution.
Q: What if I want a mean and standard deviation of my choosing?
You can calculate sigma*N(0,1)+m.
It can be shown that such shifting and scaling lead to N(m,sigma)
I have the following code which maybe could help:
set.seed(123)
n <- 1000
u <- runif(n) #creates U
x <- -log(u)
y <- runif(n, max=u*sqrt((2*exp(1))/pi)) #create Y
z <- ifelse (y < dnorm(x)/2, -x, NA)
z <- ifelse ((y > dnorm(x)/2) & (y < dnorm(x)), x, z)
z <- z[!is.na(z)]
It is also easier to use the implemented function rnorm() since it is faster than writing a random number generator for the normal distribution. See the following code as prove
n <- length(z)
t0 <- Sys.time()
z <- rnorm(n)
t1 <- Sys.time()
t1-t0
function distRandom(){
do{
x=random(DISTRIBUTION_DOMAIN);
}while(random(DISTRIBUTION_RANGE)>=distributionFunction(x));
return x;
}