Ridge, Lasso, and Ridge in stats-models package - statsmodels

I am using the stats models package for Ridge, Lasso, and Elastic Net and refer to the documentation.
My Ridge regression hyper-parameters:
L1_wt = 0, and then method = 'sqrt_lasso'
My Lasso regression hyper-parameters:
L1_wt = 1, and then method = 'sqrt_lasso'
But I get almost the same results (not exact, but very close)
Then, trying the following hyper-parameters produces different results:
L1_wt = 0.02, and then method = 'elastic_net'
My question: I think I miss something and use the wrong hyper-parameters for ridge regression (the example 1) The documentation does not specify it explicitly - my guess is to get the ridge results I need to use the following hyper-parameters:
L1_wt = 0, and then method = 'elastic_net'
Can anyone confirm that this?

Related

SARIMAX model in PyMC3

I would like to write down the following SARIMAX model (2,0,0) (2,0,0,12) in PyMC3 to perform bayesian estimation of its coefficients but I cannot figure out how to start with the seasonal part
Has anyone tries something like this?
with pm.Model() as ar2:
theta = pm.Normal("theta", 0.0, 1.0, shape=2)
sigma = pm.HalfNormal("sigma", 3)
likelihood = pm.AR("y", theta, sigma=sigma, observed=data)
trace = pm.sample(
1000,
tune=2000,
random_seed=13,
)
idata = az.from_pymc3(trace)
Although it would be best (e.g. best performance) if you can get an answer that uses PyMC3 exclusively, in case that does not exist yet, there is an alternative way to do this that uses the SARIMAX model in Statsmodels in combination with PyMC3.
There are too many details to repeat a full answer here, but basically you wrap the log-likelihood and gradient methods associated with a Statsmodels SARIMAX model. Here is a link to an example Jupyter notebook that shows how to do this:
https://www.statsmodels.org/stable/examples/notebooks/generated/statespace_sarimax_pymc3.html
I'm not sure if you'll still need it, however, expanding on cfulton's answer, here is how to fix the error in the statsmodels example (https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_sarimax_pymc3.html, cell 8):
with pm.Model():
# Priors
arL1 = pm.Uniform('ar.L1', -0.99, 0.99)
maL1 = pm.Uniform('ma.L1', -0.99, 0.99)
sigma2 = pm.InverseGamma('sigma2', 2, 4)
# convert variables to tensor vectors
# # this is wrong:
theta = tt.as_tensor_variable([arL1, maL1, sigma2])
# # this is correct:
theta = tt.as_tensor_variable([arL1, maL1, sigma2], 'v')
# use a DensityDist (use a lamdba function to "call" the Op)
# # this is wrong:
# pm.DensityDist('likelihood', lambda v: loglike(v), observed={'v': theta})
# # this is correct:
pm.DensityDist('likelihood', lambda v: loglike(v), observed=theta)
# Draw samples
trace = pm.sample(ndraws, tune=nburn, discard_tuned_samples=True, cores=4)
I'm no pymc3/theano expert, but I think the error means that Theano has failed to associate the tensor's name with the values. If you define the name along with the values right at the beginning, it works.
I know it's not a direct answer to your question. Nevertheless, I hope it helps.

Copying embeddings for gensim word2vec

I wanted to see if I can simply set new weights for gensim's Word2Vec without training. I get the 20 News Group data set from scikit-learn (from sklearn.datasets import fetch_20newsgroups) and trained an instance of Word2Vec on it:
model_w2v = models.Word2Vec(sg = 1, size=300)
model_w2v.build_vocab(all_tokens)
model_w2v.train(all_tokens, total_examples=model_w2v.corpus_count, epochs = 30)
Here all_tokens is the tokenized data set.
Then I created a new instance of Word2Vec without training
model_w2v_new = models.Word2Vec(sg = 1, size=300)
model_w2v_new.build_vocab(all_tokens)
and set the embeddings of the new Word2Vec equal to the first one
model_w2v_new.wv.vectors = model_w2v.wv.vectors
Most of the functions work as expected, e.g.
model_w2v.wv.similarity( w1='religion', w2 = 'religions')
> 0.4796233
model_w2v_new.wv.similarity( w1='religion', w2 = 'religions')
> 0.4796233
and
model_w2v.wv.words_closer_than(w1='religion', w2 = 'judaism')
> ['religions']
model_w2v_new.wv.words_closer_than(w1='religion', w2 = 'judaism')
> ['religions']
and
entities_list = list(model_w2v.wv.vocab.keys()).remove('religion')
model_w2v.wv.most_similar_to_given(entity1='religion',entities_list = entities_list)
> 'religions'
model_w2v_new.wv.most_similar_to_given(entity1='religion',entities_list = entities_list)
> 'religions'
However, most_similar doesn't work:
model_w2v.wv.most_similar(positive=['religion'], topn=3)
[('religions', 0.4796232581138611),
('judaism', 0.4426296651363373),
('theists', 0.43141329288482666)]
model_w2v_new.wv.most_similar(positive=['religion'], topn=3)
>[('roderick', 0.22643062472343445),
> ('nci', 0.21744996309280396),
> ('soviet', 0.20012077689170837)]
What am I missing?
Disclaimer. I posted this question on datascience.stackexchange but got no response, hoping to have a better luck here.
Generally, your approach should work.
It's likely the specific problem you're encountering was caused by an extra probing step you took and is not shown in your code, because you had no reason to think it significant: some sort of most_similar()-like operation on model_w2v_new after its build_vocab() call but before the later, malfunctioning operations.
Traditionally, most_similar() calculations operate on a version of the vectors that has been normalized to unit-length. The 1st time these unit-normed vectors are needed, they're calculated – and then cached inside the model. So, if you then replace the raw vectors with other values, but don't discard those cached values, you'll see results like you're reporting – essentially random, reflecting the randomly-initialized-but-never-trained starting vector values.
If this is what happened, just discarding the cached values should cause the next most_similar() to refresh them properly, and then you should get the results you expect:
model_w2v_new.wv.vectors_norm = None

Can I perform Keras training in a deterministic manner?

I'm using a Keras Sequential model where the inputs and labels are exactly the same each run. Keras is using a Tensorflow backend.
I've set the layer activations to 'zeros' and disabled batch shuffling during training.
model = Sequential()
model.add(Dense(128,
activation='relu',
kernel_initializer='zeros',
bias_initializer='zeros'))
...
model.compile(optimizer='rmsprop', loss='binary_crossentropy')
model.fit(x_train, y_train,
batch_size = 128, verbose = 1, epochs = 200,
validation_data=(x_validation, y_validation),
shuffle=False)
I've also tried seeding Numpy's random() method:
np.random.seed(7) # fix random seed for reproducibility
With the above in place I still receive different accuracy and loss values after training.
Am I missing something or is there no way to fully remove the variance between trainings?
Since this seems to be a real issue, as commented before, maybe you could go for manually initializing your weights (instead of trusting the 'zeros' parameter passed in the layer constructor):
#where you see layers[0], it's possible that the correct layer is layers[1] - I can't test at this moment.
weights = model.layers[0].get_weights()
ws = np.zeros(weights[0].shape)
bs = np.zeros(weights[1].shape)
model.layers[0].set_weights([ws,bs])
It seems the problem occurs in training and not initialization. You can check this by first initializing two models model1 and model2 and running the following code:
w1 = model1.get_weights()
w2 = model2.get_weights()
for i in range(len(w1)):
w1i = w1[i]
w2i = w2[i]
assert np.allclose(w1i, w2i), (w1i, w2i)
print("Weight %i were equal. "%i)
print("All initial weights were equal. ")
Even though all assertions passed, training model1 and model2 with shuffle=False yielded different models. That is, if I perform similar assertions on the weights of model1 and model2 after training the assertions all fail. This suggests that the problem lies in randomness from training.
As of this post I have not managed to figure out how to circumvent this.

pymc3 how to code multi-state discrete Bayes net CPT?

I'm trying to build a simple Bayesian network, where rain and sprinkler are the parents of wetgrass, but rain and sprinkler each have three (fuzzy-logic type rather rather than the usual two boolean) states, and wetgrass has two states (true/false). I can't find anywhere in the pymc3 docs what syntax to use to describe the CPTs for this -- I'm trying the following based on 2-state examples but it's not generalizing to three states the way I thought it would. Can anyone show the correct way to do this? (And also for the more general case where wetgrass has three states too.)
rain = mc.Categorical('rain', p = np.array([0.5, 0. ,0.5]))
sprinker = mc.Categorical('sprinkler', p=np.array([0.33,0.33,0.34]))
wetgrass = mc.Categorical('wetgrass',
mc.math.switch(rain,
mc.math.switch(sprinker, 10, 1, -4),
mc.math.switch(sprinker, -20, 1, 3),
mc.math.switch(sprinker, -5, 1, -0.5)))
[gives error at wetgrass definition:
Wrong number of inputs for Switch.make_node (got 4((, , , )), expected 3)
]
As I understand it - switch is a theano function similar to (b?a:b) in a C program; which is only doing a two way comparison. It's maybe possible to set up the CPT using a whole load of binary switches like this, but I really want to just give a 3D matrix CPT as the input as in BNT and other bayes net libraries. Is this currently possible ?
You can code a three-way switch using two individual switches:
tt.switch(sprinker == 0,
10
tt.switch(sprinker == 1, 1, -4))
But in general it is probably better to index into a table:
table = tt.constant(np.array([[...], [...]]))
value = table[rain, sprinker]

Train neural network with sine function

I want to train a neural network with the sine() function.
Currently I use this code and the (cerebrum gem):
require 'cerebrum'
input = Array.new
300.times do |i|
inputH = Hash.new
inputH[:input]=[i]
sinus = Math::sin(i)
inputH[:output] = [sinus]
input.push(inputH)
end
network = Cerebrum.new
network.train(input, {
error_threshold: 0.00005,
iterations: 40000,
log: true,
log_period: 1000,
learning_rate: 0.3
})
res = Array.new
300.times do |i|
result = network.run([i])
res.push(result[0])
end
puts "#{res}"
But it does not work, if I run the trained network I get some weird output values (instead of getting a part of the sine curve).
So, what I am doing wrong?
Cerebrum is a very basic and slow NN implementation. There are better options in Ruby, such as ruby-fann gem.
Most likely your problem is the network is too simple. You have not specified any hidden layers - it looks like the code assigns a default hidden layer with 3 neurons in it for your case.
Try something like:
network = Cerebrum.new({
learning_rate: 0.01,
momentum: 0.9,
hidden_layers: [100]
})
and expect it to take forever to train, plus still not be very good.
Also, your choice of 300 outputs is too broad - to the network it will look mostly like noise and it won't interpolate well between points. A neural network does not somehow figure out "oh, that must be a sine wave" and match to it. Instead it interpolates between the points - the clever bit happens when it does so in multiple dimensions at once, perhaps finding structure that you could not spot so easily with a manual inspection. To give it a reasonable chance of learning something, I suggest you give it much denser points e.g. where you currently have sinus = Math::sin(i) instead use:
sinus = Math::sin(i.to_f/10)
That's still almost 5 iterations through the sine wave. Which should hopefully be enough to prove that the network can learn an arbitrary function.

Resources