PyMC for Model Averaging - pymc

I am interested in applying PyMC to model averaging. My goal is to estimate many linear models and average estimates across them, weighting by their posterior model probabilities. I am currently using the Bayesian Information Criterion (BIC) to approximate the likelihood of my data (therefore, my analysis is not fully Bayesian). I have successfully simulated a Markov Chain of models using one of my own scripts but I want to use PyMC because it seems like a great tool.
In my attempts thus far, I have not been forming the Markov Chain correctly. I am not visiting models with higher posterior weights more often than others. I will include the example code below. Please also see the IPython notebook here! on github for the math markup and code together.
import numpy as np
from pymc import stochastic, DiscreteMetropolis, MCMC
import statsmodels.api as sm
import pandas as pd
import random
def pack(alist, rank):
binary = [str(1) if i in alist else str(0) for i in xrange(0,rank)]
string = '0b1'+''.join(binary)
return int(string, 2)
def unpack(integer):
string = bin(integer)[3:]
return [int(i) for i in xrange(len(string)) if string[i]=='1']
def make_bma():
# Simulating Data
size = 100
rank = 20
X = 10*np.random.randn(size, rank)
error = 30*np.random.randn(size,1)
coefficients = np.array([10, 2, 2, 2, 2, 2]).reshape((6,1))
y = np.dot(sm.add_constant(X[:,:5], prepend=True), coefficients) + error
# Number of allowable regressors
predictors = [3,4,5,6,7]
#stochastic(dtype=int)
def regression_model():
def logp(value):
columns = unpack(value)
x = sm.add_constant(X[:,columns], prepend=True)
corr = np.corrcoef(x[:,1:], rowvar=0)
prior = np.linalg.det(corr)
ols = sm.OLS(y,x).fit()
posterior = np.exp(-0.5*ols.bic)*prior
return np.log(posterior)
def random():
k = np.random.choice(predictors)
columns = sorted(np.random.choice(xrange(0,rank), size=k, replace=False))
return pack(columns, rank)
class ModelMetropolis(DiscreteMetropolis):
def __init__(self, stochastic):
DiscreteMetropolis.__init__(self, stochastic)
def propose(self):
'''considers a neighborhood around the previous model,
defined as having one regressor removed or added, provided
the total number of regressors coincides with predictors
'''
# Building set of neighboring models
last = unpack(self.stochastic.value)
last_indicator = np.zeros(rank)
last_indicator[last] = 1
last_indicator = last_indicator.reshape((-1,1))
neighbors = abs(np.diag(np.ones(rank)) - last_indicator)
neighbors = neighbors[:,np.any([neighbors.sum(axis=0) == i \
for i in predictors], axis=0)]
neighbors = pd.DataFrame(neighbors)
# Drawing one model at random from the neighborhood
draw = random.choice(xrange(neighbors.shape[1]))
self.stochastic.value = pack(list(neighbors[draw][neighbors[draw]==1].index), rank)
# def step(self):
#
# logp_p = self.stochastic.logp
#
# self.propose()
#
# logp = self.stochastic.logp
#
# if np.log(random.random()) > logp_p - logp:
#
# self.reject()
return locals()
if __name__ == '__main__':
model = make_bma()
M = MCMC(model)
M.use_step_method(model['ModelMetropolis'], model['regression_model'])
M.sample(iter=5000, burn=1000, thin=1)
model_chain = M.trace("regression_model")[:]
from collections import Counter
counts = Counter(model_chain).items()
counts.sort(reverse=True, key=lambda x: x[1])
for f in counts[:10]:
columns = unpack(f[0])
print('Visits:', f[1])
print(np.array([1. if i in columns else 0 for i in range(0,M.rank)]))
print(M.coefficients.flatten())
X = sm.add_constant(M.X[:, columns], prepend=True)
corr = np.corrcoef(X[:,1:], rowvar=0)
prior = np.linalg.det(corr)
fit = sm.OLS(model['y'],X).fit()
posterior = np.exp(-0.5*fit.bic)*prior
print(fit.params)
print('R-squared:', fit.rsquared)
print('BIC', fit.bic)
print('Prior', prior)
print('Posterior', posterior)
print(" ")

It sounds like you are trying to do something akin to reversible jump MCMC, where you are sampling from the model space in addition to the parameter space(s). PyMC does not currently do rjMCMC, though it probably ought to. The trick is to account for the change in dimension when moving among models. If you do have a modest number of models, you can use an indicator function to select from the models, all of which are fit simultaneously.

Related

Why do the trace values have periods of (unwanted) stability?

I have a fairly simple test data set I am trying to fit with pymc3.
The result generated by traceplot looks something like this.
Essentially the trace of all parameter look like there is a standard 'caterpillar' for 100 iterations, followed by a flat line for 750 iterations, followed by the caterpillar again.
The initial 100 iterations happen after 25,000 ADVI iterations, and 10,000 tune iterations. If I change these amounts, I randomly will/won't have these periods of unwanted stability.
I'm wondering if anyone has any advice about how I can stop this from happening - and what is causing it?
Thanks.
The full code is below. In brief, I am generating a set of 'phases' (-pi -> pi) with a corresponding set of values y = a(j)*sin(phase) + b(j)*sin(phase). a and b are drawn for each subject j at random, but are related to each other.
I then essentially try to fit this same model.
Edit: Here is a similar example, running for 25,000 iterations. Something goes wrong around iteration 20,000.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
%matplotlib inline
np.random.seed(0)
n_draw = 2000
n_tune = 10000
n_init = 25000
init_string = 'advi'
target_accept = 0.95
##
# Generate some test data
# Just generates:
# x a vector of phases
# y a vector corresponding to some sinusoidal function of x
# subject_idx a vector corresponding to which subject x is
#9 Subjects
N_j = 9
#Each with 276 measurements
N_i = 276
sigma_y = 1.0
mean = [0.1, 0.1]
cov = [[0.01, 0], [0, 0.01]] # diagonal covariance
x_sub = np.zeros((N_j,N_i))
y_sub = np.zeros((N_j,N_i))
y_true_sub = np.zeros((N_j,N_i))
ab_sub = np.zeros((N_j,2))
tuning_sub = np.zeros((N_j,1))
sub_ix_sub = np.zeros((N_j,N_i))
for j in range(0,N_j):
aj,bj = np.random.multivariate_normal(mean, cov)
#aj = np.abs(aj)
#bj = np.abs(bj)
xj = np.random.uniform(-1,1,size = (N_i,1))*np.pi
xj = np.sort(xj)#for convenience
yj_true = aj*np.sin(xj) + bj*np.cos(xj)
yj = yj_true + np.random.normal(scale=sigma_y, size=(N_i,1))
x_sub[j,:] = xj.ravel()
y_sub[j,:] = yj.ravel()
y_true_sub[j,:] = yj_true.ravel()
ab_sub[j,:] = [aj,bj]
tuning_sub[j,:] = np.sqrt(((aj**2)+(bj**2)))
sub_ix_sub[j,:] = [j]*N_i
x = np.ravel(x_sub)
y = np.ravel(y_sub)
subject_idx = np.ravel(sub_ix_sub)
subject_idx = np.asarray(subject_idx, dtype=int)
##
# Fit model
hb1_model = pm.Model()
with hb1_model:
# Hyperpriors
hb1_mu_a = pm.Normal('hb1_mu_a', mu=0., sd=100)
hb1_sigma_a = pm.HalfCauchy('hb1_sigma_a', 4)
hb1_mu_b = pm.Normal('hb1_mu_b', mu=0., sd=100)
hb1_sigma_b = pm.HalfCauchy('hb1_sigma_b', 4)
# We fit a mixture of a sine and cosine with these two coeffieicents
# allowed to be different for each subject
hb1_aj = pm.Normal('hb1_aj', mu=hb1_mu_a, sd=hb1_sigma_a, shape = N_j)
hb1_bj = pm.Normal('hb1_bj', mu=hb1_mu_b, sd=hb1_sigma_b, shape = N_j)
# Model error
hb1_eps = pm.HalfCauchy('hb1_eps', 5)
hb1_linear = hb1_aj[subject_idx]*pm.math.sin(x) + hb1_bj[subject_idx]*pm.math.cos(x)
hb1_linear_like = pm.Normal('y', mu = hb1_linear, sd=hb1_eps, observed=y)
with hb1_model:
hb1_trace = pm.sample(draws=n_draw, tune = n_tune,
init = init_string, n_init = n_init,
target_accept = target_accept)
pm.traceplot(hb1_trace)
To partially answer my own question: After playing with this for a while, it looks like the problem might be due to the hyperprior standard deviation going to 0. I am not sure why the algorithm should get stuck there though (testing a small standard deviation can't be that uncommon...).
In any case, two solutions that seem to alleviate the problem (although they don't remove it entirely) are:
1) Add an offset to the definitions of the standard deviation. e.g.:
offset = 1e-2
hb1_sigma_a = offset + pm.HalfCauchy('hb1_sigma_a', 4)
2) Instead of using a HalfCauchy or HalfNormal for the SD prior, use a logNormal distribution set so that 0 is unlikely.
I'd look at the divergencies, as explained in notes and literature on Hamiltonian Monte Carlo, see, e.g., here and here.
with model:
np.savetxt('diverging.csv', hb1_trace['diverging'])
As a dirty solution, you can try to increase target_accept, perhaps.
Good luck!

What does it mean when train and validation loss diverge from epoch 1?

I was recently working on a deep learning model in Keras and it gave me very perplexing results. The model is capable of mastering the training data over time, but it consistently gets worse results on the validation data.
I know that if the validation accuracy goes up for a while and then starts to decrease that you are over-fitting to the training data, but in this case, the validation accuracy only ever decreases. I am really confused why this happens. Does anyone have any intuition as to what could cause this to happen? Or any suggestions on things to test to potentially fix it?
Edit to add more info and code
Ok. So I am making a model that is trying to do some basic stock predictions. By looking at the open, high, low, close, and volume of the last 40 days, the model tries to predict whether or not the price will go up two average true ranges without going down one average true range. As input, I took CSVs from Yahoo Finance that include this information for the last 30 years for all of the stocks in the Dow Jones Industrial Average. The model trains on 70% of the stocks and validates on the other 20%. This leads to about 150,000 training samples. I am currently using a 1d Convolutional Neural Network, but I have also tried other smaller models (logistic regression and small Feed Forward NN) and I always get the same either diverging train and validation loss or nothing learned at all because the model is too simple.
Here is the code:
import numpy as np
from sklearn import preprocessing
from sklearn.metrics import auc, roc_curve, roc_auc_score
from keras.layers import Input, Dense, Flatten, Conv1D, Activation, MaxPooling1D, Dropout, Concatenate
from keras.models import Model
from keras.callbacks import ModelCheckpoint, EarlyStopping, Callback
from keras import backend as K
import matplotlib.pyplot as plt
from random import seed, shuffle
from os import listdir
class roc_auc(Callback):
def on_train_begin(self, logs={}):
self.aucs = []
def on_train_end(self, logs={}):
return
def on_epoch_begin(self, epoch, logs={}):
return
def on_epoch_end(self, epoch, logs={}):
y_pred = self.model.predict(self.validation_data[0])
self.aucs.append(roc_auc_score(self.validation_data[1], y_pred))
if max(self.aucs) == self.aucs[-1]:
model.save_weights("weights.roc_auc.hdf5")
print(" - auc: %0.4f" % self.aucs[-1])
return
def on_batch_begin(self, batch, logs={}):
return
def on_batch_end(self, batch, logs={}):
return
rrr = 2
epochs = 200
batch_size = 64
days_input = 40
seed(42)
X_train = []
X_test = []
y_train = []
y_test = []
files = listdir("Stocks")
total_stocks = len(files)
shuffle(files)
for x, file in enumerate(files):
test = False
if (x+1.0)/total_stocks > 0.7:
test = True
if test:
print("Test -> Stocks/%s" % file)
else:
print("Train -> Stocks/%s" % file)
stock = np.loadtxt(open("Stocks/"+file, "r"), delimiter=",", skiprows=1, usecols = (1,2,3,5,6))
atr = []
last = None
for day in stock:
if last is None:
tr = abs(day[1] - day[2])
atr.append(tr)
else:
tr = max(day[1] - day[2], abs(last[3] - day[1]), abs(last[3] - day[2]))
atr.append((13*atr[-1]+tr)/14)
last = day.copy()
stock = np.insert(stock, 5, atr, axis=1)
for i in range(days_input,stock.shape[0]-1):
input = stock[i-days_input:i, 0:5].copy()
for j, day in enumerate(input):
input[j][1] = (day[1]-day[0])/day[0]
input[j][2] = (day[2]-day[0])/day[0]
input[j][3] = (day[3]-day[0])/day[0]
input[:,0] = input[:,0] / np.linalg.norm(input[:,0])
input[:,1] = input[:,1] / np.linalg.norm(input[:,1])
input[:,2] = input[:,2] / np.linalg.norm(input[:,2])
input[:,3] = input[:,3] / np.linalg.norm(input[:,3])
input[:,4] = input[:,4] / np.linalg.norm(input[:,4])
preprocessing.scale(input, copy=False)
output = -1
buy = stock[i][1]
stoploss = buy - stock[i][5]
target = buy + rrr*stock[i][5]
for j in range(i+1, stock.shape[0]):
if stock[j][0] < stoploss or stock[j][2] < stoploss:
output = 0
break
elif stock[j][1] > target:
output = 1
break
if output != -1:
if test:
X_test.append(input)
y_test.append(output)
else:
X_train.append(input)
y_train.append(output)
shape = list(X_train[0].shape)
shape[:0] = [len(X_train)]
X_train = np.concatenate(X_train).reshape(shape)
y_train = np.array(y_train)
shape = list(X_test[0].shape)
shape[:0] = [len(X_test)]
X_test = np.concatenate(X_test).reshape(shape)
y_test = np.array(y_test)
print("Train class split is %0.2f" % (100*np.average(y_train)))
print("Test class split is %0.2f" % (100*np.average(y_test)))
inputs = Input(shape=(days_input,5))
x = Conv1D(32, 5, padding='same')(inputs)
x = Activation('relu')(x)
x = MaxPooling1D()(x)
x = Conv1D(64, 5, padding='same')(x)
x = Activation('relu')(x)
x = MaxPooling1D()(x)
x = Conv1D(128, 5, padding='same')(x)
x = Activation('relu')(x)
x = MaxPooling1D()(x)
x = Flatten()(x)
x = Dense(128, activation="relu")(x)
x = Dense(64, activation="relu")(x)
output = Dense(1, activation="sigmoid")(x)
model = Model(inputs=inputs,outputs=output)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
filepath="weights.best.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=0, save_best_only=True, mode='max')
auc_hist = roc_auc()
callbacks_list = [checkpoint, auc_hist]
history = model.fit(X_train, y_train, validation_data=(X_test,y_test) , epochs=epochs, callbacks=callbacks_list, batch_size=batch_size, class_weight ='balanced').history
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
model.save_weights("weights.latest.hdf5")
model.load_weights("weights.roc_auc.hdf5")
plt.plot(history['acc'])
plt.plot(history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
plt.plot(history['loss'])
plt.plot(history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
plt.plot(auc_hist.aucs)
plt.title('model ROC AUC')
plt.ylabel('AUC')
plt.xlabel('epoch')
plt.show()
y_pred = model.predict(X_train)
fpr, tpr, _ = roc_curve(y_train, y_pred)
roc_auc = auc(fpr, tpr)
plt.subplot(1, 2, 1)
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy',linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Train ROC')
plt.legend(loc="lower right")
y_pred = model.predict(X_test)
fpr, tpr, thresholds = roc_curve(y_test, y_pred)
roc_auc = auc(fpr, tpr)
plt.subplot(1, 2, 2)
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy',linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Test ROC')
plt.legend(loc="lower right")
plt.show()
with open('roc.csv','w+') as file:
for i in range(len(thresholds)):
file.write("%f,%f,%f\n" % (fpr[i], tpr[i], thresholds[i]))
Results by 100 batches instead of by epoch
I listened to suggestions and made a few updates. The classes are now balanced 50% to 50% instead of 25% to 75%. Also, the validation data is randomly selected now instead of being a specific set of stocks. By graphing the loss and accuracy at a finer resolution(100 batches vs 1 epoch), the over-fitting can clearly be seen. The model does actually start to learn at the very beginning before it starts to diverge. I am surprised at how fast it starts to over-fit, but now that I can see the issue hopefully I can debug it.
Possible explanations
Coding error
Overfitting due to differences in the training / validation data
Skewed classes (and differences in the training / validation data)
Things I would try
Swapping the training and the validation set. Does the problem still occur?
Plot the curves in more detail for the first ~10 epochs (e.g. directly after initialization; each few training iterations, not only per epoch). Do you still start at > 75%? Then your classes might be skewed and you might also want to check if your training-validation split is stratified.
Code
This is useless: np.concatenate(X_train)
Make your code as readable as possible when you post it here. This includes removing lines which are commented out.
This looks suspicious for a coding error to me:
if test:
X_test.append(input)
y_test.append(output)
else:
#if((output == 0 and np.average(y_train) > 0.5) or output == 1):
X_train.append(input)
y_train.append(output)
use sklearn.model_selection.train_test_split instead. Do all transformations to the data before, then make the split with this method.
Looks like the batch size is much too small for the number of training samples you have. Try batching 20% and see if that makes a difference.

Tensorflow tf.nn.in_top_k Error targets[0] is out of range

I have a tensorflow program with four output labels. I trained the model and am now evaluating separate data with it.
The issue is that after I use the code
import tensorflow as tf
import main
import Process
import Input
eval_dir = "/Users/Zanhuang/Desktop/NNP/model.ckpt-30"
checkpoint_dir = "/Users/Zanhuang/Desktop/NNP/checkpoint"
def evaluate():
with tf.Graph().as_default() as g:
images, labels = Process.eval_inputs()
forward_propgation_results = Process.forward_propagation(images)
init_op = tf.initialize_all_variables()
saver = tf.train.Saver()
top_k_op = tf.nn.in_top_k(forward_propgation_results, labels, 1)
with tf.Session(graph=g) as sess:
sess.run(init_op)
saver.restore(sess, eval_dir)
tf.train.start_queue_runners(sess=sess)
print(sess.run(top_k_op))
def main(argv=None):
evaluate()
if __name__ == '__main__':
tf.app.run()
In total, I only have one class.
My code for the error rate, where I introduce the labels in a one hot matrix is here:
def error(forward_propagation_results, labels):
labels = tf.one_hot(labels, 4)
tf.transpose(labels)
labels = tf.cast(labels, tf.float32)
mean_squared_error = tf.square(tf.sub(labels, forward_propagation_results))
cost = tf.reduce_mean(mean_squared_error)
train = tf.train.GradientDescentOptimizer(learning_rate = 0.05).minimize(cost)
tf.histogram_summary('accuracy', mean_squared_error)
tf.add_to_collection('losses', cost)
tf.scalar_summary('LOSS', cost)
return train, cost
The problem is invalid data in your labels tensor. From your comment, the labels tensor is a vector containing a single value: [40]. The value 40 is larger than the number of columns in the forward_propagation_result (which is 4).
The tf.nn.in_top_k(predictions, targets, k) op has the following behavior:
For each row predictions[i, :]:
result[i] is true if predictions[i, targets[i]] is one of the k largest elements in that row; otherwise it is false.
There is no value predictions[0, 40], because (as your comment shows) that argument is a 1 x 4 matrix. Therefore TensorFlow gives you an out of range error. This suggests that either your evaluation data are wrong, or you should be using a different evaluation function.

Dirichlet process in PyMC 3

I would like to implement to implement the Dirichlet process example referenced in
Implementing Dirichlet processes for Bayesian semi-parametric models (source: here) in PyMC 3.
In the example the stick-breaking probabilities are computed using the pymc.deterministic
decorator:
v = pymc.Beta('v', alpha=1, beta=alpha, size=N_dp)
#pymc.deterministic
def p(v=v):
""" Calculate Dirichlet probabilities """
# Probabilities from betas
value = [u*np.prod(1-v[:i]) for i,u in enumerate(v)]
# Enforce sum to unity constraint
value[-1] = 1-sum(value[:-1])
return value
z = pymc.Categorical('z', p, size=len(set(counties)))
How would you implement this in PyMC 3 which is using Theano for the gradient computation?
edit:
I tried the following solution using the theano.scan method:
with pm.Model() as mod:
conc = Uniform('concentration', lower=0.5, upper=10)
v = Beta('v', alpha=1, beta=conc, shape=n_dp)
p, updates = theano.scan(fn=lambda stick, idx: stick * t.prod(1 - v[:idx]),
outputs_info=None,
sequences=[v, t.arange(n_dp)])
t.set_subtensor(p[-1], 1 - t.sum(p[:-1]))
category = Categorical('category', p, shape=n_algs)
sd = Uniform('precs', lower=0, upper=20, shape=n_dp)
means = Normal('means', mu=0, sd=100, shape=n_dp)
points = Normal('obs',
means[category],
sd=sd[category],
observed=data)
step1 = pm.Slice([conc, v, sd, means])
step3 = pm.ElemwiseCategoricalStep(var=category, values=range(n_dp))
trace = pm.sample(2000, step=[step1, step3], progressbar=True)
Which sadly is really slow and does not obtain the original parameters of the synthetic data.
Is there a better solution and is this even correct?
Not sure I have a good answer but perhaps this could be sped up by instead using a theano blackbox op which allows you to write a distribution (or deterministic) in python code. E.g.: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/disaster_model_arbitrary_deterministic.py

PyMC: sampling step by step?

I would like to know why the sampler is incredibly slow when sampling step by step.
For example, if I run:
mcmc = MCMC(model)
mcmc.sample(1000)
the sampling is fast. However, if I run:
mcmc = MCMC(model)
for i in arange(1000):
mcmc.sample(1)
the sampling is slower (and the more it samples, the slower it is).
If you are wondering why I am asking this.. well, I need a step by step sampling because I want to perform some operations on the values of the variables after each step of the sampler.
Is there a way to speed it up?
Thank you in advance!
------------------ EDIT -------------------------------------------------------------
Here I present the specific problem in more details:
I have two models in competition and they are part of a bigger model that has a categorical variable functioning as a 'switch' between the two.
In this toy example, I have the observed vector 'Y', that could be explained by a Poisson or a Geometric distribution. The Categorical variable 'switch_model' selects the Geometric model when = 0 and the Poisson model when =1.
After each sample, if switch_model selects the Geometric model, I want the variables of the Poisson model NOT to be updated, because they are not influencing the likelihood and therefore they are just drifting away. The opposite is true if the switch_model selects the Poisson model.
Basically what I do at each step is to 'change' the value of the non-selected model by bringing it manually one step back.
I hope that my explanation and the commented code will be clear enough. Let me know if you need further details.
import numpy as np
import pymc as pm
import pandas as pd
import matplotlib.pyplot as plt
# OBSERVED VALUES
Y = np.array([0, 1, 2, 3, 8])
# PRIOR ON THE MODELS
pi = (0.5, 0.5)
switch_model = pm.Categorical("switch_model", p = pi)
# switch_model = 0 for Geometric, switch_model = 1 for Poisson
p = pm.Uniform('p', lower = 0, upper = 1) # Prior of the parameter of the geometric distribution
mu = pm.Uniform('mu', lower = 0, upper = 10) # Prior of the parameter of the Poisson distribution
# LIKELIHOOD
#pm.observed
def Ylike(value = Y, mu = mu, p = p, M = switch_model):
if M == 0:
out = pm.geometric_like(value+1, p)
elif M == 1:
out = pm.poisson_like(value, mu)
return out
model = pm.Model([Ylike, p, mu, switch_model])
mcmc = pm.MCMC(model)
n_samples = 5000
traces = {}
for var in mcmc.stochastics:
traces[str(var)] = np.zeros(n_samples)
bar = pm.progressbar.progress_bar(n_samples)
bar.update(0)
mcmc.sample(1, progress_bar=False)
for var in mcmc.stochastics:
traces[str(var)][0] = mcmc.trace(var)[-1]
for i in np.arange(1,n_samples):
mcmc.sample(1, progress_bar=False)
bar.update(i)
for var in mcmc.stochastics:
traces[str(var)][i] = mcmc.trace(var)[-1]
if mcmc.trace('switch_model')[-1] == 0: # Gemetric wins
traces['mu'][i] = traces['mu'][i-1] # One step back for the sampler of the Poisson parameter
mu.value = traces['mu'][i-1]
elif mcmc.trace('switch_model')[-1] == 1: # Poisson wins
traces['p'][i] = traces['p'][i-1] # One step back for the sampler of the Geometric parameter
p.value = traces['p'][i-1]
print '\n\n'
traces=pd.DataFrame(traces)
traces['mu'][traces['switch_model'] == 0] = np.nan
traces['p'][traces['switch_model'] == 1] = np.nan
print traces.describe()
traces.plot()
plt.show()
The reason this is so slow is that Python's for loops are pretty slow, especially when they are compared to FORTRAN loops (Which is what PyMC is written in basically.) If you could show more detailed code, it might be easier to see what you are trying to do and to provide faster alternative algorithms.
Actually I found a 'crazy' solution, and I have the suspect to know why it works. I would still like to get an expert opinion on my trick.
Basically if I modify the for loop in the following way, adding a 'reset of the mcmc' every 1000 loops, the sampling fires up again:
for i in np.arange(1,n_samples):
mcmc.sample(1, progress_bar=False)
bar.update(i)
for var in mcmc.stochastics:
traces[str(var)][i] = mcmc.trace(var)[-1]
if mcmc.trace('switch_model')[-1] == 0: # Gemetric wins
traces['mu'][i] = traces['mu'][i-1] # One step back for the sampler of the Poisson parameter
mu.value = traces['mu'][i-1]
elif mcmc.trace('switch_model')[-1] == 1: # Poisson wins
traces['p'][i] = traces['p'][i-1] # One step back for the sampler of the Geometric parameter
p.value = traces['p'][i-1]
if i%1000 == 0:
mcmc = pm.MCMC(model)
In practice this trick erases the traces and the database of the sampler every 1000 steps. It looks like the sampler does not like having a long database, although I do not really understand why. (of course 1000 steps is arbitrary, too short it adds too much overhead, too long it will cause the traces and database to be too long).
I find this hack a bit crazy and definitely not elegant.. does any of the experts or developers have a comment on it? Thank you!

Resources