change point detection with two transitions - pymc

this question was asked previously and not answered (5 June) but maybe putting it in context makes more sense.
I have done the change point tutorial with the two lambdas and extended with 2 change point so the modelling is now:
# the exp parameter expected is the inverse of the average from sampled series
alpha = 1.0 / count_data.mean()
# regime 1 poisson
lambda_1 = pm.Exponential("lambda_1", alpha)
# regime 2 poisson
lambda_2 = pm.Exponential("lambda_2", alpha)
# regime 3 poisson
lambda_3 = pm.Exponential("lambda_3", alpha)
# change point is somewhere in between with equal probabilities
tau1 = pm.DiscreteUniform("tau1", lower=0, upper=n_count_data)
# change point is somewhere in between with equal probabilities
tau2 = pm.DiscreteUniform("tau2", lower=0, upper=n_count_data)
#pm.deterministic
def lambda_(tau1=tau1,tau2=tau2, lambda_1=lambda_1, lambda_2=lambda_2):
out = np.zeros(n_count_data)
out[:tau1] = lambda_1 # lambda before tau is lambda1
out[tau1:tau2] = lambda_2 # lambda between periods is lambda2
out[tau2:] = lambda_3 # lambda after (and including) tau2 is lambda3
return out
observation = pm.Poisson("obs", lambda_, value=count_data, observed=True)
model = pm.Model([observation, lambda_1, lambda_2, tau1,tau2])
# markov monte carlo chain
mcmc = pm.MCMC(model)
mcmc.sample(40000, 10000, 1)
The question is that in the deterministic variable how do I actually tell the model that I only need to consider when tau1 is less than tau2?
The problem is that when tau2 precedes tau1 there is a time symmetry which is computationally non necessary.
Any help is welcome.

I haven't tested it, but I think you could do something like this:
# change point is somewhere in between with equal probabilities
tau1 = pm.DiscreteUniform("tau1", lower=0, upper=n_count_data)
# change point is somewhere in between with equal probabilities
tau2 = pm.DiscreteUniform("tau2", lower=tau1, upper=n_count_data)
That way tau2 is constrained to be at least as large as tau1. You may have to think a little bit about whether tau1 and tau2 should be allowed to coincide.

The full model under the assumption of a deterministic gap between the taus follows:
# the exp parameter expected is the inverse of the average from sampled series
alpha = 1.0 / count_data.mean()
# regime 1 poisson
lambda_1 = pm.Exponential("lambda_1", alpha)
# regime 2 poisson
lambda_2 = pm.Exponential("lambda_2", alpha)
# regime 3 poisson
lambda_3 = pm.Exponential("lambda_3", alpha)
# change point is somewhere in between with equal probabilities
tau1 = pm.DiscreteUniform("tau1", lower=0, upper=n_count_data)
# change point is somewhere in between with equal probabilities
tau2 = pm.DiscreteUniform("tau2", lower=tau1+1, upper=n_count_data)
#pm.deterministic
def lambda_(tau1=tau1,tau2=tau2, lambda_1=lambda_1, lambda_2=lambda_2,lambda_3=lambda_3):
out = np.zeros(n_count_data)
out[:tau1] = lambda_1 # lambda before tau is lambda1
out[tau1:tau2] = lambda_2 # lambda between periods is lambda2
out[tau2:] = lambda_3 # lambda after (and including) tau2 is lambda3
return out
observation = pm.Poisson("obs", lambda_, value=count_data, observed=True)
model = pm.Model([observation, lambda_1, lambda_2,lambda_3, tau1,tau2])
# markov monte carlo chain
mcmc = pm.MCMC(model)
mcmc.sample(40000, 10000, 1)
lambda_1_samples = mcmc.trace('lambda_1')[:]
lambda_2_samples = mcmc.trace('lambda_2')[:]
lambda_3_samples = mcmc.trace('lambda_3')[:]
tau1_samples = mcmc.trace('tau1')[:]
tau2_samples = mcmc.trace('tau2')[:]
Will also try with the random gap and see how it goes.

If you are open to use R to solve the same inference problem, the mcp package provides a higher-level interface for change point problems. It has order-restricted change point parameters by default.
Here is a model for three intercepts (two change points)
model = list(
count ~ 1,
~ 1,
~ 1
)
library(mcp)
fit = mcp(model, data, family = poisson())
More info:
about Poisson models in mcp.
priors in mcp contains more about finer control of order-restriction.

Related

Faster way to compute distributions from Markov chain?

Suppose that I have a probability transition matrix, say a matrix of dimensions 2000x2000, that represents a homogeneous Markov chain, and I want to get some statistics of each probability distribution of the first 200 steps of the chain (the distribution of the first row at each step), then I've written the following
using Distributions, LinearAlgebra
# This function defines our transition matrix:
function tm(N::Int, n0::Int)
[pdf(Hypergeometric(N-l,l,n0),k-l) for l in 0:N, k in 0:N]
end
# This computes the 5-percentile of a probability vector
function percentile5(M::Vector)
s=0
i=0
while s <= 0.05
i += 1
s += M[i]
end
return i-1
end
# This function compute a matrix with three rows: means, 5-percentiles
# and standard deviations. Each column represent a session.
function stats(N::Int, n0::Int, m::Int)
A = tm(N,n0)
B = I # Initilizing B with the identity matrix
sup = 0:N # The support of each distribution
sup2 = [k^2 for k in sup]
stats = zeros(3,m)
for i in 1:m
C = B[1,:]
stats[1,i] = sum(C .* sup) # Mean
stats[2,i] = percentile5(C) # 5-percentile
stats[3,i] = sqrt(sum(C .* sup2) - stats[1,i]^2) # Standard deviation
B = A*B
end
return stats
end
data = stats(2000,50,200)
My question is, there is a more efficient (faster) way to do the same computation? I don't see a better way to do it but maybe there are some tricks that speed-up this computation.
This is what I have running so far:
using Distributions, LinearAlgebra, SparseArrays
# This function defines our transition matrix:
function tm(N::Int, n0::Int)
[pdf(Hypergeometric(N-l,l,n0),k-l) for l in 0:N, k in 0:N]
end
# This computes the 5-percentile of a probability vector
function percentile5(M::AbstractVector)
s = zero(eltype(M))
res = length(M)
#inbounds for i = 1:length(M)
s += M[i]
if s > 0.05
res = i - 1
break
end
end
return res
end
# This function compute a matrix with three rows: means, 5-percentiles
# and standard deviations. Each column represent a session.
function stats(N::Int, n0::Int, m::Int)
A = sparse(transpose(tm(N, n0)))
C = zeros(size(A, 1))
C[1] = 1.0
sup = 0:N # The support of each distribution
sup2 = sup .^ 2
stats = zeros(3, m)
for i = 1:m
stats[1, i] = sum(C .* sup) # Mean
stats[2, i] = percentile5(C) # 5-percentile
stats[3, i] = sqrt(sum(C .* sup2) - stats[1, i]^2) # Standard deviation
C = A * C
end
return stats
end
It is around 4x faster (on smaller parameters - possibly much more speedup on large parameters). Basically uses the tips I've made in the comment:
using sparse arrays.
avoiding whole matrix multiply but using vector-matrix multiply instead.
Further improvement are possible (like simulation/ensemble method I've mentioned).

state_cov is being ignored when using the Kalman filter in statsmodels

I am trying to create an affine term structure model derived from statsmodels.tsa.statespace.MLEModel (code below) which is initialized using least squares.
'''
class affine_term_structure(sm.tsa.statespace.MLEModel):
def __init__(self, yields, tau, k_states=3, **kwargs):
# Initialize the statespace
super(affine_term_structure, self).__init__(yields, k_states=k_states, **kwargs)
self.initialize_known(np.zeros(self.k_states), np.eye(self.k_states) * 10000)
def update(self, params, **kwargs):
params = super(dynamic_nelson_siegel, self).update(params, **kwargs)
# Extract the parameters
Phi = np.reshape(params[:9], (3, 3))
k = np.array(params[9:12])
Sx = np.zeros((3, 3))
Sx[np.tril_indices(3)] = params[12:18]
lmbd = params[18]
sy = params[-1]
b = self.nss(self.tau, lmbd)
self['transition'] = Phi # transition matrix
self['state_intercept'] = k # transition offset
self['state_cov'] = Sx # Sx.T # transition covariance. 3x3 SPD matrix
self['design'] = b # design matrix
# self['obs_intercept'] = 0 # observation intercept
self['obs_cov'] = sy * sy * np.eye(self.k_endog) # observation covariance
'''
However, I noticed that on running the filter/smoother the states were being excessively smoothed. Digging through the filtering results it seems like the state_cov is not being used in the prediction step.
For example
self.predicted_state_cov[:,:,1]
matches
self.transition[:,:,0] # self.filtered_state_cov[:,:,0] # self.transition[:,:,0].T
Though I would have expected it to be equal to
self.transition[:,:,0] # self.filtered_state_cov[:,:,0] # self.transition[:,:,0].T + self.state_cov[:,:,0]
For good order, please note that all parameter matrices are time invariant.
Im not sure what Im missing here and any help would be much appreciated.
Thanks
In Statsmodels, the state equation is:
x(t+1) = T x(t) + R eta(t+1)
where eta(t+1) ~ N(0, Q)
When you set state_cov, you're setting Q, but you also need to set R, which is selection.
For example, if you want your state equation to be:
x(t+1) = T x(t) + eta(t+1)
Then you would do something like:
self['selection'] = np.eye(3)
It is not the case that R is the identity in every state space model, and it can't even always be initialized to the identity matrix, since the dimension of x(t) and the dimension of eta(t) can be different. That's why R is not automatically initialized to the identity matrix.

How to constrain model variables in GEKKO

I like to constrain the variable value u < 1 in y model. Added ub=1 to the variable definition u = m.Var(name='u', value=0, lb=-2, ub=1) but it resulted in "No soulution found" (EXIT: Converged to a point of local infeasibility. Problem may be infeasible.). I guess I have to reformulate the problem to avoid this, but I have not been able to find examples how this should be done. How do i write a proper model to avoid infeasible solutions when constraining variable values?
I hav tied to reformulate the problem by adding equation like m.Equation(u < 1) with no success.
import numpy as np
from gekko import GEKKO
import matplotlib.pyplot as pyplt
m = GEKKO(remote=False)
t = np.linspace(0, 1000, 101) # time
d = np.ones(t.shape)
d[0:10] = 0
y_delay=0
# Add data to model
m.time = t
K = m.Const(0.01, name='K')
r = m.Const(name='r', value=0) # Reference
d = m.Param(name='d', value=d) # Disturbance
y = m.Var(name='y', value=0, lb=-2, ub=2) # State variable
u = m.Var(name='u', value=0, lb=-2, ub=1) # Output
e = m.Var(name='e', value=0)
Tc = m.FV(name='Tc', value=1200, lb=60, ub=1200) # time constant
# Update variable status
Tc.STATUS = 1 # Optimizer can adjust value
Kp = m.Intermediate(1 / K * 1 / Tc, name='Kp')
Ti = m.Intermediate(4 * Tc, name='Ti')
# Model equations
m.Equations([y.dt() == K * (u-d),
e == r-y,
u.dt() == Kp*e.dt()+Kp/Ti*e])
# Model constraints
m.Equation(y < 0.5)
m.Equation(y > -0.5)
# Model objective
m.Obj(-Tc)
# options
m.options.IMODE = 6 # Problem type: 6 = Dynamic optimization
# solve
m.solve(disp=True, debug=True)
print('Tc: %6.2f [s]' % (Tc.value[-1], ))
fig1, (ax1, ax2, ax3) = pyplt.subplots(3, sharex='all')
ax1.plot(t, y.value)
ax1.set_ylabel("y", fontsize=8), ax1.grid(True, which='both')
ax2.plot(t, e.value)
ax2.set_ylabel("e", fontsize=8), ax2.grid(True, which='both')
ax3.plot(t, u.value)
ax3.plot(t, d.value)
ax3.set_ylabel("u and d", fontsize=8), ax3.grid(True, which='both')
pyplt.show()
EXIT: Converged to a point of local infeasibility. Problem may be infeasible.
An error occured.
The error code is 2
If I change the upper bound of u to 2, the optimization problem is solved as expected.
Hard constraints on variables can lead to an infeasible solution, as you observed. I recommend that you use soft constraints by specifying the variable y as a Controlled Variable and set an upper and lower set point range with SPHI and SPLO.
y = m.CV(name='y', value=0) # Controlled variable
y.STATUS = 1
y.TR_INIT = 0
y.SPHI = 0.5
y.SPLO = -0.5
I also removed the lb and ub from y and u to not give them hard bounds that can lead to the infeasibility. You also have an objective to maximize the value of Tc with m.Obj(-Tc). It goes to the maximum limit: 1200 when the solver is able to adjust the value. As you can see from the plot, the value of y exceeds the setpoint range. It may not be possible for the controller to keep it within that range. A soft constraint (objective based) approach to constraints penalizes deviations but does not lead to an infeasible solution. If you need to increase the penalty on violations of the SPHI or SPLO, the parameters WSPHI and WSPLO can be adjusted.
It appears that you have a first order dynamic model and you are trying to optimize PID parameters. If you need to model saturation of the controller output (actuator) then the if3, max3, min3 or corresponding if2, max2, min2 functions may be useful. There is more information on CV objectives and tuning in the Dynamic Optimization course.
Here is a feasible solution to your problem:
import numpy as np
from gekko import GEKKO
import matplotlib.pyplot as pyplt
m = GEKKO() # remote=False
t = np.linspace(0, 1000, 101) # time
d = np.ones(t.shape)
d[0:10] = 0
y_delay=0
# Add data to model
m.time = t
K = m.Const(0.01, name='K')
r = m.Const(name='r', value=0) # Reference
d = m.Param(name='d', value=d) # Disturbance
e = m.Var(name='e', value=0)
u = m.Var(name='u', value=0) # Output
Tc = m.FV(name='Tc', value=1200, lb=60, ub=1200) # time constant
y = m.CV(name='y', value=0) # Controlled variable
y.STATUS = 1
y.TR_INIT = 0
y.SPHI = 0.5
y.SPLO = -0.5
# Update variable status
Tc.STATUS = 1 # Optimizer can adjust value
Kp = m.Intermediate((1 / K) * (1 / Tc), name='Kp')
Ti = m.Intermediate(4 * Tc, name='Ti')
# Model equations
m.Equations([y.dt() == K * (u-d),
e == r-y,
u.dt() == Kp*e.dt()+(Kp/Ti)*e])
# Model constraints
#m.Equation(y < 0.5)
#m.Equation(y > -0.5)
# Model objective
m.Obj(-Tc)
# options
m.options.IMODE = 6 # Problem type: 6 = Dynamic optimization
m.options.SOLVER = 3
m.options.MAX_ITER = 1000
# solve
m.solve(disp=True, debug=True)
print('Tc: %6.2f [s]' % (Tc.value[-1], ))
fig1, (ax1, ax2, ax3) = pyplt.subplots(3, sharex='all')
ax1.plot(t, y.value)
ax1.plot([min(t),max(t)],[0.5,0.5],'k--')
ax1.plot([min(t),max(t)],[-0.5,-0.5],'k--')
ax1.set_ylabel("y", fontsize=8), ax1.grid(True, which='both')
ax2.plot(t, e.value)
ax2.set_ylabel("e", fontsize=8), ax2.grid(True, which='both')
ax3.plot(t, u.value)
ax3.plot(t, d.value)
ax3.set_ylabel("u and d", fontsize=8), ax3.grid(True, which='both')
pyplt.show()
Thanks for an extensive and useful answer to my question. I really appreciate it.
As you correctly observed I am trying to optimize tuning parameters for my simple control problem. I have executed your code with soft constraints, and it sure solves the feasibility issue. I also added the WSPHI/LO parameters and set their values high to have a solution within the constraints. Still, I like to have a model where the control output (“u”) is bounded [0,1]. Based on your answer I probably must add “if” or “max/min” statements in the model to avoid having a non-feasible set of equations when “u” hits the bound. Something like “if u<0, u.dt() = 0 else u.dt() = Kp*e ….”. Could it alternatively be possible to add a variable (a type slack variable) to ensure feasibility of the equation set? I will also investigate the material in the dynamic optimization course links to get a better understanding of dynamic modelling. Thanks again for guiding me in the right direction in this issue.

Why do the trace values have periods of (unwanted) stability?

I have a fairly simple test data set I am trying to fit with pymc3.
The result generated by traceplot looks something like this.
Essentially the trace of all parameter look like there is a standard 'caterpillar' for 100 iterations, followed by a flat line for 750 iterations, followed by the caterpillar again.
The initial 100 iterations happen after 25,000 ADVI iterations, and 10,000 tune iterations. If I change these amounts, I randomly will/won't have these periods of unwanted stability.
I'm wondering if anyone has any advice about how I can stop this from happening - and what is causing it?
Thanks.
The full code is below. In brief, I am generating a set of 'phases' (-pi -> pi) with a corresponding set of values y = a(j)*sin(phase) + b(j)*sin(phase). a and b are drawn for each subject j at random, but are related to each other.
I then essentially try to fit this same model.
Edit: Here is a similar example, running for 25,000 iterations. Something goes wrong around iteration 20,000.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
%matplotlib inline
np.random.seed(0)
n_draw = 2000
n_tune = 10000
n_init = 25000
init_string = 'advi'
target_accept = 0.95
##
# Generate some test data
# Just generates:
# x a vector of phases
# y a vector corresponding to some sinusoidal function of x
# subject_idx a vector corresponding to which subject x is
#9 Subjects
N_j = 9
#Each with 276 measurements
N_i = 276
sigma_y = 1.0
mean = [0.1, 0.1]
cov = [[0.01, 0], [0, 0.01]] # diagonal covariance
x_sub = np.zeros((N_j,N_i))
y_sub = np.zeros((N_j,N_i))
y_true_sub = np.zeros((N_j,N_i))
ab_sub = np.zeros((N_j,2))
tuning_sub = np.zeros((N_j,1))
sub_ix_sub = np.zeros((N_j,N_i))
for j in range(0,N_j):
aj,bj = np.random.multivariate_normal(mean, cov)
#aj = np.abs(aj)
#bj = np.abs(bj)
xj = np.random.uniform(-1,1,size = (N_i,1))*np.pi
xj = np.sort(xj)#for convenience
yj_true = aj*np.sin(xj) + bj*np.cos(xj)
yj = yj_true + np.random.normal(scale=sigma_y, size=(N_i,1))
x_sub[j,:] = xj.ravel()
y_sub[j,:] = yj.ravel()
y_true_sub[j,:] = yj_true.ravel()
ab_sub[j,:] = [aj,bj]
tuning_sub[j,:] = np.sqrt(((aj**2)+(bj**2)))
sub_ix_sub[j,:] = [j]*N_i
x = np.ravel(x_sub)
y = np.ravel(y_sub)
subject_idx = np.ravel(sub_ix_sub)
subject_idx = np.asarray(subject_idx, dtype=int)
##
# Fit model
hb1_model = pm.Model()
with hb1_model:
# Hyperpriors
hb1_mu_a = pm.Normal('hb1_mu_a', mu=0., sd=100)
hb1_sigma_a = pm.HalfCauchy('hb1_sigma_a', 4)
hb1_mu_b = pm.Normal('hb1_mu_b', mu=0., sd=100)
hb1_sigma_b = pm.HalfCauchy('hb1_sigma_b', 4)
# We fit a mixture of a sine and cosine with these two coeffieicents
# allowed to be different for each subject
hb1_aj = pm.Normal('hb1_aj', mu=hb1_mu_a, sd=hb1_sigma_a, shape = N_j)
hb1_bj = pm.Normal('hb1_bj', mu=hb1_mu_b, sd=hb1_sigma_b, shape = N_j)
# Model error
hb1_eps = pm.HalfCauchy('hb1_eps', 5)
hb1_linear = hb1_aj[subject_idx]*pm.math.sin(x) + hb1_bj[subject_idx]*pm.math.cos(x)
hb1_linear_like = pm.Normal('y', mu = hb1_linear, sd=hb1_eps, observed=y)
with hb1_model:
hb1_trace = pm.sample(draws=n_draw, tune = n_tune,
init = init_string, n_init = n_init,
target_accept = target_accept)
pm.traceplot(hb1_trace)
To partially answer my own question: After playing with this for a while, it looks like the problem might be due to the hyperprior standard deviation going to 0. I am not sure why the algorithm should get stuck there though (testing a small standard deviation can't be that uncommon...).
In any case, two solutions that seem to alleviate the problem (although they don't remove it entirely) are:
1) Add an offset to the definitions of the standard deviation. e.g.:
offset = 1e-2
hb1_sigma_a = offset + pm.HalfCauchy('hb1_sigma_a', 4)
2) Instead of using a HalfCauchy or HalfNormal for the SD prior, use a logNormal distribution set so that 0 is unlikely.
I'd look at the divergencies, as explained in notes and literature on Hamiltonian Monte Carlo, see, e.g., here and here.
with model:
np.savetxt('diverging.csv', hb1_trace['diverging'])
As a dirty solution, you can try to increase target_accept, perhaps.
Good luck!

PyMC: sampling step by step?

I would like to know why the sampler is incredibly slow when sampling step by step.
For example, if I run:
mcmc = MCMC(model)
mcmc.sample(1000)
the sampling is fast. However, if I run:
mcmc = MCMC(model)
for i in arange(1000):
mcmc.sample(1)
the sampling is slower (and the more it samples, the slower it is).
If you are wondering why I am asking this.. well, I need a step by step sampling because I want to perform some operations on the values of the variables after each step of the sampler.
Is there a way to speed it up?
Thank you in advance!
------------------ EDIT -------------------------------------------------------------
Here I present the specific problem in more details:
I have two models in competition and they are part of a bigger model that has a categorical variable functioning as a 'switch' between the two.
In this toy example, I have the observed vector 'Y', that could be explained by a Poisson or a Geometric distribution. The Categorical variable 'switch_model' selects the Geometric model when = 0 and the Poisson model when =1.
After each sample, if switch_model selects the Geometric model, I want the variables of the Poisson model NOT to be updated, because they are not influencing the likelihood and therefore they are just drifting away. The opposite is true if the switch_model selects the Poisson model.
Basically what I do at each step is to 'change' the value of the non-selected model by bringing it manually one step back.
I hope that my explanation and the commented code will be clear enough. Let me know if you need further details.
import numpy as np
import pymc as pm
import pandas as pd
import matplotlib.pyplot as plt
# OBSERVED VALUES
Y = np.array([0, 1, 2, 3, 8])
# PRIOR ON THE MODELS
pi = (0.5, 0.5)
switch_model = pm.Categorical("switch_model", p = pi)
# switch_model = 0 for Geometric, switch_model = 1 for Poisson
p = pm.Uniform('p', lower = 0, upper = 1) # Prior of the parameter of the geometric distribution
mu = pm.Uniform('mu', lower = 0, upper = 10) # Prior of the parameter of the Poisson distribution
# LIKELIHOOD
#pm.observed
def Ylike(value = Y, mu = mu, p = p, M = switch_model):
if M == 0:
out = pm.geometric_like(value+1, p)
elif M == 1:
out = pm.poisson_like(value, mu)
return out
model = pm.Model([Ylike, p, mu, switch_model])
mcmc = pm.MCMC(model)
n_samples = 5000
traces = {}
for var in mcmc.stochastics:
traces[str(var)] = np.zeros(n_samples)
bar = pm.progressbar.progress_bar(n_samples)
bar.update(0)
mcmc.sample(1, progress_bar=False)
for var in mcmc.stochastics:
traces[str(var)][0] = mcmc.trace(var)[-1]
for i in np.arange(1,n_samples):
mcmc.sample(1, progress_bar=False)
bar.update(i)
for var in mcmc.stochastics:
traces[str(var)][i] = mcmc.trace(var)[-1]
if mcmc.trace('switch_model')[-1] == 0: # Gemetric wins
traces['mu'][i] = traces['mu'][i-1] # One step back for the sampler of the Poisson parameter
mu.value = traces['mu'][i-1]
elif mcmc.trace('switch_model')[-1] == 1: # Poisson wins
traces['p'][i] = traces['p'][i-1] # One step back for the sampler of the Geometric parameter
p.value = traces['p'][i-1]
print '\n\n'
traces=pd.DataFrame(traces)
traces['mu'][traces['switch_model'] == 0] = np.nan
traces['p'][traces['switch_model'] == 1] = np.nan
print traces.describe()
traces.plot()
plt.show()
The reason this is so slow is that Python's for loops are pretty slow, especially when they are compared to FORTRAN loops (Which is what PyMC is written in basically.) If you could show more detailed code, it might be easier to see what you are trying to do and to provide faster alternative algorithms.
Actually I found a 'crazy' solution, and I have the suspect to know why it works. I would still like to get an expert opinion on my trick.
Basically if I modify the for loop in the following way, adding a 'reset of the mcmc' every 1000 loops, the sampling fires up again:
for i in np.arange(1,n_samples):
mcmc.sample(1, progress_bar=False)
bar.update(i)
for var in mcmc.stochastics:
traces[str(var)][i] = mcmc.trace(var)[-1]
if mcmc.trace('switch_model')[-1] == 0: # Gemetric wins
traces['mu'][i] = traces['mu'][i-1] # One step back for the sampler of the Poisson parameter
mu.value = traces['mu'][i-1]
elif mcmc.trace('switch_model')[-1] == 1: # Poisson wins
traces['p'][i] = traces['p'][i-1] # One step back for the sampler of the Geometric parameter
p.value = traces['p'][i-1]
if i%1000 == 0:
mcmc = pm.MCMC(model)
In practice this trick erases the traces and the database of the sampler every 1000 steps. It looks like the sampler does not like having a long database, although I do not really understand why. (of course 1000 steps is arbitrary, too short it adds too much overhead, too long it will cause the traces and database to be too long).
I find this hack a bit crazy and definitely not elegant.. does any of the experts or developers have a comment on it? Thank you!

Resources