What are the ways of deciding probabilities in hidden markov models? - algorithm

I am starting to learn hidden markov models and on the wiki page, as well as on github there are alot of examples but most of the probabilities are already there(70% change of rain, 30% chance of changing state, etc..). The spell checking or sentences examples, seem to study books and then rank the probabilities of words.
So does the markov model include a way of figuring out the probabilities or are we suppose to some other other model to pre-calculate it?
Sorry if this question is off. I think its straightforward how the hidden markov model selects probable sequences but the probability part is a bit grey to me(because its often provided). Examples or any info would be great.
For those not familiar with markov models, here's an example(from wikipedia) http://en.wikipedia.org/wiki/Viterbi_algorithm and http://en.wikipedia.org/wiki/Hidden_Markov_model
#!/usr/bin/env python
states = ('Rainy', 'Sunny')
observations = ('walk', 'shop', 'clean')
start_probability = {'Rainy': 0.6, 'Sunny': 0.4}
transition_probability = {
'Rainy' : {'Rainy': 0.7, 'Sunny': 0.3},
'Sunny' : {'Rainy': 0.4, 'Sunny': 0.6},
}
emission_probability = {
'Rainy' : {'walk': 0.1, 'shop': 0.4, 'clean': 0.5},
'Sunny' : {'walk': 0.6, 'shop': 0.3, 'clean': 0.1},
}
#application code
# Helps visualize the steps of Viterbi.
def print_dptable(V):
print " ",
for i in range(len(V)): print "%7s" % ("%d" % i),
print
for y in V[0].keys():
print "%.5s: " % y,
for t in range(len(V)):
print "%.7s" % ("%f" % V[t][y]),
print
def viterbi(obs, states, start_p, trans_p, emit_p):
V = [{}]
path = {}
# Initialize base cases (t == 0)
for y in states:
V[0][y] = start_p[y] * emit_p[y][obs[0]]
path[y] = [y]
# Run Viterbi for t > 0
for t in range(1,len(obs)):
V.append({})
newpath = {}
for y in states:
(prob, state) = max([(V[t-1][y0] * trans_p[y0][y] * emit_p[y][obs[t]], y0) for y0 in states])
V[t][y] = prob
newpath[y] = path[state] + [y]
# Don't need to remember the old paths
path = newpath
print_dptable(V)
(prob, state) = max([(V[len(obs) - 1][y], y) for y in states])
return (prob, path[state])
#start trigger
def example():
return viterbi(observations,
states,
start_probability,
transition_probability,
emission_probability)
print example()

You're looking for an EM (expectation maximization) algorithm to compute the unknown parameters from sets of observed sequences. Probably the most commonly used is the Baum-Welch algorithm, which uses the forward-backward algorithm.
For reference, here is a set of slides I've used previously to review HMMs. It has a nice overview of Forward-Backward, Viterbi, and Baum-Welch

Related

Finding center point given distance matrix

I have a matrix (really a loaded image) in which every element is a L2 distance from some unknown center point.
Here is a trivial example
A = [1.4142 1.0000 1.4142 2.2361]
[1.0000 0.0000 1.0000 2.0000]
[1.4142 1.0000 1.4142 2.2361]
In this case, the center is obviously at coordinate (1,1) (index A[1,1] in a 0-indexed matrix or 2D array).
However, in the case where my centers are not constrained to be integer indices, it's no longer as obvious. For example, given this matrix B, where is my center coordinate?
B = [3.0292 1.9612 2.8932 5.8252]
[1.2292 0.1612 1.0932 4.0252]
[1.4292 0.3612 1.2932 4.2252]
How would you find that the answer in this case is at row 1.034 and column 1.4?
I am aware of the trilateration solution (having provided MATLAB code to visualize that in 3D previously), but is there a more efficient way (e.g. one without a matrix inversion)?
This question is sort of language agnostic, as I am looking more for algorithmic help. If you could stick to MATLAB, Python, or C++ though in a solution, that would be great ;-).
While having no experience with similar tasks, i read some stuff and also tried something.
When unfamiliar with this topic it's hard to grasp it seems and all those resources i found are a bit chaotic.
Still unclear in regards to theory for me:
is the problem as stated above a convex-optimization problem (local-minimum = global-minimum; would mean access to powerful solvers!)
there are much more resources about more generic problems (Sensor Network
Localization), which are non-convex and where extremely complex methods have been developed
is your trilateration-approach able to exploit > 3 points (trilateration vs. multilateration; at least this code does not seem like it can which means: bad performance with noise!)
Here some example code with two approaches:
A: Convex-optimization: SOCP-Relaxation
Follows SECOND-ORDER CONE PROGRAMMING RELAXATION OF SENSOR NETWORK LOCALIZATION
Not impressive performance, but should be powerful as approximation for big-data
Guaranteed global-optimum for this relaxation!
Implemented with cvxpy
B: Nonlinear-programming optimization
Implemented using scipy.optimize
Pretty much perfect in my synthetic experiments; even good results in noisy case; despite the fact we are using numerical-differentiation (automatic-diff hard to use here)
Some additional remark:
Your example B surely has some (pretty bad) noise or some other problem in my opinion, as my approaches are completely off; while especially approach B shines for my synthetic-data (at least that's my impression)
Code:
import numpy as np
import cvxpy as cvx
from scipy.spatial.distance import cdist
from scipy.optimize import minimize
np.random.seed(1)
""" Create noise-free (not anymore!) fake-problem """
real_x = np.random.random(size=2) * 3
M, N = 5, 10
NOISE_DISTS = 0.1
pos = np.array([(i,j) for i in range(M) for j in range(N)]) # ugly -> tile/repeat/stack
real_x_stacked = np.vstack([real_x for i in range(pos.shape[0])])
Y = cdist(pos, real_x[np.newaxis])
Y += np.random.normal(size=Y.shape)*NOISE_DISTS # Let's add some noise!
print('-----')
print('PROBLEM')
print('-------')
print('real x: ', real_x)
print('dist mat: ', np.round(Y,3).T)
""" Helper """
def cost(x, Y, pos):
res = np.linalg.norm(pos - x, ord=2, axis=1) - Y.ravel()
return np.linalg.norm(res, 2)
print('cost with real_x (check vs. noisy): ', cost(real_x, Y, pos))
""" SOLVER SOCP """
def solve_socp_relax(pos, Y):
x = cvx.Variable(2)
y = cvx.Variable(pos.shape[0])
fake_stack = [x for i in range(pos.shape[0])] # hacky
objective = cvx.sum_entries(cvx.norm(y - Y))
x_stacked = cvx.reshape(cvx.vstack(*fake_stack), pos.shape[0], 2) # hacky
constraints = [cvx.norm(pos - x_stacked, 2, axis=1) <= y]
problem = cvx.Problem(cvx.Minimize(objective), constraints)
problem.solve(solver=cvx.ECOS, verbose=False)
return x.value.T
""" SOLVER NLP """
def solve_nlp(pos, Y):
sol = minimize(cost, np.zeros(pos.shape[1]), args=(Y, pos), method='BFGS')
# print(sol)
return sol.x
""" TEST """
print('-----')
print('SOLVE')
print('-----')
socp_relax_sol = solve_socp_relax(pos, Y)
print('SOCP RELAX SOL: ', socp_relax_sol)
nlp_sol = solve_nlp(pos, Y)
print('NLP SOL: ', nlp_sol)
Output:
-----
PROBLEM
-------
real x: [ 1.25106601 2.16097348]
dist mat: [[ 2.444 1.599 1.348 1.276 2.399 3.026 4.07 4.973 6.118 6.746
2.143 1.149 0.412 0.766 1.839 2.762 3.851 4.904 5.734 6.958
2.377 1.432 0.856 1.056 1.973 2.843 3.885 4.95 5.818 6.84
2.711 2.015 1.689 1.939 2.426 3.358 4.385 5.22 6.076 6.97
3.422 3.153 2.759 2.81 3.326 4.162 4.734 5.627 6.484 7.336]]
cost with real_x (check vs. noisy): 0.665125233772
-----
SOLVE
-----
SOCP RELAX SOL: [[ 1.95749275 2.00607253]]
NLP SOL: [ 1.23560791 2.16756168]
Edit: Further speedup can be achieved (especially in large-scale) in using nonlinear-least-squares instead of the more general NLP-approach! My results are still the same (as expected if the problem would be convex). Timings between NLP/NLS can look like 9 vs. 0.5 seconds!
This is my recommended method!
def solve_nls(pos, Y):
def res(x, Y, pos):
return np.linalg.norm(pos - x, ord=2, axis=1) - Y.ravel()
sol = least_squares(res, np.zeros(pos.shape[1]), args=(Y, pos), method='lm')
# print(sol)
return sol.x
Especially the second-approach (NLP) will also run for much bigger instances (cvxpy's overhead hurts; that's not a downside of the SOCP-solver which should scale much much better!).
Here some output for M, N = 500, 1000 with some more noise:
-----
PROBLEM
-------
real x: [ 12.51066014 21.6097348 ]
dist mat: [[ 24.706 23.573 23.693 ..., 1090.29 1091.216
1090.817]]
cost with real_x (check vs. noisy): 353.354267797
-----
SOLVE
-----
NLP SOL: [ 12.51082419 21.60911561]
used: 5.9552763315495625 # SECONDS
So in my experiments it works, but i won't give any global-convergence guarantees or reconstruction-guarantees (still missing some theory).
At first i though about using the global optimum of the relaxed-SOCP-problem as initial-point in the NLP-solver, but i did not find any example where this is needed!
Some just-for-fun visuals using:
M, N = 20, 30
NOISE_DISTS = 0.2
...
import matplotlib.pyplot as plt
plt.imshow(Y.reshape(M, N), cmap='viridis', interpolation='none')
plt.colorbar()
plt.scatter(nlp_sol[1], nlp_sol[0], color='red', s=20)
plt.xlim((0, N))
plt.ylim((0, M))
plt.show()
And some super noisy case (nice performance!):
M, N = 50, 100
NOISE_DISTS = 5
-----
PROBLEM
-------
real x: [ 12.51066014 21.6097348 ]
dist mat: [[ 22.329 18.745 27.588 ..., 94.967 80.034 91.206]]
cost with real_x (check vs. noisy): 354.527196716
-----
SOLVE
-----
NLP SOL: [ 12.44158986 21.50164637]
used: 0.01050068340320306
If I understand correctly, you have a matrix A, where A[i,j] holds the distance from (i,j) to some unknown point (y,x). You could find (y,x) like this:
Square each element of A, to make a matrix B say.
We then want to find (y,x) so
(y-i)*(y-i) + (x-j)*(x-j) = B[i,j]
Subtracting each equation from the 0,0 equation and rearranging:
2*i*y + 2*j*x = B[0,0] + i*i + j*j - B[i,j]
This can be solved by linear least squares. Note that since there are 2 unknowns, the matix inversion (better, factorisation) involved will be on a 2x2 matrix and so not time consuming. You could indeed, given just the dimensions of A, work out the required matrix and its inverse analytically.

Can someone explain this piece of code that recognises a digit from the Coursera Machine Learning course

This is a snippet from the predict function of exercise 4 of the Coursera machine learning course. What it does is it stores the predicted digit from a trained neural network in p. Can someone explain how it does this?
function p = predict(Theta1, Theta2, x)
p = 0;
h1 = sigmoid(double([1 x]) * Theta1');
h2 = sigmoid([1 h1] * Theta2');
[dummy, p] = max(h2, [], 2);
end
x = 1x784 matrix of pixel intensity values.
Theta1 = 100x785 matrix.
Theta2 = 10x101 matrix.
I have already trained the network and have gotten the optimum value of Theta1 and Theta2. What I want to know is how that last line of code takes the forward propagated values and stores 1/2/3/4/5/6/7/8/9/10 in p. Whichever digit is stored is the predicted digit.
Sigmoid function:
function g = sigmoid(z)
g = 1 ./ (1 + e.^-z);
end
The last line simply returns index of the neuron with the highest value, in matlab/octave
[M, I] = max(A, [], dim)
stores in I indeces of A which have highest values among dimension dim. In your case, h2 has activations of each output neuron, and from construction of your neural network - classification is simply index of the one with the highest value,
cl(x) = arg max_i f_i(x)

How to implement Roulette Wheel Selection and Rank Sleection on Matlab code for the Traveling Salesman Problom?

I have an assignment coding a genetic algorithm for the traveling salesman problem. I've written some code giving correct results using Tournament Selection.
The problem is, I have to do Wheel and Rank and the results I get are incorrect.
Here is my code using Tournament Selection:
clc;
clear all;
close all;
nofCities = 30;
initialPopulationSize = nofCities*nofCities;
generations = nofCities*ceil(nofCities/10);
cities = floor(rand([nofCities 2])*100+1);
figure;
hold on;
scatter(cities(:,1), cities(:,2), 5, 'b','fill');
line(cities(:,1), cities(:,2));
line(cities([1 end],1), cities([1 end],2));
axis([0 110 0 110]);
population = zeros(initialPopulationSize ,nofCities);
for i=1:initialPopulationSize
population(i,:) = randperm(nofCities);
end
distanceMatrix = zeros(nofCities);
for i=1:nofCities
for j=1:nofCities
if (i==j)
distanceMatrix(i,j)=0;
else
distanceMatrix(i,j) = sqrt((cities(i,1)-cities(j,1))^2+(cities(i,2)-cities(j,2))^2);
end
end
end
for u=1:generations
tourDistance = zeros(initialPopulationSize ,1);
for i=1:initialPopulationSize
for j=1:length(cities)-1
tourDistance(i) = tourDistance(i) + distanceMatrix(population(i,j),population(i,j+1));
end
end
for i=1:initialPopulationSize
tourDistance(i) = tourDistance(i) + distanceMatrix(population(i,end),population(i,1));
end
min(tourDistance)
newPopulation = zeros(initialPopulationSize,nofCities);
for k=1:initialPopulationSize
child = zeros(1,nofCities);
%tournament start
for i=1:5
tournamentParent1(i) = ceil(rand()*initialPopulationSize);
end
p1 = find(tourDistance == min(tourDistance([tournamentParent1])));
parent1 = population(p1(1), :);
for i=1:5
tournamentParent2(i) = ceil(rand()*initialPopulationSize);
end
p2 = find(tourDistance == min(tourDistance([tournamentParent2])));
parent2 = population(p2(1), :);
%tournament end
%crossover
startPos = ceil(rand()*(nofCities/2));
endPos = ceil(rand()*(nofCities/2)+10);
for i=1:nofCities
if (i>startPos && i<endPos)
child(i) = parent1(i);
end
end
for i=1:nofCities
if (isempty(find(child==parent2(i))))
for j=1:nofCities
if (child(j) == 0)
child(j) = parent2(i);
break;
end
end
end
end
newPopulation(k,:) = child;
end
%mutation
mutationRate = 0.015;
for i=1:initialPopulationSize
if (rand() < mutationRate)
pos1 = ceil(rand()*nofCities);
pos2 = ceil(rand()*nofCities);
mutation1 = newPopulation(i,pos1);
mutation2 = newPopulation(i,pos2);
newPopulation(i,pos1) = mutation2;
newPopulation(i,pos2) = mutation1;
end
end
population = newPopulation;
u
end
figure;
hold on;
scatter(cities(:,1), cities(:,2), 5, 'b','fill');
line(cities(population(i,:),1), cities(population(i,:),2));
line(cities([population(i,1) population(i,end)],1), cities([population(i,1) population(i,end)],2));
axis([0 110 0 110]);
%close all;
What I want is to replace the tournament code with wheel and rank code.
Here is what I wrote for the Wheel Selection:
fitness = tourDistance./sum(tourDistance);
wheel = cumsum(fitness);
parent1 = population(find(wheel >= rand(),1),:);
parent2 = population(find(wheel >= rand(),1),:);
Here is a vectorized implementation of a roulette wheel selection in Matlab:
[~,W] = min(ones(popSize,1)*rand(1,2*popSize) > ((cumsum(fitness)*ones(1,2*popSize)/sum(fitness))),[],1);
This assumes that the fitness input into the selection scheme is a matrix of size (popSize x 1) (or a column vector of the same size as the number of population members).
And popSize is obviously the amount of members in your population. And W is the winners or the population members that are selected to become parents/crossover.
The output of the selection will be selected_parents which is a double row vector of size 2*popSize which has all of the indices of the members of the population that will be used in the crossover stage.
This row vector can then be input into a vectorized crossover scheme that could look something like this:
%% Single-Point Preservation Crossover
Pop2 = Pop(W(1:2:end),:); % Pop2 Winners 1
P2A = Pop(W(2:2:end),:); % Pop2 Winners 2
Lidx = sub2ind(size(Pop),[1:popSize]',round(rand(popSize,1)*(genome-1)+1));
vLidx = P2A(Lidx)*ones(1,genome);
[r,c]=find(Pop2==vLidx);
[~,Ord]=sort(r);
r = r(Ord); c = c(Ord);
Lidx2 = sub2ind(size(Pop),r,c);
Pop2(Lidx2) = Pop2(Lidx);
Pop2(Lidx) = P2A(Lidx);
this crossover assumes an input of the W variable from the selection scheme. It also uses Pop which is the population members stored in a popSize by genome matrix. (genome is the number of cities in one tour and also happens to be the size of the genome). The genome is stored as an array of integers with each integer being a city and the tour being defined as the order from the value of the genome array from the array's first index to the array's last index.
while we are at it we may as well include a nice vectorized mutation scheme for a permuation genetic algorithm (which this is).
%% Mutation (Permutation)
idx = rand(popSize,1)<mutRate;
Loc1 = sub2ind(size(Pop2),1:popSize,round(rand(1,popSize)*(genome-1)+1));
Loc2 = sub2ind(size(Pop2),1:popSize,round(rand(1,popSize)*(genome-1)+1));
Loc2(idx == 0) = Loc1(idx == 0);
[Pop2(Loc1), Pop2(Loc2)] = deal(Pop2(Loc2), Pop2(Loc1));
This mutation randomly flips the order of 2 cities in our tour (genome).
Finally make sure to update your population after all of that work we did!
%% Update Population!
Pop = Pop2; % updates the population to include crossovers and mutation.
So i know this reply is probably way too late for your assignment, but hopefully it will help someone else with a similar problem.
I REALLY REALLY recommend anyone interested in vectorized genetic algorithms in Matlab to read this paper: UCL: Efficiently Vectorized Code for Population Based Optimization Algorithms
It is what i based all of the code off of in the examples and it will teach you why you are writing the code that way. Its a great resource and what got me started with GAs.
For wheel selection to work, you should start with designing a fitness measure with fitter individuals having a bigger value. In contrast to the distance where better individuals having a smaller value. Then your approach with the cumsum should work.
Where is the issue with ranking selection?

Dirichlet process in PyMC 3

I would like to implement to implement the Dirichlet process example referenced in
Implementing Dirichlet processes for Bayesian semi-parametric models (source: here) in PyMC 3.
In the example the stick-breaking probabilities are computed using the pymc.deterministic
decorator:
v = pymc.Beta('v', alpha=1, beta=alpha, size=N_dp)
#pymc.deterministic
def p(v=v):
""" Calculate Dirichlet probabilities """
# Probabilities from betas
value = [u*np.prod(1-v[:i]) for i,u in enumerate(v)]
# Enforce sum to unity constraint
value[-1] = 1-sum(value[:-1])
return value
z = pymc.Categorical('z', p, size=len(set(counties)))
How would you implement this in PyMC 3 which is using Theano for the gradient computation?
edit:
I tried the following solution using the theano.scan method:
with pm.Model() as mod:
conc = Uniform('concentration', lower=0.5, upper=10)
v = Beta('v', alpha=1, beta=conc, shape=n_dp)
p, updates = theano.scan(fn=lambda stick, idx: stick * t.prod(1 - v[:idx]),
outputs_info=None,
sequences=[v, t.arange(n_dp)])
t.set_subtensor(p[-1], 1 - t.sum(p[:-1]))
category = Categorical('category', p, shape=n_algs)
sd = Uniform('precs', lower=0, upper=20, shape=n_dp)
means = Normal('means', mu=0, sd=100, shape=n_dp)
points = Normal('obs',
means[category],
sd=sd[category],
observed=data)
step1 = pm.Slice([conc, v, sd, means])
step3 = pm.ElemwiseCategoricalStep(var=category, values=range(n_dp))
trace = pm.sample(2000, step=[step1, step3], progressbar=True)
Which sadly is really slow and does not obtain the original parameters of the synthetic data.
Is there a better solution and is this even correct?
Not sure I have a good answer but perhaps this could be sped up by instead using a theano blackbox op which allows you to write a distribution (or deterministic) in python code. E.g.: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/disaster_model_arbitrary_deterministic.py

PyMC: sampling step by step?

I would like to know why the sampler is incredibly slow when sampling step by step.
For example, if I run:
mcmc = MCMC(model)
mcmc.sample(1000)
the sampling is fast. However, if I run:
mcmc = MCMC(model)
for i in arange(1000):
mcmc.sample(1)
the sampling is slower (and the more it samples, the slower it is).
If you are wondering why I am asking this.. well, I need a step by step sampling because I want to perform some operations on the values of the variables after each step of the sampler.
Is there a way to speed it up?
Thank you in advance!
------------------ EDIT -------------------------------------------------------------
Here I present the specific problem in more details:
I have two models in competition and they are part of a bigger model that has a categorical variable functioning as a 'switch' between the two.
In this toy example, I have the observed vector 'Y', that could be explained by a Poisson or a Geometric distribution. The Categorical variable 'switch_model' selects the Geometric model when = 0 and the Poisson model when =1.
After each sample, if switch_model selects the Geometric model, I want the variables of the Poisson model NOT to be updated, because they are not influencing the likelihood and therefore they are just drifting away. The opposite is true if the switch_model selects the Poisson model.
Basically what I do at each step is to 'change' the value of the non-selected model by bringing it manually one step back.
I hope that my explanation and the commented code will be clear enough. Let me know if you need further details.
import numpy as np
import pymc as pm
import pandas as pd
import matplotlib.pyplot as plt
# OBSERVED VALUES
Y = np.array([0, 1, 2, 3, 8])
# PRIOR ON THE MODELS
pi = (0.5, 0.5)
switch_model = pm.Categorical("switch_model", p = pi)
# switch_model = 0 for Geometric, switch_model = 1 for Poisson
p = pm.Uniform('p', lower = 0, upper = 1) # Prior of the parameter of the geometric distribution
mu = pm.Uniform('mu', lower = 0, upper = 10) # Prior of the parameter of the Poisson distribution
# LIKELIHOOD
#pm.observed
def Ylike(value = Y, mu = mu, p = p, M = switch_model):
if M == 0:
out = pm.geometric_like(value+1, p)
elif M == 1:
out = pm.poisson_like(value, mu)
return out
model = pm.Model([Ylike, p, mu, switch_model])
mcmc = pm.MCMC(model)
n_samples = 5000
traces = {}
for var in mcmc.stochastics:
traces[str(var)] = np.zeros(n_samples)
bar = pm.progressbar.progress_bar(n_samples)
bar.update(0)
mcmc.sample(1, progress_bar=False)
for var in mcmc.stochastics:
traces[str(var)][0] = mcmc.trace(var)[-1]
for i in np.arange(1,n_samples):
mcmc.sample(1, progress_bar=False)
bar.update(i)
for var in mcmc.stochastics:
traces[str(var)][i] = mcmc.trace(var)[-1]
if mcmc.trace('switch_model')[-1] == 0: # Gemetric wins
traces['mu'][i] = traces['mu'][i-1] # One step back for the sampler of the Poisson parameter
mu.value = traces['mu'][i-1]
elif mcmc.trace('switch_model')[-1] == 1: # Poisson wins
traces['p'][i] = traces['p'][i-1] # One step back for the sampler of the Geometric parameter
p.value = traces['p'][i-1]
print '\n\n'
traces=pd.DataFrame(traces)
traces['mu'][traces['switch_model'] == 0] = np.nan
traces['p'][traces['switch_model'] == 1] = np.nan
print traces.describe()
traces.plot()
plt.show()
The reason this is so slow is that Python's for loops are pretty slow, especially when they are compared to FORTRAN loops (Which is what PyMC is written in basically.) If you could show more detailed code, it might be easier to see what you are trying to do and to provide faster alternative algorithms.
Actually I found a 'crazy' solution, and I have the suspect to know why it works. I would still like to get an expert opinion on my trick.
Basically if I modify the for loop in the following way, adding a 'reset of the mcmc' every 1000 loops, the sampling fires up again:
for i in np.arange(1,n_samples):
mcmc.sample(1, progress_bar=False)
bar.update(i)
for var in mcmc.stochastics:
traces[str(var)][i] = mcmc.trace(var)[-1]
if mcmc.trace('switch_model')[-1] == 0: # Gemetric wins
traces['mu'][i] = traces['mu'][i-1] # One step back for the sampler of the Poisson parameter
mu.value = traces['mu'][i-1]
elif mcmc.trace('switch_model')[-1] == 1: # Poisson wins
traces['p'][i] = traces['p'][i-1] # One step back for the sampler of the Geometric parameter
p.value = traces['p'][i-1]
if i%1000 == 0:
mcmc = pm.MCMC(model)
In practice this trick erases the traces and the database of the sampler every 1000 steps. It looks like the sampler does not like having a long database, although I do not really understand why. (of course 1000 steps is arbitrary, too short it adds too much overhead, too long it will cause the traces and database to be too long).
I find this hack a bit crazy and definitely not elegant.. does any of the experts or developers have a comment on it? Thank you!

Resources