Please help me figure out why my code is sooo slow and possible ways to speed it up. I have used vectorization, #inbounds, column major indexing, #floop, and preallocation so I would think it would be faster. I am at a loss...
The code simulates a stochastic wave of cells (as in biological cells) and mutant cells. I am using the Euler-Murayama method to propagate the coupled (Chemical Langevin) equations:
Where W/M denotes the number of wild-type cells and mutant cells respectively, K is the carrying capacity (maximum number of cells), i denotes the deme (location), and N(0,1) is a normal random variable.
Here is a graphic of the waves after some time propagating:
I have attached the important part of the code below, and tried to comment it as best as I could.
using Random, Distributions
using StatsBase
using Statistics
using FLoops
# CLE Parameters/Set-Up:
K = 100 # Carrying capacity (maximum number of cells)
M = 100 # Number of demes (locations)
T = 100_000_000 # number of time steps
dt = 1e-1 # time increment
g = Normal(0.0,sqrt(dt)) # normal distribution with ave = 0.0, std_dev = dt
r_w = 0.1 # Wild-type growth rate
r_m = 0.2 # Mutant growth rate
r_wm = [r_w, r_m]' # Growth rate vector (transposed)
N = 1 # number of independent processes (slow even when N = 1)
# initial wave (essentially a step-function of wild-types
# with 100 mutants at deme (location) 76)
state_init = Matrix(reshape(repeat([K, 0.0]',M+2),(M+2,2)))
state_init[M÷2+2:end,1] .= 0
state_init[76,2] = 100.0
state_init[1,:] .= [K,0]
state_init[end,:] .= [0,0]
state = deepcopy(state_init)
state_plus = zeros(size(state_init)) # state at demes i+1 instead of i (used for derivatives)
state_minus = zeros(size(state_init)) # state at demes i-1 instead of i (used for derivatives)
function sim!(state_init::Matrix{Float64}, state::Matrix{Float64},
T::Int64, dt::Float64, N::Int64, M::Int64, K::Int64,
hist_data::Array{Int64,3}, g::Normal{Float64})
#inbounds #floop for n in 1:N
state .= deepcopy(state_init) # initialize state
#inbounds for t in 1:T
state_plus .= circshift(state, -1) # make plus state
state_plus[1,:] .= [0,0] # fix boundary conditions
state_minus .= circshift(state, 1) # make minus state
state_minus[end,:] .= [0,0] # fix boundary conditions
state_shift = circshift(state, (0,1))
# make state where each deme has a vector
# (# mutants, # wild-types) instead of
# (# wild-types, # mutants)
######################################
# propagate state using Euler-Murayama method and
# restrict number of cells in a deme to be in the range [0,K]
# using clamp(). clamp() also prevents imaginary numbers from
# a negative number under the sqrt().
state .= clamp.(state .+
dt .* (r_wm .* state .* (K .- state .- state_shift) .+
K .* (state_plus .- 2.0 .* state .+ state_minus)) .+
sqrt.(clamp.(
6 .* state .* (K .- state) .+
(K .- 2.0 .* state) .* (state_plus .- 2.0 .* state .+
state_minus) .- r_wm .* state .* (K .- state .- state_shift),
0.0, 1.0*K*K)) .*
rand(g,M+2), 0.0, 1.0*K)
######################################
end
end
end
sim!(state_init, state, T, dt, N, M, K, hist_data, g)
... the rest of the code is analysis and not the reason the code is slow.
Even though you're pre-allocating state_plus and state_minus, memory is being allocated inside the for loop since you're using circshift instead of circshift!. circshift allocates new memory regardless of the fact that it's ultimately being assigned to an existing pre-allocated array. Doing that allocation 300 million times is bound to be costly!
Try
function sim!(state_init::Matrix{Float64}, state::Matrix{Float64},
T::Int64, dt::Float64, N::Int64, M::Int64, K::Int64,
hist_data::Array{Int64,3}, g::Normal{Float64})
#inbounds #floop for n in 1:N
state .= deepcopy(state_init) # initialize state
state_plus = zeros(size(state_init)) # state at demes i+1 instead of i (used for derivatives)
state_minus = zeros(size(state_init)) # state at demes i-1 instead of i (used for derivatives)
state_shift = zeros(size(state_init))
for t in 1:T
circshift!(state_plus, state, -1) # make plus state
state_plus[1,:] .= [0,0] # fix boundary conditions
circshift!(state_minus, state, 1) # make minus state
state_minus[end,:] .= [0,0] # fix boundary conditions
circshift!(state_shift, state, (0,1))
and
function sim!(state_init::Matrix{Float64}, state::Matrix{Float64},
T::Int64, dt::Float64, N::Int64, M::Int64, K::Int64,
hist_data::Array{Int64,3}, g::Normal{Float64})
state_plus = zeros(size(state_init)) # state at demes i+1 instead of i (used for derivatives)
state_minus = zeros(size(state_init)) # state at demes i-1 instead of i (used for derivatives)
state_shift = zeros(size(state_init))
#inbounds for n in 1:N
state .= deepcopy(state_init) # initialize state
for t in 1:T
circshift!(state_plus, state, -1) # make plus state
state_plus[1,:] .= [0,0] # fix boundary conditions
circshift!(state_minus, state, 1) # make minus state
state_minus[end,:] .= [0,0] # fix boundary conditions
circshift!(state_shift, state, (0,1))
The second version doesn't use #floop, but that allows it to not have to initialize state_plus and others N times, so it may be the case that it's faster for your actual N. Best to try both and find out!
Related
Suppose that I have a probability transition matrix, say a matrix of dimensions 2000x2000, that represents a homogeneous Markov chain, and I want to get some statistics of each probability distribution of the first 200 steps of the chain (the distribution of the first row at each step), then I've written the following
using Distributions, LinearAlgebra
# This function defines our transition matrix:
function tm(N::Int, n0::Int)
[pdf(Hypergeometric(N-l,l,n0),k-l) for l in 0:N, k in 0:N]
end
# This computes the 5-percentile of a probability vector
function percentile5(M::Vector)
s=0
i=0
while s <= 0.05
i += 1
s += M[i]
end
return i-1
end
# This function compute a matrix with three rows: means, 5-percentiles
# and standard deviations. Each column represent a session.
function stats(N::Int, n0::Int, m::Int)
A = tm(N,n0)
B = I # Initilizing B with the identity matrix
sup = 0:N # The support of each distribution
sup2 = [k^2 for k in sup]
stats = zeros(3,m)
for i in 1:m
C = B[1,:]
stats[1,i] = sum(C .* sup) # Mean
stats[2,i] = percentile5(C) # 5-percentile
stats[3,i] = sqrt(sum(C .* sup2) - stats[1,i]^2) # Standard deviation
B = A*B
end
return stats
end
data = stats(2000,50,200)
My question is, there is a more efficient (faster) way to do the same computation? I don't see a better way to do it but maybe there are some tricks that speed-up this computation.
This is what I have running so far:
using Distributions, LinearAlgebra, SparseArrays
# This function defines our transition matrix:
function tm(N::Int, n0::Int)
[pdf(Hypergeometric(N-l,l,n0),k-l) for l in 0:N, k in 0:N]
end
# This computes the 5-percentile of a probability vector
function percentile5(M::AbstractVector)
s = zero(eltype(M))
res = length(M)
#inbounds for i = 1:length(M)
s += M[i]
if s > 0.05
res = i - 1
break
end
end
return res
end
# This function compute a matrix with three rows: means, 5-percentiles
# and standard deviations. Each column represent a session.
function stats(N::Int, n0::Int, m::Int)
A = sparse(transpose(tm(N, n0)))
C = zeros(size(A, 1))
C[1] = 1.0
sup = 0:N # The support of each distribution
sup2 = sup .^ 2
stats = zeros(3, m)
for i = 1:m
stats[1, i] = sum(C .* sup) # Mean
stats[2, i] = percentile5(C) # 5-percentile
stats[3, i] = sqrt(sum(C .* sup2) - stats[1, i]^2) # Standard deviation
C = A * C
end
return stats
end
It is around 4x faster (on smaller parameters - possibly much more speedup on large parameters). Basically uses the tips I've made in the comment:
using sparse arrays.
avoiding whole matrix multiply but using vector-matrix multiply instead.
Further improvement are possible (like simulation/ensemble method I've mentioned).
I'm making my first effort to move from Matlab to Julia and have found my code to improve by ~3x but still think there is more to come, I'm not using any global variables in the function and have preallocated all the arrays used (I think?). If there was any thoughts on how it could be sped up even further it would be greatly appreciated, I'll fully convert even at the current improvement I think!
function word_sim(tau::Int, omega::Int, mu::Float64)
# inserts a word in position (tau+1), at each point creates a new word with prob mu
# otherwise randomly chooses a previously used. Runs the program until time omega
words = zeros(Int32, 1, omega) # to store the words
tests = rand(1,omega) # will compare mu to these
words[1] = 1; # initialize the words
next_word = 2 # will be the next word used
words[tau+1] = omega + 1; # max possible word so insert that at time tau
innovates = mu .> tests; # when we'll make a new word
for i = 2:tau # simulate the process
if innovates[i] == 1 # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
# force the word we're interested in
for i = (tau+2):omega
if innovates[i] == 1 # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
result = sum(words .== (omega + 1)); # count how many times our word occurred
return result
end
and when I run it with these values it takes ~.26 seconds on my PC
using Statistics
#time begin
nsim = 10^3;
omega = 100;
seed = [0:1:(omega-1);];
mu = 0.01;
results = zeros(Float64, 1, length(seed));
pops = zeros(Int64, 1, nsim);
for tau in seed
for jj = 1:nsim
pops[jj] = word_sim(tau, omega, mu);
end
results[tau+1] = mean(pops);
end
end
Or perhaps I'd be better writing the code in C++? Julia was my first reaction as I've heard rave reviews about its syntax, which to be honest is fantastic!
Any comments greatly appreciated.
A 3x speedup is a nice start, but it turns out there are a few more things you can do to improve performance significantly!
As a starting point, using your example posted above in Julia 1.6.1, I get
0.301665 seconds (798.10 k allocations: 164.778 MiB, 12.70% gc time)
That's a lot of allocations, and a fair amount of garbage collector ("gc") time, so it seems we're producing a fair amount of garbage here. Some of the culprits are lines like
tests = rand(1,omega) # will compare mu to these
or
innovates = mu .> tests; # when we'll make a new word
In languages like Matlab or Python, pre-calculating these things whole-vector-at-a-time can be good for performance, but in Julia it's generally not really necessary, and can even hurt because each of these lines is causing a brand new array to be allocated. If we remove these and just generate our tests on the fly, we can avoid these allocations. One other line that allocates in here is
result = sum(words .== (omega + 1))
where you first build a whole new array before taking the sum of it. You could avoid this by writing it as a for loop (even though this may feel wrong coming from Matlab, it's quite fast in Julia). Or, to keep it as a one-liner, use either count or sum with a function that does the comparison as the first argument
result = count(x->(x == omega+1), words)
(in this example, just using an anonymous function x->(x == omega+1)).
Adding up these changes so far then
function word_sim(tau::Int, omega::Int, mu::Float64)
# inserts a word in position (tau+1), at each point creates a new word with prob mu
# otherwise randomly chooses a previously used. Runs the program until time omega
words = zeros(Int32, 1, omega) # to store the words
words[1] = 1; # initialize the words
next_word = 2 # will be the next word used
words[tau+1] = omega + 1; # max possible word so insert that at time tau
for i = 2:tau # simulate the process
if mu > rand() # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
# force the word we're interested in
for i = (tau+2):omega
if mu > rand() # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
result = count(x->(x == omega+1), words) # count how many times our word occurred
return result
end
Using the same timing code, this now brings us down to
0.177766 seconds (298.10 k allocations: 51.863 MiB, 13.01% gc time)
So about half the time and half the allocations. There's still more though!
First, let's move the allocation of the words array outside of the word_sim function and instead make an in-place version of that function. We can also speed things up a adding an #inbounds to the tight for loops.
function word_sim!(words::AbstractArray, tau::Int, omega::Int, mu::Float64)
# inserts a word in position (tau+1), at each point creates a new word with prob mu
# otherwise randomly chooses a previously used. Runs the program until time omega
fill!(words, 0) # Probably not necessary actually, but I haven't spent enough time looking at the code to be sure
words[1] = 1; # initialize the words
next_word = 2 # will be the next word used
words[tau+1] = omega + 1; # max possible word so insert that at time tau
#inbounds for i = 2:tau # simulate the process
if mu > rand() # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
# force the word we're interested in
#inbounds for i = (tau+2):omega
if mu > rand() # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
result = count(x->(x == omega+1), words) # count how many times our word occurred
return result
end
In-place functions that modify one of their input arguments are usually denoted by a ! at the end of their name by convention in Julia, hence the new function name.
Since we have to modify the timing code a bit to pre-allocate words now, let's also take the opportunity to put that timing code into a function to avoid any globals in the timing.
function run_word_sim()
nsim = 10^3
omega = 100
seed = [0:1:(omega-1);]
mu = 0.01
results = zeros(Float64, 1, length(seed))
pops = zeros(Int64, 1, nsim)
words = zeros(Int32, 1, omega) # to store the words
for tau in seed
for jj = 1:nsim
pops[jj] = word_sim!(words, tau, omega, mu)
end
results[tau+1] = mean(pops)
end
return results
end
Then get the most accurate timing results (and optionally some useful plots and statistics) we can use the BenchmarkTools package and its #btime or #benchmark macros
julia> using BenchmarkTools
julia> #btime run_word_sim()
124.178 ms (4 allocations: 10.17 KiB)
or
So, almost another 3x speedup, and reduced allocations and memory usage (by four or five orders of magnitude) down to only the four arrays used in the timing code (seed, results, pops and words).
For the absolute maximum performance, you could possibly go even farther with LoopVectorization.jl and its #turbo macro, though it would likely require a change in algorithm since these loops depend on previous state, so don't appear to be compatible with loop re-ordering. You could turn the count into a for loop and #turbo that for a slight additional speedup though.
There are also other options for potentially faster random number generation, such as VectorizedRNG.jl as discussed in the discourse thread linked in the comments. While allocating a new vector of random numbers on each call of word_sim is likely not optimal, RNG is generally faster when you can generate a lot of random numbers at once, so passing a pre-allocated buffer of random numbers to word_sim! and filling that in-place with rand! as provided by either the Random stdlib or VectorizedRNG could yield a significant additional speedup.
Some of the tricks and rules of thumb used in this answer are discussed more generally in https://github.com/brenhinkeller/JuliaAdviceForMatlabProgrammers, along with a few other general Matlab -> Julia tips.
I've seen multiple questions addressing memory allocation in Julia in general, however none of these examples helped me.
I provide a minimal example that shall illustrate my problem. I implemented a finite volume solver that computes the solution of an advection equation. Long story short here the (self contained) code:
function dummyexample()
nx = 100
Δx = 1.0/nx
x = range(Δx/2.0, length=nx, step=Δx)
ρ = sin.(2π*x)
for i=1:floor(1.0/Δx / 0.5)
shu_osher_step!(ρ) # This part is executed several times
end
println(sum(Δx*abs.(ρ .- sin.(2π*x))))
end
function shu_osher_step!(ρ::AbstractArray)
ρ₁ = euler_step(ρ) # array allocation
ρ₂ = 3.0/4.0*ρ .+ 1.0/4.0*euler_step(ρ₁) # array allocation
ρ .= 1.0/3.0*ρ .+ 2.0/3.0*euler_step(ρ₂) # array allocation
end
function euler_step(ρ::AbstractArray)
return ρ .+ 0.5*rhs(ρ)
end
function rhs(ρ::AbstractArray)
ρₗ = circshift(ρ,+1) # array allocation
ρᵣ = circshift(ρ,-1) # array allocation
Δρₗ = ρ.-ρₗ # array allocation
Δρᵣ = ρᵣ .-ρ # array allocation
vᵣ = ρ .+ 1.0/2.0 .* H(Δρₗ,Δρᵣ) # array allocation
return -(vᵣ .- circshift(vᵣ,+1)) # array allocation
end
function H(Δρₗ::AbstractArray,Δρᵣ::AbstractArray)
σ = Δρₗ ./ Δρᵣ
σ̃ = max.(abs.(σ),1e-12) .* (2.0 .* (σ .>= 0.0) .- 1.0)
for i=1:100
if isnan(σ̃[i])
σ̃[i] = 1e-12
end
end
return Δρₗ .* (2.0/3.0*(1.0 ./ σ̃) .+ 1.0/3.0)
end
My problem is, that deep down in the call tree the function rhs allocates several arrays in every iteration of the most upper time loop. These arrays are temporary and I do not like the fact that they have to be reallocated every iteration. Here the output from #time:
julia> include("dummyexample.jl");
julia> #time dummyexample()
8.780349744014917e-5 # <- just to check that the error is almost zero
0.362833 seconds (627.38 k allocations: 39.275 MiB, 1.95% gc time)
Now in the real code, there is actually a struct p passed down the whole calltree that contains attributes which I hardcoded here (basically every of the explicitly stated numbers would be referenced by p.n, etc.)
I could probably also pass down preallocated arrays like but that seems to get messy and I would have to change that every time I want to do extra computations.
Global arrays are discouraged in the Julia documentation but wouldn't that do the trick here? Are there any other obvious things I am missing? I am considering Julia 1.0.
Passing down preallocated arrays, as you say in the last paragraph, is exactly the right thing in this kind of situation. Additional to that, I would devectorize the code into a manual loop containing a stencil and more indexing math instead of circshift.
Applying both ideas results in the following:
function dummyexample()
nx = 100
Δx = 1.0 / nx
steps = 2 ÷ Δx
x = range(Δx ÷ 2, length = nx, step = Δx)
ρ = sin.(2π .* x)
run!(ρ, steps)
println(sum(#. Δx * abs(ρ - sin(2π * x))))
end
function run!(ρ, steps)
ρ₁, ρ₂, v = similar(ρ), similar(ρ), similar(ρ)
for i = 1:steps
shu_osher_step!(ρ₁, ρ₂, v, ρ)
end
return ρ
end
function shu_osher_step!(ρ₁, ρ₂, v, ρ)
euler_step!(ρ₁, v, ρ)
ρ₂ .= 3.0/4.0 .* ρ .+ 1.0/4.0 .* euler_step!(ρ₂, v, ρ₁)
ρ .= 1.0/3.0 .* ρ .+ 2.0/3.0 .* euler_step!(ρ, v, ρ₂)
end
function euler_step!(ρₒ, v, ρ)
cycle(i) = mod(i - 1, length(ρ)) + 1
# two steps of calculating v fused into one -- could be replaced by
# an extra loop for v.
for I in 1:2:size(ρ, 1)
v[I] = rhs(ρ[cycle(I-1)], ρ[I], ρ[cycle(I+1)])
v[cycle(I+1)] = rhs(ρ[cycle(I)], ρ[I+1], ρ[cycle(I+2)])
ρₒ[I] += 0.5 * (v[cycle(I+1)] - v[I])
end
return ρₒ
end
function rhs(ρₗ, ρᵢ, ρᵣ)
Δρₗ = ρᵢ - ρₗ
Δρᵣ = ρᵣ - ρᵢ
return ρᵢ + 1/2 * H(Δρₗ, Δρᵣ)
end
function H(Δρₗ, Δρᵣ)
σ = Δρₗ / Δρᵣ
σ̃ = max(abs(σ), 1e-12) * (2.0 * (σ >= 0.0) - 1.0)
isnan(σ̃) && (σ̃ = 1e-12)
return Δρₗ * (2.0 / 3.0 * (1.0 / σ̃) + 1.0 / 3.0)
end
The above might still contain some logic errors due to my lack of domain knowledge (dummyexample() prints 0.02984422033942575), but you see the pattern. And it benchmarks well:
julia> #benchmark run!($ρ, $steps)
BenchmarkTools.Trial:
memory estimate: 699.13 KiB
allocs estimate: 799
--------------
minimum time: 3.024 ms (0.00% GC)
median time: 3.164 ms (0.00% GC)
mean time: 3.760 ms (1.69% GC)
maximum time: 57.105 ms (94.41% GC)
--------------
samples: 1327
evals/sample: 1
m,n =size(l.x)
for batch=1:m
l.ly = l.y[batch,:]
l.jacobian .= -l.ly .* l.ly'
l.jacobian[diagind(l.jacobian)] .= l.ly.*(1.0.-l.ly)
# # n x 1 = n x n * n x 1
l.dldx[batch,:] = l.jacobian * DLDY[batch,:]
end
return l.dldx
l.x is a m by n matrix. l.y is another matrix with the same size as l.x. My goal is to create another m by n matrix, l.dldx, in which each row is the result of the operation inside the for loop. Can any one spot further optimization for this block of code? The code above is part of https://github.com/stevenygd/NN.jl.
The following should implement the same calculation and is more efficient:
l.dldx = l.y .* (DLDY .- sum( l.y .* DLDY , 2))
There might be a slight improvement available by refactoring the sum into a loop.
As the question does not have runnable code, or a test case, it is hard to give definite benchmarks, so feedback would be welcome.
UPDATE
Here is the code above with explicit loops:
function calc_dldx(y,DLDY)
tmp = zeros(eltype(y),size(y,1))
dldx = similar(y)
#inbounds for j=1:size(y,2)
for i=1:size(y,1)
tmp[i] += y[i,j]*DLDY[i,j]
end
end
#inbounds for j=1:size(y,2)
for i=1:size(y,1)
dldx[i,j] = y[i,j]*(DLDY[i,j]-tmp[i])
end
end
return dldx
end
The long version should run even faster. A good way to measure the performance of code is using the BenchmarkTools package.
this question was asked previously and not answered (5 June) but maybe putting it in context makes more sense.
I have done the change point tutorial with the two lambdas and extended with 2 change point so the modelling is now:
# the exp parameter expected is the inverse of the average from sampled series
alpha = 1.0 / count_data.mean()
# regime 1 poisson
lambda_1 = pm.Exponential("lambda_1", alpha)
# regime 2 poisson
lambda_2 = pm.Exponential("lambda_2", alpha)
# regime 3 poisson
lambda_3 = pm.Exponential("lambda_3", alpha)
# change point is somewhere in between with equal probabilities
tau1 = pm.DiscreteUniform("tau1", lower=0, upper=n_count_data)
# change point is somewhere in between with equal probabilities
tau2 = pm.DiscreteUniform("tau2", lower=0, upper=n_count_data)
#pm.deterministic
def lambda_(tau1=tau1,tau2=tau2, lambda_1=lambda_1, lambda_2=lambda_2):
out = np.zeros(n_count_data)
out[:tau1] = lambda_1 # lambda before tau is lambda1
out[tau1:tau2] = lambda_2 # lambda between periods is lambda2
out[tau2:] = lambda_3 # lambda after (and including) tau2 is lambda3
return out
observation = pm.Poisson("obs", lambda_, value=count_data, observed=True)
model = pm.Model([observation, lambda_1, lambda_2, tau1,tau2])
# markov monte carlo chain
mcmc = pm.MCMC(model)
mcmc.sample(40000, 10000, 1)
The question is that in the deterministic variable how do I actually tell the model that I only need to consider when tau1 is less than tau2?
The problem is that when tau2 precedes tau1 there is a time symmetry which is computationally non necessary.
Any help is welcome.
I haven't tested it, but I think you could do something like this:
# change point is somewhere in between with equal probabilities
tau1 = pm.DiscreteUniform("tau1", lower=0, upper=n_count_data)
# change point is somewhere in between with equal probabilities
tau2 = pm.DiscreteUniform("tau2", lower=tau1, upper=n_count_data)
That way tau2 is constrained to be at least as large as tau1. You may have to think a little bit about whether tau1 and tau2 should be allowed to coincide.
The full model under the assumption of a deterministic gap between the taus follows:
# the exp parameter expected is the inverse of the average from sampled series
alpha = 1.0 / count_data.mean()
# regime 1 poisson
lambda_1 = pm.Exponential("lambda_1", alpha)
# regime 2 poisson
lambda_2 = pm.Exponential("lambda_2", alpha)
# regime 3 poisson
lambda_3 = pm.Exponential("lambda_3", alpha)
# change point is somewhere in between with equal probabilities
tau1 = pm.DiscreteUniform("tau1", lower=0, upper=n_count_data)
# change point is somewhere in between with equal probabilities
tau2 = pm.DiscreteUniform("tau2", lower=tau1+1, upper=n_count_data)
#pm.deterministic
def lambda_(tau1=tau1,tau2=tau2, lambda_1=lambda_1, lambda_2=lambda_2,lambda_3=lambda_3):
out = np.zeros(n_count_data)
out[:tau1] = lambda_1 # lambda before tau is lambda1
out[tau1:tau2] = lambda_2 # lambda between periods is lambda2
out[tau2:] = lambda_3 # lambda after (and including) tau2 is lambda3
return out
observation = pm.Poisson("obs", lambda_, value=count_data, observed=True)
model = pm.Model([observation, lambda_1, lambda_2,lambda_3, tau1,tau2])
# markov monte carlo chain
mcmc = pm.MCMC(model)
mcmc.sample(40000, 10000, 1)
lambda_1_samples = mcmc.trace('lambda_1')[:]
lambda_2_samples = mcmc.trace('lambda_2')[:]
lambda_3_samples = mcmc.trace('lambda_3')[:]
tau1_samples = mcmc.trace('tau1')[:]
tau2_samples = mcmc.trace('tau2')[:]
Will also try with the random gap and see how it goes.
If you are open to use R to solve the same inference problem, the mcp package provides a higher-level interface for change point problems. It has order-restricted change point parameters by default.
Here is a model for three intercepts (two change points)
model = list(
count ~ 1,
~ 1,
~ 1
)
library(mcp)
fit = mcp(model, data, family = poisson())
More info:
about Poisson models in mcp.
priors in mcp contains more about finer control of order-restriction.