state_cov is being ignored when using the Kalman filter in statsmodels - statsmodels

I am trying to create an affine term structure model derived from statsmodels.tsa.statespace.MLEModel (code below) which is initialized using least squares.
'''
class affine_term_structure(sm.tsa.statespace.MLEModel):
def __init__(self, yields, tau, k_states=3, **kwargs):
# Initialize the statespace
super(affine_term_structure, self).__init__(yields, k_states=k_states, **kwargs)
self.initialize_known(np.zeros(self.k_states), np.eye(self.k_states) * 10000)
def update(self, params, **kwargs):
params = super(dynamic_nelson_siegel, self).update(params, **kwargs)
# Extract the parameters
Phi = np.reshape(params[:9], (3, 3))
k = np.array(params[9:12])
Sx = np.zeros((3, 3))
Sx[np.tril_indices(3)] = params[12:18]
lmbd = params[18]
sy = params[-1]
b = self.nss(self.tau, lmbd)
self['transition'] = Phi # transition matrix
self['state_intercept'] = k # transition offset
self['state_cov'] = Sx # Sx.T # transition covariance. 3x3 SPD matrix
self['design'] = b # design matrix
# self['obs_intercept'] = 0 # observation intercept
self['obs_cov'] = sy * sy * np.eye(self.k_endog) # observation covariance
'''
However, I noticed that on running the filter/smoother the states were being excessively smoothed. Digging through the filtering results it seems like the state_cov is not being used in the prediction step.
For example
self.predicted_state_cov[:,:,1]
matches
self.transition[:,:,0] # self.filtered_state_cov[:,:,0] # self.transition[:,:,0].T
Though I would have expected it to be equal to
self.transition[:,:,0] # self.filtered_state_cov[:,:,0] # self.transition[:,:,0].T + self.state_cov[:,:,0]
For good order, please note that all parameter matrices are time invariant.
Im not sure what Im missing here and any help would be much appreciated.
Thanks

In Statsmodels, the state equation is:
x(t+1) = T x(t) + R eta(t+1)
where eta(t+1) ~ N(0, Q)
When you set state_cov, you're setting Q, but you also need to set R, which is selection.
For example, if you want your state equation to be:
x(t+1) = T x(t) + eta(t+1)
Then you would do something like:
self['selection'] = np.eye(3)
It is not the case that R is the identity in every state space model, and it can't even always be initialized to the identity matrix, since the dimension of x(t) and the dimension of eta(t) can be different. That's why R is not automatically initialized to the identity matrix.

Related

Faster way to compute distributions from Markov chain?

Suppose that I have a probability transition matrix, say a matrix of dimensions 2000x2000, that represents a homogeneous Markov chain, and I want to get some statistics of each probability distribution of the first 200 steps of the chain (the distribution of the first row at each step), then I've written the following
using Distributions, LinearAlgebra
# This function defines our transition matrix:
function tm(N::Int, n0::Int)
[pdf(Hypergeometric(N-l,l,n0),k-l) for l in 0:N, k in 0:N]
end
# This computes the 5-percentile of a probability vector
function percentile5(M::Vector)
s=0
i=0
while s <= 0.05
i += 1
s += M[i]
end
return i-1
end
# This function compute a matrix with three rows: means, 5-percentiles
# and standard deviations. Each column represent a session.
function stats(N::Int, n0::Int, m::Int)
A = tm(N,n0)
B = I # Initilizing B with the identity matrix
sup = 0:N # The support of each distribution
sup2 = [k^2 for k in sup]
stats = zeros(3,m)
for i in 1:m
C = B[1,:]
stats[1,i] = sum(C .* sup) # Mean
stats[2,i] = percentile5(C) # 5-percentile
stats[3,i] = sqrt(sum(C .* sup2) - stats[1,i]^2) # Standard deviation
B = A*B
end
return stats
end
data = stats(2000,50,200)
My question is, there is a more efficient (faster) way to do the same computation? I don't see a better way to do it but maybe there are some tricks that speed-up this computation.
This is what I have running so far:
using Distributions, LinearAlgebra, SparseArrays
# This function defines our transition matrix:
function tm(N::Int, n0::Int)
[pdf(Hypergeometric(N-l,l,n0),k-l) for l in 0:N, k in 0:N]
end
# This computes the 5-percentile of a probability vector
function percentile5(M::AbstractVector)
s = zero(eltype(M))
res = length(M)
#inbounds for i = 1:length(M)
s += M[i]
if s > 0.05
res = i - 1
break
end
end
return res
end
# This function compute a matrix with three rows: means, 5-percentiles
# and standard deviations. Each column represent a session.
function stats(N::Int, n0::Int, m::Int)
A = sparse(transpose(tm(N, n0)))
C = zeros(size(A, 1))
C[1] = 1.0
sup = 0:N # The support of each distribution
sup2 = sup .^ 2
stats = zeros(3, m)
for i = 1:m
stats[1, i] = sum(C .* sup) # Mean
stats[2, i] = percentile5(C) # 5-percentile
stats[3, i] = sqrt(sum(C .* sup2) - stats[1, i]^2) # Standard deviation
C = A * C
end
return stats
end
It is around 4x faster (on smaller parameters - possibly much more speedup on large parameters). Basically uses the tips I've made in the comment:
using sparse arrays.
avoiding whole matrix multiply but using vector-matrix multiply instead.
Further improvement are possible (like simulation/ensemble method I've mentioned).

How to constrain model variables in GEKKO

I like to constrain the variable value u < 1 in y model. Added ub=1 to the variable definition u = m.Var(name='u', value=0, lb=-2, ub=1) but it resulted in "No soulution found" (EXIT: Converged to a point of local infeasibility. Problem may be infeasible.). I guess I have to reformulate the problem to avoid this, but I have not been able to find examples how this should be done. How do i write a proper model to avoid infeasible solutions when constraining variable values?
I hav tied to reformulate the problem by adding equation like m.Equation(u < 1) with no success.
import numpy as np
from gekko import GEKKO
import matplotlib.pyplot as pyplt
m = GEKKO(remote=False)
t = np.linspace(0, 1000, 101) # time
d = np.ones(t.shape)
d[0:10] = 0
y_delay=0
# Add data to model
m.time = t
K = m.Const(0.01, name='K')
r = m.Const(name='r', value=0) # Reference
d = m.Param(name='d', value=d) # Disturbance
y = m.Var(name='y', value=0, lb=-2, ub=2) # State variable
u = m.Var(name='u', value=0, lb=-2, ub=1) # Output
e = m.Var(name='e', value=0)
Tc = m.FV(name='Tc', value=1200, lb=60, ub=1200) # time constant
# Update variable status
Tc.STATUS = 1 # Optimizer can adjust value
Kp = m.Intermediate(1 / K * 1 / Tc, name='Kp')
Ti = m.Intermediate(4 * Tc, name='Ti')
# Model equations
m.Equations([y.dt() == K * (u-d),
e == r-y,
u.dt() == Kp*e.dt()+Kp/Ti*e])
# Model constraints
m.Equation(y < 0.5)
m.Equation(y > -0.5)
# Model objective
m.Obj(-Tc)
# options
m.options.IMODE = 6 # Problem type: 6 = Dynamic optimization
# solve
m.solve(disp=True, debug=True)
print('Tc: %6.2f [s]' % (Tc.value[-1], ))
fig1, (ax1, ax2, ax3) = pyplt.subplots(3, sharex='all')
ax1.plot(t, y.value)
ax1.set_ylabel("y", fontsize=8), ax1.grid(True, which='both')
ax2.plot(t, e.value)
ax2.set_ylabel("e", fontsize=8), ax2.grid(True, which='both')
ax3.plot(t, u.value)
ax3.plot(t, d.value)
ax3.set_ylabel("u and d", fontsize=8), ax3.grid(True, which='both')
pyplt.show()
EXIT: Converged to a point of local infeasibility. Problem may be infeasible.
An error occured.
The error code is 2
If I change the upper bound of u to 2, the optimization problem is solved as expected.
Hard constraints on variables can lead to an infeasible solution, as you observed. I recommend that you use soft constraints by specifying the variable y as a Controlled Variable and set an upper and lower set point range with SPHI and SPLO.
y = m.CV(name='y', value=0) # Controlled variable
y.STATUS = 1
y.TR_INIT = 0
y.SPHI = 0.5
y.SPLO = -0.5
I also removed the lb and ub from y and u to not give them hard bounds that can lead to the infeasibility. You also have an objective to maximize the value of Tc with m.Obj(-Tc). It goes to the maximum limit: 1200 when the solver is able to adjust the value. As you can see from the plot, the value of y exceeds the setpoint range. It may not be possible for the controller to keep it within that range. A soft constraint (objective based) approach to constraints penalizes deviations but does not lead to an infeasible solution. If you need to increase the penalty on violations of the SPHI or SPLO, the parameters WSPHI and WSPLO can be adjusted.
It appears that you have a first order dynamic model and you are trying to optimize PID parameters. If you need to model saturation of the controller output (actuator) then the if3, max3, min3 or corresponding if2, max2, min2 functions may be useful. There is more information on CV objectives and tuning in the Dynamic Optimization course.
Here is a feasible solution to your problem:
import numpy as np
from gekko import GEKKO
import matplotlib.pyplot as pyplt
m = GEKKO() # remote=False
t = np.linspace(0, 1000, 101) # time
d = np.ones(t.shape)
d[0:10] = 0
y_delay=0
# Add data to model
m.time = t
K = m.Const(0.01, name='K')
r = m.Const(name='r', value=0) # Reference
d = m.Param(name='d', value=d) # Disturbance
e = m.Var(name='e', value=0)
u = m.Var(name='u', value=0) # Output
Tc = m.FV(name='Tc', value=1200, lb=60, ub=1200) # time constant
y = m.CV(name='y', value=0) # Controlled variable
y.STATUS = 1
y.TR_INIT = 0
y.SPHI = 0.5
y.SPLO = -0.5
# Update variable status
Tc.STATUS = 1 # Optimizer can adjust value
Kp = m.Intermediate((1 / K) * (1 / Tc), name='Kp')
Ti = m.Intermediate(4 * Tc, name='Ti')
# Model equations
m.Equations([y.dt() == K * (u-d),
e == r-y,
u.dt() == Kp*e.dt()+(Kp/Ti)*e])
# Model constraints
#m.Equation(y < 0.5)
#m.Equation(y > -0.5)
# Model objective
m.Obj(-Tc)
# options
m.options.IMODE = 6 # Problem type: 6 = Dynamic optimization
m.options.SOLVER = 3
m.options.MAX_ITER = 1000
# solve
m.solve(disp=True, debug=True)
print('Tc: %6.2f [s]' % (Tc.value[-1], ))
fig1, (ax1, ax2, ax3) = pyplt.subplots(3, sharex='all')
ax1.plot(t, y.value)
ax1.plot([min(t),max(t)],[0.5,0.5],'k--')
ax1.plot([min(t),max(t)],[-0.5,-0.5],'k--')
ax1.set_ylabel("y", fontsize=8), ax1.grid(True, which='both')
ax2.plot(t, e.value)
ax2.set_ylabel("e", fontsize=8), ax2.grid(True, which='both')
ax3.plot(t, u.value)
ax3.plot(t, d.value)
ax3.set_ylabel("u and d", fontsize=8), ax3.grid(True, which='both')
pyplt.show()
Thanks for an extensive and useful answer to my question. I really appreciate it.
As you correctly observed I am trying to optimize tuning parameters for my simple control problem. I have executed your code with soft constraints, and it sure solves the feasibility issue. I also added the WSPHI/LO parameters and set their values high to have a solution within the constraints. Still, I like to have a model where the control output (“u”) is bounded [0,1]. Based on your answer I probably must add “if” or “max/min” statements in the model to avoid having a non-feasible set of equations when “u” hits the bound. Something like “if u<0, u.dt() = 0 else u.dt() = Kp*e ….”. Could it alternatively be possible to add a variable (a type slack variable) to ensure feasibility of the equation set? I will also investigate the material in the dynamic optimization course links to get a better understanding of dynamic modelling. Thanks again for guiding me in the right direction in this issue.

Enumerate through variable (porting PyMC to PyMC3)

I'm starting out with PyMC3 by translating this code from PyMC to PyMC3.
I'm not sure how to translate this segment:
v = pymc.Beta('v', alpha=1, beta=alpha, size=N_dp)
#pymc.deterministic
def p(v=v):
""" Calculate Dirichlet probabilities """
# Probabilities from betas
# this line creates the error:
value = [u*np.prod(1-v[:i]) for i,u in enumerate(v)]
# Enforce sum to unity constraint
value[-1] = 1-sum(value[:-1])
return value
z = pymc.Categorical('z', p, size=len(set(counties)))
I assume I have to replace p in the last line with p(v) and remove the #pymc.deterministic but the problem seems to be that I cannot enumerate through v: ValueError: length not known: ViewOp [id A] 'v'.
Can someone show me how to do the translation or link me to the relevant bit in the documentation? Thanks.
The Dirichlet distribution is actually built into pymc3, so that whole code block can be replaced by:
with pm.Model():
...
v = pm.Beta('v', alpha=1, beta=alpha, shape=N_dp)
p = pm.Dirichlet('p', a=v, shape=N_dp)
...
trace = pm.sample(20000)

Can someone explain this piece of code that recognises a digit from the Coursera Machine Learning course

This is a snippet from the predict function of exercise 4 of the Coursera machine learning course. What it does is it stores the predicted digit from a trained neural network in p. Can someone explain how it does this?
function p = predict(Theta1, Theta2, x)
p = 0;
h1 = sigmoid(double([1 x]) * Theta1');
h2 = sigmoid([1 h1] * Theta2');
[dummy, p] = max(h2, [], 2);
end
x = 1x784 matrix of pixel intensity values.
Theta1 = 100x785 matrix.
Theta2 = 10x101 matrix.
I have already trained the network and have gotten the optimum value of Theta1 and Theta2. What I want to know is how that last line of code takes the forward propagated values and stores 1/2/3/4/5/6/7/8/9/10 in p. Whichever digit is stored is the predicted digit.
Sigmoid function:
function g = sigmoid(z)
g = 1 ./ (1 + e.^-z);
end
The last line simply returns index of the neuron with the highest value, in matlab/octave
[M, I] = max(A, [], dim)
stores in I indeces of A which have highest values among dimension dim. In your case, h2 has activations of each output neuron, and from construction of your neural network - classification is simply index of the one with the highest value,
cl(x) = arg max_i f_i(x)

change point detection with two transitions

this question was asked previously and not answered (5 June) but maybe putting it in context makes more sense.
I have done the change point tutorial with the two lambdas and extended with 2 change point so the modelling is now:
# the exp parameter expected is the inverse of the average from sampled series
alpha = 1.0 / count_data.mean()
# regime 1 poisson
lambda_1 = pm.Exponential("lambda_1", alpha)
# regime 2 poisson
lambda_2 = pm.Exponential("lambda_2", alpha)
# regime 3 poisson
lambda_3 = pm.Exponential("lambda_3", alpha)
# change point is somewhere in between with equal probabilities
tau1 = pm.DiscreteUniform("tau1", lower=0, upper=n_count_data)
# change point is somewhere in between with equal probabilities
tau2 = pm.DiscreteUniform("tau2", lower=0, upper=n_count_data)
#pm.deterministic
def lambda_(tau1=tau1,tau2=tau2, lambda_1=lambda_1, lambda_2=lambda_2):
out = np.zeros(n_count_data)
out[:tau1] = lambda_1 # lambda before tau is lambda1
out[tau1:tau2] = lambda_2 # lambda between periods is lambda2
out[tau2:] = lambda_3 # lambda after (and including) tau2 is lambda3
return out
observation = pm.Poisson("obs", lambda_, value=count_data, observed=True)
model = pm.Model([observation, lambda_1, lambda_2, tau1,tau2])
# markov monte carlo chain
mcmc = pm.MCMC(model)
mcmc.sample(40000, 10000, 1)
The question is that in the deterministic variable how do I actually tell the model that I only need to consider when tau1 is less than tau2?
The problem is that when tau2 precedes tau1 there is a time symmetry which is computationally non necessary.
Any help is welcome.
I haven't tested it, but I think you could do something like this:
# change point is somewhere in between with equal probabilities
tau1 = pm.DiscreteUniform("tau1", lower=0, upper=n_count_data)
# change point is somewhere in between with equal probabilities
tau2 = pm.DiscreteUniform("tau2", lower=tau1, upper=n_count_data)
That way tau2 is constrained to be at least as large as tau1. You may have to think a little bit about whether tau1 and tau2 should be allowed to coincide.
The full model under the assumption of a deterministic gap between the taus follows:
# the exp parameter expected is the inverse of the average from sampled series
alpha = 1.0 / count_data.mean()
# regime 1 poisson
lambda_1 = pm.Exponential("lambda_1", alpha)
# regime 2 poisson
lambda_2 = pm.Exponential("lambda_2", alpha)
# regime 3 poisson
lambda_3 = pm.Exponential("lambda_3", alpha)
# change point is somewhere in between with equal probabilities
tau1 = pm.DiscreteUniform("tau1", lower=0, upper=n_count_data)
# change point is somewhere in between with equal probabilities
tau2 = pm.DiscreteUniform("tau2", lower=tau1+1, upper=n_count_data)
#pm.deterministic
def lambda_(tau1=tau1,tau2=tau2, lambda_1=lambda_1, lambda_2=lambda_2,lambda_3=lambda_3):
out = np.zeros(n_count_data)
out[:tau1] = lambda_1 # lambda before tau is lambda1
out[tau1:tau2] = lambda_2 # lambda between periods is lambda2
out[tau2:] = lambda_3 # lambda after (and including) tau2 is lambda3
return out
observation = pm.Poisson("obs", lambda_, value=count_data, observed=True)
model = pm.Model([observation, lambda_1, lambda_2,lambda_3, tau1,tau2])
# markov monte carlo chain
mcmc = pm.MCMC(model)
mcmc.sample(40000, 10000, 1)
lambda_1_samples = mcmc.trace('lambda_1')[:]
lambda_2_samples = mcmc.trace('lambda_2')[:]
lambda_3_samples = mcmc.trace('lambda_3')[:]
tau1_samples = mcmc.trace('tau1')[:]
tau2_samples = mcmc.trace('tau2')[:]
Will also try with the random gap and see how it goes.
If you are open to use R to solve the same inference problem, the mcp package provides a higher-level interface for change point problems. It has order-restricted change point parameters by default.
Here is a model for three intercepts (two change points)
model = list(
count ~ 1,
~ 1,
~ 1
)
library(mcp)
fit = mcp(model, data, family = poisson())
More info:
about Poisson models in mcp.
priors in mcp contains more about finer control of order-restriction.

Resources