I am trying to port a model from Infer.NET, and I am struggling with
how can I make a Deterministic variable observed in pymc3?
M,L ~ Bernoulli
# doesn't work ...
Deterministic("U %i" % i, switch(M[i], ~L[i], L[i]), observed=True)
It's not quite clear what you are trying to model (you are more likely to get replies with a complete description of the problem and attempt at code), but in pymc3 you pass data via the 'observed' argument to specify the likelihood function. For example, if you want to estimate the probability of success for Bernoulli-distributed random variable, the likelihood for the model would be
likelihood = pm.Bernoulli('likelihood', prior_p_success, observed=data)
where prior_p_success is the prior probability of success and data is a vector of your binary data.
Related
I have noticed that my gensim Doc2Vec (DBOW) model is sensitive to document tags. My understanding was that these tags are cosmetic and so they should not influence the learned embeddings. Am I misunderstanding something? Here is a minimal example:
from gensim.test.utils import common_texts
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
import numpy as np
import os
os.environ['PYTHONHASHSEED'] = '0'
reps = []
for a in [0,500]:
documents = [TaggedDocument(doc, [i + a])
for i, doc in enumerate(common_texts)]
model = Doc2Vec(documents, vector_size=100, window=2, min_count=0,
workers=1, epochs=10, dm=0, seed=0)
reps.append(np.array([model.docvecs[k] for k in range(len(common_texts))])
reps[0].sum() == reps[1].sum()
This last line returns False. I am working with gensim 3.8.3 and Python 3.5.2. More generally, is there any role that the values of the tags play (assuming they are unique)? I ask because I have found that using different tags for documents in a classification task leads to widely varying performance.
Thanks in advance.
First & foremost, your test isn't even comparing vectors corresponding to the same texts!
In run #1, the vector for the 1st text in in model.docvecs[0]. In run #2, the vector for the 1st text is in model.docvecs[1].
And, in run #2, the vector at model.docvecs[0] is just a randomly-initialized, but never-trained, vector - because none of the training texts had a document tag of (int) 0. (If using pure ints as the doc-tags, Doc2Vec uses them as literal indexes - potentially leaving any unused slots less than your highest tag allocated-and-initialized, but never-trained.)
Since common_texts only has 11 entries, by the time you reach run #12, all the vectors in your reps array of the first 11 vectors are garbage uncorrelated with any of your texts/
However, even after correcting that:
As explained in the Gensim FAQ answer #11, determinism in this algorithm shouldn't generally be expected, given many sources of potential randomness, and the fuzzy/approximate nature of the whole approach. If you're relying on it, or testing for it, you're probably making some unwarranted assumptions.
In general, tests of these algorithms should be evaluating "roughly equivalent usefulness in comparative uses" rather than "identical (or even similar) specific vectors". For example, a test whether apple and orange are roughly at the same positions in each others' nearest-neighbor rankings makes more sense than checking their (somewhat arbitrary) exact vector positions or even cosine-similarity.
Additionally:
tiny toy datasets like common_texts won't show the algorithm's usual behavior/benefits
PYTHONHASHSEED is only consulted by the Python interpreter at startup; setting it from Python can't have any effect. But also, the kind of indeterminism it introduces only comes up with separate interpreter launches: a tight loop within a single interpreter run like this wouldn't be affected by that in any case.
Have you checked the magnitude of the differences?
Just running:
delta = reps[0].sum() - reps[1].sum()
for the aggregate differences results with -1.2598932e-05 when I run it.
Comparison dimension-wise:
eps = 10**-4
over = (np.abs(diff) <= eps).all()
Returns True on a vast majority of the runs which means that you are getting quite reproducible results given the complexity of the calculations.
I would blame numerical stability of the calculations or uncontrolled randomness. Even though you do try to control the random seed, there is a different random seed in NumPy and different in random standard library so you are not controlling for all of the sources of randomness. This can also have an influence on the results but I did not check the actual implementation in gensim and it's dependencies.
Change
import os
os.environ['PYTHONHASHSEED'] = '0'
to
import os
import sys
hashseed = os.getenv('PYTHONHASHSEED')
if not hashseed:
os.environ['PYTHONHASHSEED'] = '0'
os.execv(sys.executable, [sys.executable] + sys.argv)
So, I have a model of a tube with pressure loss, where the unknown is the mass flow rate. Normally, and on most models of this problem, the conservation equations are used to calculate the mass flow rate, but such models have lots of convergence issues (because of the blocked flow at the end of the tube which results in an infinite pressure derivative at the end). See figure below for a representation of the problem on the left and the right a graph showing the infinite pressure derivative.
Because of that I'm using a model which is more robust, though it outputs not the mass flow rate but the tube length, which is known. Therefore an iterative loop is needed to determine the mass flow rate. Ok then, I coded a function length that given the tube geometry, mass flow rate and boundary conditions it outputs the calculated tube length and made the equations like so:
parameter Real L;
Real m_flow;
...
equation
L = length(geometry, boundary, m_flow)
It simulates fine, but it takes ages... And it shouldn't because the mass flow rate is rather insensitive to the tube length, e.g. if L=3 I could say that m_flow has converged if the output of length is within L ± 0.1. On the other hand the default convergence tolerance of DASSL in Dymola is 0.0001, which is fine for all other variables, but a major setback to my model here...
That being said, I'd like to know if there's a (hacky) way of setting a specific tolerance L (from annotations or something). I was unable to find any solution online or in Dymola's user manual... So far I managed a workaround by making a second function which uses a Newton-Raphson method to determine the mass flow rate, something like:
function massflowrate
input geometry, boundary, m_flow_start, tolerance;
output m_flow;
protected
Real error, L, dL, dLdm_flow, Delta_m_flow;
algorithm
error = geometry.L;
m_flow = m_flow_start;
while error>tolerance loop
L = length(geometry, boundary, m_flow);
error = abs(boundary.L - L);
dL = length(geometry, boundary, m_flow*1.001);
dLdm_flow = dL/(0.001*m_flow);
Delta_m_flow = (geometry.L - L)/dLdm_flow;
m_flow = m_flow + Delta_m_flow;
end while;
end massflowrate;
And then I use it in the equations section:
parameter Real L;
Real m_flow;
...
equation
m_flow = massflowrate(geometry, boundary, delay(m_flow,10), tolerance)
Nevertheless, this solutions is not without it's problems, the real equations are very non-linear and depending on the boundary conditions the solver reaches a never-ending loop... =/
PS: I'm sorry for the long post and the lack of a MWE, the real equations are very long and with loads of thermodynamics which I believe not to be of any help, be that as it may, if necessary, I'm able to provide the real model.
Is the length-function smooth? To me that it being non-smooth seems like a likely cause for problems, and the suggestions by #Phil might also be good ideas.
However, it should also be possible to do what you want as follows:
Real m_flow(nominal=1e9);
Explanation: The equations are normally solved to a certain tolerance in unknowns - in this case m_flow.
The tolerance for each variable is a relative/absolute tolerance taking into the nominal value, and Dymola does not allow you to set different tolerances for different variables.
Thus the simple way to compute m_flow less accurately is by setting a high nominal value for it, since the error tolerance will be tol*(abs(m_flow)+abs(nominal(m_flow))) or something like that.
The downside is that it may be too inaccurate, e.g. causing additional events, or that the error is so random that the solver is still slowed down.
I have an equation of motion function file which I feed into ode45. Necessarily, the output variables of the function file is ydot.
Within my equation of motion function file, I calculate many objects from the state vector, y, to prescribe forces.
After ode45 is finished, I would like access to these objects at every time step so that I can calculate an energy.
Instead of recalculating them over every time step, it would be faster to just pull them from the Runge-Kutta process when they are calculated as intermediate steps anyway.
Is it possible to do this?
There is no guarantee that the ODE function for the right side is even called at the output points as they are usually interpolated from the points computed by the adaptive step size algorithm.
One trick I have often seen but would need to search for references is to have the function return all the values you will need and cut the return list down to the derivative in the ODE45 call. Modulo appropriate syntax
function [ydot, extra] = odefunc(t,y,params)
and then use
sol = ode45(#(t,y): odefunc(t,y,params)(1),...)
and then run odefunc on the points in sol to extract the extra information.
Perhaps that idea of selecting the output only works in python. Then define an explicit wrapper
function ydot = odewrapper(t,y)
[ydot,~] = odefunc(t,y,params)
end
that you then normally call in ode45.
I try to use the following code to check the validation accuracy every 100 iterations, however, the validation accuracy is not changing(the network is fine)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(1000):
batch = mnist.train.next_batch(50)
if i % 100 == 0:
train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_:batch[1], keep_prob:1.0})
print('step %d, training accuracy %g' %(i, train_accuracy))
validation_accuracy = accuracy.eval(feed_dict={x:mnist.test.images, y_:mnist.test.labels, keep_prob:0.0})
print('step %d, validation accuracy %g' %(i, validation_accuracy))
train_step.run(feed_dict={x:batch[0], y_:batch[1], keep_prob:0.5})
Since you haven't added your network implementation, my answer will be an educated guess.
TL;DR: You should use keep_prob:1.0 instead of keep_prob:0.0 in your validation step.
By the appearance of keep_prob, I deduce that your network is using dropout. By using feed_dict={x:mnist.test.images, y_:mnist.test.labels, keep_prob:0.0}, you are feeding a 0.0 probability to keeping an activation, which is equivalent to a 1.0 probability of dropping it. The result is that when you are performing validation, you are basically ignoring the input to the network and all hidden layers. This has the effect that the last layer gives you the same output values for all classes of MNIST (this may be only approximately true, depending on the specific implementation), therefore the accuracy is constant.
Dropout is a method for regularization, which drops neurons during training steps, thus improving the generalization ability of the network. When you are not training (such as during a validation step), you want to keep all neurons. Thus, what you probably want to do is feed the value 1.0 instead.
My model has three parameters, say theta_1, theta_2 and nu.
I want to sample theta_1, theta_2 from the posterior with nu marginalized out (which can be done analytically), i.e. from p(theta_1, theta_2 | D) instead of p(theta_1, theta_2, nu | D) where D is the data. After that, I want to resample nu based on the new values of theta_1 and theta_2. So one sampling scan would consist of the steps
Draw theta_1 and theta_2 from p(theta_1, theta_2 | D) (with nu marginalized out)
Draw nu from p(nu | theta_1, theta_2, D) (with nu marginalized out)
In other words, a collapsed Gibbs sampler.
How would I go about that with PyMC3? I reckon I should implement an individual step function, but I'm not sure how to construct the likelihood here. How do I get access to the model specification when implementing a step function in PyMC3?
The notions of step methods and likelihoods are somewhat conflated in the question, but I see what you are driving at. Step methods are typically independent of the likelihood, which is passed to the step method as an argument. For example check out the slice sampler step method in PyMC 3. Likelihoods are stochastic objects that return log-likelihood values conditional on the values of their parents in the directed acyclic graph.
If you are doing Gibbs sampling, you are not typically concerned with evaluating likelihoods because you are iteratively sampling directly from the conditionals of the model parameters. We do not currently have Gibbs in PyMC 3, and there is some rudimentary Gibbs support in PyMC 2. Its a little troublesome to implement generally because it involves recognizing conjugate associations in the model. Moreover, in PyMC 3 you have access to gradient-based samplers (Hamiltonian), which are much more efficient than Gibbs, so there are a few reasons you may not want to implement Gibbs.
That said, PyMC offers a tremendous amount of flexibility for implementing custom step methods and likeihoods. So long as the step (astep) function returns a new point, you can pretty much do what you like otherwise. There's no guarantee that it will be a good sampler,