I have this problem: I have a cohort of individuals grouped in 5 age groups. Initially all of them are susceptible and then they develop disease and finally they have cancers. I have information about the age group distribution of the susceptible and then the cancer carrier. Between the susceptible state and the cancer they pass through 7 stages , with same transition rate.
I'm trying to create a model that simulate each transition as a binomial extraction and fit the data I have.
I tried something but in the moment of analysing the traces , nothing work
You can see the code
Where am I getting wrong?
Thanks for any help
from pylab import *
from pymc import *
from pymc.Matplot import plot as plt
#susceptible_data = array([647,1814,8838,9949,1920])
susceptible_data = array([130,398,1415,1303,206])
infected_data_100000 = array([0,197,302,776,927])
infected_data = array([0,7,38,90,17])
prior_values=np.zeros(len(infected_data))
for i in range(0,len(infected_data)):
prior_values[i]=infected_data[i]/susceptible_data[i]
# stochastic priors
beta1 = Uniform('beta1', 0., 1.)
lambda_0_temp=susceptible_data[0]
lambda_0_1=pymc.Binomial("lambda_0_1",lambda_0_temp,pow(beta1,1))
lambda_0_2=pymc.Binomial("lambda_0_2",lambda_0_1.value,pow(beta1,1))
lambda_0_3=pymc.Binomial("lambda_0_3",lambda_0_2.value,pow(beta1,1))
lambda_0_4=pymc.Binomial("lambda_0_4",lambda_0_3.value,pow(beta1,1))
lambda_0_5=pymc.Binomial("lambda_0_5",lambda_0_4.value,pow(beta1,1))
lambda_0_6=pymc.Binomial("lambda_0_6",lambda_0_5.value,pow(beta1,1))
lambda_0_7=pymc.Binomial("lambda_0_7",n=lambda_0_6.value,p=pow(beta1,1),value=infected_data[0],observed=True)
lambda_1_temp=susceptible_data[1]
lambda_1_1=pymc.Binomial("lambda_1_1",lambda_1_temp,pow(beta1,1))
lambda_1_2=pymc.Binomial("lambda_1_2",lambda_1_1.value,pow(beta1,1))
lambda_1_3=pymc.Binomial("lambda_1_3",lambda_1_2.value,pow(beta1,1))
lambda_1_4=pymc.Binomial("lambda_1_4",lambda_1_3.value,pow(beta1,1))
lambda_1_5=pymc.Binomial("lambda_1_5",lambda_1_4.value,pow(beta1,1))
lambda_1_6=pymc.Binomial("lambda_1_6",lambda_1_5.value,pow(beta1,1))
lambda_1_7=pymc.Binomial("lambda_1_7",n=lambda_1_6.value,p=pow(beta1,1),value=infected_data[1],observed=True)
lambda_2_temp=susceptible_data[2]
lambda_2_1=pymc.Binomial("lambda_2_1",lambda_2_temp,pow(beta1,1))
lambda_2_2=pymc.Binomial("lambda_2_2",lambda_2_1.value,pow(beta1,1))
lambda_2_3=pymc.Binomial("lambda_2_3",lambda_2_2.value,pow(beta1,1))
lambda_2_4=pymc.Binomial("lambda_2_4",lambda_2_3.value,pow(beta1,1))
lambda_2_5=pymc.Binomial("lambda_2_5",lambda_2_4.value,pow(beta1,1))
lambda_2_6=pymc.Binomial("lambda_2_6",lambda_2_5.value,pow(beta1,1))
lambda_2_7=pymc.Binomial("lambda_2_7",n=lambda_2_6.value,p=pow(beta1,1),value=infected_data[2],observed=True)
lambda_3_temp=susceptible_data[3]
lambda_3_1=pymc.Binomial("lambda_3_1",lambda_3_temp,pow(beta1,1))
lambda_3_2=pymc.Binomial("lambda_3_2",lambda_3_1.value,pow(beta1,1))
lambda_3_3=pymc.Binomial("lambda_3_3",lambda_3_2.value,pow(beta1,1))
lambda_3_4=pymc.Binomial("lambda_3_4",lambda_3_3.value,pow(beta1,1))
lambda_3_5=pymc.Binomial("lambda_3_5",lambda_3_4.value,pow(beta1,1))
lambda_3_6=pymc.Binomial("lambda_3_6",lambda_3_5.value,pow(beta1,1))
lambda_3_7=pymc.Binomial("lambda_3_7",n=lambda_3_6.value,p=pow(beta1,1),value=infected_data[3],observed=True)
lambda_4_temp=susceptible_data[4]
lambda_4_1=pymc.Binomial("lambda_4_1",lambda_4_temp,pow(beta1,1))
lambda_4_2=pymc.Binomial("lambda_4_2",lambda_4_1.value,pow(beta1,1))
lambda_4_3=pymc.Binomial("lambda_4_3",lambda_4_2.value,pow(beta1,1))
lambda_4_4=pymc.Binomial("lambda_4_4",lambda_4_3.value,pow(beta1,1))
lambda_4_5=pymc.Binomial("lambda_4_5",lambda_4_4.value,pow(beta1,1))
lambda_4_6=pymc.Binomial("lambda_4_6",lambda_4_5.value,pow(beta1,1))
lambda_4_7=pymc.Binomial("lambda_4_7",n=lambda_4_6.value,p=pow(beta1,1),value=infected_data[4],observed=True)
model=pymc.Model([lambda_0_7,lambda_1_7,lambda_2_7,lambda_3_7,lambda_4_7,beta1])
mcmc =pymc.MCMC(model)
mcmc.sample(iter=100000, burn=50000, thin=10, verbose=1)
lambda_0_samples=mcmc.trace('lambda_0_7')[:]
lambda_1_samples=mcmc.trace('lambda_1_7')[:]
lambda_2_samples=mcmc.trace('lambda_2_7')[:]
lambda_3_samples=mcmc.trace('lambda_3_7')[:]
lambda_4_samples=mcmc.trace('lambda_4_7')[:]
beta1_samples=mcmc.trace('beta1')[:]
What you have implemented above only associates data with the 7th distribution in each set; the others are seemingly-redundant hierarchies on the binomial probability. I would think you want data informing each stage. I'm not sure there is information to inform what the values of p should be at each stage, based on what is provided.
Related
I'm new. I'm trying to fine-tuned a BERT MLM (bert-base-uncased) on a target domain. Unfortunately, results are not good.
Before fine-tuning, the pre-trained model fills the mask of a sentence with words in line of human expectations.
E.g. Wikipedia is a free online [MASK], created and edited by volunteers around the world.
The most probable prediction are encyclopedia (score: 0.650) and resource (score:0.087).
After fine-tuning, the prediction are completely wrong. Often stopwords are predicted as result.
E.g. Wikipedia is a free online [MASK], created and edited by volunteers around the world.
The most probable prediction are the (score: 0.052) and be (score:0.033).
I experimented with different epochs (from 1 to 10) and different datasets (from a few MB to a few GB) but I got the same issue. What am I doing wrong? I'm using the following code, I hope you can help me.
from transformers import AutoConfig, AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
config = AutoConfig.from_pretrained('bert-base-uncased', output_hidden_states=True)
model = AutoModelForMaskedLM.from_config(config) # BertForMaskedLM.from_pretrained(path)
from transformers import LineByLineTextDataset
dataset = LineByLineTextDataset(tokenizer=tokenizer,
file_path="data/english/corpora.txt",
block_size = 512)
from transformers import DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=True, mlm_probability=0.15)
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(output_dir="output/models/english",
overwrite_output_dir=True,
num_train_epochs=5,
per_gpu_train_batch_size=8,
save_steps = 22222222,
save_total_limit=2)
trainer = Trainer(model=model, args=training_args, data_collator=data_collator, train_dataset=dataset)
trainer.train()
trainer.save_model("output/models/english")
from transformers import pipeline
# Initialize MLM pipeline
mlm = pipeline('fill-mask', model="output/models/english", tokenizer="output/models/english")
# Get mask token
mask = mlm.tokenizer.mask_token
# Get result for particular masked phrase
phrase = f'Wikipedia is a free online {mask}, created and edited by volunteers around the world'
result = mlm(phrase)
# Print result
print(result)
I have training my model to detect normal vs pneumonia chest x-ray classes. This is my dataset as listed below:
train_batch= ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input)\
.flow_from_directory(directory=train_path, target_size=(224,224), classes=['NORMAL', 'PNEUMONIA'],
batch_size=32,class_mode='categorical')
val_batch= ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input) \
.flow_from_directory(directory=val_path, target_size=(224,224), classes=['NORMAL', 'PNEUMONIA'], batch_size=32, class_mode='categorical')
test_batch= ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input) \
.flow_from_directory(directory=test_path, target_size=(224,224), classes=['NORMAL', 'PNEUMONIA'], batch_size=16,class_mode='categorical', shuffle=False)
Found 3616 images belonging to 2 classes. #training
Found 1616 images belonging to 2 classes. #validation
Found 624 images belonging to 2 classes. #test
my model consist of 5 CNN layers where image w,h = (224* 224,3) with 16 feature map as first layer and then 32, 64, 128,256. Batch normalization , max pooling and dropout is added to every cnn layer, but last dense layer is as follow
model.add(Dense(units=2 , activation='softmax'))
optim = Adam( lr=0.001 )
model.compile(optimizer=optim , loss= 'categorical_crossentropy' , metrics= ['accuracy'])
history=model.fit_generator(train_batch,
steps_per_epoch= 113, #3616/32=113
epochs = 25,
validation_data = val_batch,
validation_steps = 51 #1616/32=51
#verbose=2
#callbacks=callbacks #remove to chk
)
as it can be seen in the graph that my training and validation accuracy and loss is good but when I plot confusion matrix it dose not seems good why??
prediction = model.predict_generator(test_batch,steps= stepss) #, verbose=0)
prediction1 = np.argmax(prediction, axis=1)
cm = confusion_matrix (test_batch.classes, prediction1)
print(cm)
this is my confusion matrix as below
as you can see my graph which is as below
after that I did fine tuning of my model with VGG!6 by replacing last dense layer with my own dense layer with two outputs and here is the graph and confusion matrix:
I do not understand why my testing in not going good even with vgg16 model as you can see the results so please give me your valuable suggestions THANKS
I'm trying to learn more about pyhf and my understanding of what the goals are might be limited. I would love to fit my HEP data outside of ROOT, but I could be imposing expectations on pyhf which are not what the authors intended for it's use.
I'd like to write myself a hello-world example, but I might just not know what I'm doing. My misunderstanding could also be gaps in my statistical knowledge.
With that preface, let me explain what I'm trying to explore.
I have some observed set of events for which I calculate some observable and make a binned histogram of that data. I hypothesize that there are two contributing physics processes, which I call signal and background. I generate some Monte Carlo samples for these processes and the theorized total number of events is close to, but not exactly what I observe.
I would like to:
Fit the data to this two process hypothesis
Get from the fit the optimal values for the number of events for each process
Get the uncertainties on these fitted values
If appropriate, calculate an upper limit on the number of signal events.
My starter code is below, where all I'm doing is an ML fit but I'm not sure where to go. I know it's not set up to do what I want, but I'm getting lost in the examples I find on RTD. I'm sure it's me, this is not a criticism of the documentation.
import pyhf
import numpy as np
import matplotlib.pyplot as plt
nbins = 15
# Generate a background and signal MC sample`
MC_signal_events = np.random.normal(5,1.0,200)
MC_background_events = 10*np.random.random(1000)
signal_data = np.histogram(MC_signal_events,bins=nbins)[0]
bkg_data = np.histogram(MC_background_events,bins=nbins)[0]
# Generate an observed dataset with a slightly different
# number of events
signal_events = np.random.normal(5,1.0,180)
background_events = 10*np.random.random(1050)
observed_events = np.array(signal_events.tolist() + background_events.tolist())
observed_sample = np.histogram(observed_events,bins=nbins)[0]
# Plot these samples, if you like
plt.figure(figsize=(12,4))
plt.subplot(1,3,1)
plt.hist(observed_events,bins=nbins,label='Observations')
plt.legend()
plt.subplot(1,3,2)
plt.hist(MC_signal_events,bins=nbins,label='MC signal')
plt.legend()
plt.subplot(1,3,3)
plt.hist(MC_background_events,bins=nbins,label='MC background')
plt.legend()
# Use a very naive estimate of the background
# uncertainties
bkg_uncerts = np.sqrt(bkg_data)
print("Defining the PDF.......")
pdf = pyhf.simplemodels.hepdata_like(signal_data=signal_data.tolist(), \
bkg_data=bkg_data.tolist(), \
bkg_uncerts=bkg_uncerts.tolist())
print("Fit.......")
data = pyhf.tensorlib.astensor(observed_sample.tolist() + pdf.config.auxdata)
bestfit_pars, twice_nll = pyhf.infer.mle.fit(data, pdf, return_fitted_val=True)
print(bestfit_pars)
print(twice_nll)
plt.show()
Note: this answer is based on pyhf v0.5.2.
Alright, so it looks like you've managed to figure most of the big pieces for sure. However, there's two different ways to do this depending on how you prefer to set things up. In both cases, I assume you want an unconstrained fit and you want to...
fit your signal+background model to observed data
fit your background model to observed data
First, let's discuss uncertainties briefly. At the moment, we default to numpy for the tensor background and scipy for the optimizer. See documentation:
numpy backend
scipy optimizer
However, one unfortunate drawback right now with the scipy optimizer is that it cannot return the uncertainties. What you need to do anywhere in your code before the fit (although we generally recommend as early as possible) is to use the minuit optimizer instead:
pyhf.set_backend('numpy', 'minuit')
This will get you the nice features of being able to get the correlation matrix, the uncertainties on the fitted parameters, and the hessian -- amongst other things. We're working to make this consistent for scipy as well, but this is not ready right now.
All optimizations go through our optimizer API which you can currently view through the mixin here in our documentation. Specifically, the signature is
minimize(
objective,
data,
pdf,
init_pars,
par_bounds,
fixed_vals=None,
return_fitted_val=False,
return_result_obj=False,
do_grad=None,
do_stitch=False,
**kwargs)
There are a lot of options here. Let's just focus on the fact that one of the keyword arguments we can pass through is return_uncertainties which will change the bestfit parameters by adding a column for the fitted parameter uncertainty which you want.
1. Signal+Background
In this case, we want to just use the default model
result, twice_nll = pyhf.infer.mle.fit(
data,
pdf,
return_uncertainties=True,
return_fitted_val=True
)
bestfit_pars, errors = result.T
2. Background-Only
In this case, we need to turn off the signal. The way we do this is by setting the parameter of interest (POI) fixed to 0.0. Then we can get the fitted parameters for the background-only model in a similar way, but using fixed_poi_fit instead of an unconstrained fit:
result, twice_nll = pyhf.infer.mle.fixed_poi_fit(
0.0,
data,
pdf,
return_uncertainties=True,
return_fitted_val=True
)
bestfit_pars, errors = result.T
Note that this is quite simply a quick way of doing the following unconstrained fit
bkg_params = pdf.config.suggested_init()
fixed_params = pdf.config.suggested_fixed()
bkg_params[pdf.config.poi_index] = 0.0
fixed_params[pdf.config.poi_index] = True
result, twice_nll = pyhf.infer.mle.fit(
data,
pdf,
init_pars=bkg_params,
fixed_params=fixed_params,
return_uncertainties=True,
return_fitted_val=True
)
bestfit_pars, errors = result.T
Hopefully that clarifies things up more!
Giordon's solution should answer all of your question, but I thought I'd also write out the code to basically address everything we can.
I also take the liberty of changing some of your values a bit so that the signal isn't so strong that the observed CLs value isn't far off to the right of the Brazil band (the results aren't wrong obviously, but it probably makes more sense to be talking about using the discovery test statistic at that point then setting limits. :))
Environment
For this example I'm going to setup a clean Python 3 virtual environment and then install the dependencies (here we're going to be using pyhf v0.5.2)
$ python3 -m venv "${HOME}/.venvs/question"
$ . "${HOME}/.venvs/question/bin/activate"
(question) $ cat requirements.txt
pyhf[minuit,contrib]~=0.5.2
black
(question) $ python -m pip install -r requirements.txt
Code
While we can't easily get the best fit value for both the number of signal events as well as the background events we definitely can do inference to get the best fit value for the signal strength.
The following chunk of code (which is long only because of the visualization) should address all of the points of your question.
# answer.py
import numpy as np
import pyhf
import matplotlib.pyplot as plt
import pyhf.contrib.viz.brazil
# Goals:
# - Fit the model to the observed data
# - Infer the best fit signal strength given the model
# - Get the uncertainties on the best fit signal strength
# - Calculate an 95% CL upper limit on the signal strength
def plot_hist(ax, bins, data, bottom=0, color=None, label=None):
bin_width = bins[1] - bins[0]
bin_leftedges = bins[:-1]
bin_centers = [edge + bin_width / 2.0 for edge in bin_leftedges]
ax.bar(
bin_centers, data, bin_width, bottom=bottom, alpha=0.5, color=color, label=label
)
def plot_data(ax, bins, data, label="Data"):
bin_width = bins[1] - bins[0]
bin_leftedges = bins[:-1]
bin_centers = [edge + bin_width / 2.0 for edge in bin_leftedges]
ax.scatter(bin_centers, data, color="black", label=label)
def invert_interval(test_mus, hypo_tests, test_size=0.05):
# This will be taken care of in v0.5.3
cls_obs = np.array([test[0] for test in hypo_tests]).flatten()
cls_exp = [
np.array([test[1][idx] for test in hypo_tests]).flatten() for idx in range(5)
]
crossing_test_stats = {"exp": [], "obs": None}
for cls_exp_sigma in cls_exp:
crossing_test_stats["exp"].append(
np.interp(
test_size, list(reversed(cls_exp_sigma)), list(reversed(test_mus))
)
)
crossing_test_stats["obs"] = np.interp(
test_size, list(reversed(cls_obs)), list(reversed(test_mus))
)
return crossing_test_stats
def main():
np.random.seed(0)
pyhf.set_backend("numpy", "minuit")
observable_range = [0.0, 10.0]
bin_width = 0.5
_bins = np.arange(observable_range[0], observable_range[1] + bin_width, bin_width)
n_bkg = 2000
n_signal = int(np.sqrt(n_bkg))
# Generate simulation
bkg_simulation = 10 * np.random.random(n_bkg)
signal_simulation = np.random.normal(5, 1.0, n_signal)
bkg_sample, _ = np.histogram(bkg_simulation, bins=_bins)
signal_sample, _ = np.histogram(signal_simulation, bins=_bins)
# Generate observations
signal_events = np.random.normal(5, 1.0, int(n_signal * 0.8))
bkg_events = 10 * np.random.random(int(n_bkg + np.sqrt(n_bkg)))
observed_events = np.array(signal_events.tolist() + bkg_events.tolist())
observed_sample, _ = np.histogram(observed_events, bins=_bins)
# Visualize the simulation and observations
fig, ax = plt.subplots()
fig.set_size_inches(7, 5)
plot_hist(ax, _bins, bkg_sample, label="Background")
plot_hist(ax, _bins, signal_sample, bottom=bkg_sample, label="Signal")
plot_data(ax, _bins, observed_sample)
ax.legend(loc="best")
ax.set_ylim(top=np.max(observed_sample) * 1.4)
ax.set_xlabel("Observable")
ax.set_ylabel("Count")
fig.savefig("components.png")
# Build the model
bkg_uncerts = np.sqrt(bkg_sample)
model = pyhf.simplemodels.hepdata_like(
signal_data=signal_sample.tolist(),
bkg_data=bkg_sample.tolist(),
bkg_uncerts=bkg_uncerts.tolist(),
)
data = pyhf.tensorlib.astensor(observed_sample.tolist() + model.config.auxdata)
# Perform inference
fit_result = pyhf.infer.mle.fit(data, model, return_uncertainties=True)
bestfit_pars, par_uncerts = fit_result.T
print(
f"best fit parameters:\
\n * signal strength: {bestfit_pars[0]} +/- {par_uncerts[0]}\
\n * nuisance parameters: {bestfit_pars[1:]}\
\n * nuisance parameter uncertainties: {par_uncerts[1:]}"
)
# Perform hypothesis test scan
_start = 0.0
_stop = 5
_step = 0.1
poi_tests = np.arange(_start, _stop + _step, _step)
print("\nPerforming hypothesis tests\n")
hypo_tests = [
pyhf.infer.hypotest(
mu_test,
data,
model,
return_expected_set=True,
return_test_statistics=True,
qtilde=True,
)
for mu_test in poi_tests
]
# Upper limits on signal strength
results = invert_interval(poi_tests, hypo_tests)
print(f"Observed Limit on µ: {results['obs']:.2f}")
print("-----")
for idx, n_sigma in enumerate(np.arange(-2, 3)):
print(
"Expected {}Limit on µ: {:.3f}".format(
" " if n_sigma == 0 else "({} σ) ".format(n_sigma),
results["exp"][idx],
)
)
# Visualize the "Brazil band"
fig, ax = plt.subplots()
fig.set_size_inches(7, 5)
ax.set_title("Hypothesis Tests")
ax.set_ylabel(r"$\mathrm{CL}_{s}$")
ax.set_xlabel(r"$\mu$")
pyhf.contrib.viz.brazil.plot_results(ax, poi_tests, hypo_tests)
fig.savefig("brazil_band.png")
if __name__ == "__main__":
main()
which when run gives
(question) $ python answer.py
best fit parameters:
* signal strength: 1.5884737977889158 +/- 0.7803435235862329
* nuisance parameters: [0.99020988 1.06040191 0.90488207 1.03531383 1.09093327 1.00942088
1.07789316 1.01125627 1.06202964 0.95780043 0.94990993 1.04893286
1.0560711 0.9758487 0.93692481 1.04683181 1.05785515 0.92381263
0.93812855 0.96751869]
* nuisance parameter uncertainties: [0.06966439 0.07632218 0.0611428 0.07230328 0.07872258 0.06899675
0.07472849 0.07403246 0.07613661 0.08606657 0.08002775 0.08655314
0.07564512 0.07308117 0.06743479 0.07383134 0.07460864 0.06632003
0.06683251 0.06270965]
Performing hypothesis tests
/home/stackoverflow/.venvs/question/lib/python3.7/site-packages/pyhf/infer/calculators.py:229: RuntimeWarning: invalid value encountered in double_scalars
teststat = (qmu - qmu_A) / (2 * self.sqrtqmuA_v)
Observed Limit on µ: 2.89
-----
Expected (-2 σ) Limit on µ: 0.829
Expected (-1 σ) Limit on µ: 1.110
Expected Limit on µ: 1.542
Expected (1 σ) Limit on µ: 2.147
Expected (2 σ) Limit on µ: 2.882
Let us know if you have any further questions!
I am generating random healpix maps from an input angular power spectrum Cl. If I use healpy.synalm, then healpy.alm2map, and finally test the map by running healpy.anafast on the generated map, the output and input power spectra do not agree, especially at high l, the output power spectrum is above the input (See plot produced by code below). If I directly use healpy.synfast, I get an output power spectrum that agrees with the input. The same applies if I use the alms from healpy.synfast and generate a map from the synfast alms using healpy.alm2map. When I look into the source code of synfast, it seems to just call synalm and alm2map, so I don't understand, why their results disagree. My test code looks like this:
import numpy as np
import matplotlib.pyplot as plt
import classy
import healpy as hp
NSIDE = 32
A_s=2.3e-9
n_s=0.9624
h=0.6711
omega_b=0.022068
omega_cdm=0.12029
params = {
'output': 'dCl, mPk',
'A_s': A_s,
'n_s': n_s,
'h': h,
'omega_b': omega_b,
'omega_cdm': omega_cdm,
'selection_mean': '0.55',
'selection_magnification_bias_analytic': 'yes',
'selection_bias_analytic': 'yes',
'selection_dNdz_evolution_analytic': 'yes'}
cosmo = classy.Class()
cosmo.set(params)
cosmo.compute()
theory_cl=cosmo.density_cl()['dd']
data_map,data_alm=hp.synfast(theory_cl[0],NSIDE,alm=True)
data_cl=hp.anafast(data_map)
plt.plot(np.arange(len(data_cl)),data_cl,label="synfast")
data_map=hp.alm2map(data_alm,NSIDE)
data_cl=hp.anafast(data_map)
plt.plot(np.arange(len(data_cl)),data_cl,label="synfast using alm")
data_alm=hp.synalm(theory_cl[0])
data_map=hp.alm2map(data_alm,NSIDE)
data_cl=hp.anafast(data_map)
plt.plot(np.arange(len(data_cl)),data_cl,label="synalm")
plt.plot(np.arange(min(len(data_cl),len(theory_cl[0]))),theory_cl[0][:len(data_cl)],label="Theory")
plt.xlabel(r'$\ell$')
plt.ylabel(r'$C_\ell$')
plt.legend()
plt.show()
The offset becomes larger for lower NSIDE.
Thank you very much for your help.
Sorry, I missed that synfast knows about NSIDE, so the lmax is by default based on NSIDE, whereas synalm does not know about it, so it takes the maximum l of the input spectrum as lmax. Setting lmax to 3 * NSIDE -1 in synalm resolves the discrepancy.
I have set up a Bayes net with 3 states per node as below, and can read logp's for particular states from it (as in the code).
Next I would like to sample from it. In the code below, sampling runs but I don't see distributions over the three states in the outputs; rather, I see a mean and variance as if they were continuous nodes. How do I get the posterior over the three states?
import numpy as np
import pymc3 as mc
import pylab, math
model = mc.Model()
with model:
rain = mc.Categorical('rain', p = np.array([0.5, 0. ,0.5]))
sprinkler = mc.Categorical('sprinkler', p=np.array([0.33,0.33,0.34]))
CPT = mc.math.constant(np.array([ [ [.1,.2,.7], [.2,.2,.6], [.3,.3,.4] ],\
[ [.8,.1,.1], [.3,.4,.3], [.1,.1,.8] ],\
[ [.6,.2,.2], [.4,.4,.2], [.2,.2,.6] ] ]))
p_wetgrass = CPT[rain, sprinkler]
wetgrass = mc.Categorical('wetgrass', p_wetgrass)
#brute force search (not working)
for val_rain in range(0,3):
for val_sprinkler in range(0,3):
for val_wetgrass in range(0,3):
lik = model.logp(rain=val_rain, sprinkler=val_sprinkler, wetgrass=val_wetgrass )
print([val_rain, val_sprinkler, val_wetgrass, lik])
#sampling (runs but don't understand output)
if 1:
niter = 10000 # 10000
tune = 5000 # 5000
print("SAMPLING:")
#trace = mc.sample(20000, step=[mc.BinaryGibbsMetropolis([rain, sprinkler])], tune=tune, random_seed=124)
trace = mc.sample(20000, tune=tune, random_seed=124)
print("trace summary")
mc.summary(trace)
answering own question: the trace does contain the discrete values but the mc.summary(trace) function is set up to compute continuous mean and variance stats. To make a histogram of the discrete states, use h = hist(trace.get_values(sprinkler)) :-)