Upper limit for n_background == 0 with pyhf - pyhf

I am trying to obtain the upper limit for n_observed==0 and n_background==0 events with +pyhf*, in this case I expect to obtain 2.3 (Table 39.3 pdg-statistics PDF)
I tried to create the workspace with 1 bin where I specified the 'background' sample with 'data':[0], and 'observations' with 'data':[0] but I am struggling in the correct way to obtain this 2.3.

Please see the Stack Overflow question Fit convergence failure in pyhf for small signal model where we cover that models with bins with 0 for all entries are not valid.

Related

H2O document question for stopping_tolerance, score_each_iteration, score_tree_interval, etc

I have the following questions that still confused me after I read the h2o document. Can someone provide some explanation for me
For the stopping_tolerance = 0.001, let's use AUC for example, current AUC is 0.8. Does that mean the AUC need to increase 0.8 + 0.001 or need to increase 0.8*(1+0.1%)?
score_each_iteration, in H2O document
(http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/score_each_iteration.html) it just say "iteration". But what exactly is the definition for each
"iteration", is that each tree or each grid search or each K folder
cross validation or something else?
Can I define score_tree_interval and set score_each_iteration = True
at the same time or I can only use one of them to make the grid
search repeatable?
Is there any difference to put 'stopping_metric',
'stopping_tolerance', 'stopping_rounds' in
H2OGradientBoostingEstimator vs in search_criteria of H2OGridSearch?
I found put in H2OGradientBoostingEstimator will make the code run
much faster when I test it in Spark environment
0.001 is the same as 0.1%, for AUC since bigger is better, you will want to see an increase of at least .001 after a specified number of scoring rounds.
You have linked to a portion of the documentation that is specific to the algorithms listed in Available in at the top of the page. So let's stick to answering this question with respect to individual models and not grid search. If you want to see what is being scored at each iteration take a look at your model results in Flow or use my_model.plot() (for the python api) to see what is getting scored at each iteration. For GBM and DRF this will be ntrees, but since different algorithms will have different aspects that change the word iteration is used since it is more generic.
Did you test this out? what did you find when you did this? Take a look at the scoring history plot in flow and notice what happens when you set both score_tree_interval and score_each_iteration = True versus when you only set score_tree_interval (I would recommend trying to understand these parameters at the individual model level before you use grid search).
yes, in once case you are specifying early stopping as you build an individual model in the case of grid search you are indicating whether on not to build more models.

Sampling from discrete distribution without replacement where the probabilities change each draw

I have a sequence S = (s1,s2,...sk) with probability weights for each sequence site P = (p1,p2,...pk) where the sum of P = 1 maximum length of S may be around 10^9
By Simulation a site k is picked and modified after each draw , as reason the pk also changes each run through. Expected number of site exchanges is about 50k - 100k per simulation
Question 1: How would you suggest to draw site?
Actually I implemented this logic which seems to be ok itself as going along literature see e.g. here:
counter = 0
random_number = draw_random() #<= float in range 0,1
while P_sum < random_number
P_sum += P[counter]
counter++
return counter
By testing the simulation I observed a strong bias which seems to rebuilt random generators distribution ( see_here ) Three different generators generate 3 different results... which is fairly ok but none of them is correct at all states
Walkers and Knuth's methods with lookup table seem to be too time expensive for me as the lookup tables have to be recalculated each time.
Question 2 How can I reduce bias from randomness? Actual built in 3 different generators (only one used per simulation) which are uniform distributed in kindness to their chances. Knowing this is a heavy question when not knowing a line of simulation code
Question 3 Library for the thing ?
As it's not to much code I don't have problem to write on my own, but is there a another library for it which may not BOOST? Asking as this question may be outdated... Not Boost as I don't want to built in a fourth random generator and use that large thing
Question 4 Faster alternative?
I know that this topic was answered may thousands of time before - but none of the answers satisfies me enough nor gave me a wise alternative e.g. here seems to have the same problem but I don't understand which heap is where built and why in addition it seems very complicated for such a "easy" thing
Thank you for your support!

WEKA difference between output of J48 and ID3 algorithm

I have a data set which I am classifying in WEKA using J48 and ID3 algorithm. The output of J48 algorithm is:
Correctly Classified Instances 73 92.4051 %
Incorrectly Classified Instances 6 7.5949 %
Kappa statistic 0.8958
Mean absolute error 0.061
Root mean squared error 0.1746
Relative absolute error 16.7504 %
Root relative squared error 40.9571 %
Total Number of Instances 79
and the output using ID3 is:
Correctly Classified Instances 79 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0
Root mean squared error 0
Relative absolute error 0 %
Root relative squared error 0 %
Total Number of Instances 79
My question is, if J48 is an extension of ID3 and is newer compared to it, how come ID3 is giving a better result than J48?
The J48 model is more accurate in the quality in the process, based in C4.5 is an extension of ID3 that accounts for unavailable values, continuous attribute value ranges, pruning of decision trees, rule derivation, and so on. The result in this case is only reflect of the kind of your data set you used. The ID3 could be implemented when you need more faster/simpler result without taking into account all those additional factors in the J48 consider. Take a Look into pruning decision tree and deriving rule sets HERE
In the web are a lot of resource in the theme about these comparatives results its more important to learn to identified in which case we apply the different classifier once we know how each one work(1)
Decision trees are more likely to face problem of Data over-fitting , In your case ID3 algorithm is facing the issue of data over-fitting. This is the problem of Decision trees ,that it splits the data until it make pure sets. This Problem of Data over-fitting is fixed in it's extension that is J48 by using Pruning.
Another point to cover : You should use K-fold Cross validation for Validating your Model.

Efficient storage of matrix from interview questions

I had several interviews and failed some of them.
There was a question actually asked by three different companies. Not exactly the same but they share common structure.
The question is: Now you have a matrix with 0 or 1 (or useful user profile represents by "1" and non useful represents by "0"// or image with 1 and 0 value). Now you need to store the image into the system efficiently. What method should you use?
In my opinion, they were expecting me to come up with efficient solution so I told them using 0, 1 and "value" together.
For example. 00000011100011111
Can be stored as 06 13 03 15
I know there's a encode method similar to this in multimedia or information technology.
But I don't think this is what they want.
And idea?
Thanks.!
It depends on what the meaning of "efficient".
A decent compromise between space and efficiency is to select a native datatype slightly bigger than the number of questions.
I e if you want to store 10 different binary values per item/person, use a short (or that's what the 16 bit data type is called in java).
If you store these in an array (short[]), you can very quickly find the item/person by it's id, used as a position in the array, and then a certain value by using bitwise operations (in this case bitshift a 1 to the position where the interesting item is stored and then use the bitwise operation & (and) to get this certain bit out. If the resulting short != 0 you know that bit was set.

Some details about adjusting cascaded AdaBoost stage threshold

I have implemented AdaBoost sequence algorithm and currently I am trying to implement so called Cascaded AdaBoost, basing on P. Viola and M. Jones original paper. Unfortunately I have some doubts, connected with adjusting the threshold for one stage. As we can read in original paper, the procedure is described in literally one sentence:
Decrease threshold for the ith classifier until the current
cascaded classifier has a detection rate of at least
d × Di − 1 (this also affects Fi)
I am not sure mainly two things:
What is the threshold? Is it 0.5 * sum (alpha) expression value or only 0.5 factor?
What should be the initial value of the threshold? (0.5?)
What does "decrease threshold" mean in details? Do I need to iterative select new threshold e.g. 0.5, 0.4, 0.3? What is the step of decreasing?
I have tried to search this info in Google, but unfortunately I could not find any useful information.
Thank you for your help.
I had the exact same doubt and have not found any authoritative source so far. However, this is what is my best guess to this issue:
1. (0.5*sum(aplha)) is the threshold.
2. Initial value of the threshold is what is above. Next, try to classify the samples using the intermediate strong classifier (what you currently have). You'll get the scores each of the samples attain, and depending on the current value of threshold, some of the positive samples will be classified as negative etc. So, depending on the desired detection rate desired for this stage (strong classifier), reduce the threshold so that that many positive samples get correctly classified ,
eg:
say thresh. was 10, and these are the current classifier outputs for positive training samples:
9.5, 10.5, 10.2, 5.4, 6.7
and I want a detection rate of 80% => 80% of above 5 samples classified correctly => 4 of above => set threshold to 6.7
Clearly, by changing the threshold, the FP rate also changes, so update that, and if the desired FP rate for the stage not reached, go for another classifier at that stage.
I have not done a formal course on ada-boost etc, but this is my observation based on some research papers I tried to implement. Please correct me if something is wrong. Thanks!
I have found a Master thesis on real-time face detection by Karim Ayachi (pdf) in which he describes the Viola Jones face detection method.
As it is written in Section 5.2 (Creating the Cascade using AdaBoost), we can set the maximal threshold of the strong classifier to sum(alpha) and the minimal threshold to 0 and then find the optimal threshold using binary search (see Table 5.1 for pseudocode).
Hope this helps!

Resources