Not getting the given answer while computing std deviation of a binomial distribution - probability

I am trying to find the mean and standard deviation of a binomial distribution.
A basket ball player has the following probability for success in two shot free throws.
P(0) is 0.16,
P(1) is 0.48
P (2) is 0.36
I need to find the mean and std deviation.
I get the mean correctly as 1.2, but not able to get the std. deviation of the given answer of 0.69. Requesting guidance

Standard deviation equation:
std_dev = sqrt(sum((x_i - mean) ^ 2 * p_i))
So, your example:
std_dev = sqrt((0 - 1.2)^2 * 0.16 + (1 - 1.2)^2 * 0.48 + (2 - 1.2)^2 * 0.36)
= sqrt(1.44 * 0.16 + 0.04 * 0.48 + 0.64 * 0.36)
= sqrt(0.2304 + 0.0192 + 0.2304)
= sqrt(0.48)
~= 0.69282

Related

Why does coxph() combined with cluster() give much smaller standard errors than other methods to adjust for clustering (e.g. coxme() or frailty()?

I am working on a dataset to test the association between empirical antibiotics (variable emp, the antibiotics are cefuroxime or ceftriaxone compared with a reference antibiotic) and 30-day mortality (variable mort30). The data comes from patients admitted in 6 hospitals (variable site2) with a specific type of infection. Therefore, I would like to adjust for this clustering of patients on hospital level.
First I did this using the coxme() function for mixed models. However, based on visual inspection of the Schoenfeld residuals there were violations of the proportional hazards assumption and I tried adding a time transformation (tt) to the model. Unfortunately, the coxme() does not offer the possibility for time transformations.
Therfore, I tried other options to adjust for the clustering, including coxph() combined with frailty() and cluster. Surprisingly, the standard errors I get using the cluster() option are much smaller than using the coxme() or frailty().
**Does anyone know what is the explanation for this and which option would provide the most reliable estimates?
**
1) Using coxme:
> uni.mort <- coxme(Surv(FUdur30, mort30num) ~ emp + (1 | site2), data = total.pop)
> summary(uni.mort)
Cox mixed-effects model fit by maximum likelihood
Data: total.pop
events, n = 58, 253
Iterations= 24 147
NULL Integrated Fitted
Log-likelihood -313.8427 -307.6543 -305.8967
Chisq df p AIC BIC
Integrated loglik 12.38 3.00 0.0061976 6.38 0.20
Penalized loglik 15.89 3.56 0.0021127 8.77 1.43
Model: Surv(FUdur30, mort30num) ~ emp + (1 | site2)
Fixed coefficients
coef exp(coef) se(coef) z p
empCefuroxime 0.5879058 1.800214 0.6070631 0.97 0.33
empCeftriaxone 1.3422317 3.827576 0.5231278 2.57 0.01
Random effects
Group Variable Std Dev Variance
site2 Intercept 0.2194737 0.0481687
> confint(uni.mort)
2.5 % 97.5 %
empCefuroxime -0.6019160 1.777728
empCeftriaxone 0.3169202 2.367543
2) Using frailty()
uni.mort <- coxph(Surv(FUdur30, mort30num) ~ emp + frailty(site2), data = total.pop)
> summary(uni.mort)
Call:
coxph(formula = Surv(FUdur30, mort30num) ~ emp + frailty(site2),
data = total.pop)
n= 253, number of events= 58
coef se(coef) se2 Chisq DF p
empCefuroxime 0.6302 0.6023 0.6010 1.09 1.0 0.3000
empCeftriaxone 1.3559 0.5221 0.5219 6.75 1.0 0.0094
frailty(site2) 0.40 0.3 0.2900
exp(coef) exp(-coef) lower .95 upper .95
empCefuroxime 1.878 0.5325 0.5768 6.114
empCeftriaxone 3.880 0.2577 1.3947 10.796
Iterations: 7 outer, 27 Newton-Raphson
Variance of random effect= 0.006858179 I-likelihood = -307.8
Degrees of freedom for terms= 2.0 0.3
Concordance= 0.655 (se = 0.035 )
Likelihood ratio test= 12.87 on 2.29 df, p=0.002
3) Using cluster()
uni.mort <- coxph(Surv(FUdur30, mort30num) ~ emp, cluster = site2, data = total.pop)
> summary(uni.mort)
Call:
coxph(formula = Surv(FUdur30, mort30num) ~ emp, data = total.pop,
cluster = site2)
n= 253, number of events= 58
coef exp(coef) se(coef) robust se z Pr(>|z|)
empCefuroxime 0.6405 1.8975 0.6009 0.3041 2.106 0.035209 *
empCeftriaxone 1.3594 3.8937 0.5218 0.3545 3.834 0.000126 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
empCefuroxime 1.897 0.5270 1.045 3.444
empCeftriaxone 3.894 0.2568 1.944 7.801
Concordance= 0.608 (se = 0.027 )
Likelihood ratio test= 12.08 on 2 df, p=0.002
Wald test = 15.38 on 2 df, p=5e-04
Score (logrank) test = 10.69 on 2 df, p=0.005, Robust = 5.99 p=0.05
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
>

Formula to get next question in quiz basing on previous statistics

My goal is to dynamically determine what question should be next in quiz by using statistics of previous answers
So, I have:
Question with difficulty field (1-100)
Maximum score you can get in question (let it be 256)
Score user have reached in question (x out of max)
I want to somehow combine these paramaters in formula to choose most suitable next question for user
How can I do it?
My idea was to give user a question with median difficulty as first one and then check if user scored less than 50% of maximum, then get questions with 25 percentile difficulty else get 75 percentile. Then repeat this schema on a smaller stint (25-50 percentile or 50-75 percentile and so on)
Let's assume that the player has a fixed function score = f(difficulty) that gives for each difficulty the expected score percentage. Once we know this function, we can invert it and find the difficulty level that will give us the expected score we want.
However, the function is not known. But we have samples of this function in the form of our previous questions. So, we can fit a function to these samples. If you have knowledge about the form of the dependence, you can include that knowledge in the shape of your fitted function. I will simply assume a truncated linear function:
score = f(difficulty) = max(0, min(m * difficulty + n, 1))
The two parameters that we need to find are m and n. If we remove all sample questions where the user scored 100% or 0%, we can ignore the truncation. Then, we have a list of samples that form a linear system of equations:
score1 = m * difficulty1 + n
score2 = m * difficulty2 + n
score3 = m * difficulty3 + n
...
This system will usually not have a solution. So, we can solve for a least-squares solution. To do this, we will incrementally build a 2x2 matrix A and a 2-dimensional vector b that represent the system A * x = b. We will start with the zero matrix and the zero vector. For each question, we will update:
/ A11 A12 \ += / difficulty * difficulty difficulty \
\ A21 A22 / \ difficulty 1 /
/ b1 \ += / difficulty * score \
\ b2 / \ score /
Once we have added at least two questions, we can solve:
m = (A12 * b2 - A22 * b1) / (A12 * A12 - A11 * A22)
n = (A12 * b1 - A11 * b2) / (A12 * A12 - A11 * A22)
And we can find the difficulty for an expected score of P as:
difficulty = (P - n) / m
Let's do an example. The following table contains a few questions and the state of the function after adding the question.
diff score | A11 A12 A22 b1 b2 | m n
--------------+----------------------------+-------------
70 0.3 | 4900 70 1 21 0.3 |
50 0.4 | 7400 120 2 41 0.7 | -0.005 0.65
40 0.5 | 9000 160 3 61 1.2 | -0.006 0.74
35 0.7 | 10225 195 4 85.5 1.9 | -0.010 0.96
Here is the fitted function and the sample questions:
And if we want to find the difficulty for an expected score of e.g. 75%, we get:
difficulty(0.75) = 21.009

probability normalization in python

Given the following "un-normalized" set of probabilities (i.e., that do not necessarily sum to 1):
0.22 0.54 0.58 0.36 0.3
What is the normalized set of probabilities? (Enter your answer as a sequence of space-separated numbers.)
0.11 0.27 0.29 0.18 0.15
Normalization factor would be 1 / (0.22 + 0.54 + 0.58 + 0.36 + 0.3) = 0.5.
And the the normalized value for every probability is normalization factor * value.

How to implement branch selection based on probability?

I want the program to choose something with a set probability. For example, there is 0.312 probability of choosing path A and 0.688 probability of choosing path B. The only way I can think is a naive way to select randomly from the interval 0-1 and checking if <=0.312. Is there some better approach that extends to more than 2 elements?
Following is a way to do it with more efficiently than multiple if else statements: -
Suppose
a = 0.2, b = 0.35, c = 0.15, d = 0.3.
Make an array where p[0] corresponds to a and p[1] corresponds to b and so on
run a loop evaluating sum of probabilties
p[0] = 0.2
p[1] = 0.2 + 0.35 = 0.55
p[2] = 0.55 + 0.15 = 0.70
p[3] = 0.70 + 0.30 = 1
Generate a random number in [0,1]. Do binary search on p for random number. The interval that search returns will be your branch
eg.
random no = 0.6
result = binarySearch(0.6)
result = 2 using above intervals
2 => branch c

order of growth in algorithms

Suppose that you time a program as a function of N and produce
the following table.
N seconds
-------------------
19683 0.00
59049 0.00
177147 0.01
531441 0.08
1594323 0.44
4782969 2.46
14348907 13.58
43046721 74.99
129140163 414.20
387420489 2287.85
Estimate the order of growth of the running time as a function of N.
Assume that the running time obeys a power law T(N) ~ a N^b. For your
answer, enter the constant b. Your answer will be marked as correct
if it is within 1% of the target answer - we recommend using
two digits after the decimal separator, e.g., 2.34.
Can someone explain how to calculate this?
Well, it is a simple mathematical problem.
I : a*387420489^b = 2287.85 -> a = 387420489^b/2287.85
II: a*43046721^b = 74.99 -> a = 43046721^b/74.99
III: (I and II)-> 387420489^b/2287.85 = 43046721^b/74.99 ->
-> http://www.purplemath.com/modules/solvexpo2.htm
Use logarithms to solve.
1.You should calculate the ratio of the growth change from one row to the one next
N seconds
--------------------
14348907 13.58
43046721 74.99
129140163 414.2
387420489 2287.85
2.Calculate the change's ratio for N
43046721 / 14348907 = 3
129140163 / 43046721 = 3
therefore the rate of change for N is 3.
3.Calculate the change's ratio for seconds
74.99 / 13.58 = 5.52
Now let check the ratio between one more pare of rows to be sure
414.2 / 74.99 = 5.52
so the change's ratio for seconds is 5.52
4.Build the following equitation
3^b = 5.52
b = 1.55
Finally we get that the order of growth of the running time is 1.55.

Resources