Why the answer of this Gaussian probability question is zero? - probability

Assume the average weight of an American adult male is 180 pounds with a standard deviation of 34 pounds. The distribution of weights follows a normal distribution. What is the probability that a man weighs exactly 185 pounds?
Why the answer is 0?

Because weight is a continuous variable (a variable that can has any value within an interval). If probability is expressed as favorable cases / possible cases, there is 1 possible case (weight is exactly 185 pounds) versus infinite cases (weight is any other value, and there is infinite values within an interval)
For a Gaussian distribution, it makes more sense to ask which is the proability for a man's weight to be lower or greater than a certain value.

Related

Calculating NPV using Monte Carlo simulation

I am wondering if anyone can provide me links or idea about how i can calculate stochastic npv after monte carlo simulation and how i can calculate probability of npv>0? We first calculated deterministic npv with all the assumptions and then I took some important parameters where I can assign uncertainty and then assigned them uniform distribution (runif). But the probability of positive npv seems to be 0/1, nowhere in between, Is there something wrong with how i am calculating probability of positive npv?or how i am calculating npv_vec[i]?
...
a<- runif(100,10,20)
b<- runif(100,20,30)
npv_vec <- rep(NA,ndraw)
profit_vec <- rep(NA,ndraw)
for(i in 1:ndraw) {
npv_vec[i] <- NPV_fun(a_vec[i],b_vec[i])
profit_vec[i] <- ifelse(npv_vec[i]>0,1,0)
}
# calculate the probability of positive npv
pb_profit <- mean(profit_vec)
pb_profit
...
On a single flip of a coin, it comes up either heads or tails. This does not mean that the probability of heads is either 0 or 1. To estimate that probability you have to perform multiple trials of the coin flip, and determine the proportion of flips which are heads.
Similarly, the probability of NPV>0 is 0 or 1 with no chance of anything in between. As with coin flips, you determine the probability based on multiple trials and calculating the proportion of trials which had NPV>0.

Algorithms to deal with apportionment problems

I need an algorithm, or technique, or any guidance to optimize the following problem:
I have two companies:
Company A has 324 employees
Company B has 190 employees
The total of employees (A+B) is 514. I need to randomly select 28% of these 514 employees.
Ok, so let's do it: 28% of 514 is 143.92; Oh... this is bad, we are dealing with people here, so we cannot have decimal places. Ok then, I'll try rounding that up or down.
If I round down: 143 is 27,82101167% which is not good, since I must have at least 28%, so I must round up to 144.
So now I know that 144 employees must be selected.
The main problem comes now... It's time to check how much percentage I must use for each company to get the total of 144. How do I do that in order to have the percentage as close as possible to 28% for each company?
I'll exemplify:
If I just apply 28% for each company I get:
Company A has 324 employers: 0.28 * 324 = 90.72
Company B has 190 employers: 0.28 * 190 = 53.2
Again, I end up with decimal places. So I must figure out which ones I should round up, and which ones should I round down to get 144 total.
Note: For this example I only used two companies, but in the real problem I have 30 companies.
There are many methods to perform apportionment, and no objective best method.
The following is in terms of states and seats rather than companies and people. Credit probably goes to Dr. Larry Bowen who is cited on the base site for the first link.
Hamilton’s Method
Also known as the Method of Largest Remainders and sometimes as Vinton's Method.
Procedure:
Calculate the Standard Divisor.
Calculate each state’s Standard Quota.
Initially assign each state its Lower Quota.
If there are surplus seats, give them, one at a time, to states in descending order of the fractional parts of their Standard Quota.
Here, the Standard Divisor can be found by dividing the total population (the sum of the population of each company) by the number of people you want to sample (144 in this case). The Standard Quota is the company's population divided by the Standard Divisor. The Lower Quota is this value rounded down. However, this method has some flaws.
Problems:
The Alabama Paradox An increase in the total number of seats to be apportioned causes a state to lose a seat.
The Population Paradox An increase in a state’s population can cause it to lose a seat.
The New States Paradox Adding a new state with its fair share of seats can affect the number of seats due other states.
This is probably the most simple method to implement. Below are some other methods with their accompanying implementations and drawbacks.
Jefferson’s Method Also known as the Method of Greatest Divisors and in Europe as the Method of d'Hondt or the Hagenbach-Bischoff Method.
Procedure:
Calculate the Standard Divisor.
Calculate each state’s Standard Quota.
Initially assign each state its Lower Quota.
Check to see if the sum of the Lower Quotas is equal to the correct number of seats to be apportioned.
If the sum of the Lower Quotas is equal to the correct number of seats to be apportioned, then apportion to each state the number of seats equal to its Lower Quota.
If the sum of the Lower Quotas is NOT equal to the correct number of seats to be apportioned, then, by trial and error, find a number, MD, called the Modified Divisor to use in place of the Standard Divisor so that when the Modified Quota, MQ, for each state (computed by dividing each State's Population by MD instead of SD) is rounded DOWN, the sum of all the rounded (down) Modified Quotas is the exact number of seats to be apportioned. (Note: The MD will always be smaller than the Standard Divisor.) These rounded (down) Modified Quotas are sometimes called Modified Lower Quotas. Apportion each state its Modified Lower Quota.
Problem:
Violates the Quota Rule. (However, it can only violate Upper Quota—never Lower Quota.)
Webster’s Method Also known as the Webster-Willcox Method as well as the Method of Major Fractions.
Procedure:
Calculate the Standard Divisor.
Calculate each state’s Standard Quota.
Initially assign a state its Lower Quota if the fractional part of its Standard Quota is less than 0.5. Initially assign a state its Upper Quota if the fractional part of its Standard Quota is greater than or equal to 0.5. [In other words, round down or up based on the arithmetic mean (average).]
Check to see if the sum of the Quotas (Lower and/or Upper from Step 3) is equal to the correct number of seats to be apportioned.
If the sum of the Quotas (Lower and/or Upper from Step 3) is equal to the correct number of seats to be apportioned, then apportion to each state the number of seats equal to its Quota (Lower or Upper from Step 3).
If the sum of the Quotas (Lower and/or Upper from Step 3) is NOT equal to the correct number of seats to be apportioned, then, by trial and error, find a number, MD, called the Modified Divisor to use in place of the Standard Divisor so that when the Modified Quota, MQ, for each state (computed by dividing each State's Population by MD instead of SD) is rounded based on the arithmetic mean (average) , the sum of all the rounded Modified Quotas is the exact number of seats to be apportioned. Apportion each state its Modified Rounded Quota.
Problem:
Violates the Quota Rule. (However, violations are rare and are usually associated with contrived situations.)
Huntington-Hill Method Also known as the Method of Equal Proportions.
Current method used to apportion U.S. House
Developed around 1911 by Joseph A. Hill, Chief Statistician of the Bureau of the Census and Edward V. Huntington, Professor of Mechanics & Mathematics, Harvard
Preliminary terminology: The Geometric Mean
Procedure:
Calculate the Standard Divisor.
Calculate each state’s Standard Quota.
Initially assign a state its Lower Quota if the fractional part of its Standard Quota is less than the Geometric Mean of the two whole numbers that the Standard Quota is immediately between (for example, 16.47 is immediately between 16 and 17). Initially assign a state its Upper Quota if the fractional part of its Standard Quota is greater than or equal to the Geometric Mean of the two whole numbers that the Standard Quota is immediately between (for example, 16.47 is immediately between 16 and 17). [In other words, round down or up based on the geometric mean.]
Check to see if the sum of the Quotas (Lower and/or Upper from Step 3) is equal to the correct number of seats to be apportioned.
If the sum of the Quotas (Lower and/or Upper from Step 3) is equal to the correct number of seats to be apportioned, then apportion to each state the number of seats equal to its Quota (Lower or Upper from Step 3).
If the sum of the Quotas (Lower and/or Upper from Step 3) is NOT equal to the correct number of seats to be apportioned, then, by trial and error, find a number, MD, called the Modified Divisor to use in place of the Standard Divisor so that when the Modified Quota, MQ, for each state (computed by dividing each State's Population by MD instead of SD) is rounded based on the geometric mean, the sum of all the rounded Modified Quotas is the exact number of seats to be apportioned. Apportion each state its Modified Rounded Quota.
Problem:
Violates the Quota Rule.
For reference, the Quota Rule :
Quota Rule
An apportionment method that always allocates only lower and/or upper bounds follows the quota rule.
The problem can be framed as that of finding the closest integer approximation to a set of ratios. For instance, if you want to assign respectively A, B, C ≥ 0 members from 3 groups to match the ratios a, b, c ≥ 0 (with a + b + c = N > 0), where N = A + B + C > 0 is the total allocation desired, then you're approximating (a, b, c) by (A, B, C) with A, B and C restricted to integers.
One way to solve this may be to set it up as a least squares problem - that of minimizing |a - A|² + |b - B|² + |c - C|²; subject to the constraints A + B + C = N and A, B, C ≥ 0.
A necessary condition for the optimum is that it be a local optimum with respect discrete unit changes. For instance, (A,B,C) → (A+1,B-1,C), if B > 0 ... which entails the condition (A - B ≥ a - b - 1 or B = 0).
For the situation at hand, the optimization problem is:
|A - a|² + |B - b|²
a = 144×324/(324+190) ≅ 90.770, b = 144×190/(324+190) ≅ 53.230
which leads to the conditions:
A - B ≥ a - b - 1 ≅ +36.541 or B = 0
B - A ≥ b - a - 1 ≅ -38.541 or A = 0
A + B = 144
Since they are integers the inequalities can be strengthened:
A - B ≥ +37 or B = 0
B - A ≥ -38 or A = 0
A + B = 144
The boundary cases A = 0 and B = 0 are ruled out, since they don't satisfy all three conditions. So, you're left with 37 ≤ A - B ≤ 38 or, since A + B = 144: 181 ≤ 2A ≤ 182 or A = 91 ... and B = 53.
It is quite possible that this way of framing the problem may be equivalent, in terms of its results, to one of the algorithms cited in an earlier reply.
My suggestion is to just take 28% of each company and round up to the nearest person.
In your case, you would go with 91 and 54. Admittedly, this does result in having a bit over 28%.
The most accurate method is as follows:
Calculate the exact number that you want.
Take 28% for each company and round down.
Sort the companies in descending order by the remainder.
Go through the list and choose the top n elements until you get exactly the number you want.
Since I originally posted this question I came across a description of this exact problem in Martin Fowler's book "Patterns of Enterprise Application Architecture" (page 489 and 490).
Martin talks about a "Matt Foemmel’s simple conundrum" of dividing 5 cents between two accounts, but must obey the distribution of 70% and 30%. This describes my problem in a much simpler way.
Here are the solutions he presents in his book to that problem:
Perhaps the most common is to ignore it—after all, it’s only a penny here
and there. However this tends to make accountants understandably nervous.
When allocating you always do the last allocation by subtracting from
what you’ve allocated so far. This avoids losing pennies, but you can get a
cumulative amount of pennies on the last allocation.
Allow users of a Money class to declare the rounding scheme when they call
the method. This permits a programmer to say that the 70% case rounds up and
the 30% rounds down. Things can get complicated when you allocate across ten
accounts instead of two. You also have to remember to round. To encourage people to remember I’ve seen some Money classes force a rounding parameter into
the multiply operation. Not only does this force the programmer to think about
what rounding she needs, it also might remind her of the tests to write. However,
it gets messy if you have a lot of tax calculations that all round the same way.
My favorite solution: have an allocator function on the money. The
parameter to the allocator is a list of numbers, representing the ratio to be allocated (it would look something like aMoney.allocate([7,3])). The allocator returns
a list of monies, guaranteeing that no pennies get dropped by scattering them
across the allocated monies in a way that looks pseudo-random from the outside. The allocator has faults: You have to remember to use it and any precise
rules about where the pennies go are difficult to enforce.

Markov entropy when probabilities are uneven

I've been thinking about information entropy in terms of the Markov equation:
H = -SUM(p(i)lg(p(i)), where lg is the base 2 logarithm.
This assumes that all selections i have equal probability. But what if the probability in the given set of choices is unequal? For example, let's say that StackExchange has 20 sites, and that the probability of a user visiting any StackExchange site except StackOverflow is p(i). But, the probability of a user visiting StackExchange is 5 times p(i).
Would the Markov equation not apply in this case? Or is there an advanced Markov variation that I'm unaware of?
I believe you are mixing up 2 concepts: entropy and the Markov equation. Entropy measures the "disorder" of a distribution on states, using the equation you gave: H = -SUM(p(i)lg(p(i)), where p(i) is the probability of observing each state i.
The Markov property does not imply that every state has the same probability. Roughly, a system is said to exhibit the Markov property if the probability to observe a state depends only on observing a few previous states - after a certain limit, the extra states you observe add no information to predicting the next state.
The prototypical Markov model is known as a Markov chain. It says that from each state i, you can move to any state with another probability, represented as a probability matrix:
0 1 2
0 0.2 0.5 0.3
1 0.8 0.1 0.1
2 0.3 0.3 0.4
In this example, the probability of moving from state 0 to 1 is 0.5, and depends only on being in state 0 (knowing more about the previous states would not change that probability).
As long as all states can be visited starting from any state, no matter what the initial distribution, the probability of being in each state converges to a stable, long-term distribution, and on a "long series" you'll observe each state with a stable probability, which is not necessarily equal for each states.
In our example, we would end up having probabilities p(0), p(1) and p(2), and you could then compute the entropy of that chain using your formula.
From your example are you thinking of Markov Chains?

Algorithm for categorizing values

What would be the best algorithm to solve this problem? I spent a couple of hours on this problem. But couldn't sort it out.
A guy purchased a necklace and planned to make it into two pieces in such a way that the average brightness of each piece should be either greater than or equal to the original piece.
The criteria for dividing the necklaces are
1.The difference in number of pearls between the two pearls sets should not be greater than 10% of the number of pearls in the original necklace or 3 whichever is higher.
2.The difference between number of pearls in 2 necklaces should be minimum.
3.In case if the average brightness of any one of the necklace is less than the average brightness of the original set return 0 as output.
4.Two necklaces should have their average brightness greater than the original one and the difference between the average brightness of the two pieces is minimum.
5.The average brightness of each piece should be either greater than or equal to the original piece.
This problem is rather hard to do efficiently (in NP somewhere).
Say you had a set that averaged to X. That is, X = (x1 + x2 + ... + xn) / n.
Suppose you break it up into sets that average to S and T with s and t items in each set, respectively.
You can mathematically prove that if one of the averages, S or T, is greater than X, the other of the two must be less than X.
Hence, the two sets must have exactly the same brightness because that's the only way your conditions are satisfiable.
Knowing this, you're ending up with the sumset sum problem -- you want to find a subset that sums to exactly half of the sum of the entire set. That's a problem that's known to be hard. (It's been classified NP. And alright, it's not exactly the same as the subset sum problem, but if subtract the average of the full set from each of the brightness values, solving the subset sum problem will give you your answer. (Do the reverse to see how you can solve the subset sum problem from your problem.)
Hence, there's no fast way of doing this -- only approximations or exponential running times... However, maybe this will help. It mentions better running times if your weights (in your case, brightness levels) are bounded.

How do I determine the bias of an algorithm?

Let's say I have an algorithm that is supposed to represent a coin flip. How do I determine the bias of this coin? Specifically, I have written the algorithm in this JSFiddle.
The fiddle runs a series of 20 tests. Each test flips the coin 100 times and tallies the results. At the end of the series it reports Heads/Tails for the total number of flips across all tests. This result seems to be approaching 1 (from both sides), but I have not done any rigorous testing of this.
Note, this is not homework. This is purely a personal interest.
You can't come up with a way to guarantee detecting a bias, but you can determine it to a certain degree of certainty (say 95%). What you do is test n times and count how many times you get heads, call this variable h.
Then if h / n < 0.5 - 1.96 * sqrt(0.25 / n) then the coin is biased towards tails (with a 95% probability) and if h / n > 0.5 + 1.96 * sqrt(0.25 / n) then the coin is biased towards heads.
This decision is based on something called normal approximation to a binomial distribution, you can read more about it here: http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Normal_approximation_interval
George Marsaglia Diehard Tests
if you consider your Head/Tail as generating 0 1 bits. Using them you can generate random numbers and test their randomness

Resources