Generate random numbers according to distributions - random

I want to generate random numbers according some distributions. How can I do this?

The standard random number generator you've got (rand() in C after a simple transformation, equivalents in many languages) is a fairly good approximation to a uniform distribution over the range [0,1]. If that's what you need, you're done. It's also trivial to convert that to a random number generated over a somewhat larger integer range.
Conversion of a Uniform distribution to a Normal distribution has already been covered on SO, as has going to the Exponential distribution.
[EDIT]: For the triangular distribution, converting a uniform variable is relatively simple (in something C-like):
double triangular(double a,double b,double c) {
double U = rand() / (double) RAND_MAX;
double F = (c - a) / (b - a);
if (U <= F)
return a + sqrt(U * (b - a) * (c - a));
else
return b - sqrt((1 - U) * (b - a) * (b - c));
}
That's just converting the formula given on the Wikipedia page. If you want others, that's the place to start looking; in general, you use the uniform variable to pick a point on the vertical axis of the cumulative density function of the distribution you want (assuming it's continuous), and invert the CDF to get the random value with the desired distribution.

The right way to do this is to decompose the distribution into n-1 binary distributions. That is if you have a distribution like this:
A: 0.05
B: 0.10
C: 0.10
D: 0.20
E: 0.55
You transform it into 4 binary distributions:
1. A/E: 0.20/0.80
2. B/E: 0.40/0.60
3. C/E: 0.40/0.60
4. D/E: 0.80/0.20
Select uniformly from the n-1 distributions, and then select the first or second symbol based on the probability if each in the binary distribution.
Code for this is here

It actually depends on distribution. The most general way is the following. Let P(X) be the probability that random number generated according to your distribution is less than X.
You start with generating uniform random X between zero and one. After that you find Y such that P(Y) = X and output Y. You could find such Y using binary search (since P(X) is an increasing function of X).
This is not very efficient, but works for distributions where P(X) could be efficiently computed.

You can look up inverse transform sampling, rejection sampling as well as the book by Devroye "Nonuniform random variate generation"/Springer Verlag 1986

You can convert from discrete bins to float/double with interpolation. Simple linear works well. If your table memory is constrained other interpolation methods can be used. -jlp

It's a standard textbook matter. See here for some code, or here at Section 3.2 for some reference mathematical background (actually very quick and simple to read).

Related

how to define the probability distribution

I have small question and I will be very happy if you can give me a solution or any idea for solution of probability distribution of the following idea:
I have a random variable x which follows exponntial distribution with parameter lambda1,I have one more variable y which follows exponential distribution with parameter lambda2. z is a discrete value, how can I define the probability distribution of k in the following formula ?
k=z-x-y
Thank you so much
Ok, lets start with rewriting formula a bit:
k = z-x-y = -(x-y) + z = - (x + y + -z)
That parts in the parentheses looks manageable. Let's start with x+y. For random variable x and y if one wants to find out their sum, answer is PDFs convolution.
q = x+y
PDF(q) = S PDFx(q-t) PDFy(t) dt
where S denotes integration. For x and y being exponential, the convolution integral is known and equal to expression here when lambdas are different, or to Gamma(2,lambda) when lambdas are equal, Gamma being Gamma distribution.
If z is some constant discrete value, then we could express it as continuous RV with PDF
PDF(t) = 𝛿(t+z)
where 𝛿 is Delta function, and we take into account that peak would be at -z as expected. It is normalized, so integral over t is eqaul to 1. It could be easily extended to discrete RV, as sum of 𝛿-functions at those values, multiplied by probabilities such that sum of them is equal to 1.
Again, we have sum of two RV, with known PDFs, and solution is convolution, which is easy to compute due to property of 𝛿-function. So final PDF of x + y + -z would be
PDF(q+z) dq
where PDF is taken from sum expression from Exponential distribution wiki, of Gamma distribution from Gamma wiki.
You just have to negate, and that's it

What is the most numerically precise method for dividing sums or differences?

Consider (a-b)/(c-d) operation, where a,b,c and d are floating-point numbers (namely, double type in C++). Both (a-b) and (c-d) are (sum-correction) pairs, as in Kahan summation algorithm. Briefly, the specific of these (sum-correction) pairs is that sum contains a large value relatively to what's in correction. More precisely, correction contains what didn't fit in sum during summation due to numerical limitations (53 bits of mantissa in double type).
What is the numerically most precise way to calculate (a-b)/(c-d) given the above speciality of the numbers?
Bonus question: it would be better to get the result also as (sum-correction), as in Kahan summation algorithm. So to find (e-f)=(a-b)/(c-d), rather than just e=(a-b)/(c-d) .
The div2 algorithm of Dekker (1971) is a good approach.
It requires a mul12(p,q) algorithm which can exactly computes a pair u+v = p*q. Dekker uses a method known as Veltkamp splitting, but if you have access to an fma function, then a much simpler method is
u = p*q
v = fma(p,q,-u)
the actual division then looks like (I've had to change some of the signs since Dekker uses additive pairs instead of subtractive):
r = a/c
u,v = mul12(r,c)
s = (a - u - v - b + r*d)/c
The the sum r+s is an accurate approximation to (a-b)/(c-d).
UPDATE: The subtraction and addition are assumed to be left-associative, i.e.
s = ((((a-u)-v)-b)+r*d)/c
This works because if we let rr be the error in the computation of r (i.e. r + rr = a/c exactly), then since u+v = r*c exactly, we have that rr*c = a-u-v exactly, so therefore (a-u-v-b)/c gives a fairly good approximation to the correction term of (a-b)/c.
The final r*d arises due to the following:
(a-b)/(c-d) = (a-b)/c * c/(c-d) = (a-b)/c *(1 + d/(c-d))
= [a-b + (a-b)/(c-d) * d]/c
Now r is also a fairly good initial approximation to (a-b)/(c-d) so we substitute that inside the [...], so we find that (a-u-v-b+r*d)/c is a good approximation to the correction term of (a-b)/(c-d)
For tiny corrections, maybe think of
(a - b) / (c - d) = a/b (1 - b/a) / (1 - c/d) ~ a/b (1 - b/a + c/d)

example algorithm for generating random value in dataset with normal distribution?

I'm trying to generate some random numbers with simple non-uniform probability to mimic lifelike data for testing purposes. I'm looking for a function that accepts mu and sigma as parameters and returns x where the probably of x being within certain ranges follows a standard bell curve, or thereabouts. It needn't be super precise or even efficient. The resulting dataset needn't match the exact mu and sigma that I set. I'm just looking for a relatively simple non-uniform random number generator. Limiting the set of possible return values to ints would be fine. I've seen many suggestions out there, but none that seem to fit this simple case.
Box-Muller transform in a nutshell:
First, get two independent, uniform random numbers from the interval (0, 1], call them U and V.
Then you can get two independent, unit-normal distributed random numbers from the formulae
X = sqrt(-2 * log(U)) * cos(2 * pi * V);
Y = sqrt(-2 * log(U)) * sin(2 * pi * V);
This gives you iid random numbers for mu = 0, sigma = 1; to set sigma = s, multiply your random numbers by s; to set mu = m, add m to your random numbers.
My first thought is why can't you use an existing library? I'm sure that most languages already have a library for generating Normal random numbers.
If for some reason you can't use an existing library, then the method outlined by #ellisbben is fairly simple to program. An even simpler (approximate) algorithm is just to sum 12 uniform numbers:
X = -6 ## We set X to be -mean value of 12 uniforms
for i in 1 to 12:
X += U
The value of X is approximately normal. The following figure shows 10^5 draws from this algorithm compared to the Normal distribution.

How to calculate the sum of two normal distributions

I have a value type that represents a gaussian distribution:
struct Gauss {
double mean;
double variance;
}
I would like to perform an integral over a series of these values:
Gauss eulerIntegrate(double dt, Gauss iv, Gauss[] values) {
Gauss r = iv;
foreach (Gauss v in values) {
r += v*dt;
}
return r;
}
My question is how to implement addition for these normal distributions.
The multiplication by a scalar (dt) seemed simple enough. But it wasn't simple! Thanks FOOSHNICK for the help:
public static Gauss operator * (Gauss g, double d) {
return new Gauss(g.mean * d, g.variance * d * d);
}
However, addition eludes me. I assume I can just add the means; it's the variance that's causing me trouble. Either of these definitions seems "logical" to me.
public static Gauss operator + (Gauss a, Gauss b) {
double mean = a.mean + b.mean;
// Is it this? (Yes, it is!)
return new Gauss(mean, a.variance + b.variance);
// Or this? (nope)
//return new Gauss(mean, Math.Max(a.variance, b.variance));
// Or how about this? (nope)
//return new Gauss(mean, (a.variance + b.variance)/2);
}
Can anyone help define a statistically correct - or at least "reasonable" - version of the + operator?
I suppose I could switch the code to use interval arithmetic instead, but I was hoping to stay in the world of prob and stats.
The sum of two normal distributions is itself a normal distribution:
N(mean1, variance1) + N(mean2, variance2) ~ N(mean1 + mean2, variance1 + variance2)
This is all on wikipedia page.
Be careful that these really are variances and not standard deviations.
// X + Y
public static Gauss operator + (Gauss a, Gauss b) {
//NOTE: this is valid if X,Y are independent normal random variables
return new Gauss(a.mean + b.mean, a.variance + b.variance);
}
// X*b
public static Gauss operator * (Gauss a, double b) {
return new Gauss(a.mean*b, a.variance*b*b);
}
To be more precise:
If a random variable Z is defined as the linear combination of two uncorrelated Gaussian random variables X and Y, then Z is itself a Gaussian random variable, e.g.:
if Z = aX + bY,
then mean(Z) = a * mean(X) + b * mean(Y), and variance(Z) = a2 * variance(X) + b2 * variance(Y).
If the random variables are correlated, then you have to account for that. Variance(X) is defined by the expected value E([X-mean(X)]2). Working this through for Z = aX + bY, we get:
variance(Z) = a2 * variance(X) + b2 * variance(Y) + 2ab * covariance(X,Y)
If you are summing two uncorrelated random variables which do not have Gaussian distributions, then the distribution of the sum is the convolution of the two component distributions.
If you are summing two correlated non-Gaussian random variables, you have to work through the appropriate integrals yourself.
Well, your multiplication by scalar is wrong - you should multiply variance by the square of d. If you're adding a constant, then just add it to the mean, the variance stays the same. If you're adding two distributions, then add the means and add the variances.
Can anyone help define a statistically correct - or at least "reasonable" - version of the + operator?
Arguably not, as adding two distributions means different things - having worked in reliability and maintainablity my first reaction from the title would be the distribution of a system's mtbf, if the mtbf of each part is normally distributed and the system had no redundancy. You are talking about the distribution of the sum of two normally distributed independent variates, not the (logical) sum of two normal distributions' effect. Very often, operator overloading has surprising semantics. I'd leave it as a function and call it 'normalSumDistribution' unless your code has a very specific target audience.
Hah, I thought you couldn't add gaussian distributions together, but you can!
http://mathworld.wolfram.com/NormalSumDistribution.html
In fact, the mean is the sum of the individual distributions, and the variance is the sum of the individual distributions.
I'm not sure that I like what you're calling "integration" over a series of values. Do you mean that word in a calculus sense? Are you trying to do numerical integration? There are other, better ways to do that. Yours doesn't look right to me, let alone optimal.
The Gaussian distribution is a nice, smooth function. I think a nice quadrature approach or Runge-Kutta would be a much better idea.
I would have thought it depends on what type of addition you are doing. If you just want to get a normal distribution with properties (mean, standard deviation etc.) equal to the sum of two distributions then the addition of the properties as given in the other answers is fine. This is the assumption used in something like PERT where if a large number of normal probability distributions are added up then the resulting probability distribution is another normal probability distribution.
The problem comes when the two distributions being added are not similar. Take for instance adding a probability distribution with a mean of 2 and standard deviation of 1 and a probability distribution of 10 with a standard deviation of 2. If you add these two distributions up, you get a probability distribution with two peaks, one at 2ish and one at 10ish. The result is therefore not a normal distibution. The assumption about adding distributions is only really valid if the original distributions are either very similar or you have a lot of original distributions so that the peaks and troughs can be evened out.

Converting a Uniform Distribution to a Normal Distribution

How can I convert a uniform distribution (as most random number generators produce, e.g. between 0.0 and 1.0) into a normal distribution? What if I want a mean and standard deviation of my choosing?
There are plenty of methods:
Do not use Box Muller. Especially if you draw many gaussian numbers. Box Muller yields a result which is clamped between -6 and 6 (assuming double precision. Things worsen with floats.). And it is really less efficient than other available methods.
Ziggurat is fine, but needs a table lookup (and some platform-specific tweaking due to cache size issues)
Ratio-of-uniforms is my favorite, only a few addition/multiplications and a log 1/50th of the time (eg. look there).
Inverting the CDF is efficient (and overlooked, why ?), you have fast implementations of it available if you search google. It is mandatory for Quasi-Random numbers.
The Ziggurat algorithm is pretty efficient for this, although the Box-Muller transform is easier to implement from scratch (and not crazy slow).
Changing the distribution of any function to another involves using the inverse of the function you want.
In other words, if you aim for a specific probability function p(x) you get the distribution by integrating over it -> d(x) = integral(p(x)) and use its inverse: Inv(d(x)). Now use the random probability function (which have uniform distribution) and cast the result value through the function Inv(d(x)). You should get random values cast with distribution according to the function you chose.
This is the generic math approach - by using it you can now choose any probability or distribution function you have as long as it have inverse or good inverse approximation.
Hope this helped and thanks for the small remark about using the distribution and not the probability itself.
Here is a javascript implementation using the polar form of the Box-Muller transformation.
/*
* Returns member of set with a given mean and standard deviation
* mean: mean
* standard deviation: std_dev
*/
function createMemberInNormalDistribution(mean,std_dev){
return mean + (gaussRandom()*std_dev);
}
/*
* Returns random number in normal distribution centering on 0.
* ~95% of numbers returned should fall between -2 and 2
* ie within two standard deviations
*/
function gaussRandom() {
var u = 2*Math.random()-1;
var v = 2*Math.random()-1;
var r = u*u + v*v;
/*if outside interval [0,1] start over*/
if(r == 0 || r >= 1) return gaussRandom();
var c = Math.sqrt(-2*Math.log(r)/r);
return u*c;
/* todo: optimize this algorithm by caching (v*c)
* and returning next time gaussRandom() is called.
* left out for simplicity */
}
Where R1, R2 are random uniform numbers:
NORMAL DISTRIBUTION, with SD of 1:
sqrt(-2*log(R1))*cos(2*pi*R2)
This is exact... no need to do all those slow loops!
Reference: dspguide.com/ch2/6.htm
Use the central limit theorem wikipedia entry mathworld entry to your advantage.
Generate n of the uniformly distributed numbers, sum them, subtract n*0.5 and you have the output of an approximately normal distribution with mean equal to 0 and variance equal to (1/12) * (1/sqrt(N)) (see wikipedia on uniform distributions for that last one)
n=10 gives you something half decent fast. If you want something more than half decent go for tylers solution (as noted in the wikipedia entry on normal distributions)
I would use Box-Muller. Two things about this:
You end up with two values per iteration
Typically, you cache one value and return the other. On the next call for a sample, you return the cached value.
Box-Muller gives a Z-score
You have to then scale the Z-score by the standard deviation and add the mean to get the full value in the normal distribution.
It seems incredible that I could add something to this after eight years, but for the case of Java I would like to point readers to the Random.nextGaussian() method, which generates a Gaussian distribution with mean 0.0 and standard deviation 1.0 for you.
A simple addition and/or multiplication will change the mean and standard deviation to your needs.
The standard Python library module random has what you want:
normalvariate(mu, sigma)
Normal distribution. mu is the mean, and sigma is the standard deviation.
For the algorithm itself, take a look at the function in random.py in the Python library.
The manual entry is here
This is a Matlab implementation using the polar form of the Box-Muller transformation:
Function randn_box_muller.m:
function [values] = randn_box_muller(n, mean, std_dev)
if nargin == 1
mean = 0;
std_dev = 1;
end
r = gaussRandomN(n);
values = r.*std_dev - mean;
end
function [values] = gaussRandomN(n)
[u, v, r] = gaussRandomNValid(n);
c = sqrt(-2*log(r)./r);
values = u.*c;
end
function [u, v, r] = gaussRandomNValid(n)
r = zeros(n, 1);
u = zeros(n, 1);
v = zeros(n, 1);
filter = r==0 | r>=1;
% if outside interval [0,1] start over
while n ~= 0
u(filter) = 2*rand(n, 1)-1;
v(filter) = 2*rand(n, 1)-1;
r(filter) = u(filter).*u(filter) + v(filter).*v(filter);
filter = r==0 | r>=1;
n = size(r(filter),1);
end
end
And invoking histfit(randn_box_muller(10000000),100); this is the result:
Obviously it is really inefficient compared with the Matlab built-in randn.
This is my JavaScript implementation of Algorithm P (Polar method for normal deviates) from Section 3.4.1 of Donald Knuth's book The Art of Computer Programming:
function normal_random(mean,stddev)
{
var V1
var V2
var S
do{
var U1 = Math.random() // return uniform distributed in [0,1[
var U2 = Math.random()
V1 = 2*U1-1
V2 = 2*U2-1
S = V1*V1+V2*V2
}while(S >= 1)
if(S===0) return 0
return mean+stddev*(V1*Math.sqrt(-2*Math.log(S)/S))
}
I thing you should try this in EXCEL: =norminv(rand();0;1). This will product the random numbers which should be normally distributed with the zero mean and unite variance. "0" can be supplied with any value, so that the numbers will be of desired mean, and by changing "1", you will get the variance equal to the square of your input.
For example: =norminv(rand();50;3) will yield to the normally distributed numbers with MEAN = 50 VARIANCE = 9.
Q How can I convert a uniform distribution (as most random number generators produce, e.g. between 0.0 and 1.0) into a normal distribution?
For software implementation I know couple random generator names which give you a pseudo uniform random sequence in [0,1] (Mersenne Twister, Linear Congruate Generator). Let's call it U(x)
It is exist mathematical area which called probibility theory.
First thing: If you want to model r.v. with integral distribution F then you can try just to evaluate F^-1(U(x)). In pr.theory it was proved that such r.v. will have integral distribution F.
Step 2 can be appliable to generate r.v.~F without usage of any counting methods when F^-1 can be derived analytically without problems. (e.g. exp.distribution)
To model normal distribution you can cacculate y1*cos(y2), where y1~is uniform in[0,2pi]. and y2 is the relei distribution.
Q: What if I want a mean and standard deviation of my choosing?
You can calculate sigma*N(0,1)+m.
It can be shown that such shifting and scaling lead to N(m,sigma)
I have the following code which maybe could help:
set.seed(123)
n <- 1000
u <- runif(n) #creates U
x <- -log(u)
y <- runif(n, max=u*sqrt((2*exp(1))/pi)) #create Y
z <- ifelse (y < dnorm(x)/2, -x, NA)
z <- ifelse ((y > dnorm(x)/2) & (y < dnorm(x)), x, z)
z <- z[!is.na(z)]
It is also easier to use the implemented function rnorm() since it is faster than writing a random number generator for the normal distribution. See the following code as prove
n <- length(z)
t0 <- Sys.time()
z <- rnorm(n)
t1 <- Sys.time()
t1-t0
function distRandom(){
do{
x=random(DISTRIBUTION_DOMAIN);
}while(random(DISTRIBUTION_RANGE)>=distributionFunction(x));
return x;
}

Resources