How to generate random number in range of truncated normal distribution - go

I need to generate a value in range of truncated normal distribution, for example, in python you could use scipy.stats.truncnorm() to make
def get_truncated_normal(mean=.0, sd=1., low=.0, upp=10.):
return truncnorm((low - mean) / sd, (upp - mean) / sd, loc=mean, scale=sd)
how it is described here
Is there any package to make something in go or should I write the following function myself?
I tried following, how the doc says, but it makes number not in the needed range:
func GenerateTruncatedNormal(mean, sd uint64) float64 {
return rand.NormFloat64() * (float64)(sd + mean)
}
GenerateTruncatedNormal(10, 5)
makes 16.61, -14.54, or even 32.8, but I expect a small chance of getting 15 due to mean = 10 -> 10 + 5 = 15 is maximum value which we can get. What is wrong here? 😅

One way of achieving this consists of
generating a number x from the Normal distribution, with the desired parameters for mean and standard deviation,
if it's outside the range [low..high], then throw it away and try again.
This respects the Probability Density Function of the Normal distribution, effectively cutting out the left and right tails.
func TruncatedNormal(mean, stdDev, low, high float64) float64 {
if low >= high {
panic("high must be greater than low")
}
for {
x := rand.NormFloat64()*stdDev + mean
if low <= x && x < high {
return x
}
// fmt.Println("missed!", x)
}
}
Playground
If the [low..high] interval is very narrow, then it will take a bit more computation time, as more generated numbers will be thrown away. However, it still converges very fast in practice.
I checked the code above by plotting its results against and compare them to the results of scipy's truncnorm, and they do produce equivalent charts.

Related

GoLang for loop with floats creates error

Can someone explain the following. I have a function in go that accepts a couple of float64 and then uses this value to calculate a lot of other values. The function looks like
func (g *Geometry) CalcStresses(x, zmax, zmin float64)(Vertical)
the result is put into a struct like
type Vertical struct {
X float64
Stresses []Stress
}
Now the funny thing is this. If I call the function like this;
for i:=14.0; i<15.0; i+=0.1{
result := geo.CalcStresses(i, 10, -10)
}
then I get a lot of results where the Stress array is empty, antoher interesting detail is that x sometimes shows like a number with a LOT of decimals (like 14.3999999999999999998)
However, if I call the function like this;
for i:=0; i<10; i++{
x := 14.0 + float64(i) * 0.1
result := geo.CalcStresses(x,10,-10)
}
then everything is fine.
Does anyone know why this happens?
Thanks in advance,
Rob
Not all real numbers can be represented precisely in binary floating point format, therefore looping over floating point number is asking for trouble.
From Wikipedia on Floating point
The fact that floating-point numbers cannot precisely represent all real numbers, and that floating-point operations cannot precisely represent true arithmetic operations, leads to many surprising situations. This is related to the finite precision with which computers generally represent numbers.
For example, the non-representability of 0.1 and 0.01 (in binary) means that the result of attempting to square 0.1 is neither 0.01 nor the representable number closest to it.
This code
for i := 14.0; i < 15.0; i += 0.1 {
fmt.Println(i)
}
produces this
14
14.1
14.2
14.299999999999999
14.399999999999999
14.499999999999998
14.599999999999998
14.699999999999998
14.799999999999997
14.899999999999997
14.999999999999996
You may use math.big.Rat type to represent rational numbers accurately.
Example
x := big.NewRat(14, 1)
y := big.NewRat(15, 1)
z := big.NewRat(1, 10)
for i := x; i.Cmp(y) < 0; i = i.Add(i, z) {
v, _ := i.Float64()
fmt.Println(v)
}

How to generate a random number from a random bit generator and guarantee termination?

Assuming I have a function that returns a random bit, is it possible to write a function that uniformly generates a random number within a certain range and always terminates?
I know how to do this so that it should (and probably will) terminate. I was just wondering if it's possible to write one that is guaranteed to terminate (and it doesn't have to be particularly efficient. What complexity would it have?
Here is a code for the not always terminating version
int random(int n)
{
while(true)
{
int r = 0;
for (int i = 0; i < ceil(log(n)); i++)
{
r = r<<1;
r = r|getRandomBit();
}
if(r<n)
{
return r;
}
}
}
I think this will work:
Suppose you want to generate a number in the range [a,b]
Generate a fraction r in range [0,1} using a binary radix. That means generate a number of form 0.x1x2x3.... where every x is either a 0 or 1 using your random function.
Once you have that, you can easily generate a number in the range [0,b-a], by computing ceil(r*(b-a)), and then simply add a to get a number in range [a,b]
If the size of the range isn't a power of 2, you can't get an exactly uniform distribution except through what amounts to rejection sampling. You can get as close as you like to uniform, however, by sampling once from a large range, and dividing the smaller range into it.
For instance, while you can't uniformly sample between 1 and 10, you can quite easily sample between 1 and 1024 by picking 10 random bits, and figure out some way of equitably dividing that into 10 intervals of about the same size.
Choosing additional bits has the effect of halving the largest error (from true uniformity) you have to see in your choices... so the error decreases exponentially as you choose more bits.

How to generate a number in arbitrary range using random()={0..1} preserving uniformness and density?

Generate a random number in range [x..y] where x and y are any arbitrary floating point numbers. Use function random(), which returns a random floating point number in range [0..1] from P uniformly distributed numbers (call it "density"). Uniform distribution must be preserved and P must be scaled as well.
I think, there is no easy solution for such problem. To simplify it a bit, I ask you how to generate a number in interval [-0.5 .. 0.5], then in [0 .. 2], then in [-2 .. 0], preserving uniformness and density? Thus, for [0 .. 2] it must generate a random number from P*2 uniformly distributed numbers.
The obvious simple solution random() * (x - y) + y will generate not all possible numbers because of the lower density for all abs(x-y)>1.0 cases. Many possible values will be missed. Remember, that random() returns only a number from P possible numbers. Then, if you multiply such number by Q, it will give you only one of P possible values, scaled by Q, but you have to scale density P by Q as well.
If I understand you problem well, I will provide you a solution: but I would exclude 1, from the range.
N = numbers_in_your_random // [0, 0.2, 0.4, 0.6, 0.8] will be 5
// This turns your random number generator to return integer values between [0..N[;
function randomInt()
{
return random()*N;
}
// This turns the integer random number generator to return arbitrary
// integer
function getRandomInt(maxValue)
{
if (maxValue < N)
{
return randomInt() % maxValue;
}
else
{
baseValue = randomInt();
bRate = maxValue DIV N;
bMod = maxValue % N;
if (baseValue < bMod)
{
bRate++;
}
return N*getRandomInt(bRate) + baseValue;
}
}
// This will return random number in range [lower, upper[ with the same density as random()
function extendedRandom(lower, upper)
{
diff = upper - lower;
ndiff = diff * N;
baseValue = getRandomInt(ndiff);
baseValue/=N;
return lower + baseValue;
}
If you really want to generate all possible floating point numbers in a given range with uniform numeric density, you need to take into account the floating point format. For each possible value of your binary exponent, you have a different numeric density of codes. A direct generation method will need to deal with this explicitly, and an indirect generation method will still need to take it into account. I will develop a direct method; for the sake of simplicity, the following refers exclusively to IEEE 754 single-precision (32-bit) floating point numbers.
The most difficult case is any interval that includes zero. In that case, to produce an exactly even distribution, you will need to handle every exponent down to the lowest, plus denormalized numbers. As a special case, you will need to split zero into two cases, +0 and -0.
In addition, if you are paying such close attention to the result, you will need to make sure that you are using a good pseudorandom number generator with a large enough state space that you can expect it to hit every value with near-uniform probability. This disqualifies the C/Unix rand() and possibly the*rand48() library functions; you should use something like the Mersenne Twister instead.
The key is to dissect the target interval into subintervals, each of which is covered by different combination of binary exponent and sign: within each subinterval, floating point codes are uniformly distributed.
The first step is to select the appropriate subinterval, with probability proportional to its size. If the interval contains 0, or otherwise covers a large dynamic range, this may potentially require a number of random bits up to the full range of the available exponent.
In particular, for a 32-bit IEEE-754 number, there are 256 possible exponent values. Each exponent governs a range which is half the size of the next greater exponent, except for the denormalized case, which is the same size as the smallest normal exponent region. Zero can be considered the smallest denormalized number; as mentioned above, if the target interval straddles zero, the probability of each of +0 and -0 should perhaps be cut in half, to avoid doubling its weight.
If the subinterval chosen covers the entire region governed by a particular exponent, all that is necessary is to fill the mantissa with random bits (23 bits, for 32-bit IEEE-754 floats). However, if the subinterval does not cover the entire region, you will need to generate a random mantissa that covers only that subinterval.
The simplest way to handle both the initial and secondary random steps may be to round the target interval out to include the entirety of all exponent regions partially covered, then reject and retry numbers that fall outside it. This allows the exponent to be generated with simple power-of-2 probabilities (e.g., by counting the number of leading zeroes in your random bitstream), as well as providing a simple and accurate way of generating a mantissa that covers only part of an exponent interval. (This is also a good way of handling the +/-0 special case.)
As another special case: to avoid inefficient generation for target intervals which are much smaller than the exponent regions they reside in, the "obvious simple" solution will in fact generate fairly uniform numbers for such intervals. If you want exactly uniform distributions, you can generate the sub-interval mantissa by using only enough random bits to cover that sub-interval, while still using the aforementioned rejection method to eliminate values outside the target interval.
well, [0..1] * 2 == [0..2] (still uniform)
[0..1] - 0.5 == [-0.5..0.5] etc.
I wonder where have you experienced such an interview?
Update: well, if we want to start caring about losing precision on multiplication (which is weird, because somehow you did not care about that in the original task, and pretend we care about "number of values", we can start iterating. In order to do that, we need one more function, which would return uniformly distributed random values in [0..1) — which can be done by dropping the 1.0 value would it ever appear. After that, we can slice the whole range in equal parts small enough to not care about losing precision, choose one randomly (we have enough randomness to do that), and choose a number in this bucket using [0..1) function for all parts but the last one.
Or, you can come up with a way to code enough values to care about—and just generate random bits for this code, in which case you don't really care whether it's [0..1] or just {0, 1}.
Let me rephrase your question:
Let random() be a random number generator with a discrete uniform distribution over [0,1). Let D be the number of possible values returned by random(), each of which is precisely 1/D greater than the previous. Create a random number generator rand(L, U) with a discrete uniform distribution over [L, U) such that each possible value is precisely 1/D greater than the previous.
--
A couple quick notes.
The problem in this form, and as you phrased it is unsolvable. That
is, if N = 1 there is nothing we can do.
I don't require that 0.0 be one of the possible values for random(). If it is not, then it is possible that the solution below will fail when U - L < 1 / D. I'm not particularly worried about that case.
I use all half-open ranges because it makes the analysis simpler. Using your closed ranges would be simple, but tedious.
Finally, the good stuff. The key insight here is that the density can be maintained by independently selecting the whole and fractional parts of the result.
First, note that given random() it is trivial to create randomBit(). That is,
randomBit() { return random() >= 0.5; }
Then, if we want to select one of {0, 1, 2, ..., 2^N - 1} uniformly at random, that is simple using randomBit(), just generate each of the bits. Call this random2(N).
Using random2() we can select one of {0, 1, 2, ..., N - 1}:
randomInt(N) { while ((val = random2(ceil(log2(N)))) >= N); return val; }
Now, if D is known, then the problem is trivial as we can reduce it to simply choosing one of floor((U - L) * D) values uniformly at random and we can do that with randomInt().
So, let's assume that D is not known. Now, let's first make a function to generate random values in the range [0, 2^N) with the proper density. This is simple.
rand2D(N) { return random2(N) + random(); }
rand2D() is where we require that the difference between consecutive possible values for random() be precisely 1/D. If not, the possible values here would not have uniform density.
Next, we need a function that selects a value in the range [0, V) with the proper density. This is similar to randomInt() above.
randD(V) { while ((val = rand2D(ceil(log2(V)))) >= V); return val; }
And finally...
rand(L, U) { return L + randD(U - L); }
We now may have offset the discrete positions if L / D is not an integer, but that is unimportant.
--
A last note, you may have noticed that several of these functions may never terminate. That is essentially a requirement. For example, random() may have only a single bit of randomness. If I then ask you to select from one of three values, you cannot do so uniformly at random with a function that is guaranteed to terminate.
Consider this approach:
I'm assuming the base random number generator in the range [0..1]
generates among the numbers
0, 1/(p-1), 2/(p-1), ..., (p-2)/(p-1), (p-1)/(p-1)
If the target interval length is less than or equal to 1,
return random()*(y-x) + x.
Else, map each number r from the base RNG to an interval in the
target range:
[r*(p-1)*(y-x)/p, (r+1/(p-1))*(p-1)*(y-x)/p]
(i.e. for each of the P numbers assign one of P intervals with length (y-x)/p)
Then recursively generate another random number in that interval and
add it to the interval begin.
Pseudocode:
const p;
function rand(x, y)
r = random()
if y-x <= 1
return x + r*(y-x)
else
low = r*(p-1)*(y-x)/p
high = low + (y-x)/p
return x + low + rand(low, high)
In real math: the solution is just the provided:
return random() * (upper - lower) + lower
The problem is that, even when you have floating point numbers, only have a certain resolution. So what you can do is apply above function and add another random() value scaled to the missing part.
If I make a practical example it becomes clear what I mean:
E.g. take random() return value from 0..1 with 2 digits accuracy, ie 0.XY, and lower with 100 and upper with 1100.
So with above algorithm you get as result 0.XY * (1100-100) + 100 = XY0.0 + 100.
You will never see 201 as result, as the final digit has to be 0.
Solution here would be to generate again a random value and add it *10, so you have accuracy of one digit (here you have to take care that you dont exceed your given range, which can happen, in this case you have to discard the result and generate a new number).
Maybe you have to repeat it, how often depends on how many places the random() function delivers and how much you expect in your final result.
In a standard IEEE format has a limited precision (i.e. double 53 bits). So when you generate a number this way, you never need to generate more than one additional number.
But you have to be careful that when you add the new number, you dont exceed your given upper limit. There are multiple solutions to it: First if you exceed your limit, you start from new, generating a new number (dont cut off or similar, as this changes the distribution).
Second possibility is to check the the intervall size of the missing lower bit range, and
find the middle value, and generate an appropiate value, that guarantees that the result will fit.
You have to consider the amount of entropy that comes from each call to your RNG. Here is some C# code I just wrote that demonstrates how you can accumulate entropy from low-entropy source(s) and end up with a high-entropy random value.
using System;
using System.Collections.Generic;
using System.Security.Cryptography;
namespace SO_8019589
{
class LowEntropyRandom
{
public readonly double EffectiveEntropyBits;
public readonly int PossibleOutcomeCount;
private readonly double interval;
private readonly Random random = new Random();
public LowEntropyRandom(int possibleOutcomeCount)
{
PossibleOutcomeCount = possibleOutcomeCount;
EffectiveEntropyBits = Math.Log(PossibleOutcomeCount, 2);
interval = 1.0 / PossibleOutcomeCount;
}
public LowEntropyRandom(int possibleOutcomeCount, int seed)
: this(possibleOutcomeCount)
{
random = new Random(seed);
}
public int Next()
{
return random.Next(PossibleOutcomeCount);
}
public double NextDouble()
{
return interval * Next();
}
}
class EntropyAccumulator
{
private List<byte> currentEntropy = new List<byte>();
public double CurrentEntropyBits { get; private set; }
public void Clear()
{
currentEntropy.Clear();
CurrentEntropyBits = 0;
}
public void Add(byte[] entropy, double effectiveBits)
{
currentEntropy.AddRange(entropy);
CurrentEntropyBits += effectiveBits;
}
public byte[] GetBytes(int count)
{
using (var hasher = new SHA512Managed())
{
count = Math.Min(count, hasher.HashSize / 8);
var bytes = new byte[count];
var hash = hasher.ComputeHash(currentEntropy.ToArray());
Array.Copy(hash, bytes, count);
return bytes;
}
}
public byte[] GetPackagedEntropy()
{
// Returns a compact byte array that represents almost all of the entropy.
return GetBytes((int)(CurrentEntropyBits / 8));
}
public double GetDouble()
{
// returns a uniformly distributed number on [0-1)
return (double)BitConverter.ToUInt64(GetBytes(8), 0) / ((double)UInt64.MaxValue + 1);
}
public double GetInt(int maxValue)
{
// returns a uniformly distributed integer on [0-maxValue)
return (int)(maxValue * GetDouble());
}
}
class Program
{
static void Main(string[] args)
{
var random = new LowEntropyRandom(2); // this only provides 1 bit of entropy per call
var desiredEntropyBits = 64; // enough for a double
while (true)
{
var adder = new EntropyAccumulator();
while (adder.CurrentEntropyBits < desiredEntropyBits)
{
adder.Add(BitConverter.GetBytes(random.Next()), random.EffectiveEntropyBits);
}
Console.WriteLine(adder.GetDouble());
Console.ReadLine();
}
}
}
}
Since I'm using a 512-bit hash function, that is the max amount of entropy that you can get out of the EntropyAccumulator. This could be fixed, if necessarily.
If I understand your problem correctly, it's that rand() generates finely spaced but ultimately discrete random numbers. And if we multiply it by (y-x) which is large, this spreads these finely spaced floating point values out in a way that is missing many of the floating point values in the range [x,y]. Is that all right?
If so, I think we have a solution already given by Dialecticus. Let me explain why he is right.
First, we know how to generate a random float and then add another floating point value to it. This may produce a round off error due to addition, but it will be in the last decimal place only. Use doubles or something with finer numerical resolution if you want better precision. So, with that caveat, the problem is no harder than finding a random float in the range [0,y-x] with uniform density. Let's say y-x = z. Obviously, since z is a floating point it may not be an integer. We handle the problem in two steps: first we generate the random digits to the left of the decimal point and then generate the random digits to the right of it. Doing both uniformly means their sum is uniformly distributed across the range [0,z] too. Let w be the largest integer <= z. To answer our simplified problem, we can first pick a random integer from the range {0,1,...,w}. Then, step #2 is to add a random float from the unit interval to this random number. This isn't multiplied by any possibly large values, so it has as fine a resolution as the numerical type can have. (Assuming you're using an ideal random floating point number generator.)
So what about the corner case where the random integer was the largest one (i.e. w) and the random float we added to it was larger than z - w so that the random number exceeds the allowed maximum? The answer is simple: do all of it again and check the new result. Repeat until you get a digit in the allowed range. It's an easy proof that a uniformly generated random number which is tossed out and generated again if it's outside an allowed range results in a uniformly generated random in the allowed range. Once you make this key observation, you see that Dialecticus met all your criteria.
When you generate a random number with random(), you get a floating point number between 0 and 1 having an unknown precision (or density, you name it).
And when you multiply it with a number (NUM), you lose this precision, by lg(NUM) (10-based logarithm). So if you multiply by 1000 (NUM=1000), you lose the last 3 digits (lg(1000) = 3).
You may correct this by adding a smaller random number to the original, which has this missing 3 digits. But you don't know the precision, so you can't determine where are they exactly.
I can imagine two scenarios:
(X = range start, Y = range end)
1: you define the precision (PREC, eg. 20 digits, so PREC=20), and consider it enough to generate a random number, so the expression will be:
( random() * (Y-X) + X ) + ( random() / 10 ^ (PREC-trunc(lg(Y-X))) )
with numbers: (X = 500, Y = 1500, PREC = 20)
( random() * (1500-500) + 500 ) + ( random() / 10 ^ (20-trunc(lg(1000))) )
( random() * 1000 + 500 ) + ( random() / 10 ^ (17) )
There are some problems with this:
2 phase random generation (how much will it be random?)
the first random returns 1 -> result can be out of range
2: guess the precision by random numbers
you define some tries (eg. 4) to calculate the precision by generating random numbers and count the precision every time:
- 0.4663164 -> PREC=7
- 0.2581916 -> PREC=7
- 0.9147385 -> PREC=7
- 0.129141 -> PREC=6 -> 7, correcting by the average of the other tries
That's my idea.

Encode number to a result

In my app I need to run a 5 digits number through an algorithm and return a number between the given interval, ie:
The function encode, gets 3 parameters, 5 digits initial number, interval lower limit and interval superior limit, for example:
int res=encode(12879,10,100) returns 83.
The function starts from 12879 and does something with the numbers and returns a number between 10 and 100. This mustn't be random, every time I pass the number 12879 to the encode function must always return the same number.
Any ideas?
Thanks,
Direz
One possible approach:
compute the range of your interval R = (100 - 10) + 1
compute a hash modulo R of the input H = hash(12879) % R
add the lower bound to the modular hash V = 10 + H
Here the thing though - you haven't defined any constraints or requirements on the "algorithm" that produces the result. If all you want is to map a value into a given range (without any knowledge of the distribution of the input, or how input values may cluster, etc), you could just as easily just take the range modulo of the input without hashing (as Foo Bah demonstrates).
If there are certain constraints, requirements, or distributions of the input or output of your encode method, then the approach may need to be quite different. However, you are the only one who knows what additional requirements you have.
You can do something simple like
encode(x,y,z) --> y + (x mod (z-y))
You don't have an upper limit for this function?
Assume it is 99999 because it is 5 digits. For your case, the simplest way is:
int encode (double N,double H,double L)
{
return (int)(((H - L) / (99999 - 10000)) * (N - 10000) + 10);
}

Converting a Uniform Distribution to a Normal Distribution

How can I convert a uniform distribution (as most random number generators produce, e.g. between 0.0 and 1.0) into a normal distribution? What if I want a mean and standard deviation of my choosing?
There are plenty of methods:
Do not use Box Muller. Especially if you draw many gaussian numbers. Box Muller yields a result which is clamped between -6 and 6 (assuming double precision. Things worsen with floats.). And it is really less efficient than other available methods.
Ziggurat is fine, but needs a table lookup (and some platform-specific tweaking due to cache size issues)
Ratio-of-uniforms is my favorite, only a few addition/multiplications and a log 1/50th of the time (eg. look there).
Inverting the CDF is efficient (and overlooked, why ?), you have fast implementations of it available if you search google. It is mandatory for Quasi-Random numbers.
The Ziggurat algorithm is pretty efficient for this, although the Box-Muller transform is easier to implement from scratch (and not crazy slow).
Changing the distribution of any function to another involves using the inverse of the function you want.
In other words, if you aim for a specific probability function p(x) you get the distribution by integrating over it -> d(x) = integral(p(x)) and use its inverse: Inv(d(x)). Now use the random probability function (which have uniform distribution) and cast the result value through the function Inv(d(x)). You should get random values cast with distribution according to the function you chose.
This is the generic math approach - by using it you can now choose any probability or distribution function you have as long as it have inverse or good inverse approximation.
Hope this helped and thanks for the small remark about using the distribution and not the probability itself.
Here is a javascript implementation using the polar form of the Box-Muller transformation.
/*
* Returns member of set with a given mean and standard deviation
* mean: mean
* standard deviation: std_dev
*/
function createMemberInNormalDistribution(mean,std_dev){
return mean + (gaussRandom()*std_dev);
}
/*
* Returns random number in normal distribution centering on 0.
* ~95% of numbers returned should fall between -2 and 2
* ie within two standard deviations
*/
function gaussRandom() {
var u = 2*Math.random()-1;
var v = 2*Math.random()-1;
var r = u*u + v*v;
/*if outside interval [0,1] start over*/
if(r == 0 || r >= 1) return gaussRandom();
var c = Math.sqrt(-2*Math.log(r)/r);
return u*c;
/* todo: optimize this algorithm by caching (v*c)
* and returning next time gaussRandom() is called.
* left out for simplicity */
}
Where R1, R2 are random uniform numbers:
NORMAL DISTRIBUTION, with SD of 1:
sqrt(-2*log(R1))*cos(2*pi*R2)
This is exact... no need to do all those slow loops!
Reference: dspguide.com/ch2/6.htm
Use the central limit theorem wikipedia entry mathworld entry to your advantage.
Generate n of the uniformly distributed numbers, sum them, subtract n*0.5 and you have the output of an approximately normal distribution with mean equal to 0 and variance equal to (1/12) * (1/sqrt(N)) (see wikipedia on uniform distributions for that last one)
n=10 gives you something half decent fast. If you want something more than half decent go for tylers solution (as noted in the wikipedia entry on normal distributions)
I would use Box-Muller. Two things about this:
You end up with two values per iteration
Typically, you cache one value and return the other. On the next call for a sample, you return the cached value.
Box-Muller gives a Z-score
You have to then scale the Z-score by the standard deviation and add the mean to get the full value in the normal distribution.
It seems incredible that I could add something to this after eight years, but for the case of Java I would like to point readers to the Random.nextGaussian() method, which generates a Gaussian distribution with mean 0.0 and standard deviation 1.0 for you.
A simple addition and/or multiplication will change the mean and standard deviation to your needs.
The standard Python library module random has what you want:
normalvariate(mu, sigma)
Normal distribution. mu is the mean, and sigma is the standard deviation.
For the algorithm itself, take a look at the function in random.py in the Python library.
The manual entry is here
This is a Matlab implementation using the polar form of the Box-Muller transformation:
Function randn_box_muller.m:
function [values] = randn_box_muller(n, mean, std_dev)
if nargin == 1
mean = 0;
std_dev = 1;
end
r = gaussRandomN(n);
values = r.*std_dev - mean;
end
function [values] = gaussRandomN(n)
[u, v, r] = gaussRandomNValid(n);
c = sqrt(-2*log(r)./r);
values = u.*c;
end
function [u, v, r] = gaussRandomNValid(n)
r = zeros(n, 1);
u = zeros(n, 1);
v = zeros(n, 1);
filter = r==0 | r>=1;
% if outside interval [0,1] start over
while n ~= 0
u(filter) = 2*rand(n, 1)-1;
v(filter) = 2*rand(n, 1)-1;
r(filter) = u(filter).*u(filter) + v(filter).*v(filter);
filter = r==0 | r>=1;
n = size(r(filter),1);
end
end
And invoking histfit(randn_box_muller(10000000),100); this is the result:
Obviously it is really inefficient compared with the Matlab built-in randn.
This is my JavaScript implementation of Algorithm P (Polar method for normal deviates) from Section 3.4.1 of Donald Knuth's book The Art of Computer Programming:
function normal_random(mean,stddev)
{
var V1
var V2
var S
do{
var U1 = Math.random() // return uniform distributed in [0,1[
var U2 = Math.random()
V1 = 2*U1-1
V2 = 2*U2-1
S = V1*V1+V2*V2
}while(S >= 1)
if(S===0) return 0
return mean+stddev*(V1*Math.sqrt(-2*Math.log(S)/S))
}
I thing you should try this in EXCEL: =norminv(rand();0;1). This will product the random numbers which should be normally distributed with the zero mean and unite variance. "0" can be supplied with any value, so that the numbers will be of desired mean, and by changing "1", you will get the variance equal to the square of your input.
For example: =norminv(rand();50;3) will yield to the normally distributed numbers with MEAN = 50 VARIANCE = 9.
Q How can I convert a uniform distribution (as most random number generators produce, e.g. between 0.0 and 1.0) into a normal distribution?
For software implementation I know couple random generator names which give you a pseudo uniform random sequence in [0,1] (Mersenne Twister, Linear Congruate Generator). Let's call it U(x)
It is exist mathematical area which called probibility theory.
First thing: If you want to model r.v. with integral distribution F then you can try just to evaluate F^-1(U(x)). In pr.theory it was proved that such r.v. will have integral distribution F.
Step 2 can be appliable to generate r.v.~F without usage of any counting methods when F^-1 can be derived analytically without problems. (e.g. exp.distribution)
To model normal distribution you can cacculate y1*cos(y2), where y1~is uniform in[0,2pi]. and y2 is the relei distribution.
Q: What if I want a mean and standard deviation of my choosing?
You can calculate sigma*N(0,1)+m.
It can be shown that such shifting and scaling lead to N(m,sigma)
I have the following code which maybe could help:
set.seed(123)
n <- 1000
u <- runif(n) #creates U
x <- -log(u)
y <- runif(n, max=u*sqrt((2*exp(1))/pi)) #create Y
z <- ifelse (y < dnorm(x)/2, -x, NA)
z <- ifelse ((y > dnorm(x)/2) & (y < dnorm(x)), x, z)
z <- z[!is.na(z)]
It is also easier to use the implemented function rnorm() since it is faster than writing a random number generator for the normal distribution. See the following code as prove
n <- length(z)
t0 <- Sys.time()
z <- rnorm(n)
t1 <- Sys.time()
t1-t0
function distRandom(){
do{
x=random(DISTRIBUTION_DOMAIN);
}while(random(DISTRIBUTION_RANGE)>=distributionFunction(x));
return x;
}

Resources