Normally distributed random function without irrational operations - random

I'm working on a game for which I want deterministic demo playback that is portable between architectures that treat floating point numbers differently. I'm using the Racket language, which conveniently has, as a primitive data type, non-floating-point representations of rational-number fractions. I want to use these to implement an approximately normally-distributed random function that accepts parameters for mean and standard deviation (skewness would be gold-plating).
Because of the limitations I've mentioned, any operation that takes in rational numbers and puts out irrational ones will need to be reimplemented from scratch in a way that produces approximations based on Racket's native fractions, not based on floating points. I've looked around at various algorithms for normal random functions, but of these, even many of the "simplest" ones like the Box-Muller transform involve things like square roots, logarithms, and trig functions. Iterated averaging is easy, so square roots aren't a problem, but I don't want to reinvent any more wheels than I need to here.
What are some algorithms I can use for generating approximately normal random numbers without invoking irrational operations like roots, logarithms, and trig functions?

I settled on a solution after typing up this question but before sending it, so I'll Share My Knowledge Q&A-Style.
After poring over several different SO posts on normally-distributed random numbers, I found that the best solution for my purposes was actually the most naive one: abuse the Central Limit Theorem. Random variables of any distribution, when added up, approximate a normal distribution just fine. In Racket, my solution turned out to be the delightfully concise
(define (random/normal μ σ)
(+ (* (- (for/sum ([i 12])
(random/uniform 0 1))
6)
σ)
μ))
where uniform-random is my function for generating uniformly random rational numbers.
In infix, imperative pseudocode, this means:
Function random_normal(μ, σ):
iterations := 12
sum := 0
for i from 1 to iterations:
sum += random_uniform(0, 1)
sum -= iterations / 2 # center the distribution on 0
return σ * sum + μ
Why 12 iterations?
A few SO answers mention this solution, but don't explain why 12 is a magic number here. When we add up those numbers, we want the standard deviation of that random sum to equal 1 so that we can stretch out or squish down the bell curve by the desired amount in a single multiplicative step.
If you sum a sample of random variables the standard deviation of the approximately normal distribution this creates is equal to
where is the standard deviation of the variables themselves.* The standard deviation of a uniform random distribution from 0 to 1 is equal to † so by substituting this in for we see that what we want is just
which works out easily to
* See "Central Limit Theorem" on Wolfram MathWorld. Equation is given under identity (2), here multiplied by N to give the standard deviation of the sum rather than of the average.
† See "Continuous uniform distribution" on Wikipedia. Table on the right, "variance" square-rooted.
But doesn't this limit your range to ±6 standard deviations?
It does, but the range of your distribution has to be truncated somewhere unless you have infinite memory, and ±6σ is A) almost as good as Box-Muller on a 32-bit machine and B) already huge.

Related

How to efficiently sample the continuous negative binomial distribution?

First, for context, I am working on a game where when you do something good you earn positive credits and when you do something bad you earn negative credits, and each credit corresponds to flipping a biased coin where if you get heads then something happens (good if its a positive credit, bad if its a negative credit) and otherwise nothing happens.
The deal is that I want to handle the case of multiple credits and fractional credits, and I would like to have flips use up credits so that if something good/bad happens then the leftover credits carry over. A straightforward way of doing this is to just perform a bunch of trials, and in particular for the case of fractional credits we can multiply the number of credits by X and the likelihood of something happening by 1/X (the distribution has the same expectation but slightly different weights); unfortunately, this places a practical limit on how many credits the user can get and also how many decimal places can be in the number of credits since this results in an unbounded amount of work.
What I would like to do is to take advantage of the fact that I am sampling the continuous negative binomial distribution, which is the distribution of how many trials it takes to get heads, i.e. so that if f(X) is the distribution then f(X) gives the probability that there will be X tails before we run into a heads, where X need not be an integer. If I can sample this distribution, then what I can do is that if X is the number of tails then I can see if X is greater or less than the number of credits; if it is greater than then we use up all of the credits but nothing happens, and if it is less than or equal to then something good happens and we subtract X from the number of credits. Furthermore, because the distribution is continuous I can easily handle fractional credits.
Does anyone know of a way for me to be able to efficiently sample the continuous negative binomial distribution (that is, a function that generates random numbers from this distribution)?
This question may be better answered on StatsExchange, but here I will take a stab at it.
You are correct that trying to compute this directly will be computationally expensive as you cannot avoid the beta and/or gamma function dependencies. The only statistically valid approximation I'm aware of is if the number of successes s required is large, and p is neither very small nor very large, then you can approximate it with a normal distribution with special values for the mean and variance. You can read more here but I'm guessing this approximation will not be generally applicable for you.
The negative binomial distribution can also be approximated as a mixture of Poisson distributions, but this doesn't save you from the gamma function dependency.
The only efficient class of negative binomial samplers that I'm aware of use optimized accept-reject techniques. Pages 10-11 of this PDF here describe the concept behind the method. Page 6 (page 295 internally) of this PDF here contains source code for sampling binomial deviates using related techniques. Note that even these methods still require random uniform deviates as well as sqrt(), log(), and gammln() calls. For small numbers of trials (less than 100 maybe?) I wouldn't be surprised at all if just simulating the trials with fast random number generator is faster than even the accept-reject techniques. Definitely start by getting a fast PRNG; they are not all created equal.
Edit:
The following pseudo-code would probably be fairly efficient to draw a random discrete negative binomial-distributed value as long as p is not very large (too close to 1.0). It will return the number of trials required before reaching your first "desired" outcome (which is actually the first "failure" in terms of the distribution):
// assume p and r are the parameters to the neg. binomial dist.
// r = number of failures (you'll set to one for your purpose)
// p = probability of a "success"
double rnd = _rnd.nextDouble(); // [0.0, 1.0)
int k = 0; // represents the # of successes that occur before 1st failure
double lastPmf = (1 - p)^r;
double cdf = lastPmf;
while (cdf < rnd)
{
lastPmf *= (p * (k+r) / (k+1));
cdf += lastPmf;
k++;
}
return k;
// or return (k+1) to also count the trial on which the failure occurred
Using the recurrence relationship saves over repeating the factorial independently at each step. I think using this, combined with limiting your fractional precision to 1 or 2 decimal places (so you only need to multiply by 10 or 100 respectively) might work for your purposes. You are drawing only one random number and the rest is just multiplications--it should be quite fast.

Generating Gaussian Random Numbers without a Uniform Random Number Generator

I know many uniform random number generators(RNGs) based on some algorithms, physical systems and so on. Eventually, all these lead to uniformly distributed random numbers. It's interesting and important to know whether there is Gaussian RNGs, i.e. the algorithm or something else creates Gaussian random numbers. Much precisely I want to say that I don't want to use transformations such as Box–Muller or Marsaglia polar method to get Gaussian from Uniform RNGs. I am interested if there is some paper, algorithm or even idea to create Gaussian random numbers without any of use Uniform RNGs. It's just to say we pretend that we don't know there exist Uniform random number generators.
As already noted in answers/comments, by virtue of CLT some sum of any iid random number could be made into some reasonable looking gaussian. If incoming stream is uniform, this is basically Bates distribution. Ami Tavory answer is pretty much amounts to using Bates in disguise. You could look at closely related Irwin-Hall distribution, and at n=12 or higher they look a lot like gaussian.
There is one method which is used in practice and does not rely on transformation of the U(0,1) - Wallace method (Wallace, C. S. 1996. "Fast Pseudorandom Generators for Normal and Exponential Variates." ACM Transactions on Mathematical Software.), or gaussian pool method. I would advice to read description here and see if it fits your purpose
As others have noted, it's a bit unclear what is your motivation for this, and therefore I'm not sure if the following answers your question.
Nevertheless, it is possible to generate (an approximation of) this without the specific formulas transforming uniform RNGs that you mention.
As with any RNG, we have to have some source of randomness (or pseudo-randomness). I'm assuming, therefore, that there is some limitless sequence of binary bits which are independently equally likely to be 0 or 1 (note that it's possible to counter that this is a uniform discrete binary RNG, so I'm unsure if this answers your question).
Choose some large fixed n. For each invocation of the RNG, generate n such bits, sum them as x, and return
(2 x - 1) / √n
By the de Moivre–Laplace theorem this is normal with mean 0 and variance 1.

Algorithm to generate a (pseudo-) random high-dimensional function

I don't mean a function that generates random numbers, but an algorithm to generate a random function
"High dimension" means the function is multi-variable, e.g. a 100-dim function has 100 different variables.
Let's say the domain is [0,1], we need to generate a function f:[0,1]^n->[0,1]. This function is chosen from a certain class of functions, so that the probability of choosing any of these functions is the same.
(This class of functions can be either all continuous, or K-order derivative, whichever is convenient for the algorithm.)
Since the functions on a closed interval domain are uncountable infinite, we only require the algorithm to be pseudo-random.
Is there a polynomial time algorithm to solve this problem?
I just want to add a possible algorithm to the question(but not feasible due to its exponential time complexity). The algorithm was proposed by the friend who actually brought up this question in the first place:
The algorithm can be simply described as following. First, we assume the dimension d = 1 for example. Consider smooth functions on the interval I = [a; b]. First, we split the domain [a; b] into N small intervals. For each interval Ii, we generate a random number fi living in some specific distributions (Gaussian or uniform distribution). Finally, we do the interpolation of
series (ai; fi), where ai is a characteristic point of Ii (eg, we can choose ai as the middle point of Ii). After interpolation, we gain a smooth curve, which can be regarded as a one dimensional random function construction living in the function space Cm[a; b] (where m depends on the interpolation algorithm we choose).
This is just to say that the algorithm does not need to be that formal and rigorous, but simply to provide something that works.
So if i get it right you need function returning scalar from vector;
The easiest way I see is the use of dot product
for example let n be the dimensionality you need
so create random vector a[n] containing random coefficients in range <0,1>
and the sum of all coefficients is 1
create float a[n]
feed it with positive random numbers (no zeros)
compute the sum of a[i]
divide a[n] by this sum
now the function y=f(x[n]) is simply
y=dot(a[n],x[n])=a[0]*x[0]+a[1]*x[1]+...+a[n-1]*x[n-1]
if I didn't miss something the target range should be <0,1>
if x==(0,0,0,..0) then y=0;
if x==(1,1,1,..1) then y=1;
If you need something more complex use higher order of polynomial
something like y=dot(a0[n],x[n])*dot(a1[n],x[n]^2)*dot(a2[n],x[n]^3)...
where x[n]^2 means (x[0]*x[0],x[1]*x[1],...)
Booth approaches results in function with the same "direction"
if any x[i] rises then y rises too
if you want to change that then you have to allow also negative values for a[]
but to make that work you need to add some offset to y shifting from negative values ...
and the a[] normalization process will be a bit more complex
because you need to seek the min,max values ...
easier option is to add random flag vector m[n] to process
m[i] will flag if 1-x[i] should be used instead of x[i]
this way all above stays as is ...
you can create more types of mapping to make it even more vaiable
This might not only be hard, but impossible if you actually want to be able to generate every continuous function.
For the one-dimensional case you might be able to create a useful approximation by looking into the Faber-Schauder-System (also see wiki). This gives you a Schauder-basis for continuous functions on an interval. This kind of basis only covers the whole vectorspace if you include infinite linear combinations of basisvectors. Thus you can create some random functions by building random linear combinations from this basis, but in general you won't be able to create functions that are actually represented by an infinite amount of basisvectors this way.
Edit in response to your update:
It seems like choosing a random polynomial function of order K (for the class of K-times differentiable functions) might be sufficient for you since any of these functions can be approximated (around a given point) by one of those (see taylor's theorem). Choosing a random polynomial function is easy, since you can just pick K random real numbers as coefficients for your polynom. (Note that this will for example not return functions similar to abs(x))

Pseudorandom Number Generation with Specific Non-Uniform Distributions

I'm writing a program that simulates various random walks (with differing distributions). At each timestep, I need randomly generated, two dimensional step distances and angles from the distribution of the random walk. I'm hoping someone can check my understanding of how to generate these random numbers.
As I understand it I can use Inverse Transform Sampling as follows:
If f(x) is the pdf of our random walk that has a non-uniform distribution, and y is a random number from a uniform distribution.
Then if we let f(x) = y and solve to find x then we have a random number from the non-uniform distribution.
Is this a feasible solution?
Not quite. The function that needs to be inverted is not f(x), the pdf, but F(x)=P(X<=x)=int_{-inf}^{x}f(t)dt, the cdf. The good thing is that F is monotone, so actually has a unique inverse (unlike f).
There are multiple other ways of generating random numbers according to a given distribution. For example, if the cdf F is difficult to compute or to invert, rejection sampling can be a good option if f is easy to compute.
You are close, but not quite. Every probability density function (pdf) has a corresponding cumulative density function (cdf). An important property about CDF(x) is that they are always between 0 and 1. Because it is relatively easy to draw a random number between 0 and 1, we can use that to work our way backwards to the distribution. So changing the word pdf to CDF in your question makes the statement correct.
As an aside for this to make sense computationally you need to find an easy to calculate inverse of the CDF. One way to do this is to fit a polynomial approximation to the CDF and find the inverse of that function. There are more advanced techniques for simulating probability distributions with messy distributions. See this book chapter for the details.

Generating random numbers with known mean and variance

From a paper I'm reading right know:
...
S(t+1, k) = S(t, k) + ... + C*∆
...
∆ is a standard random variable with mean 0 and variance 1.
...
How to generate this series of random values with this mean and variance? If someone has links to a C or C++ library I would be glad but I wouldn't mind implementing it myself if someone tells me how to do it :)
Do you have any restrictions on the distribution of \Delta ? if not you can just use a uniform distribution in [-sqrt(3), sqrt(3)]. The reason why this would work is because for an uniform distribution [a,b] the variance is 1/(12) (b-a)^2.
You can use the Box-Muller transform.
Suppose U1 and U2 are independent random variables that are uniformly distributed in the interval (0, 1]. Let
and
Then Z0 and Z1 are independent random variables with a normal distribution of standard deviation 1.
Waffles is a mature, stable C++ library that you can use. In particular, the noise function in the *waffles_generate* module will do what you want.
Aside from center and spread (mean and sd) also need to know the probability distribution that the random numbers are drawn from. If the paper you are reading doesn't say anything about this, and there's no other reasonable inference supported by context, then the author probably is referring to a normal distribution (gaussian)--because that's the most common, and because the two parameters one needs to completely specify a normal distribution are mean and sd. Many distributions are not specified this way--e.g., for a Gamma distribution, shape, scale, and rate are needed; to specify a Logistic, you need location and scale, etc.
If all you want it a certain mean 0 and variance 1, probably the simplest is this. Do you have a uniform random number generator unif() that gives you numbers between 0 and 1? If you want the number to be very close to a normal distribution, can just add up 12 uniform(0,1) numbers and subtract 6. If you want it to be really exactly a normal distribution, you can use the Box-Muller transform, as Mark suggested, if you don't mind throwing in a log, a sine, and a cosine.

Resources