generate random numbers within a range with different probabilities - algorithm

How can i generate a random number between A = 1 and B = 10 where each number has a different probability?
Example: number / probability
1 - 20%
2 - 20%
3 - 10%
4 - 5%
5 - 5%
...and so on.
I'm aware of some hard-coded workarounds which unfortunately are of no use with larger ranges, for example A = 1000 and B = 100000.
Assume we have a
Rand()
method which returns a random number R, 0 < R < 1, can anyone post a code sample with a proper way of doing this ? prefferable in c# / java / actionscript.

Build an array of 100 integers and populate it with 20 1's, 20 2's, 10 3's, 5 4's, 5 5's, etc. Then just randomly pick an item from the array.
int[] numbers = new int[100];
// populate the first 20 with the value '1'
for (int i = 0; i < 20; ++i)
{
numbers[i] = 1;
}
// populate the rest of the array as desired.
// To get an item:
// Since your Rand() function returns 0 < R < 1
int ix = (int)(Rand() * 100);
int num = numbers[ix];
This works well if the number of items is reasonably small and your precision isn't too strict. That is, if you wanted 4.375% 7's, then you'd need a much larger array.

There is an elegant algorithm attributed by Knuth to A. J. Walker (Electronics Letters 10, 8 (1974), 127-128; ACM Trans. Math Software 3 (1977), 253-256).
The idea is that if you have a total of k * n balls of n different colors, then it is possible to distribute the balls in n containers such that container no. i contains balls of color i and at most one other color. The proof is by induction on n. For the induction step pick the color with the least number of balls.
In your example n = 10. Multiply the probabilities with a suitable m such that they are all integers. So, maybe m = 100 and you have 20 balls of color 0, 20 balls of color 1, 10 balls of color 2, 5 balls of color 3, etc. So, k = 10.
Now generate a table of dimension n with each entry being a probability (the ration of balls of color i vs the other color) and the other color.
To generate a random ball, generate a random floating-point number r in the range [0, n). Let i be the integer part (floor of r) and x the excess (r – i).
if (x < table[i].probability) output i
else output table[i].other
The algorithm has the advantage that for each random ball you only make a single comparison.
Let me work out an example (same as Knuth).
Consider simulating throwing a pair of dice.
So P(2) = 1/36, P(3) = 2/36, P(4) = 3/36, P(5) = 4/36, P(6) = 5/36, P(7) = 6/36, P(8) = 5/36, P(9) = 4/36, P(10) = 3/36, P(11) = 2/36, P(12) = 1/36.
Multiply by 36 * 11 to get 393 balls, 11 of color 2, 22 of color 3, 33 of color 4, …, 11 of color 12.
We have k = 393 / 11 = 36.
Table[2] = (11/36, color 4)
Table[12] = (11/36, color 10)
Table[3] = (22/36, color 5)
Table[11] = (22/36, color 5)
Table[4] = (8/36, color 9)
Table[10] = (8/36, color 6)
Table[5] = (16/36, color 6)
Table[9] = (16/36, color 8)
Table[6] = (7/36, color 8)
Table[8] = (6/36, color 7)
Table[7] = (36/36, color 7)

Assuming that you have a function p(n) that gives you the desired probability for a random number:
r = rand() // a random number between 0 and 1
for i in A to B do
if r < p(i)
return i
r = r - p(i)
done
A faster way is to create an array of (B - A) * 100 elements and populate it with numbers from A to B such that the ratio of the number of each item occurs in the array to the size of the array is its probability. You can then generate a uniform random number to get an index to the array and directly access the array to get your random number.

Map your uniform random results to the required outputs according to the probabilities.
E.g., for your example:
If `0 <= Round() <= 0.2`: result = 1.
If `0.2 < Round() <= 0.4`: result = 2.
If `0.4 < Round() <= 0.5`: result = 3.
If `0.5 < Round() <= 0.55`: result = 4.
If `0.55 < Round() <= 0.65`: result = 5.
...

Here's an implementation of Knuth's Algorithm. As discussed by some of the answers it works by
1) creating a table of summed frequencies
2) generates a random integer
3) rounds it with ceiling function
4) finds the "summed" range within which the random number falls and outputs original array entity based on it

Inverse Transform
In probability speak, a cumulative distribution function F(x) returns the probability that any randomly drawn value, call it X, is <= some given value x. For instance, if I did F(4) in this case, I would get .6. because the running sum of probabilities in your example is {.2, .4, .5, .55, .6, .65, ....}. I.e. the probability of randomly getting a value less than or equal to 4 is .6. However, what I actually want to know is the inverse of the cumulative probability function, call it F_inv. I want to know what is the x value given the cumulative probability. I want to pass in F_inv(.6) and get back 4. That is why this is called the inverse transform method.
So, in the inverse transform method, we are basically trying to find the interval in the cumulative distribution in which a random Uniform (0,1) number falls. This works out to the algorithm that perreal and icepack posted. Here is another way to state it in terms of the cumulative distribution function
Generate a random number U
for x in A .. B
if U <= F(x) then return x
Note that it might be more efficient to have the loop go from B to A and check if U >= F(x) if the smaller probabilities come at the beginning of the distribution

Related

Psuedo-Random Variable

I have a variable, between 0 and 1, which should dictate the likelyhood that a second variable, a random number between 0 and 1, is greater than 0.5. In other words, if I were to generate the second variable 1000 times, the average should be approximately equal to the first variable's value. How do I make this code?
Oh, and the second variable should always be capable of producing either 0 or 1 in any condition, just more or less likely depending on the value of the first variable. Here is a link to a graph which models approximately how I would like the program to behave. Each equation represents a separate value for the first variable.
You have a variable p and you are looking for a mapping function f(x) that maps random rolls between x in [0, 1] to the same interval [0, 1] such that the expected value, i.e. the average of all rolls, is p.
You have chosen the function prototype
f(x) = pow(x, c)
where c must be chosen appropriately. If x is uniformly distributed in [0, 1], the average value is:
int(f(x) dx, [0, 1]) == p
With the integral:
int(pow(x, c) dx) == pow(x, c + 1) / (c + 1) + K
one gets:
c = 1/p - 1
A different approach is to make p the median value of the distribution, such that half of the rolls fall below p, the other half above p. This yields a different distribution. (I am aware that you didn't ask for that.) Now, we have to satisfy the condition:
f(0.5) == pow(0.5, c) == p
which yields:
c = log(p) / log(0.5)
With the current function prototype, you cannot satisfy both requirements. Your function is also asymmetric (f(x, p) != f(1-x, 1-p)).
Python functions below:
def medianrand(p):
"""Random number between 0 and 1 whose median is p"""
c = math.log(p) / math.log(0.5)
return math.pow(random.random(), c)
def averagerand(p):
"""Random number between 0 and 1 whose expected value is p"""
c = 1/p - 1
return math.pow(random.random(), c)
You can do this by using a dummy. First set the first variable to a value between 0 and 1. Then create a random number in the dummy between 0 and 1. If this dummy is bigger than the first variable, you generate a random number between 0 and 0.5, and otherwise you generate a number between 0.5 and 1.
In pseudocode:
real a = 0.7
real total = 0.0
for i between 0 and 1000 begin
real dummy = rand(0,1)
real b
if dummy > a then
b = rand(0,0.5)
else
b = rand(0.5,1)
end if
total = total + b
end for
real avg = total / 1000
Please note that this algorithm will generate average values between 0.25 and 0.75. For a = 1 it will only generate random values between 0.5 and 1, which should average to 0.75. For a=0 it will generate only random numbers between 0 and 0.5, which should average to 0.25.
I've made a sort of pseudo-solution to this problem, which I think is acceptable.
Here is the algorithm I made;
a = 0.2 # variable one
b = 0 # variable two
b = random.random()
b = b^(1/(2^(4*a-1)))
It doesn't actually produce the average results that I wanted, but it's close enough for my purposes.
Edit: Here's a graph I made that consists of a large amount of datapoints I generated with a python script using this algorithm;
import random
mod = 6
div = 100
for z in xrange(div):
s = 0
for i in xrange (100000):
a = (z+1)/float(div) # variable one
b = random.random() # variable two
c = b**(1/(2**((mod*a*2)-mod)))
s += c
print str((z+1)/float(div)) + "\t" + str(round(s/100000.0, 3))
Each point in the table is the result of 100000 randomly generated points from the algorithm; their x positions being the a value given, and their y positions being their average. Ideally they would fit to a straight line of y = x, but as you can see they fit closer to an arctan equation. I'm trying to mess around with the algorithm so that the averages fit the line, but I haven't had much luck as of yet.

Random choosing number in array without repeated

I have a algorithm to randomly select element t in a array with out repeated. This is more detail of algorithm
It can explain as folowing:
Initial a array index u that stores the index of numbers from 1 to k (line 1 to 3)
Set initial of gamma from k and reduce by one for each iteration. The purpose of gamma is for without repeated (line 4,9,10)
Random choose a number t from 1 to N(at the j=1, choose 1 to k, N are nonrepated number), and then put the number to the end of array.
Repate the step 2 to 3
If gamma =0,reset gamma=k
This function will return the t.
For example, I have a array A=[1,2,3,4,5,6,7,8,9], k=9 =size(A), N=12 (From 1 to 9, number select only one time). Now I want to use this algorithm to randomly select number t from array A. This is my code. However, it does not similar the line 6 in the algorithm. Is it right? Let see my code help me
function nonRepeat
k=9;
u=1:k; % initial value of index
N=12
gamma=k;
for j=1:N
index=randi(gamma,1); % use other choosing
t=u(index)
%%swapping
temp=u(t);
u(t)=u(gamma);
u(gamma)=temp;
gamma=gamma-1;
if gamma==0
gamma=k;
end
end
end
I think index=randi(gamma,1); is not right because it says select number t randomly but you select index randomly and assign t=u(index).
See if it works,
k = 9;
u = 1 : k;
N = 12;
gamma = k;
for j = 1 : N
t = randi(gamma,1);
temp = u(t);
u(t) = u(gamma);
u(gamma) = temp;
gamma = gamma - 1;
if gamma == 0
gamma = k;
end
end

How to generate a random number with equal probability in a given interval

I tried a lot but could not get a solution for this problem
Function returns numbers in range [1,6] with equal probability. You can use library's rand() function and you can assume implementation of rand() returns number in range number in range [0,RAND_MAX] with equal probability.
We'll do this in multiple steps.
You need to generate a number in the range [1, 6], inclusive.
You have a random number generator that will generate numbers in the range [0..RAND_MAX].
Let's say you wanted to generate numbers in the range [0..5]. You can do this:
int r = rand(); // gives you a number from 0 to RAND_MAX
double d = r / RAND_MAX; // gives you a number from 0 to 1
double val = d * 5; // gives you a number from 0 to 5
int result = round(d); // rounds to an integer
You can use that technique to So given a range of [0, high], you can generate a random number, divide by RAND_MAX, multiply by high, and round the result.
Your range is [1, 6], so you have to add another step. You want to generate a random number in the range [0, 5], and then add 1. Or, in general, to generate a random number in a given range, [low, high], you write:
int r = rand();
double d = r / RAND_MAX;
int range = high - low + 1;
double val = d * range;
result = round(val);
Obviously you can combine some of those operations. I just showed them individually to illustrate.
Basically you are looking for using the operator% (modolus).
r = rand() % 6 + 1
If you are afraid that RAND_MAX % 6 != 0 and the solution will be biased - you just need to 'throw' some numbers (up to 5) out and redraw if you get them:
let M = (RAND_MAX / 6) * 6 [integer division]
r = dontCare
do {
r = rand()
} while (r > M)
r = r % 6 + 1
PS, if you want to draw a 'real' number, it can be done with:
r = drawInt(1,5) // draw integer from 1 to 5 inclusive, as previously explained.
r += rand() / (RAND_MAX - 1) //the decimal part
Note that it is not a 'real' number, and the density between two numbers is 1/RAND_MAX-1

implementing a simple big bang big crunch (BB-BC) in matlab

i want to implement a simple BB-BC in MATLAB but there is some problem.
here is the code to generate initial population:
pop = zeros(N,m);
for j = 1:m
% formula used to generate random number between a and b
% a + (b-a) .* rand(N,1)
pop(:,j) = const(j,1) + (const(j,2) - const(j,1)) .* rand(N,1);
end
const is a matrix (mx2) which holds constraints for control variables. m is number of control variables. random initial population is generated.
here is the code to compute center of mass in each iteration
sum = zeros(1,m);
sum_f = 0;
for i = 1:N
f = fitness(new_pop(i,:));
%keyboard
sum = sum + (1 / f) * new_pop(i,:);
%keyboard
sum_f = sum_f + 1/f;
%keyboard
end
CM = sum / sum_f;
new_pop holds newly generated population at each iteration, and is initialized with pop.
CM is a 1xm matrix.
fitness is a function to give fitness value for each particle in generation. lower the fitness, better the particle.
here is the code to generate new population in each iteration:
for i=1:N
new_pop(i,:) = CM + rand(1) * alpha1 / (n_itr+1) .* ( const(:,2)' - const(:,1)');
end
alpha1 is 0.9.
the problem is that i run the code for 100 iterations, but fitness just decreases and becomes negative. it shouldnt happen at all, because all particles are in search space and CM should be there too, but it goes way beyond the limits.
for example, if this is the limits (m=4):
const = [1 10;
1 9;
0 5;
1 4];
then running yields this CM:
57.6955 -2.7598 15.3098 20.8473
which is beyond all limits.
i tried limiting CM in my code, but then it just goes and sticks at all top boundaries, which in this example give CM=
10 9 5 4
i am confused. there is something wrong in my implementation or i have understood something wrong in BB-BC?

Randomly Generate a set of numbers of n length totaling x

I'm working on a project for fun and I need an algorithm to do as follows:
Generate a list of numbers of Length n which add up to x
I would settle for list of integers, but ideally, I would like to be left with a set of floating point numbers.
I would be very surprised if this problem wasn't heavily studied, but I'm not sure what to look for.
I've tackled similar problems in the past, but this one is decidedly different in nature. Before I've generated different combinations of a list of numbers that will add up to x. I'm sure that I could simply bruteforce this problem but that hardly seems like the ideal solution.
Anyone have any idea what this may be called, or how to approach it? Thanks all!
Edit: To clarify, I mean that the list should be length N while the numbers themselves can be of any size.
edit2: Sorry for my improper use of 'set', I was using it as a catch all term for a list or an array. I understand that it was causing confusion, my apologies.
This is how to do it in Python
import random
def random_values_with_prescribed_sum(n, total):
x = [random.random() for i in range(n)]
k = total / sum(x)
return [v * k for v in x]
Basically you pick n random numbers, compute their sum and compute a scale factor so that the sum will be what you want it to be.
Note that this approach will not produce "uniform" slices, i.e. the distribution you will get will tend to be more "egalitarian" than it should be if it was picked at random among all distribution with the given sum.
To see the reason you can just picture what the algorithm does in the case of two numbers with a prescribed sum (e.g. 1):
The point P is a generic point obtained by picking two random numbers and it will be uniform inside the square [0,1]x[0,1]. The point Q is the point obtained by scaling P so that the sum is required to be 1. As it's clear from the picture the points close to the center of the have an higher probability; for example the exact center of the squares will be found by projecting any point on the diagonal (0,0)-(1,1), while the point (0, 1) will be found projecting only points from (0,0)-(0,1)... the diagonal length is sqrt(2)=1.4142... while the square side is only 1.0.
Actually, you need to generate a partition of x into n parts. This is usually done the in following way: The partition of x into n non-negative parts can be represented in the following way: reserve n + x free places, put n borders to some arbitrary places, and stones to the rest. The stone groups add up to x, thus the number of possible partitions is the binomial coefficient (n + x \atop n).
So your algorithm could be as follows: choose an arbitrary n-subset of (n + x)-set, it determines uniquely a partition of x into n parts.
In Knuth's TAOCP the chapter 3.4.2 discusses random sampling. See Algortihm S there.
Algorithm S: (choose n arbitrary records from total of N)
t = 0, m = 0;
u = random, uniformly distributed on (0, 1)
if (N - t)*u >= n - m, skip t-th record and increase t by 1; otherwise include t-th record in the sample, increase m and t by 1
if M < n, return to 2, otherwise, algorithm finished
The solution for non-integers is algorithmically trivial: you just select arbitrary n numbers that don't sum up to 0, and norm them by their sum.
If you want to sample uniformly in the region of N-1-dimensional space defined by x1 + x2 + ... + xN = x, then you're looking at a special case of sampling from a Dirichlet distribution. The sampling procedure is a little more involved than generating uniform deviates for the xi. Here's one way to do it, in Python:
xs = [random.gammavariate(1,1) for a in range(N)]
xs = [x*v/sum(xs) for v in xs]
If you don't care too much about the sampling properties of your results, you can just generate uniform deviates and correct their sum afterwards.
Here is a version of the above algorithm in Javascript
function getRandomArbitrary(min, max) {
return Math.random() * (max - min) + min;
};
function getRandomArray(min, max, n) {
var arr = [];
for (var i = 0, l = n; i < l; i++) {
arr.push(getRandomArbitrary(min, max))
};
return arr;
};
function randomValuesPrescribedSum(min, max, n, total) {
var arr = getRandomArray(min, max, n);
var sum = arr.reduce(function(pv, cv) { return pv + cv; }, 0);
var k = total/sum;
var delays = arr.map(function(x) { return k*x; })
return delays;
};
You can call it with
var myarray = randomValuesPrescribedSum(0,1,3,3);
And then check it with
var sum = myarray.reduce(function(pv, cv) { return pv + cv;},0);
This code does a reasonable job. I think it produces a different distribution than 6502's answer, but I am not sure which is better or more natural. Certainly his code is clearer/nicer.
import random
def parts(total_sum, num_parts):
points = [random.random() for i in range(num_parts-1)]
points.append(0)
points.append(1)
points.sort()
ret = []
for i in range(1, len(points)):
ret.append((points[i] - points[i-1]) * total_sum)
return ret
def test(total_sum, num_parts):
ans = parts(total_sum, num_parts)
assert abs(sum(ans) - total_sum) < 1e-7
print ans
test(5.5, 3)
test(10, 1)
test(10, 5)
In python:
a: create a list of (random #'s 0 to 1) times total; append 0 and total to the list
b: sort the list, measure the distance between each element
c: round the list elements
import random
import time
TOTAL = 15
PARTS = 4
PLACES = 3
def random_sum_split(parts, total, places):
a = [0, total] + [random.random()*total for i in range(parts-1)]
a.sort()
b = [(a[i] - a[i-1]) for i in range(1, (parts+1))]
if places == None:
return b
else:
b.pop()
c = [round(x, places) for x in b]
c.append(round(total-sum(c), places))
return c
def tick():
if info.tick == 1:
start = time.time()
alpha = random_sum_split(PARTS, TOTAL, PLACES)
end = time.time()
log('alpha: %s' % alpha)
log('total: %.7f' % sum(alpha))
log('parts: %s' % PARTS)
log('places: %s' % PLACES)
log('elapsed: %.7f' % (end-start))
yields:
[2014-06-13 01:00:00] alpha: [0.154, 3.617, 6.075, 5.154]
[2014-06-13 01:00:00] total: 15.0000000
[2014-06-13 01:00:00] parts: 4
[2014-06-13 01:00:00] places: 3
[2014-06-13 01:00:00] elapsed: 0.0005839
to the best of my knowledge this distribution is uniform

Resources