uniform random distribution at the bit level - random

I would like to understand how uniform random distribution works at the bit level.
For example, in fortran random_number gives an uniform distribution between [0,1). Real numbers have a mantisse and an exponent. So, I wonder if all possible numbers (at the bit level) are obtained. And in this case, if I consider number at the bit level, they won't have the same probability to to be chosen. Or another solution, not all numbers are used and numbers have the same interval : The largest interval between two numbers (ie exponent = 0, all mantisse bits=1 - all mantisse bits but last = 1 and last =0) is used.
Is there some links to explain that ?

In principle, it's easy.
A uniform random variable in (0, 1) is distributed as:
b0/2 + b1/4 + b2/8 + ...,
Where bi are unbiased random bits (zeros and ones).
This is a very old insight, dating at least from von Neumann (1951, "Various techniques used in connection with random digits").
Thus, in principle, all that's needed is to generate a steady sequence of unbiased random bits.
But generating a "uniform" floating-point number in the interval (0, 1) is non-trivial by comparison. See the following, for example:
Random floating point double in Inclusive Range
To respond to your comment:
In theory, a uniform distribution on (0, 1) is the same as one on [0, 1), (0, 1], or [0, 1]: the values 0 and 1 occur with probability zero, as is any particular number on (0, 1). However, a "uniform" floating-point number on (0, 1) is not the same as one on [0, 1), (0, 1], or [0, 1], since zero and 1 may occur with positive probability depending on whether the interval contains 0 or 1, respectively. In effect, "throwing away" zeros and ones on a "uniform" floating-point number on [0, 1] is the best that can be done to get a "uniform" floating-point number on (0, 1).

Related

How can I sort a vector of boolean vectors in this way? ('ranking analysis')

We need to sort a large number of vectors (an array of arrays) containing only true and false (1's and 0's), all the same size.
We have the rules that 1 + 1 = 1 (true + true = true) and 1 + 0 = 1 and 0 + 0 = 0.
The first vector is the one with the most 1's.
The second vector is the one which brings more 1's in addition to the ones we already had in the first vector.
The third vector is the one which brings more 1's in addition to the ones we already had in the previous 2 vectors.
And so on.
For example, let's say we have these 3 vectors:
a. (0, 1, 0, 0, 1, 1, 0)
b. (1, 0, 1, 1, 0, 1, 1)
c. (0, 1, 1, 1, 0, 1, 0)
The first one in our sort is b because it has the most 1's.
The next one is a. Even though c has more 1's than a, a has more 1's in addition to the 1's we had in b.
By now, the sum of a + b is (1, 1, 1, 1, 1, 1, 1), so the last one is c because, it brings nothing new to the sorting.
If two vectors brings the same number of extra 1's, the order of them doesn't really matter. I believe there are multiple possible results for this kind of sorting and they are all as good.
We call this a 'ranking analysis' here, but we don't have a clear term for this kind of sort and google doesn't yield very useful info on it.
The easiest method is to just take them one by one with an O(n^2). However, we are working with big data and we already have a software for this which is too slow, so we need something really optimized.
How can we achieve this? Programming language doesn't matter, we can use anything. Can this be parallelized (run it on multiple CPU's to speed up the process)? Any sources or ideas are welcome.
Edit: I checked; apparently we have a case where the length of these vectors is 103, so they can be longer than 64 slots.

Randomly select N unique elements from a list, given a probability for each

I've run into a problem: I have a list or array (IList) of elements that have a field (float Fitness). I need to efficiently choose N random unique elements depending on this variable: the bigger - the more likely it is to be chosen.
I searched on the internet, but the algorithms I found were rather unreliable.
The answer stated here seems to have a bigger probability at the beginning which I need to make sure to avoid.
-Edit-
For example I need to choose from objects with the values [-5, -3, 0, 1, 2.5] (negative values included).
The basic algorithm is to sum the values, and then draw a point from 0-sum(values) and an order for the items, and see which one it "intersects".
For the values [0.1, 0.2, 0.3] the "windows" [0-0.1, 0.1-0.3, 0.3-0.6] will look like this:
1 23 456
|-|--|---|
|-*--*---|
And you draw a point [0-0.6] and see what window it hit on the axis.
Pseudo-python for this:
original_values = {val1, val2, ... valn}
# list is to order them, order doesn't matter outside this context.
values = list(original_values)
# limit
limit = sum(values)
draw = random() * limit
while true:
candidate = values.pop()
if candidate > draw:
return candidate
draw -= candidate
So what shall those numbers represent?
Does 2.5 mean, that the probability to be chosen is twice as high than 1.25? Well - the negative values don't fit into that scheme.
I guess fitness means something like -5: very ill, 2.5: very fit. We have a range of 7.5 and could randomly pick an element, if we know how many candidates there are and if we have access by index.
Then, take a random number between -5 and 2.5 and see, if our number is lower than or equal to the candidates fitness. If so, the candidate is picked, else we repeat with step 1. I would say, that we then generate a new threshold to survive, because if we got an 2.5, but no candidate with that fitness remains, we would search infinitely.
The range of fitnesses has to be known for this, too.
fitnesses [-5, -3, 0, 1, 2.5]
rand -5 x x x x x
-2.5 - - x x x
0 - - x x x
2.5 - - - - x
If every candidate shall be testet every round, and the -5 guy shall have a chance to survive, you have to stretch the interval of random numbers a bit, to give him a chance, for instance, from -6 to 3.

Generate random integers from random bit sequence

Very basic question but I can't seem to find the answer on Google. A standard PRNG will generate a sequence of random bits. How would I use this to produce a sequence of random integers with a uniform probability distribution in the range [0, N)? Moreover each integer should use (expected value) log_2(N) bits.
If you want a random number between 1 and N :
you calculate how many bits you would need to turn N into a binary number. That's :
n_bits = ceiling(log_2(N))
where ceiling is the "round up" operation. (ex : ceiling(3) = 3, ceiling(3.7) = 4)
you pick the first n_bits of your random binary list and change them into a decimal number.
if your decimal number is above N, well... you discard it and try again with the n_bits next bits until it works.
Exemple for N = 12 :
n_bits = ceiling(log_2(12)) = 4
you take the 4 first bits of your random bit sequence which might be "1011"
you turn "1011" into a decimal number which gives 13. That's above 12, no good. So :
take the 4 next bits in your random sequence which might be "1110".
turn '1110' into a decimal which gives 7. That works !
Hope it helps.
Actually most standard PRNGs such as linear congruential generators or Mersenne twister generate sequences of integer values. Even generalized feedback shift register techniques are usually implemented at the register/word level. I don't know of any common techniques that actually operate at the bit level. That's not to say they don't exist, but they're not common...
Generating values from 1 to N is usually accomplished by taking the integer value produced modulo the desired bound, and then doing an acceptance/rejection stage to make sure you aren't subject to modulo bias. See Java's nextInt(int bound) method, for example, to see how this can be implemented. (Add 1 to the result to get [1,N] rather than [0,N-1].)
Theoretically this is possible. Find a, b such that 2^a > N^b but is very close. (This can be done by iterating through multiples of log2(N).) Take the first a bits, and, interpreting it as a binary number, convert it to base N (also checking that the number is less than N^b). The digits give b terms of the desired sequence.
The problem is that converting to base N is very expensive and will cost more than essentially any PRNG, so this is mostly a theoretical answer.
Calculate the number of bits required for N (= location of the most significant bit with value 1) - let's call it k.
Take the first k bits from your input stream of bits - let's call it number X.
Result = X mod N.
Propagate to the next set of k bits and repeat from step 2 for next random number generation.
Alternatively, for better distribution, this can be applied instead of step 3:
Ratio = N/2k
Result = X * Ratio
Start with the range [0, N-1] then use 0s and 1s to perform a binary search:
0: lower half
1: upper half
e.g. With N = 16, you start with [0, 15], and the sequence 0, 1, 1, 0 would give:
[0, 7]
[4, 7]
[6, 7]
[6]
If N is not a power of 2, then in any iteration, the length of the list of remaining numbers could be odd, in which case a decision needs to be made to include the middle number as part of the lower half or the upper half. This can be decided right at the start of the algorithm. Roll once: 0 means include all instances of middle numbers to the lower half, and 1 means include all instances of middle numbers to the right half.
I think this is at least closer to the uniform distribution that you are asking for compared to the common method of generating log(N) bits and taking that or taking the mod N of it.
To illustrate what I mean, using my method to generate a number in the range [0, 9]:
To generate 0
0: 0, 0, 0, 0
1: 0, 0, 0
To generate 1
0: 0, 0, 0, 1
1: 0, 0, 1
To generate 2
0: 0, 0, 1
1: 0, 1, 0
To generate 3
0: 0, 1, 0
1: 0, 1, 1, 0
To generate 4
0: 0, 1, 1
1: 0, 1, 1, 1
To generate 5
0: 1, 0, 0, 0
1: 1, 0, 0
To generate 6
0: 1, 0, 0, 1
1: 1, 0, 1
To generate 7
0: 1, 0, 1
1: 1, 1, 0
To generate 8
0: 1, 1, 0
1: 1, 1, 1, 0
To generate 9
0: 1, 1, 1
1: 1, 1, 1, 1
The other easy answer is to generate a large enough binary number such that taking mod N does not (statistically) favor some numbers over others. But I figured that you would not like this answer either because judging from your comments to another answer, you seem to be taking into account efficiency in terms of number of bits generated.
In short, I am not sure why I was downvoted for this answer as this algorithm seems to provide a nice distribution compared to the number of bits it uses (~log(N)).

Generating strongly biased random numbers for tests

I want to run tests with randomized inputs and need to generate 'sensible' random
numbers, that is, numbers that match good enough to pass the tested function's
preconditions, but hopefully wreak havoc deeper inside its code.
math.random() (I'm using Lua) produces uniformly distributed random
numbers. Scaling these up will give far more big numbers than small numbers,
and there will be very few integers.
I would like to skew the random numbers (or generate new ones using the old
function as a randomness source) in a way that strongly favors 'simple' numbers,
but will still cover the whole range, i.e., extending up to positive/negative infinity
(or ±1e309 for double). This means:
numbers up to, say, ten should be most common,
integers should be more common than fractions,
numbers ending in 0.5 should be the most common fractions,
followed by 0.25 and 0.75; then 0.125,
and so on.
A different description: Fix a base probability x such that probabilities
will sum to one and define the probability of a number n as xk
where k is the generation in which n is constructed as a surreal
number1. That assigns x to 0, x2 to -1 and +1,
x3 to -2, -1/2, +1/2 and +2, and so on. This
gives a nice description of something close to what I want (it skews a bit too
much), but is near-unusable for computing random numbers. The resulting
distribution is nowhere continuous (it's fractal!), I'm not sure how to
determine the base probability x (I think for infinite precision it would be
zero), and computing numbers based on this by iteration is awfully
slow (spending near-infinite time to construct large numbers).
Does anyone know of a simple approximation that, given a uniformly distributed
randomness source, produces random numbers very roughly distributed as
described above?
I would like to run thousands of randomized tests, quantity/speed is more
important than quality. Still, better numbers mean less inputs get rejected.
Lua has a JIT, so performance is usually not much of an issue. However, jumps based
on randomness will break every prediction, and many calls to math.random()
will be slow, too. This means a closed formula will be better than an
iterative or recursive one.
1 Wikipedia has an article on surreal numbers, with
a nice picture. A surreal number is a pair of two surreal
numbers, i.e. x := {n|m}, and its value is the number in the middle of the
pair, i.e. (for finite numbers) {n|m} = (n+m)/2 (as rational). If one side
of the pair is empty, that's interpreted as increment (or decrement, if right
is empty) by one. If both sides are empty, that's zero. Initially, there are
no numbers, so the only number one can build is 0 := { | }. In generation
two one can build numbers {0| } =: 1 and { |0} =: -1, in three we get
{1| } =: 2, {|1} =: -2, {0|1} =: 1/2 and {-1|0} =: -1/2 (plus some
more complex representations of known numbers, e.g. {-1|1} ? 0). Note that
e.g. 1/3 is never generated by finite numbers because it is an infinite
fraction – the same goes for floats, 1/3 is never represented exactly.
How's this for an algorithm?
Generate a random float in (0, 1) with a library function
Generate a random integral roundoff point according to a desired probability density function (e.g. 0 with probability 0.5, 1 with probability 0.25, 2 with probability 0.125, ...).
'Round' the float by that roundoff point (e.g. floor((float_val << roundoff)+0.5))
Generate a random integral exponent according to another PDF (e.g. 0, 1, 2, 3 with probability 0.1 each, and decreasing thereafter)
Multiply the rounded float by 2exponent.
For a surreal-like decimal expansion, you need a random binary number.
Even bits tell you whether to stop or continue, odd bits tell you whether to go right or left on the tree:
> 0... => 0.0 [50%] Stop
> 100... => -0.5 [<12.5%] Go, Left, Stop
> 110... => 0.5 [<12.5%] Go, Right, Stop
> 11100... => 0.25 [<3.125%] Go, Right, Go, Left, Stop
> 11110... => 0.75 [<3.125%] Go, Right, Go, Right, Stop
> 1110100... => 0.125
> 1110110... => 0.375
> 1111100... => 0.625
> 1111110... => 0.875
One way to quickly generate a random binary number is by looking at the decimal digits in math.random() and replace 0-4 with '1' and 5-9 with '1':
0.8430419054348022
becomes
1000001010001011
which becomes -0.5
0.5513009827118367
becomes
1100001101001011
which becomes 0.25
etc
Haven't done much lua programming, but in Javascript you can do:
Math.random().toString().substring(2).split("").map(
function(digit) { return digit >= "5" ? 1 : 0 }
);
or true binary expansion:
Math.random().toString(2).substring(2)
Not sure which is more genuinely "random" -- you'll need to test it.
You could generate surreal numbers in this way, but most of the results will be decimals in the form a/2^b, with relatively few integers. On Day 3, only 2 integers are produced (-3 and 3) vs. 6 decimals, on Day 4 it is 2 vs. 14, and on Day n it is 2 vs (2^n-2).
If you add two uniform random numbers from math.random(), you get a new distribution which has a "triangle" like distribution (linearly decreasing from the center). Adding 3 or more will get a more 'bell curve' like distribution centered around 0:
math.random() + math.random() + math.random() - 1.5
Dividing by a random number will get a truly wild number:
A/(math.random()+1e-300)
This will return an results between A and (theoretically) A*1e+300,
though my tests show that 50% of the time the results are between A and 2*A
and about 75% of the time between A and 4*A.
Putting them together, we get:
round(6*(math.random()+math.random()+math.random() - 1.5)/(math.random()+1e-300))
This has over 70% of the number returned between -9 and 9 with a few big numbers popping up rarely.
Note that the average and sum of this distribution will tend to diverge towards a large negative or positive number, because the more times you run it, the more likely it is for a small number in the denominator to cause the number to "blow up" to a large number such as 147,967 or -194,137.
See gist for sample code.
Josh
You can immediately calculate the nth born surreal number.
Example, the 1000th Surreal number is:
convert to binary:
1000 dec = 1111101000 bin
1's become pluses and 0's minuses:
1111101000
+++++-+---
The first '1' bit is 0 value, the next set of similar numbers is +1 (for 1's) or -1 (for 0's), then the value is 1/2, 1/4, 1/8, etc for each subsequent bit.
1 1 1 1 1 0 1 0 0 0
+ + + + + - + - - -
0 1 1 1 1 h h h h h
+0+1+1+1+1-1/2+1/4-1/8-1/16-1/32
= 3+17/32
= 113/32
= 3.53125
The binary length in bits of this representation is equal to the day on which that number was born.
Left and right numbers of a surreal number are the binary representation with its tail stripped back to the last 0 or 1 respectively.
Surreal numbers have an even distribution between -1 and 1 where half of the numbers created to a particular day will exist. 1/4 of the numbers exists evenly distributed between -2 to -1 and 1 to 2 and so on. The max range will be negative to positive integers matching the number of days you provide. The numbers go to infinity slowly because each day only adds one to the negative and positive ranges and days contain twice as many numbers as the last.
Edit:
A good name for this bit representation is "sinary"
Negative numbers are transpositions. ex:
100010101001101s -> negative number (always start 10...)
111101010110010s -> positive number (always start 01...)
and we notice that all bits flip accept the first one which is a transposition.
Nan is => 0s (since all other numbers start with 1), which makes it ideal for representation in bit registers in a computer since leading zeros are required (we don't make ternary computer anymore... too bad)
All Conway surreal algebra can be done on these number without needing to convert to binary or decimal.
The sinary format can be seem as a one plus a simple one's counter with a 2's complement decimal representation attached.
Here is an incomplete report on finary (similar to sinary): https://github.com/peawormsworth/tools/blob/master/finary/Fine%20binary.ipynb

Random function returning number from interval

How would you implement a function that is returning a random number from interval 1..1000
in the case there is a number N determining the chance of reaching higher numbers or lower numbers?
It should behave as follows:
e.g.
if N = 0 and we will generate many times the random number we will get a certain equilibrium (every number from interval 1..1000 has equal chance).
if N = 2321 (I call it positive factor) it will be very hard to achieve small number (often will be generated numbers > 900, sometimes numbers near 500 and rarely numbers < 100). The highest the positive factor the highest probability for high numbers
if N = -2321 (negative factor) this will be the opposite of positive factor
It's clear that the generated numbers will create for given N certain characteristic curve. Could you advise me how to achieve this goal and what curves can I create? What possibilities do I have here? How would you limit positive and negative factors etc.
thank you for help
If you generate a uniform random number, and then raise it to a power > 1, it will get smaller, but stay in the range [0, 1]. If you raise it to a power greater than 0 but less than 1, it will get larger, but stay in the range [0, 1].
So you can use the exponent to pick a power when generating your random numbers.
def biased_random(scale, bias):
return random.random() ** bias * scale
sum(biased_random(1000, 2.5) for x in range(100)) / 100
291.59652962214676 # average less than 500
max(biased_random(1000, 2.5) for x in range(100))
963.81166161355998 # but still occasionally generates large numbers
sum(biased_random(1000, .3) for x in range(100)) / 100
813.90199860117821 # average > 500
min(biased_random(1000, .3) for x in range(100))
265.25040459294883 # but still occasionally generates small numbers
This problem is severely underspecified. There are a million ways to solve it as it is mentioned.
Instead of arbitrary positive and negative values, try to think what is the meaning behind them. IMHO, beta distribution is the one you should consider. By selecting the parameters \alpha and \beta you should be appropriately modulate the behavior of your distribution.
See what shapes you can get with certain \alpha and \beta http://en.wikipedia.org/wiki/Beta_distribution#Shapes
http://en.wikipedia.org/wiki/File:Beta_distribution_pdf.svg
Lets for beginning decide that we will pick numbers from [0,1] because it makes stuff simpler.
n is number that represents distribution (0,2321 or -2321) as in example
We need solution only for n > 0, because if n < 0. You can take positive version of n and subtract from 1.
One simple idea for PDF in interval [0,1] is x^n. (or at least this kind of shape)
Calculating CDF is then integrating x^n and is x^(n+1)/(n+1)
Because CDF must be 1 at the end (in our case at 1) our final CDF is than x^(n+1) and is properly weighted
In order to generate this kind distribution from this, we must calaculate quantile function
Quantile function is just inverse of CDF and is in our case. x^(1/(n+1))
And that is it. Your QF is x^(1/(n+1))
To generate numbers from [0,1] you have to pick uniformly distributetd random from [0,1] (most common random function in programming languages)
and than power this whit (1/(n+1))
Only problem I see is that it can be problem to calculate 1-x^(1/(-n+1)) correctly, where n < 0 but i think that you can use log1p,
so it becomes exp(log1p(-x^(1/(-n+1))) if n<0
conclusion whit normalizations
if n>=0: (x^(1/(n/1000+1)))*1000
if n<0: exp(log1p(-(x^(1/(-(n/1000)+1)))))*1000
where x is uniformly distributed random value in interval [0,1]

Resources