Generating strongly biased random numbers for tests - random

I want to run tests with randomized inputs and need to generate 'sensible' random
numbers, that is, numbers that match good enough to pass the tested function's
preconditions, but hopefully wreak havoc deeper inside its code.
math.random() (I'm using Lua) produces uniformly distributed random
numbers. Scaling these up will give far more big numbers than small numbers,
and there will be very few integers.
I would like to skew the random numbers (or generate new ones using the old
function as a randomness source) in a way that strongly favors 'simple' numbers,
but will still cover the whole range, i.e., extending up to positive/negative infinity
(or ±1e309 for double). This means:
numbers up to, say, ten should be most common,
integers should be more common than fractions,
numbers ending in 0.5 should be the most common fractions,
followed by 0.25 and 0.75; then 0.125,
and so on.
A different description: Fix a base probability x such that probabilities
will sum to one and define the probability of a number n as xk
where k is the generation in which n is constructed as a surreal
number1. That assigns x to 0, x2 to -1 and +1,
x3 to -2, -1/2, +1/2 and +2, and so on. This
gives a nice description of something close to what I want (it skews a bit too
much), but is near-unusable for computing random numbers. The resulting
distribution is nowhere continuous (it's fractal!), I'm not sure how to
determine the base probability x (I think for infinite precision it would be
zero), and computing numbers based on this by iteration is awfully
slow (spending near-infinite time to construct large numbers).
Does anyone know of a simple approximation that, given a uniformly distributed
randomness source, produces random numbers very roughly distributed as
described above?
I would like to run thousands of randomized tests, quantity/speed is more
important than quality. Still, better numbers mean less inputs get rejected.
Lua has a JIT, so performance is usually not much of an issue. However, jumps based
on randomness will break every prediction, and many calls to math.random()
will be slow, too. This means a closed formula will be better than an
iterative or recursive one.
1 Wikipedia has an article on surreal numbers, with
a nice picture. A surreal number is a pair of two surreal
numbers, i.e. x := {n|m}, and its value is the number in the middle of the
pair, i.e. (for finite numbers) {n|m} = (n+m)/2 (as rational). If one side
of the pair is empty, that's interpreted as increment (or decrement, if right
is empty) by one. If both sides are empty, that's zero. Initially, there are
no numbers, so the only number one can build is 0 := { | }. In generation
two one can build numbers {0| } =: 1 and { |0} =: -1, in three we get
{1| } =: 2, {|1} =: -2, {0|1} =: 1/2 and {-1|0} =: -1/2 (plus some
more complex representations of known numbers, e.g. {-1|1} ? 0). Note that
e.g. 1/3 is never generated by finite numbers because it is an infinite
fraction – the same goes for floats, 1/3 is never represented exactly.

How's this for an algorithm?
Generate a random float in (0, 1) with a library function
Generate a random integral roundoff point according to a desired probability density function (e.g. 0 with probability 0.5, 1 with probability 0.25, 2 with probability 0.125, ...).
'Round' the float by that roundoff point (e.g. floor((float_val << roundoff)+0.5))
Generate a random integral exponent according to another PDF (e.g. 0, 1, 2, 3 with probability 0.1 each, and decreasing thereafter)
Multiply the rounded float by 2exponent.

For a surreal-like decimal expansion, you need a random binary number.
Even bits tell you whether to stop or continue, odd bits tell you whether to go right or left on the tree:
> 0... => 0.0 [50%] Stop
> 100... => -0.5 [<12.5%] Go, Left, Stop
> 110... => 0.5 [<12.5%] Go, Right, Stop
> 11100... => 0.25 [<3.125%] Go, Right, Go, Left, Stop
> 11110... => 0.75 [<3.125%] Go, Right, Go, Right, Stop
> 1110100... => 0.125
> 1110110... => 0.375
> 1111100... => 0.625
> 1111110... => 0.875
One way to quickly generate a random binary number is by looking at the decimal digits in math.random() and replace 0-4 with '1' and 5-9 with '1':
0.8430419054348022
becomes
1000001010001011
which becomes -0.5
0.5513009827118367
becomes
1100001101001011
which becomes 0.25
etc
Haven't done much lua programming, but in Javascript you can do:
Math.random().toString().substring(2).split("").map(
function(digit) { return digit >= "5" ? 1 : 0 }
);
or true binary expansion:
Math.random().toString(2).substring(2)
Not sure which is more genuinely "random" -- you'll need to test it.
You could generate surreal numbers in this way, but most of the results will be decimals in the form a/2^b, with relatively few integers. On Day 3, only 2 integers are produced (-3 and 3) vs. 6 decimals, on Day 4 it is 2 vs. 14, and on Day n it is 2 vs (2^n-2).
If you add two uniform random numbers from math.random(), you get a new distribution which has a "triangle" like distribution (linearly decreasing from the center). Adding 3 or more will get a more 'bell curve' like distribution centered around 0:
math.random() + math.random() + math.random() - 1.5
Dividing by a random number will get a truly wild number:
A/(math.random()+1e-300)
This will return an results between A and (theoretically) A*1e+300,
though my tests show that 50% of the time the results are between A and 2*A
and about 75% of the time between A and 4*A.
Putting them together, we get:
round(6*(math.random()+math.random()+math.random() - 1.5)/(math.random()+1e-300))
This has over 70% of the number returned between -9 and 9 with a few big numbers popping up rarely.
Note that the average and sum of this distribution will tend to diverge towards a large negative or positive number, because the more times you run it, the more likely it is for a small number in the denominator to cause the number to "blow up" to a large number such as 147,967 or -194,137.
See gist for sample code.
Josh

You can immediately calculate the nth born surreal number.
Example, the 1000th Surreal number is:
convert to binary:
1000 dec = 1111101000 bin
1's become pluses and 0's minuses:
1111101000
+++++-+---
The first '1' bit is 0 value, the next set of similar numbers is +1 (for 1's) or -1 (for 0's), then the value is 1/2, 1/4, 1/8, etc for each subsequent bit.
1 1 1 1 1 0 1 0 0 0
+ + + + + - + - - -
0 1 1 1 1 h h h h h
+0+1+1+1+1-1/2+1/4-1/8-1/16-1/32
= 3+17/32
= 113/32
= 3.53125
The binary length in bits of this representation is equal to the day on which that number was born.
Left and right numbers of a surreal number are the binary representation with its tail stripped back to the last 0 or 1 respectively.
Surreal numbers have an even distribution between -1 and 1 where half of the numbers created to a particular day will exist. 1/4 of the numbers exists evenly distributed between -2 to -1 and 1 to 2 and so on. The max range will be negative to positive integers matching the number of days you provide. The numbers go to infinity slowly because each day only adds one to the negative and positive ranges and days contain twice as many numbers as the last.
Edit:
A good name for this bit representation is "sinary"
Negative numbers are transpositions. ex:
100010101001101s -> negative number (always start 10...)
111101010110010s -> positive number (always start 01...)
and we notice that all bits flip accept the first one which is a transposition.
Nan is => 0s (since all other numbers start with 1), which makes it ideal for representation in bit registers in a computer since leading zeros are required (we don't make ternary computer anymore... too bad)
All Conway surreal algebra can be done on these number without needing to convert to binary or decimal.
The sinary format can be seem as a one plus a simple one's counter with a 2's complement decimal representation attached.
Here is an incomplete report on finary (similar to sinary): https://github.com/peawormsworth/tools/blob/master/finary/Fine%20binary.ipynb

Related

Early termination of fractional exponent calculation?

I need to write a function that takes the sixth root of something (equivalently, raises something to the 1/6 power), and checks if the answer is an integer. I want this function to be as fast and as optimized as possible, and since this function needs to run a lot, I'm thinking it might be best to not have to calculate the whole root.
How would I write a function (language agnostic, although Python/C/C++ preferred) that returns False (or 0 or something equivalent) before having to compute the entirety of the sixth root? For instance, if I was taking the 6th root of 65, then my function should, upon realizing that that the result is not an int, stop calculating and return False, instead of first computing that the 6th of 65 is 2.00517474515, then checking if 2.00517474515 is an int, and finally returning False.
Of course, I'm asking this question under the impression that it is faster to do the early termination thing than the complete computation, using something like
print(isinstance(num**(1/6), int))
Any help or ideas would be greatly appreciated. I would also be interested in answers that are generalizable to lots of fractional powers, not just x^(1/6).
Here are some ideas of things you can try that might help eliminate non-sixth-powers quickly. For actual sixth powers, you'll still end up eventually needing to compute the sixth root.
Check small cases
If the numbers you're given have a reasonable probability of being small (less than 12 digits, say), you could build a table of small cases and check against that. There are only 100 sixth powers smaller than 10**12. If your inputs will always be larger, then there's little value in this test, but it's still a very cheap test to make.
Eliminate small primes
Any small prime factor must appear with an exponent that's a multiple of 6. To avoid too many trial divisions, you can bundle up some of the small factors.
For example, 2 * 3 * 5 * 7 * 11 * 13 * 17 * 19 * 23 = 223092870, which is small enough to fit in single 30-bit limb in Python, so a single modulo operation with that modulus should be fast.
So given a test number n, compute g = gcd(n, 223092870), and if the result is not 1, check that n is exactly divisible by g ** 6. If not, n is not a sixth power, and you're done. If n is exactly divisible by g**6, repeat with n // g**6.
Check the value modulo 124488 (for example)
If you carried out the previous step, then at this point you have a value that's not divisible by any prime smaller than 25. Now you can do a modulus test with a carefully chosen modulus: for example, any sixth power that's relatively prime to 124488 = 8 * 9 * 7 * 13 * 19 is congruent to one of the six values [1, 15625, 19657, 28729, 48385, 111385] modulo 124488. There are larger moduli that could be used, at the expense of having to check more possible residues.
Check whether it's a square
Any sixth power must be a square. Since Python (at least, Python >= 3.8) has a built-in integer square root function that's reasonably fast, it's efficient to check whether the value is a square before going for computing a full sixth root. (And if it is a square and you've already computed the square root, now you only need to extract a cube root rather than a sixth root.)
Use floating-point arithmetic
If the input is not too large, say 90 digits or smaller, and it's a sixth power then floating-point arithmetic has a reasonable chance of finding the sixth root exactly. However, Python makes no guarantees about the accuracy of a power operation, so it's worth making some additional checks to make sure that the result is within the expected range. For larger inputs, there's less chance of floating-point arithmetic getting the right result. The sixth root of (2**53 + 1)**6 is not exactly representable as a Python float (making the reasonable assumption that Python's float type matches the IEEE 754 binary64 format), and once n gets past 308 digits or so it's too large to fit into a float anyway.
Use integer arithmetic
Once you've exhausted all the cheap tricks, you're left with little choice but to compute the floor of the sixth root, then compare the sixth power of that with the original number.
Here's some Python code that puts together all of the tricks listed above. You should do your own timings targeting your particular use-case, and choose which tricks are worth keeping and which should be adjusted or thrown out. The order of the tricks will also be significant.
from math import gcd, isqrt
# Sixth powers smaller than 10**12.
SMALL_SIXTH_POWERS = {n**6 for n in range(100)}
def is_sixth_power(n):
"""
Determine whether a positive integer n is a sixth power.
Returns True if n is a sixth power, and False otherwise.
"""
# Sanity check (redundant with the small cases check)
if n <= 0:
return n == 0
# Check small cases
if n < 10**12:
return n in SMALL_SIXTH_POWERS
# Try a floating-point check if there's a realistic chance of it working
if n < 10**90:
s = round(n ** (1/6.))
if n == s**6:
return True
elif (s - 1) ** 6 < n < (s + 1)**6:
return False
# No conclusive result; fall through to the next test.
# Eliminate small primes
while True:
g = gcd(n, 223092870)
if g == 1:
break
n, r = divmod(n, g**6)
if r:
return False
# Check modulo small primes (requires that
# n is relatively prime to 124488)
if n % 124488 not in {1, 15625, 19657, 28729, 48385, 111385}:
return False
# Find the square root using math.isqrt, throw out non-squares
s = isqrt(n)
if s**2 != n:
return False
# Compute the floor of the cube root of s
# (which is the same as the floor of the sixth root of n).
# Code stolen from https://stackoverflow.com/a/35276426/270986
a = 1 << (s.bit_length() - 1) // 3 + 1
while True:
d = s//a**2
if a <= d:
return a**3 == s
a = (2*a + d)//3

Random Numbers based on the ANU Quantum Random Numbers Server

I have been asked to use the ANU Quantum Random Numbers Service to create random numbers and use Random.rand only as a fallback.
module QRandom
def next
RestClient.get('http://qrng.anu.edu.au/API/jsonI.php?type=uint16&length=1'){ |response, request, result, &block|
case response.code
when 200
_json=JSON.parse(response)
if _json["success"]==true && _json["data"]
_json["data"].first || Random.rand(65535)
else
Random.rand(65535) #fallback
end
else
puts response #log problem
Random.rand(65535) #fallback
end
}
end
end
Their API service gives me a number between 0-65535. In order to create a random for a bigger set, like a random number between 0-99999, I have to do the following:
(QRandom.next.to_f*(99999.to_f/65535)).round
This strikes me as the wrong way of doing, since if I were to use a service (quantum or not) that creates numbers from 0-3 and transpose them into space of 0-9999 I have a choice of 4 numbers that I always get. How can I use the service that produces numbers between 0-65535 to create random numbers for a larger number set?
Since 65535 is 1111111111111111 in binary, you can just think of the random number server as a source of random bits. The fact that it gives the bits to you in chunks of 16 is not important, since you can make multiple requests and you can also ignore certain bits from the response.
So after performing that abstraction, what we have now is a service that gives you a random bit (0 or 1) whenever you want it.
Figure out how many bits of randomness you need. Since you want a number between 0 and 99999, you just need to find a binary number that is all ones and is greater than or equal to 99999. Decimal 99999 is equal to binary 11000011010011111, which is 17 bits long, so you will need 17 bits of randomness.
Now get 17 bits of randomness from the service and assemble them into a binary number. The number will be between 0 and 2**17-1 (131071), and it will be evenly distributed. If the random number happens to be greater than 99999, then throw away the bits you have and try again. (The probability of needing to retry should be less than 50%.)
Eventually you will get a number between 0 and 99999, and this algorithm should give you a totally uniform distribution.
How about asking for more numbers? Using the length parameter of that API you can just ask for extra numbers and sum them so you get bigger numbers like you want.
http://qrng.anu.edu.au/API/jsonI.php?type=uint16&length=2
You can use inject for the sum and the modulo operation to make sure the number is not bigger than you want.
json["data"].inject(:+) % MAX_NUMBER
I made some other changes to your code like using SecureRandom instead of the regular Random. You can find the code here:
https://gist.github.com/matugm/bee45bfe637f0abf8f29#file-qrandom-rb
Think of the individual numbers you are getting as 16 bits of randomness. To make larger random numbers, you just need more bits. The tricky bit is figuring out how many bits is enough. For example, if you wanted to generate numbers from an absolutely fair distribution from 0 to 65000, then it should be pretty obvious that 16 bits are not enough; even though you have the range covered, some numbers will have twice the probability of being selected than others.
There are a couple of ways around this problem. Using Ruby's Bignum (technically that happens behind the scenes, it works well in Ruby because you won't overflow your Integer type) it is possible to use a method that simply collects more bits until the result of a division could never be ambiguous - i.e. the difference when adding more significant bits to the division you are doing could never change the result.
This what it might look like, using your QRandom.next method to fetch bits in batches of 16:
def QRandom.rand max
max = max.to_i # This approach requires integers
power = 1
sum = 0
loop do
sum = 2**16 * sum + QRandom.next
power *= 2**16
lower_bound = sum * max / power
break lower_bound if lower_bound == ( (sum + 1) * max ) / power
end
end
Because it costs you quite a bit to fetch random bits from your chosen source, you may benefit from taking this to the most efficient form possible, which is similar in principle to Arithmetic Coding and squeezes out the maximum possible entropy from your source whilst generating unbiased numbers in 0...max. You would need to implement a method QRandom.next_bits( num ) that returned an integer constructed from a bitstream buffer originating with your 16-bit numbers:
def QRandom.rand max
max = max.to_i # This approach requires integers
# I prefer this: start_bits = Math.log2( max ).floor
# But this also works (and avoids suggestions the algo uses FP):
start_bits = max.to_s(2).length
sum = QRandom.next_bits( start_bits )
power = 2 ** start_bits
# No need for fractional bits if max is power of 2
return sum if power == max
# Draw 1 bit at a time to resolve fractional powers of 2
loop do
lower_bound = (sum * max) / power
break lower_bound if lower_bound == ((sum + 1) * max)/ power
sum = 2 * sum + QRandom.next_bits(1) # 0 or 1
power *= 2
end
end
This is the most efficient use of bits from your source possible. It is always as efficient or better than re-try schemes. The expected number of bits used per call to QRandom.rand( max ) is 1 + Math.log2( max ) - i.e. on average this allows you to draw just over the fractional number of bits needed to represent your range.

Generate a unique number out of the combination of 'n' different numbers?

To clarify, as input I have 'n' (n1, n2, n3,...) numbers (integers) such as each number is unique within this set.
I would like to generate a number out of this set (lets call the generated number big 'N') that is also unique, and that allows me to verify that a number 'n1' belongs to the set 'n' just by using 'N'.
is that possible?
Edit:
Thanks for the answers guys, I am looking into them atm. For those requesting an example, here is a simple one:
imagine i have those paths (bi-directional graph) with a random unique value (let's call it identifier):
P1 (N1): A----1----B----2----C----3----D
P2 (N2): A----4----E----5----D
So I want to get the full path (unique path, not all paths) from A knowing N1 and this path as a result should be P1.
Mind you that 1,2,...are just unique numbers in this graph, not weights or distances, I just use them for my heuristic.
If you are dealing with small numbers, no problem. You are doing the same thing with digits every time you compose a number: a digit is a number from 0 to 9 and a full number is a combination of them that:
is itself a number
is unique for given digits
allows you to easily verify if a digit is inside
The gotcha is that the numbers must have an upper limit, like 10 is for digits. Let's say 1000 here for simplicity, the similar composed number could be:
n1*1000^k + n2*1000^(k-1) + n3*1000^(k-2) ... + nk*1000^(0)
So if you have numbers 33, 44 and 27 you will get:
33*1000000 + 44*1000 + 27, and that is number N: 33044027
Of course you can do the same with bigger limits, and binary like 256,1024 or 65535, but it grows big fast.
A better idea, if possible is to convert it into a string (a string is still a number!) with some separator (a number in base 11, that is 10 normal digits + 1 separator digit). This is more flexible as there are no upper limits. Imagine to use digits 0-9 + a separator digit 'a'. You can obtain number 33a44a27 in base 11. By translating this to base 10 or base 16 you can get an ordinary computer number (65451833 if I got it right). Then converting 65451833 to undecimal (base11) 33a44a27, and splitting by digit 'a' you can get the original numbers back to test.
EDIT: A VARIABLE BASE NUMBER?
Of course this would work better digitally in base 17 (16 digits+separator). But I suspect there are more optimal ways, for example if the numbers are unique in the path, the more numbers you add, the less are remaining, the shorter the base could shrink. Can you imagine a number in which the first digit is in base 20, the second in base 19, the third in base 18, and so on? Can this be done? Meh?
In this variating base world (in a 10 nodes graph), path n0-n1-n2-n3-n4-n5-n6-n7-n8-n9 would be
n0*10^0 + (n1*9^1)+(offset:1) + n2*8^2+(offset:18) + n3*7^3+(offset:170)+...
offset1: 10-9=1
offset2: 9*9^1-1*8^2+1=81-64+1=18
offset3: 8*8^2-1*7^3+1=343-512+1=170
If I got it right, in this fiddle: http://jsfiddle.net/Hx5Aq/ the biggest number path would be: 102411
var path="9-8-7-6-5-4-3-2-1-0"; // biggest number
o2=(Math.pow(10,1)-Math.pow(9,1)+1); // offsets so digits do not overlap
o3=(Math.pow(9,2)-Math.pow(8,2)+1);
o4=(Math.pow(8,3)-Math.pow(7,3)+1);
o5=(Math.pow(7,4)-Math.pow(6,4)+1);
o6=(Math.pow(6,5)-Math.pow(5,5)+1);
o7=(Math.pow(5,6)-Math.pow(4,6)+1);
o8=(Math.pow(4,7)-Math.pow(3,7)+1);
o9=(Math.pow(3,8)-Math.pow(2,8)+1);
o10=(Math.pow(2,9)-Math.pow(1,9)+1);
o11=(Math.pow(1,10)-Math.pow(0,10)+1);
var n=path.split("-");
var res;
res=
n[9]*Math.pow(10,0) +
n[8]*Math.pow(9,1) + o2 +
n[7]*Math.pow(8,2) + o3 +
n[6]*Math.pow(7,3) + o4 +
n[5]*Math.pow(6,4) + o5 +
n[4]*Math.pow(5,5) + o6 +
n[3]*Math.pow(4,6) + o7 +
n[2]*Math.pow(3,7) + o8 +
n[1]*Math.pow(2,8) + o9 +
n[0]*Math.pow(1,9) + o10;
alert(res);
So N<=102411 would represent any path of ten nodes? Just a trial. You have to find a way of naming them, for instance if they are 1,2,3,4,5,6... and you use 5 you will have to compact the remaining 1,2,3,4,6->5,7->6... => 1,2,3,4,5,6... (that is revertable and unique if you start from the first)
Theoretically, yes it is.
By defining p_i as the i'th prime number, you can generate N=p_(n1)*p_(n2)*..... Now, all you have to do is to check if N%p_(n) == 0 or not.
However, note that N will grow to huge numbers very fast, so I am not sure this is a very practical solution.
One very practical probabilistic solution is using bloom filters. Note that bloom filters is a set of bits, that can be translated easily to any number N.
Bloom filters have no false negatives (if you said a number is not in the set, it really isn't), but do suffer from false positives with an expected given probability (that is dependent on the size of the sets, number of functions used and number of bits used).
As a side note, to get a result that is 100% accurate, you are going to need at the very least 2^k bits (where k is the range of the elements) to represent the number N by looking at this number as a bitset, where each bit indicates existence or non-existence of a number in the set. You can show that there is no 100% accurate solution that uses less bits (peigeon hole principle). Note that for integers for example with 32 bits, it means you are going to need N with 2^32 bits, which is unpractical.

Random function returning number from interval

How would you implement a function that is returning a random number from interval 1..1000
in the case there is a number N determining the chance of reaching higher numbers or lower numbers?
It should behave as follows:
e.g.
if N = 0 and we will generate many times the random number we will get a certain equilibrium (every number from interval 1..1000 has equal chance).
if N = 2321 (I call it positive factor) it will be very hard to achieve small number (often will be generated numbers > 900, sometimes numbers near 500 and rarely numbers < 100). The highest the positive factor the highest probability for high numbers
if N = -2321 (negative factor) this will be the opposite of positive factor
It's clear that the generated numbers will create for given N certain characteristic curve. Could you advise me how to achieve this goal and what curves can I create? What possibilities do I have here? How would you limit positive and negative factors etc.
thank you for help
If you generate a uniform random number, and then raise it to a power > 1, it will get smaller, but stay in the range [0, 1]. If you raise it to a power greater than 0 but less than 1, it will get larger, but stay in the range [0, 1].
So you can use the exponent to pick a power when generating your random numbers.
def biased_random(scale, bias):
return random.random() ** bias * scale
sum(biased_random(1000, 2.5) for x in range(100)) / 100
291.59652962214676 # average less than 500
max(biased_random(1000, 2.5) for x in range(100))
963.81166161355998 # but still occasionally generates large numbers
sum(biased_random(1000, .3) for x in range(100)) / 100
813.90199860117821 # average > 500
min(biased_random(1000, .3) for x in range(100))
265.25040459294883 # but still occasionally generates small numbers
This problem is severely underspecified. There are a million ways to solve it as it is mentioned.
Instead of arbitrary positive and negative values, try to think what is the meaning behind them. IMHO, beta distribution is the one you should consider. By selecting the parameters \alpha and \beta you should be appropriately modulate the behavior of your distribution.
See what shapes you can get with certain \alpha and \beta http://en.wikipedia.org/wiki/Beta_distribution#Shapes
http://en.wikipedia.org/wiki/File:Beta_distribution_pdf.svg
Lets for beginning decide that we will pick numbers from [0,1] because it makes stuff simpler.
n is number that represents distribution (0,2321 or -2321) as in example
We need solution only for n > 0, because if n < 0. You can take positive version of n and subtract from 1.
One simple idea for PDF in interval [0,1] is x^n. (or at least this kind of shape)
Calculating CDF is then integrating x^n and is x^(n+1)/(n+1)
Because CDF must be 1 at the end (in our case at 1) our final CDF is than x^(n+1) and is properly weighted
In order to generate this kind distribution from this, we must calaculate quantile function
Quantile function is just inverse of CDF and is in our case. x^(1/(n+1))
And that is it. Your QF is x^(1/(n+1))
To generate numbers from [0,1] you have to pick uniformly distributetd random from [0,1] (most common random function in programming languages)
and than power this whit (1/(n+1))
Only problem I see is that it can be problem to calculate 1-x^(1/(-n+1)) correctly, where n < 0 but i think that you can use log1p,
so it becomes exp(log1p(-x^(1/(-n+1))) if n<0
conclusion whit normalizations
if n>=0: (x^(1/(n/1000+1)))*1000
if n<0: exp(log1p(-(x^(1/(-(n/1000)+1)))))*1000
where x is uniformly distributed random value in interval [0,1]

Unbiased random number generator using a biased one

You have a biased random number generator that produces a 1 with a probability p and 0 with a probability (1-p). You do not know the value of p. Using this make an unbiased random number generator which produces 1 with a probability 0.5 and 0 with a probability 0.5.
Note: this problem is an exercise problem from Introduction to Algorithms by Cormen, Leiserson, Rivest, Stein.(clrs)
The events (p)(1-p) and (1-p)(p) are equiprobable. Taking them as 0 and 1 respectively and discarding the other two pairs of results you get an unbiased random generator.
In code this is done as easy as:
int UnbiasedRandom()
{
int x, y;
do
{
x = BiasedRandom();
y = BiasedRandom();
} while (x == y);
return x;
}
The procedure to produce an unbiased coin from a biased one was first attributed to Von Neumann (a guy who has done enormous work in math and many related fields). The procedure is super simple:
Toss the coin twice.
If the results match, start over, forgetting both results.
If the results differ, use the first result, forgetting the second.
The reason this algorithm works is because the probability of getting HT is p(1-p), which is the same as getting TH (1-p)p. Thus two events are equally likely.
I am also reading this book and it asks the expected running time. The probability that two tosses are not equal is z = 2*p*(1-p), therefore the expected running time is 1/z.
The previous example looks encouraging (after all, if you have a biased coin with a bias of p=0.99, you will need to throw your coin approximately 50 times, which is not that many). So you might think that this is an optimal algorithm. Sadly it is not.
Here is how it compares with the Shannon's theoretical bound (image is taken from this answer). It shows that the algorithm is good, but far from optimal.
You can come up with an improvement if you will consider that HHTT will be discarded by this algorithm, but in fact it has the same probability as TTHH. So you can also stop here and return H. The same is with HHHHTTTT and so on. Using these cases improves the expected running time, but are not making it theoretically optimal.
And in the end - python code:
import random
def biased(p):
# create a biased coin
return 1 if random.random() < p else 0
def unbiased_from_biased(p):
n1, n2 = biased(p), biased(p)
while n1 == n2:
n1, n2 = biased(p), biased(p)
return n1
p = random.random()
print p
tosses = [unbiased_from_biased(p) for i in xrange(1000)]
n_1 = sum(tosses)
n_2 = len(tosses) - n_1
print n_1, n_2
It is pretty self-explanatory, and here is an example result:
0.0973181652114
505 495
As you see, nonetheless we had a bias of 0.097, we got approximately the same number of 1 and 0
The trick attributed to von Neumann of getting two bits at a time, having 01 correspond to 0 and 10 to 1, and repeating for 00 or 11 has already come up. The expected value of bits you need to extract to get a single bit using this method is 1/p(1-p), which can get quite large if p is especially small or large, so it is worthwhile to ask whether the method can be improved, especially since it is evident that it throws away a lot of information (all 00 and 11 cases).
Googling for "von neumann trick biased" produced this paper that develops a better solution for the problem. The idea is that you still take bits two at a time, but if the first two attempts produce only 00s and 11s, you treat a pair of 0s as a single 0 and a pair of 1s as a single 1, and apply von Neumann's trick to these pairs. And if that doesn't work either, keep combining similarly at this level of pairs, and so on.
Further on, the paper develops this into generating multiple unbiased bits from the biased source, essentially using two different ways of generating bits from the bit-pairs, and giving a sketch that this is optimal in the sense that it produces exactly the number of bits that the original sequence had entropy in it.
You need to draw pairs of values from the RNG until you get a sequence of different values, i.e. zero followed by one or one followed by zero. You then take the first value (or last, doesn't matter) of that sequence. (i.e. Repeat as long as the pair drawn is either two zeros or two ones)
The math behind this is simple: a 0 then 1 sequence has the very same probability as a 1 then zero sequence. By always taking the first (or the last) element of this sequence as the output of your new RNG, we get an even chance to get a zero or a one.
Besides the von Neumann procedure given in other answers, there is a whole family of techniques, called randomness extraction (also known as debiasing, deskewing, or whitening), that serve to produce unbiased random bits from random numbers of unknown bias. They include Peres's (1992) iterated von Neumann procedure, as well as an "extractor tree" by Zhou and Bruck (2012). Both methods (and several others) are asymptotically optimal, that is, their efficiency (in terms of output bits per input) approaches the optimal limit as the number of inputs gets large (Pae 2018).
For example, the Peres extractor takes a list of bits (zeros and ones with the same bias) as input and is described as follows:
Create two empty lists named U and V. Then, while two or more bits remain in the input:
If the next two bits are 0/0, append 0 to U and 0 to V.
Otherwise, if those bits are 0/1, append 1 to U, then write a 0.
Otherwise, if those bits are 1/0, append 1 to U, then write a 1.
Otherwise, if those bits are 1/1, append 0 to U and 1 to V.
Run this algorithm recursively, reading from the bits placed in U.
Run this algorithm recursively, reading from the bits placed in V.
This is not to mention procedures that produce unbiased random bits from biased dice or other biased random numbers (not just biased bits); see, e.g., Camion (1974).
I discuss more on randomness extractors in a note on randomness extraction.
REFERENCES:
Peres, Y., "Iterating von Neumann's procedure for extracting random bits", Annals of Statistics 1992,20,1, p. 590-597.
Zhou, H. And Bruck, J., "Streaming algorithms for optimal generation of random bits", arXiv:1209.0730 [cs.IT], 2012.
S. Pae, "Binarization Trees and Random Number Generation", arXiv:1602.06058v2 [cs.DS].
Camion, Paul, "Unbiased die rolling with a biased die", North Carolina State University. Dept. Of Statistics, 1974.
Here's one way, probably not the most efficient. Chew through a bunch of random numbers until you get a sequence of the form [0..., 1, 0..., 1] (where 0... is one or more 0s). Count the number of 0s. If the first sequence is longer, generate a 0, if the second sequence is longer, generate a 1. (If they're the same, try again.)
This is like what HotBits does to generate random numbers from radioactive particle decay:
Since the time of any given decay is random, then the interval between two consecutive decays is also random. What we do, then, is measure a pair of these intervals, and emit a zero or one bit based on the relative length of the two intervals. If we measure the same interval for the two decays, we discard the measurement and try again
HotBits: How It Works
I'm just explaining the already proposed solutions with some running proof. This solution will be unbiased, no matter how many times we change the probability. In a head n tail toss, the exclusivity of consecutive head n tail or tail n head is always unbiased.
import random
def biased_toss(probability):
if random.random() > probability:
return 1
else:
return 0
def unbiased_toss(probability):
x = biased_toss(probability)
y = biased_toss(probability)
while x == y:
x = biased_toss(probability)
y = biased_toss(probability)
else:
return x
# results with contain counts of heads '0' and tails '1'
results = {'0':0, '1':0}
for i in range(1000):
# on every call we are changing the probability
p = random.random()
results[str(unbiased_toss(p))] += 1
# it still return unbiased result
print(results)

Resources