Are floats a secure alternative to generating an un-biased random number - algorithm

I have always generated un-biased random numbers by throwing away any numbers in the biased range. Similar to this
int biasCount = MAX_INT % max
int maxSafeNumber = MAX_INT - biasCount;
int generatedNumber = 0;
do
{
generatedNumber = GenerateNumber();
} while (generatedNumber > maxSafeNumber)
return generatedNumber % max;
Today a friend showed me how he generated random numbers by converting the generated number into a float, then multiplying that against the max.
float percent = generatedNumber / (float)MAX_INT;
return (int)(percent * max);
This seems to solve the bias issue by not having to use a modulus in the first place. It also looks simple and fast. Is there any reason why the float approach would not be as secure (unbiased) as the first one?

The float method with a floor (i.e. your cast) introduces a bias
against the largest value in your range.
In order to return max, generatedNumber == MAX_INT must be true.
So max has probability 1/MAX_INT, while every other number in the
range has probability max/MAX_INT
As Henry points out, there's also the issue of aliasing if MAX_INT
is not a multiple of max. This makes some values in the range more
likely than others. The larger the difference between max and MAX_INT the smaller this bias is.
(Assuming you get, and want, a uniform distribution.)
This presentation by Stephan T. Lavavej from GoingNative 2013 goes over a lot of common fallacies with random numbers, including these range schemes. It's C++ centric in the implementations, but all the concepts carry over to any language:
http://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful

The float method may not generate uniformly distributed output numbers even when the input numbers are uniformly distributed. To see where it breaks down do some examples with small numbers e.g. max = 6, MAX_INT = 8
it gets better when MAX_INT is large, but it is almost never perfect.

Related

Sampling from all possible floats in D

In the D programming language, the standard random (std.random) module provides a simple mechanism for generating a random number in some specified range.
auto a = uniform(0, 1024, gen);
What is the best way in D to sample from all possible floating point values?
For clarification, sampling from all possible 32-bit integers can be done as follows:
auto l = uniform!int(); // randomly selected int from all possible integers
Depends on the kind of distribution you want.
A uniform distribution over all possible values could be done by generating a random ulong and then casting the bits into floating point. For T being float or double:
union both { ulong input; T output; }
both val;
val.input = uniform!"[]"(ulong.min, ulong.max);
return val.output;
Since roughly half of the positive floating point numbers are between 0 and 1, this method will often give you numbers near zero.`It will also give you infinity and NaN values.
Aside: This code should be fine with D, but would be undefined behavior in C/C++. Use memcpy there.
If you prefer a uniform distribution over all possible numbers in floating point (equal probability for 0..1 and 1..2 etc), you need something like the normal uniform!double, which unfortunately does not work very well for large numbers. It also will not generate infinity or NaN. You could generate double numbers and convert them to float, but I have no answer for generating random large double numbers.

How to generate a random number from a random bit generator and guarantee termination?

Assuming I have a function that returns a random bit, is it possible to write a function that uniformly generates a random number within a certain range and always terminates?
I know how to do this so that it should (and probably will) terminate. I was just wondering if it's possible to write one that is guaranteed to terminate (and it doesn't have to be particularly efficient. What complexity would it have?
Here is a code for the not always terminating version
int random(int n)
{
while(true)
{
int r = 0;
for (int i = 0; i < ceil(log(n)); i++)
{
r = r<<1;
r = r|getRandomBit();
}
if(r<n)
{
return r;
}
}
}
I think this will work:
Suppose you want to generate a number in the range [a,b]
Generate a fraction r in range [0,1} using a binary radix. That means generate a number of form 0.x1x2x3.... where every x is either a 0 or 1 using your random function.
Once you have that, you can easily generate a number in the range [0,b-a], by computing ceil(r*(b-a)), and then simply add a to get a number in range [a,b]
If the size of the range isn't a power of 2, you can't get an exactly uniform distribution except through what amounts to rejection sampling. You can get as close as you like to uniform, however, by sampling once from a large range, and dividing the smaller range into it.
For instance, while you can't uniformly sample between 1 and 10, you can quite easily sample between 1 and 1024 by picking 10 random bits, and figure out some way of equitably dividing that into 10 intervals of about the same size.
Choosing additional bits has the effect of halving the largest error (from true uniformity) you have to see in your choices... so the error decreases exponentially as you choose more bits.

How to generate a number in arbitrary range using random()={0..1} preserving uniformness and density?

Generate a random number in range [x..y] where x and y are any arbitrary floating point numbers. Use function random(), which returns a random floating point number in range [0..1] from P uniformly distributed numbers (call it "density"). Uniform distribution must be preserved and P must be scaled as well.
I think, there is no easy solution for such problem. To simplify it a bit, I ask you how to generate a number in interval [-0.5 .. 0.5], then in [0 .. 2], then in [-2 .. 0], preserving uniformness and density? Thus, for [0 .. 2] it must generate a random number from P*2 uniformly distributed numbers.
The obvious simple solution random() * (x - y) + y will generate not all possible numbers because of the lower density for all abs(x-y)>1.0 cases. Many possible values will be missed. Remember, that random() returns only a number from P possible numbers. Then, if you multiply such number by Q, it will give you only one of P possible values, scaled by Q, but you have to scale density P by Q as well.
If I understand you problem well, I will provide you a solution: but I would exclude 1, from the range.
N = numbers_in_your_random // [0, 0.2, 0.4, 0.6, 0.8] will be 5
// This turns your random number generator to return integer values between [0..N[;
function randomInt()
{
return random()*N;
}
// This turns the integer random number generator to return arbitrary
// integer
function getRandomInt(maxValue)
{
if (maxValue < N)
{
return randomInt() % maxValue;
}
else
{
baseValue = randomInt();
bRate = maxValue DIV N;
bMod = maxValue % N;
if (baseValue < bMod)
{
bRate++;
}
return N*getRandomInt(bRate) + baseValue;
}
}
// This will return random number in range [lower, upper[ with the same density as random()
function extendedRandom(lower, upper)
{
diff = upper - lower;
ndiff = diff * N;
baseValue = getRandomInt(ndiff);
baseValue/=N;
return lower + baseValue;
}
If you really want to generate all possible floating point numbers in a given range with uniform numeric density, you need to take into account the floating point format. For each possible value of your binary exponent, you have a different numeric density of codes. A direct generation method will need to deal with this explicitly, and an indirect generation method will still need to take it into account. I will develop a direct method; for the sake of simplicity, the following refers exclusively to IEEE 754 single-precision (32-bit) floating point numbers.
The most difficult case is any interval that includes zero. In that case, to produce an exactly even distribution, you will need to handle every exponent down to the lowest, plus denormalized numbers. As a special case, you will need to split zero into two cases, +0 and -0.
In addition, if you are paying such close attention to the result, you will need to make sure that you are using a good pseudorandom number generator with a large enough state space that you can expect it to hit every value with near-uniform probability. This disqualifies the C/Unix rand() and possibly the*rand48() library functions; you should use something like the Mersenne Twister instead.
The key is to dissect the target interval into subintervals, each of which is covered by different combination of binary exponent and sign: within each subinterval, floating point codes are uniformly distributed.
The first step is to select the appropriate subinterval, with probability proportional to its size. If the interval contains 0, or otherwise covers a large dynamic range, this may potentially require a number of random bits up to the full range of the available exponent.
In particular, for a 32-bit IEEE-754 number, there are 256 possible exponent values. Each exponent governs a range which is half the size of the next greater exponent, except for the denormalized case, which is the same size as the smallest normal exponent region. Zero can be considered the smallest denormalized number; as mentioned above, if the target interval straddles zero, the probability of each of +0 and -0 should perhaps be cut in half, to avoid doubling its weight.
If the subinterval chosen covers the entire region governed by a particular exponent, all that is necessary is to fill the mantissa with random bits (23 bits, for 32-bit IEEE-754 floats). However, if the subinterval does not cover the entire region, you will need to generate a random mantissa that covers only that subinterval.
The simplest way to handle both the initial and secondary random steps may be to round the target interval out to include the entirety of all exponent regions partially covered, then reject and retry numbers that fall outside it. This allows the exponent to be generated with simple power-of-2 probabilities (e.g., by counting the number of leading zeroes in your random bitstream), as well as providing a simple and accurate way of generating a mantissa that covers only part of an exponent interval. (This is also a good way of handling the +/-0 special case.)
As another special case: to avoid inefficient generation for target intervals which are much smaller than the exponent regions they reside in, the "obvious simple" solution will in fact generate fairly uniform numbers for such intervals. If you want exactly uniform distributions, you can generate the sub-interval mantissa by using only enough random bits to cover that sub-interval, while still using the aforementioned rejection method to eliminate values outside the target interval.
well, [0..1] * 2 == [0..2] (still uniform)
[0..1] - 0.5 == [-0.5..0.5] etc.
I wonder where have you experienced such an interview?
Update: well, if we want to start caring about losing precision on multiplication (which is weird, because somehow you did not care about that in the original task, and pretend we care about "number of values", we can start iterating. In order to do that, we need one more function, which would return uniformly distributed random values in [0..1) — which can be done by dropping the 1.0 value would it ever appear. After that, we can slice the whole range in equal parts small enough to not care about losing precision, choose one randomly (we have enough randomness to do that), and choose a number in this bucket using [0..1) function for all parts but the last one.
Or, you can come up with a way to code enough values to care about—and just generate random bits for this code, in which case you don't really care whether it's [0..1] or just {0, 1}.
Let me rephrase your question:
Let random() be a random number generator with a discrete uniform distribution over [0,1). Let D be the number of possible values returned by random(), each of which is precisely 1/D greater than the previous. Create a random number generator rand(L, U) with a discrete uniform distribution over [L, U) such that each possible value is precisely 1/D greater than the previous.
--
A couple quick notes.
The problem in this form, and as you phrased it is unsolvable. That
is, if N = 1 there is nothing we can do.
I don't require that 0.0 be one of the possible values for random(). If it is not, then it is possible that the solution below will fail when U - L < 1 / D. I'm not particularly worried about that case.
I use all half-open ranges because it makes the analysis simpler. Using your closed ranges would be simple, but tedious.
Finally, the good stuff. The key insight here is that the density can be maintained by independently selecting the whole and fractional parts of the result.
First, note that given random() it is trivial to create randomBit(). That is,
randomBit() { return random() >= 0.5; }
Then, if we want to select one of {0, 1, 2, ..., 2^N - 1} uniformly at random, that is simple using randomBit(), just generate each of the bits. Call this random2(N).
Using random2() we can select one of {0, 1, 2, ..., N - 1}:
randomInt(N) { while ((val = random2(ceil(log2(N)))) >= N); return val; }
Now, if D is known, then the problem is trivial as we can reduce it to simply choosing one of floor((U - L) * D) values uniformly at random and we can do that with randomInt().
So, let's assume that D is not known. Now, let's first make a function to generate random values in the range [0, 2^N) with the proper density. This is simple.
rand2D(N) { return random2(N) + random(); }
rand2D() is where we require that the difference between consecutive possible values for random() be precisely 1/D. If not, the possible values here would not have uniform density.
Next, we need a function that selects a value in the range [0, V) with the proper density. This is similar to randomInt() above.
randD(V) { while ((val = rand2D(ceil(log2(V)))) >= V); return val; }
And finally...
rand(L, U) { return L + randD(U - L); }
We now may have offset the discrete positions if L / D is not an integer, but that is unimportant.
--
A last note, you may have noticed that several of these functions may never terminate. That is essentially a requirement. For example, random() may have only a single bit of randomness. If I then ask you to select from one of three values, you cannot do so uniformly at random with a function that is guaranteed to terminate.
Consider this approach:
I'm assuming the base random number generator in the range [0..1]
generates among the numbers
0, 1/(p-1), 2/(p-1), ..., (p-2)/(p-1), (p-1)/(p-1)
If the target interval length is less than or equal to 1,
return random()*(y-x) + x.
Else, map each number r from the base RNG to an interval in the
target range:
[r*(p-1)*(y-x)/p, (r+1/(p-1))*(p-1)*(y-x)/p]
(i.e. for each of the P numbers assign one of P intervals with length (y-x)/p)
Then recursively generate another random number in that interval and
add it to the interval begin.
Pseudocode:
const p;
function rand(x, y)
r = random()
if y-x <= 1
return x + r*(y-x)
else
low = r*(p-1)*(y-x)/p
high = low + (y-x)/p
return x + low + rand(low, high)
In real math: the solution is just the provided:
return random() * (upper - lower) + lower
The problem is that, even when you have floating point numbers, only have a certain resolution. So what you can do is apply above function and add another random() value scaled to the missing part.
If I make a practical example it becomes clear what I mean:
E.g. take random() return value from 0..1 with 2 digits accuracy, ie 0.XY, and lower with 100 and upper with 1100.
So with above algorithm you get as result 0.XY * (1100-100) + 100 = XY0.0 + 100.
You will never see 201 as result, as the final digit has to be 0.
Solution here would be to generate again a random value and add it *10, so you have accuracy of one digit (here you have to take care that you dont exceed your given range, which can happen, in this case you have to discard the result and generate a new number).
Maybe you have to repeat it, how often depends on how many places the random() function delivers and how much you expect in your final result.
In a standard IEEE format has a limited precision (i.e. double 53 bits). So when you generate a number this way, you never need to generate more than one additional number.
But you have to be careful that when you add the new number, you dont exceed your given upper limit. There are multiple solutions to it: First if you exceed your limit, you start from new, generating a new number (dont cut off or similar, as this changes the distribution).
Second possibility is to check the the intervall size of the missing lower bit range, and
find the middle value, and generate an appropiate value, that guarantees that the result will fit.
You have to consider the amount of entropy that comes from each call to your RNG. Here is some C# code I just wrote that demonstrates how you can accumulate entropy from low-entropy source(s) and end up with a high-entropy random value.
using System;
using System.Collections.Generic;
using System.Security.Cryptography;
namespace SO_8019589
{
class LowEntropyRandom
{
public readonly double EffectiveEntropyBits;
public readonly int PossibleOutcomeCount;
private readonly double interval;
private readonly Random random = new Random();
public LowEntropyRandom(int possibleOutcomeCount)
{
PossibleOutcomeCount = possibleOutcomeCount;
EffectiveEntropyBits = Math.Log(PossibleOutcomeCount, 2);
interval = 1.0 / PossibleOutcomeCount;
}
public LowEntropyRandom(int possibleOutcomeCount, int seed)
: this(possibleOutcomeCount)
{
random = new Random(seed);
}
public int Next()
{
return random.Next(PossibleOutcomeCount);
}
public double NextDouble()
{
return interval * Next();
}
}
class EntropyAccumulator
{
private List<byte> currentEntropy = new List<byte>();
public double CurrentEntropyBits { get; private set; }
public void Clear()
{
currentEntropy.Clear();
CurrentEntropyBits = 0;
}
public void Add(byte[] entropy, double effectiveBits)
{
currentEntropy.AddRange(entropy);
CurrentEntropyBits += effectiveBits;
}
public byte[] GetBytes(int count)
{
using (var hasher = new SHA512Managed())
{
count = Math.Min(count, hasher.HashSize / 8);
var bytes = new byte[count];
var hash = hasher.ComputeHash(currentEntropy.ToArray());
Array.Copy(hash, bytes, count);
return bytes;
}
}
public byte[] GetPackagedEntropy()
{
// Returns a compact byte array that represents almost all of the entropy.
return GetBytes((int)(CurrentEntropyBits / 8));
}
public double GetDouble()
{
// returns a uniformly distributed number on [0-1)
return (double)BitConverter.ToUInt64(GetBytes(8), 0) / ((double)UInt64.MaxValue + 1);
}
public double GetInt(int maxValue)
{
// returns a uniformly distributed integer on [0-maxValue)
return (int)(maxValue * GetDouble());
}
}
class Program
{
static void Main(string[] args)
{
var random = new LowEntropyRandom(2); // this only provides 1 bit of entropy per call
var desiredEntropyBits = 64; // enough for a double
while (true)
{
var adder = new EntropyAccumulator();
while (adder.CurrentEntropyBits < desiredEntropyBits)
{
adder.Add(BitConverter.GetBytes(random.Next()), random.EffectiveEntropyBits);
}
Console.WriteLine(adder.GetDouble());
Console.ReadLine();
}
}
}
}
Since I'm using a 512-bit hash function, that is the max amount of entropy that you can get out of the EntropyAccumulator. This could be fixed, if necessarily.
If I understand your problem correctly, it's that rand() generates finely spaced but ultimately discrete random numbers. And if we multiply it by (y-x) which is large, this spreads these finely spaced floating point values out in a way that is missing many of the floating point values in the range [x,y]. Is that all right?
If so, I think we have a solution already given by Dialecticus. Let me explain why he is right.
First, we know how to generate a random float and then add another floating point value to it. This may produce a round off error due to addition, but it will be in the last decimal place only. Use doubles or something with finer numerical resolution if you want better precision. So, with that caveat, the problem is no harder than finding a random float in the range [0,y-x] with uniform density. Let's say y-x = z. Obviously, since z is a floating point it may not be an integer. We handle the problem in two steps: first we generate the random digits to the left of the decimal point and then generate the random digits to the right of it. Doing both uniformly means their sum is uniformly distributed across the range [0,z] too. Let w be the largest integer <= z. To answer our simplified problem, we can first pick a random integer from the range {0,1,...,w}. Then, step #2 is to add a random float from the unit interval to this random number. This isn't multiplied by any possibly large values, so it has as fine a resolution as the numerical type can have. (Assuming you're using an ideal random floating point number generator.)
So what about the corner case where the random integer was the largest one (i.e. w) and the random float we added to it was larger than z - w so that the random number exceeds the allowed maximum? The answer is simple: do all of it again and check the new result. Repeat until you get a digit in the allowed range. It's an easy proof that a uniformly generated random number which is tossed out and generated again if it's outside an allowed range results in a uniformly generated random in the allowed range. Once you make this key observation, you see that Dialecticus met all your criteria.
When you generate a random number with random(), you get a floating point number between 0 and 1 having an unknown precision (or density, you name it).
And when you multiply it with a number (NUM), you lose this precision, by lg(NUM) (10-based logarithm). So if you multiply by 1000 (NUM=1000), you lose the last 3 digits (lg(1000) = 3).
You may correct this by adding a smaller random number to the original, which has this missing 3 digits. But you don't know the precision, so you can't determine where are they exactly.
I can imagine two scenarios:
(X = range start, Y = range end)
1: you define the precision (PREC, eg. 20 digits, so PREC=20), and consider it enough to generate a random number, so the expression will be:
( random() * (Y-X) + X ) + ( random() / 10 ^ (PREC-trunc(lg(Y-X))) )
with numbers: (X = 500, Y = 1500, PREC = 20)
( random() * (1500-500) + 500 ) + ( random() / 10 ^ (20-trunc(lg(1000))) )
( random() * 1000 + 500 ) + ( random() / 10 ^ (17) )
There are some problems with this:
2 phase random generation (how much will it be random?)
the first random returns 1 -> result can be out of range
2: guess the precision by random numbers
you define some tries (eg. 4) to calculate the precision by generating random numbers and count the precision every time:
- 0.4663164 -> PREC=7
- 0.2581916 -> PREC=7
- 0.9147385 -> PREC=7
- 0.129141 -> PREC=6 -> 7, correcting by the average of the other tries
That's my idea.

random permutation

I would like to genrate a random permutation as fast as possible.
The problem: The knuth shuffle which is O(n) involves generating n random numbers.
Since generating random numbers is quite expensive.
I would like to find an O(n) function involving a fixed O(1) amount of random numbers.
I realize that this question has been asked before, but I did not see any relevant answers.
Just to stress a point: I am not looking for anything less than O(n), just an algorithm involving less generation of random numbers.
Thanks
Create a 1-1 mapping of each permutation to a number from 1 to n! (n factorial). Generate a random number in 1 to n!, use the mapping, get the permutation.
For the mapping, perhaps this will be useful: http://en.wikipedia.org/wiki/Permutation#Numbering_permutations
Of course, this would get out of hand quickly, as n! can become really large soon.
Generating a random number takes long time you say? The implementation of Javas Random.nextInt is roughly
oldseed = seed;
nextseed = (oldseed * multiplier + addend) & mask;
return (int)(nextseed >>> (48 - bits));
Is that too much work to do for each element?
See https://doi.org/10.1145/3009909 for a careful analysis of the number of random bits required to generate a random permutation. (It's open-access, but it's not easy reading! Bottom line: if carefully implemented, all of the usual methods for generating random permutations are efficient in their use of random bits.)
And... if your goal is to generate a random permutation rapidly for large N, I'd suggest you try the MergeShuffle algorithm. An article published in 2015 claimed a factor-of-two speedup over Fisher-Yates in both parallel and sequential implementations, and a significant speedup in sequential computations over the other standard algorithm they tested (Rao-Sandelius).
An implementation of MergeShuffle (and of the usual Fisher-Yates and Rao-Sandelius algorithms) is available at https://github.com/axel-bacher/mergeshuffle. But caveat emptor! The authors are theoreticians, not software engineers. They have published their experimental code to github but aren't maintaining it. Someday, I imagine someone (perhaps you!) will add MergeShuffle to GSL. At present gsl_ran_shuffle() is an implementation of Fisher-Yates, see https://www.gnu.org/software/gsl/doc/html/randist.html?highlight=gsl_ran_shuffle.
Not what you asked exactly, but if provided random number generator doesn't satisfy you, may be you should try something different. Generally, pseudorandom number generation can be very simple.
Probably, best-known algorithm
http://en.wikipedia.org/wiki/Linear_congruential_generator
More
http://en.wikipedia.org/wiki/List_of_pseudorandom_number_generators
As other answers suggest, you can make a random integer in the range 0 to N! and use it to produce a shuffle. Although theoretically correct, this won't be faster in general since N! grows fast and you'll spend all your time doing bigint arithmetic.
If you want speed and you don't mind trading off some randomness, you will be much better off using a less good random number generator. A linear congruential generator (see http://en.wikipedia.org/wiki/Linear_congruential_generator) will give you a random number in a few cycles.
Usually there is no need in full-range of next random value, so to use exactly the same amount of randomness you can use next approach (which is almost like random(0,N!), I guess):
// ...
m = 1; // range of random buffer (single variant)
r = 0; // random buffer (number zero)
// ...
for(/* ... */) {
while (m < n) { // range of our buffer is too narrow for "n"
r = r*RAND_MAX + random(); // add another random to our random-buffer
m *= RAND_MAX; // update range of random-buffer
}
x = r % n; // pull-out next random with range "n"
r /= n; // remove it from random-buffer
m /= n; // fix range of random-buffer
// ...
}
P.S. of course there will be some errors related with division by value different from 2^n, but they will be distributed among resulted samples.
Generate N numbers (N < of the number of random number you need) before to do the computation, or store them in an array as data, with your slow but good random generator; then pick up a number simply incrementing an index into the array inside your computing loop; if you need different seeds, create multiple tables.
Are you sure that your mathematical and algorithmical approach to the problem is correct?
I hit exactly same problem where Fisher–Yates shuffle will be bottleneck in corner cases. But for me the real problem is brute force algorithm that doesn't scale well to all problems. Following story explains the problem and optimizations that I have come up with so far.
Dealing cards for 4 players
Number of possible deals is 96 bit number. That puts quite a stress for random number generator to avoid statical anomalies when selecting play plan from generated sample set of deals. I choose to use 2xmt19937_64 seeded from /dev/random because of the long period and heavy advertisement in web that it is good for scientific simulations.
Simple approach is to use Fisher–Yates shuffle to generate deals and filter out deals that don't match already collected information. Knuth shuffle takes ~1400 CPU cycles per deal mostly because I have to generate 51 random numbers and swap 51 times entries in the table.
That doesn't matter for normal cases where I would only need to generate 10000-100000 deals in 7 minutes. But there is extreme cases when filters may select only very small subset of hands requiring huge number of deals to be generated.
Using single number for multiple cards
When profiling with callgrind (valgrind) I noticed that main slow down was C++ random number generator (after switching away from std::uniform_int_distribution that was first bottleneck).
Then I came up with idea that I can use single random number for multiple cards. The idea is to use least significant information from the number first and then erase that information.
int number = uniform_rng(0, 52*51*50*49);
int card1 = number % 52;
number /= 52;
int cards2 = number % 51;
number /= 51;
......
Of course that is only minor optimization because generation is still O(N).
Generation using bit permutations
Next idea was exactly solution asked in here but I ended up still with O(N) but with larger cost than original shuffle. But lets look into solution and why it fails so miserably.
I decided to use idea Dealing All the Deals by John Christman
void Deal::generate()
{
// 52:26 split, 52!/(26!)**2 = 495,918,532,948,1041
max = 495918532948104LU;
partner = uniform_rng(eng1, max);
// 2x 26:13 splits, (26!)**2/(13!)**2 = 10,400,600**2
max = 10400600LU*10400600LU;
hands = uniform_rng(eng2, max);
// Create 104 bit presentation of deal (2 bits per card)
select_deal(id, partner, hands);
}
So far good and pretty good looking but select_deal implementation is PITA.
void select_deal(Id &new_id, uint64_t partner, uint64_t hands)
{
unsigned idx;
unsigned e, n, ns = 26;
e = n = 13;
// Figure out partnership who owns which card
for (idx = CARDS_IN_SUIT*NUM_SUITS; idx > 0; ) {
uint64_t cut = ncr(idx - 1, ns);
if (partner >= cut) {
partner -= cut;
// Figure out if N or S holds the card
ns--;
cut = ncr(ns, n) * 10400600LU;
if (hands > cut) {
hands -= cut;
n--;
} else
new_id[idx%NUM_SUITS] |= 1 << (idx/NUM_SUITS);
} else
new_id[idx%NUM_SUITS + NUM_SUITS] |= 1 << (idx/NUM_SUITS);
idx--;
}
unsigned ew = 26;
// Figure out if E or W holds a card
for (idx = CARDS_IN_SUIT*NUM_SUITS; idx-- > 0; ) {
if (new_id[idx%NUM_SUITS + NUM_SUITS] & (1 << (idx/NUM_SUITS))) {
uint64_t cut = ncr(--ew, e);
if (hands >= cut) {
hands -= cut;
e--;
} else
new_id[idx%NUM_SUITS] |= 1 << (idx/NUM_SUITS);
}
}
}
Now that I had the O(N) permutation solution done to prove algorithm could work I started searching for O(1) mapping from random number to bit permutation. Too bad it looks like only solution would be using huge lookup tables that would kill CPU caches. That doesn't sound good idea for AI that will be using very large amount of caches for double dummy analyzer.
Mathematical solution
After all hard work to figure out how to generate random bit permutations I decided go back to maths. It is entirely possible to apply filters before dealing cards. That requires splitting deals to manageable number of layered sets and selecting between sets based on their relative probabilities after filtering out impossible sets.
I don't yet have code ready for that to tests how much cycles I'm wasting in common case where filter is selecting major part of deal. But I believe this approach gives the most stable generation performance keeping the cost less than 0.1%.
Generate a 32 bit integer. For each index i (maybe only up to half the number of elements in the array), if bit i % 32 is 1, swap i with n - i - 1.
Of course, this might not be random enough for your purposes. You could probably improve this by not swapping with n - i - 1, but rather by another function applied to n and i that gives better distribution. You could even use two functions: one for when the bit is 0 and another for when it's 1.

What's a good way to add a large number of small floats together?

Say you have 100000000 32-bit floating point values in an array, and each of these floats has a value between 0.0 and 1.0. If you tried to sum them all up like this
result = 0.0;
for (i = 0; i < 100000000; i++) {
result += array[i];
}
you'd run into problems as result gets much larger than 1.0.
So what are some of the ways to more accurately perform the summation?
Sounds like you want to use Kahan Summation.
According to Wikipedia,
The Kahan summation algorithm (also known as compensated summation) significantly reduces the numerical error in the total obtained by adding a sequence of finite precision floating point numbers, compared to the obvious approach. This is done by keeping a separate running compensation (a variable to accumulate small errors).
In pseudocode, the algorithm is:
function kahanSum(input)
var sum = input[1]
var c = 0.0 //A running compensation for lost low-order bits.
for i = 2 to input.length
y = input[i] - c //So far, so good: c is zero.
t = sum + y //Alas, sum is big, y small, so low-order digits of y are lost.
c = (t - sum) - y //(t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y)
sum = t //Algebraically, c should always be zero. Beware eagerly optimising compilers!
next i //Next time around, the lost low part will be added to y in a fresh attempt.
return sum
Make result a double, assuming C or C++.
If you can tolerate a little extra space (in Java):
float temp = new float[1000000];
float temp2 = new float[1000];
float sum = 0.0f;
for (i=0 ; i<1000000000 ; i++) temp[i/1000] += array[i];
for (i=0 ; i<1000000 ; i++) temp2[i/1000] += temp[i];
for (i=0 ; i<1000 ; i++) sum += temp2[i];
Standard divide-and-conquer algorithm, basically. This only works if the numbers are randomly scattered; it won't work if the first half billion numbers are 1e-12 and the second half billion are much larger.
But before doing any of that, one might just accumulate the result in a double. That'll help a lot.
If in .NET using the LINQ .Sum() extension method that exists on an IEnumerable. Then it would just be:
var result = array.Sum();
The absolutely optimal way is to use a priority queue, in the following way:
PriorityQueue<Float> q = new PriorityQueue<Float>();
for(float x : list) q.add(x);
while(q.size() > 1) q.add(q.pop() + q.pop());
return q.pop();
(this code assumes the numbers are positive; generally the queue should be ordered by absolute value)
Explanation: given a list of numbers, to add them up as precisely as possible you should strive to make the numbers close, t.i. eliminate the difference between small and big ones. That's why you want to add up the two smallest numbers, thus increasing the minimal value of the list, decreasing the difference between the minimum and maximum in the list and reducing the problem size by 1.
Unfortunately I have no idea about how this can be vectorized, considering that you're using OpenCL. But I am almost sure that it can be. You might take a look at the book on vector algorithms, it is surprising how powerful they actually are: Vector Models for Data-Parallel Computing

Resources