Get N samples given iterator - algorithm

Given are an iterator it over data points, the number of data points we have n, and the maximum number of samples we want to use to do some calculations (maxSamples).
Imagine a function calculateStatistics(Iterator it, int n, int maxSamples). This function should use the iterator to retrieve the data and do some (heavy) calculations on the data element retrieved.
if n <= maxSamples we will of course use each element we get from the iterator
if n > maxSamples we will have to choose which elements to look at and which to skip
I've been spending quite some time on this. The problem is of course how to choose when to skip an element and when to keep it. My approaches so far:
I don't want to take the first maxSamples coming from the iterator, because the values might not be evenly distributed.
Another idea was to use a random number generator and let me create maxSamples (distinct) random numbers between 0 and n and take the elements at these positions. But if e.g. n = 101 and maxSamples = 100 it gets more and more difficult to find a new distinct number not yet in the list, loosing lot of time just in the random number generation
My last idea was to do the contrary: to generate n - maxSamples random numbers and exclude the data elements at these positions elements. But this also doesn't seem to be a very good solution.
Do you have a good idea for this problem? Are there maybe standard known algorithms for this?

To provide some answer, a good way to collect a set of random numbers given collection size > elements needed, is the following. (in C++ ish pseudo code).
EDIT: you may need to iterate over and create the "someElements" vector first. If your elements are large they can be "pointers" to these elements to save space.
vector randomCollectionFromVector(someElements, numElementsToGrab) {
while(numElementsToGrab--) {
randPosition = rand() % someElements.size();
return resultVector;
If you don't care about changing your vector of elements, you could also remove random elements from someElements, as you mentioned. The algorithm would look very similar, and again, this is conceptually the same idea, you just pass someElements by reference, and manipulate it.
Something worth noting, is the quality of psuedo random distributions as far as how random they are, grows as the size of the distribution you used increases. So, you may tend to get better results if you pick which method you use based on which method results in the use of more random numbers. Example: if you have 100 values, and need 99, you should probably pick 99 values, as this will result in you using 99 pseudo random numbers, instead of just 1. Conversely, if you have 1000 values, and need 99, you should probably prefer the version where you remove 901 values, because you use more numbers from the psuedo random distribution. If what you want is a solid random distribution, this is a very simple optimization, that will greatly increase the quality of "fake randomness" that you see. Alternatively, if performance matters more than distribution, you would take the alternative or even just grab the first 99 values approach.

interval = n/(n-maxSamples) //an euclidian division of course
offset = random(0..(n-1)) //a random number between 0 and n-1
totalSkip = 0
indexSample = 0;
FOR it IN samples DO
indexSample++ // goes from 1 to n
IF totalSkip < (n-maxSamples) AND indexSample+offset % interval == 0 THEN
//do nothing with this sample
//work with this sample
ASSERT(totalSkip == n-maxSamples) //to be sure
interval represents the distance between two samples to skip.
offset is not mandatory but it allows to have a very little diversity.

Based on the discussion, and greater understanding of your problem, I suggest the following. You can take advantage of a property of prime numbers that I think will net you a very good solution, that will appear to grab pseudo random numbers. It is illustrated in the following code.
#include <iostream>
using namespace std;
int main() {
const int SOME_LARGE_PRIME = 577; //This prime should be larger than the size of your data set.
const int NUM_ELEMENTS = 100;
int lastValue = 0;
for(int i = 0; i < NUM_ELEMENTS; i++) {
lastValue += SOME_LARGE_PRIME;
cout << lastValue % NUM_ELEMENTS << endl;
Using the logic presented here, you can create a table of all values from 1 to "NUM_ELEMENTS". Because of the properties of prime numbers, you will not get any duplicates until you rotate all the way around back to the size of your data set. If you then take the first "NUM_SAMPLES" of these, and sort them, you can iterate through your data structure, and grab a pseudo random distribution of numbers(not very good random, but more random than a pre-determined interval), without extra space and only one pass over your data. Better yet, you can change the layout of the distribution by grabbing a random prime number each time, again must be larger than your data set, or the following example breaks.
PRIME = 3, data set size = 99. Won't work.
Of course, ultimately this is very similar to the pre-determined interval, but it inserts a level of randomness that you do not get by simply grabbing every "size/num_samples"th element.

This is called the Reservoir sampling


generate a matrix with random zeros

Using C I am trying to generate a matrix which has to have more number of zero elements than non zero elements. The zero elements should be random how to generate it.
I am able to generate random numbers with some elements as zero, but the zero elements should be more than non-zero elements
int main(){
int array[25];
int i;
for (i=0;i<s;i++){
if (rand()%3 == 0)
array[i] = rand();
is the generated matrix sparse matrix ? how can I understand the difference ?
I assume you’d want no more than 10% of the matrix with non-zero values? Probably a lot less if you have a true sparse matrix that is large (thousands or millions of elements).
I would actually not go the route you are going. I would first init the array to all zeros. You can do that with int myArray[25] = {0} or with memset.
Once you have that, you can then calculate how many non-zero elements you need. If you have 30 elements and want 10% non-zero elements, you need to fill in 3 elements. You can google around and find out how to use srand to calculate which indices to place the non-zero elements at.
Once you have those, you can use srand again to get and set the actual values to fill in.
I have purposely not given a lot of details here, just a general direction I would take. It would probably be good to try a few things out and also provide code examples that actually compile (your example does not, there are a few variables that aren’t defined).

How to generate a random number from a random bit generator and guarantee termination?

Assuming I have a function that returns a random bit, is it possible to write a function that uniformly generates a random number within a certain range and always terminates?
I know how to do this so that it should (and probably will) terminate. I was just wondering if it's possible to write one that is guaranteed to terminate (and it doesn't have to be particularly efficient. What complexity would it have?
Here is a code for the not always terminating version
int random(int n)
int r = 0;
for (int i = 0; i < ceil(log(n)); i++)
r = r<<1;
r = r|getRandomBit();
return r;
I think this will work:
Suppose you want to generate a number in the range [a,b]
Generate a fraction r in range [0,1} using a binary radix. That means generate a number of form 0.x1x2x3.... where every x is either a 0 or 1 using your random function.
Once you have that, you can easily generate a number in the range [0,b-a], by computing ceil(r*(b-a)), and then simply add a to get a number in range [a,b]
If the size of the range isn't a power of 2, you can't get an exactly uniform distribution except through what amounts to rejection sampling. You can get as close as you like to uniform, however, by sampling once from a large range, and dividing the smaller range into it.
For instance, while you can't uniformly sample between 1 and 10, you can quite easily sample between 1 and 1024 by picking 10 random bits, and figure out some way of equitably dividing that into 10 intervals of about the same size.
Choosing additional bits has the effect of halving the largest error (from true uniformity) you have to see in your choices... so the error decreases exponentially as you choose more bits.

Algorithms for testing a poker hand for a straight draw (4 to a straight)?

I'm in the throes of writing a poker evaluation library for fun and am looking to add the ability to test for draws (open ended, gutshot) for a given set of cards.
Just wondering what the "state of the art" is for this? I'm trying to keep my memory footprint reasonable, so the idea of using a look up table doesn't sit well but could be a necessary evil.
My current plan is along the lines of:
subtract the lowest rank from the rank of all cards in the set.
look to see if certain sequence i.e.: 0,1,2,3 or 1,2,3,4 (for OESDs) is a subset of the modified collection.
I'm hoping to do better complexity wise, as 7 card or 9 card sets will grind things to a halt using my approach.
Any input and/or better ideas would be appreciated.
The fastest approach probably to assign a bit mask for each card rank (e.g. deuce=1, three=2, four=4, five=8, six=16, seven=32, eight=64, nine=128, ten=256, jack=512, queen=1024, king=2048, ace=4096), and OR together the mask values of all the cards in the hand. Then use an 8192-element lookup table to indicate whether the hand is a straight, an open-ender, a gut-shot, or a nothing of significance (one could also include the various types of backdoor straight draw without affecting execution time).
Incidentally, using different bitmask values, one can quickly detect other useful hands like two-of-a-kind, three-of-a-kind, etc. If one has 64-bit integer math available, use the cube of the indicated bit masks above (so deuce=1, three=8, etc. up to ace=2^36) and add together the values of the cards. If the result, and'ed with 04444444444444 (octal) is non-zero, the hand is a four-of-a kind. Otherwise, if adding plus 01111111111111, and and'ing with 04444444444444 yields non-zero, the hand is a three-of-a-kind or full-house. Otherwise, if the result, and'ed with 02222222222222 is non-zero, the hand is either a pair or two-pair. To see if a hand contains two or more pairs, 'and' the hand value with 02222222222222, and save that value. Subtract 1, and 'and' the result with the saved value. If non-zero, the hand contains at least two pairs (so if it contains a three-of-a-kind, it's a full house; otherwise it's two-pair).
As a parting note, the computation done to check for a straight will also let you determine quickly how many different ranks of card are in the hand. If there are N cards and N different ranks, the hand cannot contain any pairs or better (but might contain a straight or flush, of course). If there are N-1 different ranks, the hand contains precisely one pair. Only if there are fewer different ranks must one use more sophisticated logic (if there are N-2, the hand could be two-pair or three-of-a-kind; if N-3 or fewer, the hand could be a "three-pair" (scores as two-pair), full house, or four-of-a-kind).
One more thing: if you can't manage an 8192-element lookup table, you could use a 512-element lookup table. Compute the bitmask as above, and then do lookups on array[bitmask & 511] and array[bitmask >> 4], and OR the results. Any legitimate straight or draw will register on one or other lookup. Note that this won't directly give you the number of different ranks (since cards six through ten will get counted in both lookups) but one more lookup to the same array (using array[bitmask >> 9]) would count just the jacks through aces.
I know you said you want to keep the memory footprint as small as possible, but there is one quite memory efficient lookup table optimization which I've seen used in some poker hand evaluators and I have used it myself. If you're doing heavy poker simulations and need the best possible performance, you might wanna consider this. Though I admit in this case the difference isn't that big because testing for a straight draw isn't very expensive operation, but the same principle can be used for pretty much every type of hand evaluation in poker programming.
The idea is that we create a kind of a hash function that has the following properties:
1) calculates a unique value for each different set of card ranks
2) is symmetric in the sense that it doesn't depend on the order of the cards
The purpose of this is to reduce the number of elements needed in the lookup table.
A neat way of doing this is to assign a prime number to each rank (2->2, 3->3, 4->5, 5->7, 6->11, 7->13, 8->17, 9->19, T->23, J->29, Q->31, K->37, A->41), and then calculate the product of the primes. For example if the cards are 39TJQQ, then the hash is 36536259.
To create the lookup table you go through all the possible combinations of ranks, and use some simple algorithm to determine whether they form a straight draw. For each combination you also calculate the hash value and then store the results in a map where Key is the hash and Value is the result of the straight draw check. If the maximum number of cards is small (4 or less) then even a linear array might be feasible.
To use the lookup table you first calculate the hash for the particular set of cards and then read the corresponding value from the map.
Here's an example in C++. I don't guarantee that it's working correctly and it could probably be optimized a lot by using a sorted array and binary search instead of hash_map. hash_map is kinda slow for this purpose.
#include <iostream>
#include <vector>
#include <hash_map>
#include <numeric>
using namespace std;
const int MAXCARDS = 9;
stdext::hash_map<long long, bool> lookup;
//"Hash function" that is unique for a each set of card ranks, and also
//symmetric so that the order of cards doesn't matter.
long long hash(const vector<int>& cards)
static const int primes[52] = {
long long res=1;
for(vector<int>::const_iterator i=cards.begin();i!=cards.end();i++)
res *= primes[*i];
return res;
//Tests whether there is a straight draw (assuming there is no
//straight). Only used for filling the lookup table.
bool is_draw_slow(const vector<int>& cards)
int ranks[14];
for(vector<int>::const_iterator i=cards.begin();i!=cards.end();i++)
ranks[ *i % 13 + 1 ] = 1;
ranks[0]=ranks[13]; //ace counts also as 1
int count = ranks[0]+ranks[1]+ranks[2]+ranks[3];
for(int i=0; i<=9; i++) {
count += ranks[i+4];
return true;
count -= ranks[i];
return false;
void create_lookup_helper(vector<int>& cards, int idx)
for(;cards[idx]<13;cards[idx]++) {
lookup[hash(cards)] = is_draw_slow(cards);
else {
cards[idx+1] = cards[idx];
void create_lookup()
for(int i=1;i<=MAXCARDS;i++) {
vector<int> cards(i);
//Test for a draw using the lookup table
bool is_draw(const vector<int>& cards)
return lookup[hash(cards)];
int main(int argc, char* argv[])
cout<<lookup.size()<<endl; //497419
int cards1[] = {1,2,3,4};
int cards2[] = {0,1,2,7,12};
int cards3[] = {3,16,29,42,4,17,30,43};
cout << is_draw(vector<int>(cards1,cards1+4)) <<endl; //true
cout << is_draw(vector<int>(cards2,cards2+5)) <<endl; //true
cout << is_draw(vector<int>(cards3,cards3+8)) <<endl; //false
This may be a naive solution, but I am pretty sure it would work, although I am not sure about the perfomance issues.
Assuming again that the cards are represented by the numbers 1 - 13, then if your 4 cards have a numeric range of 3 or 4 (from highest to lowest card rank) and contain no duplicates then you have a possible straight draw.
A range of 3 implies you have an open-ended draw eg 2,3,4,5 has a range of 3 and contains no duplicates.
A range of 4 implies you have a gutshot (as you called it) eg 5,6,8,9 has a range of 4 and contains no duplicates.
Update: per Christian Mann's comment... it can be this:
let's say, A is represented as 1. J as 11, Q as 12, etc.
loop through 1 to 13 as i
if my cards already has this card i, then don't worry about this case, skip to next card
for this card i, look to the left for number of consecutive cards there is
same as above, but look to the right
if count_left_consecutive + count_right_consecutive == 4, then found case
you will need to define the functions to look for the count of left consecutive cards and right consecutive cards... and also handle the case when when looking right consecutive, after K, the A is consecutive.

Interview Question: Find Median From Mega Number Of Integers

There is a file that contains 10G(1000000000) number of integers, please find the Median of these integers. you are given 2G memory to do this. Can anyone come up with an reasonable way? thanks!
Create an array of 8-byte longs that has 2^16 entries. Take your input numbers, shift off the bottom sixteen bits, and create a histogram.
Now you count up in that histogram until you reach the bin that covers the midpoint of the values.
Pass through again, ignoring all numbers that don't have that same set of top bits, and make a histogram of the bottom bits.
Count up through that histogram until you reach the bin that covers the midpoint of the (entire list of) values.
Now you know the median, in O(n) time and O(1) space (in practice, under 1 MB).
Here's some sample Scala code that does this:
def medianFinder(numbers: Iterable[Int]) = {
def midArgMid(a: Array[Long], mid: Long) = {
val cuml = a.scanLeft(0L)(_ + _).drop(1)
cuml.zipWithIndex.dropWhile(_._1 < mid).head
val topHistogram = new Array[Long](65536)
var count = 0L
numbers.foreach(number => {
count += 1
topHistogram(number>>>16) += 1
val (topCount,topIndex) = midArgMid(topHistogram, (count+1)/2)
val botHistogram = new Array[Long](65536)
numbers.foreach(number => {
if ((number>>>16) == topIndex) botHistogram(number & 0xFFFF) += 1
val (botCount,botIndex) =
midArgMid(botHistogram, (count+1)/2 - (topCount-topHistogram(topIndex)))
(topIndex<<16) + botIndex
and here it is working on a small set of input data:
scala> medianFinder(List(1,123,12345,1234567,123456789))
res18: Int = 12345
If you have 64 bit integers stored, you can use the same strategy in 4 passes instead.
You can use the Medians of Medians algorithm.
If the file is in text format, you may be able to fit it in memory just by converting things to integers as you read them in, since an integer stored as characters may take more space than an integer stored as an integer, depending on the size of the integers and the type of text file. EDIT: You edited your original question; I can see now that you can't read them into memory, see below.
If you can't read them into memory, this is what I came up with:
Figure out how many integers you have. You may know this from the start. If not, then it only takes one pass through the file. Let's say this is S.
Use your 2G of memory to find the x largest integers (however many you can fit). You can do one pass through the file, keeping the x largest in a sorted list of some sort, discarding the rest as you go. Now you know the x-th largest integer. You can discard all of these except for the x-th largest, which I'll call x1.
Do another pass through, finding the next x largest integers less than x1, the least of which is x2.
I think you can see where I'm going with this. After a few passes, you will have read in the (S/2)-th largest integer (you'll have to keep track of how many integers you've found), which is your median. If S is even then you'll average the two in the middle.
Make a pass through the file and find count of integers and minimum and maximum integer value.
Take midpoint of min and max, and get count, min and max for values either side of the midpoint - by again reading through the file.
partition count > count => median lies within that partition.
Repeat for the partition, taking into account size of 'partitions to the left' (easy to maintain), and also watching for min = max.
Am sure this'd work for an arbitrary number of partitions as well.
Do an on-disk external mergesort on the file to sort the integers (counting them if that's not already known).
Once the file is sorted, seek to the middle number (odd case), or average the two middle numbers (even case) in the file to get the median.
The amount of memory used is adjustable and unaffected by the number of integers in the original file. One caveat of the external sort is that the intermediate sorting data needs to be written to disk.
Given n = number of integers in the original file:
Running time: O(nlogn)
Memory: O(1), adjustable
Disk: O(n)
Check out Torben's method in here: It also has implementation in C at the bottom of the document.
My best guess that probabilistic median of medians would be the fastest one. Recipe:
Take next set of N integers (N should be big enough, say 1000 or 10000 elements)
Then calculate median of these integers and assign it to variable X_new.
If iteration is not first - calculate median of two medians:
X_global = (X_global + X_new) / 2
When you will see that X_global fluctuates not much - this means that you found approximate median of data.
But there some notes :
question arises - Is median error acceptable or not.
integers must be distributed randomly in a uniform way, for solution to work
I've played a bit with this algorithm, changed a bit idea - in each iteration we should sum X_new with decreasing weight, such as:
X_global = k*X_global + (1.-k)*X_new :
k from [0.5 .. 1.], and increases in each iteration.
Point is to make calculation of median to converge fast to some number in very small amount of iterations. So that very approximate median (with big error) is found between 100000000 array elements in only 252 iterations !!! Check this C experiment:
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#define ARRAY_SIZE 100000000
#define RANGE_SIZE 1000
// probabilistic median of medians method
// should print 5000 as data average
// from ARRAY_SIZE of elements
int main (int argc, const char * argv[]) {
int iter = 0;
int X_global = 0;
int X_new = 0;
int i = 0;
float dk = 0.002;
float k = 0.5;
while (i<ARRAY_SIZE && k!=1.) {
for (int j=i; j<i+RANGE_SIZE; j++) {
X_new+=rand()%10000 + 1;
if (iter>0) {
k += dk;
k = (k>1.)? 1.:k;
X_global = k*X_global+(1.-k)*X_new;
else {
X_global = X_new;
printf("iter %d, median = %d \n",iter,X_global);
return 0;
Opps seems i'm talking about mean, not median. If it is so, and you need exactly median, not mean - ignore my post. In any case mean and median are very related concepts.
Good luck.
Here is the algorithm described by #Rex Kerr implemented in Java.
* Computes the median.
* #param arr Array of strings, each element represents a distinct binary number and has the same number of bits (padded with leading zeroes if necessary)
* #return the median (number of rank ceil((m+1)/2) ) of the array as a string
static String computeMedian(String[] arr) {
// rank of the median element
int m = (int) Math.ceil((arr.length+1)/2.0);
String bitMask = "";
int zeroBin = 0;
while (bitMask.length() < arr[0].length()) {
// puts elements which conform to the bitMask into one of two buckets
for (String curr : arr) {
if (curr.startsWith(bitMask))
if (curr.charAt(bitMask.length()) == '0')
// decides in which bucket the median is located
if (zeroBin >= m)
bitMask = bitMask.concat("0");
else {
m -= zeroBin;
bitMask = bitMask.concat("1");
zeroBin = 0;
return bitMask;
Some test cases and updates to the algorithm can be found here.
I was also asked the same question and i couldn't tell an exact answer so after the interview i went through some books on interviews and here is what i found from Cracking The Coding interview book.
Example: Numbers are randomly generated and stored into an (expanding) array. How
wouldyoukeep track of the median?
Our data structure brainstorm might look like the following:
• Linked list? Probably not. Linked lists tend not to do very well with accessing and
sorting numbers.
• Array? Maybe, but you already have an array. Could you somehow keep the elements
sorted? That's probably expensive. Let's hold off on this and return to it if it's needed.
• Binary tree? This is possible, since binary trees do fairly well with ordering. In fact, if the binary search tree is perfectly balanced, the top might be the median. But, be careful—if there's an even number of elements, the median is actually the average
of the middle two elements. The middle two elements can't both be at the top. This is probably a workable algorithm, but let's come back to it.
• Heap? A heap is really good at basic ordering and keeping track of max and mins.
This is actually interesting—if you had two heaps, you could keep track of the bigger
half and the smaller half of the elements. The bigger half is kept in a min heap, such
that the smallest element in the bigger half is at the root.The smaller half is kept in a
max heap, such that the biggest element of the smaller half is at the root. Now, with
these data structures, you have the potential median elements at the roots. If the
heaps are no longer the same size, you can quickly "rebalance" the heaps by popping
an element off the one heap and pushing it onto the other.
Note that the more problems you do, the more developed your instinct on which data
structure to apply will be. You will also develop a more finely tuned instinct as to which of these approaches is the most useful.

random permutation

I would like to genrate a random permutation as fast as possible.
The problem: The knuth shuffle which is O(n) involves generating n random numbers.
Since generating random numbers is quite expensive.
I would like to find an O(n) function involving a fixed O(1) amount of random numbers.
I realize that this question has been asked before, but I did not see any relevant answers.
Just to stress a point: I am not looking for anything less than O(n), just an algorithm involving less generation of random numbers.
Create a 1-1 mapping of each permutation to a number from 1 to n! (n factorial). Generate a random number in 1 to n!, use the mapping, get the permutation.
For the mapping, perhaps this will be useful:
Of course, this would get out of hand quickly, as n! can become really large soon.
Generating a random number takes long time you say? The implementation of Javas Random.nextInt is roughly
oldseed = seed;
nextseed = (oldseed * multiplier + addend) & mask;
return (int)(nextseed >>> (48 - bits));
Is that too much work to do for each element?
See for a careful analysis of the number of random bits required to generate a random permutation. (It's open-access, but it's not easy reading! Bottom line: if carefully implemented, all of the usual methods for generating random permutations are efficient in their use of random bits.)
And... if your goal is to generate a random permutation rapidly for large N, I'd suggest you try the MergeShuffle algorithm. An article published in 2015 claimed a factor-of-two speedup over Fisher-Yates in both parallel and sequential implementations, and a significant speedup in sequential computations over the other standard algorithm they tested (Rao-Sandelius).
An implementation of MergeShuffle (and of the usual Fisher-Yates and Rao-Sandelius algorithms) is available at But caveat emptor! The authors are theoreticians, not software engineers. They have published their experimental code to github but aren't maintaining it. Someday, I imagine someone (perhaps you!) will add MergeShuffle to GSL. At present gsl_ran_shuffle() is an implementation of Fisher-Yates, see
Not what you asked exactly, but if provided random number generator doesn't satisfy you, may be you should try something different. Generally, pseudorandom number generation can be very simple.
Probably, best-known algorithm
As other answers suggest, you can make a random integer in the range 0 to N! and use it to produce a shuffle. Although theoretically correct, this won't be faster in general since N! grows fast and you'll spend all your time doing bigint arithmetic.
If you want speed and you don't mind trading off some randomness, you will be much better off using a less good random number generator. A linear congruential generator (see will give you a random number in a few cycles.
Usually there is no need in full-range of next random value, so to use exactly the same amount of randomness you can use next approach (which is almost like random(0,N!), I guess):
// ...
m = 1; // range of random buffer (single variant)
r = 0; // random buffer (number zero)
// ...
for(/* ... */) {
while (m < n) { // range of our buffer is too narrow for "n"
r = r*RAND_MAX + random(); // add another random to our random-buffer
m *= RAND_MAX; // update range of random-buffer
x = r % n; // pull-out next random with range "n"
r /= n; // remove it from random-buffer
m /= n; // fix range of random-buffer
// ...
P.S. of course there will be some errors related with division by value different from 2^n, but they will be distributed among resulted samples.
Generate N numbers (N < of the number of random number you need) before to do the computation, or store them in an array as data, with your slow but good random generator; then pick up a number simply incrementing an index into the array inside your computing loop; if you need different seeds, create multiple tables.
Are you sure that your mathematical and algorithmical approach to the problem is correct?
I hit exactly same problem where Fisher–Yates shuffle will be bottleneck in corner cases. But for me the real problem is brute force algorithm that doesn't scale well to all problems. Following story explains the problem and optimizations that I have come up with so far.
Dealing cards for 4 players
Number of possible deals is 96 bit number. That puts quite a stress for random number generator to avoid statical anomalies when selecting play plan from generated sample set of deals. I choose to use 2xmt19937_64 seeded from /dev/random because of the long period and heavy advertisement in web that it is good for scientific simulations.
Simple approach is to use Fisher–Yates shuffle to generate deals and filter out deals that don't match already collected information. Knuth shuffle takes ~1400 CPU cycles per deal mostly because I have to generate 51 random numbers and swap 51 times entries in the table.
That doesn't matter for normal cases where I would only need to generate 10000-100000 deals in 7 minutes. But there is extreme cases when filters may select only very small subset of hands requiring huge number of deals to be generated.
Using single number for multiple cards
When profiling with callgrind (valgrind) I noticed that main slow down was C++ random number generator (after switching away from std::uniform_int_distribution that was first bottleneck).
Then I came up with idea that I can use single random number for multiple cards. The idea is to use least significant information from the number first and then erase that information.
int number = uniform_rng(0, 52*51*50*49);
int card1 = number % 52;
number /= 52;
int cards2 = number % 51;
number /= 51;
Of course that is only minor optimization because generation is still O(N).
Generation using bit permutations
Next idea was exactly solution asked in here but I ended up still with O(N) but with larger cost than original shuffle. But lets look into solution and why it fails so miserably.
I decided to use idea Dealing All the Deals by John Christman
void Deal::generate()
// 52:26 split, 52!/(26!)**2 = 495,918,532,948,1041
max = 495918532948104LU;
partner = uniform_rng(eng1, max);
// 2x 26:13 splits, (26!)**2/(13!)**2 = 10,400,600**2
max = 10400600LU*10400600LU;
hands = uniform_rng(eng2, max);
// Create 104 bit presentation of deal (2 bits per card)
select_deal(id, partner, hands);
So far good and pretty good looking but select_deal implementation is PITA.
void select_deal(Id &new_id, uint64_t partner, uint64_t hands)
unsigned idx;
unsigned e, n, ns = 26;
e = n = 13;
// Figure out partnership who owns which card
for (idx = CARDS_IN_SUIT*NUM_SUITS; idx > 0; ) {
uint64_t cut = ncr(idx - 1, ns);
if (partner >= cut) {
partner -= cut;
// Figure out if N or S holds the card
cut = ncr(ns, n) * 10400600LU;
if (hands > cut) {
hands -= cut;
} else
new_id[idx%NUM_SUITS] |= 1 << (idx/NUM_SUITS);
} else
new_id[idx%NUM_SUITS + NUM_SUITS] |= 1 << (idx/NUM_SUITS);
unsigned ew = 26;
// Figure out if E or W holds a card
for (idx = CARDS_IN_SUIT*NUM_SUITS; idx-- > 0; ) {
if (new_id[idx%NUM_SUITS + NUM_SUITS] & (1 << (idx/NUM_SUITS))) {
uint64_t cut = ncr(--ew, e);
if (hands >= cut) {
hands -= cut;
} else
new_id[idx%NUM_SUITS] |= 1 << (idx/NUM_SUITS);
Now that I had the O(N) permutation solution done to prove algorithm could work I started searching for O(1) mapping from random number to bit permutation. Too bad it looks like only solution would be using huge lookup tables that would kill CPU caches. That doesn't sound good idea for AI that will be using very large amount of caches for double dummy analyzer.
Mathematical solution
After all hard work to figure out how to generate random bit permutations I decided go back to maths. It is entirely possible to apply filters before dealing cards. That requires splitting deals to manageable number of layered sets and selecting between sets based on their relative probabilities after filtering out impossible sets.
I don't yet have code ready for that to tests how much cycles I'm wasting in common case where filter is selecting major part of deal. But I believe this approach gives the most stable generation performance keeping the cost less than 0.1%.
Generate a 32 bit integer. For each index i (maybe only up to half the number of elements in the array), if bit i % 32 is 1, swap i with n - i - 1.
Of course, this might not be random enough for your purposes. You could probably improve this by not swapping with n - i - 1, but rather by another function applied to n and i that gives better distribution. You could even use two functions: one for when the bit is 0 and another for when it's 1.
