Monte carlo on GPU - random

Today I had a talk with a friend of mine told me he tries to make some monte carlo simulations using GPU. What was interesting he told me that he wanted to draw numbers randomly on different processors and assumed that there were uncorrelated. But they were not.
The question is, whether there exists a method to draw independent sets of numbers on several GPUs? He thought that taking a different seed for each of them would solve the problem, but it does not.
If any clarifications are need please let me know, I will ask him to provide more details.

To generate completely independent random numbers, you need to use a parallel random number generator. Essentially, you choose a single seed and it generates M independent random number streams. So on each of the M GPUs you could then generate random numbers from independent streams.
When dealing with multiple GPUs you need to be aware that you want:
independent streams within GPUs (if RNs are generate by each GPU)
independent streams between GPUs.
It turns out that generating random numbers on each GPU core is tricky (see this question I asked a while back). When I've been playing about with GPUs and RNs, you only get a speed-up generating random on the GPU if you generate large numbers at once.
Instead, I would generate random numbers on the CPU, since:
It's easier and sometimes quicker to generate them on the CPU and transfer across.
You can use well tested parallel random number generators
The types of off-the shelf random number generators available for GPUs is very limited.
Current GPU random number libraries only generate RNs from a small number of distributions.
To answer your question in the comments: What do random numbers depend on?
A very basic random number generator is the linear congruential generator. Although this generator has been surpassed by newer methods, it should give you an idea of how they work. Basically, the ith random number depends on the (i-1) random number. As you point out, if you run two streams long enough, they will overlap. The big problem is, you don't know when they will overlap.

For generating iid uniform variables, you just have to initialize your generators with differents seeds. With Cuda, you may use the NVIDIA Curand Library which implements the Mersenne Twister generator.
For example, the following code executed by 100 kernels in parallel, will draw 10 sample of a (R^10)-uniform
__global__ void setup_kernel(curandState *state,int pseed)
{
int id = blockIdx.x * blockDim.x + threadIdx.x;
int seed = id%10+pseed;
/* 10 differents seed for uncorrelated rv,
a different sequence number, no offset */
curand_init(seed, id, 0, &state[id]);
}

If you take any ``good'' generator (e.g. Mersenne Twister etc), two sequences with different random seeds will be uncorrelated, be it on GPU or CPU. Hence I'm not sure what you mean by saying taking different seeds on different GPUs were not enough. Would you elaborate?

Related

Non-recursive random number generator

I have searched for pseudo-RNG algorithms but all I can find seem to generate the next number by using the previous result as seed. Is there a way to generate them non-recursively?
The scenario where I need this is during OpenCL concurrent programming, each thread/pixel needs an independent RNG. I tried to seed them using BIG_NUMBER + work_id, but the result has a strong visual pattern in it. I tried several different RNG algorithms and all have this problem. Apparently they only guarantee the numbers are independent if you generate recursively, but not when you use sequential numbers as seed.
So my question is: Can I generate an array of random numbers, from an array of sequential numbers, independently and constant time for each number? Or is it mathematically impossible?
As the solution to my openCL problem, I can just pre-generate a huge array of random numbers recursively first and store in GPU memory, then use them later using seed as index. But I'm curious about the question above, because it seems very possible just by doing bunch of overflow and cutoffs, according to my very simple understanding of chaos theory.
Can I generate an array of random numbers, from an array of sequential numbers, independently and constant time for each number? Or is it mathematically impossible?
Sure you can - use block cipher in countnig mode. It is generally known as Counter based RNG, first widely used one was Fortuna RNG

C++ random_shuffle() how does it work?

I have a Deck vector with 52 Card, and I want to shuffle it.
vector<Card^> cards;
So I used this:
random_shuffle(cards.begin(), cards.end());
The problem was that it gave me the same result every time, so I used srand to randomize it:
srand(unsigned(time(NULL)));
random_shuffle(cards.begin(),cards.end());
This was still not truly random. When I started dealing cards, it was the same as in the last run. For example: "1. deal: A,6,3,2,K; 2. deal: Q,8,4,J,2", and when I restarted the program I got exactly the same order of deals.
Then I used srand() and random_shuffle with its 3rd parameter:
int myrandom (int i) {
return std::rand()%i;
}
srand(unsigned(time(NULL)));
random_shuffle(cards.begin(),cards.end(), myrandom);
Now it's working and always gives me different results on re-runs, but I don't know why it works this way. How do these functions work, what did I do here?
This answer required some investigation, looking at the C++ Standard Library headers in VC++ and looking at the C++ standard itself. I knew what the standard said, but I was curious about VC++ (including C++CLI) did their implementation.
First what does the standard say about std::random_shuffle . We can find that here. In particular it says:
Reorders the elements in the given range [first, last) such that each possible permutation of those elements has equal probability of appearance.
1) The random number generator is implementation-defined, but the function std::rand is often used.
The bolded part is key. The standard says that the RNG can be implementation specific (so results across different compilers will vary). The standard suggests that std::rand is often used. But this isn't a requirement. So if an implementation doesn't use std::rand then it follows that it likely won't use std::srand for a starting seed. An interesting footnote is that the std::random_shuffle functions are deprecated as of C++14. However std::shuffle remains. My guess is that since std::shuffle requires you to provide a function object you are explicitly defining the behavior you want when generating random numbers, and that is an advantage over the older std::random_shuffle.
I took my VS2013 and looked at the C++ standard library headers and discovered that <algorithm> uses template class that uses a completely different pseudo-rng (PRNG) than std::rand with an index (seed) set to zero. Although this may vary in detail between different versions of VC++ (including C++/CLI) I think it is probable that most versions of VC++/CLI do something similar. This would explain why each time you run your application you get the same shuffled decks.
The option I would opt for if I am looking for a Pseudo RNG and I'm not doing cryptography is to use something well established like Mersenne Twister:
Advantages The commonly-used version of Mersenne Twister, MT19937, which produces a sequence of 32-bit integers, has the following desirable properties:
It has a very long period of 2^19937 − 1. While a long period is not a guarantee of quality in a random number generator, short periods (such as the 2^32 common in many older software packages) can be problematic.
It is k-distributed to 32-bit accuracy for every 1 ≤ k ≤ 623 (see definition below).
It passes numerous tests for statistical randomness, including the Diehard tests.
Luckily for us C++11 Standard Library (which I believe should work on VS2010 and later C++/CLI) includes a Mersenne Twister function object that can be used with std::shuffle Please see this C++ documentation for more details. The C++ Standard Library reference provided earlier actually contains code that does this:
std::random_device rd;
std::mt19937 g(rd());
std::shuffle(v.begin(), v.end(), g);
The thing to note is that std::random_device produces non-deterministic (non repeatable) unsigned integers. We need non-deterministic data if we want to seed our Mersenne Twister (std::mt19937) PRNG with. This is similar in concept to seeding rand with srand(time(NULL)) (The latter not being an overly good source of randomness).
This looks all well and good but has one disadvantage when dealing with card shuffling. An unsigned integer on the Windows platform is 4 bytes (32 bits) and can store 2^32 values. This means there are only 4,294,967,296 possible starting points (seeds) therefore only that many ways to shuffle the deck. The problem is that there are 52! (52 factorial) ways to shuffle a standard 52 card deck. That happens to be 80658175170943878571660636856403766975289505440883277824000000000000 ways, which is far bigger than the number of unique ways we can get from setting a 32-bit seed.
Thankfully, Mersenne Twister can accept seeds between 0 and 2^19937-1. 52! is a big number but all combinations can be represented with a seed of 226 bits (or ~29 bytes). The Standard Library allow std::mt19937 to accept a seed up to 2^19937-1 (~624 bytes of data) if we so choose. But since we need only 226 bits the following code would allow us to create 29 bytes of non-deterministic data to be used as a suitable seed for std::mt19937:
// rd is an array to hold 29 bytes of seed data which covers the 226 bits we need */
std::array<unsigned char, 29> seed_data;
std::random_device rd;
std::generate_n(seed_data.data(), seed_data.size(), std::ref(rd));
std::seed_seq seq(std::begin(seed_data), std::end(seed_data));
// Set the seed for Mersenne *using the 29 byte sequence*
std::mt19937 g(seq);
Then all you need to do is call shuffle with code like:
std::shuffle(cards.begin(),cards.end(), g);
On Windows VC++/CLI you will get a warning that you'll want to suppress with the code above. So at the top of the file (before other includes) you can add this:
#define _SCL_SECURE_NO_WARNINGS 1

Non-recursive pseudo-random number generation with discrete uniform distribution

I wonder is there any cheap and effective function to generate pseudo-random numbers by their indices? With something like that implementation:
var rand = new PseudoRandom(seed); // all sequences for same seeds are equal
trace(rand.get(index1)); // get int number by index1, for example =0x12345678
trace(rand.get(index2));
...
trace(rand.get(index1)); // must return the SAME number, =0x12345678
Probably it isn't about randomness but about good (fast and close as much as possible to uniform distribution) hashing where initial seed used as salt.
You could build such random number generator out of the stream cipher Salsa20. One of the nice features of Salsa20 is that you can jump ahead to any offset very cheaply. And Salsa20 is fast, typically less than 20 cycles per byte. Since the cipher is indistinguishable from a truly random stream, uniformity should be excellent.
Since you probably don't need cryptographically secure random numbers, you could even reduce the number of rounds to something like 8 instead of the usual 20 rounds.
Another option is to just use the ideas behind Salsa20, how to mix up a state array (Bernstein calls that a hashing function), to build your own random number generator.

I'm looking for a good psuedo random number generator, that takes two inputs instead of one

I'm looking for a determenistic psuedo random generator that takes two inputs and always returns the same output. I'm looking for things like uniform distribution, unpredictable as possible, and doesn't repeat for a long long time. Ideally the function doesn't rely on previous values. The reason that is a problem is I'm generating terrain data for an extremely large procedurely generated world and can't afford to store previous values.
Any help is appreciated.
i think what you're looking for is perlin noise - it's a way of generating "random" values in 2d (typically) that look like terrain / clouds / etc.
note that this doesn't have much to do with cryptography etc, but a "real" random number source is probably not what you want for synthetic terrain (it looks too noisy/spikey).
there's a good article on perlin noise here.
the implementation of perlin noise does use a source of random numbers, but typically you can use whatever is present on your system (starting with a known seed if you want to reproduce it later).
Is the problem deciding on a PRNG algorithm to use or an algorithm that accepts 2 inputs?
If it's the former, why not use the built in random class - such as Random class in .NET - since it strives for uniform distribution and long cycles. Also, given the same seed it will generate the same sequence of numbers.
If it's the latter, what you can do is map the 2 inputs to a single ouput and use that as a seed to your random algorithm. You can define a simple hash function that takes a string and calculates an integer from it:
s[0] + s[1]^1 + s[2]^2 + ... s[n]^n = seed
Combination of two inputs (by concatenating each other, provided the inputs are binary integers) into one seed will do, for a PRNG, such as Mersenne Twister.

How exactly does PC/Mac generates random numbers for either 0 or 1?

This question is NOT about how to use any language to generate a random number between any interval. It is about generating either 0 or 1.
I understand that many random generator algorithm manipulate the very basic random(0 or 1) function and take seed from users and use an algorithm to generate various random numbers as needed.
The question is that how the CPU generate either 0 or 1? If I throw a coin, I can generate head or tailer. That's because I physically throw a coin and let the nature decide. But how does CPU do it? There must be an action that the CPU does (like throwing a coin) to get either 0 or 1 randomly, right?
Could anyone tell me about it?
Thanks
(This has several facets and thus several algorithms. Keep in mind that there are many different forms of randomness used for different purposes, but I understand your question in the way that you are interested in actual randomness used for cryptography.)
The fundamental problem here is that computers are (mostly) deterministic machines. Given the same input in the same state they always yield the same result. However, there are a few ways of actually gathering entropy:
User input. Since users bring outside input into the system you can take that to derive some bits from that. Similar to how you could use radioactive decay or line noise.
Network activity. Again, an outside source of stuff.
Generally interrupts (which kinda include the first two).
As alluded to in the first item, noise from peripherals, such as audio input or a webcam can be used.
There is dedicated hardware that can generate a few hundred MiB of randomness per second. Usually they give you random numbers directly instead of their internal entropy, though.
How exactly you derive bits from that is up to you but you could use time between events, or actual content from the events, etc. – generally eliminating bias from entropy sources isn't easy or trivial and a lot of thought and algorithmic work goes into that (in the case of the aforementioned special hardware this is all done in hardware and the code using it doesn't need to care about it).
Once you have a pool of actually random bits you can just use them as random numbers (/dev/random on Linux does that). But this has downsides, since there is usually little actual entropy and possibly a higher demand for random numbers. So you can invent algorithms to “stretch” that initial randomness in a manner that makes it still impossible or at least very difficult to predict anything about following numbers (/dev/urandom on Linux or both /dev/random and /dev/urandom on FreeBSD do that). Fortuna and Yarrow are so-called cryptographically secure pseudo-random number generators and designed with that in mind. You still have a very good guarantee about the quality of random numbers you generate, but have many more before your entropy pool runs out.
In any case, the CPU itself cannot give you a random 0 or 1. There's a lot more involved and this usually includes the complete computer system or special hardware built for that purpose.
There is also a second class of computational randomness: Plain vanilla pseudo-random number generators (PRNGs). What I said earlier about determinism – this is the embodiment of it. Given the same so-called seed a PRNG will yield the exact same sequence of numbers every time¹. While this sounds idiotic it has practical benefits.
Suppose you run a simulation involving lots of random numbers, maybe to simulate interaction between molecules or atoms that involve certain probabilities and unpredictable behaviour. In science you want results anyone can independently verify, given the same setup and procedure (or, with computing, the same algorithms). If you used actual randomness the only option you have would be to save every single random number used to make sure others can replicate the results independently.
But with a PRNG all you need to save is the seed and remember what algorithm you used. Others can then get the exact same sequence of pseudo-random numbers independently. Very nice property to have :-)
Footnotes
¹ This even includes the CSPRNGs mentioned above, but they are designed to be used in a special way that includes regular re-seeding with entropy to overcome that problem.
A CPU can only generate a uniform random number, U(0,1), which happens to range from 0 to 1. So mathematically, it would be defined as a random variable U in the range [0,1]. Examples of random draws of a U(0,1) random number in the range 0 to 1 would be 0.28100002, 0.34522, 0.7921, etc. The probability of any value between 0 and 1 is equal, i.e., they are equiprobable.
You can generate binary random variates that are either 0 or 1 by setting a random draw of U(0,1) to a 0 if U(0,1)<=0.5 and 1 if U(0,1)>0.5, since in theory there will be an equal number of random draws of U(0,1) below 0.5 and above 0.5.

Resources