Random Number Generator in CUDA

Random Number Generator in CUDA - random

I've struggled with this all day, I am trying to get a random number generator for threads in my CUDA code. I have looked through all forums and yes this topic comes up a fair bit but I've spent hours trying to unravel all sorts of codes to no avail. If anyone knows of a simple method, probably a device kernel that can be called to returns a random float between 0 and 1, or an integer that I can transform I would be most grateful.
Again, I hope to use the random number in the kernel, just like rand() for instance.
Thanks in advance

For anyone interested, you can now do it via cuRAND.

I'm not sure I understand why you need anything special. Any traditional PRNG should port more or less directly. A linear congruential should work fine. Do you have some special properties you're trying to establish?

The best way for this is writing your own device function , here is the one
void RNG()
{
unsigned int m_w = 150;
unsigned int m_z = 40;
for(int i=0; i < 100; i++)
{
m_z = 36969 * (m_z & 65535) + (m_z >> 16);
m_w = 18000 * (m_w & 65535) + (m_w >> 16);
cout <<(m_z << 16) + m_w << endl; /* 32-bit result */
}
}
It'll give you 100 random numbers with 32 bit result.
If you want some random numbers between 1 and 1000, you can also take the result%1000, either at the point of consumption, or at the point of generation:
((m_z << 16) + m_w)%1000
Changing m_w and m_z starting values (in the example, 150 and 40) allows you to get a different results each time. You can use threadIdx.x as one of them, which should give you different pseudorandom series each time.
I wanted to add that it works 2 time faster than rand() function, and works great ;)

I think any discussion of this question needs to answer Zenna's orginal request and that is for a thread level implementation. Specifically a device function that can be called from within a kernel or thread. Sorry if I overdid the "in bold" phrases but I really think the answers so far are not quite addressing what is being sought here.
The cuRAND library is your best bet. I appreciate that people are wanting to reinvent the wheel (it makes one appreciate and more properly use 3rd party libraries) but high performance high quality number generators are plentiful and well tested. The best info I can recommend is on the documentation for the GSL library on the different generators here:http://www.gnu.org/software/gsl/manual/html_node/Random-number-generator-algorithms.html
For any serious code it is best to use one of the main algorithms that mathematicians/computer-scientists have into the ground over and over looking for systemic weaknesses. The "mersenne twister" is something with a period (repeat loop) on the order of 10^6000 (MT19997 algorithm means "Mersenne Twister 2^19997") that has been especially adapted for Nvidia to use at a thread level within threads of the same warp using thread id calls as seeds. See paper here:http://developer.download.nvidia.com/compute/cuda/2_2/sdk/website/projects/MersenneTwister/doc/MersenneTwister.pdf. I am actually working to implement somehting using this library and IF I get it to work properly I will post my code. Nvidia has some examples at their documentation site for the current CUDA toolkit.
NOTE: Just for the record I do not work for Nvidia, but I will admit their documentation and abstraction design for CUDA is something I have so far been impressed with.

Depending on your application you should be wary of using LCGs without considering whether the streams (one stream per thread) will overlap. You could implement a leapfrog with LCG, but then you would need to have a sufficiently long period LCG to ensure that the sequence doesn't repeat.
An example leapfrog could be:
template <typename ValueType>
__device__ void leapfrog(unsigned long &a, unsigned long &c, int leap)
{
unsigned long an = a;
for (int i = 1 ; i < leap ; i++)
an *= a;
c = c * ((an - 1) / (a - 1));
a = an;
}
template <typename ValueType>
__device__ ValueType quickrand(unsigned long &seed, const unsigned long a, const unsigned long c)
{
seed = seed * a;
return seed;
}
template <typename ValueType>
__global__ void mykernel(
unsigned long *d_seeds)
{
// RNG parameters
unsigned long a = 1664525L;
unsigned long c = 1013904223L;
unsigned long ainit = a;
unsigned long cinit = c;
unsigned long seed;
// Generate local seed
seed = d_seeds[bid];
leapfrog<ValueType>(ainit, cinit, tid);
quickrand<ValueType>(seed, ainit, cinit);
leapfrog<ValueType>(a, c, blockDim.x);
...
}
But then the period of that generator is probably insufficient in most cases.
To be honest, I'd look at using a third party library such as NAG. There are some batch generators in the SDK too, but that's probably not what you're looking for in this case.
EDIT
Since this just got up-voted, I figure it's worth updating to mention that cuRAND, as mentioned by more recent answers to this question, is available and provides a number of generators and distributions. That's definitely the easiest place to start.

There's an MDGPU package (GPL) which includes an implementation of the GNU rand48() function for CUDA here.
I found it (quite easily, using Google, which I assume you tried :-) on the NVidia forums here.

I haven't found a good parallel number generator for CUDA, however I did find a parallel random number generator based on academic research here: http://sprng.cs.fsu.edu/

You could try out Mersenne Twister for GPUs
It is based on SIMD-oriented Fast Mersenne Twister (SFMT) which is a quite fast and reliable random number generator. It passes Marsaglias DIEHARD tests for Random Number Generators.

In case you're using cuda.jit in Numba for Python, this Random number generator is useful.

Related

cuRAND performs much worse than thrust when generating random numbers inside CUDA kernels

I am trying to generate "random" numbers from a uniform distribution inside a CUDA __global__ kernel using two different approaches. The first is using the cuRAND device API, and the second is using thrust. For each approach I have created a different class.
Here is my cuRAND solution:
template<typename T>
struct RNG1
{
__device__
RNG1(unsigned int tid) {
curand_init(tid, tid, 0, &state);
}
__device__ T
operator ()(void) {
return curand_uniform(&state);
}
curandState state;
};
And here is my thrust solution:
template<typename T>
struct RNG2
{
__device__
RNG2(unsigned int tid)
: gen(tid)
, dis(0, 1) { gen.discard(tid); }
__device__ T
operator ()(void) {
return dis(gen);
}
thrust::default_random_engine gen;
thrust::uniform_real_distribution<T> dis;
};
The way I use them is the following:
template<typename T> __global__ void
mykernel(/* args here */)
{
unsigned int tid = blockIdx.x * blockDim.x + threadIdx.x;
RNG1<T> rng(tid);
// or
RNG2<T> rng(tid);
T a_random_number = rng();
// do stuff here
}
Both of them work but the cuRAND solution is much slower (more than 3 times slower). If I set the second parameter of curand_init (sequence number) to 0, then the performance is the same as that of the thrust solution, but the random numbers are "bad". I can see patterns and artefacts in the resulting distribution.
Here are my two questions:
Can someone explain to me why the cuRAND solution with a non-zero sequence is slower?
How can thrust be as fast as cuRAND with zero sequence, but also generate good random numbers?
While searching on Google, I noticed that most people use cuRAND, and very few use thrust to generate random numbers inside device code. Is there something I should be aware of? Am I misusing thrust?
Thank you.

Perhaps the performance difference happens because cuRAND and Thrust use different PRNG algorithms with different performance profiles and demands on memory. Note that cuRAND supports five different PRNG algorithms, and your code doesn't give which one is in use.
Thrust's default_random_engine is currently minstd_rand, but its documentation notes that this "may change in a future version". (A comment written after I wrote mine also noted that it's minstd_rand.) minstd_rand is a simple linear congruential generator that may be faster than whatever PRNG cuRAND is using.
This was a comment converted to an answer and edited.

Is it possible to generate a pseudo-random number without using a language's standard library or external OS assistance?

This is a language agnostic question. I'm curious if it's possible to generate a pseudo-random number without:
1. Using the language's builtin/stdlib random functions
2. Using the current time.
3. Getting assistance from the OS, i.e. reading from /dev/random on *nix platforms.
I realize these are some far-out artificial constraints. Also, I want to point out that I'm not concerned about how properly random the result is. Seemingly random would be good enough for the purpose of this question. It's not for cryptographic application.

There are plenty of algorithms for generating pseudo-random numbers that can be implemented without using built in random functions etc.
E.g. (from Wikipedia):
An example of a simple pseudo-random number generator is the
multiply-with-carry method invented by George Marsaglia. It is
computationally fast and has good (albeit not cryptographically
strong) randomness properties:
m_w = <choose-initializer>; /* must not be zero, nor 0x464fffff */
m_z = <choose-initializer>; /* must not be zero, nor 0x9068ffff */
uint get_random()
{
m_z = 36969 * (m_z & 65535) + (m_z >> 16);
m_w = 18000 * (m_w & 65535) + (m_w >> 16);
return (m_z << 16) + m_w; /* 32-bit result */
}
If you don't want to use the clock as the source of your random seed, you could make it a function of the state of your machine (e.g. some hash of memory state).

coding with vectors using the Accelerate framework

I'm playing around with the Accelerate framework for the first time with the goal of implementing some vectorized code into an iOS application. I've never tried to do anything with respect to working with vectors in Objective C or C. Having some experience with MATLAB, I wonder if using Accelerate is indeed that much more of a pain. Suppose I'd want to calculate the following:
b = 4*(sin(a/2))^2 where a and b are vectors.
MATLAB code:
a = 1:4;
b = 4*(sin(a/2)).^2;
However, as I see it after some spitting through the documentation, things are quite different using Accelerate.
My C implementation:
float a[4] = {1,2,3,4}; //define a
int len = 4;
float div = 2; //define 2
float a2[len]; //define intermediate result 1
vDSP_vsdiv(a, 1, &div, a2, 1, len); //divide
float sinResult[len]; //define intermediate result 2
vvsinf(sinResult, a2, &len); //take sine
float sqResult[len]; //square the result
vDSP_vsq(sinResult, 1, sqResult, 1, len); //take square
float factor = 4; //multiply all this by four
float b[len]; //define answer vector
vDSP_vsmul(sqResult, 1, &factor, b, 1, len); //multiply
//unset all variables I didn't actually need
Honestly, I don't know what's worst here: keeping track of all intermediate steps, trying to memorize how the arguments are passed in vDSP with respect to VecLib (quite different), or that it takes so much time doing something quite trivial.
I really hope I am missing something here and that most steps can be merged or shortened. Any recommendations on coding resources, good coding habits (learned the hard way or from a book), etc. would be very welcome! How do you all deal with multiple lines of vector calculations?

I guess you could write it that way, but it seems awfully complicated to me. I like this better (intel-specific, but can easily be abstracted for other architectures):
#include <Accelerate/Accelerate.h>
#include <immintrin.h>
const __m128 a = {1,2,3,4};
const __m128 sina2 = vsinf(a*_mm_set1_ps(0.5));
const __m128 b = _mm_set1_ps(4)*sina2*sina2;
Also, just to be pedantic, what you're doing here is not linear algebra. Linear algebra involves only linear operations (no squaring, no transcendental operations like sin).
Edit: as you noted, the above won't quite work out of the box on iOS; the biggest issue is that there is no vsinf (vMathLib is not available in Accelerate on iOS). I don't have the SDK installed on my machine to test, but I believe that something like the following should work:
#include <Accelerate/Accelerate.h>
const vFloat a = {1, 2, 3, 4};
const vFloat a2 = a*(vFloat){0.5,0.5,0.5,0.5};
const int n = 4;
vFloat sina2;
vvsinf((float *)&sina2, (const float *)&a, &n);
const vFloat b = sina2*sina2*(vFloat){4,4,4,4};
Not quite as pretty as what is possible with vMathLib, but still fairly compact.
In general, a lot of basic arithmetic operations on vectors just work; there's no need to use calls to any library, which is why Accelerate doesn't go out of its way to supply those operations cleanly. Instead, Accelerate usually tries to provide operations that aren't immediately available by other means.

To answer my own question:
In iOS 6, vMathLib will be introduced. As Stephen clarified, vMathLib could already be used on OSX, but it was not available in iOS. Until now.
The functions that vMathLib provides will allow for easier vector calculations.

Using the boost random number generator with OpenMP

I would like to parallelize my boost random number generator code in C++ with OpenMP. I'd like to do it in way that is both efficient and thread safe. Can someone give me pointers on how this is done? I am currently enclosing what I have below; this is clearly not thread safe since the static variable in the sampleNormal function is likely to give
a race condition. The number of samples (nsamples) is much bigger than n.
#pragma omp parallel for private(i,j)
for (i = 0; i < nsamples; i++) {
for (j = 0; j < n; j++) {
randomMatrix[i + nsamples*j] = SampleNormal(0.0, 1.0);
}
}
double SampleNormal (double mean, double sigma)
{
// Create a Mersenne twister random number generator
static mt19937 rng(static_cast<unsigned> (std::time(0)));
// select Gaussian probability distribution
normal_distribution<double> norm_dist(mean, sigma);
// bind random number generator to distribution
variate_generator<mt19937&, normal_distribution<double> > normal_sampler(rng, norm_dist);
// sample from the distribution
return normal_sampler();
}

Do you just need something that's thread-safe or something that scales well? If you don't need very high performance in your PRNG, you can just wrap a lock around uses of the rng object. For higher performance, you need to find or write a parallel pseudorandom number generator -- http://www.cs.berkeley.edu/~mhoemmen/cs194/Tutorials/prng.pdf has a tutorial on them. One option would be to put your mt19937 objects in thread-local storage, making sure to seed different threads with different seeds; that makes reproducing the same results in different runs difficult, if that's important to you.

"find or write a parallel pseudorandom number generator" use TRNG "TINAS random number generator". Its a parallel random number generator library designed to be run on multicore clusters. Much better than Boost. There's an introduction here http://www.lindonslog.com/programming/parallel-random-number-generation-trng/

How to manually generate random numbers [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to generate random numbers manually. I know that every language have the rand or random function, but I'm curious to know how this is working.
Does anyone have code for that?

POSIX.1-2001 gives the following example of an implementation of rand() and srand(), possibly useful when one needs the same sequence on two different machines.
static unsigned long next = 1;
/* RAND_MAX assumed to be 32767 */
int myrand(void) {
next = next * 1103515245 + 12345;
return((unsigned)(next/65536) % 32768);
}
void mysrand(unsigned seed) {
next = seed;
}

Have a look at the following:
Random Number Generation
Linear Congruential Generator - a popular approach also used in Java
List of Random Number Generators
And here's another link which elaborates on the use of LCG in Java's Random class

static void Main()
{
DateTime currentTime = DateTime.Now;
int maxValue = 100;
int hour = currentTime.Hour;
int minute = currentTime.Minute;
int second = currentTime.Second;
int milisecond = currentTime.Millisecond;
int randNum = (((hour + 1) * (minute + 1) * (second + 1) * milisecond) % maxValue);
Console.WriteLine(randNum);
Console.ReadLine();
}
Above shows a very simple piece of code to generate random numbers. It is a console program written in C#. If you know any kind of basic programming this should be understandable and easy to convert to any other language desired.
The DateTime simply takes in a current date and time, most programming languages have a facility to do this.
The hour, minute, second and milisecond variables break the date time value it up into its component parts. We are only interested in these parts so can ignore day. Again, in most languages dates and times are usually presented as strings. In .Net we have facilities that allow us to parse this information easily. But in most other languages where times are presented as strings, its is not overly difficult to parse the string for the parts that you want and convert them to their numbers. These facilities are usually provided even in the oldest of languages.
The seed essentially gives us a starting number which always changes. Traditionally you would just multiply this number by a decimal value between 0 and 1 this cuts out that step.
The upperRange defines the maximum value. So the number generated will never be above this value. Also it will never be below 0. So no ngeatives. But if you want negatives you could just negate it manually. (by multiplying it by -1)
The actual variable randNumis what holds the random value you are interested in.
The trick is to get the remainder (the modulus) after dividing the seed by the upper range. The remainder will always be smaller than the divisor which in this case is 100. Simple maths tells you that you cant have a remainder greater than the divisor. So if you divide by 10 you cant have a remainder greater than 10. It is this simple law that gets us our random number between 0 and 100 in this case.
The console.writeline simply outputs it to the screen.
The console.readline simply pauses the program so you can see it.
This is a very simple piece of code to generate random numbers. If you ran this program at the exact same intervil every day (but you would have to do it at the same hour, minute, second and milisecond) for more than 1 day you would begin to generate the same set of numbers again and again each additional day. This is because it is tied to the time. That is the resolution of the generator. So if you know the code of this program, and the time it is run at, you can predict the number generated, but it wont be easy. That is why I used miliseconds. Use seconds or minutes only to see what I mean. So you could write a table showing when 1 goes in, 0 comes out, when 2 goes in 0 comes out and so on. You could then predict the output for every second, and the range of numbers generated. The more you increase the resolution (by increasing the numbers that change) the harder it is and the longer it takes to get a predictable pattern. This method is good enough for most peoples use.
That is the old school way of doing random number generation for basic games. It needed to be fast, and simple. It is. This also highlights exactly why, random numbers genaerators are not really random but psudo random.
I hope this is a reasonable answer to your question.

I assume you mean pseudo-random numbers. The simplest one I know (from writing videogames games back on old machines) worked like this:
seed=seed*5+1;
You do that every time random is called and then you use however many low bits you want. *5+1 has the nice property (IIRC) of hitting every possibility before repeating, no matter how many bits you are looking at.
The downside, of course, is its predictability. But that didn't matter in the games. We were grabbing random numbers like crazy for all sorts of things, and you'd never know what number was coming next.
Do a couple things like this in parallel, and combine the results. This is a linear congruential generator.

http://en.wikipedia.org/wiki/Random_number_generator
Describes the different types of random number generators and how they are created.

Aloha!
By manually do you mean "not using computer" or "write my own code"?
IF it is not using computer you can use things like dice, numbers in a bag and all those methods seen on telly when they select teams, winning Bingo series etc. Las Vegas is filled with these kinds of method used in processes (games) aimed at giving you bad odds and ROI. You can also get the great RAND book and turn to a randomly selected page:
http://www.amazon.com/Million-Random-Digits-Normal-Deviates/dp/0833030477
(Also, for some amusement, read the reviews)
For writing your own code you need to consider why not using the system provided RNG is not good enough. If you are using a modern OS it will have a RNG available for user programs that should be good enough for your application.
If you really need to implement your own there are a huge bunch of generators available. For non security usage you can look at LFSR chains, Congruential generators etc. Whatever the distribution you need (uniform, normal, exponential etc) you should be able to find algorithm descriptions and libraries with implementations.
For security usage you should look at things like Yarrow/Fortuna the NIST SP 800-89 specified PRNGs and RFC 4086 for good entropy sources needed to feed the PRNG. Or even better, use the one in the OS that should meet security RNG requirements.
Implementation of RNGs can be a fun exercise, but is very rarely needed. And don't invent your own algorithm unless it is for toy applications. Do NOT, repeat NOT invent RNGs for security applications (generating cryptographic keys for example), at least unless you do some seripus reading and investigation. You will thank me for it (I hope).

hopefuly im not redundant because i havent read all the links, but i believe you can get pretty close to true random generator. nowadays systems are often so complex that even the best geeks around need a lot of time to understand whats happening inside :) just open your mind and think if you can monitor some global system property, use it to seed to ... pick a network packet (not intended for you?) and compute "something" out of its content and use it to seed to ... etc. you can design the best for your needs with all those hints around ;)

The Mersenne twister has a very long period (2^19937-1).
Here's a very basic implementation in C++:
struct MT{
unsigned int *mt, k, g;
~MT(){ delete mt; }
MT(unsigned int seed) : mt(new unsigned int[624]), k(0), g(0){
for (int i=0; i<624; i++)
mt[i]=!i?seed:(1812433253U*(mt[i-1]^(mt[i-1]>>30))+i);
}
unsigned int operator()(){
unsigned int q=(mt[k]&0x80000000U)|(mt[(k+1)%624]&0x7fffffffU);
mt[k]=mt[(k+397)%624]^(q>>1)^((q&1)?0x9908b0dfU:0);
unsigned int y = mt[k];
y ^= (y >> 11);
y ^= (y << 7) & 0x9d2c5680U;
y ^= (y << 15) & 0xefc60000U;
y ^= (y >> 18);
k = (k+1)%624;
return y;
}
};

One good way to get random numbers is to monitor the ambient level of noise coming through your computer's microphone. If you can get a driver (or language that supports mic input) and convert this to a number, you're well on your way!
It has also been researched in how to get "true randomness" - since computers are nothing more than binary machines, they can't give us "true randomness". After a while, the sequence will begin to repeat itself. The quest for better random number generation is still going, but they say monitoring ambient noise levels in a room is one good way to prevent pattern forming in your random generation.
You can look up this wiki article for more information on the science behind random number generation.

If you are looking for a theoretical treatment on random numbers, probably you can have a look at Volume 2 of the The art of computer programming. It has a chapter dedicated to random numbers. See if it helps you out.

If you are wanting to manually, hard code, your own random generator I can't give you efficiency, however, I can give you reliability. I actually decided to write some code using time to test a computer's processing speed by counting in time and that turned into me writing my own random number generator using the counting algorithm for modulo (the count is random). Please, try it for yourselves and test on number distributions within a large test-set. By the way, this is written in python.
def count_in_time(n):
import time
count = 0
start_time = time.clock()
end_time = start_time + n
while start_time < end_time:
count += 1
start_time += (time.clock() - start_time)
return count
def generate_random(time_to_count, range_nums, rand_lst_size):
randoms = []
iterables = range(range_nums)
count = 0
for i in range(rand_lst_size):
count += count_in_time(time_to_count)
randoms.append(iterables[count%len(iterables)])
return randoms

This document is a very nice write up of pseudo-random number generation and has a number of routines included (in C). It also discusses the need for appropriate seeding of the random number generators (see rule 3). Particularly useful for this is the use of /dev/randon/ (if you are on a linux machine).
Note: the routines included in this document are alot simpler to code up than the Mersenne Twister. See also the WELLRNG generator, which is supposed to have better theoretical properties, as an alternative to the MT.

Read the rands book of random numbers (monte carlo book of random numbers) the numbers in it are randomly generated for you!!! My grandfather worked for rand.

Most RNGs(random number generators) will require a small bit of initialization. This is usually to perform a seeding operation and store the results of the seeded values for later use. Here is an example of a seeding method from a randomizer I wrote for a game engine:
/// <summary>
/// Initializes the number array from a seed provided by <paramref name="seed">seed</paramref>.
/// </summary>
/// <param name="seed">Unsigned integer value used to seed the number array.</param>
private void Initialize(uint seed)
{
this.randBuf[0] = seed;
for (uint i = 1; i < 100; i++)
{
this.randBuf[i] = (uint)(this.randBuf[i - 1] >> 1) + i;
}
}
This is called from the constructor of the randomizing class. Now the real random numbers can be rolled/calculated using the aforementioned seeded values. This is usually where the actual randomizing algorithm is applied. Here is another example:
/// <summary>
/// Refreshes the list of values in the random number array.
/// </summary>
private void Roll()
{
for (uint i = 0; i < 99; i++)
{
uint y = this.randBuf[i + 1] * 3794U;
this.randBuf[i] = (((y >> 10) + this.randBuf[i]) ^ this.randBuf[(i + 399) % 100]) + i;
if ((this.randBuf[i] % 2) == 1)
{
this.randBuf[i] = (this.randBuf[i + 1] << 21) ^ (this.randBuf[i + 1] * (this.randBuf[i + 1] & 30));
}
}
}
Now the rolled values are stored for later use in this example, but those numbers can also be calculated on the fly. The upside to precalculating is a slight performance increase. Depending on the algorithm used, the rolled values could be directly returned or go through some last minute calculations when requested by the code. Here is an example that takes from the prerolled values and spits out a very good looking pseudo random number:
/// <summary>
/// Retrieves a value from the random number array.
/// </summary>
/// <returns>A randomly generated unsigned integer</returns>
private uint Random()
{
if (this.index == 0)
{
this.Roll();
}
uint y = this.randBuf[this.index];
y = y ^ (y >> 11);
y = y ^ ((y << 7) + 3794);
y = y ^ ((y << 15) + 815);
y = y ^ (y >> 18);
this.index = (this.index + 1) % 100;
return y;
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio