I tried the following program which uses curand to generate random numbers. When the number of elements to generate (variable n) is an odd number like 9849 below, I got an error on the line with curandGenerateNormal. Even number of elements does not have this problem. What is the reason of that?
#include <curand.h>
#include <iostream>
#include <cstdlib>
using namespace std;
#define CHKcuda(x) do { \
cudaError_t y = (x); \
if (y != cudaSuccess) { \
cout << __LINE__ << ": " << y << endl; exit(1); \
} \
} while(0)
#define CHKcurand(x) do { \
curandStatus_t y = (x); \
if (y != CURAND_STATUS_SUCCESS) { \
cout << __LINE__ << ": " << y << endl; exit(1); \
} \
} while(0)
int main(int argc, char** argv) {
curandGenerator_t g_randgen;
float *ptr, *h_ptr;
int n;
if (argc > 1) {
n = atoi(argv[1]);
}
CHKcurand(curandCreateGenerator(&g_randgen, CURAND_RNG_PSEUDO_DEFAULT));
CHKcuda(cudaMalloc((void**)&ptr, n * sizeof(float)));
CHKcurand(curandGenerateNormal(g_randgen, ptr, n, 0, 0.1));
h_ptr = static_cast<float*>(malloc(sizeof(float) * n));
CHKcuda(cudaMemcpy(h_ptr, ptr, sizeof(float) * n, cudaMemcpyDeviceToHost));
CHKcuda(cudaDeviceSynchronize());
for (int i = 0; i < 5; i++) {
cout << h_ptr[i] << ", ";
}
cout << endl;
return 0;
}
EDIT:
I checked the return value of the generating function. The definition of the error code says the following:
CURAND_STATUS_LENGTH_NOT_MULTIPLE = 105, ///< Length requested is not a multple of dimension
However, in the documentation it only says when generating quasirandom numbers, the number of elements must be a multiple of the dimension. So why it affects the pseudorandom number generation here? Or is the parameter I'm using to create the generator (CURAND_RNG_PSEUDO_DEFAULT) actually created a quasirandom number generator? And moreover, what is the exact value of the dimension and where can I find it out?
In general, the normal generating functions (e.g. curandGenerateNormal, curandGenerateLogNormal, etc.) require the number of requested points to be a multiple of 2, for a pseudorandom RNG.
This is documented:
curandStatus_t CURANDAPI curandGenerateNormal ( curandGenerator_t generator, float* outputPtr, size_t n, float mean, float stddev )
Generate normally distributed doubles.
Parameters
generator- Generator to use outputPtr- Pointer to device memory to store CUDA-generated results, or Pointer to host memory to store CPU-generated results n- Number of floats to generate mean- Mean of normal distribution stddev- Standard deviation of normal distribution
Returns
•CURAND_STATUS_NOT_INITIALIZED if the generator was never created
•CURAND_STATUS_PREEXISTING_FAILURE if there was an existing error from a previous kernel launch
•CURAND_STATUS_LAUNCH_FAILURE if the kernel launch failed for any reason
•CURAND_STATUS_LENGTH_NOT_MULTIPLE if the number of output samples is not a multiple of the quasirandom dimension, or is not a multiple of two for pseudorandom generators
•CURAND_STATUS_SUCCESS if the results were generated successfully
curandGenerateUniform, for example, does not have this restriction.
Related
I'm trying to create a list which contains 10 unique random numbers between 1 and 20 by using a recursive function. Here is the code.
Compiler: GNU g++ 10.2.0 on Windows
Compiler flags: -DDEBUG=9 -ansi -pedantic -Wall -std=c++11
#include <iostream>
#include <vector>
#include <algorithm>
#include <time.h>
using namespace std;
vector<int> random (int size, int range, int randnum, vector<int> randlist ) {
if (size < 1) {
cout << "returning...(size=" << size << ")" << endl;
return randlist;
}
else {
if (any_of(randlist.begin(), randlist.end(),[randnum](int elt) {return randnum == elt;})){
cout << "repeating number: " << randnum << endl;
random(size, range, rand() % range + 1, randlist);
return randlist;
}
else {
cout << "size " << size << " randnum " << randnum << endl;
randlist.push_back(randnum);
random(size-1, range, rand() % range + 1, randlist);
return randlist; }
}
}
int main (int argc, char *argv[]) {
srand (time(NULL));
vector<int> dummy{};
vector<int> uniqrandnums = random(10, 20, (rand() % 20) + 1, dummy );
cout << "here is my unique random numbers list: " ;
for_each(uniqrandnums.begin(),uniqrandnums.end(), [](int n){cout << n << ' ';});
}
To keep track of the unique random numbers, I've added 2 cout lines inside the recursive function random. The recursive function seems to operate correctly but it can't return back the resulting vector<int list randlist correctly; it seems to return a list with just the first random number it found.
Note: Reckoning that the function would finally return from here:
if (size < 1) {
cout << "returning...(size=" << size << ")" << endl;
return randlist;
}
I haven't initially added the last 2 return randlist; lines inside the recursive function but as is, it gave compilation warning control reaches end of non-void function [-Wreturn-type] That's why I've added those 2 return statements but it made just the warnings go away and it didn't help operate correctly.
Question: How to arrange the code so the recursive function random returns the full list in a correct manner?
The issue is that you are discarding the result of recursive calls to randlist(). In the two places where you call:
random(..., randlist);
return randlist;
Replace that with:
return random(..., randlist);
If I generate random numbers with the following code:
#include <iostream>
#include <random>
int main()
{
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, 999);
for (int n=0; n<1000; ++n)
std::cout << dis(gen) << ' ';
std::cout << '\n';
}
is it possible to get the previously generated values in the reverse order (without saving them into an array, etc...) after the loop is finished, and do something like this:
for (int n=0; n<1000; ++n)
std::cout << GetPrev(dis, gen) << ' ';
std::cout << '\n';
?
If you seed the pseudo random engine with the same value, it will generate the same sequence of bits, which will translate in the distribution generating the same numbers. So you need to store the seed passed to the constructor of mt19937.
I am using rand and srand from cstdlib and g++ as a compiler. I was playing around trying to generate some pseudo random numbers and I was getting some unexpected biased results. I was curious so I wrote a simple function. The expected behavior would be that a random number between 1 and 10 would be generated and printed out to screen a 100x's. The expected value of the average should be 5. However, when I run this function it will a generate a single random number between 1 and 10 and print it 100x's with the average being equal to the random number that was generated.
#include <iostream>
#include <cstdlib>
#include <ctime>
using namespace std;
float bs(){
float random;
srand(time(0));
random = rand() % 10 + 1;
return random;
}
int main(){
float average;
float random;
for (int i = 1; i < 101; ++i)
{
random += bs();
cout << random << endl;
}
average = random/100;
cout << average << endl;
return 0;
}
If the initial return from bs = 7 it will stay 7 for the duration of the loop and each time bs() is called. The output will be 7 added to itself 100x's and the average will be equal to gasp 7. What is going on here?
The seed should only be applied once. Move the
srand(time(0));
to main before the loop.
I have a 2 dimensional matrix with each column corresponding to one independent signal. I am going to perform N 1D fft on each column. In matlab, apply a fft to a 2D matrix will do the trick. But I am porting my code to c++ with fftw. I wonder if there is a way to do so. I try the following code by setting the column size to 1 and row size to 4 (total row number), but it does not help.
#include <iostream>
#include <complex>
#include "fftw3.h"
using namespace std;
int main(int argc, char** argv)
{
complex<double> data[4][2];
data[0][0] = complex<double>(1,1);
data[1][0] = complex<double>(2,1);
data[2][0] = complex<double>(3,1);
data[3][0] = complex<double>(4,1);
data[0][1] = complex<double>(1,1);
data[1][1] = complex<double>(1,2);
data[2][1] = complex<double>(1,3);
data[3][1] = complex<double>(1,4);
cout << "original data ..." << endl;
cout << data[0][0] << '\t' << data[0][1] << endl;
cout << data[1][0] << '\t' << data[1][1] << endl;
cout << data[2][0] << '\t' << data[2][1] << endl;
cout << data[3][0] << '\t' << data[3][1] << endl;
cout << endl << endl;
fftw_plan plan=fftw_plan_dft_2d(4, 1,(fftw_complex*)&data[0][0], (fftw_complex*)&data[0][0], FFTW_FORWARD, FFTW_ESTIMATE);
fftw_execute(plan);
cout << "after fftw ..." << endl;
cout << data[0][0] << '\t' << data[0][1] << endl;
cout << data[1][0] << '\t' << data[1][1] << endl;
cout << data[2][0] << '\t' << data[2][1] << endl;
cout << data[3][0] << '\t' << data[3][1] << endl;
return 0;
}
Above code takes the first and second row and reshape them to 2x2 matrix then perform a 2D fft.
Up to now, the only way that comes to my mind is as follow. Let's say I have NxM (N rows, M columns), I create M fftw plans for M 1D fftw. I execute M fftw in serial to get the result. But in practical application, the matrix is very big, M is so large. It is very inefficient to do this way. Any better idea? Thanks.
For those stumbling across this nowadays, the FFTW devs have implemented routines for this operation, which is faster than looping through each column and taking a separate transform. You certainly don't want to take a 2D transform (as is shown in the question), which is mathematically different than row-wise 1D transforms.
The key to you question is in fftw_plan_many_dft. Here is a link to the full documentation.
Here is an example (modifed from the above link) that illustrates what you're looking for.
#include "fftw3.h"
int main() {
fftw_complex *A; // array of data
A = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*10*3);
// ...
/* Transform each column of a 2d array with 10 rows and 3 columns */
int rank = 1; /* not 2: we are computing 1d transforms */
int n[] = {10}; /* 1d transforms of length 10 */
int howmany = 3;
int idist = 1;
int odist = 1;
/* distance between two elements in the same column */
int istride = 3;
int ostride = 3;
int *inembed = n, *onembed = n;
/* forward, in-place, 1D transform of each column */
fftw_plan p;
p = fftw_plan_many_dft(rank, n, howmany, A, inembed, istride, idist, A, onembed, ostride, odist, FFTW_FORWARD, FFTW_ESTIMATE);
// ...
/* run transform */
fftw_execute_dft(p, A, A);
// ...
/* we don't want memory leaks */
fftw_destroy_plan(p);
fftw_free(A);
}
I am using the boost::multiprecision library for decimal float types, and wish to compare two floats to the specified precision.
However, cpp_dec_float seems to compare the number not to the specified precision, but also includes the guard digits:
#include <iostream>
#include <boost/multiprecision/cpp_dec_float.hpp>
//#include <boost/math/special_functions.hpp>
typedef boost::multiprecision::number<boost::multiprecision::cpp_dec_float<50> > flp_type;
int main(int argc, char* argv[])
{
// 50 decimal digits
flp_type sqrt2("1.4142135623730950488016887242096980785696718753769");
// Contains calculated guard digits
flp_type result(boost::multiprecision::sqrt(flp_type("2")));
// The 50 digits of precision actually ompare equal
std::cout << std::setprecision(50) << sqrt2 << std::endl;
std::cout << std::setprecision(50) << result << std::endl;
// I want this to compare to the specified precision of the type, not the guard digits
std::cout << (result==sqrt2) << std::endl;
return 0;
}
Output:
1.4142135623730950488016887242096980785696718753769
1.4142135623730950488016887242096980785696718753769
0
Expected:
1.4142135623730950488016887242096980785696718753769
1.4142135623730950488016887242096980785696718753769
1
See on Coliru
I have tried to "truncate" with precision(), but to no avail.
Is there a way to compare the two numbers without resorting to epsilon comparisons?
If you strip the guard bits, you effectively cripple the fidelity of the type as intended.
A surefire way would be to use (de)serialization, really.
So I suggest
Live On Coliru
// Either
std::cout << std::numeric_limits<flp_type>::epsilon() << "\n";
std::cout << (abs(result-sqrt2) < std::numeric_limits<flp_type>::epsilon()) << std::endl;
// Or
result = flp_type { result.str(49, std::ios::fixed) };
std::cout << (result==sqrt2) << std::endl;
Note that the epsilon is 1e-49 there
Prints
1.4142135623730950488016887242096980785696718753769
1.4142135623730950488016887242096980785696718753769
1e-49
1
1
Obviously the epsilon() based comparison would be appear the more efficient
bool is_equal = abs(result-sqrt2) < std::pow(10, -std::numeric_limits< flp_type >::digits10 );