How to Save State of C++0X Random Number Generator - random

I'm playing around with the new C++0X random library and based on this question:
What is the standard way to get the state of a C++0x random number generator?
it seems that if you don't know the seed for the current state of a random generator, the only way to save its state is by storing the generator in a stream. To do this I wrote the following
#include <iostream>
#include <sstream>
#include <random>
int main(int /*argc*/, char** /*argv*/)
{
std::mt19937 engine1;
unsigned int var = engine1(); // Just to get engine1 out of its initial state
std::stringstream input;
input << engine1;
std::mt19937 engine2;
input >> engine2;
std::cout<<"Engine comparison: "<<(engine1 == engine2)<<std::endl;
std::cout<<"Engine 1 random number "<<engine1()<<std::endl;
std::cout<<"Engine 2 random number "<<engine2()<<std::endl;
}
This outputs
Engine comparison: 1
Engine 1 random number 581869302
Engine 2 random number 4178893912
I have a few questions:
Why are the next numbers from engine1 and engine2 different?
Why are the two engines comparing equal even though their next numbers are different?
What am I doing wrong in my example and what is the correct way to save the state of a random engine to get repeatability in later runs (assuming you don't know the seed to set the desired state)?
Thank you.

This looks like a bug to me. I ran your code on libc++ and the output is:
Engine comparison: 1
Engine 1 random number 581869302
Engine 2 random number 581869302

Related

How to change a boost::multiprecision::cpp_int from big endian to little endian

I have a boost::multiprecision::cpp_int in big endian and have to change it to little endian. How can I do that? I tried with boost::endian::conversion but that did not work.
boost::multiprecision::cpp_int bigEndianInt("0xe35fa931a0000*);
boost::multiprecision::cpp_int littleEndianInt;
littleEndianIn = boost::endian::endian_reverse(m_cppInt);
The memory layout of boost multi-precision types is implementation detail. So you cannot assume much about it anyways (they're not supposed to be bitwise serializable).
Just read a random section of the docs:
MinBits
Determines the number of Bits to store directly within the object before resorting to dynamic memory allocation. When zero, this field is determined automatically based on how many bits can be stored in union with the dynamic storage header: setting a larger value may improve performance as larger integer values will be stored internally before memory allocation is required.
It's not immediately clear that you have any chance at some level of "normal int behaviour" in memory layout. The only exception would be when MinBits==MaxBits.
Indeed, we can static_assert that the size of cpp_int with such backend configs match the corresponding byte-sizes.
It turns out that there's even a promising tag in the backend base-class to indicate "triviality" (this is truly promising): trivial_tag, so let's use it:
Live On Coliru
#include <boost/multiprecision/cpp_int.hpp>
namespace mp = boost::multiprecision;
template <int bits> using simple_be =
mp::cpp_int_backend<bits, bits, mp::unsigned_magnitude>;
template <int bits> using my_int =
mp::number<simple_be<bits>, mp::et_off>;
using my_int8_t = my_int<8>;
using my_int16_t = my_int<16>;
using my_int32_t = my_int<32>;
using my_int64_t = my_int<64>;
using my_int128_t = my_int<128>;
using my_int192_t = my_int<192>;
using my_int256_t = my_int<256>;
template <typename Num>
constexpr bool is_trivial_v = Num::backend_type::trivial_tag::value;
int main() {
static_assert(sizeof(my_int8_t) == 1);
static_assert(sizeof(my_int16_t) == 2);
static_assert(sizeof(my_int32_t) == 4);
static_assert(sizeof(my_int64_t) == 8);
static_assert(sizeof(my_int128_t) == 16);
static_assert(is_trivial_v<my_int8_t>);
static_assert(is_trivial_v<my_int16_t>);
static_assert(is_trivial_v<my_int32_t>);
static_assert(is_trivial_v<my_int64_t>);
static_assert(is_trivial_v<my_int128_t>);
// however it doesn't scale
static_assert(sizeof(my_int192_t) != 24);
static_assert(sizeof(my_int256_t) != 32);
static_assert(not is_trivial_v<my_int192_t>);
static_assert(not is_trivial_v<my_int256_t>);
}
Conluding: you can have trivial int representation up to a certain point, after which you get the allocator-based dynamic-limb implementation no matter what.
Note that using unsigned_packed instead of unsigned_magnitude representation never leads to a trivial backend implementation.
Note that triviality might depend on compiler/platform choices (it's likely that cpp_128_t uses some builtin compiler/standard library support on GCC, e.g.)
Given this, you MIGHT be able to pull of what you wanted to do with hacks IF your backend configuration support triviality. Sadly I think it requires you to manually overload endian_reverse for 128 bits case, because the GCC builtins do not have __builtin_bswap128, nor does Boost Endian define things.
I'd suggest working off the information here How to make GCC generate bswap instruction for big endian store without builtins?
Final Demo (not complete)
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/endian/buffers.hpp>
namespace mp = boost::multiprecision;
namespace be = boost::endian;
template <int bits> void check() {
using T = mp::number<mp::cpp_int_backend<bits, bits, mp::unsigned_magnitude>, mp::et_off>;
static_assert(sizeof(T) == bits/8);
static_assert(T::backend_type::trivial_tag::value);
be::endian_buffer<be::order::big, T, bits, be::align::no> buf;
buf = T("0x0102030405060708090a0b0c0d0e0f00");
std::cout << std::hex << buf.value() << "\n";
}
int main() {
check<128>();
}
(Changing be::order::big to be::order::native obviously makes it compile. The other way to complete it would be to have an ADL accessible overload for endian_reverse for your int type.)
This is both trivial and in the general case unanswerable, let me explain:
For a general N-bit integer, where N is a large number, there is unlikely to be any well defined byte order, indeed even for 64 and 128 bit integers there are more than 2 possible orders in use: https://en.wikipedia.org/wiki/Endianness#Middle-endian.
On any platform, with any native endianness you can always extract the bytes of a cpp_int, the first example here: https://www.boost.org/doc/libs/1_73_0/libs/multiprecision/doc/html/boost_multiprecision/tut/import_export.html#boost_multiprecision.tut.import_export.examples shows you how. When exporting bytes like this, they are always most significant byte first, so you can subsequently rearrange them how you wish. You should not however, rearrange them and load them back into a cpp_int as the class won't know what to do with the result!
If you know that the value is small enough to fit into a native integer type, then you can simply cast to the native integer and use a system API on the result. As in endian_reverse(static_cast<int64_t>(my_cpp_int)). Again, don't assign the result back into a cpp_int as it requires native byte order.
If you wish to check whether a value is small enough to fit in an N-bit integer for the approach above, you can use the msb function, which returns the index of the most significant bit in the cpp_int, add one to that to obtain the number of bits used, and filter out the zero case and the code looks like:
unsigned bits_used = my_cpp_int.is_zero() ? 0 : msb(my_cpp_int) + 1;
Note that all of the above use completely portable code - no hacking of the underlying implementation is required.

Beginner at programming. C++ colon on line 9 included in this example code from textbook. I understand up to line 8 but what does the colon do?

Colon on line 9 is throwing me off. I'm not sure what its purpose is.
#include <iostream>
using namespace std;
const int TOTALYEARS = 100;
int main()
{
int ageFrequency[TOTALYEARS]; // reserves memory for 100 ints
:
return 0;
}
Just checked the "texbook" contents (which are some review notes actually). That's not a colon, that's the academic way of saying "other miscellaneous non-important code goes here". Something like:
int ageFrequency[TOTALYEARS]; // reserves memory for 100 ints
.
.
.
return 0;
Here's a screenshot for confirmation:
You're expected to ignore those symbols.
Not sure if this was insinuating cheating or copying from review notes. The example was, in fact, from the guided reading that is assigned before the knowledge is applied to various exercises that follow.

Explanation of a macro in kernel

In kernel 2.4.37, there is a macro in page.h like this:
struct page *mem_map;
struct page *page;
#define VALID_PAGE(page) ((page - mem_map) < max_mapnr)
I know mem_map is an array of struct page, page is a struct, so what does page - mem_map mean?
It will compute the index of corresponding page in mem_map array means which number of page it is in mem_map array, let say it as pfn or page frame number for linux (linux assumes that mem_map array starts with 0th pfn to the max pfn) , adding a PHYS_PFN_OFFSET to pfn will give you the actual physical page frame in your memory map.
__page_to_pfn
max_mapnr is the limit of maximum number of mapped pages or maximum page frame number.
set_max_mapnr
I hope it clears your doubts.
Humm, I'm not sure but maybe a pointer adresses comparaison ?
I mean, if one of them is a array and it's not dereferenced the operations are apply on adresses I suppose.
Edit: (precision)
So, in this case I think this operation is for check if "page" is in range of the adresses array "mem_map".
We can represent like this: Graphic representation
Utility of Macro:
So, "mem_map" is adresses of the begin of array, suppose: 0x0...5.
The size of "mem_map"(max_mapnr) array is: 5.
We want to know if "page" adresses is in the range of "mem_map" array.
True Case:
Suppose "page" is in "mem_map", 2e element. We can suppose his adresses is something like: 0x0...7;
Now we do operation: ((0x0...7 - 0x0...5) < 5).
We obtain 2. So "page" adresse is in mem_map.
False Case:
Otherwise if "page" is out of the array (0x0...D): We the result will be 8. So, 8 is not less than "max_mapnr"(5). So this page is not in the "mem_map" array.
And if the adresses is bellow the array adresse (0x0...2):
The result of ((0x0...2 - 0x0...2)) will be a negative value. And in that case they comparaison with "max_mapnr"(unsigned long) is not possible.
I found this topic explain why better than me:
Signed/unsigned comparisons
So for resume:
You canno't do operations between negative(signed) and unsigned value in C cause he cast them automaticaly. In others terms, when you do (-3 - U_nbr), it's same if you do: (((unsigned)-3) - U_nbr). And in option, normaly if you compile with gcc -Wall flags, and you don't cast manually your value you will have an compilation Warning message.
For testing I tried to run this code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(void)
{
unsigned long test = 0x0000F;
unsigned long test2 = 0x0000A;
unsigned long weird = 0x00002;
char* pt1 = "This is first test string !";
char* pt2 = "This is a test string";
printf("Try to make operation on two unsigned long result must be 5: %lu\n", (test - test2));
printf("Try to make operation between unsigned long result must be negative, so he will be cast: %lu\n", (weird - test2));
printf("Let's try the same with real adresses: %lu\n", (pt2 - pt1));
printf("And this is what happens with negative value: %lu\n", (pt1 - pt2));
printf("For be sure, this is the lenght of string 1. %lu\n", strlen(pt1));
return (0);
}
The ouput is:
Try to make operation on two unsigned long result must be 5: 5
Try to make operation between unsigned long result must be negative, so he will be cast: 18446744073709551608
Let's try the same with real adresses: 28
And this is what happens with negative value: 18446744073709551588
For be sure, this is the lenght of string 1. 27
So, as we can see, the negative value is casted in Unsigned long and return a overflowed one. And if you make this comparaison with max_mapnr you will see he is "out of range".
Thank's to AnshuMan Gupta for the "weird case".

Best way to maintain an RNG state in multiple devices in openCL

So I'm trying to make use of this custom RNG library for openCL:
http://cas.ee.ic.ac.uk/people/dt10/research/rngs-gpu-mwc64x.html
The library defines a state struct:
//! Represents the state of a particular generator
typedef struct{ uint x; uint c; } mwc64x_state_t;
And in order to generate a random uint, you pass in the state into the following function:
uint MWC64X_NextUint(mwc64x_state_t *s)
which updates the state, so that when you pass it into the function again, the next "random" number in the sequence will be generated.
For the project I am creating I need to be able to generate random numbers not just in different work groups/items but also across multiple devices simultaneously and I'm having trouble figuring out the best way to design this. Like should I create 1 mwc64x_state_t object per device/commandqueue and pass that state in as a global variable? Or is it possible to create 1 state object for all devices at once?
Or do I not even pass it in as a global variable and declare a new state locally within each kernel function?
The library also comes with this function:
void MWC64X_SeedStreams(mwc64x_state_t *s, ulong baseOffset, ulong perStreamOffset)
Which supposedly is supposed to split up the RNG into multiple "streams" but including this in my kernel makes it incredibly slow. For instance, if I do something very simple like the following:
__kernel void myKernel()
{
mwc64x_state_t rng;
MWC64X_SeedStreams(&rng, 0, 10000);
}
Then the kernel call becomes around 40x slower.
The library does come with some source code that serves as example usages but the example code is kind of limited and doesn't seem to be that helpful.
So if anyone is familiar with RNGs in openCL or if you've used this particular library before I'd very much appreciate your advice.
The MWC64X_SeedStreams function is indeed relatively slow, at least in comparison
to the MWC64X_NextUint call, but this is true of most parallel RNGs that try
to split a large global stream into many sub-streams that can be used in
parallel. The assumption is that you'll be calling NextUint many times
within the kernel (e.g. a hundred or more), but SeedStreams is only at the top.
This is an annotated version of the EstimatePi example that comes with
with the library (mwc64x/test/estimate_pi.cpp and mwc64x/test/test_mwc64x.cl).
__kernel void EstimatePi(ulong n, ulong baseOffset, __global ulong *acc)
{
// One RNG state per work-item
mwc64x_state_t rng;
// This calculates the number of samples that each work-item uses
ulong samplesPerStream=n/get_global_size(0);
// And then skip each work-item to their part of the stream, which
// will from stream offset:
// baseOffset+2*samplesPerStream*get_global_id(0)
// up to (but not including):
// baseOffset+2*samplesPerStream*(get_global_id(0)+1)
//
MWC64X_SeedStreams(&rng, baseOffset, 2*samplesPerStream);
// Now use the numbers
uint count=0;
for(uint i=0;i<samplesPerStream;i++){
ulong x=MWC64X_NextUint(&rng);
ulong y=MWC64X_NextUint(&rng);
ulong x2=x*x;
ulong y2=y*y;
if(x2+y2 >= x2)
count++;
}
acc[get_global_id(0)] = count;
}
So the intent is that n should be large and grow as the number
of work items grow, so that samplesPerStream remains around
a hundred or more.
If you want multiple kernels on multiple devices, then you
need to add another level of hierarchy to the stream splitting,
so for example if you have:
K : Number of devices (possibly on parallel machines)
W : Number work-items per device
C : Number of calls to NextUint per work-item
You end up with N=KWC total calls to NextUint across all
work-items. If your devices are identified as k=0..(K-1),
then within each kernel you would do:
MWC64X_SeedStreams(&rng, W*C*k, C);
Then the indices within the stream would be:
[0 .. N ) : Parts of stream used across all devices
[k*(W*C) .. (k+1)*(W*C) ) : Used within device k
[k*(W*C)+(i*C) .. (k*W*C)+(i+1)*C ) : Used by work-item i in device k.
It is fine if each kernel uses less than C samples, you can
over-estimate if necessary.
(I'm the author of the library).

When I used calloc to dynamically allocate a 1d array, am I supposed to get the same value or different value?

I saw my friend's program.
When he used calloc and fill in 1d array with random gen. and compile their program, he is getting the same value of the array.
Here is my code:
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
srand(time(NULL));
int *n, s=10;
one=(int*) calloc(s,sizeof(int));
for(m=0;m<s;m++)
{
o[m] =(rand()%20);
printf("%d\n",o[m]);
}
free(one);
The outputs:
First run:
Second run:
You are assigning to o[m] but printing out one[m].
You also need to call srand at the start of your program to initialize the random sequence.

Resources