boost::multiprecision random number with fixed seed and variable precision - random

When using a fixed seed inside a rng, results are not reproducible when precision is varied. Namely, if one changes the template argument cpp_dec_float<xxx> and runs the following code, different outputs are seen (for each change in precision).
#include <iostream>
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <boost/multiprecision/cpp_int.hpp>
#include <random>
#include <boost/random.hpp>
typedef boost::multiprecision::cpp_dec_float<350> mp_backend; // <--- change me
typedef boost::multiprecision::number<mp_backend, boost::multiprecision::et_off> big_float;
typedef boost::random::independent_bits_engine<boost::mt19937, std::numeric_limits<big_float>::digits, boost::multiprecision::cpp_int> generator;
int main()
{
std::cout << std::setprecision(std::numeric_limits<big_float>::digits10) << std::showpoint;
auto ur = boost::random::uniform_real_distribution<big_float>(big_float(0), big_float(1));
generator gen = generator(42); // fixed seed
std::cout << ur(gen) << std::endl;
return 0;
}
Seems reasonable I guess. But how do I make it so that for n digits of precision, a fixed seed will produce a number x which is equivalent to y within n digits where y is defined for n+1 digits? e.g.
x = 0.213099234 // n = 9
y = 0.2130992347 // n = 10
...

To add to the excellent #user14717 answer, to get reproducible result, you would have to:
Use wide (wider than output mantissa+1) random bits generator. Lets say, you need MP doubles with no more than 128bit mantissa, then use bits generator which produces 128bit output. Internally, it could be some standard RNG like mersenne twister chaining words together to achieve desired width.
You own uniform_real_distribution, which converts this 128bits to mantissa
And at the end, DISCARD the rest of the bits in the 128bits pack.
Using this approach would guarantee you'll get the same real output, only difference being in precision.

The way these distributions work is to shift random bits into the mantissa of the floating point number. If you change the precision, you consume more of these bits on every call, so you get different random sequences.
I see no way for you to achieve your goal without writing your own uniform_real_distribution. You probably need two integer RNGs, one which fills the most significant bits, and another which fills the least significant bits.

Related

Why std::chrono::time_point is not large enough to store struct timespec?

I'm trying the recent std::chrono api and I found that on 64 bit Linux architecture and gcc compiler the time_point and duration classes are not able to handle the maximum time range of the operating system at the maximum resolution (nanoseconds). In fact it seems the storage for these classes is a 64bit integral type, compared to timespec and timeval which are internally using two 64 bit integers, one for seconds and one for nanoseconds:
#include <iostream>
#include <chrono>
#include <typeinfo>
#include <time.h>
using namespace std;
using namespace std::chrono;
int main()
{
cout << sizeof(time_point<nanoseconds>) << endl; // 8
cout << sizeof(time_point<nanoseconds>::duration) << endl; // 8
cout << sizeof(time_point<nanoseconds>::duration::rep) << endl; // 8
cout << typeid(time_point<nanoseconds>::duration::rep).name() << endl; // l
cout << sizeof(struct timespec) << endl; // 16
cout << sizeof(struct timeval) << endl; // 16
return 0;
}
On 64 bit Windows (MSVC2017) the situation is very similar: the storage type is also a 64 bit integer. This is not a problem when dealing with steady (aka monotonic) clocks, but storage limitations make the the different API implementations not suitable to store bigger dates and wider time spans, creating the ground for Y2K-like bugs. Is the problem acknowledged? Are there plans for better implementations or API improvements?
This was done so that you get maximum flexibility along with compact size. If you need ultra-fine precision, you usually don't need a very large range. And if you need a very large range, you usually don't need very high precision.
For example, if you're trafficking in nanoseconds, do you regularly need to think about more than +/- 292 years? And if you need to think about a range greater than that, well microseconds gives you +/- 292 thousand years.
The macOS system_clock actually returns microseconds, not nanoseconds. So that clock can run for 292 thousand years from 1970 until it overflows.
The Windows system_clock has a precision of 100-ns units, and so has a range of +/- 29.2 thousand years.
If a couple hundred thousand years is still not enough, try out milliseconds. Now you're up to a range of +/- 292 million years.
Finally, if you just have to have nanosecond precision out for more than a couple hundred years, <chrono> allows you to customize the storage too:
using dnano = duration<double, nano>;
This gives you nanoseconds stored as a double. If your platform supports a 128 bit integral type, you can use that too:
using big_nano = duration<__int128_t, nano>;
Heck, if you write overloaded operators for timespec, you can even use that for the storage (I don't recommend it though).
You can also achieve precisions finer than nanoseconds, but you'll sacrifice range in doing so. For example:
using picoseconds = duration<int64_t, pico>;
This has a range of only +/- .292 years (a few months). So you do have to be careful with that. Great for timing things though if you have a source clock that gives you sub-nanosecond precision.
Check out this video for more information on <chrono>.
For creating, manipulating and storing dates with a range greater than the validity of the current Gregorian calendar, I've created this open-source date library which extends the <chrono> library with calendrical services. This library stores the year in a signed 16 bit integer, and so has a range of +/- 32K years. It can be used like this:
#include "date.h"
int
main()
{
using namespace std::chrono;
using namespace date;
system_clock::time_point now = sys_days{may/30/2017} + 19h + 40min + 10s;
}
Update
In the comments below the question is asked how to "normalize" duration<int32_t, nano> into seconds and nanoseconds (and then add the seconds to a time_point).
First, I would be wary of stuffing nanoseconds into 32 bits. The range is just a little over +/- 2 seconds. But here's how I separate out units like this:
using ns = duration<int32_t, nano>;
auto n = ns::max();
auto s = duration_cast<seconds>(n);
n -= s;
Note that this only works if n is positive. To correctly handle negative n, the best thing to do is:
auto n = ns::max();
auto s = floor<seconds>(n);
n -= s;
std::floor is introduced with C++17. If you want it earlier, you can grab it from here or here.
I'm partial to the subtraction operation above, as I just find it more readable. But this also works (if n is not negative):
auto s = duration_cast<seconds>(n);
n %= 1s;
The 1s is introduced in C++14. In C++11, you will have to use seconds{1} instead.
Once you have seconds (s), you can add that to your time_point.
std::chrono::nanoseconds is a type alias for std::chrono::duration<some_t, std::nano> where some_t is a signed int with an storage of at least 64 bits. This still allows for at least 292 years of range with nanosecond precision.
Notably the only integral types with such characteristics mentioned by the standard are the int(|_fast|_least)64_t family.
You are free to choose a wider type to represent your times, if your implementation provides one. You are further free to provide a namespace with a bunch of typedef's that mirror the std::chrono ratios, with your wider type as the representation.

Sampling from all possible floats in D

In the D programming language, the standard random (std.random) module provides a simple mechanism for generating a random number in some specified range.
auto a = uniform(0, 1024, gen);
What is the best way in D to sample from all possible floating point values?
For clarification, sampling from all possible 32-bit integers can be done as follows:
auto l = uniform!int(); // randomly selected int from all possible integers
Depends on the kind of distribution you want.
A uniform distribution over all possible values could be done by generating a random ulong and then casting the bits into floating point. For T being float or double:
union both { ulong input; T output; }
both val;
val.input = uniform!"[]"(ulong.min, ulong.max);
return val.output;
Since roughly half of the positive floating point numbers are between 0 and 1, this method will often give you numbers near zero.`It will also give you infinity and NaN values.
Aside: This code should be fine with D, but would be undefined behavior in C/C++. Use memcpy there.
If you prefer a uniform distribution over all possible numbers in floating point (equal probability for 0..1 and 1..2 etc), you need something like the normal uniform!double, which unfortunately does not work very well for large numbers. It also will not generate infinity or NaN. You could generate double numbers and convert them to float, but I have no answer for generating random large double numbers.

C++ 0xC0000094: Integer division by zero

This code is working perfectly until 100000 but if you input 1000000 it is starting to give the error C++ 0xC0000094: Integer division by zero. I am sure it is something about floating points. I tried all the combinations of (/fp:precise), (/fp:strict), (/fp:except) and (/fp:except-) but had no positive result.
#include "stdafx.h"
#include "time.h"
#include "math.h"
#include "iostream"
#define unlikely(x)(x)
int main()
{
using namespace std;
begin:
int k;
cout<<"Please enter the nth prime you want: ";
cin>>k;
int cloc=clock();
int*p;p=new int [k];
int i,j,v,n=0;
for(p[0]=2,i=3;n<k-1;i+=2)
for(j=1;unlikely((v=p[j],pow(v,2)>i))?!(p[++n]=i):(i%v);++j);
cout <<"The "<<k<<"th prime is "<<p[n]<<"\nIt took me "<<clock()-cloc<<" milliseconds to find your prime.\n";
goto begin;
}
The code displayed in the question does not initialize p[1] or assign a value to it. In the for loop that sets j=1, p[j] is used in an assignment to v. The results in an unknown value for v. Apparently, it happens to be zero, which causes a division by zero in the expression i%v.
As this code is undocumented, poorly structured, and unreadable, the proper solution is to discard it and start from scratch.
Floating point has no bearing on the problem, although the use of pow(v, 2) to calculate v2 is a poor choice; v*v would serve better. However, some systems print the misleading message “Floating exception” when an integer division by zero occurs. In spite of the message, this is an error in an integer operation.

generate random long unsigned C

This is my code:
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <crypt.h>
#include <string.h>
#include <stdlib.h>
int main(void){
int i;
unsigned long seed[2];
/* Generate a (not very) random seed */
seed[0] = time(NULL);
seed[1] = getpid() ^ (seed[0] >> 14 & 0x30000);
printf("Seed 0: %lu ; Seed 1: %lu", seed[0], seed[1]);
return 0;
}
I want to generate some very random seed that will be used into an hash function but i don't know how to do it!
You can read the random bits you need from /dev/random.
When read, the /dev/random device will only return random bytes within the estimated number of bits of noise in the entropy pool. /dev/random should be suitable for uses that need very high quality randomness such as one-time pad or key generation. When the entropy pool is empty, reads from /dev/random will block until additional environmental noise is gathered.(http://www.kernel.org/doc/man-pages/online/pages/man4/random.4.html)
int randomSrc = open("/dev/random", O_RDONLY);
unsigned long seed[2];
read(randomSrc , seed, 2 * sizeof(long) );
close(randomSrc);
Go for Mersenne Twister, it is a widely used pseudorandom number generator, since it is very fast, has a very long period and a very good distribution. Do not attempt to write your own implementation, use any of the available ones.
Because the algorithm is deterministic you can't get very random, only pseudo-random - for most cases what you have there is plenty, if you go overboard e.g.
Mac address + IP address + free space on HD + current free memory + epoch time in ms...
then you risk crippling the performance of your algorithm.
If your solution is interactive then you could set the user a short typing task and get them to generate the random data for you - measure the time between keystrokes and multiply that by the code of the key they pressed - even if they re-type the same string the timing will be off slightly - you could mix it up a bit, take mod 10 of the seconds when they start and only count those keystrokes.
But if you really really want 100% random numbers - then you could use the ANU Quantum Vacuum Random number generator - article
There is a project on GitHub it's pretty awesome way to beat the bad guys.

How do the digits 1101004800 correspond with the number 20?

I'm trying to learn how to modify memory locations using C++ and when messing with MineSweeper, I noticed that when the clock's value in memory was 1101004800, it was 20 seconds into the game. The digits 1101529088 correspond with 21 seconds into the game. Can someone please explain to me how to convert between those 10-digit long numbers to base-10?
They are using floats to represent the timer. Here is a program that converts your integers to floats:
#include <stdio.h>
int main() {
int n = 1101004800;
int n2 = 1101529088;
printf("%f\n", *((float*)&n));
printf("%f\n", *((float*)&n2));
return 0;
}
Output:
20.000000
21.000000
1101004800 decimal is 0x41A00000 hex, which is the IEEE-754 representation of 20.0. 1101529088 decimal is 0x41A80000 hex, which is the IEEE-754 representation of 21.0.

Resources