Is fftw output depending on size of input? - fftw

In the last week i have been programming some 2-dimensional convolutions with FFTW, by passing to the frequency domain both signals, multiplying, and then coming back.
Surprisingly, I am getting the correct result only when input size is less than a fixed number!
I am posting some working code, in which i take simple initial constant matrixes of value 2 for the input, and 1 for the filter on the spatial domain. This way, the result of convolving them should be a matrix of the average of the first matrix values, i.e., 2, since it is constant. This is the output when I vary the sizes of width and height from 0 to h=215, w=215 respectively; If I set h=216, w=216, or greater, then the output gets corrupted!! I would really appreciate some clues about where could I be making some mistake. Thank you very much!
#include <fftw3.h>
int main(int argc, char* argv[]) {
int h=215, w=215;
//Input and 1 filter are declared and initialized here
float *in = (float*) fftwf_malloc(sizeof(float)*w*h);
float *identity = (float*) fftwf_malloc(sizeof(float)*w*h);
for(int i=0;i<w*h;i++){
in[i]=5;
identity[i]=1;
}
//Declare two forward plans and one backward
fftwf_plan plan1, plan2, plan3;
//Allocate for complex output of both transforms
fftwf_complex *inTrans = (fftwf_complex*) fftw_malloc(sizeof(fftwf_complex)*h*(w/2+1));
fftwf_complex *identityTrans = (fftwf_complex*) fftw_malloc(sizeof(fftwf_complex)*h*(w/2+1));
//Initialize forward plans
plan1 = fftwf_plan_dft_r2c_2d(h, w, in, inTrans, FFTW_ESTIMATE);
plan2 = fftwf_plan_dft_r2c_2d(h, w, identity, identityTrans, FFTW_ESTIMATE);
//Execute them
fftwf_execute(plan1);
fftwf_execute(plan2);
//Multiply in frequency domain. Theoretically, no need to multiply imaginary parts; since signals are real and symmetric
//their transform are also real, identityTrans[i][i] = 0, but i leave here this for more generic implementation.
for(int i=0; i<(w/2+1)*h; i++){
inTrans[i][0] = inTrans[i][0]*identityTrans[i][0] - inTrans[i][1]*identityTrans[i][1];
inTrans[i][1] = inTrans[i][0]*identityTrans[i][1] + inTrans[i][1]*identityTrans[i][0];
}
//Execute inverse transform, store result in identity, where identity filter lied.
plan3 = fftwf_plan_dft_c2r_2d(h, w, inTrans, identity, FFTW_ESTIMATE);
fftwf_execute(plan3);
//Output first results of convolution(in, identity) to see if they are the average of in.
for(int i=0;i<h/h+4;i++){
for(int j=0;j<w/w+4;j++){
std::cout<<"After convolution, component (" << i <<","<< j << ") is " << identity[j+i*w]/(w*h*w*h) << endl;
}
}std::cout<<endl;
//Compute average of data
float sum=0.0;
for(int i=0; i<w*h;i++)
sum+=in[i];
std::cout<<"Mean of input was " << (float)sum/(w*h) << endl;
std::cout<< endl;
fftwf_destroy_plan(plan1);
fftwf_destroy_plan(plan2);
fftwf_destroy_plan(plan3);
return 0;
}

Your problem has nothing to do with fftw ! It comes from this line :
std::cout<<"After convolution, component (" << i <<","<< j << ") is " << identity[j+i*w]/(w*h*w*h) << endl;
if w=216 and h=216 then `w*h*w*h=2 176 782 336. The higher limit for signed 32bit integer is 2 147 483 647. You are facing an overflow...
Solution is to cast the denominator to float.
std::cout<<"After convolution, component (" << i <<","<< j << ") is " << identity[j+i*w]/(((float)w)*h*w*h) << endl;
The next trouble that you are going to face is this one :
float sum=0.0;
for(int i=0; i<w*h;i++)
sum+=in[i];
Remember that a float has 7 useful decimal digits. If w=h=4000, the computed average will be lower than the real one. Use a double or write two loops and sum on the inner loop (localsum) before summing the outer loop (sum+=localsum) !
Bye,
Francis

Related

Unexpected and large runtime variations in Eigen for matrix multiplies

I am comparing ways to perform equivalent matrix operations within Eigen, and am getting extraordinarily different runtimes, including some non-intuitive results.
I am comparing three mathematically equivalent forms of the matrix multiplication:
wx * transpose(data)
The three forms I'm comparing are:
result = wx * data.transpose() (straight multiply version)
result.noalias() = wx * data.transpose() (noalias version)
result = (data * wx.transpose()).transpose() (transposed version)
I am also testing using both Column Major and Row Major storage.
With column major storage, the transposed version is significantly faster (an order of magnitude) than both the straight multiply and the no alias version, which are both approximately equal in runtime.
With row major storage, the noalias and the transposed version are both significantly faster than the straight multiply in runtime.
I understand that Eigen uses lazy evaluation, and that the immediate results returned from an operation are often expression templates, and are not the intermediate values. I also understand that matrix * matrix operations will always produce a temporary when they are the last operation on the right hand side, to avoid aliasing issues, hence why I am attempting to speed things up through noalias().
My main questions:
Why is the transposed version always significantly faster, even (in the case of column major storage) when I explicitly state noalias so no temporaries are created?
Why does the (significant) difference in runtime only occur between the straight multiply and the noalias version when using column major storage?
The code I am using for this is below. It is being compiled using gcc 4.9.2, on a Centos 6 install, using the following command line.
g++ eigen_test.cpp -O3 -std=c++11 -o eigen_test -pthread -fopenmp -finline-functions
using Matrix = Eigen::Matrix<float, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor>;
// using Matrix = Eigen::Matrix<float, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor>;
int wx_rows = 8000;
int wx_cols = 1000;
int samples = 1;
// Eigen::MatrixXf matrix = Eigen::MatrixXf::Random(matrix_rows, matrix_cols);
Matrix wx = Eigen::MatrixXf::Random(wx_rows, wx_cols);
Matrix data = Eigen::MatrixXf::Random(samples, wx_cols);
Matrix result;
unsigned int iterations = 10000;
float sum = 0;
auto before = std::chrono::high_resolution_clock::now();
for (unsigned int ii = 0; ii < iterations; ++ii)
{
result = wx * data.transpose();
sum += result(result.rows() - 1, result.cols() - 1);
}
auto after = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(after - before).count();
std::cout << "original sum: " << sum << std::endl;
std::cout << "original time (ms): " << duration << std::endl;
std::cout << std::endl;
sum = 0;
before = std::chrono::high_resolution_clock::now();
for (unsigned int ii = 0; ii < iterations; ++ii)
{
result.noalias() = wx * data.transpose();
sum += result(wx_rows - 1, samples - 1);
}
after = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::milliseconds>(after - before).count();
std::cout << "alias sum: " << sum << std::endl;
std::cout << "alias time (ms) : " << duration << std::endl;
std::cout << std::endl;
sum = 0;
before = std::chrono::high_resolution_clock::now();
for (unsigned int ii = 0; ii < iterations; ++ii)
{
result = (data * wx.transpose()).transpose();
sum += result(wx_rows - 1, samples - 1);
}
after = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::milliseconds>(after - before).count();
std::cout << "new sum: " << sum << std::endl;
std::cout << "new time (ms) : " << duration << std::endl;
One half of the explanation is because, in the current version of Eigen, multi-threading is achieved by splitting the work over blocks of columns of the result (and the right-hand-side). With only 1 column, multi-threading does not take place. In the column-major case, this explain why cases 1 and 2 underperform. On the other hand, case 3 is evaluated as:
column_major_tmp.noalias() = data * wx.transpose();
result = column_major_tmp.transpose();
and since wx.transpose().cols() is huge, multi-threading is effective.
To understand the row-major case, you also need to know that internally matrix products is implemented for a column-major destination. If the destination is row-major, as in case 2, then the product is transposed, so what really happens is:
row_major_result.transpose().noalias() = data * wx.transpose();
and so we're back to case 3 but without temporary.
This is clearly a limitation of current Eigen's multi-threading implementation for highly unbalanced matrix sizes. Ideally threads should be spread on row-block and/or column-block depending on the size of the matrices at hand.
BTW, you should also compile with -march=native to let Eigen fully exploit your CPU (AVX, FMA, AVX512...).

Iterative multiplication of relative transforms leads to fast instability

I'm writing a program that receives Eigen transforms and stores them in a container after applying some noise. In particular, at time k, I receive transform Tk. I get from the container the transform Tk-1, create the delta = Tk-1-1 · Tk, apply some noise to delta and store Tk-1 · delta as a new element of the container.
I've noticed that after 50 iterations the values are completely wrong and at every iteration I see that the last element of the container, when pre-multiplied by its inverse, is not even equal to the identity.
I've already checked that the container follows the rules of allocation specified by Eigen.
I think the problem is related to the instability of the operations I'm doing.
The following simple code produce the nonzero values when max = 35 and goes to infinity when max is bigger than 60.
Eigen::Isometry3d my_pose = Eigen::Isometry3d::Identity();
my_pose.translate(Eigen::Vector3d::Random());
my_pose.rotate(Eigen::Quaterniond::UnitRandom());
Eigen::Isometry3d my_other_pose = my_pose;
int max = 35;
for(int i=0; i < max; i++)
{
my_pose = my_pose * my_pose.inverse() * my_pose;
}
std::cerr << my_pose.matrix() - my_other_pose.matrix() << std::endl;
I'm surprised how fast the divergence happens. Since my real program is expected to iterate more than hundreds of times, is there a way to create relative transforms that are more stable?
Yes, use a Quaterniond for the rotations:
Eigen::Isometry3d my_pose = Eigen::Isometry3d::Identity();
my_pose.translate(Eigen::Vector3d::Random());
my_pose.rotate(Eigen::Quaterniond::UnitRandom());
Eigen::Isometry3d my_other_pose = my_pose;
Eigen::Quaterniond q(my_pose.rotation());
int max = 35;
for (int i = 0; i < max; i++) {
std::cerr << q.matrix() << "\n\n";
std::cerr << my_pose.matrix() << "\n\n";
q = q * q.inverse() * q;
my_pose = my_pose * my_pose.inverse() * my_pose;
}
std::cerr << q.matrix() - Eigen::Quaterniond(my_other_pose.rotation()).matrix() << "\n";
std::cerr << my_pose.matrix() - my_other_pose.matrix() << std::endl;
If you would have examined the difference you printed out, the rotation part of the matrix gets a huge error, while the translation part is tolerable. The inverse on the rotation matrix will hit stability issues quickly, so using it directly is usually not recommended.

Get a number value from Vector positions

I'm new here and actually
I've got a problem in my mind, and it's like this:
I get an input of a vector of any size, but for this case, let's take this one:
vetor = {1, 2, 3, 4}
Now, all I want to do is to take this numbers and sum each one (considering it's unity, tens, hundred, thousand) and register the result into a integer variable, for the case, 'int vec_value'.
Considering the vector stated above, the answer should be: vec_value = 4321.
I will leave the main.cpp attached to the post, however I will tell you how I calculated the result, but it gave me the wrong answer.
vetor[0] = 1
vetor[1] = 2
vetor[2] = 3
vetor[3] = 4
the result should be = (1*10^0)+(2*10^1)+(3*10^2)+(4*10^3) = 1 + 20 +
300 + 4000 = 4321.
The program is giving me the solution as 4320, and if I change the values randomly, the answer follows the new values, but with wrong numbers still.
If anyone could take a look at my code to see what I'm doing wrong I'd appreciate it a lot!
Thanks..
There's a link to a picture at the end of the post showing an example of wrong result.
Keep in mind that sometimes the program gives me the right answer (what leaves me more confused)
Code:
#include <iostream>
#include <ctime>
#include <cstdlib>
#include <vector>
#include <cmath>
using namespace std;
int main()
{
vector<int> vetor;
srand(time(NULL));
int lim = rand() % 2 + 3; //the minimum size must be 3 and the maximum must be 4
int value;
for(int i=0; i<lim; i++)
{
value = rand() % 8 + 1; // I'm giving random values to each position of the vector
vetor.push_back(value);
cout << "\nPos [" << i << "]: " << vetor[i]; //just to keep in mind what are the elements inside the vector
}
int vec_value=0;
for(int i=0; i<lim; i++)
{
vec_value += vetor[i] * pow(10, i); //here i wrote the formula to sum each element of the vector with the correspondent unity, tens, hundreds or thousands
}
cout << "\n\nValor final: " << vec_value; //to see what result the program will give me
return 0;
}
Example of the program
Try this for the main loop:
int power = 1;
for(int i=0; i<lim; i++)
{
vec_value += vetor[i] * power;
power *= 10;
}
This way, all the computations are in integers, you are not affected by floating point rounding.

Async doesn't work for long vectors

I am doing some parallel programming with async. I have an integrator and in a test program I wanted to see whether if dividing a vector in 4 subvectors actually takes one fourth of the time to complete the task.
I had an initial issue about the time measured, now solved as steady_clock() measures real and not CPU time.
I tried the code with different vector lenghts. For short lenghts (<10e5 elements) the direct integration is faster: normal, as the .get() calls and the sum take their time.
For intermediate lenghts (about 1e8 elements) the integration followed the expected time, giving 1 s as the first time and 0.26 s for the second time.
For long vectors(10e9 or higher) the second integration takes much more time than the first, more than 3 s against a similar or greater time.
Why? What is the process that makes the divide and conquer routine slower?
A couple of additional notes: Please note that I pass the vectors by reference, so that cannot be the issue, and keep in mind that this is a test code, thus the subvector creation is not the point of the question.
#include<iostream>
#include<vector>
#include<thread>
#include<future>
#include<ctime>
#include<chrono>
using namespace std;
using namespace chrono;
typedef steady_clock::time_point tt;
double integral(const std::vector<double>& v, double dx) //simpson 1/3
{
int n=v.size();
double in=0.;
if(n%2 == 1) {in+=v[n-1]*v[n-1]; n--;}
in=(v[0]*v[0])+(v[n-1]*v[n-1]);
for(int i=1; i<n/2; i++)
in+= 2.*v[2*i] + 4.*v[2*i+1];
return in*dx/3.;
}
int main()
{
double h=0.001;
vector<double> v1(100000,h); // a vector, content is not important
// subvectors
vector<double> sv1(v1.begin(), v1.begin() + v1.size()/4),
sv2(v1.begin() + v1.size()/4 +1,v1.begin()+ 2*v1.size()/4),
sv3( v1.begin() + 2*v1.size()/4+1, v1.begin() + 3*v1.size()/4+1),
sv4( v1.begin() + 3*v1.size()/4+1, v1.end());
double a,b;
cout << "f1" << endl;
tt bt1 = chrono::steady_clock::now();
// complete integration: should take time t
a=integral(v1, h);
tt et1 = chrono::steady_clock::now();
duration<double> time_span = duration_cast<duration<double>>(et1 - bt1);
cout << time_span.count() << endl;
future<double> f1, f2,f3,f4;
cout << "f2" << endl;
tt bt2 = chrono::steady_clock::now();
// four integrations: should take time t/4
f1 = async(launch::async, integral, ref(sv1), h);
f2 = async(launch::async, integral, ref(sv2), h);
f3 = async(launch::async, integral, ref(sv3), h);
f4 = async(launch::async, integral, ref(sv4), h);
b=f1.get()+f2.get()+f3.get()+f4.get();
tt et2 = chrono::steady_clock::now();
duration<double> time_span2 = duration_cast<duration<double>>(et2 - bt2);
cout << time_span2.count() << endl;
cout << a << " " << b << endl;
return 0;
}

Simple random number generator that can generate nth number in series in O(1) time

I do not intend to use this for security purposes or statistical analysis. I need to create a simple random number generator for use in my computer graphics application. I don't want to use the term "random number generator", since people think in very strict terms about it, but I can't think of any other word to describe it.
it has to be fast.
it must be repeatable, given a particular seed.
Eg: If seed = x, then the series a,b,c,d,e,f..... should happen every time I use the seed x.
Most importantly, I need to be able to compute the nth term in the series in constant time.
It seems, that I cannot achieve this with rand_r or srand(), since these need are state dependent, and I may need to compute the nth in some unknown order.
I've looked at Linear Feedback Shift registers, but these are state dependent too.
So far I have this:
int rand = (n * prime1 + seed) % prime2
n = used to indicate the index of the term in the sequence. Eg: For
first term, n ==1
prime1 and prime2 are prime numbers where
prime1 > prime2
seed = some number which allows one to use the same function to
produce a different series depending on the seed, but the same series
for a given seed.
I can't tell how good or bad this is, since I haven't used it enough, but it would be great if people with more experience in this can point out the problems with this, or help me improve it..
EDIT - I don't care if it is predictable. I'm just trying to creating some randomness in my computer graphics.
Use a cryptographic block cipher in CTR mode. The Nth output is just encrypt(N). Not only does this give you the desired properties (O(1) computation of the Nth output); it also has strong non-predictability properties.
I stumbled on this a while back, looking for a solution for the same problem. Recently, I figured out how to do it in low-constant O(log(n)) time. While this doesn't quite match the O(1) requested by the author, It may be fast enough (a sample run, compiled with -O3, achieved performance of 1 billion arbitrary index random numbers, with n varying between 1 and 2^48, in 55.7s -- just shy of 18M numbers/s).
First, the theory behind the solution:
A common type of RNGs are Linear Congruential Generators, basically, they work as follows:
random(n) = (m*random(n-1) + b) mod p
Where m and b, and p are constants (see a reference on LCGs for how they are chosen). From this, we can devise the following using a bit of modular arithmetic:
random(0) = seed mod p
random(1) = m*seed + b mod p
random(2) = m^2*seed + m*b + b mod p
...
random(n) = m^n*seed + b*Sum_{i = 0 to n - 1} m^i mod p
= m^n*seed + b*(m^n - 1)/(m - 1) mod p
Computing the above can be a problem, since the numbers will quickly exceed numeric limits. The solution for the generic case is to compute m^n in modulo with p*(m - 1), however, if we take b = 0 (a sub-case of LCGs sometimes called Multiplicative congruential Generators), we have a much simpler solution, and can do our computations in modulo p only.
In the following, I use the constant parameters used by RANF (developed by CRAY), where p = 2^48 and g = 44485709377909. The fact that p is a power of 2 reduces the number of operations required (as expected):
#include <cassert>
#include <stdint.h>
#include <cstdlib>
class RANF{
// MCG constants and state data
static const uint64_t m = 44485709377909ULL;
static const uint64_t n = 0x0000010000000000ULL; // 2^48
static const uint64_t randMax = n - 1;
const uint64_t seed;
uint64_t state;
public:
// Constructors, which define the seed
RANF(uint64_t seed) : seed(seed), state(seed) {
assert(seed > 0 && "A seed of 0 breaks the LCG!");
}
// Gets the next random number in the sequence
inline uint64_t getNext(){
state *= m;
return state & randMax;
}
// Sets the MCG to a specific index
inline void setPosition(size_t index){
state = seed;
uint64_t mPower = m;
for (uint64_t b = 1; index; b <<= 1){
if (index & b){
state *= mPower;
index ^= b;
}
mPower *= mPower;
}
}
};
#include <cstdio>
void example(){
RANF R(1);
// Gets the number through random-access -- O(log(n))
R.setPosition(12345); // Goes to the nth random number
printf("fast nth number = %lu\n", R.getNext());
// Gets the number through standard, sequential access -- O(n)
R.setPosition(0);
for(size_t i = 0; i < 12345; i++) R.getNext();
printf("slow nth number = %lu\n", R.getNext());
}
While I presume the author has moved on by now, hopefully this will be of use to someone else.
If you're really concerned about runtime performance, the above can be made about 10x faster with lookup tables, at the cost of compilation time and binary size (it also is O(1) w.r.t the desired random index, as requested by OP)
In the version below, I used c++14 constexpr to generate the lookup tables at compile time, and got to 176M arbitrary index random numbers per second (doing this did however add about 12s of extra compilation time, and a 1.5MB increase in binary size -- the added time may be mitigated if partial recompilation is used).
class RANF{
// MCG constants and state data
static const uint64_t m = 44485709377909ULL;
static const uint64_t n = 0x0000010000000000ULL; // 2^48
static const uint64_t randMax = n - 1;
const uint64_t seed;
uint64_t state;
// Lookup table
struct lookup_t{
uint64_t v[3][65536];
constexpr lookup_t() : v() {
uint64_t mi = RANF::m;
for (size_t i = 0; i < 3; i++){
v[i][0] = 1;
uint64_t val = mi;
for (uint16_t j = 0x0001; j; j++){
v[i][j] = val;
val *= mi;
}
mi = val;
}
}
};
friend struct lookup_t;
public:
// Constructors, which define the seed
RANF(uint64_t seed) : seed(seed), state(seed) {
assert(seed > 0 && "A seed of 0 breaks the LCG!");
}
// Gets the next random number in the sequence
inline uint64_t getNext(){
state *= m;
return state & randMax;
}
// Sets the MCG to a specific index
// Note: idx.u16 indices need to be adapted for big-endian machines!
inline void setPosition(size_t index){
static constexpr auto lookup = lookup_t();
union { uint16_t u16[4]; uint64_t u64; } idx;
idx.u64 = index;
state = seed * lookup.v[0][idx.u16[0]] * lookup.v[1][idx.u16[1]] * lookup.v[2][idx.u16[2]];
}
};
Basically, what this does is splits the computation of, for example, m^0xAAAABBBBCCCC mod p, into (m^0xAAAA00000000 mod p)*(m^0xBBBB0000 mod p)*(m^CCCC mod p) mod p, and then precomputes tables for each of the values in the 0x0000 - 0xFFFF range that could fill AAAA, BBBB or CCCC.
RNG in a normal sense, have the sequence pattern like f(n) = S(f(n-1))
They also lost precision at some point (like % mod), due to computing convenience, therefore it is not possible to expand the sequence to a function like X(n) = f(n) = trivial function with n only.
This mean at best you have O(n) with that.
To target for O(1) you therefore need to abandon the idea of f(n) = S(f(n-1)), and designate a trivial formula directly so that the N'th number can be calculated directly without knowing (N-1)'th; this also render the seed meaningless.
So, you end up have a simple algebra function and not a sequence. For example:
int my_rand(int n) { return 42; } // Don't laugh!
int my_rand(int n) { 3*n*n + 2*n + 7; }
If you want to put more constraint to the generated pattern (like distribution), it become a complex maths problem.
However, for your original goal, if what you want is constant speed to get pseudo-random numbers, I suggest to pre-generate it with traditional RNG and access with lookup table.
EDIT: I noticed you have concern with a table size for a lot of numbers, however you may introduce some hybrid model, like a table of N entries, and do f(k) = g( tbl[k%n], k), which at least provide good distribution across N continue sequence.
This demonstrates an PRNG implemented as a hashed counter. This might appear to duplicate R.'s suggestion (using a block cipher in CTR mode as a stream cipher), but for this, I avoided using cryptographically secure primitives: for speed of execution and because security wasn't a desired feature.
If we were trying to create a secure stream cipher with your requirement that any emitted sequence be trivially repeatable, given knowledge of its index...
...then we could choose a secure hash algorithm (like SHA256) and a counter with a lot of bits (maybe 2048 -> sequence repeats every 2^2048 generated numbers without reseeding).
HOWEVER, the version I present here uses Bob Jenkins' famous hash function (simple and fast, but not secure) along with a 64-bit counter (which is as big as integers can get on my system, without needing custom incrementing code).
Code in main demonstrates that knowledge of the RNG's counter (seed) after initialization allows a PRNG sequence to be repeated, as long as we know how many values were generated leading up to the repetition point.
Actually, if you know the counter's value at any point in the output sequence, you will be able to retrieve all values generated previous to that point, AND all values which will be generated afterward. This only involves adding or subtracting ordinal differences to/from a reference counter value associated with a known point in the output sequence.
It should be pretty easy to adapt this class for use as a testing framework -- you could plug in other hash functions and change the counter's size to see what kind of impact there is on speed as well as the distribution of generated values (the only uniformity analysis I did was to look for patterns in the screenfuls of hexadecimal numbers printed by main()).
#include <iostream>
#include <iomanip>
#include <ctime>
using namespace std;
class CHashedCounterRng {
static unsigned JenkinsHash(const void *input, unsigned len) {
unsigned hash = 0;
for(unsigned i=0; i<len; ++i) {
hash += static_cast<const unsigned char*>(input)[i];
hash += hash << 10;
hash ^= hash >> 6;
}
hash += hash << 3;
hash ^= hash >> 11;
hash += hash << 15;
return hash;
}
unsigned long long m_counter;
void IncrementCounter() { ++m_counter; }
public:
unsigned long long GetSeed() const {
return m_counter;
}
void SetSeed(unsigned long long new_seed) {
m_counter = new_seed;
}
unsigned int operator ()() {
// the next random number is generated here
const auto r = JenkinsHash(&m_counter, sizeof(m_counter));
IncrementCounter();
return r;
}
// the default coontructor uses time()
// to seed the counter
CHashedCounterRng() : m_counter(time(0)) {}
// you can supply a predetermined seed here,
// or after construction with SetSeed(seed)
CHashedCounterRng(unsigned long long seed) : m_counter(seed) {}
};
int main() {
CHashedCounterRng rng;
// time()'s high bits change very slowly, so look at low digits
// if you want to verify that the seed is different between runs
const auto stored_counter = rng.GetSeed();
cout << "initial seed: " << stored_counter << endl;
for(int i=0; i<20; ++i) {
for(int j=0; j<8; ++j) {
const unsigned x = rng();
cout << setfill('0') << setw(8) << hex << x << ' ';
}
cout << endl;
}
cout << endl;
cout << "The last line again:" << endl;
rng.SetSeed(stored_counter + 19 * 8);
for(int j=0; j<8; ++j) {
const unsigned x = rng();
cout << setfill('0') << setw(8) << hex << x << ' ';
}
cout << endl << endl;
return 0;
}

Resources