Edit: my main question is that I want to replicate the TI-84 plus RNG algorithm on my computer, so I can write it in a language like Javascript or Lua, to test it faster.
I tried using an emulator, but it turned out to be slower than the calculator.
Just for the people concerned: There is another question like this, but answer to that question just says how to transfer already-generated numbers over to the computer. I don't want this. I already tried something like it, but I had to leave the calculator running all weekend, and it still wasn't done.
The algorithm being used is from the paper Efficient and portable combined random number generators by P. L'Ecuyer.
You can find the paper here and download it for free from here.
The algorithm used by the Ti calculators is on the RHS side of p. 747. I've included a picture.
I've translated this into a C++ program
#include <iostream>
#include <iomanip>
using namespace std;
long s1,s2;
double Uniform(){
long Z,k;
k = s1 / 53668;
s1 = 40014*(s1-k*53668)-k*12211;
if(s1<0)
s1 = s1+2147483563;
k = s2/52774;
s2 = 40692*(s2-k*52774)-k*3791;
if(s2<0)
s2 = s2+2147483399;
Z=s1-s2;
if(Z<1)
Z = Z+2147483562;
return Z*(4.656613e-10);
}
int main(){
s1 = 12345; //Gotta love these seed values!
s2 = 67890;
for(int i=0;i<10;i++)
cout<<std::setprecision(10)<<Uniform()<<endl;
}
Note that the initial seeds are s1 = 12345 and s2 = 67890.
And got an output from a Ti-83 (sorry, I couldn't find a Ti-84 ROM) emulator:
This matches what my implementation produces
I've just cranked the output precision on my implementation and get the following results:
0.9435973904
0.9083188494
0.1466878273
0.5147019439
0.4058096366
0.7338123019
0.04399198693
0.3393625207
Note that they diverge from Ti's results in the less significant digits. This may be a difference in the way the two processors (Ti's Z80 versus my X86) perform floating point calculations. If so, it will be hard to overcome this issue. Nonetheless, the random numbers will still generate in the same sequence (with the caveat below) since the sequence relies on only integer mathematics, which are exact.
I've also used the long type to store intermediate values. There's some risk that the Ti implementation relies on integer overflow (I didn't read L'Ecuyer's paper too carefully), in which case you would have to adjust to int32_t or a similar type to emulate this behaviour. Assuming, again, that the processors perform similarly.
Edit
This site provides a Ti-Basic implementation of the code as follows:
:2147483563→mod1
:2147483399→mod2
:40014→mult1
:40692→mult2
#The RandSeed Algorithm
:abs(int(n))→n
:If n=0 Then
: 12345→seed1
: 67890→seed2
:Else
: mod(mult1*n,mod1)→seed1
: mod(n,mod2)→seed2
:EndIf
#The rand() Algorithm
:Local result
:mod(seed1*mult1,mod1)→seed1
:mod(seed2*mult2,mod2)→seed2
:(seed1-seed2)/mod1→result
:If result<0
: result+1→result
:Return result
I translated this into C++ for testing:
#include <iostream>
#include <iomanip>
using namespace std;
long mod1 = 2147483563;
long mod2 = 2147483399;
long mult1 = 40014;
long mult2 = 40692;
long seed1,seed2;
void Seed(int n){
if(n<0) //Perform an abs
n = -n;
if(n==0){
seed1 = 12345; //Gotta love these seed values!
seed2 = 67890;
} else {
seed1 = (mult1*n)%mod1;
seed2 = n%mod2;
}
}
double Generate(){
double result;
seed1 = (seed1*mult1)%mod1;
seed2 = (seed2*mult2)%mod2;
result = (double)(seed1-seed2)/(double)mod1;
if(result<0)
result = result+1;
return result;
}
int main(){
Seed(0);
for(int i=0;i<10;i++)
cout<<setprecision(10)<<Generate()<<endl;
}
This gave the following results:
0.9435974025
0.908318861
0.1466878292
0.5147019502
0.405809642
0.7338123114
0.04399198747
0.3393625248
0.9954663411
0.2003402617
which match those achieved with the implementation based on the original paper.
I implemented rand, randInt, randM and randBin in Python. Thanks Richard for the C code. All implemented commands work as expected. You can also find it in this Gist.
import math
class TIprng(object):
def __init__(self):
self.mod1 = 2147483563
self.mod2 = 2147483399
self.mult1 = 40014
self.mult2 = 40692
self.seed1 = 12345
self.seed2 = 67890
def seed(self, n):
n = math.fabs(math.floor(n))
if (n == 0):
self.seed1 = 12345
self.seed2 = 67890
else:
self.seed1 = (self.mult1 * n) % self.mod1
self.seed2 = (n)% self.mod2
def rand(self, times = 0):
# like TI, this will return a list (array in python) if times == 1,
# or an integer if times isn't specified
if not(times):
self.seed1 = (self.seed1 * self.mult1) % self.mod1
self.seed2 = (self.seed2 * self.mult2)% self.mod2
result = (self.seed1 - self.seed2)/self.mod1
if(result<0):
result = result+1
return result
else:
return [self.rand() for _ in range(times)]
def randInt(self, minimum, maximum, times = 0):
# like TI, this will return a list (array in python) if times == 1,
# or an integer if times isn't specified
if not(times):
if (minimum < maximum):
return (minimum + math.floor((maximum- minimum + 1) * self.rand()))
else:
return (maximum + math.floor((minimum - maximum + 1) * self.rand()))
else:
return [self.randInt(minimum, maximum) for _ in range(times)]
def randBin(self, numtrials, prob, times = 0):
if not(times):
return sum([(self.rand() < prob) for _ in range(numtrials)])
else:
return [self.randBin(numtrials, prob) for _ in range(times)]
def randM(self, rows, columns):
# this will return an array of arrays
matrixArr = [[0 for x in range(columns)] for x in range(rows)]
# we go from bottom to top, from right to left
for row in reversed(range(rows)):
for column in reversed(range(columns)):
matrixArr[row][column] = self.randInt(-9, 9)
return matrixArr
testPRNG = TIprng()
testPRNG.seed(0)
print(testPRNG.randInt(0,100))
testPRNG.seed(0)
print(testPRNG.randM(3,4))
The algorithm used by the TI-Basic rand command is L'Ecuyer's algorithm according to TIBasicDev.
rand generates a uniformly-distributed pseudorandom number (this page
and others will sometimes drop the pseudo- prefix for simplicity)
between 0 and 1. rand(n) generates a list of n uniformly-distributed
pseudorandom numbers between 0 and 1. seed→rand seeds (initializes)
the built-in pseudorandom number generator. The factory default seed
is 0.
L'Ecuyer's algorithm is used by TI calculators to generate
pseudorandom numbers.
Unfortunately I have not been able to find any source published by Texas Instruments backing up this claim, so I cannot with certainty that this is the algorthm used. I am also uncertain what exactly is referred to by L'Ecuyer's algorithm.
Here is a C++ program that works:
#include<cmath>
#include<iostream>
#include<iomanip>
using namespace std;
int main()
{
double seed1 = 12345;
double seed2 = 67890;
double mod1 = 2147483563;
double mod2 = 2147483399;
double result;
for(int i=0; i<10; i++)
{
seed1 = seed1*40014-mod1*floor((seed1*40014)/mod1);
seed2 = seed2*40692-mod2*floor((seed2*40692)/mod2);
result = (seed1 - seed2)/mod1;
if(result < 0)
{result = result + 1;}
cout<<setprecision(10)<<result<<endl;
}
return 0;
}
Related
So, I'm trying to create a random vector (think geometry, not an expandable array), and every time I call my random vector function I get the same x value, though y and z are different.
int main () {
srand ( (unsigned)time(NULL));
Vector<double> a;
a.randvec();
cout << a << endl;
return 0;
}
using the function
//random Vector
template <class T>
void Vector<T>::randvec()
{
const int min=-10, max=10;
int randx, randy, randz;
const int bucket_size = RAND_MAX/(max-min);
do randx = (rand()/bucket_size)+min;
while (randx <= min && randx >= max);
x = randx;
do randy = (rand()/bucket_size)+min;
while (randy <= min && randy >= max);
y = randy;
do randz = (rand()/bucket_size)+min;
while (randz <= min && randz >= max);
z = randz;
}
For some reason, randx will consistently return 8, whereas the other numbers seem to be following the (pseudo) randomness perfectly. However, if I put the call to define, say, randy before randx, randy will always return 8.
Why is my first random number always 8? Am I seeding incorrectly?
The issue is that the random number generator is being seeded with a values that are very close together - each run of the program only changes the return value of time() by a small amount - maybe 1 second, maybe even none! The rather poor standard random number generator then uses these similar seed values to generate apparently identical initial random numbers. Basically, you need a better initial seed generator than time() and a better random number generator than rand().
The actual looping algorithm used is I think lifted from Accelerated C++ and is intended to produce a better spread of numbers over the required range than say using the mod operator would. But it can't compensate for always being (effectively) given the same seed.
I don't see any problem with your srand(), and when I tried running extremely similar code, I did not repeatedly get the same number with the first rand(). However, I did notice another possible issue.
do randx = (rand()/bucket_size)+min;
while (randx <= min && randx >= max);
This line probably does not do what you intended. As long as min < max (and it always should be), it's impossible for randx to be both less than or equal to min and greater than or equal to max. Plus, you don't need to loop at all. Instead, you can get a value in between min and max using:
randx = rand() % (max - min) + min;
I had the same problem exactly. I fixed it by moving the srand() call so it was only called once in my program (previously I had been seeding it at the top of a function call).
Don't really understand the technicalities - but it was problem solved.
Also to mention, you can even get rid of that strange bucket_size variable and use the following method to generate numbers from a to b inclusively:
srand ((unsigned)time(NULL));
const int a = -1;
const int b = 1;
int x = rand() % ((b - a) + 1) + a;
int y = rand() % ((b - a) + 1) + a;
int z = rand() % ((b - a) + 1) + a;
A simple quickfix is to call rand a few times after seeding.
int main ()
{
srand ( (unsigned)time(NULL));
rand(); rand(); rand();
Vector<double> a;
a.randvec();
cout << a << endl;
return 0;
}
Just to explain better, the first call to rand() in four sequential runs of a test program gave the following output:
27592
27595
27598
27602
Notice how similar they are? For example, if you divide rand() by 100, you will get the same number 3 times in a row. Now take a look at the second result of rand() in four sequential runs:
11520
22268
248
10997
This looks much better, doesn't it? I really don't see any reason for the downvotes.
Your implementation, through integer division, ignores the smallest 4-5 bit of the random number. Since your RNG is seeded with the system time, the first value you get out of it will change only (on average) every 20 seconds.
This should work:
randx = (min) + (int) ((max - min) * rand() / (RAND_MAX + 1.0));
where
rand() / (RAND_MAX + 1.0)
is a random double value in [0, 1) and the rest is just shifting it around.
Not directly related to the code in this question, but I had same issue with using
srand ((unsigned)time(NULL)) and still having same sequence of values being returned from following calls to rand().
It turned out that srand needs to called on each thread you are using it on separately. I had a loading thread that was generating random content (that wasn't random cuz of the seed issue). I had just using srand in the main thread and not the loading thread. So added another srand ((unsigned)time(NULL)) to start of loading thread fixed this issue.
I saw the following interview question on some online forum. What is a good solution for this?
Get the last 1000 digits of 5^1234566789893943
Simple algorithm:
1. Maintain a 1000-digits array which will have the answer at the end
2. Implement a multiplication routine like you do in school. It is O(d^2).
3. Use modular exponentiation by squaring.
Iterative exponentiation:
array ans;
int a = 5;
while (p > 0) {
if (p&1) {
ans = multiply(ans, a)
}
p = p>>1;
ans = multiply(ans, ans);
}
multiply: multiplies two large number using the school method and return last 1000 digits.
Time complexity: O(d^2*logp) where d is number of last digits needed and p is power.
A typical solution for this problem would be to use modular arithmetic and exponentiation by squaring to compute the remainder of 5^1234566789893943 when divided by 10^1000. However in your case this will still not be good enough as it would take about 1000*log(1234566789893943) operations and this is not too much, but I will propose a more general approach that would work for greater values of the exponent.
You will have to use a bit more complicated number theory. You can use Euler's theorem to get the remainder of 5^1234566789893943 modulo 2^1000 a lot more efficiently. Denote that r. It is also obvious that 5^1234566789893943 is divisible by 5^1000.
After that you need to find a number d such that 5^1000*d = r(modulo 2^1000). To solve this equation you should compute 5^1000(modulo 2^1000). After that all that is left is to do division modulo 2^1000. Using again Euler's theorem this can be done efficiently. Use that x^(phi(2^1000)-1)*x =1(modulo 2^1000). This approach is way faster and is the only feasible solution.
The key phrase is "modular exponentiation". Python has that built in:
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> help(pow)
Help on built-in function pow in module builtins:
pow(...)
pow(x, y[, z]) -> number
With two arguments, equivalent to x**y. With three arguments,
equivalent to (x**y) % z, but may be more efficient (e.g. for ints).
>>> digits = pow(5, 1234566789893943, 10**1000)
>>> len(str(digits))
1000
>>> digits
4750414775792952522204114184342722049638880929773624902773914715850189808476532716372371599198399541490535712666678457047950561228398126854813955228082149950029586996237166535637925022587538404245894713557782868186911348163750456080173694616157985752707395420982029720018418176528050046735160132510039430638924070731480858515227638960577060664844432475135181968277088315958312427313480771984874517274455070808286089278055166204573155093723933924226458522505574738359787477768274598805619392248788499020057331479403377350096157635924457653815121544961705226996087472416473967901157340721436252325091988301798899201640961322478421979046764449146045325215261829432737214561242087559734390139448919027470137649372264607375942527202021229200886927993079738795532281264345533044058574930108964976191133834748071751521214092905298139886778347051165211279789776682686753139533912795298973229094197221087871530034608077419911440782714084922725088980350599242632517985214513078773279630695469677448272705078125
>>>
The technique we need to know is exponentiation by squaring and modulus. We also need to use BigInteger in Java.
Simple code in Java:
BigInteger m = //BigInteger of 10^1000
BigInteger pow(BigInteger a, long b) {
if (b == 0) {
return BigInteger.ONE;
}
BigInteger val = pow(a, b/2);
if (b % 2 == 0)
return (val.multiply(val)).mod(m);
else
return (val.multiply(val).multiply(a)).mod(m);
}
In Java, the function modPow has done it all for you (thank Java).
Use congruence and apply modular arithmetic.
Square and multiply algorithm.
If you divide any number in base 10 by 10 then the remainder represents
the last digit. i.e. 23422222=2342222*10+2
So we know:
5=5(mod 10)
5^2=25=5(mod 10)
5^4=(5^2)*(5^2)=5*5=5(mod 10)
5^8=(5^4)*(5^4)=5*5=5(mod 10)
... and keep going until you get to that exponent
OR, you can realize that as we keep going you keep getting 5 as your remainder.
Convert the number to a string.
Loop on the string, starting at the last index up to 1000.
Then reverse the result string.
I posted a solution based on some hints here.
#include <vector>
#include <iostream>
using namespace std;
vector<char> multiplyArrays(const vector<char> &data1, const vector<char> &data2, int k) {
int sz1 = data1.size();
int sz2 = data2.size();
vector<char> result(sz1+sz2,0);
for(int i=sz1-1; i>=0; --i) {
char carry = 0;
for(int j=sz2-1; j>=0; --j) {
char value = data1[i] * data2[j]+result[i+j+1]+carry;
carry = value/10;
result[i+j+1] = value % 10;
}
result[i]=carry;
}
if(sz1+sz2>k){
vector<char> lastKElements(result.begin()+(sz1+sz2-k), result.end());
return lastKElements;
}
else
return result;
}
vector<char> calculate(unsigned long m, unsigned long n, int k) {
if(n == 0) {
return vector<char>(1, 1);
} else if(n % 2) { // odd number
vector<char> tmp(1, m);
vector<char> result1 = calculate(m, n-1, k);
return multiplyArrays(result1, tmp, k);
} else {
vector<char> result1 = calculate(m, n/2, k);
return multiplyArrays(result1, result1, k);
}
}
int main(int argc, char const *argv[]){
vector<char> v=calculate(5,8,1000);
for(auto c : v){
cout<<static_cast<unsigned>(c);
}
}
I don't know if Windows can show a big number (Or if my computer is fast enough to show it) But I guess you COULD use this code like and algorithm:
ulong x = 5; //There are a lot of libraries for other languages like C/C++ that support super big numbers. In this case I'm using C#'s default `Uint64` number.
for(ulong i=1; i<1234566789893943; i++)
{
x = x * x; //I will make the multiplication raise power over here
}
string term = x.ToString(); //Store the number to a string. I remember strings can store up to 1 billion characters.
char[] number = term.ToCharArray(); //Array of all the digits
int tmp=0;
while(number[tmp]!='.') //This will search for the period.
tmp++;
tmp++; //After finding the period, I will start storing 1000 digits from this index of the char array
string thousandDigits = ""; //Here I will store the digits.
for (int i = tmp; i <= 1000+tmp; i++)
{
thousandDigits += number[i]; //Storing digits
}
Using this as a reference, I guess if you want to try getting the LAST 1000 characters of this array, change to this in the for of the above code:
string thousandDigits = "";
for (int i = 0; i > 1000; i++)
{
thousandDigits += number[number.Length-i]; //Reverse array... ¿?
}
As I don't work with super super looooong numbers, I don't know if my computer can get those, I tried the code and it works but when I try to show the result in console it just leave the pointer flickering xD Guess it's still working. Don't have a pro Processor. Try it if you want :P
I do not intend to use this for security purposes or statistical analysis. I need to create a simple random number generator for use in my computer graphics application. I don't want to use the term "random number generator", since people think in very strict terms about it, but I can't think of any other word to describe it.
it has to be fast.
it must be repeatable, given a particular seed.
Eg: If seed = x, then the series a,b,c,d,e,f..... should happen every time I use the seed x.
Most importantly, I need to be able to compute the nth term in the series in constant time.
It seems, that I cannot achieve this with rand_r or srand(), since these need are state dependent, and I may need to compute the nth in some unknown order.
I've looked at Linear Feedback Shift registers, but these are state dependent too.
So far I have this:
int rand = (n * prime1 + seed) % prime2
n = used to indicate the index of the term in the sequence. Eg: For
first term, n ==1
prime1 and prime2 are prime numbers where
prime1 > prime2
seed = some number which allows one to use the same function to
produce a different series depending on the seed, but the same series
for a given seed.
I can't tell how good or bad this is, since I haven't used it enough, but it would be great if people with more experience in this can point out the problems with this, or help me improve it..
EDIT - I don't care if it is predictable. I'm just trying to creating some randomness in my computer graphics.
Use a cryptographic block cipher in CTR mode. The Nth output is just encrypt(N). Not only does this give you the desired properties (O(1) computation of the Nth output); it also has strong non-predictability properties.
I stumbled on this a while back, looking for a solution for the same problem. Recently, I figured out how to do it in low-constant O(log(n)) time. While this doesn't quite match the O(1) requested by the author, It may be fast enough (a sample run, compiled with -O3, achieved performance of 1 billion arbitrary index random numbers, with n varying between 1 and 2^48, in 55.7s -- just shy of 18M numbers/s).
First, the theory behind the solution:
A common type of RNGs are Linear Congruential Generators, basically, they work as follows:
random(n) = (m*random(n-1) + b) mod p
Where m and b, and p are constants (see a reference on LCGs for how they are chosen). From this, we can devise the following using a bit of modular arithmetic:
random(0) = seed mod p
random(1) = m*seed + b mod p
random(2) = m^2*seed + m*b + b mod p
...
random(n) = m^n*seed + b*Sum_{i = 0 to n - 1} m^i mod p
= m^n*seed + b*(m^n - 1)/(m - 1) mod p
Computing the above can be a problem, since the numbers will quickly exceed numeric limits. The solution for the generic case is to compute m^n in modulo with p*(m - 1), however, if we take b = 0 (a sub-case of LCGs sometimes called Multiplicative congruential Generators), we have a much simpler solution, and can do our computations in modulo p only.
In the following, I use the constant parameters used by RANF (developed by CRAY), where p = 2^48 and g = 44485709377909. The fact that p is a power of 2 reduces the number of operations required (as expected):
#include <cassert>
#include <stdint.h>
#include <cstdlib>
class RANF{
// MCG constants and state data
static const uint64_t m = 44485709377909ULL;
static const uint64_t n = 0x0000010000000000ULL; // 2^48
static const uint64_t randMax = n - 1;
const uint64_t seed;
uint64_t state;
public:
// Constructors, which define the seed
RANF(uint64_t seed) : seed(seed), state(seed) {
assert(seed > 0 && "A seed of 0 breaks the LCG!");
}
// Gets the next random number in the sequence
inline uint64_t getNext(){
state *= m;
return state & randMax;
}
// Sets the MCG to a specific index
inline void setPosition(size_t index){
state = seed;
uint64_t mPower = m;
for (uint64_t b = 1; index; b <<= 1){
if (index & b){
state *= mPower;
index ^= b;
}
mPower *= mPower;
}
}
};
#include <cstdio>
void example(){
RANF R(1);
// Gets the number through random-access -- O(log(n))
R.setPosition(12345); // Goes to the nth random number
printf("fast nth number = %lu\n", R.getNext());
// Gets the number through standard, sequential access -- O(n)
R.setPosition(0);
for(size_t i = 0; i < 12345; i++) R.getNext();
printf("slow nth number = %lu\n", R.getNext());
}
While I presume the author has moved on by now, hopefully this will be of use to someone else.
If you're really concerned about runtime performance, the above can be made about 10x faster with lookup tables, at the cost of compilation time and binary size (it also is O(1) w.r.t the desired random index, as requested by OP)
In the version below, I used c++14 constexpr to generate the lookup tables at compile time, and got to 176M arbitrary index random numbers per second (doing this did however add about 12s of extra compilation time, and a 1.5MB increase in binary size -- the added time may be mitigated if partial recompilation is used).
class RANF{
// MCG constants and state data
static const uint64_t m = 44485709377909ULL;
static const uint64_t n = 0x0000010000000000ULL; // 2^48
static const uint64_t randMax = n - 1;
const uint64_t seed;
uint64_t state;
// Lookup table
struct lookup_t{
uint64_t v[3][65536];
constexpr lookup_t() : v() {
uint64_t mi = RANF::m;
for (size_t i = 0; i < 3; i++){
v[i][0] = 1;
uint64_t val = mi;
for (uint16_t j = 0x0001; j; j++){
v[i][j] = val;
val *= mi;
}
mi = val;
}
}
};
friend struct lookup_t;
public:
// Constructors, which define the seed
RANF(uint64_t seed) : seed(seed), state(seed) {
assert(seed > 0 && "A seed of 0 breaks the LCG!");
}
// Gets the next random number in the sequence
inline uint64_t getNext(){
state *= m;
return state & randMax;
}
// Sets the MCG to a specific index
// Note: idx.u16 indices need to be adapted for big-endian machines!
inline void setPosition(size_t index){
static constexpr auto lookup = lookup_t();
union { uint16_t u16[4]; uint64_t u64; } idx;
idx.u64 = index;
state = seed * lookup.v[0][idx.u16[0]] * lookup.v[1][idx.u16[1]] * lookup.v[2][idx.u16[2]];
}
};
Basically, what this does is splits the computation of, for example, m^0xAAAABBBBCCCC mod p, into (m^0xAAAA00000000 mod p)*(m^0xBBBB0000 mod p)*(m^CCCC mod p) mod p, and then precomputes tables for each of the values in the 0x0000 - 0xFFFF range that could fill AAAA, BBBB or CCCC.
RNG in a normal sense, have the sequence pattern like f(n) = S(f(n-1))
They also lost precision at some point (like % mod), due to computing convenience, therefore it is not possible to expand the sequence to a function like X(n) = f(n) = trivial function with n only.
This mean at best you have O(n) with that.
To target for O(1) you therefore need to abandon the idea of f(n) = S(f(n-1)), and designate a trivial formula directly so that the N'th number can be calculated directly without knowing (N-1)'th; this also render the seed meaningless.
So, you end up have a simple algebra function and not a sequence. For example:
int my_rand(int n) { return 42; } // Don't laugh!
int my_rand(int n) { 3*n*n + 2*n + 7; }
If you want to put more constraint to the generated pattern (like distribution), it become a complex maths problem.
However, for your original goal, if what you want is constant speed to get pseudo-random numbers, I suggest to pre-generate it with traditional RNG and access with lookup table.
EDIT: I noticed you have concern with a table size for a lot of numbers, however you may introduce some hybrid model, like a table of N entries, and do f(k) = g( tbl[k%n], k), which at least provide good distribution across N continue sequence.
This demonstrates an PRNG implemented as a hashed counter. This might appear to duplicate R.'s suggestion (using a block cipher in CTR mode as a stream cipher), but for this, I avoided using cryptographically secure primitives: for speed of execution and because security wasn't a desired feature.
If we were trying to create a secure stream cipher with your requirement that any emitted sequence be trivially repeatable, given knowledge of its index...
...then we could choose a secure hash algorithm (like SHA256) and a counter with a lot of bits (maybe 2048 -> sequence repeats every 2^2048 generated numbers without reseeding).
HOWEVER, the version I present here uses Bob Jenkins' famous hash function (simple and fast, but not secure) along with a 64-bit counter (which is as big as integers can get on my system, without needing custom incrementing code).
Code in main demonstrates that knowledge of the RNG's counter (seed) after initialization allows a PRNG sequence to be repeated, as long as we know how many values were generated leading up to the repetition point.
Actually, if you know the counter's value at any point in the output sequence, you will be able to retrieve all values generated previous to that point, AND all values which will be generated afterward. This only involves adding or subtracting ordinal differences to/from a reference counter value associated with a known point in the output sequence.
It should be pretty easy to adapt this class for use as a testing framework -- you could plug in other hash functions and change the counter's size to see what kind of impact there is on speed as well as the distribution of generated values (the only uniformity analysis I did was to look for patterns in the screenfuls of hexadecimal numbers printed by main()).
#include <iostream>
#include <iomanip>
#include <ctime>
using namespace std;
class CHashedCounterRng {
static unsigned JenkinsHash(const void *input, unsigned len) {
unsigned hash = 0;
for(unsigned i=0; i<len; ++i) {
hash += static_cast<const unsigned char*>(input)[i];
hash += hash << 10;
hash ^= hash >> 6;
}
hash += hash << 3;
hash ^= hash >> 11;
hash += hash << 15;
return hash;
}
unsigned long long m_counter;
void IncrementCounter() { ++m_counter; }
public:
unsigned long long GetSeed() const {
return m_counter;
}
void SetSeed(unsigned long long new_seed) {
m_counter = new_seed;
}
unsigned int operator ()() {
// the next random number is generated here
const auto r = JenkinsHash(&m_counter, sizeof(m_counter));
IncrementCounter();
return r;
}
// the default coontructor uses time()
// to seed the counter
CHashedCounterRng() : m_counter(time(0)) {}
// you can supply a predetermined seed here,
// or after construction with SetSeed(seed)
CHashedCounterRng(unsigned long long seed) : m_counter(seed) {}
};
int main() {
CHashedCounterRng rng;
// time()'s high bits change very slowly, so look at low digits
// if you want to verify that the seed is different between runs
const auto stored_counter = rng.GetSeed();
cout << "initial seed: " << stored_counter << endl;
for(int i=0; i<20; ++i) {
for(int j=0; j<8; ++j) {
const unsigned x = rng();
cout << setfill('0') << setw(8) << hex << x << ' ';
}
cout << endl;
}
cout << endl;
cout << "The last line again:" << endl;
rng.SetSeed(stored_counter + 19 * 8);
for(int j=0; j<8; ++j) {
const unsigned x = rng();
cout << setfill('0') << setw(8) << hex << x << ' ';
}
cout << endl << endl;
return 0;
}
Imagine we have four symbols - 'a', 'b', 'c', 'd'. We also have four given probabilities of those symbols appearing in the function output - P1, P2, P3, P4 (the sum of which is equal to 1). How would one implement a function which would generate a random sample of three of those symbols, such is that the resulting symbols are present in it with those specified probabilities?
Example: 'a', 'b', 'c' and 'd' have the probabilities of 9/30, 8/30, 7/30 and 6/30 respectively. The function outputs various random samples of any three out of those four symbols: 'abc', 'dca', 'bad' and so on. We run this function many times, counting the amount of times each of the symbols is encountered in its output. At the end, the value of counts stored for 'a' divided by the total amount of symbols output should converge to 9/30, for 'b' to 8/30, for 'c' to 7/30, and for 'd' to 6/30.
E.g. the function generates 10 outputs:
adc
dab
bca
dab
dba
cab
dcb
acd
cab
abc
which out of 30 symbols contains 9 of 'a', 8 of 'b', 7 of 'c' and 6 of 'd'. This is an idealistic example, of course, as the values would only converge when the number of samples is much larger - but it should hopefully convey the idea.
Obviously, this all is only possible when neither probability is larger than 1/3, since each single sample output would always contain three distinct symbols. It is ok for the function to enter an infinite loop or otherwise behave erratically if it's impossible to satisfy the values provided.
Note: the function should obviously use an RNG, but should otherwise be stateless. Each new invocation should be independent from any of the previous ones, except for the RNG state.
EDIT: Even though the description mentions choosing 3 out of 4 values, ideally the algorithm should be able to cope with any sample size.
Your problem is underdetermined.
If we assign a probability to each string of three letters that we allow, p(abc), p(abd), p(acd) etc xtc we can gernerate a series of equations
eqn1: p(abc) + p(abd) + ... others with a "a" ... = p1
...
...
eqn2: p(abd) + p(acd) + ... others with a "d" ... = p4
This has more unknowns than equations, so many ways of solving it. Once a solution is found, by whatever method you choose (use the simplex algorithm if you are me), sample from the probabilities of each string using the roulette method that #alestanis describes.
from numpy import *
# using cvxopt-1.1.5
from cvxopt import matrix, solvers
###########################
# Functions to do some parts
# function to find all valid outputs
def perms(alphabet, length):
if length == 0:
yield ""
return
for i in range(len(alphabet)):
val1 = alphabet[i]
for val2 in perms(alphabet[:i]+alphabet[i+1:], length-1):
yield val1 + val2
# roulette sampler
def roulette_sampler(values, probs):
# Create cumulative prob distro
probs_cum = [sum(probs[:i+1]) for i in range(n_strings)]
def fun():
r = random.rand()
for p,s in zip(probs_cum, values):
if r < p:
return s
# in case of rounding error
return values[-1]
return fun
############################
# Main Part
# create list of all valid strings
alphabet = "abcd"
string_length = 3
alpha_probs = [string_length*x/30. for x in range(9,5,-1)]
# show probs
for a,p in zip(alphabet, alpha_probs):
print "p("+a+") =",p
# all valid outputs for this particular case
strings = [perm for perm in perms(alphabet, string_length)]
n_strings = len(strings)
# constraints from probabilities p(abc) + p(abd) ... = p(a)
contains = array([[1. if s.find(a) >= 0 else 0. for a in alphabet] for s in strings])
#both = concatenate((contains,wons), axis=1).T # hacky, but whatever
#A = matrix(both)
#b = matrix(alpha_probs + [1.])
A = matrix(contains.T)
b = matrix(alpha_probs)
#also need to constrain to [0,1]
wons = array([[1. for s in strings]])
G = matrix(concatenate((eye(n_strings),wons,-eye(n_strings),-wons)))
h = matrix(concatenate((ones(n_strings+1),zeros(n_strings+1))))
## target matricies for approx KL divergence
# uniform prob over valid outputs
u = 1./len(strings)
P = matrix(eye(n_strings))
q = -0.5*u*matrix(ones(n_strings))
# will minimise p^2 - pq for each p val equally
# Do convex optimisation
sol = solvers.qp(P,q,G,h,A,b)
probs = array(sol['x'])
# Print ouput
for s,p in zip(strings,probs):
print "p("+s+") =",p
checkprobs = [0. for char in alphabet]
for a,i in zip(alphabet, range(len(alphabet))):
for s,p in zip(strings,probs):
if s.find(a) > -1:
checkprobs[i] += p
print "p("+a+") =",checkprobs[i]
print "total =",sum(probs)
# Create the sampling function
rndstring = roulette_sampler(strings, probs)
###################
# Verify
print "sampling..."
test_n = 1000
output = [rndstring() for i in xrange(test_n)]
# find which one it is
sampled_freqs = []
for char in alphabet:
n = 0
for val in output:
if val.find(char) > -1:
n += 1
sampled_freqs += [n]
print "plotting histogram..."
import matplotlib.pyplot as plt
plt.bar(range(0,len(alphabet)),array(sampled_freqs)/float(test_n), width=0.5)
plt.show()
EDIT: Python code
I think this is a pretty interesting problem. I don't know the general solution, but it's easy enough to solve in the case of samples of size n-1 (if there is a solution), since there are exactly n possible samples, each of which corresponds to the absence of one of the elements.
Suppose we are seeking Fa = 9/30, Fb = 8/30, Fc = 7/30, Fd = 6/30 in samples of size 3 from a universe of size 4, as in the OP. We can translate each of those frequencies directly into a frequency of samples by selecting the samples which do not contain the given object. For example, we wish 9/30 of the selected objects to be a's; we cannot have more than one a in a sample, and we always have three symbols in a sample; consequently, 9/10 of the samples must contain a and 1/10 cannot contain a. But there is only one possible sample which doesn't contain a: bcd. So 10% of the samples must be bcd. Similarly, 20% must be acd; 30% abd and 40% abc. (Or, more generally, Fā = 1 - (n-1)Fa where Fā is the frequency of the (unique) sample which does not include a)
I can't help thinking that this observation combined with one of the classic ways of generating unique samples can solve the general problem. But I don't have that solution. For what it's worth, the algorithm I'm thinking of is the following:
To select a random sample of size k out of a universe U of n objects:
1) Set needed = k; available = n.
2) For each element in U, select a random number in the range [0, 1).
3) If the random number is less than k/n:
3a) Add the element to the sample.
3b) Decrement needed by 1. If it reaches 0, we're finished.
4) Decrement available, and continue with the next element in U.
So my idea is that it should be possible to manipulate the frequency of element by changing the threshold in step 3, making it somehow a function of the desired frequency of the corresponding element.
Assuming that the length of a word is always one less than the number of symbols, the following C# code does the job:
using System;
using System.Collections.Generic;
using System.Linq;
using MathNet.Numerics.Distributions;
namespace RandomSymbols
{
class Program
{
static void Main(string[] args)
{
// Sample case: Four symbols with the following distribution, and 10000 trials
double[] distribution = { 9.0/30, 8.0/30, 7.0/30, 6.0/30 };
int trials = 10000;
// Create an array containing all of the symbols
char[] symbols = Enumerable.Range('a', distribution.Length).Select(s => (char)s).ToArray();
// We're assuming that the word length is always one less than the number of symbols
int wordLength = symbols.Length - 1;
// Calculate the probability of each symbol being excluded from a given word
double[] excludeDistribution = Array.ConvertAll(distribution, p => 1 - p * wordLength);
// Create a random variable using the MathNet.Numerics library
var randVar = new Categorical(excludeDistribution);
var random = new Random();
randVar.RandomSource = random;
// We'll store all of the words in an array
string[] words = new string[trials];
for (int t = 0; t < trials; t++)
{
// Start with a word containing all of the symbols
var word = new List<char>(symbols);
// Remove one of the symbols
word.RemoveAt(randVar.Sample());
// Randomly permute the remainder
for (int i = 0; i < wordLength; i++)
{
int swapIndex = random.Next(wordLength);
char temp = word[swapIndex];
word[swapIndex] = word[i];
word[i] = temp;
}
// Store the word
words[t] = new string(word.ToArray());
}
// Display words
Array.ForEach(words, w => Console.WriteLine(w));
// Display histogram
Array.ForEach(symbols, s => Console.WriteLine("{0}: {1}", s, words.Count(w => w.Contains(s))));
}
}
}
Update: The following is a C implementation of the method that rici outlined. The tricky part is calculating the thresholds that he mentions, which I did with recursion.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
// ****** Change the following for different symbol distributions, word lengths, and number of trials ******
double targetFreqs[] = {10.0/43, 9.0/43, 8.0/43, 7.0/43, 6.0/43, 2.0/43, 1.0/43 };
const int WORDLENGTH = 4;
const int TRIALS = 1000000;
// *********************************************************************************************************
const int SYMBOLCOUNT = sizeof(targetFreqs) / sizeof(double);
double inclusionProbs[SYMBOLCOUNT];
double probLeftToIncludeTable[SYMBOLCOUNT][SYMBOLCOUNT];
// Calculates the probability that there will be n symbols left to be included when we get to the ith symbol.
double probLeftToInclude(int i, int n)
{
if (probLeftToIncludeTable[i][n] == -1)
{
// If this is the first symbol, then the number of symbols left to be included is simply the word length.
if (i == 0)
{
probLeftToIncludeTable[i][n] = (n == WORDLENGTH ? 1.0 : 0.0);
}
else
{
// Calculate the probability based on the previous symbol's probabilities.
// To get to a point where there are n symbols left to be included, either there were n+1 symbols left
// when we were considering that previous symbol and we included it, leaving n,
// or there were n symbols left and we didn't included it, also leaving n.
// We have to take into account that the previous symbol may have been manditorily included.
probLeftToIncludeTable[i][n] = probLeftToInclude(i-1, n+1) * (n == SYMBOLCOUNT-i ? 1.0 : inclusionProbs[i-1])
+ probLeftToInclude(i-1, n) * (n == 0 ? 1.0 : 1 - inclusionProbs[i-1]);
}
}
return probLeftToIncludeTable[i][n];
}
// Calculates the probability that the ith symbol won't *have* to be included or *have* to be excluded.
double probInclusionIsOptional(int i)
{
// The probability that inclusion is optional is equal to 1.0
// minus the probability that none of the remaining symbols can be included
// minus the probability that all of the remaining symbols must be included.
return 1.0 - probLeftToInclude(i, 0) - probLeftToInclude(i, SYMBOLCOUNT - i);
}
// Calculates the probability with which the ith symbol should be included, assuming that
// it doesn't *have* to be included or *have* to be excluded.
double inclusionProb(int i)
{
// The following is derived by simple algebra:
// Unconditional probability = (1.0 * probability that it must be included) + (inclusionProb * probability that inclusion is optional)
// therefore...
// inclusionProb = (Unconditional probability - probability that it must be included) / probability that inclusion is optional
return (targetFreqs[i]*WORDLENGTH - probLeftToInclude(i, SYMBOLCOUNT - i)) / probInclusionIsOptional(i);
}
int main(int argc, char* argv[])
{
srand(time(NULL));
// Initialize inclusion probabilities
for (int i=0; i<SYMBOLCOUNT; i++)
for (int j=0; j<SYMBOLCOUNT; j++)
probLeftToIncludeTable[i][j] = -1.0;
// Calculate inclusion probabilities
for (int i=0; i<SYMBOLCOUNT; i++)
{
inclusionProbs[i] = inclusionProb(i);
}
// Histogram
int histogram[SYMBOLCOUNT];
for (int i=0; i<SYMBOLCOUNT; i++)
{
histogram[i] = 0;
}
// Scratchpad for building our words
char word[WORDLENGTH+1];
word[WORDLENGTH] = '\0';
// Run trials
for (int t=0; t<TRIALS; t++)
{
int included = 0;
// Build the word by including or excluding symbols according to the problem constraints
// and the probabilities in inclusionProbs[].
for (int i=0; i<SYMBOLCOUNT && included<WORDLENGTH; i++)
{
if (SYMBOLCOUNT - i == WORDLENGTH - included // if we have to include this symbol
|| (double)rand()/(double)RAND_MAX < inclusionProbs[i]) // or if we get a lucky roll of the dice
{
word[included++] = 'a' + i;
histogram[i]++;
}
}
// Randomly permute the word
for (int i=0; i<WORDLENGTH; i++)
{
int swapee = rand() % WORDLENGTH;
char temp = word[swapee];
word[swapee] = word[i];
word[i] = temp;
}
// Uncomment the following to show each word
// printf("%s\r\n", word);
}
// Show the histogram
for (int i=0; i<SYMBOLCOUNT; i++)
{
printf("%c: target=%d, actual=%d\r\n", 'a'+i, (int)(targetFreqs[i]*WORDLENGTH*TRIALS), histogram[i]);
}
return 0;
}
To do this you have to use a temporary array storing the cumulated sum of your probabilities.
In your example, probabilities are 9/30, 8/30, 7/30 and 6/30 respectively.
You should then have an array:
values = {'a', 'b', 'c', 'd'}
proba = {9/30, 17/30, 24/30, 1}
Then you pick a random number r in [0, 1] and do like this:
char chooseRandom() {
int i = 0;
while (r > proba[i])
++i;
return values[i];
}
Yesterday I spent the entire day trying to solve a problem that wants me to get the k-th permutation or unrank a permutation.
I found the best way was factoradic numbers, after hours of Googling and reading dozens of pdfs\powerpoints I finally managed to make it work perfectly both with pencil and paper and by code.
Problem now is, when there are repeated items.
I tried everything, but couldn't get the thing to work the way it should.The factoradic always generates much bigger rank for a permutation, can't just let it "recognize" only non-repeated permutations.
Does anyone know a way to use the actoradic system to unrank a permutation with repeated items ? (eg: abaac) ?
If anyone knows, please I would love a small example and intuitive explanation, that sure will benifit many others in the future.
Thanks a lot :)
PS: Here is my attempted C++ code that I wrote MYSELF.I know its not optmized at all, but just to show you what I got so far:
This code will work correct if no repeated items, but will be wrong with repeated items (next_permutation is not usable of course when say, I want the 1 billionth permutation).
#include <iostream>
#include <cstdio>
#include <string>
#include <algorithm>
using namespace std;
int f(int n) {
if(n<2) return 1;
return n*f(n-1);
}
int pos(string& s,char& c) {
for(int i=0;i<s.size();++i) {
if(s[i]==c) return i;
}
return -1;
}
int main() {
const char* perm = "bedac";
string original=perm;
sort(original.begin(),original.end());
string s=original;
string t=perm;
int res=0;
for(;s!=t && next_permutation(s.begin(),s.end());++res);
cout<<"real:"<<res<<endl;
s=original;
string n;
while(!s.empty()) {
int p=pos(s,t[0]);
n+=p;
t.erase(0,1);
s.erase(p,1);
}
for(res=0;!n.empty();(res+=n[0]*f(n.size()-1)),n.erase(0,1));
cout<<"factoradix:"<<res<<endl;
return 0;
}
In a permutation where all elements are unique, we can generate each element, in a recursive fashion. To rewrite your implementation a bit (in pseudo-code)
def map(k,left):
ele = k/(len(left)!)
return [ele] + map( k % (len(left)!), left - left[ele])
Here we know a priori how many elements are in the subcollection, namely (k-1)!.
In a permutation with repeated elements, the number of remaining elements is (k-1)!/((# of 1s)!(# of 2s)! ... (# of ks)!) and this changes based on what element we choose on each level. We need to apply the same idea, but instead of being able to calculate the index on the fly, we need to determine how many sub-permutations there are if we choose element X at each level of the recursion. We subtract that from the permutation number and recurse.
# group_v is the value of an element
# group_members is the number of times it is repeated
# facts_with is group_members[x] factorial
def permap(k,group_v,group_members,facts_with):
n = sum(group_members); # how many elements left
if n == 0:
return []
total = math.factorial(n-1);
total_denom = prod(facts_with);
start_range = 0; end_range = 0;
for group_i in range(len(group_v)):
if group_members[group_i] == 0:
continue
v = (group_members[group_i]*total)/(total_denom) # n-1!/((a-1)!...z!)
end_range += v
if end_range > k:
facts_with[group_i]/=group_members[group_i];
group_members[group_i]-=1;
return [group_v[group_i]] + permap(k-start_range,group_v,group_members,facts_with)
else:
start_range=end_range
raise Exception()
The full listing in Python
#imports
import itertools;
import math;
import operator
def prod(lst):
return reduce(operator.mul,lst);
#mainfunc
def permap(k,group_v,group_members,facts_with):
n = sum(group_members);
if n == 0:
return []
total = math.factorial(n-1);
total_denom = prod(facts_with);
start_range = 0; end_range = 0;
for group_i in range(len(group_v)):
if group_members[group_i] == 0:
continue
v = (group_members[group_i]*total)/(total_denom) # n-1!/(a!...z!)
end_range += v
if end_range > k:
facts_with[group_i]/=group_members[group_i];
group_members[group_i]-=1;
return [group_v[group_i]] + permap(k-start_range,group_v,group_members,facts_with)
else:
start_range=end_range
raise Exception()
items = [1,2,2,1]
n_groups = len(list(itertools.groupby(items)))
facts_with = [0]*(n_groups)
group_v = [0]*(n_groups)
group_members = [0]*(n_groups)
group_i = 0
print [list(g) for k,g in itertools.groupby(items)];
for group in itertools.groupby(items):
group_v[group_i], group_members[group_i] = group;
group_members[group_i] = len(list(group_members[group_i]))
facts_with[group_i] = math.factorial(group_members[group_i]);
group_i+=1
for x in range(6):
print permap(x,list(group_v),list(group_members),list(facts_with));