So, we see a lot of fibonacci questions. I, personally, hate them. A lot. More than a lot. I thought it'd be neat if maybe we could make it impossible for anyone to ever use it as an interview question again. Let's see how close to O(1) we can get fibonacci.
Here's my kick off, pretty much crib'd from Wikipedia, with of course plenty of headroom. Importantly, this solution will detonate for any particularly large fib, and it contains a relatively naive use of the power function, which places it at O(log(n)) at worst, if your libraries aren't good. I suspect we can get rid of the power function, or at least specialize it. Anyone up for helping? Is there a true O(1) solution, other than the finite* solution of using a look-up table?
http://ideone.com/FDt3P
#include <iostream>
#include <math.h>
using namespace std; // would never normally do this.
int main() {
int target = 10;
cin >> target;
// should be close enough for anything that won't make us explode anyway.
float mangle = 2.23607610;
float manglemore = mangle;
++manglemore; manglemore = manglemore / 2;
manglemore = pow(manglemore, target);
manglemore = manglemore/mangle;
manglemore += .5;
cout << floor(manglemore);
}
*I know, I know, it's enough for any of the zero practical uses fibonacci has.
Here is a near O(1) solution for a Fibonacci sequence term. Admittedly, O(log n) depending on the system Math.pow() implementation, but it is Fibonacci w/o a visible loop, if your interviewer is looking for that. The ceil() was due to rounding precision on larger values returning .9 repeating.
Example in JS:
function fib (n) {
var A=(1+Math.sqrt(5))/2,
B=(1-Math.sqrt(5))/2,
fib = (Math.pow(A,n) - Math.pow(B,n)) / Math.sqrt(5);
return Math.ceil(fib);
}
Given arbitrary large inputs, simply reading in n takes O(log n), so in that sense no constant time algorithm is possible. So, use the closed form solution, or precompute the values you care about, to get reasonable performance.
Edit: In comments it was pointed out that it is actually worse, because fibonacci is O(phi^n) printing the result of Fibonacci is O(log (phi^n)) which is O(n)!
The following answer executes in O(1), though I am not sure whether it is qualified for you question. It is called Template Meta-Programming.
#include <iostream>
using namespace std;
template <int N>
class Fibonacci
{
public:
enum {
value = Fibonacci<N - 1>::value + Fibonacci<N - 2>::value
};
};
template <>
class Fibonacci<0>
{
public:
enum {
value = 0
};
};
template <>
class Fibonacci<1>
{
public:
enum {
value = 1
};
};
int main()
{
cout << Fibonacci<50>::value << endl;
return 0;
}
In Programming: The Derivation of Algorithms, Anne Kaldewaij expands out the linear algebra solution to get (translated and refactored from the programming language used in that book):
template <typename Int_t> Int_t fib(Int_t n)
{
Int_t a = 0, b = 1, x = 0, y 1, t0, t1;
while (n != 0) {
switch(n % 2) {
case 1:
t0 = a * x + b * y;
t1 = b * x + a * y + b * y;
x = t0;
y = t1;
--n;
continue;
default:
t0 = a * a + b * b;
t1 = 2 * a * b + b * b;
a = t0;
b = t1;
n /= 2;
continue;
}
}
return x;
}
This has O(log n) complexity. That's not constant, of course, but I think it's worth adding to the discussion, especially given that it only uses relatively fast integer operations and has no possibility of rounding error.
Yes. Precalculate the values, and store in an array,
then use N to do a lookup.
Pick some largest value to handle. For any larger value, raise an error. For any smaller value than that, just store the answer at that smaller value, and keep running the calculation for the "largest" value, and return the stored value.
After all, O(1) specifically means "constant", not "fast". With this method, all calculations will take the same amount of time.
Fibonacci in O(1) space and time (Python implementation):
PHI = (1 + sqrt(5)) / 2
def fib(n: int):
return int(PHI ** n / sqrt(5) + 0.5)
Related
I've been trying to create a generalized Gradient Noise generator (which doesn't use the hash method to get gradients). The code is below:
class GradientNoise {
std::uint64_t m_seed;
std::uniform_int_distribution<std::uint8_t> distribution;
const std::array<glm::vec2, 4> vector_choice = {glm::vec2(1.0, 1.0), glm::vec2(-1.0, 1.0), glm::vec2(1.0, -1.0),
glm::vec2(-1.0, -1.0)};
public:
GradientNoise(uint64_t seed) {
m_seed = seed;
distribution = std::uniform_int_distribution<std::uint8_t>(0, 3);
}
// 0 -> 1
// just passes the value through, origionally was perlin noise activation
double nonLinearActivationFunction(double value) {
//return value * value * value * (value * (value * 6.0 - 15.0) + 10.0);
return value;
}
// 0 -> 1
//cosine interpolation
double interpolate(double a, double b, double t) {
double mu2 = (1 - cos(t * M_PI)) / 2;
return (a * (1 - mu2) + b * mu2);
}
double noise(double x, double y) {
std::mt19937_64 rng;
//first get the bottom left corner associated
// with these coordinates
int corner_x = std::floor(x);
int corner_y = std::floor(y);
// then get the respective distance from that corner
double dist_x = x - corner_x;
double dist_y = y - corner_y;
double corner_0_contrib; // bottom left
double corner_1_contrib; // top left
double corner_2_contrib; // top right
double corner_3_contrib; // bottom right
std::uint64_t s1 = ((std::uint64_t(corner_x) << 32) + std::uint64_t(corner_y) + m_seed);
std::uint64_t s2 = ((std::uint64_t(corner_x) << 32) + std::uint64_t(corner_y + 1) + m_seed);
std::uint64_t s3 = ((std::uint64_t(corner_x + 1) << 32) + std::uint64_t(corner_y + 1) + m_seed);
std::uint64_t s4 = ((std::uint64_t(corner_x + 1) << 32) + std::uint64_t(corner_y) + m_seed);
// each xy pair turns into distance vector from respective corner, corner zero is our starting corner (bottom
// left)
rng.seed(s1);
corner_0_contrib = glm::dot(vector_choice[distribution(rng)], {dist_x, dist_y});
rng.seed(s2);
corner_1_contrib = glm::dot(vector_choice[distribution(rng)], {dist_x, dist_y - 1});
rng.seed(s3);
corner_2_contrib = glm::dot(vector_choice[distribution(rng)], {dist_x - 1, dist_y - 1});
rng.seed(s4);
corner_3_contrib = glm::dot(vector_choice[distribution(rng)], {dist_x - 1, dist_y});
double u = nonLinearActivationFunction(dist_x);
double v = nonLinearActivationFunction(dist_y);
double x_bottom = interpolate(corner_0_contrib, corner_3_contrib, u);
double x_top = interpolate(corner_1_contrib, corner_2_contrib, u);
double total_xy = interpolate(x_bottom, x_top, v);
return total_xy;
}
};
I then generate an OpenGL texture to display with like this:
int width = 1024;
int height = 1024;
unsigned char *temp_texture = new unsigned char[width*height * 4];
double octaves[5] = {2,4,8,16,32};
for( int i = 0; i < height; i++){
for(int j = 0; j < width; j++){
double d_noise = 0;
d_noise += temp_1.noise(j/octaves[0], i/octaves[0]);
d_noise += temp_1.noise(j/octaves[1], i/octaves[1]);
d_noise += temp_1.noise(j/octaves[2], i/octaves[2]);
d_noise += temp_1.noise(j/octaves[3], i/octaves[3]);
d_noise += temp_1.noise(j/octaves[4], i/octaves[4]);
d_noise/=5;
uint8_t noise = static_cast<uint8_t>(((d_noise * 128.0) + 128.0));
temp_texture[j*4 + (i * width * 4) + 0] = (noise);
temp_texture[j*4 + (i * width * 4) + 1] = (noise);
temp_texture[j*4 + (i * width * 4) + 2] = (noise);
temp_texture[j*4 + (i * width * 4) + 3] = (255);
}
}
Which give good results:
But gprof is telling me that the Mersenne twister is taking up 62.4% of my time and growing with larger textures. Nothing else individual takes any where near as much time. While the Mersenne twister is fast after initialization, the fact that I initialize it every time I use it seems to make it pretty slow.
This initialization is 100% required for this to make sure that the same x and y generates the same gradient at each integer point (so you need either a hash function or seed the RNG each time).
I attempted to change the PRNG to both the linear congruential generator and Xorshiftplus, and while both ran orders of magnitude faster, they gave odd results:
LCG (one time, then running 5 times before using)
Xorshiftplus
After one iteration
After 10,000 iterations.
I've tried:
Running the generator several times before utilizing output, this results in slow execution or simply different artifacts.
Using the output of two consecutive runs after initial seed to seed the PRNG again and use the value after wards. No difference in result.
What is happening? What can i do to get faster results that are of the same quality as the mersenne twister?
OK BIG UPDATE:
I don't know why this works, I know it has something to do with the prime number utilized, but after messing around a bit, it appears that the following works:
Step 1, incorporate the x and y values as seeds separately (and incorporate some other offset value or additional seed value with them, this number should be a prime/non trivial factor)
Step 2, Use those two seed results into seeding the generator again back into the function (so like geza said, the seeds made were bad)
Step 3, when getting the result, instead of using modulo number of items (4) trying to get, or & 3, modulo the result by a prime number first then apply & 3. I'm not sure if the prime being a mersenne prime matters or not.
Here is the result with prime = 257 and xorshiftplus being used! (note I used 2048 by 2048 for this one, the others were 256 by 256)
LCG is known to be inadequate for your purpose.
Xorshift128+'s results are bad, because it needs good seeding. And providing good seeding defeats the whole purpose of using it. I don't recommend this.
However, I recommend using an integer hash. For example, one from Bob's page.
Here's a result of the first hash of that page, it looks OK to me, and it is fast (I think it is much faster than Mersenne Twister):
Here's the code I've written to generate this:
#include <cmath>
#include <stdio.h>
unsigned int hash(unsigned int a) {
a = (a ^ 61) ^ (a >> 16);
a = a + (a << 3);
a = a ^ (a >> 4);
a = a * 0x27d4eb2d;
a = a ^ (a >> 15);
return a;
}
unsigned int ivalue(int x, int y) {
return hash(y<<16|x)&0xff;
}
float smooth(float x) {
return 6*x*x*x*x*x - 15*x*x*x*x + 10*x*x*x;
}
float value(float x, float y) {
int ix = floor(x);
int iy = floor(y);
float fx = smooth(x-ix);
float fy = smooth(y-iy);
int v00 = ivalue(iy+0, ix+0);
int v01 = ivalue(iy+0, ix+1);
int v10 = ivalue(iy+1, ix+0);
int v11 = ivalue(iy+1, ix+1);
float v0 = v00*(1-fx) + v01*fx;
float v1 = v10*(1-fx) + v11*fx;
return v0*(1-fy) + v1*fy;
}
unsigned char pic[1024*1024];
int main() {
for (int y=0; y<1024; y++) {
for (int x=0; x<1024; x++) {
float v = 0;
for (int o=0; o<=9; o++) {
v += value(x/64.0f*(1<<o), y/64.0f*(1<<o))/(1<<o);
}
int r = rint(v*0.5f);
pic[y*1024+x] = r;
}
}
FILE *f = fopen("x.pnm", "wb");
fprintf(f, "P5\n1024 1024\n255\n");
fwrite(pic, 1, 1024*1024, f);
fclose(f);
}
If you want to understand, how a hash function work (or better yet, which properties a good hash have), check out Bob's page, for example this.
You (unknowingly?) implemented a visualization of PRNG non-random patterns. That looks very cool!
Except Mersenne Twister, all your tested PRNGs do not seem fit for your purpose. As I have not done further tests myself, I can only suggest to try out and measure further PRNGs.
The randomness of LCGs are known to be sensitive to the choice of their parameters. In particular, the period of a LCG is relative to the m parameter - at most it will be m (your prime factor) & for many values it can be less.
Similarly, the careful parameters selection is required to get a long period from Xorshift PRNGs.
You've noted that some PRNGs give good procedural generation results while other do not. In order to isolate the cause, I would factor out the proc gen stuff & examine the PRNG output directly. An easy way to visualize the data is to build a grey scale image where each pixel value is a (possibly scaled) random value. For image based stuff, I find this to be an easy way to find stuff that may lead to visual artifacts. Any artifacts you see with this are likely to cause issues with your proc gen output.
Another option is to try something like the Diehard tests. If the aforementioned image test failed to reveal any problems, I might use this just to be sure my PRNG techniques were trustworthy.
Note that your code seeds the PRNG, then generates one pseudorandom number from the PRNG. The reason for the nonrandomness in xorshift128+ that you discovered is that xorshift128+ simply adds the two halves of the seed (and uses the result mod 264 as the generated number) before changing its state (review its source code). This makes that PRNG considerably different from a hash function.
What you see is the practical demonstration of quality of PRNG. Mersenne Twister is one of the best PRNGs with good performance, it passes DIEHARD tests. One should know that generating a random numbers is not an easy computational task, so looking for a better performance will inevitably result in poor quality. LCG is known to be simplest and worst PRNG ever designed and it clearly shows two-dimensional correlation as in your picture. The quality of Xorshift generators largely depend on bitness and parameters. They are definitely worse than Mersenne Twister, but some (xorshift128+) may work good enough to pass BigCrush battery of TestU01 tests.
In other words, if you are making an important physical modelling numerical experiment, you better continue to use Mersenne Twister as known to be a good trade-off between speed and quality and it comes in many standard libraries. On a less important case you may try to use xorshift128+ generator. For an ultimate results you need to use cryptographical-quality PRNG (none of mentioned here may be used for cryptographical purposes).
Edit: my main question is that I want to replicate the TI-84 plus RNG algorithm on my computer, so I can write it in a language like Javascript or Lua, to test it faster.
I tried using an emulator, but it turned out to be slower than the calculator.
Just for the people concerned: There is another question like this, but answer to that question just says how to transfer already-generated numbers over to the computer. I don't want this. I already tried something like it, but I had to leave the calculator running all weekend, and it still wasn't done.
The algorithm being used is from the paper Efficient and portable combined random number generators by P. L'Ecuyer.
You can find the paper here and download it for free from here.
The algorithm used by the Ti calculators is on the RHS side of p. 747. I've included a picture.
I've translated this into a C++ program
#include <iostream>
#include <iomanip>
using namespace std;
long s1,s2;
double Uniform(){
long Z,k;
k = s1 / 53668;
s1 = 40014*(s1-k*53668)-k*12211;
if(s1<0)
s1 = s1+2147483563;
k = s2/52774;
s2 = 40692*(s2-k*52774)-k*3791;
if(s2<0)
s2 = s2+2147483399;
Z=s1-s2;
if(Z<1)
Z = Z+2147483562;
return Z*(4.656613e-10);
}
int main(){
s1 = 12345; //Gotta love these seed values!
s2 = 67890;
for(int i=0;i<10;i++)
cout<<std::setprecision(10)<<Uniform()<<endl;
}
Note that the initial seeds are s1 = 12345 and s2 = 67890.
And got an output from a Ti-83 (sorry, I couldn't find a Ti-84 ROM) emulator:
This matches what my implementation produces
I've just cranked the output precision on my implementation and get the following results:
0.9435973904
0.9083188494
0.1466878273
0.5147019439
0.4058096366
0.7338123019
0.04399198693
0.3393625207
Note that they diverge from Ti's results in the less significant digits. This may be a difference in the way the two processors (Ti's Z80 versus my X86) perform floating point calculations. If so, it will be hard to overcome this issue. Nonetheless, the random numbers will still generate in the same sequence (with the caveat below) since the sequence relies on only integer mathematics, which are exact.
I've also used the long type to store intermediate values. There's some risk that the Ti implementation relies on integer overflow (I didn't read L'Ecuyer's paper too carefully), in which case you would have to adjust to int32_t or a similar type to emulate this behaviour. Assuming, again, that the processors perform similarly.
Edit
This site provides a Ti-Basic implementation of the code as follows:
:2147483563→mod1
:2147483399→mod2
:40014→mult1
:40692→mult2
#The RandSeed Algorithm
:abs(int(n))→n
:If n=0 Then
: 12345→seed1
: 67890→seed2
:Else
: mod(mult1*n,mod1)→seed1
: mod(n,mod2)→seed2
:EndIf
#The rand() Algorithm
:Local result
:mod(seed1*mult1,mod1)→seed1
:mod(seed2*mult2,mod2)→seed2
:(seed1-seed2)/mod1→result
:If result<0
: result+1→result
:Return result
I translated this into C++ for testing:
#include <iostream>
#include <iomanip>
using namespace std;
long mod1 = 2147483563;
long mod2 = 2147483399;
long mult1 = 40014;
long mult2 = 40692;
long seed1,seed2;
void Seed(int n){
if(n<0) //Perform an abs
n = -n;
if(n==0){
seed1 = 12345; //Gotta love these seed values!
seed2 = 67890;
} else {
seed1 = (mult1*n)%mod1;
seed2 = n%mod2;
}
}
double Generate(){
double result;
seed1 = (seed1*mult1)%mod1;
seed2 = (seed2*mult2)%mod2;
result = (double)(seed1-seed2)/(double)mod1;
if(result<0)
result = result+1;
return result;
}
int main(){
Seed(0);
for(int i=0;i<10;i++)
cout<<setprecision(10)<<Generate()<<endl;
}
This gave the following results:
0.9435974025
0.908318861
0.1466878292
0.5147019502
0.405809642
0.7338123114
0.04399198747
0.3393625248
0.9954663411
0.2003402617
which match those achieved with the implementation based on the original paper.
I implemented rand, randInt, randM and randBin in Python. Thanks Richard for the C code. All implemented commands work as expected. You can also find it in this Gist.
import math
class TIprng(object):
def __init__(self):
self.mod1 = 2147483563
self.mod2 = 2147483399
self.mult1 = 40014
self.mult2 = 40692
self.seed1 = 12345
self.seed2 = 67890
def seed(self, n):
n = math.fabs(math.floor(n))
if (n == 0):
self.seed1 = 12345
self.seed2 = 67890
else:
self.seed1 = (self.mult1 * n) % self.mod1
self.seed2 = (n)% self.mod2
def rand(self, times = 0):
# like TI, this will return a list (array in python) if times == 1,
# or an integer if times isn't specified
if not(times):
self.seed1 = (self.seed1 * self.mult1) % self.mod1
self.seed2 = (self.seed2 * self.mult2)% self.mod2
result = (self.seed1 - self.seed2)/self.mod1
if(result<0):
result = result+1
return result
else:
return [self.rand() for _ in range(times)]
def randInt(self, minimum, maximum, times = 0):
# like TI, this will return a list (array in python) if times == 1,
# or an integer if times isn't specified
if not(times):
if (minimum < maximum):
return (minimum + math.floor((maximum- minimum + 1) * self.rand()))
else:
return (maximum + math.floor((minimum - maximum + 1) * self.rand()))
else:
return [self.randInt(minimum, maximum) for _ in range(times)]
def randBin(self, numtrials, prob, times = 0):
if not(times):
return sum([(self.rand() < prob) for _ in range(numtrials)])
else:
return [self.randBin(numtrials, prob) for _ in range(times)]
def randM(self, rows, columns):
# this will return an array of arrays
matrixArr = [[0 for x in range(columns)] for x in range(rows)]
# we go from bottom to top, from right to left
for row in reversed(range(rows)):
for column in reversed(range(columns)):
matrixArr[row][column] = self.randInt(-9, 9)
return matrixArr
testPRNG = TIprng()
testPRNG.seed(0)
print(testPRNG.randInt(0,100))
testPRNG.seed(0)
print(testPRNG.randM(3,4))
The algorithm used by the TI-Basic rand command is L'Ecuyer's algorithm according to TIBasicDev.
rand generates a uniformly-distributed pseudorandom number (this page
and others will sometimes drop the pseudo- prefix for simplicity)
between 0 and 1. rand(n) generates a list of n uniformly-distributed
pseudorandom numbers between 0 and 1. seed→rand seeds (initializes)
the built-in pseudorandom number generator. The factory default seed
is 0.
L'Ecuyer's algorithm is used by TI calculators to generate
pseudorandom numbers.
Unfortunately I have not been able to find any source published by Texas Instruments backing up this claim, so I cannot with certainty that this is the algorthm used. I am also uncertain what exactly is referred to by L'Ecuyer's algorithm.
Here is a C++ program that works:
#include<cmath>
#include<iostream>
#include<iomanip>
using namespace std;
int main()
{
double seed1 = 12345;
double seed2 = 67890;
double mod1 = 2147483563;
double mod2 = 2147483399;
double result;
for(int i=0; i<10; i++)
{
seed1 = seed1*40014-mod1*floor((seed1*40014)/mod1);
seed2 = seed2*40692-mod2*floor((seed2*40692)/mod2);
result = (seed1 - seed2)/mod1;
if(result < 0)
{result = result + 1;}
cout<<setprecision(10)<<result<<endl;
}
return 0;
}
I had a question in my assignment to find whether a number was perfect square or not:
Perfect square is an element of algebraic structure which is equal to
the square of another element.
For example: 4, 9, 16 etc.
What my friends did is, if n is the number, they looped n - 1 times calculating n * n:
// just a general gist
int is_square = 0;
for (int i = 2; i < n; i++)
{
if ((i * i) == n)
{
std::cout << "Yes , it is";
is_square = 1;
break;
}
}
if (is_square == 0)
{
std::cout << "No, it is not";
}
I came up with a solution as shown below:
if (ceil(sqrt(n)) == floor(sqrt(n)))
{
std::cout << "Yes , it is";
}
else
{
std::cout << "no , it is not";
}
And it works properly.
Can it be called as more optimized solution than others?
The tried and true remains:
double sqrt(double x); // from lib
bool is_sqr(long n) {
long root = sqrt(n);
return root * root == n;
}
You would need to know the complexity of the sqrt(x) function on your computer to compare it against other methods. By the way, you are calling sqrt(n) twice ; consider storing it into a variable (even if the compiler probably does this for you).
If using something like Newton's method, the complexity of sqrt(x) is in O(M(d)) where M(d) measures the time required to multiply two d-digits numbers. Wikipedia
Your friend's method does (n - 2) multiplications (worst case scenario), thus its complexity is like O(n * M(x)) where x is a growing number.
Your version only uses sqrt() (ceil and floor can be ignored because of their constant complexity), which makes it O(M(n)).
O(M(n)) < O(n * M(x)) - Your version is more optimized than your friend's, but is not the most efficient. Have a look at interjay's link for a better approach.
#include <iostream>
using namespace std;
void isPerfect(int n){
int ctr=0,i=1;
while(n>0){
n-=i;
i+=2;
ctr++;
}
if(!n)cout<<"\nSquare root = "<<ctr;
else cout<<"\nNot a perfect square";
}
int main() {
isPerfect(3);
isPerfect(2025);
return 0;
}
I don't know what limitations do you have but perfect square number definition is clear
Another way of saying that a (non-negative) number is a square number,
is that its square roots are again integers
in wikipedia
IF SQRT(n) == FLOOR(SQRT(n)) THEN
WRITE "Yes it is"
ELSE
WRITE "No it isn't"
examples sqrt(9) == floor(sqrt(9)), sqrt(10) == floor(sqrt(10))
I'll recommend this one
if(((unsigned long long)sqrt(Number))%1==0) // sqrt() from math library
{
//Code goes Here
}
It works because.... all Integers( & only positive integers ) are positive multiples of 1
and Hence the solution.....
You can also run a benchmark Test like;
I did with following code in MSVC 2012
#include <iostream>
#include <conio.h>
#include <time.h>
#include <math.h>
using namespace std;
void IsPerfect(unsigned long long Number);
void main()
{
clock_t Initial,Final;
unsigned long long Counter=0;
unsigned long long Start=Counter;
Start--;
cout<<Start<<endl;
Initial=clock();
for( Counter=0 ; Counter<=100000000 ; Counter++ )
{
IsPerfect( Counter );
}
Final=clock();
float Time((float)Final-(float)Initial);
cout<<"Calculations Done in "<< Time ;
getch();
}
void IsPerfect( unsigned long long Number)
{
if(ceil(sqrt(Number)) == floor(sqrt(Number)))
//if(((unsigned long long)sqrt(Number))%1==0) // sqrt() from math library
{
}
}
Your code took 13590 time units
Mine just 10049 time units
Moreover I'm using few extra steps that is Type conversion
like
(unsigned long long)sqrt(Number))
Without this it can do still better
I hope it helps .....
Have a nice day....
Your solutions is more optimized, but it may not work. Since sqrt(x) may return the true square root +/- epsilon, 3 different roots must be tested:
bool isPerfect(long x){
double k = round( sqrt(x) );
return (n==(k-1)*(k-1)) || (n==k*k) || (n==(k+1)*(k+1));
}
This is simple python code for finding perfect square r not:
import math
n=(int)(input())
giv=(str)(math.sqrt(n))
if(len(giv.split('.')[1])==1):
print ("yes")
else:
print ("No")
I saw the following interview question on some online forum. What is a good solution for this?
Get the last 1000 digits of 5^1234566789893943
Simple algorithm:
1. Maintain a 1000-digits array which will have the answer at the end
2. Implement a multiplication routine like you do in school. It is O(d^2).
3. Use modular exponentiation by squaring.
Iterative exponentiation:
array ans;
int a = 5;
while (p > 0) {
if (p&1) {
ans = multiply(ans, a)
}
p = p>>1;
ans = multiply(ans, ans);
}
multiply: multiplies two large number using the school method and return last 1000 digits.
Time complexity: O(d^2*logp) where d is number of last digits needed and p is power.
A typical solution for this problem would be to use modular arithmetic and exponentiation by squaring to compute the remainder of 5^1234566789893943 when divided by 10^1000. However in your case this will still not be good enough as it would take about 1000*log(1234566789893943) operations and this is not too much, but I will propose a more general approach that would work for greater values of the exponent.
You will have to use a bit more complicated number theory. You can use Euler's theorem to get the remainder of 5^1234566789893943 modulo 2^1000 a lot more efficiently. Denote that r. It is also obvious that 5^1234566789893943 is divisible by 5^1000.
After that you need to find a number d such that 5^1000*d = r(modulo 2^1000). To solve this equation you should compute 5^1000(modulo 2^1000). After that all that is left is to do division modulo 2^1000. Using again Euler's theorem this can be done efficiently. Use that x^(phi(2^1000)-1)*x =1(modulo 2^1000). This approach is way faster and is the only feasible solution.
The key phrase is "modular exponentiation". Python has that built in:
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> help(pow)
Help on built-in function pow in module builtins:
pow(...)
pow(x, y[, z]) -> number
With two arguments, equivalent to x**y. With three arguments,
equivalent to (x**y) % z, but may be more efficient (e.g. for ints).
>>> digits = pow(5, 1234566789893943, 10**1000)
>>> len(str(digits))
1000
>>> digits
4750414775792952522204114184342722049638880929773624902773914715850189808476532716372371599198399541490535712666678457047950561228398126854813955228082149950029586996237166535637925022587538404245894713557782868186911348163750456080173694616157985752707395420982029720018418176528050046735160132510039430638924070731480858515227638960577060664844432475135181968277088315958312427313480771984874517274455070808286089278055166204573155093723933924226458522505574738359787477768274598805619392248788499020057331479403377350096157635924457653815121544961705226996087472416473967901157340721436252325091988301798899201640961322478421979046764449146045325215261829432737214561242087559734390139448919027470137649372264607375942527202021229200886927993079738795532281264345533044058574930108964976191133834748071751521214092905298139886778347051165211279789776682686753139533912795298973229094197221087871530034608077419911440782714084922725088980350599242632517985214513078773279630695469677448272705078125
>>>
The technique we need to know is exponentiation by squaring and modulus. We also need to use BigInteger in Java.
Simple code in Java:
BigInteger m = //BigInteger of 10^1000
BigInteger pow(BigInteger a, long b) {
if (b == 0) {
return BigInteger.ONE;
}
BigInteger val = pow(a, b/2);
if (b % 2 == 0)
return (val.multiply(val)).mod(m);
else
return (val.multiply(val).multiply(a)).mod(m);
}
In Java, the function modPow has done it all for you (thank Java).
Use congruence and apply modular arithmetic.
Square and multiply algorithm.
If you divide any number in base 10 by 10 then the remainder represents
the last digit. i.e. 23422222=2342222*10+2
So we know:
5=5(mod 10)
5^2=25=5(mod 10)
5^4=(5^2)*(5^2)=5*5=5(mod 10)
5^8=(5^4)*(5^4)=5*5=5(mod 10)
... and keep going until you get to that exponent
OR, you can realize that as we keep going you keep getting 5 as your remainder.
Convert the number to a string.
Loop on the string, starting at the last index up to 1000.
Then reverse the result string.
I posted a solution based on some hints here.
#include <vector>
#include <iostream>
using namespace std;
vector<char> multiplyArrays(const vector<char> &data1, const vector<char> &data2, int k) {
int sz1 = data1.size();
int sz2 = data2.size();
vector<char> result(sz1+sz2,0);
for(int i=sz1-1; i>=0; --i) {
char carry = 0;
for(int j=sz2-1; j>=0; --j) {
char value = data1[i] * data2[j]+result[i+j+1]+carry;
carry = value/10;
result[i+j+1] = value % 10;
}
result[i]=carry;
}
if(sz1+sz2>k){
vector<char> lastKElements(result.begin()+(sz1+sz2-k), result.end());
return lastKElements;
}
else
return result;
}
vector<char> calculate(unsigned long m, unsigned long n, int k) {
if(n == 0) {
return vector<char>(1, 1);
} else if(n % 2) { // odd number
vector<char> tmp(1, m);
vector<char> result1 = calculate(m, n-1, k);
return multiplyArrays(result1, tmp, k);
} else {
vector<char> result1 = calculate(m, n/2, k);
return multiplyArrays(result1, result1, k);
}
}
int main(int argc, char const *argv[]){
vector<char> v=calculate(5,8,1000);
for(auto c : v){
cout<<static_cast<unsigned>(c);
}
}
I don't know if Windows can show a big number (Or if my computer is fast enough to show it) But I guess you COULD use this code like and algorithm:
ulong x = 5; //There are a lot of libraries for other languages like C/C++ that support super big numbers. In this case I'm using C#'s default `Uint64` number.
for(ulong i=1; i<1234566789893943; i++)
{
x = x * x; //I will make the multiplication raise power over here
}
string term = x.ToString(); //Store the number to a string. I remember strings can store up to 1 billion characters.
char[] number = term.ToCharArray(); //Array of all the digits
int tmp=0;
while(number[tmp]!='.') //This will search for the period.
tmp++;
tmp++; //After finding the period, I will start storing 1000 digits from this index of the char array
string thousandDigits = ""; //Here I will store the digits.
for (int i = tmp; i <= 1000+tmp; i++)
{
thousandDigits += number[i]; //Storing digits
}
Using this as a reference, I guess if you want to try getting the LAST 1000 characters of this array, change to this in the for of the above code:
string thousandDigits = "";
for (int i = 0; i > 1000; i++)
{
thousandDigits += number[number.Length-i]; //Reverse array... ¿?
}
As I don't work with super super looooong numbers, I don't know if my computer can get those, I tried the code and it works but when I try to show the result in console it just leave the pointer flickering xD Guess it's still working. Don't have a pro Processor. Try it if you want :P
I do not intend to use this for security purposes or statistical analysis. I need to create a simple random number generator for use in my computer graphics application. I don't want to use the term "random number generator", since people think in very strict terms about it, but I can't think of any other word to describe it.
it has to be fast.
it must be repeatable, given a particular seed.
Eg: If seed = x, then the series a,b,c,d,e,f..... should happen every time I use the seed x.
Most importantly, I need to be able to compute the nth term in the series in constant time.
It seems, that I cannot achieve this with rand_r or srand(), since these need are state dependent, and I may need to compute the nth in some unknown order.
I've looked at Linear Feedback Shift registers, but these are state dependent too.
So far I have this:
int rand = (n * prime1 + seed) % prime2
n = used to indicate the index of the term in the sequence. Eg: For
first term, n ==1
prime1 and prime2 are prime numbers where
prime1 > prime2
seed = some number which allows one to use the same function to
produce a different series depending on the seed, but the same series
for a given seed.
I can't tell how good or bad this is, since I haven't used it enough, but it would be great if people with more experience in this can point out the problems with this, or help me improve it..
EDIT - I don't care if it is predictable. I'm just trying to creating some randomness in my computer graphics.
Use a cryptographic block cipher in CTR mode. The Nth output is just encrypt(N). Not only does this give you the desired properties (O(1) computation of the Nth output); it also has strong non-predictability properties.
I stumbled on this a while back, looking for a solution for the same problem. Recently, I figured out how to do it in low-constant O(log(n)) time. While this doesn't quite match the O(1) requested by the author, It may be fast enough (a sample run, compiled with -O3, achieved performance of 1 billion arbitrary index random numbers, with n varying between 1 and 2^48, in 55.7s -- just shy of 18M numbers/s).
First, the theory behind the solution:
A common type of RNGs are Linear Congruential Generators, basically, they work as follows:
random(n) = (m*random(n-1) + b) mod p
Where m and b, and p are constants (see a reference on LCGs for how they are chosen). From this, we can devise the following using a bit of modular arithmetic:
random(0) = seed mod p
random(1) = m*seed + b mod p
random(2) = m^2*seed + m*b + b mod p
...
random(n) = m^n*seed + b*Sum_{i = 0 to n - 1} m^i mod p
= m^n*seed + b*(m^n - 1)/(m - 1) mod p
Computing the above can be a problem, since the numbers will quickly exceed numeric limits. The solution for the generic case is to compute m^n in modulo with p*(m - 1), however, if we take b = 0 (a sub-case of LCGs sometimes called Multiplicative congruential Generators), we have a much simpler solution, and can do our computations in modulo p only.
In the following, I use the constant parameters used by RANF (developed by CRAY), where p = 2^48 and g = 44485709377909. The fact that p is a power of 2 reduces the number of operations required (as expected):
#include <cassert>
#include <stdint.h>
#include <cstdlib>
class RANF{
// MCG constants and state data
static const uint64_t m = 44485709377909ULL;
static const uint64_t n = 0x0000010000000000ULL; // 2^48
static const uint64_t randMax = n - 1;
const uint64_t seed;
uint64_t state;
public:
// Constructors, which define the seed
RANF(uint64_t seed) : seed(seed), state(seed) {
assert(seed > 0 && "A seed of 0 breaks the LCG!");
}
// Gets the next random number in the sequence
inline uint64_t getNext(){
state *= m;
return state & randMax;
}
// Sets the MCG to a specific index
inline void setPosition(size_t index){
state = seed;
uint64_t mPower = m;
for (uint64_t b = 1; index; b <<= 1){
if (index & b){
state *= mPower;
index ^= b;
}
mPower *= mPower;
}
}
};
#include <cstdio>
void example(){
RANF R(1);
// Gets the number through random-access -- O(log(n))
R.setPosition(12345); // Goes to the nth random number
printf("fast nth number = %lu\n", R.getNext());
// Gets the number through standard, sequential access -- O(n)
R.setPosition(0);
for(size_t i = 0; i < 12345; i++) R.getNext();
printf("slow nth number = %lu\n", R.getNext());
}
While I presume the author has moved on by now, hopefully this will be of use to someone else.
If you're really concerned about runtime performance, the above can be made about 10x faster with lookup tables, at the cost of compilation time and binary size (it also is O(1) w.r.t the desired random index, as requested by OP)
In the version below, I used c++14 constexpr to generate the lookup tables at compile time, and got to 176M arbitrary index random numbers per second (doing this did however add about 12s of extra compilation time, and a 1.5MB increase in binary size -- the added time may be mitigated if partial recompilation is used).
class RANF{
// MCG constants and state data
static const uint64_t m = 44485709377909ULL;
static const uint64_t n = 0x0000010000000000ULL; // 2^48
static const uint64_t randMax = n - 1;
const uint64_t seed;
uint64_t state;
// Lookup table
struct lookup_t{
uint64_t v[3][65536];
constexpr lookup_t() : v() {
uint64_t mi = RANF::m;
for (size_t i = 0; i < 3; i++){
v[i][0] = 1;
uint64_t val = mi;
for (uint16_t j = 0x0001; j; j++){
v[i][j] = val;
val *= mi;
}
mi = val;
}
}
};
friend struct lookup_t;
public:
// Constructors, which define the seed
RANF(uint64_t seed) : seed(seed), state(seed) {
assert(seed > 0 && "A seed of 0 breaks the LCG!");
}
// Gets the next random number in the sequence
inline uint64_t getNext(){
state *= m;
return state & randMax;
}
// Sets the MCG to a specific index
// Note: idx.u16 indices need to be adapted for big-endian machines!
inline void setPosition(size_t index){
static constexpr auto lookup = lookup_t();
union { uint16_t u16[4]; uint64_t u64; } idx;
idx.u64 = index;
state = seed * lookup.v[0][idx.u16[0]] * lookup.v[1][idx.u16[1]] * lookup.v[2][idx.u16[2]];
}
};
Basically, what this does is splits the computation of, for example, m^0xAAAABBBBCCCC mod p, into (m^0xAAAA00000000 mod p)*(m^0xBBBB0000 mod p)*(m^CCCC mod p) mod p, and then precomputes tables for each of the values in the 0x0000 - 0xFFFF range that could fill AAAA, BBBB or CCCC.
RNG in a normal sense, have the sequence pattern like f(n) = S(f(n-1))
They also lost precision at some point (like % mod), due to computing convenience, therefore it is not possible to expand the sequence to a function like X(n) = f(n) = trivial function with n only.
This mean at best you have O(n) with that.
To target for O(1) you therefore need to abandon the idea of f(n) = S(f(n-1)), and designate a trivial formula directly so that the N'th number can be calculated directly without knowing (N-1)'th; this also render the seed meaningless.
So, you end up have a simple algebra function and not a sequence. For example:
int my_rand(int n) { return 42; } // Don't laugh!
int my_rand(int n) { 3*n*n + 2*n + 7; }
If you want to put more constraint to the generated pattern (like distribution), it become a complex maths problem.
However, for your original goal, if what you want is constant speed to get pseudo-random numbers, I suggest to pre-generate it with traditional RNG and access with lookup table.
EDIT: I noticed you have concern with a table size for a lot of numbers, however you may introduce some hybrid model, like a table of N entries, and do f(k) = g( tbl[k%n], k), which at least provide good distribution across N continue sequence.
This demonstrates an PRNG implemented as a hashed counter. This might appear to duplicate R.'s suggestion (using a block cipher in CTR mode as a stream cipher), but for this, I avoided using cryptographically secure primitives: for speed of execution and because security wasn't a desired feature.
If we were trying to create a secure stream cipher with your requirement that any emitted sequence be trivially repeatable, given knowledge of its index...
...then we could choose a secure hash algorithm (like SHA256) and a counter with a lot of bits (maybe 2048 -> sequence repeats every 2^2048 generated numbers without reseeding).
HOWEVER, the version I present here uses Bob Jenkins' famous hash function (simple and fast, but not secure) along with a 64-bit counter (which is as big as integers can get on my system, without needing custom incrementing code).
Code in main demonstrates that knowledge of the RNG's counter (seed) after initialization allows a PRNG sequence to be repeated, as long as we know how many values were generated leading up to the repetition point.
Actually, if you know the counter's value at any point in the output sequence, you will be able to retrieve all values generated previous to that point, AND all values which will be generated afterward. This only involves adding or subtracting ordinal differences to/from a reference counter value associated with a known point in the output sequence.
It should be pretty easy to adapt this class for use as a testing framework -- you could plug in other hash functions and change the counter's size to see what kind of impact there is on speed as well as the distribution of generated values (the only uniformity analysis I did was to look for patterns in the screenfuls of hexadecimal numbers printed by main()).
#include <iostream>
#include <iomanip>
#include <ctime>
using namespace std;
class CHashedCounterRng {
static unsigned JenkinsHash(const void *input, unsigned len) {
unsigned hash = 0;
for(unsigned i=0; i<len; ++i) {
hash += static_cast<const unsigned char*>(input)[i];
hash += hash << 10;
hash ^= hash >> 6;
}
hash += hash << 3;
hash ^= hash >> 11;
hash += hash << 15;
return hash;
}
unsigned long long m_counter;
void IncrementCounter() { ++m_counter; }
public:
unsigned long long GetSeed() const {
return m_counter;
}
void SetSeed(unsigned long long new_seed) {
m_counter = new_seed;
}
unsigned int operator ()() {
// the next random number is generated here
const auto r = JenkinsHash(&m_counter, sizeof(m_counter));
IncrementCounter();
return r;
}
// the default coontructor uses time()
// to seed the counter
CHashedCounterRng() : m_counter(time(0)) {}
// you can supply a predetermined seed here,
// or after construction with SetSeed(seed)
CHashedCounterRng(unsigned long long seed) : m_counter(seed) {}
};
int main() {
CHashedCounterRng rng;
// time()'s high bits change very slowly, so look at low digits
// if you want to verify that the seed is different between runs
const auto stored_counter = rng.GetSeed();
cout << "initial seed: " << stored_counter << endl;
for(int i=0; i<20; ++i) {
for(int j=0; j<8; ++j) {
const unsigned x = rng();
cout << setfill('0') << setw(8) << hex << x << ' ';
}
cout << endl;
}
cout << endl;
cout << "The last line again:" << endl;
rng.SetSeed(stored_counter + 19 * 8);
for(int j=0; j<8; ++j) {
const unsigned x = rng();
cout << setfill('0') << setw(8) << hex << x << ' ';
}
cout << endl << endl;
return 0;
}