How to determine big O complexity for mt19937 algorithm

How to determine big O complexity for mt19937 algorithm - random

Create a length n array to store the state of the generator
int[0..n-1] MT
int index := n+1
const int lower_mask = (1 << r) - 1 // That is, the binary number of r 1's
const int upper_mask = lowest w bits of (not lower_mask)
Initialize the generator from a seed
function seed_mt(int seed) {
index := n
MT[0] := seed
for i from 1 to (n - 1) { // loop over each element
MT[i] := lowest w bits of (f * (MT[i-1] xor (MT[i-1] >> (w-2))) + i)
}
}
Extract a tempered value based on MT[index]
calling twist() every n numbers
function extract_number() {
if index >= n {
if index > n {
error "Generator was never seeded"
// Alternatively, seed with constant value; 5489 is used in reference C code[50]
}
twist()
}
int y := MT[index]
y := y xor ((y >> u) and d)
y := y xor ((y << s) and b)
y := y xor ((y << t) and c)
y := y xor (y >> l)
index := index + 1
return lowest w bits of (y)
}
Generate the next n values from the series x_i
function twist() {
for i from 0 to (n-1) {
int x := (MT[i] and upper_mask)
+ (MT[(i+1) mod n] and lower_mask)
int xA := x >> 1
if (x mod 2) != 0 { // lowest bit of x is 1
xA := xA xor a
}
MT[i] := MT[(i + m) mod n] xor xA
}
index := 0
}

Related

Finding sum of geometric sequence with modulo 10^9+7 with my program

The problem is given as:
Output the answer of (A^1+A^2+A^3+...+A^K) modulo 1,000,000,007, where 1≤ A, K ≤ 10^9, and A and K must be an integer.
I am trying to write a program to compute the above question. I have tried using the formula for geometric sequence, then applying the modulo on the answer. Since the results must be an integer as well, finding modulo inverse is not required.
Below is the code I have now, its in pascal
Var
a,k,i:longint;
power,sum: int64;
Begin
Readln(a,k);
power := 1;
For i := 1 to k do
power := ((power mod 1000000007) * a) mod 1000000007;
sum := a * (power-1) div (a-1);
Writeln(sum mod 1000000007);
End.
This task came from my school, they do not give away their test data to the students. Hence I do not know why or where my program is wrong. I only know that my program outputs the wrong answer for their test data.

If you want to do this without calculating a modular inverse, you can calculate it recursively using:
1+ A + A2 + A3 + ... + Ak
= 1 + (A + A2)(1 + A2 + (A2)2 + ... + (A2)k/2-1)
That's for even k. For odd k:
1+ A + A2 + A3 + ... + Ak
= (1 + A)(1 + A2 + (A2)2 + ... + (A2)(k-1)/2)
Since k is divided by 2 in each recursive call, the resulting algorithm has O(log k) complexity. In java:
static int modSumAtoAk(int A, int k, int mod)
{
return (modSum1ToAk(A, k, mod) + mod-1) % mod;
}
static int modSum1ToAk(int A, int k, int mod)
{
long sum;
if (k < 5) {
//k is small -- just iterate
sum = 0;
long x = 1;
for (int i=0; i<=k; ++i) {
sum = (sum+x) % mod;
x = (x*A) % mod;
}
return (int)sum;
}
//k is big
int A2 = (int)( ((long)A)*A % mod );
if ((k%2)==0) {
// k even
sum = modSum1ToAk(A2, (k/2)-1, mod);
sum = (sum + sum*A) % mod;
sum = ((sum * A) + 1) % mod;
} else {
// k odd
sum = modSum1ToAk(A2, (k-1)/2, mod);
sum = (sum + sum*A) % mod;
}
return (int)sum;
}
Note that I've been very careful to make sure that each product is done in 64 bits, and to reduce by the modulus after each one.
With a little math, the above can be converted to an iterative version that doesn't require any storage:
static int modSumAtoAk(int A, int k, int mod)
{
// first, we calculate the sum of all 1... A^k
// we'll refer to that as SUM1 in comments below
long fac=1;
long add=0;
//INVARIANT: SUM1 = add + fac*(sum 1...A^k)
//this will remain true as we change k
while (k > 0) {
//above INVARIANT is true here, too
long newmul, newadd;
if ((k%2)==0) {
//k is even. sum 1...A^k = 1+A*(sum 1...A^(k-1))
newmul = A;
newadd = 1;
k-=1;
} else {
//k is odd.
newmul = A+1L;
newadd = 0;
A = (int)(((long)A) * A % mod);
k = (k-1)/2;
}
//SUM1 = add + fac * (newadd + newmul*(sum 1...Ak))
// = add+fac*newadd + fac*newmul*(sum 1...Ak)
add = (add+fac*newadd) % mod;
fac = (fac*newmul) % mod;
//INVARIANT is restored
}
// k == 0
long sum1 = fac + add;
return (int)((sum1 + mod -1) % mod);
}

Express a given number as a sum of four squares

I am looking for an algorithm that expresses a given number as a sum of (up to) four squares.
Examples
       120 = 82 + 62 + 42 + 22
       6 = 02 + 12 + 12 + 22
       20 = 42 + 22 + 02+ 02
My approach
Take the square root and repeat this repeatedly for the remainder:
while (count != 4) {
root = (int) Math.sqrt(N)
N -= root * root
count++
}
But this fails when N is 23, even though there is a solution:
       32 + 32+ 22 + 12
Question
Is there any other algorithm to do that?
Is it always possible?

###Always possible?
Yes, the Lagrange's four square theorem states that:
every natural number can be represented as the sum of four integer squares.
It has been proved in several ways.
###Algorithm
There are some smarter algorithms, but I would suggest the following algorithm:
Factorise the number into prime factors. They don't have to be prime, but the smaller they are, the better: so primes are best. Then solve the task for each of these factors as below, and combine any resulting 4 squares with the previously found 4 squares with the Euler's four-square identity.
         (a2 + b2 + c2 + d2)
(A2 + B2 + C2 + D2) =
               (aA + bB + cC + dD)2 +
               (aB − bA + cD − dC)2 +
               (aC − bD − cA + dB)2 +
               (aD + bC − cB − dA)2
Given a number n (one of the factors mentioned above), get the greatest square that is not greater than n, and see if n minus this square can be written as the sum of three squares using the Legendre's three-square theorem: it is possible, if and only when this number is NOT of the following form:
        4a(8b+7)
If this square is not found suitable, try the next smaller one, ... until you find one. It guaranteed there will be one, and most are found within a few retries.
Try to find an actual second square term in the same way as in step 1, but now test its viability using Fermat's theorem on sums of two squares which in extension means that:
if all the prime factors of n congruent to 3 modulo 4 occur to an even exponent, then n is expressible as a sum of two squares. The converse also holds.
If this square is not found suitable, try the next smaller one, ... until you find one. It's guaranteed there will be one.
Now we have a remainder after subtracting two squares. Try subtracting a third square until that yields another square, which means we have a solution. This step can be improved by first factoring out the largest square divisor. Then when the two square terms are identified, each can then be multiplied again by the square root of that square divisor.
This is roughly the idea. For finding prime factors there are several solutions. Below I will just use the Sieve of Eratosthenes.
This is JavaScript code, so you can run it immediately -- it will produce a random number as input and display it as the sum of four squares:
function divisor(n, factor) {
var divisor = 1;
while (n % factor == 0) {
n = n / factor;
divisor = divisor * factor;
}
return divisor;
}
function getPrimesUntil(n) {
// Prime sieve algorithm
var range = Math.floor(Math.sqrt(n)) + 1;
var isPrime = Array(n).fill(1);
var primes = [2];
for (var m = 3; m < range; m += 2) {
if (isPrime[m]) {
primes.push(m);
for (var k = m * m; k <= n; k += m) {
isPrime[k] = 0;
}
}
}
for (var m = range + 1 - (range % 2); m <= n; m += 2) {
if (isPrime[m]) primes.push(m);
}
return {
primes: primes,
factorize: function (n) {
var p, count, primeFactors;
// Trial division algorithm
if (n < 2) return [];
primeFactors = [];
for (p of this.primes) {
count = 0;
while (n % p == 0) {
count++;
n /= p;
}
if (count) primeFactors.push({value: p, count: count});
}
if (n > 1) {
primeFactors.push({value: n, count: 1});
}
return primeFactors;
}
}
}
function squareTerms4(n) {
var n1, n2, n3, n4, sq, sq1, sq2, sq3, sq4, primes, factors, f, f3, factors3, ok,
res1, res2, res3, res4;
primes = getPrimesUntil(n);
factors = primes.factorize(n);
res1 = n > 0 ? 1 : 0;
res2 = res3 = res4 = 0;
for (f of factors) { // For each of the factors:
n1 = f.value;
// 1. Find a suitable first square
for (sq1 = Math.floor(Math.sqrt(n1)); sq1>0; sq1--) {
n2 = n1 - sq1*sq1;
// A number can be written as a sum of three squares
// <==> it is NOT of the form 4^a(8b+7)
if ( (n2 / divisor(n2, 4)) % 8 !== 7 ) break; // found a possibility
}
// 2. Find a suitable second square
for (sq2 = Math.floor(Math.sqrt(n2)); sq2>0; sq2--) {
n3 = n2 - sq2*sq2;
// A number can be written as a sum of two squares
// <==> all its prime factors of the form 4a+3 have an even exponent
factors3 = primes.factorize(n3);
ok = true;
for (f3 of factors3) {
ok = (f3.value % 4 != 3) || (f3.count % 2 == 0);
if (!ok) break;
}
if (ok) break;
}
// To save time: extract the largest square divisor from the previous factorisation:
sq = 1;
for (f3 of factors3) {
sq *= Math.pow(f3.value, (f3.count - f3.count % 2) / 2);
f3.count = f3.count % 2;
}
n3 /= sq*sq;
// 3. Find a suitable third square
sq4 = 0;
// b. Find square for the remaining value:
for (sq3 = Math.floor(Math.sqrt(n3)); sq3>0; sq3--) {
n4 = n3 - sq3*sq3;
// See if this yields a sum of two squares:
sq4 = Math.floor(Math.sqrt(n4));
if (n4 == sq4*sq4) break; // YES!
}
// Incorporate the square divisor back into the step-3 result:
sq3 *= sq;
sq4 *= sq;
// 4. Merge this quadruple of squares with any previous
// quadruple we had, using the Euler square identity:
while (f.count--) {
[res1, res2, res3, res4] = [
Math.abs(res1*sq1 + res2*sq2 + res3*sq3 + res4*sq4),
Math.abs(res1*sq2 - res2*sq1 + res3*sq4 - res4*sq3),
Math.abs(res1*sq3 - res2*sq4 - res3*sq1 + res4*sq2),
Math.abs(res1*sq4 + res2*sq3 - res3*sq2 - res4*sq1)
];
}
}
// Return the 4 squares in descending order (for convenience):
return [res1, res2, res3, res4].sort( (a,b) => b-a );
}
// Produce the result for some random input number
var n = Math.floor(Math.random() * 1000000);
var solution = squareTerms4(n);
// Perform the sum of squares to see it is correct:
var check = solution.reduce( (a,b) => a+b*b, 0 );
if (check !== n) throw "FAILURE: difference " + n + " - " + check;
// Print the result
console.log(n + ' = ' + solution.map( x => x+'²' ).join(' + '));
The article by by Michael Barr on the subject probably represents a more time-efficient method, but the text is more intended as a proof than an algorithm. However, if you need more time-efficiency you could consider that, together with a more efficient factorisation algorithm.

It's always possible -- it's a theorem in number theory called "Lagrange's four square theorem."
To solve it efficiently: the paper Randomized algorithms in number theory (Rabin, Shallit) gives a method that runs in expected O((log n)^2) time.
There is interesting discussion about the implementation here: https://math.stackexchange.com/questions/483101/rabin-and-shallit-algorithm
Found via Wikipedia:Langrange's four square theorem.

Here is solution , Simple 4 loops
max = square_root(N)
for(int i=0;i<=max;i++)
for(int j=0;j<=max;j++)
for(int k=0;k<=max;k++)
for(int l=0;l<=max;l++)
if(i*i+j*j+k*k+l*l==N){
found
break;
}
So you can test for any numbers. You can use break condition after two loops if sum exceeds then break it.

const fourSquares = (n) => {
const result = [];
for (let i = 0; i <= n; i++) {
for (let j = 0; j <= n; j++) {
for (let k = 0; k <= n; k++) {
for (let l = 0; l <= n; l++) {
if (i * i + j * j + k * k + l * l === n) {
result.push(i, j, k, l);
return result;
}
}
}
}
}
return result;
}
It's running too long
const fourSquares = (n) => {
const result = [];
for (let i = 0; i <= n; i++) {
for (let j = 0; j <= (n - i * i); j++) {
for (let k = 0; k <= (n - i * i - j * j); k++) {
for (let l = 0; l <= (n - i * i - j * j - k * k); l++) {
if (i * i + j * j + k * k + l * l === n) {
result.push(i, j, k, l);
return result;
}
}
}
}
}
return result;
}

const fourSquares = (n) => {
const result = [];
for (let i = 0; i * i <= n; i++) {
for (let j = 0; j * j <= n; j++) {
for (let k = 0; k * k <= n; k++) {
for (let l = 0; l * l <= n; l++) {
if (i * i + j * j + k * k + l * l === n) {
result.push(i, j, k, l);
return result;
}
}
}
}
}
return result;
}
const fourSquares = (n) => {
let a = Math.sqrt(n);
let b = Math.sqrt(n - a * a);
let c = Math.sqrt(n - a * a - b * b);
let d = Math.sqrt(n - a * a - b * b - c * c);
if (n === a * a + b * b + c * c + d * d) {
return [a, b, c, d];
}
}

Smallest number in a range [a,b] with maximum number of '1' in binary representation

Given a range [a,b] (both inclusive) I need to find the smallest number with the maximum number of '1's in binary representation. My current approach is I find the number of bits set in all numbers from a to b and keep track of the maximum.
However this is very slow, any faster method?

Let's find most significant bit which is different in a and b. It will be 0 in a, 1 in b. If we place all other bits to the right to 1 - resulting number will be still in range [a; b]. And it will the single number with maximum number of ones in representation.
EDIT. The result of this algorithm always returns the number with n-1 bits set to one, where n is number of bits which can be changed. As pointed in comments - there is a bug in case if all of there n bits in b are set to 1. Here is the fixed code snippet:
int maximizeBits(int a, int b) {
if (a == b) {
return a;
}
int m = a ^ b, pow2 = 1; // MSB of m=a^b is bit that we need to find
while (m > pow2) { // Set other bits to 0
if ((m & pow2) != 0) {
m ^= pow2;
}
pow2 <<= 1;
}
int res = a | (m - 1); // Now m is in form of 2^n and m - 1 would be mask of n-1 bits
if ((res | b) <= b) { // Fix of problem if all n bits in b are set to 1
res = b;
}
return res;
}

You can replace the loop in Jarlax' answer by a "parallel suffix OR", like this
uint32_t m = (a ^ b) >> 1;
m |= m >> 1;
m |= m >> 2;
m |= m >> 4;
m |= m >> 8;
m |= m >> 16;
uint32_t res = a | m;
if ((res | b) <= b)
res = b;
return res;
It generalizes to different sizes integer, using ceil(log(k)) steps in general. The initial test a == b is not necessary, a ^ b would be zero, therefore m is zero, so nothing interesting happens anyway.
Alternatively, here's a completely different approach: keep changing the lowest 0 to a 1 until it is no longer possible.
unsigned x = a;
while (x < b) {
unsigned newx = (x + 1) | x; // set lowest 0
if (newx <= b)
x = newx;
else
break;
}
return x;

Fast Iterative GCD

I have GCD(n, i) where i=1 is increasing in loop by 1 up to n. Is there any algorithm which calculate all GCD's faster than naive increasing and compute GCD using Euclidean algorithm?
PS I've noticed if n is prime I can assume that number from 1 to n-1 would give 1, because prime number would be co-prime to them. Any ideas for other numbers than prime?

C++ implementation, works in O(n * log log n) (assuming size of integers are O(1)):
#include <cstdio>
#include <cstring>
using namespace std;
void find_gcd(int n, int *gcd) {
// divisor[x] - any prime divisor of x
// or 0 if x == 1 or x is prime
int *divisor = new int[n + 1];
memset(divisor, 0, (n + 1) * sizeof(int));
// This is almost copypaste of sieve of Eratosthenes, but instead of
// just marking number as 'non-prime' we remeber its divisor.
// O(n * log log n)
for (int x = 2; x * x <= n; ++x) {
if (divisor[x] == 0) {
for (int y = x * x; y <= n; y += x) {
divisor[y] = x;
}
}
}
for (int x = 1; x <= n; ++x) {
if (n % x == 0) gcd[x] = x;
else if (divisor[x] == 0) gcd[x] = 1; // x is prime, and does not divide n (previous line)
else {
int a = x / divisor[x], p = divisor[x]; // x == a * p
// gcd(a * p, n) = gcd(a, n) * gcd(p, n / gcd(a, n))
// gcd(p, n / gcd(a, n)) == 1 or p
gcd[x] = gcd[a];
if ((n / gcd[a]) % p == 0) gcd[x] *= p;
}
}
}
int main() {
int n;
scanf("%d", &n);
int *gcd = new int[n + 1];
find_gcd(n, gcd);
for (int x = 1; x <= n; ++x) {
printf("%d:\t%d\n", x, gcd[x]);
}
return 0;
}

SUMMARY
The possible answers for the gcd consist of the factors of n.
You can compute these efficiently as follows.
ALGORITHM
First factorise n into a product of prime factors, i.e. n=p1^n1*p2^n2*..*pk^nk.
Then you can loop over all factors of n and for each factor of n set the contents of the GCD array at that position to the factor.
If you make sure that the factors are done in a sensible order (e.g. sorted) you should find that the array entries that are written multiple times will end up being written with the highest value (which will be the gcd).
CODE
Here is some Python code to do this for the number 1400=2^3*5^2*7:
prime_factors=[2,5,7]
prime_counts=[3,2,1]
N=1
for prime,count in zip(prime_factors,prime_counts):
N *= prime**count
GCD = [0]*(N+1)
GCD[0] = N
def go(i,n):
"""Try all counts for prime[i]"""
if i==len(prime_factors):
for x in xrange(n,N+1,n):
GCD[x]=n
return
n2=n
for c in xrange(prime_counts[i]+1):
go(i+1,n2)
n2*=prime_factors[i]
go(0,1)
print N,GCD

Binary GCD algorithm:
https://en.wikipedia.org/wiki/Binary_GCD_algorithm
is faster than Euclidean algorithm:
https://en.wikipedia.org/wiki/Euclidean_algorithm
I implemented "gcd()" in C for type "__uint128_t" (with gcc on Intel i7 Ubuntu), based on iterative Rust version:
https://en.wikipedia.org/wiki/Binary_GCD_algorithm#Iterative_version_in_Rust
Determining number of trailing 0s was done efficiently with "__builtin_ctzll()". I did benchmark 1 million loops of two biggest 128bit Fibonacci numbers (they result in maximal number of iterations) against gmplib "mpz_gcd()" and saw 10% slowdown. Utilizing the fact that u/v values only decrease, I switched to 64bit special case "_gcd()" when "<=UINT64_max" and now see speedup of 1.31 over gmplib, for details see:
https://www.raspberrypi.org/forums/viewtopic.php?f=33&t=311893&p=1873552#p1873552
inline int ctz(__uint128_t u)
{
unsigned long long h = u;
return (h!=0) ? __builtin_ctzll( h )
: 64 + __builtin_ctzll( u>>64 );
}
unsigned long long _gcd(unsigned long long u, unsigned long long v)
{
for(;;) {
if (u > v) { unsigned long long a=u; u=v; v=a; }
v -= u;
if (v == 0) return u;
v >>= __builtin_ctzll(v);
}
}
__uint128_t gcd(__uint128_t u, __uint128_t v)
{
if (u == 0) { return v; }
else if (v == 0) { return u; }
int i = ctz(u); u >>= i;
int j = ctz(v); v >>= j;
int k = (i < j) ? i : j;
for(;;) {
if (u > v) { __uint128_t a=u; u=v; v=a; }
if (v <= UINT64_MAX) return _gcd(u, v) << k;
v -= u;
if (v == 0) return u << k;
v >>= ctz(v);
}
}

Segment tree Lazy propagation & my code

I am currently solving a problem on segment tree. I think the problem needs lazy propagation concept to be solved. As I'm very new to this concept, i'm having trouble with my code.
The problem in a nutshell is as follows:
initially, all array elements are 0 and they are indexed 0 to N-1
command 1. 0 x y v - updates value of each array indexes between x and y by v
command 2. 1 x y - output the sum of all numbers between array index x & y.
Input starts with an integer T (≤ 5), denoting the number of test cases.
Each case contains two integers n (1 ≤ n ≤ 105) and q (1 ≤ q ≤ 50000). Each of the next q lines contains a task in one of the following form:
0 x y v (0 ≤ x ≤ y < n, 1 ≤ v ≤ 1000)
1 x y (0 ≤ x ≤ y < n)
For each case, print the case number first. Then for each query '1 x y', print the sum of all the array elements between x and y.
Here is my attempt:
template<class T>
class SegmentTree
{
T *tree,*update_tree;
long size;
public:
SegmentTree(long N)
{
long x= (long)ceil(log2(N))+1;
long size = 2*(long)pow(2,x);
tree = new T[size];
update_tree = new T[size];
memset(tree,0,sizeof(tree));
memset(update_tree,0,sizeof(update_tree));
}
void update(long node, long start, long end, long i, long j, long val)
{
if(start>j || end<i) return;
if(start>=i && end<=j){
if(start==end){
tree[node]+=val;
return;
}
tree[node]+=val;
update_tree[2*node] += val;
update_tree[2*node+1]+=val;
return;
}
long mid = (start+end)/2;
update(2*node,start,mid,i,j,val);
update(2*node+1,mid+1,end,i,j,val);
}
T query(long node, long start, long end, long i, long j, long val)
{
if(start>j || end<i) return -1;
if(start>=i && end<=j)
return ((tree[node]+val)*(end-start+1));
long a,b;
a = update_tree[2*node];
b = update_tree[2*node+1];
long mid = (start+end)/2;
long val1 = query(2*node,start,mid,i,j,val+a);
long val2 = query(2*node+1,mid+1,end,i,j,val+b);
if(val1==-1)
return val2;
if(val2==-1)
return val1;
return val1+val2;
}
};
int main()
{
long N,q,x,y,res;
int tc=1, T,v,d;
scanf("%d",&T);
while(tc<=T)
{
scanf("%ld %ld",&N,&q);
SegmentTree<long>s(N);
printf("Case %d:\n",tc++);
while(q--){
scanf("%d",&d);
if(!d){
scanf("%ld %ld %d",&x,&y,&v);
s.update(1,0,N-1,x,y,v);
}
else{
scanf("%ld %ld",&x,&y);
res = s.query(1,0,N-1,x,y,0);
printf("%ld\n",res);
}
}
}
return 0;
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to determine big O complexity for mt19937 algorithm - random

Related

Finding sum of geometric sequence with modulo 10^9+7 with my program

Express a given number as a sum of four squares

Smallest number in a range [a,b] with maximum number of '1' in binary representation

Fast Iterative GCD

Segment tree Lazy propagation & my code

Categories

Resources