Making a customizable LCG that travels backward and forward

Making a customizable LCG that travels backward and forward - random

How would i go about making an LCG (type of pseudo random number generator) travel in both directions?
I know that travelling forward is (a*x+c)%m but how would i be able to reverse it?
I am using this so i can store the seed at the position of the player in a map and be able to generate things around it by propogating backward and forward in the LCG (like some sort of randomized number line).

All LCGs cycle. In an LCG which achieves maximal cycle length there is a unique predecessor and a unique successor for each value x (which won't necessarily be true for LCGs that don't achieve maximal cycle length, or for other algorithms with subcycle behaviors such as von Neumann's middle-square method).
Suppose our LCG has cycle length L. Since the behavior is cyclic, that means that after L iterations we are back to the starting value. Finding the predecessor value by taking one step backwards is mathematically equivalent to taking (L-1) steps forward.
The big question is whether that can be converted into a single step. If you're using a Prime Modulus Multiplicative LCG (where the additive constant is zero), it turns out to be pretty easy to do. If xi+1 = a * xi % m, then xi+n = an * xi % m. As a concrete example, consider the PMMLCG with a = 16807 and m = 231-1. This has a maximal cycle length of m-1 (it can never yield 0 for obvious reasons), so our goal is to iterate m-2 times. We can precalculate am-2 % m = 1407677000 using readily available exponentiation/mod libraries. Consequently, a forward step is found as xi+1 = 16807 * xi % 231-1, while a backwards step is found as xi-1 = 1407677000 * xi % 231-1.
ADDITIONAL
The same concept can be extended to generic full-cycle LCGs by casting the transition in matrix form and doing fast matrix exponentiation to come up with the equivalent one-stage transform. The matrix formulation for xi+1 = (a * xi + c) % m is Xi+1 = T · Xi % m, where T is the matrix [[a c],[0 1]] and X is the column vector (x, 1) transposed. Multiple iterations of the LCG can be quickly calculated by raising T to any desired power through fast exponentiation techniques using squaring and halving the power. After noticing that powers of matrix T never alter the second row, I was able to focus on just the first row calculations and produced the following implementation in Ruby:
def power_mod(ary, mod, power)
return ary.map { |x| x % mod } if power < 2
square = [ary[0] * ary[0] % mod, (ary[0] + 1) * ary[1] % mod]
square = power_mod(square, mod, power / 2)
return square if power.even?
return [square[0] * ary[0] % mod, (square[0] * ary[1] + square[1]) % mod]
end
where ary is a vector containing a and c, the multiplicative and additive coefficients.
Using this with power set to the cycle length - 1, I was able to determine coefficients which yield the predecessor for various LCGs listed in Wikipedia. For example, to "reverse" the LCG with a = 1664525, c = 1013904223, and m = 232, use a = 4276115653 and c = 634785765. You can easily confirm that the latter set of coefficients reverses the sequence produced by using the original coefficients.

Related

Algorithm for finding a linear dependence with strictly positive coefficients

This must be surely well known, being a particular linear programming problem. What I want is a specific easy to implement efficient algorithm adapted to this very case, for relatively small sizes (about, say, ten vectors of dimension less than twenty).
I have vectors v(1), ..., v(m) of the same dimension. Want an
algorithm that produces strictly positive numbers c(1), ..., c(m)
such that c(1)v(1) + ... + c(m)v(m) is the zero vector, or tells for
sure that no such numbers exist.
What I found (in some clever code by a colleague) gives an approximate algorithm like this:
start with, say, c(1) = ... = c(m) = 1/m;
at each stage, given current approximation v = c(1)v(1) + ... + c(m)v(m), seek for j such that v - v(j) is longer than v(j).
If no such j exists then output "no solution" (or c(1), ..., c(m) if v is zero).
If such j exists, change v to the new approximation (1 - c)v + cv(j) with some small positive c.
This changes c(j) to (1 - c)c(j) + c and each other c(i) to (1 - c)c(i), so that the new coefficients will remain positive and strictly less than 1 (in fact they will sum to 1 all the time, i. e. we will remain in the convex hull of the v(i)).
Moreover the new v will have strictly smaller length, so eventually the algorithm will either discover that there is no solution or will produce arbitrarily small v.
Clearly this is incomplete and not satisfactory from several points of view. Can one do better?
Update
There are by now two useful answers; however one final step is missing.
They both boil down to the following (unless I miss some essential point).
Take a basis of the nullspace of v(1), ..., v(m).
One obtains a collection of not necessarily strictly positive solutions c(1), ..., c(m), c'(1), ..., c'(m), c''(1), ..., c''(m), ... such that any such solution is their linear combination (in a unique way). So we are reduced to the question whether this new collection of m-dimensional vectors admits a linear combination with strictly positive entries.
Example: take four 2d-vectors (2,1), (3,-1), (-1,2), (-3,-3). Their nullspace has a basis consisting of two solutions c = (12,-3,0,5), c' = (-1,1,1,0). None of these are strictly positive but their combination c + 4c' = (8,1,4,5) is. So the latter is the desired solution. But in general it might be not so easy to find out whether a strictly positive solution exists and if yes, how to find it.
As suggested in the answer by btilly one might use Fourier-Motzkin elimination for that, but again, I would be grateful for more details about it.

This is doable as follows.
First write your vectors as columns. Put them into a matrix. Now create a single column with entries c(1), c(2), ..., c(m_)). If you multiply that matrix times that column, you get your linear combination.
Now consider the elementary row operations. Multiply a row by a constant, swap two rows, add a multiple of one row to another. If you do an elementary row operation to the matrix, your linear combination after the row operation will be 0 if and only if it was before the row operation. Therefore doing elementary row operations DOESN'T CHANGE the coefficients that you're looking for.
Therefore you may simplify life by doing elementary row operations to put the matrix into reduced row echelon form. Once it is in reduced row echelon form, life gets easier. Columns which do not contain a pivot correspond to free coefficients. Columns which do contain a pivot correspond to coefficients that must be a specific linear combination of free coefficients. This reduces your problem being to find positive values for the free coefficients that make the others also positive. So you're now just solving a system of inequalities (and generally in far fewer variables).
Whether a system of linear inequalities has a solution can be answered with the FME method.

Denoting by A the matrix where the ith row is v(i) and by x the vector whose ith index is c(i), your problem can be describes as Ax = b where b=0 is the zero vector. The problem of Ax=b when b is not equal to zero is called the least squares problem (or the inhomogeneous least squares) and has a close form solution in the sense of Minimal Mean Square Error (MMSE). In your case however, b = 0 therefore we are in the homogeneous least squares problem. In Linear Algebra this can be looked as an eigenvalue problem, whose solution is the eigenvector x of the matrix A^TA whose eigenvalue is equal to 0. If no such eigenvalue exists, the MMSE solution will the the eigenvalue x whose matching eigenvalue is the smallest (closest to 0). A nice discussion on this topic is given here.
The solution is, as stated above, will be the eigenvector of A^TA with the lowest matching eigenvalue. This can be done using Singular Value Decomposition (SVD), which will decompose the matrix A into
The column of V matching with the lowest eigenvalue in the diagonal matrix Sigma will be your solution.
Explanation
When we want to minimize the Ax = 0 in the MSE sense, we can compute the vector derivative w.r.t x as follows:
Therefore, the eigenvector of A^TA matching the smallest eigenvalue will solve your problem.
Practical solution example
In python, you can use numpy.linalg.svd to perform the SVD decomposition. numpy orders the matrices U and V^T such that the leftmost column matches the largest eigenvalue and the rightmost column matches the lowest eigenvalue. Thus, you need to compute the SVD and take the rightmost column of the resulting V:
from numpy.linalg import svd
[_, _, vt] = svd(A)
x = vt[-1] # we take the last row since this is a transposed matrix, so the last column of V is the last row of V^T
One zero eigenvalue
In this case there is only one non trivial vector who solves the problem and the only way to satisfy the strictly positive condition will be if the values in the vector are all positive or all negative (multiplying the vector by -1 will not change the result)
Multiple zero eigenvalues
In the case where we have multiple zero eigenvalues, any of their matching eigenvectors is a possible solution and any linear combination of them. In this case one would have to check if there is a linear combination of these eigenvectors which creates a vector where all the values are strictly positive in order to satisfy the strictly positive condition.
How do we find the solution if one exists? once we are left with the basis of eigenvectors matching zero eigenvalue (also known as null-space) what we need to do is to solve a system of linear inequalities. I'll explain by example, since it will be clearer this way. Suppose we have the following matrix:
import numpy as np
A = np.array([[ 2, 3, -1, -3],
[ 1, -1, 2, -3]])
[_, Sigma, Vt] = np.linalg.svd(A) # Sigma has only 2 non-zero values, meaning that the null-space have a dimension of 2
We can extract the eigenvectors as explained above:
C = Vt[len(Sigma):]
# array([[-0.10292809, 0.59058542, 0.75313786, 0.27092073],
# [ 0.89356997, -0.15289589, 0.09399548, 0.4114856 ]])
What we want to find are two real coefficients, noted as x and y such that:
-0.10292809*x + 0.89356997*y > 0
0.59058542*x - 0.15289589*y > 0
0.75313786*x + 0.09399548*y > 0
0.27092073*x + 0.4114856*y > 0
We have a system of 4 inequalities with 2 variables, therefore in this case a solution is not promised. A solution can be found in many ways but I will propose the following. We can start with an initial guess and go over each hyperplane to check if the initial guess satisfies the inequality. if not we can reflect the guess to the other side of the hyperplane. After passing all the hyperplanes we check for a solution. (explanation of hot to reflect a point w.r.t a line can be found here). An example for python implementation will be:
import numpy as np
def get_strictly_positive(A):
[_, Sigma, Vt] = np.linalg.svd(A)
if len(Sigma[np.abs(Sigma) > 1e-5]) == Vt.shape[0]: # No zero eigenvalues, taking MMSE solution if exists
c = Vt[-1]
if np.sum(c > 0) == len(c) or np.sum(c < 0) == len(c):
return c if np.sum(c) == np.sum(abs(c)) else -1 * c
else:
return -1
# This means we have a zero solution
# Building matrix C of all the null-space basis vectors
C = Vt[len(Sigma[np.abs(Sigma) > 1e-5]):]
# 1. What we have here is a set of linear system of inequalities. Each equation inequality is a hyperplane and for
# each equation there is a valid half-space. We want to find the intersection of all the half-spaces, if it exists.
# 2. A vey important observations is that the basis of the null-space that we found using SVD is ORTHOGONAL!
coeffs = np.ones(C.shape[0]) # initial guess
for hyperplane in C.T:
if coeffs.dot(hyperplane) <= 0: # the guess is on the wrong side of the hyperplane
orthogonal_part = coeffs - (coeffs.dot(hyperplane) / hyperplane.dot(hyperplane)) * hyperplane
# reflecting the coefficients to the other side of the hyperplane
coeffs = 2 * orthogonal_part - coeffs
# If this yielded a solution, we return it
c = C.T.dot(coeffs)
if np.sum(c > 0) == len(c) or np.sum(c < 0) == len(c):
return c if np.sum(c) == np.sum(abs(c)) else -1 * c
else:
return -1
The equations are taken from one of my summaries and therefore I do not have a link to the source

Is there an easy function from a pair of 32-bit ints to a single 64-bit int that preserves rotational order?

This is a question that came up in the context of sorting points with integer coordinates into clockwise order, but this question is not about how to do that sorting.
This question is about the observation that 2-d vectors have a natural cyclic ordering. Unsigned integers with usual overflow behavior (or signed integers using twos-complement) also have a natural cyclic ordering. Can you easily map from the first ordering to the second?
So, the exact question is whether there is a map from pairs of twos-complement signed 32-bit integers to unsigned (or twos-complement signed) 64-bit integers such that any list of vectors that is in clockwise order maps to integers that are in decreasing (modulo overflow) order?
Some technical cases that people will likely ask about:
Yes, vectors that are multiples of each other should map to the same thing
No, I don't care which vector (if any) maps to 0
No, the images of antipodal vectors don't have to differ by 2^63 (although that is a nice-to-have)
The obvious answer is that since there are only around 0.6*2^64 distinct slopes, the answer is yes, such a map exists, but I'm looking for one that is easily computable. I understand that "easily" is subjective, but I'm really looking for something reasonably efficient and not terrible to implement. So, in particular, no counting every lattice point between the ray and the positive x-axis (unless you know a clever way to do that without enumerating them all).
An important thing to note is that it can be done by mapping to 65-bit integers. Simply project the vector out to where it hits the box bounded by x,y=+/-2^62 and round toward negative infinity. You need 63 bits to represent that integer and two more to encode which side of the box you hit. The implementation needs a little care to make sure you don't overflow, but only has one branch and two divides and is otherwise quite cheap. It doesn't work if you project out to 2^61 because you don't get enough resolution to separate some slopes.
Also, before you suggest "just use atan2", compute atan2(1073741821,2147483643) and atan2(1073741820,2147483641)
EDIT: Expansion on the "atan2" comment:
Given two values x_1 and x_2 that are coprime and just less than 2^31 (I used 2^31-5 and 2^31-7 in my example), we can use the extended Euclidean algorithm to find y_1 and y_2 such that y_1/x_1-y_2/x_2 = 1/(x_1*x_2) ~= 2^-62. Since the derivative of arctan is bounded by 1, the difference of the outputs of atan2 on these values is not going to be bigger than that. So, there are lots of pairs of vectors that won't be distinguishable by atan2 as vanilla IEEE 754 doubles.
If you have 80-bit extended registers and you are sure you can retain residency in those registers throughout the computation (and don't get kicked out by a context switch or just plain running out of extended registers), then you're fine. But, I really don't like the correctness of my code relying on staying resident in extended registers.

Here's one possible approach, inspired by a comment in your question. (For the tl;dr version, skip down to the definition of point_to_line at the bottom of this answer: that gives a mapping for the first quadrant only. Extension to the whole plane is left as a not-too-difficult exercise.)
Your question says:
in particular, no counting every lattice point between the ray and the positive x-axis (unless you know a clever way to do that without enumerating them all).
There is an algorithm to do that counting without enumerating the points; its efficiency is akin to that of the Euclidean algorithm for finding greatest common divisors. I'm not sure to what extent it counts as either "easily computable" or "clever".
Suppose that we're given a point (p, q) with integer coordinates and both p and q positive (so that the point lies in the first quadrant). We might as well also assume that q < p, so that the point (p, q) lies between the x-axis y = 0 and the diagonal line y = x: if we can solve the problem for the half of the first quadrant that lies below the diagonal, we can make use of symmetry to solve it generally.
Write M for the bound on the size of p and q, so that in your example we want M = 2^31.
Then the number of lattice points strictly inside the triangle bounded by:
the x-axis y = 0
the ray y = (q/p)x that starts at the origin and passes through (p, q), and
the vertical line x = M
is the sum as x ranges over integers in (0, M) of ⌈qx/p⌉ - 1.
For convenience, I'll drop the -1 and include 0 in the range of the sum; both those changes are trivial to compensate for. And now the core functionality we need is the ability to evaluate the sum of ⌈qx/p⌉ as x ranges over the integers in an interval [0, M). While we're at it, we might also want to be able to compute a closely-related sum: the sum of ⌊qx/p⌋ over that same range of x (and it'll turn out that it makes sense to evaluate both of these together).
For testing purposes, here are slow, naive-but-obviously-correct versions of the functions we're interested in, here written in Python:
def floor_sum_slow(p, q, M):
"""
Sum of floor(q * x / p) for 0 <= x < M.
Assumes p positive, q and M nonnegative.
"""
return sum(q * x // p for x in range(M))
def ceil_sum_slow(p, q, M):
"""
Sum of ceil(q * x / p) for 0 <= x < M.
Assumes p positive, q and M nonnegative.
"""
return sum((q * x + p - 1) // p for x in range(M))
And an example use:
>>> floor_sum_slow(51, 43, 2**28) # takes several seconds to complete
30377220771239253
>>> ceil_sum_slow(140552068, 161600507, 2**28)
41424305916577422
These sums can be evaluated much faster. The first key observation is that if q >= p, then we can apply the Euclidean "division algorithm" and write q = ap + r for some integers a and r. The sum then simplifies: the ap part contributes a factor of a * M * (M - 1) // 2, and we're reduced from computing floor_sum(p, q, M) to computing floor_sum(p, r, M). Similarly, the computation of ceil_sum(p, q, M) reduces to the computation of ceil_sum(p, q % p, M).
The second key observation is that we can express floor_sum(p, q, M) in terms of ceil_sum(q, p, N), where N is the ceiling of (q/p)M. To do this, we consider the rectangle [0, M) x (0, (q/p)M), and divide that rectangle into two triangles using the line y = (q/p)x. The number of lattice points within the rectangle that lie on or below the line is floor_sum(p, q, M), while the number of lattice points within the rectangle that lie above the line is ceil_sum(q, p, N). Since the total number of lattice points in the rectangle is (N - 1)M, we can deduce the value of floor_sum(p, q, M) from that of ceil_sum(q, p, N), and vice versa.
Combining those two ideas, and working through the details, we end up with a pair of mutually recursive functions that look like this:
def floor_sum(p, q, M):
"""
Sum of floor(q * x / p) for 0 <= x < M.
Assumes p positive, q and M nonnegative.
"""
a = q // p
r = q % p
if r == 0:
return a * M * (M - 1) // 2
else:
N = (M * r + p - 1) // p
return a * M * (M - 1) // 2 + (N - 1) * M - ceil_sum(r, p, N)
def ceil_sum(p, q, M):
"""
Sum of ceil(q * x / p) for 0 <= x < M.
Assumes p positive, q and M nonnegative.
"""
a = q // p
r = q % p
if r == 0:
return a * M * (M - 1) // 2
else:
N = (M * r + p - 1) // p
return a * M * (M - 1) // 2 + N * (M - 1) - floor_sum(r, p, N)
Performing the same calculation as before, we get exactly the same results, but this time the result is instant:
>>> floor_sum(51, 43, 2**28)
30377220771239253
>>> ceil_sum(140552068, 161600507, 2**28)
41424305916577422
A bit of experimentation should convince you that the floor_sum and floor_sum_slow functions give the same result in all cases, and similarly for ceil_sum and ceil_sum_slow.
Here's a function that uses floor_sum and ceil_sum to give an appropriate mapping for the first quadrant. I failed to resist the temptation to make it a full bijection, enumerating points in the order that they appear on each ray, but you can fix that by simply replacing the + gcd(p, q) term with + 1 in both branches.
from math import gcd
def point_to_line(p, q, M):
"""
Bijection from [0, M) x [0, M) to [0, M^2), preserving
the 'angle' ordering.
"""
if p == q == 0:
return 0
elif q <= p:
return ceil_sum(p, q, M) + gcd(p, q)
else:
return M * (M - 1) - floor_sum(q, p, M) + gcd(p, q)
Extending to the whole plane should be straightforward, though just a little bit messy due to the asymmetry between the negative range and the positive range in the two's complement representation.
Here's a visual demonstration for the case M = 7, printed using this code:
M = 7
for q in reversed(range(M)):
for p in range(M):
print(" {:02d}".format(point_to_line(p, q, M)), end="")
print()
Results:
48 42 39 36 32 28 27
47 41 37 33 29 26 21
46 40 35 30 25 20 18
45 38 31 24 19 16 15
44 34 23 17 14 12 11
43 22 13 10 09 08 07
00 01 02 03 04 05 06

This doesn't meet your requirement for an "easy" function, nor for a "reasonably efficient" one. But in principle it would work, and it might give some idea of how difficult the problem is. To keep things simple, let's consider just the case where 0 < y ≤ x, because the full problem can be solved by splitting the full 2D plane into eight octants and mapping each to its own range of integers in essentially the same way.
A point (x1, y1) is "anticlockwise" of (x2, y2) if and only if the slope y1/x1 is greater than the slope y2/x2. To map the slopes to integers in an order-preserving way, we can consider the sequence of all distinct fractions whose numerators and denominators are within range (i.e. up to 231), in ascending numerical order. Note that each fraction's numerical value is between 0 and 1 since we are just considering one octant of the plane.
This sequence of fractions is finite, so each fraction has an index at which it occurs in the sequence; so to map a point (x, y) to an integer, first reduce the fraction y/x to its simplest form (e.g. using Euclid's algorithm to find the GCD to divide by), then compute that fraction's index in the sequence.
It turns out this sequence is called a Farey sequence; specifically, it's the Farey sequence of order 231. Unfortunately, computing the index of a given fraction in this sequence turns out to be neither easy nor reasonably efficient. According to the paper
Computing Order Statistics in the Farey Sequence by Corina E. Pǎtraşcu and Mihai Pǎtraşcu, there is a somewhat complicated algorithm to compute the rank (i.e. index) of a fraction in O(n) time, where n in your case is 231, and there is unlikely to be an algorithm in time polynomial in log n because the algorithm can be used to factorise integers.
All of that said, there might be a much easier solution to your problem, because I've started from the assumption of wanting to map these fractions to integers as densely as possible (i.e. no "unused" integers in the target range), whereas in your question you wrote that the number of distinct fractions is about 60% of the available range of size 264. Intuitively, that amount of leeway doesn't seem like a lot to me, so I think the problem is probably quite difficult and you may need to settle for a solution that uses a larger output range, or a smaller input range. At the very least, by writing this answer I might save somebody else the effort of investigating whether this approach is feasible.

Just some random ideas / observations:
(edit: added two more and marked the first one as wrong as pointed out in the comments)
Divide into 16 22.5° segments instead of 8 45° segments
If I understand the problem correctly, the lines spread out "more" towards 45°, "wasting" resolution that you need for smaller angles. (Incorrect, see below)
In the mapping to 62 bit integers, there must be gaps. Identify enough low density areas to map down to 61 bits? Perhaps plot for a smaller problem to potentially see a pattern?
As the range for x and y is limited, for a given x0, all (legal) x < x0 with y > q must have a smaller angle. Could this help to break down the problem in some way? Perhaps cutting a triangle where points can easily be enumerated out of the problem for each quadrant?

Pollard Rho factorization method

Pollard Rho factorization method uses a function generator f(x) = x^2-a(mod n) or f(x) = x^2+a(mod n) , is the choice of this function (parabolic) has got any significance or we may use any function (cubic , polynomial or even linear) as we have to identify or find the numbers belonging to same congruence class modulo n to find the non trivial divisor ?

In Knuth Vol II (The Art Of Computer Programming - Seminumerical Algorithms) section 4.5.4 Knuth says
Furthermore if f(y) mod p behaves as a random mapping from the set {0,
1, ... p-1} into itself, exercise 3.1-12 shows that the average value
of the least such m will be of order sqrt(p)... From the theory in
Chapter 3, we know that a linear polynomial f(x) = ax + c will not be
sufficiently random for our purpose. The next simplest case is
quadratic, say f(x) = x^2 + 1. We don't know that this function is
sufficiently random, but our lack of knowledge tends to support the
hypothesis of randomness, and empirical tests show that this f does
work essentially as predicted
The probability theory that says that f(x) has a cycle of length about sqrt(p) assumes in particular that there can be two values y and z such that f(y) = f(z) - since f is chosen at random. The rho in Pollard Rho contains such a junction, with the cycle containing multiple lines leading on to it. For a linear function f(x) = ax + b then for gcd(a, p) = 1 mod p (which is likely since p is prime) f(y) = f(z) means that y = z mod p, so there are no such junctions.
If you look at http://www.agner.org/random/theory/chaosran.pdf you will see that the expected cycle length of a random function is about the sqrt of the state size, but the expected cycle length of a random bijection is about the state size. If you think of generating the random function only as you evaluate it you can see that if the function is entirely random then every value seen so far is available to be chosen again at random to find a cycle, so the odds of closing the cycle increase with the cycle length, but if the function has to be invertible the only way to close the cycle is to generate the starting point, which is much less likely.

Bijection on the integers below x

i'm working on image processing, and i'm writing a parallel algorithm that iterates over all the pixels in an image, and changes the surrounding pixels based on it's value. In this algorithm, minor non-deterministic is acceptable, but i'd rather minimize it by only querying distant pixels simultaneously. Could someone give me an algorithm that bijectively maps the integers below n to the integers below n, in a fast and simple manner, such that two integers that are close to each other before mapping are likely to be far apart after application.

For simplicity let's say n is a power of two. Could you simply reverse the order of the least significant log2(n) bits of the number?

Considering the pixels to be a one dimentional array you could use a hash function j = i*p % n where n is the zero based index of the last pixel and p is a prime number chosen to place the pixel far enough away at each step. % is the remainder operator in C, mathematically I'd write j(i) = i p (mod n).
So if you want to jump at least 10 rows at each iteration, choose p > 10 * w where w is the screen width. You'll want to have a lookup table for p as a function of n and w of course.
Note that j hits every pixel as i goes from 0 to n.
CORRECTION: Use (mod (n + 1)), not (mod n). The last index is n, which cannot be reached using mod n since n (mod n) == 0.

Apart from reverting the bit order, you can use modulo. Say N is a prime number (like 521), so for all x = 0..520 you define a function:
f(x) = x * fac mod N
which is bijection on 0..520. fac is arbitrary number different from 0 and 1. For example for N = 521 and fac = 122 you get the following mapping:
which as you can see is quite uniform and not many numbers are near the diagonal - there are some, but it is a small proportion.

Calculating sum of geometric series (mod m)

I have a series
S = i^(m) + i^(2m) + ............... + i^(km) (mod m)
0 <= i < m, k may be very large (up to 100,000,000), m <= 300000
I want to find the sum. I cannot apply the Geometric Progression (GP) formula because then result will have denominator and then I will have to find modular inverse which may not exist (if the denominator and m are not coprime).
So I made an alternate algorithm making an assumption that these powers will make a cycle of length much smaller than k (because it is a modular equation and so I would obtain something like 2,7,9,1,2,7,9,1....) and that cycle will repeat in the above series. So instead of iterating from 0 to k, I would just find the sum of numbers in a cycle and then calculate the number of cycles in the above series and multiply them. So I first found i^m (mod m) and then multiplied this number again and again taking modulo at each step until I reached the first element again.
But when I actually coded the algorithm, for some values of i, I got cycles which were of very large size. And hence took a large amount of time before terminating and hence my assumption is incorrect.
So is there any other pattern we can find out? (Basically I don't want to iterate over k.)
So please give me an idea of an efficient algorithm to find the sum.

This is the algorithm for a similar problem I encountered
You probably know that one can calculate the power of a number in logarithmic time. You can also do so for calculating the sum of the geometric series. Since it holds that
1 + a + a^2 + ... + a^(2*n+1) = (1 + a) * (1 + (a^2) + (a^2)^2 + ... + (a^2)^n),
you can recursively calculate the geometric series on the right hand to get the result.
This way you do not need division, so you can take the remainder of the sum (and of intermediate results) modulo any number you want.

As you've noted, doing the calculation for an arbitrary modulus m is difficult because many values might not have a multiplicative inverse mod m. However, if you can solve it for a carefully selected set of alternate moduli, you can combine them to obtain a solution mod m.
Factor m into p_1, p_2, p_3 ... p_n such that each p_i is a power of a distinct prime
Since each p is a distinct prime power, they are pairwise coprime. If we can calculate the sum of the series with respect to each modulus p_i, we can use the Chinese Remainder Theorem to reassemble them into a solution mod m.
For each prime power modulus, there are two trivial special cases:
If i^m is congruent to 0 mod p_i, the sum is trivially 0.
If i^m is congruent to 1 mod p_i, then the sum is congruent to k mod p_i.
For other values, one can apply the usual formula for the sum of a geometric sequence:
S = sum(j=0 to k, (i^m)^j) = ((i^m)^(k+1) - 1) / (i^m - 1)
TODO: Prove that (i^m - 1) is coprime to p_i or find an alternate solution for when they have a nontrivial GCD. Hopefully the fact that p_i is a prime power and also a divisor of m will be of some use... If p_i is a divisor of i. the condition holds. If p_i is prime (as opposed to a prime power), then either the special case i^m = 1 applies, or (i^m - 1) has a multiplicative inverse.
If the geometric sum formula isn't usable for some p_i, you could rearrange the calculation so you only need to iterate from 1 to p_i instead of 1 to k, taking advantage of the fact that the terms repeat with a period no longer than p_i.
(Since your series doesn't contain a j=0 term, the value you want is actually S-1.)
This yields a set of congruences mod p_i, which satisfy the requirements of the CRT.
The procedure for combining them into a solution mod m is described in the above link, so I won't repeat it here.

This can be done via the method of repeated squaring, which is O(log(k)) time, or O(log(k)log(m)) time, if you consider m a variable.
In general, a[n]=1+b+b^2+... b^(n-1) mod m can be computed by noting that:
a[j+k]==b^{j}a[k]+a[j]
a[2n]==(b^n+1)a[n]
The second just being the corollary for the first.
In your case, b=i^m can be computed in O(log m) time.
The following Python code implements this:
def geometric(n,b,m):
T=1
e=b%m
total = 0
while n>0:
if n&1==1:
total = (e*total + T)%m
T = ((e+1)*T)%m
e = (e*e)%m
n = n/2
//print '{} {} {}'.format(total,T,e)
return total
This bit of magic has a mathematical reason - the operation on pairs defined as
(a,r)#(b,s)=(ab,as+r)
is associative, and the rule 1 basically means that:
(b,1)#(b,1)#... n times ... #(b,1)=(b^n,1+b+b^2+...+b^(n-1))
Repeated squaring always works when operations are associative. In this case, the # operator is O(log(m)) time, so repeated squaring takes O(log(n)log(m)).
One way to look at this is that the matrix exponentiation:
[[b,1],[0,1]]^n == [[b^n,1+b+...+b^(n-1))],[0,1]]
You can use a similar method to compute (a^n-b^n)/(a-b) modulo m because matrix exponentiation gives:
[[b,1],[0,a]]^n == [[b^n,a^(n-1)+a^(n-2)b+...+ab^(n-2)+b^(n-1)],[0,a^n]]

Based on the approach of #braindoper a complete algorithm which calculates
1 + a + a^2 + ... +a^n mod m
looks like this in Mathematica:
geometricSeriesMod[a_, n_, m_] :=
Module[ {q = a, exp = n, factor = 1, sum = 0, temp},
While[And[exp > 0, q != 0],
If[EvenQ[exp],
temp = Mod[factor*PowerMod[q, exp, m], m];
sum = Mod[sum + temp, m];
exp--];
factor = Mod[Mod[1 + q, m]*factor, m];
q = Mod[q*q, m];
exp = Floor[ exp /2];
];
Return [Mod[sum + factor, m]]
]
Parameters:
a is the "ratio" of the series. It can be any integer (including zero and negative values).
n is the highest exponent of the series. Allowed are integers >= 0.
mis the integer modulus != 0
Note: The algorithm performs a Mod operation after every arithmetic operation. This is essential, if you transcribe this algorithm to a language with a limited word length for integers.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio