Bijection on the integers below x - algorithm

i'm working on image processing, and i'm writing a parallel algorithm that iterates over all the pixels in an image, and changes the surrounding pixels based on it's value. In this algorithm, minor non-deterministic is acceptable, but i'd rather minimize it by only querying distant pixels simultaneously. Could someone give me an algorithm that bijectively maps the integers below n to the integers below n, in a fast and simple manner, such that two integers that are close to each other before mapping are likely to be far apart after application.

For simplicity let's say n is a power of two. Could you simply reverse the order of the least significant log2(n) bits of the number?

Considering the pixels to be a one dimentional array you could use a hash function j = i*p % n where n is the zero based index of the last pixel and p is a prime number chosen to place the pixel far enough away at each step. % is the remainder operator in C, mathematically I'd write j(i) = i p (mod n).
So if you want to jump at least 10 rows at each iteration, choose p > 10 * w where w is the screen width. You'll want to have a lookup table for p as a function of n and w of course.
Note that j hits every pixel as i goes from 0 to n.
CORRECTION: Use (mod (n + 1)), not (mod n). The last index is n, which cannot be reached using mod n since n (mod n) == 0.

Apart from reverting the bit order, you can use modulo. Say N is a prime number (like 521), so for all x = 0..520 you define a function:
f(x) = x * fac mod N
which is bijection on 0..520. fac is arbitrary number different from 0 and 1. For example for N = 521 and fac = 122 you get the following mapping:
which as you can see is quite uniform and not many numbers are near the diagonal - there are some, but it is a small proportion.

Related

Is there an easy function from a pair of 32-bit ints to a single 64-bit int that preserves rotational order?

This is a question that came up in the context of sorting points with integer coordinates into clockwise order, but this question is not about how to do that sorting.
This question is about the observation that 2-d vectors have a natural cyclic ordering. Unsigned integers with usual overflow behavior (or signed integers using twos-complement) also have a natural cyclic ordering. Can you easily map from the first ordering to the second?
So, the exact question is whether there is a map from pairs of twos-complement signed 32-bit integers to unsigned (or twos-complement signed) 64-bit integers such that any list of vectors that is in clockwise order maps to integers that are in decreasing (modulo overflow) order?
Some technical cases that people will likely ask about:
Yes, vectors that are multiples of each other should map to the same thing
No, I don't care which vector (if any) maps to 0
No, the images of antipodal vectors don't have to differ by 2^63 (although that is a nice-to-have)
The obvious answer is that since there are only around 0.6*2^64 distinct slopes, the answer is yes, such a map exists, but I'm looking for one that is easily computable. I understand that "easily" is subjective, but I'm really looking for something reasonably efficient and not terrible to implement. So, in particular, no counting every lattice point between the ray and the positive x-axis (unless you know a clever way to do that without enumerating them all).
An important thing to note is that it can be done by mapping to 65-bit integers. Simply project the vector out to where it hits the box bounded by x,y=+/-2^62 and round toward negative infinity. You need 63 bits to represent that integer and two more to encode which side of the box you hit. The implementation needs a little care to make sure you don't overflow, but only has one branch and two divides and is otherwise quite cheap. It doesn't work if you project out to 2^61 because you don't get enough resolution to separate some slopes.
Also, before you suggest "just use atan2", compute atan2(1073741821,2147483643) and atan2(1073741820,2147483641)
EDIT: Expansion on the "atan2" comment:
Given two values x_1 and x_2 that are coprime and just less than 2^31 (I used 2^31-5 and 2^31-7 in my example), we can use the extended Euclidean algorithm to find y_1 and y_2 such that y_1/x_1-y_2/x_2 = 1/(x_1*x_2) ~= 2^-62. Since the derivative of arctan is bounded by 1, the difference of the outputs of atan2 on these values is not going to be bigger than that. So, there are lots of pairs of vectors that won't be distinguishable by atan2 as vanilla IEEE 754 doubles.
If you have 80-bit extended registers and you are sure you can retain residency in those registers throughout the computation (and don't get kicked out by a context switch or just plain running out of extended registers), then you're fine. But, I really don't like the correctness of my code relying on staying resident in extended registers.
Here's one possible approach, inspired by a comment in your question. (For the tl;dr version, skip down to the definition of point_to_line at the bottom of this answer: that gives a mapping for the first quadrant only. Extension to the whole plane is left as a not-too-difficult exercise.)
Your question says:
in particular, no counting every lattice point between the ray and the positive x-axis (unless you know a clever way to do that without enumerating them all).
There is an algorithm to do that counting without enumerating the points; its efficiency is akin to that of the Euclidean algorithm for finding greatest common divisors. I'm not sure to what extent it counts as either "easily computable" or "clever".
Suppose that we're given a point (p, q) with integer coordinates and both p and q positive (so that the point lies in the first quadrant). We might as well also assume that q < p, so that the point (p, q) lies between the x-axis y = 0 and the diagonal line y = x: if we can solve the problem for the half of the first quadrant that lies below the diagonal, we can make use of symmetry to solve it generally.
Write M for the bound on the size of p and q, so that in your example we want M = 2^31.
Then the number of lattice points strictly inside the triangle bounded by:
the x-axis y = 0
the ray y = (q/p)x that starts at the origin and passes through (p, q), and
the vertical line x = M
is the sum as x ranges over integers in (0, M) of ⌈qx/p⌉ - 1.
For convenience, I'll drop the -1 and include 0 in the range of the sum; both those changes are trivial to compensate for. And now the core functionality we need is the ability to evaluate the sum of ⌈qx/p⌉ as x ranges over the integers in an interval [0, M). While we're at it, we might also want to be able to compute a closely-related sum: the sum of ⌊qx/p⌋ over that same range of x (and it'll turn out that it makes sense to evaluate both of these together).
For testing purposes, here are slow, naive-but-obviously-correct versions of the functions we're interested in, here written in Python:
def floor_sum_slow(p, q, M):
"""
Sum of floor(q * x / p) for 0 <= x < M.
Assumes p positive, q and M nonnegative.
"""
return sum(q * x // p for x in range(M))
def ceil_sum_slow(p, q, M):
"""
Sum of ceil(q * x / p) for 0 <= x < M.
Assumes p positive, q and M nonnegative.
"""
return sum((q * x + p - 1) // p for x in range(M))
And an example use:
>>> floor_sum_slow(51, 43, 2**28) # takes several seconds to complete
30377220771239253
>>> ceil_sum_slow(140552068, 161600507, 2**28)
41424305916577422
These sums can be evaluated much faster. The first key observation is that if q >= p, then we can apply the Euclidean "division algorithm" and write q = ap + r for some integers a and r. The sum then simplifies: the ap part contributes a factor of a * M * (M - 1) // 2, and we're reduced from computing floor_sum(p, q, M) to computing floor_sum(p, r, M). Similarly, the computation of ceil_sum(p, q, M) reduces to the computation of ceil_sum(p, q % p, M).
The second key observation is that we can express floor_sum(p, q, M) in terms of ceil_sum(q, p, N), where N is the ceiling of (q/p)M. To do this, we consider the rectangle [0, M) x (0, (q/p)M), and divide that rectangle into two triangles using the line y = (q/p)x. The number of lattice points within the rectangle that lie on or below the line is floor_sum(p, q, M), while the number of lattice points within the rectangle that lie above the line is ceil_sum(q, p, N). Since the total number of lattice points in the rectangle is (N - 1)M, we can deduce the value of floor_sum(p, q, M) from that of ceil_sum(q, p, N), and vice versa.
Combining those two ideas, and working through the details, we end up with a pair of mutually recursive functions that look like this:
def floor_sum(p, q, M):
"""
Sum of floor(q * x / p) for 0 <= x < M.
Assumes p positive, q and M nonnegative.
"""
a = q // p
r = q % p
if r == 0:
return a * M * (M - 1) // 2
else:
N = (M * r + p - 1) // p
return a * M * (M - 1) // 2 + (N - 1) * M - ceil_sum(r, p, N)
def ceil_sum(p, q, M):
"""
Sum of ceil(q * x / p) for 0 <= x < M.
Assumes p positive, q and M nonnegative.
"""
a = q // p
r = q % p
if r == 0:
return a * M * (M - 1) // 2
else:
N = (M * r + p - 1) // p
return a * M * (M - 1) // 2 + N * (M - 1) - floor_sum(r, p, N)
Performing the same calculation as before, we get exactly the same results, but this time the result is instant:
>>> floor_sum(51, 43, 2**28)
30377220771239253
>>> ceil_sum(140552068, 161600507, 2**28)
41424305916577422
A bit of experimentation should convince you that the floor_sum and floor_sum_slow functions give the same result in all cases, and similarly for ceil_sum and ceil_sum_slow.
Here's a function that uses floor_sum and ceil_sum to give an appropriate mapping for the first quadrant. I failed to resist the temptation to make it a full bijection, enumerating points in the order that they appear on each ray, but you can fix that by simply replacing the + gcd(p, q) term with + 1 in both branches.
from math import gcd
def point_to_line(p, q, M):
"""
Bijection from [0, M) x [0, M) to [0, M^2), preserving
the 'angle' ordering.
"""
if p == q == 0:
return 0
elif q <= p:
return ceil_sum(p, q, M) + gcd(p, q)
else:
return M * (M - 1) - floor_sum(q, p, M) + gcd(p, q)
Extending to the whole plane should be straightforward, though just a little bit messy due to the asymmetry between the negative range and the positive range in the two's complement representation.
Here's a visual demonstration for the case M = 7, printed using this code:
M = 7
for q in reversed(range(M)):
for p in range(M):
print(" {:02d}".format(point_to_line(p, q, M)), end="")
print()
Results:
48 42 39 36 32 28 27
47 41 37 33 29 26 21
46 40 35 30 25 20 18
45 38 31 24 19 16 15
44 34 23 17 14 12 11
43 22 13 10 09 08 07
00 01 02 03 04 05 06
This doesn't meet your requirement for an "easy" function, nor for a "reasonably efficient" one. But in principle it would work, and it might give some idea of how difficult the problem is. To keep things simple, let's consider just the case where 0 < y ≤ x, because the full problem can be solved by splitting the full 2D plane into eight octants and mapping each to its own range of integers in essentially the same way.
A point (x1, y1) is "anticlockwise" of (x2, y2) if and only if the slope y1/x1 is greater than the slope y2/x2. To map the slopes to integers in an order-preserving way, we can consider the sequence of all distinct fractions whose numerators and denominators are within range (i.e. up to 231), in ascending numerical order. Note that each fraction's numerical value is between 0 and 1 since we are just considering one octant of the plane.
This sequence of fractions is finite, so each fraction has an index at which it occurs in the sequence; so to map a point (x, y) to an integer, first reduce the fraction y/x to its simplest form (e.g. using Euclid's algorithm to find the GCD to divide by), then compute that fraction's index in the sequence.
It turns out this sequence is called a Farey sequence; specifically, it's the Farey sequence of order 231. Unfortunately, computing the index of a given fraction in this sequence turns out to be neither easy nor reasonably efficient. According to the paper
Computing Order Statistics in the Farey Sequence by Corina E. Pǎtraşcu and Mihai Pǎtraşcu, there is a somewhat complicated algorithm to compute the rank (i.e. index) of a fraction in O(n) time, where n in your case is 231, and there is unlikely to be an algorithm in time polynomial in log n because the algorithm can be used to factorise integers.
All of that said, there might be a much easier solution to your problem, because I've started from the assumption of wanting to map these fractions to integers as densely as possible (i.e. no "unused" integers in the target range), whereas in your question you wrote that the number of distinct fractions is about 60% of the available range of size 264. Intuitively, that amount of leeway doesn't seem like a lot to me, so I think the problem is probably quite difficult and you may need to settle for a solution that uses a larger output range, or a smaller input range. At the very least, by writing this answer I might save somebody else the effort of investigating whether this approach is feasible.
Just some random ideas / observations:
(edit: added two more and marked the first one as wrong as pointed out in the comments)
Divide into 16 22.5° segments instead of 8 45° segments
If I understand the problem correctly, the lines spread out "more" towards 45°, "wasting" resolution that you need for smaller angles. (Incorrect, see below)
In the mapping to 62 bit integers, there must be gaps. Identify enough low density areas to map down to 61 bits? Perhaps plot for a smaller problem to potentially see a pattern?
As the range for x and y is limited, for a given x0, all (legal) x < x0 with y > q must have a smaller angle. Could this help to break down the problem in some way? Perhaps cutting a triangle where points can easily be enumerated out of the problem for each quadrant?

Find modulo of large number with large number

I have a number n and m. They are both very large and exceed the limits of a C++ long long. How do i find n mod m accurately?
Naive n % m only works up to 2^63-1, getting 9 on the online judge.
Adding one digit of n at a time and using % m works for small m, but is quite slow, and without hardcoding for the special case where m = 1, it exceeds the time limit on such a small m. It gets 37 on the online judge.
So is there a method of calculating n mod m given them as strings?
Problem: https://dunjudge.me/analysis/problems/669/
Given that m is constrained to be less than 10! (which is 3628800) you can process the digits of n one at a time in an easy way.
If the digits of n are d[i] where i goes from 0 to N-1 (with d[0] being the most significant digit), then something like this works (pseudocode):
R = 0
for i = 0 to N-1
R = (10 * R + d[i]) % m
return R
One way to handle this is the binary approach. Algorithm steps would go like this:
Set a = n shift a to left as much as possible while still m > a
m = m - a
If m < n DONE result is in m, Else go to step 1
For first step, since you have both numbers in strings, finding the index of the first 1from left for both m and n and taking their difference would give the necessary number of shifts.
I haven't implemented it, however all operations are basic binary ops (avoids relatively costly modulo) and algorithm's complexity is O(N) where N is the number of digits; so it should have a nice performance.

Making a very large calculation

I am want to calculate the value X =n!/2^r
where n<10^6 and r<10^6
and it's guarantee that value of X is between O to 10
How to calculate X since i can't simple divide the factorial and power term since they overflow the long integer.
My Approach
Do with the help of Modulus. Let take a prime number greater than 10 let say 101
X= [(Factorial N%101)*inverse Modulo of(2^r)]%101;
Note that inverse modulo can easily be calculate and 2^r%101 can also be calculated.
Problem:
It's not guarantee that X is always be integer it can be float also.
My method works fine when X is integer ? How to deal when X is a floating point number
If approximate results are OK and you have access to a math library with base-2 exponential (exp2 in C), natural log gamma (lgamma in C), and natural log (log in C), then you can do
exp2(lgamma(n+1)/log(2) - r).
Find the power that 2 appears at in n!. This is:
P = n / 2 + n / 2^2 + n / 2^3 + ...
Using integer division until you reach a 0 result.
If P >= r, then you have an integer result. You can find this result by computing the factorial such that you ignore r powers of 2. Something like:
factorial = 1
for i = 2 to n:
factor = i
while factor % 2 == 0 and r != 0:
factor /= 2
r -= 1
factorial *= factor
If P < r, set r = P, apply the same algorithm and divide the result by 2^(initial_r - P) in the end.
Except for a very few cases (with small n and r) X will not be an integer -- for if n >= 11 then 11 divides n! but doesn't divide any power of two, so if X were integral it would have to be at least 11.
One method would be: initialise X to one; then loop: if X > 10 divide by 2 till its not; if X < 10 multiply by the next factors till its not; until you run out of factors and powers of 2.
An approach that would be tunable for precision/performance would be the following:
Store the factorial in an integer with a fixed number of bits. We can drop the last few digits if the number gets too large, since they won't affect the overall result altogether that much. By scaling this integer larger/smaller the algorithm gets tunable for either performance or precision.
Whenever the integer would overflow due to multiplication, shift it to the right by a few places and subtract that value from r. In the end there should be a small number left as r and an integer v with the most significant bits of the factorial. This v can now be interpreted as a fixed-point number with r fractional digits.
Depending upon the required precision this approach might even work with long, though I haven't had the time to test this approach yet apart from a bit experimenting with a calculator.

Find prime factors such that difference is smallest as possible

Suppose n, a, b are positive integers where n is not a prime number, such that n=ab with a≥b and (a−b) is small as possible. What would be the best algorithm to find the values of a and b if n is given?
I read a solution where they try to represent n as the difference between two squares via searching for a square S bigger than n such that S - n = (another square). Why would that be better than simply finding the prime factors of n and searching for the combination where a,b are factors of n and a - b is minimized?
Firstly....to answer why your approach
simply finding the prime factors of n and searching for the combination where a,b are factors of n and a - b is minimized
is not optimal:
Suppose your number is n = 2^7 * 3^4 * 5^2 * 7 * 11 * 13 (=259459200), well within range of int. From the combinatorics theory, this number has exactly (8 * 5 * 3 * 2 * 2 * 2 = 960) factors. So, firstly you find all of these 960 factors, then find all pairs (a,b) such that a * b = n, which in this case will be (6C1 + 9C2 + 11C3 + 13C4 + 14C5 + 15C6 + 16C7 + 16C8) ways. (if I'm not wrong, my combinatorics is a bit weak). This is of the order 1e5 if implemented optimally. Also, implementation of this approach is hard.
Now, why the difference of squares approach
represent S - n = Q, such that S and Q are perfect squares
is good:
This is because if you can represent S - n = Q, this implies, n = S - Q
=> n = s^2 - q^2
=> n = (s+q)(s-q)
=> Your reqd ans = 2 * q
Now, even if you iterate for all squares, you will either find your answer or terminate when difference of 2 consecutive squares is greater than n
But I don't think this will be doable for all n (eg. if n=6, there is no solution for (S,Q).)
Another approach:
Iterate from floor(sqrt(n)) to 1. The first number (say, x), such that x|n will be one of the numbers in the required pair (a,b). Other will be, obvs, y = x/n. So, your answer will be y - x.
This is O(sqrt(n)) time complex algorithm.
A general method could be this:
Find the prime factorization of your number: n = Π pi ai. Except for the worst cases where n is prime or semiprime, this will be substantially faster than O(n1/2) time of the iteration down from the square root, which won't divide the found factors out of the number.
Recall that the simplest, trial division, prime factorization is done by repeatedly trying to divide the number by increasing odd numbers (or by primes) below the number's square root, dividing out of the number each factor -- thus prime by construction -- as it is found (n := n/f).
Then, lazily enumerate the factors of n in order from its prime factorization. Stop after producing half of them. Having thus found n's (not necessarily prime) factor that is closest to its square root, find the second factor by simple division.
In case this must repeatedly run many times, it will greatly pay out to precalculate the needed primes below the n's square root, to use in the factorizations.

How can I evenly distribute distinct keys in a hashtable?

I have this formula:
index = (a * k) % M
which maps a number 'k', from an input set K of distinct numbers, into it's position in a hashtable. I was wondering how to write a non-brute force program that finds such 'M' and 'a' so that 'M' is minimal, and there are no collisions for the given set K.
If, instead of a numeric multiplication you could perform a logic computation (and / or /not), I think that the optimal solution (minimum value of M) would be as small as card(K) if you could get a function that related each value of K (once ordered) with its position in the set.
Theoretically, it must be possible to write a truth table for such a relation (bit a bit), and then simplify the minterms through a Karnaugh Table with a proper program. Depending on the desired number of bits, the computational complexity would be affordable... or not.
If a is co-prime to M then a * k = a * k' mod M if, and only if, k = k' mod M, so you might as well use a = 1, which is always co-prime to M. This also covers all the cases in which M is prime, because all the numbers except 0 are then co-prime to M.
If a and M are not co-prime, then they share a common factor, say b, so a = x * b and M = y * b. In this case anything multiplied by a will also be divisible by b mod M, and you might as well by working mod y, not mod M, so there is nothing to be gained by using an a not co-prime to M.
So for the problem you state, you could save some time by leaving a=1 and trying all possible values of M.
If you are e.g. using 32-bit integers and really calculating not (a * k) mod M but ((a * k) mod 2^32) mod M you might be able to find cases where values of a other than 1 do better than a=1 because of what happens in (a * k) mod 2^32.

Resources