I am looking for a int32->int32 function that is
bijection (one-to-one correspondence)
cheap to calculate at least in one direction
transforms the increasing sequence 0, 1, 2, 3, ... into a sequence looking like a good pseudo-random sequence (~ half bits flip when argument changes by a small number, no obvious patterns)
Multiply by a large odd number and xor with a different one.
Bijection: odd numbers have a multiplicative inverse modulo powers of two, so the multiplication is undone by a multiplication by the inverse. And xor is, of course, undone by another xor.
This is basically how the linear congruence pseudo random number generator works.
Probably an overkill for this task, but have you consider applying any crypto pseudo random permutation or other primitives comes from block ciphers. For example, it may be done using des with known key in counter mode:
younumber xor (des (key, number counter))
Related
I want a simple (non-cryptographic) random number generation algorithm where I can freely choose the period.
One candidate would be a special instance of LCG:
X(n+1) = (aX(n)+c) mod m (m,c relatively prime; (a-1) divisible by all prime factors of m and also divisible by 4 if m is).
This has period m and does not restrict possible values of m.
I intend to use this RNG to create a permutation of an array by generating indices into it. I tried the LCG and it might be OK. However, it may not be "random enough" in that distances between adjacent outputs have very few possible values (i.e, plotting x(n) vs n gives a wrapped line). The arrays I want to index into have some structure that has to do with this distance and I want to avoid potential issues with this.
Of course, I could use any good PRNG to shuffle (using e.g. Fisher–Yates) an array [1,..., m]. But I don't want to have to store this array of indices. Is there some way to capture the permuted indices directly in an algorithm?
I don't really mind the method ending up biased w.r.t choice of RNG seed. Only the period matters and the permuted sequence (for a given seed) being reasonably random.
Encryption is a one-to-one operation. If you encrypt a range of numbers, you will get the same count of apparently random numbers back. In this case the period will be the size of the chosen range. So for a period of 20, encrypt the numbers 0..19.
If you want the output numbers to be in a specific range, then pick a block cipher with an appropriately sized block and use Format Preserving Encryption if needed, as #David Eisenstat suggests.
It is not difficult to set up a cipher with almost any reasonable block size, so long as it is an even number of bits, using the Feistel structure. If you don't require cryptographic security then four or six Feistel rounds should give you enough randomness.
Changing the encryption key will give you a different ordering of the numbers.
I have an Array A(size <= 10^5) of numbers(<= 10^8), and I need to answer some queries(50000), for L, R, how many subsets for elements in the range [L, R], the XOR of the subset is a number that has 0 or 1 bit set(power of 2). Also, point modifications in the array are being done in between the queries, so can't really do some offline processing or use techniques like square root decomposition etc.
I have an approach where I use DP to calculate for a given range, something on the lines of this:
https://www.geeksforgeeks.org/count-number-of-subsets-having-a-particular-xor-value/
But this is clearly too slow. This feels like a classical segment tree problem, but can't seem to find as to what data points to store at each node, so that I can use the left child and right child to compute the answer for the given range.
Yeah, that DP won't be fast enough.
What will be fast enough is applying some linear algebra over GF(2), the Galois field with two elements. Each number can be interpreted as a bit-vector; adding/subtracting vectors is XOR; scalar multiplication isn't really relevant.
The data you need for each segment is (1) how many numbers are there in the segment (2) a basis for the subspace of numbers generated by numbers in the segment, which will consist of at most 27 numbers because all numbers are less than 2^27. The basis for a one-element segment is just that number if it's nonzero, else the empty set. To find the span of the union of two bases, use Gaussian elimination and discard the zero vectors.
Given the length of an interval and a basis for it, you can count the number of good subsets using the rank-nullity theorem. Basically, for each target number, use your Gaussian elimination routine to test whether the target number belongs to the subspace. If so, there are 2^(length of interval minus size of basis) subsets. If not, the answer is zero.
Which exponent(s) d will
require this many?
Would greatly appreciate any advice as to how to go about solving this problem.
assuming unsigned integers and simple power by squaring algo like:
DWORD powuu(DWORD a,DWORD b)
{
int i,bits=32;
DWORD d=1;
for (i=0;i<bits;i++)
{
d*=d;
if (DWORD(b&0x80000000)) d*=a;
b<<=1;
}
return d;
}
You need just replace a*b with modmul(a,b,n) or (a*b)%n so the answer is:
if exponent has k bits and l from them are set you need k+l multiplications
worst case is 2k multiplications for exponent (2^k)-1
For more info see related QAs:
Power by squaring for negative exponents
modular arithmetics and NTT (finite field DFT) optimizations
For a naive implementation, it's clearly the exponent with the largest Hamming weight (number of set bits). In this case (2^k - 1) would require the most multiplication steps: (k).
For k-ary window methods, the number of multiplications can be made independent of the exponent. e.g., for a fixed window size: w = 3 we could compute {m^0, m^1, m^2, m^3, .., m^7} group coefficients (all mod n in this case, and probably in Montgomery representation for efficient reduction). The result is ceil(k/w) multiplications. This is often preferred in cryptographic implementations, as the exponent is not revealed by simple timing attacks. Any k-bit exponent has the same timing. (The reality is a bit more complex if it is assumed the attacker has 'fine-grained' access to things like cache performance, etc.)
Sliding window techniques are typically more efficient, and only slightly more difficult to implement than fixed-window methods. however, they also leak side channel data, as timing will be dependent on the exponent. Furthermore, the 'best' sequence to use is known to be a hard problem.
For an implementation of Perlin noise, I need to select a vector from a static list of n vectors for each integer coordinate in 3D space. This boils down to generating a pseudo random number in 1..n from four signed integer values x, y, z and seed.
unsigned int pseudo_random_number(int x, int y, int z, int seed);
The algorithm should be stateless, i.e., return the same number each time it is called with the same input values.
An existing Perlin noise implementation I looked at multiplies each integer with a large prime, adds the results, does some bit manipulation on it and takes the reminder of a division by n. I don't want to just copy this because I don't understand a few things about it:
How are the primes selected?
Why is the additional bit manipulation done?
How do I know if this is „sufficiently pseudo-random“ to generate a visually pleasing result?
I looked for explanations of how a PRNG works but I couldn't find anything about multiple input values.
If you have arbitrary precision pseudo-random number generation then you can just concatenate the four inputs (x,y,z,seed) and call your pseudo-random number generator function on this input to get the "next" pseudo-random number which will serve as your random number. (and then take the appropriate number of high bits if you want to have a random number between 1 and n).
The implementation you mentioned uses the fact that different large prime numbers, modulo n, produce essentially uncorrelated results (modulo n) when multiplied with input integers. Of course you need your input integers to not all have a universal common divisor with n for this to work. This is why the additional bit manipulation is done, so that if all of your input integers are divisible by k and n is divisible by k, the remainder modulo n will not automatically be divisible by k as well. At any rate, people have put a lot of thought into established pseudo-random number generators so my advice to you is that you trust that they considered all the potential issues and that their generator is "good" if there is a large crowd that uses it without complaints.
I have a list of size n which contains n consecutive members of an arithmetic progression which are not in order. I changed less than half of the elements in this list with some random integer. From this new list, how can I find the difference of the initial arithmetic progression?
I thought a lot about it but except brute force, I was not able to come up with any other thing :(
Thanks for thinking on this one :)
It's not possible to solve this in general and be 100% sure that your answer is correct. Let's say that the initial list is the following arithmetic progression (not in order):
1 3 2 4
Change less than half the elements at random... let's say for example that we changed 2 to 5:
1 3 5 4
If we can first find out which numbers we need to change to obtain a valid shuffled arithmetic sequence then we can easily solve the problem stated in the question. However we can see that there are multiple possible answers depending in which we number we choose to change:
6, 3, 5, 4 (difference is 1)
1, 3, 2, 4 (difference is 1)
1, 3, 5, 7 (difference is 2)
There is no way to know which of these possible sequence is the original sequence, so you cannot be sure what the original difference was.
Since there is no deterministic solution for the problem (as stated by #Mark Byers), you can try a probabilistic approach.
It's difficult to obtain the original progression, but its rate can be obtained easily by comparing the differences between elements. The difference of original ones will be multiples of rate.
Consider you take 2 elements from the list (probability that both of them belongs to the original sequence is 1/4), and compute the difference. This difference, with probability of 1/4, will be a multiple of the rate. Decompose it to prime factors and count them (for example, 12 = 2^^2 * 3 will add 2 to 2's counter and will increment 3's counter).
After many such iterations (it looks like a good problem for probabilistic methods, like Monte Carlo), you could analize the counters.
If a prime factor belongs to the rate, its counter will be at least num_iteartions/4 ( or num_iterations/2 if it appears twice).
The main problem is that small factors will have large probability on random input (for example, the difference between two random numbers will have 50% probability to be divisible by 2). So you'll have to compensate it: since 3/4 of your differences were random, you'll have to consider that (3/8)*num_iterations of 2's counter must be ignored. Since this also applies to all powers of two, the simpliest way is to pregenerate "white noise mask" by taking the differences only between random numbers.
EDIT: let's take this approach further. Consider that you create this "white noise mask" (let's call it spectrum) for random numbers, and consider that it's base-1 spectrum, since their smallest "largest common factor" is 1. By computing it for a differences of the arithmetic sequence, you'll obtain a base-R spectrum, where R is the rate, and it will equivalent to a shifted version of base-1 spectrum. So you have to find the value of R such that
your_spectrum ~= spectrum(1)*3/4 + spectrum(R)*1/4
You could also check for largest number R such that at least half of the elements will be equal modulo R.