How to generate uniformly distributed sequences of random numbers? - random

I need to generate uniformly distributed binary sequences (strings) of length 1024? So that if I make an integer of such string it would be uniformly distributed. I could have generated random integer, but the range, I suppose is too wide [0 to 2^(2^10)]. What would you suggest?

Related

Random number generator with freely chosen period

I want a simple (non-cryptographic) random number generation algorithm where I can freely choose the period.
One candidate would be a special instance of LCG:
X(n+1) = (aX(n)+c) mod m (m,c relatively prime; (a-1) divisible by all prime factors of m and also divisible by 4 if m is).
This has period m and does not restrict possible values of m.
I intend to use this RNG to create a permutation of an array by generating indices into it. I tried the LCG and it might be OK. However, it may not be "random enough" in that distances between adjacent outputs have very few possible values (i.e, plotting x(n) vs n gives a wrapped line). The arrays I want to index into have some structure that has to do with this distance and I want to avoid potential issues with this.
Of course, I could use any good PRNG to shuffle (using e.g. Fisher–Yates) an array [1,..., m]. But I don't want to have to store this array of indices. Is there some way to capture the permuted indices directly in an algorithm?
I don't really mind the method ending up biased w.r.t choice of RNG seed. Only the period matters and the permuted sequence (for a given seed) being reasonably random.
Encryption is a one-to-one operation. If you encrypt a range of numbers, you will get the same count of apparently random numbers back. In this case the period will be the size of the chosen range. So for a period of 20, encrypt the numbers 0..19.
If you want the output numbers to be in a specific range, then pick a block cipher with an appropriately sized block and use Format Preserving Encryption if needed, as #David Eisenstat suggests.
It is not difficult to set up a cipher with almost any reasonable block size, so long as it is an even number of bits, using the Feistel structure. If you don't require cryptographic security then four or six Feistel rounds should give you enough randomness.
Changing the encryption key will give you a different ordering of the numbers.

How to compress an array of random positive integers in a certain range?

I want to compress an array consisting of about 10^5 random integers in range 0 to 2^15. The integers are unsorted and I need to compress them lossless.
I don't care much about the amount of computation and time needed to run the algorithm, just want to have better compression ratio.
Are there any suggested algorithms for this?
Assuming you don´t need to preserve original order, instead of passing the numbers themselves, pass the count. If they have a normal distribution, you can expect each number to be repeated 3 or 4 times. With 3 bits per number, we can count up to 7. You can make an array of 2^15 * 3 bits and every 3 bits set the count of that number. To handle extreme cases that have more than 7, we can also send a list of numbers and their counts for these cases. Then you can read the 3 bits array and overwrite with the additional info for count higher than 7.
For your exact example: just encode each number as a 15-bit unsigned int and apply bit packing. This is optimal since you have stated each integer in uniformly random in [0, 2^15), and the Shannon Entropy of this distribution is 15 bits.
For a more general solution, apply Quantile Compression (https://github.com/mwlon/quantile-compression/). It takes advantage of any smooth-ish data and compresses near optimally on shuffled data. It works by encoding each integer with a Huffman code for it coarse range in the distribution, then an exact offset within that range.
These approaches are both computationally cheap, but more compute won't get you further in this case.

Generating a stateless pseudo-random number from four integers

For an implementation of Perlin noise, I need to select a vector from a static list of n vectors for each integer coordinate in 3D space. This boils down to generating a pseudo random number in 1..n from four signed integer values x, y, z and seed.
unsigned int pseudo_random_number(int x, int y, int z, int seed);
The algorithm should be stateless, i.e., return the same number each time it is called with the same input values.
An existing Perlin noise implementation I looked at multiplies each integer with a large prime, adds the results, does some bit manipulation on it and takes the reminder of a division by n. I don't want to just copy this because I don't understand a few things about it:
How are the primes selected?
Why is the additional bit manipulation done?
How do I know if this is „sufficiently pseudo-random“ to generate a visually pleasing result?
I looked for explanations of how a PRNG works but I couldn't find anything about multiple input values.
If you have arbitrary precision pseudo-random number generation then you can just concatenate the four inputs (x,y,z,seed) and call your pseudo-random number generator function on this input to get the "next" pseudo-random number which will serve as your random number. (and then take the appropriate number of high bits if you want to have a random number between 1 and n).
The implementation you mentioned uses the fact that different large prime numbers, modulo n, produce essentially uncorrelated results (modulo n) when multiplied with input integers. Of course you need your input integers to not all have a universal common divisor with n for this to work. This is why the additional bit manipulation is done, so that if all of your input integers are divisible by k and n is divisible by k, the remainder modulo n will not automatically be divisible by k as well. At any rate, people have put a lot of thought into established pseudo-random number generators so my advice to you is that you trust that they considered all the potential issues and that their generator is "good" if there is a large crowd that uses it without complaints.

how to generate random numbers with a specified mean

I have a question like, I should genearate 'k' random numbers lets say it is from 1 to 1000. But the generated numbers should have a mean value of 300. I used rand() function to generate random numbers. But I am stuck with the mean value. How can I do so that the numbers generated have a mean value.
I'd generate k-1 random numbers, and then set the K number to be (mean*k-[sum of all the numbers you generated so far]).
Unfortunately, the C standard does not guarantee that the random numbers are uniform (it doesn't specify any distribution, for that matter), so the only way to do it is to generate the 1000 numbers in advance, calculate the mean (M) and subtract M-300 from every element

Is possible to denote a vector of numbers uniquely as a number?

Given a vector of numbers: V=(v1, v2, ..., vn). These n numbers don't need to be distinct or sorted.
Suppose that we have a few vectors V1, V2, ..., Vm. Is possible to use a number (integer or float number) to uniquely denote each vector, such that for any Vi not equal Vj, the corresponding numbers f(Vi) and f(Vj) are not equal either.
A simple solution is to have one number in the range from 0 to m-1 as an ID to represent a vector, however we assume that this kind of solution cannot work in the case that each vector is stored in a few distributed machines. That is, the portions of vectors in two machines might overlap, and the algorithm doesn't know the distribution of vectors globally.
Of course if you have n numbers you can't compress them to one number of the same length without losing information (e.g. if you calculate some kind of hash from the vector, there will be hash collisions).
If you have unlimited space (like a BigInteger in Java), you can encode the vectors. Assuming that the vector length is fixed, you can simply use some "interlocking" pattern:
vector = [12345,4711,42]
1 2 3 4 5
0 4 7 1 1
0 0 0 4 2
100240370414512 <-- your unique number
It shouldn't be too hard to encode the vector size as well, so this would work for vectors of different sizes as well (e.g. you use the length in octal and an 8 as "prefix").
I'm assuming the inputs are in principle unbounded and so is the output number, as it's trivial otherwise. A simple way is just concatenations the representations of n and v1, v2, .. vn in some base b. Represent them in k-bit digits, then annotate each k-bit digit with a continuation bit (0 if the next k-bit group starts a new number, 1 if it belongs to the same number). This isn't of much use for anything except equality tests, but you did not mention anything else.
If you also care about preserving locality (i.e. nearby points p, q frequently have nearby values f(p), f(q)), some space-filling curves can be used for this purpose. The Hilbert curve is a bit complicated to generalize to higher dimensions, and the calculation is nontrivial. The Z-order curve isn't as good at preserving locality, but it's almost trivial to implement for any number of dimensions -- just interleave the bits of the binary representation.

Resources