I want a simple (non-cryptographic) random number generation algorithm where I can freely choose the period.
One candidate would be a special instance of LCG:
X(n+1) = (aX(n)+c) mod m (m,c relatively prime; (a-1) divisible by all prime factors of m and also divisible by 4 if m is).
This has period m and does not restrict possible values of m.
I intend to use this RNG to create a permutation of an array by generating indices into it. I tried the LCG and it might be OK. However, it may not be "random enough" in that distances between adjacent outputs have very few possible values (i.e, plotting x(n) vs n gives a wrapped line). The arrays I want to index into have some structure that has to do with this distance and I want to avoid potential issues with this.
Of course, I could use any good PRNG to shuffle (using e.g. Fisher–Yates) an array [1,..., m]. But I don't want to have to store this array of indices. Is there some way to capture the permuted indices directly in an algorithm?
I don't really mind the method ending up biased w.r.t choice of RNG seed. Only the period matters and the permuted sequence (for a given seed) being reasonably random.
Encryption is a one-to-one operation. If you encrypt a range of numbers, you will get the same count of apparently random numbers back. In this case the period will be the size of the chosen range. So for a period of 20, encrypt the numbers 0..19.
If you want the output numbers to be in a specific range, then pick a block cipher with an appropriately sized block and use Format Preserving Encryption if needed, as #David Eisenstat suggests.
It is not difficult to set up a cipher with almost any reasonable block size, so long as it is an even number of bits, using the Feistel structure. If you don't require cryptographic security then four or six Feistel rounds should give you enough randomness.
Changing the encryption key will give you a different ordering of the numbers.
Let's say that we have a random number generator that can generate random 32 or 64 bit integers (like rand.Rand in the standard library)
Generating a random int64 in a given range [a,b] is fairly easy:
rand.Seed(time.Now().UnixNano())
n := rand.Int63n(b-a) + a
Is it possible to generate random 128 bit decimal (as defined in specification IEEE 754-2008) in a given range from a combination of 32 or 64 bit random integers?
It is possible, but the solution is far from trivial. For a correct solution, there are several things to consider.
For one thing, values with exponent E are 10 times more likely than values with exponent E - 1.
Other issues include subnormal numbers and ranges that straddle zero.
I am aware of the Rademacher Floating-Point Library, which tackled this problem for binary floating-point numbers, but the solution there is complicated and its author has not yet written up how his algorithm works.
EDIT (May 11):
I have now specified an algorithm for generating random "uniform" floating-point numbers—
In any range,
with full coverage, and
regardless of the digit base (such as binary or decimal).
Possible, but by no means easy. Here is a sketch of a solution that might be acceptable — writing and debugging it would probably be at least a day of concerted effort.
Let min and max be primitive.Decimal128 objects from go.mongodb.org/mongo-driver/bson. Let MAXBITS be a multiple of 32; 128 is likely to be adequate.
Get the significand (as big.Int) and exponent (as int) of min and max using the BigInt method.
Align min and max so that they have the same exponent. As far as possible, left-justify the value with the larger exponent by decreasing its exponent and adding a corresponding number of zeroes to the right side of its significand. If this would cause the absolute value of the significand to become >= 2**(MAXBITS-1), then either
(a) Right-shift the value with the smaller exponent by dropping digits from the right side of its significand and increasing its exponent, causing precision loss.
(b) Dynamically increase MAXBITS.
(c) Throw an error.
At this point both exponents will be the same, and both significands will be aligned big integers. Set aside the exponents for now, and let range (a new big.Int) be maxSignificand - minSignificand. It will be between 0 and 2**MAXBITS.
Turn range into MAXBITS/32 uint32s using the Bytes or DivMod methods, whatever is easier.
If the highest word of range is equal to math.MaxUint32 then set a flag limit to false, otherwise true.
For n from 0 to MAXBITS/32:
if limit is true, use rand.Int63n (!, not rand.Int31n or rand.Uint32) to generate a value between 0 and the nth word of range, inclusive, cast it to uint32, and store it as the nth word of the output. If the value generated is equal to the nth word of range (i.e. if we generated the maximum possible random value for this word) then let limit remain true, otherwise set it false.
If limit is false, use rand.Uint32 to generate the nth word of the output. limit remains false regardless of the generated value.
Combine the generated words into a big.Int by building a []byte and using big/Int.SetBytes or multiplication and addition, as convenient.
Add the generated value to minSignificand to obtain the significand of the result.
Use ParseDecimal128FromBigInt with the result significand and the exponent from steps 2-3 to obtain the result.
The heart of the algorithm is step 6, which generates a uniform random unsigned integer of arbitrary length 32 bits at a time. The alignment in step 2 reduces the problem from a floating-point to an integer one, and the subtraction in step 3 reduces it to an unsigned one, so that we only have to think about one bound instead of 2. The limit flag records whether we're still dealing with that bound, or whether we've already narrowed the result down to an interval that doesn't include it.
Caveats:
I haven't written this, let alone tested it. I may have gotten it quite wrong. A sanity check by someone who does more numerical computation work than me would be welcome.
Generating numbers across a large dynamic range (including crossing zero) will lose some precision and omit some possible output values with smaller exponents unless a ludicrously large MAXBITS is used; however, 128 bits should give a result at least as good as a naive algorithm implemented in terms of decimal128.
The performance is probably pretty bad.
Go has a large number package that can do arbitrary length integers: https://golang.org/pkg/math/big/
It has a pseudo random number generator https://golang.org/pkg/math/big/#Int.Rand, and the crypto package also has https://golang.org/pkg/crypto/rand/#Int
You'd want to specify the max using https://golang.org/pkg/math/big/#Int.Exp as 2^128.
Can't speak to performance, though, or whether this is compliant if the IEEE standard, but large random numbers like what you'd use for UUIDs are possible.
It depends how many values you want to generate. If it's enough to have no more 10^34 values in a specified range - it's quite simple.
As I see the problem, a random value in the range min..max can be calculated as random(0..1)*(max-min)+min
Look like we need to generate only decimal128 value in range 0..1. So it's a random value in range 0..10^34-1 with exponent -34. This value can be generated with a golang standard random package.
To multiply, add and substruct float128 values can be used golang math/big package with values normalization.
This is definitely what you are looking for.
In a counting sort algorithm, we initialize an count array with a size of Maximum Value in a given array. Runtime of this method is O(n + Max value). However with an extra loop, we can look for minimum and maximum value of given array;
for 0 -> Length(given_array)
if given_array[i] > max
max = given_array[i]
if given_array[i] < min
min = given_array[i]
Then use that data to create the count array, lets say between 95-100. We could decrease the runtime in some cases tremendously. However, I haven't seen an approach like this. Would it be still a counting sort algorithm, or does it have another name that I don't know.
Counting sort is typically used when we know upfront that values will be restricted to a certain range.
This range doesn't need to start at zero; it's absolutely fine to use an array of length six whose elements represent the counts of values 95 through 100 (or, for that matter, the counts of values from −2 to 3). So, yes, your approach is still "counting sort".
But if you don't know this restriction upfront, you're not likely to get faster results by doing a complete pass over the data to check.
For example: suppose you have 1,000,000 elements, and you know they're all somewhere in the range 0–200, but you think they're probably all in a much narrower range. Well, the cost of prescanning the entire input array is going to be greater than the cost of working with a 201-element working array, which means it costs more than it can possibly save compared to just doing a counting sort with the range 0–200.
Runtime of this method is O(n + Max value).
The runtime is O(max(num_elements, range_size)), which — due to the magic of Landau (big-O) notation — is the same as O(num_elements + range_size). Your approach only affects the asymptotic complexity if max_value is asymptotically greater than both num_elements and range_size.
Suppose you are given a range and a few numbers in the range (exceptions). Now you need to generate a random number in the range except the given exceptions.
For example, if range = [1..5] and exceptions = {1, 3, 5} you should generate either 2 or 4 with equal probability.
What logic should I use to solve this problem?
If you have no constraints at all, i guess this is the easiest way: create an array containing the valid values, a[0]...a[m] . Return a[rand(0,...,m)].
If you don't want to create an auxiliary array, but you can count the number of exceptions e and of elements n in the original range, you can simply generate a random number r=rand(0 ... n-e), and then find the valid element with a counter that doesn't tick on exceptions, and stops when it's equal to r.
Depends on the specifics of the case. For your specific example, I'd return a 2 if a Uniform(0,1) was below 1/2, 4 otherwise. Similarly, if I saw a pattern such as "the exceptions are odd numbers", I'd generate values for half the range and double. In general, though, I'd generate numbers in the range, check if they're in the exception set, and reject and re-try if they were - a technique known as acceptance/rejection for obvious reasons. There are a variety of techniques to make the exception-list check efficient, depending on how big it is and what patterns it may have.
Let's assume, to keep things simple, that arrays are indexed starting at 1, and your range runs from 1 to k. Of course, you can always shift the result by a constant if this is not the case. We'll call the array of exceptions ex_array, and let's say we have c exceptions. These need to be sorted, which shall turn out to be pretty important in a while.
Now, you only have k-e useful numbers to work with, so it'll be meaningful to find a random number in the range 1 to k-e. Say we end up with the number r. Now, we just need to find the r-th valid number in your array. Simple? Not so much. Remember, you can never simply walk over any of your arrays in a linear fashion, because that can really slow down your implementation when you have a lot of numbers. You have do some sort of binary search, say, to come up with a fast enough algorithm.
So let's try something better. The r-th number would nominally have lied at index r in your original array had you had no exceptions. The number at index r is r, of course, since your range and your array indices start from 1. But, you have a bunch of invalid numbers between 1 and r, and you want to somehow get to the r-th valid number. So, lets do a binary search on the array of exceptions, ex_array, to find how many invalid numbers are equal to or less than r, because we have these many invalid numbers lying between 1 and r. If this number is 0, we're all done, but if it isn't, we have a bit more work to do.
Assume you found there were n invalid numbers between 1 and r after the binary search. Let's advance n indices in your array to the index r+n, and find the number of invalid numbers lying between 1 and r+n, using a binary search to find how many elements in ex_array are less than or equal to r+n. If this number is exactly n, no more invalid numbers were encountered, and you've hit upon your r-th valid number. Otherwise, repeat again, this time for the index r+n', where n' is the number of random numbers that lay between 1 and r+n.
Repeat till you get to a stage where no excess exceptions are found. The important thing here is that you never once have to walk over any of the arrays in a linear fashion. You should optimize the binary searches so they don't always start at index 0. Say if you know there are n random numbers between 1 and r. Instead of starting your next binary search from 1, you could start it from one index after the index corresponding to n in ex_array.
In the worst case, you'll be doing binary searches for each element in ex_array, which means you'll do c binary searches, the first starting from index 1, the next from index 2, and so on, which gives you a time complexity of O(log(n!)). Now, Stirling's approximation tells us that O(ln(x!)) = O(xln(x)), so using the algorithm above only makes sense if c is small enough that O(cln(c)) < O(k), since you can achieve O(k) complexity using the trivial method of extracting valid elements from your array first.
In Python the solution is very simple (given your example):
import random
rng = set(range(1, 6))
ex = {1, 3, 5}
random.choice(list(rng-ex))
To optimize the solution, one needs to know how long is the range and how many exceptions there are. If the number of exceptions is very low, it's possible to generate a number from the range and just check if it's not an exception. If the number of exceptions is dominant, it probably makes sense to gather the remaining numbers into an array and generate random index for fetching non-exception.
In this answer I assume that it is known how to get an integer random number from a range.
Here's another approach...just keep on generating random numbers until you get one that isn't excluded.
Suppose your desired range was [0,100) excluding 25,50, and 75.
Put the excluded values in a hashtable or bitarray for fast lookup.
int randNum = rand(0,100);
while( excludedValues.contains(randNum) )
{
randNum = rand(0,100);
}
The complexity analysis is more difficult, since potentially rand(0,100) could return 25, 50, or 75 every time. However that is quite unlikely (assuming a random number generator), even if half of the range is excluded.
In the above case, we re-generate a random value for only 3/100 of the original values.
So 3% of the time you regenerate once. Of those 3%, only 3% will need to be regenerated, etc.
Suppose the initial range is [1,n] and and exclusion set's size is x. First generate a map from [1, n-x] to the numbers [1,n] excluding the numbers in the exclusion set. This mapping with 1-1 since there are equal numbers on both sides. In the example given in the question the mapping with be as follows - {1->2,2->4}.
Another example suppose the list is [1,10] and the exclusion list is [2,5,8,9] then the mapping is {1->1, 2->3, 3->4, 4->6, 5->7, 6->10}. This map can be created in a worst case time complexity of O(nlogn).
Now generate a random number between [1, n-x] and map it to the corresponding number using the mapping. Map looks can be done in O(logn).
You can do it in a versatile way if you have enumerators or set operations. For example using Linq:
void Main()
{
var exceptions = new[] { 1,3,5 };
RandomSequence(1,5).Where(n=>!exceptions.Contains(n))
.Take(10)
.Select(Console.WriteLine);
}
static Random r = new Random();
IEnumerable<int> RandomSequence(int min, int max)
{
yield return r.Next(min, max+1);
}
I would like to acknowledge some comments that are now deleted:
It's possible that this program never ends (only theoretically) because there could be a sequence that never contains valid values. Fair point. I think this is something that could be explained to the interviewer, however I believe my example is good enough for the context.
The distribution is fair because each of the elements has the same chance of coming up.
The advantage of answering this way is that you show understanding of modern "functional-style" programming, which may be interesting to the interviewer.
The other answers are also correct. This is a different take on the problem.
I am curious to know what the best way to generate a random integer R that is not in a provided set of integers (R∉N). I can think of several ways of doing this but I'm wondering what you all think.
Let N be the size of the overall set, and let K be the size of the excluded set.
I depends on the size of the set you are sampling from. If the excluded set is much smaller than the overall range, just choose a random number, and if it is in the excluded set, choose again. If we keep the excluded set in a hash table each try can be done in O(1) time.
If the excluded set is large, choose a random number R in a set of size (N - K) and output the choice as the member of the non excluded elements. If we store just the holes in a hash table keyed with the value of the random number we can generate this in one sample in time O(1).
The cutoff point will depend on the size of (N - K)/N, but I suspect that unless this is greater than .5 or so, or you sets are very small, just sampling until you get a hit will be faster in practice.
Given your limited description? Find the maximum value of the elements in N. Generate only random numbers greater than that.
Generate a random number R in the entire domain (subtract the size of N from the max value) that you want to use. Then loop through all N less than R and for each add 1 to R. This will give a random number in the domain that is not in N.