In what situations does the difference between random numbers generated on [0,1) and those generated on [0,1] make a difference? - algorithm

I'm used to pseudo random number generators that return floating point values in the half open interval [0,1).
I've seen some reference to RNGs that can return values on the closed interval [0,1], e.g. this implementation of the Mersenne Twister.
I can see reasons why you'd want to exclude one, or both, of the endpoints for mathematical reasons, e.g.
exponentially_distributed=-logf( 1.0-rng() )
always yields a valid number if 0.0<=rng()<1.0.
But I can't think of a case where replacing an rng yielding [0,1] with one that yields [0,1) would produce any practical difference.
In what situations does having a floating point pseudo random number generator
that returns values on the closed interval [0,1] absolutely necessary?

Maybe if you're randomly generating the probability of an event occurring? If you allow 0, you have to allow 1.

Can't, figure out when the closed interval would be useful, but the open end interval seems the only reasonable to use way to go.
Lets take coin tossing:
If you say rnd() < 0.5 is head and the rest is tail you will get more tails than heads if you use the closed interval. How many more tails depends on how likely it is to actually get 1.

A compelling reason to use a half-open interval is the use case where you are picking a random array index for some array. When you scale from [0, 1) to integers in [0, arrayLength], it's helpful never to get the value arrayLength, since that is not an index in the array in many language implementations. E.g., Java and ArrayIndexOutOfBoundsException. The half-open interval is a great convenience here.
A reason for having a closed interval [0, 1] is Albin's probability argument. But it's worth noting that mathematically speaking, the probability of picking any particular random number, including 1, in [0, 1], is zero. For pseudo random number generators, though, it will pop up occasionally.

Related

How do you generate random unique numbers in which adjacent numbers are within a specified range?

So let's say I want to generate an array a[] of n random integers between 0 and n (exclusive) which do not repeat. But I also want to ensures that abs(a[i]-a[i+1]) < maxRange. Wrap around should be permitted so that
abs(a[i]-(a[i+1]+a.length)) < maxRange || abs((a[i]+a.length)-a[i+1]) || abs(a[i]-a[i+1]) < maxRange.
is always true. For example if n=6, and maxRange =2 then the array a={1,3,2,4,0,5} is valid because 3-1 = 2, 3-2 = 1, 4-2 = 2, (0+n)-4 = 2, and (0+n)-5 = 1.
But a={0,3,2,4,1,5} is not valid because 3-0=3 and 4-1=3.
The distribution of differences abs(a[i]-(a[i+1]+a.length)) should be uniform and within the range 1 to maxRange
I feel like there must already be an algorithm out there that does this but I can't seem to find it by googling (perhaps I'm using incorrect terminology). The only way I can think to implement this is to brute force search every possible permutation and checking if the criteria is met to determine valid permutations and then choosing one of those at random. However for longer arrays this could become extremely computationally expensive. Perhaps it is better to think of this as a non-repeating random walk or something. But I'm not sure how it could be implemented as such. Any ideas? A language-agnostic solution would be great.
To give a little extra information I want to use this to perform tournament selection in a genetic algorithm with trivial geography or demes. So for example individuals a[0] and a[1] will undergo a tournament (chooses winner based on fitness) and the loser will get crossed-over/replaced. Then a[2] and a[3] etc. The reason I want to do it like this is so that I can evaluate all individuals in one go, then do the crossover stage in one go, then repeat these stages until finish. The reason I want to do it like this is so I can guarantee every individual has been evaluated at every generation, unlike a typical steady-state genetic algorithm.

Dynamic algorithm to multiply elements in a sequence two at a time and find the total

I am trying to find a dynamic approach to multiply each element in a linear sequence to the following element, and do the same with the pair of elements, etc. and find the sum of all of the products. Note that any two elements cannot be multiplied. It must be the first with the second, the third with the fourth, and so on. All I know about the linear sequence is that there are an even amount of elements.
I assume I have to store the numbers being multiplied, and their product each time, then check some other "multipliable" pair of elements to see if the product has already been calculated (perhaps they possess opposite signs compared to the current pair).
However, by my understanding of a linear sequence, the values must be increasing or decreasing by the same amount each time. But since there are an even amount of numbers, I don't believe it is possible to have two "multipliable" pairs be the same (with potentially opposite signs), due to the issue shown in the following example:
Sequence: { -2, -1, 0, 1, 2, 3 }
Pairs: -2*-1, 0*1, 2*3
Clearly, since there are an even amount of pairs, the only case in which the same multiplication may occur more than once is if the elements are increasing/decreasing by 0 each time.
I fail to see how this is a dynamic programming question, and if anyone could clarify, it would be greatly appreciated!
A quick google for define linear sequence gave
A number pattern which increases (or decreases) by the same amount each time is called a linear sequence. The amount it increases or decreases by is known as the common difference.
In your case the common difference is 1. And you are not considering any other case.
The same multiplication may occur in the following sequence
Sequence = {-3, -1, 1, 3}
Pairs = -3 * -1 , 1 * 3
with a common difference of 2.
However this is not necessarily to be solved by dynamic programming. You can just iterate over the numbers and store the multiplication of two numbers in a set(as a set contains unique numbers) and then find the sum.
Probably not what you are looking for, but I've found a closed solution for the problem.
Suppose we observe the first two numbers. Note the first number by a, the difference between the numbers d. We then count for a total of 2n numbers in the whole sequence. Then the sum you defined is:
sum = na^2 + n(2n-1)ad + (4n^2 - 3n - 1)nd^2/3
That aside, I also failed to see how this is a dynamic problem, or at least this seems to be a problem where dynamic programming approach really doesn't do much. It is not likely that the sequence will go from negative to positive at all, and even then the chance that you will see repeated entries decreases the bigger your difference between two numbers is. Furthermore, multiplication is so fast the overhead from fetching them from a data structure might be more expensive. (mul instruction is probably faster than lw).

Find random numbers in a given range with certain possible numbers excluded

Suppose you are given a range and a few numbers in the range (exceptions). Now you need to generate a random number in the range except the given exceptions.
For example, if range = [1..5] and exceptions = {1, 3, 5} you should generate either 2 or 4 with equal probability.
What logic should I use to solve this problem?
If you have no constraints at all, i guess this is the easiest way: create an array containing the valid values, a[0]...a[m] . Return a[rand(0,...,m)].
If you don't want to create an auxiliary array, but you can count the number of exceptions e and of elements n in the original range, you can simply generate a random number r=rand(0 ... n-e), and then find the valid element with a counter that doesn't tick on exceptions, and stops when it's equal to r.
Depends on the specifics of the case. For your specific example, I'd return a 2 if a Uniform(0,1) was below 1/2, 4 otherwise. Similarly, if I saw a pattern such as "the exceptions are odd numbers", I'd generate values for half the range and double. In general, though, I'd generate numbers in the range, check if they're in the exception set, and reject and re-try if they were - a technique known as acceptance/rejection for obvious reasons. There are a variety of techniques to make the exception-list check efficient, depending on how big it is and what patterns it may have.
Let's assume, to keep things simple, that arrays are indexed starting at 1, and your range runs from 1 to k. Of course, you can always shift the result by a constant if this is not the case. We'll call the array of exceptions ex_array, and let's say we have c exceptions. These need to be sorted, which shall turn out to be pretty important in a while.
Now, you only have k-e useful numbers to work with, so it'll be meaningful to find a random number in the range 1 to k-e. Say we end up with the number r. Now, we just need to find the r-th valid number in your array. Simple? Not so much. Remember, you can never simply walk over any of your arrays in a linear fashion, because that can really slow down your implementation when you have a lot of numbers. You have do some sort of binary search, say, to come up with a fast enough algorithm.
So let's try something better. The r-th number would nominally have lied at index r in your original array had you had no exceptions. The number at index r is r, of course, since your range and your array indices start from 1. But, you have a bunch of invalid numbers between 1 and r, and you want to somehow get to the r-th valid number. So, lets do a binary search on the array of exceptions, ex_array, to find how many invalid numbers are equal to or less than r, because we have these many invalid numbers lying between 1 and r. If this number is 0, we're all done, but if it isn't, we have a bit more work to do.
Assume you found there were n invalid numbers between 1 and r after the binary search. Let's advance n indices in your array to the index r+n, and find the number of invalid numbers lying between 1 and r+n, using a binary search to find how many elements in ex_array are less than or equal to r+n. If this number is exactly n, no more invalid numbers were encountered, and you've hit upon your r-th valid number. Otherwise, repeat again, this time for the index r+n', where n' is the number of random numbers that lay between 1 and r+n.
Repeat till you get to a stage where no excess exceptions are found. The important thing here is that you never once have to walk over any of the arrays in a linear fashion. You should optimize the binary searches so they don't always start at index 0. Say if you know there are n random numbers between 1 and r. Instead of starting your next binary search from 1, you could start it from one index after the index corresponding to n in ex_array.
In the worst case, you'll be doing binary searches for each element in ex_array, which means you'll do c binary searches, the first starting from index 1, the next from index 2, and so on, which gives you a time complexity of O(log(n!)). Now, Stirling's approximation tells us that O(ln(x!)) = O(xln(x)), so using the algorithm above only makes sense if c is small enough that O(cln(c)) < O(k), since you can achieve O(k) complexity using the trivial method of extracting valid elements from your array first.
In Python the solution is very simple (given your example):
import random
rng = set(range(1, 6))
ex = {1, 3, 5}
random.choice(list(rng-ex))
To optimize the solution, one needs to know how long is the range and how many exceptions there are. If the number of exceptions is very low, it's possible to generate a number from the range and just check if it's not an exception. If the number of exceptions is dominant, it probably makes sense to gather the remaining numbers into an array and generate random index for fetching non-exception.
In this answer I assume that it is known how to get an integer random number from a range.
Here's another approach...just keep on generating random numbers until you get one that isn't excluded.
Suppose your desired range was [0,100) excluding 25,50, and 75.
Put the excluded values in a hashtable or bitarray for fast lookup.
int randNum = rand(0,100);
while( excludedValues.contains(randNum) )
{
randNum = rand(0,100);
}
The complexity analysis is more difficult, since potentially rand(0,100) could return 25, 50, or 75 every time. However that is quite unlikely (assuming a random number generator), even if half of the range is excluded.
In the above case, we re-generate a random value for only 3/100 of the original values.
So 3% of the time you regenerate once. Of those 3%, only 3% will need to be regenerated, etc.
Suppose the initial range is [1,n] and and exclusion set's size is x. First generate a map from [1, n-x] to the numbers [1,n] excluding the numbers in the exclusion set. This mapping with 1-1 since there are equal numbers on both sides. In the example given in the question the mapping with be as follows - {1->2,2->4}.
Another example suppose the list is [1,10] and the exclusion list is [2,5,8,9] then the mapping is {1->1, 2->3, 3->4, 4->6, 5->7, 6->10}. This map can be created in a worst case time complexity of O(nlogn).
Now generate a random number between [1, n-x] and map it to the corresponding number using the mapping. Map looks can be done in O(logn).
You can do it in a versatile way if you have enumerators or set operations. For example using Linq:
void Main()
{
var exceptions = new[] { 1,3,5 };
RandomSequence(1,5).Where(n=>!exceptions.Contains(n))
.Take(10)
.Select(Console.WriteLine);
}
static Random r = new Random();
IEnumerable<int> RandomSequence(int min, int max)
{
yield return r.Next(min, max+1);
}
I would like to acknowledge some comments that are now deleted:
It's possible that this program never ends (only theoretically) because there could be a sequence that never contains valid values. Fair point. I think this is something that could be explained to the interviewer, however I believe my example is good enough for the context.
The distribution is fair because each of the elements has the same chance of coming up.
The advantage of answering this way is that you show understanding of modern "functional-style" programming, which may be interesting to the interviewer.
The other answers are also correct. This is a different take on the problem.

Given a true random number generator which outputs either a 1 or 0 per call, how do you use this to pick a number from an arbitrary range?

If I have a true random number generator (TRNG) which can give me either a 0 or a 1 each time I call it, then it is trivial to then generate any number in a range with a length equal to a power of 2. For example, if I wanted to generate a random number between 0 and 63, I would simply poll the TRNG 5 times, for a maximum value of 11111 and a minimum value of 00000. The problem is when I want a number in a rangle not equal to 2^n. Say I wanted to simulate the roll of a dice. I would need a range between 1 and 6, with equal weighting. Clearly, I would need three bits to store the result, but polling the TRNG 3 times would introduce two eroneous values. We could simply ignore them, but then that would give one side of the dice a much lower odds of being rolled.
My question of ome most effectively deals with this.
The easiest way to get a perfectly accurate result is by rejection sampling. For example, generate a random value from 1 to 8 (3 bits), rejecting and generating a new value (3 new bits) whenever you get a 7 or 8. Do this in a loop.
You can get arbitrarily close to accurate just by generating a large number of bits, doing the mod 6, and living with the bias. In cases like 32-bit values mod 6, the bias will be so small that it will be almost impossible to detect, even after simulating millions of rolls.
If you want a number in range 0 .. R - 1, pick least n such that R is less or equal to 2n. Then generate a random number r in the range 0 .. 2n-1 using your method. If it is greater or equal to R, discard it and generate again. The probability that your generation fails in this manner is at most 1/2, you will get a number in your desired range with less than two attempts on the average. This method is balanced and does not impair the randomness of the result in any fashion.
As you've observed, you can repeatedly double the range of a possible random values through powers of two by concatenating bits, but if you start with an integer number of bits (like zero) then you cannot obtain any range with prime factors other than two.
There are several ways out; none of which are ideal:
Simply produce the first reachable range which is larger than what you need, and to discard results and start again if the random value falls outside the desired range.
Produce a very large range, and distribute that as evenly as possible amongst your desired outputs, and overlook the small bias that you get.
Produce a very large range, distribute what you can evenly amongst your desired outputs, and if you hit upon one of the [proportionally] few values which fall outside of the set which distributes evenly, then discard the result and start again.
As with 3, but recycle the parts of the value that you did not convert into a result.
The first option isn't always a good idea. Numbers 2 and 3 are pretty common. If your random bits are cheap then 3 is normally the fastest solution with a fairly small chance of repeating often.
For the last one; supposing that you have built a random value r in [0,31], and from that you need to produce a result x [0,5]. Values of r in [0,29] could be mapped to the required output without any bias using mod 6, while values [30,31] would have to be dropped on the floor to avoid bias.
In the former case, you produce a valid result x, but there's some more randomness left over -- the difference between the ranges [0,5], [6,11], etc., (five possible values in this case). You can use this to start building your new r for the next random value you'll need to produce.
In the latter case, you don't get any x and are going to have to try again, but you don't have to throw away all of r. The specific value picked from the illegal range [30,31] is left-over and free to be used as a starting value for your next r (two possible values).
The random range you have from that point on needn't be a power of two. That doesn't mean it'll magically reach the range you need at the time, but it does mean you can minimise what you throw away.
The larger you make r, the more bits you may need to throw away if it overflows, but the smaller the chances of that happening. Adding one bit halves your risk but increases the cost only linearly, so it's best to use the largest r you can handle.

in a series of n elements of arithmetic progression, [n/2] elements are changed. Find the difference in the initial arithmetic progression

I have a list of size n which contains n consecutive members of an arithmetic progression which are not in order. I changed less than half of the elements in this list with some random integer. From this new list, how can I find the difference of the initial arithmetic progression?
I thought a lot about it but except brute force, I was not able to come up with any other thing :(
Thanks for thinking on this one :)
It's not possible to solve this in general and be 100% sure that your answer is correct. Let's say that the initial list is the following arithmetic progression (not in order):
1 3 2 4
Change less than half the elements at random... let's say for example that we changed 2 to 5:
1 3 5 4
If we can first find out which numbers we need to change to obtain a valid shuffled arithmetic sequence then we can easily solve the problem stated in the question. However we can see that there are multiple possible answers depending in which we number we choose to change:
6, 3, 5, 4 (difference is 1)
1, 3, 2, 4 (difference is 1)
1, 3, 5, 7 (difference is 2)
There is no way to know which of these possible sequence is the original sequence, so you cannot be sure what the original difference was.
Since there is no deterministic solution for the problem (as stated by #Mark Byers), you can try a probabilistic approach.
It's difficult to obtain the original progression, but its rate can be obtained easily by comparing the differences between elements. The difference of original ones will be multiples of rate.
Consider you take 2 elements from the list (probability that both of them belongs to the original sequence is 1/4), and compute the difference. This difference, with probability of 1/4, will be a multiple of the rate. Decompose it to prime factors and count them (for example, 12 = 2^^2 * 3 will add 2 to 2's counter and will increment 3's counter).
After many such iterations (it looks like a good problem for probabilistic methods, like Monte Carlo), you could analize the counters.
If a prime factor belongs to the rate, its counter will be at least num_iteartions/4 ( or num_iterations/2 if it appears twice).
The main problem is that small factors will have large probability on random input (for example, the difference between two random numbers will have 50% probability to be divisible by 2). So you'll have to compensate it: since 3/4 of your differences were random, you'll have to consider that (3/8)*num_iterations of 2's counter must be ignored. Since this also applies to all powers of two, the simpliest way is to pregenerate "white noise mask" by taking the differences only between random numbers.
EDIT: let's take this approach further. Consider that you create this "white noise mask" (let's call it spectrum) for random numbers, and consider that it's base-1 spectrum, since their smallest "largest common factor" is 1. By computing it for a differences of the arithmetic sequence, you'll obtain a base-R spectrum, where R is the rate, and it will equivalent to a shifted version of base-1 spectrum. So you have to find the value of R such that
your_spectrum ~= spectrum(1)*3/4 + spectrum(R)*1/4
You could also check for largest number R such that at least half of the elements will be equal modulo R.

Resources