When using random numbers in any language people always do something like random % 100 + 1 to limit whatever number from the range of 1 to 100. But how does the mod operator limit it? I though it was just used for calculating remainders? Can someone explain this to me?
mod is used for calculating remainders, that's true. And a remainder is always limited by (divisor-1). So the result of random % 100 always belongs to [0, 99].
Another useful property of mod is that it maps the initial range into the result range rather 'fairly'. That is, probability of random mod N == k tends to be 1/N with the growth of random's range, and is exactly 1/N if N is a divisor of random's range (for an evenly distributed random of course).
Related
I need to find the smallest number k for which (2n - 1)k % M equals a given X.
The catch here is that n can be a very large number, with possibly 10,000 digits, and hence will be stored as a string. I know that this is a hard problem in general, but does the special form of the number imply any property that makes this easier in this case? M is not necessarily prime, but is within reasonable bounds of 108.
First you can't store the value in string because 210000 digits is far more than the total number of particles in the universe (1080 ≈ 2265.75). You don't even have enough memory if you store it as bits (in fact that's how bigint libraries store their numbers, no good libraries store values as characters)
So what you can do is to use modular exponentiation to get the modulo. Basically you use the (a * b) % M = ((a % M) * (b % M)) % M property to avoid calculating the real power value. Many languages already have built-in support for that, for example Python pow function has an optional third argument for this, resulting in pow(base, exp[, mod]). The implementation is exactly like the normal pow, just replace power *= base with modpow = (modpow * base) % M. There are a lot of examples on SO
Calculating (a^b)%MOD
Calculating pow(a,b) mod n
Calculate (a^b)%c where 0<=a,b,c<=10^18
You don't need to loop (2n - 1)k times. It's actually impossible because assuming you can loop 232 times a second then you'll need 232 seconds ≈ 136 years to loop 264 times. Imagine how many centuries it need to count up to 210000. Luckily the result will repeat after a cycle, you just need to calculate the cycle length
Those are the hints needed. You can reference how to calculate a^(b^c) mod n? and finding a^b^c^... mod m which are closer to your problem
I have a number (in base 10) represented as a string with up to 10^6 digits. I want to check if this number is a power of two. One thing I can think of is binary search on exponents and using FFT and fast exponentiation algorithm, but it is quite long and complex to code. Let n denote the length of the input (i.e., the number of decimal digits in the input). What's the most efficient algorithm for solving this problem, as a function of n?
There are either two or three powers of 2 for any given size of a decimal number, and it is easy to guess what they are, since the size of the decimal number is a good approximation of its base 10 logarithm, and you can compute the base 2 logarithm by just multiplying by an appropriate constant (log210). So a binary search would be inefficient and unnecessary.
Once you have a trial exponent, which will be on the order of three million, you can use the squaring exponentiation algorithm with about 22 bugnum decimal multiplications. (And up to 21 doublings, but those are relatively easy.)
Depending on how often you do this check, you might want to invest in fast bignum code. But if it is infrequent, simple multiplication should be ok.
If you don't expect the numbers to be powers of 2, you could first do a quick computation mod 109 to see if the last 9 digits match. That will eliminate all but a tiny percentage of random numbers. Or, for an even faster but slightly weaker filter, using 64-bit arithmetic check that the last 20 digits are divisible by 220 and not by 10.
Here is an easy probabilistic solution.
Say your number is n, and we want to find k: n = 2^k. Obviously, k = log2(n) = log10(n) * log2(10). We can estimate log10(n) ~ len(n) and find k' = len(n) * log2(10) with a small error (say, |k - k'| <= 5, I didn't check but this should be enough). Probably you'll need this part in any solutions that can come in mind, it was mentioned in other answers as well.
Now let's check that n = 2^k for some known k. Select a random prime number P with from 2 to k^2. If remainders are not equal that k is definitely not a match. But what if they are equal? I claim that false positive rate is bounded by 2 log(k)/k.
Why it is so? Because if n = 2^k (mod P) then P divides D = n-2^k. The number D has length about k (because n and 2^k has similar magnitude due to the first part) and thus cannot have more than k distinct prime divisors. There are around k^2 / log(k^2) primes less than k^2, so a probability that you've picked a prime divisor of D at random is less than k / (k^2 / log(k^2)) = 2 log(k) / k.
In practice, primes up to 10^9 (or even up to log(n)) should suffice, but you have to do a bit deeper analysis to prove the probability.
This solution does not require any long arithmetics at all, all calculations could be made in 64-bit integers.
P.S. In order to select a random prime from 1 to T you may use the following logic: select a random number from 1 to T and increment it by one until it is prime. In this case the distribution on primes is not uniform and the former analysis is not completely correct, but it can be adapted to such kind of random as well.
i am not sure if its easy to apply, but i would do it in the following way:
1) show the number in binary. now if the number is a power of two, it would look like:
1000000....
with only one 1 and the rest are 0. checking this number would be easy. now the question is how is the number stored. for example, it could have leading zeroes that will harden the search for the 1:
...000010000....
if there are only small number of leading zeroes, just search from left to right. if the number of zeroes is unknown, we will have to...
2) binary search for the 1:
2a) cut in the middle.
2b) if both or neither of them are 0 (hopefully you can check if a number is zero in reasonable time), stop and return false. (false = not power of 2)
else continue with the non-zero part.
stop if the non-zero part = 1 and return true.
estimation: if the number is n digits (decimal), then its 2^n digits binary.
binary search takes O(log t), and since t = 2^n, log t = n. therefore the algorithm should take O(n).
assumptions:
1) you can access the binary view of the number.
2) you can compare a number to zero in a reasonable time.
There are similar questions, but most of them too language specific. I'm looking for a general solution. Given some way to produce k random bytes and a number n, I need to produce a random number in range 1...n (inclusive).
What I've come up with so far:
To determine the number of bytes needed to represent n, calculate
f(n):=ceiling(ln(n)/8ln(2))=ceiling(0.180337*ln(n))
Get a random number in range in range 1...2^8f(n) for 0-indexed bytes b[i]:
r:=0
for i=0 to k-1:
r = r + b[i] * 2^(8*i)
end for
To scale to 1...n without bias:
R(n,r) := ceiling(n * (r / 256^f(n)))
But I'm not sure this does not create a bias or some subtle one-off error. Could you check whether this sound and/or make suggestions for improvements? Is this the right way to do this?
In answers, please assume that there are no modular bit-twiddling operations available, but you can assume arbitrary precision arithmetics. (I'm programming in Scheme.)
Edit: There is definitely something wrong with my approach, because in my tests rolling a dice yielded a few cases of 0! But where is the error?
This is similar to what you'd do if you wanted to generate a number from 1 to n from a random floating point number from 0 to 1, inclusive. If r is the random float:
result = (r * n) + 1
If you have arbitrary precision arithmetic, you can compute r by dividing your k-byte integer by the maximum value expressible in k bytes, + 1.
So if you have 4 bytes 87 6F BD 4A, and n = 200:
((0x876FBd4A/0x100000000) * 200) + 1
Page 120 of Programming Pearls 1st edition presents this algorithm for selecting M equally probable random elements out of a population of N integers.
InitToEmpty
Size := 0
While Size < M do
T := RandInt(1,N)
if not Member(T)
Insert(T)
Size := Size + 1
It is stated that the expected number of Member tests is less than 2M, as long as M < N/2.
I'd like to know how to prove it, but my algorithm analysis background is failing me.
I understand that the closer M is to N, the longer the program will take, because the result set will have more elements and the likelihood of RandInt selecting an existing one will increase proportionally.
Can you help me figuring out this proof?
I am not a math wizard, but I will give it a rough shot. This is NOT guaranteed to be right though.
For each additional member of M, you pick a number, see if it's there, and if is add it. Otherwise, you try again. Trying something until you're successful is called a geometric probability distribution.
http://en.wikipedia.org/wiki/Geometric_distribution
So you are running M geometric trials. Each trial has expected value 1/p, so will take expected 1/p tries to get a number not already in M. p is N minus the number of numbers we've already added from M divided by N (i.e. how many unpicked items / total items). So for the fourth number, p = (N -3) / N, which is the probability of picking an unused number, so the expected number of picks for the third number is N / N-3 .
The expected value of the run time is all of these added together. So something like
E(run time) = N/N + N/(N -1) + N/(N -2 ) ... + N/ (N-M)
Now if M < N/2, then the last element in that summation is bounded above by 2. ((N/N/2) == 2)). It's also obviously the largest element in the whole summation. So if the biggest element is two picks, and there are M elements being summed, the EV of the whole run time is bounded above by 2M.
Ask me if any of this is unclear. Correct me if any of this is wrong :)
Say we have chosen K elements out of N. Then our next try has probability (N-K)/N of succeeding, so the number of tries that it takes to find the K + 1 st element is geometrically distributed with mean N/(N-K).
So if 2M < N we expect it to take less than two tries to get each element.
Given a function R which produces true random 32 bit numbers, I would like a function that returns random integers in the range 0 to n, where n is arbitrary (less than 2^32).
The function must produce all values 0 to n with equal probability.
I would like a function that executes in constant time with no if statements or loops, so something like the Java Random.nextInt(n) function is out.
I suspect that a simple modulus will not do the job unless n is a power of 2 -- am I right?
I have accepted Jason's answer, despite it requiring a loop of undetermined duration, since it appears to be the best method to use in practice and essentially answers my question. However I am still interested in any algorithms (even if less efficient) which would be deterministic in nature and be guaranteed to terminate, such as Mark Byers has pointed to.
Without discarding some of the values from the source, you can not do this. For example, a set of size 2^32 can not be partitioned into three equally sized sets. Therefore, it is impossible to do this without discarding some of the values and iterating until a non-discarded value is produced.
So, just use this (pseudocode):
rng is random number generator produces uniform integers from [0, max)
compute m = max modulo (n + 1)
do {
draw a random number r from rng
} while(r >= max - m)
return r modulo (n + 1)
Effectively I am throwing out the top part of the distribution that causes problems. If rng is uniform on [0, max), then this algorithm will be uniform on [0, n]
What you're asking for is impossible. You can't partition 2**32 numbers into three sets of exactly equal size.
If you want to guarantee an absolutely perfect uniform distribution in 0 <= x < n, where n is not a power of 2 then you have to be prepared to call R potentially an infinite number of times. In reality you will typically need only one or two calls, but the code has to in theory be able call R any number of times otherwise it can't be completely uniform.
I don't understand why modulus wouldn't do what you want? Since R is a function that produces true random 32 bit numbers, that means that each number has the same probability to be produced, right? So, if you use a modulus n:
randomNumber = R() % (n + 1) //EDITED: n+1 to return values from 0-n
then each number from 0 to n has the same probability!
You can generate two 32 bit numbers and put them together to form 64 bit number. Worst case scenario can be than biased by 0.99999999976716936 if you do not discharge numbers (if you need number whit no more than 32 bits) that mean that some number have by this factor lower probability than other.
But if you still want to remove this small bias you will have low ration "out of range" hits and in that case more that 1 discharge.
Depending upon your problem/use of the random numbers, maybe you could pre-allocate your random numbers using a slow method and put them into a simple array.
Then getNextRnd() can just return the next in the array.
Quick, fixed time call, no branches, just wasting memory (which is usually pretty cheap) and process initialization time.