Is there a way to convert uniformly distributed random numbers of one range to uniformly distributed random numbers of another range frugally?
Let me explain what I mean by "frugally".
The typical approach to generate random number within given range (e.g. r ∈ [0..10) ) is to take some fixed random bits, let's say 31, which result non-negative random number less than 2147483648. Then make sure that the value is less than 2147483640 (because 2147483648 is not divisible by 10, and hence may lead to uneven distribution). If the value is greater or equal to 2147483640, throw it away and try again (get next 31 random bits and so on). If value if less than 2147483640, then just return the remainder of division by 10. This approach consumes at least 31 bit per decimal digit. Since theoretical limit is log2(10) = 3.321928..., it is quite wasteful.
We can improve this, if we use 4 bits instead if 31. In this case we will consume 4 × 1.6 = 6.4 bits per decimal digit. This is more frugal, but still far from the ideal.
public int nextDit() {
int result;
do {
result = next4Bits();
} while (result >= 10);
return result;
}
We can try to generate 3 decimal digits at once. Since 1024 is quite close to 1000, the probability that raw source random number will be rejected is less than in previous case. Once we generated 3 decimal digits, we return 1 digit and reserve the rest 2 digits.
Something like below
private int _decDigits = 0;
private int _decCount = 0;
public int nextDit() {
if (_decCount > 0) {
// take numbers from the reserve
int result = _decDigits % 10;
_decDigits /= 10;
_decCount -= 1;
return result;
} else {
int result;
do {
result = next10Bits();
} while (result >= 1000);
// reserve 2 decimal digits
_decCount = 2;
_decDigits = result % 100;
result /= 100;
return result;
}
}
This approach is much more frugal: it consumes 10 × 1.024 / 3 = 3.41(3) bits per decimal digit.
We can even go farther if we try to reuse the numbers, which we previously have been throwing away. The random number r ∈ [0, 1024) falls into one of the 3 ranges: [0, 1000), [1000, 1020), [1020, 1024).
If it falls into [0, 1000), we do as we did before, reserve 2 decimal digits (in decimal digit reserve) and return 1 decimal digit.
If it falls into [1000, 1020), we subtract 1000 converting to the range [0, 20). Then we get 1 bit by dividing it by 10 and 1 decimal digit by getting remainder of division by 10. We put the bit to the binary digit reserve and return the decimal digit.
If it falls into [1020, 1024), we subtract 1020 converting to the range [0, 4). Here we get just 2 bits, which we put to the binary digits reserve.
// decimal digit reserve
private int _decDigits = 0;
private int _decCount = 0;
// binary digit reserve
private int _binDigits = 0;
private int _binCount = 0;
private int nextBits(int bits, int n) {
for (int i = 0; i < n; i += 1) {
bits = (bits << 1) + _bitRandomDevice.nextBit();
}
return bits;
}
private int next10Bits() {
// take bits from the binary reserve first, then from _bitRandomDevice
int result;
if (_binCount >= 10) {
result = _binDigits >> (_binCount - 10);
_binDigits = _binDigits & (1 << (_binCount - 10) - 1);
_binCount -= 10;
} else {
result = nextBits(_binDigits, 10 - _binCount);
_binCount = 0;
_binDigits = 0;
}
return result;
}
public int nextDit() {
if (_decCount > 0) {
// take numbers from the decimal reserve
int result = _decDigits % 10;
_decDigits /= 10;
_decCount -= 1;
return result;
} else {
int result;
while (true) {
result = next10Bits();
if (result < 1000) {
assert result >= 0 && result < 1000;
// reserve 2 decimal digits
_decCount = 2;
_decDigits = result % 100;
result /= 100;
// return 1 decimal digit
return result;
} else if (result < 1020) {
result -= 1000;
assert result >= 0 && result < 20;
// reserve 1 binary digit
_binCount += 1;
_binDigits = (_binDigits << 1) + (result / 10);
// return 1 decimal digit
return result % 10;
} else {
result -= 1020;
assert result >= 0 && result < 4;
// reserve 2 binary digits
_binCount += 2;
_binDigits = (_binDigits << 2) + result;
}
}
}
}
This approach consumes about 3.38... bits per decimal digit. This is the most frugal approach I can find, but it still wastes/loses some information from the source of randomness.
Thus, my question is: Is there any universal approach/algorithm that converts uniformly distributed random numbers of one arbitrary range [0, s) (later called source numbers) to uniformly distributed random numbers of another arbitrary range [0, t) (later called target numbers), consuming only logs(t) + C source numbers per target number? where C is some constant.
If there is no such approach, why? What prevents from reaching the ideal limit?
The purpose of being frugal is to reduce number of calls to RNG. This could be especially worth to do when we work with True RNG, which often has limited throughput.
As for "frugality optimizations", they are based on following assumptions:
given uniform random number r ∈ [0,N), after checking that r < M (if M <= N), we may assume that it's uniformly distributed in [0,M). Traditional rejection approach is actually based on this assumption. Similarly, after checking that r >= M, we may assume that it's uniformly distributed in [M,N).
given uniform random number r ∈ [A,B), the derived random number (r+C) is uniformly distributed in [A+C,B+C). I.e. we can add and subtract any constant to random number to shift its range.
given uniform random number r ∈ [0,N), where N=P × Q, the derived random numbers (r%P) is uniformly distributed in [0,P) and (r/P) is uniformly distributed in [0,Q). I.e. we can split one uniform random number into several ones.
given uniform random numbers p ∈ [0,P) and q ∈ [0,Q), the derived random number (q× P + p) is uniformly distributed in [0,P × Q). I.e. we can combine uniform random numbers into one.
Your goal is ultimately to roll a k-sided die given only a p-sided die, without wasting randomness.
In this sense, by Lemma 3 in "Simulating a dice with a dice" by B. Kloeckner, this waste is inevitable unless "every prime number dividing k also divides p". Thus, for example, if p is a power of 2 (and any block of random bits is the same as rolling a die with a power of 2 number of faces) and k has prime factors other than 2, the best you can do is get arbitrarily close to no waste of randomness.
Also, besides batching of bits to reduce "bit waste" (see also the Math Forum), there is also the technique of randomness extraction, discussed in Devroye and Gravel 2015-2020 and in my Note on Randomness Extraction.
See also the question: How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?, especially my answer there.
Keep adding more digits. Here's some Python to compute expected yields (this is slightly worse for a particular value of n than your approach because it doesn't save leftover bits, but it's good enough to make my point):
import math
def expected_digits(n, b):
total = 0
p = 1
while n >= b:
p *= 1 - (n % b) / n
total += p
n //= b
return total
def expected_yield(k):
return expected_digits(2 ** k, 10) / k
print(expected_yield(10))
print(expected_yield(30))
print(expected_yield(100000))
print(math.log10(2))
The output is
0.294921875
0.2952809327592452
0.301018918814536
0.3010299956639812
and as you can see, 100000 binary digits (second to last line) gets quite close to the Shannon limit (last line).
In theoretical terms, we're applying an arithmetic decoder where all output numbers have equal probability to an infinite stream of bits (interpreted as a random number between 0 and 1). The asymptotic efficiency approaches perfection, but the more samples you take, the heavier the arithmetic gets. That tends to be the trade-off.
Related
Is there a way to convert uniformly distributed random numbers of one range to uniformly distributed random numbers of another range frugally?
Let me explain what I mean by "frugally".
The typical approach to generate random number within given range (e.g. r ∈ [0..10) ) is to take some fixed random bits, let's say 31, which result non-negative random number less than 2147483648. Then make sure that the value is less than 2147483640 (because 2147483648 is not divisible by 10, and hence may lead to uneven distribution). If the value is greater or equal to 2147483640, throw it away and try again (get next 31 random bits and so on). If value if less than 2147483640, then just return the remainder of division by 10. This approach consumes at least 31 bit per decimal digit. Since theoretical limit is log2(10) = 3.321928..., it is quite wasteful.
We can improve this, if we use 4 bits instead if 31. In this case we will consume 4 × 1.6 = 6.4 bits per decimal digit. This is more frugal, but still far from the ideal.
public int nextDit() {
int result;
do {
result = next4Bits();
} while (result >= 10);
return result;
}
We can try to generate 3 decimal digits at once. Since 1024 is quite close to 1000, the probability that raw source random number will be rejected is less than in previous case. Once we generated 3 decimal digits, we return 1 digit and reserve the rest 2 digits.
Something like below
private int _decDigits = 0;
private int _decCount = 0;
public int nextDit() {
if (_decCount > 0) {
// take numbers from the reserve
int result = _decDigits % 10;
_decDigits /= 10;
_decCount -= 1;
return result;
} else {
int result;
do {
result = next10Bits();
} while (result >= 1000);
// reserve 2 decimal digits
_decCount = 2;
_decDigits = result % 100;
result /= 100;
return result;
}
}
This approach is much more frugal: it consumes 10 × 1.024 / 3 = 3.41(3) bits per decimal digit.
We can even go farther if we try to reuse the numbers, which we previously have been throwing away. The random number r ∈ [0, 1024) falls into one of the 3 ranges: [0, 1000), [1000, 1020), [1020, 1024).
If it falls into [0, 1000), we do as we did before, reserve 2 decimal digits (in decimal digit reserve) and return 1 decimal digit.
If it falls into [1000, 1020), we subtract 1000 converting to the range [0, 20). Then we get 1 bit by dividing it by 10 and 1 decimal digit by getting remainder of division by 10. We put the bit to the binary digit reserve and return the decimal digit.
If it falls into [1020, 1024), we subtract 1020 converting to the range [0, 4). Here we get just 2 bits, which we put to the binary digits reserve.
// decimal digit reserve
private int _decDigits = 0;
private int _decCount = 0;
// binary digit reserve
private int _binDigits = 0;
private int _binCount = 0;
private int nextBits(int bits, int n) {
for (int i = 0; i < n; i += 1) {
bits = (bits << 1) + _bitRandomDevice.nextBit();
}
return bits;
}
private int next10Bits() {
// take bits from the binary reserve first, then from _bitRandomDevice
int result;
if (_binCount >= 10) {
result = _binDigits >> (_binCount - 10);
_binDigits = _binDigits & (1 << (_binCount - 10) - 1);
_binCount -= 10;
} else {
result = nextBits(_binDigits, 10 - _binCount);
_binCount = 0;
_binDigits = 0;
}
return result;
}
public int nextDit() {
if (_decCount > 0) {
// take numbers from the decimal reserve
int result = _decDigits % 10;
_decDigits /= 10;
_decCount -= 1;
return result;
} else {
int result;
while (true) {
result = next10Bits();
if (result < 1000) {
assert result >= 0 && result < 1000;
// reserve 2 decimal digits
_decCount = 2;
_decDigits = result % 100;
result /= 100;
// return 1 decimal digit
return result;
} else if (result < 1020) {
result -= 1000;
assert result >= 0 && result < 20;
// reserve 1 binary digit
_binCount += 1;
_binDigits = (_binDigits << 1) + (result / 10);
// return 1 decimal digit
return result % 10;
} else {
result -= 1020;
assert result >= 0 && result < 4;
// reserve 2 binary digits
_binCount += 2;
_binDigits = (_binDigits << 2) + result;
}
}
}
}
This approach consumes about 3.38... bits per decimal digit. This is the most frugal approach I can find, but it still wastes/loses some information from the source of randomness.
Thus, my question is: Is there any universal approach/algorithm that converts uniformly distributed random numbers of one arbitrary range [0, s) (later called source numbers) to uniformly distributed random numbers of another arbitrary range [0, t) (later called target numbers), consuming only logs(t) + C source numbers per target number? where C is some constant.
If there is no such approach, why? What prevents from reaching the ideal limit?
The purpose of being frugal is to reduce number of calls to RNG. This could be especially worth to do when we work with True RNG, which often has limited throughput.
As for "frugality optimizations", they are based on following assumptions:
given uniform random number r ∈ [0,N), after checking that r < M (if M <= N), we may assume that it's uniformly distributed in [0,M). Traditional rejection approach is actually based on this assumption. Similarly, after checking that r >= M, we may assume that it's uniformly distributed in [M,N).
given uniform random number r ∈ [A,B), the derived random number (r+C) is uniformly distributed in [A+C,B+C). I.e. we can add and subtract any constant to random number to shift its range.
given uniform random number r ∈ [0,N), where N=P × Q, the derived random numbers (r%P) is uniformly distributed in [0,P) and (r/P) is uniformly distributed in [0,Q). I.e. we can split one uniform random number into several ones.
given uniform random numbers p ∈ [0,P) and q ∈ [0,Q), the derived random number (q× P + p) is uniformly distributed in [0,P × Q). I.e. we can combine uniform random numbers into one.
Your goal is ultimately to roll a k-sided die given only a p-sided die, without wasting randomness.
In this sense, by Lemma 3 in "Simulating a dice with a dice" by B. Kloeckner, this waste is inevitable unless "every prime number dividing k also divides p". Thus, for example, if p is a power of 2 (and any block of random bits is the same as rolling a die with a power of 2 number of faces) and k has prime factors other than 2, the best you can do is get arbitrarily close to no waste of randomness.
Also, besides batching of bits to reduce "bit waste" (see also the Math Forum), there is also the technique of randomness extraction, discussed in Devroye and Gravel 2015-2020 and in my Note on Randomness Extraction.
See also the question: How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?, especially my answer there.
Keep adding more digits. Here's some Python to compute expected yields (this is slightly worse for a particular value of n than your approach because it doesn't save leftover bits, but it's good enough to make my point):
import math
def expected_digits(n, b):
total = 0
p = 1
while n >= b:
p *= 1 - (n % b) / n
total += p
n //= b
return total
def expected_yield(k):
return expected_digits(2 ** k, 10) / k
print(expected_yield(10))
print(expected_yield(30))
print(expected_yield(100000))
print(math.log10(2))
The output is
0.294921875
0.2952809327592452
0.301018918814536
0.3010299956639812
and as you can see, 100000 binary digits (second to last line) gets quite close to the Shannon limit (last line).
In theoretical terms, we're applying an arithmetic decoder where all output numbers have equal probability to an infinite stream of bits (interpreted as a random number between 0 and 1). The asymptotic efficiency approaches perfection, but the more samples you take, the heavier the arithmetic gets. That tends to be the trade-off.
What is the best way to divide two numbers that have more than 50 digits, but less than 200.
I have structure to represent a number:
struct number{
int digit[MAX_SIZE]; // MAX_SIZE = 5;
bool negative; // Is it negative or positive number
};
The problem that I face when trying to implement this algorithm is that, if I'm trying to divide a number 'n' with a number 'm' (n > m) that has more digits then you can store in a variable, how can you divide it ?
For example:
1234567891234567891234567 / 12345678912345678
My first guess is to do with repeated subtractions, but isn't that too slow?
Think about how you do it by hand:
You calculate the most significant digit first. And if the numbers are great enough you do it by repeated subtraction, finding one digit at a time.
In your case:
The first number has 25 digits and the second number has 17 digits.
So you start with the digit corresponding to 1E8.
Here is some C-style pseudocode.
struct number n1 = 1234567891234567891234567;
struct number n2 = 12345678912345678;
int start = floor(log10(n1) - log10(n2)); // Position of most significant digit in answer
struct number result = 0;
int i,j;
struct number remainder = n1;
// Start with the most significant digit
for i = start to 0 {
// Find the highest digit that gives a remainder >= 0
for j = 9 to 0 step -1 {
if (remainder - j * n2 * pow(10, i) >= 0) {
// We found the digit!
result = result + j * pow(10, i);
remainder = remainder - j * n2 * pow(10, i);
break; // Move on to the next digit
}
}
}
// We now have the result and the remainder.
Without using /, % and * operators, write a function to divide a number by 3. itoa() is available.
The above was asked from me in an interview and I couldn't really come up with an answer. I thought of converting the number to a string and adding all the digits, but that will just tell me whether number is divisible or not. Or, by repeated subtraction it can also tell me the remainder. But, how do I obtain the quotient on division?
The below code takes in 2 integers, and divides the first by the second. It supports negative numbers.
int divide (int a, int b) {
if (b == 0)
//throw division by zero error
//isPos is used to check whether the answer is positive or negative
int isPos = 1;
//if the signs are different, the answer will be negative
if ((a < 0 && b > 0) || (a > 0 && b < 0))
int isPos = 0;
a = Math.abs(a);
b = Math.abs(b);
int ans = 0;
while (a >= b) {
a = a-b;
ans++;
}
if (isPos)
return 0-ans;
return ans;
}
According to itoa the number is integer.
int divide(int a, int b)
{
int n=0;
while(1)
{
a-=b;
if(a<b)
{
n=n+1;
return n;
}
else
n=n+1;
}
}
Just count how many times b in a by subtracting it
Edit: Removed the limit
The "count how many times you subtract 3" algorithm takes theta(|input|) steps. You could argue that theta(|input|) is fine for 32-bit integers, in which case why do any programming? Just use a lookup table. However, there are much faster methods which can be used for larger inputs.
You can perform a binary search for the quotient, testing whether a candidate quotient q is too large or too small by comparing q+q+q with the input. Binary search takes theta(log |input|) time.
Binary search uses division by 2, which can be done by the shift operator instead of /, or you can implement this yourself on arrays of bits if the shift operator is too close to division.
It is tempting to use the fact that 1/3 is the sum of the geometric series 1/4 + 1/16 + 1/64 + 1/256 + ... by trying (n>>2) + (n>>4) + (n>>6) + ... however this produces the wrong answer for n=3,6,7,9, 11, 12, 13, 14, 15, 18, ... It is off by two for n=15,30,31, 39, .... In general, this is off by O(log n). For n nonnegative,
(n>>2) + (n>>4) + (n>>6) + ... = (n-wt4(n))/3
where wt4(n) is the sum of the base 4 digits of n, and the / on the right hand side is exact, not integer division. We can compute n/3 by adding wt4(n)/3 to (n>>2)+(n>>4)+(n>>6)+... We can compute the base 4 digits of n and therefore wt4(n) using only addition and the right shift.
int oneThirdOf(int n){
if (0<=n && n<3)
return 0;
if (n==3)
return 1;
return sum(n) + oneThirdOf(wt4(n));
}
// Compute (n>>2) + (n>>4) + (n>>6) + ... recursively.
int sum(int n){
if (n<4)
return 0;
return (n>>2) + sum(n>>2);
}
// Compute the sum of the digits of n base 4 recursively.
int wt4(int n){
if (n<4)
return n;
int fourth = n>>2;
int lastDigit = n-fourth-fourth-fourth-fourth;
return wt4(fourth) + lastDigit;
}
This also takes theta(log input) steps.
I need to generate a random number, but it needs to be selected from the set of binary numbers with equal numbers of set bits. E.g. choose a random byte value with exactly 2 bits set...
00000000 - no
00000001 - no
00000010 - no
00000011 - YES
00000100 - no
00000101 - YES
00000110 - YES
...
=> Set of possible numbers 3, 5, 6...
Note that this is a simplified set of numbers. Think more along the lines of 'Choose a random 64-bit number with exactly 40 bits set'. Each number from the set must be equally likely to arise.
Do a random selection from the set of all bit positions, then set those bits.
Example in Python:
def random_bits(word_size, bit_count):
number = 0
for bit in random.sample(range(word_size), bit_count):
number |= 1 << bit
return number
Results of running the above 10 times:
0xb1f69da5cb867efbL
0xfceff3c3e16ea92dL
0xecaea89655befe77L
0xbf7d57a9b62f338bL
0x8cd1fee76f2c69f7L
0x8563bfc6d9df32dfL
0xdf0cdaebf0177e5fL
0xf7ab75fe3e2d11c7L
0x97f9f1cbb1f9e2f8L
0x7f7f075de5b73362L
I have found an elegant solution: random-dichotomy.
Idea is that on average:
and with a random number is dividing by 2 the number of set bits,
or is adding 50% of set bits.
C code to compile with gcc (to have __builtin_popcountll):
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
/// Return a random number, with nb_bits bits set out of the width LSB
uint64_t random_bits(uint8_t width, uint8_t nb_bits)
{
assert(nb_bits <= width);
assert(width <= 64);
uint64_t x_min = 0;
uint64_t x_max = width == 64 ? (uint64_t)-1 : (1UL<<width)-1;
int n = 0;
while (n != nb_bits)
{
// generate a random value of at least width bits
uint64_t x = random();
if (width > 31)
x ^= random() << 31;
if (width > 62)
x ^= random() << 33;
x = x_min | (x & x_max); // x_min is a subset of x, which is a subset of x_max
n = __builtin_popcountll(x);
printf("x_min = 0x%016lX, %d bits\n", x_min, __builtin_popcountll(x_min));
printf("x_max = 0x%016lX, %d bits\n", x_max, __builtin_popcountll(x_max));
printf("x = 0x%016lX, %d bits\n\n", x, n);
if (n > nb_bits)
x_max = x;
else
x_min = x;
}
return x_min;
}
In general less than 10 loops are needed to reach the requested number of bits (and with luck it can take 2 or 3 loops). Corner cases (nb_bits=0,1,width-1,width) are working even if a special case would be faster.
Example of result:
x_min = 0x0000000000000000, 0 bits
x_max = 0x1FFFFFFFFFFFFFFF, 61 bits
x = 0x1492717D79B2F570, 33 bits
x_min = 0x0000000000000000, 0 bits
x_max = 0x1492717D79B2F570, 33 bits
x = 0x1000202C70305120, 14 bits
x_min = 0x0000000000000000, 0 bits
x_max = 0x1000202C70305120, 14 bits
x = 0x0000200C10200120, 7 bits
x_min = 0x0000200C10200120, 7 bits
x_max = 0x1000202C70305120, 14 bits
x = 0x1000200C70200120, 10 bits
x_min = 0x1000200C70200120, 10 bits
x_max = 0x1000202C70305120, 14 bits
x = 0x1000200C70201120, 11 bits
x_min = 0x1000200C70201120, 11 bits
x_max = 0x1000202C70305120, 14 bits
x = 0x1000200C70301120, 12 bits
width = 61, nb_bits = 12, x = 0x1000200C70301120
Of course, you need a good prng. Otherwise you can face an infinite loop.
Say the number of bits to set is b and the word size is w. I would create a vector v of of length w with the first b values set to 1 and the rest set to 0. Then just shuffle v.
Here is another option which is very simple and reasonably fast in practice.
choose a bit at random
if it is already set
do nothing
else
set it
increment count
end if
Repeat until count equals the number of bits you want set.
This will only be slow when the number of bits you want set (call it k) is more than half the word length (call it N). In that case, use the algorithm to set N - k bits instead and then flip all the bits in the result.
I bet the expected running time here is pretty good, although I am too lazy/stupid to compute it precisely right now. But I can bound it as less than 2*k... The expected number of flips of a coin to get "heads" is two, and each iteration here has a better than 1/2 chance of succeeding.
If you don't have the convenience of Python's random.sample, you might do this in C using the classic sequential sampling algorithm:
unsigned long k_bit_helper(int n, int k, unsigned long bit, unsigned long accum) {
if !(n && k)
return accum;
if (k > rand() % n)
return k_bit_helper(n - 1, k - 1, bit + bit, accum + bit);
else
return k_bit_helper(n - 1, k, bit + bit, accum);
}
unsigned long random_k_bits(int k) {
return k_bit_helper(64, k, 1, 0);
}
The cost of the above will be dominated by the cost of generating the random numbers (true in the other solutions, also). You can optimize this a bit if you have a good prng by batching: for example, since you know that the random numbers will be in steadily decreasing ranges, you could get the random numbers for n through n-3 by getting a random number in the range 0..(n * (n - 1) * (n - 2) * (n - 3)) and then extracting the individual random numbers:
r = randint(0, n * (n - 1) * (n - 2) * (n - 3) - 1);
rn = r % n; r /= n
rn1 = r % (n - 1); r /= (n - 1);
rn2 = r % (n - 2); r /= (n - 2);
rn3 = r % (n - 3); r /= (n - 3);
The maximum value of n is presumably 64 or 26, so the maximum value of the product above is certainly less than 224. Indeed, if you used a 64-bit prng, you could extract as many as 10 random numbers out of it. However, don't do this unless you know the prng you use produces independently random bits.
I have another suggestion based on enumeration: choose a random number i between 1 and n choose k, and generate the i-th combination. For example, for n = 6, k = 3 the 20 combinations are:
000111
001011
010011
100011
001101
010101
100101
011001
101001
110001
001110
010110
100110
011010
101010
110010
011100
101100
110100
111000
Let's say we randomly choose combination number 7. We first check whether it has a 1 in the last position: it has, because the first 10 (5 choose 2) combinations have. We then recursively check the remaining positions. Here is some C++ code:
word ithCombination(int n, int k, word i) {
// i is zero-based
word x = 0;
word b = 1;
while (k) {
word c = binCoeff[n - 1][k - 1];
if (i < c) {
x |= b;
--k;
} else {
i -= c;
}
--n;
b <<= 1;
}
return x;
}
word randomKBits(int k) {
word i = randomRange(0, binCoeff[BITS_PER_WORD][k] - 1);
return ithCombination(BITS_PER_WORD, k, i);
}
To be fast, we use precalculated binomial coefficients in binCoeff. The function randomRange returns a random integer between the two bounds (inclusively).
I did some timings (source). With the C++11 default random number generator, most time is spent in generating random numbers. Then this solution is fastest, since it uses the absolute minimum number of random bits possible. If I use a fast random number generator, then the solution by mic006 is fastest. If k is known to be very small, it's best to just randomly set bits until k are set.
Not exactly an algorithm suggestion, but just found a really neat solution in JavaScript to get random bits directly from Math.random output bits using ArrayBuffer.
//Swap var out with const and let for maximum performance! I like to use var because of prototyping ease
var randomBitList = function(n){
var floats = Math.ceil(n/64)+1;
var buff = new ArrayBuffer(floats*8);
var floatView = new Float64Array(buff);
var int8View = new Uint8Array(buff);
var intView = new Int32Array(buff);
for(var i = 0; i < (floats-1)*2; i++){
floatView[floats-1] = Math.random();
int8View[(floats-1)*8] = int8View[(floats-1)*8+4];
intView[i] = intView[(floats-1)*2];
}
this.get = function(idx){
var i = idx>>5;//divide by 32
var j = idx%32;
return (intView[i]>>j)&1;
//return Math.random()>0.5?0:1;
};
this.getBitList = function(){
var arr = [];
for(var idx = 0; idx < n; idx++){
var i = idx>>5;//divide by 32
var j = idx%32;
arr[idx] = (intView[i]>>j)&1;
}
return arr;
}
};
It's easy enough to make a simple sieve:
for (int i=2; i<=N; i++){
if (sieve[i]==0){
cout << i << " is prime" << endl;
for (int j = i; j<=N; j+=i){
sieve[j]=1;
}
}
cout << i << " has " << sieve[i] << " distinct prime factors\n";
}
But what about when N is very large and I can't hold that kind of array in memory? I've looked up segmented sieve approaches and they seem to involve finding primes up until sqrt(N) but I don't understand how it works. What if N is very large (say 10^18)?
The basic idea of a segmented sieve is to choose the sieving primes less than the square root of n, choose a reasonably large segment size that nevertheless fits in memory, and then sieve each of the segments in turn, starting with the smallest. At the first segment, the smallest multiple of each sieving prime that is within the segment is calculated, then multiples of the sieving prime are marked as composite in the normal way; when all the sieving primes have been used, the remaining unmarked numbers in the segment are prime. Then, for the next segment, for each sieving prime you already know the first multiple in the current segment (it was the multiple that ended the sieving for that prime in the prior segment), so you sieve on each sieving prime, and so on until you are finished.
The size of n doesn't matter, except that a larger n will take longer to sieve than a smaller n; the size that matters is the size of the segment, which should be as large as convenient (say, the size of the primary memory cache on the machine).
You can see a simple implementation of a segmented sieve here. Note that a segmented sieve will be very much faster than O'Neill's priority-queue sieve mentioned in another answer; if you're interested, there's an implementation here.
EDIT: I wrote this for a different purpose, but I'll show it here because it might be useful:
Though the Sieve of Eratosthenes is very fast, it requires O(n) space. That can be reduced to O(sqrt(n)) for the sieving primes plus O(1) for the bitarray by performing the sieving in successive segments. At the first segment, the smallest multiple of each sieving prime that is within the segment is calculated, then multiples of the sieving prime are marked composite in the normal way; when all the sieving primes have been used, the remaining unmarked numbers in the segment are prime. Then, for the next segment, the smallest multiple of each sieving prime is the multiple that ended the sieving in the prior segment, and so the sieving continues until finished.
Consider the example of sieve from 100 to 200 in segments of 20. The five sieving primes are 3, 5, 7, 11 and 13. In the first segment from 100 to 120, the bitarray has ten slots, with slot 0 corresponding to 101, slot k corresponding to 100+2k+1, and slot 9 corresponding to 119. The smallest multiple of 3 in the segment is 105, corresponding to slot 2; slots 2+3=5 and 5+3=8 are also multiples of 3. The smallest multiple of 5 is 105 at slot 2, and slot 2+5=7 is also a multiple of 5. The smallest multiple of 7 is 105 at slot 2, and slot 2+7=9 is also a multiple of 7. And so on.
Function primesRange takes arguments lo, hi and delta; lo and hi must be even, with lo < hi, and lo must be greater than sqrt(hi). The segment size is twice delta. Ps is a linked list containing the sieving primes less than sqrt(hi), with 2 removed since even numbers are ignored. Qs is a linked list containing the offest into the sieve bitarray of the smallest multiple in the current segment of the corresponding sieving prime. After each segment, lo advances by twice delta, so the number corresponding to an index i of the sieve bitarray is lo + 2i + 1.
function primesRange(lo, hi, delta)
function qInit(p)
return (-1/2 * (lo + p + 1)) % p
function qReset(p, q)
return (q - delta) % p
sieve := makeArray(0..delta-1)
ps := tail(primes(sqrt(hi)))
qs := map(qInit, ps)
while lo < hi
for i from 0 to delta-1
sieve[i] := True
for p,q in ps,qs
for i from q to delta step p
sieve[i] := False
qs := map(qReset, ps, qs)
for i,t from 0,lo+1 to delta-1,hi step 1,2
if sieve[i]
output t
lo := lo + 2 * delta
When called as primesRange(100, 200, 10), the sieving primes ps are [3, 5, 7, 11, 13]; qs is initially [2, 2, 2, 10, 8] corresponding to smallest multiples 105, 105, 105, 121 and 117, and is reset for the second segment to [1, 2, 6, 0, 11] corresponding to smallest multiples 123, 125, 133, 121 and 143.
You can see this program in action at http://ideone.com/iHYr1f. And in addition to the links shown above, if you are interested in programming with prime numbers I modestly recommend this essay at my blog.
It's just that we are making segmented with the sieve we have.
The basic idea is let's say we have to find out prime numbers between 85 and 100.
We have to apply the traditional sieve,but in the fashion as described below:
So we take the first prime number 2 , divide the starting number by 2(85/2) and taking round off to smaller number we get p=42,now multiply again by 2 we get p=84, from here onwards start adding 2 till the last number.So what we have done is that we have removed all the factors of 2(86,88,90,92,94,96,98,100) in the range.
We take the next prime number 3 , divide the starting number by 3(85/3) and taking round off to smaller number we get p=28,now multiply again by 3 we get p=84, from here onwards start adding 3 till the last number.So what we have done is that we have removed all the factors of 3(87,90,93,96,99) in the range.
Take the next prime number=5 and so on..................
Keep on doing the above steps.You can get the prime numbers (2,3,5,7,...) by using the traditional sieve upto sqrt(n).And then use it for segmented sieve.
There's a version of the Sieve based on priority queues that yields as many primes as you request, rather than all of them up to an upper bound. It's discussed in the classic paper "The Genuine Sieve of Eratosthenes" and googling for "sieve of eratosthenes priority queue" turns up quite a few implementations in various programming languages.
If someone would like to see C++ implementation, here is mine:
void sito_delta( int delta, std::vector<int> &res)
{
std::unique_ptr<int[]> results(new int[delta+1]);
for(int i = 0; i <= delta; ++i)
results[i] = 1;
int pierw = sqrt(delta);
for (int j = 2; j <= pierw; ++j)
{
if(results[j])
{
for (int k = 2*j; k <= delta; k+=j)
{
results[k]=0;
}
}
}
for (int m = 2; m <= delta; ++m)
if (results[m])
{
res.push_back(m);
std::cout<<","<<m;
}
};
void sito_segment(int n,std::vector<int> &fiPri)
{
int delta = sqrt(n);
if (delta>10)
{
sito_segment(delta,fiPri);
// COmpute using fiPri as primes
// n=n,prime = fiPri;
std::vector<int> prime=fiPri;
int offset = delta;
int low = offset;
int high = offset * 2;
while (low < n)
{
if (high >=n ) high = n;
int mark[offset+1];
for (int s=0;s<=offset;++s)
mark[s]=1;
for(int j = 0; j< prime.size(); ++j)
{
int lowMinimum = (low/prime[j]) * prime[j];
if(lowMinimum < low)
lowMinimum += prime[j];
for(int k = lowMinimum; k<=high;k+=prime[j])
mark[k-low]=0;
}
for(int i = low; i <= high; i++)
if(mark[i-low])
{
fiPri.push_back(i);
std::cout<<","<<i;
}
low=low+offset;
high=high+offset;
}
}
else
{
std::vector<int> prime;
sito_delta(delta, prime);
//
fiPri = prime;
//
int offset = delta;
int low = offset;
int high = offset * 2;
// Process segments one by one
while (low < n)
{
if (high >= n) high = n;
int mark[offset+1];
for (int s = 0; s <= offset; ++s)
mark[s] = 1;
for (int j = 0; j < prime.size(); ++j)
{
// find the minimum number in [low..high] that is
// multiple of prime[i] (divisible by prime[j])
int lowMinimum = (low/prime[j]) * prime[j];
if(lowMinimum < low)
lowMinimum += prime[j];
//Mark multiples of prime[i] in [low..high]
for (int k = lowMinimum; k <= high; k+=prime[j])
mark[k-low] = 0;
}
for (int i = low; i <= high; i++)
if(mark[i-low])
{
fiPri.push_back(i);
std::cout<<","<<i;
}
low = low + offset;
high = high + offset;
}
}
};
int main()
{
std::vector<int> fiPri;
sito_segment(1013,fiPri);
}
Based on Swapnil Kumar answer I did the following algorithm in C. It was built with mingw32-make.exe.
#include<math.h>
#include<stdio.h>
#include<stdlib.h>
int main()
{
const int MAX_PRIME_NUMBERS = 5000000;//The number of prime numbers we are looking for
long long *prime_numbers = malloc(sizeof(long long) * MAX_PRIME_NUMBERS);
prime_numbers[0] = 2;
prime_numbers[1] = 3;
prime_numbers[2] = 5;
prime_numbers[3] = 7;
prime_numbers[4] = 11;
prime_numbers[5] = 13;
prime_numbers[6] = 17;
prime_numbers[7] = 19;
prime_numbers[8] = 23;
prime_numbers[9] = 29;
const int BUFFER_POSSIBLE_PRIMES = 29 * 29;//Because the greatest prime number we have is 29 in the 10th position so I started with a block of 841 numbers
int qt_calculated_primes = 10;//10 because we initialized the array with the ten first primes
int possible_primes[BUFFER_POSSIBLE_PRIMES];//Will store the booleans to check valid primes
long long iteration = 0;//Used as multiplier to the range of the buffer possible_primes
int i;//Simple counter for loops
while(qt_calculated_primes < MAX_PRIME_NUMBERS)
{
for (i = 0; i < BUFFER_POSSIBLE_PRIMES; i++)
possible_primes[i] = 1;//set the number as prime
int biggest_possible_prime = sqrt((iteration + 1) * BUFFER_POSSIBLE_PRIMES);
int k = 0;
long long prime = prime_numbers[k];//First prime to be used in the check
while (prime <= biggest_possible_prime)//We don't need to check primes bigger than the square root
{
for (i = 0; i < BUFFER_POSSIBLE_PRIMES; i++)
if ((iteration * BUFFER_POSSIBLE_PRIMES + i) % prime == 0)
possible_primes[i] = 0;
if (++k == qt_calculated_primes)
break;
prime = prime_numbers[k];
}
for (i = 0; i < BUFFER_POSSIBLE_PRIMES; i++)
if (possible_primes[i])
{
if ((qt_calculated_primes < MAX_PRIME_NUMBERS) && ((iteration * BUFFER_POSSIBLE_PRIMES + i) != 1))
{
prime_numbers[qt_calculated_primes] = iteration * BUFFER_POSSIBLE_PRIMES + i;
printf("%d\n", prime_numbers[qt_calculated_primes]);
qt_calculated_primes++;
} else if (!(qt_calculated_primes < MAX_PRIME_NUMBERS))
break;
}
iteration++;
}
return 0;
}
It set a maximum of prime numbers to be found, then an array is initialized with known prime numbers like 2, 3, 5...29. So we make a buffer that will store the segments of possible primes, this buffer can't be greater than the power of the greatest initial prime that in this case is 29.
I'm sure there are a plenty of optimizations that can be done to improve the performance like parallelize the segments analysis process and skip numbers that are multiple of 2, 3 and 5 but it serves as an example of low memory consumption.
A number is prime if none of the smaller prime numbers divides it. Since we iterate over the prime numbers in order, we already marked all numbers, who are divisible by at least one of the prime numbers, as divisible. Hence if we reach a cell and it is not marked, then it isn't divisible by any smaller prime number and therefore has to be prime.
Remember these points:-
// Generating all prime number up to R
// creating an array of size (R-L-1) set all elements to be true: prime && false: composite
#include<bits/stdc++.h>
using namespace std;
#define MAX 100001
vector<int>* sieve(){
bool isPrime[MAX];
for(int i=0;i<MAX;i++){
isPrime[i]=true;
}
for(int i=2;i*i<MAX;i++){
if(isPrime[i]){
for(int j=i*i;j<MAX;j+=i){
isPrime[j]=false;
}
}
}
vector<int>* primes = new vector<int>();
primes->push_back(2);
for(int i=3;i<MAX;i+=2){
if(isPrime[i]){
primes->push_back(i);
}
}
return primes;
}
void printPrimes(long long l, long long r, vector<int>*&primes){
bool isprimes[r-l+1];
for(int i=0;i<=r-l;i++){
isprimes[i]=true;
}
for(int i=0;primes->at(i)*(long long)primes->at(i)<=r;i++){
int currPrimes=primes->at(i);
//just smaller or equal value to l
long long base =(l/(currPrimes))*(currPrimes);
if(base<l){
base=base+currPrimes;
}
//mark all multiplies within L to R as false
for(long long j=base;j<=r;j+=currPrimes){
isprimes[j-l]=false;
}
//there may be a case where base is itself a prime number
if(base==currPrimes){
isprimes[base-l]= true;
}
}
for(int i=0;i<=r-l;i++){
if(isprimes[i]==true){
cout<<i+l<<endl;
}
}
}
int main(){
vector<int>* primes=sieve();
int t;
cin>>t;
while(t--){
long long l,r;
cin>>l>>r;
printPrimes(l,r,primes);
}
return 0;
}