Why do we check up to the square root of a number to determine if the number is prime? - algorithm

To test whether a number is prime or not, why do we have to test whether it is divisible only up to the square root of that number?

If a number n is not a prime, it can be factored into two factors a and b:
n = a * b
Now a and b can't be both greater than the square root of n, since then the product a * b would be greater than sqrt(n) * sqrt(n) = n. So in any factorization of n, at least one of the factors must be smaller than the square root of n, and if we can't find any factors less than or equal to the square root, n must be a prime.

Let's say m = sqrt(n) then m × m = n. Now if n is not a prime then n can be written as n = a × b, so m × m = a × b. Notice that m is a real number whereas n, a and b are natural numbers.
Now there can be 3 cases:
a > m ⇒ b < m
a = m ⇒ b = m
a < m ⇒ b > m
In all 3 cases, min(a, b) ≤ m. Hence if we search till m, we are bound to find at least one factor of n, which is enough to show that n is not prime.

Because if a factor is greater than the square root of n, the other factor that would multiply with it to equal n is necessarily less than the square root of n.

Suppose n is not a prime number (greater than 1). So there are numbers a and b such that
n = ab (1 < a <= b < n)
By multiplying the relation a<=b by a and b we get:
a^2 <= ab
ab <= b^2
Therefore: (note that n=ab)
a^2 <= n <= b^2
Hence: (Note that a and b are positive)
a <= sqrt(n) <= b
So if a number (greater than 1) is not prime and we test divisibility up to square root of the number, we will find one of the factors.

It's all really just basic uses of Factorization and Square Roots.
It may appear to be abstract, but in reality it simply lies with the fact that a non-prime-number's maximum possible factorial would have to be its square root because:
sqrroot(n) * sqrroot(n) = n.
Given that, if any whole number above 1 and below or up to sqrroot(n) divides evenly into n, then n cannot be a prime number.
Pseudo-code example:
i = 2;
is_prime = true;
while loop (i <= sqrroot(n))
{
if (n % i == 0)
{
is_prime = false;
exit while;
}
++i;
}

Let's suppose that the given integer N is not prime,
Then N can be factorized into two factors a and b , 2 <= a, b < N such that N = a*b.
Clearly, both of them can't be greater than sqrt(N) simultaneously.
Let us assume without loss of generality that a is smaller.
Now, if you could not find any divisor of N belonging in the range [2, sqrt(N)], what does that mean?
This means that N does not have any divisor in [2, a] as a <= sqrt(N).
Therefore, a = 1 and b = n and hence By definition, N is prime.
...
Further reading if you are not satisfied:
Many different combinations of (a, b) may be possible. Let's say they are:
(a1, b1), (a2, b2), (a3, b3), ..... , (ak, bk). Without loss of generality, assume ai < bi, 1<= i <=k.
Now, to be able to show that N is not prime it is sufficient to show that none of ai can be factorized further. And we also know that ai <= sqrt(N) and thus you need to check till sqrt(N) which will cover all ai. And hence you will be able to conclude whether or not N is prime.
...

So to check whether a number N is Prime or not.
We need to only check if N is divisible by numbers<=SQROOT(N). This is because, if we factor N into any 2 factors say X and Y, ie. N=XY.
Each of X and Y cannot be less than SQROOT(N) because then, XY < N
Each of X and Y cannot be greater than SQROOT(N) because then, X*Y > N
Therefore one factor must be less than or equal to SQROOT(N) ( while the other factor is greater than or equal to SQROOT(N) ).
So to check if N is Prime we need only check those numbers <= SQROOT(N).

Let's say we have a number "a", which is not prime [not prime/composite number means - a number which can be divided evenly by numbers other than 1 or itself. For example, 6 can be divided evenly by 2, or by 3, as well as by 1 or 6].
6 = 1 × 6 or 6 = 2 × 3
So now if "a" is not prime then it can be divided by two other numbers and let's say those numbers are "b" and "c". Which means
a=b*c.
Now if "b" or "c" , any of them is greater than square root of "a "than multiplication of "b" & "c" will be greater than "a".
So, "b" or "c" is always <= square root of "a" to prove the equation "a=b*c".
Because of the above reason, when we test if a number is prime or not, we only check until square root of that number.

Given any number n, then one way to find its factors is to get its square root p:
sqrt(n) = p
Of course, if we multiply p by itself, then we get back n:
p*p = n
It can be re-written as:
a*b = n
Where p = a = b. If a increases, then b decreases to maintain a*b = n. Therefore, p is the upper limit.
Update: I am re-reading this answer again today and it became clearer to me more. The value p does not necessarily mean an integer because if it is, then n would not be a prime. So, p could be a real number (ie, with fractions). And instead of going through the whole range of n, now we only need to go through the whole range of p. The other p is a mirror copy so in effect we halve the range. And then, now I am seeing that we can actually continue re-doing the square root and doing it to p to further half the range.

Let n be non-prime. Therefore, it has at least two integer factors greater than 1. Let f be the smallest of n's such factors. Suppose f > sqrt n. Then n/f is an integer ≤ sqrt n, thus smaller than f. Therefore, f cannot be n's smallest factor. Reductio ad absurdum; n's smallest factor must be ≤ sqrt n.

Any composite number is a product of primes.
Let say n = p1 * p2, where p2 > p1 and they are primes.
If n % p1 === 0 then n is a composite number.
If n % p2 === 0 then guess what n % p1 === 0 as well!
So there is no way that if n % p2 === 0 but n % p1 !== 0 at the same time.
In other words if a composite number n can be divided evenly by
p2,p3...pi (its greater factor) it must be divided by its lowest factor p1 too.
It turns out that the lowest factor p1 <= Math.square(n) is always true.

Yes, as it was properly explained above, it's enough to iterate up to Math.floor of a number's square root to check its primality (because sqrt covers all possible cases of division; and Math.floor, because any integer above sqrt will already be beyond its range).
Here is a runnable JavaScript code snippet that represents a simple implementation of this approach – and its "runtime-friendliness" is good enough for handling pretty big numbers (I tried checking both prime and not prime numbers up to 10**12, i.e. 1 trillion, compared results with the online database of prime numbers and encountered no errors or lags even on my cheap phone):
function isPrime(num) {
if (num % 2 === 0 || num < 3 || !Number.isSafeInteger(num)) {
return num === 2;
} else {
const sqrt = Math.floor(Math.sqrt(num));
for (let i = 3; i <= sqrt; i += 2) {
if (num % i === 0) return false;
}
return true;
}
}
<label for="inp">Enter a number and click "Check!":</label><br>
<input type="number" id="inp"></input>
<button onclick="alert(isPrime(+document.getElementById('inp').value) ? 'Prime' : 'Not prime')" type="button">Check!</button>

To test the primality of a number, n, one would expect a loop such as following in the first place :
bool isPrime = true;
for(int i = 2; i < n; i++){
if(n%i == 0){
isPrime = false;
break;
}
}
What the above loop does is this : for a given 1 < i < n, it checks if n/i is an integer (leaves remainder 0). If there exists an i for which n/i is an integer, then we can be sure that n is not a prime number, at which point the loop terminates. If for no i, n/i is an integer, then n is prime.
As with every algorithm, we ask : Can we do better ?
Let us see what is going on in the above loop.
The sequence of i goes : i = 2, 3, 4, ... , n-1
And the sequence of integer-checks goes : j = n/i, which is n/2, n/3, n/4, ... , n/(n-1)
If for some i = a, n/a is an integer, then n/a = k (integer)
or n = ak, clearly n > k > 1 (if k = 1, then a = n, but i never reaches n; and if k = n, then a = 1, but i starts form 2)
Also, n/k = a, and as stated above, a is a value of i so n > a > 1.
So, a and k are both integers between 1 and n (exclusive). Since, i reaches every integer in that range, at some iteration i = a, and at some other iteration i = k. If the primality test of n fails for min(a,k), it will also fail for max(a,k). So we need to check only one of these two cases, unless min(a,k) = max(a,k) (where two checks reduce to one) i.e., a = k , at which point a*a = n, which implies a = sqrt(n).
In other words, if the primality test of n were to fail for some i >= sqrt(n) (i.e., max(a,k)), then it would also fail for some i <= n (i.e., min(a,k)). So, it would suffice if we run the test for i = 2 to sqrt(n).

Related

Number of different binary sequences of length n generated using exactly k flip operations

Consider a binary sequence b of length N. Initially, all the bits are set to 0. We define a flip operation with 2 arguments, flip(L,R), such that:
All bits with indices between L and R are "flipped", meaning a bit with value 1 becomes a bit with value 0 and vice-versa. More exactly, for all i in range [L,R]: b[i] = !b[i].
Nothing happens to bits outside the specified range.
You are asked to determine the number of possible different sequences that can be obtained using exactly K flip operations modulo an arbitrary given number, let's call it MOD.
More specifically, each test contains on the first line a number T, the number of queries to be given. Then there are T queries, each one being of the form N, K, MOD with the meaning from above.
1 ≤ N, K ≤ 300 000
T ≤ 250
2 ≤ MOD ≤ 1 000 000 007
Sum of all N-s in a test is ≤ 600 000
time limit: 2 seconds
memory limit: 65536 kbytes
Example :
Input :
1
2 1 1000
Output :
3
Explanation :
There is a single query. The initial sequence is 00. We can do the following operations :
flip(1,1) ⇒ 10
flip(2,2) ⇒ 01
flip(1,2) ⇒ 11
So there are 3 possible sequences that can be generated using exactly 1 flip.
Some quick observations that I've made, although I'm not sure they are totally correct :
If K is big enough, that is if we have a big enough number of flips at our disposal, we should be able to obtain 2n sequences.
If K=1, then the result we're looking for is N(N+1)/2. It's also C(n,1)+C(n,2), where C is the binomial coefficient.
Currently trying a brute force approach to see if I can spot a rule of some kind. I think this is a sum of some binomial coefficients, but I'm not sure.
I've also come across a somewhat simpler variant of this problem, where the flip operation only flips a single specified bit. In that case, the result is
C(n,k)+C(n,k-2)+C(n,k-4)+...+C(n,(1 or 0)). Of course, there's the special case where k > n, but it's not a huge difference. Anyway, it's pretty easy to understand why that happens.I guess it's worth noting.
Here are a few ideas:
We may assume that no flip operation occurs twice (otherwise, we can assume that it did not happen). It does affect the number of operations, but I'll talk about it later.
We may assume that no two segments intersect. Indeed, if L1 < L2 < R1 < R2, we can just do the (L1, L2 - 1) and (R1 + 1, R2) flips instead. The case when one segment is inside the other is handled similarly.
We may also assume that no two segments touch each other. Otherwise, we can glue them together and reduce the number of operations.
These observations give the following formula for the number of different sequences one can obtain by flipping exactly k segments without "redundant" flips: C(n + 1, 2 * k) (we choose 2 * k ends of segments. They are always different. The left end is exclusive).
If we had perform no more than K flips, the answer would be sum for k = 0...K of C(n + 1, 2 * k)
Intuitively, it seems that its possible to transform the sequence of no more than K flips into a sequence of exactly K flips (for instance, we can flip the same segment two more times and add 2 operations. We can also split a segment of more than two elements into two segments and add one operation).
By running the brute force search (I know that it's not a real proof, but looks correct combined with the observations mentioned above) that the answer this sum minus 1 if n or k is equal to 1 and exactly the sum otherwise.
That is, the result is C(n + 1, 0) + C(n + 1, 2) + ... + C(n + 1, 2 * K) - d, where d = 1 if n = 1 or k = 1 and 0 otherwise.
Here is code I used to look for patterns running a brute force search and to verify that the formula is correct for small n and k:
reachable = set()
was = set()
def other(c):
"""
returns '1' if c == '0' and '0' otherwise
"""
return '0' if c == '1' else '1'
def flipped(s, l, r):
"""
Flips the [l, r] segment of the string s and returns the result
"""
res = s[:l]
for i in range(l, r + 1):
res += other(s[i])
res += s[r + 1:]
return res
def go(xs, k):
"""
Exhaustive search. was is used to speed up the search to avoid checking the
same string with the same number of remaining operations twice.
"""
p = (xs, k)
if p in was:
return
was.add(p)
if k == 0:
reachable.add(xs)
return
for l in range(len(xs)):
for r in range(l, len(xs)):
go(flipped(xs, l, r), k - 1)
def calc_naive(n, k):
"""
Counts the number of reachable sequences by running an exhaustive search
"""
xs = '0' * n
global reachable
global was
was = set()
reachable = set()
go(xs, k)
return len(reachable)
def fact(n):
return 1 if n == 0 else n * fact(n - 1)
def cnk(n, k):
if k > n:
return 0
return fact(n) // fact(k) // fact(n - k)
def solve(n, k):
"""
Uses the formula shown above to compute the answer
"""
res = 0
for i in range(k + 1):
res += cnk(n + 1, 2 * i)
if k == 1 or n == 1:
res -= 1
return res
if __name__ == '__main__':
# Checks that the formula gives the right answer for small values of n and k
for n in range(1, 11):
for k in range(1, 11):
assert calc_naive(n, k) == solve(n, k)
This solution is much better than the exhaustive search. For instance, it can run in O(N * K) time per test case if we compute the coefficients using Pascal's triangle. Unfortunately, it is not fast enough. I know how to solve it more efficiently for prime MOD (using Lucas' theorem), but O do not have a solution in general case.
Multiplicative modular inverses can't solve this problem immediately as k! or (n - k)! may not have an inverse modulo MOD.
Note: I assumed that C(n, m) is defined for all non-negative n and m and is equal to 0 if n < m.
I think I know how to solve it for an arbitrary MOD now.
Let's factorize the MOD into prime factors p1^a1 * p2^a2 * ... * pn^an. Now can solve this problem for each prime factor independently and combine the result using the Chinese remainder theorem.
Let's fix a prime p. Let's assume that p^a|MOD (that is, we need to get the result modulo p^a). We can precompute all p-free parts of the factorial and the maximum power of p that divides the factorial for all 0 <= n <= N in linear time using something like this:
powers = [0] * (N + 1)
p_free = [i for i in range(N + 1)]
p_free[0] = 1
for cur_p in powers of p <= N:
i = cur_p
while i < N:
powers[i] += 1
p_free[i] /= p
i += cur_p
Now the p-free part of the factorial is the product of p_free[i] for all i <= n and the power of p that divides n! is the prefix sum of the powers.
Now we can divide two factorials: the p-free part is coprime with p^a so it always has an inverse. The powers of p are just subtracted.
We're almost there. One more observation: we can precompute the inverses of p-free parts in linear time. Let's compute the inverse for the p-free part of N! using Euclid's algorithm. Now we can iterate over all i from N to 0. The inverse of the p-free part of i! is the inverse for i + 1 times p_free[i] (it's easy to prove it if we rewrite the inverse of the p-free part as a product using the fact that elements coprime with p^a form an abelian group under multiplication).
This algorithm runs in O(N * number_of_prime_factors + the time to solve the system using the Chinese remainder theorem + sqrt(MOD)) time per test case. Now it looks good enough.
You're on a good path with binomial-coefficients already. There are several factors to consider:
Think of your number as a binary-string of length n. Now we can create another array counting the number of times a bit will be flipped:
[0, 1, 0, 0, 1] number
[a, b, c, d, e] number of flips.
But even numbers of flips all lead to the same result and so do all odd numbers of flips. So basically the relevant part of the distribution can be represented %2
Logical next question: How many different combinations of even and odd values are available. We'll take care of the ordering later on, for now just assume the flipping-array is ordered descending for simplicity. We start of with k as the only flipping-number in the array. Now we want to add a flip. Since the whole flipping-array is used %2, we need to remove two from the value of k to achieve this and insert them into the array separately. E.g.:
[5, 0, 0, 0] mod 2 [1, 0, 0, 0]
[3, 1, 1, 0] [1, 1, 1, 0]
[4, 1, 0, 0] [0, 1, 0, 0]
As the last example shows (remember we're operating modulo 2 in the final result), moving a single 1 doesn't change the number of flips in the final outcome. Thus we always have to flip an even number bits in the flipping-array. If k is even, so will the number of flipped bits be and same applies vice versa, no matter what the value of n is.
So now the question is of course how many different ways of filling the array are available? For simplicity we'll start with mod 2 right away.
Obviously we start with 1 flipped bit, if k is odd, otherwise with 1. And we always add 2 flipped bits. We can continue with this until we either have flipped all n bits (or at least as many as we can flip)
v = (k % 2 == n % 2) ? n : n - 1
or we can't spread k further over the array.
v = k
Putting this together:
noOfAvailableFlips:
if k < n:
return k
else:
return (k % 2 == n % 2) ? n : n - 1
So far so well, there are always v / 2 flipping-arrays (mod 2) that differ by the number of flipped bits. Now we come to the next part permuting these arrays. This is just a simple permutation-function (permutation with repetition to be precise):
flipArrayNo(flippedbits):
return factorial(n) / (factorial(flippedbits) * factorial(n - flippedbits)
Putting it all together:
solutionsByFlipping(n, k):
res = 0
for i in [k % 2, noOfAvailableFlips(), step=2]:
res += flipArrayNo(i)
return res
This also shows that for sufficiently large numbers we can't obtain 2^n sequences for the simply reason that we can not arrange operations as we please. The number of flips that actually affect the outcome will always be either even or odd depending upon k. There's no way around this. The best result one can get is 2^(n-1) sequences.
For completeness, here's a dynamic program. It can deal easily with arbitrary modulo since it is based on sums, but unfortunately I haven't found a way to speed it beyond O(n * k).
Let a[n][k] be the number of binary strings of length n with k non-adjacent blocks of contiguous 1s that end in 1. Let b[n][k] be the number of binary strings of length n with k non-adjacent blocks of contiguous 1s that end in 0.
Then:
# we can append 1 to any arrangement of k non-adjacent blocks of contiguous 1's
# that ends in 1, or to any arrangement of (k-1) non-adjacent blocks of contiguous
# 1's that ends in 0:
a[n][k] = a[n - 1][k] + b[n - 1][k - 1]
# we can append 0 to any arrangement of k non-adjacent blocks of contiguous 1's
# that ends in either 0 or 1:
b[n][k] = b[n - 1][k] + a[n - 1][k]
# complete answer would be sum (a[n][i] + b[n][i]) for i = 0 to k
I wonder if the following observations might be useful: (1) a[n][k] and b[n][k] are zero when n < 2*k - 1, and (2) on the flip side, for values of k greater than ⌊(n + 1) / 2⌋ the overall answer seems to be identical.
Python code (full matrices are defined for simplicity, but I think only one row of each would actually be needed, space-wise, for a bottom-up method):
a = [[0] * 11 for i in range(0,11)]
b = [([1] + [0] * 10) for i in range(0,11)]
def f(n,k):
return fa(n,k) + fb(n,k)
def fa(n,k):
global a
if a[n][k] or n == 0 or k == 0:
return a[n][k]
elif n == 2*k - 1:
a[n][k] = 1
return 1
else:
a[n][k] = fb(n-1,k-1) + fa(n-1,k)
return a[n][k]
def fb(n,k):
global b
if b[n][k] or n == 0 or n == 2*k - 1:
return b[n][k]
else:
b[n][k] = fb(n-1,k) + fa(n-1,k)
return b[n][k]
def g(n,k):
return sum([f(n,i) for i in range(0,k+1)])
# example
print(g(10,10))
for i in range(0,11):
print(a[i])
print()
for i in range(0,11):
print(b[i])

Dividing N items in p groups

You are given N total number of item, P group in which you have to divide the N items.
Condition is the product of number of item held by each group should be max.
example N=10 and P=3 you can divide the 10 item in {3,4,3} since 3x3x4=36 max possible product.
You will want to form P groups of roughly N / P elements. However, this will not always be possible, as N might not be divisible by P, as is the case for your example.
So form groups of floor(N / P) elements initially. For your example, you'd form:
floor(10 / 3) = 3
=> groups = {3, 3, 3}
Now, take the remainder of the division of N by P:
10 mod 3 = 1
This means you have to distribute 1 more item to your groups (you can have up to P - 1 items left to distribute in general):
for i = 0 up to (N mod P) - 1:
groups[i]++
=> groups = {4, 3, 3} for your example
Which is also a valid solution.
For fun I worked out a proof of the fact that it in an optimal solution either all numbers = N/P or the numbers are some combination of floor(N/P) and ceiling(N/P). The proof is somewhat long, but proving optimality in a discrete context is seldom trivial. I would be interested if anybody can shorten the proof.
Lemma: For P = 2 the optimal way to divide N is into {N/2, N/2} if N is even and {floor(N/2), ceiling(N/2)} if N is odd.
This follows since the constraint that the two numbers sum to N means that the two numbers are of the form x, N-x.
The resulting product is (N-x)x = Nx - x^2. This is a parabola that opens down. Its max is at its vertex at x = N/2. If N is even this max is an integer. If N is odd, then x = N/2 is a fraction, but such parabolas are strictly unimodal, so the closer x gets to N/2 the larger the product. x = floor(N/2) (or ceiling, it doesn't matter by symmetry) is the closest an integer can get to N/2, hence {floor(N/2),ceiling(N/2)} is optimal for integers.
General case: First of all, a global max exists since there are only finitely many integer partitions and a finite list of numbers always has a max. Suppose that {x_1, x_2, ..., x_P} is globally optimal. Claim: given and i,j we have
|x_i - x_ j| <= 1
In other words: any two numbers in an optimal solution differ by at most 1. This follows immediately from the P = 2 lemma (applied to N = x_i + x_ j).
From this claim it follows that there are at most two distinct numbers among the x_i. If there is only 1 number, that number is clearly N/P. If there are two numbers, they are of the form a and a+1. Let k = the number of x_i which equal a+1, hence P-k of the x_i = a. Hence
(P-k)a + k(a+1) = N, where k is an integer with 1 <= k < P
But simple algebra yields that a = (N-k)/P = N/P - k/P.
Hence -- a is an integer < N/P which differs from N/P by less than 1 (k/P < 1)
Thus a = floor(N/P) and a+1 = ceiling(N/P).
QED

Represent a number as sum of primes

I am given a large number n and I need to find whether it can be represented as sum of K prime numbers.
Ex 9 can be represented as sum of 3 prime number as 2+2+5.
I am trying to use variation of subset sum but number is too large to generate all primes number till then.
The problem is from the current HackerRank contest. The restrictions are 1 <= n, K <= 10^12
For K = 1, the answer is obviously "Yes" iif N is prime
For K = 2, according to the Goldbach conjecture, which is verified for N up to around 10^18, the answer is "Yes" iif N is even and N >= 4 or if N - 2 is prime.
The interesting case is K = 3. Obviously if N < 6, the answer is "No" because the smallest number expressible as the sum of three primes is 2 + 2 + 2 = 6.
If N >= 6, then either N - 2 or N - 3 is even and >= 4, so we can apply Goldbach's conjecture again.
So for K = 3, the answer is "Yes" simply iif N >= 6.
Via induction (hint: just use K - 3 times the prime 2), we can show that for K >= 3, the answer is "Yes" iif N >= 2*K, so only the cases K = 1 and K = 2 are non-trivial and require just a simple primality check, e.g. via Miller–Rabin in O(log^4 N).
EDIT: As a bonus, this proof also gives a constructive algorithm to output the partition. We use a number of 2's and maybe one 3 to get to K = 2. The tricky K = 2, N even case is not as hard as it looks: We know from computational verification of the Goldbach conjecture that for N >= 12, there is a Goldbach partition with a prime < 5200 or so. There are less than 700 such primes, so we can check them all in a reasonable amount of time.
The concept you are looking for is called the prime partitions of a number. The formula to compute the number of prime partitions of a number is \kappa(n) = \frac{1}{n}\left(\mathrm{sopf}(n) + \sum_{j=1}^{n-1} \mathrm{sopf}(j) \cdot \kappa(n-j)\right); I gave that in LaTeX notation because I don't know how to do it in html. The sopf(n) function is the sum of the distinct prime factors of n, so sopf(42) = 12, since 42 = 2 * 3 * 7, but sopf(12) = 5, since 12 = 2 * 2 * 3 but each prime factor is counted once.
I discuss this formula at my blog.
Your input are n and K. There are many cases :
K > n : impossible
K = n : the K prime numbers are all 1
K < n : 4 subcases :
a. n and K are odd
b. n is even, K is odd
c. n is odd, K is even
d. n and K are even
Case a: select any prime p < n and p > 2. The problem reduces to the same problem with input n-p and K-1 instead of n and K respectively, and we fall in case b
Case b: The problem reduces to the same problem with input n-2 and K-1 instead of n and K respectively, and we fall in case d
Case c: idem than b, but we fall in case a instead of d
Case d: if n = 2K, then 2, 2, ..., 2 taken K times is your solution (ie your primes are 2, 2, ..., 2). Otherwise n can be written
n = (\sum_{i=1}^{i=K-2} 2 ) + p + q
where we add the prime 2 (K-2) times in the sum. Then the problem reduces to the same problem with input n-2(K-2) instead of n and 2 instead of K. But this is Goldbach. You can solve it in O(n sqrt(n)) like this : take p and q both equal to n/2. Increment p and decrease q by 1 at each step until they are both prime.

Creating a random number generator from a coin toss

Yesterday i had this interview question, which I couldn't fully answer:
Given a function f() = 0 or 1 with a perfect 1:1 distribution, create a function f(n) = 0, 1, 2, ..., n-1 each with probability 1/n
I could come up with a solution for if n is a natural power of 2, ie use f() to generate the bits of a binary number of k=ln_2 n. But this obviously wouldn't work for, say, n=5 as this would generate f(5) = 5,6,7 which we do not want.
Does anyone know a solution?
You can build a rng for the smallest power of two greater than n as you described. Then whenever this algorithm generates a number larger than n-1, throw that number away and try again. This is called the method of rejection.
Addition
The algorithm is
Let m = 2^k >= n where k is is as small as possible.
do
Let r = random number in 0 .. m-1 generated by k coin flips
while r >= n
return r
The probability that this loop stops with at most i iterations is bounded by 1 - (1/2)^i. This goes to 1 very rapidly: The loop is still running after 30 iterations with probability less than one-billionth.
You can decrease the expected number of iterations with a slightly modified algorithm:
Choose p >= 1
Let m = 2^k >= p n where k is is as small as possible.
do
Let r = random number in 0 .. m-1 generated by k coin flips
while r >= p n
return floor(r / p)
For example if we are trying to generate 0 .. 4 (n = 5) with the simpler algorithm, we would reject 5, 6 and 7, which is 3/8 of the results. With p = 3 (for example), pn = 15, we'd have m = 16 and would reject only 15, or 1/16 of the results. The price is needing four coin flips rather than 3 and a division op. You can continue to increase p and add coin flips to decrease rejections as far as you wish.
Another interesting solution can be derived through a Markov Chain Monte Carlo technique, the Metropolis-Hastings algorithm. This would be significantly more efficient if a large number of samples were required but it would only approach the uniform distribution in the limit.
initialize: x[0] arbitrarily
for i=1,2,...,N
if (f() == 1) x[i] = (x[i-1]++) % n
else x[i] = (x[i-1]-- + n) % n
For large N the vector x will contain uniformly distributed numbers between 0 and n. Additionally, by adding in an accept/reject step we can simulate from an arbitrary distribution, but you would need to simulate uniform random numbers on [0,1] as a sub-procedure.
def gen(a, b):
min_possible = a
max_possible = b
while True:
floor_min_possible = floor(min_possible)
floor_max_possible = floor(max_possible)
if max_possible.is_integer():
floor_max_possible -= 1
if floor_max_possible == floor_min_possible:
return floor_max_possible
mid = (min_possible + max_possible)/2
if coin_flip():
min_possible = mid
else:
max_possible = mid
My #RandomNumberGenerator #RNG
/w any f(x) that gives rand ints from 1 to x, we can get rand ints from 1 to k, for any k:
get ints p & q, so p^q is smallest possible, while p is a factor of x, & p^q >= k;
Lbl A
i=0 & s=1; while i < q {
s+= ((f(x) mod p) - 1) * p^i;
i++;
}
if s > k, goto A, else return s
//** about notation/terms:
rand = random
int = integer
mod is (from) modulo arithmetic
Lbl is a “Label”, from the Basic language, & serves as a coordinates for executing code. After the while loop, if s > k, then “goto A” means return to the point of code where it says “Lbl A”, & resume. If you return to Lbl A & process the code again, it resets the values of i to 0 & s to 1.
i is an iterator for powers of p, & s is a sum.
"s+= foo" means "let s now equal what it used to be + foo".
"i++" means "let i now equal what it used to be + 1".
f(x) returns random integers from 1 to x. **//
I figured out/invented/solved it on my own, around 2008. The method is discussed as common knowledge here. Does anyone know since when the random number generator rejection method has been common knowledge? RSVP.

Find pairs in an array such that a%b = k , where k is a given integer

Here is an interesting programming puzzle I came across . Given an array of positive integers, and a number K. We need to find pairs(a,b) from the array such that a % b = K.
I have a naive O(n^2) solution to this where we can check for all pairs such that a%b=k. Works but inefficient. We can certainly do better than this can't we ? Any efficient algorithms for the same? Oh and it's NOT homework.
Sort your array and binary search or keep a hash table with the count of each value in your array.
For a number x, we can find the largest y such that x mod y = K as y = x - K. Binary search for this y or look it up in your hash and increment your count accordingly.
Now, this isn't necessarily the only value that will work. For example, 8 mod 6 = 8 mod 3 = 2. We have:
x mod y = K => x = q*y + K =>
=> x = q(x - K) + K =>
=> x = 1(x - K) + K =>
=> x = 2(x - K)/2 + K =>
=> ...
This means you will have to test all divisors of y as well. You can find the divisors in O(sqrt y), giving you a total complexity of O(n log n sqrt(max_value)) if using binary search and O(n sqrt(max_value)) with a hash table (recommended especially if your numbers aren't very large).
Treat the problem as having two separate arrays as input: one for the a numbers and a % b = K and one for the b numbers. I am going to assume that everything is >= 0.
First of all, you can discard any b <= K.
Now think of every number in b as generating a sequence K, K + b, K + 2b, K + 3b... You can record this using a pair of numbers (pos, b), where pos is incremented by b at each stage. Start with pos = 0.
Hold these sequences in a priority queue, so you can find the smallest pos value at any given time. Sort the array of a numbers - in fact you could do this ahead of time and discard any duplicates.
For each a number
While the smallest pos in the priority queue is <= a
Add the smallest multiple of b to it to make it >= a
If it is == a, you have a match
Update the stored value of pos for that sequence, re-ordering the priority queue
At worst, you end up comparing every number with every other number, which is the same as the simple solution, but with priority queue and sorting overhead. However, large values of b may remain unexamined in the priority queue while several a numbers pass through, in which case this does better - and if there are a lot of numbers to process and they are all different, some of them must be large.
This answer mentions the main points of an algorithm (called DL because it uses “divisor lists” ) and gives details via a program, called amodb.py.
Let B be the input array, containing N positive integers. Without much loss of generality, suppose B[i] > K for all i and that B is in ascending order. (Note that x%B[i] < K if B[i] < K; and where B[i] = K, one can report pairs (B[i], B[j]) for all j>i. If B is not sorted initially, charge a cost of O(N log N) to sort it.)
In algorithm DL and program amodb.py, A is an array with K pre-subtracted from the input array elements. Ie, A[i] = B[i] - K. Note that if a%b == K, then for some j we have a = b*j + K or a-K = b*j. That is, a%b == K iff a-K is a multiple of b. Moreover, if a-K = b*j and p is any factor of b, then p is a factor of a-K.
Let the prime numbers from 2 to 97 be called “small factors”. When N numbers are uniformly randomly selected from some interval [X,Y], on the order of N/ln(Y) of the numbers will have no small factors; a similar number will have a greatest small factor of 2; and declining proportions will have successively larger greatest small factors. For example, on the average about N/97 will be divisible by 97, about N/89-N/(89*97) by 89 but not 97, etc. Generally, when members of B are random, lists of members with certain greatest small factors or with no small factors are sub-O(N/ln(Y)) in length.
Given a list Bd containing members of B divisible by largest small factor p, DL tests each element of Bd against elements of list Ad, those elements of A divisible by p. But given a list Bp for elements of B without small factors, DL tests each of Bp's elements against all elements of A. Example: If N=25, p=13, Bd=[18967, 23231], and Ad=[12779, 162383], then DL tests if any of 12779%18967, 162383%18967, 12779%23231, 162383%23231 are zero. Note that it is possible to cut the number of tests in half in this example (and many others) by noticing 12779<18967, but amodb.py does not include that optimization.
DL makes J different lists for J different factors; in one version of amodb.py, J=25 and the factor set is primes less than 100. A larger value of J would increase the O(N*J) time to initialize divisor lists, but would slightly decrease the O(N*len(Bp)) time to process list Bp against elements of A. See results below. Time to process other lists is O((N/logY)*(N/logY)*J), which is in sharp contrast to the O(n*sqrt(Y)) complexity for a previous answer's method.
Shown next is output from two program runs. In each set, the first Found line is from a naïve O(N*N) test, and the second is from DL. (Note, both DL and the naïve method would run faster if too-small A values were progressively removed.) The time ratio in the last line of the first test shows a disappointingly low speedup ratio of 3.9 for DL vs naïve method. For that run, factors included only the 25 primes less than 100. For the second run, with better speedup of ~ 4.4, factors included numbers 2 through 13 and primes up to 100.
$ python amodb.py
N: 10000 K: 59685 X: 100000 Y: 1000000
Found 208 matches in 21.854 seconds
Found 208 matches in 5.598 seconds
21.854 / 5.598 = 3.904
$ python amodb.py
N: 10000 K: 97881 X: 100000 Y: 1000000
Found 207 matches in 21.234 seconds
Found 207 matches in 4.851 seconds
21.234 / 4.851 = 4.377
Program amodb.py:
import random, time
factors = [2,3,4,5,6,7,8,9,10,11,12,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
X, N = 100000, 10000
Y, K = 10*X, random.randint(X/2,X)
print "N: ", N, " K: ", K, "X: ", X, " Y: ", Y
B = sorted([random.randint(X,Y) for i in range(N)])
NP = len(factors); NP1 = NP+1
A, Az, Bz = [], [[] for i in range(NP1)], [[] for i in range(NP1)]
t0 = time.time()
for b in B:
a, aj, bj = b-K, -1, -1
A.append(a) # Add a to A
for j,p in enumerate(factors):
if a % p == 0:
aj = j
Az[aj].append(a)
if b % p == 0:
bj = j
Bz[bj].append(b)
Bp = Bz.pop() # Get not-factored B-values list into Bp
di = time.time() - t0; t0 = time.time()
c = 0
for a in A:
for b in B:
if a%b == 0:
c += 1
dq = round(time.time() - t0, 3); t0 = time.time()
c=0
for i,Bd in enumerate(Bz):
Ad = Az[i]
for b in Bd:
for ak in Ad:
if ak % b == 0:
c += 1
for b in Bp:
for ak in A:
if ak % b == 0:
c += 1
dr = round(di + time.time() - t0, 3)
print "Found", c, " matches in", dq, "seconds"
print "Found", c, " matches in", dr, "seconds"
print dq, "/", dr, "=", round(dq/dr, 3)

Resources