Given a number N (1 <= N <= 10^50), Find number of unique pairs (x,y) such that sum of digits of x + sum of digits of y is prime.
x,y <= N;
Test Case - N=5
output - 6
explanation - pairs are (1,2), (1,4), (2,3), (2,5), (3,4)
Note - (x,y), (y,x) are equivalent.So, if (2,5) is included then (5,2) is not.
This question was asked in a competitive programming contest.I couldn't figure how to do it.Has anyone got some ideas?
Observation 1:
Primes you want to consider are smaller than 1000.
(Because sum of all digits of a number <= 10^50 is <= 50*9 <= 500)
Observation 2:
There exists only one pair (x, x) that gives you a prime number. (What is (1, 1), because 1 + 1 = 2, for any x you get an even number, which must be 2 or is not a prime number)
Let's say you have a wizard friend that told you all the results of function f for a given n, where f(x) = number of numbers smaller than n which sum of digits is equal x.
Now find all primes up to 1000 and for each x from 0 to 500 and for each p - prime calculate f(x) * f(p - x).
Sum of the values you've calculated is equal to 2 * answer - 1. (Because of duplicates and that (1, 1) is the only pair (x, y) that x = y and it gives a prime number). So you only check up to 500*1000 possibilities.
The only thing left is to calculate function f.
You can do it using dynamic programming.
Let g(x, d, e) = number of d-digit numbers which sum of digits is equal x. If e = 1, the number you're considering is equal to first d digits of n, otherwise it's smaller.
x <= 500, d <= 50, e <= 1
You can easily see that you have up to 500*50*2 states.
Let's say you know all the previous values of g and you want to calculate g(x, d, 0).
You take any d - 1 digit number and add a digit y for each 0 <= y <= 9. Since you want to get x, it's previous sum of digits must have been equal to x - y. You also want it smaller than n, so you take g(x - y, d - 1, 0) and if y is smaller than dth digit of n, add also g(x - y, d - 1, 1).
Formula for g(x, d, 1):
You take any d - 1 digit number and add a digit y that y is equal to dth digit of n. Then your result is g(x - y, d - 1, 1).
Number of different options to consider is equal 500*50*2*10, which should be enough.
Related
I found this problem in a contest. The question is:
You are given an array of N non-negative integers (A1, A2, A3,..., An) and an integer M. Your task is to find the number of unordered pairs of array elements (X,Y) that satisfies the following bitwise equation:
2 * set_bits(X|Y) = M + set_bits(X ⊕ Y)
Note:
set_bits(n) represents the number of ones in the binary represenataion of an integer n.
X|Y represents the bitwise OR of integer X and Y.
X ⊕ Y represents the bitwise XOR of integer X and Y.
The unordered pair of array elements is pair (Ai, Aj) where 1 ≤ i < j ≤ N.
Print the number of unordered pairs of array elements that satisfy the above bitwise equation.
Sample Input 1:
N=4 M=2
arr = [3, 0, 4, 5]
Sample Output: 2
2 pairs are (3,0) and (0,5)
Sample Input 2:
N=8 M=2
arr = [3, 0, 4, 5, 6, 8, 1, 8]
Sample Output: 9
Is there any other way except brute force to solve this equation?
A solution with time complexity O(n) exists if the time complexity of set_bits is O(1).
First, we are going to rephrase the condition (the bitwise equation) a bit. Assume a pair of elements (X, Y) is given. Let c_01 represent the number of digits where X is 0 but Y is 1, c_10 represent the number digits where X is 1 and Y is 0, and c_11 represent the number of digits where X and Y are 1. For example, when X=5 and Y=1 (X=101, Y=001), c_01 = 0, c_10 = 1, c_11 = 1. Now, the condition can be rewritten as
2 * (c_01 + c_10 + c_11) = M + (c_01 + c_10)
because set_bits(X|Y) is equal to c_01 + c_10 + c_11 and set_bits(X^Y) is equal to c_01 + c_10.
We can reorder the equation into
c_01 + c_10 + 2*c_11 = M
by moving the term on the right to the left side. Now, realize that set_bits(X) = c_10 + c_11. Applying this information on the equation we get
c_01 + c_11 = M - set_bits(X)
Now, also realize that set_bits(Y) = c_01 + c_11. The equation becomes
set_bits(Y) = M - set_bits(X)
or
set_bits(X) + set_bits(Y) = M
The problem has turned into counting the number of pairs such that the number of set bits in the first element plus the number of set bits in the second element is equal to M. This can be done in linear time assuming you can compute set_bits in constant time.
The task is to find the amount of distinct pairs of {x, y} that fits the equation 1/x + 1/y = 1/n, with n being the input given by the user. Different ordering of x and y does not count as a new pair.
For example, the value n = 2 will mean 1/n = 1/2. 1/2 can be formed with two pairs of {x, y}, whcih are 6 and 3 and 4 and 4.
The value n = 3 will mean 1/n = 1/3. 1/3 can be formed with two pairs of {x, y}, which are 4 and 12 and 6 and 6.
The mathematical equation of 1/x + 1/y = 1/n can be converted to y = nx/(x-n) where if y and x in said converted equation are whole, they count as a pair of {x, y}. Using said converted formula, I will iterate n times starting from x = n + 1 and adding x by 1 per iteration to find whether nx % (x - n) == 0; if it yields true, the x and y are a new distinct pair.
I found the answer to limit my iteration by n times by manually computing the answers and finding the number of repetitions 'pattern'. x also starts with n+1 because otherwise, division by zero will happen or y will result in a negative number. The modulo operator is to indicate that the y attained is whole.
Questions:
Is there a mathematical explanation behind why the iteration is limited to n times? I found out that the limit of iteration is n times by doing manual computation and finding the pattern: that I only need to iterate n times to find the amount of distinct pairs.
Is there another way to find the amount of distinct pairs {x, y} other than my method above, which is by finding the VALUES of distinct pairs itself and then summing the amount of distinct pair? Is there a quick mathematical formula I'm not aware of?
For reference, my code can be seen here: https://gist.github.com/TakeNoteIAmHere/596eaa2ccf5815fe9bbc20172dce7a63
Assuming that x,y,n > 0 we have
Observation 1: both, x and y must be greater than n
Observation 2: since (x,y) and (y,x) do not count as distinct, we can assume that x <= y.
Observation 3: x = y = 2n is always a solution and if x > 2n then y < x (thus no new solution)
This means the possible values for x are from n+1 up to 2n.
A little algebra convers the equation
1/x + 1/y = n
into
(x-n)*(y-n) = n*n
Since we want a solution in integers, we seek integers f, g so that
f*g = n*n
and then the solution for x and y is
x = f+n, y = g+n
I think the easiest way to proceed is to factorise n, ie write
n = (P[1]^k[1]) * .. *(P[m]^k[m])
where the Ps are distinct primes, the ks positive integers and ^ denotes exponentiation.
Then the possibilities for f and g are
f = P[1]^a[1]) * .. *(P[m]^a[m])
g = P[1]^b[1]) * .. *(P[m]^b[m])
where the as and bs satisfy, for each i=1..m
0<=a[i]<=2*k[i]
b[i] = 2*k[i] - a[i]
If we just wanted to count the number of solutions, we would just need to count the number of fs, ie the number of distinct sequences a[]. But this is just
Nall = (2*k[1]+1)*... (2*[k[m]+1)
However we want to count the solution (f,g) and (g,f) as being the same. There is only one case where f = g (because the factorisation into primes is unique, we can only have f=g if the a[] equal the b[]) and so the number we seek is
1 + (Nall-1)/2
I am trying to solve the problem below for the last two days. I can't think of any solution for it other than brute force. Any kind of hints or references will be appreciated. TIA.
"Given N distinct prime integers i.e p1, p2,..., pN and an interval [L,R]. Calculate the number of integers in this interval that are divisible by at least one of the given primes."
N is very small (1<=N<=10) and L,R are very big (1<=L<=R<=10^10)
First note, it's easier to restrict the problem, and ignore the lower bound (ie: treat L=1). If we can count numbers divisible by the primes <= N for any N, we can also count them on an interval, by subtracting the count of numbers <= L-1 from the count <= R.
Given any number x, the count of numbers <= R divisible by x is floor(R / x).
Now, we can apply the inclusion-exclusion principle to get the result. First, I'll show the results by hand for 3 primes p1, p2 and p3, and then give the general result.
The count of numbers <= R divisible by p1, p2 or p3 is:
R / p1 + R / p2 + R / p3
- R / (p1p2) - R / (p1p3) - R / (p2p3)
+ R / (p1p2p3)
(Here / is assumed to be rounding-down integer division).
The general case is as follows:
sum((-1)^(|S|+1) * R / prod(S) for S a non-empty subset of {p1, p2, .., pN}).
Here S ranges over all subsets of your primes, prod(S) is the product of the primes in the subset, and the initial term varies between -1 and +1 depending on the size of the subset.
For your problem, N<=10, so there's 1023 non-empty subsets which a small number of things to iterate over.
Here's some example Python code:
from itertools import *
def powerset(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
def prod(ns):
r = 1
for n in ns:
r *= n
return r
def divs(primes, N):
r = 0
for S in powerset(primes):
if not S: continue
sign = 1 if len(S) % 2 else -1
r += sign * (N // prod(S))
return r
def divs_in_range(primes, L, R):
return divs(primes, R) - divs(primes, L-1)
Note, that the running time of this code is more-or-less only dependent on the number of primes, and not so much on the magnitudes of L and R.
Assuming n is the interval size and N is const.
For each prime p, there should be roughly (R-L) / p numbers in the interval divisible by the prime.
Finding the first number divisible by p in interval: L' = L + (p - L % p).
Now if L' > R, there is none; otherwise there are 1 + floor((R-L') / p).
Example: 3, [10, 20]:
L' = 10 + 3 - 10 % 3 = 12.
Numbers divisible by 3 in the interval: 1 + floor((20 - 12) / 3) = 3
Note: So far we haven't used the fact that p1..pN are primes.
Remaining problem seems to be: How to avoid counting a number divisible by multiple primes multiple times? Example: Assuming we have 3,5 and [10, 20], we need to avoid counting 15 twice...
Maybe we can just count divisibility by (p1*p2) etc. using the counting algorithm above, and reduce the total accordingly? If N ist const, this should still be const time. Because p1...pN are prime, all their products need to be different (as any number can't have more than one prime factorizations).
I am solving a competitive programming problem, it was described like this:
Given n < 10^5 integer a1, a2, a3, ..., an and L, R. How many
subarrays are there such that sum of its element in range [L, R].
Example:
Input:
n = 4, L = 2, R = 4
1 2 3 4
Output: 4
(4 = 4, 3 = 1 + 2 = 3, 2 = 2)
One solution I have is bruteforce, but O(n^2) is too slow. What data structures / algorithms should I use to solve this problem efficiently ?
Compute prefix sums(p[0] = 0, p[1] = a1, p[2] = a1 + a2, ..., p[n] = sum of all numbers).
For a fixed prefix sum p[i], you need to find the number of such prefix sums p[j] that j is less than i and p[i] - R <= p[j] <= p[i] - L. One can do it in O(log n) with treap or another balanced binary search tree.
Pseudo code:
treap.add(0)
sum = 0
ans = 0
for i from 1 to n:
sum += a[i]
left, right = treap.split(sum - R)
middle, right = right.split(sum - L)
ans += middle.size()
merge left, middle and right together
treap.add(sum)
We can do it in linear time if the array contains positive numbers only.
First build an array with prefix sum from left to right.
1. Fix three pointers, X, Y and Z and initialize them with 0
2. At every step increase X by 1
3. While sum of numbers between X and Y are greater than R keep increasing Y
4. While sum of numbers between X and Z are greater than or equal to L, keep increasing Z
5. If valid Y and Z are found, add Z - Y + 1 to result.
6. If X is less than length of the array, Go to step 2.
My input are three numbers - a number s and the beginning b and end e of a range with 0 <= s,b,e <= 10^1000. The task is to find the minimal Levenstein distance between s and all numbers in range [b, e]. It is not necessary to find the number minimizing the distance, the minimal distance is sufficient.
Obviously I have to read the numbers as string, because standard C++ type will not handle such large numbers. Calculating the Levenstein distance for every number in the possibly huge range is not feasible.
Any ideas?
[EDIT 10/8/2013: Some cases considered in the DP algorithm actually don't need to be considered after all, though considering them does not lead to incorrectness :)]
In the following I describe an algorithm that takes O(N^2) time, where N is the largest number of digits in any of b, e, or s. Since all these numbers are limited to 1000 digits, this means at most a few million basic operations, which will take milliseconds on any modern CPU.
Suppose s has n digits. In the following, "between" means "inclusive"; I will say "strictly between" if I mean "excluding its endpoints". Indices are 1-based. x[i] means the ith digit of x, so e.g. x[1] is its first digit.
Splitting up the problem
The first thing to do is to break up the problem into a series of subproblems in which each b and e have the same number of digits. Suppose e has k >= 0 more digits than s: break up the problem into k+1 subproblems. E.g. if b = 5 and e = 14032, create the following subproblems:
b = 5, e = 9
b = 10, e = 99
b = 100, e = 999
b = 1000, e = 9999
b = 10000, e = 14032
We can solve each of these subproblems, and take the minimum solution.
The easy cases: the middle
The easy cases are the ones in the middle. Whenever e has k >= 1 more digits than b, there will be k-1 subproblems (e.g. 3 above) in which b is a power of 10 and e is the next power of 10, minus 1. Suppose b is 10^m. Notice that choosing any digit between 1 and 9, followed by any m digits between 0 and 9, produces a number x that is in the range b <= x <= e. Furthermore there are no numbers in this range that cannot be produced this way. The minimum Levenshtein distance between s (or in fact any given length-n digit string that doesn't start with a 0) and any number x in the range 10^m <= x <= 10^(m+1)-1 is necessarily abs(m+1-n), since if m+1 >= n it's possible to simply choose the first n digits of x to be the same as those in s, and delete the remainder, and if m+1 < n then choose the first m+1 to be the same as those in s and insert the remainder.
In fact we can deal with all these subproblems in a single constant-time operation: if the smallest "easy" subproblem has b = 10^m and the largest "easy" subproblem has b = 10^u, then the minimum Levenshtein distance between s and any number in any of these ranges is m-n if n < m, n-u if n > u, and 0 otherwise.
The hard cases: the end(s)
The hard cases are when b and e are not restricted to have the form b = 10^m and e = 10^(m+1)-1 respectively. Any master problem can generate at most two subproblems like this: either two "ends" (resulting from a master problem in which b and e have different numbers of digits, such as the example at the top) or a single subproblem (i.e. the master problem itself, which didn't need to be subdivided at all because b and e already have the same number of digits). Note that due to the previous splitting of the problem, we can assume that the subproblem's b and e have the same number of digits, which we will call m.
Super-Levenshtein!
What we will do is design a variation of the Levenshtein DP matrix that calculates the minimum Levenshtein distance between a given digit string (s) and any number x in the range b <= x <= e. Despite this added "power", the algorithm will still run in O(n^2) time :)
First, observe that if b and e have the same number of digits and b != e, then it must be the case that they consist of some number q >= 0 of identical digits at the left, followed by a digit that is larger in e than in b. Now consider the following procedure for generating a random digit string x:
Set x to the first q digits of b.
Append a randomly-chosen digit d between b[i] and e[i] to x.
If d == b[i], we "hug" the lower bound:
For i from q+1 to m:
If b[i] == 9 then append b[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that e[i] will be larger then b[i], and there is no digit larger than 9!]
Otherwise, flip a coin:
Heads: Append b[i].
Tails: Append a randomly-chosen digit d > b[i], then goto 6.
Stop.
Else if d == e[i], we "hug" the upper bound:
For i from q+1 to m:
If e[i] == 0 then append e[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that b[i] will be smaller then e[i], and there is no digit smaller than 0!]
Otherwise, flip a coin:
Heads: Append e[i].
Tails: Append a randomly-chosen digit d < e[i], then goto 6.
Stop.
Otherwise (if d is strictly between b[i] and e[i]), drop through to step 6.
Keep appending randomly-chosen digits to x until it has m digits.
The basic idea is that after including all the digits that you must include, you can either "hug" the lower bound's digits for as long as you want, or "hug" the upper bound's digits for as long as you want, and as soon as you decide to stop "hugging", you can thereafter choose any digits you want. For suitable random choices, this procedure will generate all and only the numbers x such that b <= x <= e.
In the "usual" Levenshtein distance computation between two strings s and x, of lengths n and m respectively, we have a rectangular grid from (0, 0) to (n, m), and at each grid point (i, j) we record the Levenshtein distance between the prefix s[1..i] and the prefix x[1..j]. The score at (i, j) is calculated from the scores at (i-1, j), (i, j-1) and (i-1, j-1) using bottom-up dynamic programming. To adapt this to treat x as one of a set of possible strings (specifically, a digit string corresponding to a number between b and e) instead of a particular given string, what we need to do is record not one but two scores for each grid point: one for the case where we assume that the digit at position j was chosen to hug the lower bound, and one where we assume it was chosen to hug the upper bound. The 3rd possibility (step 5 above) doesn't actually require space in the DP matrix because we can work out the minimal Levenshtein distance for the entire rest of the input string immediately, very similar to the way we work it out for the "easy" subproblems in the first section.
Super-Levenshtein DP recursion
Call the overall minimal score at grid point (i, j) v(i, j). Let diff(a, b) = 1 if characters a and b are different, and 0 otherwise. Let inrange(a, b..c) be 1 if the character a is in the range b..c, and 0 otherwise. The calculations are:
# The best Lev distance overall between s[1..i] and x[1..j]
v(i, j) = min(hb(i, j), he(i, j))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the lower bound
hb(i, j) = min(hb(i-1, j)+1, hb(i, j-1)+1, hb(i-1, j-1)+diff(s[i], b[j]))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the upper bound
he(i, j) = min(he(i-1, j)+1, he(i, j-1)+1, he(i-1, j-1)+diff(s[i], e[j]))
At the point in time when v(i, j) is being calculated, we will also calculate the Levenshtein distance resulting from choosing to "stop hugging", i.e. by choosing a digit that is strictly in between b[j] and e[j] (if j == q) or (if j != q) is either above b[j] or below e[j], and thereafter freely choosing digits to make the suffix of x match the suffix of s as closely as possible:
# The best Lev distance possible between the ENTIRE STRINGS s and x, given that
# we choose to stop hugging at the jth digit of x, and have optimally aligned
# the first i digits of s to these j digits
sh(i, j) = if j >= q then shc(i, j)+abs(n-i-m+j)
else infinity
shc(i, j) = if j == q then
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..(e[j]-1)))
else
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..9),
he(i, j-1)+1, he(i-1, j-1)+inrange(s[i], (0..(e[j]-1)))
The formula for shc(i, j) doesn't need to consider "downward" moves, since such moves don't involve any digit choice for x.
The overall minimal Levenshtein distance is the minimum of v(n, m) and sh(i, j), for all 0 <= i <= n and 0 <= j <= m.
Complexity
Take N to be the largest number of digits in any of s, b or e. The original problem can be split in linear time into at most 1 set of easy problems that collectively takes O(1) time to solve and 2 hard subproblems that each take O(N^2) time to solve using the super-Levenshtein algorithm, so overall the problem can be solved in O(N^2) time, i.e. time proportional to the square of the number of digits.
A first idea to speed up the computation (works if |e-b| is not too large):
Question: how much can the Levestein distance change when we compare s with n and then with n+1?
Answer: not too much!
Let's see the dynamic-programming tables for s = 12007 and two consecutive n
n = 12296
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 3
and
n = 12297
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 2
As you can see, only the last column changes, since n and n+1 have the same digits, except for the last one.
If you have the dynamic-programming table for the edit-distance of s = 12001 and n = 12296, you already have the table for n = 12297, you just need to update the last column!
Obviously if n = 12299 then n+1 = 12300 and you need to update the last 3 columns of the previous table.. but this happens just once every 100 iteration.
In general, you have to
update the last column on every iterations (so, length(s) cells)
update the second-to-last too, once every 10 iterations
update the third-to-last, too, once every 100 iterations
so let L = length(s) and D = e-b. First you compute the edit-distance between s and b. Then you can find the minimum Levenstein distance over [b,e] looping over every integer in the interval. There are D of them, so the execution time is about:
Now since
we have an algorithm wich is