Count number of subsequences with given k modulo sum - algorithm

Given an array a of n integers, count how many subsequences (non-consecutive as well) have sum % k = 0:
1 <= k < 100
1 <= n <= 10^6
1 <= a[i] <= 1000
An O(n^2) solution is easily possible, however a faster way O(n log n) or O(n) is needed.

This is the subset sum problem.
A simple solution is this:
s = 0
dp[x] = how many subsequences we can build with sum x
dp[0] = 1, 0 elsewhere
for i = 1 to n:
s += a[i]
for j = s down to a[i]:
dp[j] = dp[j] + dp[j - a[i]]
Then you can simply return the sum of all dp[x] such that x % k == 0. This has a high complexity though: about O(n*S), where S is the sum of all of your elements. The dp array must also have size S, which you probably can't even afford to declare for your constraints.
A better solution is to not iterate over sums larger than or equal to k in the first place. To do this, we will use 2 dp arrays:
dp1, dp2 = arrays of size k
dp1[0] = dp2[0] = 1, 0 elsewhere
for i = 1 to n:
mod_elem = a[i] % k
for j = 0 to k - 1:
dp2[j] = dp2[j] + dp1[(j - mod_elem + k) % k]
copy dp2 into dp1
return dp1[0]
Whose complexity is O(n*k), and is optimal for this problem.

There's an O(n + k^2 lg n)-time algorithm. Compute a histogram c(0), c(1), ..., c(k-1) of the input array mod k (i.e., there are c(r) elements that are r mod k). Then compute
k-1
product (1 + x^r)^c(r) mod (1 - x^k)
r=0
as follows, where the constant term of the reduced polynomial is the answer.
Rather than evaluate each factor with a fast exponentiation method and then multiply, we turn things inside out. If all c(r) are zero, then the answer is 1. Otherwise, recursively evaluate
k-1
P = product (1 + x^r)^(floor(c(r)/2)) mod (1 - x^k).
r=0
and then compute
k-1
Q = product (1 + x^r)^(c(r) - 2 floor(c(r)/2)) mod (1 - x^k),
r=0
in time O(k^2) for the latter computation by exploiting the sparsity of the factors. The result is P^2 Q mod (1 - x^k), computed in time O(k^2) via naive convolution.

Traverse a and count a[i] mod k; there ought to be k such counts.
Recurse and memoize over the distinct partitions of k, 2*k, 3*k...etc. with parts less than or equal to k, adding the products of the appropriate counts.
For example, if k were 10, some of the partitions would be 1+2+7 and 1+2+3+4; but while memoizing, we would only need to calculate once how many pairs mod k in the array produce (1 + 2).
For example, k = 5, a = {1,4,2,3,5,6}:
counts of a[i] mod k: {1,2,1,1,1}
products of distinct partitions of k:
5 => 1
4,1 => 2
3,2 => 1
products of distinct partitions of 2 * k with parts <= k:
5,4,1 => 2
5,3,2 => 1
4,1,3,2 => 2
products of distinct partitions of 3 * k with parts <= k:
5,4,1,3,2 => 2
answer = 11
{1,4} {4,6} {2,3} {5}
{1,4,2,3} {1,4,5} {4,6,2,3} {4,6,5} {2,3,5}
{1,4,2,3,5} {4,6,2,3,5}

Related

Count number of subsequences of A such that every element of the subsequence is divisible by its index (starts from 1)

B is a subsequence of A if and only if we can turn A to B by removing zero or more element(s).
A = [1,2,3,4]
B = [1,4] is a subsequence of A.(Just remove 2 and 4).
B = [4,1] is not a subsequence of A.
Count all subsequences of A that satisfy this condition : A[i]%i = 0
Note that i starts from 1 not 0.
Example :
Input :
5
2 2 1 22 14
Output:
13
All of these 13 subsequences satisfy B[i]%i = 0 condition.
{2},{2,2},{2,22},{2,14},{2},{2,22},{2,14},{1},{1,22},{1,14},{22},{22,14},{14}
My attempt :
The only solution that I could came up with has O(n^2) complexity.
Assuming the maximum element in A is C, the following is an algorithm with time complexity O(n * sqrt(C)):
For every element x in A, find all divisors of x.
For every i from 1 to n, find every j such that A[j] is a multiple of i, using the result of step 1.
For every i from 1 to n and j such that A[j] is a multiple of i (using the result of step 2), find the number of B that has i elements and the last element is A[j] (dynamic programming).
def find_factors(x):
"""Returns all factors of x"""
for i in range(1, int(x ** 0.5) + 1):
if x % i == 0:
yield i
if i != x // i:
yield x // i
def solve(a):
"""Returns the answer for a"""
n = len(a)
# b[i] contains every j such that a[j] is a multiple of i+1.
b = [[] for i in range(n)]
for i, x in enumerate(a):
for factor in find_factors(x):
if factor <= n:
b[factor - 1].append(i)
# There are dp[i][j] sub arrays of A of length (i+1) ending at b[i][j]
dp = [[] for i in range(n)]
dp[0] = [1] * n
for i in range(1, n):
k = x = 0
for j in b[i]:
while k < len(b[i - 1]) and b[i - 1][k] < j:
x += dp[i - 1][k]
k += 1
dp[i].append(x)
return sum(sum(dpi) for dpi in dp)
For every divisor d of A[i], where d is greater than 1 and at most i+1, A[i] can be the dth element of the number of subsequences already counted for d-1.
JavaScript code:
function getDivisors(n, max){
let m = 1;
const left = [];
const right = [];
while (m*m <= n && m <= max){
if (n % m == 0){
left.push(m);
const l = n / m;
if (l != m && l <= max)
right.push(l);
}
m += 1;
}
return right.concat(left.reverse());
}
function f(A){
const dp = [1, ...new Array(A.length).fill(0)];
let result = 0;
for (let i=0; i<A.length; i++){
for (d of getDivisors(A[i], i+1)){
result += dp[d-1];
dp[d] += dp[d-1];
}
}
return result;
}
var A = [2, 2, 1, 22, 14];
console.log(JSON.stringify(A));
console.log(f(A));
I believe that for the general case we can't provably find an algorithm with complexity less than O(n^2).
First, an intuitive explanation:
Let's indicate the elements of the array by a1, a2, a3, ..., a_n.
If the element a1 appears in a subarray, it must be element no. 1.
If the element a2 appears in a subarray, it can be element no. 1 or 2.
If the element a3 appears in a subarray, it can be element no. 1, 2 or 3.
...
If the element a_n appears in a subarray, it can be element no. 1, 2, 3, ..., n.
So to take all the possibilities into account, we have to perform the following tests:
Check if a1 is divisible by 1 (trivial, of course)
Check if a2 is divisible by 1 or 2
Check if a3 is divisible by 1, 2 or 3
...
Check if a_n is divisible by 1, 2, 3, ..., n
All in all we have to perform 1+ 2 + 3 + ... + n = n(n - 1) / 2 tests, which gives a complexity of O(n^2).
Note that the above is somewhat inaccurate, because not all the tests are strictly necessary. For example, if a_i is divisible by 2 and 3 then it must be divisible by 6. Nevertheless, I think this gives a good intuition.
Now for a more formal argument:
Define an array like so:
a1 = 1
a2 = 1× 2
a3 = 1× 2 × 3
...
a_n = 1 × 2 × 3 × ... × n
By the definition, every subarray is valid.
Now let (m, p) be such that m <= n and p <= n and change a_mtoa_m / p`. We can now choose one of two paths:
If we restrict p to be prime, then each tuple (m, p) represents a mandatory test, because the corresponding change in the value of a_m changes the number of valid subarrays. But that requires prime factorization of each number between 1 and n. By the known methods, I don't think we can get here a complexity less than O(n^2).
If we omit the above restriction, then we clearly perform n(n - 1) / 2 tests, which gives a complexity of O(n^2).

Remove all the multiples of a given set of numbers from given range

I am stuck on a problem, where it says, given a number N and a set of numbers, S = {s1,s2,.....sn} where s1 < s2 < sn < N, remove all the multiples of {s1, s2,....sn} from range 1..N
Example:
Let N = 10
S = {2,4,5}
Output: {1, 7, 9}
Explanation: multiples of 2 within range: 2, 4, 6, 8
multiples of 4 within range: 4, 8
multiples of 5 within range: 5, 10
I would like to have an algorithmic approach, psuedocode rather than complete solution.
What I have tried:
(Considering the same example as above)
1. For the given N, find all the prime factors of that number.
Therefore, for 10, prime-factors are: 2,3,5,7
In the given set, S = {2,4,5}, the prime-factors missing from
{2,3,5,7} are {3,7}.
2. First, check prime-factors that are present: {2,5}
Hence, all the multiples of them will be removed
{2,4,5,6,8,10}
3. Check for non-prime numbers in S = {4}
4. Check, if any divisor of these numbers has already been
previously processed.
> In this case, 2 is already processed.
> Hence no need to process 4, as all the multiples of 4
would have been implicitly checked by the previous
divisor.
If not,
> Remove all the multiples from this range.
5. Repeat for all the remaining non primes in the set.
Please suggest your thoughts!
It is possible to solve it in O(N log(n)) time and O(N) extra memory using something similar to the Sieve of Eratosthenes.
isMultiple[1..N] = false
for each s in S:
t = s
while t <= N:
isMultiple[t] = true
t += s
for i in 1..N:
if not isMultiple[i]:
print i
This uses O(N) memory to store the isMultiple array.
The time complexity is O(N log(n)). Indeed, the inner while loop will be performed N / s1 times for the first element in S, then N / s2 for the second, and so on.
We need to estimate the magnitude of N / s1 + N / s2 + ... + N / sn.
N / s1 + N / s2 + ... + N / sn
= N * (1/s1 + 1/s2 + ... + 1/sn) <= N * (1/1 + 1/2 + ... + 1/n).
The last inequality is due to the fact that s1 < s2 < ...
< sn, thus the worst case is when they take values {1, 2, .. n}.
However, the harmonic series 1/1 + 1/2 + ... + 1/n is in O(log(n)), (e.g. see this), thus the time complexity of the above algorithm is O(N log(n)).
basic solution:
let set X be our output set.
for each number, n, between 1 and N:
for each number, s, in set S:
if s divides n:
stop searching S, and move onto the next number,n.
else if s is the last element in S:
add n to the set X.
you can obviously remove multiples in S before running this algorithm, but I don't think prime numbers are the way to go
Since S is sorted, we can guarantee O(N) complexity by skipping elements in S already marked (http://codepad.org/Joflhb7x):
N = 10
S = [2,4,5]
marked = set()
i = 0
curr = 1
while curr <= N:
while curr < S[i]:
print curr
curr = curr + 1
if not S[i] in marked:
mult = S[i]
while mult <= N:
marked.add(mult)
mult = mult + S[i]
i = i + 1
curr = curr + 1
if i == len(S):
while curr <= N:
if curr not in marked:
print curr
curr = curr + 1
print list(marked)

Given 2 arrays of non-negative numbers, find the minimum sum of products

Given two arrays A and B, each containing n non-negative numbers, remove a>0 elements from the end of A and b>0 elements from the end of B. Evaluate the cost of such an operation as X*Y where X is the sum of the a elements removed from A and Y the sum of the b elements removed from B. Keep doing this until both arrays are empty. The goal is to minimize the total cost.
Using dynamic programming and the fact that an optimal strategy will always take exactly one element from either A or B I can find an O(n^3) solution. Now I'm curious to know if there is an even faster solution to this problem?
EDIT: Stealing an example from #recursive in the comments:
A = [1,9,1] and B = [1, 9, 1]. Possible to do with a cost of 20. (1) *
(1 + 9) + (9 + 1) * (1)
Here's O(n^2). Let CostA(i, j) be the min cost of eliminating A[1..i], B[1..j] in such a way that the first removal takes only one element from B. Let CostB(i, j) be the min cost of eliminating A[1..i], B[1..j] in such a way that the first removal takes only one element from A. We have mutually recursive recurrences
CostA(i, j) = A[i] * B[j] + min(CostA(i - 1, j),
CostA(i - 1, j - 1),
CostB(i - 1, j - 1))
CostB(i, j) = A[i] * B[j] + min(CostB(i, j - 1),
CostA(i - 1, j - 1),
CostB(i - 1, j - 1))
with base cases
CostA(0, 0) = 0
CostA(>0, 0) = infinity
CostA(0, >0) = infinity
CostB(0, 0) = 0
CostB(>0, 0) = infinity
CostB(0, >0) = infinity.
The answer is min(CostA(n, n), CostB(n, n)).

From a loop index k, obtain pairs i,j with i < j?

I need to traverse all pairs i,j with 0 <= i < n, 0 <= j < n and i < j for some positive integer n.
Problem is that I can only loop through another variable, say k. I can control the bounds of k. So the problem is to determine two arithmetic methods, f(k) and g(k) such that i=f(k) and j=g(k) traverse all admissible pairs as k traverses its consecutive values.
How can I do this in a simple way?
I think I got it (in Python):
def get_ij(n, k):
j = k // (n - 1) # // is integer (truncating) division
i = k - j * (n - 1)
if i >= j:
i = (n - 2) - i
j = (n - 1) - j
return i, j
for n in range(2, 6):
print n, sorted(get_ij(n, k) for k in range(n * (n - 1) / 2))
It basically folds the matrix so that it's (almost) rectangular. By "almost" I mean that there could be some unused entries on the far right of the bottom row.
The following pictures illustrate how the folding works for n=4:
and n=5:
Now, iterating over the rectangle is easy, as is mapping from folded coordinates back to coordinates in the original triangular matrix.
Pros: uses simple integer math.
Cons: returns the tuples in a weird order.
I think I found another way, that gives the pairs in lexicographic order. Note that here i > j instead of i < j.
Basically the algorithm consists of the two expressions:
i = floor((1 + sqrt(1 + 8*k))/2)
j = k - i*(i - 1)/2
that give i,j as functions of k. Here k is a zero-based index.
Pros: Gives the pairs in lexicographic order.
Cons: Relies on floating-point arithmetic.
Rationale:
We want to achieve the mapping in the following table:
k -> (i,j)
0 -> (1,0)
1 -> (2,0)
2 -> (2,1)
3 -> (3,0)
4 -> (3,1)
5 -> (3,2)
....
We start by considering the inverse mapping (i,j) -> k. It isn't hard to realize that:
k = i*(i-1)/2 + j
Since j < i, it follows that the value of k corresponding to all pairs (i,j) with fixed i satisfies:
i*(i-1)/2 <= k < i*(i+1)/2
Therefore, given k, i=f(k) returns the largest integer i such that i*(i-1)/2 <= k. After some algebra:
i = f(k) = floor((1 + sqrt(1 + 8*k))/2)
After we have found the value i, j is trivially given by
j = k - i*(i-1)/2
I'm not sure to understand exactly the question, but to sum up, if 0 <= i < n, 0 <= j < n , then you want to traverse 0 <= k < n*n
for (int k = 0; k < n*n; k++) {
int i = k / n;
int j = k % n;
// ...
}
[edit] I just saw that i < j ; so, this solution is not optimal since there's less that n*n necessary iterations ...
If we think of our solution in terms of a number triangle, where k is the sequence
1
2 3
4 5 6
7 8 9 10
11 12 13 14 15
...
Then j would be our (non zero-based) row number, that is, the greatest integer such that
j * (j - 1) / 2 < k
Solving for j:
j = ceiling ((sqrt (1 + 8 * k) - 1) / 2)
And i would be k's (zero-based) position in the row
i = k - j * (j - 1) / 2 - 1
The bounds for k are:
1 <= k <= n * (n - 1) / 2
Is it important that you actually have two arithmetic functions f(k) and g(k) doing this? Because you could first create a list such as
L = []
for i in range(n-1):
for j in range(n):
if j>i:
L.append((i,j))
This will give you all the pairs you asked for. Your variable k can now just run along the index of the list. For example, if we take n=5,
for x in L:
print(x)
gives us
(0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
Suppose your have 2<=k<5 for example, then
for k in range(2, 5)
print L[k]
yields
(0,3), (0,4), (1,2)

Represent a number as sum of primes

I am given a large number n and I need to find whether it can be represented as sum of K prime numbers.
Ex 9 can be represented as sum of 3 prime number as 2+2+5.
I am trying to use variation of subset sum but number is too large to generate all primes number till then.
The problem is from the current HackerRank contest. The restrictions are 1 <= n, K <= 10^12
For K = 1, the answer is obviously "Yes" iif N is prime
For K = 2, according to the Goldbach conjecture, which is verified for N up to around 10^18, the answer is "Yes" iif N is even and N >= 4 or if N - 2 is prime.
The interesting case is K = 3. Obviously if N < 6, the answer is "No" because the smallest number expressible as the sum of three primes is 2 + 2 + 2 = 6.
If N >= 6, then either N - 2 or N - 3 is even and >= 4, so we can apply Goldbach's conjecture again.
So for K = 3, the answer is "Yes" simply iif N >= 6.
Via induction (hint: just use K - 3 times the prime 2), we can show that for K >= 3, the answer is "Yes" iif N >= 2*K, so only the cases K = 1 and K = 2 are non-trivial and require just a simple primality check, e.g. via Miller–Rabin in O(log^4 N).
EDIT: As a bonus, this proof also gives a constructive algorithm to output the partition. We use a number of 2's and maybe one 3 to get to K = 2. The tricky K = 2, N even case is not as hard as it looks: We know from computational verification of the Goldbach conjecture that for N >= 12, there is a Goldbach partition with a prime < 5200 or so. There are less than 700 such primes, so we can check them all in a reasonable amount of time.
The concept you are looking for is called the prime partitions of a number. The formula to compute the number of prime partitions of a number is \kappa(n) = \frac{1}{n}\left(\mathrm{sopf}(n) + \sum_{j=1}^{n-1} \mathrm{sopf}(j) \cdot \kappa(n-j)\right); I gave that in LaTeX notation because I don't know how to do it in html. The sopf(n) function is the sum of the distinct prime factors of n, so sopf(42) = 12, since 42 = 2 * 3 * 7, but sopf(12) = 5, since 12 = 2 * 2 * 3 but each prime factor is counted once.
I discuss this formula at my blog.
Your input are n and K. There are many cases :
K > n : impossible
K = n : the K prime numbers are all 1
K < n : 4 subcases :
a. n and K are odd
b. n is even, K is odd
c. n is odd, K is even
d. n and K are even
Case a: select any prime p < n and p > 2. The problem reduces to the same problem with input n-p and K-1 instead of n and K respectively, and we fall in case b
Case b: The problem reduces to the same problem with input n-2 and K-1 instead of n and K respectively, and we fall in case d
Case c: idem than b, but we fall in case a instead of d
Case d: if n = 2K, then 2, 2, ..., 2 taken K times is your solution (ie your primes are 2, 2, ..., 2). Otherwise n can be written
n = (\sum_{i=1}^{i=K-2} 2 ) + p + q
where we add the prime 2 (K-2) times in the sum. Then the problem reduces to the same problem with input n-2(K-2) instead of n and 2 instead of K. But this is Goldbach. You can solve it in O(n sqrt(n)) like this : take p and q both equal to n/2. Increment p and decrease q by 1 at each step until they are both prime.

Resources