Subset of values with length>=N and sum>=S - algorithm

Given a list of values (e.g. 10, 15, 20, 30, 70), values N (e.g. 3) and S (e.g. 100), find a subset that satisfies :
length of subset >= N
sum of subset >= S
The sum of the subset should also be the least possible (the sum of remaining values should be the greatest possible) (e.g. result subset should be (10,20,70), not (15,20,70) which also satisfies 1. and 2.).
I was looking at some problems and solutions (Knapsack problem, Bin packing problem, ...) but didn't find them applicable. Similar problems on the internet were also not suitable for some reason (e.g. number of elements in subset was fixed).
Can someone point me in the right direction? Is there any other solution other than exhausting every possible combination?
Edit - working algorithm I implemented in ruby code, I guess it can be further optimized:
def find_subset_with_sum_and_length_threshold(vals, min_nr, min_sum)
sum_map = {}
vals.sort.each do |v|
sum_map.keys.sort.each do |k|
addends = sum_map[k] + [v]
if (addends.length >= min_nr && k+v >= min_sum)
return addends
else
sum_map[k+v] = addends
end
end
sum_map[v] = [v] if sum_map[v].nil?
end
end

This is not very different from the 0-1 knapsack problem.
Zero-initialize a matrix with S+U rows and N columns(U is the largest list value)
Zero-initialize a bit array A with S+U elements
For each value (v) in the list:
For each j<S:
If M[N-1,j] != 0 and M[N-1, j + v] == 0:
M[N-1, j + v] = v
A[j + v] = true
For i=N-2 .. 0:
For each j<S:
If M[i,j] != 0 and M[i+1, j + v] == 0:
M[i+1, j + v] = v
M[0,v] = v
Find first nonzero element in M[N-1,S..S+U]
Reconstruct other elements of the subset by subtracting found value from its\
index and using the result as index in preceding column of the matrix\
(or in the last column, depending on the corresponding bit in 'A').
Time complexity is O(L*N*S), where L is the length of the list, N and S are given limits.
Space complexity is O(L*N).
Zero-initialize an integer array A with S+U elements
i=0
For each value (v) in the list:
For each j<S:
If A[j] != 0 and A[j + v] < A[j] + 1:
A[j + v] = A[j] + 1
V[i,j + v] = v
P[i,j + v] = I[j]
I[j + v] = i
If A[v] == 0:
A[v] = 1
I[v] = i
++i
Find first element in A[S..S+U] with value not less than N
Reconstruct elements of the subset using matrices V and P.
Time complexity is O(L*S), where L is the length of the list, S is given limit.
Space complexity is O(L*S).
Algorithm that also minimizes the subset size:
Zero-initialize a boolean matrix with S+U rows and N columns\
(U is the largest list value)
Zero-initialize an integer array A with S+U elements
i=0
For each value (v) in the list:
For each j<S:
If A[j] != 0 and (A[j + v] == 0) || (A[j + v] > A[j] + 1)):
A[j + v] = A[j] + 1
V[i,N-1,j + v] = v
P[i,N-1,j + v] = (I[j,N-1],N-1)
I[j+v,N-1] = i
For k=N-2 .. 0:
For each j<S:
If M[k,j] and not M[k+1, j + v]:
M[k+1, j + v] = true
V[i,k+1,j + v] = v
P[i,k+1,j + v] = (I[j,k],k)
I[j+v,k+1] = i
For each j<S:
If M[N-1, j]:
A[j] = N-1
M[0,v] = true
I[v,0] = i
++i
Find first nonzero element in A[N-1,S..S+U] (or the first element with smallest\
value or any other element that suits both minimization criteria)
Reconstruct elements of the subset using matrices V and P.
Time complexity is O(L*N*S), where L is the length of the list, N and S are given limits.
Space complexity is O(L*N*S).

This is a slight variation of Subset sum problem. Look at section Pseudo-polynomial time dynamic programming solution. In addition to keeping track whether a particular sum is possible (i.e. storing true/false) from the given set, you would need to store the length too to satisfy:
length of subset >= N
if sum of subset = S then it is exactly similar to the above problem. For sum of subset >= S condition, I guess you can test this condition when building the array as mentioned in the Wiki page.

Related

Formulating dp problem [Codeforces 414 B]

all here is the problem statement from an old contest on codeforces
A sequence of l integers b 1, b 2, ..., b l (1 ≤ b 1 ≤ b 2 ≤ ... ≤ b
l ≤ n) is called good if each number divides (without a remainder) by
the next number in the sequence. More formally for all i
(1 ≤ i ≤ l - 1).
Given n and k find the number of good sequences of length k. As the
answer can be rather large print it modulo 1000000007 (109 + 7).
I have formulated my dp[i][j] as the number of good sequences of length i which ends with the jth number, and the transition table as the following pseudocode
dp[k][n] =
for each factor of n as i do
for j from 1 to k - 1
dp[k][n] += dp[j][i]
end
end
But in the editorial it is given as
Lets define dp[i][j] as number of good sequences of length i that ends in j.
Let's denote divisors of j by x1, x2, ..., xl. Then dp[i][j] = sigma dp[i - 1][xr]
But in my understanding, we need two sigmas, one for the divisors and the other for length. Please help me correct my understanding.
My code ->
MOD = 10 ** 9 + 7
N, K = map(int, input().split())
dp = [[0 for _ in range(N + 1)] for _ in range(K + 1)]
for k in range(1, K + 1):
for n in range(1, N + 1):
c = 1
for i in range(1, n):
if n % i != 0:
continue
for j in range(1, k):
c += dp[j][i]
dp[k][n] = c
c = 0
for i in range(1, N + 1):
c = (c + dp[K][i]) % MOD
print(c)
Link to the problem: https://codeforces.com/problemset/problem/414/B
So let's define dp[i][j] as the number of good sequences of length exactly i and which ends with a value j as its last element.
Now, dp[i][j] = Sum(dp[i-1][x]) for all x s.t. x is a divisor of i. Note that x can be equal to j itself.
This is true because if there is some sequence of length i-1 which we have already found that ends with some value x, then we can simply add j to its end and form a new sequence which satisfies all the conditions.
I guess your confusion is with the length. The thing is that since our current length is i, we can add j to the end of a sequence only if its length is i-1, we cannot iterate over other lengths.
Hope this is clear.

Number of pairs with a given sum and product

I have an array A along with 3 variables k, x and y.
I have to find number of unordered pairs (i,j) such that the sum of two elements mod k equals x and the product of the same two elements mod k is equal to y. Pairs need not be distinct. In other words, the number of (i,j) so that
(A[i]+A[j])%k == x and (A[i]*A[j])%k == y where 0 <= i < j < size of A.
For example, let A={1,2,3,2,1}, k=2, x=1, y=0. Then the answer is 6, because the pairs are: (1,2), (1,2), (2,3), (2,1), (3,2), and (2,1).
I used a brute force approach, but obviously this is not acceptable.
Modulo-arithmetic has the following two rules:
((a mod k) * (b mod k)) mod k = (a * b) mod k
((a mod k) + (b mod k)) mod k = (a + b) mod k
Thus we can sort all values into a hashtable with separate chaining and k buckets.
Addition
Find m < k, such that for a given n < k: (n + m) mod k = x.
There is exactly one solution to this problem:
if n < x: m < x must hold. Thus m = x - n
if n == x: m = 0
if n > x: we need to find m such that n + m = x + k. Thus m = x + k - n
This way, we can easily determine for each list of values the corresponding values such that for any pair (a, b) of the crossproduct of the two lists (a + b) mod k = x holds.
Multiplication
Multiplication is a bit trickier. Luckily we've already been given the matching congruence-class for addition (see above), which must as well be the matching congruence-class for the multiplication, since both constraints need to hold. To verify that the given congruence-class matches, we only need to check that (n * m) mod k = y (n and m defined as above). If this expression holds, we can build pairs, otherwise no matching elements exist.
Implementation
This would be the working python-code for the above example:
def modmuladd(ls, x, y, k):
result = []
# create tuples of indices and values
indices = zip(ls, range(0, len(ls)))
# split up into congruence classes
congruence_cls = [[] for i in range(0, k)]
for p in indices:
congruence_cls[p[0] % k].append(p)
for n in range(0, k):
# congruence class to match addition
if n < x:
m = x - n
elif n == x:
m = 0
else:
m = x + k - n
# check if congruence class matches for multiplication
if (n * m) % k != y or len(congruence_cls[m]) == 0:
continue # no matching congruence class
# add matching tuple to result
result += [(a, b) for a in congruence_cls[n] for b in congruence_cls[m] if a[1] <= b[1]]
result += [(a, b) for a in congruence_cls[m] for b in congruence_cls[n] if a[1] <= b[1]]
# sort result such according to indices of first and second element, remove duplicates
sorted_res = sorted(sorted(set(result), key=lambda p: p[1][1]), key=lambda p: p[0][1])
# remove indices from result-set
return [(p[0][0], p[1][0]) for p in sorted_res]
Note that sorting and elimination of duplicates is only required since this code concentrates on the usage of congruence-classes than perfect optimization. This example can be easily tweaked to provided ordering without the sorting by minor modifications.
Test run
print(modmuladd([1, 2, 3, 2, 1], 1, 0, 2))
Output:
[(1, 2), (1, 2), (2, 3), (2, 1), (3, 2), (2, 1)]
EDIT:
Worst-case complexity of this algorithm is still O(n^2), due to the fact that building all possible pairs of a list of size n is O(n^2). With this algorithm however the search for matching pairs can be cut down to O(k) with O(n) preprocessing. Thus counting resulting pairs can be done in O(n) with this approach. Assuming the numbers are distributed equally over the congruence-classes, this algorithm could build all pairs that are part of the solution-set in O(n^2/k^2).
EDIT 2:
An implementation that only counts would work like this:
def modmuladdct(ls, x, y, k):
result = 0
# split up into congruence classes
congruence_class = {}
for v in ls:
if v % k not in congruence_class:
congruence_class[(v % k)] = [v]
else:
congruence_class[v % k].append(v)
for n in congruence_class.keys():
# congruence class to match addition
m = (x - n + k) % k
# check if congruence class matches for multiplication
if (n * m % k != y) or len(congruence_class[m]) == 0:
continue # no matching congruence class
# total number of pairs that will be built
result += len(congruence_class[n]) * len(congruence_class[m])
# divide by two since each pair would otherwise be counted twice
return result // 2
Each pair would appear exactly twice in the result: once in-order and once with reversed order. By dividing the result by two this is being corrected. Runtime is O(n + k) (assuming dictionary-operations are O(1)).
The number of loops is C(2, n) = 5!/(2!(5-2)! = 10 loops in your case, and there is nothing magic that would drastically reduce the number of loops.
In JS you can do:
A = [1, 2, 3, 2, 1];
k = 2;
x = 1;
y = 0;
for(i=0; i<A.length; i++) {
for(j=i+1; j<A.length; j++) {
if ((A[i]+A[j])%k !== x) {
continue;
}
if ((A[i]*A[j])%k !== y) {
continue;
}
console.log('('+A[i]+', '+A[j]+')');
}
}
Ignoring A, we can find all solutions of n * (x - n) == y mod k for 0 <= n < k. That's a simple O(k) algorithm -- check each such n in turn.
We can count, for each n, how often A[i] == n, and then reconstruct the counts of pairs. For if cs is an array of these counts, and n is a solution of n * (x - n) == y mod k, then there's cs[n] * cs[(x-n)^k] pairs of things in A that solve our equations corresponding to this n. To avoid double counting we only count n such that n < (x - n) % k.
def count_pairs(A, k, x, y):
cs = [0] * k
for a in A:
cs[a % k] += 1
pairs = ((i, (x-i)%k) for i in xrange(k) if i * (x-i) % k == y)
return sum(cs[i] * cs[j] for i, j in pairs if i < j)
print count_pairs([1, 2, 3, 2, 1], 2, 1, 0)
Overall, this constructs the counts in O(|A|) time, and the remaining code runs in O(k) time. It uses O(k) space.

What is the fastest algorithm for intersection of two sorted lists?

Say that there are two sorted lists: A and B.
The number of entries in A and B can vary. (They can be very small/huge. They can be similar to each other/significantly different).
What is the known to be the fastest algorithm for this functionality?
Can any one give me an idea or reference?
Assume that A has m elements and B has n elements, with m ≥ n. Information theoretically, the best we can do is
(m + n)!
lg -------- = n lg (m/n) + O(n)
m! n!
comparisons, since in order to verify an empty intersection, we essentially have to perform a sorted merge. We can get within a constant factor of this bound by iterating through B and keeping a "cursor" in A indicating the position at which the most recent element of B should be inserted to maintain sorted order. We use exponential search to advance the cursor, for a total cost that is on the order of
lg x_1 + lg x_2 + ... + lg x_n,
where x_1 + x_2 + ... + x_n = m + n is some integer partition of m. This sum is O(n lg (m/n)) by the concavity of lg.
I don't know if this is the fastest option but here's one that runs in O(n+m) where n and m are the sizes of your lists:
Loop over both lists until one of them is empty in the following way:
Advance by one on one list.
Advance on the other list until you find a value that is either equal or greater than the current value of the other list.
If it is equal, the element belongs to the intersection and you can append it to another list
If it is greater that the other element, advance on the other list until you find a value equal or greater than this value
as said, repeat this until one of the lists is empty
Here is a simple and tested Python implementation that uses bisect search to advance pointers of both lists.
It assumes both input lists are sorted and contain no duplicates.
import bisect
def compute_intersection_list(l1, l2):
# A is the smaller list
A, B = (l1, l2) if len(l1) < len(l2) else (l2, l1)
i = 0
j = 0
intersection_list = []
while i < len(A) and j < len(B):
if A[i] == B[j]:
intersection_list.append(A[i])
i += 1
j += 1
elif A[i] < B[j]:
i = bisect.bisect_left(A, B[j], lo=i+1)
else:
j = bisect.bisect_left(B, A[i], lo=j+1)
return intersection_list
# test on many random cases
import random
MM = 100 # max value
for _ in range(10000):
M1 = random.randint(0, MM) # random max value
N1 = random.randint(0, M1) # random number of values
M2 = random.randint(0, MM) # random max value
N2 = random.randint(0, M2) # random number of values
a = sorted(random.sample(range(M1), N1)) # sampling without replacement to have no duplicates
b = sorted(random.sample(range(M2), N2))
assert compute_intersection_list(a, b) == sorted(set(a).intersection(b))

Sum of continuous sequences

Given an array A with N elements, I want to find the sum of minimum elements in all the possible contiguous sub-sequences of A. I know if N is small we can look for all possible sub sequences but as N is upto 10^5 what can be best way to find this sum?
Example: Let N=3 and A[1,2,3] then ans is 10 as Possible contiguous sub sequences {(1),(2),(3),(1,2),(1,2,3),(2,3)} so Sum of minimum elements = 1 + 2 + 3 + 1 + 1 + 2 = 10
Let's fix one element(a[i]). We want to know the position of the rightmost element smaller than this one located to the left from i(L). We also need to know the position of the leftmost element smaller than this one located to the right from i(R).
If we know L and R, we should add (i - L) * (R - i) * a[i] to the answer.
It is possible to precompute L and R for all i in linear time using a stack. Pseudo code:
s = new Stack
L = new int[n]
fill(L, -1)
for i <- 0 ... n - 1:
while !s.isEmpty() && s.top().first > a[i]:
s.pop()
if !s.isEmpty():
L[i] = s.top().second
s.push(pair(a[i], i))
We can reverse the array and run the same algorithm to find R.
How to deal with equal elements? Let's assume that a[i] is a pair <a[i], i>. All elements are distinct now.
The time complexity is O(n).
Here is a full pseudo code(I assume that int can hold any integer value here, you should
choose a feasible type to avoid an overflow in a real code. I also assume that all elements are distinct):
int[] getLeftSmallerElementPositions(int[] a):
s = new Stack
L = new int[n]
fill(L, -1)
for i <- 0 ... n - 1:
while !s.isEmpty() && s.top().first > a[i]:
s.pop()
if !s.isEmpty():
L[i] = s.top().second
s.push(pair(a[i], i))
return L
int[] getRightSmallerElementPositions(int[] a):
R = getLeftSmallerElementPositions(reversed(a))
for i <- 0 ... n - 1:
R[i] = n - 1 - R[i]
return reversed(R)
int findSum(int[] a):
L = getLeftSmallerElementPositions(a)
R = getRightSmallerElementPositions(a)
int res = 0
for i <- 0 ... n - 1:
res += (i - L[i]) * (R[i] - i) * a[i]
return res
If the list is sorted, you can consider all subsets for size 1, then 2, then 3, to N. The algorithm is initially somewhat inefficient, but an optimized version is below. Here's some pseudocode.
let A = {1, 2, 3}
let total_sum = 0
for set_size <- 1 to N
total_sum += sum(A[1:N-(set_size-1)])
First, sets with one element:{{1}, {2}, {3}}: sum each of the elements.
Then, sets of two element {{1, 2}, {2, 3}}: sum each element but the last.
Then, sets of three elements {{1, 2, 3}}: sum each element but the last two.
But this algorithm is inefficient. To optimize to O(n), multiply each ith element by N-i and sum (indexing from zero here). The intuition is that the first element is the minimum of N sets, the second element is the minimum of N-1 sets, etc.
I know it's not a python question, but sometimes code helps:
A = [1, 2, 3]
# This is [3, 2, 1]
scale = range(len(A), 0, -1)
# Take the element-wise product of the vectors, and sum
sum(a*b for (a,b) in zip(A, scale))
# Or just use the dot product
np.dot(A, scale)

Counting number of points in lower left quadrant?

I am having trouble understanding a solution to an algorithmic problem
In particular, I don't understand how or why this part of the code
s += a[i];
total += query(s);
update(s);
allows you to compute the total number of points in the lower left quadrant of each point.
Could someone please elaborate?
As an analogue for the plane problem, consider this:
For a point (a, b) to lie in the lower left quadrant of (x, y), a <
x & b < y; thus, points of the form (i, P[i]) lie in the lower left quadrant
of (j, P[j]) iff i < j and P[i] < P[j]
When iterating in ascending order, all points that were considered earlier lie on the left compared to the current (i, P[i])
So one only has to locate all P[j]s less that P[i] that have been considered until now
*current point refers to the point in consideration in the current iteration of the for loop that you quoted ie, (i, P[i])
Let's define another array, C[s]:
C[s] = Number of Prefix Sums of array A[1..(i - 1)] that amount to s
So the solution to #3 becomes the sum ... C[-2] + C[-1] + C[0] + C[1] + C[2] ... C[P[i] - 1], ie prefix sum of C[P[i]]
Use the BIT to store the prefix sum of C, thus defining query(s) as:
query(s) = Number of Prefix Sums of array A[1..(i - 1)] that amount to a value < s
Using these definitions, s in the given code gives you the prefix sum up to the current index i (P[i]). total builds the answer, and update simply adds P[i] to the BIT.
We have to repeat this method for all i, hence the for loop.
PS: It uses a data structure called a Binary Indexed Tree (http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees) for operations. If you aren't acquainted with it, I'd recommend that you check the link.
EDIT:
You are given a array S and a value X. You can split S into two disjoint subarrays such that L has all elements of S less than X, and H that has those that are greater than or equal to X.
A: All elements of L are less than all elements of H.
Any subsequence T of S will have some elements of L and some elements of H. Let's say it has p elements of L and q of H. When T is sorted to give T', all p elements of L appear before the q elements of H because of A.
Median being the central value is the value at location m = (p + q)/2
It is intuitive to think that having q >= p implies that the median lies in X, as a proof:
Values in locations [1..p] in T' belong to L. Therefore for the median to be in H, it's position m should be greater than p:
m > p
(p + q)/2 > p
p + q > 2p
q > p
B: q - p > 0
To computer q - p, I replace all elements in T' with -1 if they belong to L ( < X ) and +1 if they belong to H ( >= X)
T looks something like {-1, -1, -1... 1, 1, 1}
It has p times -1 and q times 1. Sum of T' will now give me:
Sum = p * (-1) + q * (1)
C: Sum = q - p
I can use this information to find the value in B.
All subsequences are of the form {A[i], A[i + 2], A[i + 3] ... A[j + 1]} since they are contiguous, To compute sum of A[i] to A[j + 1], I can compute the prefix sum of A[i] with P[i] = A[1] + A[2] + .. A[i - 1]
Sum of subsequence from A[i] to A[j] then can be computed as P[j] - P[i] (j is greater of j and i)
With C and B in mind, we conclude:
Sum = P[j] - P[i] = q - p (q - p > 0)
P[j] - P[i] > 0
P[j] > P[i]
j > i and P[j] > P[i] for each solution that gives you a median >= X
In summary:
Replace all A[i] with -1 if they are less than X and -1 otherwise
Computer prefix sums of A[i]
For each pair (i, P[i]), count pairs which lie to its lower left quadrant.

Resources