knapsack problem variation with almost no constraints - algorithm

I have this variation of knapsack with very few constraints, and the lack of contraints does that i really don't know where to start.
Given a set S of positive integers. could be:
1 2 3 4 5 6 7 8 9 10 11 12 13
find two non-overlapping subsets that have the same total each. The two sets do not need to contain all numbers in S.
So for the former example, the answer would be
[1,2] and [3]
Usually these problems have constraints such as the subsets needing to have specific sums, or the subsets needing to span over all elements of S.
This makes it hard for me to imagine how to solve this via bruteforce. Every time I come up with a dynamic programming table, I can't get it to cover all possible permutations of subsets

This problem might be solved like subset sum problem in pseudopolynomial time O(n*summ)
We fill array 0..summ with possible subset sums, and when we meet the same sum - we stop.
Two equal sums might be composed with some equal items - and we just remove them, so the rest sums contain only distinct items.
Example in Python using binary arithmetics to store sets (bit i+1 corresponds to using i-th item in sum). common contains equal bits, we remove them using xor operation.
The last lines retrieve needed sets themselves.
L = [2,5,11,17,29,37,43]
summ = sum(L)
A =[0]*(summ+1)
A[0] = 1
X = 0
Y = 0
for i in range(len(L)):
for k in range(summ, L[i] - 1, -1):
if A[k - L[i]]:
t = A[k - L[i]] | (2 << i)
if A[k]:
common = A[k] & t
X = A[k] ^ common
Y = t ^ common
break
else:
A[k] = t
if X:
break
first = [L[i] for i in range(len(L)) if (X & (2 << i))]
second = [L[i] for i in range(len(L)) if (Y & (2 << i))]
print(first, second)
>>> [2, 11, 29] [5, 37]
In this example code finds equal sums 59 for [2, 11, 17, 29] and [5, 17, 37] and removes common 17 to get final results with sum 42
It is not obligatory to store sets in A[] cells - we can store the last item of sum, then unwind item sequence
L = [2,5,11,17,29,37,43]
summ = sum(L)
A =[0]*(summ+1)
A[0] = -1
last = 0
for i in range(len(L)):
for k in range(summ, L[i] - 1, -1):
if A[k - L[i]]:
t = L[i]
if A[k]:
last = k
break
else:
A[k] = t
if last:
break
first = set()
k = last
while k:
first.add(A[k])
k = k - A[k]
second = set()
second.add(t)
k = last - t
while k:
second.add(A[k])
k = k - A[k]
print(first.difference(second),second.difference(first))
>>> {2, 11, 29} {37, 5}

Related

Complex Combinatorial Conditions on Dynamic Programming

I am exploring how a Dynamic Programming design approach relates to the underlying combinatorial properties of problems.
For this, I am looking at the canonical instance of the coin change problem: Let S = [d_1, d_2, ..., d_m] and n > 0 be a requested amount. In how many ways can we add up to n using nothing but the elements in S?
If we follow a Dynamic Programming approach to design an algorithm for this problem that would allow for a solution with polynomial complexity, we would start by looking at the problem and how it is related to smaller and simpler sub-problems. This would yield a recursive relation describing an inductive step representing the problem in terms of the solutions to its related subproblems. We can then implement either a memoization technique or a tabulation technique to efficiently implement this recursive relation in a top-down or a bottom-up manner, respectively.
A recursive relation to solve this instance of the problem could be the following (Python 3.6 syntax and 0-based indexing):
def C(S, m, n):
if n < 0:
return 0
if n == 0:
return 1
if m <= 0:
return 0
count_wout_high_coin = C(S, m - 1, n)
count_with_high_coin = C(S, m, n - S[m - 1])
return count_wout_high_coin + count_with_high_coin
This recursive relation yields a correct amount of solutions but disregarding the order. However, this relation:
def C(S, n):
if n < 0:
return 0
if n == 0:
return 1
return sum([C(S, n - coin) for coin in S])
yields a correct amount of solutions while regarding the order.
I am interested in capturing more subtle combinatorial patterns through a recursion relation that can be further optimized via memorization/tabulation.
For example, this relation:
def C(S, m, n, p):
if n < 0:
return 0
if n == 0 and not p:
return 1
if n == 0 and p:
return 0
if m == 0:
return 0
return C(S, m - 1, n, p) + C(S, m, n - S[n - 1], not p)
yields a solution disregarding order but counting only solutions with an even number of summands. The same relation can be modified to regard order and counting number of even number of summands:
def C(S, n, p):
if n < 0:
return 0
if n == 0 and not p:
return 1
if n == 0 and p:
return 0
return sum([C(S, n - coin, not p) for coin in S])
However, what if we have more than 1 person among which we want to split the coins? Say I want to split n among 2 persons s.t. each person gets the same number of coins, regardless of the total sum each gets. From the 14 solutions, only 7 include an even number of coins so that I can split them evenly. But I want to exclude redundant assignments of coins to each person. For example, 1 + 2 + 2 + 1 and 1 + 2 + 1 + 2 are different solutions when order matters, BUT they represent the same split of coins to two persons, i.e. person B would get 1 + 2 = 2 + 1. I am having a hard time coming up with a recursion to count splits in a non-redundant manner.
(Before I elaborate on a possible answer, let me just point out that counting the splits of the coin exchange, for even n, by sum rather than coin-count would be more or less trivial since we can count the number of ways to exchange n / 2 and multiply it by itself :)
Now, if you'd like to count splits of the coin exchange according to coin count, and exclude redundant assignments of coins to each person (for example, where splitting 1 + 2 + 2 + 1 into two equal size parts is only either (1,1) | (2,2), (2,2) | (1,1) or (1,2) | (1,2) and element order in each part does not matter), we could rely on your first enumeration of partitions where order is disregarded.
However, we would need to know the multiset of elements in each partition (or an aggregate of similar ones) in order to count the possibilities of dividing them in two. For example, to count the ways to split 1 + 2 + 2 + 1, we would first count how many of each coin we have:
def partitions_with_even_number_of_parts_as_multiset(n, coins):
results = []
def C(m, n, s, p):
if n < 0 or m <= 0:
return
if n == 0:
if not p:
results.append(s)
return
C(m - 1, n, s, p)
_s = s[:]
_s[m - 1] += 1
C(m, n - coins[m - 1], _s, not p)
C(len(coins), n, [0] * len(coins), False)
return results
Output:
=> partitions_with_even_number_of_parts_as_multiset(6, [1,2,6])
=> [[6, 0, 0], [2, 2, 0]]
^ ^ ^ ^ this one represents two 1's and two 2's
Now since we are counting the ways to choose half of these, we need to find the coefficient of x^2 in the polynomial multiplication
(x^2 + x + 1) * (x^2 + x + 1) = ... 3x^2 ...
which represents the three ways to choose two from the multiset count [2,2]:
2,0 => 1,1
0,2 => 2,2
1,1 => 1,2
In Python, we can use numpy.polymul to multiply polynomial coefficients. Then we lookup the appropriate coefficient in the result.
For example:
import numpy
def count_split_partitions_by_multiset_count(multiset):
coefficients = (multiset[0] + 1) * [1]
for i in xrange(1, len(multiset)):
coefficients = numpy.polymul(coefficients, (multiset[i] + 1) * [1])
return coefficients[ sum(multiset) / 2 ]
Output:
=> count_split_partitions_by_multiset_count([2,2,0])
=> 3
Here is a table implementation and a little elaboration on algrid's beautiful answer. This produces an answer for f(500, [1, 2, 6, 12, 24, 48, 60]) in about 2 seconds.
The simple declaration of C(n, k, S) = sum(C(n - s_i, k - 1, S[i:])) means adding all the ways to get to the current sum, n using k coins. Then if we split n into all ways it can be partitioned in two, we can just add all the ways each of those parts can be made from the same number, k, of coins.
The beauty of fixing the subset of coins we choose from to a diminishing list means that any arbitrary combination of coins will only be counted once - it will be counted in the calculation where the leftmost coin in the combination is the first coin in our diminishing subset (assuming we order them in the same way). For example, the arbitrary subset [6, 24, 48], taken from [1, 2, 6, 12, 24, 48, 60], would only be counted in the summation for the subset [6, 12, 24, 48, 60] since the next subset, [12, 24, 48, 60] would not include 6 and the previous subset [2, 6, 12, 24, 48, 60] has at least one 2 coin.
Python code (see it here; confirm here):
import time
def f(n, coins):
t0 = time.time()
min_coins = min(coins)
m = [[[0] * len(coins) for k in xrange(n / min_coins + 1)] for _n in xrange(n + 1)]
# Initialize base case
for i in xrange(len(coins)):
m[0][0][i] = 1
for i in xrange(len(coins)):
for _i in xrange(i + 1):
for _n in xrange(coins[_i], n + 1):
for k in xrange(1, _n / min_coins + 1):
m[_n][k][i] += m[_n - coins[_i]][k - 1][_i]
result = 0
for a in xrange(1, n + 1):
b = n - a
for k in xrange(1, n / min_coins + 1):
result = result + m[a][k][len(coins) - 1] * m[b][k][len(coins) - 1]
total_time = time.time() - t0
return (result, total_time)
print f(500, [1, 2, 6, 12, 24, 48, 60])

Counting Inversions In An Array - Special Case

Inversion Count for an array indicates – how far (or close) the array is from being sorted. If array is already sorted then inversion count is 0. If array is sorted in reverse order that inversion count is the maximum.
Formally speaking, two elements a[i] and a[j] form an inversion if a[i] > a[j] and i < j Example:
The sequence 2, 4, 1, 3, 5 has three inversions (2, 1), (4, 1), (4, 3).
Now, there are various algorithms to solve this in O(n log n).
There is a special case where the array only has 3 types of elements - 1, 2 and 3. Now, is it possible to count the inversions in O(n) ?
Eg 1,1,3,2,3,1,3
Yes it is. Just take 3 integers a,b,c where a is number of 1's encountered till now, b is number of 2's encountered till now and c is number of 3's encountered till now. Given this follow the algorithm below ( I assume numbers are given in array arr and the size is n, with 1 based indexing, also following is just a pseudocode )
no_of_inv = 0
a = 0
b = 0
c = 0
for i from 1 to n:
if arr[i] == 1:
no_of_inv = no_of_inv + b + c
a++
else if arr[i] == 2:
no_of_inv = no_of_inv + c
b++
else:
c++
(This algorithm is extremely similar to Sasha's. I just wanted to provide an explanation as well.)
Every inversion (i, j) satisfies 0 ≤ i < j < n. Let's define S[j] to be the number of inversions of the form (i, j); that is, S[j] is the number of times A[i] > A[j] for 0 ≤ i < j. Then the total number of inversions is T = S[0] + S[1] + … + S[n - 1].
Let C[x][j] be the number of times A[i] > x for 0 ≤ i < j. Then S[j] = C[A[j]][j] for all j. If we can compute the 3n values C[x][j] in linear time, then we can compute S in linear time.
Here is some Python code:
>>> import numpy as np
>>> A = np.array([1, 1, 3, 2, 3, 1, 3])
>>> C = {x: np.cumsum(A > x) for x in np.unique(A)}
>>> T = sum(C[A[j]][j] for j in range(len(A)))
>>> print T
4
This could be made more efficient—although not in asmpytotic terms—by not storing all C values at once. The algorithm really only needs a single pass through the array. I have chosen to present it this way because it is most concise.

Find a subset with sum within a range

How can I find a subset of an array that the sum of its elements is within a given range?
For example:
let a = [ 1, 1, 3, 6, 7, 50]
let b = getSubsetSumRange(3, 5)
so b could potentially be [1, 1, 3], [1, 3], [3], [1, 3]; It doesn't matter the order I only need one of them.
You would probably like to use dynamic programming approach to solve this problem.
Let F[i][j] have value true if it is possible to select some numbers among numbers from the original subset a[1..i] so that their sum is equal to j.
i would obviously vary from 1 to length of a, and j from 0 to max inclusively, where max is the second number from your given range.
F[i][0] = true for all i by definition (you can always select empty subset).
Then F[i][j] = F[i - 1][j - a[i]] | F[i - 1][j]
Logically it means that if you can select a subset with sum j from elements 1..i-1, then you obviously can do it with the subset 1..i, and if you can select a subset with sum j - a[i] from elements 1..i-1, then by adding your new element a[i] to that subset, you can get your desired sum j.
After you have calculated the values of F, you can find any F[n][j] that is true for values j lying in your desired range.
Say you have found such number k. Then the algorithm to find the required set would look like that:
for i = n..1:
if F[i - 1][k - a[i]] == True then
output a[i] to the answer
k -= a[i]
if k == 0
break

Sum of continuous sequences

Given an array A with N elements, I want to find the sum of minimum elements in all the possible contiguous sub-sequences of A. I know if N is small we can look for all possible sub sequences but as N is upto 10^5 what can be best way to find this sum?
Example: Let N=3 and A[1,2,3] then ans is 10 as Possible contiguous sub sequences {(1),(2),(3),(1,2),(1,2,3),(2,3)} so Sum of minimum elements = 1 + 2 + 3 + 1 + 1 + 2 = 10
Let's fix one element(a[i]). We want to know the position of the rightmost element smaller than this one located to the left from i(L). We also need to know the position of the leftmost element smaller than this one located to the right from i(R).
If we know L and R, we should add (i - L) * (R - i) * a[i] to the answer.
It is possible to precompute L and R for all i in linear time using a stack. Pseudo code:
s = new Stack
L = new int[n]
fill(L, -1)
for i <- 0 ... n - 1:
while !s.isEmpty() && s.top().first > a[i]:
s.pop()
if !s.isEmpty():
L[i] = s.top().second
s.push(pair(a[i], i))
We can reverse the array and run the same algorithm to find R.
How to deal with equal elements? Let's assume that a[i] is a pair <a[i], i>. All elements are distinct now.
The time complexity is O(n).
Here is a full pseudo code(I assume that int can hold any integer value here, you should
choose a feasible type to avoid an overflow in a real code. I also assume that all elements are distinct):
int[] getLeftSmallerElementPositions(int[] a):
s = new Stack
L = new int[n]
fill(L, -1)
for i <- 0 ... n - 1:
while !s.isEmpty() && s.top().first > a[i]:
s.pop()
if !s.isEmpty():
L[i] = s.top().second
s.push(pair(a[i], i))
return L
int[] getRightSmallerElementPositions(int[] a):
R = getLeftSmallerElementPositions(reversed(a))
for i <- 0 ... n - 1:
R[i] = n - 1 - R[i]
return reversed(R)
int findSum(int[] a):
L = getLeftSmallerElementPositions(a)
R = getRightSmallerElementPositions(a)
int res = 0
for i <- 0 ... n - 1:
res += (i - L[i]) * (R[i] - i) * a[i]
return res
If the list is sorted, you can consider all subsets for size 1, then 2, then 3, to N. The algorithm is initially somewhat inefficient, but an optimized version is below. Here's some pseudocode.
let A = {1, 2, 3}
let total_sum = 0
for set_size <- 1 to N
total_sum += sum(A[1:N-(set_size-1)])
First, sets with one element:{{1}, {2}, {3}}: sum each of the elements.
Then, sets of two element {{1, 2}, {2, 3}}: sum each element but the last.
Then, sets of three elements {{1, 2, 3}}: sum each element but the last two.
But this algorithm is inefficient. To optimize to O(n), multiply each ith element by N-i and sum (indexing from zero here). The intuition is that the first element is the minimum of N sets, the second element is the minimum of N-1 sets, etc.
I know it's not a python question, but sometimes code helps:
A = [1, 2, 3]
# This is [3, 2, 1]
scale = range(len(A), 0, -1)
# Take the element-wise product of the vectors, and sum
sum(a*b for (a,b) in zip(A, scale))
# Or just use the dot product
np.dot(A, scale)

Finding largest from each subarray of length k

Interview Question :- Given an array and an integer k , find the maximum for each and every contiguous sub array of size k.
Sample Input :
1 2 3 1 4 5 2 3 6
3 [ value of k ]
Sample Output :
3
3
4
5
5
5
6
I cant think of anything better than brute force. Worst case is O(nk) when array is sorted in decreasing order.
Just iterate over the array and keep k last elements in a self-balancing binary tree.
Adding element to such tree, removing element and finding current maximum costs O(logk).
Most languages provide standard implementations for such trees. In STL, IIRC, it's MultiSet. In Java you'd use TreeMap (map, because you need to keep count, how many times each element occurs, and Java doesn't provide Multi- collections).
Pseudocode
for (int i = 0; i < n; ++i) {
tree.add(a[i]);
if (tree.size() > k) {
tree.remove(a[i - k]);
}
if (tree.size() == k) {
print(tree.max());
}
}
You can actually do this in O(n) time with O(n) space.
Split the array into blocks of each.
[a1 a2 ... ak] [a(k+1) ... a2k] ...
For each block, maintain two more blocks, the left block and the right block.
The ith element of the left block will be the max of the i elements from the left.
The ith element of the right block will be the max of the i elements from the right.
You will have two such blocks for each block of k.
Now if you want to find the max in range a[i... i+k], say the elements span two of the above blocks of k.
[j-k+1 ... i i+1 ... j] [j+1 ... i+k ... j+k]
All you need to do is find the max of RightMax of i to j of the first block and the left max of j+1 to i+k of the second block.
Hope this is the solution which you are looking for:
def MaxContigousSum(lst, n):
m = [0]
if lst[0] > 0:
m[0] = lst[0]
maxsum = m[0]
for i in range(1, n):
if m[i - 1] + lst[i] > 0:
m.append(m[i - 1] + lst[i])
else:
m.append(0)
if m[i] > maxsum:
maxsum = m[i]
return maxsum
lst = [-2, 11, -4, 13, -5, 2, 1, -3, 4, -2, -1, -6, -9]
print MaxContigousSum(lst, len(lst))
**Output**
20 for [11, -4, 13]

Resources