Two knapsacks with smallest delta in sum of values - algorithm

This question is a rephrased problem I encountered during implementation of some system at work. I thought it's a bit similiar to knapsack problem and was curious to explore how it can be solved since I wasn't able to come up with a solution.
Problem statement: Given a set of items, each with weight and value, and two knapsacks, determine which items to include in both of these knapsacks so each knapsack has exactly a weight of K and the delta of sum of values of these two knapsacks is as small as possible. If it's not possible to satisfy weight constraint for both knapsacks algorithm should return nothing.
I think some sort of greedy algorithm might be a satisfying solution but not sure how to write it.

This can be solved with a dynamic programming approach. Here is an approach with linked lists.
from collections import namedtuple
ListEntry = namedtuple('ListEntry', 'id weight value prev')
Thing = namedtuple('Thing', 'weight value')
def add_entry_to_list(i, e, l):
return ListEntry(i, l.weight + e.weight, l.value + e.value, l)
def split_entries (entries, target_weight):
empty_list = ListEntry(None, 0, 0, None)
dp_soln = { (0, 0): (empty_list, empty_list) }
for i in range(len(entries)):
dp_soln_new = {}
e = entries[i]
for k, v in dp_soln.items():
(weight_l, weight_r) = k
(l_left, l_right) = v
this_options = {k: v}
this_options[(weight_l + e.weight, weight_r)] = (add_entry_to_list(i, e, l_left), l_right)
this_options[(weight_l, weight_r + e.weight)] = (l_left, add_entry_to_list(i, e, l_right))
for o_k, o_v in this_options.items():
if target_weight < max(o_k):
pass # Can't lead to (target_weight, target_weight)
elif o_k not in dp_soln_new:
dp_soln_new[o_k] = o_v
else:
diff = o_v[0].value - o_v[1].value
existing_diff = dp_soln_new[o_k][0].value - dp_soln_new[o_k][1].value
if existing_diff < diff:
dp_soln_new[o_k] = o_v
dp_soln = dp_soln_new
final_key = (target_weight, target_weight)
if final_key in dp_soln:
return dp_soln[final_key]
else:
return None
print(split_entries([
Thing(1, 3),
Thing(1, 4),
Thing(2, 1),
Thing(2, 5),
], 3))

Related

What is the best algo for this array minimization problem?

I have an array say 'A' of size n having some numbers {a1, a2, …, an} not necessarily distinct.
I have to create another array B = {b1, b2, …, bn} which are distinct such that the value of
sum|ai - bi| over all i's{i =1 to i =n} is minimized.
Basically I want to minimize sum of |ai - bi| over all i
What is the best algo for this?
I tried a greedy approach:
pseudocode:
for i = 0 to n-1{
if(a[i] not in b){
b[i] = a[i];}
else{
cnt = 1
assigned = false
do{
if(a[i]-cnt not in b){
b[i] = a[i]-cnt;
assigned = true}
elif(a[i]+cnt not in b){
b[i] = a[i]+cnt;
assigned = true}
else
cnt++
}while(assigned==false)
}//else
}//for loop
NOte:
'n' is an input variable.
the goal is to minimize sum of |ai - bi| over all i
I came up with a O(NlogN) solution. Its based on sorting the input-sequence and greedily expanding the available numbers around it.
Code implementation in Python:
def get_closest_distinct_tuple(X: list):
X = sorted(X, reverse=True)
hmap = {}
used_set = set()
Y = []
for x in X:
if x not in used_set:
Y.append(x)
hmap[x] = 1
used_set.add(x)
else:
Y.append(x + hmap[x])
used_set.add(x + hmap[x])
hmap[x] = 1 - hmap[x] if hmap[x] < 0 else -hmap[x]
dist = sum([abs(X[i]-Y[i]) for i in range(len(X))])
return dist, Y
print(get_closest_distinct_tuple([20, 1, 1, 1, 1, 1, 1]))
Output:
Dist: 9
Y = [20, 1, 2, 0, 3, -1, 4]
I couldnt really find a way to prove that this is the most optimal solution out there.

Find two disjoint susequences with minimum difference( > n)

You are given an array of positive numbers and you need to find two disjoint sub sequences with minimum difference less than or equal to n. These sub sequences may or may not be contiguous
For Example
if array = [10,12,101,105,1000] and n = 10
Ans = [10,105] & [12,101]
If minimum difference > n then there is no solution.
Ex- array = [100, 150, 125] and n = 7
I assume this could be done using DP but I fail to derive a recurrence.
If the sum of all your elements is not too big (within millions), then you can do a solution similar to a knapsack problem. Your DP state is a difference between two sets, and for every element you iterate over all the differences you know so far, and update them. Since a difference cannot exceed the sum of all the elements, the complexity ends up being O(n * s) where n is the number of elements and s is their sum. You will need some kind of logic to restore the answer. For example, if all your elements are non-negative, you can just store the previous difference. Here's an example python code (I slightly modified your sample case, because for your case it finds a non-interesting answer [10], [12])
a = [5, 20, 30, 1000]
n = 7
states = {(0, False, False): None} # state is sum, hasPositive, hasNegative => prevsum, prevHasP, prevHasN
for el in a:
newStates = {}
for k, v in states.items():
newStates[k] = v
for v, hasP, hasN in states:
if (v + el, True, hasN) not in newStates:
newStates[(v + el, True, hasN)] = (v, hasP, hasN)
if (v - el, hasP, True) not in newStates:
newStates[(v - el, hasP, True)] = (v, hasP, hasN)
states = newStates
best = None
for key, hasP, hasN in states.keys():
if key >= -n and key <= n and hasP and hasN and (best == None or abs(best[0]) > abs(key)):
best = (key, hasP, hasN)
if best is None: print "Impossible"
else:
ans1 = []
ans2 = []
while best[1] or best[2]: # while hasPositive or hasNegative
prev = states[best]
delta = best[0] - prev[0]
if delta > 0:
ans1.append(delta)
else:
ans2.append(- delta)
best = prev
print ans1
print ans2
As I mentioned before, it will only work if all your elements are non-negative, but it is easy to adjust the code that restores the answer to also work if the elements can be negative.

Algorithm to partition/distribute sum between buckets in all unique ways

The Problem
I need an algorithm that does this:
Find all the unique ways to partition a given sum across 'buckets' not caring about order
I hope I was clear reasonably coherent in expressing myself.
Example
For the sum 5 and 3 buckets, what the algorithm should return is:
[5, 0, 0]
[4, 1, 0]
[3, 2, 0]
[3, 1, 1]
[2, 2, 1]
Disclaimer
I'm sorry if this question might be a dupe, but I don't know exactly what these sort of problems are called. Still, I searched on Google and SO using all wordings that I could think of, but only found results for distributing in the most even way, not all unique ways.
Its bit easier for me to code few lines than writing a 5-page essay on algorithm.
The simplest version to think of:
vector<int> ans;
void solve(int amount, int buckets, int max){
if(amount <= 0) { printAnswer(); return;}
if(amount > buckets * max) return; // we wont be able to fulfill this request anymore
for(int i = max; i >= 1; i--){
ans.push_back(i);
solve(amount-i, buckets-1, i);
ans.pop_back();
}
}
void printAnswer(){
for(int i = 0; i < ans.size(); i++) printf("%d ", ans[i]);
for(int i = 0; i < all_my_buckets - ans.size(); i++) printf("0 ");
printf("\n");
}
Its also worth improving to the point where you stack your choices like solve( amount-k*i, buckets-k, i-1) - so you wont create too deep recurrence. (As far as I know the stack would be of size O(sqrt(n)) then.
Why no dynamic programming?
We dont want to find count of all those possibilities, so even if we reach the same point again, we would have to print every single number anyway, so the complexity will stay the same.
I hope it helps you a bit, feel free to ask me any question
Here's something in Haskell that relies on this answer:
import Data.List (nub, sort)
parts 0 = []
parts n = nub $ map sort $ [n] : [x:xs | x <- [1..n`div`2], xs <- parts(n - x)]
partitions n buckets =
let p = filter (\x -> length x <= buckets) $ parts n
in map (\x -> if length x == buckets then x else addZeros x) p
where addZeros xs = xs ++ replicate (buckets - length xs) 0
OUTPUT:
*Main> partitions 5 3
[[5,0,0],[1,4,0],[1,1,3],[1,2,2],[2,3,0]]
If there are only three buckets this wud be the simplest code.
for(int i=0;i<=5;i++){
for(int j=0;j<=5-i&&j<=i;j++){
if(5-i-j<=i && 5-i-j<=j)
System.out.println("["+i+","+j+","+(5-i-j)+"]");
}
}
A completely different method, but if you don't care about efficiency or optimization, you could always use the old "bucket-free" partition algorithms. Then, you could filter the search by checking the number of zeroes in the answers.
For example [1,1,1,1,1] would be ignored since it has more than 3 buckets, but [2,2,1,0,0] would pass.
This is called an integer partition.
Fast Integer Partition Algorithms is a comprehensive paper describing all of the fastest algorithms for performing an integer partition.
Just adding my approach here along with the others'. It's written in Python, so it's practically like pseudocode.
My first approach worked, but it was horribly inefficient:
def intPart(buckets, balls):
return uniqify(_intPart(buckets, balls))
def _intPart(buckets, balls):
solutions = []
# base case
if buckets == 1:
return [[balls]]
# recursive strategy
for i in range(balls + 1):
for sol in _intPart(buckets - 1, balls - i):
cur = [i]
cur.extend(sol)
solutions.append(cur)
return solutions
def uniqify(seq):
seen = set()
sort = [list(reversed(sorted(elem))) for elem in seq]
return [elem for elem in sort if str(elem) not in seen and not seen.add(str(elem))]
Here's my reworked solution. It completely avoids the need to 'uniquify' it by the tracking the balls in the previous bucket using the max_ variable. This sorts the lists and prevents any dupes:
def intPart(buckets, balls, max_ = None):
# init vars
sols = []
if max_ is None:
max_ = balls
min_ = max(0, balls - max_)
# assert stuff
assert buckets >= 1
assert balls >= 0
# base cases
if (buckets == 1):
if balls <= max_:
sols.append([balls])
elif balls == 0:
sol = [0] * buckets
sols.append(sol)
# recursive strategy
else:
for there in range(min_, balls + 1):
here = balls - there
ways = intPart(buckets - 1, there, here)
for way in ways:
sol = [here]
sol.extend(way)
sols.append(sol)
return sols
Just for comprehensiveness, here's another answer stolen from MJD written in Perl:
#!/usr/bin/perl
sub part {
my ($n, $b, $min) = #_;
$min = 0 unless defined $min;
# base case
if ($b == 0) {
if ($n == 0) { return ([]) }
else { return () }
}
my #partitions;
for my $first ($min .. $n) {
my #sub_partitions = part($n - $first, $b-1, $first);
for my $sp (#sub_partitions) {
push #partitions, [$first, #$sp];
}
}
return #partitions;
}

Balanced partition

I know this was talked over a lot here, but I am struggling with this problem.
We have a set of numbers, e.g [3, 1, 1, 2, 2, 1], and we need to break it into two subsets, so the each sum is equal or difference is minimal.
I've seen wikipedia entry, this page (problem 7) and a blog entry.
But every algorithm listed is giving only YES/NO result and I really don't understand how to use them to print out two subsets (e.g S1 = {5, 4} and S2 = {5, 3, 3}). What am I missing here?
The pseudo-polynomial algorithm is designed to provide an answer to the decision problem, not the optimization problem. However, note that the last row in the table of booleans in the example indicates that the current set is capable of summing up to N/2.
In the last row, take the first column where the boolean value is true. You can then check what the actual value of the set in the given column is. If the sets summed value is N/2 you have found the first set of the partition. Otherwise you have to check which set is capable of being the difference to N/2. You can use the same approach as above, this time for the difference d.
This will be O(2^N). No Dynamic Programming used here. You can print result1, result2 and difference after execution of the function. I hope this helps.
vector<int> p1,p2;
vector<int> result1,result2;
vector<int> array={12,323,432,4,55,223,45,67,332,78,334,23,5,98,34,67,4,3,86,99,78,1};
void partition(unsigned int i,long &diffsofar, long sum1,long sum2)
{
if(i==array.size())
{
long diff= abs(sum1 - sum2);
if(diffsofar > diff)
{
result1 = p1;
result2 = p2;
diffsofar = diff;
}
return;
}
p1.push_back(array[i]);
partition(i+1,diffsofar,sum1+array[i],sum2);
p1.pop_back();
p2.push_back(array[i]);
partition(i+1,diffsofar,sum1,sum2+array[i]);
p2.pop_back();
return;
}
I faced this same problem recently, and I posted a question about it (here: Variant of Knapsack). The difference in my case is that the resulting subsets must be the same size (if the original set has an even number of elements). In order to assure that, I added a few lines to #Sandesh Kobal answer;
void partition(unsigned int i,long &diffsofar, long sum1,long sum2)
{
int maxsize = (array.size()+1)/2;
if(p1.size()>maxsize)
return;
if(p2.size()>maxsize)
return;
if(i==array.size())
{
...
Also, after both calls to partition, I added if(diffsofar==0) return;. If we already found an optimal solution, it makes no sense to keep searching...
All the articles I've seen take a dynamic programming approach. Do we really need one?
Suppose the array given is arr
Use the following algorithm :
Sort the array in descending order
Create two empty arrays, a = [] and b = []
sum_a = sum_b = 0
for x in arr:
if sum_a > sum_b:
b.append(x)
sum_b += x
else:
a.append(x)
sum_a += x
The absolute difference between sum_a and sum_b would be the minimum possible difference between the two subsets.
Consider arr = [3,1,1,2,2,1]
Sorting the array : arr = [3,2,2,1,1,1]
a = [], b = []
a = [3], b = []
a = [3], b = [2]
a = [3], b = [2,2]
a = [3,1], b = [2,2]
a = [3,1,1], b = [2,2]
a = [3,1,1], b = [2,2,1]
sa = 5, sb = 5
Minimum difference : 5 - 5 = 0

Randomly Generate a set of numbers of n length totaling x

I'm working on a project for fun and I need an algorithm to do as follows:
Generate a list of numbers of Length n which add up to x
I would settle for list of integers, but ideally, I would like to be left with a set of floating point numbers.
I would be very surprised if this problem wasn't heavily studied, but I'm not sure what to look for.
I've tackled similar problems in the past, but this one is decidedly different in nature. Before I've generated different combinations of a list of numbers that will add up to x. I'm sure that I could simply bruteforce this problem but that hardly seems like the ideal solution.
Anyone have any idea what this may be called, or how to approach it? Thanks all!
Edit: To clarify, I mean that the list should be length N while the numbers themselves can be of any size.
edit2: Sorry for my improper use of 'set', I was using it as a catch all term for a list or an array. I understand that it was causing confusion, my apologies.
This is how to do it in Python
import random
def random_values_with_prescribed_sum(n, total):
x = [random.random() for i in range(n)]
k = total / sum(x)
return [v * k for v in x]
Basically you pick n random numbers, compute their sum and compute a scale factor so that the sum will be what you want it to be.
Note that this approach will not produce "uniform" slices, i.e. the distribution you will get will tend to be more "egalitarian" than it should be if it was picked at random among all distribution with the given sum.
To see the reason you can just picture what the algorithm does in the case of two numbers with a prescribed sum (e.g. 1):
The point P is a generic point obtained by picking two random numbers and it will be uniform inside the square [0,1]x[0,1]. The point Q is the point obtained by scaling P so that the sum is required to be 1. As it's clear from the picture the points close to the center of the have an higher probability; for example the exact center of the squares will be found by projecting any point on the diagonal (0,0)-(1,1), while the point (0, 1) will be found projecting only points from (0,0)-(0,1)... the diagonal length is sqrt(2)=1.4142... while the square side is only 1.0.
Actually, you need to generate a partition of x into n parts. This is usually done the in following way: The partition of x into n non-negative parts can be represented in the following way: reserve n + x free places, put n borders to some arbitrary places, and stones to the rest. The stone groups add up to x, thus the number of possible partitions is the binomial coefficient (n + x \atop n).
So your algorithm could be as follows: choose an arbitrary n-subset of (n + x)-set, it determines uniquely a partition of x into n parts.
In Knuth's TAOCP the chapter 3.4.2 discusses random sampling. See Algortihm S there.
Algorithm S: (choose n arbitrary records from total of N)
t = 0, m = 0;
u = random, uniformly distributed on (0, 1)
if (N - t)*u >= n - m, skip t-th record and increase t by 1; otherwise include t-th record in the sample, increase m and t by 1
if M < n, return to 2, otherwise, algorithm finished
The solution for non-integers is algorithmically trivial: you just select arbitrary n numbers that don't sum up to 0, and norm them by their sum.
If you want to sample uniformly in the region of N-1-dimensional space defined by x1 + x2 + ... + xN = x, then you're looking at a special case of sampling from a Dirichlet distribution. The sampling procedure is a little more involved than generating uniform deviates for the xi. Here's one way to do it, in Python:
xs = [random.gammavariate(1,1) for a in range(N)]
xs = [x*v/sum(xs) for v in xs]
If you don't care too much about the sampling properties of your results, you can just generate uniform deviates and correct their sum afterwards.
Here is a version of the above algorithm in Javascript
function getRandomArbitrary(min, max) {
return Math.random() * (max - min) + min;
};
function getRandomArray(min, max, n) {
var arr = [];
for (var i = 0, l = n; i < l; i++) {
arr.push(getRandomArbitrary(min, max))
};
return arr;
};
function randomValuesPrescribedSum(min, max, n, total) {
var arr = getRandomArray(min, max, n);
var sum = arr.reduce(function(pv, cv) { return pv + cv; }, 0);
var k = total/sum;
var delays = arr.map(function(x) { return k*x; })
return delays;
};
You can call it with
var myarray = randomValuesPrescribedSum(0,1,3,3);
And then check it with
var sum = myarray.reduce(function(pv, cv) { return pv + cv;},0);
This code does a reasonable job. I think it produces a different distribution than 6502's answer, but I am not sure which is better or more natural. Certainly his code is clearer/nicer.
import random
def parts(total_sum, num_parts):
points = [random.random() for i in range(num_parts-1)]
points.append(0)
points.append(1)
points.sort()
ret = []
for i in range(1, len(points)):
ret.append((points[i] - points[i-1]) * total_sum)
return ret
def test(total_sum, num_parts):
ans = parts(total_sum, num_parts)
assert abs(sum(ans) - total_sum) < 1e-7
print ans
test(5.5, 3)
test(10, 1)
test(10, 5)
In python:
a: create a list of (random #'s 0 to 1) times total; append 0 and total to the list
b: sort the list, measure the distance between each element
c: round the list elements
import random
import time
TOTAL = 15
PARTS = 4
PLACES = 3
def random_sum_split(parts, total, places):
a = [0, total] + [random.random()*total for i in range(parts-1)]
a.sort()
b = [(a[i] - a[i-1]) for i in range(1, (parts+1))]
if places == None:
return b
else:
b.pop()
c = [round(x, places) for x in b]
c.append(round(total-sum(c), places))
return c
def tick():
if info.tick == 1:
start = time.time()
alpha = random_sum_split(PARTS, TOTAL, PLACES)
end = time.time()
log('alpha: %s' % alpha)
log('total: %.7f' % sum(alpha))
log('parts: %s' % PARTS)
log('places: %s' % PLACES)
log('elapsed: %.7f' % (end-start))
yields:
[2014-06-13 01:00:00] alpha: [0.154, 3.617, 6.075, 5.154]
[2014-06-13 01:00:00] total: 15.0000000
[2014-06-13 01:00:00] parts: 4
[2014-06-13 01:00:00] places: 3
[2014-06-13 01:00:00] elapsed: 0.0005839
to the best of my knowledge this distribution is uniform

Resources