Partition a Set into k Disjoint Subset - algorithm

Give a Set S, partition the set into k disjoint subsets such that the difference of their sums is minimal.
say, S = {1,2,3,4,5} and k = 2, so { {3,4}, {1,2,5} } since their sums {7,8} have minimal difference. For S = {1,2,3}, k = 2 it will be {{1,2},{3}} since difference in sum is 0.
The problem is similar to The Partition Problem from The Algorithm Design Manual. Except Steven Skiena discusses a method to solve it without rearrangement.
I was going to try Simulated Annealing. So i wondering, if there was a better method?
Thanks in advance.

The pseudo-polytime algorithm for a knapsack can be used for k=2. The best we can do is sum(S)/2. Run the knapsack algorithm
for s in S:
for i in 0 to sum(S):
if arr[i] then arr[i+s] = true;
then look at sum(S)/2, followed by sum(S)/2 +/- 1, etc.
For 'k>=3' I believe this is NP-complete, like the 3-partition problem.
The simplest way to do it for k>=3 is just to brute force it, here's one way, not sure if it's the fastest or cleanest.
import copy
arr = [1,2,3,4]
def t(k,accum,index):
print accum,k
if index == len(arr):
if(k==0):
return copy.deepcopy(accum);
else:
return [];
element = arr[index];
result = []
for set_i in range(len(accum)):
if k>0:
clone_new = copy.deepcopy(accum);
clone_new[set_i].append([element]);
result.extend( t(k-1,clone_new,index+1) );
for elem_i in range(len(accum[set_i])):
clone_new = copy.deepcopy(accum);
clone_new[set_i][elem_i].append(element)
result.extend( t(k,clone_new,index+1) );
return result
print t(3,[[]],0);
Simulated annealing might be good, but since the 'neighbors' of a particular solution aren't really clear, a genetic algorithm might be better suited to this. You'd start out by randomly picking a group of subsets and 'mutate' by moving numbers between subsets.

If the sets are large, I would definitely go for stochastic search. Don't know exactly what spinning_plate means when writing that "the neighborhood is not clearly defined". Of course it is --- you either move one item from one set to another, or swap items from two different sets, and this is a simple neighborhood. I would use both operations in stochastic search (which in practice could be tabu search or simulated annealing.)

Related

Dynamic programming from cormen's book

When reading about dynamic programming in "Introduction to algorithms" By cormen, Chapter 15: Dynamic Programming , I came across this statement
When developing a dynamic-programming algorithm, we follow a sequence of
four steps:
Characterize the structure of an optimal solution.
Recursively define the value of an optimal solution.
Compute the value of an optimal solution, typically in a bottom-up fashion.
Construct an optimal solution from computed information.
Steps 1–3 form the basis of a dynamic-programming solution to a problem. If we
need only the value of an optimal solution, and not the solution itself, then we
can omit step 4. When we do perform step 4, we sometimes maintain additional
information during step 3 so that we can easily construct an optimal solution.
I did not understand the difference in step 3 and 4.
computing the value of optimal solution
and
constructing the optimal solution.
I was expecting to understand this by reading even further, but failed to understand.
Can some one help me understanding this by giving an example ?
Suppose we are using dynamic programming to work out whether there is a subset of [1,3,4,6,10] that sums to 9.
The answer to step 3 is the value, in this case "TRUE".
The answer to step 4 is working out the actual subset that sums to 9, in this case "3+6".
In dynamical programming we most of the time end up with a huge results hash. However initially it only contains the result obtained from the first, smallest, simplest (bottom) case and by using these initial results and calculating on top of them we eventually merge to the target. At this point the last item in the hash most of the time is the target (step 3 completed). Then we will have to process it to get the desired result.
A perfect example could be finding the minimum number of cubes summing up to a target. Target is 500 and we should get [5,5,5,5] or if the target is 432 we must get [6,6].
So we can implement this task in JS as follows;
function getMinimumCubes(tgt){
var maxi = Math.floor(Math.fround(Math.pow(tgt,1/3))),
hash = {0:[[]]},
cube = 0;
for (var i = 1; i <= maxi; i++){
cube = i*i*i;
for (var j = 0; j <= tgt - cube; j++){
hash[j+cube] = hash[j+cube] ? hash[j+cube].concat(hash[j].map(e => e.concat(i)))
: hash[j].map(e => e.concat(i));
}
}
return hash[tgt].reduce((p,c) => p.length < c.length ? p:c);
}
var target = 432,
result = [];
console.time("perf:");
result = getMinimumCubes(target);
console.timeEnd("perf:");
console.log(result);
So in this code, hash = {0:[[]]}, is step 1; the nested for loops which eventually prepare the hash[tgt] are in fact step 3 and the .reduce() functor at the return stage is step 4 since it shapes up the last item of the hash (hash[tgt]) to give us the desired result by filtering out the shortest result among all results that sum up to the target value.
To me the step 2 is somewhat meaningless. Not because of the mention of recursion but also by meaning. Besides I have never used nor seen a recursive approach in dynamical programming. It's best implemented with while or for loops.

Why should we use Dynamic Programming with Memoization in order to solve - Minimum Number of Coins to Make Change

The Problem Statement:
Given an infinite supply of coins of values {C1, C2, ..., Cn} and a sum, find the minimum number of coins that can represent the sum X.
Most of the solutions on the web include dynamic programming with memoization. Here is an example from Youtube: https://www.youtube.com/watch?v=Kf_M7RdHr1M
My question is: why don't we sort the array of coins in descending order first and start exploring recursively by minimizing the sum until we reach 0? When we reach 0, we know that we have found the needed coins to make up the sum. Because we sorted the array in descending order, we know that we will always choose the greatest coin. Therefore, the first time the sum reaches down to 0, the count will have to be minimum.
I'd greatly appreciate if you help understand the complexity of my algorithm and compare it to the dynamic programming with memoization approach.
For simplicty, we are assuming there will always be a "$1" coin and thus there is always a way to make up the sum.
import java.util.*;
public class Solution{
public static void main(String [] args){
MinCount cnt=new MinCount(new Integer []{1,2,7,9});
System.out.println(cnt.count(12));
}
}
class MinCount{
Integer[] coins;
public MinCount(Integer [] coins){
Arrays.sort(coins,Collections.reverseOrder());
this.coins=coins;
}
public int count(int sum){
if(sum<0)return Integer.MAX_VALUE;
if(sum==0)return 0;
int min=Integer.MAX_VALUE;
for(int i=0; i<coins.length; i++){
int val=count(sum-coins[i]);
if(val<min)min=val;
if(val!=Integer.MAX_VALUE)break;
}
return min+1;
}
}
Suppose that you have coins worth $1, $50, and $52, and that your total is $100. Your proposed algorithm would produce a solution that uses 49 coins ($52 + $1 + $1 + … + $1 + $1); but the correct minimum result requires only 2 coins ($50 + $50).
(Incidentally, I think it's cheating to write
For simplicty we are assuming there will always be a "$1" coin and thus there is always a way to make up the sum.
when this is not in the problem statement, and therefore not assumed in other sources. That's a bit like asking "Why do sorting algorithms always put a lot of effort into rearranging the elements, instead of just assuming that the elements are in the right order to begin with?" But as it happens, even assuming the existence of a $1 coin doesn't let you guarantee that the naïve/greedy algorithm will find the optimal solution.)
I will complement the answer that has already been provided to your question with some algorithm design advice.
The solution that you propose is what is called a "greedy algorithm": a problem solving strategy that makes the locally optimal choice at each stage with the hope of finding a global optimum.
In many problems, a greedy strategy does not produce an optimal solution. The best way to disprove the correctess of an algorithm is to find a counter-example, such as the case of the "$52", "$50", and "$1" coins. To find counter-examples, Steven Skiena gives the following advice in his book "The Algorithm Design Manual":
Think small: when an algorithm fails, there is usually a very simple example on which it fails.
Hunt for the weakness: if the proposed algorithm is of the form "always take the biggest" (that is, a greedy algorithm), think about why that might prove to be the wrong thing to do. In particular, ...
Go for a tie: A devious way to break a greedy algorithm is to provide instances where everything is the same size. This way the algorithm may have nothing to base its decision on.
Seek extremes: many counter-examples are mixtures of huge and tiny, left and right, few and many, near and far. It is usually easier to verify or reason about extreme examples than more muddled ones.
#recursive solution in python
import sys
class Solution:
def __init__(self):
self.ans=0
self.maxint=sys.maxsize-1
def minCoins(self, coins, m, v):
res=self.solve(coins,m,v)
if res==sys.maxsize-1:
return -1
return res
def solve(self,coins,m,v):
if m==0 and v>0:
return self.maxint
if v==0:
return 0
if coins[m-1]<=v:
self.ans=min(self.solve(coins,m,v-coins[m-1])+1,self.solve(coins,m-1,v))
return self.ans
else:
self.ans=self.solve(coins,m-1,v)
return self.ans

Forming Dynamic Programming algorithm for a variation of Knapsack Problem

I was thinking,
I wanted to do a variation on the Knapsack Problem.
Imagine the original problem, with items with various weights/value.
My version will, along with having the normal weights/values, contain a "group" value.
eg.
Item1[5kg, $600, electronic]
Item2[1kg, $50, food]
Now, having a set of items like this, how would I code up the knapsack problem to make sure that a maximum of 1 item from each "group" is selected.
Notes:
You don't need to choose an item from that group
There are multiple items in each group
You're still minimizing weight, maximizing value
The amount of groups are predefined, along with their values.
I'm just writing a draft of the code out at this stage, and I've chosen to use a dynamic approach. I understand the idea behind the dynamic solution for the regular knapsack problem, how do I alter this solution to incorporate these "groups"?
KnapSackVariation(v,w,g,n,W)
{
for (w = 0 to W)
V[0,w] = 0;
for(i = 1 to n)
for(w = 0 to W)
if(w[i] <= w)
V[i,w] = max{V[i-1, w], v[i] + V[i-1, w-w[i]]};
else
V[i,w] = V[i-1, w];
return V[n,W];
}
That's what I have so far, need to add it so that it will remove all corresponding items from the group it is in each time it solves this.
just noticed your question trying to find an answer to a question of my own. The problem you've stated is a well-known and well-studied problem called the Multiple Choice Knapsack Problem. If you google that you'll find all sorts of information, and I can also recommend this book: http://www.amazon.co.uk/Knapsack-Problems-Hans-Kellerer/dp/3642073115/ref=sr_1_1?ie=UTF8&qid=1318767496&sr=8-1, which dedicates a whole chapter to the problem. In the classic formulation of MCKP, you have to choose one item from each group. However, you can easily convert that version of the problem to your version by adding a dummy item to each group with profit and weight = 0, and the same algorithms will work. I would caution you against trying to adapt code for the binary knapsack problem to the MCKP with a few tweaks--this approach is likely to lead you to a solution whose performance degrades unacceptably as the number of items in each group increases.
Assume
c[i] : The category of the ith element
V[i,w,S] : Maximum value of the knapsack such that it contains at max one item from each category in S
Recursive Formulation
V[i,w,S] = max(V[i-1,w,S],V[i,w-w[i],S-{c[i]}] + v[i])
Base Case
V[0,w,S] = -`infinity if w!=0 or S != {}`

My naive maximal clique finding algorithm runs faster than Bron-Kerbosch's. What's wrong?

In short, my naive code (in Ruby) looks like:
# $seen is a hash to memoize previously seen sets
# $sparse is a hash of usernames to a list of neighboring usernames
# $set is the list of output clusters
$seen = {}
def subgraph(set, adj)
hash = (set + adj).sort
return if $seen[hash]
$sets.push set.sort.join(", ") if adj.empty? and set.size > 2
adj.each {|node| subgraph(set + [node], $sparse[node] & adj)}
$seen[hash] = true
end
$sparse.keys.each do |vertex|
subgraph([vertex], $sparse[vertex])
end
And my Bron Kerbosch implementation:
def bron_kerbosch(set, points, exclude)
$sets.push set.sort.join(', ') if set.size > 2 and exclude.empty? and points.empty?
points.each_with_index do |vertex, i|
points[i] = nil
bron_kerbosch(set + [vertex],
points & $sparse[vertex],
exclude & $sparse[vertex])
exclude.push vertex
end
end
bron_kerbosch [], $sparse.keys, []
I also implemented pivoting and degeneracy ordering, which cut down on bron_kerbosch execution time, but not enough to overtake my initial solution. It seems wrong that this is the case; what algorithmic insight am I missing? Here is a writeup with more detail if you need to see fully working code. I've tested this on pseudo-random sets up to a million or so edges in size.
I don't know how you generate the random graphs for your tests but I suppose you use a function which generates a number according to a uniform distribution and thus you obtain a graph that is very homogeneous. That's a common problem when testing algorithms on graphs, it is very difficult to create good test cases (it's often as hard as solving the original problem).
The max-clique problem is a well-known NP hard problem and both algorithms (the naive one and the Bron Kerbosch one) have the same complexity so we can't expect a global improvement on all testcase but just an improvement on some particular cases. But because you used a uniform distribution to generate your graph, you don't have this particular case.
That's why performance of both algorithms is very similar on your data. And because Bron Kerbosch algorithm is a little more complex than the naive one, the naive one is faster.

Algorithm for solving set problem

If I have a set of values (which I'll call x), and a number of subsets of x:
What is the best way to work out all possible combinations of subsets whose union is equal to x, but none of whom intersect with each other.
An example might be:
if x is the set of the numbers 1 to 100, and I have four subsets:
a = 0-49
b = 50-100
c = 50-75
d = 76-100
then the possible combinations would be:
a + b
a + c + d
What you describe is called the Exact cover problem. The general solution is Knuth's Algorithm X, with the Dancing Links algorithm being a concrete implementation.
Given a well-order on the elements of x (make one up if necessary, this is always possible for finite or countable sets):
Let "sets chosen so far" be empty. Consider the smallest element of x. Find all sets which contain x and which do not intersect with any of the sets chosen so far. For each such set in turn recurse, adding the chosen set to "sets chosen so far", and looking at the smallest element of x not in any chosen set. If you reach a point where there is no element of x left, then you've found a solution. If you reach a point where there is no unchosen set containing the element you're looking for, and which does not intersect with any of the sets that you already have selected, then you've failed to find a solution, so backtrack.
This uses stack proportional to the number of non-intersecting subsets, so watch out for that. It also uses a lot of time - you can be far more efficient if, as in your example, the subsets are all contiguous ranges.
here's a bad way (recursive, does a lot of redundant work). But at least its actual code and is probably halfway to the "efficient" solution.
def unique_sets(sets, target):
if not sets and not target:
yield []
for i, s in enumerate(sets):
intersect = s.intersection(target) and not s.difference(target)
sets_without_s = sets[:i] + sets[i+1:]
if intersect:
for us in unique_sets(sets_without_s, target.difference(s)):
yield us + [s]
else:
for us in unique_sets(sets_without_s, target):
yield us
class named_set(set):
def __init__(self, items, name):
set.__init__(self, items)
self.name = name
def __repr__(self):
return self.name
a = named_set(range(0, 50), name='a')
b = named_set(range(50, 100), name='b')
c = named_set(range(50, 75), name='c')
d = named_set(range(75, 100), name='d')
for s in unique_sets([a,b,c,d], set(range(0, 100))):
print s
A way (may not be the best way) is:
Create a set of all the pairs of subsets which overlap.
For every combination of the original subsets, say "false" if the combination contains one or more of the pairs listed in Step 1, else say "true" if the union of the subsets equals x (e.g. if the total number of elements in the subsets is x)
The actual algorithm seems largely dependent on the choice of subsets, product operation, and equate operation. For addition (+), it seems like you could find a summation to suit your needs (the sum of 1 to 100 is similar to your a + b example). If you can do this, your algorithm is obviously O(1).
If you have a tougher product or equate operator (let's say taking a product of two terms means summing the strings and finding the SHA-1 hash), you may be stuck doing nested loops, which would be O(n^x) where x is the number of terms/variables.
Depending on the subsets you have to work with, it might be advantageous to use a more naive algorithm. One where you don't have to compare the entire subset, but only upper and lower bounds.
If you are talking random subsets, not necesserily a range, then Nick Johnson's suggestion will probably be the best choice.

Resources