Smallest set of multi-sets that together contains all numbers from 1 to N - algorithm

Lets assume that we have only integer numbers which values are in range 1 to N. Next we will split them into K-element multi-sets. How would you find such set which contains smallest possible number of those multi-sets yet sum of this multi-set contains all numbers from 1 to N? In case of ambiguity answer will be any set that matches criteria (first found).
For instance, we have N = 9, K = 3
(1,2,3)(4,5,6)(7,8,8)(8,7,6)(1,9,2)(4,4,3)
Smallest number of multi-sets that contains all the numbers from 1 to 9 is equal to 4 and can be either (1,2,3)(4,5,6)(7,8,8)(1,9,2) or (1,2,3)(4,5,6)(8,7,6)(1,9,2).
Any idea for efficient algorithm to find such set?
PS
After writing an answer I found yet another 4 element set: (4,5,6)(1,9,2)(4,4,3)(7,8,8) or (4,5,6)(1,9,2)(4,4,3)(8,7,6) But as I said algorithm finding any minimum set would be fine.

Your question is a restricted version the classic Set Covering problem, but it still easy to show that it is NP-Hard.
Any approximation technique for this problem would be reasonable here. In particular, the greedy solution of choosing the next subset covering the most uncovered items - is esp. easy to implement.

This problem, as #Ami Tavroy said, is NP-hard by reduction to 3-dimensional matching (here).
To do the reduction, note the restricted decision variant of 3-dimensional matching when it reduces to a exact cover (here):
...given a set T and an integer k, decide whether there exists a
3-dimensional matching M ⊆ T with |M| ≥ k. ... The problem is
NP-complete even in the special case that k = |X| = |Y| =
|Z|.1[4][5] In this case, a 3-dimensional (dominating) matching is
not only a set packing but also an exact cover: the set M covers each
element of X, Y, and Z exactly once.[6]
This variant can be solved in P if you can solve the other question in P - you can produce all the triples in O(N ^ 3) time and then do set cover, and check if K = N / 3 or not. Thus by reduction, the original questions is also NP-hard.

Related

Calculate the probability of one set to be covered by random subsets

Suppose these two sets are given as input:
One set U as universe
And one set S containing some of the subsets of U.
The members of S are assigned with random flags 0 or 1. For each member of S, the probability of flag 1 is p and flag 0 is (1-p).
The desired output is: The probability of 'Union of the flag 1 subsets in S = U'
Although considering all the possible combinations of the flag 1 subsets in S is the trivial algorithm to lead to output, the running time of this brute force method is obviously exponential.
Is there any polynomial time algorithm which leads to the exact or approximate output? Or can we reduce the problem to any famous one like set-cover?
Getting an exact answer is #P-hard (counting analog of NP, thus at least as hard), since this problem generalizes monotone 2-CNF-SAT, which is known to be #P-hard (Welsh, Dominic; Gale, Amy (2001), "The complexity of counting problems", Aspects of complexity: minicourses in algorithmics, complexity and computational algebra: mathematics workshop, Kaikoura, January 7–15, 2000, pp. 115ff, Theorem 57.). The reduction is to set U to the set of clause identifiers and let each subset in S be the set of clauses in which some variable appears. EDIT: set p = 1/2 for each set, natch.

Number of ways to represent a number as a sum of K numbers in subset S

Let the set S be {1 , 2 , 4 , 5 , 10}
Now i want to find the number of ways to represent x as sum of K numbers of the set S. (a number can be included any number of times)
if x = 10 and k = 3
Then the ans should be 2 => (5,4,1) , (4,4,2)
The order of the numbers doesn't matter ie.(4,4,2) and (4,2,4) count as one.
I did some research and found that the set can be represented as a polynomial x^1+x^2+x^4+x^5+x^10 and after raising the polynomial to the power K the coefficients of the product polynomial gives the ans.
But the ans includes (4,4,2) and (4,2,4) as unique terms which i don't want
Is there any way to make (4,4,2) and (4,2,4) count as same term ?
This is a NP-complete, a variant of the sum-subset problem as described here.
So frankly, I don't think you can solve it via a non-exponential (iterate though all combinations) solution, without any restrictions on the problem input (such as maximum number range, etc.).
Without any restrictions on the problem domain, I suggest iterating through all your possible k-set instances (as described in the Pseudo-polynomial time dynamic programming solution) and see which are a solution.
Checking whether 2 solutions are identical is nothing compared to the complexity of the overall algo. So, a hash of the solution set-elements will work just fine:
E.g. hash-order-insensitive(4,4,2)==hash-order-insensitive(4,2,4) => check the whole set, otherwise the solutions are distinct.
PS: you can also describe step-by-step your current solution.

Algorithm to find discriminating data points?

Given n samples and p >> n (discrete) data points for each of the n samples, what is a good algorithm for finding a smallest possible set of k data points such that those k data points discriminate between all n samples?
For my purposes, a good algorithm that finds an approximately smallest set would also suffice.
It sounds as though your problem is closely related to the test cover problem. The test cover problem is, given a ground set X = {1, …, n} and a collection T = {T1, …, Tm} of subsets of X, to find the smallest subcollection U of T such that for all y ≠ z in X, there exists a set S in T such that either (x in S and y not in S) or (x not in S and y in S).
The test cover problem is NP-hard, so in practice, optimal solutions are found using branch and bound techniques. See De Bontridder et al.
Here is a simple greedy algorithm, shouldn't generate too bad results:
Check if data points are same for two different elements, if so, there is no solution.
In each step we add one new data point to the set k.
We test all the different points in all of the p in n.
Try to add that point to k.
The new k divides n into a couple of distinct sets (some of these
contain just one element, some more.. finally all will contain just one).
Pick the point which generates
the most sets.
Do this till all sets are distinct.

an algorithm to find the minimum size set cover for the Set-cover problem

In the Set Covering problem, we are given a universe U, such that |U|=n, and sets S1,……,Sk are subsets of U. A set cover is a collection C of some of the sets from S1,……,Sk whose union is the entire universe U.
I'm trying to come up with an algorithm that will find the minimum number of set cover so that I can show that the greedy algorithm for set covering sometimes finds more sets.
Following is what I came up with:
repeat for each set.
1. Cover<-Seti (i=1,,,n)
2. if a set is not a subset of any other sets, then take take that set into cover.
but it's not working for some instances.
Please help me figure out an algorithm to find the minimum set cover.
I'm still having problem find this algorithm online. Anyone has any suggestion?
Set cover is NP-hard, so it's unlikely that there'll be an algorithm much more efficient than looking at all possible combinations of sets, and checking if each combination is a cover.
Basically, look at all combinations of 1 set, then 2 sets, etc. until they form a cover.
EDIT
This is an example pseudocode. Note that I do not claim that this is efficient. I simply claim that there isn't a much more efficient algorithm (algorithms will be worse than polynomial time unless something really cool is discovered)
for size in 1..|S|:
for C in combination(S, size):
if (union(C) == U) return C
where combination(K, n) returns all possible sets of size n whose elements come from K.
EDIT
However, I'm not too sure why you need an algorithm to find the minimum. In the question you state that you want to show that the greedy algorithm for set covering sometimes finds more sets. But this is easily achieved via a counterexample (and a counterexample is shown in the wikipedia entry for set cover). So I am quite puzzled.
EDIT
A possible implementation of combination(K, n) is:
if n == 0: return [{}] //a list containing an empty set
r = []
for k in K:
K = K \ {k} // remove k from K.
for s in combination(K, n-1):
r.append(union({k}, s))
return r
But in combination with the cover problem, one probably wants to perform the test of coverage from the base case n == 0 instead. Well.
Try Donald E. Knuth algorithm-X for exact set coverage, using a sparse matrix. Must be adapted a little to solve minimum set cover problems also.

Point covering problem

I recently had this problem on a test: given a set of points m (all on the x-axis) and a set n of lines with endpoints [l, r] (again on the x-axis), find the minimum subset of n such that all points are covered by a line. Prove that your solution always finds the minimum subset.
The algorithm I wrote for it was something to the effect of:
(say lines are stored as arrays with the left endpoint in position 0 and the right in position 1)
algorithm coverPoints(set[] m, set[][] n):
chosenLines = []
while m is not empty:
minX = min(m)
bestLine = n[0]
for i=1 to length of n:
if n[i][0] <= minX and n[i][1] > bestLine[1] then
bestLine = n[i]
add bestLine to chosenLines
for i=0 to length of m:
if m[i] <= bestLine[1] then delete m[i] from m
return chosenLines
I'm just not sure if this always finds the minimum solution. It's a simple greedy algorithm so my gut tells me it won't, but one of my friends who is much better than me at this says that for this problem a greedy algorithm like this always finds the minimal solution. For proving mine always finds the minimal solution I did a very hand wavy proof by contradiction where I made an assumption that probably isn't true at all. I forget exactly what I did.
If this isn't a minimal solution, is there a way to do it in less than something like O(n!) time?
Thanks
Your greedy algorithm IS correct.
We can prove this by showing that ANY other covering can only be improved by replacing it with the cover produced by your algorithm.
Let C be a valid covering for a given input (not necessarily an optimal one), and let S be the covering according to your algorithm. Now lets inspect the points p1, p2, ... pk, that represent the min points you deal with at each iteration step. The covering C must cover them all as well. Observe that there is no segment in C covering two of these points; otherwise, your algorithm would have chosen this segment! Therefore, |C|>=k. And what is the cost (segments count) in your algorithm? |S|=k.
That completes the proof.
Two notes:
1) Implementation: Initializing bestLine with n[0] is incorrect, since the loop may be unable to improve it, and n[0] does not necessarily cover minX.
2) Actually this problem is a simplified version of the Set Cover problem. While the original is NP-complete, this variation results to be polynomial.
Hint: first try proving your algorithm works for sets of size 0, 1, 2... and see if you can generalise this to create a proof by induction.

Resources