System of nondistinct representatives efficiently solvable? - algorithm

We have sets S1, S2, ..., Sn. These sets do not have to be disjoint. Our task is to select a representative member for each set, such that the total number of elements selected is as small as possible. One element may be present in more than one set, and can represent all the sets it is included in. Is there an algorithm to solve this efficiently?

It is easier to answer this question after restatement: let the original sets S1, S2, ..., Sn be elements of the universe and let the original set members be sets themselves: T1, T2, ..., Tm (where Ti contains elements {Sj} which are the original sets containing corresponding member).
Now we have to cover universe S1, S2, ..., Sn with sets T1, T2, ..., Tm. Which is exactly Set cover problem. It is a well known NP-hard problem, so there is no algorithm to solve it efficiently (unless P=NP, as theorists usually say). As you can see from Wikipedia page, there is a greedy approximation algorithm; it is efficient, but approximation ratio is not very good.

I'm assuming that by "efficiently," you mean in polynomial time.
Evgeny Kluev is correct, the problem is NP-hard. The decision version of it is known as the hitting set problem and was shown to be what we now call NP-complete soon after the introduction of that concept. While it's true that Evgeny's reduction is from the hitting set problem to the set cover problem, it's not hard to see an explicit inverse reduction.
Given a set C={C1,C2,...Cm} whose union is U={u1,u2,...,un}, we want to find a minimal-cardinality subset C' whose union is also equal to U. Define Si in your initial problem as {Cj in C | ui is an element of Cj}. The minimum hitting set of S={S1,S2,...,Sn} is then equal to our desired C'.

Not to steal Evgeny's glory, but here's a rather straightforward way of showing perhaps more rigorously that the general case of the poster's problem is NP-hard.
Consider the minimum vertex cover problem of finding a minimum set X from vertices V in an simple graph (V,E) where every edge in E is adjacent to at least one vertex in X.
An edge can be represented by an unordered two-element set {va, vb} where va and vb are distinct elements in V. Note that an edge e represented as {va, vb} is adjacent to vc if and only if vc is an element of {va, vb}.
Hence, the minimal vertex cover problem is the same as finding a minimum size subset X of V where each edge set {va, vb} defined by an edge in E contains an element that is in X.
If one has an algorithm to efficiently solve the original stated problem, then one has an algorithm to efficiently solve the above problem, and therefore one can solve the minimal vertex cover problem efficiently as well.

A couple of algorithms to look at are Simulated Annealing and Genetic Algorithms, if you can live with a solution close to optimal (they might get you the optimal solution, but not necessarily). Simulated Annealing can be made to work in production electronic CAD autoplacement (as I was part of the development team for Wintek's autoplacement program).

Related

Proof of np-completeness

Show that the following problem is NP-complete.
The tv problem is to select tv shows for a weekly tv night so that
everyone in a group of people sees something that they like. You are
given a list of people (P1, . . . , Pn) in the group and a list of
possible shows (S1, . . . , Sk). For each show Si, there is a subset
of the people who would like that show choice. You also get w, the
number of weeks for which you can select shows. The question is
whether there are these many movies so that every person likes at
least one of them.
I can't figure out which np problem can be reduced to this and how to establish the certificate.
You can model this as the Set cover problem. You have elements {P1, ..., Pn}, and k subsets of these, T1, ..., Tk, defined as Ti = {Pj : Pj likes Si}. You then want to find the smallest collection of subsets such that their union is the whole set of people. Deciding whether the number of necessary subsets is less than or equal to a number is NP-complete. Finding the actual optimal collection of subsets is NP-hard.
As Matt commented above, your problem is a set cover problem. To prove it is NP complete we must show that it is in NP, and that a known NP-complete problem can be reduced to yours. As suggested I will use Vertex Cover as our known NP-complete problem.
NP Proof
For this we need to word the problem as a decision problem and supply a certificate that can be verified in P time. Our decision problem will be can we satisfy all people using at most k shows. The certificate will be a subset of shows (we will call this subset X). To verify this certificate we need to verify:
1) X is a subset of S
This is simply done by iterating through X and verifying each item appears in S. This can be accomplished in linear time.
2) |X|<= k
This also can be solved in linear time by iterating through X incrementing a count value and comparing it to k.
3) All people are satisfied
This can be accomplished by iterating through P, and checking to see if each person is accounted for by a selection in X. In worst case where most shows cater to many people, this can be accomplished in O(P^2) time.
Since all these steps take polynomial time, the problem is in NP.
NP Complete Proof by Vertex Cover Problem Reduction
Vertex cover is a problem that concerns finding the minimum subset of vertices such that every edge has an endpoint in that subset of vertices. The input to this problem is a graph G(V,E) and k (the number of vertices). To reduce this problem to the instance of set cover you have stated above, let k = the min number of shows required to satisfy everyone, P = E, and Sn = the set of edges incident to n.
This transformation can be easily accomplished in polynomial time, as the most expensive is the last transformation (Sn = the set of edges incident to n) which takes O(V*E) time.
Now, if G has a vertex cover G' of size k, and then in our problem X is a collection of subsets representing vertices in G. This implies |X| = k. Going further, X is a set cover of P as each edge (u,v) has at least u or v in G' (as it is a vertex cover) meaning u or v are in X in our set cover problem.
What this all means is that if you represent a vertex cover problem as your problem, finding a solution also solves the transformed vertex cover problem, as each subset represents a vertex in G, and since each person is accounted for, every edge in the vertex cover problem is also accounted for

Is this bipartite graph optimization task NP-complete?

I have been trying to find out a polynomial-time algorithm to solve this problem, but in vain. I'm not familiar with the NP-complete thing. Just wondering whether this problem is actually NP-complete, and I should not waste any further effort trying to come up with a polynomial-time algorithm.
The problem is easy to describe and understand. Given a bipartite graph, what is the minimum number of vertices you have to select from one vertex set, say A, so that each vertex in B is adjacent to at least one selected vertex.
Unfortunately, this is NP-hard; there's an easy reduction from Set Cover (in fact it's arguably just a different way of expressing the same problem). In Set Cover we're given a ground set F, a collection C of subsets of F, and a number k, and we want to know if we can cover all n ground set elements of F by choosing at most k of the sets in C. To reduce this to your problem: Make a vertex in B for each ground element, and a vertex in A for each set in C, and add an edge uv whenever ground element v is in set u. If there was some algorithm to efficiently solve the problem you describe, it could solve the instance I just described, which would immediately give a solution to the original Set Cover problem (which is known to be NP-hard).
Interestingly, if we are allowed to choose vertices from the entire graph (rather than just from A), the problem is solvable in polynomial time using bipartite maximum matching algorithms, due to KÅ‘nig's Theorem.

Algorithm to divide a set of symbols with constraints into minimum number of subsets

I have a set S={a,c,d,e,f,j,m,q,s,t} with a constraint C={am,cm,de,df,dm,ds,ef,em,eq,es,et,fj,fm,fs,jm,js}. xy in C means that x and y cannot be in the same subset. I would like an algorithm to split set S into subsets Sj such that:
1.The number of Sj is minimized
2.The difference between size of each subset is as large as possible
For example in this case, both {{q,a,c,d,j,t},{m,s},{f},{e}} and {{a,c,e,j},{m,s,q,t},{d},{f}} are satisfying 1, but the first is optimal.
Coming from a computer science background, I wonder whether Mathematicians have devised an algorithm for this problem.
As I understand, your task can be rewritten as: find the largest independent subset of vertices S' of graph G=(S, C); repeat the step for graph G'=G\S'.
It's well-known (also pointed by #tobias_k in his comment) that largest independent set of the graph is NP-hard problem (as it's equivalent to the famous clique-problem).
I think this is very hard problem, and that is why. For finding minimum number of subsets, you must solve problem about minimum chromatic number of graph. This problem is generally solved by brute force.

looking for better bound to stop earlier in set cover

I am trying to solve set cover problem in a way that vertex cover is solved
Input: we have a base set X and collection C of subsets of X, so that each element in C is a subset of X
Output: the size of the smallest set F from set in C in a way that the union of all elements of F results in X
I know how to solve this but I am looking for a heuristic to stop going further in the tree earlier. For example Now I remove each element from C and do a recursive call and I check for stopping point in this way: if(bestsofar <= F.length+1) stop
but I know that there would be better heuristic because for example in vertex cover I can check like this : if K+1 >best stop; which k is the number of added vertice in the result to cover edges but the better approach is if K+ number Edges/maxdeg >=best stop which is much better.
I want the same thing for set-cover .
does anyone have any idea?
From a theoretical perspective, what your heuristic for vertex cover is doing is constructing a feasible solution to the dual of the relaxed linear program for vertex cover. The same can be done for set cover. If for whatever reason you don't want to use the simplex method to find the optimal dual solution, then there are a variety of approximations available. You could use K plus the number of items divided by maximum number of items in a set, which generalizes your heuristic for vertex cover. You also could use a greedy algorithm to find a packing, by which I mean the following. For vertex cover, this would be a set of edges with no endpoints in common (i.e., a matching). Every cover contains at least one endpoint of each of the edges in the packing. For set cover, this would be a collection of items such that no set contains more than one item of the collection.

"(1:k) Tree-Matching" - Solvable in polynomial time?

Some months ago there was a nice question regarding a "1:n matching problem" and there seems to be no poly-time algorithm.
I would like to add constraints to find a maximum matching for the 1:n matching problem with a polynomial algorithm. I would like to say: "For vertex A1 choose either {B1,B2,B5} or {B2,B3} if the vertices are not already taken from another A-vertex" i.e. I would not allow all possible combinations.
This could be expressed if we introduce helper vertices H for each choice and substitute edges with trees => we get a problem similar to the ordinary bipartite matching. Every vertex of A or B can have only one edge in the matching. The edges to or from vertices in H are either all in the matching or none of them is present in the matching. Imagine the following tri-partite graph:
Now define h_ij="tree rooted that contains H_ij" to express the matching easily:
Then in the example M={h12,h22} would be one 'maximum' matching, although not all vertices from B are involved
The set {h12,h23} is not a matching because then B3 would have be choosen twice.
Would this problem then be solvable in polynomial time? If yes, is there a polytime solution for the weighted (w(h_ij)) variant? If no, could you argue or even proof it for a "simple-man" like me or suggest other constraints to solve the 1:n matching problem?
E.g. could the graph transformed to a general graph which then could be solved with the weighted matching for general graphs? Or could branchings or even matching forests help here?
PS: not a homework ;-)
There is a difference between maximal and maximum. I have assumed you meant maximum for the below writeup.
You don't seem to have defined your problem very clearly, but if I have understood your intent correctly, It seems like your problem is NP complete (and 'equivalent' to Set Packing).
We can assume that the allowed sets sizes is the same (k) for all A_i to find a [1:k] matching, as any other set size can be ignored. To find max k, we just run the algorithm for [1:k] for k = 1,2,3.. etc.
So your problem is (I think...):
Given m set families F_i = {S_1i, .., S_n(i)i} (|F_i| = size of F_i = n(i), need not be same as |F_j|), each set of size k, you have to find one set from each family (say S_i) such that
S_i and S_j are disjoint for any i neq j.
number of S_i's is maximum.
We can show that it is NP-Complete for k=3 in two steps:
The NP-Complete problem Set Packing can be reduced it. This shows that it is NP-Hard.
Your problem is in NP and can be reduced to Set Packing. This and 1) implies your problem is NP-Complete. It also helps you leverage any approximation/randomized algorithms already existing for Set-Packing.
Set Packing is the problem:
Given n sets S_1, S_2, ..., S_n, find the maximum number of pairwise disjoint sets among these.
This problem remains NP-Complete even if |S_1| = |S_2| = ... = |S_n| = 3 and is called the 3-Set packing problem.
We will use this to show that your problem is NP-Hard, by providing an easy reduction from 3-Set packing to your problem.
Given S_1, S_2, .., S_n just form the families
F_i = {S_i}.
Now if your problem had a polynomial time solution, then we get a set of Sets {S_1, S_2, ..., S_r} such that
S_i and S_j are disjoint
Number of S_i is maximum.
This easy reduction gives us a solution to the 3-set Packing problem and thus your problem is NP-Hard.
To see that this problem is in NP, we reduce it to Set-Packing as follows:
Given F_i = {S_1i, S_2i, ..., S_ni}
we consider the sets T_ji = S_ji U {i} (i.e. we add an id of the family into the set itself) and run them through the Set-Packing algorithm. I will leave it to you to see why a solution to Set-Packing gives a solution to your problem.
For a maximal solution, all you need is a greedy algorithm. Just keep picking up sets till you can pick no more. This would be polynomial time.

Resources