backtracking algorithm for set cover - algorithm

Can someone provide me with a backtracking algorithm to solve the "set cover" problem to find the minimum number of sets that cover all the elements in the universe?
The greedy approach almost always selects more sets than the optimal number of sets.

This paper uses Linear Programming Relaxation to solve covering problems.
Basically, the LP relaxation yields good bounds, and can be used to identify solutions that are optimum in many cases. Incidentally, when I last looked at open source LP solvers (~2003) I wasn't impressed (some gave incorrect results), but there seem to be some decent open source LP solvers now.

Your problem needs a little more clarification - it seems that you are given a family of subsets $$S_1,\ldots,S_n$$ of a set A, such that the union of the subsets equals A, and you want a minimum number of subsets whose union is still A.
The basic approach is branch and bound with some heuristics. E.g., if a particular element of A is in only one subset $$S_i$$, then you must select $$S_i$$. Similarly, if $$S_k$$ is a subset of $$S_j$$, then there's no reason to consider $$S_k$$; if element $$a_i$$ is in every subset that $$a_j$$ is in, then you can not bother considering $$a_i$$.
For branch and bound you need good bounding heuristics. Lower bounds can come from independent sets (if there are k elements $$i_1,\ldots,i_L$$ in A such that each if $$i_p$$ is contained in $$A_p$$ and $$i_q$$ is contained in $$A_q$$ then $$A_p$$ and $$A_q$$ are disjoint). Better lower bounds come from the LP relaxation described above.
The Espresso logic minimization system from Berkeley has a very high quality set covering engine.

Related

Number of elements required to occur at least ones in each set of a set

I have a list L of lists l[i] of elements e. I am looking for an algorithm that finds a minimum set S_min of elements such that at least one member of S_min occurs in each l.
I am not only curious to find a simple algorithm that does this for me, but also to learn what problems of this sort are actually called. I am sure there is something out there
I have implemented brute force algorithms that start with adding all those elements to S_min which occur in sets of len(l[i])=1. The rest is simple trial and error.
The problem you describe ist the vertex cover problem in hypergraphs, an optimization problem which is NP-hard in the general case but admits approximation algorithms for suitably bounded instances.

Object stacking, dynamic programming

I'm working with a problem that is similar to the box stacking problem that can be solved with a dynamic programming algorithm. I read posts here on SO about it but I have a difficult time understanding the DP approach, and would like some explanation as to how it works. Here's the problem at hand:
Given X objects, each with its own weight 'w' and strength 's', how
many can you stack on top of each other? An object can carry its own
weight and the sum of all weights on top of it as long as it does not
exceed its strength.
I understand that it has an optimal substructure, but its the overlapping subproblem part that confuses me. I'm trying to create a recursion tree to see where it would calculate the same thing several times, but I can't figure out if the function would take one or two parameters for example.
The first step to solving this problem is proving that you can find an optimal stack with boxes ordered from highest to lowest strength.
Then you just have to sort the boxes by strength and figure out which ones are included in the optimal stack.
The recursive subproblem has two parameters: find the best stack you can put on top of a stack with X remaining strength, using boxes at positions >= Y in the list.
If good DP solution exists, it takes 2 params:
number of visited objects or number of unvisited objects
total weight of unvisited objects you can currently afford (weight of visited objects does not matter)
To make it work you have to find ordering, where placing object on top of next objects is useless. That is, for any solution with violation of this ordering there is another solution that follows this ordering and is better or equal.
You have to proof that selected ordering exists and define it clearly. I don't think simple sorting by strength, suggested by Matt Timmermans, is enough, since weight has some meaning. But it's the proofing part...

Affect m students to n groups, but with constraints?

I asked about the minimum cost maximum flow several weeks ago. Kraskevich's answer was brilliant and solved my problem. I've implemented it and it works fine (available only in French, sorry). Additionaly, the algorithm can handle the assignement of i (i > 1) projects to each student.
Now I'm trying something more difficult. I'd like to add constraints on choices. In the case one wants to affect i (i > 1) projects to each student, I'd like to be able to specify which projects are compatible (each other).
In the case some projects are not compatible, I'd like the algorithm to return the global optimum, i.e. affect i projects to each student maximizing global happiness and repecting compatibility constraints.
Chaining i times the original method (and checking constraints at each step) will not help, as it would only return a local optimum.
Any idea about the correct graph to work with ?
Unfortunately, it is not solvable in polynomial time(unless P = NP or there are additional constraints).
Here is a polynomial time reduction from the maximum independent set problem(which is known to be NP-complete) to this one:
Given a graph G and a number k, do the following:
Create a project for each vertex in the graph G and say that two project are incompatible iff there is an edge between the corresponding vertices in G.
Create one student who likes each project equally(we can assume that the happiness each project gives to him is equal to 1).
Find the maximum happiness using an algorithm that solves the problem stated in your question. Let's call it h.
A set of projects can be picked iff they all are compatible, which means that the picked vertices of G form an independent set(due to the way we constructed the graph).
Thus, h is equal to the size of the maximum independent set.
Return h >= k.
What does it mean in practice? It means that it is not reasonable to look for a polynomial time solution to this problem. There are several things that can be done:
If the input is small, you can use exhaustive search.
If it is not, you can use heuristics and/or approximations to find a relatively good solution(not necessary the optimal one, though).
If you can stomach the library dependency, integer programming will be quicker and easier than anything you can implement yourself. All you have to do is formulate the original problem as an integer program and then add your ad hoc constraints at the end.

Approximation for set cover

I started learning about approximation algorithms,
I'm reading a book about that and I don't understand the analysis for the set cover algorithm.
Can someone please explain lemma 2.3 ?
it's short but I don't understand it...
http://view.samurajdata.se/psview.php?id=0482e9ff&page=13
The algorithm is assigning a "price" price(e) to each element of the universe U where that price is the cost of the set S used to cover e divided by the number of elements newly covered by the set S (any elements already covered must have been assigned a lower price by a previous set due to the definition of the algorithm).
There exists an optimal solution which chooses a set of sets with total cost OPT. As that solution covers all elements, it certainly covers whatever elements have not yet been covered. Covering the rest of the elements (the set CBar in the notation of the proof) at cost OPT would mean covering each element at cost-effectiveness OPT/|CBar| by the definition of cost-effectiveness (aka price). As the optimal solution contains a set which covers all remaining elements, suppose we pick a set S from the optimal solution which covers at least one remaining element (e_k in lemma 2.3). Note that we are choosing the set with the best cost-effectiveness, so its cost-effectiveness must be at least as good as the average cost-effectiveness of the sets in the optimal solution of OPT/|CBar|.
The last part is that due to the definitions, |CBar|=n-(k-1)=n-k+1 as k-1 elements have already been covered and we are looking at covering element k. Therefore, the cost-effectiveness of S and therefore price(e_k) is bounded by OPT/(n-k+1).

Find the priority function / alphabet order for extreme higher order elements relation

This question is an extension to the following one. The difference is that now our function to optimize will have higher order relations between elements:
We have an array of elements a1,a2,...aN from an alphabet E. Assuming |N| >> |E|.
For each symbol of the alphabet we define an unique integer priority = V(sym). Let's define V{i} := V(symbol(ai)) for the simplicity.
The task is to find a priority function V for which:
Count(i)->MIN | V{i} > V{i+1} <= V{i+2}
In other words, I need to find the priorities / permutation of the alphabet for which the number of positions i, satisfying the condition V{i}>V{i+1}<=V{i+2}, is minimum.
Maximum required abstraction (low priority for me). I guess once the solution model for the initial question is extended to cover the first part of this one, extending it farther (see below) will be easier.
Given a matrix of signs B of size MxK (basically B[i,j] is from the set {<,>,<=,>=}), find the priority function V for which:
Sum(for all j in range [1,M]) {Count(i)}->EXTREMUM | V{i} B[j,1] V{i+1} B[j,2] ... B[j,K] V{i+K}
As an example, find the priority function V, for which the number of i, satisfying V{i}<V{i+1}<V{i+2} or V{i}>V{i+1}>V{i+2}, is minimum.
My intuition is that all variations on this problem will prove to be NP-hard. So I'd begin looking for heuristics that produce reasonable answers. This may involve some trial and error.
A simplistic approach is to write down a possible permutation. And then try possible swaps until you've arrived at a local minimum. Try several times, and pick the best answer.
Simulated annealing provides a more sophisticated version of this approach, see http://en.wikipedia.org/wiki/Simulated_annealing for a description. It may take some experimentation to find a set of parameters that seems to converge relatively well.
Another idea is to look for a genetic algorithm. Based on a quick Google search it looks like the standard way to do this is to try to turn an NP-complete problem into a SAT problem, and then use a genetic algorithm on that problem. This approach would require turning this into a SAT problem in some reasonable way. Unfortunately it is not obvious to me how one would go about doing this reduction. Indeed in the first version that you had, your problem was closely connected to a classic NP-hard problem. The fact that it is labeled NP-hard rather than NP-complete is evidence that people haven't found a good way to transform it into a SAT problem. So if it isn't obvious how to turn the simple version into a SAT problem, then you are unlikely to convert the hard problem either.
But you could still try some variation on genetic algorithms. Mutation is pretty simple, just swap some elements around. One way to combine elements would be to take 3 permutations and use quicksort to find the combination as follows: take a random pivot, and then use "majority wins" to bucket elements into bigger and smaller. Sort each half in the same way.
I'm sorry that I can't just give you an approach and say, "This should work." You've got what looks like an open-ended research project, and the best I can do is give you some ideas about things you can try that might work reasonably well.

Resources