Greedy Set Coverage algorithm built by *removing* sets

Greedy Set Coverage algorithm built by *removing* sets - algorithm

I am trying to implement a solution for a set coverage problem using a greedy algorithm.
The classic greedy approximation algorithm for it is
input: collection C of sets over universe U , costs: C→R ≥0
output: set cover S
1. Let S←∅.
2. Repeat until S covers all elements:
3. Add a set s to S, where s∈C maximizes the number of elements in s not yet covered by set s in S, divided by the cost c(s).
4. Return S.
I have a question in 2 parts:
a. Will doing the algorithm in reverse be a valid algorithm i.e.
input: collection C of sets over universe U , costs: C→R ≥0
output: set cover S
1. Let S←C .
2. Repeat until there are no s∈S such that S-s=S (i.e. all elements in s are redundant):
3. Remove a set s from S, where s∈S minimises the number of elements in s, divided by the cost c(s).
4. Return S.
b. The nature of the problem is such that it easy to get C and there will be a limited number (<5) of redundant sets - in this case will this removal algorithmm would perform better?

The algorithm will surely return a valid set cover as at every step it checks if all elements of s are redundant.
Intuitively I feel that part b is true though I am unable to write a formal proof for it. Read chapter 2 of Vijay Vazirani as it might help do the analysis part.

Related

An example of an input to the set cover problem which does not provide a 2-approximation

I need some help with the following question:
Show an example of an input to the set cover problem for which the greedy algorithm shown in class does not provide a 2-approximation.
The greedy algorithm:
X - a finite set
F - family of subsets of X such that the union gives X
C - the desired set of minimal size which covers X.

There is a 3/2 approximation example in the wikipedia page presenting the greedy algorithm for the set cover problem.
We can see two groups of sets composing F. 2 sets (the 'lines'), forming a partition, each of them with half of the 'points'. And 3 other sets (the 'rectangles'), forming another partition, with resp. 2, 4 and 8 points.
The greedy algorithm will choose the 'rectangles' since it starts with the largest set of F.
It is possible to adapt this scheme to make a 'worse' approximation, to 'trick' the greedy algorithm.
Recipe: draw the same figure, but with a 31 x 2 grid instead of a 7 x 2. Keep the two lines with half the points in each (still forming a partition), and add two 'rectangles' (the two biggest, they will have resp. 16 and 32 'points') on the right side.
The greedy algorithm will return the 5 'rectangles', while the optimal solution will consist of the two lines, so an approximation of 5/2 > 2.
Note that this process can be extended infinitly (with a (2^n)-1 per 2 grid), so you can prove that the greedy algorithm for the set cover is not a k-approximaation, for any number k.

Calculate the probability of one set to be covered by random subsets

Suppose these two sets are given as input:
One set U as universe
And one set S containing some of the subsets of U.
The members of S are assigned with random flags 0 or 1. For each member of S, the probability of flag 1 is p and flag 0 is (1-p).
The desired output is: The probability of 'Union of the flag 1 subsets in S = U'
Although considering all the possible combinations of the flag 1 subsets in S is the trivial algorithm to lead to output, the running time of this brute force method is obviously exponential.
Is there any polynomial time algorithm which leads to the exact or approximate output? Or can we reduce the problem to any famous one like set-cover?

Getting an exact answer is #P-hard (counting analog of NP, thus at least as hard), since this problem generalizes monotone 2-CNF-SAT, which is known to be #P-hard (Welsh, Dominic; Gale, Amy (2001), "The complexity of counting problems", Aspects of complexity: minicourses in algorithmics, complexity and computational algebra: mathematics workshop, Kaikoura, January 7–15, 2000, pp. 115ff, Theorem 57.). The reduction is to set U to the set of clause identifiers and let each subset in S be the set of clauses in which some variable appears. EDIT: set p = 1/2 for each set, natch.

Selecting Maximum Number of Choice

We are given N fruits and M choices to select those fruits.M lines have some integers and the first one is K and each M lines follows K integers after the first value (ie. K) denoting the indices of fruit to be selected in that choice.
I need to find out the maximal number of choices that can be selected.
Note :- There is only one fruit at a particular index.
Sample Input:-
4 3
2 1 2
2 2 3
2 3 4
Output :-
2
As we can select 1st and 3rd choice.
Which Algorithm should I use to solve this question ?

This is a variation of the maximum independent set
There are very well detailed algorithms for finding maximum independent set in this paper:
Algorithms for Maximum independent Sets
And a parallel approach has been provided in this one:Lecture Notes on a Parallel Algorithm for Generating a Maximal Independent Set
And this is a java implementation.

This is the set packing problem, one of the classic NP-complete problems. There is no efficient solution, but you can try a backtracking algorithm (slow but exact) or a greedy approximation (fast but suboptimal).

A greedy or dynamic algorithm to subset selection

I have a simple algorithmic question. I would be grateful if you could help me.
We have some 2 dimensional points. A positive weight is associated to them (a sample problem is attached). We want to select a subset of them which maximizes the weights and neither of two selected points overlap each other (for example, in the attached file, we cannot select both A and C because they are in the same row, and in the same way we cannot select both A and B, because they are in the same column.) If there is any greedy (or dynamic) approach I can use. I'm aware of non-overlapping interval selection algorithm, but I cannot use it here, because my problem is 2 dimensional.
Any reference or note is appreciated.
Regards
Attachment:
A simple sample of the problem:
A (30$) -------- B (10$)
|
|
|
|
C (8$)

If you are OK with a good solution, and do not demand the best solution - you can use heuristical algorithms to solve this.
Let S be the set of points, and w(s) - the weightening function.
Create a weight function W:2^S->R (from the subsets of S to real numbers):
W(U) = - INFINITY is the solution is not feasible
Sigma(w(u)) for each u in U otherwise
Also create a function next:2^S -> 2^2^S (a function that gets a subset of S, and returns a set of subsets of S)
next(U) = V you can get V from U by adding/removing one element to/from U
Now, given that data - you can invoke any optimization algorithm in the Artificial Intelligence book, such as Genetic Algorithm or Hill Climbing.
For example, Hill Climbing with random restarts, will be something like that:
1. best<- -INFINITY
2. while there is more time
3. choose a random subset s
4. NEXT <- next(s)
5. if max{ W(v) | for each v in NEXT} < W(s): //s is a local maximum
5.1. if W(s) > best: best <- W(s) //if s is better then the previous result - store it.
5.2. go to 2. //restart the hill climbing from a different random point.
6. else:
6.1. s <- max { NEXT }
6.2. goto 4.
7. return best //when out of time, return the best solution found so far.
The above algorithm is anytime - meaning it will produce better results if given more time.

This can be treated as a linear assignment problem, which can be solved using an algorithm like the Hungarian algorithm. The algorithm tries to minimize the sum of costs, so just negate your weights, and use them as the costs. The assignment of rows to columns will give you the subset of points that you need. There are sparse variants for cases where not every (row,column) pair has an associated point, but you can also just use a large positive cost for these.

Well you can think of this as a binary constraint optimization problem, and there are various algorithms. The easiest algorithm for this problem is backtracking and arc propogation. However, it takes exponential time in the worst case. I am not sure if there are any specific algorithms to take advantage of the geometrical nature of the problem.

This can be solved by a pretty straight forward dynamic programming approach with a exponential time complexity
s = {A, B, C ...}
getMaxSum(s) = max( A.value + getMaxSum(compatibleSubSet(s, A)),
B.value + getMaxSum(compatibleSubSet(s, B)),
...)
where compatibleSubSet(s, A) gets the subset of s that does not overlap with A
To optimize it, you can memorize the result for each subset

Some way to do it:
Write a function that generates subsets ordered from the subset off maximum weight to the subset off minimum weight while ignoring the constraints.
Then call this function repeatedly until a subset that honors the constraints pops up.
In order to improve the performance, you can write a not so dumb generator function that for instance honors the not-on-the-same-row constraint but that ignores the not-on-the-same-column one.

an algorithm to find the minimum size set cover for the Set-cover problem

In the Set Covering problem, we are given a universe U, such that |U|=n, and sets S1,……,Sk are subsets of U. A set cover is a collection C of some of the sets from S1,……,Sk whose union is the entire universe U.
I'm trying to come up with an algorithm that will find the minimum number of set cover so that I can show that the greedy algorithm for set covering sometimes finds more sets.
Following is what I came up with:
repeat for each set.
1. Cover<-Seti (i=1,,,n)
2. if a set is not a subset of any other sets, then take take that set into cover.
but it's not working for some instances.
Please help me figure out an algorithm to find the minimum set cover.
I'm still having problem find this algorithm online. Anyone has any suggestion?

Set cover is NP-hard, so it's unlikely that there'll be an algorithm much more efficient than looking at all possible combinations of sets, and checking if each combination is a cover.
Basically, look at all combinations of 1 set, then 2 sets, etc. until they form a cover.
EDIT
This is an example pseudocode. Note that I do not claim that this is efficient. I simply claim that there isn't a much more efficient algorithm (algorithms will be worse than polynomial time unless something really cool is discovered)
for size in 1..|S|:
for C in combination(S, size):
if (union(C) == U) return C
where combination(K, n) returns all possible sets of size n whose elements come from K.
EDIT
However, I'm not too sure why you need an algorithm to find the minimum. In the question you state that you want to show that the greedy algorithm for set covering sometimes finds more sets. But this is easily achieved via a counterexample (and a counterexample is shown in the wikipedia entry for set cover). So I am quite puzzled.
EDIT
A possible implementation of combination(K, n) is:
if n == 0: return [{}] //a list containing an empty set
r = []
for k in K:
K = K \ {k} // remove k from K.
for s in combination(K, n-1):
r.append(union({k}, s))
return r
But in combination with the cover problem, one probably wants to perform the test of coverage from the base case n == 0 instead. Well.

Try Donald E. Knuth algorithm-X for exact set coverage, using a sparse matrix. Must be adapted a little to solve minimum set cover problems also.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Greedy Set Coverage algorithm built by removing sets - algorithm

The algorithm will surely return a valid set cover as at every step it checks if all elements of s are redundant. Intuitively I feel that part b is true though I am unable to write a formal proof for it. Read chapter 2 of Vijay Vazirani as it might help do the analysis part.

Related

An example of an input to the set cover problem which does not provide a 2-approximation

Calculate the probability of one set to be covered by random subsets

Selecting Maximum Number of Choice

A greedy or dynamic algorithm to subset selection

an algorithm to find the minimum size set cover for the Set-cover problem

Categories

Resources