Extended version of the set cover problem - algorithm

I don't generally ask questions on SO, so if this question seems inappropriate for SO, just tell me (help would still be appreciated of course).
I'm still a student and I'm currently taking a class in Algorithms. We recently learned about the Branch-and-Bound paradigm and since I didn't fully understand it, I tried to do some exercises in our course book. I came across this particular instance of the Set Cover problem with a special twist:
Let U be a set of elements and S = {S1, S2, ..., Sn} a set of subsets of U, where the union of all sets Si equals U. Outline a Branch-and-Bound algorithm to find a minimal subset Q of S, so that for all elements u in U there are at least two sets in Q, which contain u. Specifically, elaborate how to split the problem up into subproblems and how to calculate upper and lower bounds.
My first thought was to sort all the sets Si in S in descending order, according to how many elements they contain which aren't yet covered at least twice by the currently chosen subsets of S, so our current instance of Q. I was then thinking of recursively solving this, where I choose the first set Si in the sorted order and make one recursive call, where I take this set Si and one where I don't (meaning from those recursive calls onwards the subset is no longer considered). If I choose it I would then go through each element in this chosen subset Si and increase a counter for all its elements (before the recursive call), so that I'll eventually know, when an element is already covered by two or more chosen subsets. Since I sort the not chosen sets Si for each recursive call, I would theoretically (in my mind at least) always be making the best possible choice for the moment. And since I basically create a binary tree of recursive calls, because I always make one call with the current best subset chosen and one where I don't I'll eventually cover all 2^n possibilities, meaning eventually I'll find the optimal solution.
My problem now is I don't know or rather understand how I would implement a heuristic for upper and lower bounds, so the algorithm can discard some of the paths in the binary tree, which will never be better than the current best Q. I would appreciate any help I could get.

Here's a simple lower bound heuristic: Find the set containing the largest number of not-yet-twice-covered elements. (It doesn't matter which set you pick if there are multiple sets with the same, largest possible number of these elements.) Suppose there are u of these elements in total, and this set contains k <= u of them. Then, you need to add at least u/k further sets before you have a solution. (Why? See if you can prove this.)
This lower bound works for the regular set cover problem too. As always with branch and bound, using it may or may not result in better overall performance on a given instance than simply using the "heuristic" that always returns 0.

First, some advice: don't re-sort S every time you recurse/loop. Sorting is an expensive operation (O(N log N)) so putting it in a loop or a recursion usually costs more than you gain from it. Generally you want to sort once at the beginning, and then leverage that sort throughout your algorithm.
The sort you've chosen, descending by the length of the S subsets is a good "greedy" ordering, so I'd say just do that upfront and don't re-sort after that. You don't get to skip over subsets that are not ideal within your recursion, but checking a redundant/non-ideal subset is still faster than re-sorting every time.
Now what upper/lower bounds can you use? Well generally, you want your bounds and bounds-checking to be as simple and efficient as possible because you are going to be checking them a lot.
With this in mind, an upper bounds is easy: use the shortest set-length solution that you've found so far. Initially set your upper-bounds as var bestQlength = int.MaxVal, some maximum value that is greater than n, the number of subsets in S. Then with every recursion you check if currentQ.length > bestQlength, if so then this branch is over the upper-bounds and you "prune" it. Obviously when you find a new solution, you also need to check if it is better (shorter) than your current bestQ and if so then update both bestQ and bestQlength at the same time.
A good lower bounds is a bit trickier, the simplest I can think of for this problem is: Before you add a new subset Si into your currentQ, check to see if Si has any elements that are not already in currentQ two or more times, if it does not, then this Si cannot contribute in any way to the currentQ solution that you are trying to build, so just skip it and move on to the next subset in S.

Related

Mutually Overlapping Subset of Activites

I am prepping for a final and this was a practice problem. It is not a homework problem.
How do I go about attacking this? Also, more generally, how do I know when to use Greedy vs. Dynamic programming? Intuitively, I think this is a good place to use greedy. I'm also thinking that if I could somehow create an orthogonal line and "sweep" it, checking the #of intersections at each point and updating a global max, then I could just return the max at the end of the sweep. I'm not sure how to plane sweep algorithmically though.
a. We are given a set of activities I1 ... In: each activity Ii is represented by its left-point Li and its right-point Ri. Design a very efficient algorithm that finds the maximum number of mutually overlapping subset of activities (write your solution in English, bullet by bullet).
b. Analyze the time complexity of your algorithm.
Proposed solution:
Ex set: {(0,2) (3,7) (4,6) (7,8) (1,5)}
Max is 3 from interval 4-5
1) Split start and end points into two separate arrays and sort them in non-decreasing order
Start points: [0,1,3,4,7] (SP)
End points: [2,5,6,7,8] (EP)
I know that I can use two pointers to sort of simulate the plane sweep, but I'm not exactly sure how. I'm stuck here.
I'd say your idea of a sweep is good.
You don't need to worry about planar sweeping, just use the start/end points. Put the elements in a queue. In every step take the smaller element from the queue front. If it's a start point, increment current tasks count, otherwise decrement it.
Since you don't need to point which tasks are overlapping - just the count of them - you don't need to worry about specific tasks duration.
Regarding your greedy vs DP question, in my non-professional opinion greedy may not always provide valid answer, whereas DP only works for problem that can be divided into smaller subproblems well. In this case, I wouldn't call your sweep-solution either.

Optimal selection election algorithm

Given a bunch of sets of people (similar to):
[p1,p2,p3]
[p2,p3]
[p1]
[p1]
Select 1 from each set, trying to minimize the maximum number of times any one person is selected.
For the sets above, the max number of times a given person MUST be selected is 2.
I'm struggling to get an algorithm for this. I don't think it can be done with a greedy algorithm, more thinking along the lines of a dynamic programming solution.
Any hints on how to go about this? Or do any of you know any good websites about this stuff that I could have a look at?
This is neither dynamic nor greedy. Let's look at a different problem first -- can it be done by selecting every person at most once?
You have P people and S sets. Create a graph with S+P vertices, representing sets and people. There is an edge between person pi and set si iff pi is an element of si. This is a bipartite graph and the decision version of your problem is then equivalent to testing whether the maximum cardinality matching in that graph has size S.
As detailed on that page, this problem can be solved by using a maximum flow algorithm (note: if you don't know what I'm talking about, then take your time to read it now, as you won't understand the rest otherwise): first create a super-source, add an edge linking it to all people with capacity 1 (representing that each person may only be used once), then create a super-sink and add edges linking every set to that sink with capacity 1 (representing that each set may only be used once) and run a suitable max-flow algorithm between source and sink.
Now, let's consider a slightly different problem: can it be done by selecting every person at most k times?
If you paid attention to the remarks in the last paragraph, you should know the answer: just change the capacity of the edges leaving the super-source to indicate that each person may be used more than once in this case.
Therefore, you now have an algorithm to solve the decision problem in which people are selected at most k times. It's easy to see that if you can do it with k, then you can also do it with any value greater than k, that is, it's a monotonic function. Therefore, you can run a binary search on the decision version of the problem, looking for the smallest k possible that still works.
Note: You could also get rid of the binary search by testing each value of k sequentially, and augmenting the residual network obtained in the last run instead of starting from scratch. However, I decided to explain the binary search version as it's conceptually simpler.

Minimum ordering of a loop of values

Given a sequence like:
1,2,1,2,1,1,1,2,1,2,1,3,1,2,1,2,1,3
What is an efficient way to obtain the minimum ordering:
1,1,1,2,1,2,1,3,1,2,1,2,1,3,1,2,1,2
The brute force approach is obvious so please don't recommend it -- unless to provide a convincing proof that the brute force approach is the only way to do it!
More detail
I have an algorithm that generates a list of numbers. The output of the algorithm is a list / array, but logically the numbers represent a loop, where only the relative order of the elements matters. In order to store these loops for later comparison I want to put store them in such a way that the one-dimensional list of numbers stored represents the minimum ordering of the elements in the loop. A picture would be most helpful:
This loop describes the path around the T tetromino, where 1 is move forward, 2 is turn right, and 3 is turn left. It doesn't matter where you start or even which direction you go, following this sequence of 18 moves will get you a T tetromino. The output of the algorithm which produces this loop will return the elements with an arbitrary starting point and direction. So the returned array could be:
Arbitrary initial ordering:
1,2,1,2,1,1,1,2,1,2,1,3,1,2,1,2,1,3
There is one minimum ordering, though. It can be obtained from two different circuits, reflecting the fact that the T tetromino is symmetrical:
Minimum ordering:
1,1,1,2,1,2,1,3,1,2,1,2,1,3,1,2,1,2
Brute force
The obvious brute force method is to construct all possible orderings and take the minimum. My question is whether there is a more clever and efficient way of doing this.
This is a well-studied problem called the lexicographically minimal string rotation.
The better algorithms all run in O(n) time as opposed to O(n*n) for the naive algorithm.

Solution to Recursive Relations with Arrays

from Mexico. The truth is almost never asked or open new issues, because really the forum and not only this, if not to work instead of the network, you can find plenty of information about topic x or y, however this time I feel very defeated.
I have two years of recursion.
Define the following recursive algorithms.
a. Calculate the next n integers.
At first not referred to the master with this is that if the algorithm returns a sum, or set of numbers. Furthermore, although in principle and algorithm design for the second case is asked to resolve by its expression as a recurrence relation ... this is where I am more than lost, not how to express this as a RR. And that can be solved
b. Calculate the minimum of a set of integers
In the other case suppose that calls for the minimum of a set of integers. that's solved, but the fact and pass it to a RR fix, has left me completely flooded.
APPRECIATE ANY HELP, thanks
Answering on b)
You have a set of integers. You pick one and you know that minimal element is either that you've picked or the minimal is still in the set. Recursivly you call function unless you pick all elements from set, you assume that minimum of set that contain no elements is infinity. Then your recurrence is going back updateing the minimal value.
minimum (S) = min(any element, minimum(Rest of S))
if (S is empty) then minimum(empty) = infinity.
Not an implementation in any language cause surely depend on representation of set.
P.S why doing this recursivly?

Sorting a list of numbers with modified cost

First, this was one of the four problems we had to solve in a project last year and I couldn’t find a suitable algorithm so we handle in a brute force solution.
Problem: The numbers are in a list that is not sorted and supports only one type of operation. The operation is defined as follows:
Given a position i and a position j the operation moves the number at position i to position j without altering the relative order of the other numbers. If i > j, the positions of the numbers between positions j and i - 1 increment by 1, otherwise if i < j the positions of the numbers between positions i+1 and j decreases by 1. This operation requires i steps to find a number to move and j steps to locate the position to which you want to move it. Then the number of steps required to move a number of position i to position j is i+j.
We need to design an algorithm that given a list of numbers, determine the optimal (in terms of cost) sequence of moves to rearrange the sequence.
Attempts:
Part of our investigation was around NP-Completeness, we make it a decision problem and try to find a suitable transformation to any of the problems listed in Garey and Johnson’s book: Computers and Intractability with no results. There is also no direct reference (from our point of view) to this kind of variation in Donald E. Knuth’s book: The art of Computer Programing Vol. 3 Sorting and Searching. We also analyzed algorithms to sort linked lists but none of them gives a good idea to find de optimal sequence of movements.
Note that the idea is not to find an algorithm that orders the sequence, but one to tell me the optimal sequence of movements in terms of cost that organizes the sequence, you can make a copy and sort it to analyze the final position of the elements if you want, in fact we may assume that the list contains the numbers from 1 to n, so we know where we want to put each number, we are just concerned with minimizing the total cost of the steps.
We tested several greedy approaches but all of them failed, divide and conquer sorting algorithms can’t be used because they swap with no cost portions of the list and our dynamic programing approaches had to consider many cases.
The brute force recursive algorithm takes all the possible combinations of movements from i to j and then again all the possible moments of the rest of the element’s, at the end it returns the sequence with less total cost that sorted the list, as you can imagine the cost of this algorithm is brutal and makes it impracticable for more than 8 elements.
Our observations:
n movements is not necessarily cheaper than n+1 movements (unlike swaps in arrays that are O(1)).
There are basically two ways of moving one element from position i to j: one is to move it directly and the other is to move other elements around i in a way that it reaches the position j.
At most you make n-1 movements (the untouched element reaches its position alone).
If it is the optimal sequence of movements then you didn’t move the same element twice.
This problem looks like a good candidate for an approximation algorithm but that would only give us a good enough answer. Since you want the optimal answer, this is what I'd do to improve on the brute force approach.
Instead of blindly trying every permutations, I'd use a backtracking approach that would maintain the best solution found and prune any branches that exceed the cost of our best solution. I would also add a transposition table to avoid redoing searches on states that were reached by previous branches using different move permutations.
I would also add a few heuristics to explore moves that are more likely to reach good results before any other moves. For example, prefer moves that have a small cost first. I'd need to experiment before I can tell which heuristics would work best if any.
I would also try to find the longest increasing subsequence of numbers in the original array. This will give us a sequence of numbers that don't need to be moved which should considerably cut the number of branches we need to explore. This also greatly speeds up searches on list that are almost sorted.
I'd expect these improvements to be able to handle lists that are far greater than 8 but when dealing with large lists of random numbers, I'd prefer an approximation algorithm.
By popular demand (1 person), this is what I'd do to solve this with a genetic algorithm (the meta-heuristique I'm most familiar with).
First, I'd start by calculating the longest increasing subsequence of numbers (see above). Every item that is not part of that set has to be moved. All we need to know now is in what order.
The genomes used as input for the genetic algorithm, is simply an array where each element represents an item to be moved. The order in which the items show up in the array represent the order in which they have to be moved. The fitness function would be the cost calculation described in the original question.
We now have all the elements needed to plug the problem in a standard genetic algorithm. The rest is just tweaking. Lots and lots of tweaking.

Resources