Assign items to a minimal number of groups - algorithm

This is a popular CS pattern, but I'm apparently missing some keywords because I'm not having any luck searching for it.
I have a set of 4 items: [A,B,C,D].
I have 3 groups: 1, 2, 3.
Group 1 can accept A or B.
Group 2 can accept B or C.
Group 3 can accept C or D.
Assign the items in a way that minimizes the number of groups used.
I.e. the solution would be:
Group 1: [A,B]
Group 2: []
Group 3: [C,D]
How would I solve this programatically? I know I've seen this before, so any keywords or links to point me in the right direction would be very appreciated.

This is the set covering problem. It is NP hard, so finding the true minimal set is hard in general and requires exponential time. The greedy algorithm which takes the set covering the most remaining elements may give good approximations. For covering sets with bounded size it can be also solved in reasonable time. For further details see http://en.m.wikipedia.org/wiki/Set_cover_problem

If we visualize this problem as a Graph problem, which all the groups and items are nodes in the graph, and there are edges connect between the group and items, we can see that this problem is a small case of Vertex Cover
However, if the number of items is small (less than 16), we can use dynamic programming to solve it easily.

Related

Shortest time for everyone to get to the destination

I have to design an algorithm to solve a problem:
We have two groups of people (group A and group B, the number of people in group A is always less or equal to the number of people in group B), all standing in a one-dimensional line, each people have a corresponding number indicating its location. When the timer starts, each people in group A must find a partner in group B, but people in group B cannot move at all and each people in group B can only have at most 1 partner.
Suppose that people in group A move 1 unit/sec, how can I find the minimum time for everyone in group A to find a partner?
for example, if there are three people in group A with location {5,7,8}, and four people in group B with location {2,3,4,9}, the optimal solution would be 3 sec because max(5-3,7-4,9-8)=3
I could just use brute-force to solve it, but is there a better way of solving this problem?
This problem is a special case of the edit distance problem, and so a similar Dynamic Programming solution can be used to solve it. It's possible that a faster solution exists for this special case.
Let A = [a_0, a_1...,a_(m-1)] be the (sorted) positions of our m moving people, and B = [b_0, b_1...,b_(n-1)] be the n (sorted) destination spots, with m <= n. For the edit distance analogy, the allowed operations are:
Insert a number into A (free), or
Substitute an element a -> a' in A with cost |a-a'|.
We can solve this in O(n*m) time (plus sorting time of both A and B, if necessary).
We can define the dynamic programming via a cost function C(i, j) which is the minimum cost to move the first i people a_0, ... a_(i-1) using only the first j spots b_0, ... b_(j-1). You want C(m,n). Define C as follows:

Why dynamic programming for 0/1 Knapsack?

I looked at many resources and also this question, but am still confused why we need Dynamic Programming to solve 0/1 knapsack?
The question is: I have N items, each item with value Vi, and each item has weight Wi. We have a bag of total weight W. How to select items to get best total of values over limitation of weight.
I am confused with the dynamic programming approach: why not just calculate the fraction of (value / weight) for each item and select the item with best fraction which has less weight than remaining weight in bag?
For your fraction-based approach you can easily find a counterexample.
Consider
W=[3, 3, 5]
V=[4, 4, 7]
Wmax=6
Your approach gives optimal value Vopt=7 (we're taking the last item since 7/5 > 4/3), but taking the first two items gives us Vopt=8.
As other answers pointed out, there are edge cases with your approach.
To explain the recursive solution a bit better, and perhaps to understand it better I suggest you approach it with this reasoning:
For each "subsack"
If we have no fitting element there is no best element
If we only have one fitting element, the best choice is that element
If we have more than one fitting element, we take each element and calculate the best fit for its "subsack". The best choice is the highest valued element/subsack combination.
This algorithm works because it spans all the possible combinations of fitting elements and finds the one with the highest value.
A direct solution, instead, is not possible as the problem is NP-hard.
Just look at this counterexample:
Weight 7, W/V pairs (3/10),(4/12),(5/21)
Greedy algorithm fails when there is unit ratio case. for example consider the following example:
n= 1 2, P= 4 18, W= 2 18, P/W= 2 1
Knapsack capacity=18
According to greedy algorithm it will consider the first item since it's P/W ratio is greater and hence the total profit will be 4 (since it cannot insert the second item after first as the capacity reduces to 16 after inserting the first item).
But the actual answer is 18.
Hence there are multiple corner cases where greedy fails to give optimal solution, that's why we use Dynamic programming in 0/1 knapsack problem.

How to find the minimum number of groups needed for grouping a list of elements?

For example:
There is a list of elements, e.g. a sorted integer list {1,2,3,4,6,8,11}.
I want to find the minimum number of groups, in which each member in the group have a
qualified difference between each other, e.g. less than or equal to 2 to each other.
In this case, we can see the answer should be 4, and one possible solution is {1,2,3},{4,6},{8},{11}.
How to write the algorithm to find the minimum number?
Furthermore, we can consider this in 2 dimension:
There is a list of points {(3,3),(3,6),(6,9)}. What is the minimum number of squares with length 3 I will need to use to cover all those points? The answer should be 2 here. But how to find it by programming?
I am asking this question for solving the question Square Fields in Google Code Jam Practice Contest, but any answer solving the first part of this question is good enough for me.
For 1-D: Sort the elements by ascending order, and greedily form groups by setting the new group's left bound as the smallest element not yet covered by any group.

Combinatorial best match

Say I have a Group data structure which contains a list of Element objects, such that each group has a unique set of elements.:
public class Group
{
public List<Element> Elements;
}
and say I have a list of populations who require certain elements, in such a way that each population has a unique set of required elements:
public class Population
{
public List<Element> RequiredElements;
}
I have an unlimited quantity of each defined Group, i.e. they are not consumed by populations.
Say I am looking at a particular Population. I want to find the best possible match of groups such that there is minimum excess elements, and no unmatched elements.
For example: I have a population which needs wood, steel, grain, and coal. The only groups available are {wood, herbs}, {steel, coal, oil}, {grain, steel}, and {herbs, meat}.
The last group - {herbs, meat} isn't required at all by my population so it isn't used. All others are needed, but herbs and oil are not required so it is wasted. Furthermore, steel exists twice in the minimum set, so one lot of steel is also wasted. The best match in this example has a wastage of 3.
So for a few hundred Population objects, I need to find the minimum wastage best match and compute how many elements are wasted.
How do I even begin to solve this? Once I have found a match, counting the wastage is trivial. Finding the match in the first place is hard. I could enumerate all possibilities but with a few thousand populations and many hundreds of groups, it's quite a task. Especially considering this whole thing sits inside each iteration of a simulated annealing algorithm.
I'm wondering whether I can formulate the whole thing as a mixed-integer program and call a solver like GLPK at each iteration.
I hope I have explained the problem correctly. I can clarify anything that's unclear.
Here's my binary program, for those of you interested...
x is the decision vector, an element of {0,1}, which says that the population in question does/doesn't receive from group i. There is an entry for each group.
b is the column vector, an element of {0,1}, which says which resources the population in question does/doesn't need. There is an entry for each resource.
A is a matrix, an element of {0,1}, which says what resources are in what groups.
The program is:
Minimise: ((Ax - b)' * 1-vector) + (x' * 1-vector);
Subject to: Ax >= b;
The constraint just says that all required resources must be satisfied. The objective is to minimise all excess and the total number of groups used. (i.e. 0 excess with 1 group used is better than 0 excess with 5 groups used).
You can formulate an integer program for each population P as follows. Use a binary variable xj to denote whether group j is chosen or not. Let A be a binary matrix, such that Aij is 1 if and only if item i is present in group j. Then the integer program is:
min Ei,j (xjAij)
s.t. Ej xjAij >= 1 for all i in P.
xj = 0, 1 for all j.
Note that you can obtain the minimum wastage by subtracting |P| from the optimal solution of the above IP.
Do you mean the Maximum matching problem?
You need to build a bipartite graph, where one of the sides is your populations and the other is groups, and edge exists between group A and population B if it have it in its set.
To find maximum edge matching you can easily use Kuhn algorithm, which is greatly described here on TopCoder.
But, if you want to find mimimum edge dominating set (the set of minimum edges that is covering all the vertexes), the problem becomes NP-hard and can't be solved in polynomial time.
Take a look at the weighted set cover problem, I think this is exactly what you described above. A basic description of the (unweighted) problem can be found here.
Finding the minimal waste as you defined above is equivalent to finding a set cover such that the sum of the cardinalities of the covering sets is minimal. Hence, the weight of each set (=a group of elements) has to be defined equal to its cardinality.
Since even the unweighted the set cover problem is NP-complete, it is not likely that an efficient algorithm for your problem instances exist. Maybe a good greedy approximation algorithm will be sufficient or your purpose? Googling weighted set cover provides several promising results, e.g. this script.

Ordering a dictionary to maximize common letters between adjacent words

This is intended to be a more concrete, easily expressable form of my earlier question.
Take a list of words from a dictionary with common letter length.
How to reorder this list tto keep as many letters as possible common between adjacent words?
Example 1:
AGNI, CIVA, DEVA, DEWA, KAMA, RAMA, SIVA, VAYU
reorders to:
AGNI, CIVA, SIVA, DEVA, DEWA, KAMA, RAMA, VAYU
Example 2:
DEVI, KALI, SHRI, VACH
reorders to:
DEVI, SHRI, KALI, VACH
The simplest algorithm seems to be: Pick anything, then search for the shortest distance?
However, DEVI->KALI (1 common) is equivalent to DEVI->SHRI (1 common)
Choosing the first match would result in fewer common pairs in the entire list (4 versus 5).
This seems that it should be simpler than full TSP?
What you're trying to do, is calculate the shortest hamiltonian path in a complete weighted graph, where each word is a vertex, and the weight of each edge is the number of letters that are differenct between those two words.
For your example, the graph would have edges weighted as so:
DEVI KALI SHRI VACH
DEVI X 3 3 4
KALI 3 X 3 3
SHRI 3 3 X 4
VACH 4 3 4 X
Then it's just a simple matter of picking your favorite TSP solving algorithm, and you're good to go.
My pseudo code:
Create a graph of nodes where each node represents a word
Create connections between all the nodes (every node connects to every other node). Each connection has a "value" which is the number of common characters.
Drop connections where the "value" is 0.
Walk the graph by preferring connections with the highest values. If you have two connections with the same value, try both recursively.
Store the output of a walk in a list along with the sum of the distance between the words in this particular result. I'm not 100% sure ATM if you can simply sum the connections you used. See for yourself.
From all outputs, chose the one with the highest value.
This problem is probably NP complete which means that the runtime of the algorithm will become unbearable as the dictionaries grow. Right now, I see only one way to optimize it: Cut the graph into several smaller graphs, run the code on each and then join the lists. The result won't be as perfect as when you try every permutation but the runtime will be much better and the final result might be "good enough".
[EDIT] Since this algorithm doesn't try every possible combination, it's quite possible to miss the perfect result. It's even possible to get caught in a local maximum. Say, you have a pair with a value of 7 but if you chose this pair, all other values drop to 1; if you didn't take this pair, most other values would be 2, giving a much better overall final result.
This algorithm trades perfection for speed. When trying every possible combination would take years, even with the fastest computer in the world, you must find some way to bound the runtime.
If the dictionaries are small, you can simply create every permutation and then select the best result. If they grow beyond a certain bound, you're doomed.
Another solution is to mix the two. Use the greedy algorithm to find "islands" which are probably pretty good and then use the "complete search" to sort the small islands.
This can be done with a recursive approach. Pseudo-code:
Start with one of the words, call it w
FindNext(w, l) // l = list of words without w
Get a list l of the words near to w
If only one word in list
Return that word
Else
For every word w' in l do FindNext(w', l') //l' = l without w'
You can add some score to count common pairs and to prefer "better" lists.
You may want to take a look at BK-Trees, which make finding words with a given distance to each other efficient. Not a total solution, but possibly a component of one.
This problem has a name: n-ary Gray code. Since you're using English letters, n = 26. The Wikipedia article on Gray code describes the problem and includes some sample code.

Resources