Question about formulating a problem as a Linear Program - algorithm

I have the following problem:
We have 180 students. Each student is required to choose one of 6 courses to get a degree. No course should have more than 30 students in it. Moreover, students must specify three courses with different preferences :.The goal is to find an assignment of students to courses in such a way that:
Every student is assigned to a course.
There is no course which has more than 30 students.
The sum of student preferences is maximized.
First question is to formulate the problem as a Linear Program (LP). My formulation is as follows:
Maximize ,
subject to:
.
.
.
Is my formulation correct ?
The second part of the question is the following:
Suppose we have a black box which solves the Min Cost Flow problem (https://en.wikipedia.org/wiki/Minimum-cost_flow_problem). How to use this black box to solve our assignment problem ?
Thank you,
Regards.

Your Integer Linear Programming (ILP) formulation is not completely correct, in your last constraint, you write that all classes have exactly 30 students, but that is incorrect, a class can not have more than 30 students.
So the formulation should be something like:
maximize ∑ij xij pij
subject to:
∑jxij=1, ∀i
∑ixij≤30, ∀j
As for the max-flow, you can present each student as a node in a network, and each class as a node, for example for four students and three classes, the graph looks like:
Here the capacity of s to the students si is 1, since each student can make at most one choice, so c(s, si)=1. The capacity of a class room is 30, so that means that for every class cj, it holds that c(ci, d)=30. Furthermore the capacity between each si and cj is 1 as well (although a larger capacity will not make a difference), so c(si, cj)=1.
Here we add a "cost" to the edges between si and cj that is equal to a(si, cj)=-pij, so given the perference is higher, the cost is lower. Other edges have a cost of zero, so a(s, si)=a(cj,d)=0. So here we will assign flows (based on the capacity one per student, such that the total flow to a class room is less than 30), and minimize the cost, so minimize the sum of the -pij's. Given a flow exists such that there is a flow of 1 from the source s to every student si, then we can give each student a choice, and the total cost will be optimized.

Related

A variation of KnapSack

Consider the problem definition of a knapsack problem. Given a set S of objects - each having a profit and weight associated with it, I have to find a subset T of S, which gives me the maximum profit but has a total weight less than or equal to a constant W. Now consider an extra constraint. In the above problem the profit of one object is independent of another. Suppose I say they're interdependent, say I've a factor 0<= S_ij <=1 for two objects i and j. This factor diminishes the effect of the item with minimum profit. Effectively
profit({i,j})=max(profit(i),profit(j))+S_ij * min(profit(i),profit(j))
This keeps the effective sum between max(profit(i),profit(j)) and profit(i)+profit(j) -> "Atleast as good as the best one but not as good as using both simultaneously". Now I'm tyring to extend it for n>2. Is this a standard problem of some variation of knapsack ? Can I formulate an LP(?) or NLP for this ?
UPDATE:
The set T is a strict subset of S. So you can only use objects in S(use duplicates if it exists in S).
As for the objective function, I'm still not sure about how to go about it. Above I've calculated the score for a 2 object sack considering the interactions between them. Now i want extend it over to more than 2 objects, and I'm not sure how to do it. The letter 'n' is the size of sack. For n=2 I've defined a way of calculating the total profit of the sack but for n>2 I'm not quite clear.

Variant Scheduling Algorithm

I'm working on a problem from "Algorithm Design" by Kleinberg, specifically problem 4.15. I'm not currently enrolled in the class that this relates to -- I'm taking a crack at the problem set before the new quarter starts to see if I'd be able to do it. The question is as follows:
The manager of a large student union on campus comes to you with the
following problem. She’s in charge of a group of n students, each of whom
is scheduled to work one shift during the week. There are different jobs
associated with these shifts (tending the main desk, helping with package
delivery, rebooting cranky information kiosks, etc.), but.we can view each
shift as a single contiguous interval of time. There can be multiple shifts
going on at once.
She’s trying to choose a subset of these n students to form a super-
vising committee that she can meet with once a week. She considers such
a committee to be complete if, for every student not on the committee,
that student’s shift overlaps (at least partially) the shift of some student
who is on the committee. In this way, each student’s performance can be
observed by at least one person who’s serving on the committee.
Give an efficient algorithm that takes the schedule of n shifts and
produces a complete supervising committee containing as few students
as possible.
Example. Suppose n = 3, and the shifts are
Monday 4 p.M.-Monday 8 P.M.,
Monday 6 p.M.-Monday 10 P.M.,
Monday 9 P.M.-Monday 1I P.M..
Then the smallest complete supervising committee would consist of just
the second student, since the second shift overlaps both the first and the
third.
My attempt (I can't find this problem in my solution manual, so I'm asking here):
Construct a graph G with vertices S1, S2, ..., Sn for each student.
Let there be an edge between Si and Sj iff students i and j have an overlapping
shift. Let C represent the set of students in the supervising committee.
[O(n + 2m) to build an adjacency list, where m is the number of shifts?
Since we have to add at least each student to the adjacency list, and add an
additional m entries for each shift, with two entries added per shift since
our graph is undirected.]
Sort the vertices by degree into a list S [O(n log n)].
While S[0] has degree > 0:
(1) Add Si to C. [O(1)]
(2) Delete Si and all of the nodes that it was connected to, update the
adjacency list.
(3) Update S so that it is once again sorted.
Add any remaining vertices of degree 0 to C.
I'm not sure how to quantify the runtime of (2) and (3). Since the degree of any node is bounded by n, it seems that (2) is bounded by O(n). But the degree of the node removed in (1) also affects the number of iterations performed inside of the while loop, so I suspect that it's possible to say something about the upper bound of the whole while loop -- something to the effect of "Any sequence of deletions will involve deleting at most n nodes in linear time and resorting at most n nodes in linear time, resulting in an upper bound of O(n log n) for the while loop, and therefore of the algorithm as a whole."
You don't want to convert this to a general graph problem, as then it's simply the NP-hard vertex cover problem. However, on interval graphs in particular, there is in fact a linear-time greedy algorithm, as described in this paper (which is actually for a more general problem, but works fine here). From a quick read of it, here's how it applies to your problem:
Sort the students by the time at which their shift ends, from earliest to latest. Number them 1 through n.
Initialize a counter k = 1 which represents the earliest student in the ordering not in the committee.
Starting from k, find the first student in the order whose shift does not intersect student k's shift. Suppose this is student i. Add student i-1 to the committee, and update k to be the new earliest student not covered by the committee.
Repeat the previous step until all students are covered.
(This feels correct, but like I said I only had a quick read, so please say if I missed something)

Combinatorial best match

Say I have a Group data structure which contains a list of Element objects, such that each group has a unique set of elements.:
public class Group
{
public List<Element> Elements;
}
and say I have a list of populations who require certain elements, in such a way that each population has a unique set of required elements:
public class Population
{
public List<Element> RequiredElements;
}
I have an unlimited quantity of each defined Group, i.e. they are not consumed by populations.
Say I am looking at a particular Population. I want to find the best possible match of groups such that there is minimum excess elements, and no unmatched elements.
For example: I have a population which needs wood, steel, grain, and coal. The only groups available are {wood, herbs}, {steel, coal, oil}, {grain, steel}, and {herbs, meat}.
The last group - {herbs, meat} isn't required at all by my population so it isn't used. All others are needed, but herbs and oil are not required so it is wasted. Furthermore, steel exists twice in the minimum set, so one lot of steel is also wasted. The best match in this example has a wastage of 3.
So for a few hundred Population objects, I need to find the minimum wastage best match and compute how many elements are wasted.
How do I even begin to solve this? Once I have found a match, counting the wastage is trivial. Finding the match in the first place is hard. I could enumerate all possibilities but with a few thousand populations and many hundreds of groups, it's quite a task. Especially considering this whole thing sits inside each iteration of a simulated annealing algorithm.
I'm wondering whether I can formulate the whole thing as a mixed-integer program and call a solver like GLPK at each iteration.
I hope I have explained the problem correctly. I can clarify anything that's unclear.
Here's my binary program, for those of you interested...
x is the decision vector, an element of {0,1}, which says that the population in question does/doesn't receive from group i. There is an entry for each group.
b is the column vector, an element of {0,1}, which says which resources the population in question does/doesn't need. There is an entry for each resource.
A is a matrix, an element of {0,1}, which says what resources are in what groups.
The program is:
Minimise: ((Ax - b)' * 1-vector) + (x' * 1-vector);
Subject to: Ax >= b;
The constraint just says that all required resources must be satisfied. The objective is to minimise all excess and the total number of groups used. (i.e. 0 excess with 1 group used is better than 0 excess with 5 groups used).
You can formulate an integer program for each population P as follows. Use a binary variable xj to denote whether group j is chosen or not. Let A be a binary matrix, such that Aij is 1 if and only if item i is present in group j. Then the integer program is:
min Ei,j (xjAij)
s.t. Ej xjAij >= 1 for all i in P.
xj = 0, 1 for all j.
Note that you can obtain the minimum wastage by subtracting |P| from the optimal solution of the above IP.
Do you mean the Maximum matching problem?
You need to build a bipartite graph, where one of the sides is your populations and the other is groups, and edge exists between group A and population B if it have it in its set.
To find maximum edge matching you can easily use Kuhn algorithm, which is greatly described here on TopCoder.
But, if you want to find mimimum edge dominating set (the set of minimum edges that is covering all the vertexes), the problem becomes NP-hard and can't be solved in polynomial time.
Take a look at the weighted set cover problem, I think this is exactly what you described above. A basic description of the (unweighted) problem can be found here.
Finding the minimal waste as you defined above is equivalent to finding a set cover such that the sum of the cardinalities of the covering sets is minimal. Hence, the weight of each set (=a group of elements) has to be defined equal to its cardinality.
Since even the unweighted the set cover problem is NP-complete, it is not likely that an efficient algorithm for your problem instances exist. Maybe a good greedy approximation algorithm will be sufficient or your purpose? Googling weighted set cover provides several promising results, e.g. this script.

Is there a well understood algorithm or solution model for this meeting scheduling scenario?

I have a complex problem and I want to know if an existing and well understood solution model exists or applies, like the Traveling Salesman problem.
Input:
A calendar of N time events, defined by starting and finishing time, and place.
The capacity of each meeting place (maximum amount of people it can simultaneously hold)
A set of pairs (Ai,Aj) which indicates that attendant Ai wishes to meet with attendat Aj, and Aj accepted that invitation.
Output:
For each assistant A, a cronogram of all the events he will attend. The main criteria is that each attendants should meet as many of the attendants who accepted his invites as possible, satisfying the space constraints.
So far, we thought of solving with backtracking (trying out all possible solutions), and using linear programming (i.e. defining a model and solving with the simplex algorithm)
Update: If Ai already met Aj in some event, they don't need to meet anymore (they have already met).
Your problem is as hard as minimum maximal matching problem in interval graphs, w.l.o.g Assume capacity of rooms is 2 means they can handle only one meeting in time. You can model your problem with Interval graphs, each interval (for each people) is one node. Also edges are if A_i & A_j has common time and also they want to see each other, set weight of edges to the amount of time they should see each other, . If you find the minimum maximal matching in this graph, you can find the solution for your restricted case. But notice that this graph is n-partite and also each part is interval graph.
P.S: note that if the amount of time that people should be with each other is fixed this will be more easier than weighted one.
If you have access to a good MIP solver (cplex/gurobi via acedamic initiative, but coin OR and LP_solve are open-source, and not bad either), I would definitely give simplex a try. I took a look at formulating your problem as a mixed integer program, and my feeling is that it will have pretty strong relaxations, so branch and cut and price will go a long way for you. These solvers give remarkably scalable solutions nowadays, especially the commercial ones. Advantage is they also provide an upper bound, so you get an idea of solution quality, which is not the case for heuristics.
Formulation:
Define z(i,j) (binary) as a variable indicating that i and j are together in at least one event n in {1,2,...,N}.
Define z(i,j,n) (binary) to indicate they are together in event n.
Define z(i,n) to indicate that i is attending n.
Z(i,j) and z(i,j,m) only exist if i and j are supposed to meet.
For each t, M^t is a subset of time events that are held simulteneously.
So if event 1 is from 9 to 11, event 2 is from 10 to 12 and event 3 is from 11 to 13, then
M^1 = {event 1, event 2) and M^2 = {event 2, event 3}. I.e. no person can attend both 1 and 2, or 2 and 3, but 1 and 3 is fine.
Max sum Z(i,j)
z(i,j)<= sum_m z(i,j,m)
(every i,j)(i and j can meet if they are in the same location m at least once)
z(i,j,m)<= z(i,m) (for every i,j,m)
(if i and j attend m, then i attends m)
z(i,j,m)<= z(j,m) (for every i,j,m)
(if i and j attend m, then j attends m)
sum_i z(i,m) <= C(m) (for every m)
(only C(m) persons can visit event m)
sum_(m in M^t) z(i,m) <= 1 (for every t and i)
(if m and m' are both overlapping time t, then no person can visit them both. )
As pointed out by #SaeedAmiri, this looks like a complex problem.
My guess would be that the backtracking and linear programming options you are considering will explode as soon as the number of assistants grows a bit (maybe in the order of tens of assistants).
Maybe you should consider a (meta)heuristic approach if optimality is not a requirement, or constraint programming to build an initial model and see how it scales.
To give you a more precise answer, why do you need to solve this problem? what would be the typical number of attendees? number of rooms?

Shopping cart minimization algorithm

I have a list of products, which consists of list of shops, which sold it.
{
'Book A': [ShopA, ShopB, ShopC],
'Book B': [ShopC, ShopD],
'Movie C': [ShopA, ShopB, ShopD, ShopE],
...
}
(Price differs between the shops)
Each shop is also has a shipping cost. It's a "per-order" shipping cost, it doesn't matter how many items are in my cart. And it differs between the shops too.
Ex: if I buy "Book A" from ShopA, "Book B" from ShopC and "Movie C" from ShopA, the resulting price is: Book A price in ShopA + Book B price in ShopC + Movie C price in ShopA + ShopC shipping cost + ShopA shipping cost
If the shipping cost was zero or it was on per-item basis and constant, than I would just sort the offer lists by price+shipping field and fetch the first result from each set.
I need to buy all the items once and find the minimal price and the resulting set.
I'm not very good with optimization algorithms and dynamic programming so I need a solution or just a nod into the right direction.
This problem is NP Hard.
We will show a reduction from the Hitting Set problem.
Hitting Set problem: Given sets S1,S2,...,Sn and a number k: chose set S of size k, such that for every Si there is an element s in S such that s is in Si. [alternative definition: the intersection between each Si and S is not empty].
Reduction:
Given an instance of hitting set, in the form of (S1,...,Sn,k) create an instance of this problem:
All books cost nothing. In order to buy from each store you pay 1.
The book i is sold by each store denoted in Si, minimal price for this instance is k.
proof:
Hitting Set -> This problem: Assume there is a minimal hitting set in (S1,...,Sn) of size k. Let this hitting set be S. By buying from each store in S, we can buy all our books at cost k, since the books cost nothing [in our construction], and we bought all books, and we paid for the ordering from stores exactly k, thus the total price was k.
This problem -> Hitting set: Assume there is a pricing of k for the problem at the question. Then, from the building of the problem, and since the books cost nothing, we need to buy in k different stores to get all books. Let these stores be S. From the construction of the problem, S is a hitting set for (S1,...,Sn)
Q.E.D.
Conclusion:
Thus, this problem is "not easier then" Hitting Set Problem, and there is no known polynomial solution for this problem, so - your best shot, if you want optimal solution, is probably an exponential one, such as backtracking [Check all possibilities, and return the minimal solution].
With so little items I have a solution. It is dynamic.
We will process every shop iteratively. At every step we store the current best price with which we can cover all subsets of items. In the beginning all of them are infinity in price except for the empty subset which is 0 of price. Note that all subsets are 2^Num_products in count but in your case these are only about 1000.
Now how do we process the next to follow shop: Consider you cover every possible subset of the products with this shop (i mean subset that the shop can actually provide) and all the rest of the products being covered by shops you already observed, thus improving the minimal costs of covering every subset. This step takes 2^Num_products*2^Num_products=4^Num_products, still about a million which is bareable. You do this for every shop and at the end the answer is the cost of covering all the elements. The whole complexity of the proposed solution is 4^Num_products * num_shops which is about 50 million which is good to go.
Note that this is still exponential and this is not surprising. Thank you to amit for his incredible proof of NP hard.
EDIT Adding further explanation of the algorithm in pseudocode:
init:
cost[subset] = infi
cost[{}] = 0
for shop in shops
new_prices = costs.dup()
for set : subsets
for covered_set : all_subsets(set)
price = covered_set == {} ? 0 : delivery[shop]
remaining = set
for element : covered_set
if shop do not sale element
break for, choose next covered_set
price += el_price[element]
remaining.remove(element)
price += costs[remaining]
new_prices[set] = min(new_prices[set], price)
costs = new_prices
return costs[all]
Note that here I use sets as index - this is because I actually use the bitmask representation of the subsets e.g 1101 is a subset containing the 1st, 2nd and the forth element. Thus an iteration of all sets is for (int i = 0; i < (1 << n); i++).
There is also one more thing: if you want to cycle all the subsets of a subset S you can actually do it faster than iterating all the subsets of the initial set and checking whether the subset is subset of S. If S is also represented with bitmask bit_mask this for loop does the job: for(int i = bit_mask; i > 0; i = (i - 1) & bitmask). Using this approach you decrease the complexity of the algorithm to 3^Num_products * num_shops. However, this is a bit harder to understand and you will probably need to write by hand one example to make sure the loop I wrote actually cycles all the subsets of S. About the complexity - just trust me.
EDIT2 Edited the break condition. also let me elaborate on the set remaining and its calculation: as dmzkrsk pointed out the pseudocode mentions removal from the set, but you can actually just assign remaining = set ^ covered_set (again bit operation) in case of using bitmasks to represent the subsets.
I have dealt with this exact problem once. I didn't come up with any other solution than just testing every possible combination of shops but there is an easy way to filter out many of the shops in every product.
1. Calculate the lowest price (shipping cost included) of every product, let's call it best_price.
2. In every product, retain only the shops where price of the shop (without shipping cost) <= best_price (with shipping cost)
3. Test every possible combination of shops for the cheapest.
A good heuristic can be the ant colony optimization. I use it to solve the travel salesman problem. You can find a working example from google tsp solver. It's a javascript library that uses also a brute force and a dynamic programming solution. The AOC is used when you have more cities to compute then the current limit of 20 cities. I believe you can use the library to solve your problem and it just need a little rewrite. With 20 cities the program has to check 20! posibilites. In your case it's a bit lighter but maybe only a magnitude.

Resources