Assignment problem with multiple persons needed on each job - algorithm

First, sorry if my English isn't so good, I'm not a native English speaker.
I'm facing an assignment problem.
I have a list of jobs, with a certain number of persons needed for each job. Each person will let me know on how many jobs they want to be and their preferences.
I tried to do that with the Hungarian Algorithm but it seems I'm unable to get it done. With a large number of jobs and spots, some persons got multiple time the same job, which isn't ok.
I think it's all due to the fact I considered each spot as an individual job and I listed each person as many times as they need to be placed.
Do you know a better algorithm or a way to do it?
(It's not a coding problem, I'm doing it in Octave/Matlab for now, but I think I'll switch to Python.)
Thanks for your help.

In addition to Henrik's suggestion to use Linear Programming, your specific problem can also be solved using Minimum cost maximum flow.
You make a bipartite graph between people and jobs (like in the Hungarian algorithm), where the cost on the middle edges are the preference scores, and the capacity is 1.
The capacity of the edges from the jobs to the sink is the number of people you need for that job.

Assignment problems can be solved with linear programming:
Let xij = 1 if person i is assigned to job j and 0 otherwise. Let aij be the rank for person i of job j : aij = 1 for the job he wants most, aij = 2 for the next and so on. If he only wants k jobs you put aij to a very high number for all jobs beyond those k.
If you need at least bj workers on job j you have the constraint
x1j + ... + xmj >= bj (j = 1,...,n)
You also have the constraints xij >= 0 and xij <= 1 .
The linear function to minimize is
sum( aij xij ) over all i,j

Related

Proving the greedy solution to the weighted task scheduling problem

I am attempting to prove the following algorithm is fully correct (partial correctness + termination), but I can only seem to prove for arbitrary example inputs (not general ones).
Here is my pseudo-code:
IN :Listofjobs J, maxindex n
1:S ← an array indexed 0 to n, with null at each index
2:Sort J in non-increasing order of profits
3:for i from 0 to n
4:Find the largest t such that S[t] = null and t ≤ J[i].deadline (if one exists) if an index t was found
5: S[t] ← J[i]
OUT: S maximizes the profit of scheduled jobs that can be done in n 1 unit of time blocks
So for example, I created a table with jobs and their associated attributes (deadlines and profit):
jOB J1 J2 J3 J4 J5
Deadline 2 1 3 2 1
profit 62 100 20 40 20
From this example input, we'd be able to do J2,J1,J3 for a total profit of 182.
Can someone help me form a more generic way of showing my pseudo-code algorithm is fully correct?
Adding constraints helps you here.
In step 2 order J first by non-increasing profits, and then by non-increasing deadline.
Of all optimal solutions, at least one must be lexicographically last by the ordering of profits of jobs over time. (If multiple jobs have the same profit and deadlines, there may be multiple optimal solutions that are lexicographically last.)
Prove that any solution that does not have a job of the same profit at the same time as the first job that you place is either not optimal, or is not lexicographically last among optimal solutions.
Prove by induction that the solution that you found is identical in placement of sizes of jobs as any solution that is optimal and lexicographically last among optimal solutions.
Please note that this is an outline only. There are some subtle tricks needed at each step of this proof that I'm deliberately leaving as an exercise.

Dynamic programming algorithm for unweighted interval scheduling?

I was wondering if someone could please help me reason about a DP algorithm for unweighted interval scheduling.
I'm given 2 arrays [t1,...,tn] and [d1,...,dn] where ti is the start time of job i and di is the duration of job i. Also the jobs are sorted by start time, so t1 <= t2 <= ... <= tn. I need to maximize the number of jobs that can be executed without any overlaps. I'm trying to come up with a DP algorithm and runtime for this problem. Any help would be much appreciated!
Thank you!
I am sorry I don't have any more time now to spend on this problem. Here is an idea, I think it lends itself nicely to Dynamic Programming. [Actually I think it is DP, but almost two decades have passed since I last studied such things...]
Suppose T = {t1, t2, ..., tn} is partitioned as follows:
T = {t1, t2, ..., tn} = {t1, t2, ..., tk} U {tk+1, tk+2, ..., tn}
= T1(k) U T2(k)
Let T2'(k) be the subset of T2(k) not containing the jobs overlapping T1(k).
Let opt(X) be the optimal value for a subset X of T. Then
opt(T) = min( opt( T1(k) ) + opt( T2'(k) )
where the minimum is taken along any possible k in {1, 2, ..., n}
Of course you need to compute opt() recursively, and take into account overlaps.
Hope this helps!
It's easiest for me to think about if I suppose that you work out what the end time would be for each job and sort the jobs into order of increasing end time, although you can probably achieve the same thing using start times working in the opposite direction.
Consider each job in order of increasing end time. For each job work out the maximum number of jobs you can handle up to and including that job if you decide to work on that job. To work this out, look at the answers you have already computed that cover times up to the start time of that job and find the one that covers the maximum number of jobs. The best you can do while handling the job you are considering is one plus that maximum number.
When you have considered all the jobs, the maximum number you can cover is the maximum number you have computed when considering any job. You can work out which jobs to do by storing the previous job that you identified when working out the maximum score possible for a particular job, and then tracing these pointers back from the job with the maximum score.
With N jobs to consider you look back at at most N previously computed answers when working out the best possible score for each job so I think this is O(N^2)

Resource allocation algorithm

I know the algorithm exists but i and having problems naming it and finding a suitable solutions.
My problem is as follows:
I have a set of J jobs that need to be completed.
All jobs take different times to complete, but the time is known.
I have a set of R resources.
Each recourse R may have any number from 1 to 100 instances.
A Job may need to use any number of resources R.
A job may need to use multiple instances of a resource R but never more than the resource R has instances. (if a resource only has 2 instances a job will never need more than 2 instances)
Once a job completes it returns all instances of all resources it used back into the pool for other jobs to use.
A job cannot be preempted once started.
As long as resources allow, there is no limit to the number of jobs that can simultaneously execute.
This is not a directed graph problem, the jobs J may execute in any order as long as they can claim their resources.
My Goal:
The most optimal way to schedule the jobs to minimize run time and/or maximize resource utilization.
I'm not sure how good this idea is, but you could model this as an integer linear program, as follows (not tested)
Define some constants,
Use[j,i] = amount of resource i used by job j
Time[j] = length of job j
Capacity[i] = amount of resource i available
Define some variables,
x[j,t] = job j starts at time t
r[i,t] = amount of resource of type i used at time t
slot[t] = is time slot t used
The constraints are,
// every job must start exactly once
(1). for every j, sum[t](x[j,t]) = 1
// a resource can only be used up to its capacity
(2). r[i,t] <= Capacity[i]
// if a job is running, it uses resources
(3). r[i,t] = sum[j | s <= t && s + Time[j] >= t] (x[j,s] * Use[j,i])
// if a job is running, then the time slot is used
(4). slot[t] >= x[j,s] iff s <= t && s + Time[j] >= t
The third constraint means that if a job was started recently enough that it's still running, then its resource usage is added to the currently used resources. The fourth constraint means that if a job was started recently enough that it's still running, then this time slot is used.
The objective function is the weighted sum of slots, with higher weights for later slots, so that it prefers to fill the early slots. In theory the weights must increase exponentially to ensure using a later time slot is always worse than any configuration that uses only earlier time slots, but solvers don't like that and in practice you can probably get away with using slower growing weights.
You will need enough slots such that a solution exists, but preferably not too many more than you end up needing, so I suggest you start with a greedy solution to give you a hopefully non-trivial upper bound on the number of time slots (obviously there is also the sum of the lengths of all tasks).
There are many ways to get a greedy solution, for example just schedule the jobs one by one in the earliest time slot it will go. It may work better to order them by some measure of "hardness" and put the hard ones in first, for example you could give them a score based on how badly they use a resource up (say, the sum of Use[j,i] / Capacity[i], or maybe the maximum? who knows, try some things) and then order by that score in decreasing order.
As a bonus, you may not always have to solve the full ILP problem (which is NP-hard, so sometimes it can take a while), if you solve just the linear relaxation (allowing the variables to take fractional values, not just 0 or 1) you get a lower bound, and the approximate greedy solutions give upper bounds. If they are sufficiently close, you can skip the costly integer phase and take a greedy solution. In some cases this can even prove the greedy solution optimal, if the rounded-up objective from the linear relaxation is the same as the objective of the greedy solution.
This might be a job for Dykstra's Algorithm. For your case, if you want to maximize resource utilization, then each node in the search space is the result of adding a job to the list of jobs you'll do at once. The edges will then be the resources which are left when you add a job to the list of jobs you'll do.
The goal then, is to find the path to the node which has an incoming edge which is the smallest value.
An alternative, which is more straight forward, is to view this as a knapsack problem.
To construct this problem as an instance of The Knapsack Problem, I'd do the following:
Assuming I have J jobs, j_1, j_2, ..., j_n and R resources, I want to find the subset of J such that when that subset is scheduled, R is minimized (I'll call that J').
in pseudo-code:
def knapsack(J, R, J`):
potential_solutions = []
for j in J:
if R > resources_used_by(j):
potential_solutions.push( knapsack(J - j, R - resources_used_by(j), J' + j) )
else:
return J', R
return best_solution_of(potential_solutions)

Resource allocation- Matching

I've got a problem which I'm not sure can be solved by Linear Programming. Essentially there are 2 groups of people who are list their preference for one another and will be subsequently matchd. I'm writing an algorithm for this. Group A has upto 4 choices from Group B and vice versa.
In formulating a solution, I am currently assigning a cost to each combination of pairs. For example if Person 1 from Group A ranks Person 3 from Group B as his/her number 1 choice and vice versa, then the cost is minimal (Pair 1-3 cost: 0.01). Similarly, I would allot a cost to other pairs, devising an objective function which seeks to have pairings which minimize overall cost.
However, I do not see this being feasible because I don't know how to define my constraints and overall objective function. Reading online and from textbooks, I find resource allocation problems to be different from what I am trying to do.
Can I seek your advise on how to proceed?
Your problem can be formulated as an "Assignment Problem." As a canonical case, assignment problems are for assigning "jobs" to "machines." They can just as easily be used for Matching two sets.
Here's the formulation:
Two sets of people A and B
Decision Variable Xij
Let Xij be 1 if person i (ith person in set A) is matched with jth person in set B; 0 otherwise
Parameters:
Let Cij be the cost of pairing person i with person j
Objective Function: Minimize (Sum over i) (sum over j) Cij * Xij
Constraints:
Every Person i gets paired exactly once
Sum over j Xij = 1 (for each i)
Every Person j gets paired exactly once
Sum over i Xij = 1 (for each j)
Xij are Binary variables
Xij = (0,1)
The neat thing about Assignment problems is that the optimal pairings can be found using the fairly easy to understand 'Hungarian Method.' You can also use an LP/IP solver you have at your disposal.
Hope that helps.

Is there a well understood algorithm or solution model for this meeting scheduling scenario?

I have a complex problem and I want to know if an existing and well understood solution model exists or applies, like the Traveling Salesman problem.
Input:
A calendar of N time events, defined by starting and finishing time, and place.
The capacity of each meeting place (maximum amount of people it can simultaneously hold)
A set of pairs (Ai,Aj) which indicates that attendant Ai wishes to meet with attendat Aj, and Aj accepted that invitation.
Output:
For each assistant A, a cronogram of all the events he will attend. The main criteria is that each attendants should meet as many of the attendants who accepted his invites as possible, satisfying the space constraints.
So far, we thought of solving with backtracking (trying out all possible solutions), and using linear programming (i.e. defining a model and solving with the simplex algorithm)
Update: If Ai already met Aj in some event, they don't need to meet anymore (they have already met).
Your problem is as hard as minimum maximal matching problem in interval graphs, w.l.o.g Assume capacity of rooms is 2 means they can handle only one meeting in time. You can model your problem with Interval graphs, each interval (for each people) is one node. Also edges are if A_i & A_j has common time and also they want to see each other, set weight of edges to the amount of time they should see each other, . If you find the minimum maximal matching in this graph, you can find the solution for your restricted case. But notice that this graph is n-partite and also each part is interval graph.
P.S: note that if the amount of time that people should be with each other is fixed this will be more easier than weighted one.
If you have access to a good MIP solver (cplex/gurobi via acedamic initiative, but coin OR and LP_solve are open-source, and not bad either), I would definitely give simplex a try. I took a look at formulating your problem as a mixed integer program, and my feeling is that it will have pretty strong relaxations, so branch and cut and price will go a long way for you. These solvers give remarkably scalable solutions nowadays, especially the commercial ones. Advantage is they also provide an upper bound, so you get an idea of solution quality, which is not the case for heuristics.
Formulation:
Define z(i,j) (binary) as a variable indicating that i and j are together in at least one event n in {1,2,...,N}.
Define z(i,j,n) (binary) to indicate they are together in event n.
Define z(i,n) to indicate that i is attending n.
Z(i,j) and z(i,j,m) only exist if i and j are supposed to meet.
For each t, M^t is a subset of time events that are held simulteneously.
So if event 1 is from 9 to 11, event 2 is from 10 to 12 and event 3 is from 11 to 13, then
M^1 = {event 1, event 2) and M^2 = {event 2, event 3}. I.e. no person can attend both 1 and 2, or 2 and 3, but 1 and 3 is fine.
Max sum Z(i,j)
z(i,j)<= sum_m z(i,j,m)
(every i,j)(i and j can meet if they are in the same location m at least once)
z(i,j,m)<= z(i,m) (for every i,j,m)
(if i and j attend m, then i attends m)
z(i,j,m)<= z(j,m) (for every i,j,m)
(if i and j attend m, then j attends m)
sum_i z(i,m) <= C(m) (for every m)
(only C(m) persons can visit event m)
sum_(m in M^t) z(i,m) <= 1 (for every t and i)
(if m and m' are both overlapping time t, then no person can visit them both. )
As pointed out by #SaeedAmiri, this looks like a complex problem.
My guess would be that the backtracking and linear programming options you are considering will explode as soon as the number of assistants grows a bit (maybe in the order of tens of assistants).
Maybe you should consider a (meta)heuristic approach if optimality is not a requirement, or constraint programming to build an initial model and see how it scales.
To give you a more precise answer, why do you need to solve this problem? what would be the typical number of attendees? number of rooms?

Resources