Algorithm to solve an assignment or matching with a constraint - algorithm

Here is the problem. Suppose we have N workers and N jobs. We want to assign each job exactly one worker. For each worker i, he could do some jobs on some cost. Our goal is to minimize the total cost on the condition that any single cost should be less than some value.
For example, 10 workers and 10 jobs. Worker 1 can do job 1 with $0.8, job 2 with $2.3, job 3 with $15.8, jobs 4 to 8 with $100, job 9 with $3.2, job 10 with $15.3.
Worker 2 can do job 1 with $3.5, job 2 with $2.3, job 3 with $4.6, job 4 with $17, etc.
Our goal is to find a matching or we can call it an assignment such that the total cost is minimized but any single cost of the corresponding pair/matching between work i and job i is less than a value like $50.
I would very much like to solve it in MATLAB if possible.

This is a slight variation of the Assignment Problem. To handle your additional constraint that no single job cost should be more than some value, just change all entries in the matrix of costs that are greater than this threshold to a huge value (bigger than the sum of all other entries will suffice), and solve as usual, using e.g. the Hungarian Algorithm.

Related

Minimize time to complete N tasks on P agents

I have N tasks where the i'th task takes A[i] time to process. Every task is independent of each other and can be scheduled anytime on any of the P processors. A task can be run on only 1 processor, and a processor can process any number of tasks. Each agent/processor can only work on one task at a time, and once begun, must continue to work on it until the task is complete
I want to minimize the amount of time it takes to complete all the task
I am implementing this using a min-heap, i.e.
Sort the task in descending order
Create a min-heap of size P initialized to 0
For each task i, pull the min from heap, add the task time A[i] to it and add it back to the heap
The time to complete all the task is maximum value in the heap. This has been working so far and I want to verify its correctness
Do you think this breaks for any inputs?
I believe I am doing something like Greedy Number Partitioning
This is a polynomial time algorithm for a problem that includes NP complete problems as special cases (for example with P=2, you have a subset sum problem). Therefore you should expect it to not always work.
The simplest case I could find where your algorithm breaks is if the weights are 1, 1, 5, 5 and P=2. Your algorithm should combine things like this:
1 1 5 5
1,1 5 5
1,1,5 5
and will take 7. The better solution you don't find is:
1,5 1,5
which will complete in 6.

Max Tasks that can be completed in given time

I recently came across a this question in a forum:
You are given a straight line starting at 0 to 10^9. You start at zero and there are n tasks you can perform. i th task is located at point i in the line and requires 't' time to be performed. To perform the task you need to reach the point i and spend 't' time at that location.
example: (5,8) lies at 5 so travel distance is 5 and work effort is 8.
Total effort is calculated as travel distance + time required to complete the work.
It takes one sec to travel one unit of path.
Now we are given total T seconds and we need to complete as many tasks as possible and reach back to starting position
Find the max number of tasks that you can finish in time T.
example :
3 16 - 3 tasks and 16 units of total time
2 8 - task 1 at position 2 in line and takes 8 sec to complete
4 5 - task 2 at position 4 in line and takes 5 sec to complete
5 1 - task 3 at position 5 in line and takes 1 sec to complete
​​​​​​​
Output : 2
Explanation :
If we take task 1 at location 2 which requires 8 sec then getting to location 2 takes 2s and completing the task takes 8s leaving us with only 6s which is not enough for completing other task
On the other hand skipping the fist task leaves us enough time to complete the other two tasks.
Going to location and coming back costs 2x5 =10s and performing task at location 4 and 5 cost us 5+1 = 6s. Total time spent will be 10s+6s=16s.
I am new to graphs and DP so I was not sure which approach to use Hamiltonian cycle, Knapsack or Longest Path.
Can someone please help me with the most efficient approach to solve this.
Let's iterate from the first task to the last, according to distance. As we go, it's clear that after subtracting 2 * distance(i) + effort(i) for considering the current task as our last, the most tasks we can achieve can be found by greedily accumulating as many earlier tasks as possible into the remaining time, ordering them by increasing effort.
Therefore, an efficient solution could insert the seen element into a data-structure ordered by effort, dynamically updating the best solution so far. (I originally thought of using a treap and binary search but j_random_hacker suggested a much simpler way in the comments below this answer.)
Suggestion:
For each task n create a graph like this
Join up these graphs for all the tasks.
Run a travelling salesman algorithm to find the minimum time to do all the tasks ( = visit all the nodes in combined graph )
Remove tasks in an orderly sequence. This will give you a collection of results for different numbers of tasks performed. Choose the one that does the most number of tasks that still remains under the time limit.
Since you are maximizing the number of tasks performed, start by removing the longest tasks so that you will be left with lots of short tasks.

Greedy Algorithm: Assigning jobs to minimize cost

What is the best approach to take if I want to find the minimum total cost if I want to assign n jobs to a person in a sequence which have cost assigned to them? For eg. I have 2 jobs which have costs 4 and 5 respectively. Both jobs take 6 and 10 minutes respectively. So the finish time of the second job will be finish time of first job + time taken by this job. So the total cost will be finish time of each job multiplied by its cost.
If you have to assign n jobs to 1 person (or 1 machine) in scheduling literature terminology, you are looking to minimize weighted flow time. The problem is polynomially solvable.
The shortest weighted processing time sequence is optimal.
Sort and reindex jobs such that p_1/w_1 <= p_2/w_2 <= ... <= p_n/w_n,
where, p_i is the processing time of the ith job and w_i is its weight or cost.
Then, assign job 1 first, followed by 2 and so on until n.
If you look at what happens if you swap two adjacent values you will end up comparing terms like (A+c)m + (A+c+d)l and (A+d)l + (A+c+d)m, where A is the time consumed by earlier jobs, c and d are times, and l and m are costs. With some algebra and rearrangement you can see that the first version is smaller if c/m < d/l. So you could work out for each job the time taken by that job divided by its cost, and do first the jobs with smallest time per unit cost. - check: if you have a job that takes 10 years and has a cost of 1 cent, you want to do that last so that 10 year wait doesn't get multiplied by any other costs.

Priority based preemptive Shortest Job First. How to determine what process comes first

I have a question for Priority based preemptive Shortest Job First algorithm. If two processes have the same priority, who is the one to go first. The one that was put in first or the one with smaller burst time? Same goes with burst time if I have 2 processes with same burst time do I sort by priority? And what happens if 2 processes have same burst time and priority?
For example what would a Gantt chart based on this table look like?
Arrival Time Burst Time Priority
p0 0 8 2
p1 4 15 5
p2 7 9 3
p3 13 5 1
p4 9 13 4
p5 0 6 1
As the name implies, you first pick the set of highest priority jobs.
Then, from that set you select the shortest job. In this case I presume 'burst time' represents the expected execution time (or time to yield).
Therefore assuming that your lower priority numbers represent 'higher' priority jobs, p3 and p5 are the two highest priority jobs.
At that point, what matters is the expected job size (burst time) at which point you select the one with the shortest burst time. In this case it would be p3.

Algorithm for fairly assigning tasks to workers based on skills

(Before anyone asks, this is not homework.)
I have a set of workers with interests, i.e.:
Bob: Java, XML, Ruby
Susan: Java, HTML, Python
Fred: Python, Ruby
Sam: Java, Ruby
etc.
(There are actually somewhere in the range of 10-25 "interests" for each worker, and I have around 40-50 workers)
At the same time, I have a very large set of tasks that need to be distributed among the workers. Each task has to be assigned to at least 3 workers, and the workers must match at least one of the tasks' interests:
Task 1: Ruby, XML
Task 2: XHTML, Python
and so on. So Bob, Fred, or Sam could get Task 1; Susan or Fred could get Task 2.
This is all stored in a database thusly:
Task
id integer primary key
name varchar
TaskInterests
task_id integer
interest_id integer
Workers
id integer primary key
name varchar
max_assignments integer
WorkerInterests
worker_id
interest_id
Assignments
task_id
worker_id
date_assigned
Each worker has a maximum number of assignments they will do, around 10. Some interests are more rare than others (i.e. only 1 or 2 workers have listed them as a interest), some interests are more common (i.e. half of the workers list them).
The algorithm must:
Assign every task to 3 workers (it is
assumed that at least 3 of the
workers are interested in one of the
interests of the task).
Assign every worker 1 or more tasks
Ideally, the algorithm will:
Assign each worker a number of tasks proportional to their maximum assignments and the total number of tasks. For example, if Susan says she will do 20 tasks and most people will only do 10 tasks and there are 50 workers and 300 tasks, she should be assigned 12 tasks (20/10*(300/50)).
Assign a variety of tasks to each worker, so if Susan lists 4 interests she gets tasks that include 4 interests (rather than getting 10 tasks all with the same interest)
The most difficult aspect so far has been dealing with theses issues:
tasks having interests with few corresponding workers
workers who have few interests, especially
workers who have a few interests, for which there are relatively few tasks
This problem can be modeled as a
Maximum Flow Problem.
In a max-flow problem, you have a directed graph with two special nodes, the source and the sink. The edges in the graph have capacities, and your goal is to assign a flow through the graph from the source to the sink without exceeding any of the edge capacities.
With a (very) carefully crafted graph, we can find an assignment meeting your requirements from the maximum flow.
Let me number the requirements.
Required:
1. Workers are assigned no more than their maximum assignments.
2. Tasks can only be assigned to workers that match one of the task's interests.
3. Every task must be assigned to 3 workers.
4. Every worker must be assigned to at least 1 task.
Optional:
5. Each worker should be assigned a number of tasks proportional to that worker's maximum assignments
6. Each worker should be assigned a variety of tasks.
I will assume that the maximum flow is found using the
Edmonds-Karp Algorithm.
Let's first find a graph that meets requirements 1-3.
Picture the graph as 4 columns of nodes, where edges only go from nodes in a column to nodes in the neighboring column to the right.
In the first column we have the source node. In the next column we will have nodes for each of the workers. From the source, there is an edge to each worker with capacity equal to that worker's maximum assignments. This will enforce requirement 1.
In the third column, there is a node for each task. From each worker in the second column there is an edge to each task that that worker is interested in with a capacity of 1 (a worker is interested in a task if the intersection of their interests is non-empty). This will enforce requirement 2. The capacity of 1 will ensure that each worker takes only 1 of the 3 slots for each task.
In the fourth column we have the sink. There is an edge from each task to the sink with capacity 3. This will enforce requirement 3.
Now, we find a maximum flow in this graph using the Edmonds-Karp Algorithm. If this maximum flow is less than 3 * (# of tasks) then there is no assignment meeting requirements 1-3. If not, there is such an assignment and we can find it by examining the final augmented graph. In the augmented graph, if there is an edge from a task to a worker with capacity 1, then that worker is assigned to that task.
Now, we will modify our graph and algorithm to meet the rest of the requirements.
First, let's meet requirement 4. This will require a small change to the algorithm. Initially, set all the capacities from the source to the workers to 1. Find the max-flow in this graph. If the flow is not equal to the number of workers, then there is no assignment meeting requirement 4. Now, in your final residual graph, for each worker the edge from the source to that worker has capacity 0 and the reverse edge has capacity 1. Change these to that worker's maximum assignments - 1 and 0, respectively. Now continue Edmonds-Karp algorithm as before. Basically what we have done is first find an assignment such that each worker is assigned to exactly one task. Then delete the reverse edge from that task so that the worker will always be assigned to at least one task(though it may not be the one assigned to in the first pass).
Now let's meet requirement 5. Strictly speaking, this requirement just means that we divide each worker's maximum assignments by sum of all worker's maximum assignments / number of tasks. This will quite possibly not have a satisfying assignment. But that's ok. Initialize our graph with these new maximum assignments. Run Edmonds-Karp. If it finds a flow that saturates the edges from tasks to sink, we are done. Otherwise we can increment the capacities from sink to workers in the residual graph and continue running Edmonds-Karp. Repeat until we saturate the edges into the sink. Don't increment the capacities so much that a worker is assigned too many tasks. Also, technically, the increment for each worker should be proportional to that worker's maximum assignments. These are both easy to do.
Finally let's meet requirement 6. This one is a bit tricky. First, add a column between workers and tasks and remove all edges from workers to tasks. In this new column, for each worker add a node for each of that workers interests. From each of these new nodes, add an edge to each task with a matching interest with capacity 1. Add an edge from each worker to each of its interest nodes with capacity 1. Now, a flow in this graph would enforce that if a worker is assigned to n tasks, then the intersection of the union of those task's interests with that worker's interests has size at least n. Again, it is possible that there is a satisfying assignment without this assignment, but there is not one with it. We can handle this the same as requirement 5: run Edmonds-Karp to completion, if no satisfying assignment, increment the capacities from workers to their interest nodes and repeat.
Note that in this modified graph we no longer satisfy requirement 3, as a single worker may be assigned to multiple/all slots of a task if the intersection of their interests has size greater than 1. We can fix that. Add a new column of nodes between the interest nodes and the task nodes and delete the edges between those nodes. For each employee, in the new column insert a node for each task (so each employee has its own node for each task). From these new nodes, to their corresponding task to the right, add an edge with capacity 1. From each worker's interests node to that worker's task nodes, add an edge with capacity 1 from each interest to each task that matches.
-
EDIT: Let me try to clarify this a little. Let -(n)-> be an edge with n capacity.
Previously we had worker-(1)->task for each worker-task pair with a matching interest. Now we have worker-(k)->local interest-(1)->local task-(1)->global task. Now, you can think of a task being matched to a worker-interest pair. The first edge says that for a worker, each of its interests can be matched to k tasks. The second edge says that each of a worker's interests can only be matched once to each job. The third edge says that each task can only be assigned once to each worker. Note that you could push multiple flow from the worker to a local task (equal to the size of the intersection of their interests) but only 1 flow from the worker to the global task node due to the third edge.
-
Also note that we can't really mix this incrementing with the one for requirement 5 correctly. However, we can run the whole algorithm once for each capacity {1,2,...,r} for worker->interest edges. We then need a way to rank the assignments. That is, as we relax requirement 5 we can better meet requirement 6 and vice versa. However, there is another approach that I prefer for relaxing these constraints.
A better approach to requirement relaxation (inspired-by/taken-from templatetypedef)
When we want to be able to relax multiple requirements (e.g. 5 and 6), we can model it as a min-cost max-flow problem. This may be simpler than the incremental search that I described above.
For example, for requirement 5, set all the edge costs to 0. We have the initial edge from the source to the worker with the capacity equal to worker's maximum assignments / (sum of all worker's maximum assignments / number of tasks) and with cost 0. Then you can add another edge with the remaining capacity for that worker and cost 1. Another possibility would be to use some sort of progressive cost such that as you add tasks to a worker the cost to add another task to that user goes up. E.g. you could instead split a worker's remaining capacity up into individual edges with costs 1,2,3,4,....
A similar thing could then be done between the worker nodes and the local-interest nodes for requirement 6. The weighting would need to be balanced to reflect the relative importance of the different requirements.
This method is also sufficient to enforce requirement 4. Also, the costs for requirement 5 should probably be made such that they are proportional to a worker's maximum assignments. Then assigning 1 extra task to a worker with max 100 would not cost as much as assigning an extra to a worker with max 2.
Complexity Analysis
Let n = # of employees, m = # of tasks, k = max interests for a single task/worker, l = # of interests, j = maximum of maximum assignments.
Requirement 3 implies that n = O(m). Let's also assume that l = O(m) and j = O(m).
In the smaller graph (before the change for req. 6), the graph has n + m + 2 = O(m) vertices and at most n + m + k*min(n, m) = O(km) edges.
After the change it has 2 + n + n * l + n * m + m = O(nm) vertices and n + k * n + k * m * n + n * m + m = O(kmn) edges (technically we may need j * n + j * l more nodes and edges so that there are not multiple edges from one node to another, but this wouldn't change the asymptotic bound). Also note that no edge need have capacity > j.
Using the min-cost max-flow formulation, we can find a solution in O(VEBlogV) where V = # vertices, E = # edges, and B = max capacity on a single edge. In our case this gives O(kjn^2m^2log(nm)).
For problems where finding a direct solution is difficult it can be a good idea to use an approximation algorithm, an evaulation function and a method to improve the solution. There are a variety of approaches, such as genetic algorithms and simulated annealing.
The basic idea is to use some sort of simple algorithm (such as a greedy algorithm) to get something that is vaguely usable and make random modifications, keeping those modifications that improve the evaluation score and discarding those that make it worse.
With genetic algorithms a group of for example 100 random solutions is generated and scored and the best are kept and "bred" to produce a new generation of solutions with characteristics similar to the previous generations, but with some random mutations.
For simulated annealing the probablility of a slightly worse solutions being accepted is high initially, but decreases over time. This reduces the risk of getting stuck at a local optimium early on.
Try mapping your task to the stable marriage problem. Tasks become prospective wives `, and your staff become suitors.
You might want to add some extra algorithm for assigning preferences of each task to the staff, and vice-versa - you could assign some ideal proficiency neccessary for the components of each task, and then allow your staff to rank each task. You could assign a proficiency for each component that each staff member posses and use that to get each tasks preference in staff members.
Once you have the preferences then run the algorithm, post the results, then allow people to apply in pairs to you to swap assignments - after all this is a people problem and people work better when they have a degree of control.
So I gave this problem some thought and I think that you can get a good solution (for some definition of "good") by reducing it to an instance of min-cost max-flow (see this, for example). The idea is as follows. Suppose you are given as input a set of jobs J, each of which has a set of skills necessary, along with a set of workers W, each of whom has a set of talents. You are also given for each worker a constant k_i saying how many jobs you'd like them to do, as well as a constant m_i saying the maximum number of jobs you can allocate to them. Your goal is to assign the jobs to the workers in such a way that each job is done by a worker who has the skills, no worker does more than m_i jobs, and the number of the "excess" jobs done by the workers is minimized. For example, if the re are five workers who each want to do four tasks and the load is balanced so that two workers do four jobs, one does three, and one does five, the total excess is one, since one worker did one more job than was expected.
The reduction is as follows. For now, we'll ignore the balancing requirement and just see how tom reduce this to max-flow; we'll add load balancing at the end. Construct a graph G with a designated start node s and sink node t. Add to this graph a node for each job j and each worker w. There will be an edge from s to each of these j nodes of cost zero and capacity one. There will also be an edge from each w node to t with cost zero and capacity m_i. Finally, for each job j and worker w, if worker w has the talents necessary to complete job j, there is an edge from j to w with cost zero and capacity one.
The idea is that we want to push flow from s to t through the j and w nodes such that each flow path going through some j node to a w node means that job j should be given to worker w. The capacity restrictions on the edges from s to j nodes ensures that at most one unit of flow enters the j node, so the job is only assigned at most once. The capacity restriction on the edges from the w nodes to the node t prevent each worker from being assigned too many times. Since all capacities are integral, an integral max flow exists from s to t, and so a max-flow in this graph corresponds to an assignments of jobs to workers that is legal and doesn't exceed any worker's maximum load. You can check whether all jobs are assigned by looking at the total flow in the graph; if it's equal to the number of jobs, they've all been assigned.
This above construction, however, does nothing to balance worker loads. To fix this, we'll modify the construction a bit. Rather than having an edge from each w node to t, instead, for each w node, add two nodes to the graph, c and e, and connect them as follows. There is an edge from w_i to c_i with capacity k_i and cost zero, and an identical edge from c_i to t. There is also an edge from w_i to e_i with cost 1 and capacity m_i - k_i. There is also an edge from e_i to t with equal capacity and zero cost.
Intuitively, we haven't changed the amount of flow that leaves any w node, but we have changed how much that flow costs. Flow shunted to t via the c node is free, and so the worker can take on k_i jobs without incurring cost. Any jobs after that have to be routed through e, which costs one for each unit of flow crossing it. Finding a max-flow in this new graph still determines an assignment, but finding the min-cost max-flow in the graph finds the assignment that minimizes the excess jobs divvied up to workers.
Min-cost max flows can be solved in polynomial time with a few somewhat-well-known algorithms, so hopefully this is a useful answer!

Resources