Assigning jobs to workers - algorithm

There are N plumbers, and M jobs for them to do, where M > N in general.
If N > M then it's time for layoffs! :)
Properties of a job:
Each job should be performed in a certain time window which can vary per-job.
Location of each job varies per-job.
Some jobs require special skill. Skills needed to complete the job can vary per-job
Some jobs have higher priority than others. The "reward" for some jobs is higher than others.
Properties of a plumber:
Plumbers have to drive from one job to the next which takes time. Say it's known what the travel time from each job to every other job site is.
Some plumbers have skills that others don't have.
The task is to find the optimal assignment of jobs to plumbers, so that the reward is maximized.
It's possible that not all jobs can be completed. For example, with one plumber and two jobs, it's possible that if they are doing job A, they can't do job B because there's not enough time to get from A to B once they are done with A and B is supposed to begin. In that case, optimal is to have the plumber do the job with the biggest reward and we are done.
I am thinking of a greedy algorithm that works like this:
sort jobs by reward
while true:
for each job:
find plumbers that could potentially handle this job
make a note of the association, used in next loop
if each plumber is associated with a different job, break
for each job that can be handled by a plumber:
assign job to a plumber:
if more than one plumber can handle this job, break tie somehow:
for instance if plumber A can do jobs X,Y but
plumber B can only do X, then give X to B.
else just pick a plumber to take it
remove assigned job from further consideration
if no jobs got assigned:
break out of "while true" loop
My question: is there a better way? Seems like an NP-hard problem but I have no proof of that. :)
I guess it's similar to the Assignment Problem.
Seems it's a bit different though because of the space/time wrinkle: plumber could do either A or B, but not both because of the distance between them (can't get to B in time after finishing A). And jobs must be completed in certain time windows.
Also a plumber might not be able to take both jobs if they are too close in time (even if they are close in space). For example if B must be started before time_A_finished + time_to_travel_A_to_B, then B can't be done after A.
Thanks for any ideas! Any pointers on good stuff to read in this area is also appreciated.

Even routing just one plumber between jobs is as hard as the NP-hard traveling salesman problem.
I can suggest two general approaches for improving on your greedy algorithm. The first is local search. After obtaining a greedy solution, see if there are any small improvements to be made by assigning/reassigning/un-assigning a few jobs. Repeat until there are no obvious improvements or CPU time runs out.
Another approach is linear programming with column generation. This is more powerful but a lot more involved. The idea is to set up a master program where we try to capture as much reward as possible by choosing to use or not use every feasible plumber schedule, subject to the packing constraints of only doing a job once and not using more plumber skills than are available. At each stage of solving the master program, the dual values corresponding to jobs and plumbers reflect the opportunity cost of doing a particular job/using a particular plumber. The subproblem is figuring how out to route a plumber so as to capture more (adjusted) reward than the plumber "costs". This subproblem is NP-hard (per the note above), but it may be amenable itself to dynamic programming or further linear programming techniques depending on how many jobs there are. You'll quickly bump into the outer limits of academic operations research following this path.

Related

Need an advice about algorithm to solve quite specific Job Shop Scheduling Problem

Job Shop Scheduling Problem (JSSP): I have jobs that consist of tasks and I have machines that can perform these tasks.
I should be able to add new jobs dynamically. E.g. I have a schedule for the first 5 jobs, and when the 6th arrive - I need to be able to fit it into the schedule in the best way. It is possible to adjust existing schedule within the given flexibility constrains.
Look at the picture below.
Jobs have tasks, each task is the same type of action. Think about painting of some objects with paint spray. All the machines are the same (paint sprays), and all of the tasks are the same.
Constraint 1. Jobs have a preferred deadline for completion, but the deadline is flexible to some extent.
Edit after #tucuxi answer: Flexible deadline mean that the time of completion can be extended by some delta if necessary.
Constraint 2. Between the jobs there is resting phase. Think about drying the paint. Resting phase has minimal required duration. Resting phase can be longer or shorter if necessary.
Edit after #tucuxi answer: So there is planned time of rest Tp which is desired, but flexible value that can be increased or decreased if this allows for better scheduling. And there is minimal time of rest Tm. So Tp-Tadjustmenet>=Tm.
The machine is occupied by the job from the start to the completion.
Here goes parts that make this problem very distinct from what I have read about.
Jobs arrive in batches of several jobs. For example a batch can contain 10 jobs of the type Job_1 and 5 of Job_2. Different batches can contain different types of jobs. All the jobs from the batches should be finished as close to each other as possible. Not necessary at the same time, but we need to minimize the delay between the completion of first and last jobs from the batch.
Constraint 3. Machines are grouped. In each group only M machines can work simultaneously. Think about paint sprays that are connected to the common pressurizer that has limited performance.
The goal.
Having given description of the problem, it should be possible to solve JSSP. It should be also possible to add new jobs to the existing schedule.
Edit after #tucuxi answer: This is not a task that should be solved immediately: it is not a time-critical system. But it shouldn't be too long to irritate a human who put new tasks into the algorithm.
Question
What kind of many JSSP algorithms can help me solve this? I can implement an algorithm by myself, if there is one. The closest I found is This - Resource Constrained Project Scheduling Problem. But I was not able to comprehend how can I glue it to the JSSP solving algorithm.
Edit after #tucuxianswer: No, I haven't tried it yet.
Is there any libraries that can be used to solve this problem? Python or C# are the preferred languages, but in the end it doesn't really matter.
I appreciate any help: keyword to search for, link, reference to a book, reference to a library.
Thank you.
I doubt that there is a pre-made algorithm that solves your exact problem.
If I had to solve it, I would first:
compile datasets of inputs that I can feed into candidate solvers.
think of a metric to rank outputs, so that I can compare the candidates to see which is better.
A baseline solver could be a brute-force search: test and rate all possible job schedulings for small sample problems. This is of course infeasible for large inputs, but for small inputs it allows you to compare the outputs of more efficient solvers to a known-best answer.
Your link is to localsolver.com, which appears to provide a library for specifying problem constraints to then solve them. It is not freely available, requiring a license to use; but it would seem that your problem can be readily modeled in it. Have you tried to do so? They appear to support both C++ and Python. Other free options exist, including optaplanner (2.8k stars in github) or python-constraint (I have not looked into other languages).
Note that a good metric is crucial to choosing a good algorithm: unless you have a clear cost function to minimize, choosing "a good algorithm" is impossible. In your description of the problem, I see several places where cost is unclear (marked in italics):
job deadlines are flexible
minimal required rest times... which may be shortened
jobs from a batch should be finished as close together as possible
(not from specification): how long can you wait for an optimal vs a less-optimal-but-faster solution?

Task assigning algorithm

I'm trying to figure out what the most efficient way of assigning tasks to people. Here is what I'm struggling with:
you have X number of people that are all eligible to do work
each person can do X number of tasks at once
you have X number of waiting tasks
each task takes a variable length of time
The goal of the challenge is to evenly distribute the tasks among the people as best as possible. Once a person finishes one of the given tasks, one of the 'queued' tasks will be fed to them. Here's an example scenario.
There's 500 tasks in the queue with 50 people available to 'take' them. Each person can take 2 tasks at once. Once a person finishes a given task they'll be fed another. The tasks that have been waiting the longest get the highest priority.
One way to possibly do it would be to have each of the 50 people that have the capacity to take a task be assigned one based on their task last given time. For example:
task 1 -> person 1
task 2 -> person 2
task 3 -> person 3
...
task 4 -> person 1
task 5 -> person 2
task 6 -> person 3
Based on the task last assigned to X person, the person with the oldest task last assigned and that's available to take on another task would get it fed to them. I'm unsure if this is the right solution for even task distribution, would love to hear suggestions! Is there a name for this type of algorithm?
Another method could possibly be to assign tasks based on the person currently serving the lowest number of tasks. Although if multiple people are tied to for the lowest number of tasks, the task is assigned to the person who's been available (last task assigned) for the longest period of time.
Please consider looking at this at a higher level.
The proposals so far have been greedy. They build one schedule and hope for the best.
The first thing you'll need to decide is whether that's what you want. Greedy assignment will produce spectacularly bad answers for some inputs, though if the inputs are "reasonable," and all you want is a reasonable answer, it may be fine.
On the other hand, finding the optimal assignment of tasks is NP hard. You'll need time exponential in the input size to be sure you have the best possible answer.
There are two intermediate approaches.
Randomized task scheduling algorithms. This is a huge topic. This paper is still a decent starting place, though it's now very out-of-date. Richard Karp is amazing. The nice thing about randomized algorithms is that they can provide very useful optimality guarantees.
Heuristic search. Define a single numeric metric of goodness of a schedule. Start with a reasonable one (greedily determined or random). Put that on the search queue sorted by metric v, pull the best metric off the queue, find all its "children", i.e. schedules that haven't been considered before resulting from all possible simple changes, add these to the queue, and repeat. Stop when you can't wait any longer. The current best is your answer. You can also structure this as a genetic algorithm, which is just a specialized heuristic search.
Keep 2 queues. one for tasks another for free people waiting for task. If there is a task to be taken first person from queue will take and go. You will not think about time for tasks and people in this solution as it is justice way of distribution. You might think about priority queues if you need some kind of prioritization in the future with little changes for both queues.
As zsiar, but use two priority queues. The top task in the top priority queue gets assigned to the top workers, assuming he is capable. If he's not, the task can't be done and must wait.
Workers in the worker priority queue get ordered by capacity or time idle or whatever seems fair. In fact it's not a real priority queue as when a worker finishes a task, we take him out and put him back in the queue, in a higher position.
(If workers can do two tasks at once they are probably computers rather than people, so time idle isn't a useful metric. Only human workers care if they are kept busy whilst other doss about).

What data structure and algorithms to use to optimize concurrent jobs?

I have a series of file-watchers that trigger jobs. The file-watchers look, every fixed interval of time, in their list and, if they find a file, they trigger a job. If not, they wait, coming back after that mentioned interval.
Some jobs are dependent on others, so running them in a proper order and with proper parallelism would be a good optimization. But I do not want to think about this myself.
What data structure and algorithms should I use to ask a computer to tell me what job to assign to what file-watcher (and in what order to put them)?
As input, I have the dependencies between the jobs, the arrival time of files for each job and a number of watchers. (For starter, I will pretend each jobs takes same amount of time). How do I spread the jobs between the watchers, to avoid unnecessary waiting gaps and to obtain faster run time?
(I am looking forward tackling this optimization in an algorithmic way, but would like to start with some expert advice)
EDIT : so far I understood the fact the I need a DAG (Directed acyclic graph) to represent the dependencies and that I need to play with Topological sorting in order to optimize. But this responds with a one execution line, one thread. What if I have more, say 7?

Optimally fair load balancing/multiprocessor scheduling of periodic tasks

I’ve been thinking about scheduling and load balancing algorithms, and I came up with a problem that I think is interesting.
There are N cages and M zookeepers. Each cage has a size S and a number of animals A. The frequency with which a cage must be cleaned is computed as some function of A / S (smaller cages with more animals get dirty faster). The difficulty of cleaning a cage is computed as some other function of A and S, the details of which are unimportant (the size of a cage contributes most of the difficulty, and the number of animals contributes a little). Once every three days, any cages that are due for cleaning are cleaned (“cleaning day”). Zookeepers are completely identical and interchangeable. Zookeepers need to do a similar amount of work each cleaning day, and to not have to do much more work than any other zookeeper. The duration of time that a cage takes to clean is not part of the problem (it's assumed that time is roughly reflected by difficulty, and that there is always enough time in the day for a zookeeper to complete their assigned tasks).
The scheduling algorithm must tell each zookeeper which cages to clean on each cleaning day, such that
each cage is cleaned at its ideal frequency or as close as possible,
assigning a minimal and roughly equal number of cleanings per
zookeeper per cleaning day,
and assuring as equal a workload as possible across all zookeepers
(i.e., over a period of time, the aggregate difficulties of each zookeeper’s workload are as close to equal as possible, and cages are distributed among zookeepers with roughly 1/M probability).
I’m wondering what an approximation algorithm for such an optimization problem would look like. It bears a resemblance to several different classic examples of NP-hard scheduling/resource utilization problems; maybe it is reducible to one such problem and I’m just missing it. If we get rid of the frequency/periodicity of tasks element, it is similar to the classic multiprocessor scheduling or finite bin packing problem.
Given that the objective is to equalize zookeeper effort, the more-or-less standard way to handle such tasks this is on-line greedy.
In this case, that amounts to this:
Maintain a tally of the effort each zookeeper has expended so far, initially zero. On each cleaning day, tally up the needed cleaning jobs and use first-fit, best-fit, or random fit to assign jobs in a way that will tend to equalize the work sums. I.e. for best fit assign he biggest job to the keeper farthest behind in work assigned so far. Repeat until all tasks are assigned.

What is this algorithm called?

I'm trying to look up this problem but I don't know what it's called. The premise is this:
Given m machines and j jobs, where each job can only be assigned to machines i through j, I need to assign the jobs to machines so that I maximize busy machines at one time. I am only concerned with how they are assigned at time 0. I am not concerned with how I would schedule remaining jobs after a job is completed.
Once a job and a machine are assigned to each other, no other job or machine can act on either member.
Scheduling algorithm
As others said, what you described is a problem, not an algorithm. There are many techniques you could use to solve your problem. Which one you should choose depends on your needs. If you need the optimal solution, you must use a technique called integer programming. If you want a very good solution, not necessarily the optimal one, there are many heuristics you could use.
Like they have said you are basically writing a 'scheduler'.
As your 'j' jobs seem to be having equal priority may be you are looking at 'Round robin - time sliced scheduling algorithm'.
The problem is a variant of the bin packing problem, which has a wider variety of literature than processor scheduling.
Typical real world OS multi-processor scheduling algorithms don't operate with knowledge of how the long jobs will take, and account for other issues such as memory affinity, and trading the scheduler's complexity with the benefit of scheduling.
I have encountered get this kind of problem in modular avionics systems where you are apportioning jobs to nodes, and there you do know the expected timing and memory requirements for their jobs prior to the jobs executing.
Sounds like a scheduler.
As other have said, it's a scheduler.
It's also a classic problem used to demonstrate OOPS development, and in particular it used to be used as a very common sample application for Smalltalk programming.

Resources