What data structure and algorithms to use to optimize concurrent jobs?

What data structure and algorithms to use to optimize concurrent jobs? - algorithm

I have a series of file-watchers that trigger jobs. The file-watchers look, every fixed interval of time, in their list and, if they find a file, they trigger a job. If not, they wait, coming back after that mentioned interval.
Some jobs are dependent on others, so running them in a proper order and with proper parallelism would be a good optimization. But I do not want to think about this myself.
What data structure and algorithms should I use to ask a computer to tell me what job to assign to what file-watcher (and in what order to put them)?
As input, I have the dependencies between the jobs, the arrival time of files for each job and a number of watchers. (For starter, I will pretend each jobs takes same amount of time). How do I spread the jobs between the watchers, to avoid unnecessary waiting gaps and to obtain faster run time?
(I am looking forward tackling this optimization in an algorithmic way, but would like to start with some expert advice)
EDIT : so far I understood the fact the I need a DAG (Directed acyclic graph) to represent the dependencies and that I need to play with Topological sorting in order to optimize. But this responds with a one execution line, one thread. What if I have more, say 7?

Related

Need an advice about algorithm to solve quite specific Job Shop Scheduling Problem

Job Shop Scheduling Problem (JSSP): I have jobs that consist of tasks and I have machines that can perform these tasks.
I should be able to add new jobs dynamically. E.g. I have a schedule for the first 5 jobs, and when the 6th arrive - I need to be able to fit it into the schedule in the best way. It is possible to adjust existing schedule within the given flexibility constrains.
Look at the picture below.
Jobs have tasks, each task is the same type of action. Think about painting of some objects with paint spray. All the machines are the same (paint sprays), and all of the tasks are the same.
Constraint 1. Jobs have a preferred deadline for completion, but the deadline is flexible to some extent.
Edit after #tucuxi answer: Flexible deadline mean that the time of completion can be extended by some delta if necessary.
Constraint 2. Between the jobs there is resting phase. Think about drying the paint. Resting phase has minimal required duration. Resting phase can be longer or shorter if necessary.
Edit after #tucuxi answer: So there is planned time of rest Tp which is desired, but flexible value that can be increased or decreased if this allows for better scheduling. And there is minimal time of rest Tm. So Tp-Tadjustmenet>=Tm.
The machine is occupied by the job from the start to the completion.
Here goes parts that make this problem very distinct from what I have read about.
Jobs arrive in batches of several jobs. For example a batch can contain 10 jobs of the type Job_1 and 5 of Job_2. Different batches can contain different types of jobs. All the jobs from the batches should be finished as close to each other as possible. Not necessary at the same time, but we need to minimize the delay between the completion of first and last jobs from the batch.
Constraint 3. Machines are grouped. In each group only M machines can work simultaneously. Think about paint sprays that are connected to the common pressurizer that has limited performance.
The goal.
Having given description of the problem, it should be possible to solve JSSP. It should be also possible to add new jobs to the existing schedule.
Edit after #tucuxi answer: This is not a task that should be solved immediately: it is not a time-critical system. But it shouldn't be too long to irritate a human who put new tasks into the algorithm.
Question
What kind of many JSSP algorithms can help me solve this? I can implement an algorithm by myself, if there is one. The closest I found is This - Resource Constrained Project Scheduling Problem. But I was not able to comprehend how can I glue it to the JSSP solving algorithm.
Edit after #tucuxianswer: No, I haven't tried it yet.
Is there any libraries that can be used to solve this problem? Python or C# are the preferred languages, but in the end it doesn't really matter.
I appreciate any help: keyword to search for, link, reference to a book, reference to a library.
Thank you.

I doubt that there is a pre-made algorithm that solves your exact problem.
If I had to solve it, I would first:
compile datasets of inputs that I can feed into candidate solvers.
think of a metric to rank outputs, so that I can compare the candidates to see which is better.
A baseline solver could be a brute-force search: test and rate all possible job schedulings for small sample problems. This is of course infeasible for large inputs, but for small inputs it allows you to compare the outputs of more efficient solvers to a known-best answer.
Your link is to localsolver.com, which appears to provide a library for specifying problem constraints to then solve them. It is not freely available, requiring a license to use; but it would seem that your problem can be readily modeled in it. Have you tried to do so? They appear to support both C++ and Python. Other free options exist, including optaplanner (2.8k stars in github) or python-constraint (I have not looked into other languages).
Note that a good metric is crucial to choosing a good algorithm: unless you have a clear cost function to minimize, choosing "a good algorithm" is impossible. In your description of the problem, I see several places where cost is unclear (marked in italics):
job deadlines are flexible
minimal required rest times... which may be shortened
jobs from a batch should be finished as close together as possible
(not from specification): how long can you wait for an optimal vs a less-optimal-but-faster solution?

Assigning jobs to workers

There are N plumbers, and M jobs for them to do, where M > N in general.
If N > M then it's time for layoffs! :)
Properties of a job:
Each job should be performed in a certain time window which can vary per-job.
Location of each job varies per-job.
Some jobs require special skill. Skills needed to complete the job can vary per-job
Some jobs have higher priority than others. The "reward" for some jobs is higher than others.
Properties of a plumber:
Plumbers have to drive from one job to the next which takes time. Say it's known what the travel time from each job to every other job site is.
Some plumbers have skills that others don't have.
The task is to find the optimal assignment of jobs to plumbers, so that the reward is maximized.
It's possible that not all jobs can be completed. For example, with one plumber and two jobs, it's possible that if they are doing job A, they can't do job B because there's not enough time to get from A to B once they are done with A and B is supposed to begin. In that case, optimal is to have the plumber do the job with the biggest reward and we are done.
I am thinking of a greedy algorithm that works like this:
sort jobs by reward
while true:
for each job:
find plumbers that could potentially handle this job
make a note of the association, used in next loop
if each plumber is associated with a different job, break
for each job that can be handled by a plumber:
assign job to a plumber:
if more than one plumber can handle this job, break tie somehow:
for instance if plumber A can do jobs X,Y but
plumber B can only do X, then give X to B.
else just pick a plumber to take it
remove assigned job from further consideration
if no jobs got assigned:
break out of "while true" loop
My question: is there a better way? Seems like an NP-hard problem but I have no proof of that. :)
I guess it's similar to the Assignment Problem.
Seems it's a bit different though because of the space/time wrinkle: plumber could do either A or B, but not both because of the distance between them (can't get to B in time after finishing A). And jobs must be completed in certain time windows.
Also a plumber might not be able to take both jobs if they are too close in time (even if they are close in space). For example if B must be started before time_A_finished + time_to_travel_A_to_B, then B can't be done after A.
Thanks for any ideas! Any pointers on good stuff to read in this area is also appreciated.

Even routing just one plumber between jobs is as hard as the NP-hard traveling salesman problem.
I can suggest two general approaches for improving on your greedy algorithm. The first is local search. After obtaining a greedy solution, see if there are any small improvements to be made by assigning/reassigning/un-assigning a few jobs. Repeat until there are no obvious improvements or CPU time runs out.
Another approach is linear programming with column generation. This is more powerful but a lot more involved. The idea is to set up a master program where we try to capture as much reward as possible by choosing to use or not use every feasible plumber schedule, subject to the packing constraints of only doing a job once and not using more plumber skills than are available. At each stage of solving the master program, the dual values corresponding to jobs and plumbers reflect the opportunity cost of doing a particular job/using a particular plumber. The subproblem is figuring how out to route a plumber so as to capture more (adjusted) reward than the plumber "costs". This subproblem is NP-hard (per the note above), but it may be amenable itself to dynamic programming or further linear programming techniques depending on how many jobs there are. You'll quickly bump into the outer limits of academic operations research following this path.

Ensure that a list of dependent tasks dont have cycles

I want to make a CRUD app (SQL or document DB) for tasks: create them one at a time. Tasks may depend on one or more other tasks. Ideally this must be a directed acyclic graph.
How do I enforce that, when I create a new task (or modify the 'depends on'), there are no cycles? Its slow to load all the tasks to memory to traverse for cycles. Is there a better way to enforce them?
A simple way is to ensure that a task can depend only on previously created tasks. But this breaks in case we create task A, then we create task B which A depends on.

In the worst case, you'll definitely need to work with the whole graph - consider when there's a path from the new task through all the other tasks back to itself - you wouldn't be able to know this without processing the whole graph.
If you're primarily adding one task at a time, or modifying one relationship at a time, and the graph is fairly sparse (there aren't too many edges), one solution I can think of is to do a depth-first-search from the new task (perhaps in reverse, if that would have a smaller branching factor). If you get back to that task, you know there's a cycle. This would presumably result in only loading the required vertices one by one, which could be either a lot slower, or a lot faster, depending on how the data is stored (thus how long a single load would take), how many vertices need to be loaded in relation to the total number of vertices, and perhaps a few other factors.
If you make many changes at the same time, you're probably better off running a complete cycle detection algorithm on the graph.

Algorithm to find resource-free slot

I am designing a project management app for a factory. The app is expected to produce draft project plans. To schedule a task, the app should check three conditions:
task dependency - do not start before,
machine availability, and
shift work hours
I keep track of machine engagement in machine_allocations table:
machine_allocations
+------------+--------------+-----------------+---------------+
| machine_id | operation_id | start_timestamp | end_timestamp |
+------------+--------------+-----------------+---------------+
Shift hours follow a pattern.
Now, to find the earliest possible date-time for an operation I am thinking of a function:
function earliest_slot($machine_id, $for_duration, $no_sooner_than) {
// pseudo code
1. get records for the machine in question for after $no_sooner_than
2. put start and end timestamps into $unavailable array
3. add non-working times as new elements to the array
4. in a loop find timeslots which are not in the array
5. if a timeslot is found which is equal to or bigger than $for_duration, return that
}
My question is, is this a good approach? Are there simpler ways to do this?

Finding the earliest date-time for one operation at a time may not give you the best result. Consider the example where operation A uses machine 1 for a long time, operation B uses machine 1 for a short time and operation C uses machine 2 for a short time, but operation C must be done after B.
In this case, it is better to schedule B before A on machine 1, but your approach would not achieve this. Of course, writing and using software to manage this would be more difficult than what you have suggested, so you need to decide whether the benefit is worth the extra effort.
Have a look at Scheduling, Job Shop Scheduling and Scheduling algorithm.
First you need to think about what sort of information you can collect about tasks (such as dependencies, priorities, deadlines) and then decide how best to put it together.
You may find that an approach like you propose is good enough in your case. My addition to your proposed algorithm would be to sort the list of existing machine operations to make searching through them faster, that is you can stop as soon as you find a time where your operation fits because it's guaranteed to be the earliest time.
A relatively simple extension would be a priority system that allows you to bump lower-priority tasks forward (which may require the adjustment of their dependencies as well), but more complicated algorithms would consider multiple tasks at once and try to optimise the outcome. In the end it comes down to what's appropriate for your specific problem.

That depends when You want to plan work. If before starting work of machines then mayby a Branch&Bound algorithm, or something like it (mayby dynamic programming). If work have to planned when machines are working and You can not tell what jobs would be performed then for optimal solution You can not count (well I can't think about it). Mayby put next jobs on machine with smalles max time? Mayby a dynamic version of Ford-Bellmas alg (if you have couple layers of production). Hard to say.
I would do couple of approches and determine witch are best. The You can write an article about this :)

Scheduling with variable Resources

(First of all, sorry for my english, it's not my first language)
I have a list of tasks/jobs, each task must start after a specific start time, needs to run for a certain time and has to be finished after a certain end time.
I can dynamically add and remove workers, so it is possible to execute 2 or more tasks at the same time if I have to. My Goal is to find a scheduling plan that executes each job successfully and uses the minimal amount of workers possible.
I'm currently using an EDF (http://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) Algorithm and recursively call the function with a higher Worker Limit if it can't schedule all jobs correctly, but I think this doesn't work right because I don't have a real way to measure when I can lower the ressource limit again.
Are there any Algorithms that work for my problem, or any other clever ideas?
Thanks for your help.

A scheduling problem can often be solved very effectively by formulating it either as mixed-integer program (MIP)
http://en.wikipedia.org/wiki/Mixed_integer_programming#Integer_unknowns
or expressing it using constraint programming (CP)
http://en.wikipedia.org/wiki/Constraint_programming
For either MIP or CP, you will find both free and commercial solvers that can address your problem.
In both of these approaches, you put your effort into stating the properties that the solution must have, and the hard work of applying an appropriate algorithm is left to a specialized solver.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio