A job allocating proble - algorithm

I got n people, need to do N tasks, each task takes (x man-day), tasks have dependence, such as task A must start after task B finished. How should I arrangement it? Thank you so much!
I set up 4 rules for this question:
one Task could set most num of people to do it (such as 5)
there are logical dependence between tasks
one Task will cost determinated man-days
everyday concurrent working people can not be above current people num(n)
the rules are above, but i don't know how to calculate a minimum total time. Please tell me a method to solve it or some inspiration, thank you so much!

Make topological sort of jobs.
Now you have sequence of events (time; job start/job end)
On start of the job assign a man, if there are free workers, otherwise wait.
On job end free a worker, assign him to the waiting job if exists.

This problem is well-known to be NP-complete. Even the simple version with no dependencies, and only one worker per job, is NP-complete.
So you likely need to accept a reasonable heuristic. Here is one.
With a breadth-first search starting from the jobs which nothing else depends on, figure out for each job the minimum wall-clock time (max out workers on the job, and everything that depends on it, even if that is not actually possible) from starting that job to finishing.
And now you start at the top. Always assign workers to jobs by the following rules:
Longest wallclock time to finish first.
Break ties in favor of longest job.
Break ties in favor of job with fewest max workers.
In other words you're prioritizing putting people to work on the critical path and any slow jobs.

Related

Assign same queue item to multiple workers

We have a logical queue of tasks, where each task has to be assigned to multiple workers.
The numbers of workers to be assigned are based on a configuration of Minimum and Maximum workers.
A worker should not see the same task that they have already completed. It is not necessary that all workers will see all the tasks.
Total number of workers can change dynamically. Each worker can become online or offline anytime.
Each worker can choose to either complete the task or let it expire.
On expiry the task should be assigned to any worker who has not already completed the task.
Is there a good algorithm to solve this scenario ?
The easy solution:
Greedily assign tasks:
For every task that is ready.
Find the Minimum <= N <= Maximum workers that
didn't see the task yet and assign them.
Repeat until you either run
out of workers or finish all the tasks.
If a worker comes online or finishes task, re-check for all the tasks.
If a new task comes, re-check for the available workers.
This solution might be enough if there are not that many tasks, as it is computation heavy and recomputes everything.
The possible optimizations:
If the greedy solution fails(and it probably will), there are ways to improve on it. I will try to list those that come to my mind but it won't be an exhaustive list.
First, my personal favorite: Network flows. Unfortunately I don't see an easy way to resolve the minimal number of workers requirement, however it would be fast and it would result in a maximum possible workers being assigned at any given moment.
Create the network Source - Workers - Tasks - Sink. Edges workers to tasks will be linked and unlinked as needed:
when a worker is available for a task, create the edge with weight 1, otherwise don't create the edge.
From source link an edge with weight one to each online worker.
From every task link an edge with weight equal to its maximal worker capacity.
You could even differentiate between different kinds of workers, network flows are awesome. The algorithms are fast which makes them suitable even for large graphs. Also they are available in many libraries so you won't have to implement them yourself. Unfortunately, there is no easy way to enforce the minimal workers rule. At least I don't see one right now, there might be some way. Or maybe at least a heuristic
Second, be smart while being greedy:
Create a queue for every task.
When a worker is available, register him for every task he can do into its queue.
When a worker is not available, remove him from the queues.
When a task has enough workers, start progress and disable these workers.
This is still the brute force approach however since you keep the queues, you limit the amount of the necessary computation to a reasonable level. The potential downside is that large tasks(with a large minimal number of workers) might be stalled by the small tasks that will be easier to start - and will eat up the workers. So there will probably be some further checking/balancing and prioritizing needed.
There is definitely more to be tried & done for your task at hand, however the information you provided is rather limited so this advice is not that specific.

Max-Min and Min-Min algorithms implementation

I'm trying to simulate the Max-Min and Min-Min scheduling algorithms and code them myself in a simulation. But don't really understand how to implement the way they work in code.
For example, in FCFS algorithm i use 3 servers(vms),each server is faster than the other and 5 tasks with different arrival times. So the first task will check the first server and will be scheduled there, the second if arrive while the first is't completed yet, will check the availability and scheduled to the second server. If all 3 servers are occupied the next task will be scheduled to the one with the min remain executing time.
Now for the Min-Min and Min-Max this is the theoretical background:
Min-Min:
Phase 1: First computes the completion time of every task on each machine and then for every task select the machine which processes the tasks in minimum possible time.
Phase 2: Among all the tasks in Meta task the task with minimum completion time is selected and is assigned to machine on which minimum execution time is expected. The task is removed from the list of Meta Task and the procedure continues until Meta Task list is empty.
Max-Min:
Phase 1: First computes the completion time of every task on each machine and then for every task chooses the machine which processes the tasks in minimum possible time
Phase 2: Among all the tasks in Meta Task the task with maximum completion time is selected and is assigned to machine. The task is removed from the list of Meta Task and the procedure continues until Meta Task list is empty.
I get the phase 1 for both algorithms, I need to check the task's burst time and server's speedup -> burst/speedup = execution time. I will find the best server for each task.
But I can't understand the phase 2. For Min-Min i have to choose every time the fastest task and when i find it I will have to schedule it to the faster server. But the workload will be imbalanced, as I said 3 servers and at least one is the faster, lets say server with ID 1, so every time the tasks will scheduled to this one, I also need the other 2 to work.
Same problem with Max-Min, find the worst task, schedule it to the worst server, but only one server is the worst so the other 2 will not work. How am I suppose to do the balancing and also take in consideration that the tasks arrive in different times?
If you need anything more just let me know and thanks in advance!
You can find nice description of both algorithms in A Comparative Analysis of Min-Min and Max-Min Algorithms based on the Makespan Parameter:
I paste here pseudocode for Min-Min. ETij is execution time for task ti on resource Rj. And rj is ready time of Rj.
That's true that you can have imbalanced load, because all small task will get executed first. Max-Min algorithm overcomes this drawback.
Max-Min algorithm performs the same steps as the Min-Min algorithm but the main difference comes in the second phase, where a task ti is selected which has the maximum completion time instead of minimum completion time as in min-min and assigned to resource Rj, which gives the minimum completion time.

Fair job processing algorithm

I've got a machine that accepts user uploads, performs some processing on them, and then returns the result. It usually takes a few minutes to process each upload received.
The problem is, a few users can upload a lot of jobs that basically deny processing to other users for a long time. I thought of just setting a hard cap and using priority queues, e.g. after 5 uploads in an hour, all new uploads are given a lower processing priority. I basically want to process ALL jobs, but I don't want the user who uploaded 1000 jobs to make everyone wait.
My question is, is there a better way to do this?
My goal is to minimize the time between the upload and the result being returned. It would be ideal if the algorithm could work in a distributed manner as well.
Thanks
Implementation will vary widely depending on what these jobs are and how long they take and how varied the processing times are, as well as how likely there is to be a fatal error during the process.
That being said, an easy way to maintain an even distribution of jobs across users is to maintain a list of all the users who have submitted jobs. When you are ready to get a new job, rather than just taking the next job out of a random queue, cycle through the users taking the top job from each user each time.
Again, this can be accomplished a number of ways, I would recommend a map from users to their respective list of jobs submitted. Cycle through the keys of the map each time you are ready for a new job. then get the list of jobs for whatever key you are on, and do the first job.
This is assuming that each job is "atomic" in that one job is not dependent on being executed next to the jobs it was submitted with.
Hope that helps, of course I could have completely misunderstood what you are asking for.
You don't have to roll-your-own. There is Sun Grid Engine. An open-source tool that is built to do that sort of thing, and if you are willing to pay, there is Platform LSF, which I use at work.
What is the maximum # of jobs a user can submit? Can users submit 1 job a a time OR is it a batch of jobs?
So your algorithm would go something like this
If the User has submitted jobs Then
Check how many jobs per hour
If the jobs per hour > than the average Then
Modify the users profile to a lower priority
Else
Check Users priority level and restore
End If
If the priority = HIGH
process right away
Else If priority = MEDIUM
Check Queue for High Priority
If High Priority Found (rerun this loop)
Else Process
Else If priority = LOW
Check Queue for High Priority
If High Priority Found (rerun this loop)
Else Process
Check Queue for Medium Priority
If Medium Priority Found (rerun this loop)
Else Process
Process Queue
End If
You can use a graph algorithm like Edmond's Blossom V to assign all users and jobs to a process. If a user can upload more then another user it would be more simplier for him to find a process. With the Blossom V algorithm you can define a threshold to not exceed the maximum process the server can handle.

how is task Scheduled(allocated) to CPU using Shortest job first method?

I have some clear understanding about Time slicing algorithm for CPU task scheduling.But i have some confusion with Shortest job first algorithm.
For example:
I have two programs.
one program with infinite loop and another with finite loop.In this situation,how CPU determining job for execute by using shortest job first.
CPU only understand this kind of situation while executing those programs(Whether it is infinite or finite).
Please Guide me to get out of this issue...
Thanks & Regards,
Saravanan.P
It's taking the Job with the lowest process burst assumption (intervals with no I/O-Usage). Computers can't really determine infinite loops or tell if it's just a long query, but it can determine I/O-usage.

Optimum algorithm

Two workers has several tasks.Assume the tasks has duration of 14, 7, 2, 4. The next task goes to first worker that is free. The two workers have to finish several tasks in one day. The same task takes the same time on the two workers. Our goal is to finish the tasks as soon as possible.
Two questions:
1.show that the algorithm always completes the task prior to time 2*T,T is the optimum completion time.
2.express the optimum scheduling with a resursion(multi-dimentonal)
Not HW PRoblem
Please give me some suggestions
What is a multi-dimentional recursion?
Since you ask for suggestions...
Try drawing the problem out. Have a timeline for worker #1 and worker #2 and specify what tasks they are working on for what stretches of time. Once you understand why this algorithm completes in less than 2*T time, you then can start figuring out how to formally prove it.

Resources