Max-Min and Min-Min algorithms implementation - algorithm

I'm trying to simulate the Max-Min and Min-Min scheduling algorithms and code them myself in a simulation. But don't really understand how to implement the way they work in code.
For example, in FCFS algorithm i use 3 servers(vms),each server is faster than the other and 5 tasks with different arrival times. So the first task will check the first server and will be scheduled there, the second if arrive while the first is't completed yet, will check the availability and scheduled to the second server. If all 3 servers are occupied the next task will be scheduled to the one with the min remain executing time.
Now for the Min-Min and Min-Max this is the theoretical background:
Min-Min:
Phase 1: First computes the completion time of every task on each machine and then for every task select the machine which processes the tasks in minimum possible time.
Phase 2: Among all the tasks in Meta task the task with minimum completion time is selected and is assigned to machine on which minimum execution time is expected. The task is removed from the list of Meta Task and the procedure continues until Meta Task list is empty.
Max-Min:
Phase 1: First computes the completion time of every task on each machine and then for every task chooses the machine which processes the tasks in minimum possible time
Phase 2: Among all the tasks in Meta Task the task with maximum completion time is selected and is assigned to machine. The task is removed from the list of Meta Task and the procedure continues until Meta Task list is empty.
I get the phase 1 for both algorithms, I need to check the task's burst time and server's speedup -> burst/speedup = execution time. I will find the best server for each task.
But I can't understand the phase 2. For Min-Min i have to choose every time the fastest task and when i find it I will have to schedule it to the faster server. But the workload will be imbalanced, as I said 3 servers and at least one is the faster, lets say server with ID 1, so every time the tasks will scheduled to this one, I also need the other 2 to work.
Same problem with Max-Min, find the worst task, schedule it to the worst server, but only one server is the worst so the other 2 will not work. How am I suppose to do the balancing and also take in consideration that the tasks arrive in different times?
If you need anything more just let me know and thanks in advance!

You can find nice description of both algorithms in A Comparative Analysis of Min-Min and Max-Min Algorithms based on the Makespan Parameter:
I paste here pseudocode for Min-Min. ETij is execution time for task ti on resource Rj. And rj is ready time of Rj.
That's true that you can have imbalanced load, because all small task will get executed first. Max-Min algorithm overcomes this drawback.
Max-Min algorithm performs the same steps as the Min-Min algorithm but the main difference comes in the second phase, where a task ti is selected which has the maximum completion time instead of minimum completion time as in min-min and assigned to resource Rj, which gives the minimum completion time.

Related

A job allocating proble

I got n people, need to do N tasks, each task takes (x man-day), tasks have dependence, such as task A must start after task B finished. How should I arrangement it? Thank you so much!
I set up 4 rules for this question:
one Task could set most num of people to do it (such as 5)
there are logical dependence between tasks
one Task will cost determinated man-days
everyday concurrent working people can not be above current people num(n)
the rules are above, but i don't know how to calculate a minimum total time. Please tell me a method to solve it or some inspiration, thank you so much!
Make topological sort of jobs.
Now you have sequence of events (time; job start/job end)
On start of the job assign a man, if there are free workers, otherwise wait.
On job end free a worker, assign him to the waiting job if exists.
This problem is well-known to be NP-complete. Even the simple version with no dependencies, and only one worker per job, is NP-complete.
So you likely need to accept a reasonable heuristic. Here is one.
With a breadth-first search starting from the jobs which nothing else depends on, figure out for each job the minimum wall-clock time (max out workers on the job, and everything that depends on it, even if that is not actually possible) from starting that job to finishing.
And now you start at the top. Always assign workers to jobs by the following rules:
Longest wallclock time to finish first.
Break ties in favor of longest job.
Break ties in favor of job with fewest max workers.
In other words you're prioritizing putting people to work on the critical path and any slow jobs.

Measure parallelizability of task scheduler graph in Dask

If I have a large custom task graph fed into Dask, and a list of times that each task will take (e.g. in the simplest case, each task might take the same number of CPU seconds/minutes), is there a way to calculate how "parallelizable" the problem is from the structure of the graph, without actually running the tasks? The assumption would be that the time to calculate scheduling priorities etc is negligible compared to the time of each task.
I'm thinking of a metric either that plots the number of parallel processes being used over time (assuming an infinite number of available cores), or perhaps the number of hours that the whole graph would take, given a fixed number of CPU cores.
If not, this seems like a nice thing to be able to provide for the task scheduler, especially if individual tasks are expected to take many minutes or hours. However, I guess it might be hard to calculate analytically, depending on the algorithm(s) underlying task allocation in Dask.

Managing tasks without using the system clock

For the sake of argument i'm trying to define an algorithm that creates
a pool of tasks (each task has an individual time-slice to operate)
and managing them without the system clock.
The problem that i've encountered is that with each approach that I was taking,
the tasks with the higher time-slice were starved.
What i've tried to do is creating a tasks vector that contains a pair of task/(int) "time".
I've sorted the vector with the lowest "time" first & then iterated and executed each task with a time of zero (0). While iterating through the entire vector i've decreased the "time" of each task.
Is there a better approach to this kind of "problem". With my approach, startvation definitely will occur.
Managing tasks may not need a system clock at all.
You only need to figure a way to determine the priority between each task then run each task following their priority.
You may want to pause a task to execute another task and then will need to set a new priority to the paused task. This feature (Multitasking) will need an interruption based on an event (usually clock-time, but you can use any other event like temperature or monkey pushing a button or another process sending a signal).
You say your problem is that tasks with the higher time-slice are starving.
As you decrease the 'time' of each task when running it and assuming 'time' will not be negative, higher time-slice tasks will eventually get to 0 and as well as every other task.

What is a good way to design and build a task scheduling system with lots of recurring tasks?

Imagine you're building something like a monitoring service, which has thousands of tasks that need to be executed in given time interval, independent of each other. This could be individual servers that need to be checked, or backups that need to be verified, or just anything at all that could be scheduled to run at a given interval.
You can't just schedule the tasks via cron though, because when a task is run it needs to determine when it's supposed to run the next time. For example:
schedule server uptime check every 1 minute
first time it's checked the server is down, schedule next check in 5 seconds
5 seconds later the server is available again, check again in 5 seconds
5 seconds later the server is still available, continue checking at 1 minute interval
A naive solution that came to mind is to simply have a worker that runs every second or so, checks all the pending jobs and executes the ones that need to be executed. But how would this work if the number of jobs is something like 100 000? It might take longer to check them all than it is the ticking interval of the worker, and the more tasks there will be, the higher the poll interval.
Is there a better way to design a system like this? Are there any hidden challenges in implementing this, or any algorithms that deal with this sort of a problem?
Use a priority queue (with the priority based on the next execution time) to hold the tasks to execute. When you're done executing a task, you sleep until the time for the task at the front of the queue. When a task comes due, you remove and execute it, then (if its recurring) compute the next time it needs to run, and insert it back into the priority queue based on its next run time.
This way you have one sleep active at any given time. Insertions and removals have logarithmic complexity, so it remains efficient even if you have millions of tasks (e.g., inserting into a priority queue that has a million tasks should take about 20 comparisons in the worst case).
There is one point that can be a little tricky: if the execution thread is waiting until a particular time to execute the item at the head of the queue, and you insert a new item that goes at the head of the queue, ahead of the item that was previously there, you need to wake up the thread so it can re-adjust its sleep time for the item that's now at the head of the queue.
We encountered this same issue while designing Revalee, an open source project for scheduling triggered callbacks. In the end, we ended up writing our own priority queue class (we called ours a ScheduledDictionary) to handle the use case you outlined in your question. As a free, open source project, the complete source code (C#, in this case) is available on GitHub. I'd recommend that you check it out.

Optimum algorithm

Two workers has several tasks.Assume the tasks has duration of 14, 7, 2, 4. The next task goes to first worker that is free. The two workers have to finish several tasks in one day. The same task takes the same time on the two workers. Our goal is to finish the tasks as soon as possible.
Two questions:
1.show that the algorithm always completes the task prior to time 2*T,T is the optimum completion time.
2.express the optimum scheduling with a resursion(multi-dimentonal)
Not HW PRoblem
Please give me some suggestions
What is a multi-dimentional recursion?
Since you ask for suggestions...
Try drawing the problem out. Have a timeline for worker #1 and worker #2 and specify what tasks they are working on for what stretches of time. Once you understand why this algorithm completes in less than 2*T time, you then can start figuring out how to formally prove it.

Resources