Optimum algorithm - algorithm

Two workers has several tasks.Assume the tasks has duration of 14, 7, 2, 4. The next task goes to first worker that is free. The two workers have to finish several tasks in one day. The same task takes the same time on the two workers. Our goal is to finish the tasks as soon as possible.
Two questions:
1.show that the algorithm always completes the task prior to time 2*T,T is the optimum completion time.
2.express the optimum scheduling with a resursion(multi-dimentonal)
Not HW PRoblem
Please give me some suggestions
What is a multi-dimentional recursion?

Since you ask for suggestions...
Try drawing the problem out. Have a timeline for worker #1 and worker #2 and specify what tasks they are working on for what stretches of time. Once you understand why this algorithm completes in less than 2*T time, you then can start figuring out how to formally prove it.

Related

A job allocating proble

I got n people, need to do N tasks, each task takes (x man-day), tasks have dependence, such as task A must start after task B finished. How should I arrangement it? Thank you so much!
I set up 4 rules for this question:
one Task could set most num of people to do it (such as 5)
there are logical dependence between tasks
one Task will cost determinated man-days
everyday concurrent working people can not be above current people num(n)
the rules are above, but i don't know how to calculate a minimum total time. Please tell me a method to solve it or some inspiration, thank you so much!
Make topological sort of jobs.
Now you have sequence of events (time; job start/job end)
On start of the job assign a man, if there are free workers, otherwise wait.
On job end free a worker, assign him to the waiting job if exists.
This problem is well-known to be NP-complete. Even the simple version with no dependencies, and only one worker per job, is NP-complete.
So you likely need to accept a reasonable heuristic. Here is one.
With a breadth-first search starting from the jobs which nothing else depends on, figure out for each job the minimum wall-clock time (max out workers on the job, and everything that depends on it, even if that is not actually possible) from starting that job to finishing.
And now you start at the top. Always assign workers to jobs by the following rules:
Longest wallclock time to finish first.
Break ties in favor of longest job.
Break ties in favor of job with fewest max workers.
In other words you're prioritizing putting people to work on the critical path and any slow jobs.

Assign same queue item to multiple workers

We have a logical queue of tasks, where each task has to be assigned to multiple workers.
The numbers of workers to be assigned are based on a configuration of Minimum and Maximum workers.
A worker should not see the same task that they have already completed. It is not necessary that all workers will see all the tasks.
Total number of workers can change dynamically. Each worker can become online or offline anytime.
Each worker can choose to either complete the task or let it expire.
On expiry the task should be assigned to any worker who has not already completed the task.
Is there a good algorithm to solve this scenario ?
The easy solution:
Greedily assign tasks:
For every task that is ready.
Find the Minimum <= N <= Maximum workers that
didn't see the task yet and assign them.
Repeat until you either run
out of workers or finish all the tasks.
If a worker comes online or finishes task, re-check for all the tasks.
If a new task comes, re-check for the available workers.
This solution might be enough if there are not that many tasks, as it is computation heavy and recomputes everything.
The possible optimizations:
If the greedy solution fails(and it probably will), there are ways to improve on it. I will try to list those that come to my mind but it won't be an exhaustive list.
First, my personal favorite: Network flows. Unfortunately I don't see an easy way to resolve the minimal number of workers requirement, however it would be fast and it would result in a maximum possible workers being assigned at any given moment.
Create the network Source - Workers - Tasks - Sink. Edges workers to tasks will be linked and unlinked as needed:
when a worker is available for a task, create the edge with weight 1, otherwise don't create the edge.
From source link an edge with weight one to each online worker.
From every task link an edge with weight equal to its maximal worker capacity.
You could even differentiate between different kinds of workers, network flows are awesome. The algorithms are fast which makes them suitable even for large graphs. Also they are available in many libraries so you won't have to implement them yourself. Unfortunately, there is no easy way to enforce the minimal workers rule. At least I don't see one right now, there might be some way. Or maybe at least a heuristic
Second, be smart while being greedy:
Create a queue for every task.
When a worker is available, register him for every task he can do into its queue.
When a worker is not available, remove him from the queues.
When a task has enough workers, start progress and disable these workers.
This is still the brute force approach however since you keep the queues, you limit the amount of the necessary computation to a reasonable level. The potential downside is that large tasks(with a large minimal number of workers) might be stalled by the small tasks that will be easier to start - and will eat up the workers. So there will probably be some further checking/balancing and prioritizing needed.
There is definitely more to be tried & done for your task at hand, however the information you provided is rather limited so this advice is not that specific.

Max-Min and Min-Min algorithms implementation

I'm trying to simulate the Max-Min and Min-Min scheduling algorithms and code them myself in a simulation. But don't really understand how to implement the way they work in code.
For example, in FCFS algorithm i use 3 servers(vms),each server is faster than the other and 5 tasks with different arrival times. So the first task will check the first server and will be scheduled there, the second if arrive while the first is't completed yet, will check the availability and scheduled to the second server. If all 3 servers are occupied the next task will be scheduled to the one with the min remain executing time.
Now for the Min-Min and Min-Max this is the theoretical background:
Min-Min:
Phase 1: First computes the completion time of every task on each machine and then for every task select the machine which processes the tasks in minimum possible time.
Phase 2: Among all the tasks in Meta task the task with minimum completion time is selected and is assigned to machine on which minimum execution time is expected. The task is removed from the list of Meta Task and the procedure continues until Meta Task list is empty.
Max-Min:
Phase 1: First computes the completion time of every task on each machine and then for every task chooses the machine which processes the tasks in minimum possible time
Phase 2: Among all the tasks in Meta Task the task with maximum completion time is selected and is assigned to machine. The task is removed from the list of Meta Task and the procedure continues until Meta Task list is empty.
I get the phase 1 for both algorithms, I need to check the task's burst time and server's speedup -> burst/speedup = execution time. I will find the best server for each task.
But I can't understand the phase 2. For Min-Min i have to choose every time the fastest task and when i find it I will have to schedule it to the faster server. But the workload will be imbalanced, as I said 3 servers and at least one is the faster, lets say server with ID 1, so every time the tasks will scheduled to this one, I also need the other 2 to work.
Same problem with Max-Min, find the worst task, schedule it to the worst server, but only one server is the worst so the other 2 will not work. How am I suppose to do the balancing and also take in consideration that the tasks arrive in different times?
If you need anything more just let me know and thanks in advance!
You can find nice description of both algorithms in A Comparative Analysis of Min-Min and Max-Min Algorithms based on the Makespan Parameter:
I paste here pseudocode for Min-Min. ETij is execution time for task ti on resource Rj. And rj is ready time of Rj.
That's true that you can have imbalanced load, because all small task will get executed first. Max-Min algorithm overcomes this drawback.
Max-Min algorithm performs the same steps as the Min-Min algorithm but the main difference comes in the second phase, where a task ti is selected which has the maximum completion time instead of minimum completion time as in min-min and assigned to resource Rj, which gives the minimum completion time.

Why Shortest Job First(SJF) algorithm is not used instead of FCFS at final level in Multilevel Feedback Scheduling

In Multilevel Feedback Scheduling at the base level queue, the processes circulate in round robin fashion until they complete and leave the system. Processes in the base level queue can also be scheduled on a first come first served basis.
Why can't they be scheduled on Shortest Job First (SJF) algorithm instead of First Come First Serve (FCFS) algorithm which seems to improve average performance of the algorithm.
One simple reason:
The processes fall in the base level queue after they fail to finish in the time quantum alloted to them in the higher level queues. If you implement SJF algorithm in the base level queue, you may starve a process because shorter job may keep coming before a longer executing process ever gets the CPU.
The SJF algorithm gives more througput, only when processes differ a lot in their burst time. However its not always the case that it will perform better than FCFS. Take a loot at this answer.
Since in Multilevel Feedback Scheduling algorithm, all the processes that are unable to complete execution within defined time quantum of first 2 queues, are put to the last queue having FCFS, its very likely that they all have large CPU bursts and therefore wont differ much in their burst time. Hence, its preferred to have FCFS, scheduling for the last queue.

A new project management algorithm wanted for finite number of workers

I'm studying task-based parallel computing and got interested in a variation of the old project management problem -- the critical path of an activity-on-vertex (AOV) project network, which can be calculated using the topological sorting algorithm if there's no deadlock cycle. The total time of those activities on a critical path gives the minimum completion time of the project.
But this is assuming we always have enough workers simultaneously finishing the activities with no dependence on each other. If the number of workers (processors/cores) available is finite, certain activities can wait not because some activities they depend on have not yet been finished, but simply because all workers are now busy doing other activities. This is a simplified model for today's multi-core parallel computing. If there's only one worker who has to do all the activities, the project completion time is the total time of all activities. We are back to single-core serial computing that way.
Is there an efficient algorithm that gives the minimum completion time of an AOV network given a finite number of workers available? How should we wisely choose which activities to do first when the doable activities is more than the number of workers so as to minimize the idling time of workers later on? The minimum time should be somewhere in between the critical path time (infinite workers) and the total time of all activities (one worker). It should also be greater than equal to the total time divided by the number of workers (no idling). Is there an algorithm to get that minimum time?
I found a C++ conference video called "work stealing" that almost answers my question. At 18:40, the problem is said on the slide to be NP-hard if activities cannot be paused, further divided, or transferred from worker to worker. Such restrictions make decisions of which workers to finish which jobs (activities) too hard to make. Work stealing is therefore introduced to avoid making such difficult decisions beforehand. Instead, it makes such decions no longer crucial so long as certain apparent greedy rules are followed. The whole project will be always finished as soon as possible under the constraint of either the critical path or the no-idling time of the finite number of workers or both. The video then goes on talking about how to make the procedure of "work stealing" between different workers (processors) more efficient by making the implementation distributed and cache-friendly, etc.
According to the video, future C++ shared-memory parallel coding will be task-based rather than loop-based. To solve a problem, the programmer defines a bunch of tasks to finish and their dependence relations to respect, and then the coding language will automatically schedule the tasks on multiple cores at run time in a flexible way. This "event-driven"-like way of implementing a flexible code by a distributed task queuing system will become very useful in parallel computing.
When an optimization problem is NP-hard, the best way to solve it is to find ways to avoid it.

Resources