We are confronted with a Task scheduling Problem
Specs
We have N workers available, and a list of tasks to do.
Each task-->Ti needs Di (i.e. worker*days) to finish (Demand), and can only hold no more than Ci workers to work on it simultaneously (Capacity).
And some tasks can only start after other task(s) are done (Dependency).
The target is to achieve total minimal duration by allocating workers to those sequences.
Example
Number of workers: 10
Taks List: [A, B, C]
Demand: [100 50 10] - unit: workerday (Task A need 100 workerday to finish, B needs 50 workerday, and C need 10 workerday)
Capacity: [10 10 2] - unit: worker (Task A can only 10 workers to work on it at the same time, B can only hold 10, and C can only hold 2)
Dependency: {A: null, B: null, C: B} - A and B can start at any time, C can only start after B is done
Possible approaches to the example problem:
First assign B 10 workers, and it will take 50/10 = 5 days to finish. Then at day 5, we assign 2 workers to C, and 8 workers to A, it will take max(10/2 = 5, 100/8 = 12.5) = 12.5 days to finish. Then the total duration is 5 + 12.5 = 17.5 days.
First assign A 10 workers, and it takes 100/10 = 10 days to finish. Then at day 10, we assign 10 workers to B, which takes 50/10 = 5 days to finish. Then at day 15, we assign 2 workers to C, which takes 10/2 = 5 days to finish. The total duration is 10+5+5 = 20 days.
So the first practice is better, since 17.5 < 20.
But there are still many more possible allocation practices to the example problem, and we are not even sure about what is the best practice to get the minimal total duration for it.
What we want is an algorithm:
Input:
Nworker, Demand, Capacity, Dependency
output: worker allocation practice with the minimal total duration.
Possible Allocation Strategies we've considered when allocating for the tasks without dependency:
First finish the tasks dependent by others as soon as possible (say, finish B as soon as possible in the example)
Allocate workers to tasks with maximam demand (say, first allocate all workers to A in the example)
But none of the two proves to be the optimal strategy.
Any idea or suggestion would be appreciated. Thanks !
This sounds like Job Shop Scheduling with dependencies, which is NP-complete (or NP-hard). So scaling out and delivering an optimal solution in reasonable time is probably impossible.
I've got good results on similar cases (Task assigning and Dependend Job Scheduling) by doing first a Construction Heuristic (pretty much one of those 2 allocation strategies you got there) and then doing a Local Search (usually Late Acceptance or Tabu Search) to get to near optimal results.
Related
when I read the Julia document of multi-core parallel computing, I noticed there are both parallel map pmap and for-loop #distributed for.
From the documentation, "Julia's pmap is designed for the case where each function call does a large amount of work. In contrast, #distributed for can handle situations where each iteration is tiny".
What makes the difference between pmap and #distributed for? Why #distributed for is slow for a large amount of work?
Thanks
The issue is that pmap does load balancing while #distributed for splits jobs into equal chunks. You can confirm this by running these two code examples:
julia> #time res = pmap(x -> (sleep(x/10); println(x)), [10;ones(Int, 19)]);
From worker 2: 1
From worker 3: 1
From worker 4: 1
From worker 2: 1
From worker 3: 1
From worker 4: 1
From worker 3: 1
From worker 2: 1
From worker 4: 1
From worker 4: 1
From worker 2: 1
From worker 3: 1
From worker 2: 1
From worker 3: 1
From worker 4: 1
From worker 4: 1
From worker 3: 1
From worker 2: 1
From worker 4: 1
From worker 5: 10
1.106504 seconds (173.34 k allocations: 8.711 MiB, 0.66% gc time)
julia> #time #sync #distributed for x in [10;ones(Int, 19)]
sleep(x/10); println(x)
end
From worker 4: 1
From worker 3: 1
From worker 5: 1
From worker 4: 1
From worker 5: 1
From worker 3: 1
From worker 5: 1
From worker 3: 1
From worker 4: 1
From worker 3: 1
From worker 4: 1
From worker 5: 1
From worker 4: 1
From worker 5: 1
From worker 3: 1
From worker 2: 10
From worker 2: 1
From worker 2: 1
From worker 2: 1
From worker 2: 1
1.543574 seconds (184.19 k allocations: 9.013 MiB)
Task (done) #0x0000000005c5c8b0
And you can see that the large job (value 10) makes pmap execute all small jobs on workers different than the one that got the large job (in my example worker 5 did only job 10 while workers 2 to 4 did all other jobs). On the other hand #distributed for assigned the same number of jobs to each worker. Thus the worker that got job 10 (worker 2 in the second example) still had to do four short jobs (as each worker on the average has to do 5 jobs - my example has 20 jobs in total and 4 workers).
Now the advantage of #distributed for is that if the job is inexpensive then equal splitting of jobs among workers avoids having to do the dynamic scheduling which is not for free either.
In summary, as the documentation states, if the job is expensive (and especially if its run time can vary largely), it is better to use pmap as it does load-balancing.
pmap has a batch_size argument which is, by default, 1. This means that each element of the collection will be sent one by one to available workers or tasks to be transformed by the function you provided. If each function call does a large amount of work and perhaps each call differs in time it takes, using pmap has the advantage of not letting workers go idle, while other workers do work, because when a worker completes one transformation, it will ask for the next element to transform. Therefore, pmap effectively balances load among workers/tasks.
#distributed for-loop, however, partitions a given range among workers once at the beginning, not knowing how much time each partition of the range will take. Consider, for example, a collection of matrices, where the first hundred elements of the collection are 2-by-2 matrices, the next hundred elements are 1000-by-1000 matrices and we would like to take the inverse of each matrix using #distributed for-loops and 2 worker processes.
#sync #distributed for i = 1:200
B[i] = inv(A[i])
end
The first worker will get all the 2-by-2 matrices and the second one will get 1000-by-1000 matrices. The first worker will complete all the transformation very quickly and go idle, while the other will continue to do work for very long time. Although you are using 2 workers, the major part of the whole work will effectively be executed in serial on the second worker and you will get almost no benefit from using more than one worker. This problem is known as load balancing in the context of parallel computing. The problem may also arise, for example, when one processor is slow and the other is fast even if the work to be completed is homogeneous.
For very small work transformations, however, using pmap with a small batch size creates a communication overhead that might be significant since after each batch the processor needs to get the next batch from the calling process, whereas with #distributed for-loops each worker process will know, at the beginning, which part of the range it is responsible for.
The choice between pmap and #distributed for-loop depends on what you want to achieve. If you are going to transform a collection as in map and each transformation requires a large amount of work and this amount is varying, then you are likely to be better of choosing pmap. If each transformation is very tiny, then you are likely to be better of choosing #distributed for-loop.
Note that, if you need a reduction operation after the transformation, #distributed for-loop already provides one, most of the reductions will be applied locally while the final reduction will take place on the calling process. With pmap, however, you will need to handle the reduction yourself.
You can also implement your own pmap function with very complex load balancing and reduction schemes if you really need one.
https://docs.julialang.org/en/v1/manual/parallel-computing/
I have some workers and tasks, and want to assign the best workers for each task, that's a typical use for Hungarian method.
But let's add that tasks happen at certain day/time, and I want to add in consideration that for a given worker, I'd like his tasks to be as close in time as possible.
Is there an algorithm where I could set a priority between the two different goals?
edit: let's try a simple example. I have 3 tasks happening respectively at 1:00, 2:00 and 3:00. I have 2 workers for my tasks. If I consider only the quality of the work, I assign worker 1 on slots 1 and 3, and worker 2 on slot 2. But then the 1st worker has a hole in his schedule, So I'd like to put a "weight" value so that I have an equilibrium between the total quality and the will to have compact work hours.
Thanks,
Guillaume
I got a task at work to evenly schedule commercial timed items into pre-defined commercial breaks (containers).
Each campaign has a set of commercials with or without spreading order. I need to allow users to chose multiple campaigns and distribute all the commercials to best fit the breaks within a time window.
Example of Campaign A:
Item | Duration | # of times to schedule | order
A1 15 sec 2 1
A2 25 sec 2 2
A3 30 sec 2 3
Required outcome:
each item should appear only once in a break, no repeating.
if there is specific order try to best fit by keeping the order. If
no order shuffle it.
At the end of the process the breaks should contain evenly amount of
commercial time.
Ideal spread would fully fill all desired campaigns into the breaks.
For example: Campaign {Item,Duration,#ofTimes,Order}
Campaign A which has set {A1,15,2,1},{A2,25,2,2},{A3,10,1,3}
Campaign B which has set {B1,20,2,2},{B2,35,3,1},
Campaign C which has set {C1,10,1,1},{C2,15,2,3 sec},{C3,15,1,2 sec}
,{C4,40,1,4}
A client will choose to schedule those campaigns in a specific date that hold 5 breaks of 60 second each.
A good outcome would result in:
Container 1: {A1,15}{B2,35}{C1,10} total of 60 sec
Container 2: {C3,15}{A2,25}{B1,20} total of 60 sec
Container 3: {A3,10}{C2,15}{B2,35} total of 60 sec
Container 4: {C4,40}{B1,20} total of 60 sec
Container 5: {C2,15}{A3,10}{B3,35} total of 60 sec
Of course it's rarely that all will fit so perfectly in real-life examples.
There are so many combinations with large amount of items and I'm not sure how to go about it. The order of items inside a break needs to be dynamically calculated so that the end result would best fit all the items into the breaks.
If the architecture is poor and someone has a better idea (like giving priority to items over order and schedule based on priority or such I'll be glad to hear).
It seems like simulated annealing might be a good way to approach this problem. Just incorporate your constraints of: keeping order, even spreading and fitting into 60sec frame into the scoring function. Your random neighbor function might just swap 2 items with each other | move an item to a different frame.
I'm currently learning about priority queues and heaps in my Data Structures class and all that stuff and in the class power points there is a little section that introduces machine scheduling, and I'm having difficulty understanding what is going on.
It begins by giving an example:
m identical machines
n jobs/tasks to be performed
assign jobs to machines so that the time at which the last job completes is minimum. -->The wording of this last part sort of throws me of...what exactly does the italicized portion mean? Can somebody word it differently?
Continuing with the example it says:
3 machines and 7 jobs
job times are [6, 2, 3, 5, 10, 7, 14]
possible schedule, followed by this picture:
(Example schedule is constructed by scheduling the jobs in the order they appear in the given job list (left to right); each job is scheduled on the machine on which it will complete earliest.
)
Finish time = 21
Objective: find schedules with minimum finish time
And I don't really understand what is going on. I don't understand what is being accomplished, or how they came up with that little picture with the jobs and the different times...Can somebody help me out?
"The time at which the last job completes is minimum" = "the time at which the all jobs are finished", if that helps.
In your example, that happens at time = 21. Clearly there's no jobs still running after that time, and all jobs have been scheduled (i.e. you can't schedule no jobs and say the minimum time is time = 0).
To explain the example:
The given jobs are the duration of the jobs. The job with duration 6 is scheduled first - since scheduling it on machines A, B or C will all end up with it finishing at time 6, which one doesn't really matter, so we just schedule it on machine A. Then the job with duration 2 is scheduled. Similarly it can go on B or C (if it were to go on A, it would finish at time 8, so that's not in line with our algorithm), and we schedule it on B. Then the job with duration 3 is scheduled. The respective end times for machines A, B and C would be 9, 5 and 3, so we schedule it on machine C. And so on.
Although the given algorithm is not the best we can do (but perhaps there is something enforcing the order, although that won't make too much sense). One better assignment:
14 16
A | 14 |2|
10 16
B | 10 | 6 |
7 10 15
C | 7 | 3| 5 |
Here all jobs are finished at time = 16.
I've listed the actual job chosen for each slot in the slot itself to hopefully explain it better to possibly clear up any remaining confusion (for example, on machine A, you can see that the jobs with duration 14 and 2 were scheduled, ending at time 16).
I'm sure the given algorithm was just an introduction to the problem and you'll get to always producing the best result soon.
What's being accomplished with trying to get all jobs to finish as soon as possible: think of a computer with multiple cores for example. There are many reasons you'd want tasks to finish as soon as possible. Perhaps you're playing a game and you have a bunch of tasks that work out what's happening (maybe there's a task assigned to each unit / a few units to determine what it does). You can only display after all tasks is finished, so if you don't try to finish as soon as possible, you'll unnecessarily make the game slow.
Waiting time is defined as how long each process has to wait before it gets it's time slice.
In scheduling algorithms such as Shorted Job First and First Come First Serve, we can find that waiting time easily when we just queue up the jobs and see how long each one had to wait before it got serviced.
When it comes to Round Robin or any other preemptive algorithms, we find that long running jobs spend a little time in CPU, when they are preempted and then wait for sometime for it's turn to execute and at some point in it's turn, it executes till completion. I wanted to findout the best way to understand 'waiting time' of the jobs in such a scheduling algorithm.
I found a formula which gives waiting time as:
Waiting Time = (Final Start Time - Previous Time in CPU - Arrival Time)
But I fail to understand the reasoning for this formula. For e.g. Consider a job A which has a burst time of 30 units and round-robin happens at every 5 units. There are two more jobs B(10) and C(15).
The order in which these will be serviced would be:
0 A 5 B 10 C 15 A 20 B 25 C 30 A 35 C 40 A 45 A 50 A 55
Waiting time for A = 40 - 5 - 0
I choose 40 because, after 40 A never waits. It just gets its time slices and goes on and on.
Choose 5 because A spent in process previouly between 30 and 35.
0 is the start time.
Well, I have a doubt in this formula as why was 15 A 20 is not accounted for?
Intuitively, I unable to get how this is getting us the waiting time for A, when we are just accounting for the penultimate execution only and then subtracting the arrival time.
According to me, the waiting time for A should be:
Final Start time - (sum of all times it spend in the processing).
If this formula is wrong, why is it?
Please help clarify my understanding of this concept.
You've misunderstood what the formula means by "previous time in CPU". This actually means the same thing as what you call "sum of all times it spend in the processing". (I guess "previous time in CPU" is supposed to be short for "total time previously spent running on the CPU", where "previously" means "before the final start".)
You still need to subtract the arrival time because the process obviously wasn't waiting before it arrived. (Just in case this is unclear: The "arrival time" is the time when the job was submitted to the scheduler.) In your example, the arrival time for all processes is 0, so this doesn't make a difference there, but in the general case, the arrival time needs to be taken into account.
Edit: If you look at the example on the webpage you linked to, process P1 takes two time slices of four time units each before its final start, and its "previous time in CPU" is calculated as 8, consistent with the interpretation above.
Last waiting
value-(time quantum×(n-1))
Here n denotes the no of times a process arrives in the gantt chart.