I am trying to understand how Greedy Algorithm scheduling problem works.
So I've been reading and googling for a while since I could not understand Greedy algorithm scheduling problem.
We have n jobs to schedule on a single resource. The job (i) has a requested start time s(i) and finish time f(i).
There are some greedy ideas which we select...
Accept in increasing order of s ("earliest start time")
Accept in increasing order of f - s ("shortest job time")
Accept in increasing order of number of conflicts ("fewest conflicts")
Accept in increasing order of f ("earliest finish time")
And the book says the last one, accept in increasing order of f will always gives an optimal solution.
However it did not mention why it always gives optimal solution and why other 3 will not give optimal solution.
They provided the figure that says why other three will not provide optimal solution but I could not understand what it means.
Since I have low reputation, I can not post any image so I will try to draw it.
|---| |---| |---|
|-------------------------|
increasing order of s
underestimated solution
|-----------| |-----------|
|-----|
increasing order of f-s
underestimated solution
|----| |----| |----| |----|
|-----| |-----| |-----|
|-----| |-----|
|-----| |-----|
increasing order of number of conflicts.
underestimated solution
This is what it looks like and I don't see why this is a counterexample of each scenario.
If anyone can explain why each greedy idea does/ does not work, it will be very helpful.
Thank you.
I think I can explain this.
Lets say, we have n jobs, start times as s[1..n] and finish times as f[1..n]. So if we sort it according to finish times, then, we will always be able to complete most number of tasks. Lets see, how.
If a job is finishing earlier (even if it started later in the series, a short job), then, we always have more time for later jobs. Lets assume, we have other jobs that we could start/complete in this interval so that our number of tasks could increase. Now, this is not actually possible as if any task completed before this, then that would be the one with earliest finish time so we would be working on that one. And, if any task has not been completed till now (but has started), then if we selected that, we would not have completed any task but now we actually have done one at least. So, in any case, this is the most optimal choice.
There are many possible solutions with maximum number of tasks that can be done in an interval, EFT gives one such solution. But it is always the max number possible.
I hope I could explain it well.
Since #vish4071 has already explained why selecting earliest finish time will lead to optimal solution, I'll only explain the counterexamples. Task [a,b] starts at a and ends at b. I'll use the counterexamples you have provided.
Earliest start time
Suppose tasks [1,10], [2,3], [4,5], [6,7]. The earliest start time strategy will choose [1,10] and then refuse the other 3, since they all collide with the first one. Yet we can see that [2,3], [4,5], [6,7] is the optimal solution, so earliest start time strategy will not always yield the optimal result.
Shortest execution time
Suppose tasks [1,10], [11,20], [9,12]. This strategy would choose [9,12] and then reject the other two, but optimal solution is [1,10], [11,20]. Therefore, shortest execution time strategy will not always lead to optimal result.
Least amount of collisions
This strategy seems promising, but your example with 11 task proves it not to be optimal. Suppose tasks: [1,4], 3x[3,6], [5,8], [7,10], [9,12], 3x[11,14] and [13, 16]. [7,10] has only 2 collisions with other tasks, which is less than any other task, so it would be selected first by the least amount of collisions strategy. Then [1,4] and [13, 16] would be selected, and all the other tasks rejected because they collide with already selected tasks. That is 3 tasks, however 4 tasks can be selected without collision: [1,4], [5,8], [9,12] and [13, 16].
You can also see that the earliest finish time strategy will always choose the optimal solution in these examples. Note that more than one optimal solution can exist with same number of selected tasks. In such case, earliest finish time strategy will always choose one of them.
Related
I'm trying to look for an algorithm to optimally schedule events, given a set of timeslots. Each event (a,b) is a meeting between 2 users and each timeslot is a fixed amount of time.
eg. a possible set of events can be: [(1,2),(1,3),(4,2),(4,3),(3,1)] with 4 possible timeslots. All events have to be scheduled in a certain timeslot, however, waiting time per user should be minimised (time between two events) and at the same time, the amount of users in a waiting timeslot should be maximised.
Do you know of any possible algorithm or heuristic for this problem?
Greetings
Sound like a combination of Job Shop Scheduling (video) and Meeting Scheduling (video) with a fairness constraint. Both are NP-complete.
Use a simple greedy Construction Heuristic (such as First Fit Decreasing) with Local Search (such as Tabu Search). For these use cases, Local Search leads to better results than Genetic Algorithms, as well be more scalable (see research competitions for proof).
For the fairness constraint "waiting time per user should be minimised", penalize the waiting time squared:
You could get a maybe-better-than-random solution with a simple approach:
sort each pair with the lower-numbered user first
sort the list on first-user (primary key), second-user (secondary sort key)
schedule meetings in that order, with any independent meetings scheduled in parallel. (Like a CPU instruction scheduler looking ahead for independent instructions. Any given user will still have their meetings in the listed order. You're just finding allowed overlaps here.)
I'm unfortunately not an expert on trying to reduce problems to known NP problems like the travelling salesman problem. It's possible there's a polynomial-time solution to this, but it's not obvious to me. If nobody comes up with one, then read on:
If the list isn't too big, you could brute-force check every permutation. For each permutation, schedule all the meetings (with independent meetings in parallel), then sum the last-first meeting times for every user. That's the score for that permutation. Take the permutation with the lowest score.
Instead of brute force, you could use a random start point and evolve towards a local minimum. Phylogenetics software like phyml uses this technique to search for maximum-likelihood evolutionary tree, which has a similarly factorial search space.
Start with a random permutation and evaluate its score
make some random changes, then evaluate the score
if it's not an improvement, try another permutation until you find one that is. (maybe with a mechanism to remember that you already tried this modification to the starting tree).
Repeat from 2 with this new tree, until you've converged on a local minimum.
Repeat from 1 for some other starting guesses, and take the best final result.
If you can efficiently figure out the score change from a swap, that will be a big speedup over re-computing the score for a permutation from scratch.
This is similar to a genetic algorithm. You should read up on that and see if any of those ideas can work.
Assuming all processes arrive at the same time, shortest job first seems to be optimal in terms of lowering the average turn around time. I also managed to prove that.
However, when procesess arrive at different times I felt like the optimal algorithm would be the Shortest Remaing Time (preemptive shortest job first). But I can't find a way to prove it. Can someone help me/point me to a solution? Or am I flat out wrong?
http://en.wikipedia.org/wiki/Shortest_remaining_time
You can run one process at a time. No context switch time.
EDIT:
Say we have n proccesses.
Each process has an execution time of P(i). 1<= i <= n
Each process becomes available for execution at a specific time R(i)
Each process ends running at some time C(i) (turn-around time) based on when it started running, if it was suspended e.t.c
all times are integers. no specific example. I just have to find an algorithm that optimizes the average turn around time ((C(1)+C(2)+...+C(n))/n) for any given input. (as low aas possible)
I'm trying to come up with a fast and reasonably optimal algorithm to solve the following TSP/hamiltonian-path-like problem:
A delivery vehicle has a number of pickups and dropoffs it needs to
perform:
For each delivery, the pickup needs to come before the
dropoff.
The vehicle is quite small and the packages vary in size.
The total carriage cannot exceed some upper bound (e.g. 1 cubic
metre). Each delivery has a deadline.
The planner can run mid-route, so the vehicle will begin with a number of jobs already picked up and some capacity already taken up.
A near-optimal solution should minimise the total cost (for simplicity, distance) between each waypoint. If a solution does not exist because of the time constraints, I need to find a solution that has the fewest number of late deliveries. Some illustrations of an example problem and a non-optimal, but valid solution:
I am currently using a greedy best first search with backtracking bounded to 100 branches. If it fails to find a solution with on-time deliveries, I randomly generate as many as I can in one second (the most computational time I can spare) and pick the one with the fewest number of late deliveries. I have looked into linear programming but can't get my head around it - plus I would think it would be inappropriate given it needs to be run very frequently. I've also tried algorithms that require mutating the tour, but the issue is mutating a tour nearly always makes it invalid due to capacity constraints and precedence. Can anyone think of a better heuristic approach to solving this problem? Many thanks!
Safe Moves
Here are some ideas for safely mutating an existing feasible solution:
Any two consecutive stops can always be swapped if they are both pickups, or both deliveries. This is obviously true for the "both deliveries" case; for the "both pickups" case: if you had room to pick up A, then pick up B without delivering anything in between, then you have room to pick up B first, then pick up A. (In fact a more general rule is possible: In any pure-delivery or pure-pickup sequence of consecutive stops, the stops can be rearranged arbitrarily. But enumerating all the possibilities might become prohibitive for long sequences, and you should be able to get most of the benefit by considering just pairs.)
A pickup of A can be swapped with any later delivery of something else B, provided that A's original pickup comes after B was picked up, and A's own delivery comes after B's original delivery. In the special case where the pickup of A is immediately followed by the delivery of B, they can always be swapped.
If there is a delivery of an item of size d followed by a pickup of an item of size p, then they can be swapped provided that there is enough extra room: specifically, provided that f >= p, where f is the free space available before the delivery. (We already know that f + d >= p, otherwise the original schedule wouldn't be feasible -- this is a hint to look for small deliveries to apply this rule to.)
If you are starting from purely randomly generated schedules, then simply trying all possible moves, greedily choosing the best, applying it and then repeating until no more moves yield an improvement should give you a big quality boost!
Scoring Solutions
It's very useful to have a way to score a solution, so that they can be ordered. The nice thing about a score is that it's easy to incorporate levels of importance: just as the first digit of a two-digit number is more important than the second digit, you can design the score so that more important things (e.g. deadline violations) receive a much greater weight than less important things (e.g. total travel time or distance). I would suggest something like 1000 * num_deadline_violations + total_travel_time. (This assumes of course that total_travel_time is in units that will stay beneath 1000.) We would then try to minimise this.
Managing Solutions
Instead of taking one solution and trying all the above possible moves on it, I would instead suggest using a pool of k solutions (say, k = 10000) stored in a min-heap. This allows you to extract the best solution in the pool in O(log k) time, and to insert new solutions in the same time.
You could initially populate the pool with randomly generated feasible solutions; then on each step, you would extract the best solution in the pool, try all possible moves on it to generate child solutions, and insert any child solutions that are better than their parent back into the pool. Whenever the pool doubles in size, pull out the first (i.e. best) k solutions and make a new min-heap with them, discarding the old one. (Performing this step after the heap grows to a constant multiple of its original size like this has the nice property of leaving the amortised time complexity unchanged.)
It can happen that some move on solution X produces a child solution Y that is already in the pool. This wastes memory, which is unfortunate, but one nice property of the min-heap approach is that you can at least handle these duplicates cheaply when they arrive at the front of the heap: all duplicates will have identical scores, so they will all appear consecutively when extracting solutions from the top of the heap. Thus to avoid having duplicate solutions generate duplicate children "down through the generations", it suffices to check that the new top of the heap is different from the just-extracted solution, and keep extracting and discarding solutions until this holds.
A note on keeping worse solutions: It might seem that it could be worthwhile keeping child solutions even if they are slightly worse than their parent, and indeed this may be useful (or even necessary to find the absolute optimal solution), but doing so has a nasty consequence: it means that it's possible to cycle from one solution to its child and back again (or possibly a longer cycle). This wastes CPU time on solutions we have already visited.
You are basically combining the Knapsack Problem with the Travelling Salesman Problem.
Your main problem here seems to be actually the Knapsack Problem, rather then the Travelling Salesman Problem, since it has the one hard restriction (maximum delivery volume). Maybe try to combine the solutions for the Knapsack Problem with the Travelling Salesman.
If you really only have one second max for calculations a greedy algorithm with backtracking might actually be one of the best solutions that you can get.
Here is the assignment problem http://en.wikipedia.org/wiki/Generalized_assignment_problem
I have a similar task, but can't find the algorithm.
We have m tasks, n laborers, m>n. When task is done, the laborer takes the next one (if there is free one). If task is taken by some laborer, no one else can take it. Each laborer has his own speed: V1..Vn, each task has its own 'volume' - W1..Wm. So, i need to distribute tasks between laborers with the goal of minimization of time doing all tasks.
Please help me to find an algorithm or how this problem is named.)
This problem is scheduling jobs on parallel, uniformly related machines so as to minimize the makespan. There's a polynomial-time approximation scheme due to Hochbaum and Shmoys (Using dual approximation algorithms for scheduling problems: Theoretical and practical results, 1988). btilly is right that the bin-packing problem is closely related; the analyses of both Hochbaum--Shmoys and the previous best approximation MULTIFIT are based on techniques pioneered for bin packing.
This looks like a likely np-complete variation of the http://en.wikipedia.org/wiki/Bin_packing_problem. I would therefore not worry about an exact algorithm.
Assuming that the tasks are independent, my first try would be a greedy heuristic. Given an estimate of finishing time, assign to each worker at all points the longest task that they can finish before that finishing time. Now do a binary search to find the shortest finishing time that you can get away with. Your initial upper time is the time for the fastest worker to do everything. Your initial lower time is the time for all of the workers to complete that much work if all are working at the same time.
This is clearly not always going to be perfectly optimal. But it should work reasonably well.
At work, we are given a set of constraints of the form (taskname, frequency) where frequency is an integer number which means the number of ticks between each invocation of the task "taskname". Two tasks cannot run concurrently, and each task invocation takes one tick to complete. Our goal is to find the best schedule in terms of matching the set of constraints.
For example, if we are given the constraints {(a, 2), (b,2)} the best schedule is "ab ab ab..."
On the other hand, if we are given the constraints ({a,2}, {b, 5}, {c, 5}) the best schedule is probably "abaca abaca abaca..."
Currently we find the best schedule by running a genetic algorithm which tries to minimize the distance between actual frequencies and the given constrains. It actually works pretty well, but I wonder if there's some algorithm which better suits this kind of problem. I've tried to search Google but I seem to lack the right words (scheduling is usually about completing tasks :(). Can you help?
First off, consider the merits of jldupont's comment! :)
Second, I think 'period' is the accurate description of the second element of the tuple, e.g. {Name, Period[icity]}.
That said, look to networking algorithms. Some variant of weighted queuing is probably applicable here.
For example, given N tasks, create N queues corresponding to tasks T0...Tn, and in each cycle ("tick") based on the period of the task, queue an item to the corresponding queue.
The scheduler algorithm would then aim for minimizing (on average) the total number of waiters in the queues. Simple starting off point would be to simply dequeue from the quene Qx which has the highest number of items. (A parameter on queued item to indicate 'age' would assist in prioritization.)