At work, we are given a set of constraints of the form (taskname, frequency) where frequency is an integer number which means the number of ticks between each invocation of the task "taskname". Two tasks cannot run concurrently, and each task invocation takes one tick to complete. Our goal is to find the best schedule in terms of matching the set of constraints.
For example, if we are given the constraints {(a, 2), (b,2)} the best schedule is "ab ab ab..."
On the other hand, if we are given the constraints ({a,2}, {b, 5}, {c, 5}) the best schedule is probably "abaca abaca abaca..."
Currently we find the best schedule by running a genetic algorithm which tries to minimize the distance between actual frequencies and the given constrains. It actually works pretty well, but I wonder if there's some algorithm which better suits this kind of problem. I've tried to search Google but I seem to lack the right words (scheduling is usually about completing tasks :(). Can you help?
First off, consider the merits of jldupont's comment! :)
Second, I think 'period' is the accurate description of the second element of the tuple, e.g. {Name, Period[icity]}.
That said, look to networking algorithms. Some variant of weighted queuing is probably applicable here.
For example, given N tasks, create N queues corresponding to tasks T0...Tn, and in each cycle ("tick") based on the period of the task, queue an item to the corresponding queue.
The scheduler algorithm would then aim for minimizing (on average) the total number of waiters in the queues. Simple starting off point would be to simply dequeue from the quene Qx which has the highest number of items. (A parameter on queued item to indicate 'age' would assist in prioritization.)
Related
I have stumbled into a problem that looks similar to the classic interval scheduling problem
However, in my case I do allow overlap between intervals- just want to minimize it as much as possible.
Is there a canonic algorithm that solves this? I did not find a 'relaxed' alternative online.
I don't think this problem maps cleanly to any of the classical scheduling literature. I'd try to solve it as an integer program using (e.g.) OR-Tools.
What makes this problem hard is that the order of the jobs is unknown. Otherwise, we could probably write a dynamic program. Handling continuous time efficiently would be tricky but doable.
Similarly, the natural first attempt for a mixed integer programming formulation would be to have a variable for the starting time of each job, but the overlap function is horribly non-convex, and I don't see a good way to encode using integer variables.
The formulation that I would try first would be to quantize time, then create a 0-1 variable x[j,s] for each valid (job, start time) pair. Then we write constraints to force each job to be scheduled exactly once:
for all j, sum over s of x[j,s] = 1.
As for the objective, we have a couple of different choices. I'll show one, but there's a lot of flexibility as long as one unit of time with i + j jobs running is worse than one unit with i jobs and a different unit with j jobs.
For each time t, we make a non-negative integer variable y[t] that will represent the number of excess jobs running at t. We write constraints:
for all t, -y[t] + sum over (j,s) overlapping t of x[j,s] ≤ 1.
Technically this constraint only forces y[t] to be greater than or equal to the number of excess jobs. But the optimal solution will take it to be equal, because of the objective:
minimize sum over t of y[t].
I'm looking for an effective way of achieving optimal job/worker assignation. I'd use Hungarian Algorithm but there is a catch: a worker can be assigned to only one job at a time and each job has a rating and each worker has his own rating. A job rated 4 can be solved by either a worker with rating 4 or by multiple workers with their combined ratings equal to the rating of the job, e.g. 2+2 or 3+1 or 2+1+1 or 1+1+1+1. A job rated 2 can be solved by two workers rated 1 or one worker rated 2. I'd like to prefer one-to-one assignation whenever possible.
Is there any known algorithm or any simple way to achieve optimal assignation in this case?
Your problem is clearly at least as hard as the Partition Problem, even just to know if a feasible solution exists. To show this, let's have a partition instance. It can be easily transformed into your problem by creating two jobs and as many workers as the number of elements in the partition problem. Each work has a rating equal to the value of the corresponding element in the partition problem. Your problem has a solution if and only if the Partition problem has a solution, hence proving that your problem is NP-hard.
I think we could also make an argument that the problem is at least as hard as NP-Complete, if we consider Subset Sum.
Transform it into this decision problem:
Given one job with rating N whose value is in the set of all real numbers, and M workers with ratings Ri for i in [0, M), where each rating is in the set of all real numbers, is there a subset of workers whose rating adds up to N?
In our case, we may be restricting the problem to positive integers, but the decision problem remains, and is in fact much harder because we have many jobs as well, and we want to maximize the number of jobs completed.
I am trying to understand how Greedy Algorithm scheduling problem works.
So I've been reading and googling for a while since I could not understand Greedy algorithm scheduling problem.
We have n jobs to schedule on a single resource. The job (i) has a requested start time s(i) and finish time f(i).
There are some greedy ideas which we select...
Accept in increasing order of s ("earliest start time")
Accept in increasing order of f - s ("shortest job time")
Accept in increasing order of number of conflicts ("fewest conflicts")
Accept in increasing order of f ("earliest finish time")
And the book says the last one, accept in increasing order of f will always gives an optimal solution.
However it did not mention why it always gives optimal solution and why other 3 will not give optimal solution.
They provided the figure that says why other three will not provide optimal solution but I could not understand what it means.
Since I have low reputation, I can not post any image so I will try to draw it.
|---| |---| |---|
|-------------------------|
increasing order of s
underestimated solution
|-----------| |-----------|
|-----|
increasing order of f-s
underestimated solution
|----| |----| |----| |----|
|-----| |-----| |-----|
|-----| |-----|
|-----| |-----|
increasing order of number of conflicts.
underestimated solution
This is what it looks like and I don't see why this is a counterexample of each scenario.
If anyone can explain why each greedy idea does/ does not work, it will be very helpful.
Thank you.
I think I can explain this.
Lets say, we have n jobs, start times as s[1..n] and finish times as f[1..n]. So if we sort it according to finish times, then, we will always be able to complete most number of tasks. Lets see, how.
If a job is finishing earlier (even if it started later in the series, a short job), then, we always have more time for later jobs. Lets assume, we have other jobs that we could start/complete in this interval so that our number of tasks could increase. Now, this is not actually possible as if any task completed before this, then that would be the one with earliest finish time so we would be working on that one. And, if any task has not been completed till now (but has started), then if we selected that, we would not have completed any task but now we actually have done one at least. So, in any case, this is the most optimal choice.
There are many possible solutions with maximum number of tasks that can be done in an interval, EFT gives one such solution. But it is always the max number possible.
I hope I could explain it well.
Since #vish4071 has already explained why selecting earliest finish time will lead to optimal solution, I'll only explain the counterexamples. Task [a,b] starts at a and ends at b. I'll use the counterexamples you have provided.
Earliest start time
Suppose tasks [1,10], [2,3], [4,5], [6,7]. The earliest start time strategy will choose [1,10] and then refuse the other 3, since they all collide with the first one. Yet we can see that [2,3], [4,5], [6,7] is the optimal solution, so earliest start time strategy will not always yield the optimal result.
Shortest execution time
Suppose tasks [1,10], [11,20], [9,12]. This strategy would choose [9,12] and then reject the other two, but optimal solution is [1,10], [11,20]. Therefore, shortest execution time strategy will not always lead to optimal result.
Least amount of collisions
This strategy seems promising, but your example with 11 task proves it not to be optimal. Suppose tasks: [1,4], 3x[3,6], [5,8], [7,10], [9,12], 3x[11,14] and [13, 16]. [7,10] has only 2 collisions with other tasks, which is less than any other task, so it would be selected first by the least amount of collisions strategy. Then [1,4] and [13, 16] would be selected, and all the other tasks rejected because they collide with already selected tasks. That is 3 tasks, however 4 tasks can be selected without collision: [1,4], [5,8], [9,12] and [13, 16].
You can also see that the earliest finish time strategy will always choose the optimal solution in these examples. Note that more than one optimal solution can exist with same number of selected tasks. In such case, earliest finish time strategy will always choose one of them.
I'm trying to look for an algorithm to optimally schedule events, given a set of timeslots. Each event (a,b) is a meeting between 2 users and each timeslot is a fixed amount of time.
eg. a possible set of events can be: [(1,2),(1,3),(4,2),(4,3),(3,1)] with 4 possible timeslots. All events have to be scheduled in a certain timeslot, however, waiting time per user should be minimised (time between two events) and at the same time, the amount of users in a waiting timeslot should be maximised.
Do you know of any possible algorithm or heuristic for this problem?
Greetings
Sound like a combination of Job Shop Scheduling (video) and Meeting Scheduling (video) with a fairness constraint. Both are NP-complete.
Use a simple greedy Construction Heuristic (such as First Fit Decreasing) with Local Search (such as Tabu Search). For these use cases, Local Search leads to better results than Genetic Algorithms, as well be more scalable (see research competitions for proof).
For the fairness constraint "waiting time per user should be minimised", penalize the waiting time squared:
You could get a maybe-better-than-random solution with a simple approach:
sort each pair with the lower-numbered user first
sort the list on first-user (primary key), second-user (secondary sort key)
schedule meetings in that order, with any independent meetings scheduled in parallel. (Like a CPU instruction scheduler looking ahead for independent instructions. Any given user will still have their meetings in the listed order. You're just finding allowed overlaps here.)
I'm unfortunately not an expert on trying to reduce problems to known NP problems like the travelling salesman problem. It's possible there's a polynomial-time solution to this, but it's not obvious to me. If nobody comes up with one, then read on:
If the list isn't too big, you could brute-force check every permutation. For each permutation, schedule all the meetings (with independent meetings in parallel), then sum the last-first meeting times for every user. That's the score for that permutation. Take the permutation with the lowest score.
Instead of brute force, you could use a random start point and evolve towards a local minimum. Phylogenetics software like phyml uses this technique to search for maximum-likelihood evolutionary tree, which has a similarly factorial search space.
Start with a random permutation and evaluate its score
make some random changes, then evaluate the score
if it's not an improvement, try another permutation until you find one that is. (maybe with a mechanism to remember that you already tried this modification to the starting tree).
Repeat from 2 with this new tree, until you've converged on a local minimum.
Repeat from 1 for some other starting guesses, and take the best final result.
If you can efficiently figure out the score change from a swap, that will be a big speedup over re-computing the score for a permutation from scratch.
This is similar to a genetic algorithm. You should read up on that and see if any of those ideas can work.
I am trying to develop an algorithm to select a subset of activities from a larger list. If selected, each activity uses some amount of a fixed resource (i.e. the sum over the selected activities must stay under a total budget). There could be multiple feasible subsets, and the means of choosing from them will be based on calculating the opportunity cost of the activities not selected.
EDIT: There are two reasons this is not the 0-1 knapsack problem:
Knapsack requires integer values for the weights (i.e. resources consumed) whereas my resource consumption (i.e. mass in the knapsack parlance) is a continuous variable. (Obviously it's possible to pick some level of precision and quantize the required resources, but my bin size would have to be very small and Knapsack is O(2^n) in W.
I cannot calculate the opportunity cost a priori; that is, I can't evaluate the fitness of each one independently, although I can evaluate the utility of a given set of selected activities or the marginal utility from adding an additional task to an existing list.
The research I've done suggests a naive approach:
Define the powerset
For each element of the powerset, calculate it's utility based on the items not in the set
Select the element with the highest utility
However, I know there are ways to speed up execution time and required memory. For example:
fully enumerating a powerset is O(2^n), but I don't need to fully enumerate the list because once I've found a set of tasks that exceeds the budget I know that any set that adds more tasks is infeasible and can be rejected. That is if {1,2,3,4} is infeasible, so is {1,2,3,4} U {n}, where n is any one of the tasks remaining in the larger list.
Since I'm just summing duty the order of tasks doesn't matter (i.e. if {1,2,3} is feasible, so are {2,1,3}, {3,2,1}, etc.).
All I need in the end is the selected set, so I probably only need the best utility value found so far for comparison purposes.
I don't need to keep the list enumerations, as long as I can be sure I've looked at all the feasible ones. (Although I think keeping the duty sum for previously computed feasible sub-sets might speed run-time.)
I've convinced myself a good recursion algorithm will work, but I can't figure out how to define it, even in pseudo-code (which probably makes the most sense because it's going to be implemented in a couple of languages--probably Matlab for prototyping and then a compiled language later).
The knapsack problem is NP-complete, meaning that there's no efficient way of solving the problem. However there's a pseudo-polynomial time solution using dynamic programming. See the Wikipedia section on it for more details.
However if the maximum utility is large, you should stick with an approximation algorithm. One such approximation scheme is to greedily select items that have the greatest utility/cost. If the budget is large and the cost of each item is small, then this can work out very well.
EDIT: Since you're defining the utility in terms of items not in the set, you can simply redefine your costs. Negate the cost and then shift everything so that all your values are positive.
As others have mentioned, you are trying to solve some instance of the Knapsack problem. While theoretically, you are doomed, in practice you may still do a lot to increase the performance of your algorithm. Here are some (wildly assorted) ideas:
Be aware of Backtracking. This corresponds to your observation that once you crossed out {1, 2, 3, 4} as a solution, {1, 2, 3, 4} u {n} is not worth looking at.
Apply Dynamic Programming techniques.
Be clear about your actual requirements:
Maybe you don't need the best set? Will a good one do? I am not aware if there is an algorithm which provides a good solution in polynomial time, but there might well be.
Maybe you don't need the best set all the time? Using randomized algorithms you can solve some NP-Problems in polynomial time with the risk of failure in 1% (or whatever you deem "safe enough") of all executions.
(Remember: It's one thing to know that the halting problem is not solvable, but another to build a program that determines whether "hello world" implementations will run indefinetly.)
I think the following iterative algorithm will traverse the entire solution set and store the list of tasks, the total cost of performing them, and the opportunity cost of the tasks not performed.
It seems like it will execute in pseudo-polynomial time: polynomial in the number of activities and exponential in the number of activities that can fit within the budget.
ixCurrentSolution = 1
initialize empty set solution {
oc(ixCurrentSolution) = opportunity cost of doing nothing
tasklist(ixCurrentSolution) = empty set
costTotal(ixCurrentSolution) = 0
}
for ixTask = 1:cActivities
for ixSolution = 1:ixCurrentSolution
costCurrentSolution = costTotal(ixCurrentSolution) + cost(ixTask)
if costCurrentSolution < costMax
ixCurrentSolution++
costTotal(ixCurrentSolution) = costCurrentSolution
tasklist(ixCurrentSolution) = tasklist(ixSolution) U ixTask
oc(ixCurrentSolution) = OC of tasks not in tasklist(ixCurrentSolution)
endif
endfor
endfor