A friend gave me a puzzle that he says can be solved in better than O(n^3) time.
Given a set of n jobs that each have a set start time and end time (overlaps are very possible), find the smallest subset that for every job either includes that job or includes a job that has overlap with that job.
I'm pretty sure that the optimal solution is to pick the job with the most unmarked overlap, add it to the solution set, then mark it, and its overlap. And repeat until all jobs are marked.
Figuring out which job has the most unmarked overlappers is a simple adjacency matrix (O(n^2)), and this has to be redone every time a job is selected, in order to update the marks, making it O(n^3).
Is there a better solution?
Let A be the set of jobs which we haven't overlapped yet.
Find the job x in A which has the minimal end time (t).
From all jobs whose start time is less than t: pick the job j with the maximum end time.
Add j to the output set.
Remove all jobs which overlap j from A.
Repeat 1-4 until A is empty.
A simple implementation will run in O(n^2). Using interval trees it's probably possible to solve in O(n*logn).
The basic idea behind why it's an optimal solution (not a formal proof): We have to pick one job whose start time is less than t, so that x will be overlapped. If we let S be the set of all jobs whose start time is less than t, it can be shown that j will overlap the same jobs as any job in S, plus possibly more. Since we have to pick one job in S, the best choice is j. We can use this idea to form a proof by induction on the number of jobs.
We can achieve an O(nlogn) solution with a dynamic programming approach. In particular, we want to consider the size of the smallest set including the kth job and matching the first k jobs (ordered by start time), which we denote by S(k). We should first add an auxiliary job (∞,∞), so the result will be our DP solution for this final job minus one.
To compute S(k), consider the job p(k) which ends before job k, but has maximal start time. Note that p is an increasing function. S(k) will then be one more than the minimum S(i) with end(i) > start(p(k)).
We can efficiently find this job by maintaining a (S(k) ordered min) heap of potential jobs. After computing each S(k), we add the job to the heap. When we want to get a job, we remove jobs at the base of the heap which end too early, until we find a suitable one. This will take a total of at most O(nlogn), since we do at most O(n) of each heap operation (pop/peek/push).
The remainder of the task is to compute the p(k) values efficiently. One way to do this is to iterate over all job start and ends (in increasing time), keeping track of the latest starting job.
Related
Let's say I have a set of tasks that need to be done. I have two identical workers that will process them, and (for simplicity's sake) let's assume that I have perfect information on the complexity of the tasks: there are not tasks I don't know about, and I know exactly how long each one will take to complete. (But different tasks will take different amounts of time.) Each worker can only work on one task at a time, and once begun, must continue to work on it until the task is complete. There are no dependencies between tasks, such that one must be finished before another can be worked on.
Given these constraints, is there any known-best algorithm to divide the work between the two workers so that the total time to complete all tasks is minimized? The obvious, naive solution is that each time a worker is free, always assign it the longest (or shortest) remaining task, but is there any method that is more efficient?
This is the partition problem, which is NP-Complete, but if the tasks times are given in relatively low integers - there is a pseudo-polynomial time Dynamic programming solution to solve it.
In your case, you are basically given a set of numbers - and you want to assign them to two subsets, such that the sum of subsets is equal (or closest as possible to being equal, which is the optimization variant of partition problem).
The recursive formula for the DP solution should be something similar to this:
DP[0, 0] = true
DP[0, x] = false | x != 0
DP[i, x] = DP[i-1, x-value[i]] OR DP[i-1, x]
^ ^
assigned i to S1 Assigned i to S2
Calculate all values needed for DP[n+1, SUM] (where SUM is the total sum of tasks and n is the number of tasks), and you are looking for a value DP[n+1, SUM/2] to see if that can be done perfectly.
Getting the actual tasks for each subset is done by retracing your steps, similar to explained here.
I was wondering if someone could please help me reason about a DP algorithm for unweighted interval scheduling.
I'm given 2 arrays [t1,...,tn] and [d1,...,dn] where ti is the start time of job i and di is the duration of job i. Also the jobs are sorted by start time, so t1 <= t2 <= ... <= tn. I need to maximize the number of jobs that can be executed without any overlaps. I'm trying to come up with a DP algorithm and runtime for this problem. Any help would be much appreciated!
Thank you!
I am sorry I don't have any more time now to spend on this problem. Here is an idea, I think it lends itself nicely to Dynamic Programming. [Actually I think it is DP, but almost two decades have passed since I last studied such things...]
Suppose T = {t1, t2, ..., tn} is partitioned as follows:
T = {t1, t2, ..., tn} = {t1, t2, ..., tk} U {tk+1, tk+2, ..., tn}
= T1(k) U T2(k)
Let T2'(k) be the subset of T2(k) not containing the jobs overlapping T1(k).
Let opt(X) be the optimal value for a subset X of T. Then
opt(T) = min( opt( T1(k) ) + opt( T2'(k) )
where the minimum is taken along any possible k in {1, 2, ..., n}
Of course you need to compute opt() recursively, and take into account overlaps.
Hope this helps!
It's easiest for me to think about if I suppose that you work out what the end time would be for each job and sort the jobs into order of increasing end time, although you can probably achieve the same thing using start times working in the opposite direction.
Consider each job in order of increasing end time. For each job work out the maximum number of jobs you can handle up to and including that job if you decide to work on that job. To work this out, look at the answers you have already computed that cover times up to the start time of that job and find the one that covers the maximum number of jobs. The best you can do while handling the job you are considering is one plus that maximum number.
When you have considered all the jobs, the maximum number you can cover is the maximum number you have computed when considering any job. You can work out which jobs to do by storing the previous job that you identified when working out the maximum score possible for a particular job, and then tracing these pointers back from the job with the maximum score.
With N jobs to consider you look back at at most N previously computed answers when working out the best possible score for each job so I think this is O(N^2)
there are n jobs in a set, each with starting times si, and finish times fi, for ni
I'm trying to figure out if the ordering jobs in ordering ascending start time, finish time, and interval time (fi - si) is optimal or not.
I said that ordering in ascending earliest start time was not optimal in the case that the first job starts first however spans the time that 3 jobs could be started and finished.
Next I said that ordering in ascending finish time was optimal because right when a finish time is added, the next fastest ending job as added, maximizing numbers of jobs added to the non-overlapping jobs list.
However I'm not sure about the ordering fi - si is optimal.
My logic is that it is optimal, because it would list the shortest jobs which I believe would add or consider the jobs that span the lengths of other jobs last
EDIT : Optimize by maximizing the size of the non-overlapping processes list
I think there is a suprisingly simple strategy for choosing the next job which gives you a subset with the maximal number of consecutive jobs: among the jobs left which have a valid start time (in the beginning: all start times are valid; after the first job has been chosen the start time of the next job must, of course, not precede the finish time of the previously chosen job) always choose the job with the earliest finish time.
A proof that this strategy is optimal can start like this: assume you have an optimal (i.e. maximal) subset of consecutive jobs and that the first job is not the job with the (overall) earliest finish time, then this job with the overall earliest finish time cannot be in the optimal subset, but you can replace the first job of the optimal subset with this job and you get another optimal subset which has the job with earliest finish time as first job. Now you can continue in the same way with the second job and thus it is clear that in the subset generated with the above strategy the n-th job has a finish time that does not exceed the finish time of the n-th job of any optimal subset, for any n, and hence the so created subset is also optimal.
I know the algorithm exists but i and having problems naming it and finding a suitable solutions.
My problem is as follows:
I have a set of J jobs that need to be completed.
All jobs take different times to complete, but the time is known.
I have a set of R resources.
Each recourse R may have any number from 1 to 100 instances.
A Job may need to use any number of resources R.
A job may need to use multiple instances of a resource R but never more than the resource R has instances. (if a resource only has 2 instances a job will never need more than 2 instances)
Once a job completes it returns all instances of all resources it used back into the pool for other jobs to use.
A job cannot be preempted once started.
As long as resources allow, there is no limit to the number of jobs that can simultaneously execute.
This is not a directed graph problem, the jobs J may execute in any order as long as they can claim their resources.
My Goal:
The most optimal way to schedule the jobs to minimize run time and/or maximize resource utilization.
I'm not sure how good this idea is, but you could model this as an integer linear program, as follows (not tested)
Define some constants,
Use[j,i] = amount of resource i used by job j
Time[j] = length of job j
Capacity[i] = amount of resource i available
Define some variables,
x[j,t] = job j starts at time t
r[i,t] = amount of resource of type i used at time t
slot[t] = is time slot t used
The constraints are,
// every job must start exactly once
(1). for every j, sum[t](x[j,t]) = 1
// a resource can only be used up to its capacity
(2). r[i,t] <= Capacity[i]
// if a job is running, it uses resources
(3). r[i,t] = sum[j | s <= t && s + Time[j] >= t] (x[j,s] * Use[j,i])
// if a job is running, then the time slot is used
(4). slot[t] >= x[j,s] iff s <= t && s + Time[j] >= t
The third constraint means that if a job was started recently enough that it's still running, then its resource usage is added to the currently used resources. The fourth constraint means that if a job was started recently enough that it's still running, then this time slot is used.
The objective function is the weighted sum of slots, with higher weights for later slots, so that it prefers to fill the early slots. In theory the weights must increase exponentially to ensure using a later time slot is always worse than any configuration that uses only earlier time slots, but solvers don't like that and in practice you can probably get away with using slower growing weights.
You will need enough slots such that a solution exists, but preferably not too many more than you end up needing, so I suggest you start with a greedy solution to give you a hopefully non-trivial upper bound on the number of time slots (obviously there is also the sum of the lengths of all tasks).
There are many ways to get a greedy solution, for example just schedule the jobs one by one in the earliest time slot it will go. It may work better to order them by some measure of "hardness" and put the hard ones in first, for example you could give them a score based on how badly they use a resource up (say, the sum of Use[j,i] / Capacity[i], or maybe the maximum? who knows, try some things) and then order by that score in decreasing order.
As a bonus, you may not always have to solve the full ILP problem (which is NP-hard, so sometimes it can take a while), if you solve just the linear relaxation (allowing the variables to take fractional values, not just 0 or 1) you get a lower bound, and the approximate greedy solutions give upper bounds. If they are sufficiently close, you can skip the costly integer phase and take a greedy solution. In some cases this can even prove the greedy solution optimal, if the rounded-up objective from the linear relaxation is the same as the objective of the greedy solution.
This might be a job for Dykstra's Algorithm. For your case, if you want to maximize resource utilization, then each node in the search space is the result of adding a job to the list of jobs you'll do at once. The edges will then be the resources which are left when you add a job to the list of jobs you'll do.
The goal then, is to find the path to the node which has an incoming edge which is the smallest value.
An alternative, which is more straight forward, is to view this as a knapsack problem.
To construct this problem as an instance of The Knapsack Problem, I'd do the following:
Assuming I have J jobs, j_1, j_2, ..., j_n and R resources, I want to find the subset of J such that when that subset is scheduled, R is minimized (I'll call that J').
in pseudo-code:
def knapsack(J, R, J`):
potential_solutions = []
for j in J:
if R > resources_used_by(j):
potential_solutions.push( knapsack(J - j, R - resources_used_by(j), J' + j) )
else:
return J', R
return best_solution_of(potential_solutions)
Everyday from 9am to 5pm, I am supposed to have at least one person at the factory supervising the workers and make sure that nothing goes wrong.
There are currently n applicants to the job, and each of them can work from time si to time ci, i = 1, 2, ..., n.
My goal is to minimize the time that more than two people are keeping watch of the workers at the same time.
(The applicants' available working hours are able to cover the time period from 9am to 5pm.)
I have proved that at most two people are needed for any instant of time to fulfill my needs, but how should I get from here to the final solution?
Finding the time periods where only one person is available for the job and keeping them is my first step, but finding the next step is what troubles me... .
The algorithm must run in polynomial-time.
Any hints(a certain type of data structure maybe?) or references are welcome. Many thanks.
I think you can do this with dynamic programming by solving the sub-problem:
What is the minimum overlap time given that applicant i is the last worker and we have covered all times from start of day up to ci?
Call this value of the minimum overlap time cost(i).
You can compute the value of cost(i) by considering cases:
If si is equal to the start of day, then cost(i) = 0 (no overlap is required)
Otherwise, consider all previous applicants j. Set cost(i) to the minimum of cost(j)+overlap between i and j. Also set prev(i) to the value of j that attains the minimum.
Then the answer to your problem is given by the minimum of cost(k) for all values of k where ck is equal to the end of the day. You can work out the correct choice of people by backtracking using the values of prev.
This gives an O(n^2) algorithm.