Dynamic programming algorithm for unweighted interval scheduling? - algorithm

I was wondering if someone could please help me reason about a DP algorithm for unweighted interval scheduling.
I'm given 2 arrays [t1,...,tn] and [d1,...,dn] where ti is the start time of job i and di is the duration of job i. Also the jobs are sorted by start time, so t1 <= t2 <= ... <= tn. I need to maximize the number of jobs that can be executed without any overlaps. I'm trying to come up with a DP algorithm and runtime for this problem. Any help would be much appreciated!
Thank you!

I am sorry I don't have any more time now to spend on this problem. Here is an idea, I think it lends itself nicely to Dynamic Programming. [Actually I think it is DP, but almost two decades have passed since I last studied such things...]
Suppose T = {t1, t2, ..., tn} is partitioned as follows:
T = {t1, t2, ..., tn} = {t1, t2, ..., tk} U {tk+1, tk+2, ..., tn}
= T1(k) U T2(k)
Let T2'(k) be the subset of T2(k) not containing the jobs overlapping T1(k).
Let opt(X) be the optimal value for a subset X of T. Then
opt(T) = min( opt( T1(k) ) + opt( T2'(k) )
where the minimum is taken along any possible k in {1, 2, ..., n}
Of course you need to compute opt() recursively, and take into account overlaps.
Hope this helps!

It's easiest for me to think about if I suppose that you work out what the end time would be for each job and sort the jobs into order of increasing end time, although you can probably achieve the same thing using start times working in the opposite direction.
Consider each job in order of increasing end time. For each job work out the maximum number of jobs you can handle up to and including that job if you decide to work on that job. To work this out, look at the answers you have already computed that cover times up to the start time of that job and find the one that covers the maximum number of jobs. The best you can do while handling the job you are considering is one plus that maximum number.
When you have considered all the jobs, the maximum number you can cover is the maximum number you have computed when considering any job. You can work out which jobs to do by storing the previous job that you identified when working out the maximum score possible for a particular job, and then tracing these pointers back from the job with the maximum score.
With N jobs to consider you look back at at most N previously computed answers when working out the best possible score for each job so I think this is O(N^2)

Related

Assignment problem with multiple persons needed on each job

First, sorry if my English isn't so good, I'm not a native English speaker.
I'm facing an assignment problem.
I have a list of jobs, with a certain number of persons needed for each job. Each person will let me know on how many jobs they want to be and their preferences.
I tried to do that with the Hungarian Algorithm but it seems I'm unable to get it done. With a large number of jobs and spots, some persons got multiple time the same job, which isn't ok.
I think it's all due to the fact I considered each spot as an individual job and I listed each person as many times as they need to be placed.
Do you know a better algorithm or a way to do it?
(It's not a coding problem, I'm doing it in Octave/Matlab for now, but I think I'll switch to Python.)
Thanks for your help.
In addition to Henrik's suggestion to use Linear Programming, your specific problem can also be solved using Minimum cost maximum flow.
You make a bipartite graph between people and jobs (like in the Hungarian algorithm), where the cost on the middle edges are the preference scores, and the capacity is 1.
The capacity of the edges from the jobs to the sink is the number of people you need for that job.
Assignment problems can be solved with linear programming:
Let xij = 1 if person i is assigned to job j and 0 otherwise. Let aij be the rank for person i of job j : aij = 1 for the job he wants most, aij = 2 for the next and so on. If he only wants k jobs you put aij to a very high number for all jobs beyond those k.
If you need at least bj workers on job j you have the constraint
x1j + ... + xmj >= bj (j = 1,...,n)
You also have the constraints xij >= 0 and xij <= 1 .
The linear function to minimize is
sum( aij xij ) over all i,j

Profit dependent on the previous job time - Job Scheduling problem

There are n jobs that need to be processed on a single machine. Job j requires tj time units to execute and has a profit value of pj. All the jobs are to schedule in time W = summation of tjtime units.
Scheduling job j to start at time sj earns a profit (W - sj)*pj.
I have already tried a greedy approach for pj and sj individually as well pj*tj but have been able to come up with a counterexample. I think it can be solved by a greedy algorithm using pj/tj in decreasing order but not able to prove it. I am just looking for some hints on how to prove it formally.
An approach I have seen before is to consider swapping two adjacent jobs in a proposed schedule. Suppose we have 1,2, where other stuff will take time K and then we hit a deadline. This is better left unswapped if
p1(K + t2) + p2K > p2(K + t1) + p1K
which simplifies to
p1t2 > p2t1
which simplifies to
p1 / t1 > p2 / t2
So if we sort in the way you guessed no swap of adjacent jobs will increase profits, but if there a schedule which does not follow this rule you can improve it by swapping adjacent jobs. So I think your guess is correct.

Resource allocation algorithm

I know the algorithm exists but i and having problems naming it and finding a suitable solutions.
My problem is as follows:
I have a set of J jobs that need to be completed.
All jobs take different times to complete, but the time is known.
I have a set of R resources.
Each recourse R may have any number from 1 to 100 instances.
A Job may need to use any number of resources R.
A job may need to use multiple instances of a resource R but never more than the resource R has instances. (if a resource only has 2 instances a job will never need more than 2 instances)
Once a job completes it returns all instances of all resources it used back into the pool for other jobs to use.
A job cannot be preempted once started.
As long as resources allow, there is no limit to the number of jobs that can simultaneously execute.
This is not a directed graph problem, the jobs J may execute in any order as long as they can claim their resources.
My Goal:
The most optimal way to schedule the jobs to minimize run time and/or maximize resource utilization.
I'm not sure how good this idea is, but you could model this as an integer linear program, as follows (not tested)
Define some constants,
Use[j,i] = amount of resource i used by job j
Time[j] = length of job j
Capacity[i] = amount of resource i available
Define some variables,
x[j,t] = job j starts at time t
r[i,t] = amount of resource of type i used at time t
slot[t] = is time slot t used
The constraints are,
// every job must start exactly once
(1). for every j, sum[t](x[j,t]) = 1
// a resource can only be used up to its capacity
(2). r[i,t] <= Capacity[i]
// if a job is running, it uses resources
(3). r[i,t] = sum[j | s <= t && s + Time[j] >= t] (x[j,s] * Use[j,i])
// if a job is running, then the time slot is used
(4). slot[t] >= x[j,s] iff s <= t && s + Time[j] >= t
The third constraint means that if a job was started recently enough that it's still running, then its resource usage is added to the currently used resources. The fourth constraint means that if a job was started recently enough that it's still running, then this time slot is used.
The objective function is the weighted sum of slots, with higher weights for later slots, so that it prefers to fill the early slots. In theory the weights must increase exponentially to ensure using a later time slot is always worse than any configuration that uses only earlier time slots, but solvers don't like that and in practice you can probably get away with using slower growing weights.
You will need enough slots such that a solution exists, but preferably not too many more than you end up needing, so I suggest you start with a greedy solution to give you a hopefully non-trivial upper bound on the number of time slots (obviously there is also the sum of the lengths of all tasks).
There are many ways to get a greedy solution, for example just schedule the jobs one by one in the earliest time slot it will go. It may work better to order them by some measure of "hardness" and put the hard ones in first, for example you could give them a score based on how badly they use a resource up (say, the sum of Use[j,i] / Capacity[i], or maybe the maximum? who knows, try some things) and then order by that score in decreasing order.
As a bonus, you may not always have to solve the full ILP problem (which is NP-hard, so sometimes it can take a while), if you solve just the linear relaxation (allowing the variables to take fractional values, not just 0 or 1) you get a lower bound, and the approximate greedy solutions give upper bounds. If they are sufficiently close, you can skip the costly integer phase and take a greedy solution. In some cases this can even prove the greedy solution optimal, if the rounded-up objective from the linear relaxation is the same as the objective of the greedy solution.
This might be a job for Dykstra's Algorithm. For your case, if you want to maximize resource utilization, then each node in the search space is the result of adding a job to the list of jobs you'll do at once. The edges will then be the resources which are left when you add a job to the list of jobs you'll do.
The goal then, is to find the path to the node which has an incoming edge which is the smallest value.
An alternative, which is more straight forward, is to view this as a knapsack problem.
To construct this problem as an instance of The Knapsack Problem, I'd do the following:
Assuming I have J jobs, j_1, j_2, ..., j_n and R resources, I want to find the subset of J such that when that subset is scheduled, R is minimized (I'll call that J').
in pseudo-code:
def knapsack(J, R, J`):
potential_solutions = []
for j in J:
if R > resources_used_by(j):
potential_solutions.push( knapsack(J - j, R - resources_used_by(j), J' + j) )
else:
return J', R
return best_solution_of(potential_solutions)

Variation to the Set-Covering Prob (Maybe an Activity Selection Prob)

Everyday from 9am to 5pm, I am supposed to have at least one person at the factory supervising the workers and make sure that nothing goes wrong.
There are currently n applicants to the job, and each of them can work from time si to time ci, i = 1, 2, ..., n.
My goal is to minimize the time that more than two people are keeping watch of the workers at the same time.
(The applicants' available working hours are able to cover the time period from 9am to 5pm.)
I have proved that at most two people are needed for any instant of time to fulfill my needs, but how should I get from here to the final solution?
Finding the time periods where only one person is available for the job and keeping them is my first step, but finding the next step is what troubles me... .
The algorithm must run in polynomial-time.
Any hints(a certain type of data structure maybe?) or references are welcome. Many thanks.
I think you can do this with dynamic programming by solving the sub-problem:
What is the minimum overlap time given that applicant i is the last worker and we have covered all times from start of day up to ci?
Call this value of the minimum overlap time cost(i).
You can compute the value of cost(i) by considering cases:
If si is equal to the start of day, then cost(i) = 0 (no overlap is required)
Otherwise, consider all previous applicants j. Set cost(i) to the minimum of cost(j)+overlap between i and j. Also set prev(i) to the value of j that attains the minimum.
Then the answer to your problem is given by the minimum of cost(k) for all values of k where ck is equal to the end of the day. You can work out the correct choice of people by backtracking using the values of prev.
This gives an O(n^2) algorithm.

Find the smallest set of overlapping jobs

A friend gave me a puzzle that he says can be solved in better than O(n^3) time.
Given a set of n jobs that each have a set start time and end time (overlaps are very possible), find the smallest subset that for every job either includes that job or includes a job that has overlap with that job.
I'm pretty sure that the optimal solution is to pick the job with the most unmarked overlap, add it to the solution set, then mark it, and its overlap. And repeat until all jobs are marked.
Figuring out which job has the most unmarked overlappers is a simple adjacency matrix (O(n^2)), and this has to be redone every time a job is selected, in order to update the marks, making it O(n^3).
Is there a better solution?
Let A be the set of jobs which we haven't overlapped yet.
Find the job x in A which has the minimal end time (t).
From all jobs whose start time is less than t: pick the job j with the maximum end time.
Add j to the output set.
Remove all jobs which overlap j from A.
Repeat 1-4 until A is empty.
A simple implementation will run in O(n^2). Using interval trees it's probably possible to solve in O(n*logn).
The basic idea behind why it's an optimal solution (not a formal proof): We have to pick one job whose start time is less than t, so that x will be overlapped. If we let S be the set of all jobs whose start time is less than t, it can be shown that j will overlap the same jobs as any job in S, plus possibly more. Since we have to pick one job in S, the best choice is j. We can use this idea to form a proof by induction on the number of jobs.
We can achieve an O(nlogn) solution with a dynamic programming approach. In particular, we want to consider the size of the smallest set including the kth job and matching the first k jobs (ordered by start time), which we denote by S(k). We should first add an auxiliary job (∞,∞), so the result will be our DP solution for this final job minus one.
To compute S(k), consider the job p(k) which ends before job k, but has maximal start time. Note that p is an increasing function. S(k) will then be one more than the minimum S(i) with end(i) > start(p(k)).
We can efficiently find this job by maintaining a (S(k) ordered min) heap of potential jobs. After computing each S(k), we add the job to the heap. When we want to get a job, we remove jobs at the base of the heap which end too early, until we find a suitable one. This will take a total of at most O(nlogn), since we do at most O(n) of each heap operation (pop/peek/push).
The remainder of the task is to compute the p(k) values efficiently. One way to do this is to iterate over all job start and ends (in increasing time), keeping track of the latest starting job.

Resources