Best scheduling jobs - algorithm

I have been working on this question and can't seem to find the right answer. Can someone please help me with this?
We are given N jobs [1,..,N]. We'll get a salary S(i) >= 0 for getting a job i done, and a deduction D(i) >= 0 that adds up for each day passing.
We'll need T(i) days to complete job i. Suppose the job i is done on day d, we'll get S(i) - d.D(i) in reward. The reward can be negative if d is too big.
We can switch jobs in the process and work on jobs in any order, meaning if we start job 1 that takes 5 days on day 1, we don't have to spend 5 consecutive days working on job 1.
How can we decide the best schedule of the jobs, so that we can complete all the jobs and get maximum salary?

I think shapiro is right. You need to determine an appropriate weighted cost formula for each task. It has to take into account the days remaining, the per day deduction, and maybe total deduction.
Once you have the weighted cost you can sort the task list by the weighted cost and perform one day of work on the first task in the list (should be the one that will cost the most if not completed). Then recalculate the weighted cost for all the tasks now that a day has passed, sort the list, and repeat until all tasks are complete.
Generally when you are optimizing schedules in the real world this is the approach. Figure out which task should be worked on first, do some work on it, then recalculate to see if you should switch tasks or keep working on the current one.

Following the above discussion:
For each job i, calculate the one day delay cost as X(i) = D(i) / T(i) and order the jobs by it. Maybe even just order by D(i) since when you choose one job you are not choosing the others - so it makes sense to choose the one with the most expensive deduction. Perform the jobs by this order to minimize the deduction fees.
Again, this is assuming that S(i) is a fixed reward for each job, independent on the exact day it is finished by, and that all jobs need to be performed.

First forget about S(i). You are doing all the jobs you get all the rewards anyway.
Second there's no point to interrupt a task and switch to another.Let's say you have jobs A and B. The deduction you get for the one that finishes last is the same (it's going to take T(A) + T(B) to finish it regardless of how you schedule). The deduction for the other job can only increase if you switch because it's going to take longer to finish it. So you're best if you drop the switch.
Now the problem is to order the tasks so that you get minimum amount of penalty. I'm not sure what's next.
You can pick the first job to minimize T(x) * sum(d) (since you commit to dong job x everything will incur T(x) days delay).
Or you can pick the last job since you know you're going to pay sum(T) * d(x) (you know when it's going to finish).
One says order by T(x) the other says order by d(x) and they are both wrong.
Likely the solution is some dynamic programming in this space, but it escapes me at the moment.

Related

Optimal job interval algorithm

Let's say you have different jobs that you need to run on a regular basis (for example, you want to make API calls to different endpoints).
Let's say you need to hit two different endpoints and you want your calls to be as far away in time from each other as possible.
Example: You have two jobs, one is run once a minute, another is run twice a minute.
Solution: Start job A with interval of 60 seconds, wait 15 seconds, start job B with interval of 30 seconds.
This way the jobs will run at seconds: 0(job A), 15(job B), 45(job B), 60(job A), 75(job B), 105(job B), 120(job A), ... making a maximum interval between API calls 15 seconds while maintaining the call frequency that we need.
Can you think of an algorithm for these cases that will give optimal start times for each job so that the minimum time difference between calls in maximized? Ideally this algorithm could handle more than two jobs.
Assume we don't need to wait for the job to be finished to run it once again.
Thanks
Here is my solution if we allow the intervals to be slightly unequal.
Suppose that our calls are A[0], A[1], ..., A[n] with frequencies of f[0], f[1], ..., f[n] where the frequencies are all in the same unit. For example 60/hour, 120/hour, etc.
The total frequency with which events happen will be f = f[0] + f[1] + ... + f[n], which means that some event will be scheduled every hour/f time apart. The question is which one will happen when.
The way to imagine this is imagine we have a row of buckets filling with water. Each time we will dump a unit of water from the fullest bucket in front of us.
Since at the start we don't actually care where we start, let's initialize a vector of numbers by just assigning random numbers to them, full[0], full[1], ..., full[n]. And now our algorithm looks like this pseudocode:
Every hour/f time apart:
for each i in 0..n:
fill[i] += f[i]/f
i_choice = (select i from 0..n with the largest f[i])
fill[i_choice] -= 1
Do event A[i_choice]
This leads to events spaced as far apart as possible, but with repeating events happening in a slightly uneven rhythm. In your example that will lead to every 20 seconds doing events following the pattern ...ABBABBABBABB....

is ordering processes in ascending run time, an optimal way to create a set of non overlapping processes?

there are n jobs in a set, each with starting times si, and finish times fi, for ni
I'm trying to figure out if the ordering jobs in ordering ascending start time, finish time, and interval time (fi - si) is optimal or not.
I said that ordering in ascending earliest start time was not optimal in the case that the first job starts first however spans the time that 3 jobs could be started and finished.
Next I said that ordering in ascending finish time was optimal because right when a finish time is added, the next fastest ending job as added, maximizing numbers of jobs added to the non-overlapping jobs list.
However I'm not sure about the ordering fi - si is optimal.
My logic is that it is optimal, because it would list the shortest jobs which I believe would add or consider the jobs that span the lengths of other jobs last
EDIT : Optimize by maximizing the size of the non-overlapping processes list
I think there is a suprisingly simple strategy for choosing the next job which gives you a subset with the maximal number of consecutive jobs: among the jobs left which have a valid start time (in the beginning: all start times are valid; after the first job has been chosen the start time of the next job must, of course, not precede the finish time of the previously chosen job) always choose the job with the earliest finish time.
A proof that this strategy is optimal can start like this: assume you have an optimal (i.e. maximal) subset of consecutive jobs and that the first job is not the job with the (overall) earliest finish time, then this job with the overall earliest finish time cannot be in the optimal subset, but you can replace the first job of the optimal subset with this job and you get another optimal subset which has the job with earliest finish time as first job. Now you can continue in the same way with the second job and thus it is clear that in the subset generated with the above strategy the n-th job has a finish time that does not exceed the finish time of the n-th job of any optimal subset, for any n, and hence the so created subset is also optimal.

Resource allocation algorithm

I know the algorithm exists but i and having problems naming it and finding a suitable solutions.
My problem is as follows:
I have a set of J jobs that need to be completed.
All jobs take different times to complete, but the time is known.
I have a set of R resources.
Each recourse R may have any number from 1 to 100 instances.
A Job may need to use any number of resources R.
A job may need to use multiple instances of a resource R but never more than the resource R has instances. (if a resource only has 2 instances a job will never need more than 2 instances)
Once a job completes it returns all instances of all resources it used back into the pool for other jobs to use.
A job cannot be preempted once started.
As long as resources allow, there is no limit to the number of jobs that can simultaneously execute.
This is not a directed graph problem, the jobs J may execute in any order as long as they can claim their resources.
My Goal:
The most optimal way to schedule the jobs to minimize run time and/or maximize resource utilization.
I'm not sure how good this idea is, but you could model this as an integer linear program, as follows (not tested)
Define some constants,
Use[j,i] = amount of resource i used by job j
Time[j] = length of job j
Capacity[i] = amount of resource i available
Define some variables,
x[j,t] = job j starts at time t
r[i,t] = amount of resource of type i used at time t
slot[t] = is time slot t used
The constraints are,
// every job must start exactly once
(1). for every j, sum[t](x[j,t]) = 1
// a resource can only be used up to its capacity
(2). r[i,t] <= Capacity[i]
// if a job is running, it uses resources
(3). r[i,t] = sum[j | s <= t && s + Time[j] >= t] (x[j,s] * Use[j,i])
// if a job is running, then the time slot is used
(4). slot[t] >= x[j,s] iff s <= t && s + Time[j] >= t
The third constraint means that if a job was started recently enough that it's still running, then its resource usage is added to the currently used resources. The fourth constraint means that if a job was started recently enough that it's still running, then this time slot is used.
The objective function is the weighted sum of slots, with higher weights for later slots, so that it prefers to fill the early slots. In theory the weights must increase exponentially to ensure using a later time slot is always worse than any configuration that uses only earlier time slots, but solvers don't like that and in practice you can probably get away with using slower growing weights.
You will need enough slots such that a solution exists, but preferably not too many more than you end up needing, so I suggest you start with a greedy solution to give you a hopefully non-trivial upper bound on the number of time slots (obviously there is also the sum of the lengths of all tasks).
There are many ways to get a greedy solution, for example just schedule the jobs one by one in the earliest time slot it will go. It may work better to order them by some measure of "hardness" and put the hard ones in first, for example you could give them a score based on how badly they use a resource up (say, the sum of Use[j,i] / Capacity[i], or maybe the maximum? who knows, try some things) and then order by that score in decreasing order.
As a bonus, you may not always have to solve the full ILP problem (which is NP-hard, so sometimes it can take a while), if you solve just the linear relaxation (allowing the variables to take fractional values, not just 0 or 1) you get a lower bound, and the approximate greedy solutions give upper bounds. If they are sufficiently close, you can skip the costly integer phase and take a greedy solution. In some cases this can even prove the greedy solution optimal, if the rounded-up objective from the linear relaxation is the same as the objective of the greedy solution.
This might be a job for Dykstra's Algorithm. For your case, if you want to maximize resource utilization, then each node in the search space is the result of adding a job to the list of jobs you'll do at once. The edges will then be the resources which are left when you add a job to the list of jobs you'll do.
The goal then, is to find the path to the node which has an incoming edge which is the smallest value.
An alternative, which is more straight forward, is to view this as a knapsack problem.
To construct this problem as an instance of The Knapsack Problem, I'd do the following:
Assuming I have J jobs, j_1, j_2, ..., j_n and R resources, I want to find the subset of J such that when that subset is scheduled, R is minimized (I'll call that J').
in pseudo-code:
def knapsack(J, R, J`):
potential_solutions = []
for j in J:
if R > resources_used_by(j):
potential_solutions.push( knapsack(J - j, R - resources_used_by(j), J' + j) )
else:
return J', R
return best_solution_of(potential_solutions)

Variation to the Set-Covering Prob (Maybe an Activity Selection Prob)

Everyday from 9am to 5pm, I am supposed to have at least one person at the factory supervising the workers and make sure that nothing goes wrong.
There are currently n applicants to the job, and each of them can work from time si to time ci, i = 1, 2, ..., n.
My goal is to minimize the time that more than two people are keeping watch of the workers at the same time.
(The applicants' available working hours are able to cover the time period from 9am to 5pm.)
I have proved that at most two people are needed for any instant of time to fulfill my needs, but how should I get from here to the final solution?
Finding the time periods where only one person is available for the job and keeping them is my first step, but finding the next step is what troubles me... .
The algorithm must run in polynomial-time.
Any hints(a certain type of data structure maybe?) or references are welcome. Many thanks.
I think you can do this with dynamic programming by solving the sub-problem:
What is the minimum overlap time given that applicant i is the last worker and we have covered all times from start of day up to ci?
Call this value of the minimum overlap time cost(i).
You can compute the value of cost(i) by considering cases:
If si is equal to the start of day, then cost(i) = 0 (no overlap is required)
Otherwise, consider all previous applicants j. Set cost(i) to the minimum of cost(j)+overlap between i and j. Also set prev(i) to the value of j that attains the minimum.
Then the answer to your problem is given by the minimum of cost(k) for all values of k where ck is equal to the end of the day. You can work out the correct choice of people by backtracking using the values of prev.
This gives an O(n^2) algorithm.

Trying to gain intuition for work scheduling greedy algorithm

I have the following scenario: (since I don't know of a way to show LaTeX, here's a screenshot)
I'm having some trouble conceptualizing what's going on here. If I were to program this, I would probably attempt to structure this as some kind of heap where each node represents a worker, from earliest-to-latest, then run Prim's/Kruskal's algorithm on it. I don't know if I'm on the right track with that idea, but I need to flesh out my understanding of this problem so I can do the following:
Describe in detail the greedy choice
Show that if there's an optimal solution for which the greedy choice was not made, then an exchange can be made to conform with the greedy choice
Know how to implement a greedy algorithm solution, and its running time
So where should I be going with this idea?
This problem is very similar in nature to "Roster Scheduling problems." Think of the committee as say a set of 'supervisors' and you want to have a supervisor present, whenever a worker is present. In this case, the supervisor comes from the same set as the workers.
Here are some modeling ideas, and an Integer Programming formulation.
Time Slicing Idea
This sounds like a bad idea initially, but works really well in practice. We are going to create a lot of "time instants" T i from the start time of the first shift, to the end time of the very last shift. It sometimes helps to think of
T1, T2, T3....TN as being time instants (say) five minutes apart. For every Ti at least one worker is working on a shift. Therefore, that time instant has be be covered (Coverage means there has to be at least one member of the committee also working at time Ti.)
We really need to only worry about 2n Time instants: The start and finish times of each of the n workers.
Coverage Property Requirement
For every time instant Ti, we want a worker from the Committee present.
Let w1, w2...wn be the workers, sorted by their start times s_i. (Worker w1 starts the earliest shift, and worker wn starts the very last shift.)
Introduce a new Indicator variable (boolean):
Y_i = 1 if worker i is part of the committeee
Y_i = 0 otherwise.
Visualization
Now think of a 0-1 matrix, where the rows are the SORTED workers, and the columns are the time instants...
Construct a Time-Worker Matrix (0/1)
t1 t2 t3 t4 t5 t6 ... tN
-------------------------------------------
w1 1 1
w2 1 1
w3 1 1 1
w4 1 1 1
...
...
wn 1 1 1 1
-------------------------------------------
Total 2 4 3 ... ... 1 2 4 5
So the problem is to make sure that for each column, at least 1 worker is Selected to be part of the committee. The Total shows the number of candidates for the committee at each Time instant.
An Integer Programming based formulation
Objective: Minimize Sum(Y_i)
Subject to:
Y1 + Y2 >= 1 # coverage for time t1
Y1 + Y2 + Y3 >= 1 # coverage for time t2
...
More generally, the constraints are:
# Set Covering constraint for time T_i
Sum over all worker i's that are working at time t_i (Y_i) >= 1
Y_i Binary for all i's
Preprocessing
This Integer program, if attempted without preprocessing can be very difficult, and end up choking the solvers. But in practice there are quite a number of preprocessing ideas that can help immensely.
Make any forced assignments. (If ever there is a time instant with only one
worker working, that worker has to be in the committee ∈ C)
Separate into nice subproblems. Look at the time-worker Matrix. If there are nice 'rectangles' in it that can be cut out without
impacting any other time instant, then that is a wholly separate
sub-problem to solve. Makes the solver go much, much faster.
Identical shifts - If lots of workers have the exact same start and end times, then you can simply choose ANY one of them (say, the
lexicographically first worker, WLOG) and remove all the other workers from
consideration. (Makes a ton of difference in real life situations.)
Dominating shifts: If one worker starts before and stays later than any other worker, the 'dominating' worker can stay, all the
'dominated' workers can be removed from consideration for C.
All the identical rows (and columns) in the time-worker Matrix can be fused. You need to only keep one of them. (De-duping)
You could throw this into an IP solver (CPLEX, Excel, lp_solve etc.) and you will get a solution, if the problem size is not an issue.
Hope some of these ideas help.

Resources