Optimal job interval algorithm

Optimal job interval algorithm - algorithm

Let's say you have different jobs that you need to run on a regular basis (for example, you want to make API calls to different endpoints).
Let's say you need to hit two different endpoints and you want your calls to be as far away in time from each other as possible.
Example: You have two jobs, one is run once a minute, another is run twice a minute.
Solution: Start job A with interval of 60 seconds, wait 15 seconds, start job B with interval of 30 seconds.
This way the jobs will run at seconds: 0(job A), 15(job B), 45(job B), 60(job A), 75(job B), 105(job B), 120(job A), ... making a maximum interval between API calls 15 seconds while maintaining the call frequency that we need.
Can you think of an algorithm for these cases that will give optimal start times for each job so that the minimum time difference between calls in maximized? Ideally this algorithm could handle more than two jobs.
Assume we don't need to wait for the job to be finished to run it once again.
Thanks

Here is my solution if we allow the intervals to be slightly unequal.
Suppose that our calls are A[0], A[1], ..., A[n] with frequencies of f[0], f[1], ..., f[n] where the frequencies are all in the same unit. For example 60/hour, 120/hour, etc.
The total frequency with which events happen will be f = f[0] + f[1] + ... + f[n], which means that some event will be scheduled every hour/f time apart. The question is which one will happen when.
The way to imagine this is imagine we have a row of buckets filling with water. Each time we will dump a unit of water from the fullest bucket in front of us.
Since at the start we don't actually care where we start, let's initialize a vector of numbers by just assigning random numbers to them, full[0], full[1], ..., full[n]. And now our algorithm looks like this pseudocode:
Every hour/f time apart:
for each i in 0..n:
fill[i] += f[i]/f
i_choice = (select i from 0..n with the largest f[i])
fill[i_choice] -= 1
Do event A[i_choice]
This leads to events spaced as far apart as possible, but with repeating events happening in a slightly uneven rhythm. In your example that will lead to every 20 seconds doing events following the pattern ...ABBABBABBABB....

Related

Dynamic Programming - Break Scheduling Problem With Decreasing Work Capacity

Assume you are given t1, t2, t3, ..., tn amount of tasks to finish every day. And once you start working, you can only finish c1, c2, c3, ..., cn tasks until spending 1 day resting. You can spend multiple days resting too. But you can only do the tasks which are given you that day. For example;
T[] = {10, 1, 4, 8} given tasks;
C[] = {8, 4, 2, 1} is the capacity of doing tasks for each day.
For this example, optimal solution is giving a break on the 3rd day. That way you can complete 17 tasks in 4 days:
1st day 8 (maximum 10 tasks, but c1=8)
2nd day 1 (maximum 1 task, c2=4)
3rd day 0 (rest to reset to c1)
4th day 8 (maximum 8 tasks, c1=8)
Any other schedule would result with fewer tasks getting done.
I'm trying to find the recurrence relation for this dynamic programming problem. Can anyone help me? I find this question but mine is different because of the decreasing work capacity and there are different number of jobs each day. Reference

If I got you right you have an amount of tasks to do, t(i) for every day i. Also you have some kind of a given internal restriction sequence c(j) for a current treak day j where j can be reseted to 0 if no task was done that day. Goal is to maximizie the solved tasks.
Naive approach is to store for each day i a list for every state j how many tasks were done. Fill the data for the first day. Then for every following day fill the values for the "no break" case - which is
value(i-1,j-1)+min(t(i),c(j)). Choose the maximum from the previous day to fill the "break" entry. Repeat until last day. Choose the highest value and trace back the path.
Example for above
Memory consumtption is pretty easy: O(number of days * number of states).
If you are only interested in the value and not the schedule the memory consumption would be the O(number of states).
Time consumption is a bit more complex, so lets write some pseudo code:
For each day i
For each possible state j
add and write values
choose maximum from previous day for break state
choose maximum
For each day
trace back path
The choose maximum-function would have a complexity of O(number of states).
This pseudo code results in time consumption O(number of days * number of states) as well.

Greedy Algorithm: Assigning jobs to minimize cost

What is the best approach to take if I want to find the minimum total cost if I want to assign n jobs to a person in a sequence which have cost assigned to them? For eg. I have 2 jobs which have costs 4 and 5 respectively. Both jobs take 6 and 10 minutes respectively. So the finish time of the second job will be finish time of first job + time taken by this job. So the total cost will be finish time of each job multiplied by its cost.

If you have to assign n jobs to 1 person (or 1 machine) in scheduling literature terminology, you are looking to minimize weighted flow time. The problem is polynomially solvable.
The shortest weighted processing time sequence is optimal.
Sort and reindex jobs such that p_1/w_1 <= p_2/w_2 <= ... <= p_n/w_n,
where, p_i is the processing time of the ith job and w_i is its weight or cost.
Then, assign job 1 first, followed by 2 and so on until n.

If you look at what happens if you swap two adjacent values you will end up comparing terms like (A+c)m + (A+c+d)l and (A+d)l + (A+c+d)m, where A is the time consumed by earlier jobs, c and d are times, and l and m are costs. With some algebra and rearrangement you can see that the first version is smaller if c/m < d/l. So you could work out for each job the time taken by that job divided by its cost, and do first the jobs with smallest time per unit cost. - check: if you have a job that takes 10 years and has a cost of 1 cent, you want to do that last so that 10 year wait doesn't get multiplied by any other costs.

Best scheduling jobs

I have been working on this question and can't seem to find the right answer. Can someone please help me with this?
We are given N jobs [1,..,N]. We'll get a salary S(i) >= 0 for getting a job i done, and a deduction D(i) >= 0 that adds up for each day passing.
We'll need T(i) days to complete job i. Suppose the job i is done on day d, we'll get S(i) - d.D(i) in reward. The reward can be negative if d is too big.
We can switch jobs in the process and work on jobs in any order, meaning if we start job 1 that takes 5 days on day 1, we don't have to spend 5 consecutive days working on job 1.
How can we decide the best schedule of the jobs, so that we can complete all the jobs and get maximum salary?

I think shapiro is right. You need to determine an appropriate weighted cost formula for each task. It has to take into account the days remaining, the per day deduction, and maybe total deduction.
Once you have the weighted cost you can sort the task list by the weighted cost and perform one day of work on the first task in the list (should be the one that will cost the most if not completed). Then recalculate the weighted cost for all the tasks now that a day has passed, sort the list, and repeat until all tasks are complete.
Generally when you are optimizing schedules in the real world this is the approach. Figure out which task should be worked on first, do some work on it, then recalculate to see if you should switch tasks or keep working on the current one.

Following the above discussion:
For each job i, calculate the one day delay cost as X(i) = D(i) / T(i) and order the jobs by it. Maybe even just order by D(i) since when you choose one job you are not choosing the others - so it makes sense to choose the one with the most expensive deduction. Perform the jobs by this order to minimize the deduction fees.
Again, this is assuming that S(i) is a fixed reward for each job, independent on the exact day it is finished by, and that all jobs need to be performed.

First forget about S(i). You are doing all the jobs you get all the rewards anyway.
Second there's no point to interrupt a task and switch to another.Let's say you have jobs A and B. The deduction you get for the one that finishes last is the same (it's going to take T(A) + T(B) to finish it regardless of how you schedule). The deduction for the other job can only increase if you switch because it's going to take longer to finish it. So you're best if you drop the switch.
Now the problem is to order the tasks so that you get minimum amount of penalty. I'm not sure what's next.
You can pick the first job to minimize T(x) * sum(d) (since you commit to dong job x everything will incur T(x) days delay).
Or you can pick the last job since you know you're going to pay sum(T) * d(x) (you know when it's going to finish).
One says order by T(x) the other says order by d(x) and they are both wrong.
Likely the solution is some dynamic programming in this space, but it escapes me at the moment.

Resource allocation algorithm

I know the algorithm exists but i and having problems naming it and finding a suitable solutions.
My problem is as follows:
I have a set of J jobs that need to be completed.
All jobs take different times to complete, but the time is known.
I have a set of R resources.
Each recourse R may have any number from 1 to 100 instances.
A Job may need to use any number of resources R.
A job may need to use multiple instances of a resource R but never more than the resource R has instances. (if a resource only has 2 instances a job will never need more than 2 instances)
Once a job completes it returns all instances of all resources it used back into the pool for other jobs to use.
A job cannot be preempted once started.
As long as resources allow, there is no limit to the number of jobs that can simultaneously execute.
This is not a directed graph problem, the jobs J may execute in any order as long as they can claim their resources.
My Goal:
The most optimal way to schedule the jobs to minimize run time and/or maximize resource utilization.

I'm not sure how good this idea is, but you could model this as an integer linear program, as follows (not tested)
Define some constants,
Use[j,i] = amount of resource i used by job j
Time[j] = length of job j
Capacity[i] = amount of resource i available
Define some variables,
x[j,t] = job j starts at time t
r[i,t] = amount of resource of type i used at time t
slot[t] = is time slot t used
The constraints are,
// every job must start exactly once
(1). for every j, sum[t](x[j,t]) = 1
// a resource can only be used up to its capacity
(2). r[i,t] <= Capacity[i]
// if a job is running, it uses resources
(3). r[i,t] = sum[j | s <= t && s + Time[j] >= t] (x[j,s] * Use[j,i])
// if a job is running, then the time slot is used
(4). slot[t] >= x[j,s] iff s <= t && s + Time[j] >= t
The third constraint means that if a job was started recently enough that it's still running, then its resource usage is added to the currently used resources. The fourth constraint means that if a job was started recently enough that it's still running, then this time slot is used.
The objective function is the weighted sum of slots, with higher weights for later slots, so that it prefers to fill the early slots. In theory the weights must increase exponentially to ensure using a later time slot is always worse than any configuration that uses only earlier time slots, but solvers don't like that and in practice you can probably get away with using slower growing weights.
You will need enough slots such that a solution exists, but preferably not too many more than you end up needing, so I suggest you start with a greedy solution to give you a hopefully non-trivial upper bound on the number of time slots (obviously there is also the sum of the lengths of all tasks).
There are many ways to get a greedy solution, for example just schedule the jobs one by one in the earliest time slot it will go. It may work better to order them by some measure of "hardness" and put the hard ones in first, for example you could give them a score based on how badly they use a resource up (say, the sum of Use[j,i] / Capacity[i], or maybe the maximum? who knows, try some things) and then order by that score in decreasing order.
As a bonus, you may not always have to solve the full ILP problem (which is NP-hard, so sometimes it can take a while), if you solve just the linear relaxation (allowing the variables to take fractional values, not just 0 or 1) you get a lower bound, and the approximate greedy solutions give upper bounds. If they are sufficiently close, you can skip the costly integer phase and take a greedy solution. In some cases this can even prove the greedy solution optimal, if the rounded-up objective from the linear relaxation is the same as the objective of the greedy solution.

This might be a job for Dykstra's Algorithm. For your case, if you want to maximize resource utilization, then each node in the search space is the result of adding a job to the list of jobs you'll do at once. The edges will then be the resources which are left when you add a job to the list of jobs you'll do.
The goal then, is to find the path to the node which has an incoming edge which is the smallest value.
An alternative, which is more straight forward, is to view this as a knapsack problem.
To construct this problem as an instance of The Knapsack Problem, I'd do the following:
Assuming I have J jobs, j_1, j_2, ..., j_n and R resources, I want to find the subset of J such that when that subset is scheduled, R is minimized (I'll call that J').
in pseudo-code:
def knapsack(J, R, J`):
potential_solutions = []
for j in J:
if R > resources_used_by(j):
potential_solutions.push( knapsack(J - j, R - resources_used_by(j), J' + j) )
else:
return J', R
return best_solution_of(potential_solutions)

Trying to gain intuition for work scheduling greedy algorithm

I have the following scenario: (since I don't know of a way to show LaTeX, here's a screenshot)
I'm having some trouble conceptualizing what's going on here. If I were to program this, I would probably attempt to structure this as some kind of heap where each node represents a worker, from earliest-to-latest, then run Prim's/Kruskal's algorithm on it. I don't know if I'm on the right track with that idea, but I need to flesh out my understanding of this problem so I can do the following:
Describe in detail the greedy choice
Show that if there's an optimal solution for which the greedy choice was not made, then an exchange can be made to conform with the greedy choice
Know how to implement a greedy algorithm solution, and its running time
So where should I be going with this idea?

This problem is very similar in nature to "Roster Scheduling problems." Think of the committee as say a set of 'supervisors' and you want to have a supervisor present, whenever a worker is present. In this case, the supervisor comes from the same set as the workers.
Here are some modeling ideas, and an Integer Programming formulation.
Time Slicing Idea
This sounds like a bad idea initially, but works really well in practice. We are going to create a lot of "time instants" T i from the start time of the first shift, to the end time of the very last shift. It sometimes helps to think of
T1, T2, T3....TN as being time instants (say) five minutes apart. For every Ti at least one worker is working on a shift. Therefore, that time instant has be be covered (Coverage means there has to be at least one member of the committee also working at time Ti.)
We really need to only worry about 2n Time instants: The start and finish times of each of the n workers.
Coverage Property Requirement
For every time instant Ti, we want a worker from the Committee present.
Let w1, w2...wn be the workers, sorted by their start times s_i. (Worker w1 starts the earliest shift, and worker wn starts the very last shift.)
Introduce a new Indicator variable (boolean):
Y_i = 1 if worker i is part of the committeee
Y_i = 0 otherwise.
Visualization
Now think of a 0-1 matrix, where the rows are the SORTED workers, and the columns are the time instants...
Construct a Time-Worker Matrix (0/1)
t1 t2 t3 t4 t5 t6 ... tN
-------------------------------------------
w1 1 1
w2 1 1
w3 1 1 1
w4 1 1 1
...
...
wn 1 1 1 1
-------------------------------------------
Total 2 4 3 ... ... 1 2 4 5
So the problem is to make sure that for each column, at least 1 worker is Selected to be part of the committee. The Total shows the number of candidates for the committee at each Time instant.
An Integer Programming based formulation
Objective: Minimize Sum(Y_i)
Subject to:
Y1 + Y2 >= 1 # coverage for time t1
Y1 + Y2 + Y3 >= 1 # coverage for time t2
...
More generally, the constraints are:
# Set Covering constraint for time T_i
Sum over all worker i's that are working at time t_i (Y_i) >= 1
Y_i Binary for all i's
Preprocessing
This Integer program, if attempted without preprocessing can be very difficult, and end up choking the solvers. But in practice there are quite a number of preprocessing ideas that can help immensely.
Make any forced assignments. (If ever there is a time instant with only one
worker working, that worker has to be in the committee ∈ C)
Separate into nice subproblems. Look at the time-worker Matrix. If there are nice 'rectangles' in it that can be cut out without
impacting any other time instant, then that is a wholly separate
sub-problem to solve. Makes the solver go much, much faster.
Identical shifts - If lots of workers have the exact same start and end times, then you can simply choose ANY one of them (say, the
lexicographically first worker, WLOG) and remove all the other workers from
consideration. (Makes a ton of difference in real life situations.)
Dominating shifts: If one worker starts before and stays later than any other worker, the 'dominating' worker can stay, all the
'dominated' workers can be removed from consideration for C.
All the identical rows (and columns) in the time-worker Matrix can be fused. You need to only keep one of them. (De-duping)
You could throw this into an IP solver (CPLEX, Excel, lp_solve etc.) and you will get a solution, if the problem size is not an issue.
Hope some of these ideas help.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio