Greedy Algorithm: Assigning jobs to minimize cost - algorithm

What is the best approach to take if I want to find the minimum total cost if I want to assign n jobs to a person in a sequence which have cost assigned to them? For eg. I have 2 jobs which have costs 4 and 5 respectively. Both jobs take 6 and 10 minutes respectively. So the finish time of the second job will be finish time of first job + time taken by this job. So the total cost will be finish time of each job multiplied by its cost.

If you have to assign n jobs to 1 person (or 1 machine) in scheduling literature terminology, you are looking to minimize weighted flow time. The problem is polynomially solvable.
The shortest weighted processing time sequence is optimal.
Sort and reindex jobs such that p_1/w_1 <= p_2/w_2 <= ... <= p_n/w_n,
where, p_i is the processing time of the ith job and w_i is its weight or cost.
Then, assign job 1 first, followed by 2 and so on until n.

If you look at what happens if you swap two adjacent values you will end up comparing terms like (A+c)m + (A+c+d)l and (A+d)l + (A+c+d)m, where A is the time consumed by earlier jobs, c and d are times, and l and m are costs. With some algebra and rearrangement you can see that the first version is smaller if c/m < d/l. So you could work out for each job the time taken by that job divided by its cost, and do first the jobs with smallest time per unit cost. - check: if you have a job that takes 10 years and has a cost of 1 cent, you want to do that last so that 10 year wait doesn't get multiplied by any other costs.

Related

Max Tasks that can be completed in given time

I recently came across a this question in a forum:
You are given a straight line starting at 0 to 10^9. You start at zero and there are n tasks you can perform. i th task is located at point i in the line and requires 't' time to be performed. To perform the task you need to reach the point i and spend 't' time at that location.
example: (5,8) lies at 5 so travel distance is 5 and work effort is 8.
Total effort is calculated as travel distance + time required to complete the work.
It takes one sec to travel one unit of path.
Now we are given total T seconds and we need to complete as many tasks as possible and reach back to starting position
Find the max number of tasks that you can finish in time T.
example :
3 16 - 3 tasks and 16 units of total time
2 8 - task 1 at position 2 in line and takes 8 sec to complete
4 5 - task 2 at position 4 in line and takes 5 sec to complete
5 1 - task 3 at position 5 in line and takes 1 sec to complete
​​​​​​​
Output : 2
Explanation :
If we take task 1 at location 2 which requires 8 sec then getting to location 2 takes 2s and completing the task takes 8s leaving us with only 6s which is not enough for completing other task
On the other hand skipping the fist task leaves us enough time to complete the other two tasks.
Going to location and coming back costs 2x5 =10s and performing task at location 4 and 5 cost us 5+1 = 6s. Total time spent will be 10s+6s=16s.
I am new to graphs and DP so I was not sure which approach to use Hamiltonian cycle, Knapsack or Longest Path.
Can someone please help me with the most efficient approach to solve this.
Let's iterate from the first task to the last, according to distance. As we go, it's clear that after subtracting 2 * distance(i) + effort(i) for considering the current task as our last, the most tasks we can achieve can be found by greedily accumulating as many earlier tasks as possible into the remaining time, ordering them by increasing effort.
Therefore, an efficient solution could insert the seen element into a data-structure ordered by effort, dynamically updating the best solution so far. (I originally thought of using a treap and binary search but j_random_hacker suggested a much simpler way in the comments below this answer.)
Suggestion:
For each task n create a graph like this
Join up these graphs for all the tasks.
Run a travelling salesman algorithm to find the minimum time to do all the tasks ( = visit all the nodes in combined graph )
Remove tasks in an orderly sequence. This will give you a collection of results for different numbers of tasks performed. Choose the one that does the most number of tasks that still remains under the time limit.
Since you are maximizing the number of tasks performed, start by removing the longest tasks so that you will be left with lots of short tasks.

Dynamic Programming - Break Scheduling Problem With Decreasing Work Capacity

Assume you are given t1, t2, t3, ..., tn amount of tasks to finish every day. And once you start working, you can only finish c1, c2, c3, ..., cn tasks until spending 1 day resting. You can spend multiple days resting too. But you can only do the tasks which are given you that day. For example;
T[] = {10, 1, 4, 8} given tasks;
C[] = {8, 4, 2, 1} is the capacity of doing tasks for each day.
For this example, optimal solution is giving a break on the 3rd day. That way you can complete 17 tasks in 4 days:
1st day 8 (maximum 10 tasks, but c1=8)
2nd day 1 (maximum 1 task, c2=4)
3rd day 0 (rest to reset to c1)
4th day 8 (maximum 8 tasks, c1=8)
Any other schedule would result with fewer tasks getting done.
I'm trying to find the recurrence relation for this dynamic programming problem. Can anyone help me? I find this question but mine is different because of the decreasing work capacity and there are different number of jobs each day. Reference
If I got you right you have an amount of tasks to do, t(i) for every day i. Also you have some kind of a given internal restriction sequence c(j) for a current treak day j where j can be reseted to 0 if no task was done that day. Goal is to maximizie the solved tasks.
Naive approach is to store for each day i a list for every state j how many tasks were done. Fill the data for the first day. Then for every following day fill the values for the "no break" case - which is
value(i-1,j-1)+min(t(i),c(j)). Choose the maximum from the previous day to fill the "break" entry. Repeat until last day. Choose the highest value and trace back the path.
Example for above
Memory consumtption is pretty easy: O(number of days * number of states).
If you are only interested in the value and not the schedule the memory consumption would be the O(number of states).
Time consumption is a bit more complex, so lets write some pseudo code:
For each day i
For each possible state j
add and write values
choose maximum from previous day for break state
choose maximum
For each day
trace back path
The choose maximum-function would have a complexity of O(number of states).
This pseudo code results in time consumption O(number of days * number of states) as well.

Profit dependent on the previous job time - Job Scheduling problem

There are n jobs that need to be processed on a single machine. Job j requires tj time units to execute and has a profit value of pj. All the jobs are to schedule in time W = summation of tjtime units.
Scheduling job j to start at time sj earns a profit (W - sj)*pj.
I have already tried a greedy approach for pj and sj individually as well pj*tj but have been able to come up with a counterexample. I think it can be solved by a greedy algorithm using pj/tj in decreasing order but not able to prove it. I am just looking for some hints on how to prove it formally.
An approach I have seen before is to consider swapping two adjacent jobs in a proposed schedule. Suppose we have 1,2, where other stuff will take time K and then we hit a deadline. This is better left unswapped if
p1(K + t2) + p2K > p2(K + t1) + p1K
which simplifies to
p1t2 > p2t1
which simplifies to
p1 / t1 > p2 / t2
So if we sort in the way you guessed no swap of adjacent jobs will increase profits, but if there a schedule which does not follow this rule you can improve it by swapping adjacent jobs. So I think your guess is correct.

Algorithm to solve an assignment or matching with a constraint

Here is the problem. Suppose we have N workers and N jobs. We want to assign each job exactly one worker. For each worker i, he could do some jobs on some cost. Our goal is to minimize the total cost on the condition that any single cost should be less than some value.
For example, 10 workers and 10 jobs. Worker 1 can do job 1 with $0.8, job 2 with $2.3, job 3 with $15.8, jobs 4 to 8 with $100, job 9 with $3.2, job 10 with $15.3.
Worker 2 can do job 1 with $3.5, job 2 with $2.3, job 3 with $4.6, job 4 with $17, etc.
Our goal is to find a matching or we can call it an assignment such that the total cost is minimized but any single cost of the corresponding pair/matching between work i and job i is less than a value like $50.
I would very much like to solve it in MATLAB if possible.
This is a slight variation of the Assignment Problem. To handle your additional constraint that no single job cost should be more than some value, just change all entries in the matrix of costs that are greater than this threshold to a huge value (bigger than the sum of all other entries will suffice), and solve as usual, using e.g. the Hungarian Algorithm.

Find the smallest set of overlapping jobs

A friend gave me a puzzle that he says can be solved in better than O(n^3) time.
Given a set of n jobs that each have a set start time and end time (overlaps are very possible), find the smallest subset that for every job either includes that job or includes a job that has overlap with that job.
I'm pretty sure that the optimal solution is to pick the job with the most unmarked overlap, add it to the solution set, then mark it, and its overlap. And repeat until all jobs are marked.
Figuring out which job has the most unmarked overlappers is a simple adjacency matrix (O(n^2)), and this has to be redone every time a job is selected, in order to update the marks, making it O(n^3).
Is there a better solution?
Let A be the set of jobs which we haven't overlapped yet.
Find the job x in A which has the minimal end time (t).
From all jobs whose start time is less than t: pick the job j with the maximum end time.
Add j to the output set.
Remove all jobs which overlap j from A.
Repeat 1-4 until A is empty.
A simple implementation will run in O(n^2). Using interval trees it's probably possible to solve in O(n*logn).
The basic idea behind why it's an optimal solution (not a formal proof): We have to pick one job whose start time is less than t, so that x will be overlapped. If we let S be the set of all jobs whose start time is less than t, it can be shown that j will overlap the same jobs as any job in S, plus possibly more. Since we have to pick one job in S, the best choice is j. We can use this idea to form a proof by induction on the number of jobs.
We can achieve an O(nlogn) solution with a dynamic programming approach. In particular, we want to consider the size of the smallest set including the kth job and matching the first k jobs (ordered by start time), which we denote by S(k). We should first add an auxiliary job (∞,∞), so the result will be our DP solution for this final job minus one.
To compute S(k), consider the job p(k) which ends before job k, but has maximal start time. Note that p is an increasing function. S(k) will then be one more than the minimum S(i) with end(i) > start(p(k)).
We can efficiently find this job by maintaining a (S(k) ordered min) heap of potential jobs. After computing each S(k), we add the job to the heap. When we want to get a job, we remove jobs at the base of the heap which end too early, until we find a suitable one. This will take a total of at most O(nlogn), since we do at most O(n) of each heap operation (pop/peek/push).
The remainder of the task is to compute the p(k) values efficiently. One way to do this is to iterate over all job start and ends (in increasing time), keeping track of the latest starting job.

Resources