Introduction
I have a bipartite graph with workers(W) and Tasks(T).
Te goal is assign all task to the workers to minimize the maximum time spend. IE finish the last tasks as soon as possible.
Question
What modification to the Hungarian algorithm have to be done to accomplish this task.
If Hungarian algorithm is not useful what could be a good mathematical approach?
Mathematically i don't know how to work with multiple task assignments for workers.
I will implement it in python once i understand the math theory.
Problem
Conditions:
A task can only be assigned to one worker
There isn't any restriction in the amount of task
All task must be assigned
A worker could have multiple task assigned
There isn't any restriction in the amount of workers
A worker could have no assignation.
Workers are not free to start working at the same time
Example
If i have 7 task T={T₁, T₂, T₃, T₄, T₅, T₆, T₇} and 3 workers W={W₁, W₂, W₃}, workers will be free to start working in F={4, 7, 8} (where Fᵢ is the time Wᵢ needs to be free to start working) and the cost matrix is:
A matching example could be (not necessary correct in this case, is just an example):
W₁ = {T₁, T₂, T₃}
W₂ = {T₄, T₅}
W₃ = {T₆, T₇}
in this case the time expend for each worker is:
Time(W₁) = 4+5+4+3 = 16
Time(W₂) = 7+4+9 = 20
Time(W₃) = 8+1+7 = 16
Explained as:
For W₁, we have to wait for:
4 till he is free
after that he will finish T₁ in 5
T₂ in 4
T₃ in 3
giving a total time of 16.
For W₂, we have to wait for:
7 till he is free
After that he will finish T₄ in 4
T₅ in 9
Giving a total time of 20.
For W₃, we have to wait for:
8 till he is free
after that he will finish T₆ in 1
T₇ in 7
Giving a total time of 16.
Goal
Minimize the maximum total time. Not the sum of totals.
If Total times {9, 6, 6} (sum 21) is a solution then {9, 9, 9} (sum 27) is a solution too and {10, 1, 1} (sum 12) is not because in the first and second case the last task is finished at time 9 and in the third case in time 10.
Related
This was asked by a friend of mine. I had no previous context, so I want to know what type of algorithm this problem belongs to. Any hint or suggestions will do.
Suppose we have a group of N workers working at a car assembly line. Each worker can do 3 types of work, and their skills rated from 1 to 10. For example, Worker1's "paint surface" efficiency is rated 8, but "assemble engine" efficiency is only rated 5.
The manager has a list of M jobs defined by start time, duration, job type, and importance, rated from 0 to 1. Each worker can only work on 1 job at a time, and 1 job can be worked by 1 worker. How can the managers assign the jobs properly to get maximum output?
The maximum output for a job = worker skill rating * job importance * duration.
For example, we have workers {w1, w2}
w1: paint_skill = 9, engine_skill = 8
w2: paint_skill = 10,engine_skill = 5
We have jobs {j1, j2}
j1: paint job, start_time = 0, duration = 10, importance = 0.5
j2: engine job, start_time = 3, duration = 10, importance = 0.9
We should assign w1 to j2, and w2 to j1. output = 8 * 10 * 0.5 + 10 * 10 * 0.9 = 40 + 90 = 130
A greedy solution that matches the next available worker with the next job is clearly sub-optimal, as in the example we could have matched w1 to j1, which is not optimal.
A exhaustive brute-force solution would guarantee the best output, but will use exponentially more time to compute with large job lists.
How can this problem be approached?
Assume we have a set of n jobs to execute, each of which takes unit time. At any time we can serve exactly one job. Job i, 1<=i<=n earns us a profit if and only if it is executed no later than its deadline.
We can a set of jobs feasible if there exists at least one sequence that allows each job in the set to be performed no later than their deadline. "Earliest deadline first" is feasible.
Show that the greedy algorithm is optimal: Add in every step the job with the highest value of profit among those not yet considered, provided that the chosen set of jobs remains feasible.
MUST DO THIS FIRST: show first that is always possible to re-schedule two feasible sequences (one computed by Greedy) in a way that every job common to both sequences is scheduled at the same time. This new sequence might contain gaps.
UPDATE
I created an example that seems to disprove the algorithm:
Assume 4 jobs:
Job A has profit 1, time duration 2, deadline before day 3;
Job B has profit 4, time duration 1, deadline before day 4;
Job C has profit 3, time duration 1, deadline before day 3;
Job D has profit 2, time duration 1, deadline before day 2.
If we use greedy algorithm with the highest profit first, then we only get job B & C. However, if we do deadline first, then we can get all jobs and the order is CDB
Not sure if I am approaching this question in the right way, since I created an example to disprove what the question wants
This problem looks like Job Shop Scheduling, which is NP-complete (which means there's no optimal greedy algorithm - despite that experts are trying to find one since the 70's). Here's a video on a more advanced form of that use case that is being solved with a Greedy algorithm followed by Local Search.
If we presume your use case can indeed be relaxed to Job Shop Scheduling, than there are many optimization algorithms that can help, such as Metaheuristics (including Local Search such as Tabu Search and Simulated Annealing), (M)IP, Dynamic Programming, Constraint Programming, ... The reason there are so many choices, is because none are perfect. I prefer Metaheuristics, as they out-scale the others in all the research challenges I've seen.
In fact, neither "earliest deadline first", "highest profit first" nor "highest profit/duration first" are correct algorithm...
Assume 2 jobs:
Job A has profit 1, time duration 1, deadline before day 1;
Job B has profit 2, time duration 2, deadline before day 2;
Then "earliest deadline first" fails to get correct answer. Correct answer is B.
Assume another 5 jobs:
Job A has profit 2, time duration 3, deadline before day 3;
Job B has profit 1, time duration 1, deadline before day 1;
Job C has profit 1, time duration 1, deadline before day 2;
Job D has profit 1, time duration 1, deadline before day 3;
Job E has profit 1, time duration 1, deadline before day 4;
Then "highest profit first" fails to get correct answer. Correct answer is BCDE.
Assume another 4 jobs:
Job A has profit 6, time duration 4, deadline before day 6;
Job B has profit 4, time duration 3, deadline before day 6;
Job C has profit 4, time duration 3, deadline before day 6;
Job D has profit 0.0001, time duration 2, deadline before day 6;
Then "highest profit/duration first" fails to get correct answer. Correct answer is BC (Thanks for #dognose's counter-example, see comment).
One correct algorithm is Dynamic Programming:
First order by deadline ascending. dp[i][j] = k means within the first i jobs and within j time units we can get k highest profit. Then initially dp[0][0] = 0.
Jobs info are stored in 3 arrays: profit are stored in profit[i], 1<=i<=n, time duration are stored in time[i], 1<=i<=n, deadline are stored in deadline[i], 1<=i<=n.
// sort by deadline in ascending order
...
// initially 2 dimension dp array are all -1, -1 means this condition unreachable
...
dp[0][0] = 0;
int maxDeadline = max(deadline); // max value of deadline
for(int i=0;i<n;i++) {
for(int j=0;j<=maxDeadline;j++) {
// if do task i+1 satisfy deadline condition, try to update condition for "within the first i+1 jobs, cost j+time[i+1] time units, what's the highest total profit will be"
if(dp[i][j] != -1 && j + time[i+1] <= deadline[i+1]) {
dp[i+1][j+time[i+1]] = max(dp[i+1][j+time[i+1]], dp[i][j] + profit[i+1]);
}
}
}
// the max total profit can get is max value of 2 dimension dp array
The time/space complexity (which is n*m, n is job count, m is maximum deadline) of DP algorithm is highly dependent on how many jobs and the maximum deadline. If n and/or m is rather large, it maybe difficult to get answer, while for common use, it will work well.
The problem is called Job sequencing with deadlines, and can be solved by two algorithms based on greedy strategy:
Sort input jobs decreasing on profit. For every job put it in list of jobs of solution sorted increasingly on deadline. If after including a job some jobs in solution has index grater than deadline, do not include this job.
Sort input jobs decreasing on profit. For every job put it in the list of job of solution on the last possible index. If there is no free index less or equal to the job deadline, do not include the job.
public class JOB {
public static void main(String[] args) {
char name[]={'1','2','3','4'};
int dl[] = {1,1,4,1};
int profit[] ={40,30,20,10};
char cap[] = new char[2];
for (int i =0;i<2 ;i++)
{
cap[i]='\0';
}
int j;
int i =0;
j = dl[i]-1;
while (i<4)
{
if(j<0) {
i++;
if(i<4)
j = dl[i]-1;
}
else if(j<2 && cap[j]=='\0')
{
cap[j] = name[i];
i++;
if(i<4)
j = dl[i]-1;
}
else
j=j-1;
}
for (int i1 =0 ; i1< 2 ; i1++)
System.out.println(cap[i1]);
}
}
This is an interview question:
4 men- each can cross a bridge in 1,3, 7, and 10 min. Only 2 people
can walk the bridge at a time. How many minutes would they take
to cross the bridge?
I can manually think of a solution as: 10 and 7 got together, as soon as 7 reaches the destination, '3' hops in and 10 and 3 complete together. Now 1 goes by itself, and total time taken is 11. So, {10, 7} followed by {10, 3} followed by {1}.
I am unable to think about how I can implement this into a general algorithm into a code. Can someone help me identify how I can go about converting this idea into some real code?
The problem you describe is not subset sum.
Yet you can:
order the array a by descending time
int time1 = 0; // total time taken by the first lane
int time2 = 0; // total time taken by the second lane
for i : 0..n
if(time1 < time2) // add the time to the most "available" lane
time1 += a[i];
else
time2 += a[i];
endif
endfor
return max(time1, time2);
This is not a subset sum problem but a job shop scheduling problem. See Wikipedia entry on Job shop scheduling. You have four "jobs", taking 1, 3, 7 and 10 minutes respectively, and two "lanes" for conducting them, that is, the capacity 2 of the bridge. Calculating exact solution in general for job shop scheduling is hard.
Problem description is follows:
There are n events for particular day d having start time and duration. Example:
e1 10:15:06 11ms (ms = milli seconds)
e2 10:16:07 12ms
......
I need to find out the time x and n. Where x is the time when maximum events were getting executed.
Solution I am thinking is:
Scanning all ms in day d. But that request total 86400000*n calculation. Example
Check at 00::00::00::001 How many events are running
Check at 00::00::00::002 How many events are running
Take max of Range(00::00::00::01,00::00::00::00)
Second solution I am thinking is:
For eventi in all events
Set running_event=1
eventj in all events Where eventj!=eventi
if eventj.start_time in Range (eventi.start_time,eventi.execution_time)
running_event++
And then take max of running_event
Is there any better solution for this?
This can be solved in O(n log n) time:
Make an array of all events. This array is already partially sorted: O(n)
Sort the array: O(n log n); your library should be able to make use of the partial sortedness (timSort does that very well); look into distribution-based sorting algorithms for better expected running time.
Sort event boundaries ascending w.r.t. the boundary time
Sort event ends before sort starts if touching intervals are considered non-overlapping
(Sort event ends after sort starts if touching intervals are considered overlapping)
Initialise running = 0, running_best = 0, best_at = 0
For each event boundary:
If it's a start of an event, increment running
If running > running_best, set best_at = current event time
If it's an end of an event, decrement running
output best_at
You could reduce the number of points you check by checking only ends of all intervals, for each interval (task) I that lasts from t1 to t2, you only need to check how many tasks are running at t1 and at t2 (assuming the tasks runs from t1 to t2 inclusive, if it is exclusive, check for t1-EPSILON, t1+EPSILON, t2-EPSILON, T2+EPSILON.
It is easy to see (convince yourself why) that you cannot get anything better that these cases do not cover.
Example:
tasks run in `[0.5,1.5],[0,1.2],[1,3]`
candidates: 0,0.5,1,1.2,1.5,3
0 -> 1 tasks
0.5 -> 2 tasks
1 -> 3 tasks
1.2 -> 3 tasks (assuming inclusive, end of interval)
1.5 -> 2 tasks (assuming inclusive, end of interval)
3 -> 1 task (assuming inclusive, end of interval)
I was reading a problem which seemed to be an assignment problem to me .Here is the abstract:
A company has N jobs with it.N candidates have come to apply for it but at different times.
Given an NxN matrix in which cell (i,j) denotes the time when job-seeker i approaches for jth to the company. You have to find a valid one to one assignment . if a job is assigned to a candidate then that candidate does not look for more jobs.No two candidates must be given the same job.Also at any given moment no two candidates must be at the same job office.Output should be any one permutation which satisfies the above constraints.
eg:
Input:
1 2 3
4 5 6
7 8 9
Output:
3 2 1
Explantion: At time =1sec 1st candidate goes to the first job.Then at time=2sec to the second job.But he is finally assigned the job 3 at time 3.Then at 5th sec job 2 will be assigned to 2nd cand. So he will not go for the job 3 at time =6.Then finally the 1st job will be assigned to the 3rd cand at t=7.
Note that any other permutation is incorrect.For output (1 2 3) will be wrong because the 1st candidate will be assigned the first job.So He will not look for the jobs 2 and 3 .But at the 4 sec the 2nd candidate will also apply for the 1st job which already has the 1st person in the office.
My question is that how to deal with such assignment problems ??
If you order the (i, j) by time, Now which ever person applied for a job last, give that person that job. There will still be someone available for all other jobs at an earlier time (because otherwise it wouldn't have been the maximum time).
Now keep repeating this you will get an assignment fairly quickly:
matrix = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
dictionary = {}
for person in range(3):
for job in range(3):
time = matrix[person][job]
dictionary[time] = (person, job)
ordered_time = sorted(dictionary.keys(), reverse=True)
taken_job = set()
taken_person = set()
assignment = []
for time in ordered_time:
person, job = dictionary[time]
if person not in taken_person and job not in taken_job:
assignment.append("t=%s, i=%s, j=%s" % (time, person, job))
taken_job.add(job)
taken_person.add(person)
print(assignment)
#['t=9, i=2, j=2', 't=5, i=1, j=1', 't=1, i=0, j=0']
This is the BLOCKING problem from CodeChef August Challenge programming competition which is currently running. It is against the rules to ask for these sort of hints while the competition is running.
http://www.codechef.com/AUG12/problems/BLOCKING
Once the compeititon has completed on the weekend you will be able to get your answer by looking at other competitors answers.