Minimize time to complete N tasks on P agents - algorithm

I have N tasks where the i'th task takes A[i] time to process. Every task is independent of each other and can be scheduled anytime on any of the P processors. A task can be run on only 1 processor, and a processor can process any number of tasks. Each agent/processor can only work on one task at a time, and once begun, must continue to work on it until the task is complete
I want to minimize the amount of time it takes to complete all the task
I am implementing this using a min-heap, i.e.
Sort the task in descending order
Create a min-heap of size P initialized to 0
For each task i, pull the min from heap, add the task time A[i] to it and add it back to the heap
The time to complete all the task is maximum value in the heap. This has been working so far and I want to verify its correctness
Do you think this breaks for any inputs?
I believe I am doing something like Greedy Number Partitioning

This is a polynomial time algorithm for a problem that includes NP complete problems as special cases (for example with P=2, you have a subset sum problem). Therefore you should expect it to not always work.
The simplest case I could find where your algorithm breaks is if the weights are 1, 1, 5, 5 and P=2. Your algorithm should combine things like this:
1 1 5 5
1,1 5 5
1,1,5 5
and will take 7. The better solution you don't find is:
1,5 1,5
which will complete in 6.

Related

Algorithm to find the most efficient way to distribute non-identical work tasks between workers

Let's say I have a set of tasks that need to be done. I have two identical workers that will process them, and (for simplicity's sake) let's assume that I have perfect information on the complexity of the tasks: there are not tasks I don't know about, and I know exactly how long each one will take to complete. (But different tasks will take different amounts of time.) Each worker can only work on one task at a time, and once begun, must continue to work on it until the task is complete. There are no dependencies between tasks, such that one must be finished before another can be worked on.
Given these constraints, is there any known-best algorithm to divide the work between the two workers so that the total time to complete all tasks is minimized? The obvious, naive solution is that each time a worker is free, always assign it the longest (or shortest) remaining task, but is there any method that is more efficient?
This is the partition problem, which is NP-Complete, but if the tasks times are given in relatively low integers - there is a pseudo-polynomial time Dynamic programming solution to solve it.
In your case, you are basically given a set of numbers - and you want to assign them to two subsets, such that the sum of subsets is equal (or closest as possible to being equal, which is the optimization variant of partition problem).
The recursive formula for the DP solution should be something similar to this:
DP[0, 0] = true
DP[0, x] = false | x != 0
DP[i, x] = DP[i-1, x-value[i]] OR DP[i-1, x]
^ ^
assigned i to S1 Assigned i to S2
Calculate all values needed for DP[n+1, SUM] (where SUM is the total sum of tasks and n is the number of tasks), and you are looking for a value DP[n+1, SUM/2] to see if that can be done perfectly.
Getting the actual tasks for each subset is done by retracing your steps, similar to explained here.

Max Tasks that can be completed in given time

I recently came across a this question in a forum:
You are given a straight line starting at 0 to 10^9. You start at zero and there are n tasks you can perform. i th task is located at point i in the line and requires 't' time to be performed. To perform the task you need to reach the point i and spend 't' time at that location.
example: (5,8) lies at 5 so travel distance is 5 and work effort is 8.
Total effort is calculated as travel distance + time required to complete the work.
It takes one sec to travel one unit of path.
Now we are given total T seconds and we need to complete as many tasks as possible and reach back to starting position
Find the max number of tasks that you can finish in time T.
example :
3 16 - 3 tasks and 16 units of total time
2 8 - task 1 at position 2 in line and takes 8 sec to complete
4 5 - task 2 at position 4 in line and takes 5 sec to complete
5 1 - task 3 at position 5 in line and takes 1 sec to complete
​​​​​​​
Output : 2
Explanation :
If we take task 1 at location 2 which requires 8 sec then getting to location 2 takes 2s and completing the task takes 8s leaving us with only 6s which is not enough for completing other task
On the other hand skipping the fist task leaves us enough time to complete the other two tasks.
Going to location and coming back costs 2x5 =10s and performing task at location 4 and 5 cost us 5+1 = 6s. Total time spent will be 10s+6s=16s.
I am new to graphs and DP so I was not sure which approach to use Hamiltonian cycle, Knapsack or Longest Path.
Can someone please help me with the most efficient approach to solve this.
Let's iterate from the first task to the last, according to distance. As we go, it's clear that after subtracting 2 * distance(i) + effort(i) for considering the current task as our last, the most tasks we can achieve can be found by greedily accumulating as many earlier tasks as possible into the remaining time, ordering them by increasing effort.
Therefore, an efficient solution could insert the seen element into a data-structure ordered by effort, dynamically updating the best solution so far. (I originally thought of using a treap and binary search but j_random_hacker suggested a much simpler way in the comments below this answer.)
Suggestion:
For each task n create a graph like this
Join up these graphs for all the tasks.
Run a travelling salesman algorithm to find the minimum time to do all the tasks ( = visit all the nodes in combined graph )
Remove tasks in an orderly sequence. This will give you a collection of results for different numbers of tasks performed. Choose the one that does the most number of tasks that still remains under the time limit.
Since you are maximizing the number of tasks performed, start by removing the longest tasks so that you will be left with lots of short tasks.

Algorithm to minimizing switching between paths

I have a list of lists in the form l = [[1,2,3],[3,4,6],...]. There are m sublists each representing a player. Each player can perform a number of tasks (there are n tasks). I would like to find the shortest path through all the steps by minimizing the number of switches between players. So basically have the same player perform the tasks consecutively as often as possible. I'm trying to write an algorithm to optimize this that runs in polynomial time but I'm having a bit of trouble coming up with a good scheme. I was thinking it could be like Dijkstra's algorithm, but I'm not exactly sure how to adapt it to fit my case. Below a concrete example of what I want.
Example
n = 5 and m = 3 such that we have a list of lists l = [[1,2,5],[1,3,5],[2,3,4]]
The algorithm would return [0,2,2,2,0]
i.e. player 0 would be chosen first then swap to player 2 for 3 tasks than back to player 0 for the last task.
I'm just looking for pseudo code or a push in the right direction. Really struggling and brute force won't work for large numbers!
Since it is never beneficial to have a player perform fewer consecutive tasks than he can, a simple greedy algorithm suffices to find the optimal solution:
Starting with task 1, find the player that can execute the largest number of consecutive tasks starting with that first task.
Starting with the first task that the previously found player can't do, find the player that can execute the largest number of consecutive tasks starting with that task.
Repeat until all the tasks are done.
Here's a proof that this algorithm is optimal:
Let say there's an optimal solution that has player A performing tasks i through j and then player B performing tasks j+1 through k.
If there is any player (including A) that can perform tasks i through j+1, then we can use that player to do those tasks instead, and the solution will be as good or better. Either B will perform tasks j+2 through k, and the number of player switches will be the same, or j+1 = k and we won't need player B at all.
Therefore there is an optimal solution in which every chosen player maximizes the number of consecutive moves that can be performed by that player. In fact, since every such solution is equivalent, they are all optimal.
EDIT: As I was writing this, Pham suggests to use a segment tree, but no such complex data structure is necessary. If the sublists are sorted and you make an index from each task number to the sublist positions at which it can be found, then you can do this in O(N) time.

Is there an exact algorithm for the minimum makespan scheduling with 2 identical machines and N processes that exists for small constraints?

If 2 identical machines are given, with N jobs with i'th job taking T[i] time to complete, is there an exact algorithm to assign these N jobs to the 2 machines so that the makespan is minimum or the total time required to complete all the N jobs is minimum?
I need to solve the problem only for N=50.
Also note that total execution time of all the processes is bounded by 10000.
Does greedily allocating the largest job to the machine which gets free work?
// s1 -> machine 1
//s2->machine 2 , a[i]-> job[i] ,time-> time consumed,jobs sorted in descending order
// allocated one by one to the machine which is free.
long long ans=INT_MAX;
sort(a,a+n);
reverse(a,a+n);
int i=2;
int s1=a[0];
int s2=a[1];
long long time=min(s1,s2);
s1-=time;
s2-=time;
while(i<n)
{
if(s1==0 && s2==0)
{
s1=a[i];
if(i+1<n) s2=a[i+1];
int c=min(s1,s2);
time+=c;
s1-=c;
s2-=c;
i+=2;
continue;
}
else
{
if(s1<s2) swap(s1,s2);
s2=a[i];
int c=min(s1,s2);
time+=c;
s1-=c;
s2-=c;
i++;
}
}
assert(s1*s2==0);
ans = min(ans,time+max(s1,s2));
The problem you described is NP-hard via a more or less straightforward reduction from Subset Sum, which makes an excat polynomial time algorithm impossible unless P=NP. Greedy assignment will not yield an optimal solution in general. However, as the number of jobs is bounded by 50, any exact algorithm with running time exponential in N is in fact an algorithm with constant running time.
The problem can be tackled via dynamic programming as follows. Let P be the sum of all processing times, which is an upper bound for the optimal makespan. Define an array S[N][P] as state space. The meaning of S[i][j] is the minimum makespan attainable for jobs indexed by 1,...,i where the load of machine 1 is exactly j. An outer loop iterates over the jobs, an inner loop over the target load of machine 1. In each iteration, we have do decide whether job i should run on machine 1 or machine 2. The determination of the state value of course has to be done in such a way that only solutions which exist are taken into account.
In the first case, we set S[i][j] to the minimum of [i-1][j-T[i]] + T[i] (the resulting load of machine 1) and the sum of pi' for i' in {1,...,i-1} minus [i-1][j-T[i]] (the resulting load of machine 2, so to speak the complementary load of machine 1 which is not changed by our choice).
In the second case, we set S[i][j] to the minimum of [i-1][j] (the resulting load of machine 1 which is not changed by our choice) and the sum of T[i'] for i' in {1,...,i-1} minus [i-1][j-T[i]] plus T[i] (the resulting load of machine 2, so to speak the complementary load of machine 1).
Finally, the optimal makespan can be found by determining the minimum value of S[N][j] for each j. Note that the approach only calculates the optimum value, but not an optimal solution itself. An optimal solution can be found by backtracking or using suitable auxiliary data structures. The running time and space requirement would be O(N*P), i.e. pseudopolynomial in N.
Note that the problem and the approach are very similar to the Knapsack problem. However for the scheduling problem, the choice is not to be made whether or not to include an item but whether or not to execute a job on machine 1 or machine 2.
Also note that the problem is actually well-studied; the problem description in so-called three-field notation is P2||Cmax. If I recall correctly, however greedily scheduling jobs in non-increasing order of processing time yields an approximation ratio of 2 as proved in the following article.
R.L. Graham, "Bounds-for certain multiprocessing anomalies," Bell System Technological Journal 45 (1966) 1563-1581

Find the smallest set of overlapping jobs

A friend gave me a puzzle that he says can be solved in better than O(n^3) time.
Given a set of n jobs that each have a set start time and end time (overlaps are very possible), find the smallest subset that for every job either includes that job or includes a job that has overlap with that job.
I'm pretty sure that the optimal solution is to pick the job with the most unmarked overlap, add it to the solution set, then mark it, and its overlap. And repeat until all jobs are marked.
Figuring out which job has the most unmarked overlappers is a simple adjacency matrix (O(n^2)), and this has to be redone every time a job is selected, in order to update the marks, making it O(n^3).
Is there a better solution?
Let A be the set of jobs which we haven't overlapped yet.
Find the job x in A which has the minimal end time (t).
From all jobs whose start time is less than t: pick the job j with the maximum end time.
Add j to the output set.
Remove all jobs which overlap j from A.
Repeat 1-4 until A is empty.
A simple implementation will run in O(n^2). Using interval trees it's probably possible to solve in O(n*logn).
The basic idea behind why it's an optimal solution (not a formal proof): We have to pick one job whose start time is less than t, so that x will be overlapped. If we let S be the set of all jobs whose start time is less than t, it can be shown that j will overlap the same jobs as any job in S, plus possibly more. Since we have to pick one job in S, the best choice is j. We can use this idea to form a proof by induction on the number of jobs.
We can achieve an O(nlogn) solution with a dynamic programming approach. In particular, we want to consider the size of the smallest set including the kth job and matching the first k jobs (ordered by start time), which we denote by S(k). We should first add an auxiliary job (∞,∞), so the result will be our DP solution for this final job minus one.
To compute S(k), consider the job p(k) which ends before job k, but has maximal start time. Note that p is an increasing function. S(k) will then be one more than the minimum S(i) with end(i) > start(p(k)).
We can efficiently find this job by maintaining a (S(k) ordered min) heap of potential jobs. After computing each S(k), we add the job to the heap. When we want to get a job, we remove jobs at the base of the heap which end too early, until we find a suitable one. This will take a total of at most O(nlogn), since we do at most O(n) of each heap operation (pop/peek/push).
The remainder of the task is to compute the p(k) values efficiently. One way to do this is to iterate over all job start and ends (in increasing time), keeping track of the latest starting job.

Resources