So this is a bit of a thought provoking question to get across the idea of NP-Completeness by my professor. I get WHY there should be a solution, due to the rules of NP-Completeness, but I don't know how to find it. So here is the problem:
The problem is a simple task scheduling problem with two processors. Each processor can handle one of the n tasks, and any two tasks can be done simultaneously. Each task has an end time e, it must be completed by this time. Each task also has a duration d. All time values, such as end time, duration, and the current time in the system (the time will start at 0), are all integers. So we are given a list of n tasks and we need to use these two processors to schedule them ALL. If any one can not be scheduled, the algorithm has to return no solution. Keep in mind that the order does not matter and it doesn't matter which processor gets which task, as long as there is no overlap and each task finishes before its deadline.
So here is where the problem gets conceptual/abstract, say we have access to a special little function, we have no idea how it works, all we know is this: given a list of n tasks and the current schedule, it will return true or false based on whether or not the algorithm is solvable from this point. This function assumes that the already scheduled tasks are set in stone, and it will only change the times of the unscheduled tasks. However, all this function does is return true or false, it will not give you the proper schedule if it does find a solution. The point is that you can use the special function in the implementation of the scheduling problem. The goal is to solve the scheduling problem, and return a working schedule with every job scheduled properly, using a polynomial number of calls to the special function.
EDIT: To clarify, the question is this: Create a solution to schedule all n tasks without any going over deadline, using a polynomial number of calls to the "special function."
I think this problem is to show that verifying a solution is polynomial, but finding it is nonpolynomial. But my professor is insistent that there is a way to solve this using a polynomial number of calls to the special function. Since the problem as a whole is NP-Complete, this would prove that the nonpolynomial aspect of the runtime comes in during the "decision portion of the algorithm.
If you would like me to clear up anything just leave a comment, I know this wasn't a perfect explanation of the problem.
Given an oracle M that returns true or false only:
input:
tasks - list of tasks
output:
schedule: a triplet(task,processor,start) for each task
algorithm:
While there is some unscheduled task:
let p be the processor that currently finished up his scheduled tasks first
let x be the first available time on x
for each unscheduled task t:
assign t with the triplet: (t,p,x)
run M on current tasks
if M answers true:
break the for loop, continue to next iteration of while loop
else:
unassign t, and continue to next iteration of for loop
if no triplet was assigned, return NO_SOLUTION
return all assigned triplets
The above runs in polynomial time - it needs O(N^2) calls to M.
Correctness of the above algorithm can be proved by induction, where the induction hypothesis is After round k of the while loop, if there was a solution before it, there is still a solution after it (and after the assignment of some task). After proving this claim, correctness of the algorithm is achieved trivially for k=#tasks
Formally proving the above claim:
Base of induction is trivial for k=0.
Hypothesis: for any k < i, the claim "if there was a solution before round k, there is still one after round k", is correct
Proof:
Assume there is some solution { (tj,pj,xj) | j=1,...,n}, ordered by j<u <-> xj<xu, and also assume that t1,t2,...,ti-1 is assigned same as the algorithm yielded (induction hypothesis).
Now, we are going to assign ti, and we'll be able to do it, since we are going to find the smallest available time stamp (xi), and place some task on it. We are going to find some task, and since ti is a possibility - it will not "fail" and yield "NO_SOLUTION".
Also, since the algorithm does not yields "NO_SOLUTION" in iteration i, from correctness of M, it will yield some task t, that by assigning (t,p,x) - there will still be a solution, and the claim for step i is proven.
Related
I have stumbled into a problem that looks similar to the classic interval scheduling problem
However, in my case I do allow overlap between intervals- just want to minimize it as much as possible.
Is there a canonic algorithm that solves this? I did not find a 'relaxed' alternative online.
I don't think this problem maps cleanly to any of the classical scheduling literature. I'd try to solve it as an integer program using (e.g.) OR-Tools.
What makes this problem hard is that the order of the jobs is unknown. Otherwise, we could probably write a dynamic program. Handling continuous time efficiently would be tricky but doable.
Similarly, the natural first attempt for a mixed integer programming formulation would be to have a variable for the starting time of each job, but the overlap function is horribly non-convex, and I don't see a good way to encode using integer variables.
The formulation that I would try first would be to quantize time, then create a 0-1 variable x[j,s] for each valid (job, start time) pair. Then we write constraints to force each job to be scheduled exactly once:
for all j, sum over s of x[j,s] = 1.
As for the objective, we have a couple of different choices. I'll show one, but there's a lot of flexibility as long as one unit of time with i + j jobs running is worse than one unit with i jobs and a different unit with j jobs.
For each time t, we make a non-negative integer variable y[t] that will represent the number of excess jobs running at t. We write constraints:
for all t, -y[t] + sum over (j,s) overlapping t of x[j,s] ≤ 1.
Technically this constraint only forces y[t] to be greater than or equal to the number of excess jobs. But the optimal solution will take it to be equal, because of the objective:
minimize sum over t of y[t].
I am trying to understand how Greedy Algorithm scheduling problem works.
So I've been reading and googling for a while since I could not understand Greedy algorithm scheduling problem.
We have n jobs to schedule on a single resource. The job (i) has a requested start time s(i) and finish time f(i).
There are some greedy ideas which we select...
Accept in increasing order of s ("earliest start time")
Accept in increasing order of f - s ("shortest job time")
Accept in increasing order of number of conflicts ("fewest conflicts")
Accept in increasing order of f ("earliest finish time")
And the book says the last one, accept in increasing order of f will always gives an optimal solution.
However it did not mention why it always gives optimal solution and why other 3 will not give optimal solution.
They provided the figure that says why other three will not provide optimal solution but I could not understand what it means.
Since I have low reputation, I can not post any image so I will try to draw it.
|---| |---| |---|
|-------------------------|
increasing order of s
underestimated solution
|-----------| |-----------|
|-----|
increasing order of f-s
underestimated solution
|----| |----| |----| |----|
|-----| |-----| |-----|
|-----| |-----|
|-----| |-----|
increasing order of number of conflicts.
underestimated solution
This is what it looks like and I don't see why this is a counterexample of each scenario.
If anyone can explain why each greedy idea does/ does not work, it will be very helpful.
Thank you.
I think I can explain this.
Lets say, we have n jobs, start times as s[1..n] and finish times as f[1..n]. So if we sort it according to finish times, then, we will always be able to complete most number of tasks. Lets see, how.
If a job is finishing earlier (even if it started later in the series, a short job), then, we always have more time for later jobs. Lets assume, we have other jobs that we could start/complete in this interval so that our number of tasks could increase. Now, this is not actually possible as if any task completed before this, then that would be the one with earliest finish time so we would be working on that one. And, if any task has not been completed till now (but has started), then if we selected that, we would not have completed any task but now we actually have done one at least. So, in any case, this is the most optimal choice.
There are many possible solutions with maximum number of tasks that can be done in an interval, EFT gives one such solution. But it is always the max number possible.
I hope I could explain it well.
Since #vish4071 has already explained why selecting earliest finish time will lead to optimal solution, I'll only explain the counterexamples. Task [a,b] starts at a and ends at b. I'll use the counterexamples you have provided.
Earliest start time
Suppose tasks [1,10], [2,3], [4,5], [6,7]. The earliest start time strategy will choose [1,10] and then refuse the other 3, since they all collide with the first one. Yet we can see that [2,3], [4,5], [6,7] is the optimal solution, so earliest start time strategy will not always yield the optimal result.
Shortest execution time
Suppose tasks [1,10], [11,20], [9,12]. This strategy would choose [9,12] and then reject the other two, but optimal solution is [1,10], [11,20]. Therefore, shortest execution time strategy will not always lead to optimal result.
Least amount of collisions
This strategy seems promising, but your example with 11 task proves it not to be optimal. Suppose tasks: [1,4], 3x[3,6], [5,8], [7,10], [9,12], 3x[11,14] and [13, 16]. [7,10] has only 2 collisions with other tasks, which is less than any other task, so it would be selected first by the least amount of collisions strategy. Then [1,4] and [13, 16] would be selected, and all the other tasks rejected because they collide with already selected tasks. That is 3 tasks, however 4 tasks can be selected without collision: [1,4], [5,8], [9,12] and [13, 16].
You can also see that the earliest finish time strategy will always choose the optimal solution in these examples. Note that more than one optimal solution can exist with same number of selected tasks. In such case, earliest finish time strategy will always choose one of them.
Say I want to find a natural number n which n+n=3
To solve this computationally, I would run an algorithm:
int n = 1;
while(n+n!=3)
n++;
System.out.println(n);
Of course we know that this loop is an infinite loop. But is there an algorithm that can predict whether this loop will be infinite or finite? (similar but different from the halting machine, since my desired algorithm only examines this loop while the halting machine can examine all loops) If there is, what would be the algorithm be?
In answer to the specific question asked ("is there an algorithm that can predict if this loop will be infinite or finite"), the algorithm would be simply to simply report "INFINITE".
If you are looking for something more general (i.e. work on arbitrary source code), there are algorithms that work on various classes of algorithms/code. But it has been known for a long time that no such algorithm exists for the general case.
While that program is running, its state can be fully defined by the value of the variable 'n' and by the next operation (amongst a finite set of operations) to be executed. An algorithm can simulate the execution of that program step by step, at each step checking whether both the data and the next operation match those of an earlier step. If a match is found, then the algorithm stops the simulation and reports that the program does not halt. If a match is not found, the new state of the program is logged for future comparisons. If there is no further operation to be executed, i.e. the program finishes running by itself, then the algorithm reports that the program halts.
This algorithm is of course very inefficient, but it shows a very general case: that it is possible to predict whether a specific program stops, when that program does not use an arbitrarily large amount of memory (for code and data).
I am working on a problem that appears like a variant of the assignment problem. There are tasks that need to be assigned to servers. The sum of costs over servers needs to be minimized. The following conditions hold:
Each task has a unit size.
A task may not be divided among more than one servers. A task must be handled by exactly one server.
A server has a limit on the maximum number of tasks that may be assigned to it.
The cost function for task assignment is a staircase function. A server incurs a minimum cost 'a'. For each task handled by the server, the cost increases by 1. If the number of tasks assigned to a particular server exceeds half of it's capacity, there is a jump in that server's cost equal to a positive number 'd'.
Tasks have preferences, i.e., a given task may be assigned to one of a few of the servers.
I have a feeling that this is an NP-Hard problem, but I can't seem to find an NP-Complete problem to map to it. I've tried Bin Packing, Assignment problem, Multiple Knapsacks, bipartite graph matching but none of these problems have all the key characteristics of my problem. Can you please suggest some problem that maps to it?
Thanks and best regards
Saqib
Have you tried reducing the set partitioning problem to yours?
The SET-PART (stands for "set partitioning") decision problem asks whether there exists a partition of a given set S of numbers into two sets S1 and S2, so that the sum of the elements in S1 equals the sum of elements in S2. This problem is known to be NP-complete.
Your problem seems related to the m-PROCESSOR decision problem. Given a nonempty set A of n>0 tasks {a1,a2,...,an} with processing times t1,t2,...,tn, the m-PROCESSOR problem asks if you can schedule the tasks among m equal processors so that all tasks finish in at most k>0 time steps. (Processing times are (positive) natural numbers.)
The reduction of SET-PART to m-PROCESSOR is very easy: first show that the special case, with m=2, is NP-complete; then use this to show that m-PROCESSOR is NP-complete for all m>=2. (A reduction in Slovene.)
Hope this helps.
EDIT 1: Oops, this m-PROCESSOR thingy seems very similar to the assignment problem.
Here is the assignment problem http://en.wikipedia.org/wiki/Generalized_assignment_problem
I have a similar task, but can't find the algorithm.
We have m tasks, n laborers, m>n. When task is done, the laborer takes the next one (if there is free one). If task is taken by some laborer, no one else can take it. Each laborer has his own speed: V1..Vn, each task has its own 'volume' - W1..Wm. So, i need to distribute tasks between laborers with the goal of minimization of time doing all tasks.
Please help me to find an algorithm or how this problem is named.)
This problem is scheduling jobs on parallel, uniformly related machines so as to minimize the makespan. There's a polynomial-time approximation scheme due to Hochbaum and Shmoys (Using dual approximation algorithms for scheduling problems: Theoretical and practical results, 1988). btilly is right that the bin-packing problem is closely related; the analyses of both Hochbaum--Shmoys and the previous best approximation MULTIFIT are based on techniques pioneered for bin packing.
This looks like a likely np-complete variation of the http://en.wikipedia.org/wiki/Bin_packing_problem. I would therefore not worry about an exact algorithm.
Assuming that the tasks are independent, my first try would be a greedy heuristic. Given an estimate of finishing time, assign to each worker at all points the longest task that they can finish before that finishing time. Now do a binary search to find the shortest finishing time that you can get away with. Your initial upper time is the time for the fastest worker to do everything. Your initial lower time is the time for all of the workers to complete that much work if all are working at the same time.
This is clearly not always going to be perfectly optimal. But it should work reasonably well.