I have stumbled into a problem that looks similar to the classic interval scheduling problem
However, in my case I do allow overlap between intervals- just want to minimize it as much as possible.
Is there a canonic algorithm that solves this? I did not find a 'relaxed' alternative online.
I don't think this problem maps cleanly to any of the classical scheduling literature. I'd try to solve it as an integer program using (e.g.) OR-Tools.
What makes this problem hard is that the order of the jobs is unknown. Otherwise, we could probably write a dynamic program. Handling continuous time efficiently would be tricky but doable.
Similarly, the natural first attempt for a mixed integer programming formulation would be to have a variable for the starting time of each job, but the overlap function is horribly non-convex, and I don't see a good way to encode using integer variables.
The formulation that I would try first would be to quantize time, then create a 0-1 variable x[j,s] for each valid (job, start time) pair. Then we write constraints to force each job to be scheduled exactly once:
for all j, sum over s of x[j,s] = 1.
As for the objective, we have a couple of different choices. I'll show one, but there's a lot of flexibility as long as one unit of time with i + j jobs running is worse than one unit with i jobs and a different unit with j jobs.
For each time t, we make a non-negative integer variable y[t] that will represent the number of excess jobs running at t. We write constraints:
for all t, -y[t] + sum over (j,s) overlapping t of x[j,s] ≤ 1.
Technically this constraint only forces y[t] to be greater than or equal to the number of excess jobs. But the optimal solution will take it to be equal, because of the objective:
minimize sum over t of y[t].
Related
I am dealing with a weighted interval problem. In the traditional formulation, we have we have a list {i_1, ..., i_n} of jobs with weights w_j. I found a pretty straightforward approach with example from the book "Algorithm Design" by Kleinberg and Tardos where Dynamic Programming that is based on initially sorting the intervals by finishing time (https://www.cs.princeton.edu/courses/archive/spr05/cos423/lectures/06dynamic-programming.pdf). The algorithm makes use of the concept p_j (predecessor) which is the largest job i non-conflicting with job j. In my specific case, however, I am dealing with a problem where there are are several jobs with the same finish time, so I would have several p_js. Because of that I am not sure how straightforward or appropriate would be this DP approach for my problem. Do you have any suggestion?
Observe that you need to order jobs using <= not < operator:
For this formula it doesn't matter if you have several jobs with the same ending time.
p(j) is one with the largest index among jobs with the same finishing time.
We have four events x,y,z,w that could be ran on a machine M. Each of the event need to use 1/3 of the machine’s capacity. Some of them cannot be ran simultaneously in one batch (say x and y cannot be ran in one batch), how to determine the minimum number of batches expected to run? The time of the event does not matter so the objective is the minimum number of batches.
My intuition is I can formulate it as a integer linear programming. Any ideas?
Based on what you described, the minimal number of events that can be run is 0, as you didn't mention any objective function as mentioned by #cricket_007.
Then the maximal number of events is less than 2, as you mentioned at least two events can not be run simultaneously.
At last, the definition of batch is not given, so not sure about that.
I am working on a problem that appears like a variant of the assignment problem. There are tasks that need to be assigned to servers. The sum of costs over servers needs to be minimized. The following conditions hold:
Each task has a unit size.
A task may not be divided among more than one servers. A task must be handled by exactly one server.
A server has a limit on the maximum number of tasks that may be assigned to it.
The cost function for task assignment is a staircase function. A server incurs a minimum cost 'a'. For each task handled by the server, the cost increases by 1. If the number of tasks assigned to a particular server exceeds half of it's capacity, there is a jump in that server's cost equal to a positive number 'd'.
Tasks have preferences, i.e., a given task may be assigned to one of a few of the servers.
I have a feeling that this is an NP-Hard problem, but I can't seem to find an NP-Complete problem to map to it. I've tried Bin Packing, Assignment problem, Multiple Knapsacks, bipartite graph matching but none of these problems have all the key characteristics of my problem. Can you please suggest some problem that maps to it?
Thanks and best regards
Saqib
Have you tried reducing the set partitioning problem to yours?
The SET-PART (stands for "set partitioning") decision problem asks whether there exists a partition of a given set S of numbers into two sets S1 and S2, so that the sum of the elements in S1 equals the sum of elements in S2. This problem is known to be NP-complete.
Your problem seems related to the m-PROCESSOR decision problem. Given a nonempty set A of n>0 tasks {a1,a2,...,an} with processing times t1,t2,...,tn, the m-PROCESSOR problem asks if you can schedule the tasks among m equal processors so that all tasks finish in at most k>0 time steps. (Processing times are (positive) natural numbers.)
The reduction of SET-PART to m-PROCESSOR is very easy: first show that the special case, with m=2, is NP-complete; then use this to show that m-PROCESSOR is NP-complete for all m>=2. (A reduction in Slovene.)
Hope this helps.
EDIT 1: Oops, this m-PROCESSOR thingy seems very similar to the assignment problem.
I am trying to develop an algorithm to select a subset of activities from a larger list. If selected, each activity uses some amount of a fixed resource (i.e. the sum over the selected activities must stay under a total budget). There could be multiple feasible subsets, and the means of choosing from them will be based on calculating the opportunity cost of the activities not selected.
EDIT: There are two reasons this is not the 0-1 knapsack problem:
Knapsack requires integer values for the weights (i.e. resources consumed) whereas my resource consumption (i.e. mass in the knapsack parlance) is a continuous variable. (Obviously it's possible to pick some level of precision and quantize the required resources, but my bin size would have to be very small and Knapsack is O(2^n) in W.
I cannot calculate the opportunity cost a priori; that is, I can't evaluate the fitness of each one independently, although I can evaluate the utility of a given set of selected activities or the marginal utility from adding an additional task to an existing list.
The research I've done suggests a naive approach:
Define the powerset
For each element of the powerset, calculate it's utility based on the items not in the set
Select the element with the highest utility
However, I know there are ways to speed up execution time and required memory. For example:
fully enumerating a powerset is O(2^n), but I don't need to fully enumerate the list because once I've found a set of tasks that exceeds the budget I know that any set that adds more tasks is infeasible and can be rejected. That is if {1,2,3,4} is infeasible, so is {1,2,3,4} U {n}, where n is any one of the tasks remaining in the larger list.
Since I'm just summing duty the order of tasks doesn't matter (i.e. if {1,2,3} is feasible, so are {2,1,3}, {3,2,1}, etc.).
All I need in the end is the selected set, so I probably only need the best utility value found so far for comparison purposes.
I don't need to keep the list enumerations, as long as I can be sure I've looked at all the feasible ones. (Although I think keeping the duty sum for previously computed feasible sub-sets might speed run-time.)
I've convinced myself a good recursion algorithm will work, but I can't figure out how to define it, even in pseudo-code (which probably makes the most sense because it's going to be implemented in a couple of languages--probably Matlab for prototyping and then a compiled language later).
The knapsack problem is NP-complete, meaning that there's no efficient way of solving the problem. However there's a pseudo-polynomial time solution using dynamic programming. See the Wikipedia section on it for more details.
However if the maximum utility is large, you should stick with an approximation algorithm. One such approximation scheme is to greedily select items that have the greatest utility/cost. If the budget is large and the cost of each item is small, then this can work out very well.
EDIT: Since you're defining the utility in terms of items not in the set, you can simply redefine your costs. Negate the cost and then shift everything so that all your values are positive.
As others have mentioned, you are trying to solve some instance of the Knapsack problem. While theoretically, you are doomed, in practice you may still do a lot to increase the performance of your algorithm. Here are some (wildly assorted) ideas:
Be aware of Backtracking. This corresponds to your observation that once you crossed out {1, 2, 3, 4} as a solution, {1, 2, 3, 4} u {n} is not worth looking at.
Apply Dynamic Programming techniques.
Be clear about your actual requirements:
Maybe you don't need the best set? Will a good one do? I am not aware if there is an algorithm which provides a good solution in polynomial time, but there might well be.
Maybe you don't need the best set all the time? Using randomized algorithms you can solve some NP-Problems in polynomial time with the risk of failure in 1% (or whatever you deem "safe enough") of all executions.
(Remember: It's one thing to know that the halting problem is not solvable, but another to build a program that determines whether "hello world" implementations will run indefinetly.)
I think the following iterative algorithm will traverse the entire solution set and store the list of tasks, the total cost of performing them, and the opportunity cost of the tasks not performed.
It seems like it will execute in pseudo-polynomial time: polynomial in the number of activities and exponential in the number of activities that can fit within the budget.
ixCurrentSolution = 1
initialize empty set solution {
oc(ixCurrentSolution) = opportunity cost of doing nothing
tasklist(ixCurrentSolution) = empty set
costTotal(ixCurrentSolution) = 0
}
for ixTask = 1:cActivities
for ixSolution = 1:ixCurrentSolution
costCurrentSolution = costTotal(ixCurrentSolution) + cost(ixTask)
if costCurrentSolution < costMax
ixCurrentSolution++
costTotal(ixCurrentSolution) = costCurrentSolution
tasklist(ixCurrentSolution) = tasklist(ixSolution) U ixTask
oc(ixCurrentSolution) = OC of tasks not in tasklist(ixCurrentSolution)
endif
endfor
endfor
I work at a publishing house and I am setting up one of our presses for "ganging", in other words, printing multiple jobs simultaneously. Given that different print jobs can have different quantities, and anywhere from 1 to 20 jobs might need to be considered at a time, the problem would be to determine which jobs to group together to minimize waste (waste coming from over-printing on smaller-quantity jobs in a given set, that is).
Given the following stable data:
All jobs are equal in terms of spatial size--placement on paper doesn't come into consideration.
There are three "lanes", meaning that three jobs can be printed simultaneously.
Ideally, each lane has one job. Part of the problem is minimizing how many lanes each job is run on.
If necessary, one job could be run on two lanes, with a second job on the third lane.
The "grouping" waste from a given set of jobs (let's say the quantities of them are x, y and z) would be the highest number minus the two lower numbers. So if x is the higher number, the grouping waste would be (x - y) + (x - z). Otherwise stated, waste is produced by printing job Y and Z (in excess of their quantities) up to the quantity of X. The grouping waste would be a qualifier for the given set, meaning it could not exceed a certain quantity or the job would simply be printed alone.
So the question is stated: how to determine which sets of jobs are grouped together, out of any given number of jobs, based on the qualifiers of 1) Three similar quantities OR 2) Two quantities where one is approximately double the other, AND with the aim of minimal total grouping waste across the various sets.
(Edit) Quantity Information:
Typical job quantities can be from 150 to 350 on foreign languages, or 500 to 1000 on English print runs. This data can be used to set up some scenarios for an algorithm. For example, let's say you had 5 jobs:
1000, 500, 500, 450, 250
By looking at it, I can see a couple of answers. Obviously (1000/500/500) is not efficient as you'll have a grouping waste of 1000. (500/500/450) is better as you'll have a waste of 50, but then you run (1000) and (250) alone. But you could also run (1000/500) with 1000 on two lanes, (500/250) with 500 on two lanes and then (450) alone.
In terms of trade-offs for lane minimization vs. wastage, we could say that any grouping waste over 200 is excessive.
(End Edit)
...Needless to say, quite a problem. (For me.)
I am a moderately skilled programmer but I do not have much familiarity with algorithms and I am not fully studied in the mathematics of the area. I'm I/P writing a sort of brute-force program that simply tries all options, neglecting any option tree that seems to have excessive grouping waste. However, I can't help but hope there's an easier and more efficient method.
I've looked at various websites trying to find out more about algorithms in general and have been slogging my way through the symbology, but it's slow going. Unfortunately, Wikipedia's articles on the subject are very cross-dependent and it's difficult to find an "in". The only thing I've been able to really find would seem to be a definition of the rough type of algorithm I need: "Exclusive Distance Clustering", one-dimensionally speaking.
I did look at what seems to be the popularly referred-to algorithm on this site, the Bin Packing one, but I was unable to see exactly how it would work with my problem.
This seems similar to the classic Operations Research 'cutting stock' problem. For the formal mathematical treatment try
http://en.wikipedia.org/wiki/Cutting_stock_problem
I've coded solutions for the cutting stock problems using delayed column generation technique from the paper "Selection and Design of Heuristic Procedures for Solving Roll Trim Problems" by Robert W. Haessler (Management Sci. Dec '88). I tested it up to a hundred rolls without problem. Understanding how to get the residuals from the first iteration, and using them to craft the new equation for the next iteration is quite interesting. See if you can get hold of this paper, as the author discusses variations closer to your problem.
If you get to a technique that's workable I recommend using a capable linear algebra solver, rather than re-inventing the wheel. Whilst simplex method is easy enough to code yourself for fractional solutions, what you are dealing with here is harder - it's a mixed integer problem. For a modern C mixed integer solver (MIP) using eg. branch & bound, with Java/python bindings I recommend lp_solve.
When I wrote this I found this NEOS guide page useful. Online solver looks defunct through (for me it returns perl code rather than executing it). There's still some background information.
Edit - a few notes: I'll summarise the differences between your problem and that of the cutting stock:
1) cutting stock has input lengths that are indivisible. You can simulate your divisible problems by running the problem multiple times, breaking up the jobs into 1.0, {0.5, 0.5} time original lengths.
2) your 'length of print run' map to the section length
3) choose a large stock length
I'm going to try and attack the "ideal" case, in which no jobs are split between lanes or printed alone.
Let n be the number of jobs, rounded up to the nearest multiple of 3. Dummy zero-length jobs can be created to make the number of jobs a multiple of 3.
If n=3, this is trivial, because there's only one possible solution. So assume n>3.
The job (or one of the jobs if there are several) with the highest quantity must inevitably be the highest or joint-highest of the longest job group (or one of the joint longest job groups if there is a tie). Equal-quantity jobs are interchangeable, so just pick one and call that the highest if there is a tie.
So if n=6, you have two job groups, of which the longest-or-equal one has a fixed highest or joint-highest quantity job. So the only question is how to arrange the other 5 jobs between the groups. The formula for calculating the grouping waste can be expressed as 2∑hi - ∑xj where the his are the highest quantities in each group and the xjs are the other quantities. So moving from one possible solution to another is going to involve swapping one of the h's with one of the x's. (If you swapped one of the h's with another one of the h's, or one of the x's with another one of the x's, it wouldn't make any difference, so you wouldn't have moved to a different solution.) Since h2 is fixed and x1 and x2 are useless for us, what we are actually trying to minimise is w(h1, x3, x4) = 2h1 - (x3 + x4). If h1 <= x3 <= x4, this is an optimal grouping because no swap can improve the situation. (To see this, let d = x3 - h1 and note that w(x3, h1, x4) - w(h1, x3, x4) = 3d which is non-negative, and by symmetry the same argument holds for swapping with x4). So that deals with the case n=6.
For n=9, we have 8 jobs that can be moved around, but again, it's useless to move the shortest two. So the formula this time is w(h1, h2, x3, x4, x5, x6) = 2h1 + 2h2 - (x3 + x4 + x5 + x6), but this time we have the constraint that h2 must not be less than the second-smallest x in the formula (otherwise it couldn't be the highest or joint-highest of any group). As noted before, h1 and h2 can't be swapped with each other, so either you swap one of them with an appropriate x (without violating the constraint), or you swap both of them, each with a distinct x. Take h1 <= x3 <= x4 <= h2 <= x5 <= x6. Again, single swaps can't help, and a double swap can't help either because its effect must necessarily be the sum of the effects of two single swaps. So again, this is an optimal solution.
It looks like this argument is going to work for any n. In which case, finding an optimal solution when you've got an "ideal case" (as defined at the top of my answer) is going to be simple: sort the jobs by quantity and then chop the sorted list into consecutive groups of 3. If this solution proves not to be suitable, you know you haven't got an ideal case.
I will have a think about the non-ideal cases, and update this answer if I come up with anything.
If I understand the problem (and I am not sure I do), the solution could be as simple as printing job 1 in all three lanes, then job 2 in all three lanes, then job 3 in all three lanes.
It has a worst case of printing two extra sheets per job.
I can think of cases where this isn't optimal (e.g. three jobs of four sheets each would take six pages rather than four), but it is likely to be far far simpler to develop than a Bin Packing solution (which is NP-complete; each of the three lanes, over time, are representing the bins.)