I am dealing with a weighted interval problem. In the traditional formulation, we have we have a list {i_1, ..., i_n} of jobs with weights w_j. I found a pretty straightforward approach with example from the book "Algorithm Design" by Kleinberg and Tardos where Dynamic Programming that is based on initially sorting the intervals by finishing time (https://www.cs.princeton.edu/courses/archive/spr05/cos423/lectures/06dynamic-programming.pdf). The algorithm makes use of the concept p_j (predecessor) which is the largest job i non-conflicting with job j. In my specific case, however, I am dealing with a problem where there are are several jobs with the same finish time, so I would have several p_js. Because of that I am not sure how straightforward or appropriate would be this DP approach for my problem. Do you have any suggestion?
Observe that you need to order jobs using <= not < operator:
For this formula it doesn't matter if you have several jobs with the same ending time.
p(j) is one with the largest index among jobs with the same finishing time.
Related
I have stumbled into a problem that looks similar to the classic interval scheduling problem
However, in my case I do allow overlap between intervals- just want to minimize it as much as possible.
Is there a canonic algorithm that solves this? I did not find a 'relaxed' alternative online.
I don't think this problem maps cleanly to any of the classical scheduling literature. I'd try to solve it as an integer program using (e.g.) OR-Tools.
What makes this problem hard is that the order of the jobs is unknown. Otherwise, we could probably write a dynamic program. Handling continuous time efficiently would be tricky but doable.
Similarly, the natural first attempt for a mixed integer programming formulation would be to have a variable for the starting time of each job, but the overlap function is horribly non-convex, and I don't see a good way to encode using integer variables.
The formulation that I would try first would be to quantize time, then create a 0-1 variable x[j,s] for each valid (job, start time) pair. Then we write constraints to force each job to be scheduled exactly once:
for all j, sum over s of x[j,s] = 1.
As for the objective, we have a couple of different choices. I'll show one, but there's a lot of flexibility as long as one unit of time with i + j jobs running is worse than one unit with i jobs and a different unit with j jobs.
For each time t, we make a non-negative integer variable y[t] that will represent the number of excess jobs running at t. We write constraints:
for all t, -y[t] + sum over (j,s) overlapping t of x[j,s] ≤ 1.
Technically this constraint only forces y[t] to be greater than or equal to the number of excess jobs. But the optimal solution will take it to be equal, because of the objective:
minimize sum over t of y[t].
I'm looking for an effective way of achieving optimal job/worker assignation. I'd use Hungarian Algorithm but there is a catch: a worker can be assigned to only one job at a time and each job has a rating and each worker has his own rating. A job rated 4 can be solved by either a worker with rating 4 or by multiple workers with their combined ratings equal to the rating of the job, e.g. 2+2 or 3+1 or 2+1+1 or 1+1+1+1. A job rated 2 can be solved by two workers rated 1 or one worker rated 2. I'd like to prefer one-to-one assignation whenever possible.
Is there any known algorithm or any simple way to achieve optimal assignation in this case?
Your problem is clearly at least as hard as the Partition Problem, even just to know if a feasible solution exists. To show this, let's have a partition instance. It can be easily transformed into your problem by creating two jobs and as many workers as the number of elements in the partition problem. Each work has a rating equal to the value of the corresponding element in the partition problem. Your problem has a solution if and only if the Partition problem has a solution, hence proving that your problem is NP-hard.
I think we could also make an argument that the problem is at least as hard as NP-Complete, if we consider Subset Sum.
Transform it into this decision problem:
Given one job with rating N whose value is in the set of all real numbers, and M workers with ratings Ri for i in [0, M), where each rating is in the set of all real numbers, is there a subset of workers whose rating adds up to N?
In our case, we may be restricting the problem to positive integers, but the decision problem remains, and is in fact much harder because we have many jobs as well, and we want to maximize the number of jobs completed.
I'm trying to look for an algorithm to optimally schedule events, given a set of timeslots. Each event (a,b) is a meeting between 2 users and each timeslot is a fixed amount of time.
eg. a possible set of events can be: [(1,2),(1,3),(4,2),(4,3),(3,1)] with 4 possible timeslots. All events have to be scheduled in a certain timeslot, however, waiting time per user should be minimised (time between two events) and at the same time, the amount of users in a waiting timeslot should be maximised.
Do you know of any possible algorithm or heuristic for this problem?
Greetings
Sound like a combination of Job Shop Scheduling (video) and Meeting Scheduling (video) with a fairness constraint. Both are NP-complete.
Use a simple greedy Construction Heuristic (such as First Fit Decreasing) with Local Search (such as Tabu Search). For these use cases, Local Search leads to better results than Genetic Algorithms, as well be more scalable (see research competitions for proof).
For the fairness constraint "waiting time per user should be minimised", penalize the waiting time squared:
You could get a maybe-better-than-random solution with a simple approach:
sort each pair with the lower-numbered user first
sort the list on first-user (primary key), second-user (secondary sort key)
schedule meetings in that order, with any independent meetings scheduled in parallel. (Like a CPU instruction scheduler looking ahead for independent instructions. Any given user will still have their meetings in the listed order. You're just finding allowed overlaps here.)
I'm unfortunately not an expert on trying to reduce problems to known NP problems like the travelling salesman problem. It's possible there's a polynomial-time solution to this, but it's not obvious to me. If nobody comes up with one, then read on:
If the list isn't too big, you could brute-force check every permutation. For each permutation, schedule all the meetings (with independent meetings in parallel), then sum the last-first meeting times for every user. That's the score for that permutation. Take the permutation with the lowest score.
Instead of brute force, you could use a random start point and evolve towards a local minimum. Phylogenetics software like phyml uses this technique to search for maximum-likelihood evolutionary tree, which has a similarly factorial search space.
Start with a random permutation and evaluate its score
make some random changes, then evaluate the score
if it's not an improvement, try another permutation until you find one that is. (maybe with a mechanism to remember that you already tried this modification to the starting tree).
Repeat from 2 with this new tree, until you've converged on a local minimum.
Repeat from 1 for some other starting guesses, and take the best final result.
If you can efficiently figure out the score change from a swap, that will be a big speedup over re-computing the score for a permutation from scratch.
This is similar to a genetic algorithm. You should read up on that and see if any of those ideas can work.
Here is the assignment problem http://en.wikipedia.org/wiki/Generalized_assignment_problem
I have a similar task, but can't find the algorithm.
We have m tasks, n laborers, m>n. When task is done, the laborer takes the next one (if there is free one). If task is taken by some laborer, no one else can take it. Each laborer has his own speed: V1..Vn, each task has its own 'volume' - W1..Wm. So, i need to distribute tasks between laborers with the goal of minimization of time doing all tasks.
Please help me to find an algorithm or how this problem is named.)
This problem is scheduling jobs on parallel, uniformly related machines so as to minimize the makespan. There's a polynomial-time approximation scheme due to Hochbaum and Shmoys (Using dual approximation algorithms for scheduling problems: Theoretical and practical results, 1988). btilly is right that the bin-packing problem is closely related; the analyses of both Hochbaum--Shmoys and the previous best approximation MULTIFIT are based on techniques pioneered for bin packing.
This looks like a likely np-complete variation of the http://en.wikipedia.org/wiki/Bin_packing_problem. I would therefore not worry about an exact algorithm.
Assuming that the tasks are independent, my first try would be a greedy heuristic. Given an estimate of finishing time, assign to each worker at all points the longest task that they can finish before that finishing time. Now do a binary search to find the shortest finishing time that you can get away with. Your initial upper time is the time for the fastest worker to do everything. Your initial lower time is the time for all of the workers to complete that much work if all are working at the same time.
This is clearly not always going to be perfectly optimal. But it should work reasonably well.
I am trying to develop an algorithm to select a subset of activities from a larger list. If selected, each activity uses some amount of a fixed resource (i.e. the sum over the selected activities must stay under a total budget). There could be multiple feasible subsets, and the means of choosing from them will be based on calculating the opportunity cost of the activities not selected.
EDIT: There are two reasons this is not the 0-1 knapsack problem:
Knapsack requires integer values for the weights (i.e. resources consumed) whereas my resource consumption (i.e. mass in the knapsack parlance) is a continuous variable. (Obviously it's possible to pick some level of precision and quantize the required resources, but my bin size would have to be very small and Knapsack is O(2^n) in W.
I cannot calculate the opportunity cost a priori; that is, I can't evaluate the fitness of each one independently, although I can evaluate the utility of a given set of selected activities or the marginal utility from adding an additional task to an existing list.
The research I've done suggests a naive approach:
Define the powerset
For each element of the powerset, calculate it's utility based on the items not in the set
Select the element with the highest utility
However, I know there are ways to speed up execution time and required memory. For example:
fully enumerating a powerset is O(2^n), but I don't need to fully enumerate the list because once I've found a set of tasks that exceeds the budget I know that any set that adds more tasks is infeasible and can be rejected. That is if {1,2,3,4} is infeasible, so is {1,2,3,4} U {n}, where n is any one of the tasks remaining in the larger list.
Since I'm just summing duty the order of tasks doesn't matter (i.e. if {1,2,3} is feasible, so are {2,1,3}, {3,2,1}, etc.).
All I need in the end is the selected set, so I probably only need the best utility value found so far for comparison purposes.
I don't need to keep the list enumerations, as long as I can be sure I've looked at all the feasible ones. (Although I think keeping the duty sum for previously computed feasible sub-sets might speed run-time.)
I've convinced myself a good recursion algorithm will work, but I can't figure out how to define it, even in pseudo-code (which probably makes the most sense because it's going to be implemented in a couple of languages--probably Matlab for prototyping and then a compiled language later).
The knapsack problem is NP-complete, meaning that there's no efficient way of solving the problem. However there's a pseudo-polynomial time solution using dynamic programming. See the Wikipedia section on it for more details.
However if the maximum utility is large, you should stick with an approximation algorithm. One such approximation scheme is to greedily select items that have the greatest utility/cost. If the budget is large and the cost of each item is small, then this can work out very well.
EDIT: Since you're defining the utility in terms of items not in the set, you can simply redefine your costs. Negate the cost and then shift everything so that all your values are positive.
As others have mentioned, you are trying to solve some instance of the Knapsack problem. While theoretically, you are doomed, in practice you may still do a lot to increase the performance of your algorithm. Here are some (wildly assorted) ideas:
Be aware of Backtracking. This corresponds to your observation that once you crossed out {1, 2, 3, 4} as a solution, {1, 2, 3, 4} u {n} is not worth looking at.
Apply Dynamic Programming techniques.
Be clear about your actual requirements:
Maybe you don't need the best set? Will a good one do? I am not aware if there is an algorithm which provides a good solution in polynomial time, but there might well be.
Maybe you don't need the best set all the time? Using randomized algorithms you can solve some NP-Problems in polynomial time with the risk of failure in 1% (or whatever you deem "safe enough") of all executions.
(Remember: It's one thing to know that the halting problem is not solvable, but another to build a program that determines whether "hello world" implementations will run indefinetly.)
I think the following iterative algorithm will traverse the entire solution set and store the list of tasks, the total cost of performing them, and the opportunity cost of the tasks not performed.
It seems like it will execute in pseudo-polynomial time: polynomial in the number of activities and exponential in the number of activities that can fit within the budget.
ixCurrentSolution = 1
initialize empty set solution {
oc(ixCurrentSolution) = opportunity cost of doing nothing
tasklist(ixCurrentSolution) = empty set
costTotal(ixCurrentSolution) = 0
}
for ixTask = 1:cActivities
for ixSolution = 1:ixCurrentSolution
costCurrentSolution = costTotal(ixCurrentSolution) + cost(ixTask)
if costCurrentSolution < costMax
ixCurrentSolution++
costTotal(ixCurrentSolution) = costCurrentSolution
tasklist(ixCurrentSolution) = tasklist(ixSolution) U ixTask
oc(ixCurrentSolution) = OC of tasks not in tasklist(ixCurrentSolution)
endif
endfor
endfor