algorithm similar to 'assignment task' - algorithm

Here is the assignment problem http://en.wikipedia.org/wiki/Generalized_assignment_problem
I have a similar task, but can't find the algorithm.
We have m tasks, n laborers, m>n. When task is done, the laborer takes the next one (if there is free one). If task is taken by some laborer, no one else can take it. Each laborer has his own speed: V1..Vn, each task has its own 'volume' - W1..Wm. So, i need to distribute tasks between laborers with the goal of minimization of time doing all tasks.
Please help me to find an algorithm or how this problem is named.)

This problem is scheduling jobs on parallel, uniformly related machines so as to minimize the makespan. There's a polynomial-time approximation scheme due to Hochbaum and Shmoys (Using dual approximation algorithms for scheduling problems: Theoretical and practical results, 1988). btilly is right that the bin-packing problem is closely related; the analyses of both Hochbaum--Shmoys and the previous best approximation MULTIFIT are based on techniques pioneered for bin packing.

This looks like a likely np-complete variation of the http://en.wikipedia.org/wiki/Bin_packing_problem. I would therefore not worry about an exact algorithm.
Assuming that the tasks are independent, my first try would be a greedy heuristic. Given an estimate of finishing time, assign to each worker at all points the longest task that they can finish before that finishing time. Now do a binary search to find the shortest finishing time that you can get away with. Your initial upper time is the time for the fastest worker to do everything. Your initial lower time is the time for all of the workers to complete that much work if all are working at the same time.
This is clearly not always going to be perfectly optimal. But it should work reasonably well.

Related

Traveling Salesman Problem for a large number of vertices

I need to solve TSP for a large number of vertices(30-100) with good accuracy and adequate time(like 1-2 days). My graph can contain asymmetrical edges(g[i][j] not equal g[j][i]).
I tried greedy, little(maybe my bad, but that shows worse results than greedy), simple genetic algo(barely better than greedy) , dynamic for O(2^n*n) (fast out of memory).
Well, 30-100 is not really large number of vertices. Did you miss some zeroes? Or you are facing some special hard to solve cases like p43 from TSPLIB?
In any case, if you are looking for a good heuristic I used to use Ant Colony Optimization for Asymmetric TSP. It is easy to implement and providers quite good performance.
You might take a look at my old implementation: https://github.com/aligusnet/optimer/tree/master/src/heuristics/aco
If you can accept not an optimal, but "close to optimal" solution, I can suggest you to use "random traveling" algorithm. Idea of this algorithm - do not BFS/DFS search through entire combination tree, but search just random DFS-subtrees.
For example, you have vertices [A-Z], and you start point within [A]. Try 10000 attempts for each path (total 32 prefix), started from [A-B-...], [A-C-...] and so on, where [...] is randomly selected full-depth path through your graph, according your rules. Keep cost of appropriate paths within array, where cost is sum of costs from each prefix. Because of you use equal attempts to all "start prefixes", sum of minimal prefix will show you best step from [A]. Of course, this is not guarantee for optimal, but this is high probability to be so.
For example, sum of 10,000 attempts withing path [A-K] is lowest. Next step - accept first step [A-K], and again repeat algorithm, until you found the solution.
Here's the TSP source code of the OptaPlanner implementation, fwiw. It deals with datasets up to 10k visits pretty good when NearbySelection is activated (or up to 500 visits or so if it's not activated) - to go above 10k you'll need to activate Partitioned Search which comes at a trade-off.
It has asymmetric datasets (using OpenStreetMap data) in the import/belgium/road-time directory. It can't prove if it reaches the optimal solution or not. Usually termination is set on either a few minutes or on a few unimproved minutes.
Benchmarks showed that Late Acceptance has slightly better results than Simulated Annealing and Tabu Search, given a specific set of MoveSelectors configured, but your mileage may vary...

Efficient scheduling jobs with declining profits on multiple machines

Problem: Consider the scheduling problem of n jobs on M machines where each job i have a processing time pi and gives a profit gi(t) if completed by time t. All the jobs are released at time 0. All gi(t) are non-increasing functions. For simplicity, we can assume that the machines are not preemptive.
For M=1 and linearly decreasing profit functions. this problem can be solved in O(n) using the greedy algorithm. But for general functions it is NP-complete.
I am interested in the general case. Please give me any link of papers or resource materials for the problem. I searched on the internet but didn't find anything interesting for M>1, though there is previous work on approximating the bounds for M=1.
Please note that I am not expecting you to solve this but just need previous work on the similar problems, if any. And if you have any ideas which can help please feel free to share.
I want know what bounds are know for this problem with m machines and n jobs with same release dates and general non-increasing profit functions. I found a paper towards that direction
http://arxiv.org/pdf/1008.4889v1.pdf
They gave an O(1) approximation when all jobs have identical release times. I want to find similar kind literature for the problem and what ideas they used for solving the problem.
You can start with a "greedy solution" by using a dispatch rule that e.g. minimizes
gi(t0+pi)/pi
on the first empty machine (i runs over all not yet scheduled jobs, t0 is the time, where the first machine is empty).
Then this solution can be improved using Meta-Heuristics like Simulated Annealing. A solution can be represented as a tuple of job sequences (one job sequence for each machine). A crucial point is, what "moves" are allowed to change a solution. Maybe for this problem one can find already good solutions with the two basic moves:
Take one job from one machine and find a new place to insert it.
Exchange two jobs in the machines' job sequences.

greedy algorithm, scheduling

I am trying to understand how Greedy Algorithm scheduling problem works.
So I've been reading and googling for a while since I could not understand Greedy algorithm scheduling problem.
We have n jobs to schedule on a single resource. The job (i) has a requested start time s(i) and finish time f(i).
There are some greedy ideas which we select...
Accept in increasing order of s ("earliest start time")
Accept in increasing order of f - s ("shortest job time")
Accept in increasing order of number of conflicts ("fewest conflicts")
Accept in increasing order of f ("earliest finish time")
And the book says the last one, accept in increasing order of f will always gives an optimal solution.
However it did not mention why it always gives optimal solution and why other 3 will not give optimal solution.
They provided the figure that says why other three will not provide optimal solution but I could not understand what it means.
Since I have low reputation, I can not post any image so I will try to draw it.
 |---| |---| |---|
|-------------------------|
increasing order of s
underestimated solution
|-----------| |-----------|
   |-----|
increasing order of f-s
underestimated solution
|----|  |----| |----|  |----|
 |-----| |-----| |-----|
 |-----|    |-----|
 |-----|    |-----|
increasing order of number of conflicts.
underestimated solution
This is what it looks like and I don't see why this is a counterexample of each scenario.
If anyone can explain why each greedy idea does/ does not work, it will be very helpful.
Thank you.
I think I can explain this.
Lets say, we have n jobs, start times as s[1..n] and finish times as f[1..n]. So if we sort it according to finish times, then, we will always be able to complete most number of tasks. Lets see, how.
If a job is finishing earlier (even if it started later in the series, a short job), then, we always have more time for later jobs. Lets assume, we have other jobs that we could start/complete in this interval so that our number of tasks could increase. Now, this is not actually possible as if any task completed before this, then that would be the one with earliest finish time so we would be working on that one. And, if any task has not been completed till now (but has started), then if we selected that, we would not have completed any task but now we actually have done one at least. So, in any case, this is the most optimal choice.
There are many possible solutions with maximum number of tasks that can be done in an interval, EFT gives one such solution. But it is always the max number possible.
I hope I could explain it well.
Since #vish4071 has already explained why selecting earliest finish time will lead to optimal solution, I'll only explain the counterexamples. Task [a,b] starts at a and ends at b. I'll use the counterexamples you have provided.
Earliest start time
Suppose tasks [1,10], [2,3], [4,5], [6,7]. The earliest start time strategy will choose [1,10] and then refuse the other 3, since they all collide with the first one. Yet we can see that [2,3], [4,5], [6,7] is the optimal solution, so earliest start time strategy will not always yield the optimal result.
Shortest execution time
Suppose tasks [1,10], [11,20], [9,12]. This strategy would choose [9,12] and then reject the other two, but optimal solution is [1,10], [11,20]. Therefore, shortest execution time strategy will not always lead to optimal result.
Least amount of collisions
This strategy seems promising, but your example with 11 task proves it not to be optimal. Suppose tasks: [1,4], 3x[3,6], [5,8], [7,10], [9,12], 3x[11,14] and [13, 16]. [7,10] has only 2 collisions with other tasks, which is less than any other task, so it would be selected first by the least amount of collisions strategy. Then [1,4] and [13, 16] would be selected, and all the other tasks rejected because they collide with already selected tasks. That is 3 tasks, however 4 tasks can be selected without collision: [1,4], [5,8], [9,12] and [13, 16].
You can also see that the earliest finish time strategy will always choose the optimal solution in these examples. Note that more than one optimal solution can exist with same number of selected tasks. In such case, earliest finish time strategy will always choose one of them.

Optimal scheduling system in terms of lowest waiting time for users and maximum users in waiting intervals

I'm trying to look for an algorithm to optimally schedule events, given a set of timeslots. Each event (a,b) is a meeting between 2 users and each timeslot is a fixed amount of time.
eg. a possible set of events can be: [(1,2),(1,3),(4,2),(4,3),(3,1)] with 4 possible timeslots. All events have to be scheduled in a certain timeslot, however, waiting time per user should be minimised (time between two events) and at the same time, the amount of users in a waiting timeslot should be maximised.
Do you know of any possible algorithm or heuristic for this problem?
Greetings
Sound like a combination of Job Shop Scheduling (video) and Meeting Scheduling (video) with a fairness constraint. Both are NP-complete.
Use a simple greedy Construction Heuristic (such as First Fit Decreasing) with Local Search (such as Tabu Search). For these use cases, Local Search leads to better results than Genetic Algorithms, as well be more scalable (see research competitions for proof).
For the fairness constraint "waiting time per user should be minimised", penalize the waiting time squared:
You could get a maybe-better-than-random solution with a simple approach:
sort each pair with the lower-numbered user first
sort the list on first-user (primary key), second-user (secondary sort key)
schedule meetings in that order, with any independent meetings scheduled in parallel. (Like a CPU instruction scheduler looking ahead for independent instructions. Any given user will still have their meetings in the listed order. You're just finding allowed overlaps here.)
I'm unfortunately not an expert on trying to reduce problems to known NP problems like the travelling salesman problem. It's possible there's a polynomial-time solution to this, but it's not obvious to me. If nobody comes up with one, then read on:
If the list isn't too big, you could brute-force check every permutation. For each permutation, schedule all the meetings (with independent meetings in parallel), then sum the last-first meeting times for every user. That's the score for that permutation. Take the permutation with the lowest score.
Instead of brute force, you could use a random start point and evolve towards a local minimum. Phylogenetics software like phyml uses this technique to search for maximum-likelihood evolutionary tree, which has a similarly factorial search space.
Start with a random permutation and evaluate its score
make some random changes, then evaluate the score
if it's not an improvement, try another permutation until you find one that is. (maybe with a mechanism to remember that you already tried this modification to the starting tree).
Repeat from 2 with this new tree, until you've converged on a local minimum.
Repeat from 1 for some other starting guesses, and take the best final result.
If you can efficiently figure out the score change from a swap, that will be a big speedup over re-computing the score for a permutation from scratch.
This is similar to a genetic algorithm. You should read up on that and see if any of those ideas can work.

How can I efficiently find the subset of activities that stay within a budget and maximizes utility?

I am trying to develop an algorithm to select a subset of activities from a larger list. If selected, each activity uses some amount of a fixed resource (i.e. the sum over the selected activities must stay under a total budget). There could be multiple feasible subsets, and the means of choosing from them will be based on calculating the opportunity cost of the activities not selected.
EDIT: There are two reasons this is not the 0-1 knapsack problem:
Knapsack requires integer values for the weights (i.e. resources consumed) whereas my resource consumption (i.e. mass in the knapsack parlance) is a continuous variable. (Obviously it's possible to pick some level of precision and quantize the required resources, but my bin size would have to be very small and Knapsack is O(2^n) in W.
I cannot calculate the opportunity cost a priori; that is, I can't evaluate the fitness of each one independently, although I can evaluate the utility of a given set of selected activities or the marginal utility from adding an additional task to an existing list.
The research I've done suggests a naive approach:
Define the powerset
For each element of the powerset, calculate it's utility based on the items not in the set
Select the element with the highest utility
However, I know there are ways to speed up execution time and required memory. For example:
fully enumerating a powerset is O(2^n), but I don't need to fully enumerate the list because once I've found a set of tasks that exceeds the budget I know that any set that adds more tasks is infeasible and can be rejected. That is if {1,2,3,4} is infeasible, so is {1,2,3,4} U {n}, where n is any one of the tasks remaining in the larger list.
Since I'm just summing duty the order of tasks doesn't matter (i.e. if {1,2,3} is feasible, so are {2,1,3}, {3,2,1}, etc.).
All I need in the end is the selected set, so I probably only need the best utility value found so far for comparison purposes.
I don't need to keep the list enumerations, as long as I can be sure I've looked at all the feasible ones. (Although I think keeping the duty sum for previously computed feasible sub-sets might speed run-time.)
I've convinced myself a good recursion algorithm will work, but I can't figure out how to define it, even in pseudo-code (which probably makes the most sense because it's going to be implemented in a couple of languages--probably Matlab for prototyping and then a compiled language later).
The knapsack problem is NP-complete, meaning that there's no efficient way of solving the problem. However there's a pseudo-polynomial time solution using dynamic programming. See the Wikipedia section on it for more details.
However if the maximum utility is large, you should stick with an approximation algorithm. One such approximation scheme is to greedily select items that have the greatest utility/cost. If the budget is large and the cost of each item is small, then this can work out very well.
EDIT: Since you're defining the utility in terms of items not in the set, you can simply redefine your costs. Negate the cost and then shift everything so that all your values are positive.
As others have mentioned, you are trying to solve some instance of the Knapsack problem. While theoretically, you are doomed, in practice you may still do a lot to increase the performance of your algorithm. Here are some (wildly assorted) ideas:
Be aware of Backtracking. This corresponds to your observation that once you crossed out {1, 2, 3, 4} as a solution, {1, 2, 3, 4} u {n} is not worth looking at.
Apply Dynamic Programming techniques.
Be clear about your actual requirements:
Maybe you don't need the best set? Will a good one do? I am not aware if there is an algorithm which provides a good solution in polynomial time, but there might well be.
Maybe you don't need the best set all the time? Using randomized algorithms you can solve some NP-Problems in polynomial time with the risk of failure in 1% (or whatever you deem "safe enough") of all executions.
(Remember: It's one thing to know that the halting problem is not solvable, but another to build a program that determines whether "hello world" implementations will run indefinetly.)
I think the following iterative algorithm will traverse the entire solution set and store the list of tasks, the total cost of performing them, and the opportunity cost of the tasks not performed.
It seems like it will execute in pseudo-polynomial time: polynomial in the number of activities and exponential in the number of activities that can fit within the budget.
ixCurrentSolution = 1
initialize empty set solution {
oc(ixCurrentSolution) = opportunity cost of doing nothing
tasklist(ixCurrentSolution) = empty set
costTotal(ixCurrentSolution) = 0
}
for ixTask = 1:cActivities
for ixSolution = 1:ixCurrentSolution
costCurrentSolution = costTotal(ixCurrentSolution) + cost(ixTask)
if costCurrentSolution < costMax
ixCurrentSolution++
costTotal(ixCurrentSolution) = costCurrentSolution
tasklist(ixCurrentSolution) = tasklist(ixSolution) U ixTask
oc(ixCurrentSolution) = OC of tasks not in tasklist(ixCurrentSolution)
endif
endfor
endfor

Resources