Solving Assignment Problem on a dynamically updating tasks and agents - algorithm

I am having agents say A1, A2, A3, and so on. along with tasks say T1, T2, T3, and so on. I have to efficiently assign at most one task to each agent based on some parameter like T1 can be assigned to A1, A2. T2 can be assigned to A2, and A3. and T3 can be assigned to A3, and A1. I have built an unweighted bipartite graph and performed maximum cardinality matching of 1 using the max flow algorithm. Since my list of agents and tasks is changing dynamically. Is there any way where I don't have to rebuild the graph from scratch and rerun the flow algorithm? Can I use the same graph and somehow rerun the max flow algorithm?

It depends on what you mean by "efficiently assign".
Although you do not say, I assume that you are optimizing some calculated value that measures how "efficient" a particular solution is, compared to others.
But perhaps you will be satisfied with a very fast determination of a pretty good solution based on the optimal solution you first found, modified slightly by the change in circumstances ( e.g. assign the cheapest free agent to a new task ) The modified solution might not be optimal, but it will be close or equal. Every few changes, as the modifications of the optimal solution begin to build up, you can stop and run the whole thing again from scratch.
However, if you insist on the guaranteed optimal solution on every change, then you will have to run from scratch every time.
It all depends on whether this is a practical, real world problem you are tackling, where a pretty good, possibly even optimal, solution is fine, or if this is merely an academic exercise.

Related

Algorithm to place N agents into M shelters with minimal cost

TL;DR: Short problem description:
I am looking for an efficient algorithm that optimizes how N agents, located in 2D space, can be placed into M shelters by minimizing the distance the agents need to travel.
Each shelter can only hold 1 agent. If N > M (more agents than available shelters), then some agents will not get placed into shelters (all agents are the same).
(Optional simplification: while agents can be freely located in 2D space, shelters are always arranged on a square grid. No agent is located outside of the convex hull of shelters.)
This is all you need to know. However, if you think that this problem has no efficient solution then here is ...
a more specific (and to me most relevant) version of the problem:
There are exactly 9 shelters, arranged on a square grid (with distance d). All N agents are located around the central shelter (in a box of size d*d centered around central shelter). However, in this case, the central shelter is always empty but all other shelters may or may not be available (empty) at the beginning.
For this case, I need an algorithm that solves the problem of arbitrary many agents N (typically N < 9) and arbitrary shelters being available (either all 9, or in the extreme case only the central shelter).
The algorithm should be efficient, since I need to solve many of these problems quickly.
Example:
Here is an example with N=3 agents (black dots) and M=5 available shelters (green dots). The red dots show non-available shelters. ]1 I use letters for shelters and numbers for agents.
What I did so far:
I am sure that this problem has a specific name and has been solved/studied already, but I cannot find its name or any solutions. I need to solve many of those problems fast and I always want the optimal solution (if thats not possible, an almost optimal solution is also sufficient). Here is what I tried/thought of so far:
Brute force: I know that the optimal solution to the problem can be found with brute force by checking all possible options, calculating the total travel distance for each and picking the option with smallest total travel distance. This may involve many computations if M and N are large.
A fast but very non-optimal solution works as follows: for each agent i, calculate the distance to central node E. Starting from the agent i with smallest distance to E, assign i to its closest shelter (in this case: E). Then assign the next agent to its closest shelter, considering that E is now not available anymore, etc, until all agents are assigned or stop if no more free shelters are available. This works, is fast, but of course produces non-optimal results (in the example image: 2->E, 1->B, 3->F, while the optimal solution should be 3->E, 2->F, 1->B)
Another idea I'm working on is to first find the agents that are under the most "pressure", i.e. all of their good options are far away. Starting with the agent under highest pressure, assign it to the closest shelter. Continue for all other agents. However, I am not sure how to properly define "pressure" for this problem, as it likely should be a combination of the distances to the first few shelters. Also, I am not sure that this will lead to the optimal solution, but may result in an almost optimal solution.
I am trying to think of this problem as some sort of weighted permutation, that is, I need to select N shelters and map them to the N agents, but each mapping comes at a cost. I need to minize the total cost, but I have no idea how to do this.
Also, I am thinking of some sort of Simulated Annealing, or some form of push-and-pull algorithm where each shelter is attracting agents, or agents are attracted to shelters based on their distance. While this may sound interesting, I would expect that this is computationally not efficient.
I am happy for any input, especially if this problem already has a proper name and solutions. I am also happy for a simple and fast-to-compute algorithm that achieves an almost optimal solution.
As suggested in the comments (thanks again!), this is indeed answered by this post.
Specifically, this is an assignment problem which gets solved by the Hungarian algorithm, considering agents as workers, shelters as tasks and the cost of worker i doing task j being the Manhatten distance between agent i and shelter j.
The python package munkres implements this algorithm and is very fast for the 9-shelter problem. If there are more shelters than agents, the package handles it automatically. For the case of more agents than shelters I am satisfied with deleting random agents until the number of agents is equal to the number of shelters. Therefore my problem is solved.

Ideas for heuristically solving travelling salesman with extra constraints

I'm trying to come up with a fast and reasonably optimal algorithm to solve the following TSP/hamiltonian-path-like problem:
A delivery vehicle has a number of pickups and dropoffs it needs to
perform:
For each delivery, the pickup needs to come before the
dropoff.
The vehicle is quite small and the packages vary in size.
The total carriage cannot exceed some upper bound (e.g. 1 cubic
metre). Each delivery has a deadline.
The planner can run mid-route, so the vehicle will begin with a number of jobs already picked up and some capacity already taken up.
A near-optimal solution should minimise the total cost (for simplicity, distance) between each waypoint. If a solution does not exist because of the time constraints, I need to find a solution that has the fewest number of late deliveries. Some illustrations of an example problem and a non-optimal, but valid solution:
I am currently using a greedy best first search with backtracking bounded to 100 branches. If it fails to find a solution with on-time deliveries, I randomly generate as many as I can in one second (the most computational time I can spare) and pick the one with the fewest number of late deliveries. I have looked into linear programming but can't get my head around it - plus I would think it would be inappropriate given it needs to be run very frequently. I've also tried algorithms that require mutating the tour, but the issue is mutating a tour nearly always makes it invalid due to capacity constraints and precedence. Can anyone think of a better heuristic approach to solving this problem? Many thanks!
Safe Moves
Here are some ideas for safely mutating an existing feasible solution:
Any two consecutive stops can always be swapped if they are both pickups, or both deliveries. This is obviously true for the "both deliveries" case; for the "both pickups" case: if you had room to pick up A, then pick up B without delivering anything in between, then you have room to pick up B first, then pick up A. (In fact a more general rule is possible: In any pure-delivery or pure-pickup sequence of consecutive stops, the stops can be rearranged arbitrarily. But enumerating all the possibilities might become prohibitive for long sequences, and you should be able to get most of the benefit by considering just pairs.)
A pickup of A can be swapped with any later delivery of something else B, provided that A's original pickup comes after B was picked up, and A's own delivery comes after B's original delivery. In the special case where the pickup of A is immediately followed by the delivery of B, they can always be swapped.
If there is a delivery of an item of size d followed by a pickup of an item of size p, then they can be swapped provided that there is enough extra room: specifically, provided that f >= p, where f is the free space available before the delivery. (We already know that f + d >= p, otherwise the original schedule wouldn't be feasible -- this is a hint to look for small deliveries to apply this rule to.)
If you are starting from purely randomly generated schedules, then simply trying all possible moves, greedily choosing the best, applying it and then repeating until no more moves yield an improvement should give you a big quality boost!
Scoring Solutions
It's very useful to have a way to score a solution, so that they can be ordered. The nice thing about a score is that it's easy to incorporate levels of importance: just as the first digit of a two-digit number is more important than the second digit, you can design the score so that more important things (e.g. deadline violations) receive a much greater weight than less important things (e.g. total travel time or distance). I would suggest something like 1000 * num_deadline_violations + total_travel_time. (This assumes of course that total_travel_time is in units that will stay beneath 1000.) We would then try to minimise this.
Managing Solutions
Instead of taking one solution and trying all the above possible moves on it, I would instead suggest using a pool of k solutions (say, k = 10000) stored in a min-heap. This allows you to extract the best solution in the pool in O(log k) time, and to insert new solutions in the same time.
You could initially populate the pool with randomly generated feasible solutions; then on each step, you would extract the best solution in the pool, try all possible moves on it to generate child solutions, and insert any child solutions that are better than their parent back into the pool. Whenever the pool doubles in size, pull out the first (i.e. best) k solutions and make a new min-heap with them, discarding the old one. (Performing this step after the heap grows to a constant multiple of its original size like this has the nice property of leaving the amortised time complexity unchanged.)
It can happen that some move on solution X produces a child solution Y that is already in the pool. This wastes memory, which is unfortunate, but one nice property of the min-heap approach is that you can at least handle these duplicates cheaply when they arrive at the front of the heap: all duplicates will have identical scores, so they will all appear consecutively when extracting solutions from the top of the heap. Thus to avoid having duplicate solutions generate duplicate children "down through the generations", it suffices to check that the new top of the heap is different from the just-extracted solution, and keep extracting and discarding solutions until this holds.
A note on keeping worse solutions: It might seem that it could be worthwhile keeping child solutions even if they are slightly worse than their parent, and indeed this may be useful (or even necessary to find the absolute optimal solution), but doing so has a nasty consequence: it means that it's possible to cycle from one solution to its child and back again (or possibly a longer cycle). This wastes CPU time on solutions we have already visited.
You are basically combining the Knapsack Problem with the Travelling Salesman Problem.
Your main problem here seems to be actually the Knapsack Problem, rather then the Travelling Salesman Problem, since it has the one hard restriction (maximum delivery volume). Maybe try to combine the solutions for the Knapsack Problem with the Travelling Salesman.
If you really only have one second max for calculations a greedy algorithm with backtracking might actually be one of the best solutions that you can get.

NP-Hardness proof for constrained scheduling with staircase cost

I am working on a problem that appears like a variant of the assignment problem. There are tasks that need to be assigned to servers. The sum of costs over servers needs to be minimized. The following conditions hold:
Each task has a unit size.
A task may not be divided among more than one servers. A task must be handled by exactly one server.
A server has a limit on the maximum number of tasks that may be assigned to it.
The cost function for task assignment is a staircase function. A server incurs a minimum cost 'a'. For each task handled by the server, the cost increases by 1. If the number of tasks assigned to a particular server exceeds half of it's capacity, there is a jump in that server's cost equal to a positive number 'd'.
Tasks have preferences, i.e., a given task may be assigned to one of a few of the servers.
I have a feeling that this is an NP-Hard problem, but I can't seem to find an NP-Complete problem to map to it. I've tried Bin Packing, Assignment problem, Multiple Knapsacks, bipartite graph matching but none of these problems have all the key characteristics of my problem. Can you please suggest some problem that maps to it?
Thanks and best regards
Saqib
Have you tried reducing the set partitioning problem to yours?
The SET-PART (stands for "set partitioning") decision problem asks whether there exists a partition of a given set S of numbers into two sets S1 and S2, so that the sum of the elements in S1 equals the sum of elements in S2. This problem is known to be NP-complete.
Your problem seems related to the m-PROCESSOR decision problem. Given a nonempty set A of n>0 tasks {a1,a2,...,an} with processing times t1,t2,...,tn, the m-PROCESSOR problem asks if you can schedule the tasks among m equal processors so that all tasks finish in at most k>0 time steps. (Processing times are (positive) natural numbers.)
The reduction of SET-PART to m-PROCESSOR is very easy: first show that the special case, with m=2, is NP-complete; then use this to show that m-PROCESSOR is NP-complete for all m>=2. (A reduction in Slovene.)
Hope this helps.
EDIT 1: Oops, this m-PROCESSOR thingy seems very similar to the assignment problem.

How can I efficiently find the subset of activities that stay within a budget and maximizes utility?

I am trying to develop an algorithm to select a subset of activities from a larger list. If selected, each activity uses some amount of a fixed resource (i.e. the sum over the selected activities must stay under a total budget). There could be multiple feasible subsets, and the means of choosing from them will be based on calculating the opportunity cost of the activities not selected.
EDIT: There are two reasons this is not the 0-1 knapsack problem:
Knapsack requires integer values for the weights (i.e. resources consumed) whereas my resource consumption (i.e. mass in the knapsack parlance) is a continuous variable. (Obviously it's possible to pick some level of precision and quantize the required resources, but my bin size would have to be very small and Knapsack is O(2^n) in W.
I cannot calculate the opportunity cost a priori; that is, I can't evaluate the fitness of each one independently, although I can evaluate the utility of a given set of selected activities or the marginal utility from adding an additional task to an existing list.
The research I've done suggests a naive approach:
Define the powerset
For each element of the powerset, calculate it's utility based on the items not in the set
Select the element with the highest utility
However, I know there are ways to speed up execution time and required memory. For example:
fully enumerating a powerset is O(2^n), but I don't need to fully enumerate the list because once I've found a set of tasks that exceeds the budget I know that any set that adds more tasks is infeasible and can be rejected. That is if {1,2,3,4} is infeasible, so is {1,2,3,4} U {n}, where n is any one of the tasks remaining in the larger list.
Since I'm just summing duty the order of tasks doesn't matter (i.e. if {1,2,3} is feasible, so are {2,1,3}, {3,2,1}, etc.).
All I need in the end is the selected set, so I probably only need the best utility value found so far for comparison purposes.
I don't need to keep the list enumerations, as long as I can be sure I've looked at all the feasible ones. (Although I think keeping the duty sum for previously computed feasible sub-sets might speed run-time.)
I've convinced myself a good recursion algorithm will work, but I can't figure out how to define it, even in pseudo-code (which probably makes the most sense because it's going to be implemented in a couple of languages--probably Matlab for prototyping and then a compiled language later).
The knapsack problem is NP-complete, meaning that there's no efficient way of solving the problem. However there's a pseudo-polynomial time solution using dynamic programming. See the Wikipedia section on it for more details.
However if the maximum utility is large, you should stick with an approximation algorithm. One such approximation scheme is to greedily select items that have the greatest utility/cost. If the budget is large and the cost of each item is small, then this can work out very well.
EDIT: Since you're defining the utility in terms of items not in the set, you can simply redefine your costs. Negate the cost and then shift everything so that all your values are positive.
As others have mentioned, you are trying to solve some instance of the Knapsack problem. While theoretically, you are doomed, in practice you may still do a lot to increase the performance of your algorithm. Here are some (wildly assorted) ideas:
Be aware of Backtracking. This corresponds to your observation that once you crossed out {1, 2, 3, 4} as a solution, {1, 2, 3, 4} u {n} is not worth looking at.
Apply Dynamic Programming techniques.
Be clear about your actual requirements:
Maybe you don't need the best set? Will a good one do? I am not aware if there is an algorithm which provides a good solution in polynomial time, but there might well be.
Maybe you don't need the best set all the time? Using randomized algorithms you can solve some NP-Problems in polynomial time with the risk of failure in 1% (or whatever you deem "safe enough") of all executions.
(Remember: It's one thing to know that the halting problem is not solvable, but another to build a program that determines whether "hello world" implementations will run indefinetly.)
I think the following iterative algorithm will traverse the entire solution set and store the list of tasks, the total cost of performing them, and the opportunity cost of the tasks not performed.
It seems like it will execute in pseudo-polynomial time: polynomial in the number of activities and exponential in the number of activities that can fit within the budget.
ixCurrentSolution = 1
initialize empty set solution {
oc(ixCurrentSolution) = opportunity cost of doing nothing
tasklist(ixCurrentSolution) = empty set
costTotal(ixCurrentSolution) = 0
}
for ixTask = 1:cActivities
for ixSolution = 1:ixCurrentSolution
costCurrentSolution = costTotal(ixCurrentSolution) + cost(ixTask)
if costCurrentSolution < costMax
ixCurrentSolution++
costTotal(ixCurrentSolution) = costCurrentSolution
tasklist(ixCurrentSolution) = tasklist(ixSolution) U ixTask
oc(ixCurrentSolution) = OC of tasks not in tasklist(ixCurrentSolution)
endif
endfor
endfor

Prescheduling Recurrent Tasks

At work, we are given a set of constraints of the form (taskname, frequency) where frequency is an integer number which means the number of ticks between each invocation of the task "taskname". Two tasks cannot run concurrently, and each task invocation takes one tick to complete. Our goal is to find the best schedule in terms of matching the set of constraints.
For example, if we are given the constraints {(a, 2), (b,2)} the best schedule is "ab ab ab..."
On the other hand, if we are given the constraints ({a,2}, {b, 5}, {c, 5}) the best schedule is probably "abaca abaca abaca..."
Currently we find the best schedule by running a genetic algorithm which tries to minimize the distance between actual frequencies and the given constrains. It actually works pretty well, but I wonder if there's some algorithm which better suits this kind of problem. I've tried to search Google but I seem to lack the right words (scheduling is usually about completing tasks :(). Can you help?
First off, consider the merits of jldupont's comment! :)
Second, I think 'period' is the accurate description of the second element of the tuple, e.g. {Name, Period[icity]}.
That said, look to networking algorithms. Some variant of weighted queuing is probably applicable here.
For example, given N tasks, create N queues corresponding to tasks T0...Tn, and in each cycle ("tick") based on the period of the task, queue an item to the corresponding queue.
The scheduler algorithm would then aim for minimizing (on average) the total number of waiters in the queues. Simple starting off point would be to simply dequeue from the quene Qx which has the highest number of items. (A parameter on queued item to indicate 'age' would assist in prioritization.)

Resources