Problem
There are N points in 2D space (with coordinates in the range 10^9). All these points must be processed (once each).
Processing can use P parallel threads (with typical hardware, P ≈ 6).
The time it takes to process a point is different for each point and unknown beforehand.
All points being processed in parallel must be at least D apart from each other (Euclidean or other distance measure are all okay).
Attempts
I would imagine the algorithm would be an implementation of two parts:
Which points to schedule initially
Which new point to schedule (if possible) when a point finishes being processed
My solutions have not been much better than a naive method, which is simply to keep trying random points until one is at least D away from all points being processed.
(I have thought about making P of points so that every element of one group is at least D away from every element of every other group, and then when a point from a group finishes, take the next point in the group. This only saves time in some scenarios though, and I have not determined how to get a good set of groups either.)
I have recently came across a lot of questions that involve time intervals as an input. Some of the time intervals are overlapping. And depending upon that you have to perform an optimization, maximization or minimization operation on the input. I am not able to solve such problems. In fact, I am not able to even start thinking on these problems.
Here is an example:
Let us say, you are a resource holder. There can be an infinite supply of such a resource.
There are people who want that resource for a particular time interval. For ex: 4 pm to 8 pm
There can be an overlapping interval. ex: 5 pm to 7 pm, 3 pm to 6 pm
etc.
Depending upon these intervals, and their overlapping nature, you have to figure out how many distinct instances of these resources are required.
Ex. Input:
8 am - 9 am
8:30 am to 9:15 am
9.30 am to 1040 am
In this case, the first two intervals overlap. So two instances of resources will be required. The third interval is not overlapping, so the person with that interval can reuse the resource returned by any of the earlier ones.
Hence, in this case, minimum resources required are 2.
I don't need a solution. I need some pointers on how to solve. Are there any algorithms that address such questions? What should I read/ study. Are there any data structures that might help.
The number of intervals overlapping any time instant T is the number of interval start times less than T, minus the number of interval end times less than or equal to T.
Many of these problems, like the specific one above, can be solved by putting the start and end times separately into a sorted list or tree so you can figure out stuff about how these counts change over time.
To solve this problem, for example, sort the start and end times in a single list:
800S, 900E, 830S, 915E, 930S, 1040E
then sort them:
800S, 830S, 900E, 915E, 930S, 1040E
The run through the list and count, adding 1 for each start time and subtracting one for each end time:
1 2 1 0 1 0
The highest number of overlapping intervals is 2.
The data structure you need to use in order to solve this type of problems is The Interval Graph. The Interval Graph has a vertex for every interval and an edge between every pair of vertices corresponding to intervals that intersect.
The following interval graph corresponds to the set of three intervals in your example:
A: 8:00-9:00
B: 8:30-9:15
C: 9:30-10:40
This data structure captures the relevant aspects of most problems involving intervals and thus helps to solve them efficiently. Also, given the set of intervals (represented by a list of 2-tuples), you can construct the interval graph in Polynomial time.
Many problems that are NP-hard in general graphs, such as finding the Maximum Weight Independent Set or finding the Optimal Coloring, can be efficiently solved for interval graphs.
To solve the particular problem you've specified, first construct the interval graph G, while storing for each vertex the finish time of its corresponding interval. Also initialize a set of resources R={1} that at first contains only a single resource: resource number 1. Consider each vertex v of G in sorted order according to their finish time. Assign to v resource number i where i is the smallest resource in R not used by the neighbors of v. If no such a resource exists (because the neighbors of v use all the resources in R), insert a new resource i=max{R}+1 to R and assign it to v. The optimal number of resources (aka, the solution to your problem) is the size of the set R.
I have created the following mathematical abstraction for an optimisation problem.
O - Order
C - customer
P - Producer
S - Productionslot(related to producer)
pc - Productioncost per slot
tc - Transportcost for an order produceded in a specific slot
a - Productamount per order
ca - Slotcapacity
e - Endtime of a slot
dt - Latest order arrival time
tt - Transporttime
Decisionvariables:
x - Produce in a specific slot
y - Produce an order in an specific slot
The following cost function has to be minimised:
Meaning each order has to been processed in one production slot. Depending on the producer connected to this slot and its distance to the customer transportcost arise. If at least one order is produced in an slot, some sort of fixed cost for this production will occure. The aggregated sum of this costs should be minimised.
The following conditions apply:
This optimisation has to been repeated every day, with a different set of orders(o and a will vary).
Each producer will have 4 to 5 slots he produces in. The number of producers is 20. Orders are round about 100.
The last condition can be used to reduce the problem in advance. Meaning set all y's to 0 where the arrival time can not be reached, because arrival time is before productionend time plus transport time.
As far as I can see this problem and its size can not be easy used be just iterating through all feasible solutions. Even if you use condition 2 to reduced the combinations of "y" strongly, I will be left with approximately (5*20)^200 solutions to check(assign each order to each slot and check all combinations of order associations with respect to rest condtion fulfillment and cost function value).
If I do some mathematical operations I can form the problem to look pretty close to an Multidimensional Knapsack problem.
E.g.
- multiplie the objective function by -1 to have a max problem.
- Split condition 2 in two conditions to get less equal conditions(multiply the greater than with -1 as well)
- Move the capacity term in condition 3 to the left side
But these operations lead to partly negative coefficient. Algorithms found in the literature to approach the MDK often assume them to be positiv, is that a problem?
Has someone a good branch and bound approach to capture the problem? Or do I have to use I Metaheuristic as often propossed in literature, and has someone used a specific approach to solve a similar problem before to tell me about his/her experience?
Has someone other moddeling tricks to reduce the problem complexity in the given situation.
I have a pool of worker threads in which I send request to them based on percentage. For example, worker 1 must process 60% of total requests, worker 2 must process 31% of total requests and lastly worker 3 processes 9%. I need to know mathematically how to scale down the numbers and maintain ratio so I don't have to send 60 requests to thread 1 and then start sending requests to worker 2. It sounds like a "Linear Scale" math approach. In any case, all inputs on this issue are appreciated
One way to think about this problem makes it quite similar to the problem of drawing a sloped line on a pixel-based display, which can be done with Bresenham's algorithm.
First let's assume for simplicity that there are only 2 workers, and that they should take a fraction p (for worker 1) and (1-p) (for worker 2) of the incoming requests. Imagine that "Requests sent to worker 1" is the horizontal axis and "Requests sent to worker 2" is the vertical axis of a graph: what we want to do is draw a (pixelated) line in this graph that starts at (0, 0) and has slope (1-p)/p (i.e. it advances (1-p) units upwards for every p units it advances rightwards). When a new request comes in, a new pixel gets drawn. This new pixel will always be either immediately to the right of the previous pixel (if we assign the job to worker 1) or immediately above it (if we assign it to worker 2), so it's not quite like Bresenham's algorithm where diagonal movements are possible, but there are similarities.
With each new request that comes in, we have to assign that request to one of the workers, corresponding to drawing the next pixel rightwards or upwards from the previous one. I propose that a good way to pick the right direction is to pick the one that minimises an error function. The easiest thing to do is to take the slope of the line between (0, 0) and the point that would result from each of the 2 possible choices, and compare these slopes to the ideal slope (1-p)/p; then pick whichever one produces the lowest difference. This will cause the drawn pixels to "track" the ideal line as closely as possible.
To generalise this to more than 2 dimensions (workers), we can't use slope directly. If there are W workers, we need to come up with some function error(X, Y), where X and Y are both W-dimensional vectors, one representing the ideal direction (the ratios of requests to assign, analogous to the slope (1-p)/p earlier), the other representing the candidate point, and returning some number representing how different their directions are. Fortunately this is easy: we can take the cosine of the angle between two vectors by dividing their dot product by the product of their magnitudes, which is easy to calculate. This will be 1 if their directions are identical, and less than 1 otherwise, so when a new request arrives, all we need to do is perform this calculation for each of worker 1 <= i <= W and see which one's error(X, Y[i]) is closest to 1: that's the worker to give the request to.
[EDIT]
This procedure will also adapt to changes in the ideal direction. But as it stands, it tries (as hard as it can) to make the overall ratios of every request assigned so far track the ideal direction, so if the procedure has been running a long time, then even a small adjustment in the target direction could result in large "swings" to compensate. In that case, when calling error(X, Y[i]), it might be better to compute the second argument using the difference between the latest pixel (request assignment) and the pixel from some number k (e.g. k=100) steps ago. (In the original algorithm, we are implicitly subtracting the starting point (0, 0), i.e. k is as large as possible.) This only requires you to keep the last k chosen endpoints. Picking k too large will mean you can still get large swings, while picking k too small might mean that the "line" drifts well off-course, with some workers never picked at all, because each assignment alters the direction so drastically. You might need to experiment to find a good k.
To keep the assignments non-clustered, associate merits with each workers jobs inversely proportional to the intended share, e.g., 31 * 9 for w1, 60 * 9 for w2, and 31 * 60 for w3. Start mit no merits for each worker, next job goes to worker with least merits, and lesser ordinal in case of ties. Accumulate merits for jobs done. (On overflow from one accumulator, subtract MAXVALUE - 31 * 60 from each.)
(Before anyone asks, this is not homework.)
I have a set of workers with interests, i.e.:
Bob: Java, XML, Ruby
Susan: Java, HTML, Python
Fred: Python, Ruby
Sam: Java, Ruby
etc.
(There are actually somewhere in the range of 10-25 "interests" for each worker, and I have around 40-50 workers)
At the same time, I have a very large set of tasks that need to be distributed among the workers. Each task has to be assigned to at least 3 workers, and the workers must match at least one of the tasks' interests:
Task 1: Ruby, XML
Task 2: XHTML, Python
and so on. So Bob, Fred, or Sam could get Task 1; Susan or Fred could get Task 2.
This is all stored in a database thusly:
Task
id integer primary key
name varchar
TaskInterests
task_id integer
interest_id integer
Workers
id integer primary key
name varchar
max_assignments integer
WorkerInterests
worker_id
interest_id
Assignments
task_id
worker_id
date_assigned
Each worker has a maximum number of assignments they will do, around 10. Some interests are more rare than others (i.e. only 1 or 2 workers have listed them as a interest), some interests are more common (i.e. half of the workers list them).
The algorithm must:
Assign every task to 3 workers (it is
assumed that at least 3 of the
workers are interested in one of the
interests of the task).
Assign every worker 1 or more tasks
Ideally, the algorithm will:
Assign each worker a number of tasks proportional to their maximum assignments and the total number of tasks. For example, if Susan says she will do 20 tasks and most people will only do 10 tasks and there are 50 workers and 300 tasks, she should be assigned 12 tasks (20/10*(300/50)).
Assign a variety of tasks to each worker, so if Susan lists 4 interests she gets tasks that include 4 interests (rather than getting 10 tasks all with the same interest)
The most difficult aspect so far has been dealing with theses issues:
tasks having interests with few corresponding workers
workers who have few interests, especially
workers who have a few interests, for which there are relatively few tasks
This problem can be modeled as a
Maximum Flow Problem.
In a max-flow problem, you have a directed graph with two special nodes, the source and the sink. The edges in the graph have capacities, and your goal is to assign a flow through the graph from the source to the sink without exceeding any of the edge capacities.
With a (very) carefully crafted graph, we can find an assignment meeting your requirements from the maximum flow.
Let me number the requirements.
Required:
1. Workers are assigned no more than their maximum assignments.
2. Tasks can only be assigned to workers that match one of the task's interests.
3. Every task must be assigned to 3 workers.
4. Every worker must be assigned to at least 1 task.
Optional:
5. Each worker should be assigned a number of tasks proportional to that worker's maximum assignments
6. Each worker should be assigned a variety of tasks.
I will assume that the maximum flow is found using the
Edmonds-Karp Algorithm.
Let's first find a graph that meets requirements 1-3.
Picture the graph as 4 columns of nodes, where edges only go from nodes in a column to nodes in the neighboring column to the right.
In the first column we have the source node. In the next column we will have nodes for each of the workers. From the source, there is an edge to each worker with capacity equal to that worker's maximum assignments. This will enforce requirement 1.
In the third column, there is a node for each task. From each worker in the second column there is an edge to each task that that worker is interested in with a capacity of 1 (a worker is interested in a task if the intersection of their interests is non-empty). This will enforce requirement 2. The capacity of 1 will ensure that each worker takes only 1 of the 3 slots for each task.
In the fourth column we have the sink. There is an edge from each task to the sink with capacity 3. This will enforce requirement 3.
Now, we find a maximum flow in this graph using the Edmonds-Karp Algorithm. If this maximum flow is less than 3 * (# of tasks) then there is no assignment meeting requirements 1-3. If not, there is such an assignment and we can find it by examining the final augmented graph. In the augmented graph, if there is an edge from a task to a worker with capacity 1, then that worker is assigned to that task.
Now, we will modify our graph and algorithm to meet the rest of the requirements.
First, let's meet requirement 4. This will require a small change to the algorithm. Initially, set all the capacities from the source to the workers to 1. Find the max-flow in this graph. If the flow is not equal to the number of workers, then there is no assignment meeting requirement 4. Now, in your final residual graph, for each worker the edge from the source to that worker has capacity 0 and the reverse edge has capacity 1. Change these to that worker's maximum assignments - 1 and 0, respectively. Now continue Edmonds-Karp algorithm as before. Basically what we have done is first find an assignment such that each worker is assigned to exactly one task. Then delete the reverse edge from that task so that the worker will always be assigned to at least one task(though it may not be the one assigned to in the first pass).
Now let's meet requirement 5. Strictly speaking, this requirement just means that we divide each worker's maximum assignments by sum of all worker's maximum assignments / number of tasks. This will quite possibly not have a satisfying assignment. But that's ok. Initialize our graph with these new maximum assignments. Run Edmonds-Karp. If it finds a flow that saturates the edges from tasks to sink, we are done. Otherwise we can increment the capacities from sink to workers in the residual graph and continue running Edmonds-Karp. Repeat until we saturate the edges into the sink. Don't increment the capacities so much that a worker is assigned too many tasks. Also, technically, the increment for each worker should be proportional to that worker's maximum assignments. These are both easy to do.
Finally let's meet requirement 6. This one is a bit tricky. First, add a column between workers and tasks and remove all edges from workers to tasks. In this new column, for each worker add a node for each of that workers interests. From each of these new nodes, add an edge to each task with a matching interest with capacity 1. Add an edge from each worker to each of its interest nodes with capacity 1. Now, a flow in this graph would enforce that if a worker is assigned to n tasks, then the intersection of the union of those task's interests with that worker's interests has size at least n. Again, it is possible that there is a satisfying assignment without this assignment, but there is not one with it. We can handle this the same as requirement 5: run Edmonds-Karp to completion, if no satisfying assignment, increment the capacities from workers to their interest nodes and repeat.
Note that in this modified graph we no longer satisfy requirement 3, as a single worker may be assigned to multiple/all slots of a task if the intersection of their interests has size greater than 1. We can fix that. Add a new column of nodes between the interest nodes and the task nodes and delete the edges between those nodes. For each employee, in the new column insert a node for each task (so each employee has its own node for each task). From these new nodes, to their corresponding task to the right, add an edge with capacity 1. From each worker's interests node to that worker's task nodes, add an edge with capacity 1 from each interest to each task that matches.
-
EDIT: Let me try to clarify this a little. Let -(n)-> be an edge with n capacity.
Previously we had worker-(1)->task for each worker-task pair with a matching interest. Now we have worker-(k)->local interest-(1)->local task-(1)->global task. Now, you can think of a task being matched to a worker-interest pair. The first edge says that for a worker, each of its interests can be matched to k tasks. The second edge says that each of a worker's interests can only be matched once to each job. The third edge says that each task can only be assigned once to each worker. Note that you could push multiple flow from the worker to a local task (equal to the size of the intersection of their interests) but only 1 flow from the worker to the global task node due to the third edge.
-
Also note that we can't really mix this incrementing with the one for requirement 5 correctly. However, we can run the whole algorithm once for each capacity {1,2,...,r} for worker->interest edges. We then need a way to rank the assignments. That is, as we relax requirement 5 we can better meet requirement 6 and vice versa. However, there is another approach that I prefer for relaxing these constraints.
A better approach to requirement relaxation (inspired-by/taken-from templatetypedef)
When we want to be able to relax multiple requirements (e.g. 5 and 6), we can model it as a min-cost max-flow problem. This may be simpler than the incremental search that I described above.
For example, for requirement 5, set all the edge costs to 0. We have the initial edge from the source to the worker with the capacity equal to worker's maximum assignments / (sum of all worker's maximum assignments / number of tasks) and with cost 0. Then you can add another edge with the remaining capacity for that worker and cost 1. Another possibility would be to use some sort of progressive cost such that as you add tasks to a worker the cost to add another task to that user goes up. E.g. you could instead split a worker's remaining capacity up into individual edges with costs 1,2,3,4,....
A similar thing could then be done between the worker nodes and the local-interest nodes for requirement 6. The weighting would need to be balanced to reflect the relative importance of the different requirements.
This method is also sufficient to enforce requirement 4. Also, the costs for requirement 5 should probably be made such that they are proportional to a worker's maximum assignments. Then assigning 1 extra task to a worker with max 100 would not cost as much as assigning an extra to a worker with max 2.
Complexity Analysis
Let n = # of employees, m = # of tasks, k = max interests for a single task/worker, l = # of interests, j = maximum of maximum assignments.
Requirement 3 implies that n = O(m). Let's also assume that l = O(m) and j = O(m).
In the smaller graph (before the change for req. 6), the graph has n + m + 2 = O(m) vertices and at most n + m + k*min(n, m) = O(km) edges.
After the change it has 2 + n + n * l + n * m + m = O(nm) vertices and n + k * n + k * m * n + n * m + m = O(kmn) edges (technically we may need j * n + j * l more nodes and edges so that there are not multiple edges from one node to another, but this wouldn't change the asymptotic bound). Also note that no edge need have capacity > j.
Using the min-cost max-flow formulation, we can find a solution in O(VEBlogV) where V = # vertices, E = # edges, and B = max capacity on a single edge. In our case this gives O(kjn^2m^2log(nm)).
For problems where finding a direct solution is difficult it can be a good idea to use an approximation algorithm, an evaulation function and a method to improve the solution. There are a variety of approaches, such as genetic algorithms and simulated annealing.
The basic idea is to use some sort of simple algorithm (such as a greedy algorithm) to get something that is vaguely usable and make random modifications, keeping those modifications that improve the evaluation score and discarding those that make it worse.
With genetic algorithms a group of for example 100 random solutions is generated and scored and the best are kept and "bred" to produce a new generation of solutions with characteristics similar to the previous generations, but with some random mutations.
For simulated annealing the probablility of a slightly worse solutions being accepted is high initially, but decreases over time. This reduces the risk of getting stuck at a local optimium early on.
Try mapping your task to the stable marriage problem. Tasks become prospective wives `, and your staff become suitors.
You might want to add some extra algorithm for assigning preferences of each task to the staff, and vice-versa - you could assign some ideal proficiency neccessary for the components of each task, and then allow your staff to rank each task. You could assign a proficiency for each component that each staff member posses and use that to get each tasks preference in staff members.
Once you have the preferences then run the algorithm, post the results, then allow people to apply in pairs to you to swap assignments - after all this is a people problem and people work better when they have a degree of control.
So I gave this problem some thought and I think that you can get a good solution (for some definition of "good") by reducing it to an instance of min-cost max-flow (see this, for example). The idea is as follows. Suppose you are given as input a set of jobs J, each of which has a set of skills necessary, along with a set of workers W, each of whom has a set of talents. You are also given for each worker a constant k_i saying how many jobs you'd like them to do, as well as a constant m_i saying the maximum number of jobs you can allocate to them. Your goal is to assign the jobs to the workers in such a way that each job is done by a worker who has the skills, no worker does more than m_i jobs, and the number of the "excess" jobs done by the workers is minimized. For example, if the re are five workers who each want to do four tasks and the load is balanced so that two workers do four jobs, one does three, and one does five, the total excess is one, since one worker did one more job than was expected.
The reduction is as follows. For now, we'll ignore the balancing requirement and just see how tom reduce this to max-flow; we'll add load balancing at the end. Construct a graph G with a designated start node s and sink node t. Add to this graph a node for each job j and each worker w. There will be an edge from s to each of these j nodes of cost zero and capacity one. There will also be an edge from each w node to t with cost zero and capacity m_i. Finally, for each job j and worker w, if worker w has the talents necessary to complete job j, there is an edge from j to w with cost zero and capacity one.
The idea is that we want to push flow from s to t through the j and w nodes such that each flow path going through some j node to a w node means that job j should be given to worker w. The capacity restrictions on the edges from s to j nodes ensures that at most one unit of flow enters the j node, so the job is only assigned at most once. The capacity restriction on the edges from the w nodes to the node t prevent each worker from being assigned too many times. Since all capacities are integral, an integral max flow exists from s to t, and so a max-flow in this graph corresponds to an assignments of jobs to workers that is legal and doesn't exceed any worker's maximum load. You can check whether all jobs are assigned by looking at the total flow in the graph; if it's equal to the number of jobs, they've all been assigned.
This above construction, however, does nothing to balance worker loads. To fix this, we'll modify the construction a bit. Rather than having an edge from each w node to t, instead, for each w node, add two nodes to the graph, c and e, and connect them as follows. There is an edge from w_i to c_i with capacity k_i and cost zero, and an identical edge from c_i to t. There is also an edge from w_i to e_i with cost 1 and capacity m_i - k_i. There is also an edge from e_i to t with equal capacity and zero cost.
Intuitively, we haven't changed the amount of flow that leaves any w node, but we have changed how much that flow costs. Flow shunted to t via the c node is free, and so the worker can take on k_i jobs without incurring cost. Any jobs after that have to be routed through e, which costs one for each unit of flow crossing it. Finding a max-flow in this new graph still determines an assignment, but finding the min-cost max-flow in the graph finds the assignment that minimizes the excess jobs divvied up to workers.
Min-cost max flows can be solved in polynomial time with a few somewhat-well-known algorithms, so hopefully this is a useful answer!