Optimally fair load balancing/multiprocessor scheduling of periodic tasks - algorithm

I’ve been thinking about scheduling and load balancing algorithms, and I came up with a problem that I think is interesting.
There are N cages and M zookeepers. Each cage has a size S and a number of animals A. The frequency with which a cage must be cleaned is computed as some function of A / S (smaller cages with more animals get dirty faster). The difficulty of cleaning a cage is computed as some other function of A and S, the details of which are unimportant (the size of a cage contributes most of the difficulty, and the number of animals contributes a little). Once every three days, any cages that are due for cleaning are cleaned (“cleaning day”). Zookeepers are completely identical and interchangeable. Zookeepers need to do a similar amount of work each cleaning day, and to not have to do much more work than any other zookeeper. The duration of time that a cage takes to clean is not part of the problem (it's assumed that time is roughly reflected by difficulty, and that there is always enough time in the day for a zookeeper to complete their assigned tasks).
The scheduling algorithm must tell each zookeeper which cages to clean on each cleaning day, such that
each cage is cleaned at its ideal frequency or as close as possible,
assigning a minimal and roughly equal number of cleanings per
zookeeper per cleaning day,
and assuring as equal a workload as possible across all zookeepers
(i.e., over a period of time, the aggregate difficulties of each zookeeper’s workload are as close to equal as possible, and cages are distributed among zookeepers with roughly 1/M probability).
I’m wondering what an approximation algorithm for such an optimization problem would look like. It bears a resemblance to several different classic examples of NP-hard scheduling/resource utilization problems; maybe it is reducible to one such problem and I’m just missing it. If we get rid of the frequency/periodicity of tasks element, it is similar to the classic multiprocessor scheduling or finite bin packing problem.

Given that the objective is to equalize zookeeper effort, the more-or-less standard way to handle such tasks this is on-line greedy.
In this case, that amounts to this:
Maintain a tally of the effort each zookeeper has expended so far, initially zero. On each cleaning day, tally up the needed cleaning jobs and use first-fit, best-fit, or random fit to assign jobs in a way that will tend to equalize the work sums. I.e. for best fit assign he biggest job to the keeper farthest behind in work assigned so far. Repeat until all tasks are assigned.

Related

Parallel Solving with PDPTW in OptaPlanner

I'm trying to increase OptaPlanner performance using parallel methods, but I'm not sure of the best strategy.
I have PDPTW:
vehicle routing
time-windowed (1 hr windows)
pickup and delivery
When a new customer wants to add a delivery, I'm trying to figure out a fast way (less than a second) to show them what time slots are available in a day (8am, 9am, 10am, etc). Each time slot has different score outcomes. Some are very efficient and some aren't bookable depending on the time/situation with increased drive times.
For performance, I don't want to try each of the hour times in sequence as it's too slow.
How can I try the customer's delivery across all the time slots in parallel? It would make sense to run the solver first before adding the customer's potential delivery window and then share that solved original state with all the different added delivery's time slots being solved independently.
Is there an intuitive way to do this? Eg:
Reuse some of the original solving computation (the state before adding the new delivery). Maybe this can even be cached ahead of time?
Perhaps run all the time slot solving instances on separate servers (or at least multiple threads).
What is the recommended setup for something like this? It would be great to return an HTTP response within a second. This is for roughly 100-200 deliveries and 10-20 trucks.
Thanks!
A) If you optimize the assignment of 1 customer to 1 index in 1 of the vehicles, while pinning all other already assigned customers, then you forgoing all optimization benefits. It's not NP-hard.
You can still use OptaPlanner <constructionHeuristic/> for this (<localSearch/> won't improve the score), with or without moveThreadCount to spread it across cores, even though the main benefit will just the the incremental score calculation, not the AI algoritms.
B) Optimize assignment of all customers to an index of a vehicle. The real business benefits - like 25% less driving time - come when adding a new customer allows moving existing customer assignments too. The problem is that those existing customers already received a time window they blocked out in their agenda. But that doesn't need to be a problem if those time windows are wide enough: those are just hard constraints. Wider time windows = more driving time optimization opportunities (= more $$$, less CO² emissions).
What about the response within one minute?
At that point, you don't need to publish (= share info with the customer) which vehicle will come at which time in which order. You only need to publish whether or not you accept the time window. There's two ways to accomplish this:
C) Decision table based (a relaxation): no more than 5 customers per vehicle per day.
Pitfall: if it gets 5 customers in the 5 corners of the country/state, then it might still be infeasible. Factor in the average eucledean distance between any 2 customer location pairs to influence the decision.
D) By running optaplanner until termination feasible=true, starting from a warm start of the previous schedule. If no such feasible solution is found within 1000ms, reject the time window proposal.
Pitfall with D): if 2 requests come in at the same time, and you run them in parallel, so neither takes into account the other one, they could be feasible individually but infeasible together.

Algorithm for detecting clusters in peaks in time series signal data

I have a binary time series data set with on/off data. The on is usually short lived hence looks like a peak. This is how it looks.
I have detected the peaks and extracted time intervals between the peaks and have data for it too (small red 2way arrows at the bottom). The issue is that, as can be seen, the peaks are clustered and I would want to have quantification regarding the burst size (number of peaks in a cluster), interburst interval (distance between the last peak of the first cluster and first peak of the last cluster), no. of bursts, etc.
All this is easy to do once the clusters are identified. This can be easily done by thresholding the interpeak interval to be greater than some value. But all of my data doesn't have such well-defined clusters, and the interburst interval varies largely. Some of the datasets do not even have clusters. So my main issue here would be to identify clusters based on some automated and relative (not fixed) thresholding.
Could someone please help me with an algorithm for the same.
The answer to your question is: No. No one can (yet) help you with an algorithm for what you want.
The problem is that you don't have anything well quantified. You're asking for a reliable algorithm that can identify clusters, when you can't identify what a cluster is.
I wrote a previous answer that recommended you look at the ratio from one peak to the next. If the ratio is above a certain threshold, then it's an inter-cluster gap, otherwise it's an intra-cluster gap. That can work, but it does still have a threshold.
The problem is - you need one. You can't just eyeball each graph and say "Oh, there's a cluster." If you don't define a cluster, you can't identify one. There are ways to make your thresholds more generic; the ratio is one of the simpler ways that lets you avoid scaling issues, and is generally effective. You could look at rolling averages. There are all sorts of ways to play with your data, but somewhere in there, you have to define what you want. Even if you trained some artificial intelligence, you should ideally be doing it with a fixed criteria as to what's a cluster and what isn't. And once you have the fixed criteria, you don't need artificial intelligence.
So, define a cluster. Once you can quantify what a cluster means to you, you can work on making an algorithm for it.
Start by answering these questions:
How many peaks at a minimum are needed to define a cluster?
Is there a minimum or maximum time between peaks that makes it not a cluster? How about a minimum or maximum time that's relative to the entire time of the dataset?
Is there a minimum distance between clusters that makes it two instead of one?
If it helps, look at simplified plots like these to help you come up with your answers. Can you define a cluster for each of these?
..||.|.|.|.||
|.|.|.|.|.|.|
||..||..||..|
||....||....|
|...||||.....

Assignment Problem when jobs are available more than once

I have a normal assignment problem, where I want to match workers to jobs. But there are several kinds of jobs, each with a set amount of positions. So for example I would need 10,000 builders, 5,000 welders etc. Each worker has of course the same preference for each position of the same kind of job.
My current approach is to use the Hungarian Algorithm and to just extend the matrix columns to account for that. So for example it would have 10,000 builder columns, 5,000 welder etc. Of course with O(n3) and a matrix that big, getting results may take a while.
Is there any variation of the Hungarian algorithm, or a different one, which uses the fact, that there can be multiple connections to one job node? Or should rather look into Monte Carlo or genetic search tree algorithms?
edit:
Formal description as Sascha proposed:
Set W for workers, J for jobs, weight function for the preference, function for the amount of jobs available
So the function I want to minimize would be:
where
Constraints would be:
and
As asked by Yay295, it would be ok if it ran for a day or two on a normal consumer machine. There are 50k workers right now with 10 kinds of jobs and 50k jobs total. So the matrix is 50k x 50k (extended) in the case of the Hungarian algorithm I'm using right now, or 50k x 10 for LP with the additional constraint , while and preference values in the matrix would go from 0-100.
This is actually called the Transportation Problem. The Transportation Problem is similar to the Assignment Problem in that they both have sources and destinations, but the Transportation Problem has two more values: each source has a supply, and each destination has a demand. The Assignment Problem is a simplification of the Transportation Problem in which the supply of each source and the demand of each destination is 1.
In your case, you have 50,000 sources (your workers) each with a supply of 1 (each worker can only work one job). You also have 10 destinations (the job types) each with some amount of demand (the number of openings for that type).
The Transportation Problem is traditionally solved with the Simplex Algorithm. I couldn't tell you how it works off the top of my head, but there is plenty of information available elsewhere online on how to do it. I would recommend these two videos: first, second.
 
Alternatively, the Transportation Problem can actually also be solved using the Hungarian Algorithm. The idea is to keep track of your supply and demand separately, and then use the Hungarian Algorithm (or any other algorithm for the Assignment Problem) to solve it as if the supply and demand were 1 (this can be incredibly fast when it's as lopsided as 50,000 sources to 10 destinations as in your case). Once you've solved it once, use the results to decrement the supply and demand of the assigned solution appropriately. Repeat until the sum of either supply or demand is zero.
However, none of this may be necessary. I wrote my own Assignment Problem solver in C++ a few years ago, and despite using 2.5GB of RAM, it can solve a 50,000 by 50,000 assignment problem in less than 5 seconds. The trick is to write your own. Before I wrote mine I had a look around at what was available online, and they were all pretty bad, often with obvious bugs. If you are going to write your own code for this though, it would be better to use the Simplex Algorithm as described in the videos I linked above. I don't know that one is faster than the other, but the Hungarian Algorithm wasn't made for the Transportation Problem.
 
ps: The same person who did the two lectures I linked above also did one on the Assignment Problem and the Hungarian Algorithm.

Simulation Performance Metrics

This is a semi-broad question, but it's one that I feel on some level is answerable or at least approachable.
I've spent the last month or so making a fairly extensive simulation. In order to protect the interests of my employer, I won't state specifically what it does... but an analogy of what it does may be explained by... a high school dance.
A girl or boy enters the dance floor, and based on the selection of free dance partners, an optimal choice is made. After a period of time, two dancers finish dancing and are now free for a new partnership.
I've been making partner selection algorithms designed to maximize average match outcome while not sacrificing wait time for a partner too much.
I want a way to gauge / compare versions of my algorithms in order to make a selection of the optimal algorithm for any situation. This is difficult however since the inputs of my simulation are extremely large matrices of input parameters (2-5 per dancer), and the simulation takes several minutes to run (a fact that makes it difficult to test a large number of simulation inputs). I have a few output metrics, but linking them to the large number of inputs is extremely hard. I'm also interested in finding which algorithms completely fail under certain input conditions...
Any pro tips / online resources which might help me in defining input constraints / output variables which might give clarity on an optimal algorithm?
I might not understand what you exactly want. But here is my suggestion. Let me know if my solution is inaccurate/irrelevant and I will edit/delete accordingly.
Assume you have a certain metric (say compatibility of the pairs or waiting time). If you just have the average or total number for this metric over all the users, it is kind of useless. Instead you might want to find the distribution of of this metric over all users. If nothing, you should always keep track of the variance. Once you have the distribution, you can calculate a probability that particular algorithm A is better than B for a certain metric.
If you do not have the distribution of the metric within an experiment, you can always run multiple experiments, and the number of experiments you need to run depends on the variance of the metric and difference between two algorithms.

Clustering Algorithm for Paper Boys

I need help selecting or creating a clustering algorithm according to certain criteria.
Imagine you are managing newspaper delivery persons.
You have a set of street addresses, each of which is geocoded.
You want to cluster the addresses so that each cluster is assigned to a delivery person.
The number of delivery persons, or clusters, is not fixed. If needed, I can always hire more delivery persons, or lay them off.
Each cluster should have about the same number of addresses. However, a cluster may have less addresses if a cluster's addresses are more spread out. (Worded another way: minimum number of clusters where each cluster contains a maximum number of addresses, and any address within cluster must be separated by a maximum distance.)
For bonus points, when the data set is altered (address added or removed), and the algorithm is re-run, it would be nice if the clusters remained as unchanged as possible (ie. this rules out simple k-means clustering which is random in nature). Otherwise the delivery persons will go crazy.
So... ideas?
UPDATE
The street network graph, as described in Arachnid's answer, is not available.
I've written an inefficient but simple algorithm in Java to see how close I could get to doing some basic clustering on a set of points, more or less as described in the question.
The algorithm works on a list if (x,y) coords ps that are specified as ints. It takes three other parameters as well:
radius (r): given a point, what is the radius for scanning for nearby points
max addresses (maxA): what are the maximum number of addresses (points) per cluster?
min addresses (minA): minimum addresses per cluster
Set limitA=maxA.
Main iteration:
Initialize empty list possibleSolutions.
Outer iteration: for every point p in ps.
Initialize empty list pclusters.
A worklist of points wps=copy(ps) is defined.
Workpoint wp=p.
Inner iteration: while wps is not empty.
Remove the point wp in wps. Determine all the points wpsInRadius in wps that are at a distance < r from wp. Sort wpsInRadius ascendingly according to the distance from wp. Keep the first min(limitA, sizeOf(wpsInRadius)) points in wpsInRadius. These points form a new cluster (list of points) pcluster. Add pcluster to pclusters. Remove points in pcluster from wps. If wps is not empty, wp=wps[0] and continue inner iteration.
End inner iteration.
A list of clusters pclusters is obtained. Add this to possibleSolutions.
End outer iteration.
We have for each p in ps a list of clusters pclusters in possibleSolutions. Every pclusters is then weighted. If avgPC is the average number of points per cluster in possibleSolutions (global) and avgCSize is the average number of clusters per pclusters (global), then this is the function that uses both these variables to determine the weight:
private static WeightedPClusters weigh(List<Cluster> pclusters, double avgPC, double avgCSize)
{
double weight = 0;
for (Cluster cluster : pclusters)
{
int ps = cluster.getPoints().size();
double psAvgPC = ps - avgPC;
weight += psAvgPC * psAvgPC / avgCSize;
weight += cluster.getSurface() / ps;
}
return new WeightedPClusters(pclusters, weight);
}
The best solution is now the pclusters with the least weight. We repeat the main iteration as long as we can find a better solution (less weight) than the previous best one with limitA=max(minA,(int)avgPC). End main iteration.
Note that for the same input data this algorithm will always produce the same results. Lists are used to preserve order and there is no random involved.
To see how this algorithm behaves, this is an image of the result on a test pattern of 32 points. If maxA=minA=16, then we find 2 clusters of 16 addresses.
(source: paperboyalgorithm at sites.google.com)
Next, if we decrease the minimum number of addresses per cluster by setting minA=12, we find 3 clusters of 12/12/8 points.
(source: paperboyalgorithm at sites.google.com)
And to demonstrate that the algorithm is far from perfect, here is the output with maxA=7, yet we get 6 clusters, some of them small. So you still have to guess too much when determining the parameters. Note that r here is only 5.
(source: paperboyalgorithm at sites.google.com)
Just out of curiosity, I tried the algorithm on a larger set of randomly chosen points. I added the images below.
Conclusion? This took me half a day, it is inefficient, the code looks ugly, and it is relatively slow. But it shows that it is possible to produce some result in a short period of time. Of course, this was just for fun; turning this into something that is actually useful is the hard part.
(source: paperboyalgorithm at sites.google.com)
(source: paperboyalgorithm at sites.google.com)
What you are describing is a (Multi)-Vehicle-Routing-Problem (VRP). There's quite a lot of academic literature on different variants of this problem, using a large variety of techniques (heuristics, off-the-shelf solvers etc.). Usually the authors try to find good or optimal solutions for a concrete instance, which then also implies a clustering of the sites (all sites on the route of one vehicle).
However, the clusters may be subject to major changes with only slightly different instances, which is what you want to avoid. Still, something in the VRP-Papers may inspire you...
If you decide to stick with the explicit clustering step, don't forget to include your distribution in all clusters, as it is part of each route.
For evaluating the clusters using a graph representation of the street grid will probably yield more realistic results than connecting the dots on a white map (although both are TSP-variants). If a graph model is not available, you can use the taxicab-metric (|x_1 - x_2| + |y_1 - y_2|) as an approximation for the distances.
I think you want a hierarchical agglomeration technique rather than k-means. If you get your algorithm right you can stop it when you have the right number of clusters. As someone else mentioned you can seed subsequent clusterings with previous solutions which may give you a siginificant performance improvement.
You may want to look closely at the distance function you use, especially if your problem has high dimension. Euclidean distance is the easiest to understand but may not be the best, look at alternatives such as Mahalanobis.
I'm presuming that your real problem has nothing to do with delivering newspapers...
Have you thought about using an economic/market based solution? Divide the set up by an arbitrary (but constant to avoid randomness effects) split into even subsets (as determined by the number of delivery persons).
Assign a cost function to each point by how much it adds to the graph, and give each extra point an economic value.
Iterate allowing each person in turn to auction their worst point, and give each person a maximum budget.
This probably matches fairly well how the delivery people would think in real life, as people will find swaps, or will say "my life would be so much easier if I didn't do this one or two. It is also pretty flexible (for example, would allow one point miles away from any others to be given a premium fairly easily).
I would approach it differently: Considering the street network as a graph, with an edge for each side of each street, find a partitioning of the graph into n segments, each no more than a given length, such that each paperboy can ride a single continuous path from the start to the end of their route. This way, you avoid giving people routes that require them to ride the same segments repeatedly (eg, when asked to cover both sides of a street without covering all the surrounding streets).
This is a very quick and dirty method of discovering where your "clusters" lie. This was inspired by the game "Minesweeper."
Divide your entire delivery space up into a grid of squares. Note - it will take some tweaking of the size of the grid before this will work nicely. My intuition tells me that a square size roughly the size of a physical neighbourhood block will be a good starting point.
Loop through each square and store the number of delivery locations (houses) within each block. Use a second loop (or some clever method on the first pass) to store the number of delivery points for each neighbouring block.
Now you can operate on this grid in a similar way to photo manipulation software. You can detect the edges of clusters by finding blocks where some neighbouring blocks have no delivery points in them.
Finally you need a system that combines number of deliveries made as well as total distance travelled to create and assign routes. There may be some isolated clusters with just a few deliveries to be made, and one or two super clusters with many homes very close to each other, requiring multiple delivery people in the same cluster. Every home must be visited, so that is your first constraint.
Derive a maximum allowable distance to be travelled by any one delivery person on a single run. Next do the same for the number of deliveries made per person.
The first ever run of the routing algorithm would assign a single delivery person, send them to any random cluster with not all deliveries completed, let them deliver until they hit their delivery limit or they have delivered to all the homes in the cluster. If they have hit the delivery limit, end the route by sending them back to home base. If they could safely travel to the nearest cluster and then home without hitting their max travel distance, do so and repeat as above.
Once the route is finished for the current delivery person, check if there are homes that have not yet had a delivery. If so, assign another delivery person, and repeat the above algorithm.
This will generate initial routes. I would store all the info - the location and dimensions of each square, the number of homes within a square and all of its direct neighbours, the cluster to which each square belongs, the delivery people and their routes - I would store all of these in a database.
I'll leave the recalc procedure up to you - but having all the current routes, clusters, etc in a database will enable you to keep all historic routes, and also try various scenarios to see how to best to adapt to changes creating the least possible changes to existing routes.
This is a classic example of a problem that deserves an optimized solution rather than trying to solve for "The OPTIMUM". It's similar in some ways to the "Travelling Salesman Problem", but you also need to segment the locations during the optimization.
I've used three different optimization algorithms to good effect on problems like this:
Simulated Annealing
Great Deluge Algorithm
Genetic Algoritms
Using an optimization algorithm, I think you've described the following "goals":
The geographic area for each paper
boy should be minimized.
The number of subscribers served by
each should be approximately equal.
The distance travelled by each
should be about equal.
(And one you didn't state, but might
matter) The route should end where
it began.
Hope this gets you started!
* Edit *
If you don't care about the routes themselves, that eliminates goals 3 and 4 above, and perhaps allows the problem to be more tailored to your bonus requirements.
If you take demographic information into account (such as population density, subscription adoption rate and subscription cancellation rate) you could probably use the optimization techniques above to eliminate the need to rerun the algorithm at all as subscribers adopted or dropped your service. Once the clusters were optimized, they would stay in balance because the rates of each for an individual cluster matched the rates for the other clusters.
The only time you'd have to rerun the algorithm was when and external factor (such as a recession/depression) caused changes in the behavior of a demographic group.
Rather than a clustering model, I think you really want some variant of the Set Covering location model, with an additional constraint to cover the number of addresses covered by each facility. I can't really find a good explanation of it online. You can take a look at this page, but they're solving it using areal units and you probably want to solve it in either euclidean or network space. If you're willing to dig up something in dead tree format, check out chapter 4 of Network and Discrete Location by Daskin.
Good survey of simple clustering algos. There is more though:
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/index.html
Perhaps a minimum spanning tree of the customers, broken into set based on locality to the paper boy. Prims or Kruskal to get the MST with the distance between houses for the weight.
I know of a pretty novel approach to this problem that I have seen applied to Bioinformatics, though it is valid for any sort of clustering problem. It's certainly not the simplest solution but one that I think is very interesting. The basic premise is that clustering involves multiple objectives. For one you want to minimise the number of clusters, the trival solution being a single cluster with all the data. The second standard objective is to minimise the amount of variance within a cluster, the trivial solution being many clusters each with only a single data point. The interesting solutions come about when you try to include both of these objectives and optimise the trade-off.
At the core of the proposed approach is something called a memetic algorithm that is a little like a genetic algorithm, which steve mentioned, however it not only explores the solution space well but also has the ability to focus in on interesting regions, i.e. solutions. At the very least I recommend reading some of the papers on this subject as memetic algorithms are an unusual approach, though a word of warning; it may lead you to read The Selfish Gene and I still haven't decided whether that was a good thing... If algorithms don't interest you then maybe you can just try and express your problem as the format requires and use the source code provided. Related papers and code can be found here: Multi Objective Clustering
This is not directly related to the problem, but something I've heard and which should be considered if this is truly a route-planning problem you have. This would affect the ordering of the addresses within the set assigned to each driver.
UPS has software which generates optimum routes for their delivery people to follow. The software tries to maximize the number of right turns that are taken during the route. This saves them a lot of time on deliveries.
For people that don't live in the USA the reason for doing this may not be immediately obvious. In the US people drive on the right side of the road, so when making a right turn you don't have to wait for oncoming traffic if the light is green. Also, in the US, when turning right at a red light you (usually) don't have to wait for green before you can go. If you're always turning right then you never have to wait for lights.
There's an article about it here:
http://abcnews.go.com/wnt/story?id=3005890
You can have K means or expected maximization remain as unchanged as possible by using the previous cluster as a clustering feature. Getting each cluster to have the same amount of items seems bit trickier. I can think of how to do it as a post clustering step by doing k means and then shuffling some points until things balance but that doesn't seem very efficient.
A trivial answer which does not get any bonus points:
One delivery person for each address.
You have a set of street
addresses, each of which is geocoded.
You want to cluster the addresses so that each cluster is
assigned to a delivery person.
The number of delivery persons, or clusters, is not fixed. If needed,
I can always hire more delivery
persons, or lay them off.
Each cluster should have about the same number of addresses. However,
a cluster may have less addresses if a
cluster's addresses are more spread
out. (Worded another way: minimum
number of clusters where each cluster
contains a maximum number of
addresses, and any address within
cluster must be separated by a maximum
distance.)
For bonus points, when the data set is altered (address added or
removed), and the algorithm is re-run,
it would be nice if the clusters
remained as unchanged as possible (ie.
this rules out simple k-means
clustering which is random in nature).
Otherwise the delivery persons will go
crazy.
As has been mentioned a Vehicle Routing Problem is probably better suited... Although strictly isn't designed with clustering in mind, it will optimize to assign based on the nearest addresses. Therefore you're clusters will actually be the recommended routes.
If you provide a maximum number of deliverers then and try to reach the optimal solution this should tell you the min that you require. This deals with point 2.
The same number of addresses can be obtained by providing a limit on the number of addresses to be visited, basically assigning a stock value (now its a capcitated vehicle routing problem).
Adding time windows or hours that the delivery persons work helps reduce the load if addresses are more spread out (now a capcitated vehicle routing problem with time windows).
If you use a nearest neighbour algorithm then you can get identical results each time, removing a single address shouldn't have too much impact on your final result so should deal with the last point.
I'm actually working on a C# class library to achieve something like this, and think its probably the best route to go down, although not neccesairly easy to impelement.
I acknowledge that this will not necessarily provide clusters of roughly equal size:
One of the best current techniques in data clustering is Evidence Accumulation. (Fred and Jain, 2005)
What you do is:
Given a data set with n patterns.
Use an algorithm like k-means over a range of k. Or use a set of different algorithms, the goal is to produce an ensemble of partitions.
Create a co-association matrix C of size n x n.
For each partition p in the ensemble:
3.1 Update the co-association matrix: for each pattern pair (i, j) that belongs to the same cluster in p, set C(i, j) = C(i, j) + 1/N.
Use a clustering algorihm such as Single Link and apply the matrix C as the proximity measure. Single Link gives a dendrogram as result in which we choose the clustering with the longest lifetime.
I'll provide descriptions of SL and k-means if you're interested.
I would use a basic algorithm to create a first set of paperboy routes according to where they live, and current locations of subscribers, then:
when paperboys are:
Added: They take locations from one or more paperboys working in the same general area from where the new guy lives.
Removed: His locations are given to the other paperboys, using the closest locations to their routes.
when locations are:
Added : Same thing, the location is added to the closest route.
Removed: just removed from that boy's route.
Once a quarter, you could re-calculate the whole thing and change all the routes.

Resources