I have a set of points (UK full postcode centroids). There is a hierarchical relationship in the postcodes to postcode sectors and postcode districts. The original sectors and districts are contiguous. I wish to derive approximate boundaries for the sectors and districts such that any part of country falls into exactly one sector and exactly one district, all resulting polygons should ideally be contiguous and (obviously?) all original points should be in the appropriate polygons.
Is there some appropriate algorithm? Better yet, is there some appropriate implementation?
I think I must have explained it poorly since I don't think that answers my question.
Let's just talk about sectors since the answer would also apply to districts.
There are 1.8m coordinates. Consider each of these is tagged with a postcode such as "SG13 7AT" The postcode tag can itself reflects the postcode-sector-district structure - the sector in this case is "SG13 7"
There is no other data than these points and their postcode tags.
I know that there exists a boundary that defines the sector. However, this boundary data is not freely available. Each postcode point is known to be inside its true sector boundary.
What I want is to recreate approximations of the sector boundaries such that the points fall within the newly created polygons and so that the polygons I create are contiguous. These boundaries will not be an accurate reflection of the originals, but they are good enough for my purposes.
To obtain the partition of the plane into sectors, according to the sampled postal codes, use the Voronoi diagram calculated on the full set of points, then assign each diagram cell to the sector containing the cell's site.
I am illustrating this with an example on two sectors, red and blue. Say your initial data is this:
Then after computing the Voronoi diagram, the division to cells will look as follows. I outlined the boundary between the red and blue sectors. Note that they are both unbounded, but that's just because the data doesn't include other sectors.
Now to my answer before you clarified things...
What you need is a data structure for "Point Location Queries": given a subdivision of space (in your case, the plane) and a query point, find the object that contains the query point. There are efficient algorithms (log(n) query time) for doing that on lines, segments, and polygons, and they have been implemented in the computational geometry library CGAL.
Note that I used the CGAL 2D triangulation demo to illustrate the solution
Check out this link for the documentation of point location queries.
Related
Given a "density" scalar field in the plane, how can I divide the plane into nice (low moment of inertia) regions so that each region contains a similar amount of "mass"?
That's not the best description of what my actual problem is, but it's the most concise phrasing I could think of.
I have a large map of a fictional world for use in a game. I have a pretty good idea of approximately how far one could walk in a day from any given point on this map, and this varies greatly based on the terrain etc. I would like to represent this information by dividing the map into regions, so that one day of walking could take you from any region to any of its neighboring regions. It doesn't have to be perfect, but it should be significantly better than simply dividing the map into a hexagonal grid (which is what many games do).
I had the idea that I could create a gray-scale image with the same dimensions as the map, where each pixel's color value represents how quickly one can travel through the pixel in the same place on the map. Well-maintained roads would be encoded as white pixels, and insurmountable cliffs would be encoded as black, or something like that.
My question is this: does anyone have an idea of how to use such a gray-scale image (the "density" scalar field) to generate my "grid" from the previous paragraph (regions of similar "mass")?
I've thought about using the gray-scale image as a discrete probability distribution, from which I can generate a bunch of coordinates, and then use some sort of clustering algorithm to create the regions, but a) the clustering algorithms would have to create clusters of a similar size, I think, for that idea to work, which I don't think they usually do, and b) I barely have any idea if any of that even makes sense, as I'm way out of my comfort zone here.
Sorry if this doesn't belong here, my idea has always been to solve it programatically somehow, so this seemed the most sensible place to ask.
UPDATE: Just thought I'd share the results I've gotten so far, trying out the second approach suggested by #samgak - recursively subdividing regions into boxes of similar mass, finding the center of mass of each region, and creating a voronoi diagram from those.
I'll keep tweaking, and maybe try to find a way to make it less grid-like (like in the upper right corner), but this worked way better than I expected!
Building upon #samgak's solution, if you don't want the grid-like structure, you can just add a small random perturbation to your centers. You can see below for example the difference I obtain:
without perturbation
adding some random perturbation
A couple of rough ideas:
You might be able to repurpose a color-quantization algorithm, which partitions color-space into regions with roughly the same number of pixels in them. You would have to do some kind of funny mapping where the darker the pixel in your map, the greater the number of pixels of a color corresponding to that pixel's location you create in a temporary image. Then you quantize that image into x number of colors and use their color values as co-ordinates for the centers of the regions in your map, and you could then create a voronoi diagram from these points to define your region boundaries.
Another approach (which is similar to how some color quantization algorithms work under the hood anyway) could be to recursively subdivide regions of your map into axis-aligned boxes by taking each rectangular region and choosing the optimal splitting line (x or y) and position to create 2 smaller rectangles of similar "mass". You would end up with a power of 2 count of rectangular regions, and you could get rid of the blockiness by taking the centre of mass of each rectangle (not simply the center of the bounding box) and creating a voronoi diagram from all the centre-points. This isn't guaranteed to create regions of exactly equal mass, but they should be roughly equal. The algorithm could be improved by allowing recursive splitting along lines of arbitrary orientation (or maybe a finite number of 8, 16, 32 etc possible orientations) but of course that makes it more complicated.
Task with example
I'm working with geodata (country-size) from openstreetmap. Buildings are often polygons without housenumbers and a single point with the housenumber is placed within the polygon of the building. Buildings may have multiple housenumbers.
I want to match the housenumbers to the polygons of the buildings.
Simple solution
Foreach housenumber perform a point-in-polygon-test with each building-polygon.
Problem
Way too slow for about 50,000,000 buildings and 10,000,000 address-points.
Idea
Build and index for the building-polygons to accelerate the search for the surrounding polygon for each housenumber-point.
Question
What index or strategy would you recommend for this polygon-structure? The polygons never overlap and the area is sparsly covered.
This question is duplicated to gis.stackexchange.com. It was recommendet to post the question there.
Since it sounds like you have well-formed polygons to test against, I'd use a spatial hash with a AABB check, and then finally the full point-in-polygon test. Hopefully at that point you'll be averaging three or less point-in-polygon tests per address.
Break the area your data is over into a simple grid where a grid is a small multiple (2 to 4) of the median building size. (Maybe 100-200 meters?)
Compute the axis aligned bounding box of every polygon, add it (with its bounding box) to each grid location which the bounding box intersects. (It's pretty simple to figure out where an axis aligned bounding box overlaps regular axis aligned grid cells. I wouldn't store the grid in a simple 2D array -- I'd use a hash table that maps 2D integer grid coordinates, e.g. (1023, 301), to a list of polygons)
Then go through all your address points. Look up in your hash table what cell that point is in. Go through all the polygons in that cell and if the point is within any polygon's axis aligned bounding box do the full point-in-polygon test.
This has several advantages:
The data structures are simple -- no fancy libraries needed (other than handling polygons). With C++, your polygon library, and the std namespace this could be implemented in less than an hour.
Spatial structure isn't hierarchical -- when you're looking up the points you only have to do one O(1) lookup in the hash table.
And of course, the usual disadvantage of grids as a spatial structure:
Doesn't handle wildly varying sized polygons particularly well. However, I'm hoping since you're using map data the sizes are almost always within an order of magnitude, and probably much less.
Assuming you end up with N maximum polygons in each of grid and each polygon has P points and you've got B buildings and A addresses, you're looking at O(B*P + N*A). Since B and P are likely relatively small, especially on average, you could consider this O(B + N) -- pretty much linear.
Here is my specific problem. I need to write an algorithm that:
1) Takes these 2 arrays:
a) An array of about 3000 postcodes (or zip codes if you're in the US), with the longitude and latitude of the center point of the areas they cover (that is, 3 numbers per array element)
b) An array of about 120,000 locations, consisting of longitude and latitude
2) Converts each location to the postcode whose centerpoint is closests to the given longitude and latitude
Note that the longitudes and latitudes of the locations are very unlikely to precisely match those in the postcodes array. That's why I'm looking for the shortest distance to the center point of the area covered by the postcode.
I know how to calculate the distance between two longitude/latitude pairs. I also appreciate that being closests to the center point of an area covered by a postcode doesn't necessarily mean you are in the area covered by that postcode - if you're in a very big postcode area but close to the border, you may be closer to the center point of a neighbouring postcode area. However, in this case I don't have to take this into account - shortest distance to center point is enough.
A very simple way to solve this problem would be to visit each of the 120,000 locations, and find the postcode with the closest centerpoint by calculating the distance to each of the 3000 postcode centerpoints. That would mean 3000 x 120,000 = 360,000,000 distance calculations though.
If postcodes and locations were in a one-dimensional space (that is, identified by 1 number instead of 2), I could simply sort the postcode array by its one-dimensional centerpoint and then do a binary search in the postcode array for each location.
So I guess what I'm looking for is a way to sort the two dimensional space of longitudes and latitudes of the postcode center points, so I can perform a two dimensional binary search for each location. I've seen solutions to this problem, but those only work for direct matches, while I'm looking for the center point closests to a given location.
I am considering caching solutions, but if there is a fast two-dimensional binary search that I could use, that would make the solution much simpler.
This will be part of a batch program, so I'm not counting milli seconds but it can't take days either. It will run once a month without manual intervention.
You can use a space-filling-curve and a quadkey instead of a quadtree or a spatial index. There are some very interesting sfc like the hilbert curve and the moore curve with very interesting patterns.
I was wondering if anybody could point me to the best algorithm/heuristic which will fit my particular polygon packing problem. I am given a single polygon as a boundary (convex or concave may also contain holes) and a single "fill" polygon (may also be convex or concave, does not contain holes) and I need to fill the boundary polygon with a specified number of fill polygons. (I'm working in 2D).
Many of the polygon packing heuristics I've found assume that the boundary and/or filling polygons will be rectangular and also that the filling polygons will be of different sizes. In my case, the filling polygons may be non-rectangular, but all will be exactly the same.
Maybe this is a particular type of packing problem? If somebody has a definition for this type of polygon packing I'll gladly google away, but so far I've not found anything which is similar enough to be of great use.
Thanks.
The question you ask is very hard. To put this in perspective, the (much) simpler case where you're packing the interior of your bounded polygon with non-overlapping disks is already hard, and disks are the simplest possible "packing shape" (with any other shape you have to consider orientation as well as size and center location).
In fact, I think it's an open problem in computational geometry to determine for an arbitrary integer N and arbitrary bounded polygonal region (in the Euclidean plane), what is the "optimal" (in the sense of covering the greatest percentage of the polygon interior) packing of N inscribed non-overlapping disks, where you are free to choose the radius and center location of each disk. I'm sure the "best" answer is known for certain special polygonal shapes (like rectangles, circles, and triangles), but for arbitrary shapes your best "heuristic" is probably:
Start your shape counter at N.
Add the largest "packing shape" you can fit completely inside the polygonal boundary without overlapping any other packing shapes.
Decrement your shape counter.
If your shape counter is > 0, go to step 2.
I say "probably" because "largest first" isn't always the best way to pack things into a confined space. You can dig into that particular flavor of craziness by reading about the bin packing problem and knapsack problem.
EDIT: Step 2 by itself is hard. A reasonable strategy would be to pick an arbitrary point on the interior of the polygon as the center and "inflate" the disk until it touches either the boundary or another disk (or both), and then "slide" the disk while continuing to inflate it so that it remains inside the boundary without overlapping any other disks until it is "trapped" - with at least 2 points of contact with the boundary and/or other disks. But it isn't easy to formalize this "sliding process". And even if you get the sliding process right, this strategy doesn't guarantee that you'll find the biggest "inscribable disk" - your "locally maximal" disk could be trapped in a "lobe" of the interior which is connected by a narrow "neck" of free space to a larger "lobe" where a larger disk would fit.
Thanks for the replies, my requirements were such that I was able to further simplify the problem by not having to deal with orientation and I then even further simplified by only really worrying about the bounding box of the fill element. With these two simplifications the problem became much easier and I used a stripe like filling algorithm in conjunction with a spatial hash grid (since there were existing elements I was not allowed to fill over).
With this approach I simply divided the fill area into stripes and created a spatial hash grid to register existing elements within the fill area. I created a second spatial hash grid to register the fill area (since my stripes were not guaranteed to be within the bounding area, this made checking if my fill element was in the fill area a little faster since I could just query the grid and if all grids where my fill element were to be placed, were full, I knew the fill element was inside the fill area). After that, I iterated over each stripe and placed a fill element where the hash grids would allow. This is certainly not an optimal solution, but it ended up being all that was required for my particular situation and pretty fast as well. I found the required information about creating a spatial hash grid from here. I got the idea for filling by stripes from this article.
This type of problem is very complex to solve geometrically.
If you can accept a good solution instead of the 100% optimal
solution then you can to solve it with a raster algorithm.
You draw (rasterize) the boundary polygon into one in-memory
image and the fill polygon into another in-memory image.
You can then more easily search for a place where the fill polygon will
fit in the boundary polygon by overlaying the two images with
various (X, Y) offsets for the fill polygon and checking
the pixel values.
When you find a place that the fill polygon fits,
you clear the pixels in the boundary polygon and repeat
until there are no more places where the fill polygon fits.
The keywords to google search for are: rasterization, overlay, algorithm
If your fill polygon is the shape of a jigsaw piece, many algorithms will miss the interlocking alignment. (I don't know what to suggest in that case)
One approach to the general problem that works well when the boundary is much larger than
the fill pieces is to tile an infinite plane with the pieces in the best way you can, and then look for the optimum alignment of the boundary on this plane.
First I will define:
Region: big stuff manually created I want to divide.
Zone: small stuff I want to generate.
I have a map. The world map in fact. And I want to divide it into small zones. The size of the zones will be dependent on what region the zone is in. For instance very small for Europe (maybe Europe will have like 200 zones) but only a couple of huge ones for the Atlantic Ocean.
I can manually create points to enclose a region. I will create regions for each big space I want it to have different size than other spaces. For instance I will create an enclosed region for Europe. So I got a butch of (latitude, longitude) points defining the limits of the Europe region. The shape is of course not regular and there are holes in the middle of it (I don't want to create small zones over the Mediterranean sea but a big one). So what we got is a huge 2D shape to be filled up with zones.
Zones themselves are n-sized polygons, number of sizes can be randomly chosen or subject to other constraints. The area of each zone is also limited random (like 50 plus/minus 40%) although this constraint again can be relaxed (as exception, not as rule). Zones can not overlap and the whole region must be divided.
The obvious question, any algorithm that look like can be used to solve this problem?
I even have problems to determine if a given point is inside or outside an enclosed region.
Me, I'd do it the other way round, put a point in the (approximate) centre of all the zones and compute the Voronoi Diagram of the resulting point set.
EDIT: in response to #Unreason's comments. I don't claim that computing the Voronoi diagram is an answer to the question asked. I do claim that computing the Voronoi diagram is a suitable method for dividing a planar map into zones which are defined by their closeness to a point. This may, or may not, satisfy OP's underlying requirement and OP is free to use or ignore my suggestion.
I implied the following, but will now make it explicit: OP, if taken with this suggestion, should define the points (lat,long) at the 'centres' of each zone required and run the algorithm. Voronoi diagrams are not computed iteratively, if OP doesn't like the solution then OP would have to shift the points around and re-compute. I guess it would be feasible to write a routine to do this; the hard part, as ever with computational cartography, is in defining a computable rule about how well a trial solution fits (quasi-)aesthetic requirements.
I wouldn't bother, I'd use country capital cities as the points for my zones (relatively densely packed in Europe, relatively sparse in the Atlantic) and let the algorithm run. Job done.
Perhaps OP might use the locations of all cities with populations over 5 x 10^5 (there are probably about 200 of those in Europe). Or some other points.
Oh, and computing the Voronoi diagram isn't random either, it's entirely deterministic. Again, this may or may not satisfy the underlying requirement.
To determine if a point is inside a polygon follow point in polygon in wikipedia or use some geometry framework.
The restrictions to divide a polygon into smaller polygons of loosely same size are not very limiting at all, for example if you
cut the big polygons with vertical and horizontal lines spaced such that on land you will get exactly the targeted are size, then for europe you will satisfy your criteria for most of the zones.
Inspect them all and for the ones that do not satisfy the criteria you can start modifying the borders with the neighbouring zones in such a way to reach desired size (as you have +/- 40% this should not be hard).
You can do this by moving the shared nodes or by adding points to the borders and moving only those lines.
Also, before the above, join the zones from the initial cut that are smaller then certain percentage (for example 20% of the target size; these could be islands and other small pieces).
The algorithm would work well for the large number of small zones, but will not work as well for regions that need to be cut into only a few zones (but it would work).