Randomly dividing a 2d complex enclosed region - algorithm

First I will define:
Region: big stuff manually created I want to divide.
Zone: small stuff I want to generate.
I have a map. The world map in fact. And I want to divide it into small zones. The size of the zones will be dependent on what region the zone is in. For instance very small for Europe (maybe Europe will have like 200 zones) but only a couple of huge ones for the Atlantic Ocean.
I can manually create points to enclose a region. I will create regions for each big space I want it to have different size than other spaces. For instance I will create an enclosed region for Europe. So I got a butch of (latitude, longitude) points defining the limits of the Europe region. The shape is of course not regular and there are holes in the middle of it (I don't want to create small zones over the Mediterranean sea but a big one). So what we got is a huge 2D shape to be filled up with zones.
Zones themselves are n-sized polygons, number of sizes can be randomly chosen or subject to other constraints. The area of each zone is also limited random (like 50 plus/minus 40%) although this constraint again can be relaxed (as exception, not as rule). Zones can not overlap and the whole region must be divided.
The obvious question, any algorithm that look like can be used to solve this problem?
I even have problems to determine if a given point is inside or outside an enclosed region.

Me, I'd do it the other way round, put a point in the (approximate) centre of all the zones and compute the Voronoi Diagram of the resulting point set.
EDIT: in response to #Unreason's comments. I don't claim that computing the Voronoi diagram is an answer to the question asked. I do claim that computing the Voronoi diagram is a suitable method for dividing a planar map into zones which are defined by their closeness to a point. This may, or may not, satisfy OP's underlying requirement and OP is free to use or ignore my suggestion.
I implied the following, but will now make it explicit: OP, if taken with this suggestion, should define the points (lat,long) at the 'centres' of each zone required and run the algorithm. Voronoi diagrams are not computed iteratively, if OP doesn't like the solution then OP would have to shift the points around and re-compute. I guess it would be feasible to write a routine to do this; the hard part, as ever with computational cartography, is in defining a computable rule about how well a trial solution fits (quasi-)aesthetic requirements.
I wouldn't bother, I'd use country capital cities as the points for my zones (relatively densely packed in Europe, relatively sparse in the Atlantic) and let the algorithm run. Job done.
Perhaps OP might use the locations of all cities with populations over 5 x 10^5 (there are probably about 200 of those in Europe). Or some other points.
Oh, and computing the Voronoi diagram isn't random either, it's entirely deterministic. Again, this may or may not satisfy the underlying requirement.

To determine if a point is inside a polygon follow point in polygon in wikipedia or use some geometry framework.
The restrictions to divide a polygon into smaller polygons of loosely same size are not very limiting at all, for example if you
cut the big polygons with vertical and horizontal lines spaced such that on land you will get exactly the targeted are size, then for europe you will satisfy your criteria for most of the zones.
Inspect them all and for the ones that do not satisfy the criteria you can start modifying the borders with the neighbouring zones in such a way to reach desired size (as you have +/- 40% this should not be hard).
You can do this by moving the shared nodes or by adding points to the borders and moving only those lines.
Also, before the above, join the zones from the initial cut that are smaller then certain percentage (for example 20% of the target size; these could be islands and other small pieces).
The algorithm would work well for the large number of small zones, but will not work as well for regions that need to be cut into only a few zones (but it would work).

Related

Points within a distance of a moving point

I'm working on a evolutionary simulation with predators, prey, and food (plants that grow on terrain depending on the conditions and meat that creatures give off when they die).
Each of them ocupy an (x,y) position.
At the moment, each creature has a few "eyes" which are sensible to red, green and blue color channels, and when a creature or a piece of food is within their viewing distance, the eyes react sending an input to their neural network, depending on the color of the object they are seeing, it's relative angle and it's distance from the creature.
What I'm doing right now is iterating through ALL the plants, meat pieces, and creatures, and checking if they are within the creature's viewing distance. If that condition is true, then the inputs for the network are calculated.
The problem is that the world is massive (about 10,000*10,000 "units") compared to the creatures viewing distance, which is normally between 150 and 300 "units". On top of that, plant number can get really high depending on terrain conditions (up to a few thousand, too), together whith all the other creatures and meat pieces.
So, I normally end up with a massive loop being performed for each creature, which really slows down the simulation, when most of the creatures and food pieces checked are completely irrelevant (are too far away).
What I'm asking for is some method or algorithm that can reduce the number of points being checked for distance in each loop, limiting the distance of the points being checked, or some other technique.
PS: I thought about dividing the simulation in various "zones" so if a creature was in a zone it would only check for other points (food and other creatures) in that particular zone. However, as they are continuously moving, if they were on the edge of a zone it would make their view very inaccurate.
I also slightly improved the speed by checking for distance^2 (not doing the sqrt), and thein calculating it only if it was smaller than viewing_distance^2.
Divide the world in zones. You only need to check at most 4 zones if zone width is slightly larger than the maximum viewing distance.
Using a quad-tree or a kd-tree has the disadvantage that you need to constantly update the structure. But it might work better, do some profiling.
The quadtree structure might be used for geometrical representation:
http://en.wikipedia.org/wiki/Quadtree
It all depends on how much efficiency you need. The zones answer also has the problem that if your creature is near the boundary of a zone you might endup scanning 4 zones..

Algorithm for partially filling a polygonal mesh

Overview:
I have a simple plastic sandbox represented by a 3D polygonal mesh. I need to be able to determine the water level after pouring a specific amount of water into the sandbox.
The water is distributed evenly from above when poured
No fluid simulation, the water is poured very slowly
It needs to be fast
Question:
What kind of techniques/algorithms could I use for this problem?
I'm not looking for a program or similar that can do this, just algorithms - I'll do the implementation.
Just an idea:
First you compute all saddle points. Tools like discrete morse theory or topological persistence might be useful here, but I know too little about them to be sure. Next you iterate over all saddle points, starting at the lowest, and compute the point in time after which water starts to cross that point. This is the time at which the shallower (in terms of volume versus surface area) of the two adjacent basins has reached a level that matches the height of that saddle point. From that point onward, water pouring onto that surface will flow over to the other basin and contribute to its water level instead, until the two basins have reached an equal level. After that, they will be treated as one single basin. Along the way you might have to correct the times when other saddle points will be reached, as the area associated with basins changes. You iterate in order of increasing time, not increasing height (e.g. using a heap with decrease-key operation). Once the final pair of basins have equal height, you are done; afterwards there is only a single basin left.
On the whole, this gives you a sequence of “interesting” times, where things change fundamentally. In between these, the problem will be much more local, as you only have to consider the shape of a single basin to compute its water level. In this local problem, you know the volume of water contained in that basin, so you can e.g. use bisection to find a suitable level for that. The adjacent “interesting” times might provide useful end points for your bisection.
To compute the volume of a triangulated polytope, you can use a 3D version of the shoelace formula: for every triangle, you take its three vertices and compute their determinant. Sum them up and divide by 6, and you have the volume of the enclosed space. Make sure that you orientate all your triangles consistently, i.e. either all as seen from the inside or all as seen from the outside. The choice decides the overall sign, try it out to see which one is which.
Note that your question might need refinement: when the level in a basin reaches two saddle points at exactly the same height, where does the water flow? Without fluid simulation this is not well defined, I guess. You could argue that it should be distributed equally among all adjacent basins. You could argue that such situations are unlikely to occur in real data, and therefore choose one neighbour arbitrarily, implying that this saddle point has infinitesimally less height than the other ones. Or you could come up with a number of other solutions. If this case is of interest to you, then you might need to clarify what you expect there.
A simple solution comes to mind:
binary-search your way through different heights of water, computing the volume of water contained.
i.e.
Start with an upper-estimate for the water's height of the depth D of the sandbox.
Note that since sand is porous, the maximum volume will be with the box filled to the brim with water;
Any more water would just pout back out into the grass in our hypothetical back-yard.
Also note, that this means that you don't need to worry about saddle points, or multiple water levels in your solution;
Again, we're assuming regular porous sand here, not mountains made out of rock.
Compute the volume of water contained by height D.
If it is within your approximation threshold, quit.
Otherwise, adjust your estimate with a different height, and repeat.
Note that computing the volume of water above the surface of the sand for any given triangular piece of the sand is easy.
It's the volume of a triangular prizm plus the volume of the tetrahedron which is in contact with the sand:
Note that the volume of water below the sandline would be calculated similarly, but the volume would be less, since part of it would be occupied by the sand.
I suggest internet-searching for typical air-void contents of sand, or water-holding capacity.
Or whatever phrases would return a sane result.
Also note, that some triangles may have zero water above the sand, if the sand is above the water-line.
Once you have the volume of water both above and below the sand-line for a single triangle of your mesh, you just loop over all of the triangles to get the total volume for your entire sandbox, for the given height.
Note that this is a pretty dumb algorithm, but I suspect it will have decent performance compared to a fancy-schmancier algorithm which would try to do anything more clever.
Remember that this is just a handful of multiplications and summations for each triangle, and that blind loops with few if statements or other flow-control tend to execute quickly, since the processor can pipeline it well.
This method may be parallelized easily instead of looping over each triangle, if you have a highly-detailed mesh of a sandbox, and want to shove the calculations into multiple cores.
Or keep the loops, and shove different heights into each core.
Or something else; I leave parallelization and speeding up as an exercise for the reader.

What "boundary conditions" can make a rectangle "look" like a circle?

I am solving a fourth order non-linear partial differential equation in time and space (t, x) on a square domain with periodic or free boundary conditions with MATHEMATICA.
WITHOUT using conformal mapping, what boundary conditions at the edge or corner could I use to make the square domain "seem" like a circular domain for my non-linear partial differential equation which is cartesian?
The options I would NOT like to use are:
Conformal mapping
changing my equation to polar/cylindrical coordinates?
This is something I am pursuing purely out of interest just in case someone screams bloody murder if misconstrued as a homework problem! :P
That question was asked on the time people found out that the world was spherical. They wanted to make rectangular maps of the surface of the world...
It is not possible.
The reason why is not possible is because the sphere has an intrinsic curvature, while the cube/parallelepiped has not. It can be shown that for two elements with different intrinsic curvatures, their surfaces cannot be mapped while either keeping constant infinitesimal distances, either the distance between two points is given by the euclidean distance.
The easiest way to understand this problem is to pick some rectangular piece of paper and try to make a sphere of it without locally stretch it or compress it (you can fold). You can't. On the other hand, you can make a cylinder surface, because the cylinder has also no intrinsic curvature.
In maps, normally people use one of the two options:
approximate the local surface of the sphere by a tangent plane and make a rectangle out of it. (a local map of some region)
make world maps but implement some curved lines everywhere identifying that the measuring distances must be made according to those lines.
This is also the main reason why when traveling from Europe to North America the airplanes seems to make a curve always trying to pass near canada. If we measured the distance from the rectangular map, we see that they should go on a strait line to minimize the distance. However, because we are mapping two different intrinsic curvatures, the real distance must be measured in a different way (and not via a strait line).
For 2D (in fact for nD) the same reasoning applies.

How to subsample a 2D polygon?

I have polygons that define the contour of counties in the UK. These shapes are very detailed (10k to 20k points each), thus rendering the related computations (is point X in polygon P?) quite computationaly expensive.
Thus, I would like to "subsample" my polygons, to obtain a similar shape but with less points. What are the different techniques to do so?
The trivial one would be to take one every N points (thus subsampling by a factor N), but this feels too "crude". I would rather do some averaging of points, or something of that flavor. Any pointer?
Two solutions spring to mind:
1) since the map of the UK is reasonably squarish, you could choose to render a bitmap with the counties. Assign each a specific colour, and then render the borders with a 1 or 2 pixel thick black line. This means you'll only have to perform the expensive interior/exterior calculation if a sample happens to lie on the border. The larger the bitmap, the less often this will happen.
2) simplify the county outlines. You can use a recursive Ramer–Douglas–Peucker algorithm to recursively simplify the boundaries. Just make sure you cache the results. You may also have to solve this not for entire county boundaries but for shared boundaries only, to ensure no gaps. This might be quite tricky.
Here you can find a project dealing exactly with your issues. Although it works primarily with an area "filled" by points, you can set it to work with a "perimeter" type definition as yours.
It uses a k-nearest neighbors approach for calculating the region.
Samples:
Here you can request a copy of the paper.
Seemingly they planned to offer an online service for requesting calculations, but I didn't test it, and probably it isn't running.
HTH!
Polygon triangulation should help here. You'll still have to check many polygons, but these are triangles now, so they are easier to check and you can use some optimizations to determine only a small subset of polygons to check for a given region or point.
As it seems you have all the algorithms you need for polygons, not only for triangles, you can also merge several triangles that are too small after triangulation or if triangle count gets too high.

Comparison of Lat, Long Coordinates

I have a list of more than 15 thousand latitude and longitude coordinates. Given any X,Y coordinates, what is the fastest way to find the closest coordinates on the list?
I did this once for a web site. I.e. find the dealer within 50 miles of your zip code. I used the great circle calculation to find the coordinates that were 50 miles north, 50 miles east, 50 miles south, and 50 miles west. That gave me a min and max lat and a min and max long. From there then I did a database query:
select *
from dealers
where latitude >= minlat
and latitude <= maxlat
and longitude >= minlong
and longitude <= maxlong
Since some of those results will still be more than 50 miles away, then I used the great circle formula once more on that small list of coordinates. Then I printed out the list along with the distance from the target.
Of course, if you wanted to search for points near the international date line or the poles, than this won't work. But it works great for searches inside North America!
You will want to use a geometric construction called a Voronoi diagram. This divides up the plane into a number of areas, one for each point, that encompass all the points that are closest to each of your given points.
The code for the exact algorithms to create the Voronoi diagram and arrange the data structure lookups are too large to fit in this little edit box. :)
#Linor: That's essentially what you would do after creating a Voronoi diagram. But instead of making a rectangular grid, you can choose dividing lines that closely match the lines of the Voronoi diagram (this way you will get fewer areas that cross dividing lines). If you recursively divide your Voronoi diagram in half along the best dividing line for each subdiagram, you can then do a tree search for each point you want to look up. This requires a bit of work up front but saves time later. Each lookup would be on the order of log N where N is the number of points. 16 comparisons is a lot better than 15,000!
The general concept you're describing is nearest-neighbour search, and there are a whole raft of techniques which deal with solving these types of queries, either exactly or approximately. The basic idea is to use a spatial partitioning technique to reduce the complexity from O(n) per query to (approximately) O( log n ) per query.
KD-Trees, and variants of KD-Trees seem to work very well, but quad-trees will also work. The quality of these searches depends on whether your set of 15,000 data points are static (you're not adding a-lot of data points to the reference set). Mount and Arya's work on the Approximate Nearest Neighbour library is both easy to use and understand, even without a good grounding in the math. It also gives you some flexibility in the types and tolerances of your queries.
It rather depends how many times you want to do it, and what resources are available - if you're doing the test once, then the O(log N) techniques are good. If you're doing it a thousand times on a server, constructing a bitmap lookup table would be faster, either giving the result directly or as a first stage of. 2GB of bitmap can map the whole world lat-lon to a 32bit value at 0.011 degree pixels (1.2km at equator), and should fit into memory. If you're only doing single country, or can exclude the poles, you can have a smaller map or higher resolution. For 15,000 points you probably have a much smaller map - I first sized it up as a first step to doing lat-lon to postcode searches, which needs higher resolution. Depending on requirements, you use the mapped value to point at the result directly, or to short list of the candidates (which would allow a smaller map, but requires greater subsequent processing - you're not in O(1) lookup territory any more).
You didn't specify what you meant by fastest. If you want to get the answer quickly without writing any code, I would give the gpsbabel radius filter a go.
Based on your clarifications, I would use a geometrical data structure such as a KD-tree or an R-tree. MySQL has a SPATIAL data type which does this. Other languages/frameworks/databases have libraries to support this. Basically, such a data structure embeds the points in a tree of rectangles, and searches the tree using a radius. This should be fast enough, and I believe is simpler than building a Voronoi diagram. I guess there is some threshold above which you would prefer the added performance of a Voronoi diagram so you will be ready to pay the added complexity.
This can be solved in several ways. I would first approach this problem by generating a Delaunay network connecting closest points to each other. This can be accomplished with the v.delaunay command in the open source GIS application GRASS. You could complete the problem in GRASS using one of the many network analysis modules in GRASS. Alternatively, you could use the free spatial RDBMS PostGIS to do the distance queries. The PostGIS spatial queries are considerably more powerful than those in MySQL, as they are not constrained to BBOX operations. For example:
SELECT network_id, ST_Length(geometry) from spatial_table where ST_Length(geometry) < 10;
Since you are using Longitude and Latitude, you probably want to use the Spheroid-Distance functions. With a spatial index, PostGIS scales very well for large datasets.
Even if you create a voronoi diagram, that still means you need to compare your x, y coordinates to all 15 thousand created areas. To make that easier, the first thing that popped into my mind though was to create some sort of grid over the possible values, so that you can easily place and x/y coordinate into one of the boxes in a grid, if the same is done for the list of areas you should quickly shrink the possible candidates for comparison (because the grid would be more rectangular, it's possible for an area to be in multiple grid positions).
Premature optimization is the root of all evil.
15K coordinates aren't that much. Why not iterate over the 15K coordinates and see if that's really a performance problem? You could save a lot of work and maybe it never gets too slow to even notice.
How large an area are these coordinates spread out over? What latitude are they at? How much accuracy do you require? If they're fairly close together, you can probably ignore the fact that the earth is round and just treat this as a Cartesian plane rather than messing about with spherical geometry and great circle distances. Of course, as you get further from the equator, degrees of longitute get smaller compared to degrees of latitude, so some sort of scaling factor may be appropriate.
Start with a fairly simple distance formula and a brute force search and see how long that's going to take and if the results are accurate enough before you get fancy.
Thanks everyone for the answers.
#Tom, #Chris Upchurch: The coordinates are fairly close to each others, and they are in a relatively small area of about 800 sq km. I guess I can assume the surface to be flat. I need to process the requests over and over again, and the response should be faster enough for more web experience.
A grid is very simple, and very fast. It's basically just a 2D array of lists. Each array entry represents the points that fall inside a grid cell. Very easy to set the grid up:
for each point p
get cell that contains p
add point to that cell's list
and it's very easy to look things up:
given a query point p
get cell that contains p
check points in that cell (and its 8 neighbors), against query point p
Alejo
Just to be contrairian, do you mean close in distance or (driving) time? In an urban area I'd gladly drive 5 miles (5min) on the highway than 4 miles (20min stop and go) in another direction.
Thus if it's a 'closest' metric you need, I'd look into GIS databases with travel time metrics.

Resources