I'm considering trying to make a game that takes place on an essentially infinite grid.
The grid is very sparse. Certain small regions of relatively high density. Relatively few isolated nonempty cells.
The amount of the grid in use is too large to implement naively but probably smallish by "big data" standards (I'm not trying to map the Internet or anything like that)
This needs to be easy to persist.
Here are the operations I may want to perform (reasonably efficiently) on this grid:
Ask for some small rectangular region of cells and all their contents (a player's current neighborhood)
Set individual cells or blit small regions (the player is making a move)
Ask for the rough shape or outline/silhouette of some larger rectangular regions (a world map or region preview)
Find some regions with approximately a given density (player spawning location)
Approximate shortest path through gaps of at most some small constant empty spaces per hop (it's OK to be a bad approximation often, but not OK to keep heading the wrong direction searching)
Approximate convex hull for a region
Here's the catch: I want to do this in a web app. That is, I would prefer to use existing data storage (perhaps in the form of a relational database) and relatively little external dependency (preferably avoiding the need for a persistent process).
Guys, what advice can you give me on actually implementing this? How would you do this if the web-app restrictions weren't in place? How would you modify that if they were?
Thanks a lot, everyone!
I think you can do everything using quadtrees, as others have suggested, and maybe a few additional data structures. Here's a bit more detail:
Asking for cell contents, setting cell contents: these are the basic quadtree operations.
Rough shape/outline: Given a rectangle, go down sufficiently many steps within the quadtree that most cells are empty, and make the nonempty subcells at that level black, the others white.
Region with approximately given density: if the density you're looking for is high, then I would maintain a separate index of all objects in your map. Take a random object and check the density around that object in the quadtree. Most objects will be near high density areas, simply because high-density areas have many objects. If the density near the object you picked is not the one you were looking for, pick another one.
If you're looking for low-density, then just pick random locations on the map - given that it's a sparse map, that should typically give you low density spots. Again, if it doesn't work right try again.
Approximate shortest path: if this is a not-too-frequent operation, then create a rough graph of the area "between" the starting point A and end point B, for some suitable definition of between (maybe the square containing the circle with the midpoint of AB as center and 1.5*AB as diameter, except if that diameter is less than a certain minimum, in which case... experiment). Make the same type of grid that you would use for the rough shape / outline, then create (say) a Delaunay triangulation of the black points. Do a shortest path on this graph, then overlay that on the actual map and refine the path to one that makes sense given the actual map. You may have to redo this at a few different levels of refinement - start with a very rough graph, then "zoom in" taking two points that you got from the higher level as start and end point, and iterate.
If you need to do this very frequently, you'll want to maintain this type of graph for the entire map instead of reconstructing it every time. This could be expensive, though.
Approx convex hull: again start from something like the rough shape, then take the convex hull of the black points in that.
I'm not sure if this would be easy to put into a relational database; a file-based storage could work but it would be impractical to have a write operation be concurrent with anything else, which you would probably want if you want to allow this to grow to a reasonable number of players (per world / map, if there are multiple worlds / maps). I think in that case you are probably best off keeping a separate process alive... and even then making this properly respect multithreading is going to be a headache.
A kd tree or a quadtree is a good data structure to solve your problem. Especially the latter it's a clever way to address the grid and to reduce the 2d complexity to a 1d complexity. Quadtrees is also used in many maps application like bing and google maps. Here is a good start: Nick quadtree spatial index hilbert curve blog.
Related
I've an indeterminate number of closed CGPath elements of various shapes and sizes all containing a single concave bezier curve, like the red and blue shapes in the diagram below.
What is the simplest and most efficient method of dividing these shapes into n regions of (roughly) equal size?
What you want is Delaunay triangulation. Here is an example which resembles what you want to do. It uses an as3 library. Here is an iOS port, that should help you:
https://github.com/czgarrett/delaunay-ios
I don't really understand the context of what you want to achieve and what the constraints are. For instance, is there a hard requirement that the subdivided regions are equal size?
Often the solutions to a performance problem is not a faster algorithm but a different approach, usually one or more of the following:
Pre-compute the values, or compute as much as possible offline. Say by using another server API which is able to do the subdivision offline and cache the results for multiple clients. You could serve the post-computed result as a bitmap where each colour indexes into the table of values you want to display. Looking up the value would be a simple matter of indexing the pixel at the touch position.
Simplify or approximate a solution. Would a grid sub-division be accurate enough? At 500 x 6 = 3000 subdivisions, you only have about 51 square points for each region, that's a region of around 7x7 points. At that size the user isn't going to notice if the region is perfectly accurate. You may need to end up aggregating adjacent regions anyway due to touch resolution.
Progressive refinement. You often don't need to compute the entire algorithm up front. Very often algorithms run in discrete (often symmetrical) units, meaning you're often re-using the information from previous steps. You could compute just the first step up front, and then use a background thread to progressively fill in the rest of the detail. You could also defer final calculation until the the touch occurs. A delay of up to a second is still tolerable at that point, or in the worst case you can display an animation while the calculation is in progress.
You could use some hybrid approach, and possibly compute one or two levels using Delaunay triangulation, and then using a simple, fast triangular sub-division for two more levels.
Depending on the required accuracy, and if discreet samples are not required, the final levels could be approximated using a weighted average between the points of the triangle, i.e., if the touch is halfway between two points, pick the average value between them.
I have polygons that define the contour of counties in the UK. These shapes are very detailed (10k to 20k points each), thus rendering the related computations (is point X in polygon P?) quite computationaly expensive.
Thus, I would like to "subsample" my polygons, to obtain a similar shape but with less points. What are the different techniques to do so?
The trivial one would be to take one every N points (thus subsampling by a factor N), but this feels too "crude". I would rather do some averaging of points, or something of that flavor. Any pointer?
Two solutions spring to mind:
1) since the map of the UK is reasonably squarish, you could choose to render a bitmap with the counties. Assign each a specific colour, and then render the borders with a 1 or 2 pixel thick black line. This means you'll only have to perform the expensive interior/exterior calculation if a sample happens to lie on the border. The larger the bitmap, the less often this will happen.
2) simplify the county outlines. You can use a recursive Ramer–Douglas–Peucker algorithm to recursively simplify the boundaries. Just make sure you cache the results. You may also have to solve this not for entire county boundaries but for shared boundaries only, to ensure no gaps. This might be quite tricky.
Here you can find a project dealing exactly with your issues. Although it works primarily with an area "filled" by points, you can set it to work with a "perimeter" type definition as yours.
It uses a k-nearest neighbors approach for calculating the region.
Samples:
Here you can request a copy of the paper.
Seemingly they planned to offer an online service for requesting calculations, but I didn't test it, and probably it isn't running.
HTH!
Polygon triangulation should help here. You'll still have to check many polygons, but these are triangles now, so they are easier to check and you can use some optimizations to determine only a small subset of polygons to check for a given region or point.
As it seems you have all the algorithms you need for polygons, not only for triangles, you can also merge several triangles that are too small after triangulation or if triangle count gets too high.
I'm building a 2D physics engine and I want to add broad-phase collision detection, though I only know of 2 or 3 types:
Check everything against everything else (O(n^2) complexity)
Sweep and Prune (sort and sweep)
something about Binary Space Partition (not sure how to do this)
But surely there's more options right? what are they? And can either a basic description of each be provided or links to descriptions?
I've seen this but I'm asking for a list of algorithms available, not the best one for my needs.
In this case, "Broad phase collision detection" is a method used by physics engines to determine which bodies in their simulation are close enough to warrant further investigation and possibly collision resolution.
The best approach depends on the specific use, but the bottom line is that you want to subdivide your world space such that (a) every body is in exactly one subdivision, (b) every subdivision is large enough that a a body in a particular subdivision can only collide with bodies in that same subdivision or an adjacent subdivision, and (c) the number of bodies in a particular subdivision is as small as possible.
How you do that depends on how many bodies you have, how they're moving, what your performance requirements are, and how much time you want to spend on your engine. If you're talking about bodies moving around in a largely open space, the simplest technique would be divide the world into a grid where each cell is larger than your largest object, and track the list of objects in each cell. If you're building something on the scale of a classic arcade game, this solution may well suffice.
If you're dealing with bodies moving in a larger open world, a simple grid will become overwhelming pretty quickly, and you'll probably want some sort of a tree-based structure like quadtrees, as Arriu suggests.
If you're talking about moving bodies around within bounded spaces instead of open spaces, then you may consider a BSP tree; the tree partitions the world into 'space you can walk in' and 'walls', and clipping a body into the tree determines whether it's in a legal position. Depending on the world geometry, you can also use a BSP for your broad-phase detection of collisions between bodies in the world.
Another option for bodies moving in bounded space would be a portal engine; if your world can consist of convex polygonal regions where each side of the polygon is either a solid wall or a 'portal' to another concave space, you can easily determine whether a body is within a region with a point-in-polygon test and simplify collision detection by only looking at bodies in the same region or connected regions.
An alternative to QuadTrees or BSPTrees are SphereTrees (CircleTrees in 2D, the implementation would be more or less the same). The advantage that SphereTrees have are that they handle large loads of dynamic objects very well. If you're objects are constantly moving, BSPTrees and QuadTrees are much slower in their updates than a Sphere/Circle Tree would be.
If you have a good mix of static and dynamic objects, a reasonably good solution is to use a QuadTree/BSPTree for the statics and a Sphere/Cicle Tree for the dynamic objects. Just remember that for any given object, you would need to check it against both trees.
I recommend quadtree partitioning. It's pretty simple and it works really well. Here is a Flash demo of brute-force collision detection vs. quadtree collision detection. (You can tell it to show the quadtree structure.) Did you notice how quadtree collision detection is only 3% of brute force in that demo?
Also, if you are serious about your engine then I highly recommend you pick up real-time collision detection. It's not expensive and it's a really great book which covers everything you would ever want to know. (Including GPU based collision detection.)
All of the acceleration algorithms depend on using an inexpensive test to quickly rule out objects (or groups of objects) and thereby cut down on the number of expensive tests you have to do. I view the algorithms in categories, each of which has many variations.
Spatial partitioning. Carve up space and cheaply exclude candidates that are in different regions. For example, BSP, grids, octrees, etc.
Object partitioning. Similar to #1, but the clustering is focused on the objects themselves more than the space they reside in. For example, bounding volume hierarchies.
Sort and sweep methods. Put the objects in order spatially and rule out collisions among ones that aren't adjacent.
1 and 2 are often hierarchical, recursing into each partition as needed. With 3, you can optionally iterate along different dimensions.
Trade-offs depend a lot on scene geometry. Do objects cluster or are they evenly or sparsely distributed? Are they all about the same size or are there huge variations in size? Is the scene dynamic? Can you afford a lot of preprocessing time?
The "inexpensive" tests are actually along a spectrum of really-cheap to kind-of-expensive. Choosing the best algorithm means minimizing the ratio of the cost of the inexpensive testing to the reduction in the number of expensive tests. Beyond the algorithmic concerns, you get into tuning, like worrying about cache locality.
An alternative are plain grids, say 20x20 or 100x100 (depends on your world and memory size). The implementation is simpler than a recursive structure such as quad/bsp-trees (or sphere trees for that matter).
Objects crossing cell borders are a bit simpler in this case, and do not degenerate as much as an naive implementation of a bsp/quad/oct-tree might do.
Using that (or other techinques), you should be able to quickly cull many pairs and get a set of potential collisions that need further investigation.
I just came up with a solution that doesn't depend on grid size and is probably O(nlogn) (that is the optimum when there are no collisions) though worst at O(nnlogn) (when everything collides).
I also implemented and tested it, here is the link to the source. But I haven't compared it to the brute force solution.
A description of how it works:
(I'm looking for collisions of rectangles)
on x axis I sort the rectangles according to their right edge ( O(nlogn) )
for every rect in the sorted list I take the left edge and do a binary search until I find the rightmost edge at the left of the left edge and insert these rectangles between these indices in a possible_Collision_On_X_Axis set ( those are n rectangles, logn for the binary search, n inserts int the set at O(log n)per insert)
on y axis I do the same
in each of the sets I now have possible collisions on x (in one set) and on y(int the other), I intersect these sets and now I have the collisions on both the x axis and y axis (that means I take the common elements) (O(n))
sorry for the poor description, I hope you understand better from the source, also an example illustrated here: image
You might want to check out what Scott did in Chipmunk with spacial hashing. The source is freely available. I think he used a similar technique to Box-2D if not for collision, definitely for contact persistence.
I've used a quad-tree in a larger project, which is good for game objects that don't move much (less removals & re-insertions). The same principle applies for octrees.
The basic Idea Is, you create a recursive tree structure, which stores 4(for quad), or 8(oct) child objects of the same type as the tree root. Each node in the tree represents a section of Cartesian space, for example, -100 -> +100 on each applicable axis. each child represents that same space, but subdivided by half (an immediate child of the example would represent, for example, -100->0 on X axis, and -100->0 on Y axis).
Imagine a square, or plane, with a given set of dimensions. Now draw a line through the centre on each axis, dividing that plane into 4 smaller planes. Now take one of them and do It again, and again, until you reach a point when the size of the plane segment Is roughly the size of a game object. Now you have your tree. Only objects occupying the same plane, can possibly collide. When you have determined which objects can collide, You generate pairs of possible collisions from them. At this stage, broadphase Is complete, and you can move onto narrow phase collision detection, which Is where your more precise, and expensive calculations are.
The purpose of Broadphase, Is to use inexpensive calculations to generate possible collisions, and cull out contacts that cannot occur, thus reducing the work your narrow phase algorithm has to perform, In turn, making your entire collision detection algorithm more efficient.
So before you go ahead and attempt to implement such an algorithm, as yourself:
How many objects are in my game?
If there are a lot, I probably need a broadphase. If not, then the Nnarrowphase should suffice.
Also, am I dealing with many moving objects?
Tree structures generally are slowed down by moving objects, as they can change their position in the tree over time, simply by moving. This requires that objects be removed and reinserted each frame (potentially), which is less than Ideal.
If this is the case, you would be better off with Sweep and Prune, which maintains min/max heaps of the extents of the shapes in your world. Objects do not need to be reinserted, but the heaps need to be resorted each frame, thought this is generally faster than a tree wide, traversal with removals and reinsertions.
Depending on your coding experience, one may be more complicated to code over another. Personally I have found trees to be more intuitive to code, but less efficient, and more prone to error, since they raise other issues, such as what to do if you have an object that sits directly on top of an axis, or the centre of the root node. This can be solved by using loose trees, which have some spacial overlap, so that one child node is always prioritised over others when such a case occurs.
If the space that your objects move within is bounded, then you could use a grid to subdivide your objects. Put each object into every grid cell which the object covers (fully or partially). To check if object A collides with any other object, determine which grid cells object A covers, then get the list of unique objects in those cells, and finally test object A against each unique object. This approach works best if most objects are usually contained in a single grid cell.
If your space is not bounded, then you will need to implement some sort of dynamic grid that can grow on demand in each of the four directions (in 2D).
The advantage of this approach over more adaptive algorithsm (i.e. BSP, Quadtree, Circletree) is that the cells can be accessed in O(1) time (i.e. constant time) rather than O(log N) time (i.e. logarithmic time). However these latter algorithms are able to adapt themselves to large variability in the density of objects. The grid approach works best when object density is fairly constant across the space.
I would like to recommend the introductory reference to game physics from Ian Millington, Game Physics Engine Development. It has a great section on broad phase collision detection with sample code.
I was wondering what is the best data structure to deal with a lot of moving objects(spheres, triangles, boxes, points, etc)? I'm trying to answer two questions, Nearest Neighbor and Collsion detection.
I do realize that traditionally, data structures like R trees are used for nearest neighbor queries and Oct/Kd/BSP are used for collision detection problems dealing with static objects, or with very few moving objects.
I'm just hoping that there is something else out there that is better.
I appreciate all the help.
You can partition the scene in an octree and utilize scene coherence. Your moving object that you are testing is going to be in a specific node of the tree for a specific amount of frames depending on its speed. This means you can assume it will be in the node and therefore quickly cut out a lot of objects that are not in the node. Of course the tricky bit is when your object is close to the edge of your partition, you would have to make sure you update which node the object is in.
A moving object has a direction and a speed. You can easily divide the scene in two sections by using a perpendicular division from your objects direction of movement. Any object on the wrong side of this division do not need to be checked. Of course compensate for the other object's velocity. So if the other object is reasonable slow, you can easily cut it out from further checks.
Always simplify whatever shape you are testing with something like an axis aligned bounding box. An initial collision test
You can take the distance between your object and another moving object plus the velocities. If the other moving object is moving in the same general direction at a faster velocity, you can eliminate it from your check.
There are many other optimizations depending on the object's shape. Circles or squares or more complex shapes all have specific optimizations you can do while moving.
Assuming some objects do stop or can stop moving, you can keep track of objects that stop. these objects can than be treated like static objects and therefore the checks are faster and you can apply all the static optimization techniques to them.
I know of more but can't think of any off the top of my head. Haven't done this in a while.
Now of course this depends on how you are doing the collision detection. Are you incrementally updating the object's position based on velocity and checking as though it is static. Or are you compensating for velocity by extruding the shape and figuring out initial collision points. The former needs a small step for fast moving object. The latter is more complicated and costly but gives better results. Also if you are going to have rotating objects then things get a little more tricky.
Bounding Spheres probably would help you with many moving objects; you calculate the radius squared of the object, and track it from it's center. If the squared distance between the centers of two objects is less than the sum of the squared radii of the two objects, then you have a potential collision. Everything done with squared distances; no square roots.
You can sort nearest neighbors by the minimum squared distance between your objects. Collision detection can get complex, of course, and with non-spherically shaped objects, Bounding Spheres won't necessarily get you collision information, but it can prune your tree of objects you need to compare for collision quite nicely.
You'll need, of course, to track the CENTER of your object; and ideally you'd like for each object to be rigid, to avoid having to recalculate bounding sphere sizes (although the recalculation isn't particularly tough, particularly if you use a tree of rigid objects each with their own bounding sphere for the objects which are non-rigid; but it gets complicated).
Basically, to answer your question about data structures, you can keep all of your objects in a master array; I'd have a set of "region" arrays which consist of references to the objects in the master array that you can sort the objects into quickly based upon their cartesian coordinates; the "region' arrays should have an overlap defined of at least 2x the largest object radius in your master object array (if that's feasible; a question of object bounding sphere scaling vs. number of objects obviously comes up).
Once you've got that, you can do a quick collision test by comparing distances of all objects within a region to each other; again, this is where region definition becomes important, because you're doing a significant tradeoff of number of regions to number of comparisons. However, it's a little simpler just because your distance comparisons come down to simple subtractions (and abs() operations, of course).
Of course, then you have to do actual collision detection between your non-spherical objects, and that can be non-trivial, but you've reduced the number of potential comparisons very dramatically at that point.
Basically, it's a two-tiered system; the first is the region array, whereby you do a rough sort on your scene. Second, you have your intra-region distance comparison; wherein you're going to do your basic collision detection and collision flagging on the objects which have collided.
There probably is room in this sort of algorithm for use of trees in dynamic region determination to even out your region sizes to assure that your region size doesn't grow too rapidly with "crowded" regions; that sort of thing, though, is non-trivial, because with objects of differing sizes, your sort on density becomes... complex, to say the least.
One interesting method for doing collision detection is to use axially aligned bounding boxes (AABB's) organised within a special octree structure. The AABB's help because they make the coarse collision computations very quick to do because you only need to perform up to 6 comparisons.
There are a couple of tricks you should use with the octree structure:
1) The bounding area which a node occupies should be twice as large as it would be for a normal octree (where the octree partitions the space without overlap). As each node should overlap half of the area of its adjacent nodes. Since the AABB can belong to only one node this extra size and overlap allows it to remain in the one node for a longer period of time.
2) Also in each node - and probably in each level of the hierarchy - you keep links to the node's neighbours. This will involve a lot of extra code but it will allow you to move elements between nodes in close to O(1) time rather than O(2logn) time.
If the octree is taking up too much memory you could change it to use a sparse octree structure, only keeping the tree for the parts of the game world that actually contained entities. However this would mean that you'd have to perform more computations for each frame when entities move through the world.
Some other ideas you might want to try instead of an octree is to use a kd-tree (I believe that's the correct name), or use AABB's to build the structure from the bottom up.
KD trees (from memory) partition the space using axially aligned planes, thus they are a good fit for use with AABB.
The other idea is instead of forcing the octree generation from the top down (starting with a box envoloping the whole world), you build it up from the base entities and construct bigger AABB's that grow until the biggest one encompasses the whole world. Though I'm unsure of how it would work with many fast moving entities.
(It's been a while since I've done this kind of spatial game development coding.)
RDC may be of use, especially if your objects are sparse (not many intersections).
sweep and prune broad phase + gjk narrow phase
I have a list of more than 15 thousand latitude and longitude coordinates. Given any X,Y coordinates, what is the fastest way to find the closest coordinates on the list?
I did this once for a web site. I.e. find the dealer within 50 miles of your zip code. I used the great circle calculation to find the coordinates that were 50 miles north, 50 miles east, 50 miles south, and 50 miles west. That gave me a min and max lat and a min and max long. From there then I did a database query:
select *
from dealers
where latitude >= minlat
and latitude <= maxlat
and longitude >= minlong
and longitude <= maxlong
Since some of those results will still be more than 50 miles away, then I used the great circle formula once more on that small list of coordinates. Then I printed out the list along with the distance from the target.
Of course, if you wanted to search for points near the international date line or the poles, than this won't work. But it works great for searches inside North America!
You will want to use a geometric construction called a Voronoi diagram. This divides up the plane into a number of areas, one for each point, that encompass all the points that are closest to each of your given points.
The code for the exact algorithms to create the Voronoi diagram and arrange the data structure lookups are too large to fit in this little edit box. :)
#Linor: That's essentially what you would do after creating a Voronoi diagram. But instead of making a rectangular grid, you can choose dividing lines that closely match the lines of the Voronoi diagram (this way you will get fewer areas that cross dividing lines). If you recursively divide your Voronoi diagram in half along the best dividing line for each subdiagram, you can then do a tree search for each point you want to look up. This requires a bit of work up front but saves time later. Each lookup would be on the order of log N where N is the number of points. 16 comparisons is a lot better than 15,000!
The general concept you're describing is nearest-neighbour search, and there are a whole raft of techniques which deal with solving these types of queries, either exactly or approximately. The basic idea is to use a spatial partitioning technique to reduce the complexity from O(n) per query to (approximately) O( log n ) per query.
KD-Trees, and variants of KD-Trees seem to work very well, but quad-trees will also work. The quality of these searches depends on whether your set of 15,000 data points are static (you're not adding a-lot of data points to the reference set). Mount and Arya's work on the Approximate Nearest Neighbour library is both easy to use and understand, even without a good grounding in the math. It also gives you some flexibility in the types and tolerances of your queries.
It rather depends how many times you want to do it, and what resources are available - if you're doing the test once, then the O(log N) techniques are good. If you're doing it a thousand times on a server, constructing a bitmap lookup table would be faster, either giving the result directly or as a first stage of. 2GB of bitmap can map the whole world lat-lon to a 32bit value at 0.011 degree pixels (1.2km at equator), and should fit into memory. If you're only doing single country, or can exclude the poles, you can have a smaller map or higher resolution. For 15,000 points you probably have a much smaller map - I first sized it up as a first step to doing lat-lon to postcode searches, which needs higher resolution. Depending on requirements, you use the mapped value to point at the result directly, or to short list of the candidates (which would allow a smaller map, but requires greater subsequent processing - you're not in O(1) lookup territory any more).
You didn't specify what you meant by fastest. If you want to get the answer quickly without writing any code, I would give the gpsbabel radius filter a go.
Based on your clarifications, I would use a geometrical data structure such as a KD-tree or an R-tree. MySQL has a SPATIAL data type which does this. Other languages/frameworks/databases have libraries to support this. Basically, such a data structure embeds the points in a tree of rectangles, and searches the tree using a radius. This should be fast enough, and I believe is simpler than building a Voronoi diagram. I guess there is some threshold above which you would prefer the added performance of a Voronoi diagram so you will be ready to pay the added complexity.
This can be solved in several ways. I would first approach this problem by generating a Delaunay network connecting closest points to each other. This can be accomplished with the v.delaunay command in the open source GIS application GRASS. You could complete the problem in GRASS using one of the many network analysis modules in GRASS. Alternatively, you could use the free spatial RDBMS PostGIS to do the distance queries. The PostGIS spatial queries are considerably more powerful than those in MySQL, as they are not constrained to BBOX operations. For example:
SELECT network_id, ST_Length(geometry) from spatial_table where ST_Length(geometry) < 10;
Since you are using Longitude and Latitude, you probably want to use the Spheroid-Distance functions. With a spatial index, PostGIS scales very well for large datasets.
Even if you create a voronoi diagram, that still means you need to compare your x, y coordinates to all 15 thousand created areas. To make that easier, the first thing that popped into my mind though was to create some sort of grid over the possible values, so that you can easily place and x/y coordinate into one of the boxes in a grid, if the same is done for the list of areas you should quickly shrink the possible candidates for comparison (because the grid would be more rectangular, it's possible for an area to be in multiple grid positions).
Premature optimization is the root of all evil.
15K coordinates aren't that much. Why not iterate over the 15K coordinates and see if that's really a performance problem? You could save a lot of work and maybe it never gets too slow to even notice.
How large an area are these coordinates spread out over? What latitude are they at? How much accuracy do you require? If they're fairly close together, you can probably ignore the fact that the earth is round and just treat this as a Cartesian plane rather than messing about with spherical geometry and great circle distances. Of course, as you get further from the equator, degrees of longitute get smaller compared to degrees of latitude, so some sort of scaling factor may be appropriate.
Start with a fairly simple distance formula and a brute force search and see how long that's going to take and if the results are accurate enough before you get fancy.
Thanks everyone for the answers.
#Tom, #Chris Upchurch: The coordinates are fairly close to each others, and they are in a relatively small area of about 800 sq km. I guess I can assume the surface to be flat. I need to process the requests over and over again, and the response should be faster enough for more web experience.
A grid is very simple, and very fast. It's basically just a 2D array of lists. Each array entry represents the points that fall inside a grid cell. Very easy to set the grid up:
for each point p
get cell that contains p
add point to that cell's list
and it's very easy to look things up:
given a query point p
get cell that contains p
check points in that cell (and its 8 neighbors), against query point p
Alejo
Just to be contrairian, do you mean close in distance or (driving) time? In an urban area I'd gladly drive 5 miles (5min) on the highway than 4 miles (20min stop and go) in another direction.
Thus if it's a 'closest' metric you need, I'd look into GIS databases with travel time metrics.