2D spatial index optimized for queries of multiple regions - data-structures

I have a search space of a large number of axis-aligned boxes. A normal spatial index, like an r-tree, will rapidly give me a list of boxes that overlap one search area.
However, I have a large number (hundreds) of potentially overlapping search areas I'd like to query all at once. In other words, I want all objects in my data structure that overlap at least one of these five hundred boxes.
Is there a data structure optimized for this kind of query?

You can use a quadkey and a placeholder. You can create a quadkey by interleave the x-and y co-ordinate. It's use in a morton curve a.k.a z curve.

Related

Using a spatial index to find points within range of each other

I'm trying to find a spatial index structure suitable for a particular problem : using a union-find data structure, I want to connect\associate points that are within a certain range of each other.
I have a lot of points and I'm trying to optimize an existing solution by using a better spatial index.
Right now, I'm using a simple 2D grid indexing each square of width [threshold distance] of my point map, and I look for potential unions by searching for points in adjacent squares in the grid.
Then I compute the squared Euclidean distance to the adjacent cells combinations, which I compare to my squared threshold, and I use the union-find structure (optimized using path compression and etc.) to build groups of points.
Here is some illustration of the method. The single black points actually represent the set of points that belong to a cell of the grid, and the outgoing colored arrows represent the actual distance comparisons with the outside points.
(I'm also checking for potential connected points that belong to the same cells).
By using this pattern I make sure I'm not doing any distance comparison twice by using a proper "neighbor cell" pattern that doesn't overlap with already tested stuff when I iterate over the grid cells.
Issue is : this approach is not even close to being fast enough, and I'm trying to replace the "spatial grid index" method with something that could maybe be faster.
I've looked into quadtrees as a suitable spatial index for this problem, but I don't think it is suitable to solve it (I don't see any way of performing repeated "neighbours" checks for a particular cell more effectively using a quadtree), but maybe I'm wrong on that.
Therefore, I'm looking for a better algorithm\data structure to effectively index my points and query them for proximity.
Thanks in advance.
I have some comments:
1) I think your problem is equivalent to a "spatial join". A spatial join takes two sets of geometries, for example a set R of rectangles and a set P of points and finds for every rectangle all points in that rectangle. In Your case, R would be the rectangles (edge length = 2 * max distance) around each point and P the set of your points. Searching for spatial join may give you some useful references.
2) You may want to have a look at space filling curves. Space filling curves create a linear order for a set of spatial entities (points) with the property that points that a close in the linear ordering are usually also close in space (and vice versa). This may be useful when developing an algorithm.
3) Have look at OpenVDB. OpenVDB has a spatial index structure that is highly optimized to traverse 'voxel'-cells and their neighbors.
4) Have a look at the PH-Tree (disclaimer: this is my own project). The PH-Tree is a somewhat like a quadtree but uses low level bit operations to optimize navigation. It is also Z-ordered/Morten-ordered (see space filling curves above). You can create a window-query for each point which returns all points within that rectangle. To my knowledge, the PH-Tree is the fastest index structure for this kind of operation, especially if you typically have only 9 points in a rectangle. If you are interested in the code, the V13 implementation is probably the fastest, however the V16 should be much easier to understand and modify.
I tried on my rather old desktop machine, using about 1,000,000 points I can do about 200,000 window queries per second, so it should take about 5 second to find all neighbors for every point.
If you are using Java, my spatial index collection may also be useful.
A standard approach to this is the "sweep and prune" algorithm. Sort all the points by X coordinate, then iterate through them. As you do, maintain the lowest index of the point which is within the threshold distance (in X) of the current point. The points within that range are candidates for merging. You then do the same thing sorting by Y. Then you only need to check the Euclidean distance for those pairs which showed up in both the X and Y scans.
Note that with your current union-find approach, you can end up unioning points which are quite far from each other, if there are a bunch of nearby points "bridging" them. So your basic approach -- of unioning groups of points based on proximity -- can induce an arbitrary amount of distance error, not just the threshold distance.

Search bounding rectangles (axis aligned) for a given query point in 2 dimensions

I have a set of very many axis-aligned rectangles which maybe nested and intersecting. I want to be able to find all the rectangles that enclose/bound a query point. What would be a good approach for this?
EDIT : Additional information-
1. By very many I meant ~100 million or more.
2. The rectangles are distributed across a huge span (span of a country). There is no restriction on the sizes.
3. Yes the rectangles can be pre-processed and stored in a tree structure.
4. No real-time insertions and deletions are required.
5. I only need to find all the rectangles enclosing/bounding a given query point. I do not need the Nearest Neighbors.
As you might have guessed, this is for a real-time geo-fencing application on a mobile unit and hence -
6. The search need not be repeated for rectangles sufficiently far from the point.
I've tried KD trees and Quad-Trees by approximating each Rectangle to a point. They've given me variable performances depending on the size of the rectangles.
Is there a more direct way of doing it ? How about r trees?
I would consider using a quadtree. (Posting from mobile so it's too much effort to link, but Wikipedia has a decent explanation.) You can split at the left, right, top, and bottom bound of any rectangle, and store each rectangle in the node representing the smallest region that contains the rectangle. To search for a point, you go down the quadtree towards the point and check every rectangle that you encounter along that path.
This will work well for small rectangles, but if many rectangles cover almost the entire region you'll still have to check all of those.
You need to look at the R*-tree data structure.
In contrast to many other structures, the R*-tree is well capable of storing (overlapping) rectangles. It is not limited to point data. You will be able to find many publications on how to best approximate polygons before putting them into the index, too. Also, it scales up to pretty large data, as it can operate on disk, too.
R*-trees are faster when bulk loaded; as this can be used to reduce overlap of index pages and ensure a near-perfectly balanced tree, whereas dynamic insertions only guarantee each page to be at least half full or so. I.e. a bulk loaded tree will often use only half as much memory / storage.
For 2d data, and your type of queries, a quadtree or grid may however work just well enough. It depends on how much local data density varies.

Spatial index on a sorted set

I have a large set of objects to render in 2D, which I have sorted from bottom to top. I'm currently using an R-tree to get a subset of them out that are within the current viewport. However, after getting them out of the spatial index, I have to re-sort them by their Z order. That sorting takes about 6 times longer than looking up the list of them in the spatial index (where several hundred items have matched my query).
Is there a kind of 2D spatial index which has fast lookup by rectangular bounding box, which will return the elements in a sorted order?
You can build the R-tree on the Z-order directly.
Usually, the Hilbert order is preferred, this is known as an Hilbert-R-tree.
But you can do the same with the Z-order, too.
However, you may also consider to store the data fully in Z-order right away; in a B+-tree for example.
Instead of querying with a rectangle, translate your query into Z-order intervals, and query for the Z indexes. This is a very classic approach predating the R-trees:
Morton, G. M. (1966)
A computer Oriented Geodetic Data Base; and a New Technique in File Sequencing
Technical Report, Ottawa, Canada: IBM Ltd.

Match housenumbers on buildings (special case of point-in-polygon-test)

Task with example
I'm working with geodata (country-size) from openstreetmap. Buildings are often polygons without housenumbers and a single point with the housenumber is placed within the polygon of the building. Buildings may have multiple housenumbers.
I want to match the housenumbers to the polygons of the buildings.
Simple solution
Foreach housenumber perform a point-in-polygon-test with each building-polygon.
Problem
Way too slow for about 50,000,000 buildings and 10,000,000 address-points.
Idea
Build and index for the building-polygons to accelerate the search for the surrounding polygon for each housenumber-point.
Question
What index or strategy would you recommend for this polygon-structure? The polygons never overlap and the area is sparsly covered.
This question is duplicated to gis.stackexchange.com. It was recommendet to post the question there.
Since it sounds like you have well-formed polygons to test against, I'd use a spatial hash with a AABB check, and then finally the full point-in-polygon test. Hopefully at that point you'll be averaging three or less point-in-polygon tests per address.
Break the area your data is over into a simple grid where a grid is a small multiple (2 to 4) of the median building size. (Maybe 100-200 meters?)
Compute the axis aligned bounding box of every polygon, add it (with its bounding box) to each grid location which the bounding box intersects. (It's pretty simple to figure out where an axis aligned bounding box overlaps regular axis aligned grid cells. I wouldn't store the grid in a simple 2D array -- I'd use a hash table that maps 2D integer grid coordinates, e.g. (1023, 301), to a list of polygons)
Then go through all your address points. Look up in your hash table what cell that point is in. Go through all the polygons in that cell and if the point is within any polygon's axis aligned bounding box do the full point-in-polygon test.
This has several advantages:
The data structures are simple -- no fancy libraries needed (other than handling polygons). With C++, your polygon library, and the std namespace this could be implemented in less than an hour.
Spatial structure isn't hierarchical -- when you're looking up the points you only have to do one O(1) lookup in the hash table.
And of course, the usual disadvantage of grids as a spatial structure:
Doesn't handle wildly varying sized polygons particularly well. However, I'm hoping since you're using map data the sizes are almost always within an order of magnitude, and probably much less.
Assuming you end up with N maximum polygons in each of grid and each polygon has P points and you've got B buildings and A addresses, you're looking at O(B*P + N*A). Since B and P are likely relatively small, especially on average, you could consider this O(B + N) -- pretty much linear.

Data structure to hold list of rectangles?

I was wondering if there is a good data structure to hold a list of axis-aligned non overlapping discrete space rectangles. Thus each rectangle could be stored as the integers x, y, width, and height. It would be easy to just store such a list but I also want to be able to query if a given x,y coordinate is inside any other rectangle.
One easy solution would be to create a hash and fill it with the hashed lower left coordinates of the start of each rectangle. This would not allow me to test a given x,y coordinate because it would hit an empty space in the middle. Another answer is to create a bunch of edges into the hash table that cover the entire rectangle with unit squares. This would create too many needless entries for a rectangle of say 100 by 100.
R-Tree is the can be used. R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The information of all rectangles can be stored in tree form so searching will be easy
Wikipedia page, short ppt and the research paper will help you understand the concept.

Resources