Spatial index on a sorted set

Spatial index on a sorted set - data-structures

I have a large set of objects to render in 2D, which I have sorted from bottom to top. I'm currently using an R-tree to get a subset of them out that are within the current viewport. However, after getting them out of the spatial index, I have to re-sort them by their Z order. That sorting takes about 6 times longer than looking up the list of them in the spatial index (where several hundred items have matched my query).
Is there a kind of 2D spatial index which has fast lookup by rectangular bounding box, which will return the elements in a sorted order?

You can build the R-tree on the Z-order directly.
Usually, the Hilbert order is preferred, this is known as an Hilbert-R-tree.
But you can do the same with the Z-order, too.
However, you may also consider to store the data fully in Z-order right away; in a B+-tree for example.
Instead of querying with a rectangle, translate your query into Z-order intervals, and query for the Z indexes. This is a very classic approach predating the R-trees:
Morton, G. M. (1966)
A computer Oriented Geodetic Data Base; and a New Technique in File Sequencing
Technical Report, Ottawa, Canada: IBM Ltd.

Related

Using a spatial index to find points within range of each other

I'm trying to find a spatial index structure suitable for a particular problem : using a union-find data structure, I want to connect\associate points that are within a certain range of each other.
I have a lot of points and I'm trying to optimize an existing solution by using a better spatial index.
Right now, I'm using a simple 2D grid indexing each square of width [threshold distance] of my point map, and I look for potential unions by searching for points in adjacent squares in the grid.
Then I compute the squared Euclidean distance to the adjacent cells combinations, which I compare to my squared threshold, and I use the union-find structure (optimized using path compression and etc.) to build groups of points.
Here is some illustration of the method. The single black points actually represent the set of points that belong to a cell of the grid, and the outgoing colored arrows represent the actual distance comparisons with the outside points.
(I'm also checking for potential connected points that belong to the same cells).
By using this pattern I make sure I'm not doing any distance comparison twice by using a proper "neighbor cell" pattern that doesn't overlap with already tested stuff when I iterate over the grid cells.
Issue is : this approach is not even close to being fast enough, and I'm trying to replace the "spatial grid index" method with something that could maybe be faster.
I've looked into quadtrees as a suitable spatial index for this problem, but I don't think it is suitable to solve it (I don't see any way of performing repeated "neighbours" checks for a particular cell more effectively using a quadtree), but maybe I'm wrong on that.
Therefore, I'm looking for a better algorithm\data structure to effectively index my points and query them for proximity.
Thanks in advance.

I have some comments:
1) I think your problem is equivalent to a "spatial join". A spatial join takes two sets of geometries, for example a set R of rectangles and a set P of points and finds for every rectangle all points in that rectangle. In Your case, R would be the rectangles (edge length = 2 * max distance) around each point and P the set of your points. Searching for spatial join may give you some useful references.
2) You may want to have a look at space filling curves. Space filling curves create a linear order for a set of spatial entities (points) with the property that points that a close in the linear ordering are usually also close in space (and vice versa). This may be useful when developing an algorithm.
3) Have look at OpenVDB. OpenVDB has a spatial index structure that is highly optimized to traverse 'voxel'-cells and their neighbors.
4) Have a look at the PH-Tree (disclaimer: this is my own project). The PH-Tree is a somewhat like a quadtree but uses low level bit operations to optimize navigation. It is also Z-ordered/Morten-ordered (see space filling curves above). You can create a window-query for each point which returns all points within that rectangle. To my knowledge, the PH-Tree is the fastest index structure for this kind of operation, especially if you typically have only 9 points in a rectangle. If you are interested in the code, the V13 implementation is probably the fastest, however the V16 should be much easier to understand and modify.
I tried on my rather old desktop machine, using about 1,000,000 points I can do about 200,000 window queries per second, so it should take about 5 second to find all neighbors for every point.
If you are using Java, my spatial index collection may also be useful.

A standard approach to this is the "sweep and prune" algorithm. Sort all the points by X coordinate, then iterate through them. As you do, maintain the lowest index of the point which is within the threshold distance (in X) of the current point. The points within that range are candidates for merging. You then do the same thing sorting by Y. Then you only need to check the Euclidean distance for those pairs which showed up in both the X and Y scans.
Note that with your current union-find approach, you can end up unioning points which are quite far from each other, if there are a bunch of nearby points "bridging" them. So your basic approach -- of unioning groups of points based on proximity -- can induce an arbitrary amount of distance error, not just the threshold distance.

Search bounding rectangles (axis aligned) for a given query point in 2 dimensions

I have a set of very many axis-aligned rectangles which maybe nested and intersecting. I want to be able to find all the rectangles that enclose/bound a query point. What would be a good approach for this?
EDIT : Additional information-
1. By very many I meant ~100 million or more.
2. The rectangles are distributed across a huge span (span of a country). There is no restriction on the sizes.
3. Yes the rectangles can be pre-processed and stored in a tree structure.
4. No real-time insertions and deletions are required.
5. I only need to find all the rectangles enclosing/bounding a given query point. I do not need the Nearest Neighbors.
As you might have guessed, this is for a real-time geo-fencing application on a mobile unit and hence -
6. The search need not be repeated for rectangles sufficiently far from the point.
I've tried KD trees and Quad-Trees by approximating each Rectangle to a point. They've given me variable performances depending on the size of the rectangles.
Is there a more direct way of doing it ? How about r trees?

I would consider using a quadtree. (Posting from mobile so it's too much effort to link, but Wikipedia has a decent explanation.) You can split at the left, right, top, and bottom bound of any rectangle, and store each rectangle in the node representing the smallest region that contains the rectangle. To search for a point, you go down the quadtree towards the point and check every rectangle that you encounter along that path.
This will work well for small rectangles, but if many rectangles cover almost the entire region you'll still have to check all of those.

You need to look at the R*-tree data structure.
In contrast to many other structures, the R*-tree is well capable of storing (overlapping) rectangles. It is not limited to point data. You will be able to find many publications on how to best approximate polygons before putting them into the index, too. Also, it scales up to pretty large data, as it can operate on disk, too.
R*-trees are faster when bulk loaded; as this can be used to reduce overlap of index pages and ensure a near-perfectly balanced tree, whereas dynamic insertions only guarantee each page to be at least half full or so. I.e. a bulk loaded tree will often use only half as much memory / storage.
For 2d data, and your type of queries, a quadtree or grid may however work just well enough. It depends on how much local data density varies.

2D spatial index optimized for queries of multiple regions

I have a search space of a large number of axis-aligned boxes. A normal spatial index, like an r-tree, will rapidly give me a list of boxes that overlap one search area.
However, I have a large number (hundreds) of potentially overlapping search areas I'd like to query all at once. In other words, I want all objects in my data structure that overlap at least one of these five hundred boxes.
Is there a data structure optimized for this kind of query?

You can use a quadkey and a placeholder. You can create a quadkey by interleave the x-and y co-ordinate. It's use in a morton curve a.k.a z curve.

CUDA Thrust find near neighbor points

In my problem, there are N points in the domain and they are somehow randomly distributed. For each point I need to find all neighbor points with distance less than a given double precision floating number, DIST.
Is there an efficient way to do this in Thrust?
In serial, I would use a neighborhood table and hope to achieve approximately O(n) instead of naive algorithm of O(n^2).
I have found a thrust example for 2D bucket sort, which is a perfect fit for the first part of my problem. But that is not enough, because for each bucket I need to find all points in the neighbor buckets, and then compute their distances and see if any of them is less than DIST. Finding neighbors and compute distance should be relatively easy, but adding those eligible points to a result array seems really difficult for me to implement in Thrust.
A way to rephrase this particular problem is this -- I have two 2D arrays A1 and A2, the column number represent the index of the 2D bucket and each column have different number of elements that are indices of my points. Each element in column(i) of A1 will form a potential pair with each element in colunm(i) of A2, and all eligible pairs should be recorded to a result array.
I could use a CUDA kernel and allocating tons of potentially unused memory as a workaround, but that would be the last thing I would want to do.
Thanks in advance.

The full solution is out of the scope of a single Stack Overflow answer, but there's a discussion on how to use Thrust to build a 2D spatial index in this repository:
https://github.com/jaredhoberock/thrust-workshop

Another possibility, simpler than creating a quad-tree, is using a neighborhood matrix.
First place all your points into a 2D square matrix (or 3D cubic grid, if you are dealing with three dimensions). Then you can run a full or partial spatial sort, so points will became ordered inside the matrix.
Points with small Y could move to the top rows of the matrix, and likewise, points with large Y would go to the bottom rows. The same will happen with points with small X coordinates, that should move to the columns on the left. And symmetrically, points with large X value will go to the right columns.
After you did the spatial sort (there are many ways to achieve this, both by serial or parallel algorithms) you can lookup the nearest points of a given point P by just visiting the adjacent cells where point P is actually stored in the neighborhood matrix.
If this matrix is placed into texture memory, you can use all the spatial caching from CUDA to have very fast accesses to all neighbors!
You can read more details for this idea in the following paper (you will find PDF copies of it online): Supermassive Crowd Simulation on GPU based on Emergent Behavior.
The sorting step gives you interesting choices. You can use just the even-odd transposition sort described in the paper, which is very simple to implement (even in CUDA). If you run just one pass of this, it will give you a partial sort, which can be already useful if your matrix is near-sorted. That is, if your points move slowly, it will save you a lot of computation.
If you need a full sort, you can run such even-odd transposition pass several times (as described in the following Wikipedia page):
http://en.wikipedia.org/wiki/Odd%E2%80%93even_sort
There is a second paper from the same authors, describing an extension to 3D and using three passes of the bitonic sort (which is highly parallel, but it is not a spatial sort). they claim it is both more precise than a single even-odd transposition pass and more efficient than a full sort. The paper is A Neighborhood Grid Data Structure for Massive 3D Crowd Simulation on GPU.

Different Searching Methods In Spatial Data Structures

I am trying to write a spatial data structure (such as a K-D tree or a QuadTree) which, given a point, will find the x closest points to it.
The issue with the data structures I mentioned above is that they support mostly a radial/region search. So they will obtain the points that are within a radius of y of a given point/node.
Altering those structures search for what I want would be inefficient. I am assuming I will need to repeat the radial search several times, starting from a short radial distance, and keep increasing it until I have the wanted x amount of points close to the given point. Of course, this defeats the whole purpose behind the data structure.
Almost all spatial data structures operate on radial search. What are other efficient search methods I could apply to a QuadTree, or any other spatial data structures I need to consider to achieve what I mean? Any suggestions?

I'm not sure that you are right in your assumptions. The Wikipedia article on kd-trees indicates how the structure can be used to support finding the x nearest neighbours to a search point. Yes, it is essentially a repetition of finding the nearest neighbour x times, but I'm not sure that you have a right to expect a more efficient performance from an algorithm over a kd-tree.
If that is not good enough for you perhaps you need to store your points in a different data structure. If x is small and bounded you could store your points in a weighted graph where the edge weights are, of course, the distances between points.
If x is neither small nor bounded you might employ a simple subdivision of space into k*m uniform cells (2D here, inflate to 3+D if necessary). For each search point go straight to the cell which contains it, find the other points in the same cell. If x of them are closer to the search point than the boundary of the cell, those are what you are looking for. If not, search in the cells on the other side of the near boundaries too.
If you find yourself needing to support both radial/region searches and x-nearest neighbour searches it's not the end of the world if you have to maintain 2 data structures, one to support each type of query. For many search problems the first step to an efficient solution is to put the data into the right structure for efficient searching. Making this decision depends on numbers you simply haven't provided us.

If you do call the search method several times over on a quadtree (which is what I've done a few times), then if you double the search radius on each call until you have correct number of points, the search is not that inefficient.
Assuming a 2d space, if the correct minimum radius to contain the X points is R1, and you keep on doubling until you find a radius R2 which contains them, then (a) R2 must be less than 2xR1 and (b) the area searched becomes 4 times bigger on each search, which (I think) gives you a worst case scenario of only half the area you've searched through actually being unnecessary (or thereabouts).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio