I have collection of objects. Each object represents a coordinate range (ie, a block). What I want is to find the object near another coordinate in a given direction.
Is there a way to do this without traversing the whole collection all the time?
You may want to look into Binary Space Partitioning, and similar algorithms (Quadtree comes to mind, along with variations on Plane Sweet Algorithms)
While inserting the objects .. sort them by the cordinates then use divide and conquer algorithm to search for your nearest possibility
Related
I'm trying to find a spatial index structure suitable for a particular problem : using a union-find data structure, I want to connect\associate points that are within a certain range of each other.
I have a lot of points and I'm trying to optimize an existing solution by using a better spatial index.
Right now, I'm using a simple 2D grid indexing each square of width [threshold distance] of my point map, and I look for potential unions by searching for points in adjacent squares in the grid.
Then I compute the squared Euclidean distance to the adjacent cells combinations, which I compare to my squared threshold, and I use the union-find structure (optimized using path compression and etc.) to build groups of points.
Here is some illustration of the method. The single black points actually represent the set of points that belong to a cell of the grid, and the outgoing colored arrows represent the actual distance comparisons with the outside points.
(I'm also checking for potential connected points that belong to the same cells).
By using this pattern I make sure I'm not doing any distance comparison twice by using a proper "neighbor cell" pattern that doesn't overlap with already tested stuff when I iterate over the grid cells.
Issue is : this approach is not even close to being fast enough, and I'm trying to replace the "spatial grid index" method with something that could maybe be faster.
I've looked into quadtrees as a suitable spatial index for this problem, but I don't think it is suitable to solve it (I don't see any way of performing repeated "neighbours" checks for a particular cell more effectively using a quadtree), but maybe I'm wrong on that.
Therefore, I'm looking for a better algorithm\data structure to effectively index my points and query them for proximity.
Thanks in advance.
I have some comments:
1) I think your problem is equivalent to a "spatial join". A spatial join takes two sets of geometries, for example a set R of rectangles and a set P of points and finds for every rectangle all points in that rectangle. In Your case, R would be the rectangles (edge length = 2 * max distance) around each point and P the set of your points. Searching for spatial join may give you some useful references.
2) You may want to have a look at space filling curves. Space filling curves create a linear order for a set of spatial entities (points) with the property that points that a close in the linear ordering are usually also close in space (and vice versa). This may be useful when developing an algorithm.
3) Have look at OpenVDB. OpenVDB has a spatial index structure that is highly optimized to traverse 'voxel'-cells and their neighbors.
4) Have a look at the PH-Tree (disclaimer: this is my own project). The PH-Tree is a somewhat like a quadtree but uses low level bit operations to optimize navigation. It is also Z-ordered/Morten-ordered (see space filling curves above). You can create a window-query for each point which returns all points within that rectangle. To my knowledge, the PH-Tree is the fastest index structure for this kind of operation, especially if you typically have only 9 points in a rectangle. If you are interested in the code, the V13 implementation is probably the fastest, however the V16 should be much easier to understand and modify.
I tried on my rather old desktop machine, using about 1,000,000 points I can do about 200,000 window queries per second, so it should take about 5 second to find all neighbors for every point.
If you are using Java, my spatial index collection may also be useful.
A standard approach to this is the "sweep and prune" algorithm. Sort all the points by X coordinate, then iterate through them. As you do, maintain the lowest index of the point which is within the threshold distance (in X) of the current point. The points within that range are candidates for merging. You then do the same thing sorting by Y. Then you only need to check the Euclidean distance for those pairs which showed up in both the X and Y scans.
Note that with your current union-find approach, you can end up unioning points which are quite far from each other, if there are a bunch of nearby points "bridging" them. So your basic approach -- of unioning groups of points based on proximity -- can induce an arbitrary amount of distance error, not just the threshold distance.
I'm facing a problem with mapping, I need mapping N dimensional vectors to one group/point, like [0,1....N-1] to 1 | [1,2....N-1] to 2.
The problem is that, right now I have one function where receive a dimensional vector and the return a point, that point is the result, I want avoid call the function, I already have all results stored in a table, the problem is, I'll remove the function and now I need mapping the new entry to a existing point.
There is some way to mapping the entry to a correct point?
There is some algorithm to mapping to the correct point?
Some help or advice?
I already saw this topic, but I'm not sure whether Hilbert Curve is the solution, I need study more about it.
Mapping N-dimensional value to a point on Hilbert curve
I'll be grateful.
Mapping an n dimensional data to a one dimensional data is called projection. There are lots of ways to project an n dimensional data to a lower dimension the most well-known ones are PCA, SVD, or using radial basis functions. If you do not have your method of projection anymore, you probably can't project another point unless you have a hash-table of the previous projected points. If you happen to have exactly the same point, then you can map it to the same point. However, pay attention that the projection is not one-to-one meaning that there may exist two points that are mapped to the same point in the lower dimension. An example of such a case is the projection of 3D points on the screen on which many points may get mapped to exactly the same points on the screen. As a result, inverse projecting the points usually have ambiguities. About the link that you sent about Hilbert curve, this is a general approach to project a point in ND to a point on a space filling curve (SFC) such as Hilbert, Peano, etc. This website at MIT has interesting stuff about dimension reduction using SFC:
http://people.csail.mit.edu/jaffer/Geometry/MDSFC
Interview question:
Given billions of rectangles, find rectangle with minimum area overlapping a given point P(x,y)
There is a simple way to achieve the answer in O(n) time by processing each rectangle sequentially, but optimize it further provided large number of Rectangle array.
My best approach would be to check each rectangle, see if the point is inside, then calculate area and compare with current smallest area. This can be done in a single pass. I cannot conceive of any other method that doesn't require checking all rectangles
If you use the same rectangle set with many point queries, then R-tree data structure allows to know what rectangles contain given point without checking all rectangles
You do need to process all rectangles at least once, of course. But use that first pass wisely and you can get multiple faster lookups later on.
I would insert the rectangles into a K-d tree or a Quadtree.
Most probably you are not telling the full question. Because the way how it stands now, your solution is optimal.
No matter what you have to go through each rectangle at least one to check whether it actually covers the point and to calculate its area. There is no point to preprocess them in any way because you need to answer only one question.
Preprocessing only makes sense if you will need to answer many similar questions in the future.
I´ve got the following theoretical problem:
I have an amount n of cuboids in 3-dimensional space.
They are aligned to the coordinate-system, so that one cuboid can be described via a point (x,y,z) and dimensions (dimX,dimY,dimZ).
I want to organize these cuboids in a way that I´m able to check if a newly inserted cuboid intersecs with one of the existing (collision detection).
To do this I decided to use hierarchical bounding-boxes.
So in sum I have a binary-tree-structure of bounding volumes.
Insertion is then done by determining recursively the distance to both children (=the distance between the two centers of two cuboids) and inserting in the path with the smallest distance.
Collision detection works similar, but we take all bounding volumes in a sub-path which are intersecting a given cuboid.
The tricky part is, how to balance this tree to get better performance if some cuboids are very close to each other and others are far away.
So far, I´ve found no way to use e.g. an AVL-tree because then I´d have to be able to compare two cuboids in some way that does not break the conditions on which collision detection depends.
P.S.: I know there are libraries to do this, but I want to understand the principles of collision detection e.g. in games in detail and therefore want to implement this by myself.
I´ve now tried it with space-partitioning instead of object-partitioning. That´s not exactly what I wanted to do but I found much more helpful information about it, e.g.: https://en.wikipedia.org/wiki/Kd-tree
With this information it should be possible to implement it.
I am trying to write a spatial data structure (such as a K-D tree or a QuadTree) which, given a point, will find the x closest points to it.
The issue with the data structures I mentioned above is that they support mostly a radial/region search. So they will obtain the points that are within a radius of y of a given point/node.
Altering those structures search for what I want would be inefficient. I am assuming I will need to repeat the radial search several times, starting from a short radial distance, and keep increasing it until I have the wanted x amount of points close to the given point. Of course, this defeats the whole purpose behind the data structure.
Almost all spatial data structures operate on radial search. What are other efficient search methods I could apply to a QuadTree, or any other spatial data structures I need to consider to achieve what I mean? Any suggestions?
I'm not sure that you are right in your assumptions. The Wikipedia article on kd-trees indicates how the structure can be used to support finding the x nearest neighbours to a search point. Yes, it is essentially a repetition of finding the nearest neighbour x times, but I'm not sure that you have a right to expect a more efficient performance from an algorithm over a kd-tree.
If that is not good enough for you perhaps you need to store your points in a different data structure. If x is small and bounded you could store your points in a weighted graph where the edge weights are, of course, the distances between points.
If x is neither small nor bounded you might employ a simple subdivision of space into k*m uniform cells (2D here, inflate to 3+D if necessary). For each search point go straight to the cell which contains it, find the other points in the same cell. If x of them are closer to the search point than the boundary of the cell, those are what you are looking for. If not, search in the cells on the other side of the near boundaries too.
If you find yourself needing to support both radial/region searches and x-nearest neighbour searches it's not the end of the world if you have to maintain 2 data structures, one to support each type of query. For many search problems the first step to an efficient solution is to put the data into the right structure for efficient searching. Making this decision depends on numbers you simply haven't provided us.
If you do call the search method several times over on a quadtree (which is what I've done a few times), then if you double the search radius on each call until you have correct number of points, the search is not that inefficient.
Assuming a 2d space, if the correct minimum radius to contain the X points is R1, and you keep on doubling until you find a radius R2 which contains them, then (a) R2 must be less than 2xR1 and (b) the area searched becomes 4 times bigger on each search, which (I think) gives you a worst case scenario of only half the area you've searched through actually being unnecessary (or thereabouts).