Good data structure for euclidean 3d data queries? - data-structures

What's a good way to store point cloud data so that it's optimal for an application that will do one of these two queries?
Nearest (i.e. lowest euclidean distance) data point to (x,y,z)
Get all the points inside a sphere with radius R around a point (x,y,z)
The structure will only be filled once, but read many times. A lowish memory footprint would be nice as I may be dealing with datasets of > 7 million points, but speed should be of primary concern. A library would be nice, but I wouldn't mind implementing it myself if it's something doable with limited expertise in the area.
Thanks in advance!

A Kd-Tree you get O(log(n)) nearest neighbor, and usually range queries will be fast.
There are a bunch of libraries referenced there. I have not used any of them.
You might also look at CGAL. I have used CGAL for other things, it is tolerably fast, extremely comprehensive, but the documentation will drive you to drink.

A huge portion of the decision in data structures will depend on the spatial organization of the data. For example, highly clustered data tends to have different performance charateristics in kd-trees than evenly distributed data.
KD-Trees are very good for both of these queries.
Octree can be a good option in many cases, as well, and is potentially easier to implement.
There are many libraries out there that do this, using various algorithms. Searching for k-nearest neighbor searching will reveal many useful libraries. I've had pretty good luck with ANN in the past, for example.

Related

N dimensional space search algorithms

I have an input N dimensional space. I also have a function f which given some point in the space, would return how likely this point is a "good" point [some measure between [0, 1]]. I know that "good" points are often close together in the space. But there could be clusters of these good points spread across the entire search space. So there could be regions which are excellent in producing these good points.
What are some good approximate algorithms / statistics / techniques I could apply to get as many of these points as possible, and also as extensive as possible (covering as many cluster as possible).
Thanks
Take a look at the topics of cluster analysis and statistical classification.
The point here is that there are many different algorithms and it really depends a lot on your application and the structure of your input data if a certain method is appropriate.
You might want to use a data mining tool for evaluating different algorithms on your specific data. I have used RapidMiner in the past to do so and learned a lot about what worked well for my application and what did not.

best algorithm for spacial partitioning/collision detection on objects from tiny to massive size?

I've looked around and found a billion questions, articles, studies, theses, etc, but I haven't been able to really figure out or find an answer to this question.
Basically, I'm just wondering what the best algorithm(s) for spacial partitioning/collision detection between objects from 1 pixel to the size of the screen itself is. Currently, i'm leaning towards the the loose quadtree.
First, have a look here.
It depends on your requirements. Quadtrees are perfect for small to medium sized datasets up to 100K entries or so. If that fits your requirements, there is no need to read further.
However, normal quadtrees tend to have difficulties with very large or strongly clustered (high point density in some areas) datasets. They are also not that straight forward to implement because with larger tree you may run into precision problems (divide a number by 2 30 times if you go deep in a tree and your quadrants start overlapping or have gaps between them). Otherwise they are relatively easy to implement.
I found my PH-Tree quite useful, it is somewhat similar to a quadtree, but as no precision problems and inherently limited depth. In my experience it's quite fast, especially if you do window queries with a small result set (that's basically what you do in collision detection). Unfortunately it's not that easy to implement. The link above references my own Java implementation, but the doc also contains a link to a C++ version.
You can also try the 'qthypercube2' here, it's a quadtree with some of the PH-Tree's navigation techniques. The current 1.6 version has some precision problems for extreme datasets, but you will find it to be very fast, and a solution to the precision problem is already in the pipeline.

Algorithm for 2D nearest-neighbour queries with dynamic points

I am trying to find a fast algorithm for finding the (approximate, if need be) nearest neighbours of a given point in a two-dimensional space where points are frequently removed from the dataset and new points are added.
(Relatedly, there are two variants of this problem that interest me: one in which points can be thought of as being added and removed randomly and another in which all the points are in constant motion.)
Some thoughts:
kd-trees offer good performance, but are only suitable for static point sets
R*-trees seem to offer good performance for a variety of dimensions, but the generality of their design (arbitrary dimensions, general content geometries) suggests the possibility that a more specific algorithm might offer performance advantages
Algorithms with existing implementations are preferable (though this is not necessary)
What's a good choice here?
I agree with (almost) everything that #gsamaras said, just to add a few things:
In my experience (using large dataset with >= 500,000 points), kNN-performance of KD-Trees is worse than pretty much any other spatial index by a factor of 10 to 100. I tested them (2 KD-trees and various other indexes) on a large OpenStreetMap dataset. In the following diagram, the KD-Trees are called KDL and KDS, the 2D dataset is called OSM-P (left diagram): The diagram is taken from this document, see bullet points below for more information.
This research describes an indexing method for moving objects, in case you keep (re-)inserting the same points in slightly different positions.
Quadtrees are not too bad either, they can be very fast in 2D, with excellent kNN performance for datasets < 1,000,000 entries.
If you are looking for Java implementations, have a look at my index library. In has implementations of quadtrees, R-star-tree, ph-tree, and others, all with a common API that also supports kNN. The library was written for the TinSpin, which is a framework for testing multidimensional indexes. Some results can be found enter link description here (it doesn't really describe the test data, but 'OSM-P' results are based on OpenStreetMap data with up to 50,000,000 2D points.
Depending on your scenario, you may also want to consider PH-Trees. They appear to be slower for kNN-queries than R-Trees in low dimensionality (though still faster than KD-Trees), but they are faster for removal and updates than RTrees. If you have a lot of removal/insertion, this may be a better choice (see the TinSpin results, Figures 2 and 46). C++ versions are available here and here.
Check the Bkd-Tree, which is:
an I/O-efficient dynamic data structure based on the kd-tree. [..] the Bkd-tree maintains its high space utilization and excellent
query and update performance regardless of the number of updates performed on it.
However this data structure is multi dimensional, and not specialized to lower dimensions (like the kd-tree).
Play with it in bkdtree.
Dynamic Quadtrees can also be a candidate, with O(logn) query time and O(Q(n)) insertion/deletion time, where Q(n) is the time
to perform a query in the data structure used. Note that this data structure is specialized for 2D. For 3D however, we have octrees, and in a similar way the structure can be generalized for higher dimensions.
An implentation is QuadTree.
R*-tree is another choice, but I agree with you on the generality. A r-star-tree implementations exists too.
A Cover tree could be considered as well, but I am not sure if it fits your description. Read more here,and check the implementation on CoverTree.
Kd-tree should still be considered, since it's performance is remarkable on 2 dimensions, and its insertion complexity is logarithic in size.
nanoflann and CGAL are jsut two implementations of it, where the first requires no install and the second does, but may be more performant.
In any case, I would try more than one approach and benchmark (since all of them have implementations and these data structures are usually affected by the nature of your data).

Quadtree performance issue

The problem I have is that a game I work on uses a quadtree for fast proximity detection, used for range checks when weapons are firing. I'm using the classic "4 wide" quadtree, meaning that I subdivide when I attempt to add a 5th child node to an already full parent node.
Initially the set of available targets was fairly evenly spread out, so the quadtree worked very well. Due to design changes, we now get clusters of large numbers of enemies in a relatively small space, leading to performance problems because the quadtree becomes significantly unbalanced.
There are two possible solutions that occur to me, either modify the quadtree to handle this, or switch to an alternative representation.
The only other representation I'm familiar with is a spatial hash, and not very familiar at that. From what I understand, this risks suffering the same problem since the cluster would wind up in a relatively small number of hash buckets. From what I know of it, a BSP is a possible solution that will deal with the uneven distribution better than a quadtree or spatial hash.
No fair, I know, there are actually three questions now.
Are there any modifications I can make to the quadtree, e.g. increasing the "width" of nodes that would help deal with this?
Is it even worth my time to consider a spatial hash?
Is a BSP, or some other data structure a better bet to deal with the uneven distribution?
I usually use quadtree with at least 10 entries per node, but you'll have to try it out.
I have no experience with spatial hashing.
Other structures you could look into are:
KD-Trees: They are quite simple to implement and are also good for neighbour search, but get slower with string clustering. They are a bit slow to update and may get imbalanced.
R*Tree: More complex, very good with neighbour search, but even slower to update than KD-Trees. They won't get imbalanced because of automatic rebalancing. Rebalancing is mostly fast, but in extreme cases it can occasionally slow thing down further.
PH-Tree: Quite complex to implement. Good neighbour search. Has very good update speed (like quadtree), maximum depth is limited by the bit width of your coordinates (usually 32 or 64bit), so they can't really get imbalanced. Scales very well with large datasets (1 million and more), I have little experience with small datasets.
If you are using Java, I have Apache licensed versions available here (R*Tree) and here (PH-Tree).

How should I index for a simple world of rectangles?

The world consists of many (1k-10k) rectangles of similar sizes, and I need to be able to quickly determine potential overlaps when trying to add a new rectangle. Rectangles will be added and removed dynamically. Are R-Trees appropriate here? If so, are there any good libraries I should consider? (I'm open to suggestions in any language).
R-Trees would be suitable, yes.
quad trees are also a good data structure for quickly finding objects in a region of 2D space. They are really a more uniform version of r-trees. Using these structures you can quickly zero in on a small region of space, with very few tests, even with massive data sets.
There is a c# implementation here, though I have not looked at it.
This kind of data structure (and it's 3D version called Octrees) are often used in games to manage the large data sets of objects that need to know if they are near any other objects for collision testing, and all kinds of other fun reasons.
You should be able to find lots of articles and examples of these kinds of data structures in the games industry sites, like gamasutra and opengl.org
You can also look up to kd-trees.
I don't know of any implementation but in 3D at least they are usually considered more performant than Octrees. For example, here is a return of experience I just googled it.
You may want to consider alternative to quad trees if you ever have a problem of performance.
However it should be noted that kd-trees are hard to rebalance...

Resources