What is the difference between a KD-tree and a R-tree? - data-structures

I looked at the definition of KD-tree and R-tree. It seems to me that they are almost the same.
What's the difference between a KD-tree and an R-tree?

They are actually quite different. They serve similar purpose (region queries on spatial data), and they both are trees (and both belong to the family of bounding volume hierarchy indexes), but that is about all they have in common.
R-Trees are balanced, k-d-trees are not (unless bulk-loaded). This is why R-trees are preferred for changing data, as k-d-trees may need to be rebuilt to re-optimize.
R-Trees are disk-oriented. They actually organize the data in areas that directly map to the on-disk representation. This makes them more useful in real databases and for out-of-memory operation. k-d-trees are memory oriented and are non-trivial to put into disk pages
k-d-trees are elegant when bulk-loaded (kudos to SingleNegationElimination for pointing this out), while R-trees are better for changing data (although they do benefit from bulk loading, when used with static data).
R-Trees do not cover the whole data space. Empty areas may be uncovered. k-d-trees always cover the whole space.
k-d-trees binary split the data space, R-trees partition the data into rectangles. The binary splits are obviously disjoint; while the rectangles of an R-tree may overlap (which actually is sometimes good, although one tries to minimize overlap)
k-d-trees are a lot easier to implement in memory, which actually is their key benefit
R-trees can store rectangles and polygons, k-d-trees only stores point vectors (as overlap is needed for polygons)
R-trees come with various optimization strategies, different splits, bulk-loaders, insertion and reinsertion strategies etc.
k-d-trees use the one-dimensional distance to the separating hyperplane as bound; R-trees use the d-dimensional minimum distance to the bounding hyperrectangle for bounding (they can also use the maximum distance for some counting queries, to filter true positives).
k-d-trees support squared Euclidean distance and Minkowski norms, while Rtrees have been shown to also support geodetic distance (for finding near points on geodata).

R-trees and kd-trees are based on similar ideas (space partitioning based on axis-aligned regions), but the key differences are:
Nodes in kd-trees represent separating planes, whereas nodes in R-trees represent bounding boxes.
kd-trees partition the whole of space into regions whereas R-trees only partition the subset of space containing the points of interest.
kd-trees represent a disjoint partition (points belong to only one region) whereas the regions in an R-tree may overlap.
(There are lots of similar kinds of tree structures for partitioning space: quadtrees, BSP-trees, R*-trees, etc. etc.)

A major difference between the two not mentioned in this answer is that KD-trees are only efficient in bulk-loading situations. Once built, modifying or rebalancing a KD-tree is non-trivial. R-trees do not suffer from this.

Related

A data structure to handle moving points contact and containment within bounding boxes?

I have many points in space moving over time. They move in space full of AABB bounding boxes (including nested ones, there are less BBs than points) I wonder if there is a data structure that would help with organization of points getting into bounding boxes detection.
Currently I thought of a kd-tree based on boxes centers to perform ANN on points movement, with boxes intersection /nesting hierarchy (who is inside /beside whom) for box detection.
Yet this is slow for so many points so I wonder if there is some specialized algorithm/data structure for such case? A way to make such query for many points at the same time?
I would suggest using some kind if quadtree.
Basic quadtrees are already quite good with fast deletion/insertion, but there are special variants, that are better:
The Quadtree proposed by Samet et al uses overlapping regions, which allow moving objects to stay in the same node for longer, before requiring reinsertion in a different node.
The PH-Tree as technically a 'bit-level trie'. It basically looks like a quadtree, but has limited depth (64 for 64bit values), no reinsertion/rebalancing ever and is guaranteed to modify at most two nodes for every insertion or removal. At the same time, node size is limited by dimensionality, so maximum 8 points per node for 3D points or 64 entries when storing points and rectangles (boxes). Important for your case: Very good update performance (better than R-trees or KD-trees) and very good window-query performance. Window-queries would represent your AABBs when you look for points that overlap with a AABB. Moreover, you could also store th AABBs in the tree, then any window-query would return all points and other AABBs that it overlaps with (if that is useful for you).
Unfortunately, I'm not aware of any free implementation of Samet's quadtree. For the PH-Tree, you can find a Java version on the link given above and a C++ version here. You can also look at my index collection for Java implementations of various multidimensional indexes.

How to best represent a grid graph in 3D Euclidean Space?

I am looking for an efficient way to represent a connected graph, where the nodes are spatially located in a 3D Euclidian space and each node may have 6 edges (4 directions on its respective 2D plance and up and down), but have not yet found any examples, perhaps because I am not using the correct keywords.
Any guidance would be much appreciated.
Is there any library for such a structure?
Maybe you are looking for a 'spatial' data structure.
A simple example is an oct-tree (three dimensions), which is easy to implement, there are also plenty of implementation on the web.
Is the grid extended one node at a time? Or planes at a time? Or are you adding cubes (for example 10x10x10) of nodes?
I wrote my own multi-dimensional structure a while ago, called PH-Tree. If you add individual nodes, you could add them one by one. If you add cubes of nodes, maybe it's best to store these cubes in 3D arrays, then you add these arrays to the ph-tree, with their position in space as key.
The PH-Tree is somewhat complex to implement, but it's faster and more space efficient than octtrees, at least for large datasets.
The PH-Tree sources are in Java.
Other key-words to look up: R-Trees (R*-Tree, R+Tree, X-Tree) and kd-Trees.

Quadtree equivalent of AVL tree

I am looking for a quadtree/octree/2^n tree that self-balances as it accepts new observations, without knowledge of every other point, iow, it cannot rely on the median as I am writing in a 'streaming' context. The AVL tree balances as it goes by pivoting, is there another similar data structure for higher dimensioned data?
The AVL tree, returns only one result, the element to find.
But especiall the ebucket based quadtrees return a list of objects near the queried location. The calling programm finally has to inspect all objects in the result for that ones that fullfill the application task.
From that perspective balancing makes little sense. A more dense region (e.g city) has more detailed structures and therefore has a deeper quadtree.
This is not bad. I don't see any need for quad balancing.
Further for all quadtree types (point, lines, object quadtrees) a quad node when it is splitted, it always splitts in 4 equal size sub rectangular or quadratic sub nodes. These types are called restricted quad trees. There is only one hint in literature I have found on balanced quadtrees (M.Bern, D-Eppstein and J.Gilbert: Probably good mesh generations, cited in Hanan Samet: Foundations on Multidimensional Spatial Data Strcutures). If you have academic interest you might read the paper, otherwise I doubt it has value to you.
Otherwise it is not a normal (i.e restricted) quad tree. Read more on R-Trees for sub dividinbg the space in individual rectangles. (R-trees are a competitor to quad trees)
The only quad type balancing that corresponds to a quad tree, could be a dynamic bucket size. But for that I don't see an advantage.
About garuantees:
The maximum depth of the final built up static quad tree, gives an upper bound. (Feel free to measure an average depth). The max bucket size, too gives an upper bound. (Again measure the avg bucket size).
Balancing:
The structure of a quad tree depends on the order of the inserted values.
The values to insert into a quad tree are usually static, so the can be ordered in advance. There are specific pre-orderings that give a (slightly) better balance.
Please note that a quad tree is a spatial index wich is not well usefull for deletions.

Search bounding rectangles (axis aligned) for a given query point in 2 dimensions

I have a set of very many axis-aligned rectangles which maybe nested and intersecting. I want to be able to find all the rectangles that enclose/bound a query point. What would be a good approach for this?
EDIT : Additional information-
1. By very many I meant ~100 million or more.
2. The rectangles are distributed across a huge span (span of a country). There is no restriction on the sizes.
3. Yes the rectangles can be pre-processed and stored in a tree structure.
4. No real-time insertions and deletions are required.
5. I only need to find all the rectangles enclosing/bounding a given query point. I do not need the Nearest Neighbors.
As you might have guessed, this is for a real-time geo-fencing application on a mobile unit and hence -
6. The search need not be repeated for rectangles sufficiently far from the point.
I've tried KD trees and Quad-Trees by approximating each Rectangle to a point. They've given me variable performances depending on the size of the rectangles.
Is there a more direct way of doing it ? How about r trees?
I would consider using a quadtree. (Posting from mobile so it's too much effort to link, but Wikipedia has a decent explanation.) You can split at the left, right, top, and bottom bound of any rectangle, and store each rectangle in the node representing the smallest region that contains the rectangle. To search for a point, you go down the quadtree towards the point and check every rectangle that you encounter along that path.
This will work well for small rectangles, but if many rectangles cover almost the entire region you'll still have to check all of those.
You need to look at the R*-tree data structure.
In contrast to many other structures, the R*-tree is well capable of storing (overlapping) rectangles. It is not limited to point data. You will be able to find many publications on how to best approximate polygons before putting them into the index, too. Also, it scales up to pretty large data, as it can operate on disk, too.
R*-trees are faster when bulk loaded; as this can be used to reduce overlap of index pages and ensure a near-perfectly balanced tree, whereas dynamic insertions only guarantee each page to be at least half full or so. I.e. a bulk loaded tree will often use only half as much memory / storage.
For 2d data, and your type of queries, a quadtree or grid may however work just well enough. It depends on how much local data density varies.

Broad-phase collision detection methods?

I'm building a 2D physics engine and I want to add broad-phase collision detection, though I only know of 2 or 3 types:
Check everything against everything else (O(n^2) complexity)
Sweep and Prune (sort and sweep)
something about Binary Space Partition (not sure how to do this)
But surely there's more options right? what are they? And can either a basic description of each be provided or links to descriptions?
I've seen this but I'm asking for a list of algorithms available, not the best one for my needs.
In this case, "Broad phase collision detection" is a method used by physics engines to determine which bodies in their simulation are close enough to warrant further investigation and possibly collision resolution.
The best approach depends on the specific use, but the bottom line is that you want to subdivide your world space such that (a) every body is in exactly one subdivision, (b) every subdivision is large enough that a a body in a particular subdivision can only collide with bodies in that same subdivision or an adjacent subdivision, and (c) the number of bodies in a particular subdivision is as small as possible.
How you do that depends on how many bodies you have, how they're moving, what your performance requirements are, and how much time you want to spend on your engine. If you're talking about bodies moving around in a largely open space, the simplest technique would be divide the world into a grid where each cell is larger than your largest object, and track the list of objects in each cell. If you're building something on the scale of a classic arcade game, this solution may well suffice.
If you're dealing with bodies moving in a larger open world, a simple grid will become overwhelming pretty quickly, and you'll probably want some sort of a tree-based structure like quadtrees, as Arriu suggests.
If you're talking about moving bodies around within bounded spaces instead of open spaces, then you may consider a BSP tree; the tree partitions the world into 'space you can walk in' and 'walls', and clipping a body into the tree determines whether it's in a legal position. Depending on the world geometry, you can also use a BSP for your broad-phase detection of collisions between bodies in the world.
Another option for bodies moving in bounded space would be a portal engine; if your world can consist of convex polygonal regions where each side of the polygon is either a solid wall or a 'portal' to another concave space, you can easily determine whether a body is within a region with a point-in-polygon test and simplify collision detection by only looking at bodies in the same region or connected regions.
An alternative to QuadTrees or BSPTrees are SphereTrees (CircleTrees in 2D, the implementation would be more or less the same). The advantage that SphereTrees have are that they handle large loads of dynamic objects very well. If you're objects are constantly moving, BSPTrees and QuadTrees are much slower in their updates than a Sphere/Circle Tree would be.
If you have a good mix of static and dynamic objects, a reasonably good solution is to use a QuadTree/BSPTree for the statics and a Sphere/Cicle Tree for the dynamic objects. Just remember that for any given object, you would need to check it against both trees.
I recommend quadtree partitioning. It's pretty simple and it works really well. Here is a Flash demo of brute-force collision detection vs. quadtree collision detection. (You can tell it to show the quadtree structure.) Did you notice how quadtree collision detection is only 3% of brute force in that demo?
Also, if you are serious about your engine then I highly recommend you pick up real-time collision detection. It's not expensive and it's a really great book which covers everything you would ever want to know. (Including GPU based collision detection.)
All of the acceleration algorithms depend on using an inexpensive test to quickly rule out objects (or groups of objects) and thereby cut down on the number of expensive tests you have to do. I view the algorithms in categories, each of which has many variations.
Spatial partitioning. Carve up space and cheaply exclude candidates that are in different regions. For example, BSP, grids, octrees, etc.
Object partitioning. Similar to #1, but the clustering is focused on the objects themselves more than the space they reside in. For example, bounding volume hierarchies.
Sort and sweep methods. Put the objects in order spatially and rule out collisions among ones that aren't adjacent.
1 and 2 are often hierarchical, recursing into each partition as needed. With 3, you can optionally iterate along different dimensions.
Trade-offs depend a lot on scene geometry. Do objects cluster or are they evenly or sparsely distributed? Are they all about the same size or are there huge variations in size? Is the scene dynamic? Can you afford a lot of preprocessing time?
The "inexpensive" tests are actually along a spectrum of really-cheap to kind-of-expensive. Choosing the best algorithm means minimizing the ratio of the cost of the inexpensive testing to the reduction in the number of expensive tests. Beyond the algorithmic concerns, you get into tuning, like worrying about cache locality.
An alternative are plain grids, say 20x20 or 100x100 (depends on your world and memory size). The implementation is simpler than a recursive structure such as quad/bsp-trees (or sphere trees for that matter).
Objects crossing cell borders are a bit simpler in this case, and do not degenerate as much as an naive implementation of a bsp/quad/oct-tree might do.
Using that (or other techinques), you should be able to quickly cull many pairs and get a set of potential collisions that need further investigation.
I just came up with a solution that doesn't depend on grid size and is probably O(nlogn) (that is the optimum when there are no collisions) though worst at O(nnlogn) (when everything collides).
I also implemented and tested it, here is the link to the source. But I haven't compared it to the brute force solution.
A description of how it works:
(I'm looking for collisions of rectangles)
on x axis I sort the rectangles according to their right edge ( O(nlogn) )
for every rect in the sorted list I take the left edge and do a binary search until I find the rightmost edge at the left of the left edge and insert these rectangles between these indices in a possible_Collision_On_X_Axis set ( those are n rectangles, logn for the binary search, n inserts int the set at O(log n)per insert)
on y axis I do the same
in each of the sets I now have possible collisions on x (in one set) and on y(int the other), I intersect these sets and now I have the collisions on both the x axis and y axis (that means I take the common elements) (O(n))
sorry for the poor description, I hope you understand better from the source, also an example illustrated here: image
You might want to check out what Scott did in Chipmunk with spacial hashing. The source is freely available. I think he used a similar technique to Box-2D if not for collision, definitely for contact persistence.
I've used a quad-tree in a larger project, which is good for game objects that don't move much (less removals & re-insertions). The same principle applies for octrees.
The basic Idea Is, you create a recursive tree structure, which stores 4(for quad), or 8(oct) child objects of the same type as the tree root. Each node in the tree represents a section of Cartesian space, for example, -100 -> +100 on each applicable axis. each child represents that same space, but subdivided by half (an immediate child of the example would represent, for example, -100->0 on X axis, and -100->0 on Y axis).
Imagine a square, or plane, with a given set of dimensions. Now draw a line through the centre on each axis, dividing that plane into 4 smaller planes. Now take one of them and do It again, and again, until you reach a point when the size of the plane segment Is roughly the size of a game object. Now you have your tree. Only objects occupying the same plane, can possibly collide. When you have determined which objects can collide, You generate pairs of possible collisions from them. At this stage, broadphase Is complete, and you can move onto narrow phase collision detection, which Is where your more precise, and expensive calculations are.
The purpose of Broadphase, Is to use inexpensive calculations to generate possible collisions, and cull out contacts that cannot occur, thus reducing the work your narrow phase algorithm has to perform, In turn, making your entire collision detection algorithm more efficient.
So before you go ahead and attempt to implement such an algorithm, as yourself:
How many objects are in my game?
If there are a lot, I probably need a broadphase. If not, then the Nnarrowphase should suffice.
Also, am I dealing with many moving objects?
Tree structures generally are slowed down by moving objects, as they can change their position in the tree over time, simply by moving. This requires that objects be removed and reinserted each frame (potentially), which is less than Ideal.
If this is the case, you would be better off with Sweep and Prune, which maintains min/max heaps of the extents of the shapes in your world. Objects do not need to be reinserted, but the heaps need to be resorted each frame, thought this is generally faster than a tree wide, traversal with removals and reinsertions.
Depending on your coding experience, one may be more complicated to code over another. Personally I have found trees to be more intuitive to code, but less efficient, and more prone to error, since they raise other issues, such as what to do if you have an object that sits directly on top of an axis, or the centre of the root node. This can be solved by using loose trees, which have some spacial overlap, so that one child node is always prioritised over others when such a case occurs.
If the space that your objects move within is bounded, then you could use a grid to subdivide your objects. Put each object into every grid cell which the object covers (fully or partially). To check if object A collides with any other object, determine which grid cells object A covers, then get the list of unique objects in those cells, and finally test object A against each unique object. This approach works best if most objects are usually contained in a single grid cell.
If your space is not bounded, then you will need to implement some sort of dynamic grid that can grow on demand in each of the four directions (in 2D).
The advantage of this approach over more adaptive algorithsm (i.e. BSP, Quadtree, Circletree) is that the cells can be accessed in O(1) time (i.e. constant time) rather than O(log N) time (i.e. logarithmic time). However these latter algorithms are able to adapt themselves to large variability in the density of objects. The grid approach works best when object density is fairly constant across the space.
I would like to recommend the introductory reference to game physics from Ian Millington, Game Physics Engine Development. It has a great section on broad phase collision detection with sample code.

Resources