I am looking for a quadtree/octree/2^n tree that self-balances as it accepts new observations, without knowledge of every other point, iow, it cannot rely on the median as I am writing in a 'streaming' context. The AVL tree balances as it goes by pivoting, is there another similar data structure for higher dimensioned data?
The AVL tree, returns only one result, the element to find.
But especiall the ebucket based quadtrees return a list of objects near the queried location. The calling programm finally has to inspect all objects in the result for that ones that fullfill the application task.
From that perspective balancing makes little sense. A more dense region (e.g city) has more detailed structures and therefore has a deeper quadtree.
This is not bad. I don't see any need for quad balancing.
Further for all quadtree types (point, lines, object quadtrees) a quad node when it is splitted, it always splitts in 4 equal size sub rectangular or quadratic sub nodes. These types are called restricted quad trees. There is only one hint in literature I have found on balanced quadtrees (M.Bern, D-Eppstein and J.Gilbert: Probably good mesh generations, cited in Hanan Samet: Foundations on Multidimensional Spatial Data Strcutures). If you have academic interest you might read the paper, otherwise I doubt it has value to you.
Otherwise it is not a normal (i.e restricted) quad tree. Read more on R-Trees for sub dividinbg the space in individual rectangles. (R-trees are a competitor to quad trees)
The only quad type balancing that corresponds to a quad tree, could be a dynamic bucket size. But for that I don't see an advantage.
About garuantees:
The maximum depth of the final built up static quad tree, gives an upper bound. (Feel free to measure an average depth). The max bucket size, too gives an upper bound. (Again measure the avg bucket size).
Balancing:
The structure of a quad tree depends on the order of the inserted values.
The values to insert into a quad tree are usually static, so the can be ordered in advance. There are specific pre-orderings that give a (slightly) better balance.
Please note that a quad tree is a spatial index wich is not well usefull for deletions.
Related
I have a problem understanding this algorithm, labeled "Algorithm 1" from this paper.
It says:
If at leaf, check for intersection with contained triangles.
What am I missing? As far as I know, the kd-tree nodes only hold one value and pointers to left and right children. Do you know any reference kd-tree structures for me to investigate? At insert, I calculate middlepoints for each axis and based on that, I place one triangle for each node.
In case of GPU Ray Tracing, KD-Trees are used as acceleration structures. We group geometry into larger chunks to cull ray misses early. However, it might happen, that the tree gets too deep at certain nodes. To avoid this, we can limit the height of the tree. This is why we might end up with more geometry in a leaf.
Note, that Algorithm 3 also mentions this case.
The exit step
24: Intersect ray with contained geometry
Clarifying "too deep":
When a branch goes much deeper than it's neighbours, the threads diverge, causing performance degradation .
More on the topic here: Nvidia RTX best practices
Storing only one triangle per node is probably sub-optimal. Generally, it is faster to iterate through a moderately sized bucket of objects than to navigate the same sized binary tree.
Also, there is no guarantee that you will be able to sort out all your triangles such that your k-D tree has one per node. Hierarchical bounding volumes (where, unlike k-D trees, sibling volumes may overlap) should be more flexible in this respect, but they are also conventionally used with buckets of target objects, rather than one object per node.
I'm solving this problem, and I don't know which data structure to use.
I have multiple objects (convex polygons and circles) on a 2D plane, and for a given point, I have to calculate the objects the point lies within (they can overlap).
I've been reading about K-D trees, but I don't know how to "bend" it for this kind of objects. I've been also reading about bounding volume hierarchy, but I don't know if it would be optimal.
So, what do you think would be the best data structure for this problem? Time performance is more important than memory usage).
Thanks!
For most part, the "efficiency" of space partitioning schemes like BVH, kd-tree, R-tree etc, comes from smart tree construction. As long as you can build your tree well, you will have fast performance. For you case, I would say kd-tree is fine - it's very common with lots of source code available. So are R-trees. I don't understand what you mean by "bend" it for your objects. For Kd-Tree, all you have to decide, is given an axis aligned plane - for 2D case it would be either x = c or y = c, if the circle (or poly) lies to one side, or straddles. Rather trivial problem.
I have a set of very many axis-aligned rectangles which maybe nested and intersecting. I want to be able to find all the rectangles that enclose/bound a query point. What would be a good approach for this?
EDIT : Additional information-
1. By very many I meant ~100 million or more.
2. The rectangles are distributed across a huge span (span of a country). There is no restriction on the sizes.
3. Yes the rectangles can be pre-processed and stored in a tree structure.
4. No real-time insertions and deletions are required.
5. I only need to find all the rectangles enclosing/bounding a given query point. I do not need the Nearest Neighbors.
As you might have guessed, this is for a real-time geo-fencing application on a mobile unit and hence -
6. The search need not be repeated for rectangles sufficiently far from the point.
I've tried KD trees and Quad-Trees by approximating each Rectangle to a point. They've given me variable performances depending on the size of the rectangles.
Is there a more direct way of doing it ? How about r trees?
I would consider using a quadtree. (Posting from mobile so it's too much effort to link, but Wikipedia has a decent explanation.) You can split at the left, right, top, and bottom bound of any rectangle, and store each rectangle in the node representing the smallest region that contains the rectangle. To search for a point, you go down the quadtree towards the point and check every rectangle that you encounter along that path.
This will work well for small rectangles, but if many rectangles cover almost the entire region you'll still have to check all of those.
You need to look at the R*-tree data structure.
In contrast to many other structures, the R*-tree is well capable of storing (overlapping) rectangles. It is not limited to point data. You will be able to find many publications on how to best approximate polygons before putting them into the index, too. Also, it scales up to pretty large data, as it can operate on disk, too.
R*-trees are faster when bulk loaded; as this can be used to reduce overlap of index pages and ensure a near-perfectly balanced tree, whereas dynamic insertions only guarantee each page to be at least half full or so. I.e. a bulk loaded tree will often use only half as much memory / storage.
For 2d data, and your type of queries, a quadtree or grid may however work just well enough. It depends on how much local data density varies.
I looked at the definition of KD-tree and R-tree. It seems to me that they are almost the same.
What's the difference between a KD-tree and an R-tree?
They are actually quite different. They serve similar purpose (region queries on spatial data), and they both are trees (and both belong to the family of bounding volume hierarchy indexes), but that is about all they have in common.
R-Trees are balanced, k-d-trees are not (unless bulk-loaded). This is why R-trees are preferred for changing data, as k-d-trees may need to be rebuilt to re-optimize.
R-Trees are disk-oriented. They actually organize the data in areas that directly map to the on-disk representation. This makes them more useful in real databases and for out-of-memory operation. k-d-trees are memory oriented and are non-trivial to put into disk pages
k-d-trees are elegant when bulk-loaded (kudos to SingleNegationElimination for pointing this out), while R-trees are better for changing data (although they do benefit from bulk loading, when used with static data).
R-Trees do not cover the whole data space. Empty areas may be uncovered. k-d-trees always cover the whole space.
k-d-trees binary split the data space, R-trees partition the data into rectangles. The binary splits are obviously disjoint; while the rectangles of an R-tree may overlap (which actually is sometimes good, although one tries to minimize overlap)
k-d-trees are a lot easier to implement in memory, which actually is their key benefit
R-trees can store rectangles and polygons, k-d-trees only stores point vectors (as overlap is needed for polygons)
R-trees come with various optimization strategies, different splits, bulk-loaders, insertion and reinsertion strategies etc.
k-d-trees use the one-dimensional distance to the separating hyperplane as bound; R-trees use the d-dimensional minimum distance to the bounding hyperrectangle for bounding (they can also use the maximum distance for some counting queries, to filter true positives).
k-d-trees support squared Euclidean distance and Minkowski norms, while Rtrees have been shown to also support geodetic distance (for finding near points on geodata).
R-trees and kd-trees are based on similar ideas (space partitioning based on axis-aligned regions), but the key differences are:
Nodes in kd-trees represent separating planes, whereas nodes in R-trees represent bounding boxes.
kd-trees partition the whole of space into regions whereas R-trees only partition the subset of space containing the points of interest.
kd-trees represent a disjoint partition (points belong to only one region) whereas the regions in an R-tree may overlap.
(There are lots of similar kinds of tree structures for partitioning space: quadtrees, BSP-trees, R*-trees, etc. etc.)
A major difference between the two not mentioned in this answer is that KD-trees are only efficient in bulk-loading situations. Once built, modifying or rebalancing a KD-tree is non-trivial. R-trees do not suffer from this.
I'm interested in efficiently keeping track of the center of mass of a large box on the integer lattice from which orthant-shaped regions are repeatedly carved out. I've been poking around in the computational geometry literature, and there's a variety of data structures that might be relevant, but most of the discussion is about visibility computations (for computer graphics) or nearest neighbor finding (for data mining and such).
The paper
http://www.graphicsinterface.org/pre1996/92-NaylorSolidGeometry.pdf, i.e.:
Naylor, Bruce F. Interactive Solid Geometry via Partitioning Trees,
Proc. Graphics Interface '92, 1992, pp 11-18.
discusses a system that represents geometric objects by "binary space partitioning trees", supports set operations, and has an intriguing mention (without details) that the center of mass of objects is recomputed after set operations. Perhaps I have a blind spot, but it's not immediately apparent to me how to efficiently update the center of mass during the tree merging algorithm, and I haven't found a discussion of center of mass computations in papers that cite the Naylor one. Any pointers?
A k-d tree is quite literally what you're looking for here seeing as your cuts are inherently axis-aligned but at arbitrary positions. Dealing with generalizations such as binary space partitioning schemes as mentioned in the paper sounds like a layer of complexity that you don't need and will reduce performance.
With a k-d tree you'll be able to calculate and remove intersections with large regions disappearing. If you store the weight and center of mass for each k-d tree node's region within node itself you ought to be able to erase its contribution to the total center of mass without needing to consider child nodes.
By doing this, effectively you'll be building a tree of volumes represented as point masses. Each node can be calculated from its children as needed.