I am looking for a quadtree/octree/2^n tree that self-balances as it accepts new observations, without knowledge of every other point, iow, it cannot rely on the median as I am writing in a 'streaming' context. The AVL tree balances as it goes by pivoting, is there another similar data structure for higher dimensioned data?
The AVL tree, returns only one result, the element to find.
But especiall the ebucket based quadtrees return a list of objects near the queried location. The calling programm finally has to inspect all objects in the result for that ones that fullfill the application task.
From that perspective balancing makes little sense. A more dense region (e.g city) has more detailed structures and therefore has a deeper quadtree.
This is not bad. I don't see any need for quad balancing.
Further for all quadtree types (point, lines, object quadtrees) a quad node when it is splitted, it always splitts in 4 equal size sub rectangular or quadratic sub nodes. These types are called restricted quad trees. There is only one hint in literature I have found on balanced quadtrees (M.Bern, D-Eppstein and J.Gilbert: Probably good mesh generations, cited in Hanan Samet: Foundations on Multidimensional Spatial Data Strcutures). If you have academic interest you might read the paper, otherwise I doubt it has value to you.
Otherwise it is not a normal (i.e restricted) quad tree. Read more on R-Trees for sub dividinbg the space in individual rectangles. (R-trees are a competitor to quad trees)
The only quad type balancing that corresponds to a quad tree, could be a dynamic bucket size. But for that I don't see an advantage.
About garuantees:
The maximum depth of the final built up static quad tree, gives an upper bound. (Feel free to measure an average depth). The max bucket size, too gives an upper bound. (Again measure the avg bucket size).
Balancing:
The structure of a quad tree depends on the order of the inserted values.
The values to insert into a quad tree are usually static, so the can be ordered in advance. There are specific pre-orderings that give a (slightly) better balance.
Please note that a quad tree is a spatial index wich is not well usefull for deletions.
I'm solving this problem, and I don't know which data structure to use.
I have multiple objects (convex polygons and circles) on a 2D plane, and for a given point, I have to calculate the objects the point lies within (they can overlap).
I've been reading about K-D trees, but I don't know how to "bend" it for this kind of objects. I've been also reading about bounding volume hierarchy, but I don't know if it would be optimal.
So, what do you think would be the best data structure for this problem? Time performance is more important than memory usage).
Thanks!
For most part, the "efficiency" of space partitioning schemes like BVH, kd-tree, R-tree etc, comes from smart tree construction. As long as you can build your tree well, you will have fast performance. For you case, I would say kd-tree is fine - it's very common with lots of source code available. So are R-trees. I don't understand what you mean by "bend" it for your objects. For Kd-Tree, all you have to decide, is given an axis aligned plane - for 2D case it would be either x = c or y = c, if the circle (or poly) lies to one side, or straddles. Rather trivial problem.
Problem
I'm working with openstreetmap-data and want to test for point-features in which polygon they lie. In total there are 10.000s of polygons and 100.000.000 of points. I can hold all this data in memory. The polygons usually have 1000s of points, hence making point-in-polygon-tests is very expensive.
Idea
I could index all polygons with an R-Tree, allowing me to only check the polygons whose bounding-box is hit.
Probable new problem
As the polygons are touching each other (think of administrative boundaries) there are many points in the bounding-box of more than one polygon, hence forcing many point-in-polygon-tests.
Question
Do you have any better suggestion than using an R-Tree?
Quad-Trees will likely work worse than rasterization - they are essentially a repeated rasterization to 2x2 images... But definitely exploit rasterization for all the easy cases, because testing the raster should be as fast as it gets. And if you can solve 90% of your points easily, you have more time for the remaining.
Also make sure to first remove duplicates. The indexes often suffer from duplicates, and they are obviously redundant to test.
R*-trees are probably a good thing to try, but you need to really carefully implement them.
The operation you are looking for is a containment spatial join. I don't think there is any implementation around that you could use - but for your performance issues, I would carefully implement it myself anyway. Also make sure to tune parameters and profile your code!
The basic idea of the join is to build two trees - one for the points, one for the polygons.
You then start with the root nodes of each tree, and repeat the following recursively until the leaf level:
If one is a non-directory node:
If the two nodes do not overlap: return
Decide by an heuristic (you'll need to figure this part out, "larger extend" may do for a start) which directory node to expand.
Recurse into each new node, plus the other non-opened node as new pair.
Leaf nodes:
fast test point vs. bounding box of polygon
slow test point in polygon
You can further accelerate this if you have a fast interior-test for the polygon, in particular for rectangle-in-polygon. It may be good enough if it is approximative, as long as it is fast.
For more detailed information, search for r-tree spatial join.
Try using quad trees.
Basically you can recursivelly partion space into 4 parts and then for each part you should know:
a) polygons which are superset of given part
b) polygons which intersect given part
This gives some O(log n) overhead factor which you might not be happy with.
The other option is to just partion space using grid. You should keep same information or each part of the grid as in the case above. This does only have some constant overhead.
Both this options assume, that the distribution of polygons is somehow uniform.
There is an other option, if you can process points offline (in other words you can pick the processing order of points). Then you can use some sweeping line techniques, where you sort points by one coordinate, you iterate over points in this sorted order and maintain only interesting set of polygons during iteration.
I'm building a 2D physics engine and I want to add broad-phase collision detection, though I only know of 2 or 3 types:
Check everything against everything else (O(n^2) complexity)
Sweep and Prune (sort and sweep)
something about Binary Space Partition (not sure how to do this)
But surely there's more options right? what are they? And can either a basic description of each be provided or links to descriptions?
I've seen this but I'm asking for a list of algorithms available, not the best one for my needs.
In this case, "Broad phase collision detection" is a method used by physics engines to determine which bodies in their simulation are close enough to warrant further investigation and possibly collision resolution.
The best approach depends on the specific use, but the bottom line is that you want to subdivide your world space such that (a) every body is in exactly one subdivision, (b) every subdivision is large enough that a a body in a particular subdivision can only collide with bodies in that same subdivision or an adjacent subdivision, and (c) the number of bodies in a particular subdivision is as small as possible.
How you do that depends on how many bodies you have, how they're moving, what your performance requirements are, and how much time you want to spend on your engine. If you're talking about bodies moving around in a largely open space, the simplest technique would be divide the world into a grid where each cell is larger than your largest object, and track the list of objects in each cell. If you're building something on the scale of a classic arcade game, this solution may well suffice.
If you're dealing with bodies moving in a larger open world, a simple grid will become overwhelming pretty quickly, and you'll probably want some sort of a tree-based structure like quadtrees, as Arriu suggests.
If you're talking about moving bodies around within bounded spaces instead of open spaces, then you may consider a BSP tree; the tree partitions the world into 'space you can walk in' and 'walls', and clipping a body into the tree determines whether it's in a legal position. Depending on the world geometry, you can also use a BSP for your broad-phase detection of collisions between bodies in the world.
Another option for bodies moving in bounded space would be a portal engine; if your world can consist of convex polygonal regions where each side of the polygon is either a solid wall or a 'portal' to another concave space, you can easily determine whether a body is within a region with a point-in-polygon test and simplify collision detection by only looking at bodies in the same region or connected regions.
An alternative to QuadTrees or BSPTrees are SphereTrees (CircleTrees in 2D, the implementation would be more or less the same). The advantage that SphereTrees have are that they handle large loads of dynamic objects very well. If you're objects are constantly moving, BSPTrees and QuadTrees are much slower in their updates than a Sphere/Circle Tree would be.
If you have a good mix of static and dynamic objects, a reasonably good solution is to use a QuadTree/BSPTree for the statics and a Sphere/Cicle Tree for the dynamic objects. Just remember that for any given object, you would need to check it against both trees.
I recommend quadtree partitioning. It's pretty simple and it works really well. Here is a Flash demo of brute-force collision detection vs. quadtree collision detection. (You can tell it to show the quadtree structure.) Did you notice how quadtree collision detection is only 3% of brute force in that demo?
Also, if you are serious about your engine then I highly recommend you pick up real-time collision detection. It's not expensive and it's a really great book which covers everything you would ever want to know. (Including GPU based collision detection.)
All of the acceleration algorithms depend on using an inexpensive test to quickly rule out objects (or groups of objects) and thereby cut down on the number of expensive tests you have to do. I view the algorithms in categories, each of which has many variations.
Spatial partitioning. Carve up space and cheaply exclude candidates that are in different regions. For example, BSP, grids, octrees, etc.
Object partitioning. Similar to #1, but the clustering is focused on the objects themselves more than the space they reside in. For example, bounding volume hierarchies.
Sort and sweep methods. Put the objects in order spatially and rule out collisions among ones that aren't adjacent.
1 and 2 are often hierarchical, recursing into each partition as needed. With 3, you can optionally iterate along different dimensions.
Trade-offs depend a lot on scene geometry. Do objects cluster or are they evenly or sparsely distributed? Are they all about the same size or are there huge variations in size? Is the scene dynamic? Can you afford a lot of preprocessing time?
The "inexpensive" tests are actually along a spectrum of really-cheap to kind-of-expensive. Choosing the best algorithm means minimizing the ratio of the cost of the inexpensive testing to the reduction in the number of expensive tests. Beyond the algorithmic concerns, you get into tuning, like worrying about cache locality.
An alternative are plain grids, say 20x20 or 100x100 (depends on your world and memory size). The implementation is simpler than a recursive structure such as quad/bsp-trees (or sphere trees for that matter).
Objects crossing cell borders are a bit simpler in this case, and do not degenerate as much as an naive implementation of a bsp/quad/oct-tree might do.
Using that (or other techinques), you should be able to quickly cull many pairs and get a set of potential collisions that need further investigation.
I just came up with a solution that doesn't depend on grid size and is probably O(nlogn) (that is the optimum when there are no collisions) though worst at O(nnlogn) (when everything collides).
I also implemented and tested it, here is the link to the source. But I haven't compared it to the brute force solution.
A description of how it works:
(I'm looking for collisions of rectangles)
on x axis I sort the rectangles according to their right edge ( O(nlogn) )
for every rect in the sorted list I take the left edge and do a binary search until I find the rightmost edge at the left of the left edge and insert these rectangles between these indices in a possible_Collision_On_X_Axis set ( those are n rectangles, logn for the binary search, n inserts int the set at O(log n)per insert)
on y axis I do the same
in each of the sets I now have possible collisions on x (in one set) and on y(int the other), I intersect these sets and now I have the collisions on both the x axis and y axis (that means I take the common elements) (O(n))
sorry for the poor description, I hope you understand better from the source, also an example illustrated here: image
You might want to check out what Scott did in Chipmunk with spacial hashing. The source is freely available. I think he used a similar technique to Box-2D if not for collision, definitely for contact persistence.
I've used a quad-tree in a larger project, which is good for game objects that don't move much (less removals & re-insertions). The same principle applies for octrees.
The basic Idea Is, you create a recursive tree structure, which stores 4(for quad), or 8(oct) child objects of the same type as the tree root. Each node in the tree represents a section of Cartesian space, for example, -100 -> +100 on each applicable axis. each child represents that same space, but subdivided by half (an immediate child of the example would represent, for example, -100->0 on X axis, and -100->0 on Y axis).
Imagine a square, or plane, with a given set of dimensions. Now draw a line through the centre on each axis, dividing that plane into 4 smaller planes. Now take one of them and do It again, and again, until you reach a point when the size of the plane segment Is roughly the size of a game object. Now you have your tree. Only objects occupying the same plane, can possibly collide. When you have determined which objects can collide, You generate pairs of possible collisions from them. At this stage, broadphase Is complete, and you can move onto narrow phase collision detection, which Is where your more precise, and expensive calculations are.
The purpose of Broadphase, Is to use inexpensive calculations to generate possible collisions, and cull out contacts that cannot occur, thus reducing the work your narrow phase algorithm has to perform, In turn, making your entire collision detection algorithm more efficient.
So before you go ahead and attempt to implement such an algorithm, as yourself:
How many objects are in my game?
If there are a lot, I probably need a broadphase. If not, then the Nnarrowphase should suffice.
Also, am I dealing with many moving objects?
Tree structures generally are slowed down by moving objects, as they can change their position in the tree over time, simply by moving. This requires that objects be removed and reinserted each frame (potentially), which is less than Ideal.
If this is the case, you would be better off with Sweep and Prune, which maintains min/max heaps of the extents of the shapes in your world. Objects do not need to be reinserted, but the heaps need to be resorted each frame, thought this is generally faster than a tree wide, traversal with removals and reinsertions.
Depending on your coding experience, one may be more complicated to code over another. Personally I have found trees to be more intuitive to code, but less efficient, and more prone to error, since they raise other issues, such as what to do if you have an object that sits directly on top of an axis, or the centre of the root node. This can be solved by using loose trees, which have some spacial overlap, so that one child node is always prioritised over others when such a case occurs.
If the space that your objects move within is bounded, then you could use a grid to subdivide your objects. Put each object into every grid cell which the object covers (fully or partially). To check if object A collides with any other object, determine which grid cells object A covers, then get the list of unique objects in those cells, and finally test object A against each unique object. This approach works best if most objects are usually contained in a single grid cell.
If your space is not bounded, then you will need to implement some sort of dynamic grid that can grow on demand in each of the four directions (in 2D).
The advantage of this approach over more adaptive algorithsm (i.e. BSP, Quadtree, Circletree) is that the cells can be accessed in O(1) time (i.e. constant time) rather than O(log N) time (i.e. logarithmic time). However these latter algorithms are able to adapt themselves to large variability in the density of objects. The grid approach works best when object density is fairly constant across the space.
I would like to recommend the introductory reference to game physics from Ian Millington, Game Physics Engine Development. It has a great section on broad phase collision detection with sample code.
I have a polygon soup of triangles that I would like to construct a BSP tree for. My current program simply constructs a BSP tree by inserting a random triangle from the model one at a time until all the triangles are consumed, then it checks the depth and breadth of the tree and remembers the best score it achieved (lowest depth, lowest breadth).
By definition, the best depth would be log2(n) (or less if co-planar triangles are grouped?) where n is the number of triangles in my model, and the best breadth would be n (meaning no splitting has occurred). But, there are certain configurations of triangles for which this pinnacle would never be reached.
Is there an efficient test for checking the quality of my BSP tree? Specifically, I'm trying to find a way for my program to know it should stop looking for a more optimal construction.
Construction of an optimal tree is an NP-complete problem. Determining if a given tree is optimal is essentially the same problem.
From this BSP faq:
The problem is one of splitting versus
tree balancing. These are mutually
exclusive requirements. You should
choose your strategy for building a
good tree based on how you intend to
use the tree.
Randomly building BSP trees until you chance upon a good one will be really, really inefficient.
Instead of choosing a tri at random to use as a split-plane, you want to try out several (maybe all of them, or maybe a random sampling) and pick one according to some heuristic. The heuristic is typically based on (a) how balanced the resulting child nodes would be, and (b) how many tris it would split.
You can trade off performance and quality by considering a smaller or larger sampling of tris as candidate split-planes.
But in the end, you can't hope to get a totally optimal tree for any real-world data so you might have to settle for 'good enough'.
Try to pick planes that (could potentially) get split by the most planes as splitting planes. Splitting planes can't be split.
Try to pick a plane that has close to the same number of planes in front as in back.
Try to pick a plane that doesn't cause too many splits.
Try to pick a plane that is coplanar with a lot of other surfaces
You'll have to sample this criteria and come up with a scoring system to decide which one is most likely to be a good choice for a splitting plane. For example, the further off balance, the more score it loses. If it causes 20 splits, then penalty is -5 * 20 (for example). Choose the one that scores best. You don't have to sample every polygon, just search for a pretty good one.