KDTree Splitting - algorithm

I am currently writing a KDTree for a physics engine (Hobby project).
The KDTree does not contain points.
Instead it contains Axis Aligned bounding boxes which bound the different objects in the environment.
My problem is deciding on how to split the KDTree nodes when they get full.
I am trying 2 methods:
Method1: Always split the node exactly in half on the biggest axis.
This has the advantage of a pretty evenly spaced out tree.
Big disadvantage: If objects are concentrated in small area of the node, redundant sub-divisions will be created. This is because all volumes are split exactly in half.
Method2: Find the area of the node which contains objects. Split the node on the plane which splits that area in half on it's biggest axis. Example - If all objects are concentrated on the bottom of the node then it split length-wise thereby dividing the bottom in two.
This solves the problem with the method above
When indexing something that exists on the same plane (terrain for example), it creates long and narrow nodes. If I am to add some other objects later which are not on the same plane, these elongated nodes provide very poor indexing.
So what I'm looking for here is a better way to split my KD-Tree node.
Considering that this is going to be a physics engine the decision needs to be simple enough to be made in real time.

The "surface area heuristic" (SAH) is considered the best splitting method for building kd-trees, at least within the raytracing community. The idea is to add the plane so that the surface areas of the two child spaces, weighted by the number of objexts in each child, are equal.
A good reference on the subject is Ingo Wald's thesis, in particular chapter 7.3, "High-quality BSP Construction", which explains SAH better than I can.
I can't find a good link at the moment, but you should look around for papers on "binned" SAH, which is an approximation to the true SAH but much faster.
All that being said, bounding-volume hierarchies (BVH) a.k.a. AABB trees, seem to be much more popular than kd-trees these days. Again, Ingo Wald's publication page is a good starting point, probably with the "On fast Construction of SAH based Bounding Volume Hierarchies" paper, although it's been a while since I read it.
The OMPF forums are also a good place to discuss these sorts of things.
Hope that helps. Good luck!

Certainly for a physics engine where the premise is lots of moving geometry, a bvh is probably the better choice, they don't traverse quite as quickly but they are much faster to build, and are much easier to refit/restructure on a frame per frame basis, and offen don't need a complete rebuild, every frame (something that can be done in parallel over a series of frames while the refitted bvh suffices in the meantime, again, refer to wald).
An exception to this in physics could be when you're dealing with entities that have no volume such as particles or photons, the building of the kd tree is simplified by the fact that you don't need to resolve the bounds of the individual primitive. It really depends on the application. A good physics engine should use a balanced combination of spatial acceleration structures, it's common practise to resolve broader phase partitioning with say a shallow octree then extend the leaf nodes with another scheme that better fits the nature of what you are doing, BSPs are ideal for static geometry, especially in 2d and when the structure isn't changing, the best thing to do is experiment with as many different schemes and structures and get a feel for how and when they work best.

Related

Quadtree performance issue

The problem I have is that a game I work on uses a quadtree for fast proximity detection, used for range checks when weapons are firing. I'm using the classic "4 wide" quadtree, meaning that I subdivide when I attempt to add a 5th child node to an already full parent node.
Initially the set of available targets was fairly evenly spread out, so the quadtree worked very well. Due to design changes, we now get clusters of large numbers of enemies in a relatively small space, leading to performance problems because the quadtree becomes significantly unbalanced.
There are two possible solutions that occur to me, either modify the quadtree to handle this, or switch to an alternative representation.
The only other representation I'm familiar with is a spatial hash, and not very familiar at that. From what I understand, this risks suffering the same problem since the cluster would wind up in a relatively small number of hash buckets. From what I know of it, a BSP is a possible solution that will deal with the uneven distribution better than a quadtree or spatial hash.
No fair, I know, there are actually three questions now.
Are there any modifications I can make to the quadtree, e.g. increasing the "width" of nodes that would help deal with this?
Is it even worth my time to consider a spatial hash?
Is a BSP, or some other data structure a better bet to deal with the uneven distribution?
I usually use quadtree with at least 10 entries per node, but you'll have to try it out.
I have no experience with spatial hashing.
Other structures you could look into are:
KD-Trees: They are quite simple to implement and are also good for neighbour search, but get slower with string clustering. They are a bit slow to update and may get imbalanced.
R*Tree: More complex, very good with neighbour search, but even slower to update than KD-Trees. They won't get imbalanced because of automatic rebalancing. Rebalancing is mostly fast, but in extreme cases it can occasionally slow thing down further.
PH-Tree: Quite complex to implement. Good neighbour search. Has very good update speed (like quadtree), maximum depth is limited by the bit width of your coordinates (usually 32 or 64bit), so they can't really get imbalanced. Scales very well with large datasets (1 million and more), I have little experience with small datasets.
If you are using Java, I have Apache licensed versions available here (R*Tree) and here (PH-Tree).

What is the best data structure for an AABB collision checking physics engine?

I need an engine which consists of a world populated with axis-aligned bounding boxes (AABBs). A continuous loop will be executed, doing the following:
for box_a in world
box_a = do_something(box_a)
for box_b in world
if (box_a!=box_b and collides(box_a, box_b))
collide(box_a, box_b)
collide(box_b, box_a)
The problem with that is, obviously, that this is O(n^2). I have managed to make this loop much faster partitioning the space in chunks, so this became:
for box_a in world
box_a = do_something(box_a)
for chunk in box_a.neighbor_chunks
for box_b in chunk
if (box_a!=box_b and collides(box_a, box_b))
collide(box_a, box_b)
collide(box_b, box_a)
This is much faster but a little crude. Given that there is such a faster algorithm with not a lot of effort, I'd bet there is a data structure I'm not aware of that generalizes what I've done here, providing much better scalability for this algorithm.
So, my question is: what is the name of this problem and what are the optimal algorithms and data-structures to implement it?
this is indeed a generic problem of computer science : space partitionning.
its used in raytracing, path tracing, raster rendering, physics, IA, games, and pretty sure in HPC, databases, matrix maths, whatever science (molecules, pharmacy....), and I bet thousands of other stuff.
there is no 1 best structure, I have a friend who did his master on an algorithm to tesselate a point of cloud coming out of a laser scanner (billions of data) and in his case the best data structure was to mix a collection of uniforms 3D grids with some octree.
For other people kd-tree is the best, for other people, BVH trees are the best.
I like the grid system but it cannot work if the space is too wide because all cells has to exist.
One day I even implemented a sparse grid system using a hash map, it worked, I didn't bother to profile or investigate the performance so I wouldn't know if its an excellent way, I know its one way though.
To do that, you make a KEY class which is basically a 3D position vector hasher, first you apply an integer division on the coordinates to define the size of one grid cell. Then you stupidely hash all coordinates into one hash and provide a hash_value method or friend method. an equality operator and then its usable in a hash map.
You can use a google::sparse_map or something along these lines. I personally used boost::unordered and it was enough in my case.
Then the thing to consider is the presence of AABB into more than one cell. You can store a reference in every cell covered by your AABB, its just something to be aware of in every algorithm : "there is no 1-1 relationship between cell references and AABB." that's all.
good luck

Does the Barnes Hut Tree need to be recreated each iteration of a loop

I'm coding an application which is required to perform an n-body simulation between a few hundred particles which are constantly in motion. The application has real time requirements and thus the algorithm performing the simulation needs to be fast.
I've done a fair amount of research on the matter and have come to the conclusion that the Barnes Hut algorithm would be most suitable for my needs, it seems very efficient for large particle sets.
http://arborjs.org/docs/barnes-hut gave a very clear explanation on how the algorithm worked, but as the title implies, I'd like to know whether the tree needs to be recreated for each iteration, considering that the particles used in the simulation are always dynamically in motion. And if the tree does need to be recreated, how does one do it in the most efficient (in terms of processing power and memory) way.
Usually with motion based indexes there is no "update" for the index after movement has occurred and you must rebuild the entire index.
The Barnes Hut Tree is the same and will have to be rebuilt. Here is an example I found online with a code outline of the process.
This is one of the reason so much effort has gone into build optimizations for things like the KD-Tree, and I'm sure the Barnes Hut has the same. Also, I'm sure there is research on dynamically updating, but most of the time these implementations are much harder than a simply rebuilding.
Pretty late on answering this question. Guess I had never worked on this problem then. But not that I have been looking into it for sometime now, I can share some insight. What #greedybuddha is saying is mostly right, but they are tricks and techniques to prevent creating the tree entirely every time step. As usual these techniques with their own overheads (memory footprint etc) and necessary trade-offs need to be made. Also, it may not be possible to apply them in all situations.
Allocate enough memory upfront to handle octree of a given depth. Deallocate the memory only when all timessteps have finished. For example suppose for all n-body input data sets you are going to use, you know that your Octree will never go beyond a depth of say 10. In that case it is easy to figure out how many maximum nodes you octree might have(usually its a geometric progression sum). By doing a bit of bookkeeping(valid child indexes of each node) for every octree node, it is easy to reuse this buffer to fill in octree nodes without needing to allocate/deallocate octree nodes every timestep. Clearly the likely overhead here is that you might be wasting lot of memory but then you gain performance by not doing allocation/deallocation every timestep
Usual way of going about bounding the number of octree levels is to allow multiple bodies per leaf node (last level of octree or smallest cell size of octree grid)
Create a new Octree only when needed: It is possible to think about a situation where octree doesn't change for some series (burst) of timesteps and then it changes and the pattern repeats. This can happen only when bodies positions change insignificantly over a timestep burst such that octree structure remains the same during that burst. And in what situation will bodies move slowly - when the chosen timestep size is pretty small, forces exerted + initial momentum on bodies is small. How to dynamically figure out such bursts is a difficult problem. This is a more tricky technique and I dont know whats the easy way to do this. This requires some insight of timestep granularity, initial velocity/acceleration of the bodies and kind of forces the bodies are dealing with.

Ask for resource about fast ray-tracing algorithm

First, I am sorry for this rough question, but I don't want to introduce too much details, so I just ask for related resource like articles, libraries or tips.
My program need to do intensive computation of ray-triangle intersection (there are millions of rays and triangles), and my goal is to make it as fast as I can.
What I have done is:
Use the fastest ray-triangle algorithm that I know.
Use Octree.(From Game Programming Gem 1, 4.10. 4.11)
Use An Efficient and Robust Ray–Box Intersection Algorithm which is used in octree algorithm.
It is faster than before I applied those better algorithms, but I believe it could be faster, Could you please shed lights on any possible places that could make it faster?
Thanks.
The place to ask these questions is ompf2.com. A forum with topics about realtime (although also non-realtime) raytracing
OMPF forum is the right place for this question, but since I'm here today...
Don't use a ray/box intersection for OctTree traversal. You may use it for the root node of the tree, but that's it. Once you know the distance to the entry and exit of the root box, you can calculate the distances to the x,y, and z partition planes - the planes that subdivide the box. If the distance to front and back are f and b respectively then you can determine which child nodes of the box are hit by analyzing f,b,x,y,z distances. You can also determine the order to traverse the child nodes and completely reject many of them.
At most 4 of the children can be hit since the ray starts in one octant and only changes octants when it crosses one of the 3 partition planes.
Also, since it becomes recursive you'll be needing the entry and exit distances for the child nodes. These distances are chosen from the set (f,b,x,y,z) which you've already computed.
I have been optimizing this for a very long time, and can safely say you have about an order of magnitude performance still on the table for trees many levels deep. I started right where you are now.
There are several optimizations you can do, but all of them depend on the exact domain of your problem. As far as general algorithms go, you are on the right track. Depending on the domain, you could:
Introduce a portal system
Move the calculations to a GPU and take advantage of parallel computation
A quite popular trend in raytracing recently is Bounding Volume Hierarchies
You've already gotten a good start using a spatial sort coupled with fast intersection algorithms. For tracing single rays at a time, one of the best structures out there (for static scenes) is a K-d tree built using the Surface Area Heuristic.
However, for truly high-speed ray tracing you need to take advantage of:
Coherent packets of rays
Frusta
SIMD
I would suggest you start with "Ray Tracing Animated Scenes using Coherent Grid Traversal". It gives an easy-to-follow example of such a modern approach. You can also follow the references to see how these ideas are applied to K-d trees and BVHs.
On the same page, also check out "State of the Art in Ray Tracing Animated Scenes".
Another great set of resources are all the SIGGRAPH publications over the years. This is a very competitive conference, so these papers tend to be top-notch.
Finally, if you're willing to use existing code, check out the project page for OpenRT.
A useful resource I've seen is the journal of graphics tools. Depending on your scenes, another BVH might be more appropriate than an octree.
Also, if you haven't looked at your performance with a profiler then you should. Shark is great on OSX, and I've gotten good results with Very Sleepy on windows.

How to subdivide a 2d game world for better collision detection

I'm developing a game which features a sizeable square 2d playing area. The gaming area is tileless with bounded sides (no wrapping around). I am trying to figure out how I can best divide up this world to increase the performance of collision detection. Rather than checking each entity for collision with all other entities I want to only check nearby entities for collision and obstacle avoidance.
I have a few special concerns for this game world...
I want to be able to be able to use a large number of entities in the game world at once. However, a % of entities won't collide with entities of the same type. For example projectiles won't collide with other projectiles.
I want to be able to use a large range of entity sizes. I want there to be a very large size difference between the smallest entities and the largest.
There are very few static or non-moving entities in the game world.
I'm interested in using something similar to what's described in the answer here: Quadtree vs Red-Black tree for a game in C++?
My concern is how well will a tree subdivision of the world be able to handle large size differences in entities? To divide the world up enough for the smaller entities the larger ones will need to occupy a large number of regions and I'm concerned about how that will affect the performance of the system.
My other major concern is how to properly keep the list of occupied areas up to date. Since there's a lot of moving entities, and some very large ones, it seems like dividing the world up will create a significant amount of overhead for keeping track of which entities occupy which regions.
I'm mostly looking for any good algorithms or ideas that will help reduce the number collision detection and obstacle avoidance calculations.
If I were you I'd start off by implementing a simple BSP (binary space partition) tree. Since you are working in 2D, bound box checks are really fast. You basically need three classes: CBspTree, CBspNode and CBspCut (not really needed)
CBspTree has one root node instance of class CBspNode
CBspNode has an instance of CBspCut
CBspCut symbolize how you cut a set in two disjoint sets. This can neatly be solved by introducing polymorphism (e.g. CBspCutX or CBspCutY or some other cutting line). CBspCut also has two CBspNode
The interface towards the divided world will be through the tree class and it can be a really good idea to create one more layer on top of that, in case you would like to replace the BSP solution with e.g. a quad tree. Once you're getting the hang of it. But in my experience, a BSP will do just fine.
There are different strategies of how to store your items in the tree. What I mean by that is that you can choose to have e.g. some kind of container in each node that contains references to the objects occuping that area. This means though (as you are asking yourself) that large items will occupy many leaves, i.e. there will be many references to large objects and very small items will show up at single leaves.
In my experience this doesn't have that large impact. Of course it matters, but you'd have to do some testing to check if it's really an issue or not. You would be able to get around this by simply leaving those items at branched nodes in the tree, i.e. you will not store them on "leaf level". This means you will find those objects quick while traversing down the tree.
When it comes to your first question. If you only are going to use this subdivision for collision testing and nothing else, I suggest that things that can never collide never are inserted into the tree. A missile for example as you say, can't collide with another missile. Which would mean that you dont even have to store the missile in the tree.
However, you might want to use the bsp for other things as well, you didn't specify that but keep that in mind (for picking objects with e.g. the mouse). Otherwise I propose that you store everything in the bsp, and resolve the collision later on. Just ask the bsp of a list of objects in a certain area to get a limited set of possible collision candidates and perform the check after that (assuming objects know what they can collide with, or some other external mechanism).
If you want to speed up things, you also need to take care of merge and split, i.e. when things are removed from the tree, a lot of nodes will become empty or the number of items below some node level will decrease below some merge threshold. Then you want to merge two subtrees into one node containing all items. Splitting happens when you insert items into the world. So when the number of items exceed some splitting threshold you introduce a new cut, which splits the world in two. These merge and split thresholds should be two constants that you can use to tune the efficiency of the tree.
Merge and split are mainly used to keep the tree balanced and to make sure that it works as efficient as it can according to its specifications. This is really what you need to worry about. Moving things from one location and thus updating the tree is imo fast. But when it comes to merging and splitting it might become expensive if you do it too often.
This can be avoided by introducing some kind of lazy merge and split system, i.e. you have some kind of dirty flagging or modify count. Batch up all operations that can be batched, i.e. moving 10 objects and inserting 5 might be one batch. Once that batch of operations is finished, you check if the tree is dirty and then you do the needed merge and/or split operations.
Post some comments if you want me to explain further.
Cheers !
Edit
There are many things that can be optimized in the tree. But as you know, premature optimization is the root to all evil. So start off simple. For example, you might create some generic callback system that you can use while traversing the tree. This way you dont have to query the tree to get a list of objects that matched the bound box "question", instead you can just traverse down the tree and execute that call back each time you hit something. "If this bound box I'm providing intersects you, then execute this callback with these parameters"
You most definitely want to check this list of collision detection resources from gamedev.net out. It's full of resources with game development conventions.
For other than collision detection only, check their entire list of articles and resources.
My concern is how well will a tree
subdivision of the world be able to
handle large size differences in
entities? To divide the world up
enough for the smaller entities the
larger ones will need to occupy a
large number of regions and I'm
concerned about how that will affect
the performance of the system.
Use a quad tree. For objects that exist in multiple areas you have a few options:
Store the object in both branches, all the way down. Everything ends up in leaf nodes but you may end up with a significant number of extra pointers. May be appropriate for static things.
Split the object on the zone border and insert each part in their respective locations. Creates a lot of pain and isn't well defined for a lot of objects.
Store the object at the lowest point in the tree you can. Sets of objects now exist in leaf and non-leaf nodes, but each object has one pointer to it in the tree. Probably best for objects that are going to move.
By the way, the reason you're using a quad tree is because it's really really easy to work with. You don't have any heuristic based creation like you might with some BSP implementations. It's simple and it gets the job done.
My other major concern is how to
properly keep the list of occupied
areas up to date. Since there's a lot
of moving entities, and some very
large ones, it seems like dividing the
world up will create a significant
amount of overhead for keeping track
of which entities occupy which
regions.
There will be overhead to keeping your entities in the correct spots in the tree every time they move, yes, and it can be significant. But the whole point is that you're doing much much less work in your collision code. Even though you're adding some overhead with the tree traversal and update it should be much smaller than the overhead you just removed by using the tree at all.
Obviously depending on the number of objects, size of game world, etc etc the trade off might not be worth it. Usually it turns out to be a win, but it's hard to know without doing it.
There are lots of approaches. I'd recommend settings some specific goals (e.g., x collision tests per second with a ratio of y between smallest to largest entities), and do some prototyping to find the simplest approach that achieves those goals. You might be surprised how little work you have to do to get what you need. (Or it might be a ton of work, depending on your particulars.)
Many acceleration structures (e.g., a good BSP) can take a while to set up and thus are generally inappropriate for rapid animation.
There's a lot of literature out there on this topic, so spend some time searching and researching to come up with a list candidate approaches. Mock them up and profile.
I'd be tempted just to overlay a coarse grid over the play area to form a 2D hash. If the grid is at least the size of the largest entity then you only ever have 9 grid squares to check for collisions and it's a lot simpler than managing quad-trees or arbitrary BSP trees. The overhead of determining which coarse grid square you're in is typically just 2 arithmetic operations and when a change is detected the grid just has to remove one reference/ID/pointer from one square's list and add the same to another square.
Further gains can be had from keeping the projectiles out of the grid/tree/etc lookup system - since you can quickly determine where the projectile would be in the grid, you know which grid squares to query for potential collidees. If you check collisions against the environment for each projectile in turn, there's no need for the other entities to then check for collisions against the projectiles in reverse.

Resources