How should I index for a simple world of rectangles? - algorithm

The world consists of many (1k-10k) rectangles of similar sizes, and I need to be able to quickly determine potential overlaps when trying to add a new rectangle. Rectangles will be added and removed dynamically. Are R-Trees appropriate here? If so, are there any good libraries I should consider? (I'm open to suggestions in any language).

R-Trees would be suitable, yes.
quad trees are also a good data structure for quickly finding objects in a region of 2D space. They are really a more uniform version of r-trees. Using these structures you can quickly zero in on a small region of space, with very few tests, even with massive data sets.
There is a c# implementation here, though I have not looked at it.
This kind of data structure (and it's 3D version called Octrees) are often used in games to manage the large data sets of objects that need to know if they are near any other objects for collision testing, and all kinds of other fun reasons.
You should be able to find lots of articles and examples of these kinds of data structures in the games industry sites, like gamasutra and opengl.org

You can also look up to kd-trees.
I don't know of any implementation but in 3D at least they are usually considered more performant than Octrees. For example, here is a return of experience I just googled it.
You may want to consider alternative to quad trees if you ever have a problem of performance.
However it should be noted that kd-trees are hard to rebalance...

Related

best algorithm for spacial partitioning/collision detection on objects from tiny to massive size?

I've looked around and found a billion questions, articles, studies, theses, etc, but I haven't been able to really figure out or find an answer to this question.
Basically, I'm just wondering what the best algorithm(s) for spacial partitioning/collision detection between objects from 1 pixel to the size of the screen itself is. Currently, i'm leaning towards the the loose quadtree.
First, have a look here.
It depends on your requirements. Quadtrees are perfect for small to medium sized datasets up to 100K entries or so. If that fits your requirements, there is no need to read further.
However, normal quadtrees tend to have difficulties with very large or strongly clustered (high point density in some areas) datasets. They are also not that straight forward to implement because with larger tree you may run into precision problems (divide a number by 2 30 times if you go deep in a tree and your quadrants start overlapping or have gaps between them). Otherwise they are relatively easy to implement.
I found my PH-Tree quite useful, it is somewhat similar to a quadtree, but as no precision problems and inherently limited depth. In my experience it's quite fast, especially if you do window queries with a small result set (that's basically what you do in collision detection). Unfortunately it's not that easy to implement. The link above references my own Java implementation, but the doc also contains a link to a C++ version.
You can also try the 'qthypercube2' here, it's a quadtree with some of the PH-Tree's navigation techniques. The current 1.6 version has some precision problems for extreme datasets, but you will find it to be very fast, and a solution to the precision problem is already in the pipeline.

Concerning Octrees

I am creating a Minecraft like terrain engine thing, and I was wondering what exactly octrees are. With my engine I have seperated each part of it into chunks or regions - which from what I have read has something to do with it. Also, I was wondering if indices do increase performance within a game and if so how much? Any other ideas/ways to increase performance would be much appreciated. Note that I have already included backface culling and that if the box or a side is hidden don't show that side.
Read this excellent article on FlipCode
Googleing for Octree and flipcode or Gamedev.net will give you a lot of references.
Thoughts on performance are hard to give because a lot depends on what you are doing. (how many changes are being made to the 'world', are there any objects moving, what do you want to use the Octree for (visibility, collision detection, rendering, ...) Read about K-d-trees too because they might be more appropriate for your problem.

Good data structure for euclidean 3d data queries?

What's a good way to store point cloud data so that it's optimal for an application that will do one of these two queries?
Nearest (i.e. lowest euclidean distance) data point to (x,y,z)
Get all the points inside a sphere with radius R around a point (x,y,z)
The structure will only be filled once, but read many times. A lowish memory footprint would be nice as I may be dealing with datasets of > 7 million points, but speed should be of primary concern. A library would be nice, but I wouldn't mind implementing it myself if it's something doable with limited expertise in the area.
Thanks in advance!
A Kd-Tree you get O(log(n)) nearest neighbor, and usually range queries will be fast.
There are a bunch of libraries referenced there. I have not used any of them.
You might also look at CGAL. I have used CGAL for other things, it is tolerably fast, extremely comprehensive, but the documentation will drive you to drink.
A huge portion of the decision in data structures will depend on the spatial organization of the data. For example, highly clustered data tends to have different performance charateristics in kd-trees than evenly distributed data.
KD-Trees are very good for both of these queries.
Octree can be a good option in many cases, as well, and is potentially easier to implement.
There are many libraries out there that do this, using various algorithms. Searching for k-nearest neighbor searching will reveal many useful libraries. I've had pretty good luck with ANN in the past, for example.

Storing Large 2D Game Worlds

I've been experimenting with different ideas of how to store a 2D game world. I'm interested in hearing techniques of storing large quantities of objects while managing the set that's visible ( lets say 100,000 tiles square ). Obviously the techniques can vary based on how the game renders that space.
Lets assume that we're describing a scrolling 2d game world rather than screen based as you could fairly easily do screen based rendering from such a setup while the converse is a bit more messy.
Looking for language agnostic solutions here so it's more helpful to others.
Edit: I think a good answer here would be a general review of the ideas to consider when thinking about this, as some of the responders have attempted, but also begin to explain how different solutions would apply to those scenarios. It's a somewhat complex question, so I would expect a good answer to reflect that.
Quadtrees are a fairly efficient solution for storing data about a large 2-dimensional world and the objects within it.
You might get some ideas on how to implement this from some spatial data structures like range or kd trees.
However, the answer to this question would vary considerably depending exactly on how your game works.
Are we talking a 2D platformer with 10 enemies onscreen, 20 more offscreen but "active", and an unknown number more "inactive"? If so, you can probably store your whole level as an array of "screens" where you manipulate the ones closest to you.
Or do you mean a true 2D game with lots of up/down movement too? You might have to be a bit more careful here.
The platform is also of some importance. If you're implementing a simple platformer for desktop PCs, you probably wouldn't have to worry about performance as much as you would on an embedded device. This is no excuse to be naive about it, but you might not have to be terribly clever either.
This is a somewhat interesting question I think. Presumably someone smarter than I who has experience with implementing platformers has thought these things out already.
Break the world into smaller areas, and deal with them. Any solution to this problem is going to boil down to this concept (such as quadtrees, mentioned in another answer). The differences will be in how they subdivide the world.
How much data is stored per tile? How fast can players move across the world? What's the behavior of NPCs, etc., that are offscreen? Do they just reset when the player comes back (like old Zelda games)? Do they simply resume where they were? Do they do some kind of catch-up script?
How much different rendering data is going to be needed for different areas?
How much of the world can be seen at one time?
All of these questions are going to immpact your solution, as well as the capabilities of your platform. Coming up with a general answer for these without having a reasonable idea of these parameters is going to be a bit difficult.
Assuming that your game will only update what is visible and some area around what is visible, just break the world in "screens" (a "screen" is a rectangular area on the tilemap that can fill the whole screen). Keep in memory the "screens" around the visible area (and some more if you want to update entities which are close to the character - but there is little reason to update an entity that far away) and have the rest on disk with a cache to avoid loading/unloading of commonly visited areas when you move around. Some setup like:
+---+---+---+---+---+---+---+
|FFF|FFF|FFF|FFF|FFF|FFF|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|VVV|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|FFF|FFF|FFF|FFF|FFF|FFF|
+---+---+---+---+---+---+---+
Where "V" part is the "screen" where the center (hero or whatever) is, the "N" parts are those who are nearby and have active (updating) entities, are checked for collisions, etc and "F" parts are far parts which might get updated infrequently and are prone to be "swapped" out (stored to disk). Of course you might want to use more "N" screens than two :-).
Note btw that since 2D games do not usually hold much data instead of saving the far away parts to disk you might want to just keep them in memory compressed.
You probably want to use a single int or byte array that links to block types. If you need more optimization from there, then you'll want to link to more complicated data structures like oct trees from your array. There is a good discussion on a Java game forum here: http://www.javagaming.org/index.php/topic,20505.30.html text
Anything with links becomes very expensive because the pointer takes up something like 8 bytes each, depending upon the language, so depending upon how populated your world is it can get expensive very quickly (8 pointers 8 bytes each is 64 bytes per item, and a byte array is 1 byte per item). So unless 1/64 of your world is empty, a byte array is going to be a much better option. You're also going to need to spend a lot of time iterating down the tree whenever you're doing a lookup for collision or whatever else - a byte array will be an instantaneous lookup.
Hopefully that's detailed enough for you. :-)

Improving raytracer performance

I'm writing a comparatively straightforward raytracer/path tracer in D (http://dsource.org/projects/stacy), but even with full optimization it still needs several thousand processor cycles per ray. Is there anything else I can do to speed it up? More generally, do you know of good optimizations / faster approaches for ray tracing?
Edit: this is what I'm already doing.
Code is already running highly parallel
temporary data is structured in a cache-efficient fashion as well as aligned to 16b
Screen divided into 32x32-tiles
Destination array is arranged in such a way that all subsequent pixels in a tile are sequential in memory
Basic scene graph optimizations are performed
Common combinations of objects (plane-plane CSG as in boxes) are replaced with preoptimized objects
Vector struct capable of taking advantage of GDC's automatic vectorization support
Subsequent hits on a ray are found via lazy evaluation; this prevents needless calculations for CSG
Triangles neither supported nor priority. Plain primitives only, as well as CSG operations and basic material properties
Bounding is supported
The typical first order improvement of raytracer speed is some sort of spatial partitioning scheme. Based only on your project outline page, it seems you haven't done this.
Probably the most usual approach is an octree, but the best approach may well be a combination of methods (e.g. spatial partitioning trees and things like mailboxing). Bounding box/sphere tests are a quick cheap and nasty approach, but you should note two things: 1) they don't help much in many situations and 2) if your objects are already simple primitives, you aren't going to gain much (and might even lose). You can more easily (than octree) implement a regular grid for spatial partitioning, but it will only work really well for scenes that are somewhat uniformly distributed (in terms of surface locations)
A lot depends on the complexity of the objects you represent, your internal design (i.e. do you allow local transforms, referenced copies of objects, implicit surfaces, etc), as well as how accurate you're trying to be. If you are writing a global illumination algorithm with implicit surfaces the tradeoffs may be a bit different than if you are writing a basic raytracer for mesh objects or whatever. I haven't looked at your design in detail so I'm not sure what, if any, of the above you've already thought about.
Like any performance optimization process, you're going to have to measure first to find where you're actually spending the time, then improving things (algorithmically by preference, then code bumming by necessity)
One thing I learned with my ray tracer is that a lot of the old rules don't apply anymore. For example, many ray tracing algorithms do a lot of testing to get an "early out" of a computationally expensive calculation. In some cases, I found it was much better to eliminate the extra tests and always run the calculation to completion. Arithmetic is fast on a modern machine, but a missed branch prediction is expensive. I got something like a 30% speed-up on my ray-polygon intersection test by rewriting it with minimal conditional branches.
Sometimes the best approach is counter-intuitive. For example, I found that many scenes with a few large objects ran much faster when I broke them down into a large number of smaller objects. Depending on the scene geometry, this can allow your spatial subdivision algorithm to throw out a lot of intersection tests. And let's face it, intersection tests can be made only so fast. You have to eliminate them to get a significant speed-up.
Hierarchical bounding volumes help a lot, but I finally grokked the kd-tree, and got a HUGE increase in speed. Of course, building the tree has a cost that may make it prohibitive for real-time animation.
Watch for synchronization bottlenecks.
You've got to profile to be sure to focus your attention in the right place.
Is there anything else I can do to speed it up?
D, depending on the implementation and compiler, puts forth reasonably good performance. As you haven't explained what ray tracing methods and optimizations you're using already, then I can't give you much help there.
The next step, then, is to run a timing analysis on the program, and recode the most frequently used code or slowest code than impacts performance the most in assembly.
More generally, check out the resources in these questions:
Literature and Tutorials for Writing a Ray Tracer
Anyone know of a really good book about Ray Tracing?
Computer Graphics: Raytracing and Programming 3D Renders
raytracing with CUDA
I really like the idea of using a graphics card (a massively parallel computer) to do some of the work.
There are many other raytracing related resources on this site, some of which are listed in the sidebar of this question, most of which can be found in the raytracing tag.
I don't know D at all, so I'm not able to look at the code and find specific optimizations, but I can speak generally.
It really depends on your requirements. One of the simplest optimizations is just to reduce the number of reflections/refractions that any particular ray can follow, but then you start to lose out on the "perfect result".
Raytracing is also an "embarrassingly parallel" problem, so if you have the resources (such as a multi-core processor), you could look into calculating multiple pixels in parallel.
Beyond that, you'll probably just have to profile and figure out what exactly is taking so long, then try to optimize that. Is it the intersection detection? Then work on optimizing the code for that, and so on.
Some suggestions.
Use bounding objects to fail fast.
Project the scene at a first step (as common graphic cards do) and use raytracing only for light calculations.
Parallelize the code.
Raytrace every other pixel. Get the color in between by interpolation. If the colors vary greatly (you are on an edge of an object), raytrace the pixel in between. It is cheating, but on simple scenes it can almost double the performance while you sacrifice some image quality.
Render the scene on GPU, then load it back. This will give you the first ray/scene hit at GPU speeds. If you do not have many reflective surfaces in the scene, this would reduce most of your work to plain old rendering. Rendering CSG on GPU is unfortunately not completely straightforward.
Read source code of PovRay for inspiration. :)
You have first to make sure that you use very fast algorithms (implementing them can be a real pain, but what do you want to do and how far want you to go and how fast should it be, that's a kind of a tradeof).
some more hints from me
- don't use mailboxing techniques, in papers it is sometimes discussed that they don't scale that well with the actual architectures because of the counting overhead
- don't use BSP/Octtrees, they are relative slow.
- don't use the GPU for Raytracing, it is far too slow for advanced effects like reflection and shadows and refraction and photon-mapping and so on ( i use it only for shading, but this is my beer)
For a complete static scene kd-Trees are unbeatable and for dynamic scenes there are clever algorithms there that scale very well on a quadcore (i am not sure about the performance above).
And of course, for a realy good performance you need to use very much SSE code (with of course not too much jumps) but for not "that good" performance (im talking here about 10-15% maybe) compiler-intrinsics are enougth to implement your SSE stuff.
And some decent Papers about some Algorithms i was talking about:
"Fast Ray/Axis-Aligned Bounding Box - Overlap Tests using Ray Slopes"
( very fast very good paralelisizable (SSE) AABB-Ray hit test )( note, the code in the paper is not all code, just google for the title of the paper, youll find it)
http://graphics.tu-bs.de/publications/Eisemann07RS.pdf
"Ray Tracing Deformable Scenes using Dynamic Bounding Volume Hierarchies"
http://www.sci.utah.edu/~wald/Publications/2007///BVH/download//togbvh.pdf
if you know how the above algorithm works then this is a much greater algorithm:
"The Use of Precomputed Triangle Clusters for Accelerated Ray Tracing in Dynamic Scenes"
http://garanzha.com/Documents/UPTC-ART-DS-8-600dpi.pdf
I'm also using the pluecker-test to determine fast (not thaat accurate, but well, you can't have all) if i hit a polygon, works very pretty with SSE and above.
So my conclusion is that there are so many great papers out there about so much Topics that do relate to raytracing (How to build fast, efficient trees and how to shade (BRDF models) and so on and so on), it is an realy amazing and interesting field of "experimentating", but you need to have also much sparetime because it is so damn complicated but funny.
My first question is - are you trying to optimize the tracing of one single still screen,
or is this about optimizing the tracing of multiple screens in order to calculate an animation ?
Optimizing for a single shot is one thing, if you want to calculate successive frames in an animation there are lots of new things to think about / optimize.
You could
use an SAH-optimized bounding volume hierarchy...
...eventually using packet traversal,
introduce importance sampling,
access the tiles ordered by Morton code for better cache coherency, and
much more - but those were the suggestions I could immediately think of. In more words:
You can build an optimized hierarchy based on statistics in order to quickly identify candidate nodes when intersecting geometry. In your case you'll have to combine the automatic hierarchy with the modeling hierarchy, that is either constrain the build or have it eventually clone modeling information.
"Packet traversal" means you use SIMD instructions to compute 4 parallel scalars, each of an own ray for traversing the hierarchy (which is typically the hot spot) in order to squeeze the most performance out of the hardware.
You can perform some per-ray-statistics in order to control the sampling rate (number of secondary rays shot) based on the contribution to the resulting pixel color.
Using an area curve on the tile allows you to decrease the average space distance between the pixels and thus the probability that your performance benefits from cache hits.

Resources