I am creating a Minecraft like terrain engine thing, and I was wondering what exactly octrees are. With my engine I have seperated each part of it into chunks or regions - which from what I have read has something to do with it. Also, I was wondering if indices do increase performance within a game and if so how much? Any other ideas/ways to increase performance would be much appreciated. Note that I have already included backface culling and that if the box or a side is hidden don't show that side.
Read this excellent article on FlipCode
Googleing for Octree and flipcode or Gamedev.net will give you a lot of references.
Thoughts on performance are hard to give because a lot depends on what you are doing. (how many changes are being made to the 'world', are there any objects moving, what do you want to use the Octree for (visibility, collision detection, rendering, ...) Read about K-d-trees too because they might be more appropriate for your problem.
Related
The world consists of many (1k-10k) rectangles of similar sizes, and I need to be able to quickly determine potential overlaps when trying to add a new rectangle. Rectangles will be added and removed dynamically. Are R-Trees appropriate here? If so, are there any good libraries I should consider? (I'm open to suggestions in any language).
R-Trees would be suitable, yes.
quad trees are also a good data structure for quickly finding objects in a region of 2D space. They are really a more uniform version of r-trees. Using these structures you can quickly zero in on a small region of space, with very few tests, even with massive data sets.
There is a c# implementation here, though I have not looked at it.
This kind of data structure (and it's 3D version called Octrees) are often used in games to manage the large data sets of objects that need to know if they are near any other objects for collision testing, and all kinds of other fun reasons.
You should be able to find lots of articles and examples of these kinds of data structures in the games industry sites, like gamasutra and opengl.org
You can also look up to kd-trees.
I don't know of any implementation but in 3D at least they are usually considered more performant than Octrees. For example, here is a return of experience I just googled it.
You may want to consider alternative to quad trees if you ever have a problem of performance.
However it should be noted that kd-trees are hard to rebalance...
I've been experimenting with different ideas of how to store a 2D game world. I'm interested in hearing techniques of storing large quantities of objects while managing the set that's visible ( lets say 100,000 tiles square ). Obviously the techniques can vary based on how the game renders that space.
Lets assume that we're describing a scrolling 2d game world rather than screen based as you could fairly easily do screen based rendering from such a setup while the converse is a bit more messy.
Looking for language agnostic solutions here so it's more helpful to others.
Edit: I think a good answer here would be a general review of the ideas to consider when thinking about this, as some of the responders have attempted, but also begin to explain how different solutions would apply to those scenarios. It's a somewhat complex question, so I would expect a good answer to reflect that.
Quadtrees are a fairly efficient solution for storing data about a large 2-dimensional world and the objects within it.
You might get some ideas on how to implement this from some spatial data structures like range or kd trees.
However, the answer to this question would vary considerably depending exactly on how your game works.
Are we talking a 2D platformer with 10 enemies onscreen, 20 more offscreen but "active", and an unknown number more "inactive"? If so, you can probably store your whole level as an array of "screens" where you manipulate the ones closest to you.
Or do you mean a true 2D game with lots of up/down movement too? You might have to be a bit more careful here.
The platform is also of some importance. If you're implementing a simple platformer for desktop PCs, you probably wouldn't have to worry about performance as much as you would on an embedded device. This is no excuse to be naive about it, but you might not have to be terribly clever either.
This is a somewhat interesting question I think. Presumably someone smarter than I who has experience with implementing platformers has thought these things out already.
Break the world into smaller areas, and deal with them. Any solution to this problem is going to boil down to this concept (such as quadtrees, mentioned in another answer). The differences will be in how they subdivide the world.
How much data is stored per tile? How fast can players move across the world? What's the behavior of NPCs, etc., that are offscreen? Do they just reset when the player comes back (like old Zelda games)? Do they simply resume where they were? Do they do some kind of catch-up script?
How much different rendering data is going to be needed for different areas?
How much of the world can be seen at one time?
All of these questions are going to immpact your solution, as well as the capabilities of your platform. Coming up with a general answer for these without having a reasonable idea of these parameters is going to be a bit difficult.
Assuming that your game will only update what is visible and some area around what is visible, just break the world in "screens" (a "screen" is a rectangular area on the tilemap that can fill the whole screen). Keep in memory the "screens" around the visible area (and some more if you want to update entities which are close to the character - but there is little reason to update an entity that far away) and have the rest on disk with a cache to avoid loading/unloading of commonly visited areas when you move around. Some setup like:
+---+---+---+---+---+---+---+
|FFF|FFF|FFF|FFF|FFF|FFF|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|VVV|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|FFF|FFF|FFF|FFF|FFF|FFF|
+---+---+---+---+---+---+---+
Where "V" part is the "screen" where the center (hero or whatever) is, the "N" parts are those who are nearby and have active (updating) entities, are checked for collisions, etc and "F" parts are far parts which might get updated infrequently and are prone to be "swapped" out (stored to disk). Of course you might want to use more "N" screens than two :-).
Note btw that since 2D games do not usually hold much data instead of saving the far away parts to disk you might want to just keep them in memory compressed.
You probably want to use a single int or byte array that links to block types. If you need more optimization from there, then you'll want to link to more complicated data structures like oct trees from your array. There is a good discussion on a Java game forum here: http://www.javagaming.org/index.php/topic,20505.30.html text
Anything with links becomes very expensive because the pointer takes up something like 8 bytes each, depending upon the language, so depending upon how populated your world is it can get expensive very quickly (8 pointers 8 bytes each is 64 bytes per item, and a byte array is 1 byte per item). So unless 1/64 of your world is empty, a byte array is going to be a much better option. You're also going to need to spend a lot of time iterating down the tree whenever you're doing a lookup for collision or whatever else - a byte array will be an instantaneous lookup.
Hopefully that's detailed enough for you. :-)
Disclaimer: I'm not actually trying to make one I'm just curious as to how it could be done.
When I say "Most Accurate" I include the basics
wall
distance
light levels
and the more complicated
Dust in Atmosphere
rain, sleet, snow
clouds
vegetation
smoke
fire
If I were to want to program this, what resources should I look into and what things should I watch out for?
Also, are there any relevant books on the theory behind line of sight including all these variables?
I personally don't know too much about this topic but a quick couple of Google searches turns up some formal papers that contain some very relevant information:
http://www.tecgraf.puc-rio.br/publications/artigo_1999_efficient_lineofsight_algorithms.pdf - Provides a detailed description of two different methods of efficiently performing an LOS calculation, along with issues involved
http://www.agc.army.mil/operations/programs/LOS/LOS%20Compendium.doc - This one aims to maintain "a current list of unique LOS algorithms"; it has a section listing quite a few and describing them in detail with a focus on military applications.
Hope this helps!
Typically, one represents the world as a set of volumes of space held in some kind of space partitioning data structure, then intersects the ray representing your "line of sight" with that structure to find the set of objects it hits; these are then walked in order from ray origin to determine the overall result. Reflective objects cause further rays to be fired, opaque objects stop the walk and semitransparent objects partially contribute to the result.
You might like to read up on ray tracing; there is a great body of literature on the subject and well-understood ways of solving what are basically the same problems you list exist.
The obvious question is do you really want the most accurate, and why?
I've worked on games that depended on line of sight and you really need to think clearly about what kind of line of sight you want.
First, can the AI see any part of your body? Or are you talking about "eye to eye" LOS?
Second, if the player's camera view is not his avatar's eye view, the player will not perceive your highly accurate LOS as highly accurate. At which point inaccuracies are fine.
I'm not trying to dissuade you, but remember that player experience is #1, and that might mean not having the best LOS.
A good friend of mine has done the AI for a long=-running series of popular console games. He often tells a story about how the AIs are most interesting (and fun) in the first game, because they stumble into you rather than see you from afar. Now, he has great LOS and spends his time trying to dumb them down to make them as fun as they were in the first game.
So why are you doing this? Does the game need it? Or do you just want the challenge?
There is no "one algorithm" for these since the inputs are not well defined.
If you treat Dust-In-Atmosphere as a constant value then there is an algorithm that can take it into account, but the fact is that dust levels will vary from point to point, and thus the algorithm you want needs to be aware of how your dust-data is structured.
The most used algorithm in todays ray-tracers is just incremental ray-marching, which is by definition not correct, but it does approximate the Ultimate Answer to a fair degree.
Even if you managed to incorporate all these properties into a single master-algorithm, you'd still have to somehow deal with how different people perceive the same setting. Some people are near-sighted, some far-sighted. Then there's the colour-blind. Not to mention that Dust-In-Atmosphere levels also affect tear-glands, which in turn affects visibility. And then there's the whole dichotomy between what people are actually seeying and what they think they are seeying...
There are far too many variables here to aim for a unified solution. Treat your environment as a voxelated space and shoot your rays through it. I suspect that's the only solution you'll be able to complete within a single lifetime...
I am doing a maprouting application. Several people have suggested me, that I do a datastructure where I split the map in a grid. In theory it sounds really good, but I am not to sure because of the bad performance I get when I implement it.
In the worst case you have to draw every road. If you divide the map in a grid, the sum of roads in all the cells in the grid, will be much larger than if you put all roads in a list.(each cell must have more roads than actually needed if a road goes through it).
If I have to zoom in I can see some smartness in using a grid, but if I keep it in a list I can just decrease the numbers of roads each time I zoom in.
As it is now(by using the list) it is not really fast, so I am all for making it faster. But in practice dividing in a grid makes it slower for me.
Any suggestigion for what datastructure I should be using and/or what I might be doing wrong?
See this question for related information:
What algorithms compute directions from point A to point B on a map?
Somebody who writes this kind of software for a living has answered it.
Also for rendering see:
What is the best way to read, represent and render map data?
I'm not quite sure if you're trying to do routing quick or rendering!
If you want it to go quick, you might be better off organizing your roads in to major and minor roads.
Use the list of minor roads to find a route to the nearest major road.
Use the major roads to get you near the destination.
Then go back to the minor roads to complete the route.
Without a split like this, there are a heck of a lot of roads to search, most of which are quite slow routes.
google does not draw each road every time the screen is refreshed. They used pre-drawn tiles of the map. They can redraw them as needed. e.g. when there is a map update. They even use transparent overlays, stacks of tiles to add and remove layers of details.
Very clever, but very simple.
You may want to look at openlayers javascript library. Free and can do just about anything you need to do with a map.
Maptraction JS is also available - its not as complete as OpenLayers
More optimal then using a grid as your spatial data structure, might be a quadtree because it logarithmically breaks down the map. And from studying the source, my guesstimate is that google uses (that or) a similar data structure.
As for getting directions, you might want to look in to hierarchical path finding to approximate the direction at first and to speed up the process; generic path finding algorithms tend to be quite slow at that level of complexity.
I'm writing a comparatively straightforward raytracer/path tracer in D (http://dsource.org/projects/stacy), but even with full optimization it still needs several thousand processor cycles per ray. Is there anything else I can do to speed it up? More generally, do you know of good optimizations / faster approaches for ray tracing?
Edit: this is what I'm already doing.
Code is already running highly parallel
temporary data is structured in a cache-efficient fashion as well as aligned to 16b
Screen divided into 32x32-tiles
Destination array is arranged in such a way that all subsequent pixels in a tile are sequential in memory
Basic scene graph optimizations are performed
Common combinations of objects (plane-plane CSG as in boxes) are replaced with preoptimized objects
Vector struct capable of taking advantage of GDC's automatic vectorization support
Subsequent hits on a ray are found via lazy evaluation; this prevents needless calculations for CSG
Triangles neither supported nor priority. Plain primitives only, as well as CSG operations and basic material properties
Bounding is supported
The typical first order improvement of raytracer speed is some sort of spatial partitioning scheme. Based only on your project outline page, it seems you haven't done this.
Probably the most usual approach is an octree, but the best approach may well be a combination of methods (e.g. spatial partitioning trees and things like mailboxing). Bounding box/sphere tests are a quick cheap and nasty approach, but you should note two things: 1) they don't help much in many situations and 2) if your objects are already simple primitives, you aren't going to gain much (and might even lose). You can more easily (than octree) implement a regular grid for spatial partitioning, but it will only work really well for scenes that are somewhat uniformly distributed (in terms of surface locations)
A lot depends on the complexity of the objects you represent, your internal design (i.e. do you allow local transforms, referenced copies of objects, implicit surfaces, etc), as well as how accurate you're trying to be. If you are writing a global illumination algorithm with implicit surfaces the tradeoffs may be a bit different than if you are writing a basic raytracer for mesh objects or whatever. I haven't looked at your design in detail so I'm not sure what, if any, of the above you've already thought about.
Like any performance optimization process, you're going to have to measure first to find where you're actually spending the time, then improving things (algorithmically by preference, then code bumming by necessity)
One thing I learned with my ray tracer is that a lot of the old rules don't apply anymore. For example, many ray tracing algorithms do a lot of testing to get an "early out" of a computationally expensive calculation. In some cases, I found it was much better to eliminate the extra tests and always run the calculation to completion. Arithmetic is fast on a modern machine, but a missed branch prediction is expensive. I got something like a 30% speed-up on my ray-polygon intersection test by rewriting it with minimal conditional branches.
Sometimes the best approach is counter-intuitive. For example, I found that many scenes with a few large objects ran much faster when I broke them down into a large number of smaller objects. Depending on the scene geometry, this can allow your spatial subdivision algorithm to throw out a lot of intersection tests. And let's face it, intersection tests can be made only so fast. You have to eliminate them to get a significant speed-up.
Hierarchical bounding volumes help a lot, but I finally grokked the kd-tree, and got a HUGE increase in speed. Of course, building the tree has a cost that may make it prohibitive for real-time animation.
Watch for synchronization bottlenecks.
You've got to profile to be sure to focus your attention in the right place.
Is there anything else I can do to speed it up?
D, depending on the implementation and compiler, puts forth reasonably good performance. As you haven't explained what ray tracing methods and optimizations you're using already, then I can't give you much help there.
The next step, then, is to run a timing analysis on the program, and recode the most frequently used code or slowest code than impacts performance the most in assembly.
More generally, check out the resources in these questions:
Literature and Tutorials for Writing a Ray Tracer
Anyone know of a really good book about Ray Tracing?
Computer Graphics: Raytracing and Programming 3D Renders
raytracing with CUDA
I really like the idea of using a graphics card (a massively parallel computer) to do some of the work.
There are many other raytracing related resources on this site, some of which are listed in the sidebar of this question, most of which can be found in the raytracing tag.
I don't know D at all, so I'm not able to look at the code and find specific optimizations, but I can speak generally.
It really depends on your requirements. One of the simplest optimizations is just to reduce the number of reflections/refractions that any particular ray can follow, but then you start to lose out on the "perfect result".
Raytracing is also an "embarrassingly parallel" problem, so if you have the resources (such as a multi-core processor), you could look into calculating multiple pixels in parallel.
Beyond that, you'll probably just have to profile and figure out what exactly is taking so long, then try to optimize that. Is it the intersection detection? Then work on optimizing the code for that, and so on.
Some suggestions.
Use bounding objects to fail fast.
Project the scene at a first step (as common graphic cards do) and use raytracing only for light calculations.
Parallelize the code.
Raytrace every other pixel. Get the color in between by interpolation. If the colors vary greatly (you are on an edge of an object), raytrace the pixel in between. It is cheating, but on simple scenes it can almost double the performance while you sacrifice some image quality.
Render the scene on GPU, then load it back. This will give you the first ray/scene hit at GPU speeds. If you do not have many reflective surfaces in the scene, this would reduce most of your work to plain old rendering. Rendering CSG on GPU is unfortunately not completely straightforward.
Read source code of PovRay for inspiration. :)
You have first to make sure that you use very fast algorithms (implementing them can be a real pain, but what do you want to do and how far want you to go and how fast should it be, that's a kind of a tradeof).
some more hints from me
- don't use mailboxing techniques, in papers it is sometimes discussed that they don't scale that well with the actual architectures because of the counting overhead
- don't use BSP/Octtrees, they are relative slow.
- don't use the GPU for Raytracing, it is far too slow for advanced effects like reflection and shadows and refraction and photon-mapping and so on ( i use it only for shading, but this is my beer)
For a complete static scene kd-Trees are unbeatable and for dynamic scenes there are clever algorithms there that scale very well on a quadcore (i am not sure about the performance above).
And of course, for a realy good performance you need to use very much SSE code (with of course not too much jumps) but for not "that good" performance (im talking here about 10-15% maybe) compiler-intrinsics are enougth to implement your SSE stuff.
And some decent Papers about some Algorithms i was talking about:
"Fast Ray/Axis-Aligned Bounding Box - Overlap Tests using Ray Slopes"
( very fast very good paralelisizable (SSE) AABB-Ray hit test )( note, the code in the paper is not all code, just google for the title of the paper, youll find it)
http://graphics.tu-bs.de/publications/Eisemann07RS.pdf
"Ray Tracing Deformable Scenes using Dynamic Bounding Volume Hierarchies"
http://www.sci.utah.edu/~wald/Publications/2007///BVH/download//togbvh.pdf
if you know how the above algorithm works then this is a much greater algorithm:
"The Use of Precomputed Triangle Clusters for Accelerated Ray Tracing in Dynamic Scenes"
http://garanzha.com/Documents/UPTC-ART-DS-8-600dpi.pdf
I'm also using the pluecker-test to determine fast (not thaat accurate, but well, you can't have all) if i hit a polygon, works very pretty with SSE and above.
So my conclusion is that there are so many great papers out there about so much Topics that do relate to raytracing (How to build fast, efficient trees and how to shade (BRDF models) and so on and so on), it is an realy amazing and interesting field of "experimentating", but you need to have also much sparetime because it is so damn complicated but funny.
My first question is - are you trying to optimize the tracing of one single still screen,
or is this about optimizing the tracing of multiple screens in order to calculate an animation ?
Optimizing for a single shot is one thing, if you want to calculate successive frames in an animation there are lots of new things to think about / optimize.
You could
use an SAH-optimized bounding volume hierarchy...
...eventually using packet traversal,
introduce importance sampling,
access the tiles ordered by Morton code for better cache coherency, and
much more - but those were the suggestions I could immediately think of. In more words:
You can build an optimized hierarchy based on statistics in order to quickly identify candidate nodes when intersecting geometry. In your case you'll have to combine the automatic hierarchy with the modeling hierarchy, that is either constrain the build or have it eventually clone modeling information.
"Packet traversal" means you use SIMD instructions to compute 4 parallel scalars, each of an own ray for traversing the hierarchy (which is typically the hot spot) in order to squeeze the most performance out of the hardware.
You can perform some per-ray-statistics in order to control the sampling rate (number of secondary rays shot) based on the contribution to the resulting pixel color.
Using an area curve on the tile allows you to decrease the average space distance between the pixels and thus the probability that your performance benefits from cache hits.