Data structure / approach for efficient raytracing - performance

I'm writing a 3D raytracer as a personal learning project (Enlight) and have run into an interesting problem related to doing intersection tests between a ray and a scene of objects.
The situation is:
I have a number of primitives that rays can intersect with (spheres, boxes, planes, etc.) and groups thereof. Collectively I'm calling these scene objects.
I want to be able to scene objects primitives with arbitrary affine transformations by wrapping them in a Transform object (importantly, this will enable multiple instances of the same primitive(s) to be used in different positions in the scene since primitives are immutable)
Scene objects may be stored in a bounding volume hierarchy (i.e. I'm doing spatial partitioning)
My intersection tests work with Ray objects that represent a partial ray segment (start vector, normalised direction vector, start distance, end distance)
The problem is that when a ray hits the bounding box of a Transform object, it looks like the only way to do an intersection test with the transformed primitives contained within is to transform the Ray into the transformed co-ordinate space. This is easy enough, but then if the ray doesn't hit any transformed objects I need to fall back to the original Ray to continue the trace. Since Transforms may be nested, this means I have to maintain a whole stack of Rays for each intersection trace that is done.
This is of course within the inner loop of the whole application and the primary performance bottleneck. It will be called millions of times a second so I'm keen to minimise complexity / avoid unnecessary memory allocation.
Is there a clever way to avoid having to allocate new Rays / keep a Ray stack?
Or is there a cleverer way of doing this altogether?

Most of the time in ray-tracing you have a few (hundred, thousand) objects and quite a few more rays. Probably millions of rays. That being the case, it makes sense to see what kind of computation you can spend on the objects in order to make it faster/easier to have the rays interact with them.
A cache will be very helpful as boyfarrell suggested. It might make sense to not only create the forward and reverse transforms on the objects that would move them to or from the global frame, but also to keep a copy of the object in the global frame. It makes it more expensive to create objects or move them (because the transform changes and so do the cached global frame copies) but that's probably okay.
If you cast N rays and have M objects and N >> M then it stands to reason that every object will have multiple rays hit it. If we assume every ray hits an object then every object has N/M rays that strike it. That means transforming N/M rays to each object, hit testing, and possibly reversing it back out. Or N/M transforms per object at minimum. But if we cache the transformed object we can perform a single transform per object to get to the global frame and then not need any additional. At least for the hit-testing.

Define your primitives in their base form (unity scale, centered on 0,0,0, not rotated) and then move them in the scene using transformations only. Cache the result of the complete forward and reverse transformations in each object. (Do not forget the normal vectors, you will need them for reflections)
This will give you the ability to test the hit using simplified math (you reverse transform the ray to the object space and compute hit with base form object) and then transform the hit point and possible reflection vector back to the real world space using the other transform.
You will need to compute intersections with all objects in scene and select the hit which is closest to the ray origin (but not in negative distance). To speed this even more, enclose multiple objects to "bounding boxes" that will be very simple to compute hit on and will pass the real world ray to the enclosed objects if hit (but all objects will still use their precomputed matrices).

Related

Efficient way to create click targets larger that the actual scene object

What’s a good way to have click targets that are larger than the actual scene object?
So far we have been using a larger invisible (yet raycastable) object to do this but it comes at the cost of requiring two draw calls instead of one.
Is there any better solutions?
So far we have been using a larger invisible (yet raycastable) object to do this but it comes at the cost of requiring two draw calls instead of one.
There is no additional draw call if you set Object3D.visible to false. However, you can still perform raycasting against invisible 3D objects. Use Raycaster.layers to selectively ignore 3D objects when performing intersection tests.
So what you are doing is already fine. You might want to consider to raycast only against bounding volumes if the raycasting performance becomes a bottleneck in your app. The idea is to create an instance of Box3 (AABB) or Sphere (bounding sphere) of your actual scene object and only use it for raycasting.

What is the most efficient way to create a reservoir in Three.js?

I am creating a 3D reservoir model which looks like this.
It's made of hundreds of thousands of cells with outline. The outline is needed for all cells underneath, because there is an IJK filter used to hide cells on any level and thus show the rest. Once the model is rendered, it shouldn't need to be updated in terms of position or scale.
That's enough about the background. The approach I'm using is creating one large geometry, which stores all vertices cross the reservoir in one triangle strip. It also stores IJK index for each cell, so the IJK filter works in shader level. This should create the mesh part. Then I create another object to draw all outlines using one THREE.LineSegments.
The approach works pretty well for small amount of cells, but for large data set, frame rate drops.
I'm proposing another way of doing this by barycentric outline and instancing drawing. Barycentric outline drawing removes the extra LineSegment object, since it draws outline in fragment shader. However, it comes with drawbacks. Because of the missing of geometry shader in WebGL, I have to use full triangle rather than triangle strip to store barycentric coordinates for each vertex. I'm ok with this extra memory usage, if instanced drawing can boost the performance.(?) That's to say, I draw a cube with outline, and I create as many instances as I need and put them in right position.
I am wondering if this approach is indeed gonna increase the performance theoretically. Any thoughts are welcomed!
Ok I think I am gonna answer this question myself.
I implemented the change based on above ideas and it works pretty good compared to the original version.
Let's put the result first: this approach has no problem rendering hundreds of thousands of cells at reasonable frame rate. My demo contains 400,000 cells, with the frame rate at 50 fps in worst case, running on my Nvidia 1050Ti card and 4k monitor. For comparison, if I draw 400,000 cells in the previous version, the frame rate could drop to 10 fps.
This means using instanced drawing for a large object is faster than composing a single large geometry. For rendering performance, the instanced cube is rendered only one side, while triangle-stripped cube is two-sided. Once I can draw a single unit cube with ideal outline, I can transform it to any places in "any" shape in vertex shader. But of course instanced drawing comes with its restrictions: each cell doesn't have to be at same shape, but has to have same number of vertices, faces, etc; I lost control to change vertex color...
As for memory usage, the new approach actually use less. I provide position for 8 vertices, instead of 14, in each cell. Even though the first unit cube has 36 vertices, I can use its unit position as index, for subsequent instances. That is, for 36 unit vertices (0/1, 0/1, 0/1), I only need to provide 8 real positions.
Hope this helps for people who want to implement the same optimization.

Is there a more efficient method than this to calculate the volume of an advanced 3D object?

In a project I'm planning to calculate the volume of an advanced 3D object by creating an array of 1x1 unit squares which make up a grid, passing that grid through the object and, every 1 unit of distance, running collision detection on each square with the object, so essentially we're really creating a cubed grid and running a simplified implementation of 1x1x1 cube collision detection throughout the object. Sum of volume of all collided cubes = volume of advanced 3D object.
Like so (the grid here is less subdivided for the sake of demonstration):
I can then control a balance between computational cost and accuracy by further subdividing the grid.
This seems like it would work but I wanted to make sure I wasn't making a mess of a task that could be much cleaner / simpler before I started. Is there a better way of calculating the volume of an advanced 3D object?

Do elements drawn outside the clip plane affect OpenGL performance?

OpenGL Question:I have something to ask about clip space transformation. I am reading an online tutorial and it says that everything you draw outside the clip space will be clipped. When it come to this, does the elements outside the clip space affects the performance or not? Because it will not be drawn and thus it doesn't affect.
Assuming that it will affect performance and in case of 2d game like super mario, I am thinking about not to draw the elements outside the clip space to achieve better performance. Please clarify. Thanks.
OpenGL has only a certain amount of knowledge about your scene and will clip very late in the pipeline. It can't apply a broad phase test. Assuming you can, you should.
Supposing you had a model with 30,000 triangles, OpenGL would transform each and every one of those 30,000 triangles before considering clipping. If you know something as simple as the bounding sphere for the model it's possible you could see that the whole thing is completely outside of the frustum in a single test and save almost 30,000 extra bits of effort.
In a 2d game like Mario what this usually means is using the scroll position to index into the map and to generate geometry only for potentially visible tiles and sprites that are within the visible area.
For the map that will generally just men figuring out the (x, y) of one corner and then generating geometry for the known width and height of the screen so it means discarding the vast majority of the geometry with zero processing.
For the sprites, this is generally why in those sort of games you often see enemies reset to their starting position if you walk a little way from them and then walk back: they're added to the active list based on a map location trigger and removed when you walk far enough away. While not active, no mutable storage is afforded to them.

Vertex buffer objects and glutsolidsphere

I have to draw a great collection of spheres in a 3D physical simulation of a "spring-mass" like system.
I would like to know an efficient method to draw spheres without having to compile a display list at every step of my simulation (each step may vary from milliseconds to seconds, depending on the number of bodies involved in the computation).
I've read that vertex-buffer objects are an efficient method to draw objects which need also to be sometimes updated.
Is there any method to draw OpenGL spheres in a way faster than glutSolidSphere?
Spheres are self-similar; every sphere is just a scaled version of any other sphere. I see no need to regenerate any geometry. Indeed, I see no need to have more than one sphere at all.
It's simply a matter of providing the proper scaling matrix. I would suggest a sphere of radius one centered at the origin for your display list or buffer object mesh. Then you can just transform it to different locations, using a scale to set the new radius.
I would like to know an efficient method to draw spheres without having to compile a display list at every step of my simulation (each step may vary from milliseconds to seconds, depending on the number of bodies involved in the computation).
Why are you generating a display list at all, if the geometry you put into is is dynamic. Display lists are meant for static geometry that never or only seldomly changes.
I've read that vertex-buffer objects are an efficient method to draw objects which need also to be sometimes updated.
Actually VBOs are most efficient with static geometry as well. In general you want to keep the number of actual geometry updates as low as possible. In your case the only thing updating are the positions (and maybe the size) of the spheres. This is a prime example for instanced drawing. However this also works well, with updating only a uniform or the transformation matrix and do the call drawing a sphere.
The idea of Vertex Arrays and VBOs is, that you draw a whole batch of geometry with a single call. A sphere would be such a batch.

Resources