I want to create a 2d opengles engine to use for my apps. This engine should support a scene graph. Each node of the graph can have it's own shader, texture, material and transformation matrix according to it's parent node. But I'm new to opengles 2.0 and so I have some questions:
How does the matrix multiplication work in opengles 2.0? Is it a good approach to draw first parent, then it's child nodes (will it give some optimization when multiplying modelview matrices). Where does the matrix multiplication take place? Should i do it on the CPU or on GPU.
Is it a better approach to draw nodes according to the shader it uses, then to texture and material? How should i implement scene graph transformations in this case (on CPU or GPU and how)?
How does the matrix multiplication work in opengles 2.0? Is it a good approach to draw first parent, then it's child nodes (will it give some optimization when multiplying modelview matrices). Where does the matrix multiplication take place? Should i do it on the CPU or on GPU.
Except some ancient SGI machines, transformation matrix multiplication always took place on the CPU. While it is possible to do matrix multiplication in a shader, this should not be used for implementing a transformation hierachy. You should use, or implement a small linear algebra library. If it is tailored for 4x4 homogenous tranformations, those used in 3D graphics, it can be implemented in under 1k lines of C code.
Instead of relying on the old OpenGL matrix stack, which has been deprecated and removed from later versions, for every level in the transformation stack you create a copy of the current matrix, apply the next transform on it and supply the new matrix to the transformation uniform.
Is it a better approach to draw nodes according to the shader it uses, then to texture and material? How should i implement scene graph transformations in this case (on CPU or GPU and how)?
Usually one uses a two phase approach to rendering. In the first phase you collect all the information about which objects to draw and the drawing input data (shaders, textures, transformation matrices) and put this in a list. Then you sort that list according to some criteria. One wants to minimize the total cost of state changes.
The most expensive kind of state change is switching textures as this invalidates the caches. Swapping textures between texturing units has some cost but its still cheaper than switching shaders, which invalidates the state of the execution path predictor. Changing uniform data however is very cheap, so one should not be anxious to switch uniforms too often. So if you can make substantial settings through uniforms, do it that way. However you must be aware that conditional code in shaders has a performance it there, so you must balance the cost of switching shaders against the cost of conditional shader code.
Related
I want to do some raytracing from a fragment shader. I don't want to use a compute shader as I need to do rasterization anyway and want to do some simple raytracing as part of the fragment shader evaluation in a single pass. Simple in the sense as the mesh I want to raytrace against consists of only a few hundred triangles.
I have written plenty of shaders and also wrote (cpu based) raytracers, so I'm familiar with all the concepts. What I wonder though is what's the best representation for the mesh plus acceleration structure (some kd-tree probably) to pass it to the fragment shader? most likely they'd be converted to textures in some way (like three pixels for example to represent the positions of a triangle).
I am creating a 3D reservoir model which looks like this.
It's made of hundreds of thousands of cells with outline. The outline is needed for all cells underneath, because there is an IJK filter used to hide cells on any level and thus show the rest. Once the model is rendered, it shouldn't need to be updated in terms of position or scale.
That's enough about the background. The approach I'm using is creating one large geometry, which stores all vertices cross the reservoir in one triangle strip. It also stores IJK index for each cell, so the IJK filter works in shader level. This should create the mesh part. Then I create another object to draw all outlines using one THREE.LineSegments.
The approach works pretty well for small amount of cells, but for large data set, frame rate drops.
I'm proposing another way of doing this by barycentric outline and instancing drawing. Barycentric outline drawing removes the extra LineSegment object, since it draws outline in fragment shader. However, it comes with drawbacks. Because of the missing of geometry shader in WebGL, I have to use full triangle rather than triangle strip to store barycentric coordinates for each vertex. I'm ok with this extra memory usage, if instanced drawing can boost the performance.(?) That's to say, I draw a cube with outline, and I create as many instances as I need and put them in right position.
I am wondering if this approach is indeed gonna increase the performance theoretically. Any thoughts are welcomed!
Ok I think I am gonna answer this question myself.
I implemented the change based on above ideas and it works pretty good compared to the original version.
Let's put the result first: this approach has no problem rendering hundreds of thousands of cells at reasonable frame rate. My demo contains 400,000 cells, with the frame rate at 50 fps in worst case, running on my Nvidia 1050Ti card and 4k monitor. For comparison, if I draw 400,000 cells in the previous version, the frame rate could drop to 10 fps.
This means using instanced drawing for a large object is faster than composing a single large geometry. For rendering performance, the instanced cube is rendered only one side, while triangle-stripped cube is two-sided. Once I can draw a single unit cube with ideal outline, I can transform it to any places in "any" shape in vertex shader. But of course instanced drawing comes with its restrictions: each cell doesn't have to be at same shape, but has to have same number of vertices, faces, etc; I lost control to change vertex color...
As for memory usage, the new approach actually use less. I provide position for 8 vertices, instead of 14, in each cell. Even though the first unit cube has 36 vertices, I can use its unit position as index, for subsequent instances. That is, for 36 unit vertices (0/1, 0/1, 0/1), I only need to provide 8 real positions.
Hope this helps for people who want to implement the same optimization.
I am trying to make a particle based simulation using the Three.JS particle system. In order to control the vertices I am using fragment shaders so each index in the fragment shader is the position of the particle. I can get velocity from storing the old texture and computing the difference in position. The problem I am confused about is how I would implement collision handling between the particles.
Instead of having each particle go over every other particle in the array (which I can't imagine would work, since it would be like 1,000,000 checks per particle), I was thinking of grouping the points into grid cells and checking the adjacent cells (as I would do on the CPU), however I am unsure how to get group the particles on the GPU. I basically want a texture that is a grid that contains information about the particles that fall within the area that grid cell covers.
Any advice or strategies would be greatly appreciated!
So, I want to start to make a game engine and I realized that I would have to draw 3D Objects and GUI(Immediate Mode) at the same time.
3D objects will use the perspective projection matrix and as GUI is in 2D space I will have to use Orthographic projection matrix.
So how can I implement that please anyone guide me. I'm not one of the professional Graphics programmers.
Also I'm using DirectX 11 so keep it that way.
To preface my answer, when I say "draw at the same time", I mean all drawing that takes place with a single call to ID3D11DeviceContext::Draw (or DrawIndexed/DrawAuto/etc). You might mean something different.
You do not required to draw objects with orthographic and perspective projections at the same time, and this isn't very commonly done.
Generally the projection matrix is provided to a vertex shader via a shader constant (or frequently via a concatenation of the World, View and Projection matrices). When you made a draw of a perspective object, you would bind one set of constants, when drawing an orthographic one, you'd set different ones. Frequently, different shaders are used to render perspective and orthographic objects, because they generally have completely different properties (eg. lighting, etc.).
You could draw the two different types of objects at the same time, and there are several ways you could accomplish that. A straightforward way would be to provide both projection matrices to the vertex shader, and have an additional vertex stream which determines which projection matrix to use.
In some edge cases, you might get some small performance benefit from this sort of batching. I don't suggest you do that. Make you life easier and use separate draw calls for orthographic and perspective objects.
Is there any way to render a different set of screen coordinates than the standard equidistant grid between -1,-1 to 1,1?
I am not talking about a transformation that can be accomplished by transformations in the vertex shader.
Specifically ES2 would be nice, but any version is fine.
Is this even directly OpenGl related or is the standard grid typically provided by plumbing libraries?
No, there isn't any other way. The values you write to gl_Position in the vertex (or tesselation or geometry) shader are clip space coordinates. The GPU will convert these to normalized device space (the "[-1,1] grid") by dividing by the w coordinate (after the actual primitive clipping, of course) and will finally use the viewport parameters to transform the results to window space.
There is no way to directly use window coordinates when you want to use the rendering pipeline. There are some operations which bypass large parts of that pipeline, like frambuffer blitting, which provides a limited way to draw some things directly to the framebuffer.
However, working with pixel coordinates isn't hard to achieve. You basically have to invert the transformations the GPU will be doing. While typically "ortho" matrices are used for this, the only required operations are a scale and a translation, which boils down to a fused multiply-add per component, so you can implement this extremely efficient in the vertex shader.