I'm currently rewriting a shader written in GLES30 for the GLES20 shader language.
I've hit a snag where the shader I need to convert makes a call to the function textureLod, which samples the currently bound texture using a specific level-of-detail. This call is made within the fragment shader, which can only be called within the vertex shader when using GLES20.
I'm wondering, if I replace this with a call with the function texture2D, will I be likely to compromise the function of the shader, or just reduce it's performance? All instances where the textureLod call is made within the original shader uses a level of detail of zero.
If you switch calls from textureLod to texture2D, you will lose control over which mip-level is being sampled.
If the texture being sampled only has a single mip-level, then the two calls are equivalent, regardless of the lod parameter passed to textureLod, because there is only one level that could be sampled.
If the original shader always samples the top mip level (=0), it is unlikely that the change could hurt performance, as sampling lower mip-levels would more likely give better texture cache performance. If possible, you could have your sampled texture only include a top level to guarantee equivalence (unless the mip levels are required somewhere else). If this isn't possible, then the execution will be different. If the sample is used for 'direct' texturing, it is likely that the results will be fairly similar, assuming a nicely generated mip-chain. If it is used for other purposes (eg. logic within the shader), then the divergence might be larger. It's difficult to predict without seeing the actual shader.
Also note that, if the texture sample is used within a loop or conditional, and has been ported to/from a DirectX HLSL shader at any point in its lifetime, the call to textureLod may be an artifact of HLSL not allowing gradient instructions within dynamic loops (of which the HLSL equivalent of texture2D is, but equivalent of textureLod is not). This is required in HLSL, even if the texture only has a single mip-level.
Related
I have an existing OpenGL ES 3.1 application that renders a scene to an FBO with color and depth/stencil attachment. It uses the usual methods for drawing (glBindBuffer, glDrawArrays, glBlend*, glStencil* etc. ). My task is now to create a depth-only pass that fills the depth attachment with the same values as the main pass.
My question is: What is the minimum number of steps necessary to achieve this and avoid the GPU doing superfluous work (unnecessary shader invocations etc.)? Is deactivating the color attachment enough or do I also have to set null shaders, disable blending etc. ?
I assume you need this before the main pass runs, otherwise you would just keep the main pass depth.
Preflight
Create specialized buffers which contain only the mesh data needed to compute position (which are deinterleaved from all non-position data).
Create specialized vertex shaders (which compute only the output position).
Link programs with the simplest valid fragment shader.
Rendering
Render the depth-only pass using the specialized buffers and shaders, masking out all color writes.
Render the main pass with the full buffers and shaders.
Options
At step (2) above it might be beneficial to load the depth-only pass depth results as the starting depth for the main pass. This will give you better early-zs test accuracy, at the expense of the readback of the depth value. Most mobile GPUs have hidden surface removal, so this isn't always going to be a net gain - it depends on your content, target GPU, and how good your front-to-back draw order is.
You probably want to use the specialized buffers (position data interleaved in one buffer region, non-position interleaved in a second) for the main draw, as many GPUs will optimize out the non-position calculations if the primitive is culled.
The specialized buffers and optimized shaders can also be used for shadow mapping, and other such depth-only techniques.
I plan to eliminate all glUniform calls from my GLSL shaders in order to save costs in state switching. For that purpose, I plan to use an UBO that is bound to the shader permanently. Different draw calls use different parts of the UBO (it's basically an array). In order to tell the draw call which entry to use, I have to submit an integer to the vertex/fragment shaders. The problem is, that on the system I have to use even casting a single glUniform call will cause an expensive state update, so I cannot use glUniform at all.
Do you know a solution that will work on GLES 3.1 and one that will work on GLES 2?
GLES doesn't have glMulti* calls yet and base vertex only from 3.2 upwards as far as I know. And adding another vertex attribute may be costly.
I'm working on an OpenGL visualisation for navigating a 3D dataset. Briefly, the visualisation takes in a large (~1 million data points) array of matrices, which are then eigendecomposed and visualised as ellipsoids.
I have found that performance improves significantly when I calculate ellipsoid vertex transformations "up-front" (i.e. calculate all model transformations once only on the CPU), rather than in shaders (where the model transformations have to be calculated for each draw). For scene navigation/lighting etc., view and projection tranformations are calculated as normal as uniforms passed to the relevant shaders.
The result of this approach is the program taking longer to initialise (due to the CPU being tied up calculating all the model transformations), but significantly higher frame rates.
I understand from this, that it is common to decompose matrices to avoid unnecessary shader computations, however I haven't come across anything describing this practice of completely pre-calculating the world space.
I understand that this approach is only appropriate for my narrow usecase (i.e. where the scene is static, meaning there will never be a situation where a vertex's position in world space will change while the program is running). Apart from that, are there any significant reasons that I should avoid doing this?
It's a common optimization to remove redundant transformations from static objects. Your objects are static in the world, so you've collapsed all the redundant transformations right up to the root of your scene, which is not a problem.
Having said that, the performance gain you're seeing is probably not coming from the cost of doing the model transform in the shader, but from passing that transform to the shader for each object. You have not said much about how you organize the ellipsoids, but if you are updating a program with the model matrix uniform and issuing a DrawElements call for each ellipsoid, that is very slow indeed. Even doing something more exotic -- like using instances and passing each transform in a VBO -- you would still have the overhead of updating them,which you can now avoid. If you are not doing this already, you can group your ellipsoid vertices into large arrays and draw them with only a few DrawElements calls.
I'm working on a 3d engine, that should work for mobile platforms. Currently I just want to make a prototype that will work on iOS and use forward rendering. In the engine a scene can have a variable number of lights of different types (directional, spot etc). When rendering, for each object (mesh) an array of lights that affect this object is constructed. The array will always have 1 or more elements. I can pack the light source information into 1D texture and pass to the shader. The number of lights can be put into this texture or passed as a separate uniform (I did not try it yet, but these are my thoughts after googling).
The problem is that not all glsl-es implementation support for loops with variable limits. So I can't write a shader that will loop through light sources and expect it to work on a wide range on platforms. Are there any technics to support variable number of lights in a shader if for loops with variable limits are not supported?
The idea I have:
Implement some preprocessing of shader source to unroll loops manually for different number of lights.
So in that case if I would render all objects with one type of shader and if the number of lights limits are 1 to 3, I will end-up having 3 different shaders (generated automatically) for 1, 2 and 3 lights.
Is it a good idea?
Since the source code for a shader consists of strings that you pass in at runtime, there's nothing stopping you from building the source code dynamically, depending on the number of lights, or any other parameters that control what kind of shader you need.
If you're using a setup where the shader code is in separate text files, and you want to keep it that way, you can take advantage of the fact that you can use preprocessor directives in shader code. Say you use LIGHT_COUNT for the number of lights in your shader code. Then when compiling the shader code, you prepend it with a definition for the count you need, for example:
#define LIGHT_COUNT 4
Since glShaderSource() takes an array of strings, you don't even need any string operations to connect this to the shader code your read from the file. You simply pass it in as an additional string to glShaderSource().
Shader compilation is fairly expensive, so you'll probably want to cache the shader program for each light count.
Another option is what Andon suggested in a comment. You can write the shader for the upper limit of the light count you need, and then pass in uniforms that serve as multipliers for each light source. For the lights you don't need, you set the multiplier to 0. That's not very efficient since you're doing extra calculations for light sources you don't need, but it's simple, and might be fine if it meets your performance requirements.
I'm using an ParticleSystem with PointSprites (inspired by the Cocos2D Source). But I wonder how to rebuild the functionality for OpenGL ES 2.0
glEnable(GL_POINT_SPRITE_OES);
glEnableClientState(GL_POINT_SIZE_ARRAY_OES);
glPointSizePointerOES(GL_FLOAT,sizeof(PointSprite),(GLvoid*) (sizeof(GL_FLOAT)*2));
glDisableClientState(GL_POINT_SIZE_ARRAY_OES);
glDisable(GL_POINT_SPRITE_OES);
these generate BAD_ACCESS when using an OpenGL ES 2.0 context.
Should I simply go with 2 TRIANGLES per PointSprite? But thats probably not very efficent (overhead for extra vertexes).
EDIT:
So, my new problem with the suggested solution from:
https://gamedev.stackexchange.com/questions/11095/opengl-es-2-0-point-sprites-size/15528#15528
is a possibility to pass many different sizes in an batch call. I thought of using an Attribute instead of an Uniform, but then I would need to pass always an PointSize to my shaders - even if I'm not drawing GL_POINTS. So, maybe a second shader (a shader only for GL_POINTS)?! I'm not aware of the overhead for switching shaders every frame in the draw routine (because if the particle system is used, I want naturally also render regular GL_TRIANGLES without an pointSize)... Any ideas on this?
So doing the thing here as I already commented here is what you need: https://gamedev.stackexchange.com/questions/11095/opengl-es-2-0-point-sprites-size/15528#15528
And for which approach to go, I can either tell you to use different shaders for different types of drawables in your application or just another boolean uniform in your shader and enable and disable changing the gl_PointSize through your shader code. It's usually up to you. What you need to keep in mind is changing the shader program is one of the most time costly operations so doing the drawing of same type of objects in a batch will be better in that case. I'm not really sure if using an if statement in your shader code will give a huge performance impact.