How to optimize texture fetching in openGL-ES

How to optimize texture fetching in openGL-ES - opengl-es

I want to draw a map in shader, which is composed by five textures. One is for the ratios of each texture(CC_Texture0: 512x512), and the other four are different elements(u_tex_r,u_tex_g,u_tex_b,u_tex_a: 256x256).
my shader is:
varying vec4 v_fragmentColor;
#ifdef GL_ES
varying highp vec2 v_uv0;
#else
varying vec2 v_uv0;
#endif
//uniform vec2 repeated_r;
//uniform vec2 repeated_g;
//uniform vec2 repeated_b;
//uniform vec2 repeated_a;
uniform vec4 u_repeat_1;
uniform vec4 u_repeat_2;
uniform sampler2D u_tex_r;
uniform sampler2D u_tex_g;
uniform sampler2D u_tex_b;
uniform sampler2D u_tex_a;
void main()
{
vec4 mix = texture2D(CC_Texture0, v_uv0);
vec4 r = texture2D(u_tex_r, vec2(v_uv0.x * 20, v_uv0.y * 20));
vec4 g = texture2D(u_tex_g, vec2(v_uv0.x * 20, v_uv0.y * 20));
vec4 b = texture2D(u_tex_b, vec2(v_uv0.x * 20, v_uv0.y * 20));
vec4 a = texture2D(u_tex_a, vec2(v_uv0.x * 20, v_uv0.y * 20));
gl_FragColor = vec4((r * mix.r + g * mix.g + b * mix.b + a * mix.a).rgb, 1.0);
//gl_FragColor = vec4((r * mix.r + g * mix.g + b * mix.b).rgb, 1.0);
}
Device is samsung G5308W, frame rate is only 50 fps, even I delete the scale. When I just print CC_Texture0, frame rate can be 60fps. why? GPU memory or scale? Anybody can help to improve it?

Clearly your shader is pretty simple in terms of computation, so I suspect that the thing that is limiting performance is the cost of fetching the teture data. This is not unusual on mobile, memory bandwidth is often the limiting factor for frame-rate and also one of the top contributors to energy use affecting battery life and device heat.
Some suggestions:
You mentioned that removing the scale doesn't help performance (I presume you mean removing the * 20 when constructing the UVs). Even if it doesn't have a measurable impact on this device I'd still recommend avoiding the dependent texture read as it will probably slightly improve performance and may improve performance a lot on some older devices. Add a second set of UVs which you can calculate on the vertex shader and pass in as a varying. If you need different scales for each texture then I'd add 4 new varyings. Varyings are very cheap. Don't be tempted to encode multiple UVs into vec4, that might cause dependent texture reads.
You mention the texture resolution but not the texture format. If you are able to lower the bits-per-pixel of these textures as much as possible you will see a big impact on performance. Compressed textures (e.g. ETC1 which is 4bpp) are best, but even switching from 8888 to 565 or 4444 will also help a lot.
Could you use a simpler shader in some cases? You haven't mentioned the context, but this has the feel of terrain texture splatting. In terrain you often find that very few chunks of geometry actually use all of the tiled textures. If you can identify geometries which reference fewer textures and use a more specialized fragment shader for those which reference fewer textures, then you would get a good performance boost. The tradeoff is more complex code and potentially more draw calls and state changes which might impact CPU time so it can be a tricky balancing act.
Finally, the G5308W is a 6 year old device, if you can't hit 60fps on it, then it isn't the end of the world.

Related

Perfomance depending on index type

I was playing around with "drawing" millions of triangles and found something interesting: switching type of indices from VK_INDEX_TYPE_UINT32 to VK_INDEX_TYPE_UINT16 increased amount of triangles being drawn per second by 1.5 times! I want to know, how is the difference in speed so large?
I use indirect indexed instanced (so much i) drawing: 25 vertices, 138 indices(46 triangles), 2^21~=2M instances(I am too lazy to seek where to disable vSync), 1 draw per frame. 96'468'992 triangles per frame total. To get the clearest results I look away from the triangles (discarding rasterisation has pretty much same performance)
I have very simple vertex shader:
layout(set = 0, binding = 0) uniform A
{
mat4 cam;
};
layout(location = 0)in vec3 inPosition;//
layout(location = 1)in vec4 inColor; //Color and position are de-interleaved
layout(location = 2)in vec3 inGlob; //
layout(location = 3)in vec4 inQuat; //data per instance, interleaved
layout(location = 0)out vec4 fragColor;
vec3 vecXquat(const vec3 v, const vec4 q)
{// function rotating vector by quaternion
return v + 2.0f *
cross(q.xyz,
cross(q.xyz, v)
+ q.w * v);
}
void main(){
gl_Position = vec4(vecXquat(inPosition, inQuat)+inGlob, 1.0f)*cam;
fragColor = inColor;
}
and pass-through fragment shader.
Primitives - VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST
The results:
~1950MTris/s with 32bit indices
~2850MTris/s with 16bit indices
GPU - GTX1050Ti

Since your shaders are so simple, your rendering performance will likely be dominated by factors that would otherwise be more trivial, like vertex data transfer rate.
138 indices have to be read by the GPU for each instance. With 2 million instances, that's 1.02GB of just index data that has to be read by the GPU with 32-bit indices. Of course, for 16-bit indices, the transfer rate is halved. And with half as much data, there's a better chance that the index data all manages to fit entirely in the vertex pulling cache.

Finding the size of a screen pixel in UV coordinates for use by the fragment shader

I've got a very detailed texture (with false color information I'm rendering with a false-color lookup in the fragment shader). My problem is that sometimes the user will zoom far away from this texture, and the fine detail will be lost: fine lines in the texture can't be seen. I would like to modify my code to make these lines pop out.
My thinking is that I can run fast filter over neighboring textels and pick out the biggest/smallest/most interesting value to render. What I'm not sure how to do is to find out if (and how much) to do this. When the user is zoomed into a triangle, I want the standard lookup. When they are zoomed out, a single pixel on the screen maps to many texture pixels.
How do I get an estimate of this? I am doing this with both orthogographic and perspective cameras.
My thinking is that I could somehow use the vertex shader to get an estimate of how big one screen pixel is in UV space and pass that as a varying to the fragment shader, but I still don't have a solid grasp on either the transforms and spaces enough to get the idea.
My current vertex shader is quite simple:
varying vec2 vUv;
varying vec3 vPosition;
varying vec3 vNormal;
varying vec3 vViewDirection;
void main() {
vUv = uv;
vec4 mvPosition = modelViewMatrix * vec4( position, 1.0 );
vPosition = (modelMatrix *
vec4(position,1.0)).xyz;
gl_Position = projectionMatrix * mvPosition;
vec3 transformedNormal = normalMatrix * vec3( normal );
vNormal = normalize( transformedNormal );
vViewDirection = normalize(mvPosition.xyz);
}
How do I get something like vDeltaUV, which gives the distance between screen pixels in UV units?
Constraints: I'm working in WebGL, inside three.js.
Here is an example of one image, where the user has zoomed perspective in close to my texture:
Here is the same example, but zoomed out; the feature above is a barely-perceptible diagonal line near the center (see the coordinates to get a sense of scale). I want this line to pop out by rendering all pixels with the red-est color of the corresponding array of textels.
Addendum (re LJ's comment)...
No, I don't think mipmapping will do what I want here, for two reasons.
First, I'm not actually mapping the texture; that is, I'm doing something like this:
gl_FragColor = texture2D(mappingtexture, texture2d(vec2(inputtexture.g,inputtexture.r))
The user dynamically creates the mappingtexture, which allows me to vary the false-color map in realtime. I think it's actually a very elegant solution to my application.
Second, I don't want to draw the AVERAGE value of neighboring pixels (i.e. smoothing) I want the most EXTREME value of neighboring pixels (i.e. something more akin to edge finding). "Extreme" in this case is technically defined by my encoding of the g/r color values in the input texture.
Solution:
Thanks to the answer below, I've now got a working solution.
In my javascript code, I had to add:
extensions: {derivatives: true}
to my declaration of the ShaderMaterial. Then in my fragment shader:
float dUdx = dFdx(vUv.x); // Difference in U between this pixel and the one to the right.
float dUdy = dFdy(vUv.x); // Difference in U between this pixel and the one to the above.
float dU = sqrt(dUdx*dUdx + dUdy*dUdy);
float pixel_ratio = (dU*(uInputTextureResolution));
This allows me to do things like this:
float x = ... the u coordinate in pixels in the input texture
float y = ... the v coordinate in pixels in the input texture
vec4 inc = get_encoded_adc_value(x,y);
// Extremum mapping:
if(pixel_ratio>2.0) {
inc = most_extreme_value(inc, get_encoded_adc_value(x+1.0, y));
}
if(pixel_ratio>3.0) {
inc = most_extreme_value(inc, get_encoded_adc_value(x-1.0, y));
}
The effect is subtle, but definitely there! The lines pop much more clearly.
Thanks for the help!

You can't do this in the vertex shader as it's pre-rasterization stage hence output resolution agnostic, but in the fragment shader you could use dFdx, dFdy and fwidth using the GL_OES_standard_derivatives extension(which is available pretty much everywhere) to estimate the sampling footprint.
If you're not updating the texture in realtime a simpler and more efficient solution would be to generate custom mip levels for it on the CPU.

GLSL: memory exhausted

I am working on a WebGL scene with ~100 different 2048 x 2048 px textures. I'm rendering points primitives, and each point has a texture index and texture uv offsets that indicate the region of the given texture that should be used on the point.
Initially, I attempted to pass each point's texture index as a varying value, then I attempted to pull the given texture from a sampler2D array using that index position. However, this yielded an error that one can only fetch sampler2D array values with a "constant integer expression", so now I'm using a gnarly if conditional to assign each point's texture index:
/**
* The fragment shader's main() function must define `gl_FragColor`,
* which describes the pixel color of each pixel on the screen.
*
* To do so, we can use uniforms passed into the shader and varyings
* passed from the vertex shader.
*
* Attempting to read a varying not generated by the vertex shader will
* throw a warning but won't prevent shader compiling.
**/
// set float precision
precision highp float;
// repeat identifies the size of each image in an atlas
uniform vec2 repeat;
// textures contains an array of textures with length n textures
uniform sampler2D textures[42];
// identify the uv values as a varying attribute
varying vec2 vUv; // blueprint uv coords
varying vec2 vTexOffset; // instance uv offsets
varying float vTexture; // set index of each object's vertex
void main() {
int textureIndex = int(floor(vTexture));
vec2 uv = vec2( gl_PointCoord.x, 1.0 - gl_PointCoord.y );
// The block below is automatically generated
if (textureIndex == 0) {vec4 color = texture2D(textures[0], uv * repeat + vTexOffset ); }
else if (textureIndex == 1) { vec4 color = texture2D(textures[1], uv * repeat + vTexOffset ); }
else if (textureIndex == 2) { vec4 color = texture2D(textures[2], uv * repeat + vTexOffset ); }
else if (textureIndex == 3) { vec4 color = texture2D(textures[3], uv * repeat + vTexOffset ); }
[ more lines of the same ... ]
gl_FragColor = color;
}
If the number of textures is small, this works fine. But if the number of textures is large (e.g. 40) this approach throws:
ERROR: 0:58: '[' : memory exhausted
I've tried reading around on this error but still am not sure what it means. Have I surpassed the max RAM in the GPU? If anyone knows what this error means, and/or what I can do to resolve the problem, I'd be grateful for any tips you can provide.
More details:
Total size of all textures to be loaded: 58MB
Browser: recent Chrome
Graphics card: AMD Radeon R9 M370X 2048 MB graphics (stock 2015 OSX card)

There is a limit on how many samplers a fragment shader can access. It can be obtained via gl.getParameter(gl.MAX_TEXTURE_IMAGE_UNITS). It is guaranteed to be at least 8, and is typically 16 or 32.
To circumvent the limit, texture arrays are available in WebGL2, which also allow indexing layers with any variable. In WebGL1 your only option are atlases, but since your textures are already 2048 by 2048, you can't make ghem any bigger.
If you don't want to limit yourself to WebGL2, you would have to split your rendering into multiple draw calls with diffferent textures set.
Also consider that having 100 8-bit RGBA 2048x2048 textures uses up 1.6 gigabytes of VRAM. Texture compression via WEBGL_compressed_texture_s3tc can reduce that by 8x or 4x, depending on how much alpha precision you need.

OpenGL - trouble passing ALL data into shader at once

I'm trying to display textures on quads (2 triangles) using opengl 3.3
Drawing a texture on a quad works great; however when I have ONE textures (sprite atlas) but using 2 quads(objects) to display different parts of the atlas. When in draw loop, they end up switching back and fourth(one disappears than appears again, etc) at their individual translated locations.
The way I'm drawing this is not the standard DrawElements for each quad(or object) but I package all quads, uv, translations, etc send them up to the shader as one big chunk (as "in" variables): Vertex shader:
#version 330 core
// Input vertex data, different for all executions of this shader.
in vec3 vertexPosition_modelspace;
in vec3 vertexColor;
in vec2 vertexUV;
in vec3 translation;
in vec4 rotation;
in vec3 scale;
// Output data ; will be interpolated for each fragment.
out vec2 UV;
// Output data ; will be interpolated for each fragment.
out vec3 fragmentColor;
// Values that stay constant for the whole mesh.
uniform mat4 MVP;
...
void main(){
mat4 Model = mat4(1.0);
mat4 t = translationMatrix(translation);
mat4 s = scaleMatrix(scale);
mat4 r = rotationMatrix(vec3(rotation), rotation[3]);
Model *= t * r * s;
gl_Position = MVP * Model * vec4 (vertexPosition_modelspace,1); //* MVP;
// The color of each vertex will be interpolated
// to produce the color of each fragment
fragmentColor = vertexColor;
// UV of the vertex. No special space for this one.
UV = vertexUV;
}
Is the vertex shader working as I think it would with a large chunk of data - that it draws each segment passed up as uniform individually because it does not seem like it? Is my train of thought correct on this?
For completeness this is my fragment shader:
#version 330 core
// Interpolated values from the vertex shaders
in vec3 fragmentColor;
// Interpolated values from the vertex shaders
in vec2 UV;
// Ouput data
out vec4 color;
// Values that stay constant for the whole mesh.
uniform sampler2D myTextureSampler;
void main()
{
// Output color = color of the texture at the specified UV
color = texture2D( myTextureSampler, UV ).rgba;
}
A request for more information was made so I will put how i bind this data up to the vertex shader. The following code is just one I use for my translations. I have more for color, rotation, scale, uv, etc:
gl.BindBuffer(gl.ARRAY_BUFFER, tvbo)
gl.BufferData(gl.ARRAY_BUFFER, len(data.Translations)*4, gl.Ptr(data.Translations), gl.DYNAMIC_DRAW)
tAttrib := uint32(gl.GetAttribLocation(program, gl.Str("translation\x00")))
gl.EnableVertexAttribArray(tAttrib)
gl.VertexAttribPointer(tAttrib, 3, gl.FLOAT, false, 0, nil)
...
gl.DrawElements(gl.TRIANGLES, int32(len(elements)), gl.UNSIGNED_INT, nil)

You have just single sampler2D
which means you have just single texture at your disposal
regardless on how many of them you bind.
If you really need to pass the data as single block
then you should add sampler per each texture you got
not sure how many objects/textures you have
but you are limited by gfx hw limit on texture units with this way of data passing
also you need to add another value to your data telling which primitive use which texture unit
and inside fragment then select the right texture sampler ...
You should add stuff like this:
// vertex
in int usedtexture;
out int txr;
void main()
{
txr=usedtexture;
}
// fragment
uniform sampler2D myTextureSampler0;
uniform sampler2D myTextureSampler1;
uniform sampler2D myTextureSampler2;
uniform sampler2D myTextureSampler3;
in vec2 UV;
in int txr;
out vec4 color;
void main
{
if (txr==0) color = texture2D( myTextureSampler0, UV ).rgba;
else if (txr==1) color = texture2D( myTextureSampler1, UV ).rgba;
else if (txr==2) color = texture2D( myTextureSampler2, UV ).rgba;
else if (txr==3) color = texture2D( myTextureSampler3, UV ).rgba;
else color=vec4(0.0,0.0,0.0,0.0);
}
This way of passing is not good for these reasons:
number of used textures is limited to HW texture units limit
if your rendering would need additional textures like normal/shininess/light maps
then you need more then 1 texture per object type and your limit is suddenly divided by 2,3,4...
You need if/switch statements inside fragment which can slow things down considerably
Yes you can do it brunch less but then you would need to access all textures all the time increasing heat stress on gfx without reason...
This kind of passing is suitable for
all textures inside single image (as you mentioned texture atlas)
which can be faster this way and reasonable for scenes with small number of object types (or materials) but large object count...

Since I needed more input on this matter, I linked this page to reddit and someone was able to help me with one response! Anyways the reddit link is here:
https://www.reddit.com/r/opengl/comments/3gyvlt/opengl_passing_all_scene_data_into_shader_each/
The issue of seeing two individual textures/quads after passing all vertices as one data structure over to vertex shader was because my element indices were off. I needed to determine the correct index of each set of vertices for my 2 triangle(quad) objects. Simply had to do something like this:
vertexInfo.Elements = append(vertexInfo.Elements, uint32(idx*4), uint32(idx*4+1), uint32(idx*4+2), uint32(idx*4), uint32(idx*4+2), uint32(idx*4+3))

After updating to Mavericks, my OpenGL software runs the vertex shader on CPU

I have a large C++ OpenGL software that was running with very high performance under Mountain Lion.
After updating to Mavericks and recompiling, the performance has dropped significantly.
By switching between Triangle strips and triangles as the type of object being rendered and seeing a drop in performance by a further factor 2 or 3, I am under the impression that the vertex shader must be the cause of the issue and given how simple it is, I suspect that it is running in software on the CPU rather than on the GPU.
How can I recover the performance I had under Mountain Lion? Are there some changes I need to do?
The source of my vertex shader is given below. It feeds a following geometry shader.
#version 410
uniform mat3 normalMatrix;
uniform mat4 modelMatrix;
uniform mat4 modelProjMatrix;
uniform vec3 color = vec3(0.4,0.4, 0.4);
in vec3 vertex;
in vec3 normal;
out NodeData {
vec3 normal, position;
vec4 refColor;
} v;
void main()
{
vec4 position = modelMatrix * vec4(vertex, 1.0);
vec3 vertNormal = normal;
v.normal = normalize(normalMatrix * vertNormal);
v.position = position.xyz;
v.refColor = vec4(color, 1.0);
gl_Position = modelProjMatrix * vec4(vertex, 1.0);
}
For 180,000 triangles I can only get 3FPS when feeding as triangles and about 8 when fed as strips. The triangle are ordered according to Forsyth's optimization algorithm for post transform cache optimization.

Solution: Make sure all vector buffers that are added to the VAO are used in the vertex shader.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio