WebGL batch drawing textures. Which has better performance and why? - opengl-es

I'm working on a WebGL batch renderer (question still valid in OpenGL land). Aka all graphics in a scene drawn in the fewest possible drawArrays/drawElements calls (ideally 1). Part of this involves allowing textures to be determined based on attributes.
Therefore in my fragment shader I'm contemplating two scenarios:
1. Draw texture 0 to the screen and use attributes to determine the "frame" in which the texture lies on a sprite sheet that's in memory. The fragment shader would look something like:
precision mediump float;
uniform sampler2D u_spriteSheet;
// Represents position that's framed.
varying vec4 v_texturePosition;
void main() {
gl_FragColor = texture2D(u_spriteSheet, v_texturePosition);
}
2. Do "if" statements in the shader to determine which uniform sampler2d to utilize. The fragment shader would look something like:
precision mediump float;
uniform sampler2D u_image1;
uniform sampler2D u_image2;
uniform sampler2D u_image3;
uniform sampler2D u_image4;
....
uniform sampler2D u_image32;
varying uint v_imageId;
// Represents texture position that's framed
varying vec4 v_texturePosition;
void main() {
if(v_imageId == 1) {
gl_FragColor = texture2D(u_image1, v_texturePosition);
}
else if (v_imageId == 2) {
gl_FragColor = texture2D(u_image2, v_texturePosition);
}
...
else if (v_imageId == 32) {
gl_FragColor = texture2D(u_image32, v_texturePosition);
}
}
I understand that with option 1 i'm limited by the max texture size and by approach 2 i'm limited by the number of texture registers available. For the sake of discussion lets assume that these limits will never be passed.
I'm trying to determine the more performant approach before investing a significant amount of time into either one... Sooo any thoughts?

If statements in shaders are generally slow, because on normal GPU hardware
shaders are executed as SIMD, i.e. many fragments are processed in parallel,
instruction per instruction. Simplified speaking, in case of an if all threads
process the then part whereby only threads with a positive if-condition
really execute and store the result while the other threads are waiting
(or even do the calculation but not store the result). Afterwards all
threads do the else part and all threads with positive condition are waiting.
So in your solution #2 the fragment shader on many cards would execute all
32 parts and not just one as in solution #1 (On some cards it is said that
they stop executing an if branch if there is no thread following that part any
more, so it may be less than 32).
So I would expect that solution #1 is faster w.r.t the fragment shader. However your solution might have other bottlenecks, so that the performance of the fragment shader
might become irrelevant for the overall performance.
Additional thoughts are that many GPUs allow less than 32 textures, so you
probably cannot use 32 if you want to stay compatible with many devices.
According to webglstats.com 99% have 16 textures units and since most
scenes have more than 16 textures there is probably no way around
implementing something like your solution #1.
On the other hand when you hit the maximal texture size you might need
something like #2 as well.

Related

Is there a way to access a vector by coordinates as you would a texture in GLSL?

I am implementing a feature extraction algorithm with OpenGL ES 3.0 (given an input texture with some 1's and mostly 0's, produce an output texture that has feature regions labeled). The problem I face in my fragment shader is how to perform a “lookup” on an intermediate vec or float rather than a sampler.
Conceptually every vec or float is just a texel-valued function, so there ought to be a way to get its value given texel coordinates, something like textureLikeLookup(texel_valued, v_color) - but I haven’t found anything that parallels the texture* functions.
The options I see are:
Render my vector to a framebuffer and pass that as a texture into another shader/rendering pass - undesirable because I have many such passes in a loop, and I want to avoid interleaving CPU calls;
Switch to ES 3.1 and take advantage of imageStore (https://www.khronos.org/registry/OpenGL-Refpages/es3.1/html/imageStore.xhtml) - it seems clear that if I can update an intermediate image within my shader then I can achieve this within the fragment shader (cf. https://www.reddit.com/r/opengl/comments/279fc7/glsl_frag_update_values_to_a_texturemap_within/), but I would rather stick to 3.0 if possible.
Is there a better/natural way to deal with this problem? In other words, do something like this
// works, because tex_sampler is a sampler2D
vec4 texel_valued = texture(tex_sampler, v_color);
when the data is not a sampler2D but a vec:
// doesn't work, because texel_valued is not a sampler but a vec4
vec4 oops = texture(texel_valued, v_color);

Pixelated results when altering texture coordinates in fragment shader

I am trying to alter the texture coordinates in the fragment shader.
Precision is set to medium.
It works on some devices but causes severe pixelation on some (cheaper) devices.
I assume this is a precision problem but its odd because I have set default to medium which should be available on all devices. Any idea? Thanks for your time.
+"mediump vec2 coords=v_TexCoordinate; \n"
+"coords+=0.1; \n"
mediump is only defined in the spec in terms of a minimum allowable precision. There are devices out there which can implement mediump as a highp (i.e. as more or less a single precision 32 bit float).
There may be other devices which genuinely have a mediump (half precision float alike) and separate highp in hardware, but use some heuristic to notice that it's a texture coordinate and decide that you're better off with the higher precision.
Other devices still, may move your trivial tex coordinate modification to the vertex shader, and will use the texture coordinates output by the vertex shader to prefetch the texels before the fragment shader even executes (and thus avoid the fragment shader's shonky precision altogether).
So just because you ask for mediump everywhere, it's not unusual that you can only repro precision issues on a subset of your devices.
In terms of fixing it - from what you've included in your question, you could just do the tex coordinate modification on the vertex shader.

building a type of pixel sorting in glsl

I'm working on emulating in glsl an effect I've seen used pretty widely (example in image). I am new to glsl, but have a decent amount of experience in max msp and jitter, so that is where I am trying to implement it (syntax should be similar enough though, but it's glsl version 2.1).
Currently, I am taking in one texture, then feeding the output back in as the second texture. Using a luma threshold, I am selecting pixels to be sorted, but from here, I can't figure out how to continue. I assume that I need to:
Take the color value from the initial, luma selected pixel.
Apply that color to subsequent pixel along a certain axis.
And now I can't figure out how to continue. I'll include the code that I do have, which obviously just has nothing after the luma threshold stuff. Any advice or resources would be greatly appreciated.
uniform sampler2DRect tex0;
uniform sampler2DRect tex1;
varying vec2 texcoord0;
varying vec2 texcoord1;
uniform float thresh;
const vec4 lumcoeff = vec4(0.299,0.587,0.114,0.0);
void main()
{
vec4 input0 = texture2DRect(tex0, texcoord0);
vec4 input1 = texture2DRect(tex1, texcoord1);
float luma = dot(input0,lumcoeff);
if(luma > thresh){
gl_FragColor
}else{
gl_FragColor = input0;
}
}
I actually made the original image you posted up above. Sorting on the GPU can be a complicated task because we can't easily make comparisons the same was that a CPU can.
I'd take a look at this nvidia paper on a bitonic merge sort. I implemented it in glsl here if you'd like to take a look, though I think chrome auto play killed it so you may need to try in FF or Safari.
There's also a few examples on shadertoy if you search around. I think this one might be pretty close to what you're hoping for
Good luck!

Can OpenGL shader compilers optimize expressions on uniforms?

I've got an OpenGL ES shader with some uniforms in it. I do some math on the uniforms in my fragment shader. Will the shader compiler generally optimize those expressions on the uniforms so they happen once instead of on every pixel? Or should I do the computations outside the shader and pass the results in?
Specifically, I've got three uniform coordinates I'm passing into the shader:
uniform vec2 u_a;
uniform vec2 u_b;
uniform vec2 u_c;
And then I compute some vectors among those points:
vec2 v0 = u_c - u_a;
vec2 v1 = u_b - u_a;
I'm curious if the shader compiler can optimize these such that they happen once per render, or if I should compute these outside the shader and pass them in as additional uniforms. Of course, for optimizations, I should really measure things to find the difference in my specific situation (since the GPU might be faster doing this on every pixel than my CPU), but I'm interested in how much scope the shader compilers have for optimizations like this in general.
Bonus points if you know that shader compilers on Android/iPhone devices might behave differently, or if different GLSL versions make a difference.
Could a compiler optimize that? Yes, it could. Will it? Probably not, as this would require significant overhead when beginning to render an object (they have to pre-compute all the pre-computable stuff for shaders).

Render depth buffer to texture

Quite new at shaders, so please bear with me if I am doing something silly here. :)
I am trying to render the depth buffer of a scene to a texture using opengl ES 2.0 on iOS, but I do not seem to get entirely accurate results unless the models have a relatively high density of polygons showing on the display.
So, for example if I render a large plane consisting of only four vertices, I get very inaccurate results, but if I subdivide this plane the results get more accurate for each subdivision, and ultimately I get a correctly rendered depth buffer.
This reminds me a lot about affine versus perspective projected texture mapping issues, and I guess I need to play around with the ".w" component somehow to fix this. But I thought the "varying" variables should take this into account already, so I am a bit at loss here.
This is my vertex and fragment shader:
[vert]
uniform mat4 uMVPMatrix;
attribute vec4 aPosition;
varying float objectDepth;
void main()
{
gl_Position=uMVPMatrix * aPosition;
objectDepth=gl_Position.z;
}
[frag]
precision mediump float;
varying float objectDepth;
void main()
{
//Divide by scene clip range, set to a constant 200 here
float grayscale=objectDepth/200.0;
gl_FragColor = vec4(grayscale,grayscale,grayscale,1.0);
}
Please note that this shader is simplified a lot just to highlight the method I am using. Although for the naked eye it seems to work well in most cases, I am in fact rendering to 32 bit textures (by packing a float into ARGB), and I need very high accuracy for later processing or I get noticeable artifacts.
I can achieve pretty high precision by cranking up the polygon count, but that drives my framerate down a lot, so, is there a better way?
You need to divide z by the w component.
This is very simple, the depth is not linear so you can not use linear interpolation for z ... you will solve it very easily if you interpolate 1/z instead of z. You can also perform some w math, exactly as suggested by rasmus.
You can read more about coordinate interpolation at http://www.luki.webzdarma.cz/eng_05_en.htm (page about implementing a simple software renderer)

Resources