WebGL: Framebuffers and textures with one, one-byte channel? - opengl-es

I'm generating blurred drop shadows in WebGL by drawing the object to be blurred onto an off-screen framebuffer/texture, then applying a few passes of a filter to it (back and forth between two off-screen framebuffers), then copying the result to the final output.
However, I'm just dropping the RGB channels, overwriting them with the desired color of the drop shadow (usually black) while maintaining the alpha channel. It seems like I could probably get better performance by just having my off-screen framebuffers be a single (alpha) channel.
Is there a way to do that, and would it actually help?
Also, is there a better way to apply multiple passes of a filter than just alternating between two frame buffers and using the previous frame buffer's bound texture as the input?

Assuming WebGL follows GLES then per the spec (Page 91):
The name of the color buffer of an application-created framebuffer object
is COLOR_ATTACHMENT0 ... Color buffers consist of R, G, B, and,
optionally, A unsigned integer values.
So you can't attach only to A, or only to any single colour channel.
Options to explore:
Use colorMask to disable writing to R, G and B. Depending on what data layout your GPU uses internally you can imagine that could effectively achieve exactly what you want or possibly have no effect whatsoever.
Is there a way you could render to the depth channel instead of to the alpha channel?
Reducing memory bandwidth is often helpful but if it's not a bottleneck then you could end up prematurely optimising.
To avoid excessive per-frame ping-ponging you'd normally attempt to reform your shader so that it does the effect of all the stages in one. Otherwise consider whether there's any better than-linear way to combine multiple passes. Instead of knowing only how to get from stage n to stage n+1, can you go from stage n to stage 2n? Or even just n+2?

Related

Mutably borrow multiple disjoint sections of an ImageBuffer

I would like to wrap an inner image with tiled versions of wrapper images using the image-rs crate.
Like so (the blue square is the inner image, and the other colors are the wrappers. Sizes may differ. One layer may be N pixels wide, while the next is K pixels wide. The wrapper layers are not solid colors, but scaled images. This is simply example output illustrating the problem):
I have some code that accomplishes this, in a single RgbaImage allocation, and in a single thread:
let out: RgbaImage = construct_image(); //constructed differently, but here to show types
...
//TODO This loop can be parallelized?
for LocalizedWrapper { // LocalizedWrapper just holds coords, size, and wrapper that knows how to wrap around an image, given coords and size
coordinates, //top-left coordinates
size, //size of bounding rectangle
wrapper, //abstract logic for wrapping differently
} in localized_wrappers
{
wrapper.wrap_mut(&mut out, coordinates, size, ppi)?; //wrap_mut() calls image::ImageBuffer::sub_image(), image::imageops::tile(), and friends
}
This works great (it generated the image above). However, it mutably borrows the entire RgbaImage on every iteration; and I'd like to parallelize the loop to write each layer concurrently. This seems sound as I can guarantee the layers will never overlap.
I initially thought to convert the RgbaImage to its raw content using ImageBuffer::into_raw(), then using slice.split_at_mut() repeatedly, using Rayon to populate each slice in parallel; but then I end up with a messy split job and multiple &mut [u8] per layer with no way of recreating a GenericImage for the layer.
tl;dr How do I wrap multiple images around a central image, in parallel, in a single allocation, using GenericImage to allow using image::imageops::tile() and friends in wrap_mut()?

OpenGL ES 3.x How to (performantly) render blended triangles front-to-back with alpha-blending and early-reject occluded fragments?

I recently found out that one can render alpha-blended primitives correctly not just back-to-front but also front-to-back (http://hacksoflife.blogspot.com/2010/02/alpha-blending-back-to-front-front-to.html) by using GL_ONE_MINUS_DST_ALPHA, GL_ONE, premultiplying the fragment's alpha in the fragment shader and clearing destination alpha to black before rendering.
It occurred to me that it would then be great if one could combine this with EITHER early-z rejection OR some kind of early "destination-alpha testing" in order to discard fragments that won't contribute to the final pixel color.
When rendering with front-to-back alpha-blending, a fragment can be skipped if the destination-alpha at this location already contains the value 1.0.
I did prototype-implement that by using GL_EXT_shader_framebuffer_fetch to test the destination alpha at the start of the pixel shader and then manually discard the fragment if the value is above a certain threshold. That works but it made things actually slower on my test hardware (Snapdragon XR2) - so I wonder:
whether it's somehow possible to not even have the fragment shader execute if destination alpha is already above a certain threshold?
alternatively, if it would be possible to only write to the depth buffer for fragments that are completely opaque and leave the current depth buffer value unchanged for all fragments that have an alpha value of less than 1 (but still depth-test every fragment), that should allow the hardware to use early-z rejection for occluded fragments. So,
Is this possible somehow (i.e. use depth testing, but update the depth buffer value only for opaque fragments and leave it unchanged for others)?
bottom line this would allow to reduce overdraw of alpha-blended sprites to only those fragments that contribute to the final pixel color and I wonder whether there is a performant way of doing this.
For number 2, I think you could modify gl_FragDepth in the fragment shader to achieve something close, but doing so would disable early-z rejection so wouldn't really help.
I think one viable way to reduce overdraw would be to create a tool to generate a mesh for each sprite which aims to cover a decent proportion of the opaque part of the sprite without using too many verts. I imagine for a typical sprite, even just a well placed quad could cover 80%+.
You'd render the generated opaque geometry of your sprites with depth write enabled, and do a second pass the ordinary way with depth testing enabled to cover the transparent parts.
You would massively reduce overdraw, but significantly increase the complexity of your code and number of verts rendered. You would double your draw calls, but if you're atlassing and using texture arrays, you might be doubling from 1 to 2 draw calls which is fine. I've never tried it so can't say if it's worth all the effort that would be involved.

Minimum steps to implement depth-only pass

I have an existing OpenGL ES 3.1 application that renders a scene to an FBO with color and depth/stencil attachment. It uses the usual methods for drawing (glBindBuffer, glDrawArrays, glBlend*, glStencil* etc. ). My task is now to create a depth-only pass that fills the depth attachment with the same values as the main pass.
My question is: What is the minimum number of steps necessary to achieve this and avoid the GPU doing superfluous work (unnecessary shader invocations etc.)? Is deactivating the color attachment enough or do I also have to set null shaders, disable blending etc. ?
I assume you need this before the main pass runs, otherwise you would just keep the main pass depth.
Preflight
Create specialized buffers which contain only the mesh data needed to compute position (which are deinterleaved from all non-position data).
Create specialized vertex shaders (which compute only the output position).
Link programs with the simplest valid fragment shader.
Rendering
Render the depth-only pass using the specialized buffers and shaders, masking out all color writes.
Render the main pass with the full buffers and shaders.
Options
At step (2) above it might be beneficial to load the depth-only pass depth results as the starting depth for the main pass. This will give you better early-zs test accuracy, at the expense of the readback of the depth value. Most mobile GPUs have hidden surface removal, so this isn't always going to be a net gain - it depends on your content, target GPU, and how good your front-to-back draw order is.
You probably want to use the specialized buffers (position data interleaved in one buffer region, non-position interleaved in a second) for the main draw, as many GPUs will optimize out the non-position calculations if the primitive is culled.
The specialized buffers and optimized shaders can also be used for shadow mapping, and other such depth-only techniques.

Not calling glclear results in weird artifacts

im wanting to NOT call glClear for the depth or color bit because i want to be able to see all the previously rendered frames. And it does work except it repeats the model all over the x and y axis, and also causes some strange grey blocky lines. Is there a way to accomplish this? Im using opengl es 3 on android. Thank you for any help.
The contents of the default framebuffer at the start of a frame is undefined, especially on tile-based renderers, which most of the mobile GPUs are. Your "repeats" in the X and Y axis are likely just showing how big the tiles are on your particular GPU (e.g. it's just dumping out whatever is in the GPU local tile RAM, repeated N times to completely cover the screen).
If you want to render on top of the previous frame you need to configure the rendering context configuration to use EGL_BUFFER_PRESERVED (the default is EGL_BUFFER_DESTROYED). E.g:
eglSurfaceAttrib(m_display, m_surface, EGL_SWAP_BEHAVIOR, EGL_BUFFER_PRESERVED);
Note 1: this will incur some overhead (the surface is effectively copied back into tile-local memory), whereas starting with a surface discard or invalidate, or a clear is usually free.
Note 2: this will only preserve color data; there is no means to preserve depth or stencil across frames for the default framebuffer.

OpenGL ES 2.0 - Reducing memcopies when rendering to textures

Lets say I have four content layers: A, B, C and D; each one represents one type of visual content.
Each layer does several sequential render calls (there are no interleaved render calls from the various layers).
Also, layer B and D need to be rendered to textures in order to apply visual effects. In order to reduce the memory footprint, I use only one FBO with only one texture.
So, at the moment I do:
Render A content;
Bind FBO > Render B content > Unbind FBO > Render texture (B content);
Render C;
Bind FBO > Render D content > Unbind FBO > Render texture (D content).
My main problem with this approach is that every time I bind/unbind the FBO, the default framebuffer is saved/restored to/from memory.
I cannot simply draw layers B and D to the FBO first , since I cannot change the rendering order of layers.
Is there any better way to do this and avoid many saves/restores of the main framebuffer? Keep in mind that this is an example and the real case is more complex (more layers).
Being able to switch between render targets quickly is the primary original purpose of the FBO feature. It is generally faster than the older pbuffer approach because it does not have to deal with changing rendering contexts. Also, FBOs are not as dependent on the EGL to allocate the rendering surfaces as pbuffers are. If the Adreno 200 can not switch FBOs quickly, then that is an implementation problem specific to Adreno.
The driver will bring the contents of the FBO back from memory (which is a costly operation), unless you clear the FBO with glClear() before drawing. Clearing the FBO will hint the driver to discard the current contents and not bring them from main memory. If your FBOs have a depth buffer, make sure you include that bit also when calling glClear().
Since you are only using one FBO, you may not be able to get around it. Check if your memory requirements are so strict that you can not use a second FBO, so you can render B to your first FBO, then D to your second, and then all your content layers together to the screen.

Resources