OpenGL, alpha blending with packed buffer and multiple textures - sorting

I'm drawing a fairly simple 2D scene containing only rectangles. I have one FloatBuffer into which I put X, Y, Z, R, G, B, A, U, and V data for each vertex.
I draw using glDrawArrays and GL_TRIANGLE_STRIP, keeping the rectangles separate with degenerate vertices.
To facilitate the use of multiple textures, I keep separate float arrays for each texture's draw calls. The texture is binded, the float array is put into the FloatBuffer, and I draw.
Then the next texture is then binded and this continues until I have drawn all of my textures for this render.
I use an Orthographic projection so I can use the Z coordinates and GL_DEPTH_TEST for setting depth independently of the draw order.
To use alpha blending, every piece of advice on the internet seems to say:
GLES20.glEnable(GLES20.GL_BLEND);
GLES20.glBlendFunc(GLES20.GL_SRC_ALPHA, GLES20.GL_ONE_MINUS_SRC_ALPHA);
This works fine per each texture's "draw", because I have the draw calls sorted in the buffer from back to front before drawing. I have no way to correctly draw texture2 under a partially transparent texture1 because of the depth test and texture1 being drawn before texture2. texture1 chops off the overlapping part of texture2 because the depth test says that texture1 is in front of texture2.
The only ways I see around this are
1) only using 1 texture in the whole program, and 2) not using transparent textures. Neither of these are acceptable options.
Basically, I need a way to have alpha blending without needing to sort back-to-front. Is this possible?

It sounds like you might need to do Depth Peeling. Here's a PDF that shows how to do it.

Related

In SceneKit, how can one tile a texture on differently sized objects while keeping draw calls minimal?

To improve performance/fps in a SceneKit scene, I would like to minimise the number of draw calls. The scene contains a procedurally generated city, for which I generate houses of random heights (each an SCNBox) and tile them with a single, identical repeating facade texture, like so:
The proper way to apply the textures appears to be as follows:
let material = SCNMaterial()
material.diffuse.contents = image
material.diffuse.wrapS = SCNWrapMode.repeat
material.diffuse.wrapT = SCNWrapMode.repeat
buildingGeometry.firstMaterial = material
This works. But as written, it stretches the material to fit the size of the faces of the box. To resize the textures to maintain aspect ratio, one needs to add the following code:
material.diffuse.contentsTransform = SCNMatrix4MakeScale(sx, sy, sz)
where sx, sy and sz are appropriate scale factors derived from size of the faces in the geometry. This also works.
But that latter approach implies that every node needs a custom material, which in turn means that I cannot re-use a single material for all of the houses, which in turn means that every single node requires an extra draw call.
Is there a way to use a single texture material to tile all of the houses (without stretching the texture)?
Using a surface shader modifier (SCNShaderModifierEntryPointSurface) you could modify _surface.diffuseTexcoord based on scn_node.boundingBox.
Since the bounding box is dynamically fed to the shader all the objects will be using the same shader and will benefit from instancing (reducing the number of draw calls).
The SCNShadable.h header file has more details on that.

Per-object post-processing

Suppose I need to render the following scene:
Two cubes, one yellow, another red.
The red cube needs to 'glow' with red light, the yellow one does not glow.
The cubes are rotating around the common center of gravity.
The camera is positioned in
such a way that when the red, glowing cube is close to the camera,
it partially obstructs the yellow cube, and when the yellow cube is
close to the camera, it partially obstructs the red, glowing one.
If not for the glow, the scene would be trivial to render. With the glow, I can see at least 2 ways of rendering it:
WAY 1
Render the yellow cube to the screen.
Compute where the red cube will end up on the screen (easy, we have the vertices +the model view matrix), so render it to an off-screen
FBO just big enough (leave margins for the glow); make sure to save
the Depths to a texture.
Post-process the FBO and make the glow.
Now the hard part: merge the FBO with the screen. We need to take into account the Depths (which we have stored in a texture) so looks
like we need to do the following:
a) render a quad , textured with the FBO's color attachment.
b) set up the ModelView matrix appropriately (
we need to move the texture by some vector because we intentionally
rendered the red cube to a smaller than the screen FBO in step 2 (for
speed reasons!)) c) in the 'merging' fragment shader, we need to write
the gl_FragDepth from FBO's Depth attachment texture (and not from
FragCoord.z)
WAY2
Render both cubes to a off-screen FBO; set up stencil so that the unobstructed part of the red cube is marked with 1's.
Post-process the FBO so that the marked area gets blurred and blend this to make the glow
Blit the FBO to the screen
WAY 1 works, but major problem with it is speed, namely step 4c. Writing to gl_FragDepth in fragment shader disables the early z-test.
WAY 2 also kind of works, and looks like it should be much faster, but it does not give 100% correct results.
The problem is when the red cube is partially obstructed by the yellow one, pixels of the red cube that are close to the yellow one get 'yellowish' when we blur them, i.e. the closer, yellow cube 'creeps' into the glow.
I guess I could kind of remedy the above problem by, when I am blurring, stop blurring when the pixels I am reading suddenly decrease in Depth (means we just jumped from a further object to a closer one) but that would mean twice as many texture accesses when blurring (in addition to fetching the COLOR texture we need to keep fetching the DEPTH texture), and a conditional statement in the blurring fragment shader. I haven't tried, but I am not convinced it would be any faster than WAY 1, and even that wouldn't give 100% correct results (the red pixels close to the border with the yellow cube would be only influenced by the visible part of the red cube, rather than the whole (-blurRadius,+blurRadius) area so in this area the glow would not be 100% the same).
Would anyone have suggestions how to best implement such 'per-object post-processing' ?
EDIT:
What I am writing is a sort of OpenGL ES library for graphics effects. Clients are able to give it a series of instructions like 'take this Mesh, texture it with this, apply the following matrix transformations it its ModelView matrix, apply the following distortions to its vertices, the following set of fragment effects, render to the following Framebuffer'.
In my library, I already have what I call 'matrix effects' (modifying the Model View) 'vertex effects' (various vertex distortions) and 'fragment effects' (various changes of RGBA per-fragment).
Now I am trying to add what I call 'post-processing' effects, this 'GLOW' being the first of them. I define the effect and I vision it exactly as you described above.
The effects are applied to whole Meshes; thus now I need what I call 'per-object post-processing'.
The library is aimed mostly at kind of '2.5D' usages, like GPU-accelerated UIs in Mobile Apps, 2-2.5D games (think Candy Crush), etc. I doubt people will actually ever use it for any real 3D, large game.
So FPS, while always important, is a bit less crucial then usually.
I try really hard to keep the API 'Mesh-local', i.e. the rendering pipeline only knows about the current Mesh it is rendering. Main complaint about the above is that it has to be aware of the whole set me meshes we are going to render to a given Framebuffer. That being said, if 'mesh-locality' is impossible or cannot be done efficiently with post-processing effects, then I guess I'll have to give it up (and make my Tutorials more complicated).
Yesterday I was thinking about this:
# 'Almost-Mesh-local' algorithm for rendering N different Meshes, some of them glowing
Create FBO, attach texture the size of the screen to COLOR0, another texture 1/4 the size of the screen to COLOR1.
Enable DEPTH test, clear COLOR/DEPTH
FOREACH( glowing Mesh )
{
use MRT to render it to COLOR0 and COLOR1 in one go
}
Detach COLOR1, attach STENCIL texture
Set up STENCIL so that the test always passes and writes 1s when Depth test passes
Switch off DEPTH/COLOR writes
FOREACH( glowing Mesh )
{
enlarge it by N% (amount of GLOW needs to be modifiable!)
render to STENCIL // i.e. mark the future 'glow' regions with 1s in stencil
}
Set up STENCIL so that test always passes and writes 0 when Depth test passes
Switch on DEPTH/COLOR writes
FOREACH( not glowing Mesh )
{
render to COLOR0/STENCIL/DEPTH // now COLOR0 contains everything rendered, except for the GLOW. STENCIL marks the unobstructed glowing areas with 1s
}
Blur the COLOR1 texture with BLUR radius 'N'
Merge COLOR0 and COLOR1 to the screen in the following way:
IF ( STENCIL==0 ) take pixel from COLOR0
ELSE blend COLOR0 and COLOR1
END
This is not Mesh-local (we still need to be able to process all 'glowing' Meshes first) although I call it 'almost Mesh-local' because it differentiates between meshes only on the basis of the Effects being applied to them, and not which one is where or which obstructs which.
It also can have problems when two GLOWING Meshes obstruct each other (blend does not have to be done in the right order) although with the GLOW being half-transparent, I am hoping the final look will be more or less ok.
Looks like it can even be turned into a completely 'Mesh-local' algorithm by doing one giant
FOREACH(Mesh)
{
if( glowing )
{
}
else
{
}
}
although at a cost of having to attach and detach stuff from FBO and setting STENCILS differently at each loop iteration.
A knee-jerk suggestion is to do the hybrid:
compute where the red cube will end up on screen, so render it to an off-screen FBO just big enough (or one the same size as the screen, since creating FBOs on the hoof may not be efficient); don't worry about depths, it's only the colours you're after;
render both cubes to an off-screen FBO; set up stencil so that the unobstructed part of the red cube is marked with 1s;
post-process to the screen by using an original pixel from (2) wherever the stencil is 0, or a blurred pixel computed by sampling (1) wherever the stencil is 1.

How to access the depth value of the neighbor pixels in WebGL?

I want to make a toon border effect. For it, I'll use the depth value of the neighbor pixels of each pixel to determine if it is or isn't supposed to be blacked. How can I access that information inside the fragment shader?
When you render your scene in a normal way (vertex shader, then fragment shader - single pass) then in the fragment shader there is no way to access depth values for another pixels.
But:
You can render scene twice and perform some postprocessing effects. In the first run you store depth values and others (like normals, etc) in RenderTarget (in texture) then you use those textures in the second pass.
Here you have effect from XNA, but can be quickly ported to GLSL: http://xnameetingpoint.weebly.com/shader7f31.html
Here some link about Render to Texture: http://learningwebgl.com/blog/?p=1786
Hint: depth values will not be enough for border detection, you have to use normals as well. But it is covered in the above tutorial from XNA.

performance - drawing many 2d circles in opengl

I am trying to draw large numbers of 2d circles for my 2d games in opengl. They are all the same size and have the same texture. Many of the sprites overlap. What would be the fastest way to do this?
an example of the kind of effect I'm making http://img805.imageshack.us/img805/6379/circles.png
(It should be noted that the black edges are just due to the expanding explosion of circles. It was filled in a moment after this screen-shot was taken.
At the moment I am using a pair of textured triangles to make each circle. I have transparency around the edges of the texture so as to make it look like a circle. Using blending for this proved to be very slow (and z culling was not possible as they were rendered as squares to the depth buffer). Instead I am not using blending but having my fragment shader discard any fragments with an alpha of 0. This works, however it means that early z is not possible (as fragments are discarded).
The speed is limited by the large amounts of overdraw and the gpu's fillrate. The order that the circles are drawn in doesn't really matter (provided it doesn't change between frames creating flicker) so I have been trying to ensure each pixel on the screen can only be written to once.
I attempted this by using the depth buffer. At the start of each frame it is cleared to 1.0f. Then when a circle is drawn it changes that part of the depth buffer to 0.0f. When another circle would normally be drawn there it is not as the new circle also has a z of 0.0f. This is not less than the 0.0f that is currently there in the depth buffer so it is not drawn. This works and should reduce the number of pixels which have to be drawn. However; strangely it isn't any faster. I have already asked a question about this behavior (opengl depth buffer slow when points have same depth) and the suggestion was that z culling was not being accelerated when using equal z values.
Instead I have to give all of my circles separate false z-values from 0 upwards. Then when I render using glDrawArrays and the default of GL_LESS we correctly get a speed boost due to z culling (although early z is not possible as fragments are discarded to make the circles possible). However this is not ideal as I've had to add in large amounts of z related code for a 2d game which simply shouldn't require it (and not passing z values if possible would be faster). This is however the fastest way I have currently found.
Finally I have tried using the stencil buffer, here I used
glStencilFunc(GL_EQUAL, 0, 1);
glStencilOp(GL_KEEP, GL_INCR, GL_INCR);
Where the stencil buffer is reset to 0 each frame. The idea is that after a pixel is drawn to the first time. It is then changed to be none-zero in the stencil buffer. Then that pixel should not be drawn to again therefore reducing the amount of overdraw. However this has proved to be no faster than just drawing everything without the stencil buffer or a depth buffer.
What is the fastest way people have found to write do what I am trying?
The fundamental problem is that you're fill limited, which is the GPUs inability to shade all the fragments you ask it to draw in the time you're expecting. The reason that you're depth buffering trick isn't effective is that the most time-comsuming part of processing is shading the fragments (either through your own fragment shader, or through the fixed-function shading engine), which occurs before the depth test. The same issue occurs for using stencil; shading the pixel occurs before stenciling.
There are a few things that may help, but they depend on your hardware:
render your sprites from front to back with depth buffering. Modern GPUs often try to determine if a collection of fragments will be visible before sending them off to be shaded. Roughly speaking, the depth buffer (or a represenation of it) is checked to see if the fragment that's about to be shaded will be visible, and if not, it's processing is terminated at that point. This should help reduce the number of pixels that need to be written to the framebuffer.
Use a fragment shader that immediately checks your texel's alpha value, and discards the fragment before any additional processing, as in:
varying vec2 texCoord;
uniform sampler2D tex;
void main()
{
vec4 texel = texture( tex, texCoord );
if ( texel.a < 0.01 ) discard;
// rest of your color computations
}
(you can also use alpha test in fixed-function fragment processing, but it's impossible to say if the test will be applied before the completion of fragment shading).

iPhone OpenGL ES: Applying a Depth Test on Textures that have transparent pixels for 2D game

Currently, I have blending and depth testing turn on for a 2D game. When I draw my textures, the "upper" texture remove some portion of the lower textures if they intersect. Clearly, transparent pixels of the textures are taken into account of the depth test, and it clear out all the colors of the drawn lower textures if they intersect. Moreover, alpha blendings are incorrectly rendered. Are there any sort of functions that can tell OpenGL to not include transparent pixels into depth testing?
glEnable( GL_ALPHA_TEST );
glAlphaFunc( GL_EQUAL, 1.0f );
This will discard all pixels with an alpha of anything other than fully opaque. These pixels will, then, not be rendered to the Z-Buffer. This does, however, affect various Z-Buffer pipeline optimisations so it may cause some serious slowdowns. Only use it if you really have too.
No it's not possible. This is true of all hardware depth testing.
GL (full or ES -- and D3D) all have the same model -- they paint in the order you specify polygons. If you draw polygon A in before polygon B, and logically polygon A should be in front on polygon B, polygon B won't be painted (courtesy of the depth test).
The solution is to draw you polygons in order from farthest to nearest the current view origin. Happily in a 2D game this should just be a simple sort (one you probably won't even need to do very often).
In 3D games BSPs are the basic solution to this issue.
if you're using shaders, can try disabling blending and discard the pixels with alpha 0
if(texColor.w == 0.0)
discard;
What type of blending are you using?
glEnable(GL_BLEND);
glBlendFunc (GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
Should prevent any fragments with alpha of 0 from writing to the depth buffer.

Resources