Reading depth buffer using OpenGLES3 - opengl-es

In OpenGL I am able to read the z-buffer values, using glReadPixels, like so:
glReadPixels(scrx, scry, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &depth);
If I do the same in OpenGL ES 3.2 I get a GL_INVALID_OPERATION error.
Checking the specification, I see that OpenGL allows GL_DEPTH_COMPONENT, but OpenGLES3 does not.
As a work-around, I copied the fragment depth to the alpha value in the colour buffer using this GLSL:
#version 320 es
...
outCol = vec4(psCol.rgb, gl_FragCoord.z);
After doing glReadPixels() on the GL_RGBA part of the framebuffer, I use rgba[3]/255.0 as the depth value.
Although this works, the 8-bit precision on the alpha value is insufficient for my purpose of picking what is under the mouse cursor.
Is there a way to get Z values from the framebuffer in OpenGL ES3?

There is an OpenGL ES Extension NV_read_depth which allows reading from the depth buffer by glReadPixels. The extension is written against the OpenGL ES 2.0 Specification, but is still not standard in OpenGL ES 3.2.
Get a set of the OpenGL es extension names by:
std::set<std::string> ogl_es_extensins;
GLint no_of_extensions = 0;
glGetIntegerv(GL_NUM_EXTENSIONS, &no_of_extensions);
for ( int i = 0; i < no_of_extensions; ++i )
{
std::string ext_name = reinterpret_cast<const char*>(glGetStringi(GL_EXTENSIONS, i));
ogl_es_extensins.insert(ext_name);
}
Note, you can either try to read the depth buffer (NV_read_depth) or the depth and stencil buffer (NV_read_depth_stencil);

Related

Texture crop replacement in OpenGL ES 2.0?

I am trying to upgrade code which uses OpenGLES1 to ES2. The conversion is more or less understandable however for these lines I could not find alternative in ES2. Not even sure why it is necessary.
GLint crop[4] = { 0, h, w, -h };
glTexParameteriv(GL_TEXTURE_2D, GL_TEXTURE_CROP_RECT_OES, crop);
if it is for drawing animation then use UV coordinates, it is faster.
If you need part of the texture use
Bitmap newbitmap = Bitmap.createBitmap(oldbitmap, x,y,w,h)
GLUtils.texImage2D(GL_TEXTURE_2D, 0, newbitmap , 0);

How can I properly create an array texture in OpenGL (Go)?

I have a total of two textures, the first is used as a framebuffer to work with inside a computeshader, which is later blitted using BlitFramebuffer(...). The second is supposed to be an OpenGL array texture, which is used to look up textures and copy them onto the framebuffer. It's created in the following way:
var texarray uint32
gl.GenTextures(1, &texarray)
gl.ActiveTexture(gl.TEXTURE0 + 1)
gl.BindTexture(gl.TEXTURE_2D_ARRAY, texarray)
gl.TexParameteri(gl.TEXTURE_2D_ARRAY, gl.TEXTURE_MIN_FILTER, gl.LINEAR)
gl.TexImage3D(
gl.TEXTURE_2D_ARRAY,
0,
gl.RGBA8,
16,
16,
22*48,
0,
gl.RGBA, gl.UNSIGNED_BYTE,
gl.Ptr(sheet.Pix))
gl.BindImageTexture(1, texarray, 0, false, 0, gl.READ_ONLY, gl.RGBA8)
sheet.Pix is just the pixel array of an image loaded as a *image.NRGBA
The compute-shader looks like this:
#version 430
layout(local_size_x = 1, local_size_y = 1) in;
layout(rgba32f, binding = 0) uniform image2D img;
layout(binding = 1) uniform sampler2DArray texAtlas;
void main() {
ivec2 iCoords = ivec2(gl_GlobalInvocationID.xy);
vec4 c = texture(texAtlas, vec3(iCoords.x%16, iCoords.y%16, 7));
imageStore(img, iCoords, c);
}
When i run the program however, the result is just a window filled with the same color:
So my question is: What did I do wrong during the shader creation and what needs to be corrected?
For any open code questions, here's the corresponding repo
vec4 c = texture(texAtlas, vec3(iCoords.x%16, iCoords.y%16, 7))
That can't work. texture samples the texture at normalized coordinates, so the texture is in [0,1] (in the st domain, the third dimension is the layer and is correct here), coordinates outside of that ar handled via the GL_WRAP_... modes you specified (repeat, clamp to edge, clamp to border). Since int % 16 is always an integer, and even with repetition only the fractional part of the coordinate will matter, you are basically sampling the same texel over and over again.
If you need the full texture sampling (texture filtering, sRGB conversions etc.), you have to use the normalized coordinates instead. But if you only want to access individual texel data, you can use texelFetch and keep the integer data instead.
Note, since you set the texture filter to GL_LINEAR, you seem to want filtering, however, your coordinates appear as if you would want at to access the texel centers, so if you're going the texture route , thenvec3(vec2(iCoords.xy)/vec2(16) + vec2(1.0/32.0) , layer) would be the proper normalization to reach the texel centers (together with GL_REPEAT), but then, the GL_LINEAR filtering would yield identical results to GL_NEAREST.

glsl es 2.0 discard and depth buffer

OpenGL ES 2.0 can't write to Depth Buffer, since it doesn't have gl_FragDepth, but if I do discard - will it affect Depth Buffer (I mean, not overwrite it with CURRENT gl_FragCoord.z, but leave as is)?
Something like this:
void main() {
if (sin(x) > 0.5 ){
discard;
} else {
gl_FragColor = (1.0, 1.0, 1.0, 1.0);
}
}
I expect to get holes in Depth Buffer, in discard cases.
Your statement saying "OpenGL ES 2.0 can't write to Depth Buffer" is somewhat misleading. It will write to the depth buffer, as long as there is a depth buffer in the current framebuffer, and depth buffer writes are enabled.
Not having a writable gl_FragDepth variable in the fragment shader means that you can't modify the depth value in the fragment shader. It's alway the incoming depth value that is written to the depth buffer.
If you discard a fragment, neither the color nor the depth value is written. So yes, if you discard a fragment, the depth buffer will not be modified for that fragment.

Storing floats in a texture in OpenGL ES

In WebGL, I am trying to create a texture with texels each consisting of 4 float values. Here I attempt to to create a simple texture with one vec4 in it.
var textureData = new Float32Array(4);
var texture = gl.createTexture();
gl.activeTexture( gl.TEXTURE0 );
gl.bindTexture(gl.TEXTURE_2D, texture);
gl.texImage2D(
// target, level, internal format, width, height
gl.TEXTURE_2D, 0, gl.RGBA, 1, 1,
// border, data format, data type, pixels
0, gl.RGBA, gl.FLOAT, textureData
);
My intent is to sample it in the shader using a sampler like so:
uniform sampler2D data;
...
vec4 retrieved = texture2D(data, vec2(0.0, 0.0));
However, I am getting an error during gl.texImage2D:
WebGL: INVALID_ENUM: texImage2D: invalid texture type
WebGL error INVALID_ENUM in texImage2D(TEXTURE_2D, 0, RGBA, 1, 1, 0, RGBA, FLOAT,
[object Float32Array])
Comparing the OpenGL ES spec and the OpenGL 3.3 spec for texImage2D, it seems like I am not allowed to use gl.FLOAT. In that case, how would I accomplish what I am trying to do?
You can create a byte array from your float array. Each float should take 4bytes (32bit float). This array can be put into texture using a standard RGBA format with unsigned byte. This will create a texture where each texel contains a single 32bit floating number which seems to be exactly what you want.
The only problem is your floating value is split into 4 floating values when you retrieve it from texture in your fragment shader. So what you are looking for is most likely "how to convert vec4 into a single float".
You should note what you are trying to do with internal format being RGBA consisting of 32bit floats will not work as your texture will always be 32bit per texel so even somehow forcing floats into a texture should result into clamping or precision loss. And then even if the texture texel would consist of 4 RGBA 32bit floats your shader would most likely treat them as lowp using texture2D at some point.
The solution to my problem is actually quite simple! I just needed to type
var float_texture_ext = gl.getExtension('OES_texture_float');
Now WebGL can use texture floats!
This MDN page tells us why:
Note: In WebGL, unlike in other GL APIs, extensions are only available if explicitly requested.

How can I improve performance of Direct3D when I'm writing to a single vertex buffer thousands of times per frame?

I am trying to write an OpenGL wrapper that will allow me to use all of my existing graphics code (written for OpenGL) and will route the OpenGL calls to Direct3D equivalents. This has worked surprisingly well so far, except performance is turning out to be quite a problem.
Now, I admit I am most likely using D3D in a way it was never designed. I am updating a single vertex buffer thousands of times per render loop. Every time I draw a "sprite" I send 4 vertices to the GPU with texture coordinates, etc and when the number of "sprites" on the screen at one time gets to around 1k to 1.5k, then the FPS of my app drops to below 10fps.
Using the VS2012 Performance Analysis (which is awesome, btw), I can see that the ID3D11DeviceContext->Draw method is taking up the bulk of the time:
Screenshot Here
Is there some setting I'm not using correctly while setting up my vertex buffer, or during the draw method? Is it really, really bad to be using the same vertex buffer for all of my sprites? If so, what other options do I have that wouldn't drastically alter the architecture of my existing graphics code base (which are built around the OpenGL paradigm...send EVERYTHING to the GPU every frame!)
The biggest FPS killer in my game is when I'm displaying a lot of text on the screen. Each character is a textured quad, and each one requires a separate update to the vertex buffer and a separate call to Draw. If D3D or hardware doesn't like many calls to Draw, then how else can you draw a lot of text to the screen at one time?
Let me know if there is any more code you'd like to see to help me diagnose this problem.
Thanks!
Here's the hardware I'm running on:
Core i7 # 3.5GHz
16 gigs of RAM
GeForce GTX 560 Ti
And here's the software I'm running:
Windows 8 Release Preview
VS 2012
DirectX 11
Here is the draw method:
void OpenGL::Draw(const std::vector<OpenGLVertex>& vertices)
{
auto matrix = *_matrices.top();
_constantBufferData.view = DirectX::XMMatrixTranspose(matrix);
_context->UpdateSubresource(_constantBuffer, 0, NULL, &_constantBufferData, 0, 0);
_context->IASetInputLayout(_inputLayout);
_context->VSSetShader(_vertexShader, nullptr, 0);
_context->VSSetConstantBuffers(0, 1, &_constantBuffer);
D3D11_PRIMITIVE_TOPOLOGY topology = D3D11_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP;
ID3D11ShaderResourceView* texture = _textures[_currentTextureId];
// Set shader texture resource in the pixel shader.
_context->PSSetShader(_pixelShaderTexture, nullptr, 0);
_context->PSSetShaderResources(0, 1, &texture);
D3D11_MAPPED_SUBRESOURCE mappedResource;
D3D11_MAP mapType = D3D11_MAP::D3D11_MAP_WRITE_DISCARD;
auto hr = _context->Map(_vertexBuffer, 0, mapType, 0, &mappedResource);
if (SUCCEEDED(hr))
{
OpenGLVertex *pData = reinterpret_cast<OpenGLVertex *>(mappedResource.pData);
memcpy(&(pData[_currentVertex]), &vertices[0], sizeof(OpenGLVertex) * vertices.size());
_context->Unmap(_vertexBuffer, 0);
}
UINT stride = sizeof(OpenGLVertex);
UINT offset = 0;
_context->IASetVertexBuffers(0, 1, &_vertexBuffer, &stride, &offset);
_context->IASetPrimitiveTopology(topology);
_context->Draw(vertices.size(), _currentVertex);
_currentVertex += (int)vertices.size();
}
And here is the method that creates the vertex buffer:
void OpenGL::CreateVertexBuffer()
{
D3D11_BUFFER_DESC bd;
ZeroMemory(&bd, sizeof(bd));
bd.Usage = D3D11_USAGE_DYNAMIC;
bd.ByteWidth = _maxVertices * sizeof(OpenGLVertex);
bd.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bd.CPUAccessFlags = D3D11_CPU_ACCESS_FLAG::D3D11_CPU_ACCESS_WRITE;
bd.MiscFlags = 0;
bd.StructureByteStride = 0;
D3D11_SUBRESOURCE_DATA initData;
ZeroMemory(&initData, sizeof(initData));
_device->CreateBuffer(&bd, NULL, &_vertexBuffer);
}
Here is my vertex shader code:
cbuffer ModelViewProjectionConstantBuffer : register(b0)
{
matrix model;
matrix view;
matrix projection;
};
struct VertexShaderInput
{
float3 pos : POSITION;
float4 color : COLOR0;
float2 tex : TEXCOORD0;
};
struct VertexShaderOutput
{
float4 pos : SV_POSITION;
float4 color : COLOR0;
float2 tex : TEXCOORD0;
};
VertexShaderOutput main(VertexShaderInput input)
{
VertexShaderOutput output;
float4 pos = float4(input.pos, 1.0f);
// Transform the vertex position into projected space.
pos = mul(pos, model);
pos = mul(pos, view);
pos = mul(pos, projection);
output.pos = pos;
// Pass through the color without modification.
output.color = input.color;
output.tex = input.tex;
return output;
}
What you need to do is batch vertexes as aggressively as possible, then draw in large chunks. I've had very good luck retrofitting this into old immediate-mode OpenGL games. Unfortunately, it's kind of a pain to do.
The simplest conceptual solution is to use some sort of device state (which you're probably tracking already) to create a unique stamp for a particular set of vertexes. Something like blend modes and bound textures is a good set. If you can find a fast hashing algorithm to run on the struct that's in, you can store it pretty efficiently.
Next, you need to do the vertex caching. There are two ways to handle that, both with advantages. The most aggressive, most complicated, and in the case of many sets of vertexes with similar properties, most efficient is to make a struct of device states, allocate a large (say 4KB) buffer, and proceed to store vertexes with matching states in that array. You can then dump the entire array into a vertex buffer at the end of the frame, and draw chunks of the buffer (to recreate original order). Keeping track of all the buffer and state and order is difficult, however.
The simpler method, which can provide a good bit of caching under good circumstances, is to cache vertexes in a large buffer until device state changes. At that point, prior to actually changing state, dump the array into a vertex buffer and draw. Then reset the array index, commit state changes, and go again.
If your application has large numbers of similar vertexes, which is very possible working with sprites (texture coordinates and colors may change, but good sprites will use a single texture atlas and few blending modes), even the second method can give some performance boosts.
The trick here is to build up a cache in system memory, preferably a large chunk of pre-allocated memory, then dump it to video memory just prior to drawing. This allows you to perform far fewer writes to video memory and draw calls, which tend to be expensive (especially together). As you've seen, the number of calls you make gets to be slow, and batching stands a good chance of helping with that. The trick is to not allocate memory each frame if you can help it, batch large enough chunks to be worthwhile, and maintain correct device state and order for each draw.

Resources