I need to create a texture for an OpenGL ES 2.0 application with the following specs:
Each pixel has two components (lets call them r and g in the fragment shader).
Each pixel component is a 16 bit float.
That means every pixel in the texture has 4 bytes (2 bytes / 16 bit for each component).
The fragment shader should be able to sample the texture as 2 float16 components.
All formats must be supported on OpenGL ES 2.0 and as efficient as possible.
How would the appropriate glTexImage2D call look?
Regards
Neither floating point textures nor floating point render targets are supported in OpenGL ES 2.x. The short answer is therefore "you can't do what you are trying to do", at least not natively.
You can emulate higher precision by packing pairs of values into a RGBA8 texture or render target, e.g. the pair of RG values is one value, and BA is the other, but you'll have to pack/unpack the component 8-bit unorms yourself in shader code. This is quite a common solution in deferred rendering G-buffers for example, but can be relatively expensive on some of the lower-end mobile GPU parts (given it's basically just overhead, rather than useful rendering).
Related
I am new to OpenGL ES. I am currently reading docs about 2.0 version of OpenGL ES. I have a triangular 2D mesh, a 2D RGB texture and i need to compute, for every triangle, the following quantities:
where N is the number of pixels of a given triangle. This quantities are needed for further CPU processing. The idea would be to use GPU rasterization to sum quantities over triangles. I am not able to see how to do this with OpenGL ES 2.0 (which is the most popular version among android devices). Another question i have is: is it possible to do this type of computation with OpenGL ES 3.0?
I am not able to see how to do this with OpenGL ES 2.0
You can't; the API simply isn't designed to do it.
Is it possible to do this type of computation with OpenGL ES 3.0?
In the general case, no. If you can use OpenGL ES 3.1 and if you can control the input geometry then a viable algorithm would be:
Add a vertex attribute which is the primitive ID for each triangle in the mesh (we can use as an array index).
Allocate an atomics buffer GL_ATOMIC_COUNTER_BUFFER with one atomic per primitive, which is pre-zeroed.
In the fragment shader increment the atomic corresponding the current primitive (loaded from the vertex attribute).
Performance is likely to be pretty horrible though - atomics generally suck for most GPU implementations.
As OpenGL ES does not support shared "uniform blocks" I was wondering if there is a way I can put matrices that can be referenced by a number of different shaders, a simple example would be a worldToViewport or worldToEye which would not change for an entire frame and which all shaders would reference. I saw one post where one uses 3 or 4 dot calls on a vertex to transform it from 4 "column vectors", but wondering if there is a way to assign the buffer data to a "mat4" in the shader.
Ah yes the need for this is webGL which at the moment it seems only to support openGLES 2.0.
I wonder if it supports indexed attribute buffers as I assume they don't need to be any specified size relative to the size of the position vertex array.
Then if one can use a hard coded or calculated index into the attribute buffer ( in the shader ) and if one can bind more than one attribute buffer at a time, and access all "bound to the shader" buffers simultaneously in a shader ...
I see if all true might work. I need a good language/architecture reference on shaders as I am somewhat new to shader programming as I I'm trying to deign a wall without knowing the shapes of the bricks :)
Vertex attributes are per-vertex, so there is no way so share vertex attributes amongst multiple vertices.
OpenGL ES 2.0 upwards has CPU-side uniforms, which must be uploaded individually from the CPU at draw time. Uniforms belong to the program object, so for uniforms which are constant for a frame you only have to modify each program once, so the cost isn't necessarily proportional to draw count.
OpenGL ES 3.0 onwards has Uniform Buffer Objects (UBOs) which allow you to load uniforms from a buffer in memory.
I'm not sure what you mean by "doesn't support shared uniform blocks", as that's pretty much what a UBO is, albeit it won't work on older hardware which only supports OpenGL ES 2.x.
This question is for OpenGL ES 2.0 (on Android) but may be more general to OpenGL.
Ultimately all performance questions are implementation-dependent, but if anyone can answer this question in general or based on their experience that would be helpful. I'm writing some test code as well.
I have a YUV (12bpp) image I'm loading into a texture and color-converting in my fragment shader. Everything works fine but I'd like to see where I can improve performance (in terms of frames per second).
Currently I'm actually loading three textures for each image - one for the Y component (of type GL_LUMINANCE), one for the U component (of type GL_LUMINANCE and of course 1/4 the size of the Y component), and one for the V component (of type GL_LUMINANCE and of course 1/4 the size of the Y component).
Assuming I can get the YUV pixels in any arrangement (e.g. the U and V in separate planes or interspersed), would it be better to consolidate the three textures into only two or only one? Obviously it's the same number of bytes to push to the GPU no matter how you do it, but maybe with fewer textures there would be less overhead. At the very least, it would use fewer texture units. My ideas:
If the U and V pixels were interspersed with each other, I could load them in a single texture of type GL_LUMINANCE_ALPHA which has two components.
I could load the entire YUV image as a single texture (of type GL_LUMINANCE but 3/2 the size of the image) and then in the fragment shader I could call texture2D() three times on the same texture, doing a bit of arithmetic figure out the correct co-ordinates to pass to texture2D to get the correct texture co-ordinates for the Y, U and V components.
I would combine the data into as few textures as possible. Fewer textures is usually a better option for a few reasons.
Fewer state changes to setup the draw call.
The fewer texture fetches in a fragment shader the better.
Less upload time.
Sources:
I understand some of these are focused on more specific hardware, but the principles apply to most Mobile graphics architectures.
Best Practices for Working with Texture Data
Optimize OpenGL for Tegra
Optimizing performance of a heavy fragment shader
"Binding to a texture takes time for OpenGL ES to process. Apps that reduce the number of changes they make to OpenGL ES state perform better. "
"In my experience mobile GPU performance is roughly proportional to the number of texture2D calls." "There are two texture loads, so the minimum cycle count for the texture sub-unit is two." (Tegra has a texture unit which has to run a cycle for reach texture read)
"making calls to the glTexSubImage and glCopyTexSubImage functions particularly expensive" - upload operations must stall the pipeline until textures are uploaded. It is faster to batch these into a single upload than block a bunch of separate times.
In OpenGL ES 1.x, one could do glTranslate first, and then glRotate, to modify where the center of rotation is located (i.e. rotate around given point). As far as I understand, in OpenGL ES 2.0 matrix computations are done on CPU side. I am using IwGeom (from Marmalade SDK) – a typical (probably) matrix package. From documentation:
Matrices in IwGeom are effectively in 4x3 format, comprising a 3x3
rotation, and a 3-component vector translation.
I find it hard to obtain the same effect using this method. The translation is always applied after the rotation. More, in Marmalade, one also sets Model matrix:
IwGxSetModelMatrix( &modelMatrix );
And, apparently, rotation and translation is also applied in one order: a) rotation, b) translation.
How to obtain the OpenGL ES 1.x effect?
Marmalades IwGX wraps OpenGL and it is more similar to GLES 1.0 then GLES 2.0 as it does not requires shaders.
glTranslate and glRotate modifying view matrix.
You may replace with
CIwFMat viewMat1 = IwGxGetModelMatrix();
CIwFMat rot; rot.SetIdentity(); rot.SetRotZ(.....); // or other matrix rotation function
CIwFMat viewMat2 = viewMat1;
viewMat2.PostMult(rot); // or viewMat2.PreMult(rot);
IwGxSetModelMatrix(viewMat2);
// Draw Something
IwGxSetModelMatrix(&viewMat1);
If you use GLES 2.0 then matrix might be computed in vertex shader as well. That might be faster then CPU. CPU with NEON instructions have similar performance on iPhone 4S
I have googled around but havnt found an answer that suits me for OpenGL.
I want to construct a sparse matrix with a single diagonal and around 9 off-diagonals. These diagonals arent necessarily next to the main diagonal and they wrap around. Each diagonal is an image in row-major format i.e. a vector of size NxM.
The size of the matrix is (NxM)x(NxM)
My question is as follows:
After some messing around with the math I have arrived at the basic units of my operation. It involves a pixel by pixel multiplication of two images (WITHOUT limiting the value of the result i.e. so it can be above 1 or below 0), storing the resulting image and then adding a bunch of the resulting images (SAME as above).
How can I multiply and add images on a pixel by pixel basis in OpenGL? Is it easier in 1.1 or 2.0? Will use of textures cause hard maxing of the results to between 0 and 1? Will this maximize the use of the gpu cores?
In order to be able to store values outside the [0-1] range you would have to use floating point textures. There is no support in OpenGL ES 1.1 and for OpenGL ES 2.0 it is an optional extension (See other SO question).
In case your implementation supports it you could then write a fragment program to do the required math.
In OpenGL ES 1.1 you could use the glTexEnv call to set up how the pixels from different texture units are supposed to be combined. You could then use "modulate" or "add" to multiply/add the values. The result would be clamped to [0,1] range though.