Optimizing an OpenGL ES operation? - opengl-es

I'm using the following code to draw characters on screen (each UTF8 character is a texture):
int row = 0;
glTexCoordPointer(2, GL_FLOAT, 0, texCoords);
glVertexPointer(2, GL_FLOAT, 0, vertices);
for (StdStr* line in _lines) {
const char* str = [line cString];
for (int i = 0; i < strlen(str); i++) {
((XX::OGL::GLESContext*)context)->viewport(C_WIDTH*i,
C_HEIGHT*row,
C_WIDTH,
C_HEIGHT);
glColor4f(1.0, 1.0, 1.0, 1.0);
glBindTexture(GL_TEXTURE_2D, _textures[0] + *(str + i));
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
}
row++;
}
When there are a lot of characters, the code takes longer to run. In this case, almost 99% of the time is spent in the glDrawArrays routine. Is it possible to minimise the amount of calls to glDrawArrays? The OpenGL ES version is 1.1.

Actually I think that you should try to limit the amount of calls to viewport, glBindTexture and glDrawArrays.
Technically, you should pack all your characters in a single texture, so that you can bind it once.
Then, you could compute the vertices and texcoords in a loop like you do actually, but doing the viewport maths yourself, and accumulating results in a CPU array. Once your array constituted, you should submit a draw call once, providing this array.
You can probably find inpiration here:
http://www.angelcode.com/products/bmfont/
http://sourceforge.net/projects/oglbmfont/

Related

Why Compute Shader is slowing down the rendering API calls?

I am using compute shader to process the input buffer data and store it as output texture using imagestore().
After executing the compute shader, I have 3 render calls sequentially.
Compute Shader Code:
#version 310 es
precision mediump image2D;
layout(std430) buffer; // Sets the default layout for SSBOs
layout(local_size_x = 256) in; // 256 threads per work group
layout(binding = 0) readonly buffer InputBuf
{
uint input_buff[];
} inputbuff;
layout (rgba32f, binding = 1 ) uniform writeonly image2D out_teximg;
void main()
{
int idx = int(gl_GlobalInvocationID.x);
int idy = int(gl_GlobalInvocationID.y);
unsigned int inputpix = inputbuff[1024 * idy + idx];
// some calculation on inputpix and output is rcolor, bcolor, gcolor
imageStore(out_teximg, ivec2(idx , idy), vec4(rcolor, bcolor, gcolor, 1.0));
barrier();
};
Code:
void initCompute()
{
glGenTextures(1, &computeOutTex);
glGenBuffers(1, &inSSBOId);
}
uint inputBuffData = { .... }; // input buffer data
void execute_compute()
{
// compute shader code starts...
glUseProgram(computePgmId);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, computeOutTex);
glTexStorage2D(GL_TEXTURE_2D, 1, GL_RGBA32F, width, height);
glBindImageTexture(1, computeOutTex, 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_RGBA32F); // binding is 1
glUniform1i( glGetUniformLocation(computePgmId, "out_teximg"), 0);
uint inputBuffSize = 1024 * 512 * 3;
glBindBuffer(GL_SHADER_STORAGE_BUFFER, inSSBOId);
glBufferData(GL_SHADER_STORAGE_BUFFER, inputBuffSize, inputBuffData, GL_STATIC_DRAW);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0 , inSSBOId); // binding is 0
glDispatchCompute(width / 256, height, 1);
glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);
// glFinish();
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
glBindImageTexture(1, 0, 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_RGBA32F); // binding is 1
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, 0);// binding is 0
}
int draw()
{
glBindFramebuffer(GL_FRAMEBUFFER, m_FBOId); // Offscreen Rendering
glClear(GL_COLOR_BUFFER_BIT);
glUseProgram(compute_pgm);
execute_compute();
glUseProgram(render_pgm1);
glViewport(0,0,w,h);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, computeOutTex);
glDrawElements(); // Render the texture data
// 2nd draw call
glUseProgram(render_pgm2);
....
....
glDrawElements();
// 3rd draw call
glUseProgram(render_pgm3);
....
....
glDrawElements();
glBindFramebuffer(GL_FRAMEBUFFER, 0); // unbind FBO
}
Here, the only 2nd draw call is taking more time after using compute shader.
If glFinish() is called after glMemoryBarrier(), then only execute_compute() call is slowed down.
Why compute shader is slowing down the subsequent draw calls?
Is glFinish() really needed?
The compute shader does not slow down the subsequent draw call. However, the compute shader itself takes some time to execute. Since you are setting a memory barrier, the subsequent draws have to wait.
The OpenGL commands are cached and are not executed immediately when they are called. GPU and CPU work in parallel. The CPU sends instructions to the GPU and the GPU processes them as quickly as possible.
glFinish gets everything ready and does not return until all previously called commands have been completed. glFinish itself is not "costly". It just seems "costly" when measuring the time on the CPU since it measures the time it takes to complete the previously called OpenGL commands.
Anyway glFinish is not needed here. All you need is the memory barrier. When using the memory barrier, the following OpenGL commands, which depend on this barrier, appear to take longer to complete. However they don't need any longer, they just have to wait until the condition indicated by the barrier is met.
In your case you need to use GL_ALL_BARRIER_BITS or GL_TEXTURE_FETCH_BARRIER_BIT, which reflects incoherent memory writes (e.g.: Image store) prior to the barrier to texture fetches after the barrier.

OpenGL ES: Handle large amount matrixdata improve performance

I am using instancing in my OpenGL-app and since only one drawcall are made I have to calculate a larger matrix that consists of smaller matrices and that larger matrix is sent to the shader where gl_InstanceID can distinguish between successive matrices.
Its put on the bus with the following call
GLES30.glUniformMatrix4fv(mMVPMatrixHandleBall, nBalls, false, mMVPMatrixMajor, 0);
and in the shader the multiplication si made by
gl_Position = u_MVPMatrix[gl_InstanceID] * a_Position;
simple!
On the client-side the larger matrix is created by the following code:
private void setLargeMVPmatrix() {
int cnt = 0;
for (Iterator<Ball> shapeIterator = arrayListBalls.iterator(); shapeIterator.hasNext(); ) {
Ball ball = shapeIterator.next();
mModelMatrix = ball.getmModelMatrix();
//multipl.
Matrix.multiplyMM(mMVPMatrix, 0, mViewMatrix, 0, mModelMatrix, 0);
Matrix.multiplyMM(mMVPMatrix, 0, mProjectionMatrix, 0, mMVPMatrix, 0);
//subst. in matrisdata i en större vektor dvs vi får en stor matris som innehhåller flera mindre matriser
for (int i = 0; i < CreateGLContext.MATRIX_SIZE; i++) {
mMVPMatrixMajor[i + CreateGLContext.MATRIX_SIZE * cnt] = mMVPMatrix[i];
}
cnt++;
}
}
If I have moving-objects on the screen, like bouncing balls, for instance 100 balls bouncing around it means I have to continously translate their positions each frame which in turn means I have to call this method every frame. And the consequence is that it becomes a real performance bottelneck. I know it by just commenting out the method just to see what happends - and a real performance boost but the balls doesnt not move any longer, of course.
So my question - Is there a soluition to this problem? If I use instancing, I have to send a large matrix according to above.
Edit:
I've even tried the following which I thought could at least partially solve my problem. In the drawMethod:
int cnt = 0;
for (Iterator <Ball> it = arrayListBalls.iterator(); it.hasNext();) {
Ball ball = it.next();
mModelMatrix = ball.getmModelMatrix();
//multipl.
Matrix.multiplyMM(mMVPMatrix, 0, mViewMatrix, 0, mModelMatrix, 0);
Matrix.multiplyMM(mMVPMatrix, 0, mProjectionMatrix, 0, mMVPMatrix, 0);
GLES30.glUniformMatrix4fv( (mMVPMatrixHandleBall + cnt), 1, false, mMVPMatrix, 0);
cnt++;
}
Thanks in advance!!!
If the data that change are positions and rotations then that's what you should update to the shader.
Doing most of matrix stuff at CPU is slow, unless the needed operations are tiny, like computing the new view and projection matrices, same for all objects, and they are cheap to pass as uniforms
For every frame I'd re-fill a BufferData, perhaps with the help of glMapBufferRange or glBufferSubData, with the new positions and rotations.
Then, in the shader, build the matrices needed and do matrices multiplication there.
If initial positions and rotations are needed to build new matrices, then you must also pass them in another buffer, although just update it for the first frame.
With the proper attributes order you read in the shader these positions and rotations. The gl_InstanceID is then not needed for gl_Position calculus, perhaps needed for other object property.
If you need help on how to build matrices inside the shaders, look for glRotate and glTranslate in OpenGL 2.1 docs where you can find the definitions.
Also note that passing a big matrix for all objects by an uniform may exceed the limit on the size for the whole uniform data.

OpenGL ES 2.0 Convert int[] to GLubyte[]

The following works as a texture...
GLubyte bytePix[4 * 3] ={
255, 0, 0, //red
0, 255, 0, //green
0, 0, 255, //blue
255, 255, 0 //yellow
};
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, pixelWidth, pixelHeight, 0, GL_RGB, GL_UNSIGNED_BYTE, pbytePix);
Problem is I am passing in my BMP as an int[] so I would need something more like this...
int bytePix[4 * 3] ={
255, 0, 0, //red
0, 255, 0, //green
0, 0, 255, //blue
255, 255, 0 //yellow
};
But this doesn't show the same result.
My question is how do I convert the latter into a GLubtye[] or some other recognizable format.
On your platform, sizeof(int) clearly isn't equal to sizeof(GLubyte). I guess the immediate question is — why are you using int? It's likely just to be a huge waste of space if you're storing only values in the range 0–255.
You can't just use GL_INT or GL_UNSIGNED_INT in place of GL_UNSIGNED_BYTE, even if they are the same size as your int as you're using only a byte's range within each integer.
That aside, you'll notice that glTexImage2D doesn't have a stride parameter unlike glVertexAttribPointer and most of the other functions that exist primarily to provide data. So even though you have your values within bytes and those bytes are a predictable space apart, OpenGL can't pull them apart and repack them for you.
So the easiest option is to do it yourself:
void glTexImage2DWithStride(..., GLsizei stride, ...)
{
// the following is written to assume GL_RGB; adapt as necessary
GLubyte *byteBuffer = (GLubyte *)malloc(width * height * 3);
for(int c = 0; c < width * height * 3; c++)
byteBuffer[c] = originalBuffer[c];
glTexImage2D(..., byteBuffer, ...);
free(byteBuffer);
}
Failing that, supposing your int is four times as large as a byte, you could upload the original as an RGBA texture that's four times as large as its real size, then shrink it down in a shader, combining the .r or .as (as per your endianness) into the correct output channels.
Since ubyte and int are different in size, I guess you have to create a new ubyte array and convert explicitly element by element with a for loop, before passing it to OpenGL.

How to use fragment shader to draw sphere ilusion in OpenGL ES?

I am using this simple function to draw quad in 3D space that is facing camera. Now, I want to use fragment shader to draw illusion of a sphere inside. But, the problem is I'm new to OpenGL ES, so I don't know how?
void draw_sphere(view_t view) {
set_gl_options(COURSE);
glPushMatrix();
{
glTranslatef(view.plyr_pos.x, view.plyr_pos.y, view.plyr_pos.z - 1.9);
#ifdef __APPLE__
#undef glEnableClientState
#undef glDisableClientState
#undef glVertexPointer
#undef glTexCoordPointer
#undef glDrawArrays
static const GLfloat vertices []=
{
0, 0, 0,
1, 0, 0,
1, 1, 0,
0, 1, 0,
0, 0, 0,
1, 1, 0
};
glEnableClientState(GL_VERTEX_ARRAY);
glVertexPointer(3, GL_FLOAT, 0, vertices);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 6);
glDisableClientState(GL_VERTEX_ARRAY);
#else
#endif
}
glPopMatrix();
}
More exactly, I want to achieve this:
There might be quite a few thing you need to to achieve this... The sphere that is drawn on the last image you posted is a result in using lighting and shine and color. In general you need a shader that can process all that and can normally work for any shape.
This specific case (also some others that can be mathematically presented) can be drawn with a single quad without even needing to push normal coordinates to the program. What you need to do is create a normal in a fragment shader: If you receive vectors sphereCenter, fragmentPosition and float sphereRadius, then sphereNormal is a vector such as
sphereNormal = (fragmentPosition-sphereCenter)/radius; //taking into account all have .z = .0
sphereNormal.z = -sqrt(1.0 - length(sphereNormal)); //only if(length(spherePosition) < sphereRadius)
and real sphere position:
spherePosition = sphereCenter + sphereNormal*sphereRadius;
Now all you need to do is add your lighting.. Static or not it is most common to use some ambient factor, linear and square distance factors, shine factor:
color = ambient*materialColor; //apply ambient
vector fragmentToLight = lightPosition-spherePosition;
float lightDistance = length(fragmentToLight);
fragmentToLight = normalize(fragmentToLight); //can also just divide with light distance
float dotFactor = dot(sphereNormal, fragmentToLight); //dot factor is used to take int account the angle between light and surface normal
if(dotFactor > .0) {
color += (materialColor*dotFactor)/(1.0 + lightDistance*linearFactor + lightDistance*lightDistance*squareFactor); //apply dot factor and distance factors (in many cases the distance factors are 0)
}
vector shineVector = (sphereNormal*(2.0*dotFactor)) - fragmentToLight; //this is a vector that is mirrored through the normal, it is a reflection vector
float shineFactor = dot(shineVector, normalize(cameraPosition-spherePosition)); //factor represents how strong is the light reflection towards the viewer
if(shineFactor > .0) {
color += materialColor*(shineFactor*shineFactor * shine); //or some other power then 2 (shineFactor*shineFactor)
}
This pattern to create lights in fragment shader is one of very many. If you don't like it or you cant make it work I suggest you find another one on the web, otherwise I hope you will understand it and be able to play around with it.

openGL image quality (blured)

i use openGL to create an slideshow app. Unfortunatly the images rendered with openGL look blured compared to the gnome image viewer.
Here are the 2 Screenshots
(opengl) http://tinyurl.com/dxmnzpc
(image viewer) http://tinyurl.com/8hshv2a
and this is the base image:
http://tinyurl.com/97ho4rp
the image has the native size of my screen. (2560x1440)
#include <GL/gl.h>
#include <GL/glu.h>
#include <GL/freeglut.h>
#include <SDL/SDL.h>
#include <SDL/SDL_image.h>
#include <unistd.h>
GLuint text = 0;
GLuint load_texture(const char* file) {
SDL_Surface* surface = IMG_Load(file);
GLuint texture;
glPixelStorei(GL_UNPACK_ALIGNMENT,4);
glGenTextures(1, &texture);
glBindTexture(GL_TEXTURE_2D, texture);
SDL_PixelFormat *format = surface->format;
printf("%d %d \n",surface->w,surface->h);
if (format->Amask) {
gluBuild2DMipmaps(GL_TEXTURE_2D, 4,surface->w, surface->h, GL_RGBA,GL_UNSIGNED_BYTE, surface->pixels);
} else {
gluBuild2DMipmaps(GL_TEXTURE_2D, 3,surface->w, surface->h, GL_RGB, GL_UNSIGNED_BYTE, surface->pixels);
}
SDL_FreeSurface(surface);
return texture;
}
void display(void) {
GLdouble offset_x = -1;
GLdouble offset_y = -1;
int p_viewport[4];
glGetIntegerv(GL_VIEWPORT, p_viewport);
GLfloat gl_width = p_viewport[2];//width(); // GL context size
GLfloat gl_height = p_viewport[3];//height();
glClearColor (0.0,2.0,0.0,1.0);
glClear (GL_COLOR_BUFFER_BIT);
glLoadIdentity();
glEnable( GL_TEXTURE_2D );
glTranslatef(0,0,0);
glBindTexture( GL_TEXTURE_2D, text);
gl_width=2; gl_height=2;
glBegin(GL_QUADS);
glTexCoord2f(0, 1); //4
glVertex2f(offset_x, offset_y);
glTexCoord2f(1, 1); //3
glVertex2f(offset_x + gl_width, offset_y);
glTexCoord2f(1, 0); // 2
glVertex2f(offset_x + gl_width, offset_y + gl_height);
glTexCoord2f(0, 0); // 1
glVertex2f(offset_x, offset_y + gl_height);
glEnd();
glutSwapBuffers();
}
int main(int argc, char **argv) {
glutInit(&argc,argv);
glutInitDisplayMode (GLUT_DOUBLE);
glutGameModeString("2560x1440:24");
glutEnterGameMode();
text = load_texture("/tmp/raspberry/out.jpg");
glutDisplayFunc(display);
glutMainLoop();
}
UPDATED TRY
void display(void)
{
GLdouble texture_x = 0;
GLdouble texture_y = 0;
GLdouble texture_width = 0;
GLdouble texture_height = 0;
glViewport(0,0,width,height);
glClearColor (0.0,2.0,0.0,1.0);
glClear (GL_COLOR_BUFFER_BIT);
glColor3f(1.0, 1.0, 1.0);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(0, width, 0, height, -1, 1);
//Do pixel calculatons
texture_x = ((2.0*1-1) / (2*width));
texture_y = ((2.0*1-1) / (2*height));
texture_width=((2.0*width-1)/(2*width));
texture_height=((2.0*height-1)/(2*height));
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glTranslatef(0,0,0);
glEnable( GL_TEXTURE_2D );
glBindTexture( GL_TEXTURE_2D, text);
glBegin(GL_QUADS);
glTexCoord2f(texture_x, texture_height); //4
glVertex2f(0, 0);
glTexCoord2f(texture_width, texture_height); //3
glVertex2f(width, 0);
glTexCoord2f(texture_width, texture_y); // 2
glVertex2f(width,height);
glTexCoord2f(texture_y, texture_y); // 1
glVertex2f(0,height);
glEnd();
glutSwapBuffers();
}
What you run into is a variation of the fencepost problem, that arises from how OpenGL deals with texture coordinates. OpenGL does not address a texture's pixels (texels), but uses the image data as support for a interpolation, that in fact covers a wider range than the images pixels. So the texture coordinates 0 and 1 don't hit the left-/bottom most and right-/top most pixels, but go a little further, in fact.
Let's say the texture is 8 pixels wide:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
^ ^ ^ ^ ^ ^ ^ ^ ^
0.0 | | | | | | | 1.0
| | | | | | | | |
0/8 1/8 2/8 3/8 4/8 5/8 6/8 7/8 8/8
The digits denote the texture's pixels, the bars the edges of the texture and in case of nearest filtering the border between pixels. You however want to hit the pixels' centers. So you're interested in the texture coordinates
(0/8 + 1/8)/2 = 1 / (2 * 8)
(1/8 + 2/8)/2 = 3 / (2 * 8)
...
(7/8 + 8/8)/2 = 15 / (2 * 8)
Or more generally for pixel i in a N wide texture the proper texture coordinate is
(2i + 1)/(2N)
However if you want to perfectly align your texture with the screen pixels, remember that what you specify as coordinates are not a quad's pixels, but edges, which, depending on projection may align with screen pixel edges, not centers, thus may require other texture coordinates.
Note that if you follow this, irregardless of your filtering mode and mipmaps your image will always look clear and crisp, because the interpolation hits exactly your sampling support, which is your input image. Switching to another filtering mode, like GL_NEAREST may look right at first look, but it's actually not correct, because it will alias your samples. So don't do it.
There are few other issues with your code as well, but they're not as a huge problem. First and foremost, you're choosing a rather arcane way to viewport dimensions. You're (probably without further thought) explout the fact that the default OpenGL viewport is the size of the window the context has been created with. You're using SDL, which has the side effect, that this approach won't bite you, as long as you stick with SDL-1. But switch to any other framework, that may create the context via a proxy drawable, and you're running into a problem.
The canonical way is usually to retrieve the window size from the windowing system (SDL in your case) and then setting the viewport at one of the first actions in the display function.
Another issue is your use of gluBuildMipmaps, because a) you don't want to use mipmaps and b) since OpenGL-2 you can upload texture images of arbitrary size (i.e. you're not limited to powers of 2 for the dimensions), which completely eliminates the need for gluBuildMipmaps. So don't use it. Just use glTexImage2D directly and switch to a non-mipmapping filtering mode.
Update due to question update
The way you calculate the texture coordinates still doesn't look right. It seems like you're starting to count at 1. Texture pixels are 0 base indexed, so…
This is how I'd do it:
Assuming the projection maps the viewport
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(0, win_width, 0, win_height, -1, 1);
glViewport(0, 0, win_width, win_height);
we calculate the texture coordinates as
//Do pixel calculatons
glBindTexture( GL_TEXTURE_2D, text);
GLint tex_width, tex_height;
glGetTexLevelParameteriv(GL_TEXTURE_2D, 0, GL_TEXTURE_WIDTH, &tex_width);
glGetTexLevelParameteriv(GL_TEXTURE_2D, 0, GL_TEXTURE_HEIGHT, &tex_height);
texture_s1 = 1. / (2*width); // (2*0-1
texture_t1 = 1. / (2*height);
texture_s2 = (2.*(tex_width -1) + 1) / (2*width);
texture_t2 = (2.*(tex_height-1) + 1) / (2*height);
Note that tex_width and tex_height give the number of pixels in each direction, but the coordinates are 0 based, so you've to subtract 1 from them for the texture coordinate mapping. Hence we also use a constant 1 in the numerator for the s1, t1 coordinates.
The rest looks okay, given the projection you choose
glEnable( GL_TEXTURE_2D );
glBegin(GL_QUADS);
glTexCoord2f(s1, t1); //4
glVertex2f(0, 0);
glTexCoord2f(s2, t1); //3
glVertex2f(tex_width, 0);
glTexCoord2f(s2, t2); // 2
glVertex2f(tex_width,tex_height);
glTexCoord2f(s1, t2); // 1
glVertex2f(0,tex_height);
glEnd();
I'm not sure if this is really the problem, but I think you don't need/want mipmaps here. Have you tried using glTexImage2D instead of gluBuild2DMipmaps in combination with nearest neighbor filtering (glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_MIN/MAG_FILTER, GL_NEAREST);)?

Resources