Unbinding a WebGL buffer, worth it? - performance

In various sources I've seen recommendations for 'unbinding' buffers after use, i.e. setting it to null. I'm curious if there is really a need for this. e.g.
var buffer = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, buffer);
// ... buffer related operations ...
gl.bindBuffer(gl.ARRAY_BUFFER, null); // unbinding
On the one hand, it's likely better for debugging as you'll probably get better error messages, but is there any significant performance loss from unbinding buffers all the time? It's generally recommended to reduce WebGL calls where possible.

The reason people often unbind buffers and other objects is to minimize the side effects of functions/methods. It's a general software development principle that functions should only perform their advertised operations, and not have any unexpected side effects. Therefore, it's a common practice that if a function binds objects, it unbinds them before returning.
Let's look at a typical example (with no particular language syntax). First, we define a function that creates a texture without any defined content:
function GLuint createEmptyTexture(int texWidth, int texHeight) {
GLuint texId;
glGenTextures(1, &texId);
glBindTexture(GL_TEXTURE_2D, texId);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, texWidth, texHeight, 0,
GL_RGBA, GL_UNSIGNED_BYTE, 0);
return texId;
}
Then, let's have another function to create a texture. But this one fills the texture with data from a buffer (which I believe is not supported in WebGL yet, but it still helps illustrates the general principle):
function GLuint createTextureFromBuffer(int texWidth, int texHeight,
GLuint bufferId) {
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, bufferId);
GLuint texId;
glGenTextures(1, &texId);
glBindTexture(GL_TEXTURE_2D, texId);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, texWidth, texHeight, 0,
GL_RGBA, GL_UNSIGNED_BYTE, 0);
return texId;
}
Now, I can call these functions, and everything works as expected:
GLuint tex1 = createEmptyTexture(width, height);
GLuint tex2 = createTextureFromBuffer(width, height, bufferId);
But see what happens if I call them in the opposite order:
GLuint tex1 = createTextureFromBuffer(width, height, bufferId);
GLuint tex2 = createEmptyTexture(width, height);
This time, both textures will be filled with the buffer content, because the pixel unpack buffer was still bound after the first function returned, and therefore when the second function was called.
One way of avoiding this is to unbind the pixel unpack buffer at the end of the function that binds it. And to make sure that similar issues can not happen because the texture is still bound, it can unbind that one as well:
function GLuint createTextureFromBuffer(int texWidth, int texHeight,
GLuint bufferId) {
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, bufferId);
GLuint texId;
glGenTextures(1, &texId);
glBindTexture(GL_TEXTURE_2D, texId);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, texWidth, texHeight, 0,
GL_RGBA, GL_UNSIGNED_BYTE, 0);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
glBindTexture(GL_TEXTURE_2D, 0);
return texId;
}
With this implementation, both call sequences of using these two functions will produce the same result.
There are other approaches to address this. For example:
Each function documents its preconditions and side effects, and the caller is responsible to make any necessary state changes to meet the preconditions of the next function after calling a function with side effects.
Each function is completely responsible for setting up all it's state. In the example above, this would mean that the createEmptyTexture() function would have to unbind the pixel unpack buffer, because it relies on none being bound.
Approach 1 does not really scale well, and will be painful to maintain in larger systems. Approach 2 is also unsatisfactory because OpenGL has a lot of state, and having to set up all relevant state in every function would be verbose and inefficient.
This is really part of a bigger question: How do you deal with the state based nature of OpenGL in a modular software architecture? Buffer bindings are just one example of state you need to deal with. This is typically not very difficult to handle in small programs that you write by yourself, but is a possible trouble spot in larger systems. It gets worse if components from different sources (e.g. different vendors) are mixed.
I don't think there's one single approach that is ideal in all possible scenarios. The important thing is that you pick one clearly defined strategy, and use it consistently. How to handle this best in various scenarios is somewhat beyond the scope of an answer here.
While unbinding buffers should be fairly cheap, I'm not a fan of unnecessary calls. So I would try to avoid those calls, unless you really feel you need them to enforce a clear and consistent policy for the software you are writing.

Related

access to VBO from vertex shader with OpenGL ES 3.0

I have four VBO's (BufferA, BufferB, BufferC and BufferD) and two programs (program1 and program2).
Main steps of logic are:
glUseProgram(progran1);
glBindBuffer(GL_ARRAY_BUFFER, BufferA);
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, BufferB);
glBeginTransformFeedback(GL_POINTS);
glDrawArrays(GL_POINTS, 0, Vertex1Count);
glEndTransformFeedback();
swap(BufferA, BufferB);
glUseProgram(progran2);
glBindBuffer(GL_ARRAY_BUFFER, BufferC);
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, BufferD);
glBeginTransformFeedback(GL_POINTS);
glDrawArrays(GL_POINTS, 0, Vertex2Count);
glEndTransformFeedback();
swap(BufferC, BufferD);
Questions: What do I need to do to gain access to BufferB from program2?
Can I bind BufferB as texture somehow and read it with texelfetch?
I am using iOS 7 and OpenGL es 3.0
Yes, you can. You may use buffer as PBO and then create a texture from your buffer.
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, BufferB);
GLuint someTex;
glActiveTexture(GL_TEXTURE0);
glGenTexutre(1, &someTex);
glBindTexture(GL_TEXTURE_2D, someTex);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, 1, sizeOfYourBuffer, 0, GL_RGBA, GL_FLOAT, nullptr);
// nullptr is interpret as an offset in your buffer
In case of using PBO, TexImage* works fast since CPU is not involved in texture initialization.
Disadvantage of the approach is that the texture is not allowed to be changed. But if you are implementing iterating method you may use the "pin pong strategy" (Have different buffers for previous and new state; Swap it after visualization).

Image Rotation by using Opengl ES

I'm working on Opengl ES 2.0 using OMAP3530 development board on Windows CE 7.
My Task is to Load a 24-Bit Image File & rotate it about an angle in z-Axis & export the image file(Buffer).
For this task I've created a FBO for off-screen rendering & loaded this image file as a Texture by using glTexImage2D() & I've applied this Texture to a Quad & rotate that QUAD by using PVRTMat4::RotationZ() API & Read-Back by using ReadPixels() API. Since it is a single frame process i just made only 1 loop.
Here are the problems I'm facing now.
1) All API's are taking distinct processing time on every run.ie Sometimes when i run my application i get different processing time for all API's.
2) glDrawArrays() is taking too much time (~50 ms - 80 ms)
3) glReadPixels() is also taking too much time ~95 ms for Image(800x600)
4) Loading 32-Bit image is much faster than 24-Bit image so conversion is needed.
I'd like to ask you all if anybody facing/Solved similar problem kindly suggest me any
Here is the Code snippet of my Application.
[code]
[i]
void BindTexture(){
glGenTextures(1, &m_uiTexture);
glBindTexture(GL_TEXTURE_2D, m_uiTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, ImageWidth, ImageHeight, 0, GL_RGBA, GL_UNSIGNED_BYTE, pTexData);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER,GL_LINEAR );
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR );
}
int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, TCHAR *lpCmdLine, int nCmdShow)
{
// Fragment and vertex shaders code
char* pszFragShader = "Same as in RenderToTexture sample;
char* pszVertShader = "Same as in RenderToTexture sample;
CreateWindow(Imagewidth, ImageHeight);//For this i've referred OGLES2HelloTriangle_Windows.cpp example
LoadImageBuffers();
BindTexture();
Generate& BindFrame,Render Buffer();
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, m_auiFbo, 0);
glRenderbufferStorage(GL_RENDERBUFFER, GL_DEPTH_COMPONENT16, ImageWidth, ImageHeight);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_RENDERBUFFER, m_auiDepthBuffer);
BindTexture();
GLfloat Angle = 0.02f;
GLfloat afVertices[] = {Vertices to Draw a QUAD};
glGenBuffers(1, &ui32Vbo);
LoadVBO's();//Aps's to load VBO's refer
// Draws a triangle for 1 frames
while(g_bDemoDone==false)
{
glBindFramebuffer(GL_FRAMEBUFFER, m_auiFbo);
glClear(GL_COLOR_BUFFER_BIT|GL_DEPTH_BUFFER_BIT);
PVRTMat4 mRot,mTrans, mMVP;
mTrans = PVRTMat4::Translation(0,0,0);
mRot = PVRTMat4::RotationZ(Angle);
glBindBuffer(GL_ARRAY_BUFFER, ui32Vbo);
glDisable(GL_CULL_FACE);
int i32Location = glGetUniformLocation(uiProgramObject, "myPMVMatrix");
mMVP = mTrans * mRot ;
glUniformMatrix4fv(i32Location, 1, GL_FALSE, mMVP.ptr());
// Pass the vertex data
glEnableVertexAttribArray(VERTEX_ARRAY);
glVertexAttribPointer(VERTEX_ARRAY, 3, GL_FLOAT, GL_FALSE, m_ui32VertexStride, 0);
// Pass the texture coordinates data
glEnableVertexAttribArray(TEXCOORD_ARRAY);
glVertexAttribPointer(TEXCOORD_ARRAY, 2, GL_FLOAT, GL_FALSE, m_ui32VertexStride, (void*) (3 * sizeof(GLfloat)));
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);//
glReadPixels(0,0,ImageWidth ,ImageHeight,GL_RGBA,GL_UNSIGNED_BYTE,pOutTexData) ;
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindFramebuffer(GL_FRAMEBUFFER, 0);
eglSwapBuffers(eglDisplay, eglSurface);
}
DeInitAll();[/i][/code]
The PowerVR architecture can not render a single frame and allow the ARM to read it back quickly. It is just not designed to work that way fast - it is a deferred rendering tile-based architecture. The execution times you are seeing are too be expected and using an FBO is not going to make it faster either. Also, beware that the OpenGL ES drivers on OMAP for Windows CE are really poor quality. Consider yourself lucky if they work at all.
A better design would be to display the OpenGL ES rendering directly to the DSS and avoid using glReadPixels() and the FBO completely.
I've got improved performance for rotating a Image Buffer my using multiple FBO's & PBO's.
Here is the pseudo code snippet of my application.
InitGL()
GenerateShaders();
Generate3Textures();//Generate 3 Null Textures
Generate3FBO();//Generate 3 FBO & Attach each Texture to 1 FBO.
Generate3PBO();//Generate 3 PBO & to readback from FBO.
DrawGL()
{
BindFBO1;
BindTexture1;
UploadtoTexture1;
Do Some Processing & Draw it in FBO1;
BindFBO2;
BindTexture2;
UploadtoTexture2;
Do Some Processing & Draw it in FBO2;
BindFBO3;
BindTexture3;
UploadtoTexture3;
Do Some Processing & Draw it in FBO3;
BindFBO1;
ReadPixelfromFBO1;
UnpackToPBO1;
BindFBO2;
ReadPixelfromFBO2;
UnpackToPBO2;
BindFBO3;
ReadPixelfromFBO3;
UnpackToPBO3;
}
DeinitGL();
DeallocateALL();
By this way I've achieved 50% increased performance for overall processing.

OES_vertex_array_object and client state

I want a vertex array object in OpenGL ES 2.0 to hold two attributes from different buffers, the second buffer being read from client memory (glBindBuffer(GL_ARRAY_BUFFER, 0)) But I get a runtime error:
GLuint my_vao;
GLuint my_buffer_attrib0;
GLfloat attrib0_data[] = { 0, 0, 0, 0 };
GLfloat attrib1_data[] = { 1, 1, 1, 1 };
void init()
{
// setup vao
glGenVertexArraysOES(1, &my_vao);
glBindVertexArrayOES(my_vao);
// setup attrib0 as a vbo
glGenBuffers( 1, &my_buffer_attrib0 );
glBindBuffer(GL_ARRAY_BUFFER, my_buffer_attrib0);
glBufferData( GL_ARRAY_BUFFER, sizeof(attrib0_data), attrib0_data, GL_STATIC_DRAW );
glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, 0, 0);
glEnableVertexAttribArray( 0 );
glEnableVertexAttribArray( 1 );
// "end" vao
glBindVertexArrayOES( 0 );
}
void draw()
{
glBindVertexArrayOES(my_vao);
// (now I assume attrib0 is bound to my_buffer_attrib0,
// and attrib1 is not bound. but is this assumption true?)
// setup attrib1
glBindBuffer( GL_ARRAY_BUFFER, 0 );
glVertexAttribPointer(1, 4, GL_FLOAT, GL_FALSE, 0, attrib1_data);
// draw using attrib0 and attrib1
glDrawArrays( GL_POINTS, 0, 1 ); // runtime error: Thread1: EXC_BAD_ACCESS (code=2, address=0x0)
}
What I want to achieve is to wrap the binding of two attributes as a vertex array buffer:
void draw_ok()
{
glBindVertexArrayOES( 0 );
// setup attrib0
glBindBuffer( GL_ARRAY_BUFFER, my_buffer_attrib0 );
glVertexAttribPointer( 0, 4, GL_FLOAT, GL_FALSE, 0, 0);
// setup attrib1
glBindBuffer( GL_ARRAY_BUFFER, 0 );
glVertexAttribPointer(1, 4, GL_FLOAT, GL_FALSE, 0, attrib1_data);
glEnableVertexAttribArray( 0 );
glEnableVertexAttribArray( 1 );
// draw using attrib0 and attrib1
glDrawArrays( GL_POINTS, 0, 1); // ok
}
Is it possible to bind two different buffers in a vertex array object? Are OES_vertex_array_object's different from (plain) OpenGL vertex array objects? Also note that I get this error in XCode running the iOS simulator. These are related links:
Use of VAO around VBO in Open ES iPhone app Causes EXC_BAD_ACCESS When Call to glDrawElements
OES_vertex_array_object
Well, a quote from the extension specifications explains it quite simply:
Should a vertex array object be allowed to encapsulate client vertex arrays?
RESOLVED: No. The OpenGL ES working group agreed that compatibility with OpenGL and the ability to to guide developers to more performant drawing by enforcing VBO usage were more important than the possibility of hurting adoption of VAOs.
So you can indeed bind two different buffers in a VAO, (well, the buffer binding isn't stored in the VAO, anyway, only the source buffers of the individual attributes, set through glVertexAttribPointer) but you cannot use client space memory in a VAO, only VBOs. This is the same for desktop GL.
So I would advise you to store all your vertex data in VBOs. If you want to use client memory because the data is updated dynamically and you think VBOs won't buy you anything there, that's still the wrong approach. Just use a VBO with a dynamic usage (GL_DYNAMIC_DRAW or even GL_STREAM_DRAW) and update it using glBuffer(Sub)Data or glMapBuffer (or the good old glBufferData(..., NULL); glMapBuffer(GL_WRITE_ONLY) combination).
Remove the following line:
glBindBuffer( GL_ARRAY_BUFFER, 0 );
from the draw() function. You didn't bind any buffer before and it may mess up buffer state.
After some digging (reading), answers was found found in OES_vertex_array_object. It seems that OES_vertex_array_object's focus on state on the server side, and client state are used if and only if the zero object is bound. It remains to answer if OES_vertex_array_object's are the same as plain OpenGL VAO's. Please comment if you know the answer to this. Below are quotations from OES_vertex_array_object:
This extension introduces vertex array objects which encapsulate
vertex array states on the server side (vertex buffer objects).
* Should a vertex array object be allowed to encapsulate client
vertex arrays?
RESOLVED: No. The OpenGL ES working group agreed that compatibility
with OpenGL and the ability to to guide developers to more
performant drawing by enforcing VBO usage were more important than
the possibility of hurting adoption of VAOs.
An INVALID_OPERATION error is generated if
VertexAttribPointer is called while a non-zero vertex array object
is bound, zero is bound to the <ARRAY_BUFFER> buffer object binding
point and the pointer argument is not NULL [fn1].
[fn1: This error makes it impossible to create a vertex array
object containing client array pointers, while still allowing
buffer objects to be unbound.]
And the presently attached vertex array object has the following
impacts on the draw commands:
While a non-zero vertex array object is bound, if any enabled
array's buffer binding is zero, when DrawArrays or
DrawElements is called, the result is undefined.
So EXC_BAD_ACCESS was the undefined result!
The functionality you desire has now been accepted by the community as an extension to WebGL:
http://www.khronos.org/registry/webgl/extensions/OES_vertex_array_object/

White textures with OpenGL and DevIL on some Windows systems

I'm having some problems trying to load images with DevIL and creating textures in OpenGL. When a friend of mine tested my program on his Windows, all the rects, whom were supposed to contain textures, were white, without any image (texture). The problem seems to occur only on Window XP, Windows Vista and Windows 7, but not in every Windows PC: my Windows XP runs the program without problems.
Maybe there're some missing DLLs or files (improbable), or something that don't let the image to be loaded or used as a texture. By the other hand, the program runs fine on *UNIX systems.
This is the code I'm using to load an image and generate a texture:
void Image::load(const char* filename)
{
ILuint ilimg;
ilGenImages(1, &ilimg);
ilBindImage(ilimg);
if (!ilLoadImage(filename))
throw ImageLoadError;
glGenTextures(1, &image);
glBindTexture(GL_TEXTURE_2D, image);
bpp = ilGetInteger(IL_IMAGE_BPP);
width = ilGetInteger(IL_IMAGE_WIDTH);
height = ilGetInteger(IL_IMAGE_HEIGHT);
format = ilGetInteger(IL_IMAGE_FORMAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexImage2D(GL_TEXTURE_2D, 0, bpp, width, height, 0, format,
GL_UNSIGNED_BYTE, ilGetData());
ilDeleteImages(1, &ilimg);
}
This is the code that draws a rect with the texture applied:
void Rect::show()
{
glPushAttrib(GL_CURRENT_BIT);
glEnable(GL_TEXTURE_2D);
glColor4f(1.0, 1.0, 1.0, opacity);
glBindTexture(GL_TEXTURE_2D, texture->get_image());
glBegin(GL_POLYGON);
glTexCoord2i(0, 0); glVertex2f(x, y);
glTexCoord2i(1, 0); glVertex2f(x+width, y);
glTexCoord2i(1, 1); glVertex2f(x+width, y+height);
glTexCoord2i(0, 1); glVertex2f(x, y+height);
glEnd();
glDisable(GL_TEXTURE_2D);
glPopAttrib();
}
If you need some other code that I don't mentioned, ask me and i'll post it.
glTexImage2D(target, level, internal format, width, height, border, format, type, data);
The internal format parameter of glTexImage is not bits per pixel. Actually its numerical value is not related to the format at all. OpenGL defines only specific values to be valid, among them also four with numeric relation, but just as a mnemonic: 1 (shorthand for GL_LUMINANCE), 2 (GL_LUMINANCE_ALPHA), 3 (GL_RGB), 4 (GL_RGBA). There are also a number of other format tokens, but their numeric value is arbitrarily choosen.
Also the bpp for the image data is used to find the format and type parameters. You'll need to implement a LUT or switch-case structure for that.
The issue you are experiencing sounds like an implementation version issue. Probably a "correct" but less "nice" implementation isn't letting you get away with something subtly incorrect. Or on the other hand the implementation may have a subtle bug and isn't accepting your valid code.
I might try swapping
glColor4f(1.0, 1.0, 1.0, opacity);
glBindTexture(GL_TEXTURE_2D, texture->get_image());
to
glBindTexture(GL_TEXTURE_2D, texture->get_image());
glColor4f(1.0, 1.0, 1.0, opacity);
and declaring vertex before texcoord like.
glVertex2f(x, y);
glTexCoord2i(0, 0);
Short of that if you do not already have "The Red Book" consult http://glprogramming.com/red/chapter09.html

glDrawArrays() slow on iPad?

I was wondering how to speed up my iPad application using OpenGLES 2.0. At the moment we have every drawable object draw itself with a call to glDrawArrays(). Blend mode is on, we really need it. Without disabling blendmode, how would we improve performance for this app?
For instances, if we now draw 3 textures (1024x1024, 256x512, 256x512) across the whole screen, the app only gets 15FPS, which is really slow I think? Are we doing something terribly wrong? Our drawing code (for each drawable), is as follows:
- (void) draw {
GLuint textureAvailable = 0;
if(texture != nil){
textureAvailable = 1;
}
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, texture.name);
glVertexAttribPointer(ATTRIB_VERTEX, 2, GL_FLOAT, 0, 0, vertices);
glEnableVertexAttribArray(ATTRIB_VERTEX);
glVertexAttribPointer(ATTRIB_COLOR, 4, GL_FLOAT, 1, 0, colorsWithMultipliedAlpha);
glEnableVertexAttribArray(ATTRIB_COLOR);
glVertexAttribPointer(ATTRIB_TEXTUREMAP, 2, GL_FLOAT, 1, 0, textureMapping);
glEnableVertexAttribArray(ATTRIB_TEXTUREMAP);
//Note that we are NOT using position.z here because that is only used to determine drawing order
int *jnUniforms = JNOpenGLConstants::getInstance().uniforms;
glUniform4f(jnUniforms[UNIFORM_TRANSLATE], position.x, position.y, 0.0, 0.0);
glUniform4f(jnUniforms[UNIFORM_SCALE], scale.x, scale.y, 1.0, 1.0);
glUniform1f(jnUniforms[UNIFORM_ROTATION], rotation);
glUniform1i(jnUniforms[UNIFORM_TEXTURE_SAMPLE], 0);
glUniform2f(jnUniforms[UNIFORM_TEXTURE_REPEAT], textureRepeat.x, textureRepeat.y);
glUniform1i(jnUniforms[UNIFORM_TEXTURE_AVAILABLE], textureAvailable);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
}
Possible optimizations I think won't work:
Drawing geometry in batches
I'm only drawing 3 items and the FPS is 15, I don't think batching the geometry would work here because it's such a small number of calls for drawing that it doesn't matter if we kill 2/3 of those calls.
Texture Atlas
Again, only drawing 3 textures. What I do wonder if it would matter (a lot) if we were to convert these to PVR? I haven't looked into it, but I must admit we're loading big PNGs at the moment. Is there any way to see if this is indeed the case, or is it easier just to check it out?
But please tell me if I'm wrong, I'm happy to hear any ideas.
Proposed solutions
Mipmapped textures
Loading mipmapped textures, doing it like this:
- (id) initWithUIImage: (UIImage * const) image {
glGenTextures(1, &name);
//JNLogString(#"Recieved name(%d), binding texture", name);
glBindTexture(GL_TEXTURE_2D, name);
//Set the needed parameters for the texture
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
//Load the image data into the texture
glGenerateMipmap(GL_TEXTURE_2D);
return self;
}
This doesn't seem to do anything for our FPS, I think this is because our textures are already roughly at the size they are rendered to on the screen, in most cases even 1:1.
Other solutions are welcome! I will try them out and post the results here
If you are using very large textures, try to create mipmap textures. The cost is basically 1/3 of the original texture memory. I think they can be created with this call when setting up the textures.
glTexParameteri(GL_TEXTURE_2D, GL_GENERATE_MIPMAP, GL_TRUE);
Some calculations: If you have 3 textures 2048x2048 (max size) at 15 Hz you will have a texel throughput (if they are fully shown, ie downscaled to screen resolution) of 2048x2048x3x15 = 188,743,680 / sec which is around the value we see at glbenchmark.com for single fill rate (173 Mtexel/sec). But if you are using mipmap textures the texel throughput should be closer to the screen size resolution (1024x768) which should be something like 1/4 of the previous throughput.
I had a branch in my fragment shader. I though that didn't put a lot of strain on it, but it did! Anyhow, that was the whole problem, I removed the branch and now my FPS has almost doubled.

Resources