What command in webgl cause texture memory to be processed - performance

When setting up a single texture WebGL program what command actually causes "work" to be done?
It seems to me that the textImage2D command upload html image data to the GPU:
gl.texImage2D(target, level, internalformat, format, type, HTMLImageElement);
However once that data is uploaded and bound to a texture, that texture still needs to be "bound" to a sampler.
setActiveTexture(gl, 0, this['textureRef0']);
var samplerRef = gl.getUniformLocation(program, 'sampler0');
gl.uniform1i(samplerRef, 0);
Does any memory need to be allocated to bind the texture to the sampler? Or is it just a pointer that changes which points the sampler to the texture data?
Also what about binding textures to frame buffers?
gl.bindFramebuffer(gl.FRAMEBUFFER, this.globalFB);
gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, textureRef0, 0);
Does that act alone cause any significant performance issues? Or is "real" work only done when the program is called and data is rendered into that texture?

texImage2D allocates memory because the driver needs to make a copy of the data you pass it as the moment texImage2D returns you can change your memory.
framebuffers don't allocate much memory, the memory is the attachments. But, framebuffers need to be validated so it's better to make multiple framebuffers with every combination of attachments you need rather than change the attachments on a single framebuffer
In other words, for example if you're doing ping ponging of textures for post processing for example
// init time
const fb = gl.createFramebuffer();
// render time
for each pass
gl.framebufferTexture2D(...., dstTex); // bad!
...
gl.drawXXX
const t = srcTex; srcTex = dstTex; dstTex = t; // swap textures
}
vs
// init time
let srcFB = gl.createFramebuffer();
gl.framebufferTexture(...., srcTex);
let dstFB = gl.createFramebuffer();
gl.framebufferTexture(...., dstTex);
// render time
for each pass
gl.bindFramebuffer(dstFB); // good
...
gl.drawXXX
const t = srcFB; srcFB = dstFB; dstFB = t; // swap framebuffers
}
textures also have the issue that because of the API design GL has a bunch of work to do the first time you draw with a texture (and any time you change it's contents).
Consider this is a normal sequence in WebGL to supply mips
texImage2D level 0, 16x16
texImage2D level 1, 8x8
texImage2D level 2, 4x4
texImage2D level 3, 2x2
texImage2D level 4, 1x1
But this is also completely valid API calls
texImage2D level 0, 16x16
texImage2D level 1, 8x8
texImage2D level 2, 137x324 // nothing in the spec prevents this. It's fully valid
texImage2D level 3, 2x2
texImage2D level 4, 1x1
texImage2D level 2, 4x4 // fix level 2 before drawing
That call to level 2 with some strange size is valid. It's not allowed to give an error. Of course if you don't replace level 2 before drawing it will fail to draw but uploading the data is not wrong according to the API. That means it isn't until the texture is actually used that the driver can look at the data, formats, and sizes for each mip, check if they are all correct, and then finally arrange the data on the GPU.
texStorage was added to fix that issue (available in WebGL2/OpenGL ES 3.0)
Calling activeTexture, binding textures with bindTexture, and setting uniforms take no memory and have no significant performance issues.

Related

is it possible to partial update pvrtc texture

I created a 1024*1024 texture with
glCompressedTexImage2D(GL_TEXTURE_2D, 0, GL_COMPRESSED_RGBA_PVRTC_4BPPV1_IMG, 1024, 1024, 0, nDataLen*4, pData1);
then update it's first 512*512 part like this
glCompressedTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 512, 512, GL_COMPRESSED_RGBA_PVRTC_4BPPV1_IMG, nDataLen, pData2);
This update generated glerror 1282(invalid operation), if I update the whole 1024*1024 region all are ok, it seems that pvrtc texture cannot be partial updated.
Is it possible to partial update pvrtc textur, if it is how how?
Sounds to me like you can't on GLES2 (link to spec, see 3.7.3.)
Calling CompressedTexSubImage2D will result in an INVALID_OPERATION error if xoffset or yoffset is not equal to zero, or if width and height do not match the width and height of the texture, respectively. The contents of any texel outside the region modified by the call are undefined. These restrictions may be relaxed for specific compressed internal formats whose images are easily modified
Makes glCompressedTexSubImage2D sound a bit useless to me, tbh, but I guess it's for updating individual mips or texture array levels.
Surprisingly, i copyed a small pvrtc texture data into a large one, it works just like glCompressedTexSubImage2D.But i'am not sure whether it's safe to use this solution in my engine.
Rightly or wrongly, the reason PVRTC1 does not have CompressedTexSubImage2D support is that unlike, say, ETC* or S3TC, the texture data is not compressed as independent 4x4 squares of texels which, in turn, get represented as either 64 or 128 bits of data depending on the format. With ETC*/S3TC any aligned 4x4 block of texels can be replaced without affecting any other region of the texture simply by just replacing its corresponding 64- or 128-bit data block.
With PVRTC1, two aims were to avoid block artifacts and to take advantage of the fact that neighbouring areas are usually very similar and thus can share information. Although the compressed data is grouped into 64-bit units, these affect overlapping areas of texels. In the case of 4bpp they are ~7x7 and for 2bpp, 15x7.
As you later point out, you could copy the data yourself but there may be a fuzzy boundary: For example, I took these 64x64 and 32x32 textures (which have been compressed and decompressed with PVRTC1 #4bpp ) ...
+
and then did the equivalent of "TexSubImage" to get:
As you should be able to see, the border of the smaller texture has smudged as the colour information is shared across the boundaries.
In practice it might not matter but since it doesn't strictly match the requirements of TexSubImage, it's not supported.
PVRTC2 has facilities to do better subimage replacement but is not exposed on at least one well-known platform.
< Unsubtle plug > BTW if you want some more info on texture compression, there is a thread on the Stack Exchange Computer Graphics site < /Unsubtle plug >

opengl es shader program id and vbo buffer id are same

I am drawing 2 triangles in opengl es 2.0 using vbo.
the program handle (hProgramHandle)
hProgramHandle = glCreateProgram(); // value is 210003
is same as the iVertBuffId
glGenBuffers(1, &iVertBuffId1); // for vertices // 70001
...
...
glGenBuffers(1, &iVertBuffId2); // for color // 140002
...
...
glGenBuffers(1, &iVertBuffId3); // for texture // 210003
I have created 3 buffers (each for position, color and texture).
The above issue is coming while generating the buffer for the texture.
I am not getting the output.
Will opengl generate the same number for the program id and vbo buffer id?
That is dependent on the implementation of the particular OpenGL ES driver you are running, but yes the values can be the same because they are handles to different types of objects and not necessarily memory pointers. Think of them as indexes into different data structures.
Ids returned by OpenGL are in fact names refering to its internal storage.
Internal OpenGL storage is divided by speciality, so it can optimize its memory access at will.
Where this is counter-intuitive is that ids are in fact not unique, but rather dependant on what you are talking about to OpenGL : e.g. what is currently bound.
It is absolutely correct for OpenGL to give you identical ids, as long as they refer to something different : texture ids and buffer ids can overlap, that's not a problem.
Remark that they may or may not overlap, and begin by 0 in order or simply give you what seems to be random numbers, that is implementation dependant.

Using glGenerateMipmap() with glCopyTexImage2D()

I wish to upload textures with non-zero mipmap levels using glCopyTexImage2D().
I am using following code for the same :
// Render Some Geometry
GLint mipmap_level = 1;
glGenTextures(1, &textureId);
glBindTexture(GL_TEXTURE_CUBE_MAP, textureId);
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_NEAREST);
glCopyTexImage2D(GL_TEXTURE_CUBE_MAP_POSITIVE_X,mipmap_level,GL_RGBA16_SNORM,0,0,128,128,0);
// Five other glCopyTexImage2D() calls to load textures
glGenerateMipmap(GL_TEXTURE_CUBE_MAP);
//Render geometry
Here, if I use mipmap_level = 1 the geometry is not drawn at all. How exactly does mipmap levels work in conjunction with glCopyTexImage2D() API ?
I suppose that using level = 1 , loads 64x64 texture i.e. the first sampled mipmap.
Using glGenerateMipmap() call before glCopyTexImage2D() will not make any sense. So how exactly the driver will load a non zero mipmap level using glCopyTexImage2D() ?
You first set the image for mipmap level 1 and then you call glGenerateMipmap to automatically generate mipmaps from the image of the first mipmap level, which is mipmap level 0. So all mipmap images following level 0 just get overwritten by the automatically generated images. glGenerateMipmap doesn't care how you set the image for mipmap level 0, it just assumes you did. On the other hand I don't see you specifying an image for level 0, so it will probably just contain rubbish or be regarded as incomplete, which will make any use of the corresponding texture fail.
In the end I hope this question doesn't just amount to knowing that in most programming languages one usually starts to count at 0 instead of 1, and so does OpenGL.
EDIT: As datenwolf points out in his comment, you can change the base mipmap level to be used for mipmap-filtering and as input for glGenerateMipmap by doing a
glTexParamteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_BASE_LEVEL, 1);
But don't do this just because you want to start counting at 1 instead of 0. If on the other hand you have a valid image for mipmap level 0 and want to set one for level 1 and use this for computing the other levels, then changing the base level to 1 before calling glGenerateMipmap (and probably back to 0 aftwerwards) can be a good idea, though I cannot come up with a usecase for this approach right away.

OpenGL rendering performance - default shader or no shader?

I have some code for rendering a video, so the OpenGL side of it (once the rendered frame is available in the target texture) is very simple: Just render it to the target rectangle.
What complicates things a bit is that I am using a third-party SDK to render the UI, so I cannot know what state changes it makes, and therefore every time I am rendering a frame I have to make sure all the states I need are set correctly.
I am using a vertex and a texture coordinate buffer to draw my rectangle like this:
glActiveTexture(GL_TEXTURE0);
glEnable(GL_TEXTURE_RECTANGLE_ARB);
glBindTexture(GL_TEXTURE_RECTANGLE_ARB, texHandle);
glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_REPLACE);
glPushClientAttrib( GL_CLIENT_VERTEX_ARRAY_BIT );
glEnableClientState( GL_VERTEX_ARRAY );
glEnableClientState( GL_TEXTURE_COORD_ARRAY );
glBindBuffer(GL_ARRAY_BUFFER, m_vertexBuffer);
glVertexPointer(4, GL_FLOAT, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, m_texCoordBuffer);
glTexCoordPointer(2, GL_FLOAT, 0, 0);
glDrawArrays(GL_QUADS, 0, 4);
glPopClientAttrib();
(Is there anything that I can skip - even when not knowing what is happening inside the UI library?)
Now I wonder -and this is more theoretical as I suppose there won't be much difference when just drawing one Quad- if it is theoretically faster to just render like the above, or instead write a simple default vertex and fragment shader that does maybe nothing more than returning ftransform() for the position and uses the default way for the fragment color too?
I wonder if by using a shader I can skip certain state changes, or generally speed up things. Or if by using the above code OpenGL internally just does that and the outcome will be exactly the same?
If you are worried about clobbering the UI SDK state, you should wrap the code with glPushAttrib(GL_ENABLE_BIT | GL_TEXTURE_BIT) ... glPopAttrib() as well.
You could simplify the state management code a bit by using a vertex array object.
As to using a shader, for this simple program I wouldn't bother. It would be one more bit of state you'd have to save & restore, and you're right that internally OpenGL is probably doing just that for the same outcome.
On speeding things up: performance is going to be dominated by the cost of sending tens? hundreds? of kilobytes of video frame data to the GPU, and adding or removing OpenGL calls is very unlikely to make a difference. I'd look first at possible differences in frame rate between the UI and video stream: for example, if the frame rate is faster, arrange for the video data to be copied once and re-used instead of copying it each time the UI is redrawn.
Hope this helps.

glTexSubImage2D extremely slow on Intel video card

My video card is Mobile Intel 4 Series. I'm updating a texture with changing data every frame, here's my main loop:
for(;;) {
Timer timer;
glBindTexture(GL_TEXTURE2D, tex);
glBegin(GL_QUADS); ... /* draw textured quad */ ... glEnd();
glTexSubImage2D(GL_TEXTURE2D, 0, 0, 0, 512, 512,
GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV, data);
swapBuffers();
cout << timer.Elapsed();
}
Every iteration takes 120ms. However, inserting glFlush before glTexSubImage2D brings the iteration time to 2ms.
The issue is not in the pixel format. I've tried the pixel formats BGRA, RGBA and ABGR_EXT together with the pixel types UNSIGNED_BYTE, BYTE, UNSIGNED_INT_8_8_8_8 and UNSIGNED_INT_8_8_8_8_EXT. The texture's internal pixel format is RGBA.
The order of calls matters. Moving the texture upload before the quad drawing, for example, fixes the slowness.
I also tried this on an GeForce GT 420M card, and it works fast there. My real app does have performance problems on non-Intel cards that are fixed by glFlush calls, but I haven't distilled those to a test case yet.
Any ideas on how to debug this?
One issue is that glTexImage2D performs a full reinitialization of the texture object. If only the data changes, but the format remains the same, use glTexSubImage2D to speed things up (just a reminder).
The other issue is, that despite its name the immediate mode, i.e. glBegin(…) … glEnd() the drawing calls are not synchronous, i.e. the calls return long before the GPU is done drawing. Adding a glFinish() will synchronize. But as well will do calls to anything that modifies data still required by queued operations. So in your case glTexImage2D (and glTexSubImage2D) must wait for the drawing to finish.
Usually it's best to do all volatile resource uploads at either the beginning of the drawing function, or during the SwapBuffers block in a separate thread through buffer objects. Buffer objects have been introduced for that very reason, to allow for asynchronous, yet tight operation.
I assume you're actually using that texture for one or more of your quads?
Uploading textures is one of the most expensive operations possible. Since your texture data changes every frame, the upload is unavoidable, but you should try to do it when the texture isn't in use by shaders. Remember that glBegin(GL_QUADS); ... glEnd(); doesn't actually draw quads, it requests that the GPU render the quads. Until the rendering completes, the texture will be locked. Depending on the implementation, this might cause the texture upload to wait (ala glFlush), but it could also cause the upload to fail, in which case you've wasted megabytes of PCIe bandwidth and the driver has to retry.
It sounds like you already have a solution: upload all new textures at the beginning of the frame. So what's your question?
NOTE: Intel integrated graphics are horribly slow anyway.
When you make a Draw Call ( glDrawElements, other ), the driver simply add this call in a buffer, and let the GPU consume these commands when it can.
If this buffer had to be consumed entirely at glSwapBuffers, this would mean that the GPU would be idle after that, waiting for you to send new commands.
Drivers solve this by letting the GPU lag one frame behind. This is the first reason why glTexSubImage2D blocks : the driver waits for the GPU not to use it anymore (in the previous frame) to begin the transfer, so that you never get half-updated data.
The other reason is that glTexSubImage2D is synchronous. Il will also block during the whole transfer.
You can solve the first issue by keeping 2 textures : one for the current frame, one for the previous frame. Upload the texture in the former, but draw with the latter.
You can solve the second issue by using a GL_TEXTURE_BUFFER Buffer Object, which allows asynchronous transfers.
In your case, I suspect that calling glTexSubImage2D just before glSwapBuffer adds an extra synchronization in the driver, whereas drawing the quad just before glSwapBuffer simply appends the command in the buffer. 120ms is probably a driver bug, though : even an Intel GMA doesn't need 120ms to upload a 512x512 texture.

Resources