I implemented an opengl-es application running on mali-400 gpu.
I grab the 1280x960 RGB buffer from camera and render on gpu using glTexImage2D.
However the glTexImage2D call takes around 25 milliseconds for 1280x960 resolution frame. It does extra memcopy of pCameraBuffer.
1) Is there any way to improve the performance of glTexImage2D?
2) Will FBO help? how can I use Frame Buffer Objects to render. I found few FBO examples, but I see that these examples pass NULL to glTexImage2d in last argument (data). so how can I render pCameraBuffer with FBO?
below is the code running for each camera frame.
glGenTextures(1, &textureID);
glBindTexture(GL_TEXTURE_2D, textureID);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, SCENE_WIDTH, SCENE_HEIGHT, 0, GL_RGB, GL_UNSIGNED_BYTE, pCameraBuffer);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
glDeleteTextures(1, &textureID);
The usual approach to this type of thing is to try and import the camera buffer directly into the graphics driver, avoiding the need for any memory allocation or copy at all. Whether this is supported depends on a lot on the platform integration, and the capabilities of the drivers in the system.
For Linux systems, which is what you indicate you are using, the route is via the EGL_EXT_image_dma_buf_import extension. You need a camera driver which creates a surface backed by dma_buf managed memory, and a side-channel to get the dma_buf file handle into the application running the graphics operations. You can then turn this into an EGLImage using the extension above.
Related
I render to texture using frame buffer. But I am not sure when should I correctly use glDeleteFramebuffers. Should fbo exist while texture exists, or can I safely call glDeleteFramebuffers after last draw to texture.
You can safely call glDeleteFramebuffers after the last draw to texture. However, I would work on the assumption that creating and destroying framebuffers is expensive, so I would only do it if I knew for sure I wouldn't be rendering to that texture ever again.
I have experienced bugs on some Android GLES drivers where I've had to detach the texture from the framebuffer prior to deleting the framebuffer so I'd recommend you do that as a precaution:
glBindFramebuffer(GL_FRAMEBUFFER, frameBuffer);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, 0, 0);
Is there any way to detect if a driver supports blending for floating point targets in OpenGL ES 2 / WebGL? On some mobile devices, glDrawElements throws GL_INVALID_OPERATION for floating point textures with blending enabled.
As no extension guarantees floating point frame buffer support, I check for it like this:
glGenTextures(1, &texture_test);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, texture_test);
glTexImage2D(GL_TEXTURE_2D, 0, opengl_es_2 ? GL_RGBA : GL_RGBA32F, 1, 1, 0, GL_RGBA, GL_FLOAT, NULL);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glGenFramebuffers(1, &fbo_test);
glBindFramebuffer(GL_FRAMEBUFFER, fbo_test);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, texture_test, 0);
fbo_test_status = glCheckFramebufferStatus(GL_FRAMEBUFFER);
bool floating_point_supported = (fbo_test_status == GL_FRAMEBUFFER_COMPLETE);
This does not seem to guarantee blending support though. I could draw something and check the error state with blending enabled but am wondering if there's a more elegant way.
OpenGL ES 2.0 doesn't natively support floating point rendering at all - it is only available on some platforms via extensions.
Floating point is only officially available in OpenGL ES 3.2.
As no extension guarantees floating point frame buffer support
Apart from these two, sure ...
https://developer.mozilla.org/en-US/docs/Web/API/OES_texture_float
https://developer.mozilla.org/en-US/docs/Web/API/WEBGL_color_buffer_float
I'm working on an OS X app in a multi-GPU setup (Mac Pro late-2013) that uses OpenCL (on the secondary GPU) to generate a texture which is later drawn to the screen with OpenGL (on the primary GPU). The app is CPU-bound due to calls to glBindTexture() and glBegin(), both of which are spending basically all of their time in:
_platform_memmove$VARIANT$Ivybridge
which is a part of the video driver:
AMDRadeonX4000GLDriver
Setup: creates the OpenGL texture (glPixelBuffer) and then its OpenCL counterpart (clPixelBuffer).
cl_int clerror = 0;
GLuint glPixelBuffer = 0;
cl_mem clPixelBuffer = 0;
glGenTextures(1, &glPixelBuffer);
glBindTexture(GL_TEXTURE_2D, glPixelBuffer);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 2048, 2048, 0, GL_RGBA, GL_FLOAT, NULL);
glBindTexture(GL_TEXTURE_2D, 0);
clPixelBuffer = clCreateFromGLTexture(_clShareGroupContext, CL_MEM_WRITE_ONLY, GL_TEXTURE_2D, 0, glPixelBuffer, &clerror);
Drawing code: maps the OpenGL texture onto the viewport. The entire NSOpenGLView is just this one texture.
glClear(GL_COLOR_BUFFER_BIT);
glBindTexture(GL_TEXTURE_2D, _glPixelBuffer); // <- spends cpu time here,
glBegin(GL_QUADS); // <- and here
glTexCoord2f(0., 0.); glVertex3f(-1.f, 1.f, 0.f);
glTexCoord2f(0., hr); glVertex3f(-1.f, -1.f, 0.f);
glTexCoord2f(wr, hr); glVertex3f( 1.f, -1.f, 0.f);
glTexCoord2f(wr, 0.); glVertex3f( 1.f, 1.f, 0.f);
glEnd();
glBindTexture(GL_TEXTURE_2D, 0);
glFlush();
After gaining control of the texture memory (via clEnqueueAcquireGLObjects()), the OpenCL kernel writes data to the texture and then releases control of it (via clEnqueueReleaseGLObjects()). The texture data should never exist in main memory (if I understand all of this correctly).
My question is: is it expected that so much CPU time is spent in memmove()? Is it indicative of a problem in my code? Or a bug in the driver, perhaps? My (unfounded) suspicion is that the texture data is moving via: GPUx -> CPU/RAM -> GPUy, which I'd like to avoid.
Before I touch on the memory transfer, my first observation is that you're using clBegin() which is not going to be your best friend because
1) This direct drawing does not work well with the driver. Use VBOs, etc. instead so this data can live on the GPU.
2) On OS X it means you're in their old compatibility context rather than the new core context. As (I understand) the new context is a complete rewrite this is where future optimizations will end up while the context you're using is (probably) simply being maintained.
So to the memory transfer.... on the GL side are you putting in glCreateSyncFromCLeventARB() and glWaitSync() on that? There should be no need for the glFlush() I see in your code. Once you've got rid of the immediate mode drawing (as mentioned above) and are using sync objects between the two APIs your host code should be doing nothing (except asking the driver to tell the GPU to do things). This will give you your best chance of having speedy buffer copies....
Yes, copies :( Because your CL texture physically lives on a different piece of GPU memory to the GL texture there will have to be a copy over PCIe bus which will be slow(er). This is what you're seeing in your profiling. What's actually happening is that the CPU is mapping GPU memory A and GPU memory B into pinned host memory and then copying between them (hopefully) with a DMA. I doubt the data actually touches system memory so the move is GPUx -> GPUy.
Try putting your CL and GL contexts on the same GPU and I think you'll see your transfer time disappear.
Final thought: if your CL compute is being dwarfed by the transfer time it's probably best to stick the contexts on the same CPU. You've got the classic CPU/GPU task split problem.
Is it possible to overlap shader effects in OpenGL ES 2.0? (not using FBOs)
How to use the result of a shader with another shader without having to do a glReadPixels and push again the processed pixels?
The next pseudo-code is what I'm trying to achieve:
// Push RGBA pixels into the GPU
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, Pixels_To_Render);
// Apply first shader effect
glUseProgram( FIRST_SHADER_HANDLE);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
// Apply the second shader effect sampling from the result of the first shader effect
glUseProgram( SECOND_SHADER_HANDLE );
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
// Get the overall result
glReadPixels(......)
I presume you're talking about pixel processing with fragment shaders?
With the OpenGLĀ ES 2.0 core API, you can't get pixels from the destination framebuffer into the fragment shader without reading them back from the GPU.
But if you're on a device/platform that supports a shader framebuffer fetch extension (EXT_shader_framebuffer_fetch on at least iOS, NV_shader_framebuffer_fetch in some other places), you're in luck. With that extension, a fragment shader can read the fragment data from the destination framebuffer for the fragment it's rendering to (and only that fragment). This is great for programmable blending or pixel post-processing effects because you don't have to incur the performance penalty of a glReadPixels operation.
Declare that you're using the extension with #extension GL_EXT_shader_framebuffer_fetch : require, then read fragment data from the gl_LastFragData[0] builtin. (The subscript is for the rendering target index, but you don't have multiple render targets unless you're using OpenGLĀ ES 3.0, so it's always zero.) Process it however you like and write to gl_FragColor or gl_FragData as usual.
On my windows machine I see no difference between next settings:
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
and
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
and both are of quite bad quality. Am I missing some settings in pipeline?
And if this is some kind of oddity what are my options to overcome this by the means of
opengl without using custom scaling?
Don't sure about windows tag. )
For minifictation, you likely want to enable mipmapping. See GL_LINEAR_MIPMAP_LINEAR, otherwise you will get very noticeable aliasing in high frequency textures, when you zoom in more than 2x.
Of course, you need to generate mipmaps, do use this!
The filter is applicable only when a texture in minified respect the original size.
What is your projection parameters, and how do you display the texture? Answering to these question may help us to find the solution.
Probably your texture is not minified, I suppose. In this case, try to setup the MAG_FILTER texture parameter to have effects using your projection.