AMD OpenGL and HDR display problem on Windows - windows

I have been using OpenGL to display HDR content following explanation from nvidia:
https://on-demand.gputechconf.com/gtc/2017/presentation/s7394-tom-true-programming-for-high-dynamic-range.pdf
And it works great, but only on nVidia GPUs.
Using the same method:
Specify WGL_PIXEL_TYPE_ARB = WGL_TYPE_RGBA_FLOAT_ARB
with color depth 16 (WGL_RED_BITS_ARB = 16, WGL_GREEN_BITS_ARB = 16, WGL_BLUE_BITS_ARB = 16)
On AMD GPUs it displays SDR image.
That is to say it clamps the fragment shader values to 1.0, while on nvidia gpus it allows values to ~25.0 (or 10.000 nits as i understand it), and displays it correctly.
This is using the same TV (LG B9) and same OS (Wind 10).
Note that other apps, like Chrome displays HDR content correctly on AMD gpus, and directX tests apps also.
Tried bunch of different AMD GPUs, drivers settings texture formats, pixel types etc, with no luck.
Read thru whole https://gpuopen.com/ for clues, no luck.
Anyone have an idea or example how to create a proper OpenGL HDR Context/configuration?
I'll try an minimal example here, but its part of larger process and in Delphi, so it will be for orientation only
const
PixelaAttribList: array[0..20] of Integer =( //
WGL_DRAW_TO_WINDOW_ARB, 1, //
WGL_DOUBLE_BUFFER_ARB, 1, //
WGL_SUPPORT_OPENGL_ARB, 1, //
WGL_ACCELERATION_ARB, WGL_FULL_ACCELERATION_ARB, //
WGL_SWAP_METHOD_ARB, WGL_SWAP_EXCHANGE_ARB, //
WGL_PIXEL_TYPE_ARB, WGL_TYPE_RGBA_FLOAT_ARB, //
WGL_RED_BITS_ARB, 16, //
WGL_GREEN_BITS_ARB, 16, //
WGL_BLUE_BITS_ARB, 16, //
WGL_ALPHA_BITS_ARB, 0, //
0);
var
piFormats: PGLint;
Begin
wglChoosePixelFormatARB(DC, #PixelaAttribList, NIL, 100, piFormats, #nNumFormats);
if nNumFormats = 0 then
exit;
if not SetPixelFormat(DC, piFormats^, nil) then
exit;
hrc:= wglCreateContextAttribsARB(DC, 0, nil);
if Result <> 0 then
ActivateRenderingContext(DC, hrc);
After the code i tested format with
wglGetPixelFormatAttribivARB
and I get 16 bit per color, so exactly whats needed.
Fragment shader is simple:
gl_FragColor = vec4(25.0,25.0,25.0,1.0);
Regards

Related

ARCore ArSession_setCameraTextureNames

If calling ArSession_setCameraTextureNames with a list of textures, saying 2, like this
uint _textureHandles[2] = { 0, 0 };
glGenTextures(2, textureHandles);
ArSession_setCameraTextureNames(ar_session, 2, textureHandles);
How does the application know which texture is being used for the latest frame ?
Any chance that order of texture will be messed up and ring buffer of textures could be mapped to wrong frames ?
Thanks

Improving the performance of Webgl2 texSubImage2D call with large texture

Using WebGL2 I stream a 4K by 2K stereoscopic video as a texture onto the inside of a sphere in order to provide 360° VR video playback capability. I've optimized as much of the codebase as is feasible given the returns on time and the application runs flawlessly when using an .H264 video source.
However; when using 8bit VP8 or VP9 (which offer superior fidelity and file size, AV1 isn't available to me) I encounter FPS drops on weaker systems due to the extra CPU requirements for decoding VP8/VP9 video.
When profiling the app, I've identified that the per-frame call of texSubImage2D that updates the texture from the video consumes the large majority of each frame (texImage2D was even worse due to it's allocations), but am unsure how to further optimize it's use. Below are the things I'm already doing to minimize it's impact:
I cache the texture's memory space at initial load using texStorage2D to keep it as contiguous as possible.
let glTexture = gl.createTexture();
let pixelData = new Uint8Array(4096*2048*3);
pixelData.fill(255);
gl.bindTexture(GL.TEXTURE_2D, glTexture);
gl.texStorage2D(GL.TEXTURE_2D, 1, GL.RGB8, 4096, 2048);
gl.texSubImage2D(GL.TEXTURE_2D, 0, 0, 0, 4096, 2048, GL.RGB, GL.RGB, pixelData);
gl.generateMipmap(GL.TEXTURE_2D);
Then, during my render loop, both left and right eye-poses are processed for each object before moving on to the next object. This allows me to only need to call gl.bindTexture and gl.texSubImage2D once per object per frame. Additionally I also, skip populating shader program defines if the material for this entity is the same as the one for the previous entity, the video is paused, or still loading.
/* Main Render Loop Extract */
//Called each frame after pre-sorting entities
function DrawScene(glLayer, pose, scene){
//Entities are pre-sorted for transparency blending, rendering opaque first, and transparent second.
for (let ii = 0; ii < _opaqueEntities.length; ii++){
//Only render if entity and it's parent chain are active
if(_opaqueEntities[ii] && _opaqueEntities[ii].isActiveHeirachy){
for (let i = 0; i < pose.views.length; i++) {
_RenderEntityView(pose, i, _opaqueEntities[ii]);
}
}
}
for (let ii = 0; ii < _transparentEntities.length; ii++) {
//Only render if entity and it's parent chain are active
if(_transparentEntities[ii] && _transparentEntities[ii].isActiveHeirachy){
for (let i = 0; i < pose.views.length; i++) {
_RenderEntityView(pose, i, _transparentEntities[ii]);
}
}
}
}
let _programData;
function _RenderEntityView(pose, viewIdx, entity){
//Calculates/manipualtes view matrix for entity for this view. (<0.1ms)
//...
//Store reference to make stack overflow lines shorter :-)
_programData = entity.material.shaderProgram;
_BindEntityBuffers(entity, _programData);//The buffers Thomas, mind the BUFFERS!!!
gl.uniformMatrix4fv(
_programData.uniformData.uProjectionMatrix,
false,
_view.projectionMatrix
);
gl.uniformMatrix4fv(
_programData.uniformData.uModelViewMatrix,
false,
_modelViewMatrix
);
//Render all triangles that make up the object.
gl.drawElements(GL.TRIANGLES, entity.tris.length, GL.UNSIGNED_SHORT, 0);
}
let _attrName;
let _attrLoc;
let textureData;
function _BindEntityBuffers(entity, programData){
gl.useProgram(programData.program);
//Binds pre-defined shader atributes on an as needed basis
for(_attrName in programData.attributeData){
_attrLoc = programData.attributeData[_attrName];
//Bind only if exists in shader
if(_attrLoc.key >= 0){
_BindShaderAttributes(_attrLoc.key, entity.attrBufferData[_attrName].buffer,
entity.attrBufferData[_attrName].compCount);
}
}
//Bind triangle index buffer
gl.bindBuffer(GL.ELEMENT_ARRAY_BUFFER, entity.triBuffer);
//If already in use, is instanced material so skip configuration.
if(_materialInUse == entity.material){return;}
_materialInUse = entity.material;
//Use the material by applying it's specific uniforms
//Apply base color
gl.uniform4fv(programData.uniformData.uColor, entity.material.color);
//If shader uses a difuse texture
if(programData.uniformData.uDiffuseSampler){
//Store reference to make stack overflow lines shorter :-)
textureData = entity.material.diffuseTexture;
gl.activeTexture(gl.TEXTURE0);
//Use assigned texture
gl.bindTexture(gl.TEXTURE_2D, textureData);
//If this is a video, update the texture buffer using the current video's playback frame data
if(textureData.type == TEXTURE_TYPE.VIDEO &&
textureData.isLoaded &&
!textureData.paused){
//This accounts for 42% of all script execution time!!!
gl.texSubImage2D(gl.TEXTURE_2D, textureData.level, 0, 0,
textureData.width, textureData.height, textureData.internalFormat,
textureData.srcType, textureData.video);
}
gl.uniform1i(programData.uniformData.uDiffuseSampler, 0);
}
}
function _BindShaderAttributes(attrKey, buffer, compCount, type=GL.FLOAT, normalize=false, stride=0, offset=0){
gl.bindBuffer(GL.ARRAY_BUFFER, buffer);
gl.vertexAttribPointer(attrKey, compCount, type, normalize, stride, offset);
gl.enableVertexAttribArray(attrKey);
}
I've contemplated using pre-defined counters for all for loops to avoid the var i=0; allocation, but the gain from that seems hardly worth the effort.
Side Note, The source video is actually larger than 4K, but anything above 4K and FPS grinds to about 10-12.
Obligatory: The key functionality above is extracted from a larger WebGL rendering framework I wrote that itself runs pretty damn fast already. The reason I'm not 'just using' Three, AFrame, or other such common libraries is that they do not have an ATO from the DOD, whereas in-house developed code is ok.
Update 9/9/21: At some point when chrome updated from 90 to 93 the WebGL performance of texSubImage2D dropped dramatically, resulting in +100ms per frame execution regardless of CPU/GPU capability. Changing to use texImage2D now results in around 16ms per frame. In addition shifting from RGB to RGB565 offers up a few ms of performance while minimally sacrificing color.
I'd still love to hear from GL/WebGL experts as to what else I can do to improve performance.

Screen glitching in OS X Metal app? Error (IOAF code 1)?

I'm making an app on OS X Sierra using Metal. Something I am doing is causing the screen to start glitching badly, flashing black in various places, which quickly escalates to the entire screen going black.
In XCode, if I use the GPU frame capture, the paused frame appears correct however -- it suddenly returns from the black abyss. I don't see any errors or warnings in the GPU frame information. However, I am relatively new to Metal and am not experienced with the frame debugger, so I may not know what to look for.
Usually there is nothing printed to the console, but occasionally I do get one of these:
Execution of the command buffer was aborted due to an error during execution. Internal Error (IOAF code 1)
The same app runs on iOS devices without this problem -- so far it only happens on OS X. Does this sound familiar? Any suggestions on what I should check?
I can post some code if it will be helpful, but right now I'm not sure what part of the program is the problem.
EDIT: In response to Noah Witherspoon -- it seems that the problem is caused by some kind of interaction between my scene drawing and the UI drawing. If I display only my scene, which is composed of fat, fuzzy lines, then the problem does not occur. It also does not occur if I display only the UI, which is orthographic projection, a bunch of rounded rectangles and some type. The problem happens only when both are showing. This is a lot of code, many buffers and a lot of commandBuffer usage, too much to put into a post. But here is a little bit.
My lines are rendered with vertex buffers which are arrays of floats, four per vertex:
let dataSize = count * 4 * MemoryLayout<Float>.size
vertexBuffer = device.makeBuffer(bytes: points, length: dataSize, options: MTLResourceOptions())!
These are rendered like this:
renderEncoder.setVertexBuffer(self.vertexBuffer, offset: 0, index: 0)
renderEncoder.setRenderPipelineState(strokeNode.strokePipelineState)
renderEncoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: Int(widthCountEdge.count)*2-4)
renderEncoder.setRenderPipelineState(strokeNode.capPipelineState)
renderEncoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 12)
Here's my main loop for drawing with the command Buffer.
if let commandBuffer = commandQueue.makeCommandBuffer() {
commandBuffer.addCompletedHandler { (commandBuffer) -> Void in
self.strokeNode.bufferProvider.availableResourcesSemaphore.signal()
}
self.updateDynamicBufferState()
self.updateGameState(timeInterval)
let renderPassDescriptor = view.currentRenderPassDescriptor
renderPassDescriptor?.colorAttachments[0].loadAction = .clear
renderPassDescriptor?.colorAttachments[0].clearColor = MTLClearColor(red: 0.0, green: 0.0, blue: 0.0, alpha: 0.0)
renderPassDescriptor?.colorAttachments[0].storeAction = .store
if let renderPassDescriptor = renderPassDescriptor, let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor) {
strokeNode.subrender(renderEncoder, parentModelViewMatrix: viewMatrix, projectionMatrix: projectionMatrix, renderer:self)
mainscreen.draw(renderEncoder);
renderEncoder.endEncoding()
if let drawable = view.currentDrawable {
commandBuffer.present(drawable)
}
}
commandBuffer.commit()
}
The line drawing happens in strokeNode.subrender, and then my UI drawing happens in mainscreen.draw. The UI drawing has a lot of components - a lot to list here -- but I will try taking them out one by one and see if I can narrow it down. If none of this looks problematic I'll edit and post some of that ...
Thanks!

setLineWidth works differently on different test machines

I make a game in which I draw bezier curve like this:
final VertexBufferObjectManager vbom = engine.getVertexBufferObjectManager();
final HighPerformanceMeshVertexBufferObject pMeshVBOM = new HighPerformanceMeshVertexBufferObject(vbom, pBufferData, pBufferData.length, DrawType.DYNAMIC, true, Mesh.VERTEXBUFFEROBJECTATTRIBUTES_DEFAULT);
final HighPerformanceLineChainVertexBufferObject pLeftCurbLineChainVBOM = new HighPerformanceLineChainVertexBufferObject(vbom, triangleCount * 3, DrawType.DYNAMIC, true, leftCurb.VERTEXBUFFEROBJECTATTRIBUTES_DEFAULT);
final HighPerformanceLineChainVertexBufferObject pRightCurbLineChainVBOM = new HighPerformanceLineChainVertexBufferObject(vbom, triangleCount * 3, DrawType.DYNAMIC, true, rightCurb.VERTEXBUFFEROBJECTATTRIBUTES_DEFAULT);
leftCurb = new LineStrip(0, 0, 10f, triangleCount, pLeftCurbLineChainVBOM){
#Override
protected void onManagedUpdate(final float pSecondsElapsed) {
super.onManagedUpdate(pSecondsElapsed);
drawByBezier(curveOffset);
};
void drawByBezier(float curveOffset){
for (int triangleIndex = 0; triangleIndex < triangleCount; triangleIndex++) {
this.setX(triangleIndex, getBezierX(triangleIndex, -curveBottom, -curveControlPoint, -curveTop + curveOffset));
this.setY(triangleIndex, triangleIndex * heightIncrement);
}
}
By changing a value of curveOffset I change the look of the curve.
The first parameter (10f) is a line width. When I test it on Galaxy S5 (android 5) the line width is drawn as about 2 pixels wide and if I put lower value there, like 1.5f the drawn line is very thin. On the other hand putting large numbers like 100f doesn't do anything - line stays at same (small) width.
I tested this on Galaxy S3 mini (android 4.1.2) and the line width works there (performance is other thing though...). Line is drawn as I wanted to. How can I do that in GalaxyS5 (android 5). For me it looks like device or OS specific problem (openGL version?), but is there any way to overcome this?
OpenGL ES implementations do not have to support drawing of wide lines. You can query the range of available line widths with:
float[] range = new float[2];
GLES20.glGetFloatv(GLES20.GL_ALIASED_LINE_WIDTH_RANGE, range, 0);
// range[0] is the minimum supported line width.
// range[1] is the maximum supported line width.
This gives you the range supported by the specific device you're running on. Compliant implementations can have a maximum as low as 1.0. This means that you cannot use wide lines if you want your code to run on all devices.
If you want something that has the appearance of wide lines, and will work on any device, you have to draw polygons. You can draw something that looks like a line as a thin quad that is oriented towards the viewer.

texture for YUV420 to RGB conversion in OpenGL ES

I have to convert and display YUV420P images to RGB colorspace using the AMD GPU on a Freescale iMX53 processor (OpenGL ES 2.0, EGL). Linux OS, no X11. To achieve this I should be able to create an appropriate image holding the YUV420P data: this could be either a YUV420P/YV12 image type or 3 simple 8-bit images, one for each component (Y, U, V).
glTexImage2D is excluded, because it's slow, the YUV420P frames are the results of a real time video decoding #25FPS and with glTexImage2D we can't keep the desired framerate.
There's an alternative: eglCreateImageKHR/glEGLImageTargetTexture2DOES. The only problem is that these can't handle any image format that would be suitable for YUV420/YV12 data.
EGLint attribs[] = {
EGL_WIDTH, 800,
EGL_HEIGHT, 480,
EGL_IMAGE_FORMAT_FSL, EGL_FORMAT_YUV_YV12_FSL,
EGL_NONE
};
EGLint const req_attribs[] = {
EGL_RED_SIZE, 5,
EGL_GREEN_SIZE, 6,
EGL_BLUE_SIZE, 5,
EGL_ALPHA_SIZE, 0,
EGL_SAMPLES, 0,
EGL_COLOR_BUFFER_TYPE, EGL_RGB_BUFFER,
EGL_SURFACE_TYPE, EGL_WINDOW_BIT,
EGL_NONE
};
...
display = eglGetDisplay(EGL_DEFAULT_DISPLAY);
eglInitialize(display, NULL, NULL);
eglBindAPI(EGL_OPENGL_ES_API);
eglChooseConfig(display, req_attribs, config, ARRAY_SIZE(config), &num_configs);
ctx = eglCreateContext(display, curr_config, NULL, NULL);
surface = eglCreateWindowSurface(display, curr_config, fb_handle, NULL);
...
EGLImageKHR yuv_img = eglCreateImageKHR(display, EGL_NO_CONTEXT, EGL_NEW_IMAGE_FSL, NULL, attribs);
eglQueryImageFSL(display, yuv_img, EGL_CLIENTBUFFER_TYPE_FSL, (EGLint *)&ptr);
glGenTextures(1, &texture);
glBindTexture(GL_TEXTURE_2D, texture);
glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, yuv_img);
glEGLImageTargetTexture2DOES(...) fails. If I change the appropriate line in 'attribs' to this:
EGL_IMAGE_FORMAT_FSL, EGL_FORMAT_RGB_565_FSL,
then the image can be assigned to an OpenGL ES texture, but it's not appropriate to hold either 8-bit data (Y/U/V) or a YUV420/YV12 data. Searching the net (including Freescale community forum) I've haven't found any solution to this.
How can I create an image which:
is fast to create;
eventually can be assigned to an already existing buffer (physical address or virtual address is given);
can be used in the fragment/vertex shader program to perform YUV --> RGB conversion;
Constraint is to avoid unneccessary memcpy(...)s due to performance reasons.
I have implemented this on the i.MX53 for several YUV formats and it works really well. I have a published article about it, although it was generalized to cover more Android platforms:
http://software.intel.com/en-us/articles/using-opengl-es-to-accelerate-apps-with-legacy-2d-guis
I suspect your problem is that you are not binding to the correct texture target. It should be like this:
glEGLImageTargetTexture2DOES(GL_TEXTURE_EXTERNAL_OES, hEglImage[iTextureIndex]);
glBindTexture(GL_TEXTURE_EXTERNAL_OES, hTexture[iIndex]);
And the eglImageAttributes should be one of these:
EGLint eglImageAttributes[] = {EGL_WIDTH, iTextureWidth, EGL_HEIGHT, iTextureHeight, EGL_IMAGE_FORMAT_FSL, EGL_FORMAT_YUV_YV12_FSL, EGL_NONE};
EGLint eglImageAttributes[] = {EGL_WIDTH, iTextureWidth, EGL_HEIGHT, iTextureHeight, EGL_IMAGE_FORMAT_FSL, EGL_FORMAT_YUV_NV21_FSL, EGL_NONE};
EGLint eglImageAttributes[] = {EGL_WIDTH, iTextureWidth, EGL_HEIGHT, iTextureHeight, EGL_IMAGE_FORMAT_FSL, EGL_FORMAT_YUV_UYVY_FSL, EGL_NONE};
hEglImage[iTextureIndex] = eglCreateImageKHR(eglDisplay, EGL_NO_CONTEXT, EGL_NEW_IMAGE_FSL, NULL, eglImageAttributes);
struct EGLImageInfoFSL EglImageInfo;
eglQueryImageFSL(eglDisplay, hEglImage[iTextureIndex], EGL_CLIENTBUFFER_TYPE_FSL, (EGLint *)&EglImageInfo);
Although this feature of the Freescale i.MX53 platform makes YUV to RGB color space conversion for video extremely fast, it does have a couple of limitations:
It only supports those 3 YUV formats.
eglCreateImageKHR() must allocate the buffers. There is no way to make it use existing buffers. Freescale confirmed that the NULL pointer can not be anything else, which technically violates the Khronos specification.
Freescale has resolved these problems on the i.MX6 platform, although the architecture is really different. Hope this helps.

Resources