We built an application which uses QT WebEngine to test WebGL functionality, it worked however the CPU utilization was very high (>30%) for rendering some sine waveforms, the root file system was provided by QT Enterprise as described here for IMX6
http://doc.qt.digia.com/QtEnterpriseEmbedded/qtee-preparing-hardware-imx6sabresd.html
On inspecting the root file system we found that there were no GPU drivers (usually libVivante.so and libVivante.ko for IMX6), so it looks like all the GL rendering is being done by CPU instead of GPU and thats the reason for high CPU Utilization, Does anybody know any other ways to enable hardware acceleration for WebGL in the QT WebEngine?
Qt WebEngine requires hardware acceleration to composite the layers of the page and you would probably not be able to see anything on the screen without it.
Chromium, behind Qt WebEngine, is quite a beast and is more designed for perceived smoothness than to yield CPU cycles; it will use all the resources it can to achieve this.
Any JavaScript WebGL call will go from the main render thread, then to the GPU process main thread to be decoded into GL calls to the driver. Each different WebGL canvas will trigger a different FBO to be used and bound, requiring GL context switching, and as often as possible, the latest state will trigger the Chromium compositor to kick in, send all the delegated scene to the browser process, to finally end in QtQuick's scene graph thread to be composited.
All this to say that a single JavaScript WebGL call triggers a much bigger machine than just telling OpenGL to draw those geometries. A CPU usage of 30% on this kind of device doesn't seem anormal to me, though there might be a way to avoid bottle necks.
The most efficient this could get is by having a custom QtQuick Scene Graph geometry as shown in this example: http://qt-project.org/doc/qt-5/qtquick-scenegraph-customgeometry-example.html, but even then I wouldn't expect a CPU usage under 10% on that device.
Related
I have a Qt application that is built around a QGraphicsView/Scene. The graphical performance is fine, animations are extremely smooth and a simple high res timer says the frames are drawing as fast as 400 fps. However, the application is always using 15% cpu according to task manager. I have run performance analysis on it in Visual Studio 2012 and it is showing that most of the samples are being taken in the QApplication::notify function.
I have set the viewport to render with a QGLWidget in hopes that offloading the drawing functions to the GPU would help, but that had no impact at all on CPU usage.
Is this normal? Is there something I can do to reduce CPU usage?
Well, there you have it - 400 FPS framerates. That loads one of your cores at 100%. There is a reason people usually cap framerates. The high framerates are putting a strain on the Qt event system which is driving the graphics.
Limit your frame rate to 60 FPS and problem solved.
I'm not updating the view unless an event occurs that updates an
individual graphicswidget
Do not update the scene for each and every scene element change. This is likely the cause of the overhead. You can make multiple scene item changes, but render the scene at a fixed rate.
Also, I notice you said graphicswidget - which I assume is QGraphicsWidget - this might be problematic too. QObject derived classes are a little on the heavy side, and the Qt event system comes with overheads too, which is the reason the regular QGraphicsItem is not QObject derived. If you use graphics widgets excessively, that might be a source of overhead, so see if you can get away with using the lighter QGraphicsItem class and some lighter mechanism to drive your scene.
I need to write application where the main content will be OpenGL rendered (something like game engine), but there is no good OpenGL based GUI library similiar to what Qt widgets does (but they are software rendered).
As i browsed the source code of Qt, all painting is done via QPainter and there is even QPainter implementation in OpenGL, but the suppport for multiple graphics backends was dropped in Qt 5, so you can't render Qt Widgets in OpenGL anymore (i don't know why).
The problem is that you can't paint to window surface using both software and hardware rendering. You can have the window associated with OpenGL context or use software rendering. That means if i want to have app with complex GUI with OpenGL based content, i need either paint everything using OpenGL (which is hard because as i said, there is no good GUI library for it), or i can render GUI to image using software rendering (for example Qt) and than load that image as OpenGL texture (probably big performance loss).
Does anyone know any good application that is using software rendered GUI loaded as texture to OpenGL? I need to be sure it will work without some big performance loss, but can't find good example that it will work well even for apps like game engines.
If you take the "render ui to texture then draw a textured quad over my game" route, and are worried about performances, try to avoid transfering the whole texture each frame.
If you think about it :
60fps is not necessary for ui : 30fps is enough, so update it one time out of two.
Most of the time, ui dont change between frames, and if it changes, only a small portion of it do.
ui framework often keep track of which part of the ui is "dirty" and need to be redrawn. If you can get your hand on that, you can stream to the texture only the parts that need to be updated (glTexSubImage2D).
First let me explain the application a little bit. This is video security software that can display up to 48 cameras at once. Each video stream gets its own Windows HDC but they all use a shared OpenGL context. I get pretty good performance with OpenGL and it runs on Windows/Linux/Mac. Under the hood the contexts are created using wxWidgets 2.8 wxGLCanvas, but I don't think that has anything to do with the issue.
Now here's the issue. Say I take the same camera and display it in all 48 of my windows. This basically means I'm only decoding 30fps (which is done on a different thread anywa) but displaying up to 1440fps to take decoding out of the picture. I'm using PBOs to transfer the images over, depending on whether pixel shaders and multitexturing are supported I may use those to do YUV->RGB conversion on the GPU. Then I use a quad to position the texture and call SwapBuffers. All the OpenGL calls come from the UI thread. Also I've tried doing YUV->RGB conversion on the CPU and messed with using GL_RGBA and GL_BGRA textures, but all formats still yield roughly the same performance. Now the problem is I'm only getting around 1000fps out of the possible 1440fps (I know I shouldn't be measuring in fps, but its easier in this scenario). The above scenario is using 320x240 (YUV420) video which is roughly only 110MB/sec. If I use a 1280x720 camera then I get roughly the same framerate which is nearly 1.3GB/sec. This tells me that it certainly isn't the texture upload speed. If I do the YUV->RGB conversion and scaling on the CPU and paint using a Windows DC then I can easily get the full 1440fps.
The other thing to mention is that I've disabled vsync both on my video card and through OpenGL using wglSwapIntervalEXT. Also there are no OpenGL errors being reported. However, using very sleepy to profile the application it seems to be spending most of its time in SwapBuffers. I'm assuming the issue is somehow related to my use of multiple HDCs or with SwapBuffers somewhere, however, I'm not sure how else to do what I'm doing.
I'm no expert on OpenGL so if anyone has any suggestions or anything I would love to hear them. If there is anything that I'm doing that sounds wrong or any way I could achieve the same thing more efficiently I'd love to hear it.
Here's some links to glIntercept logs for a better understanding of all the OpenGL calls being made:
Simple RGB: https://docs.google.com/open?id=0BzGMib6CGH4TdUdlcTBYMHNTRnM
Shaders YUV: https://docs.google.com/open?id=0BzGMib6CGH4TSDJTZGxDanBwS2M
Profiling Information:
So after profiling it reported several redundant state changes which I'm not surprised by. I eliminated all of them and saw no noticeable performance difference which I kind of expected. I have 34 state changes per render loop and I am using several deprecated functions. I'll look into using vertex arrays which would solve these. However, I'm just doing one quad per render loop so I don't really expect much performance impact from this. Also keep in mind I don't want to rip everything out and go all VBOs because I still need to support some fairly old Intel chipset drivers that I believe are only OpenGL 1.4.
The thing that really interested me and it hadn't occurred to me before was that each context has its own front and back buffer. Since I'm only using one context the previous HDCs render call must finish writing to the back buffer before the swap can occur and then the next one can start writing to the back buffer again. Would it really be more efficient to use more than one context? Or should I look into rendering to textures (FBOs I think) instead and continue using one context?
EDIT: The original description mentioned using multiple OpenGL contexts, but I was wrong I'm only using one OpenGL context and multiple HDCs.
EDIT2: Added some information after profiling with gDEBugger.
What I try to make your application faster. I made one OpenGL render thread (or more if you have 2 or more video cards). Video card cannot process several context in one time, your multiple OpenGL contexts are waiting one of context. This thread will make only OpenGL work, like YUV->RGB conversion (Used FBO to render to texture). Camere`s thread send images to this thread and UI thread can picked up it to show on window.
You have query to process in OpenGL context and you can combine several frames to one texture to convert it by one pass. It maybe useful, because you have up to 48 cameras. As another variant if OpenGL thread is busy now, you can convert some frame on CPU.
From the log I see you often call the same methods:
glEnable(GL_TEXTURE_2D)
glMatrixMode(GL_TEXTURE)
glLoadIdentity()
glColor4f(1.000000,1.000000,1.000000,1.000000)
You may call it once per context and did not call for each render.
If I understung correct you use 3 texture for each plane of YUV
glTexSubImage2D(GL_TEXTURE_2D,0,0,0,352,240,GL_LUMINANCE,GL_UNSIGNED_BYTE,00000000)
glTexSubImage2D(GL_TEXTURE_2D,0,0,0,176,120,GL_LUMINANCE,GL_UNSIGNED_BYTE,000000)
glTexSubImage2D(GL_TEXTURE_2D,0,0,0,176,120,GL_LUMINANCE,GL_UNSIGNED_BYTE,00000000)
Try to use one texture and use calculation in shader to take correct YUV value for pixel. It is possible, I made it in my application.
I've been playing around with graphics in android and I noticed that it takes a lot of time and resources to draw a bitmaps with the canvas. Especially in high end games which require many images to be drawn at once, this could be pretty bad for things such as the framerate. If I decide to learn and use openGL, would it make a big difference? Or maybe I'm not using the canvas right?
It depends on what version of Android you're talking about.
In android version 2.X, all canvas operations are not hardware accelerated, so it's not using the GPU at all, and it processes everything pixel by pixel on the CPU.
In either Android 3 or 4 (I forget which one exactly), hardware acceleration was added to canvas so that you could have a GPU accelerated canvas.
OpenGLES always uses hardware acceleration, so for android 2.X, it will always be much much faster than a canvas (this is your only real option for any kind of game that needs a reasonable framerate).
In hardware accelerated android, you probably won't notice much of a difference between canvas and OpenGL, because they both leverage the GPU, provided that your canvas has hardware acceleration enabled.
I am curious about how Graphics Card works in general. Please enlighten me.
If we don't make a call to a graphics library such as DirectX or OpenGL, does Graphics Card render every other things on screen as well? Or all these calculation for rendering depend on the CPU and are rendered by the CPU?
For instance, if I am to create a simple program that will load an image and render it on a window frame, without using DirectX or OpenGL, does having a faster graphics card render this image faster in this case? Or will this solely depend on the CPU if we don't use DirectX or OpenGL?
The simple answer is "yes", in a modern OS the graphics card does render most everything on the screen. This isn't quite a 'graphics card' question, but rather a OS question. The card has been able to do this since 3dfx days, but the OS did not use it for things like window compositing until recently.
For your example, the answer really depends on the API you use to render your window. One could imagine an API that is far removed from the OS and chose to always keep the image data in CPU memory. If every frame is displayed by blitting the visible portion from CPU to GPU, the GPU would likely not be the bottleneck (PCIE probably would be). But, other APIs (hopefully the one you use) could store the image data in the GPU memory and the visible portion could be displayed from GPU memory without traversing PCIE every frame. That said, the 'decoration' part of the window is likely drawn by a series of OpenGL or DX calls.
Hopefully that answers things well enough?