One of the things that I've noticed (at least on Windows anyway), is that the mouse cursor is drawn with much less latency than even standard Windows elements.
A good example of this would be to start dragging on the desktop. You can easily notice that the drag rectangle is lagging significantly behind the cursor.
My first question is: why is this the case?
I can't imagine drawing a rectangle being so much more expensive than drawing the cursor. Certainly not by a frame or two.
And my second question is, would it ever be possible to match one's application rendering 1:1 with cursor input?
A good use case for this would be either this selection rectangle, or drag previews for draggable items. Both of which lag behind quite significantly from the OS mouse pointer (independent of any framework or library used).
Selecting icons on the desktop with the selection rectangle is not that slow on my system (DWM on), it is lagging a little bit but not enough for me to really care.
The "Show Window Contents while Dragging" option has always been rather slow which is why it was not on by default in older Windows versions.
The mouse cursor on the other hand can be rendered directly by your hardware. That is, Windows sends the cursor image to your graphics card and after that Windows only has to tell the graphics card the cursor position and this is much faster than all the messages and user/kernel context switches involved when you resize and paint a window. The mouse driver probably uses hardware interrupts/timers with a higher priority than your normal software as well.
You can try to disable hardware cursors with a registry hack but the HID/mouse driver and the raw input thread in win32k will still have a higher priority than your application.
Related
I am looking to create a custom rounded frame for an application window (border-radius and shadow)
From a performance point of view, what would be the best technique for this?
a. Use regions (SetWindowRgn) for the rounded application window and a layered window (UpdateLayeredWindow) for the shadow.
b. Use layered windows for both the rounded application window and the shadow.
The docs for UpdateLayeredWindow specify:
For best drawing performance by the layered window and any underlying
windows, the layered window should be as small as possible.
I am asking this specifically for the application main window, so a large window that can have a high complexity and is most of the times visible on the screen.
Should I go with regions or layered window for the application window? Which one would be lighter on the CPU/memory?
SetWindowRgn disables DWM for the given Window. DWM ist the component that is responsible for performantly drawing the Window frame using the available graphics hardware. That should pretty much rule out SetWindowRgn. Also, SetWindowRgn produces very "ancient" looking results because antialiasing is not possible. A pixel can be either fully transparent or fully opaque.
For best drawing performance by the layered window and any underlying
windows, the layered window should be as small as possible.
I believe that in 2018, this hint is less relevant. The documentation was written 18 years ago when the hardware was way more limited than today.
Still, UpdateLayeredWindow is not the fastest way to draw custom window frames, especially when you have to update the bitmap often (e. g. during window resize). The bottleneck is that these updates have to go from system memory to graphics memory. To minimize window size, create four small windows which are only large enough to draw the borders/corners of your window. This trick is pulled by Visual Studio for instance. Using Spy++ one can see 4 instances of "VisualStudioGlowWindow" which are layered windows that are just 9 pixels wide/tall (on my system):
If you want maximum performance, you may also look into Direct Composition, combined with the WS_EX_NOREDIRECTIONBITMAP extended window style, as explained in the article "High-Performance Window Layering Using the Windows Composition Engine". This technique requires Windows 8 at least.
I'm trying to implement a cross-platform UI library that takes as little system resource as possible. I'm considering to either use my own software renderer or opengl.
For stationary controls everything's fine, I can repaint only when it's needed. However, when it comes to implementing animations, especially animated blinking carets like the 'phase' caret in sublime text, I don't see a easy way to balance resource usage and performance.
For a blinking caret, it's required that the caret be redrawn very frequently(15-20 times per sec at least, I guess). On one hand, the software renderer supports partial redraw but is far too slow to be practical(3-4 fps for large redraw regions, say, 1000x800, which makes it impossible to implement animations). On the other hand, opengl doesn't support partial redraw very well as far as I know, which means the whole screen needs to be rendered at 15-20 fps constantly.
So my question is:
How are carets usually implemented in various UI systems?
Is there any way to have opengl to render to only a proportion of the screen?
I know that glViewport enables rendering to part of the screen, but due to double buffering or other stuff the rest of the screen is not kept as it was. In this way I still need to render the whole screen again.
First you need to ask yourself.
Do I really need to partially redraw the screen?
OpenGL or better said the GPU can draw thousands of triangles at ease. So before you start fiddling with partial redrawing of the screen, then you should instead benchmark and see whether it's worth looking into at all.
This doesn't however imply that you have to redraw the screen endlessly. You can still just redraw it when changes happen.
Thus if you have a cursor blinking every 500 ms, then you redraw once every 500 ms. If you have an animation running, then you continuously redraw while that animation is playing (or every time the animation does a change that requires redrawing).
This is what Chrome, Firefox, etc does. You can see this if you open the Developer Tools (F12) and go to the Timeline tab.
Take a look at the following screenshot. The first row of the timeline shows how often Chrome redraws the windows.
The first section shows a lot continuously redrawing. Which was because I was scrolling around on the page.
The last section shows a single redraw every few 500 ms. Which was the cursor blinking in a textbox.
Open the image in a new tab, to see better what's going on.
Note that it doesn't tell whether Chrome is fully redrawing the window or only that parts of it. It is just showing the frequency of the redrawing. (If you want to see the redrawn regions, then both Firefox and Chrome has "Show Paint Rectangles".)
To circumvent the problem with double buffering and partially redrawing. Then you could instead draw to a framebuffer object. Now you can utilize glScissor() as much as you want. If you have various things that are static and only a few dynamic things. Then you could have multiple framebuffer objects and only draw the static contents once and continuously update the framebuffer containing the dynamic content.
However (and I can't emphasize this enough) benchmark and check if this is even needed. Having two framebuffer objects could be more expensive than just always redrawing everything. The same goes for say having a buffer for each rectangle, in contrast to packing all rectangles in a single buffer.
Lastly to give an example let's take NanoGUI (a minimalistic GUI library for OpenGL). NanoGUI continuously redraws the screen.
The problem with not just continuously redrawing the screen is that now you need a system for issuing a redraw. Now calling setText() on a label needs to callback and tell the window to redraw. Now what if the parent panel the label is added to isn't visible? Then setText() just issued a redundant redrawing of the screen.
The point I'm trying to make is that if you have a system for issuing redrawing of the screen. Then that might be more prone to errors. Thus unless continuously redrawing is an issue, then that is definitely a more optimal starting point.
If the content of the last frame isn't changed on receiving WM_PAINT, is it possible to simply direct the operating system to redraw the window using the old back buffer instead of redrawing the whole scene again to the new back buffer and swapping it?
No. There is no such "backbuffer". And when drawing occurs you don't know what areas may be covered by other windows. The clipping area isn't a real good indicator.
The only thing you know is that such areas need to be redrawn. Each window cares about its own client area. If you want to buffer something, you have to do it on your own.
The reason is simple: Imagine you have hundreds of windows. To hold a buffer for each window is inefficient, when just a view on the top are visible. So the Windows makers decide not to store any windows content and just notify windows on the top to redraw themselves.
OK. Since we have a DWM (Dynamic Window Manager) things changed a lot. But the principle is still: You are responsible to draw. If you want to buffer something, you have to do it on your own.
I want to make a skinning engine capable of drawing custom-shaped windows with alpha blending. That is, it'll use layered windows (UpdateLayeredWindow). A typical window will contain among its background a couple dozens of other bitmaps ranging from 10×10 to, say, 300×150 pixels. In the worst case most of these elements will have smooth animation up to 30 fps. Everything will be alpha-blended and I am going to use Direct2D for this (yes, I know older Windows versions doesn't support it). In general, Winamp's modern skin engine is the closest example.
Given all this and taking in account modern PCs performance, can I just redraw the whole window every single frame or do I have to constrain to some sort of clip rectangle?
D2D required you to render with WM_Paint messages
Honneslty, use The IAnimation interface, and just let D2D and windows worry about how often to redraw , though i will let you know , winamp is done with adobe air, and layerd windows with d2d causes issues. (Kinda think you have to use a DXGI render target, but with the window being layerd it needs a DC to be returned to an end paint call so it can update it's alpha channel)
I have some experience with this.
If you need to support Windows XP, using UpdateLayeredWindow is the only choice available for solving this problem. The documentation for this call says it copies the whole bitmap to the screen each time it is called and this bottleneck showed up in my benchmarking as the real limiting factor. If your window is 300x300 you pay that price on every update, even if you are careful to modify only a couple of pixels. It would be very easy to over-optimize the rendering side for no real benefit so implement something simple, measure, and then decide if you need to optimize.
If you can drop support for Windows XP then you can avoid UpdateLayeredWindow completely and use DwmExtendFrameIntoClientArea to create the same effect as a layered window. You'll write less code, avoid the UpdateLayeredWindow bottleneck, and D2D will be easier to work with.
The performance of a Direct3D application seems to be significantly better in full screen mode compared to windowed mode. What are the technical reasons behind this?
I guess it has something to do with the fact that a full screen application can gain exclusive control for the display. But why the application cannot gain exclusive control for part of the screen (i.e. window) and have the same performance benefits?
Here are the cliff notes on how things work underneath.
Monitor screen always needs to be associated with so-called primary surface to be able to display anything, i.e. videocard can only scan out of one surface in video memory.
When application is fullscreen (and everything was set up correctly to enable flipping), primary surface is just one of the application backbuffers, and flipped to another backbuffer every frame. It is the most efficient way of presenting on the screen, but it requires application to own the entire monitor area (i.e. entire primary surface).
When there's no fullscreen application and DWM is off, primary surface is owned by OS, and every windowed application performs a blit from application backbuffer to a primary surface. This blit takes some GPU time to complete (as well as blits from the other applications visible on the screen), so it's not as efficient as fullscreen presentation. XP worked that way.
When DWM is composing the screen, things get even more complicated.
Here, DWM owns the primary surface and needs to draw application windows there. To make it possible, every window has an associated surface holding its contents, called redirection surface (which allows DWM to enable window ghosting, glass effects, and all that good stuff). Every time D3D application issues a frame, it adds a blit to a redirection surface.
That way, several blits need to happen: blit to a redirection surface by the app, blit from a redirection surface to the primary by DWM, which is, again, some overhead compared to fullscreen.
Note all of that additional work is on the GPU, so it doesn't affect CPU performance.
Stuff to read further:
http://blogs.msdn.com/greg_schechter/archive/2006/03/19/555087.aspx
http://blogs.msdn.com/greg_schechter/archive/2006/05/02/588934.aspx
http://blogs.msdn.com/greg_schechter/archive/2006/03/05/544314.aspx
There's a bit on MSDN that says full screen mode uses buffer flipping, if set up correctly, as opposed to blitting. It makes sense.
Of course you can (and in a way, do) give exclusive control for part of the screen to an application, but what happens to the rest of the screen? You still have to blit, do occlusion checking, etc. on the rest of the windows, and I think that's what causes the performance hit.
I'll add to #aib's answer that the rest of the screen is being managed by the OS. So, if anything else needs to be drawn/worked upon simultaneously, there has to be a performance hit.
For example, if you have a video playing in Windows Media Player in one window, then start Civilization in another, when Civ starts doing its fancy graphics, it will need to share screen space with everything else (like the video.
Whereas if the DirectX app has the full-screen, everything else might be "updating" or "playing", but not being drawn.
Basically, the video hardware is completely dedicated to the exclusive mode application.
There is no contention for video resources (pipeline, texture memory, etc...)
In particular, texture upload can be a big bottleneck. The less you have to do it (because you have it all), the better.