On systems with a compositor, a windowed application must render into an off-screen buffer, which is then submitted to the compositor for composition and presentation. How does displaying a windowed desktop work without a compositor?
Suppose we have a 3D application using double buffering to render into a window, fully redrawing on each frame. This is my understanding of the process of presenting a new frame:
Application submits a frame buffer for presentation.
Compositor receives the buffer.
Later, the compositor composites all the windows into the screen's back buffer.
Compositor swaps the screen's buffers.
What happens after step 1 if there is no compositor? (For example, on Windows without DWM, or on an X server.) Clearly, something is laying out the windows and making sure they render in the correct position and order. How is that different from compositing?
Related
In my app I render my Direct3D content to the window, as recommended, using the swap chain in flip sequential mode (IDXGISwapChain1, DXGI_SWAP_EFFECT_SEQUENTIAL), in a window without redirection bitmap (WS_EX_NOREDIRECTIONBITMAP)
However for the purpose of smooth window resizing (without lags and flicker), I want to intercept live resize enter/exit events and temporarily switch the rendering to ID2D1HwndRenderTarget to the window's redirection surface, which seems to be offer the smoothest possible resize.
The question is - how to render the Direct3D content to ID2D1HwndRenderTarget?
The problem is that obviously the Direct3D rendering and ID2D1HwndRenderTarget belong to different devices and I don't seem to find a way to interop them together.
What ideas come to mind:
A. Somehow assign ID2D1HwndRenderTarget to be the output frame buffer for 3D rendering. This would be the best case scenario, because it would minimize buffer copying.
Somehow obtain a ID3D11Texture2D or at least an IDXGISurface from the ID2D1HwndRenderTarget
Create render target ID3D11Device::CreateRenderTargetView from the DXGI surface / Direct3D texture
Set render target ID3D11DeviceContext::OMSetRenderTargets
Render directly to the ID2D1HwndRenderTarget
The problem is in step 1. ID2D1HwndRenderTarget and its ancestor ID2D1RenderTarget seem pretty scarce and don't seem to provide this capability. Is there a way to obtain a DXGI or D3D surface from the ID2D1HwndRenderTarget?
B. Create an off-screen Direct3D frame buffer texture, render the 3D content there and then copy it to the window render target.
Create off-screen texture ID3D11Texture2D
(ID3D11Device::CreateTexture2D)
Create a ID2D1Device using D2D1CreateDevice (from the same
IDXGIDevice as my ID3D11Device)
Create a ID2D1DeviceContext
using ID2D1Device::CreateDeviceContext
Create a ID2D1Bitmap
using ID2D1DeviceContext::CreateBitmapFromDxgiSurface
Draw the
bitmap using ID2D1HwndRenderTarget::DrawBitmap
Unfortunately, I get the error message "An operation failed because a device-dependent resource is associated with the wrong ID2D1Device (resource domain)". Obviously, the resource comes from a different ID2D1Device But how to draw a texture bitmap from one Direct2D device onto another?
C. Out of desperation, I tried to map the content of my frame buffer to CPU memory using IDXGISurface::Map, however this is tricky because it requires D3D11_USAGE_STAGING and D3D11_CPU_ACCESS_READ flags when creating the texture, which then seems make it impossible to use this texture as an output frame buffer (or am I missing something?). And generally this technique will be most likely very slow, because it involves syncing between CPU and GPU, and copying the whole texture at least two times.
Has anyone ever succeeded with the task of rendering 3D content to a ID2D1HwndRenderTarget? Please share your experience. Thanks!
One of the things that I've noticed (at least on Windows anyway), is that the mouse cursor is drawn with much less latency than even standard Windows elements.
A good example of this would be to start dragging on the desktop. You can easily notice that the drag rectangle is lagging significantly behind the cursor.
My first question is: why is this the case?
I can't imagine drawing a rectangle being so much more expensive than drawing the cursor. Certainly not by a frame or two.
And my second question is, would it ever be possible to match one's application rendering 1:1 with cursor input?
A good use case for this would be either this selection rectangle, or drag previews for draggable items. Both of which lag behind quite significantly from the OS mouse pointer (independent of any framework or library used).
Selecting icons on the desktop with the selection rectangle is not that slow on my system (DWM on), it is lagging a little bit but not enough for me to really care.
The "Show Window Contents while Dragging" option has always been rather slow which is why it was not on by default in older Windows versions.
The mouse cursor on the other hand can be rendered directly by your hardware. That is, Windows sends the cursor image to your graphics card and after that Windows only has to tell the graphics card the cursor position and this is much faster than all the messages and user/kernel context switches involved when you resize and paint a window. The mouse driver probably uses hardware interrupts/timers with a higher priority than your normal software as well.
You can try to disable hardware cursors with a registry hack but the HID/mouse driver and the raw input thread in win32k will still have a higher priority than your application.
If the content of the last frame isn't changed on receiving WM_PAINT, is it possible to simply direct the operating system to redraw the window using the old back buffer instead of redrawing the whole scene again to the new back buffer and swapping it?
No. There is no such "backbuffer". And when drawing occurs you don't know what areas may be covered by other windows. The clipping area isn't a real good indicator.
The only thing you know is that such areas need to be redrawn. Each window cares about its own client area. If you want to buffer something, you have to do it on your own.
The reason is simple: Imagine you have hundreds of windows. To hold a buffer for each window is inefficient, when just a view on the top are visible. So the Windows makers decide not to store any windows content and just notify windows on the top to redraw themselves.
OK. Since we have a DWM (Dynamic Window Manager) things changed a lot. But the principle is still: You are responsible to draw. If you want to buffer something, you have to do it on your own.
I'm using OpenGL to speed-up GUI rendering. It works fine, but when user drags the window into a different display (potentially connected to a different GPU), it starts rendering black interior. When the user moves the window back to the original display, it starts working again.
I have this report from Windows XP, I'm unfortunately unable to check on Win 7/8 and Mac OS X right now.
Any ideas what to do about it?
In the current Windows and Linux driver models OpenGL contexts are tied to a certain graphics scanout framebuffer. It's perfectly possible for the scanout framebuffer to span over several connected displays and even GPUs, if the GPU architecture allows for that (for example NVidia SLi and AMD CrossFire).
However what does not work (with current driver architectures) are GPUs of different vendors (e.g. NVidia and Intel) to share a scanout buffer. Or in the case of all NVidia or AMD if the GPUs have not been connected using SLI or CrossFire.
So if the monitors are connected to different graphics cards then this can happen.
This is however only a software design limitation. It's perfectly possible to separate graphics rendering from display scanout. This forms in fact the technical base for Hybrid graphics, where a fast GPU renders into the scanout buffer memory managed by a different GPU (NVidia Optimus for example).
The low hanging fruit to fix that would be to recreate the context when the window passes over to a screen conntected to a different GPU. But this has a problem: If the Window is split among screens on one of the screens it will stay black. Also recreating a context, together with uploading all the data can be a length operation. And often in situations like yours the device on the other screen is incompatible with the feature set of the original context.
A workaround for this is to do all rendering on a off-screen framebuffer object (FBO), which contents you then copy to CPU memory and from there to the target Window using GDI operations. This method however has a huge drawback of involving a full memory roundtrip and increased latency.
The steps to set this up would be:
Identify the screen with the GPU you want to use
Create a hidden window centered on that screen. (i.e. do not WS_VISIBLE as style in CreateWindow and do not call ShowWindow on it).
Create a OpenGL context on this window; it doesn't have to be double buffered PIXELFORMAT, but usually double buffered gives better performance.
Create the target, user visible window; do not bind a OpenGL context to this window.
Setup a Framebuffer Object (FBO) on the OpenGL context
the renderbuffer target of this FBO is to be created to match the client rect size of the target window; when the window gets resized, resize the FBO renderbuffer.
setup 2 renderbuffer object for double buffered operation
Setup a Pixel Buffer Object (PBO) that matches the dimensions of the renderbuffers
when the renderbuffers' size changes, so needs the PBO
With OpenGL render to the FBO, then transfer the pixel contents to the PBO (glBindBuffer, glReadPixels)
Map the PBO to process memory using glMapBuffer and use the SetDIBitsToDevice function to transfer the data from the mapped memory region to the target window device context; then unmap the PBO
The performance of a Direct3D application seems to be significantly better in full screen mode compared to windowed mode. What are the technical reasons behind this?
I guess it has something to do with the fact that a full screen application can gain exclusive control for the display. But why the application cannot gain exclusive control for part of the screen (i.e. window) and have the same performance benefits?
Here are the cliff notes on how things work underneath.
Monitor screen always needs to be associated with so-called primary surface to be able to display anything, i.e. videocard can only scan out of one surface in video memory.
When application is fullscreen (and everything was set up correctly to enable flipping), primary surface is just one of the application backbuffers, and flipped to another backbuffer every frame. It is the most efficient way of presenting on the screen, but it requires application to own the entire monitor area (i.e. entire primary surface).
When there's no fullscreen application and DWM is off, primary surface is owned by OS, and every windowed application performs a blit from application backbuffer to a primary surface. This blit takes some GPU time to complete (as well as blits from the other applications visible on the screen), so it's not as efficient as fullscreen presentation. XP worked that way.
When DWM is composing the screen, things get even more complicated.
Here, DWM owns the primary surface and needs to draw application windows there. To make it possible, every window has an associated surface holding its contents, called redirection surface (which allows DWM to enable window ghosting, glass effects, and all that good stuff). Every time D3D application issues a frame, it adds a blit to a redirection surface.
That way, several blits need to happen: blit to a redirection surface by the app, blit from a redirection surface to the primary by DWM, which is, again, some overhead compared to fullscreen.
Note all of that additional work is on the GPU, so it doesn't affect CPU performance.
Stuff to read further:
http://blogs.msdn.com/greg_schechter/archive/2006/03/19/555087.aspx
http://blogs.msdn.com/greg_schechter/archive/2006/05/02/588934.aspx
http://blogs.msdn.com/greg_schechter/archive/2006/03/05/544314.aspx
There's a bit on MSDN that says full screen mode uses buffer flipping, if set up correctly, as opposed to blitting. It makes sense.
Of course you can (and in a way, do) give exclusive control for part of the screen to an application, but what happens to the rest of the screen? You still have to blit, do occlusion checking, etc. on the rest of the windows, and I think that's what causes the performance hit.
I'll add to #aib's answer that the rest of the screen is being managed by the OS. So, if anything else needs to be drawn/worked upon simultaneously, there has to be a performance hit.
For example, if you have a video playing in Windows Media Player in one window, then start Civilization in another, when Civ starts doing its fancy graphics, it will need to share screen space with everything else (like the video.
Whereas if the DirectX app has the full-screen, everything else might be "updating" or "playing", but not being drawn.
Basically, the video hardware is completely dedicated to the exclusive mode application.
There is no contention for video resources (pipeline, texture memory, etc...)
In particular, texture upload can be a big bottleneck. The less you have to do it (because you have it all), the better.