Alternatives to StretchDIBits win32 API - winapi

I am using StretchDIBits function to draw four different Bitmap's pixeldata(DIBits) to different parts of a window.
StretchDIBits immediately displays the copied bits in the window. I was wondering if there was a way to defer showing the copied bitmap's bits until after all bitmaps are drawn.

Related

How does windowed rendering work without a compositor?

On systems with a compositor, a windowed application must render into an off-screen buffer, which is then submitted to the compositor for composition and presentation. How does displaying a windowed desktop work without a compositor?
Suppose we have a 3D application using double buffering to render into a window, fully redrawing on each frame. This is my understanding of the process of presenting a new frame:
Application submits a frame buffer for presentation.
Compositor receives the buffer.
Later, the compositor composites all the windows into the screen's back buffer.
Compositor swaps the screen's buffers.
What happens after step 1 if there is no compositor? (For example, on Windows without DWM, or on an X server.) Clearly, something is laying out the windows and making sure they render in the correct position and order. How is that different from compositing?

How to render Direct3D content into ID2D1HwndRenderTarget?

In my app I render my Direct3D content to the window, as recommended, using the swap chain in flip sequential mode (IDXGISwapChain1, DXGI_SWAP_EFFECT_SEQUENTIAL), in a window without redirection bitmap (WS_EX_NOREDIRECTIONBITMAP)
However for the purpose of smooth window resizing (without lags and flicker), I want to intercept live resize enter/exit events and temporarily switch the rendering to ID2D1HwndRenderTarget to the window's redirection surface, which seems to be offer the smoothest possible resize.
The question is - how to render the Direct3D content to ID2D1HwndRenderTarget?
The problem is that obviously the Direct3D rendering and ID2D1HwndRenderTarget belong to different devices and I don't seem to find a way to interop them together.
What ideas come to mind:
A. Somehow assign ID2D1HwndRenderTarget to be the output frame buffer for 3D rendering. This would be the best case scenario, because it would minimize buffer copying.
Somehow obtain a ID3D11Texture2D or at least an IDXGISurface from the ID2D1HwndRenderTarget
Create render target ID3D11Device::CreateRenderTargetView from the DXGI surface / Direct3D texture
Set render target ID3D11DeviceContext::OMSetRenderTargets
Render directly to the ID2D1HwndRenderTarget
The problem is in step 1. ID2D1HwndRenderTarget and its ancestor ID2D1RenderTarget seem pretty scarce and don't seem to provide this capability. Is there a way to obtain a DXGI or D3D surface from the ID2D1HwndRenderTarget?
B. Create an off-screen Direct3D frame buffer texture, render the 3D content there and then copy it to the window render target.
Create off-screen texture ID3D11Texture2D
(ID3D11Device::CreateTexture2D)
Create a ID2D1Device using D2D1CreateDevice (from the same
IDXGIDevice as my ID3D11Device)
Create a ID2D1DeviceContext
using ID2D1Device::CreateDeviceContext
Create a ID2D1Bitmap
using ID2D1DeviceContext::CreateBitmapFromDxgiSurface
Draw the
bitmap using ID2D1HwndRenderTarget::DrawBitmap
Unfortunately, I get the error message "An operation failed because a device-dependent resource is associated with the wrong ID2D1Device (resource domain)". Obviously, the resource comes from a different ID2D1Device But how to draw a texture bitmap from one Direct2D device onto another?
C. Out of desperation, I tried to map the content of my frame buffer to CPU memory using IDXGISurface::Map, however this is tricky because it requires D3D11_USAGE_STAGING and D3D11_CPU_ACCESS_READ flags when creating the texture, which then seems make it impossible to use this texture as an output frame buffer (or am I missing something?). And generally this technique will be most likely very slow, because it involves syncing between CPU and GPU, and copying the whole texture at least two times.
Has anyone ever succeeded with the task of rendering 3D content to a ID2D1HwndRenderTarget? Please share your experience. Thanks!

Reusing the previous back buffer on WM_PAINT

If the content of the last frame isn't changed on receiving WM_PAINT, is it possible to simply direct the operating system to redraw the window using the old back buffer instead of redrawing the whole scene again to the new back buffer and swapping it?
No. There is no such "backbuffer". And when drawing occurs you don't know what areas may be covered by other windows. The clipping area isn't a real good indicator.
The only thing you know is that such areas need to be redrawn. Each window cares about its own client area. If you want to buffer something, you have to do it on your own.
The reason is simple: Imagine you have hundreds of windows. To hold a buffer for each window is inefficient, when just a view on the top are visible. So the Windows makers decide not to store any windows content and just notify windows on the top to redraw themselves.
OK. Since we have a DWM (Dynamic Window Manager) things changed a lot. But the principle is still: You are responsible to draw. If you want to buffer something, you have to do it on your own.

OpenGL context stops working when user moves the window into a different display

I'm using OpenGL to speed-up GUI rendering. It works fine, but when user drags the window into a different display (potentially connected to a different GPU), it starts rendering black interior. When the user moves the window back to the original display, it starts working again.
I have this report from Windows XP, I'm unfortunately unable to check on Win 7/8 and Mac OS X right now.
Any ideas what to do about it?
In the current Windows and Linux driver models OpenGL contexts are tied to a certain graphics scanout framebuffer. It's perfectly possible for the scanout framebuffer to span over several connected displays and even GPUs, if the GPU architecture allows for that (for example NVidia SLi and AMD CrossFire).
However what does not work (with current driver architectures) are GPUs of different vendors (e.g. NVidia and Intel) to share a scanout buffer. Or in the case of all NVidia or AMD if the GPUs have not been connected using SLI or CrossFire.
So if the monitors are connected to different graphics cards then this can happen.
This is however only a software design limitation. It's perfectly possible to separate graphics rendering from display scanout. This forms in fact the technical base for Hybrid graphics, where a fast GPU renders into the scanout buffer memory managed by a different GPU (NVidia Optimus for example).
The low hanging fruit to fix that would be to recreate the context when the window passes over to a screen conntected to a different GPU. But this has a problem: If the Window is split among screens on one of the screens it will stay black. Also recreating a context, together with uploading all the data can be a length operation. And often in situations like yours the device on the other screen is incompatible with the feature set of the original context.
A workaround for this is to do all rendering on a off-screen framebuffer object (FBO), which contents you then copy to CPU memory and from there to the target Window using GDI operations. This method however has a huge drawback of involving a full memory roundtrip and increased latency.
The steps to set this up would be:
Identify the screen with the GPU you want to use
Create a hidden window centered on that screen. (i.e. do not WS_VISIBLE as style in CreateWindow and do not call ShowWindow on it).
Create a OpenGL context on this window; it doesn't have to be double buffered PIXELFORMAT, but usually double buffered gives better performance.
Create the target, user visible window; do not bind a OpenGL context to this window.
Setup a Framebuffer Object (FBO) on the OpenGL context
the renderbuffer target of this FBO is to be created to match the client rect size of the target window; when the window gets resized, resize the FBO renderbuffer.
setup 2 renderbuffer object for double buffered operation
Setup a Pixel Buffer Object (PBO) that matches the dimensions of the renderbuffers
when the renderbuffers' size changes, so needs the PBO
With OpenGL render to the FBO, then transfer the pixel contents to the PBO (glBindBuffer, glReadPixels)
Map the PBO to process memory using glMapBuffer and use the SetDIBitsToDevice function to transfer the data from the mapped memory region to the target window device context; then unmap the PBO

Fastest method for blitting from a pixel buffer into a device context

Good evening,
I have several 32-bit images in memory buffers that I wish to "blit" to a device context, quickly. Speed is an issue here because the buffer will be manipulated constantly and need to be blitted to the DC repeatedly.
The color depth of the buffer is 32-bits, so it is already in the DIB-expected format of SetDIBits(). However, this is rather cumbersome since the bitmap target of SetDIBits() cannot be already selected into the DC prior to the operation. So I will need to constantly swap out the DC's bitmap, call SetDIBits(), swap the bitmap back into the DC, and then blit the DC to the Window's DC. To me, that just seems like a LOT of workload on the CPU and too much branching in the Windows API; way too much for optimal performance.
I would be interested in using DirectX if it didn't force me to use Device Contexts for 2D operations, or uploading textures to video memory before displaying them, because the contents of the image are constantly changing.
My question is simple (despite the long writeup). What would be the fastest way for me to blit an image from a pixel buffer in memory onto the screen? Direct access to the pixel buffer of a DC would be great, but I know that's not going to happen.
Thanks for reading my long writeup.
There is an API method CreateDIBSection to create a DIB that applications can write to directly. This allows to continuously updating the bitmap (either memcopy or directly writing to it).
See MSDN article for further details.
Access to the bitmap must be synchronized. Do this by calling the GdiFlush function.

Resources