How to access bitmap data of Direct2D Hardware RenderTarget? - direct2d

I'm using Direct2D with for some simple accelerated image compositing/manipulation, and now need to get the pixel data the RenderTarget to pass it to an encoder.
So far, I've managed this by rendering to a BitmapRenderTarget, then finally drawing the bitmap from that onto a WicBitmapRenderTarget which allows me to Lock an area and get a pointer to the pixels.
This only works if my initial RenderTarget uses D2D1_RENDER_TARGET_TYPE_SOFTWARE, because a hardware rendertarget's bitmap can't be 'shared' with the WicBitmapRenderTarget which only supports software rendering. The software rendering seems significantly slower than hardware.
Is there any way round this? Would I be better off using Direct3D instead?


How to render Direct3D content into ID2D1HwndRenderTarget?

In my app I render my Direct3D content to the window, as recommended, using the swap chain in flip sequential mode (IDXGISwapChain1, DXGI_SWAP_EFFECT_SEQUENTIAL), in a window without redirection bitmap (WS_EX_NOREDIRECTIONBITMAP)
However for the purpose of smooth window resizing (without lags and flicker), I want to intercept live resize enter/exit events and temporarily switch the rendering to ID2D1HwndRenderTarget to the window's redirection surface, which seems to be offer the smoothest possible resize.
The question is - how to render the Direct3D content to ID2D1HwndRenderTarget?
The problem is that obviously the Direct3D rendering and ID2D1HwndRenderTarget belong to different devices and I don't seem to find a way to interop them together.
What ideas come to mind:
A. Somehow assign ID2D1HwndRenderTarget to be the output frame buffer for 3D rendering. This would be the best case scenario, because it would minimize buffer copying.
Somehow obtain a ID3D11Texture2D or at least an IDXGISurface from the ID2D1HwndRenderTarget
Create render target ID3D11Device::CreateRenderTargetView from the DXGI surface / Direct3D texture
Set render target ID3D11DeviceContext::OMSetRenderTargets
Render directly to the ID2D1HwndRenderTarget
The problem is in step 1. ID2D1HwndRenderTarget and its ancestor ID2D1RenderTarget seem pretty scarce and don't seem to provide this capability. Is there a way to obtain a DXGI or D3D surface from the ID2D1HwndRenderTarget?
B. Create an off-screen Direct3D frame buffer texture, render the 3D content there and then copy it to the window render target.
Create off-screen texture ID3D11Texture2D
Create a ID2D1Device using D2D1CreateDevice (from the same
IDXGIDevice as my ID3D11Device)
Create a ID2D1DeviceContext
using ID2D1Device::CreateDeviceContext
Create a ID2D1Bitmap
using ID2D1DeviceContext::CreateBitmapFromDxgiSurface
Draw the
bitmap using ID2D1HwndRenderTarget::DrawBitmap
Unfortunately, I get the error message "An operation failed because a device-dependent resource is associated with the wrong ID2D1Device (resource domain)". Obviously, the resource comes from a different ID2D1Device But how to draw a texture bitmap from one Direct2D device onto another?
C. Out of desperation, I tried to map the content of my frame buffer to CPU memory using IDXGISurface::Map, however this is tricky because it requires D3D11_USAGE_STAGING and D3D11_CPU_ACCESS_READ flags when creating the texture, which then seems make it impossible to use this texture as an output frame buffer (or am I missing something?). And generally this technique will be most likely very slow, because it involves syncing between CPU and GPU, and copying the whole texture at least two times.
Has anyone ever succeeded with the task of rendering 3D content to a ID2D1HwndRenderTarget? Please share your experience. Thanks!

Performance of Direct 2D vs GDI+ when drawing bitmaps

Recently i have been investigating using Direct 2D (D2D) instead of GDI+ to draw cached bitmaps, i figured that since D2D was hardware accelerated i would get much lower drawing times (the API also seemed to be fairly friendly). My prototyping in fact indicates i should stick with GDI+ for now and it is in fact faster for this particular scenario.
In my prototype i have an MFC app which draws about 1000 icons to a canvas in the OnPaint function. The icons are not all different, there are probably less than 20 different types. From there i initialise both GDI+ and D2D from the GDI drawing context. First access of the bitmaps comes at a cost after that they are cached. I store them as a map of CachedBitmaps and ID2D1Bitmaps.
My feeling is that at this level (drawing bitmaps) the slowness is caused by interop'ing D2D with GDI (i believe this results in copying the results of the D2D rendering back to the GDI context). I dont know at what level GDI+ is hardware accelerated but i presume BitBlt is...
Can anyone shed any more light on my findings?

high performance video output with Qt

I'm writing a video player where my code decodes the video to raw YCbCr frames.
What would be the fastest way to output these through the Qt framework? I want to
avoid copying data around too much as the images are in HD format.
I am afraid that software color conversion into a QImage would be slow and that later the QImage will again be copied when drawing into the GUI.
I have had a look at QAbstractVideoSurface and even have running code,
but cannot grasp how this is faster, since like in the VideoWidget example
(, rendering is still done by calling QPainter::drawImage with
QImage, which has to be in RGB.
The preferred solution seems to me to have access to a hardware surface directly
into which I could decode the YCbCr or at least directly do the RGB conversion (with libswscale) into.
But I cannot see how I could do this (without using OpenGL, which would give me
free scaling too, though).
One common solution is to use QGL Widget with texture mapping. The application allocates a texture buffer on first frame, then call update texture in remaining frames. This is pure GL call, Qt not supporting texture manipulation yet. But QGLwidget can be used as a container.
Decoding was done using SSE2. Hope this helps.

OpenGL Win32 texture not shown in DrawToBitmap (DIB)

I have a realtime OpenGL application rendering some objects with textures. I have build a function to make an internal screenshot of the rendered scene by rendering it to an DIB via PFD_DRAW_TO_BITMAP and copy it to an image. It works quite well except for one kind of texture. These are JPGs with 24bpp (so 8 Bit for every R,G,B). I can load them and they render correctly in realtime but not when rendered to the DIB. For other textures it works well.
I have the same behaviour when testing my application on virtual machine (WinXP, no hardware acceleration!). Here these specific textures are not even shown in realtime rendering. Without hardware acceleration I guess WinXP uses its own software implementation of OpenGL and falls back to OpenGL 1.1.
So are there any kinds of textures that cant be drawn without 3d hardware acceleration? Or is there a common pitfall?
PFD_DRAW_TO_BITMAP will always drop you into the fallback OpenGL-1.1 software rasterizer. So you should not use it. Create an off-screen FBO, render to that, retrieve the pixel data using glReadPixels and write it to a file using an image file I/O library.

Pixel level manipulation windows

I've been using SDL to render graphics in C. I know there are several options to create graphics at the pixel level on Windows, including SDL and OpenGL. But how do these programs do it? Fine, I can use SDL. But I'd like to know what SDL is doing so I don't feel like an ignorant fool. Am I the only one slightly frustrated by the opaque layer of frosting on modern computers?
A short explanation as to how this is done on other operating systems would also be interesting, but I am most concerned with Windows.
Edit: Since this question seems to be somehow unclear, this is precisely what I want:
I would like to know how pixel level graphics manipulations (drawing on the screen pixel by pixel) works on Windows. What do libraries like SDL do with the operating system to allow this to happen. I can manipulate the screen pixel by pixel using SDL, so what magic happens in SDL to let me do this?
Windows has many graphics APIs. Some are layers built on top of others (e.g., GDI+ on top of GDI), and others are completely independent stacks (like the Direct3D family).
In an API like GDI, there are functions like SetPixel which let you change the value of a single pixel on the screen (or within a region of the screen that you have access to). But using SetPixel to setting lots of pixels is generally slow.
If you were to build a photorealistic renderer, like a ray tracer, then you'd probably build up a bitmap in memory (pixel by pixel), and use an API like BitBlt that sends the entire bitmap to the screen at once. This is much faster.
But it still may not be fast enough for rendering something like video. Moving all that data from system memory to the video card memory takes time. For video, it's common to use a graphics stack that's closer to the low-level graphics drivers and hardware. If the graphics card can do the video decompression directly, then sending the compressed video stream to the card will be much more efficient than sending the decompressed data from system memory to the video card--and that's often the limiting factor.
But conceptually, it's the same thing: you're manipulating a bitmap (or texture or surface or raster or ...), but that bitmap lives in graphics memory, and you're issuing commands to the GPU to set the pixels the way you want, and then to display that bitmap at some portion of the screen (often with some sort of transformation).
Modern graphics processors actually run little programs--called shaders--that can (among other things) do calculations to determine the pixel values. The GPUs are optimized to do these types of calculations and can do many of them in parallel. But ultimately, it boils down to getting the pixel values into some sort of bitmap in video memory.
