Why Direct3D application performs better in full screen mode? - performance

The performance of a Direct3D application seems to be significantly better in full screen mode compared to windowed mode. What are the technical reasons behind this?
I guess it has something to do with the fact that a full screen application can gain exclusive control for the display. But why the application cannot gain exclusive control for part of the screen (i.e. window) and have the same performance benefits?

Here are the cliff notes on how things work underneath.
Monitor screen always needs to be associated with so-called primary surface to be able to display anything, i.e. videocard can only scan out of one surface in video memory.
When application is fullscreen (and everything was set up correctly to enable flipping), primary surface is just one of the application backbuffers, and flipped to another backbuffer every frame. It is the most efficient way of presenting on the screen, but it requires application to own the entire monitor area (i.e. entire primary surface).
When there's no fullscreen application and DWM is off, primary surface is owned by OS, and every windowed application performs a blit from application backbuffer to a primary surface. This blit takes some GPU time to complete (as well as blits from the other applications visible on the screen), so it's not as efficient as fullscreen presentation. XP worked that way.
When DWM is composing the screen, things get even more complicated.
Here, DWM owns the primary surface and needs to draw application windows there. To make it possible, every window has an associated surface holding its contents, called redirection surface (which allows DWM to enable window ghosting, glass effects, and all that good stuff). Every time D3D application issues a frame, it adds a blit to a redirection surface.
That way, several blits need to happen: blit to a redirection surface by the app, blit from a redirection surface to the primary by DWM, which is, again, some overhead compared to fullscreen.
Note all of that additional work is on the GPU, so it doesn't affect CPU performance.
Stuff to read further:
http://blogs.msdn.com/greg_schechter/archive/2006/03/19/555087.aspx
http://blogs.msdn.com/greg_schechter/archive/2006/05/02/588934.aspx
http://blogs.msdn.com/greg_schechter/archive/2006/03/05/544314.aspx

There's a bit on MSDN that says full screen mode uses buffer flipping, if set up correctly, as opposed to blitting. It makes sense.
Of course you can (and in a way, do) give exclusive control for part of the screen to an application, but what happens to the rest of the screen? You still have to blit, do occlusion checking, etc. on the rest of the windows, and I think that's what causes the performance hit.

I'll add to #aib's answer that the rest of the screen is being managed by the OS. So, if anything else needs to be drawn/worked upon simultaneously, there has to be a performance hit.
For example, if you have a video playing in Windows Media Player in one window, then start Civilization in another, when Civ starts doing its fancy graphics, it will need to share screen space with everything else (like the video.
Whereas if the DirectX app has the full-screen, everything else might be "updating" or "playing", but not being drawn.

Basically, the video hardware is completely dedicated to the exclusive mode application.
There is no contention for video resources (pipeline, texture memory, etc...)
In particular, texture upload can be a big bottleneck. The less you have to do it (because you have it all), the better.

Related

Why is the mouse cursor drawn faster than applications?

One of the things that I've noticed (at least on Windows anyway), is that the mouse cursor is drawn with much less latency than even standard Windows elements.
A good example of this would be to start dragging on the desktop. You can easily notice that the drag rectangle is lagging significantly behind the cursor.
My first question is: why is this the case?
I can't imagine drawing a rectangle being so much more expensive than drawing the cursor. Certainly not by a frame or two.
And my second question is, would it ever be possible to match one's application rendering 1:1 with cursor input?
A good use case for this would be either this selection rectangle, or drag previews for draggable items. Both of which lag behind quite significantly from the OS mouse pointer (independent of any framework or library used).
Selecting icons on the desktop with the selection rectangle is not that slow on my system (DWM on), it is lagging a little bit but not enough for me to really care.
The "Show Window Contents while Dragging" option has always been rather slow which is why it was not on by default in older Windows versions.
The mouse cursor on the other hand can be rendered directly by your hardware. That is, Windows sends the cursor image to your graphics card and after that Windows only has to tell the graphics card the cursor position and this is much faster than all the messages and user/kernel context switches involved when you resize and paint a window. The mouse driver probably uses hardware interrupts/timers with a higher priority than your normal software as well.
You can try to disable hardware cursors with a registry hack but the HID/mouse driver and the raw input thread in win32k will still have a higher priority than your application.

Get color pixel in metro app (windows 8)

I want to get color pixel on touch point and return a string (ex: #FFADD8E6).
I wonder if windows APIs support for that in metro app (windows 8). Anyone can answer for me or help me to find out the solution? Thank.
In general it isn't easy to do this. Assuming this is a XAML app (although the same logic applies to a WWA or DirectX app), you have a stack of rendering going on. The XAML objects are turned into textures inside the runtime, which get composited together by the hardware, along with potentially being combined with other applications, including components from the protected media pipeline, into the image that appears on screen. This image, which is what the user sees, only exists in the frame buffer of the GPU, so there really isn't anywhere for the CPU, and therefore your app, to read it from. While it would be possible to read it, it would almost certainly involve stalling the whole system wide rendering pipeline, then copying the whole frame buffer into system memory. That would be very slow.

What happens during a display mode change?

What happens during a display mode change (resolution, depth) on an ordinary computer? (classical stationarys and laptops)
It might not be so trivial since video cards are so different, but one thing is common to all of them:
The screen goes black (understandable since the signal is turned off)
It takes many seconds for the signal to return with the new mode
and if it is under D3D or GL:
The graphics device is lost and all VRAM objects must be reloaded, making the mode change take even longer
Can someone explain the underlying nature of this, and specifically why a display mode change is not a trivial reallocation of the backbuffer(s) and takes such a "long" time?
The only thing that actually changes are the settings of the so called RAMDAC (a Digital Analog Converter directly attached to the video RAM), well today with digital connections it's more like a RAMTX (a DVI/HDMI/DisplayPort Transmitter attached to the video RAM). DOS graphics programmer veterans probably remember the fights between the RAMDAC, the specification and the woes of one's own code.
It actually doesn't take seconds until the signal returns. This is a rather quick process, but most display devices take their time to synchronize with the new signal parameters. Actually with well written drivers the change happens almost immediately, between vertical blanks. A few years ago, when the displays were, errr, stupider and analogue, after changing the video mode settings, one could see the picture going berserk for a short moment, until the display resynchronized (maybe I should take a video of this, while I still own equipment capable of this).
Since what actually is going on is just a change of RAMDAC settings there's also not neccesary data lost as long as the basic parameters stays the same: Number of Bits per Pixel, number of components per pixel and pixel stride. And in fact OpenGL contexts usually don't loose their data with an video mode change. Of course visible framebuffer layouts change, but that happens also when moving the window around.
DirectX Graphics is a bit of different story, though. There is device exclusive access and whenever switching between Direct3D fullscreen mode and regular desktop mode all graphics objects are swapped, so that's the reason for DirectX Graphics being so laggy when switching from/to a game to the Windows desktop.
If the pixel data format changes it usually requires a full reinitialization of the visible framebuffer, but today GPUs are exceptionally good in maping heterogenous pixel formats into a target framebuffer, so no delays neccesary there, too.

A skinning engine in Windows: draw “dirty” regions only or the whole window at once?

I want to make a skinning engine capable of drawing custom-shaped windows with alpha blending. That is, it'll use layered windows (UpdateLayeredWindow). A typical window will contain among its background a couple dozens of other bitmaps ranging from 10×10 to, say, 300×150 pixels. In the worst case most of these elements will have smooth animation up to 30 fps. Everything will be alpha-blended and I am going to use Direct2D for this (yes, I know older Windows versions doesn't support it). In general, Winamp's modern skin engine is the closest example.
Given all this and taking in account modern PCs performance, can I just redraw the whole window every single frame or do I have to constrain to some sort of clip rectangle?
D2D required you to render with WM_Paint messages
Honneslty, use The IAnimation interface, and just let D2D and windows worry about how often to redraw , though i will let you know , winamp is done with adobe air, and layerd windows with d2d causes issues. (Kinda think you have to use a DXGI render target, but with the window being layerd it needs a DC to be returned to an end paint call so it can update it's alpha channel)
I have some experience with this.
If you need to support Windows XP, using UpdateLayeredWindow is the only choice available for solving this problem. The documentation for this call says it copies the whole bitmap to the screen each time it is called and this bottleneck showed up in my benchmarking as the real limiting factor. If your window is 300x300 you pay that price on every update, even if you are careful to modify only a couple of pixels. It would be very easy to over-optimize the rendering side for no real benefit so implement something simple, measure, and then decide if you need to optimize.
If you can drop support for Windows XP then you can avoid UpdateLayeredWindow completely and use DwmExtendFrameIntoClientArea to create the same effect as a layered window. You'll write less code, avoid the UpdateLayeredWindow bottleneck, and D2D will be easier to work with.

Fast screen capture and lost Vsync

I'd like to generate a movie in real time with a self-made application doing fast screen captures with part of the screen occupied by a running 3D application.
I'm aware that several applications already exist for this (like FRAPS or Taksi), and even dedicated DirectShow filters (like UScreenCapture), but i really need to make this with my own external application.
When correctly setup (UScreenCapture + ffdshow), capturing an compressing a full screen does not consumes as much CPU as you would expect (about 15%), and does not impairs the performances of the 3D app.
The problem of doing a capture from an external application is that the 3D application loses it's Vsync and creates a shaggy, difficult to use 3D application (3D app is only presented on a small part of the screen, the rest being GDI, DirectX)
FRAPS solves this problem by allowing you to capture only one application at a time (the one with focus). Depending on the technology used (OpenGl, DirectX, GDI), it hooks the Vsync and does its capture (with glReadPixels,...), without perturbing it.
Doing this does not solve my problem, since I want the full composed screen image (including 3D and the rest) AND a smooth 3D app.
The UScreenCapture seems to use a fast DirectX call to capture the whole screen, but the openGL 3D app is still out of sync.
Doing a BitBlt is too slow and CPU consumming to do real time 30 fps acquisition (at least under windows XP, not sure with 7)
My question is to know if there is a way to achieve my goal with Windows 7 and it's brand new DirectX compositing engine?
Windows 7 succeeds to show live VSynced duplicated previews of every app (in the taskbar), so there must be a way to access the currenlty displayed screen buffer without perturbing the rendering of the 3D OpenGL app ?
Any other suggestion, technology ?
thank you
I made a list of possibly useful links at
http://betterlogic.com/roger/?p=3037
let me know if you have any success--eventually I would also be interested in a fast open source screen capture for windows...
related: Fastest method of screen capturing

Resources