Fast screen capture and lost Vsync - windows-7

I'd like to generate a movie in real time with a self-made application doing fast screen captures with part of the screen occupied by a running 3D application.
I'm aware that several applications already exist for this (like FRAPS or Taksi), and even dedicated DirectShow filters (like UScreenCapture), but i really need to make this with my own external application.
When correctly setup (UScreenCapture + ffdshow), capturing an compressing a full screen does not consumes as much CPU as you would expect (about 15%), and does not impairs the performances of the 3D app.
The problem of doing a capture from an external application is that the 3D application loses it's Vsync and creates a shaggy, difficult to use 3D application (3D app is only presented on a small part of the screen, the rest being GDI, DirectX)
FRAPS solves this problem by allowing you to capture only one application at a time (the one with focus). Depending on the technology used (OpenGl, DirectX, GDI), it hooks the Vsync and does its capture (with glReadPixels,...), without perturbing it.
Doing this does not solve my problem, since I want the full composed screen image (including 3D and the rest) AND a smooth 3D app.
The UScreenCapture seems to use a fast DirectX call to capture the whole screen, but the openGL 3D app is still out of sync.
Doing a BitBlt is too slow and CPU consumming to do real time 30 fps acquisition (at least under windows XP, not sure with 7)
My question is to know if there is a way to achieve my goal with Windows 7 and it's brand new DirectX compositing engine?
Windows 7 succeeds to show live VSynced duplicated previews of every app (in the taskbar), so there must be a way to access the currenlty displayed screen buffer without perturbing the rendering of the 3D OpenGL app ?
Any other suggestion, technology ?
thank you

I made a list of possibly useful links at
http://betterlogic.com/roger/?p=3037
let me know if you have any success--eventually I would also be interested in a fast open source screen capture for windows...
related: Fastest method of screen capturing

Related

Get color pixel in metro app (windows 8)

I want to get color pixel on touch point and return a string (ex: #FFADD8E6).
I wonder if windows APIs support for that in metro app (windows 8). Anyone can answer for me or help me to find out the solution? Thank.
In general it isn't easy to do this. Assuming this is a XAML app (although the same logic applies to a WWA or DirectX app), you have a stack of rendering going on. The XAML objects are turned into textures inside the runtime, which get composited together by the hardware, along with potentially being combined with other applications, including components from the protected media pipeline, into the image that appears on screen. This image, which is what the user sees, only exists in the frame buffer of the GPU, so there really isn't anywhere for the CPU, and therefore your app, to read it from. While it would be possible to read it, it would almost certainly involve stalling the whole system wide rendering pipeline, then copying the whole frame buffer into system memory. That would be very slow.

Changing how windows displays using Win API?

While I have some experience with the WinAPI I do not have a ton, so I have a question for people who do have much experience in it. My question concerns what the limit of our power is. Can we change how windows fundamentally displays?
For example, can I cause windows to render a screen size bigger than the display and pan across it, kind of like workspaces but without separation? Can I apply distortion to the top and bottom of the screen? If distortion is not possible can I have an application mirror what windows is displaying with very little delay?
The biggest question I have is the first one, because if I can make windows render virtual workspaces and pan seamlessly between them then I figure it is possible to make a separate application which handles the distortion on a mirrored image of the desktop. Again, apologies for the vague questions, but I really want to know if we are able to do this stuff, at least in theory, before I dive deep into learning more on the API. If the WinAPI does not allow it is there another way to do this kind of stuff in Windows?
EDIT: Some clarification. What I want to do is basically extend the desktop to a very large size (not sure on exact size yet), both vertically and horizontally. Section the large desktop into workspaces of a specific size which can seamlessly be transitioned across and windows moved across. It would transition workspaces based on a head tracking device and/or mouse movement. Note that when I say workspaces this could be achieved by zomming in and then panning the zoom as well. I also need to be able to distort the screen, such as curving the edges, and render the screen twice. That is the bare minimum of what I am wanting to do.
Yes, you can. The most feasible way I come up with is using a virtual graphics driver (like what Windows Remote Desktop does, which creates a virtual graphics card and a virtual display). Sadly you will lose the ability to run some programs needing advanced graphics API (such as 3D games, 3D modelling tools or so).
There're some examples:
http://virtualmonitor.github.io/
https://superuser.com/questions/62051/is-there-a-way-to-fake-a-dual-second-monitor
And remember Windows has a limit on display resolution (for each and for altogether). I don't remember the exact number but it should be less than 32768*32768.

OpenGL maximum of 32 on-screen windows with Vista/7

Before I reduce this to a reasonable example I was hoping someone might have run into this before and can shed some light on the problem.
I have a 32-bit C based application which uses one OpenGL context per window, all the contexts and windows are set up identically. The requested pixel format is 32bit color, alpha, depth buffer, accelerated. Everything works flawlessly on Windows 2000 and XP.
Everything works flawlessly on Vista and 7 until the 33rd window/context pair is created. Creating the window has no errors, creating the context has no errors, making the context current has no errors, drawing produces no errors, SwapBuffers does not generate an error. However, the OpenGL contexts fail to produce any output, with Aero the windows are white, with the classic mode they don't draw and are just screen garbage. Killing the DWM doesn't fix the problem, trying different pixel formats (single buffer, diff. depths, etc) and PFD_SUPPORT_COMPOSITION doesn't fix the problem. This is on a number of different machines with Vista/7, never XP.
I can glReadPixels the back buffer and they are the correct pixels. Rendering into a pbuffer with the same context works fine, rendering into >32 pbuffers is fine.
If I free working on-screen contexts/windows, the non-working windows start working again. It's as if Vista/7 simply stop displaying OpenGL rendering after 32 windows are on screen doing this.
If the pixel format descriptor includes PFD_SUPPORT_GDI everything is OK, but it's using the software renderer which is unacceptable.
I am wondering if this is an OS limitation or driver limitation in Vista/7. Thanks for any insight.
The limit is implementation-specific, and all you can do is to run some tests on common hardware.
I ran some tests myself and it turns out that the limit is pretty high for GeForce cards (maybe even no limit). For desktop Quadro, there was limit of 128 contexts that were able to repaint correctly, the program was able to create 128 more contexts with no errors, but the windows contained rubbish. I'm not using PFD_SUPPORT_GDI.
It was even more interesting on ATi Radeon 6950, there the redrawing stopped at window #105, and creating rendering context #200 failed.
If you want to try for yourself, the program can be found here: Max OpenGL Contexts test (there is full source code + win32 binaries). Maybe you can look at the code and track down the culprit, would be very interested hearing about it.
That's the result. One piece of advice - avoid using multiple contexts where possible. Multiple contexts can be understood in application running at mulitple monitors, but applications on a single monitor should resort to a single context. Context switching is slow. And that's not all. Applications where OpenGL windows are overlapped by another windows require hardware clipping regions. There is one hardware clipping region on GeForce, eight or more on Quadro (CAD applications often use windows and menus that overlap OpenGL window, in contrast with games). In case more regions are needed, rendering falls back to software - so again - having lots of OpenGL windows (contexts) is not a very good idea.
Note this is rather similar to Is there a limit to how many OpenGL rendering contexts you can create simultaneously?

A skinning engine in Windows: draw “dirty” regions only or the whole window at once?

I want to make a skinning engine capable of drawing custom-shaped windows with alpha blending. That is, it'll use layered windows (UpdateLayeredWindow). A typical window will contain among its background a couple dozens of other bitmaps ranging from 10×10 to, say, 300×150 pixels. In the worst case most of these elements will have smooth animation up to 30 fps. Everything will be alpha-blended and I am going to use Direct2D for this (yes, I know older Windows versions doesn't support it). In general, Winamp's modern skin engine is the closest example.
Given all this and taking in account modern PCs performance, can I just redraw the whole window every single frame or do I have to constrain to some sort of clip rectangle?
D2D required you to render with WM_Paint messages
Honneslty, use The IAnimation interface, and just let D2D and windows worry about how often to redraw , though i will let you know , winamp is done with adobe air, and layerd windows with d2d causes issues. (Kinda think you have to use a DXGI render target, but with the window being layerd it needs a DC to be returned to an end paint call so it can update it's alpha channel)
I have some experience with this.
If you need to support Windows XP, using UpdateLayeredWindow is the only choice available for solving this problem. The documentation for this call says it copies the whole bitmap to the screen each time it is called and this bottleneck showed up in my benchmarking as the real limiting factor. If your window is 300x300 you pay that price on every update, even if you are careful to modify only a couple of pixels. It would be very easy to over-optimize the rendering side for no real benefit so implement something simple, measure, and then decide if you need to optimize.
If you can drop support for Windows XP then you can avoid UpdateLayeredWindow completely and use DwmExtendFrameIntoClientArea to create the same effect as a layered window. You'll write less code, avoid the UpdateLayeredWindow bottleneck, and D2D will be easier to work with.

Why Direct3D application performs better in full screen mode?

The performance of a Direct3D application seems to be significantly better in full screen mode compared to windowed mode. What are the technical reasons behind this?
I guess it has something to do with the fact that a full screen application can gain exclusive control for the display. But why the application cannot gain exclusive control for part of the screen (i.e. window) and have the same performance benefits?
Here are the cliff notes on how things work underneath.
Monitor screen always needs to be associated with so-called primary surface to be able to display anything, i.e. videocard can only scan out of one surface in video memory.
When application is fullscreen (and everything was set up correctly to enable flipping), primary surface is just one of the application backbuffers, and flipped to another backbuffer every frame. It is the most efficient way of presenting on the screen, but it requires application to own the entire monitor area (i.e. entire primary surface).
When there's no fullscreen application and DWM is off, primary surface is owned by OS, and every windowed application performs a blit from application backbuffer to a primary surface. This blit takes some GPU time to complete (as well as blits from the other applications visible on the screen), so it's not as efficient as fullscreen presentation. XP worked that way.
When DWM is composing the screen, things get even more complicated.
Here, DWM owns the primary surface and needs to draw application windows there. To make it possible, every window has an associated surface holding its contents, called redirection surface (which allows DWM to enable window ghosting, glass effects, and all that good stuff). Every time D3D application issues a frame, it adds a blit to a redirection surface.
That way, several blits need to happen: blit to a redirection surface by the app, blit from a redirection surface to the primary by DWM, which is, again, some overhead compared to fullscreen.
Note all of that additional work is on the GPU, so it doesn't affect CPU performance.
Stuff to read further:
http://blogs.msdn.com/greg_schechter/archive/2006/03/19/555087.aspx
http://blogs.msdn.com/greg_schechter/archive/2006/05/02/588934.aspx
http://blogs.msdn.com/greg_schechter/archive/2006/03/05/544314.aspx
There's a bit on MSDN that says full screen mode uses buffer flipping, if set up correctly, as opposed to blitting. It makes sense.
Of course you can (and in a way, do) give exclusive control for part of the screen to an application, but what happens to the rest of the screen? You still have to blit, do occlusion checking, etc. on the rest of the windows, and I think that's what causes the performance hit.
I'll add to #aib's answer that the rest of the screen is being managed by the OS. So, if anything else needs to be drawn/worked upon simultaneously, there has to be a performance hit.
For example, if you have a video playing in Windows Media Player in one window, then start Civilization in another, when Civ starts doing its fancy graphics, it will need to share screen space with everything else (like the video.
Whereas if the DirectX app has the full-screen, everything else might be "updating" or "playing", but not being drawn.
Basically, the video hardware is completely dedicated to the exclusive mode application.
There is no contention for video resources (pipeline, texture memory, etc...)
In particular, texture upload can be a big bottleneck. The less you have to do it (because you have it all), the better.

Resources