Before I reduce this to a reasonable example I was hoping someone might have run into this before and can shed some light on the problem.
I have a 32-bit C based application which uses one OpenGL context per window, all the contexts and windows are set up identically. The requested pixel format is 32bit color, alpha, depth buffer, accelerated. Everything works flawlessly on Windows 2000 and XP.
Everything works flawlessly on Vista and 7 until the 33rd window/context pair is created. Creating the window has no errors, creating the context has no errors, making the context current has no errors, drawing produces no errors, SwapBuffers does not generate an error. However, the OpenGL contexts fail to produce any output, with Aero the windows are white, with the classic mode they don't draw and are just screen garbage. Killing the DWM doesn't fix the problem, trying different pixel formats (single buffer, diff. depths, etc) and PFD_SUPPORT_COMPOSITION doesn't fix the problem. This is on a number of different machines with Vista/7, never XP.
I can glReadPixels the back buffer and they are the correct pixels. Rendering into a pbuffer with the same context works fine, rendering into >32 pbuffers is fine.
If I free working on-screen contexts/windows, the non-working windows start working again. It's as if Vista/7 simply stop displaying OpenGL rendering after 32 windows are on screen doing this.
If the pixel format descriptor includes PFD_SUPPORT_GDI everything is OK, but it's using the software renderer which is unacceptable.
I am wondering if this is an OS limitation or driver limitation in Vista/7. Thanks for any insight.
The limit is implementation-specific, and all you can do is to run some tests on common hardware.
I ran some tests myself and it turns out that the limit is pretty high for GeForce cards (maybe even no limit). For desktop Quadro, there was limit of 128 contexts that were able to repaint correctly, the program was able to create 128 more contexts with no errors, but the windows contained rubbish. I'm not using PFD_SUPPORT_GDI.
It was even more interesting on ATi Radeon 6950, there the redrawing stopped at window #105, and creating rendering context #200 failed.
If you want to try for yourself, the program can be found here: Max OpenGL Contexts test (there is full source code + win32 binaries). Maybe you can look at the code and track down the culprit, would be very interested hearing about it.
That's the result. One piece of advice - avoid using multiple contexts where possible. Multiple contexts can be understood in application running at mulitple monitors, but applications on a single monitor should resort to a single context. Context switching is slow. And that's not all. Applications where OpenGL windows are overlapped by another windows require hardware clipping regions. There is one hardware clipping region on GeForce, eight or more on Quadro (CAD applications often use windows and menus that overlap OpenGL window, in contrast with games). In case more regions are needed, rendering falls back to software - so again - having lots of OpenGL windows (contexts) is not a very good idea.
Note this is rather similar to Is there a limit to how many OpenGL rendering contexts you can create simultaneously?
Related
I want to get color pixel on touch point and return a string (ex: #FFADD8E6).
I wonder if windows APIs support for that in metro app (windows 8). Anyone can answer for me or help me to find out the solution? Thank.
In general it isn't easy to do this. Assuming this is a XAML app (although the same logic applies to a WWA or DirectX app), you have a stack of rendering going on. The XAML objects are turned into textures inside the runtime, which get composited together by the hardware, along with potentially being combined with other applications, including components from the protected media pipeline, into the image that appears on screen. This image, which is what the user sees, only exists in the frame buffer of the GPU, so there really isn't anywhere for the CPU, and therefore your app, to read it from. While it would be possible to read it, it would almost certainly involve stalling the whole system wide rendering pipeline, then copying the whole frame buffer into system memory. That would be very slow.
I want to make a skinning engine capable of drawing custom-shaped windows with alpha blending. That is, it'll use layered windows (UpdateLayeredWindow). A typical window will contain among its background a couple dozens of other bitmaps ranging from 10×10 to, say, 300×150 pixels. In the worst case most of these elements will have smooth animation up to 30 fps. Everything will be alpha-blended and I am going to use Direct2D for this (yes, I know older Windows versions doesn't support it). In general, Winamp's modern skin engine is the closest example.
Given all this and taking in account modern PCs performance, can I just redraw the whole window every single frame or do I have to constrain to some sort of clip rectangle?
D2D required you to render with WM_Paint messages
Honneslty, use The IAnimation interface, and just let D2D and windows worry about how often to redraw , though i will let you know , winamp is done with adobe air, and layerd windows with d2d causes issues. (Kinda think you have to use a DXGI render target, but with the window being layerd it needs a DC to be returned to an end paint call so it can update it's alpha channel)
I have some experience with this.
If you need to support Windows XP, using UpdateLayeredWindow is the only choice available for solving this problem. The documentation for this call says it copies the whole bitmap to the screen each time it is called and this bottleneck showed up in my benchmarking as the real limiting factor. If your window is 300x300 you pay that price on every update, even if you are careful to modify only a couple of pixels. It would be very easy to over-optimize the rendering side for no real benefit so implement something simple, measure, and then decide if you need to optimize.
If you can drop support for Windows XP then you can avoid UpdateLayeredWindow completely and use DwmExtendFrameIntoClientArea to create the same effect as a layered window. You'll write less code, avoid the UpdateLayeredWindow bottleneck, and D2D will be easier to work with.
I'd like to generate a movie in real time with a self-made application doing fast screen captures with part of the screen occupied by a running 3D application.
I'm aware that several applications already exist for this (like FRAPS or Taksi), and even dedicated DirectShow filters (like UScreenCapture), but i really need to make this with my own external application.
When correctly setup (UScreenCapture + ffdshow), capturing an compressing a full screen does not consumes as much CPU as you would expect (about 15%), and does not impairs the performances of the 3D app.
The problem of doing a capture from an external application is that the 3D application loses it's Vsync and creates a shaggy, difficult to use 3D application (3D app is only presented on a small part of the screen, the rest being GDI, DirectX)
FRAPS solves this problem by allowing you to capture only one application at a time (the one with focus). Depending on the technology used (OpenGl, DirectX, GDI), it hooks the Vsync and does its capture (with glReadPixels,...), without perturbing it.
Doing this does not solve my problem, since I want the full composed screen image (including 3D and the rest) AND a smooth 3D app.
The UScreenCapture seems to use a fast DirectX call to capture the whole screen, but the openGL 3D app is still out of sync.
Doing a BitBlt is too slow and CPU consumming to do real time 30 fps acquisition (at least under windows XP, not sure with 7)
My question is to know if there is a way to achieve my goal with Windows 7 and it's brand new DirectX compositing engine?
Windows 7 succeeds to show live VSynced duplicated previews of every app (in the taskbar), so there must be a way to access the currenlty displayed screen buffer without perturbing the rendering of the 3D OpenGL app ?
Any other suggestion, technology ?
thank you
I made a list of possibly useful links at
http://betterlogic.com/roger/?p=3037
let me know if you have any success--eventually I would also be interested in a fast open source screen capture for windows...
related: Fastest method of screen capturing
I was reading Larry Osterman's latest blog post about debugging a flickering problem in the Windows Vista/7 volume control, and I suddenly realized that I can't recall ever seeing an application flicker on my OS X laptop. Even applications that otherwise seem to be poorly written avoid the flicker problem in my experience. Without this turning into an Apple vs Windows debate (please), why do OS X applications not seem to have the same flickering problem?
I have trouble believing that Apple developers are simply amazing at programming flicker-free GUIs, while Windows programmers suck, so what's the reason? Does the OS X API require all GUIs to implement double-buffering? While some apps have the slightly sluggish double-buffered resize behavior, many don't, and they still avoid flickering. Is the OS X repaint flow somehow fundamentally different from Windows, avoiding the WM_ERASEBKGRND problem entirely? Or is there some other possibility that I'm not seeing?
Update: Thank you for your answers. I wish I could select both ken and cb160's answers, because they are both helpful.
Mac OS X has double buffered windows.
You don't have to do anything to make it happen. It's behind the scenes.
You (almost always) don't explicitly draw to a window in Cocoa when something changes, you invalidate a region of the window. The framework will later descend the hierarchy of views and draw the dirty regions of the window into a secondary buffer. Then it swaps the buffers.
You can optionally make some promises that allow the framework to take shortcuts when redrawing, but they're all opt-in. Only savvy views are affected.
If your subclass of NSView implements the isOpaque method to return YES, then the framework will never clear anything behind your view or draw any of the views under it.
Implementing preservesContentDuringLiveResize to return YES gives you some extra responsibilities, but can improve performance during window resizing.
10.6 added another two new APIs of this sort, layerContentsRedrawPolicy and layerContentsPlacement.
Last, custom drawing is less common than on Windows. The majority of views you see are framework-supplied and not subclassed. Framework-supplied means optimized-by-apple.
Both Windows Vista/7 and OSX use compositing engines to draw rasterised bitmaps on the screen. These compositing engines are responsible for processing output from all windows and drawing the final screen image. This compositing approach is how OSX is able to use the genie effect when minimizing to the dock and how aero draws the translucent borders. They also prevent flickering as if the bitmap to fill a particular area of the screen is not available, it will use the image it has already rather than drawing a blank region.
OSX has had a compositing engine since it first shipped. At the time, lots of people though this was a crazy appraoch as all the video cards shipping at the time wer optimized to draw bitmaps (ie, windows buttons and borders) and not composited images. In later versions of OSX, the compositing was pushed off to the GPU (in Quartz Extreme)and so took significant load off of the CPU and made more effects possible.
Because the Windows compositer was only added in windows Vista and then only when there was a GPU available and you had the right version of the OS, it is not as pervassive as the Quartz Compositer in OSX. Because the compositer is not always used in Windows, flickering will occur when a region is blanked and the application responsible for drawing is not able to redraw the region qucikly enough.
Yup, it's all double buffered automagically. Of course, if you are running
legacy code from mac os 9, or code ported from windoze, that mean's you're
probably triple buffering without knowing it. Hey, cycles are cheap!
The performance of a Direct3D application seems to be significantly better in full screen mode compared to windowed mode. What are the technical reasons behind this?
I guess it has something to do with the fact that a full screen application can gain exclusive control for the display. But why the application cannot gain exclusive control for part of the screen (i.e. window) and have the same performance benefits?
Here are the cliff notes on how things work underneath.
Monitor screen always needs to be associated with so-called primary surface to be able to display anything, i.e. videocard can only scan out of one surface in video memory.
When application is fullscreen (and everything was set up correctly to enable flipping), primary surface is just one of the application backbuffers, and flipped to another backbuffer every frame. It is the most efficient way of presenting on the screen, but it requires application to own the entire monitor area (i.e. entire primary surface).
When there's no fullscreen application and DWM is off, primary surface is owned by OS, and every windowed application performs a blit from application backbuffer to a primary surface. This blit takes some GPU time to complete (as well as blits from the other applications visible on the screen), so it's not as efficient as fullscreen presentation. XP worked that way.
When DWM is composing the screen, things get even more complicated.
Here, DWM owns the primary surface and needs to draw application windows there. To make it possible, every window has an associated surface holding its contents, called redirection surface (which allows DWM to enable window ghosting, glass effects, and all that good stuff). Every time D3D application issues a frame, it adds a blit to a redirection surface.
That way, several blits need to happen: blit to a redirection surface by the app, blit from a redirection surface to the primary by DWM, which is, again, some overhead compared to fullscreen.
Note all of that additional work is on the GPU, so it doesn't affect CPU performance.
Stuff to read further:
http://blogs.msdn.com/greg_schechter/archive/2006/03/19/555087.aspx
http://blogs.msdn.com/greg_schechter/archive/2006/05/02/588934.aspx
http://blogs.msdn.com/greg_schechter/archive/2006/03/05/544314.aspx
There's a bit on MSDN that says full screen mode uses buffer flipping, if set up correctly, as opposed to blitting. It makes sense.
Of course you can (and in a way, do) give exclusive control for part of the screen to an application, but what happens to the rest of the screen? You still have to blit, do occlusion checking, etc. on the rest of the windows, and I think that's what causes the performance hit.
I'll add to #aib's answer that the rest of the screen is being managed by the OS. So, if anything else needs to be drawn/worked upon simultaneously, there has to be a performance hit.
For example, if you have a video playing in Windows Media Player in one window, then start Civilization in another, when Civ starts doing its fancy graphics, it will need to share screen space with everything else (like the video.
Whereas if the DirectX app has the full-screen, everything else might be "updating" or "playing", but not being drawn.
Basically, the video hardware is completely dedicated to the exclusive mode application.
There is no contention for video resources (pipeline, texture memory, etc...)
In particular, texture upload can be a big bottleneck. The less you have to do it (because you have it all), the better.