GetMessage() while the thread is blocked in SwapBuffers() - windows

Vsync blocks SwapBuffers(), which is what I want. My problem is that, since input messages go to the same thread that owns the window, any messages that come in while SwapBuffers() is blocked won't be processed immediately, but only after the vsync triggers the buffer swap and SwapBuffers() returns. So I have all my compute threads sitting idle instead of processing the scene for rendering in the next frame using the most recent input. I'm particularly concerned with having very low latency. I need some way to access all pending input messages to the window from other threads.
Windows API provides a way to wait for either Windows events or input messages using MsgWaitForMultipleObjects(), yet there's no similar way to wait for a buffer swap together with other things. That's very unfortunate.
I considered calling SwapBuffers() in another thread, but that requires glFinish() to be called in the window's thread before signalling another thread to SwapBuffers(), and glFinish() is still a blocking call so it's not a good solution.
I considered hooking, but that also looks like a dead end. Hooking with WH_GETMESSAGE will have the GetMsgProc() be called not asynchronously, but when the window's thread calls GetMessage()/PeekMessage(), so it's no help. Installing a global hook doesn't help me either due to the need of calling RegisterTouchWindow() with a specific window handle to process WM_TOUCH -- and my input is touch. And, while for mouse and keyboard, you can install low level hooks that capture messages as they're posted to the thread's queue, rather than when the thread calls GetMessage()/PeekMessage(), there appears to be no similar option for touch.
I also looked at wglDelayBeforeSwapNV(), but I don't see what's preventing the OS from preempting a thread sometimes after the call to that function but before SwapBuffers(), causing a miss of the next vsync signal.
So what's a good workaround? Can I make a second, invisible window, that will somehow be always the active one and so get all input messages, while the visible one is displaying the rendering? According to another discussion, message-only windows (CreateWindow with HWND_MESSAGE) are not compatible with WM_TOUCH. Is there perhaps some undocumented event that SwapBuffers() is internally waiting on that I could access and feed to MsgWaitForMultipleObjects()? My target is a fixed platform (Windows 8.1 64-bit) so I'm fine with using undocumented functionality, should it exist. I do want to avoid writing my own touchscreen driver, however.

Out of curiosity, why not implement your entire drawing logic in that other thread? It appears the problem you are running into is that the message pump is driven by the same thread that draws. Since Windows does not let you drive the message pump from a different thread than the one that created the window, the easiest solution would just be to push all the GL stuff into a different thread.
SwapBuffers (...) is also not necessarily going to block. As-per requirements of VSYNC an implementation need only block the next command that would modify the backbuffer while all backbuffers are pending a swap. Triple buffering changes things up a little bit by introducing a second backbuffer.
One possible implementation of triple buffering will discard the oldest backbuffer when it comes time to swap, thus SwapBuffers (...) would never cause blocking (this is effectively how modern versions of Windows work in windowed mode with the DWM enabled). Other implementations will eventually present both backbuffers, this reduces (but does not eliminate) blocking but also results in the display of late frames.
Unfortunately WGL does not let you request the number of backbuffers in a swap-chain (beyond 0 single-buffered or 1 double-buffered); the only way to get triple buffering on Windows is using driver settings. Lowest message latency will come from driving GL in a different thread, but triple buffering can help a little bit while requiring no effort on your part.

Related

WebGL - When to call gl.Flush?

I just noticed today that this method, Flush() is available.
Not able to find detailed documentation on it.
What exactly does this do?
Is this required?
gl.flush in WebGL does have it's uses but it's driver and browser specific.
For example, because Chrome's GPU architecture is multi-process you can do this
function var loadShader = function(gl, shaderSource, shaderType) {
var shader = gl.createShader(shaderType);
gl.shaderSource(shader, shaderSource);
gl.compileShader(shader);
return shader;
}
var vs = loadShader(gl, someVertexShaderSource, gl.VERTEX_SHADER);
var fs = loadShader(gl, someFragmentShaderSource, FRAGMENT_SHADER);
var p = gl.createProgram();
gl.attachShader(p, vs);
gl.attachShader(p, fs);
gl.linkProgram(p);
At this point all of the commands might be sitting in the command
queue with nothing executing them yet. So, issue a flush
gl.flush();
Now, because we know that compiling and linking programs is slow depending on how large and complex they are so we can wait a while before trying using them and do other stuff
setTimeout(continueLater, 1000); // continue 1 second later
now do other things like setup the page or UI or something
1 second later continueLater will get called. It's likely our shaders finished compiling and linking.
function continueLater() {
// check results, get locations, etc.
if (!gl.getShaderParameter(vs, gl.COMPILE_STATUS) ||
!gl.getShaderParameter(fs, gl.COMPILE_STATUS) ||
!gl.getProgramParameter(p, gl.LINK_STATUS)) {
alert("shaders didn't compile or program didn't link");
...etc...
}
var someLoc = gl.getUniformLocation(program, "u_someUniform");
...etc...
}
I believe Google Maps uses this technique as they have to compile many very complex shaders and they'd like the page to stay responsive. If they called gl.compileShader or gl.linkProgram and immediately called one of the query functions like gl.getShaderParameter or gl.getProgramParameter or gl.getUniformLocation the program would freeze while the shader is first validated and then sent to the driver to be compiled. By not doing the querying immediately but waiting a moment they can avoid that pause in the UX.
Unfortunately this only works for Chrome AFAIK because other browsers are not multi-process and I believe all drivers compile/link synchronously.
There maybe be other reasons to call gl.flush but again it's very driver/os/browser specific. As an example let's say you were going to draw 1000 objects and to do that took 5000 webgl calls. It likely would require more than that but just to have a number lets pick 5000. 4 calls to gl.uniformXXX and 1 calls to gl.drawXXX per object.
It's possible all 5000 of those calls fit in the browser's (Chrome) or driver's command buffer. Without a flush they won't start executing until the the browser issues a gl.flush for you (which it does so it can composite your results on the screen). That means the GPU might be sitting idle while you issue 1000, then 2000, then 3000, etc.. commands since they're just sitting in a buffer. gl.flush tells the system "Hey, those commands I added, please make sure to start executing them". So you might decide to call gl.flush after each 1000 commands.
The problem though is gl.flush is not free otherwise you'd call it after every command to make sure it executes as soon as possible. On top of that each driver/browser works in different ways. On some drivers calling gl.flush every few 100 or 1000 WebGL calls might be a win. On others it might be a waste of time.
Sorry, that was probably too much info :p
Assuming it's semantically equivalent to the classic GL glFlush then no, it will almost never be required. OpenGL is an asynchronous API — you queue up work to be done and it is done when it can be. glFlush is still asynchronous but forces any accumulated buffers to be emptied as quickly as they can be, however long that may take; it basically says to the driver "if you were planning to hold anything back for any reason, please don't".
It's usually done only for a platform-specific reason related to the coupling of OpenGL and the other display mechanisms on that system. For example, one platform might need all GL work to be ordered not to queue before the container that it will be drawn into can be moved to the screen. Another might allow piping of resources from a background thread into the main OpenGL context but not guarantee that they're necessarily available for subsequent calls until you've flushed (e.g. if multithreading ends up creating two separate queues where there might otherwise be one then flush might insert a synchronisation barrier across both).
Any platform with a double buffer or with back buffering as per the WebGL model will automatically ensure that buffers proceed in a timely manner. Queueing is to aid performance but should have no negative observable consequences. So you don't have to do anything manually.
If you decline to flush and don't strictly need to even when you semantically perhaps should, but your graphics are predicated on real-time display anyway, then you're probably going to be suffering at worst a fraction of a second of latency.

Async signal or notification between processes on Windows

There are 2 processes running on Windows. They communicate with each other through named pipe. When one of them is ready to send a message, I want to notificate the other process asynchronously like signal on Linux so that the other process don't need to check for the pipe continously. Are there some similar methods like the signal mechanism on Windows or other way to solve my problem?
A direct signal mechanism which conceptually works the same way does not exist (one could probably simulate it with a thread injection hack, but don't even think about that). It is not much of a problem, since you can do otherwise.
Every waitable kernel object which can take a name such as an event or a semaphore can be accessed by different processes.
You can WaitForSingleObject on the synchronization primitive until the other process signals it. That would be a Unix-like readiness notification mechanism (not quite as elegant, but to the same effect).
However, that isn't even necessary. Named pipes (not true for anyonymous pipes!) can be used with overlapped I/O. Which means you can use ReadFileEx to initiate a read from the pipe, and it will linger there in the background until it can complete.
You can think of this kind of I/O as "fire and forget". Your process continues running while the read operation is blocked. When the read operation completes, it signals an event or posts a completion message to a completion port (which you can query) or posts an asynchronous procedure call ("APC", a more fancy name for "callback") to the thread that originally called it. That's as close to a "signal" as you can get under Windows.
Unluckily, APCs don't quite work as one would wish, since they only execute at well-defined points (when a thread is in an "alertable wait state", which you must do explicitly by setting the altertable flag in a wait function or calling NtTestAlert).
The likely reasoning why the Windows designers made it that way that this is "safer", but it is also more annoying from an usability point of view. Alas, that is how it works.
Note that the overlapped I/O model is the exact opposite of the readiness notification system under e.g. Linux. Rather than asking the OS whether a descriptor is ready to be read, you tell the OS to read it, and you can have yourself be notified (or verify) whether this has completed.

Monitoring files asynchronously

On Unix: I’ve been through FAM and Gamin, and both seem to provide a client/server file monitoring system. I would rather have a system where I tell the kernel to monitor some inodes and it pokes me back when events occur. Inotify looked promising at first on that side: inotify_init1 let me pass IN_NONBLOCK which in turn caused poll() to return directly. However I understood that I would have to call it regularly if I wanted to have news about the monitored files. Now I’m a bit short of ideas.
Is there something to monitor files asynchronously?
PS: I haven’t looked on Windows yet, but I would love to have some answers about it too.
As Celada says in the comments above, inotify and poll are the right way to do this.
Signals are not a mechanism for reasonable asynchronous programming -- and signal handlers are remarkably dangerous for the inexperienced and even for the experienced. One does not use them for such purposes voluntarily.
Instead, one should structure one's program around an event loop (see http://en.wikipedia.org/wiki/Event-driven_programming for an overall explanation) using poll, select, or some similar system call as the core of your program's event handling mechanism.
Alternatively, you can use threads, or threads plus an event loop.
However interesting are you answers, I am sorry but I can’t accept a mechanism based on blocking calls on poll or select, when the question states “asynchronously”, regardless of how deep it is hidden.
On the other hand, I found out that one could manage to run inotify asynchronously by passing to inotify_init1 the flag IN_NONBLOCK. Signals are not triggered as they would have with aio, and a read call that would block blocking would set errno to EWOULDBLOCK instead.

Total system freezing when using timers in graphical application

I’m really stuck with this issue and will greatly appreciate any advice.
The problem:
Some of our users complain about total system “freezing” when using our product. No matter how we tried, we couldn’t reproduce it in any of systems available for troubleshooting.
The product:
Physically, it’s a 32bit/64bit DLL. The product has a self-refreshing GUI, which draws a realtime spectrogram of an audio signal
Problem details:
What I managed to collect from a number of fragmentary reports makes the following picture:
When GIU is opened, sometimes immediately, sometimes after a few minutes of GIU being visible, the system completely stalls, without possibility to operate with windows, start Task Manager etc. No reactions on keyboard, no mouse cursor seen (or it’s seen but is not responsibe to mouse movements – this I do not know). The user has to hard-reset the system in order to reboot. What is important, I think, is that (in some cases) for some time the GIU is responsive and shows some adequate pictures. Then this freezing happens. One of the reports tells that once the system was frozen, the audio continued to be rendered – i.e. heard by the reporter (but the whole graphic shell of Windows was already frozen). Note: in this sort of apps it’s usually a specialized thread which is responsible for sound processing.
The freezing is more or less confirmed to happen for 2 users on Windows7 x64 using both 32 and 64 bit versions of the DLL, never heard of any other OSs mentioned with connection to this freezing (though there was 1 report without any OS specified).
That’s all that I managed to collect.
The architecture / suspicions:
I strongly suspect that it’s the GUI refreshing cycle that is a culprit.
Basically, it works like this:
There is a timer that triggers callbacks at a frame rate of approx 25 fps.
In this callback audio analysis is performed and GUI updated
Some details about the timer:
It’s based on this call:
CreateTimerQueueTimer(&m_timerHandle, NULL, xPlatformTimerCallbackWrapper,
this, m_firstExpInterval, m_period, WT_EXECUTEINTIMERTHREAD);
We create a timer and m_timerHandle is called periodically.
Some details about the GUI refreshing:
It works like this:
HDC hdc = GetDC (hwnd);
// Some drawing
ReleaseDC(hwnd,hdc);
My intuition tells me that this CreateTimeQueueTimer might be not the right decision. The reference page tells that in case of using WT_EXECUTEINTIMERTHREAD:
The callback function is invoked by the timer thread itself. This flag
should be used only for short tasks or
it could affect other timer
operations. The callback function is
queued as an APC. It should not
perform alertable wait operations.
I don’t remember why this WT_EXECUTEINTIMERTHREAD option was chosen actually, now WT_EXECUTEDEFAULT seems equally suitable for me.
In fact, I don’t see any major difference in using any of the options mentioned in the reference page.
Questions:
Is anything of what was told give anyone any clue on what might be wrong?
Have you faced similar problems, what was the reason?
Thanks for any info!
==========================================
Update: 2010-02-20
Unfortunatelly, the advise given here (which I could check so far) didn't help, namelly:
changing to WT_EXECUTEDEFAULT in CreateTimerQueueTimer(&m_timerHandle,NULL,xPlatformTimerCallbackWrapper,this,m_firstExpInterval,m_period, WT_EXECUTEDEFAULT);
the reenterability guard was already there
I havent' yet checked if updateding the GUI in WM_PAINT hander helps or not
Thanks for the hints anyway.
Now, I've been playing with this for a while, also got a real W7 intallation (I used to use the virtual one) and it seems that the problem can be narrowed down.
On my installation, using of the app really get the GUI far less responsive, although I couldn't manage to reproduce a total system freezing as someone reported.
My assumption now is this responsiveness degradation and reported total freezing have a common origin.
Then I did some primitive profiling and found that at least one of the culprits is BitBlt function that is called approx 50 times a second
BitBlt ((HDC)pContext->getSystemContext (), // hdcDest
destRect.left + pContext->offset.h,
destRect.top + pContext->offset.v,
destRect.right - destRect.left,
destRect.bottom - destRect.top,
(HDC)pSystemContext,
srcOffset.h,
srcOffset.v,
SRCCOPY);
The regions being copied are not really large (approx. 400x200 pixels). It is used for displaying the backbuffer and is executed in the timer callback.
If I comment out this BitBlt call, the problem seems to disappear (at least partly).
On the same machine running WinXP everything works just fine.
Any ideas on this?
Most likely what's happening is that your timer callback is taking more than 25 ms to execute. Then another timer tick comes along and it starts processing, too. And so on, and pretty soon you have a whole bunch of threads sucking down CPU cycles, all trying to do your audio analysis and in short order the system is so busy doing thread context switches that no real work gets done. And all the while, more and more timer ticks are getting placed into the queue.
I would strongly suggest that you use WT_EXECUTEDEFAULT here, rather than WT_EXECUTEINTIMERTHREAD. Also, you need to prevent overlapping timer callbacks. There are several ways to do that.
You can use a critical section in your timer callback. When the callback is triggered it calls TryEnterEnterCriticalSection and if not successful, just returns without doing anything.
You can do something similar using a volatile variable and InterlockedCompareExchange.
Or, you can change your timer to be a one-shot (WT_EXECUTEONLYONCE), and then re-set the timer at the end of every callback. That would make the thing execute 25 ms after the last one completed.
Which you choose is up to you. If your analysis often takes longer than 25 ms but not more than 35 ms, then you'll probably get a smoother update rate using WT_EXECUTEONLYONCE. If it's rare that analysis takes more than 25 ms, or if it often takes more than about 35 ms (but less than 50 ms), then you're probably better off using one of the other techniques.
Of course, if it often takes longer than 25 ms, then you probably want to increase the time (reduce the update rate).
Also, as one of the commenters pointed out, it's possible that the problem also involves accessing the GUI from the timer thread. You should do all of your analysis in the timer thread, store the results somewhere that the main thread can access it, and then send a message to the window proc, telling it to update the display.
Have you asked the users to disable Aero/WDMDWM? With Aero enabled, rendering is implemented quite different. Without Aero, the behaviour will be similar to XP. Not that it solves anything, but it will give you a clue as to what the problem is.

What are event handles?

I had a leaking handle problem ("Not enough quota available to process this command.") in some inherited C# winforms code, so I went and used Sysinternals' Handle tool to track it down. Turns out it was Event Handles that were leaking, so I tried googled it (took a couple tries to find a query that didn't return "Did you mean: event handler?"). According to Junfeng Zhang, event handles are generated by the use of Monitor, and there may be some weird rules as far as event handle disposal and the synchonization primitives.
I'm not entirely sure that the source of my leaking handles are entirely due to simply long-lived objects calling lots of synchronization stuff, as this code is also dealing with HID interfaces and lots of win32 marshaling and interop, and was not doing any synchronization that I was aware of. Either way, I'm just going to run this in windbg and start tracing down where the handles are originating from, and also spend a lot of time learning this section of the code, but I had a very hard time finding information about what event handles are in the first place.
The msdn page for the event kernel object just links to the generic synchronization overview... so what are event handles, and how are they different from mutexes/semaphores/whatever?
The NT kernel uses event objects to allow signals to transferred to entities that wait on the signal. A mutex and a semaphore are also waitable kernel objects (Kernel Dispatcher Objects), but with different semantics. The only time I ever came across them was when waiting for IO to complete in drivers.
So my theory on your problem is possibly a faulty driver, or are you relying on specialised hardware?
Edit: More info (from Windows Internals 5th Edition - Chapter 3 System Mechanics)
Some Kernel Dispatcher Objects (e.g. mutex, semaphore) have the of concept ownership. So when signalled the released one waiting thread will be released will grab these resources. And others will have to continue to wait. Events are not owned hence are available to be reset by any thread.
Also there are three types of events:
Notification : On signalled all waiting threads are released
Synchronisation : On signalled one waiting thread is released but the event is reset
Keyed : On signalled one waiting thread in the same process as the signaller is released.
Another interesting thing that I've learned is that critical sections (the lock primitive in c#) are actually not kernel objects, rather they are implemented out of a keyed event, or mutex or semaphore as required.
If you're talking about kernel Event Objects, then an event handle will be a handle (Int) that the system keeps on this object so other objects can reference it. IE Keep a 'handle' on it.
Hope this helps!

Resources