What are event handles? - winapi

I had a leaking handle problem ("Not enough quota available to process this command.") in some inherited C# winforms code, so I went and used Sysinternals' Handle tool to track it down. Turns out it was Event Handles that were leaking, so I tried googled it (took a couple tries to find a query that didn't return "Did you mean: event handler?"). According to Junfeng Zhang, event handles are generated by the use of Monitor, and there may be some weird rules as far as event handle disposal and the synchonization primitives.
I'm not entirely sure that the source of my leaking handles are entirely due to simply long-lived objects calling lots of synchronization stuff, as this code is also dealing with HID interfaces and lots of win32 marshaling and interop, and was not doing any synchronization that I was aware of. Either way, I'm just going to run this in windbg and start tracing down where the handles are originating from, and also spend a lot of time learning this section of the code, but I had a very hard time finding information about what event handles are in the first place.
The msdn page for the event kernel object just links to the generic synchronization overview... so what are event handles, and how are they different from mutexes/semaphores/whatever?

The NT kernel uses event objects to allow signals to transferred to entities that wait on the signal. A mutex and a semaphore are also waitable kernel objects (Kernel Dispatcher Objects), but with different semantics. The only time I ever came across them was when waiting for IO to complete in drivers.
So my theory on your problem is possibly a faulty driver, or are you relying on specialised hardware?
Edit: More info (from Windows Internals 5th Edition - Chapter 3 System Mechanics)
Some Kernel Dispatcher Objects (e.g. mutex, semaphore) have the of concept ownership. So when signalled the released one waiting thread will be released will grab these resources. And others will have to continue to wait. Events are not owned hence are available to be reset by any thread.
Also there are three types of events:
Notification : On signalled all waiting threads are released
Synchronisation : On signalled one waiting thread is released but the event is reset
Keyed : On signalled one waiting thread in the same process as the signaller is released.
Another interesting thing that I've learned is that critical sections (the lock primitive in c#) are actually not kernel objects, rather they are implemented out of a keyed event, or mutex or semaphore as required.

If you're talking about kernel Event Objects, then an event handle will be a handle (Int) that the system keeps on this object so other objects can reference it. IE Keep a 'handle' on it.
Hope this helps!

Related

How does COM avoid deadlocks when 2 objects call into each other?

Let's say there are two apartment-threaded COM objects, located in different apartments. Or maybe they're in different processes altogether. If one object calls a method on another, which in turn calls a method back on the first object, how does COM prevent the whole thing from deadlocking?
What you describe is called reentrancy.
The truth is that COM doesn’t do anything explicit to prevent reentrancy issues. It’s up to the implementer of each object to take precautions where needed, as applicable.
Funny enough, reentrancy in COM is far less common in real life than you would think. Object graphs in COM tend to be mostly trees, which do not exhibit reentrancy. When you have cycles it’s almost always because of objects exposing event-type functionality of some sort, typically Connection Points.
Event callbacks are very limited in scope and they trigger under the explicit control of each object’s code, so the programmer is able to easily time them so they occur at safe places (for example at/near the end of a method’s body after all the real work is done). This prevents serious reentrancy issues from developing.
But nothing stops you from coding something dangerous. For example, if an object triggers an event while its internal object state is inconsistent, all bets are off.
You mention deadlocks. Deadlocks require a locking mechanism of some sort (for example a Critical Section) and should be extremely rare to impossible in COM apartments for the reasons listed above. Any object that triggers an event while holding a lock is asking for serious trouble, and a deadlock is not the biggest of its worries: by virtue of being an STA object the reentrant call will run on the same thread, and it will be able to acquire the locks again and proceed right through, which means it’s very likely that the object will corrupt its internal state, cause a crash, or worse. Note that locks in an STA thread only make sense if the resources controlled by the lock are accessible to threads outside the object’s STA.
And finally, nothing in COM stops you from causing an infinite recursion loop and subsequent stack overflow either. For example, take two COM objects Obj1 and Obj2, with Obj2 implementing an event. We can have Obj1 call a pObj2->SomeMethod(…) which causes Obj2 to fire the event; then have obj1 listen (“sink”) to that event, and have that event handler call SomeMethod() again.
UPDATE:
Profound thanks to Remy Lebeau for pointing to in his comment something I had forgotten to discuss, via a link to CodeGuru article Understanding COM Apartments, Part I. And in the process I also learned something new myself I should have known about.
There is one aspect of reentrancy and locking to consider and that is what happens during inter-apartment calls (either STA<->STA, STA<->MTA, or even STA<->OutofProc). During an inter-apartment call the STA (caller's) thread needs to stall and wait for an answer to the call request; the response cannot (by definition) execute on the same thread. But it can't just fully block (e.g. WaitForSingleObject) waiting for the response because the thread needs to be able to respond and process not only potential callbacks to the original object, but also to callbacks to any other object inside of the same apartment. If it were to fully block, the COM infrastructure itself would be introducing the potential for a deadlock and you wouldn't even need a dependency cycle between objects. So the COM marshalling infrastructure uses a more complex form of Wait that can unblock for a few other situations (Hans Passat points to CoWaitForMultipleHandles which looks right to me but I don't know the infrastructure to that level). If an applicable callback occurs, the marshalling infrastructure will unblock and allow that call to enter the apartment and proceed.
This is a form of locking induced by the COM infrastructure itself, rather than one coded explicitly as part of the object's implementation, which is why I hadn't thought of bringing it up. So COM does in fact "do something to prevent deadlocks", but to prevent deadlock potentials induced by its own infrastructure.
The part that I hadn't consciously realized was that this mechanism is very selective. It only lets through COM calls that form part of the same causality chain, that is, a callback, a direct consequence of the call that the thread was waiting on. Other COM calls into the apartment have to queue up and wait for that call chain to conclude, and for the STA thread to return to the thread's message loop.1
1 It makes complete sense that it needs to be that way, but I don't think I ever realized it.

CoInitializeEx(COINIT_MULTITHREADED) and Goroutines using WMI

We have a monitoring agent written in Go that uses a number of goroutines to gather system metrics from WMI. We recently discovered the program was leaking memory when the go binary is run on Server 2016 or Windows 10 (also possibly on other OS using WMF 5.1). After creating a minimal test case to reproduce the issue it seems that the leak only occurs if you make a large number of calls to the ole.CoInitializeEx method (possibly something changed in WMF 5.1 but we could not reproduce the issue using the python comtypes package on the same system).
We are using COINIT_MULTITHREADED for multithread apartment (MTA) in our application, and my question is this: Because we are issuing OLE/WbemScripting calls from various goroutines, do we need to call ole.CoInitializeEx just once on startup or once in each goroutine? Our query code already uses runtime.LockOSThread to prevent the scheduler from running the method on different OS threads, but the MSDN remarks on CoInitializeEx seem to indicate it must be called at least once on each thread. I am not aware of any way to make sure new goroutines run on an already initialized OS thread, so multiple calls to CoInitializeEx seemed like the correct approach (and worked fine for the last few years).
We have already refactored the code to do all the WMI calls on a dedicated background worker, but I am curious to know if our original code should work using only one CoInitializeEx at startup instead of once for every goroutine.
AFAIK, since Win32 API is defined only in terms of native OS threads, a call to CoInitialize[Ex]() only ever affects the thread it completed on.
Since the Go runtime uses free M×N scheduling of the goroutines to OS threads, and these threads are created / deleted as needed at runtime in a manner completely transparent to the goroutines, the only way to make sure the CoInitialize[Ex]() call has any lasting effect on the goroutine it was performed on is to first bind that goroutine to its current OS thread by calling runtime.LockOSThread() and doing this for every goroutine intended to do COM calls.
Please note that this basically creates an 1×1 mapping between goroutines and OS threads which defeats much of the purpose of goroutines to begin with. So supposedly you might want to consider having just a single goroutine calling into COM and listening for requests on a channel, or having
a pool of such worker goroutines hidden behing another one which would dispatch the clients' requests onto the workers.
Update regarding COINIT_MULTITHREADED.
To cite the docs:
Multi-threading (also called free-threading) allows calls to methods
of objects created by this thread to be run on any thread. There is no
serialization of calls — many calls may occur to the same method or
to the same object or simultaneously. Multi-threaded object
concurrency offers the highest performance and takes the best
advantage of multiprocessor hardware for cross-thread, cross-process,
and cross-machine calling, since calls to objects are not serialized
in any way. This means, however, that the code for objects must
enforce its own concurrency model, typically through the use of
synchronization primitives, such as critical sections, semaphores, or
mutexes. In addition, because the object doesn't control the lifetime
of the threads that are accessing it, no thread-specific state may be
stored in the object (in Thread Local Storage).
So basically the COM threading model has nothing to do with initialization of the threads theirselves—but rather with how the COM subsystem is allowed to call the methods of the COM objects you create on the COM-initialized threads.
IIUC, if you will COM-initialize a thread as COINIT_MULTITHREADED and create some COM object on it, and then pass its reference to some outside client of that object so that it is able to call that object's methods, those methods can be called by the OS on any thread in your process.
I really have no idea how this is supposed to interact with Go runtime,
so I'd start small and would use a single thread with STA model and then
maybe try to make it more complicated if needed.
On the other hand, if you only instantiate external COM objects and not
transfer their descriptors outside (and it appears that's the case),
the threading model should not be relevant. That is, only unless some
code in the WUA API would call some "event-like" method on a COM object you
have instantiated.

GetMessage() while the thread is blocked in SwapBuffers()

Vsync blocks SwapBuffers(), which is what I want. My problem is that, since input messages go to the same thread that owns the window, any messages that come in while SwapBuffers() is blocked won't be processed immediately, but only after the vsync triggers the buffer swap and SwapBuffers() returns. So I have all my compute threads sitting idle instead of processing the scene for rendering in the next frame using the most recent input. I'm particularly concerned with having very low latency. I need some way to access all pending input messages to the window from other threads.
Windows API provides a way to wait for either Windows events or input messages using MsgWaitForMultipleObjects(), yet there's no similar way to wait for a buffer swap together with other things. That's very unfortunate.
I considered calling SwapBuffers() in another thread, but that requires glFinish() to be called in the window's thread before signalling another thread to SwapBuffers(), and glFinish() is still a blocking call so it's not a good solution.
I considered hooking, but that also looks like a dead end. Hooking with WH_GETMESSAGE will have the GetMsgProc() be called not asynchronously, but when the window's thread calls GetMessage()/PeekMessage(), so it's no help. Installing a global hook doesn't help me either due to the need of calling RegisterTouchWindow() with a specific window handle to process WM_TOUCH -- and my input is touch. And, while for mouse and keyboard, you can install low level hooks that capture messages as they're posted to the thread's queue, rather than when the thread calls GetMessage()/PeekMessage(), there appears to be no similar option for touch.
I also looked at wglDelayBeforeSwapNV(), but I don't see what's preventing the OS from preempting a thread sometimes after the call to that function but before SwapBuffers(), causing a miss of the next vsync signal.
So what's a good workaround? Can I make a second, invisible window, that will somehow be always the active one and so get all input messages, while the visible one is displaying the rendering? According to another discussion, message-only windows (CreateWindow with HWND_MESSAGE) are not compatible with WM_TOUCH. Is there perhaps some undocumented event that SwapBuffers() is internally waiting on that I could access and feed to MsgWaitForMultipleObjects()? My target is a fixed platform (Windows 8.1 64-bit) so I'm fine with using undocumented functionality, should it exist. I do want to avoid writing my own touchscreen driver, however.
Out of curiosity, why not implement your entire drawing logic in that other thread? It appears the problem you are running into is that the message pump is driven by the same thread that draws. Since Windows does not let you drive the message pump from a different thread than the one that created the window, the easiest solution would just be to push all the GL stuff into a different thread.
SwapBuffers (...) is also not necessarily going to block. As-per requirements of VSYNC an implementation need only block the next command that would modify the backbuffer while all backbuffers are pending a swap. Triple buffering changes things up a little bit by introducing a second backbuffer.
One possible implementation of triple buffering will discard the oldest backbuffer when it comes time to swap, thus SwapBuffers (...) would never cause blocking (this is effectively how modern versions of Windows work in windowed mode with the DWM enabled). Other implementations will eventually present both backbuffers, this reduces (but does not eliminate) blocking but also results in the display of late frames.
Unfortunately WGL does not let you request the number of backbuffers in a swap-chain (beyond 0 single-buffered or 1 double-buffered); the only way to get triple buffering on Windows is using driver settings. Lowest message latency will come from driving GL in a different thread, but triple buffering can help a little bit while requiring no effort on your part.

Need help in understanding nonarbitrary thread context?

I was reading a MSDN doc about driver synchronization and I come across a statement that goes like this
a driver can wait if
• The driver is executing in a nonarbitrary thread context. That is,
you can identify the thread that will enter a wait state. In practice,
the only driver routines that execute in a nonarbitrary thread context
are the DriverEntry, AddDevice, Reinitialize, and Unload routines of
any driver, plus the dispatch routines of highest-level drivers. All
these routines are called directly by the system
now my question is that why dispatch routines are considered in arbitrary thread context ?? Since read, write and other routines will be invoked when a request will be raised from user space, so we can know that which thread did that in system space ?? My be I am completely messed up or it could be a silly question but still help me coz i am a newbie in windwos.
ok i found the answer in a document :) and here is what it states ..
Although the highest-level drivers receive I/O requests in the context
of the requesting thread, they often forward those requests to their
lower level drivers on different threads. Consequently, you can make
no assumptions about the contents of the user-mode address space at
the time such routines are called

Which Model to use STA/MTA

I'm trying to create COM component, where it will be frequently call by a excel application (excel will load the COM at its initialization) and another process (let say procA) also sends (with high frequency) windows messages to this component. Currently I implemented COM as a STA, however, I experienced that while COM is busy with processing messages from procA, the excel UI get stuck.
Please help me to get around this problem. Can I just create a simple window thread to process messages from procA while keeping COM as STA model ? Or do I need to make COM as MTA model, if so please explain how to handle it .
Thank You
Moving to an MTA requires you to perform all necessary locking to protect the state of your component. And it will add thread switch overhead because Excel's UI runs on a specific thread, which will block1 while calling cross thread into your component. The other process is already incurring the cross process overhead so no real change there.
You could avoid the Excel cross-thread overhead by marking your component model as threading model 'Neutral'—it can still be used from any thread while not being tied to exist in the MTA (i.e. all in process calls will be direct, no thread swicthes). Write it as free threaded (all the locking is still needed) but just change the registration.
Given all the effort to ensure your component is thread safe you may find there is no advantage unless multiple calls into your component can really run concurrently. If you just take a lock for the duration of each method you are not saving anything over being in an STA. Finer grained locking might give an advantage, but you need more detailed analysis of what concurrency is possible, and then profiling to prove you have been able to achieve it. A look at Amdahl's Law will cover these issues.
1 This is very simplistically put... the real situation is rather more complex.

Resources