What's the purpose of multi-threaded apartments? - windows

I understand the reason behind STA, but don't really see the reason for MTA.
A COM object can be loaded without any apartments, right? That means it's already able to take calls asynchronously, since no one puts any constraints.
Where am I wrong?

First, your assumption is wrong: a COM object can not be created outside of any apartment. Generally a thread should only create COM objects if it has previously called CoInitialize or CoInitializeEx, which places it in an apartment. Otherwise, creation will usually fail. There is the edge case of the implicit multithreaded apartment (if another thread of the same process initialized it), but even then you would be in the MTA, just in an unreliable and hard to debug way. No COM object ever exists without being in an apartment.
The reason you want an MTA is that it is not necessarily the only apartment. A process can have one MTA and arbitrarily many STAs. Calls between the MTA and any of the STAs still need to be marshaled; if they weren't, one of the MTA threads could call an STA thread in an unsafe way.
In fact having at least one STA is the rule rather than the exception: The user interface wants to live in an STA, because it depends on messages (for example because of mouse clicks etc.) to be processed in sequence.

Related

How does COM avoid deadlocks when 2 objects call into each other?

Let's say there are two apartment-threaded COM objects, located in different apartments. Or maybe they're in different processes altogether. If one object calls a method on another, which in turn calls a method back on the first object, how does COM prevent the whole thing from deadlocking?
What you describe is called reentrancy.
The truth is that COM doesn’t do anything explicit to prevent reentrancy issues. It’s up to the implementer of each object to take precautions where needed, as applicable.
Funny enough, reentrancy in COM is far less common in real life than you would think. Object graphs in COM tend to be mostly trees, which do not exhibit reentrancy. When you have cycles it’s almost always because of objects exposing event-type functionality of some sort, typically Connection Points.
Event callbacks are very limited in scope and they trigger under the explicit control of each object’s code, so the programmer is able to easily time them so they occur at safe places (for example at/near the end of a method’s body after all the real work is done). This prevents serious reentrancy issues from developing.
But nothing stops you from coding something dangerous. For example, if an object triggers an event while its internal object state is inconsistent, all bets are off.
You mention deadlocks. Deadlocks require a locking mechanism of some sort (for example a Critical Section) and should be extremely rare to impossible in COM apartments for the reasons listed above. Any object that triggers an event while holding a lock is asking for serious trouble, and a deadlock is not the biggest of its worries: by virtue of being an STA object the reentrant call will run on the same thread, and it will be able to acquire the locks again and proceed right through, which means it’s very likely that the object will corrupt its internal state, cause a crash, or worse. Note that locks in an STA thread only make sense if the resources controlled by the lock are accessible to threads outside the object’s STA.
And finally, nothing in COM stops you from causing an infinite recursion loop and subsequent stack overflow either. For example, take two COM objects Obj1 and Obj2, with Obj2 implementing an event. We can have Obj1 call a pObj2->SomeMethod(…) which causes Obj2 to fire the event; then have obj1 listen (“sink”) to that event, and have that event handler call SomeMethod() again.
UPDATE:
Profound thanks to Remy Lebeau for pointing to in his comment something I had forgotten to discuss, via a link to CodeGuru article Understanding COM Apartments, Part I. And in the process I also learned something new myself I should have known about.
There is one aspect of reentrancy and locking to consider and that is what happens during inter-apartment calls (either STA<->STA, STA<->MTA, or even STA<->OutofProc). During an inter-apartment call the STA (caller's) thread needs to stall and wait for an answer to the call request; the response cannot (by definition) execute on the same thread. But it can't just fully block (e.g. WaitForSingleObject) waiting for the response because the thread needs to be able to respond and process not only potential callbacks to the original object, but also to callbacks to any other object inside of the same apartment. If it were to fully block, the COM infrastructure itself would be introducing the potential for a deadlock and you wouldn't even need a dependency cycle between objects. So the COM marshalling infrastructure uses a more complex form of Wait that can unblock for a few other situations (Hans Passat points to CoWaitForMultipleHandles which looks right to me but I don't know the infrastructure to that level). If an applicable callback occurs, the marshalling infrastructure will unblock and allow that call to enter the apartment and proceed.
This is a form of locking induced by the COM infrastructure itself, rather than one coded explicitly as part of the object's implementation, which is why I hadn't thought of bringing it up. So COM does in fact "do something to prevent deadlocks", but to prevent deadlock potentials induced by its own infrastructure.
The part that I hadn't consciously realized was that this mechanism is very selective. It only lets through COM calls that form part of the same causality chain, that is, a callback, a direct consequence of the call that the thread was waiting on. Other COM calls into the apartment have to queue up and wait for that call chain to conclude, and for the STA thread to return to the thread's message loop.1
1 It makes complete sense that it needs to be that way, but I don't think I ever realized it.

Smalltalks runtime state

For Javascript, there exists this excellent intro that explains the runtime state: http://latentflip.com/loupe/
For Smalltalk, I have never found a similar overview of how the runtime and image snapshots are structured.
It is said that a Smalltalk image consists of objects that can send each other messages. This creates many questions:
Is only one object ever active at a time?
Is there a "root scheduler" that starts up designated "process" objects?
Does each suspended image have a list of active objects?
What happens if two active objects send a message to a third one?
Can only one message be handled at a time? What is the level of "atomicity"?
How do two active objects communicate?
Does every object have an "inbox" of messages received, but not yet processed?
Does every object have an event loop?
Is only one object ever active at a time?
Yes, while the systems can schedule different "processes", which are instances of the class Process running at different priorities, these take control one at the time. Since the scheduling is non-preemptive, processes must explicitly yield or wait on a semaphore (instance of the class Semaphore).
Is there a "root scheduler" that starts up designated "process" objects?
Yes, the global Processor (an instance of ProcessorScheduler) keeps and manages the prioritized list of processes that are ready to run (the others being the ones that are waiting on some semaphore).
Does each suspended image have a list of active objects?
The suspended image is nothing but the image. So, yes, it has everything in it, in particular the Processor, which knows who the activeProcess is.
What happens if two active objects send a message to a third one?
Messages are sent one at a time (even though they may be interrupted by the Virtual Machine)
Can only one message be handled at a time? What is the level of "atomicity"?
The level of atomiciy (non-interruptibility) is essentially the bytecode: message sends, assignments, etc. Any operation perceived as atomic by the programmer.
How do two active objects communicate?
Objects communicate by means of message sends.
In some way, the answer to your question depends on the virtual machine that one is using. And while many Smalltalk implementations or derivatives stick pretty close to the original concept, that may vary a lot.
As from your question, it looks like you are interested in conceptual answers, I recommend you to read the original Smalltalk "blue book".
Smalltalk-80: The Language and Its Implementation. By Adele Goldberg and David Robson, 1983, Addison-Wesley
The book talks in-depth about the design of the system and implementation of the core classes, but also has a few sections, in the end, providing specifications from the VM, interpreter, object memory, etc…

CoInitializeEx(COINIT_MULTITHREADED) and Goroutines using WMI

We have a monitoring agent written in Go that uses a number of goroutines to gather system metrics from WMI. We recently discovered the program was leaking memory when the go binary is run on Server 2016 or Windows 10 (also possibly on other OS using WMF 5.1). After creating a minimal test case to reproduce the issue it seems that the leak only occurs if you make a large number of calls to the ole.CoInitializeEx method (possibly something changed in WMF 5.1 but we could not reproduce the issue using the python comtypes package on the same system).
We are using COINIT_MULTITHREADED for multithread apartment (MTA) in our application, and my question is this: Because we are issuing OLE/WbemScripting calls from various goroutines, do we need to call ole.CoInitializeEx just once on startup or once in each goroutine? Our query code already uses runtime.LockOSThread to prevent the scheduler from running the method on different OS threads, but the MSDN remarks on CoInitializeEx seem to indicate it must be called at least once on each thread. I am not aware of any way to make sure new goroutines run on an already initialized OS thread, so multiple calls to CoInitializeEx seemed like the correct approach (and worked fine for the last few years).
We have already refactored the code to do all the WMI calls on a dedicated background worker, but I am curious to know if our original code should work using only one CoInitializeEx at startup instead of once for every goroutine.
AFAIK, since Win32 API is defined only in terms of native OS threads, a call to CoInitialize[Ex]() only ever affects the thread it completed on.
Since the Go runtime uses free M×N scheduling of the goroutines to OS threads, and these threads are created / deleted as needed at runtime in a manner completely transparent to the goroutines, the only way to make sure the CoInitialize[Ex]() call has any lasting effect on the goroutine it was performed on is to first bind that goroutine to its current OS thread by calling runtime.LockOSThread() and doing this for every goroutine intended to do COM calls.
Please note that this basically creates an 1×1 mapping between goroutines and OS threads which defeats much of the purpose of goroutines to begin with. So supposedly you might want to consider having just a single goroutine calling into COM and listening for requests on a channel, or having
a pool of such worker goroutines hidden behing another one which would dispatch the clients' requests onto the workers.
Update regarding COINIT_MULTITHREADED.
To cite the docs:
Multi-threading (also called free-threading) allows calls to methods
of objects created by this thread to be run on any thread. There is no
serialization of calls — many calls may occur to the same method or
to the same object or simultaneously. Multi-threaded object
concurrency offers the highest performance and takes the best
advantage of multiprocessor hardware for cross-thread, cross-process,
and cross-machine calling, since calls to objects are not serialized
in any way. This means, however, that the code for objects must
enforce its own concurrency model, typically through the use of
synchronization primitives, such as critical sections, semaphores, or
mutexes. In addition, because the object doesn't control the lifetime
of the threads that are accessing it, no thread-specific state may be
stored in the object (in Thread Local Storage).
So basically the COM threading model has nothing to do with initialization of the threads theirselves—but rather with how the COM subsystem is allowed to call the methods of the COM objects you create on the COM-initialized threads.
IIUC, if you will COM-initialize a thread as COINIT_MULTITHREADED and create some COM object on it, and then pass its reference to some outside client of that object so that it is able to call that object's methods, those methods can be called by the OS on any thread in your process.
I really have no idea how this is supposed to interact with Go runtime,
so I'd start small and would use a single thread with STA model and then
maybe try to make it more complicated if needed.
On the other hand, if you only instantiate external COM objects and not
transfer their descriptors outside (and it appears that's the case),
the threading model should not be relevant. That is, only unless some
code in the WUA API would call some "event-like" method on a COM object you
have instantiated.

Which Model to use STA/MTA

I'm trying to create COM component, where it will be frequently call by a excel application (excel will load the COM at its initialization) and another process (let say procA) also sends (with high frequency) windows messages to this component. Currently I implemented COM as a STA, however, I experienced that while COM is busy with processing messages from procA, the excel UI get stuck.
Please help me to get around this problem. Can I just create a simple window thread to process messages from procA while keeping COM as STA model ? Or do I need to make COM as MTA model, if so please explain how to handle it .
Thank You
Moving to an MTA requires you to perform all necessary locking to protect the state of your component. And it will add thread switch overhead because Excel's UI runs on a specific thread, which will block1 while calling cross thread into your component. The other process is already incurring the cross process overhead so no real change there.
You could avoid the Excel cross-thread overhead by marking your component model as threading model 'Neutral'—it can still be used from any thread while not being tied to exist in the MTA (i.e. all in process calls will be direct, no thread swicthes). Write it as free threaded (all the locking is still needed) but just change the registration.
Given all the effort to ensure your component is thread safe you may find there is no advantage unless multiple calls into your component can really run concurrently. If you just take a lock for the duration of each method you are not saving anything over being in an STA. Finer grained locking might give an advantage, but you need more detailed analysis of what concurrency is possible, and then profiling to prove you have been able to achieve it. A look at Amdahl's Law will cover these issues.
1 This is very simplistically put... the real situation is rather more complex.

Can I be sure that the code I write is always executed in the same thread?

I normally work on single threaded applications and have generally never really bothered with dealing with threads. My understanding of how things work - which certainly, may be wrong - is that as long as we're always dealing with single threaded code (i.e. no forks or anything like that) it will always be executed in the same thread.
Is this assumption correct? I have a fuzzy idea that UI libraries/frameworks may spawn off threads of their own to handle GUI stuff (which accounts for the fact that the Windows task manager tells me that my 'single threaded' application is actually running on 10 threads) but I'm guessing that this shouldn't affect me?
How does this apply to COM? For instance, if I were to create an instance of a COM component in my code; and that COM component writes some information to a thread-based location (using System.Threading.Thread.GetData for instance) will my application be able to get hold of that information?
So in summary:
In single threaded code, can I be sure that whatever I store in a thread-based location can be retrievable from anywhere else in the code?
If that single threaded code were to create an instance of a COM component which stores some information in a thread-based location, can that be similarly retrievable from anywhere else?
UI usually has the opposite constraint (sadly): it's single threaded and everything must happen on that thread.
The easiest way to check if you are always in the same thread (for, say, a function) is to have an integer variable set at -1, and have a check function like (say you are in C#):
void AssertSingleThread()
{
if (m_ThreadId < 0) m_ThreadId = Thread.CurrentThread.ManagedThreadId;
Debug.Assert(m_ThreadId == Thread.CurrentThread.ManagedThreadId);
}
That said:
I don't understand the question #1, really. Why store in a thread-based location if your purpose is to have a global scope ?
About the second question, most COM code runs on a single thread and, most often, on the thread where your UI message processing lives - this is because most COM code is designed to be compatible with VB6 which is single-thread.
The reason your program has about 10 threads is because both Windows (if you use some of its features like completion ports, or some kind of timers) and the CLR (for example for the GC or, again, some types of timers) may create threads in your process space (technically any program with enough priviledges, can too).
Think about having the model of having a single dataStore class running in your mainThread that all threads can read and write their instance variables to. This will avoid a lot of problems that might arise accessing threads all over the shop.
Simple idea, until you reach the fun part of threading. Concurrency and synchronization; simply, if you have two threads that want to read and write to the same variable inside dataStore at the same time, you have a problem.
Java handles this by allowing you to declare a variable or method synchronized, allowing only one thread access at a time.
I believe some .NET objects have Lock and Synchronized methods defined on them, but I know no more than this.

Resources