Implementation of MTA COM server - winapi

I can't find any source code on the prerequisites of an MTA compliant COM. I tried changing the ThreadingModel registry key of my object from Apartment to Both, and it results in a crash when a secondary thread calls the method before any data is accessed.
If STA COMs require a message pump, what kind of plumbing code do MTA COM objects require?

I do not think that there is anything special about MTA, except that you need to use synchronization primitives like mutexes to synchronize access to your internal structures. Does the "Multithreaded Apartments" not give you all that you need?
Quoting from the documentation, emphasis is mine:
Because calls to objects are not serialized in any way, multithreaded object concurrency offers the highest performance and takes the best advantage of multiprocessor hardware for cross-thread, cross-process, and cross-machine calling. This means, however, that the code for objects must provide synchronization in their interface implementations, typically through the use of synchronization primitives such as event objects, critical sections, mutexes, or semaphores, which are described later in this section. In addition, because the object doesn't control the lifetime of the threads that are accessing it, no thread-specific state may be stored in the object (in thread local storage).

Related

CoInitializeEx(COINIT_MULTITHREADED) and Goroutines using WMI

We have a monitoring agent written in Go that uses a number of goroutines to gather system metrics from WMI. We recently discovered the program was leaking memory when the go binary is run on Server 2016 or Windows 10 (also possibly on other OS using WMF 5.1). After creating a minimal test case to reproduce the issue it seems that the leak only occurs if you make a large number of calls to the ole.CoInitializeEx method (possibly something changed in WMF 5.1 but we could not reproduce the issue using the python comtypes package on the same system).
We are using COINIT_MULTITHREADED for multithread apartment (MTA) in our application, and my question is this: Because we are issuing OLE/WbemScripting calls from various goroutines, do we need to call ole.CoInitializeEx just once on startup or once in each goroutine? Our query code already uses runtime.LockOSThread to prevent the scheduler from running the method on different OS threads, but the MSDN remarks on CoInitializeEx seem to indicate it must be called at least once on each thread. I am not aware of any way to make sure new goroutines run on an already initialized OS thread, so multiple calls to CoInitializeEx seemed like the correct approach (and worked fine for the last few years).
We have already refactored the code to do all the WMI calls on a dedicated background worker, but I am curious to know if our original code should work using only one CoInitializeEx at startup instead of once for every goroutine.
AFAIK, since Win32 API is defined only in terms of native OS threads, a call to CoInitialize[Ex]() only ever affects the thread it completed on.
Since the Go runtime uses free M×N scheduling of the goroutines to OS threads, and these threads are created / deleted as needed at runtime in a manner completely transparent to the goroutines, the only way to make sure the CoInitialize[Ex]() call has any lasting effect on the goroutine it was performed on is to first bind that goroutine to its current OS thread by calling runtime.LockOSThread() and doing this for every goroutine intended to do COM calls.
Please note that this basically creates an 1×1 mapping between goroutines and OS threads which defeats much of the purpose of goroutines to begin with. So supposedly you might want to consider having just a single goroutine calling into COM and listening for requests on a channel, or having
a pool of such worker goroutines hidden behing another one which would dispatch the clients' requests onto the workers.
Update regarding COINIT_MULTITHREADED.
To cite the docs:
Multi-threading (also called free-threading) allows calls to methods
of objects created by this thread to be run on any thread. There is no
serialization of calls — many calls may occur to the same method or
to the same object or simultaneously. Multi-threaded object
concurrency offers the highest performance and takes the best
advantage of multiprocessor hardware for cross-thread, cross-process,
and cross-machine calling, since calls to objects are not serialized
in any way. This means, however, that the code for objects must
enforce its own concurrency model, typically through the use of
synchronization primitives, such as critical sections, semaphores, or
mutexes. In addition, because the object doesn't control the lifetime
of the threads that are accessing it, no thread-specific state may be
stored in the object (in Thread Local Storage).
So basically the COM threading model has nothing to do with initialization of the threads theirselves—but rather with how the COM subsystem is allowed to call the methods of the COM objects you create on the COM-initialized threads.
IIUC, if you will COM-initialize a thread as COINIT_MULTITHREADED and create some COM object on it, and then pass its reference to some outside client of that object so that it is able to call that object's methods, those methods can be called by the OS on any thread in your process.
I really have no idea how this is supposed to interact with Go runtime,
so I'd start small and would use a single thread with STA model and then
maybe try to make it more complicated if needed.
On the other hand, if you only instantiate external COM objects and not
transfer their descriptors outside (and it appears that's the case),
the threading model should not be relevant. That is, only unless some
code in the WUA API would call some "event-like" method on a COM object you
have instantiated.

COM `IStream` interface pointer and access from different threads

Is it an official COM requirement to any IStream implementation, that it should be thread-safe, in terms of concurrent access to IStream methods through the same interface pointer across threads?
I am not talking about data integrity (normally, reads/writes/seeks should be synchronized with locks anyway). The question is about the need to use COM marshaller to pass IStream object to a thread from different COM apartment.
This is a more general question than I asked about IStream as returned by CreateStreamOnHGlobal, please refer there for more technical details. I'm just trying to understand this stuff better.
EDITED, I have found this info on MSDN:
Thread safety. The stream created by SHCreateMemStream is thread-safe
as of Windows 8. On earlier systems, the stream is not thread-safe.
The stream created by CreateStreamOnHGlobal is thread-safe.
Now I believe, the IStream object returned by CreateStreamOnHGlobal is thread-safe, but there is NO requirement that other IStream implementations should follow this.
No, it isn't. And the accepted answer to the other question is dead wrong. Hans Passant's answer is correct. You should delete this question because it presupposes a falsehood, namely that CreateStreamOnHGlobal returns a thread-safe IStream. It doesn't. You then ask if this is true of other IStream implementations. It isn't.
In computer programming generally, and COM in particular, objects have guarantees they give and guarantees they do not give. If you use an object in conformance with its guarantees, then it will work all the time (barring bugs). If you exceed the guarantees, it may still work most of the time, but this is no longer guaranteed.
Generally in COM, the thread-safety guarantee is given by one of the standard threading models.
See here: http://msdn.microsoft.com/en-us/library/ms809971.aspx
Apartment threaded objects can be instantiated on multiple threads, but can only be used from the particular thread they were instantiated on.
Multi-threaded apartment objects can be instantiated in a multi-threaded apartment and can be used from any of those threads.
"Both"-threaded objects can be instantiated in any thread, and used from any thread.
Note: The threading model belongs to the object not the interface. Some objects supporting IStream may be single-threaded, others may be fully-thread safe. This depends on the code which implements the interface. Because an interface is just a specification, and thread-safety is not something covered by it.
It is always harmless to marshal an interface. If the threading models of the threads are compatible with the object's home thread, you will get the exact same interface pointer back. If they are not compatible, you will get a proxy. But it never hurts to marshal, and unless you know that the objects are compatible, you should always marshal.
However it is always open to an implementer to give additional guarantees.
In the case of CoMarshalInterthreadInterfaceInStream, you are told in the documentation that the returned IStream interface can be used for unmarshalling at the destination thread, using CoUnmarshalInterfaceAndReleaseStream.
That is, you have been given an additional guarantee. So you can rely on that working.
But that does not apply to any other instance of IStream at any time.
So you should always marshal them.

Does Go support volatile / non-volatile variables?

I'm new to the language so bear with me.
I am curious how GO handles data storage available to threads, in the sense that non-local variables can also be non-volatile, like in Java for instance.
GO has the concept of channel, which, by it's nature -- inter thread communication, means it bypasses processor cache, and reads/writes to heap directly.
Also, have not found any reference to volatile in the go lang documentation.
TL;DR: Go does not have a keyword to make a variable safe for multiple goroutines to write/read it. Use the sync/atomic package for that. Or better yet Do not communicate by sharing memory; instead, share memory by communicating.
Two answers for the two meanings of volatile
.NET/Java concurrency
Some excerpts from the Go Memory Model.
If the effects of a goroutine must be observed by another goroutine,
use a synchronization mechanism such as a lock or channel
communication to establish a relative ordering.
One of the examples from the Incorrect Synchronization section is an example of busy waiting on value.
Worse, there is no guarantee that the write to done will ever be
observed by main, since there are no synchronization events between
the two threads. The loop in main is not guaranteed to finish.
Indeed, this code(play.golang.org/p/K8ndH7DUzq) never exits.
C/C++ non-standard memory
Go's memory model does not provide a way to address non-standard memory. If you have raw access to a device's I/O bus you'll need to use assembly or C to safely write values to the memory locations. I have only ever needed to do this in a device driver which generally precludes use of Go.
The simple answer is that volatile is not supported by the current Go specification, period.
If you do have one of the use cases where volatile is necessary, such as low-level atomic memory access that is unsupported by existing packages in the standard library, or unbuffered access to hardware mapped memory, you'll need to link in a C or assembly file.
Note that if you do use C or assembly as understood by the GC compiler suite, you don't even need cgo for that, since the [568]c C/asm compilers are also able to handle it.
You can find examples of that in Go's source code. For example:
http://golang.org/src/pkg/runtime/sema.goc
http://golang.org/src/pkg/runtime/atomic_arm.c
Grep for many other instances.
For how memory access in Go does work, check out The Go Memory Model.
No, go does not support the volatile or register statement.
See this post for more information.
This is also noted in the Go for C++ Programmers guide.
The Go Memory Model documentation explains why the concept of 'volatile' has no application in Go.
Loosely: Among other things, goroutines are free to keep goroutine-local changes cached in registers so those changes are not observable by other goroutines. To "flush" those changes to memory, a synchronization must be performed. Either by using locks or by communicating (channel send or receive).

what is the difference between _EPROCESS object and _KPROCESS object

Upon analysis, I learnt that even _KPROCESS objects can be members of the ActiveProcessLinks list. What is the difference between _EPROCESS and _KPROCESS objects? When is one created and one not? What are the conceptual differences between them?
This is simplified, but the kernel mode portion of the Windows O/S is broken up into three pieces: the HAL, the Kernel, and the Executive Subsystems. The Executive Subsystems deal with general O/S policy and operation. The Kernel deals with process architecture specific details for low level operations (e.g. spinlocks, thread switching) as well as scheduling. The HAL deals with differences that arise in particular implementations of a processor architecture (e.g. how interrupts are routed on this implementation of the x86). This is all explained in greater detail in the Windows Internals book.
When you create a new Win32 process, both the Kernel and the Executive Subsystems want to track it. For example, the Kernel wants to know the priority and affinity of the threads in the process because that's going to affect scheduling. The Executive Subsystems want to track the process because, for example, the Security Executive Subsystem wants to associate a token with the process so we can do security checking later.
The structure that the Kernel uses to track the process is the KPROCESS. The structure that the Executive Subsystems use to track it is the EPROCESS. As an implementation detail, the KPROCESS is the first field of the EPROCESS, so the Executive Subsystems allocate the EPROCESS structure and then call the Kernel to initialize the KPROCESS portion of it. In the end, both structures are part of the Process Object that represents the instance of the user process. This should also all be covered in the Windows Internals book.
-scott
Have a look here:
http://channel9.msdn.com/Shows/Going+Deep/Arun-Kishan-Process-Management-in-Windows-Vista
EPROCESS is the kernel mode equivalent of the PEB from user mode. More details can be found in this document on Alex Ionescu's site as well as the book by Schreiber and other books about the NT internals.
Use dt in WinDbg to get an idea how they look.
EPROCESS is not available in user mode. Neither is KPROCESS.
KPROCESS is a subset of EPROCESS. If you look at the fields in a debugger, you'll see the KPROCESS contains fields more related to scheduling and book-keeping of the process at a lower level, while EPROCESS has higher-level process contexts inside of it. The names, as far as I am aware, come from different subsystems that interact with these structures (the Executive has structures and functions frequently prefixed with Ex while the Kernel has structures and functions frequently prefixed with Ke)
You can see this in different documented functions. Consider the prototype for KeStackAttachProcess ( http://msdn.microsoft.com/en-us/library/ff549659(v=vs.85).aspx ), which is a Ke functions and takes a KPROCESS. There aren't any exported and documented Ex functions that accept EPROCESS (or KPROCESS), but Ps functions deal entirely in EPROCESSES.
A similar divide exists for threads, with KTHREAD and ETHREAD.

Are Interlocked* functions useful on shared memory?

Two Windows processes have memory mapped the same shared file. If the file consists of counters, is it appropriate to use the Interlocked* functions (like InterlockedIncrement) to update those counters? Will those synchronize access across processes? Or do I need to use something heavier, like a mutex? Or perhaps the shared-memory mechanism itself ensures consistent views.
The interlocked functions are intended for exactly that type of use.
From http://msdn.microsoft.com/en-us/library/ms684122.aspx:
The threads of different processes can use these functions if the variable is in shared memory.
Of course, if you need to have more than one item updated atomically, you'll need to go with a mutex or some other synchronization object that works across processes. There's nothing built-in to the shared memory mechanism to provide synchronization to access to the shared memory - you'll need to use the interlocked functions or a synchronization object.
From MSDN:
...
The Interlocked API
The interlocked functions provide a
simple mechanism for synchronizing
access to a variable that is shared by
multiple threads. They also perform
operations on variables in an atomic
manner. The threads of different
processes can use these functions if
the variable is in shared memory.
So, yes, it is safe with your shared memory approach.

Resources