detect client process termination from EXE COM Server - windows

I'm writing an EXE COM Server that exposes a class that lock a system resource.
In normal execution the client release the resource (the COM executable shutsdown a couple of seconds later.
In abnormal execution, the client app crashes, leaving the com sever with an instance having a positive reference count. The COM executable runs for ~12 minutes until termination. This means that the system resource is locked during this time.
Is there a way to detect client termination instantaneously, as in socket IPC or driver protocol? if not it would seem that COM is inferior to other IPC mechanisms.

I had the same question a couple of years ago. I found the answer here: How To Turn Off the COM Garbage Collection Mechanism. In short: no, there is no way to detect client termination instantaneously. Excerpts:
When a COM client terminates normally,
it releases all references to its
server object. When a client
terminates abnormally however, there
might be outstanding references to the
server object. Without a garbage
collection mechanism, the server code
has no way of knowing when to reclaim
the resources allocated for the COM
object, which can then cause a
resource leak. To address this
problem, COM implements an automatic
garbage collection mechanism in which
the COM resolver process (RPCSS) on
the client machine pings the server
machine on behalf of the client
process.
Alternatives to using COM's GC
protocol (for example, using periodic
application-level "pings"--method
calls that inform the object that
clients are still alive, or using an
underlying transport mechanism such as
TCP keepalives) are demonstrably much
less efficient. Therefore, DCOM's
default GC mechanism should be used
for any objects that must be shut down
when their clients disappear or
otherwise misbehave if those objects
would effectively become memory leaks
on the server.
The resolver on the server machine
keeps track of the pings for each
server object. The ping period is 2
minutes and, currently, it is non-
configurable. When the resolver on the
server machine detects that an object
has not been pinged for 6 minutes, it
assumes that all clients of the object
have terminated or otherwise are no
longer using the object. The resolver
will then release all external
references to the object. It does this
by simply having the object's stub
manager (the COM runtime code that
delivers calls to each object) call
::Release() on the object's IUnknown
interface. At this point, the object's
reference count will be zero so far as
the COM runtime is concerned. (There
may still be references held by local
(same-apartment) clients, so the
object's internal reference count may
not necessarily go to zero at this
point.) The object may then shut
itself down.
NOTE: Garbage collection applies to
all servers regardless of whether
their clients are local or remote, or
a combination of local and remote. The
underlying pinging mechanism is
different in the local case as no
network packets are generated, but for
all practical purposes, the behavior
is the same.

Related

Resolve Windows socket error WSAENOBUFS (10055)

Our application has a feature to actively connect to the customers' internal factory network and send a message when inspection events occur. The customer enters the IP address and port number of their machine and application into our software.
I'm using a TClientSocket in blocking mode and have provided callback functions for the OnConnect and OnError events. Assuming the abovementioned feature has been activated, when the application starts I call the following code in a separate thread:
// Attempt active connection
try
m_socketClient.Active := True;
except
end;
// Later...
// If `OnConnect` and socket is connected...send some data!
// If `OnError`...call `m_socketClient.Active := True;` again
When IP + port are valid, the feature works well. But if not, after several thousand errors (and many hours or even days) eventually Windows socket error 10055 (WSAENOBUFS) occurs and the application crashes.
Various articles such as this one from ServerFramework and this one from Microsoft talk about exhausting the Windows non-paged pool and mention (1) actively managing the number outstanding asynchronous send operations and (2) releasing the data buffers that were used for the I/O operations.
My question is how to achieve this and is three-fold:
A) Am I doing something wrong that memory is being leaked? For example, is there some missing cleanup code in the OnError handler?
B) How do you monitor I/O buffers to see if they are being exhausted? I've used Process Explorer to confirm my application is the cause of the leak, but ideally I'd need some programmatic way of measuring this.
C) Apart from restarting the application, is there a way to ask Windows to clear out or release I/O operation data buffers?
Code samples in Delphi, C/C++, C# fine.
A) The cause of the resource leak was a programming error. When the OnError event occurs, Socket.Close() should be called to release low-level resources associated with the socket.
B) The memory leak does not show up in the standard Working Set memory use of the process. Open handles belonging to your process need to be monitored which is possible with GetProcessHandleCount. See this answer in Delphi which was tested and works well. This answer in C++ was not tested but the answer is accepted so should work. Of course, you should be able to use GetProcessHandleCount directly in C++.
C) After much research, I must conclude that just like a normal memory leak, you cannot just ask Windows to "clean up" after you! The handle resource has been leaked by your application and you must find and fix the cause (see A and B above).

Sharing Mach ports with child processes

I am doing a comparison of different IPC mechanisms available on Mac OS X (pipes, sockets, System V IPC, etc.), and I would like to see how Mach ports compare to the higher-level alternatives. However, I've run into a very basic issue: getting send rights to ports across processes (specifically, across a parent process and a child process).
Unlike file descriptors, ports are generally not carried over to forked processes. This means that some other way to transfer them must be established. Just about the only relevant page I could find about this was this one, and they state in an update that their method no longer works and never was guaranteed to, even though that method was suggested by an Apple engineer in 2009. (It implied replacing the bootstrap port, and now doing that breaks XPC.) The replacement they suggest uses deprecated functions, so that's not a very appealing solution.
Besides, one thing I liked about the old solution is that ports remained pretty much private between the processes that used it. There was no need to broadcast the existence of the port, just like pipes (from the pipe call) work once forked. (I'll probably live with it if there's another solution, but it's a little annoying.)
So, how do you pass a send right to a Mach port from a parent process to a child process?
bootstrap_register is deprecated but bootstrap_check_in isn't, and can be used to register your port which can later be retrieved by the child process by using bootstrap_look_up. (This still doesn't provide the privacy you're looking for, unfortunately).
The recommended solution is to not use Mach IPC directly at all but implementing your child process as an XPC service, in which case you can use the XPC API that will use Mach IPC behind the scene, yet you don't have to deal with any details. You have an easy API to send XPC messages in the parent and an easy API to receive XPC messages in the client, that can also pass back replies easily. The system will handle all the hard parts for you.
https://developer.apple.com/library/mac/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/CreatingXPCServices.html
If you cannot use the XPC API, keep in mind that when you register your service with bootstrap_check_in() (which is not deprecated), it won't be private, but if you do so in a user space process, it will be private to your login session: root processes won't see it, processes of other users neither. If you do that in a root process, it will be visible to all sessions, though.
Also note however, that you can control who may send you IPC messages and who not. You can request a mach_msg_audit_trailer_t when receiving a mach message. That way you get access to the audit_token_t of the sender. And using audit_token_to_pid() you can get the pid_t of the sender. As you know the PID of your child, you can simply ignore all messages (passing it to mach_msg_destroy() to avoid leaking resources), unless the message was sent by your child process. So you cannot avoid your port to be discover-able, but you can avoid that any process other than your child process may use this port.
And last not but not least, you can just give your port a random name, after all only your child process needs to know it, so you can dynamicall generate a name in the parent process and the pass it along to your child process, that way your port can be seen if software scans for ports but most software just uses hardcoded names anyway.
One thing you might try (although it's a gross hack) is hijacking the exception ports as an inheritance mechanism. Set a custom port as an exception port in the parent, fork the child, have the child get the custom port from its exception port, send its task port to the parent, the parent resets its exception port, resets the child's exception port, and then the two proceed from there with a communication channel. See task_set_exception_ports().

Is it true that COM services can't be used by multiple programs at the same time?

Before the application terminates its
execution, COM must be shut down
again. (Failure to shut down COM could
result in execution errors when
another program attempts to use COM
services .)
The above quote implies that, right?
No it doesn't.
If you fail to properly release all references to an out of process COM server and correctly close down COM it could lead to that instance of that service being in an odd state (everything should be OK after releasing all references, but sometimes COM might cache part of the out of process marshalling layer).
An out of process COM service can be designed to have separate component instances for each client (within or across services) that are completely independent (even if hosted in the same process), in which case it is hard to see how a failure of one client would affect other instances (other than wasting memory on instances until COM finally times them out). If the instances share state they can of course interfere even if the clients operate perfectly to the rules.
It is rather important that you quote the source of that quote so we can get the context. As near as I can see, you got that from a book about DirectShow programming. What it actually refers to is the need to call CoUninitialize().
Yes, that's kinda important. A thread should call CoInitializeEx() to initialize the COM infrastructure before it starts using any of the COM API functions. You really should call CoUninitialize() when that threads ends so stuff is properly cleaned up. Typically at the end of your program's main() function. Failure to do so may make another app fail when it finds a register class factory that in fact is dead.
This otherwise has nothing to do with a COM out-of-process server having to restrict itself in any way. You specify sharing mode with the REGCLS argument to CoRegisterClassObject(). Of course, a server should not exit and call CoUninitialize until all its objects are released.

What determines how long does an out of process COM server takes to notice that a client has died?

In a simple windows setup we have a COM singleton that runs as an out of process server.
Clients connect by calling cocreate and each receives an interface to the same instance of the server.
If clients shutdown normally they release their references.
The server has a bit of logic that keeps it alive for a short time after the last release to allow for new connections.
I'm interested in one special case - the server is running with only one client which crashes (consider this to be any random unknown crash) and the client exits without having released its references.
I observe that after a undefined period of time say 8 minutes the server receives release calls on the stubs of any objects that the server had returned interfaces from to the client. This appears to be an automatic cleanup that I assuming is started by the LRPC layer.
Is this documented anywhere and is the timeout configurable?
Note: Multithreaded apartment model used throughout.
This https://web.archive.org/web/20171228092925/http://www.microsoft.com/msj/0398/dcom.aspx under DCOM Garbage Collection seems to indicate that DCOM uses a 120 second timeout which needs to be missed three times, so about 6 minutes the client will be considered disconnected. Unfortunately it also indicates that it isn't user configurable and I cannot find any stuff to the contrary.

Cocoa Distributed Objects, GC client, non-GC server

I have a setup where there are two Cocoa processes, communicating with Distributed Objects (DO). The client is using garbage collection, the server is not.
It seems that the client hangs on to the distant objects outside of my direct references to them. This means that even after I don't have references to the objects, they hang around owned by NSDistantObjectTableEntry. Obviously they don't get deallocated on the server.
Only when the client is quit it lets go of all the distant objects. Breaking the connection manually would probably also work, but I don't want to do that while the client is running.
Is there a way to tell a GC'd DO client to let go of the distant objects that aren't referenced locally anymore?
There may be a retain cycle that spans the client and server - i.e. a client object is retaining a proxy of a server object, which is in turn retaining the proxy of the client's object.
That's a very simple example of a retain cycle, when more that two objects are involved it gets more complicated to diagnose.
See The Subtle Dangers Of Distributed Objects for example of other DO related gotchas.

Resources