COMGLB_EXCEPTION_DONOT_HANDLE_ANY does not always work for unhandled C++ exceptions - windows

TL;DR
(Visual Studio 2019, Windows 10 (so far tested on 1809LTSC, because this is my dev machine)
We have an out-of-process COM server
We set COMGLB_EXCEPTION_DONOT_HANDLE_ANY
"Fatal" SEH exceptions are handled OK.
"Non-Fatal" SEH exceptions, among these C++ excpetions, are randomly swallowed or handled by the COM/RPC runtime stack.
Does COMGLB_EXCEPTION_DONOT_HANDLE _ANY work reliably? Are there any additional settings?
Neccessary Background
When using COM, the RPC layer will catch (and possibly swallow) all Structured SEH Exceptions (which include C++ exceptions). Raymond explains this very well:
Historically, COM placed a giant try/except around your server’s
methods. If your server encountered what would normally be an
unhandled exception, the giant try/except would catch it and turn it
into the error RPC_E_SERVERFAULT. It then marked the exception as
handled, so that the server remained running ... Mind you, this was
actually a disservice
Now there is a supposed solution, namely IGlobalOptions with setting COMGLB_EXCEPTION_DONOT_HANDLE_ANY.
This is supposed to (to quote The Old New):
... then go ahead and let the process crash.” In Windows 7, you can
ask for the even stronger COMGLB_EXCEPTION_DONOT_HANDLE_ANY, which
means “Don’t even try to catch ‘nonfatal’ exceptions.”
You can even find this recommendation in the docs:
It's important for applications that detect crashes and other
exceptions that might be generated while executing inbound COM calls,
... to set COMGLB_EXCEPTION_HANDLING to COMGLB_EXCEPTION_DONOT_HANDLE
to disable COM behavior of catching exceptions.
And the option is explained as:
COMGLB_EXCEPTION_DONOT_HANDLE_ANY:
When set and a fatal exception
occurs in a COM method, this causes the COM runtime to not handle the
exception. (caveat A)
When set and a non-fatal exception occurs in a COM method, this causes
the COM runtime to create a Windows Error Reporting (WER) dump and
terminate the process. Supported in Windows 7 and later. (caveat B)
And here's the thing
Neither of the above two statements is really accurate, but specifically for any non fatal exception, which C++ exceptions are, we get random behavior:
I have set up a simple client / server COM Test Program in VS2019 and intentionally generate an unhandled C++ exception: There are two modes at runtime, seemingly at random:
Server is terminated by the COM/RPC stack and we get an ID 1000 entry in the event log with the exceptioncode 0xe06d7363 (and a WER dump is written). The client gets 0x800706BE HRESULT in this case.
This is the advertised behavior.
Starting the client -> server a second (or third, ...) time, the C++ Exception DOES NOT terminate the server, and the client gets 0xe06d7363 as HRESULT for its server call. No event log entry written!
For those "fatal" SEH exceptions the termination happens reliably; but not for the non-fatal ones.
What is going on here?

Related

Detect UI operation which will "hang" the application if running in service mode

Fellow experts!
I have faced the following dilemma: some of our tools (executables) are started as scheduled tasks, some are started as services and others as usual desktop apps with interactive Windows user. We are using the code sharing strategy for source management (this is not debatable for this question).
So the solution I want to find is the following:
Detect UI operation at run-time which leads to hanging service/background task (such as say call to Application.ShowException, ShowMessage, MessageDialog, TForm.Show etc.). And when such an action detected I want to raise the exception instead. Then the operation will fail, we will have stack trace etc. but the process will not hang up! The most problematic hang up is when some event processing is done in transaction and then in some of the code used to process event suddenly (because of error in code, design, whatever) there is UI code executed then the process hangs and the DB parts can be locked!
What I think I need to do is: Use DDetours library to intercept WinAPI calls to a certain routines and raise exception instead (so that the process does not hang, but just fail in some method). Also I know that the creation of forms and windows does not hang the app, but only the tries to show them to the user.
Is there some known method of handling this problem? Or maybe there is some list of WinAPI routine set which hangs in service mode?
Thank you in advance.

Any method to catch a fatal error triggered within cgo code?

Our applications are using the odbc driver to access an Impala database. We've discovered that in certain difficult-to-replicate situations, the driver will trigger a segfault within its cgo code, which manifests as a fatal error once it propagates back up through the driver and to our code. Since we want some cleanup and alerting to happen in these situations, I implemented a deferred panic catcher, hoping this might catch them.
However, it isn't working. The fatal error continues straight past the deferred function containing the recover() call (so apparently it's not a panic, despite the print output looking similar), though it does catch other panics. A github issue suggests that cgo signals cannot be caught, and that applications should gracelessly and immediately crash if one occurs. This is an unacceptable crash case for our production applications, so I'm wondering if that's changed in the last 6 years, or if anyone knows of another way of running some cleanup code in the event of a cgo signal. It seems like extremely poor design to have no way at all catch and handle these fatal errors.

Windows stack guard page not triggering in _chkstk

I have seen a few crashes "in the wild" where the crash dump shows the code throwing an access violation inside _chkstk when attempting to expand the stack. Windbg shows that _chkstk is touching the guard page, however rather than expanding the stack as it should, it just throws an access violation.
I suspected this might be due to user mode structured exception handlers in the code, however my testing shows that under normal conditions the _chkstk guard page exceptions happen in kernel mode and never even reach the user mode exception handlers.
Hence in this case it looks like the kernel mode guard page exceptions are not being handled for some reason, and instead user mode access violations are triggered.
What could cause this to happen?
This turned out to be an XP/Server 2003 kernel specific issue. On those OS if one thread reads another threads stack then the guard page and TIB state get messed up and any subsequent attempts to grow the stack (_chkstk) result in an access violation. This does not happen on later OS.
In our case we were writing out an in-process minidump containing thread stacks, which would corrupt the stack state as described when the dbghelp library read each threads stack.
The moral of the story is that it is not safe to generate in-process minidumps, they should always be generated by an external process.

What is "Microsoft C++ Visual Runtime Library: Runtime error!" and how can I capture it?

Seldom I receive a report from some user that the application has terminated itself with a following message box:
Microsoft C++ Visual Runtime Library
Runtime error!
Program: XXXXX.exe
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
Unfortunately the application terminates silenly after showing the message. We have a crash dump generation on structured exceptions, but as there is no exception here, no crash dump is generated.
What can be causing this message?
Is there some way to change the application so that instead of (or in addtion to) showing the message a minidump is generated (or some other custom handling is done by the application)?
The message is produced by abort(), which can be called either directly, or by badly designed exceptions - see unexpected() or terminate(), as described in Disable Microsoft Visual C++ Runtime Error. Whether the message is shown or not can be adjusted using _set_abort_behavior call. On XP and later the application should create a minidump by default and send it to Windows Error Reporting service. If you need a custom handler (e.g. custom crash dump), the only (non-standard) possibility seems to be to provide your own implementation for the abort() function.
The default implementation of abort in Microsoft C Runtime Library does following:
shows the message box or prints the message to the console
raises handler for SIGABRT if there is any
if fault reporting is allowed, then
deletes any handler for unhandled exceptions using SetUnhandledExceptionFilter(NULL)
executes UnhandledExceptionFilter with an artificially prepared exception information
calls _exit(3) to terminate the process without any additional cleanup
Including a following code in your source makes the application to perform default structured exception handling (including any filter you may have installed):
extern "C" void __cdecl abort (void)
{
volatile int a = 0;
a = 1/a;
}
The application has called abort() most likely because terminate() has been called after an exception has escaped a destructor during stack unwinding or because an exception was not called.
See an answer to this related question for details. Basically you have to catch and handle all exceptions at the top level, not let exceptions escape destructors. Start your program under debugger and enable "Stop when exception is thrown" to find what exactly is going wrong inside and fix that.

What errors / exceptions trigger Windows Error Reporting?

When running a Delphi application outside the debugger most exceptions that occur seem to be silently ignored (like an access violation). Sometimes however there appears the Windows error reporting dialog (send or not send, you probably know what I mean). What exactly does this mean? What errors trigger this behaviour?
Additional info: I have a global exception handler for my application that should log all unhandled exceptions. So, no exceptions should leave the application unhandled.
Thanks.
Most exceptions are not silently ignored when running outside the debugger. They are normally caught by the event loop in VCL applications, or fall through to the main begin/end in console applications, etc. The default aciton of the VCL event loop is to display a dialog containing the message associated with the exception.
It's if the exception escapes the application, either by reaching the main begin/end without being caught, or not being caught by the event loop, that the Windows error reporting steps in - functionally, it is an exception handler just like any other except at the very base of the stack.
It covers exceptions that are not handled by the application - if an exception propagates outside of the main entry point of the app, then WER will step in. This covers things like AVs, divide by zero, invalid handle access and other out of band or "chip" exceptions. Sometimes your code can attempt to handle those things, but if memory is corrupted too badly or what have you, then your code will die.
You will generally get problems if you have exceptions in threads that are not handled in the Execute method. The program will mostly be killed, but the behaviour is unpredictable and seems to depend on many things (like the number and state of other threads). Often the main window vanishes immediately, and any further exceptions will thus not be handled by the program, and this is probably what causes WER to catch them.
I made it a habit to have an outer exception handler in Execute that logs any unhandled exceptions and allows the thread to terminate cleanly.
Exceptions occurring in initialization and finalization sections would escape your global exception handler and trigger WER.

Resources