Can I disable the unhandled exception handler for a process? - windows

I am performing system tests on a SAP system. From time to time, SAP crashes and I'd like to recover from those crashes by resetting the virtual machine to a previously saved state.
My problem is that I cannot detect such crashes reliably. I have created WER LocalDumps registry entries, but I don't get dumps.
It seems SAP has registered an unhandled exception handler and performs different tasks on different types of exceptions. Sometimes it shows a message box and terminates the application (e.g. in case of compression errors), sometimes it goes with a so-called Short Dump.
I am neither interested in the message box, nor in the short dump, so I am looking for a way to disable the unhandled exception handler of SAP. This should bring up WER, which writes the dump file and I can take actions to restart my system tests.
For performance reasons, I'd not like to restart the VM on every test.
I have tried:
I am basically familiar with unhandled exception handlers. I have applied them to my own .NET code successfully.
I looked at SetUnhandledExceptionFilter (MSDN) and similar but it applies to the calling process only and I cannot modify the code of SAP.
I read about DisableUserModeCallbackFilter but I don't think it is helpful for my case
I wonder whether there is a Registry Setting (e.g. in ImageFileExecutionOptions) or a Shim that I could activate.

According to Hans Passant's comment (which I take as an authorative answer),
There is no boss override switch built into the operating system to stop it from doing this.
I finally attached the debugger to SAP GUI at a time where the process was alive. Starting with all exceptions enabled, I narrowed down the conditions so that WinDbg would break when SAP GUI crashed (first chance, then second chance).

Related

Detect/Redirect core dumps (when a software crashes) on Windows

For my work, I need to create a service that will detect an abnormal program termination and, instead of displaying a message to the user (default behavior), send the generated core dump to a remote server.
I'm pretty sure this can be done, but I have absolutely no clue on where to start. Is there any API/registry settings for this ?
Thank you.
One method is to install an Unhandled Exception Filter and then write a minidump from it which you can then upload to some place of your choosing. I wouldn't totally disregard Windows Error Reporting -- that's an addition to any crash reporting of your own. If your application is for public release then registering for Windows Error Reporting is well worthwhile as you get information about which crashes users are encountering in the wild and when crashes have been fixed you can add a response code to point them to a new version or other relevant information.
Another tool that may be useful depending on how your application is deployed in your organisation is to run Adplus on a user's machine which will collect together crash dumps. This is more useful for one-off crashes that seem to affect an individual user but aren't reproducible in a development environment.
Some other useful links:
http://www.debuginfo.com/articles/effminidumps.html - some useful sample code
http://www.codeproject.com/KB/debug/postmortemdebug_standalone1.aspx
It seems my question was either obvious or stupid (both ?) but anyway, i found some interesting pages after some researches.
Here are the links I found useful:
Track application crashes and disable Windows Error Reporting at the same time!
Disable error reporting

General Protection Fault

How to detect the process that caused a GPF?
I'm not sure I understand your question. GPF - is the situation where a processor issues an interrupt.
If this happens in the user-mode - it's translated into a SEH exception, which in turn may be handled by the process. If it's not handled - the process "crashes". Means - an ugly message box is displayed and the process is terminated (depending on the settings the process may also be debugged, debug dump generated and etc.)
IF this happens in the kernel-mode - there're two possibilities. If this happened in a context of where exceptions are allowed - SEH exception is raised and handled (similarly to user-mode). If however the exception is not handled, or the context in which GPF happened doesn't allow exceptions - the OS shuts down, displaying the so-called BSOD (blue screen of death).
Now about your question, I see several possibilities:
OS dies, and you want to know which process made the system call which caused the GPF in the kernel mode.
This is possible to discover with kernel debugger attached. You'll also see the driver that caused the error.
The GPF happens in the user-mode inside a process, and it's not handled.
This process will crash, and you'll definitely know which process was that.
The GPS happens inside the process, handled, and the process continues to run. And you want to be notified about this.
For this you can attach to the process with a debugger. Whenever a SEH exception occurs inside a process - the debugger is notified by the OS.

Disabling Windows error reporting (Dr. Watson) for my process

I have an application that is hosting some unstable third-party code which I can't control in an external process to protect my main application from nasty errors it exhibits. My parent process is monitoring the other process and doing "the right thing (tm)" when it fails.
The problem that I have is that Dr. Watson is still detecting crashes in the isolated process and attaching to the processes on the way down to take a crash dump. This has the two problems of:
1. Dramatically slowing down the time that it takes for me to detect a failure because the process stays alive while the crash dump is being taken.
2. Showing annoying popups to the user asking if they want to submit the error reports to Microsoft.
Clearly I would prefer to fix the bugs in the child process, but given that it isn't an option, I would like to be able to selectively disable Dr. Watson (and Windows Error Reporting in Vista+) for that process.
I am running some of my own code in the process before handing off to the untrusted bit, so if there is an API that I can call that affects the current process that would be fine.
I am aware of: http://support.microsoft.com/default.aspx/kb/188296 which would disable Dr. Watson for the entire machine. I don't want to do that because it would make me a bad citizen to trash a machine-wide setting.
I am also aware of the WerSetFlags option in Vista+ that would seem to disable windows error reporting for the current process, but I need something that will disable Dr.Watson on earlier OS versions.
The good doctor is invoked when a process does not handle a certain exception. Therefore, the common way to go would be to handle all exceptions yourself. In your case, it is much harder since you don't own the crashing process code. What you can do then, is to inject your code into the other process at runtime, and install an exception handler that will swallow the exception causing the crash. When caught, gracefully shut down the process.
There are quite a few questions here talking about injecting code into another process. As for the crash handler, you can either set an unhandled exception filter, or add a vectored exception handler. Note that for the latter, you'll have to be careful not to swallow legit exceptions that are in fact handled inside the other process, namely find a way to recognize the crashing exception and make sure it is the only one you handle.
You want to disable the GPF popup: http://blogs.msdn.com/oldnewthing/archive/2004/07/27/198410.aspx

How to reload a crashed process on Windows

How to reload a crashed process on Windows? Of course, I can run a custom monitoring Win service process. But, for example, Firefox: it doesn't seem to install such a thing, but still it can restart itself when it crashes.
On Vista and above, you can use the RegisterApplicationRestart API to automatically restart when it crashes or hangs.
Before Vista, you need to have a top level exception filter which will do the restart, but be aware that running code inside of a compromised process isn't entirely secure or reliable.
Firefox constantly saves its state to the hard disk, every time you open a tab or click a link, or perform some other action. It also saves a flag saying it shut down safely.
On startup, it reads this all back, and is able to "restore" based on that info.
Structured exception handling (SEH) allows you to catch program crashes and to do something when it happens.
See: __try and __except
SEH can be very dangerous though and could lead to your program hanging instead. Please see this article for more information.
If you write your program as an NT service then you can set the first, second and subsequent failure actions to "Restart the service".
For Windows 2008 server and Windows Vista and Windows 7 you can use the Win32 API RegisterApplicationRestart
Please see my answer here for more information about dealing with different types of program crashes.
If I recall correctly Windows implements at least some subset of POSIX and so "must" have the signal interface (things like SIGKILL, SIGSEGV, SIGQUIT etc.).
I've never done this but on linux, but you could try setting the unexpected termination trap with signal() (signal.h).
From quick scan of docs it seems that very few things can be done while handling signal, it may be possible that even starting a new process is on forbidden list.
Now that I've thought about it, I'd probably go with master/worker pattern, very simple parent thread that does nothing but spawns the worker (that does all the UI / other things). If it does not set a specific "I'm gonna die now" bit but still dies (parent process always gets message / notification that spawned process died) then master respawns the worker. The main theme is keep master very simple and hard to die due to own bugs.

How does a debugger work?

I keep wondering how does a debugger work? Particulary the one that can be 'attached' to already running executable. I understand that compiler translates code to machine language, but then how does debugger 'know' what it is being attached to?
The details of how a debugger works will depend on what you are debugging, and what the OS is. For native debugging on Windows you can find some details on MSDN: Win32 Debugging API.
The user tells the debugger which process to attach to, either by name or by process ID. If it is a name then the debugger will look up the process ID, and initiate the debug session via a system call; under Windows this would be DebugActiveProcess.
Once attached, the debugger will enter an event loop much like for any UI, but instead of events coming from the windowing system, the OS will generate events based on what happens in the process being debugged – for example an exception occurring. See WaitForDebugEvent.
The debugger is able to read and write the target process' virtual memory, and even adjust its register values through APIs provided by the OS. See the list of debugging functions for Windows.
The debugger is able to use information from symbol files to translate from addresses to variable names and locations in the source code. The symbol file information is a separate set of APIs and isn't a core part of the OS as such. On Windows this is through the Debug Interface Access SDK.
If you are debugging a managed environment (.NET, Java, etc.) the process will typically look similar, but the details are different, as the virtual machine environment provides the debug API rather than the underlying OS.
As I understand it:
For software breakpoints on x86, the debugger replaces the first byte of the instruction with CC (int3). This is done with WriteProcessMemory on Windows. When the CPU gets to that instruction, and executes the int3, this causes the CPU to generate a debug exception. The OS receives this interrupt, realizes the process is being debugged, and notifies the debugger process that the breakpoint was hit.
After the breakpoint is hit and the process is stopped, the debugger looks in its list of breakpoints, and replaces the CC with the byte that was there originally. The debugger sets TF, the Trap Flag in EFLAGS (by modifying the CONTEXT), and continues the process. The Trap Flag causes the CPU to automatically generate a single-step exception (INT 1) on the next instruction.
When the process being debugged stops the next time, the debugger again replaces the first byte of the breakpoint instruction with CC, and the process continues.
I'm not sure if this is exactly how it's implemented by all debuggers, but I've written a Win32 program that manages to debug itself using this mechanism. Completely useless, but educational.
In Linux, debugging a process begins with the ptrace(2) system call. This article has a great tutorial on how to use ptrace to implement some simple debugging constructs.
If you're on a Windows OS, a great resource for this would be "Debugging Applications for Microsoft .NET and Microsoft Windows" by John Robbins:
http://www.amazon.com/dp/0735615365
(or even the older edition: "Debugging Applications")
The book has has a chapter on how a debugger works that includes code for a couple of simple (but working) debuggers.
Since I'm not familiar with details of Unix/Linux debugging, this stuff may not apply at all to other OS's. But I'd guess that as an introduction to a very complex subject the concepts - if not the details and APIs - should 'port' to most any OS.
I think there are two main questions to answer here:
1. How the debugger knows that an exception occurred?
When an exception occurs in a process that’s being debugged, the debugger gets notified by the OS before any user exception handlers defined in the target process are given a chance to respond to the exception. If the debugger chooses not to handle this (first-chance) exception notification, the exception dispatching sequence proceeds further and the target thread is then given a chance to handle the exception if it wants to do so. If the SEH exception is not handled by the target process, the debugger is then sent another debug event, called a second-chance notification, to inform it that an unhandled exception occurred in the target process. Source
2. How the debugger knows how to stop on a breakpoint?
The simplified answer is: When you put a break-point into the program, the debugger replaces your code at that point with a int3 instruction which is a software interrupt. As an effect the program is suspended and the debugger is called.
Another valuable source to understand debugging is Intel CPU manual (Intel® 64 and IA-32 Architectures
Software Developer’s Manual). In the volume 3A, chapter 16, it introduced the hardware support of debugging, such as special exceptions and hardware debugging registers. Following is from that chapter:
T (trap) flag, TSS — Generates a debug exception (#DB) when an attempt is
made to switch to a task with the T flag set in its TSS.
I am not sure whether Window or Linux use this flag or not, but it is very interesting to read that chapter.
Hope this helps someone.
My understanding is that when you compile an application or DLL file, whatever it compiles to contains symbols representing the functions and the variables.
When you have a debug build, these symbols are far more detailed than when it's a release build, thus allowing the debugger to give you more information. When you attach the debugger to a process, it looks at which functions are currently being accessed and resolves all the available debugging symbols from here (since it knows what the internals of the compiled file looks like, it can acertain what might be in the memory, with contents of ints, floats, strings, etc.). Like the first poster said, this information and how these symbols work greatly depends on the environment and the language.

Resources