How to detect the process that caused a GPF?
I'm not sure I understand your question. GPF - is the situation where a processor issues an interrupt.
If this happens in the user-mode - it's translated into a SEH exception, which in turn may be handled by the process. If it's not handled - the process "crashes". Means - an ugly message box is displayed and the process is terminated (depending on the settings the process may also be debugged, debug dump generated and etc.)
IF this happens in the kernel-mode - there're two possibilities. If this happened in a context of where exceptions are allowed - SEH exception is raised and handled (similarly to user-mode). If however the exception is not handled, or the context in which GPF happened doesn't allow exceptions - the OS shuts down, displaying the so-called BSOD (blue screen of death).
Now about your question, I see several possibilities:
OS dies, and you want to know which process made the system call which caused the GPF in the kernel mode.
This is possible to discover with kernel debugger attached. You'll also see the driver that caused the error.
The GPF happens in the user-mode inside a process, and it's not handled.
This process will crash, and you'll definitely know which process was that.
The GPS happens inside the process, handled, and the process continues to run. And you want to be notified about this.
For this you can attach to the process with a debugger. Whenever a SEH exception occurs inside a process - the debugger is notified by the OS.
Related
We develop a user-space process running on Linux 3.4.11 in an embedded MIPS system. The process creates multiple (>10) threads using pthreads. The process has a SIGSEGV signal handler which, among other things, generates a log message which goes to our log file. As part of this flow, it acquires a semaphore (bad, I know...).
During our testing the process appeared to hang. We're currently unable to build gdb for the target platform, so I wrote a CLI tool that uses ptrace to extract the register values and USER data using PTRACE_PEEKUSR.
What surprised me to see is that all of our threads were inside our crash handler, trying to acquire the semaphore. This (obviously?) indicates a deadlock on the semaphore, which means that a thread died while holding it. When I dug up the stack, it seemed that almost all of the threads (except one) were in a blocking call (recv, poll, sleep) when the signal handler started running. Manual stack reconstruction on MIPS is a pain so we have not fully done it yet. One thread appeared to be in the middle of a malloc call, which to me indicates that it crashed due to a heap corruption.
A couple of things are still unclear:
1) Assuming one thread crashed in malloc, why would all other threads be running the SIGSEGV handler? As I understand it, a SIGSEGV signal is delivered to the faulting thread, no? Does it mean that each and every one of our threads crashed?
2) Looking at the sigcontext struct for MIPS, it seems it does not contain the memory address which was accessed (badaddr). Is there another place that has it? I couldn't find it anywhere, but it seemed odd to me that it would not be available.
And of course, if anyone can suggest ways to continue the analysis, it would be appreciated!
Yes, it is likely that all of your threads crashed in turn, assuming that you have captured the thread state correctly.
siginfo_t has a si_addr member, which should give you the address of the fault. Whether your kernel fills that in is a different matter.
In-process crash handlers will always be unreliable. You should use an out-of-process handler, and set kernel.core_pattern to invoke it. In current kernels, it is not necessary to write the core file to disk; you can either read the core file from standard input, or just map the process memory of the zombie process (which is still available when the kernel invokes the crash handler).
I am performing system tests on a SAP system. From time to time, SAP crashes and I'd like to recover from those crashes by resetting the virtual machine to a previously saved state.
My problem is that I cannot detect such crashes reliably. I have created WER LocalDumps registry entries, but I don't get dumps.
It seems SAP has registered an unhandled exception handler and performs different tasks on different types of exceptions. Sometimes it shows a message box and terminates the application (e.g. in case of compression errors), sometimes it goes with a so-called Short Dump.
I am neither interested in the message box, nor in the short dump, so I am looking for a way to disable the unhandled exception handler of SAP. This should bring up WER, which writes the dump file and I can take actions to restart my system tests.
For performance reasons, I'd not like to restart the VM on every test.
I have tried:
I am basically familiar with unhandled exception handlers. I have applied them to my own .NET code successfully.
I looked at SetUnhandledExceptionFilter (MSDN) and similar but it applies to the calling process only and I cannot modify the code of SAP.
I read about DisableUserModeCallbackFilter but I don't think it is helpful for my case
I wonder whether there is a Registry Setting (e.g. in ImageFileExecutionOptions) or a Shim that I could activate.
According to Hans Passant's comment (which I take as an authorative answer),
There is no boss override switch built into the operating system to stop it from doing this.
I finally attached the debugger to SAP GUI at a time where the process was alive. Starting with all exceptions enabled, I narrowed down the conditions so that WinDbg would break when SAP GUI crashed (first chance, then second chance).
A debugger makes perfect sense when you're talking about an interpreted program because instructions always pass through the interpreter for verification before execution. But how does a debugger for a compiled application work? If the instructions are already layed out in memory and run, how can I be notified that a 'breakpoint' has been reached, or that an 'exception' has occurred?
With the help of hardware and/or the operating system.
Most modern CPUs have several debug registers that can be set to trigger a CPU exception when a certain address is reached. They often also support address watchpoints, which trigger exceptions when the application reads from or writes to a specified address or address range, and single-stepping, which causes a process to execute a single instruction and throw an exception. These exceptions can be caught by a debugger attached to the program (see below).
Alternatively, some debuggers create breakpoints by temporarily replacing the instruction at the breakpoint with an interrupt or trap instruction (thereby also causing the program to raise a CPU exception). Once the breakpoint is hit, the debugger replaces it with the original instruction and single-steps the CPU past that instruction so that the program behaves normally.
As far as exceptions go, that depends on the system you're working on. On UNIX systems, debuggers generally use the ptrace() system call to attach to a process and get a first shot at handling its signals.
TL;DR - low-level magic.
I have an application that sometimes causes an access violation on exit. This is quite unpredictable and all attempts to locate the bug have been unsuccesful so far. The bug is harmless, as no data is lost, so I was thinking if it might be possible to just hide it.
Is it possible to have another app launch the buggy one and catch the Access Violation exception if it occurs? If yes, how?
Thanks in advance!
Yes, if the other application is a debugger. This is a non-trivial amount of work, To become a debugger, you create the process with DEBUG_PROCESS | DEBUG_ONLY_THIS_PROCESS flag, see CreateProcess flags for more information.
Once you are the debugger of the process, you will get first chance to handle all exceptions.
You could also attach to the process as a debugger just before it shuts down (assuming that you know when this is going to happen) with DebugActiveProcess
Call SetErrorMode(SEM_NOGPFAULTERRORBOX) before launching the buggy application as a child process.
The error mode is inherited to child processes and this particular flag will prevent the crash dialog from appearing.
I have an application that is hosting some unstable third-party code which I can't control in an external process to protect my main application from nasty errors it exhibits. My parent process is monitoring the other process and doing "the right thing (tm)" when it fails.
The problem that I have is that Dr. Watson is still detecting crashes in the isolated process and attaching to the processes on the way down to take a crash dump. This has the two problems of:
1. Dramatically slowing down the time that it takes for me to detect a failure because the process stays alive while the crash dump is being taken.
2. Showing annoying popups to the user asking if they want to submit the error reports to Microsoft.
Clearly I would prefer to fix the bugs in the child process, but given that it isn't an option, I would like to be able to selectively disable Dr. Watson (and Windows Error Reporting in Vista+) for that process.
I am running some of my own code in the process before handing off to the untrusted bit, so if there is an API that I can call that affects the current process that would be fine.
I am aware of: http://support.microsoft.com/default.aspx/kb/188296 which would disable Dr. Watson for the entire machine. I don't want to do that because it would make me a bad citizen to trash a machine-wide setting.
I am also aware of the WerSetFlags option in Vista+ that would seem to disable windows error reporting for the current process, but I need something that will disable Dr.Watson on earlier OS versions.
The good doctor is invoked when a process does not handle a certain exception. Therefore, the common way to go would be to handle all exceptions yourself. In your case, it is much harder since you don't own the crashing process code. What you can do then, is to inject your code into the other process at runtime, and install an exception handler that will swallow the exception causing the crash. When caught, gracefully shut down the process.
There are quite a few questions here talking about injecting code into another process. As for the crash handler, you can either set an unhandled exception filter, or add a vectored exception handler. Note that for the latter, you'll have to be careful not to swallow legit exceptions that are in fact handled inside the other process, namely find a way to recognize the crashing exception and make sure it is the only one you handle.
You want to disable the GPF popup: http://blogs.msdn.com/oldnewthing/archive/2004/07/27/198410.aspx