How does debuggers/exceptions work on a compiled program? - debugging

A debugger makes perfect sense when you're talking about an interpreted program because instructions always pass through the interpreter for verification before execution. But how does a debugger for a compiled application work? If the instructions are already layed out in memory and run, how can I be notified that a 'breakpoint' has been reached, or that an 'exception' has occurred?

With the help of hardware and/or the operating system.
Most modern CPUs have several debug registers that can be set to trigger a CPU exception when a certain address is reached. They often also support address watchpoints, which trigger exceptions when the application reads from or writes to a specified address or address range, and single-stepping, which causes a process to execute a single instruction and throw an exception. These exceptions can be caught by a debugger attached to the program (see below).
Alternatively, some debuggers create breakpoints by temporarily replacing the instruction at the breakpoint with an interrupt or trap instruction (thereby also causing the program to raise a CPU exception). Once the breakpoint is hit, the debugger replaces it with the original instruction and single-steps the CPU past that instruction so that the program behaves normally.
As far as exceptions go, that depends on the system you're working on. On UNIX systems, debuggers generally use the ptrace() system call to attach to a process and get a first shot at handling its signals.
TL;DR - low-level magic.

Related

Any way to set a breakpoint in GDB without stopping the target?

I'm using GDB (through an IDE) to debug an arm cortex microcontroller, and I've encountered an issue where the micro is halted briefly (for 10-20ms) whenever a breakpoint is set or cleared (not hit, but set, as in the code has not yet reached the breakpoint). This long of a pause can cause significant problems when driving an electric motor, for example.
The IDE has a debug console which shows that the GDB client is sending a SIGINT to the GDB server whenever I add or remove a breakpoint. I know that in the command-line client you have to use ctrl+c to interrupt the process to issue any command, but for modern microcontrollers (ARM cortex-m, etc.) it is not necessary to interrupt the processor to insert breakpoints, read memory, and in some cases to trace program execution. I am wondering if this is something that is being imposed by the GDB interface artificially.
Is there any way to create a new breakpoint without halting the target?
I have tried using "async" mode in GDB, but it informs me that I must halt the program to insert breakpoints. I have also verified that breakpoints can be set with the underlying debug server (OpenOCD) without halting, so in this case GDB is incorrect.
Any input is appreciated, and thanks in advance.

Crash with all threads running SIGSEGV handler

We develop a user-space process running on Linux 3.4.11 in an embedded MIPS system. The process creates multiple (>10) threads using pthreads. The process has a SIGSEGV signal handler which, among other things, generates a log message which goes to our log file. As part of this flow, it acquires a semaphore (bad, I know...).
During our testing the process appeared to hang. We're currently unable to build gdb for the target platform, so I wrote a CLI tool that uses ptrace to extract the register values and USER data using PTRACE_PEEKUSR.
What surprised me to see is that all of our threads were inside our crash handler, trying to acquire the semaphore. This (obviously?) indicates a deadlock on the semaphore, which means that a thread died while holding it. When I dug up the stack, it seemed that almost all of the threads (except one) were in a blocking call (recv, poll, sleep) when the signal handler started running. Manual stack reconstruction on MIPS is a pain so we have not fully done it yet. One thread appeared to be in the middle of a malloc call, which to me indicates that it crashed due to a heap corruption.
A couple of things are still unclear:
1) Assuming one thread crashed in malloc, why would all other threads be running the SIGSEGV handler? As I understand it, a SIGSEGV signal is delivered to the faulting thread, no? Does it mean that each and every one of our threads crashed?
2) Looking at the sigcontext struct for MIPS, it seems it does not contain the memory address which was accessed (badaddr). Is there another place that has it? I couldn't find it anywhere, but it seemed odd to me that it would not be available.
And of course, if anyone can suggest ways to continue the analysis, it would be appreciated!
Yes, it is likely that all of your threads crashed in turn, assuming that you have captured the thread state correctly.
siginfo_t has a si_addr member, which should give you the address of the fault. Whether your kernel fills that in is a different matter.
In-process crash handlers will always be unreliable. You should use an out-of-process handler, and set kernel.core_pattern to invoke it. In current kernels, it is not necessary to write the core file to disk; you can either read the core file from standard input, or just map the process memory of the zombie process (which is still available when the kernel invokes the crash handler).

Does the operating system assumes anything about callee-saved registers when control returns to it?

Does the operating system assumes anything about callee-saved registers when control returns to it?
I've wondered whether the OS, say Windows, assumes anything about the callee-saved registers like ebp, esi, edi?
In other words, does the OS require the value in any of these registers preserved, when control transfers back to it (ret in main)?
I cannot find anything specified, but I guess the answer is no (having looked at compiler generated code). Is there any documentation on the topic?
Windows 32 is designed to have process isolation.
Nothing* that a process does can cause another process (including) the operating system itself to fail.
For this reason it does not matter what you do with the registers upon exit.
The only exception is esp. If the stack pointer is messed up your application will terminate with a stack fault or access violation.
This will still not affect the OS however, it will merely terminate your app slightly early.
*Obviously this does not include the effects on the system by legitimate system calls, or the exploitation of bugs.
Note that the ret in main does not return control to the OS. Almost all Win32 c applications have a runtime library included. If so the ret in main returns to some initialization code that look like this:
//pseudo-init
do set up (setup command line params for main to read).
call main;
call Windows.ExitProcess();
Having a 'clean' exit to Windows is important to an application so it can clean up its own resources (close files etc). The OS does not really care. If an application does not clean up after itself, the OS will do the job for it.
Much worse than having a crashing up is a 'hung' one. If an application is stuck in an endless loop, or worse an endless loop that keeps claiming more and more resources then the system can be brought to its knees quite easily.

Crash after returning from Windows keyboard hook procedure

There is a keyboard hook installed like this:
s_hKeyboardHook = ::SetWindowsHookEx(WH_KEYBOARD, KeyboardHookProc, nullptr, ::GetCurrentThreadId());
(This is a plug-in that wants to intercept keyboard events that get sent to its host (64-bit), even though the host doesn't provide keyboard events to its plugins the normal way. I do not have the source code of the host, though I do have the source code of the plug-in.)
After the keyboard hook procedure successfully runs and returns, the program crashes. The crash happens inside Windows' ZwCallbackReturn(), executing the syscall instruction. The exception is 0XC0000005 (access violation). The crash only happens if a particular key is pressed which triggers some particular logic.
I am stuck diagnosing this crash and could really use some help. I am sure the problem is in this big chunk of code that's in the hook proc. What I am having trouble with is understanding where the crash occurs and where to basically place the breakpoint to preempt it.
Additional info:
1) The hook procedure is really really heavy, with lots of blocking, i/o and memory usage (it completes in a couple of seconds on a fast machine). Maybe that's part of the problem.
2) If compiled as 32-bit, the stack right after the crash looks more interesting, but I doubt it can be trusted:
2a71f510() Unknown
ExecuteHandler2#20() Unknown
ExecuteHandler#20() Unknown
_RtlDispatchException#8() Unknown
_KiUserExceptionDispatcher#8() Unknown
2a10f24a() Unknown
_DispatchHookW#16() Unknown
_CallHookWithSEH#16() Unknown
___fnHkINDWORD#4() Unknown
_KiUserCallbackDispatcher#12() Unknown
_LdrAddLoadAsDataTable#20() Unknown
AfxInternalPumpMessage() Line 153 C++
AfxWinMain(0x00000000, 0x00000020, 0x00000001, 1638280) Line 47 C++
#BaseThreadInitThunk#12() Unknown
where the top 5 lines are repeated many times.
Here's what I tried so far. It is my understanding that the syscall instruction itself doesn't generate the exception: the registers look sane, and I guess the stack would remain the same if it crashed. So I think that after this instruction initiates transition back to the kernel mode, from where the "user callback" (the hook procedure call) had originated, the kernel continues to run just fine. Eventually it should return control back to userland -GetMessage() I presume). Then down the road, I think, the stack gets corrupted and the program crashes. But unfortunately I can't instruct my Visual C++ debugger to break at the first user-mode instruction executed, before the stack is corrupted. I tried installing conditional breakpoints in TranslateMessage() and DispatchMessage(), which are most likely to run right after GetMessage(), but they don't fire between the last good user-mode instruction and the crash.
The crash happened because the keyboard hook procedure was NOT the first in the hook chain. It was called from a previous hook in the hook chain via CallNextHookEx(). And that previous hook was registered by a DLL which got unloaded inside "our" keyboard hook.
Therefore, after all the hooks got eventually called, the control returned to the first hook procedure, which didn't exist any more. And the crash was trying to execute an invalid address.

How does a debugger work?

I keep wondering how does a debugger work? Particulary the one that can be 'attached' to already running executable. I understand that compiler translates code to machine language, but then how does debugger 'know' what it is being attached to?
The details of how a debugger works will depend on what you are debugging, and what the OS is. For native debugging on Windows you can find some details on MSDN: Win32 Debugging API.
The user tells the debugger which process to attach to, either by name or by process ID. If it is a name then the debugger will look up the process ID, and initiate the debug session via a system call; under Windows this would be DebugActiveProcess.
Once attached, the debugger will enter an event loop much like for any UI, but instead of events coming from the windowing system, the OS will generate events based on what happens in the process being debugged – for example an exception occurring. See WaitForDebugEvent.
The debugger is able to read and write the target process' virtual memory, and even adjust its register values through APIs provided by the OS. See the list of debugging functions for Windows.
The debugger is able to use information from symbol files to translate from addresses to variable names and locations in the source code. The symbol file information is a separate set of APIs and isn't a core part of the OS as such. On Windows this is through the Debug Interface Access SDK.
If you are debugging a managed environment (.NET, Java, etc.) the process will typically look similar, but the details are different, as the virtual machine environment provides the debug API rather than the underlying OS.
As I understand it:
For software breakpoints on x86, the debugger replaces the first byte of the instruction with CC (int3). This is done with WriteProcessMemory on Windows. When the CPU gets to that instruction, and executes the int3, this causes the CPU to generate a debug exception. The OS receives this interrupt, realizes the process is being debugged, and notifies the debugger process that the breakpoint was hit.
After the breakpoint is hit and the process is stopped, the debugger looks in its list of breakpoints, and replaces the CC with the byte that was there originally. The debugger sets TF, the Trap Flag in EFLAGS (by modifying the CONTEXT), and continues the process. The Trap Flag causes the CPU to automatically generate a single-step exception (INT 1) on the next instruction.
When the process being debugged stops the next time, the debugger again replaces the first byte of the breakpoint instruction with CC, and the process continues.
I'm not sure if this is exactly how it's implemented by all debuggers, but I've written a Win32 program that manages to debug itself using this mechanism. Completely useless, but educational.
In Linux, debugging a process begins with the ptrace(2) system call. This article has a great tutorial on how to use ptrace to implement some simple debugging constructs.
If you're on a Windows OS, a great resource for this would be "Debugging Applications for Microsoft .NET and Microsoft Windows" by John Robbins:
http://www.amazon.com/dp/0735615365
(or even the older edition: "Debugging Applications")
The book has has a chapter on how a debugger works that includes code for a couple of simple (but working) debuggers.
Since I'm not familiar with details of Unix/Linux debugging, this stuff may not apply at all to other OS's. But I'd guess that as an introduction to a very complex subject the concepts - if not the details and APIs - should 'port' to most any OS.
I think there are two main questions to answer here:
1. How the debugger knows that an exception occurred?
When an exception occurs in a process that’s being debugged, the debugger gets notified by the OS before any user exception handlers defined in the target process are given a chance to respond to the exception. If the debugger chooses not to handle this (first-chance) exception notification, the exception dispatching sequence proceeds further and the target thread is then given a chance to handle the exception if it wants to do so. If the SEH exception is not handled by the target process, the debugger is then sent another debug event, called a second-chance notification, to inform it that an unhandled exception occurred in the target process. Source
2. How the debugger knows how to stop on a breakpoint?
The simplified answer is: When you put a break-point into the program, the debugger replaces your code at that point with a int3 instruction which is a software interrupt. As an effect the program is suspended and the debugger is called.
Another valuable source to understand debugging is Intel CPU manual (Intel® 64 and IA-32 Architectures
Software Developer’s Manual). In the volume 3A, chapter 16, it introduced the hardware support of debugging, such as special exceptions and hardware debugging registers. Following is from that chapter:
T (trap) flag, TSS — Generates a debug exception (#DB) when an attempt is
made to switch to a task with the T flag set in its TSS.
I am not sure whether Window or Linux use this flag or not, but it is very interesting to read that chapter.
Hope this helps someone.
My understanding is that when you compile an application or DLL file, whatever it compiles to contains symbols representing the functions and the variables.
When you have a debug build, these symbols are far more detailed than when it's a release build, thus allowing the debugger to give you more information. When you attach the debugger to a process, it looks at which functions are currently being accessed and resolves all the available debugging symbols from here (since it knows what the internals of the compiled file looks like, it can acertain what might be in the memory, with contents of ints, floats, strings, etc.). Like the first poster said, this information and how these symbols work greatly depends on the environment and the language.

Resources