How does a debugger work? - debugging

I keep wondering how does a debugger work? Particulary the one that can be 'attached' to already running executable. I understand that compiler translates code to machine language, but then how does debugger 'know' what it is being attached to?

The details of how a debugger works will depend on what you are debugging, and what the OS is. For native debugging on Windows you can find some details on MSDN: Win32 Debugging API.
The user tells the debugger which process to attach to, either by name or by process ID. If it is a name then the debugger will look up the process ID, and initiate the debug session via a system call; under Windows this would be DebugActiveProcess.
Once attached, the debugger will enter an event loop much like for any UI, but instead of events coming from the windowing system, the OS will generate events based on what happens in the process being debugged – for example an exception occurring. See WaitForDebugEvent.
The debugger is able to read and write the target process' virtual memory, and even adjust its register values through APIs provided by the OS. See the list of debugging functions for Windows.
The debugger is able to use information from symbol files to translate from addresses to variable names and locations in the source code. The symbol file information is a separate set of APIs and isn't a core part of the OS as such. On Windows this is through the Debug Interface Access SDK.
If you are debugging a managed environment (.NET, Java, etc.) the process will typically look similar, but the details are different, as the virtual machine environment provides the debug API rather than the underlying OS.

As I understand it:
For software breakpoints on x86, the debugger replaces the first byte of the instruction with CC (int3). This is done with WriteProcessMemory on Windows. When the CPU gets to that instruction, and executes the int3, this causes the CPU to generate a debug exception. The OS receives this interrupt, realizes the process is being debugged, and notifies the debugger process that the breakpoint was hit.
After the breakpoint is hit and the process is stopped, the debugger looks in its list of breakpoints, and replaces the CC with the byte that was there originally. The debugger sets TF, the Trap Flag in EFLAGS (by modifying the CONTEXT), and continues the process. The Trap Flag causes the CPU to automatically generate a single-step exception (INT 1) on the next instruction.
When the process being debugged stops the next time, the debugger again replaces the first byte of the breakpoint instruction with CC, and the process continues.
I'm not sure if this is exactly how it's implemented by all debuggers, but I've written a Win32 program that manages to debug itself using this mechanism. Completely useless, but educational.

In Linux, debugging a process begins with the ptrace(2) system call. This article has a great tutorial on how to use ptrace to implement some simple debugging constructs.

If you're on a Windows OS, a great resource for this would be "Debugging Applications for Microsoft .NET and Microsoft Windows" by John Robbins:
http://www.amazon.com/dp/0735615365
(or even the older edition: "Debugging Applications")
The book has has a chapter on how a debugger works that includes code for a couple of simple (but working) debuggers.
Since I'm not familiar with details of Unix/Linux debugging, this stuff may not apply at all to other OS's. But I'd guess that as an introduction to a very complex subject the concepts - if not the details and APIs - should 'port' to most any OS.

I think there are two main questions to answer here:
1. How the debugger knows that an exception occurred?
When an exception occurs in a process that’s being debugged, the debugger gets notified by the OS before any user exception handlers defined in the target process are given a chance to respond to the exception. If the debugger chooses not to handle this (first-chance) exception notification, the exception dispatching sequence proceeds further and the target thread is then given a chance to handle the exception if it wants to do so. If the SEH exception is not handled by the target process, the debugger is then sent another debug event, called a second-chance notification, to inform it that an unhandled exception occurred in the target process. Source
2. How the debugger knows how to stop on a breakpoint?
The simplified answer is: When you put a break-point into the program, the debugger replaces your code at that point with a int3 instruction which is a software interrupt. As an effect the program is suspended and the debugger is called.

Another valuable source to understand debugging is Intel CPU manual (Intel® 64 and IA-32 Architectures
Software Developer’s Manual). In the volume 3A, chapter 16, it introduced the hardware support of debugging, such as special exceptions and hardware debugging registers. Following is from that chapter:
T (trap) flag, TSS — Generates a debug exception (#DB) when an attempt is
made to switch to a task with the T flag set in its TSS.
I am not sure whether Window or Linux use this flag or not, but it is very interesting to read that chapter.
Hope this helps someone.

My understanding is that when you compile an application or DLL file, whatever it compiles to contains symbols representing the functions and the variables.
When you have a debug build, these symbols are far more detailed than when it's a release build, thus allowing the debugger to give you more information. When you attach the debugger to a process, it looks at which functions are currently being accessed and resolves all the available debugging symbols from here (since it knows what the internals of the compiled file looks like, it can acertain what might be in the memory, with contents of ints, floats, strings, etc.). Like the first poster said, this information and how these symbols work greatly depends on the environment and the language.

Related

How does the Visual Studio Attach to Process work?

I've always wanted to know the inner-workings of Visual Studio's debugger and debuggers in general. How does it communicate and control your code, especially when it's running inside a host process or an external network server (attach to process)? Does the compiler or linker patch your code with callbacks so that the debugger is given control? If it indeed works this way, how do interpreted languages such as JavaScript containing no debug code work?
Generally speaking, Windows provides an API for writing debuggers that let you examine and modify memory in another process and to get notifications when exceptions happen in another process.
The debug process sits in a loop, waiting for notification from events from the process under inspection. To set a breakpoint, the debugger process modifies the code in the debugee to cause an exception (typically, an int 3 instruction for x86).
The compiler and linker work together to make the symbol information about a program available in a format that can be read by debuggers. On Windows, that's typically CodeView in a separate PDB file.
In the Unix-derived world, there's an API called ptrace that does essentially the same sorts of things as Windows debugging API.
For remote debugging, a small program is placed on the remote machine that communicates with and acts on behalf of the actual debugger running on the local machine.
For interpreted languages, like JavaScript, the debugger works with the interpreter to give the same sorts of functionality (inspecting memory, setting breakpoints, etc.).
Windows includes support for debuggers. A process has to enable debugger privilege, and once this is done that process can attach to any other process and debug it using windows debugger functions
http://msdn.microsoft.com/en-us/library/windows/desktop/ms679303(v=vs.85).aspx
For something like javascript, seems like you would need the equivalent of a javascript debugger.
In the case of a Visual Studio multi-process project, you typically have to switch which process the debugger is attached to in order to debug that process. I don't know if there's a way to have pending breakpoints set for multiple processes at the same time. There could be other debuggers that work better with multiple processes, but I haven't used such a tool.

Live vs. offline debugging

I've been trying to find the difference between these 2 types of debugging, but couldn't find it anywhere (been googling almost 30 minutes), so I'm asking here: What's the difference between live vs. offline debugging? What do people mean when they say a debugger is "live" vs. "offline"?
Debugging types
There are several ways of debugging that can be distinguished:
live debugging vs. post mortem debugging (what you call "offline" debugging, also called "dump debugging")
kernel debugging vs. user mode debugging
local debugging vs. remote debugging
which give 8 combinations in total.
For live debugging, you can distinguish between invasive debugging vs. noninvasive debugging.
Live debugging vs. offline debugging
In live debugging, the program is running and the debugger is attached to it. This means you can still interact with the program. You can set breakpoints, handle exceptions that would normally cause the program to terminate, modify the memory etc.
The downside of live debugging is its temporal/fluent nature. If you enter a wrong command or step too far, the situation is gone and might not be repeatable.
I mentioned that there are 2 sub-modes for live debugging: invasive and noninvasive debugging: in noninvasive debugging, the debugger does not attach to the target application. It suspends all of the program's threads and has access to the memory, registers, and other such information. However, the debugger cannot control the target.
In post mortem debugging, someone has captured a memory dump of a running program at a certain point in time. In many cases this is done upon a specific event, e.g. an unhandled exception that causes the program to terminate. Since the memory dump is a file on disk, you can analyze it as often as you want and you get the exact same situation.
The downside if post mortem debugging is, of course, that the program is not running, you can't interact with it and it's very hard to find out what happens next.
"Online" debugging is the normal process:
Tell the debugger to tell the program to step forwards;
Look at what the program state is at the moment;
Set a breakpoint for the future;
Tell the debugger to simply run the program;
If the breakpoint 'fires', have a look at the program state now.
There are two ways to "offline" debug:
You can take your source code and manually step through what the processor ought to be doing, watching for unexpected program paths.
Note if you do this, you need to diligently not "know" what the processor is "supposed" to do and just do that: you need to honestly obey the code as though you were the computer. Often you get other people, who don't know the code, to do this instead of you.
You take the result of a run-log, usually captured by a hardware probe, and use the debugger to "post mortem" the run.
The latter usually requires a processor that will transmit what it is doing out a "Trace" port (not all have this), and a hardware device (like a probe) connected to the Trace port to capture the data. That probe then communicates with a debugger, which takes the data and presents it to the programmer. The programmer can work backwards and forwards through this Trace log, and see the execution path that the code actually took, rather than the code the programmer thought it should take.
Some processors not only transmit what instruction they're currently processing, but also what data they read or wrote while doing this. A more sophisticated debugger can take this extra data and provide a 'snapshot' of the system at any time during the run, allowing the programmer to analyse why the code behaved the way it did.
The reason that it is called "offline" is because once the log has been captured, you can disconnect and power down the target, and look at the saved log at any time in the future without still being connected to the probe or processor.

How does gdb set software breakpoints in shared library functions?

I know that software breakpoints in an executable file can work through replacing some assembler instruction at the desired place with another one, which cause interrupt. So debugger can stop execution exactly at this place and replace this instruction with original one and ask user about what to do the next or call some commands and etc.
But code of such executable file is not used by another programs and has only one copy in memory. How can software breakpoints work with a shared libraries? For instance, how software breakpoints work if I set one at some internal function of C-library (as I understand it has only one copy for all the applications, so we cannot just replace some instruction in it)? Are there any "software breakpoints" techniques for that purpose?
The answer for Linux is that the Linux kernel implements COW (Copy-on-Write): If the code of a shared library is written to, the kernel makes a private duplicate copy of the shared page first, remaps internally virtual memory just for that process to the copy, and allows the application to continue. This is completely invisible to userland applications and done entirely in the kernel.
Thus, until the first time a software breakpoint is put into the shared library, its code is indeed shared; But afterwards, not. The process thereafter operates with a dirty but private copy.
This kernel magic is what allows the debugger to not cause every other application to suddenly stop.
On OSes such as VxWorks, however, this is not possible. From personal experience, when I was implementing a GDB remote debug server for VxWorks, I had to forbid my users from ever single-stepping within semTake() and semGive() (the OS semaphore functions), since a) GDB uses software breakpoints in its source-level single-step implementation and b) VxWorks uses a semaphore to protect its breakpoints list...
The unpleasant consequence was an interrupt storm in which a breakpoint would cause an interrupt, and within this interrupt there would be another interrupt, and another and another in an unescapable chain resistant even to Ctrl-Z. The only way out was to power off the machine.

How does debuggers/exceptions work on a compiled program?

A debugger makes perfect sense when you're talking about an interpreted program because instructions always pass through the interpreter for verification before execution. But how does a debugger for a compiled application work? If the instructions are already layed out in memory and run, how can I be notified that a 'breakpoint' has been reached, or that an 'exception' has occurred?
With the help of hardware and/or the operating system.
Most modern CPUs have several debug registers that can be set to trigger a CPU exception when a certain address is reached. They often also support address watchpoints, which trigger exceptions when the application reads from or writes to a specified address or address range, and single-stepping, which causes a process to execute a single instruction and throw an exception. These exceptions can be caught by a debugger attached to the program (see below).
Alternatively, some debuggers create breakpoints by temporarily replacing the instruction at the breakpoint with an interrupt or trap instruction (thereby also causing the program to raise a CPU exception). Once the breakpoint is hit, the debugger replaces it with the original instruction and single-steps the CPU past that instruction so that the program behaves normally.
As far as exceptions go, that depends on the system you're working on. On UNIX systems, debuggers generally use the ptrace() system call to attach to a process and get a first shot at handling its signals.
TL;DR - low-level magic.

Fastest way to break in WinDbg for specific exception? .net 4.0 app

Folks,
Debugging a .net 4.0 app using WinDbg (I'm a beginner to WinDbg). I'm trying to break when I hit a stack overflow:
(NTSTATUS) 0xc00000fd – A new guard page for the stack cannot be created
Unfortunately, this overflow happens about 2-hours into a long-running process and logs tells me that it doesn't always happen at the same time/place. If I attach to the process in the debugger, the program runs terribly slow...it might take a few days to hit the bug! Is there a way to speed up the app/WinDbg by telling WinDbg to ONLY break for this particular error?
You can instruct ADPLus to create dumps of the process when exceptions occur. John Robbins has a good article on the subject. You can then use WinDbg to debug the dump file(s).
Be aware, that the original adplus.vbs has been replaced by adplus.exe, which is supposed to provide the same functionality. In my experience there are a few problems with the new implementation, so you may need to use the old script, which is still available as adplus_old.vbs.
Usually, attaching a debugger would not slow down an application too much (compared to starting the application from the debugger, which will set the heap in debug mode).
But by default, the debugger will trace events (exceptions and OutputDebugString), and in your case, there may be too much of them. After attaching with the debugger, you can disable all exception handling. (Menu Debug/Event Filters, or command sxi). You have to change the handling for all events (sxi * means unknown events, and does not apply to all events). You can also disable all tracing with .outmask-0xFFFFFFFF. Then enable only the stack overflow event with sxe -c ".outmask /d" sov

Resources