I've been trying to find the difference between these 2 types of debugging, but couldn't find it anywhere (been googling almost 30 minutes), so I'm asking here: What's the difference between live vs. offline debugging? What do people mean when they say a debugger is "live" vs. "offline"?
Debugging types
There are several ways of debugging that can be distinguished:
live debugging vs. post mortem debugging (what you call "offline" debugging, also called "dump debugging")
kernel debugging vs. user mode debugging
local debugging vs. remote debugging
which give 8 combinations in total.
For live debugging, you can distinguish between invasive debugging vs. noninvasive debugging.
Live debugging vs. offline debugging
In live debugging, the program is running and the debugger is attached to it. This means you can still interact with the program. You can set breakpoints, handle exceptions that would normally cause the program to terminate, modify the memory etc.
The downside of live debugging is its temporal/fluent nature. If you enter a wrong command or step too far, the situation is gone and might not be repeatable.
I mentioned that there are 2 sub-modes for live debugging: invasive and noninvasive debugging: in noninvasive debugging, the debugger does not attach to the target application. It suspends all of the program's threads and has access to the memory, registers, and other such information. However, the debugger cannot control the target.
In post mortem debugging, someone has captured a memory dump of a running program at a certain point in time. In many cases this is done upon a specific event, e.g. an unhandled exception that causes the program to terminate. Since the memory dump is a file on disk, you can analyze it as often as you want and you get the exact same situation.
The downside if post mortem debugging is, of course, that the program is not running, you can't interact with it and it's very hard to find out what happens next.
"Online" debugging is the normal process:
Tell the debugger to tell the program to step forwards;
Look at what the program state is at the moment;
Set a breakpoint for the future;
Tell the debugger to simply run the program;
If the breakpoint 'fires', have a look at the program state now.
There are two ways to "offline" debug:
You can take your source code and manually step through what the processor ought to be doing, watching for unexpected program paths.
Note if you do this, you need to diligently not "know" what the processor is "supposed" to do and just do that: you need to honestly obey the code as though you were the computer. Often you get other people, who don't know the code, to do this instead of you.
You take the result of a run-log, usually captured by a hardware probe, and use the debugger to "post mortem" the run.
The latter usually requires a processor that will transmit what it is doing out a "Trace" port (not all have this), and a hardware device (like a probe) connected to the Trace port to capture the data. That probe then communicates with a debugger, which takes the data and presents it to the programmer. The programmer can work backwards and forwards through this Trace log, and see the execution path that the code actually took, rather than the code the programmer thought it should take.
Some processors not only transmit what instruction they're currently processing, but also what data they read or wrote while doing this. A more sophisticated debugger can take this extra data and provide a 'snapshot' of the system at any time during the run, allowing the programmer to analyse why the code behaved the way it did.
The reason that it is called "offline" is because once the log has been captured, you can disconnect and power down the target, and look at the saved log at any time in the future without still being connected to the probe or processor.
Related
When a program is misbehaving, it is pretty easy to capture a memory dump of the process, and then analyze it with a tool like WinDBG. However, this is pretty limited, you only get a snapshot of what the process is doing, and in some cases finding why a certain part of the code was reached is really difficult.
Is there any way of capturing memory dumps for a period of time, like recording a movie rather than taking a picture, which would indicate what changed in that period of time, and the parts of the code that were executed in that time interval?
Recording many memory dumps
Is there any way of capturing memory dumps for a period of time, like recording a movie rather than taking a picture
Yes, that exists. It's called Procdump and you can define the number of dumps with the -n parameter and the seconds between dumps with -s. It might not work well for small values of s, because it takes longer to take the crash dump.
Example:
procdump -ma -n 10 -s 1 <PID> ./dumps
However, this technique is usually not very helpful, because you now have 10 dumps to analyze instead of just 1 - and analyzing 1 dump is already difficult. AFAIK, there's no tool that would compare two dumps and give you the differences.
Live debugging
IMHO, what you need is live debugging. And that's possible with WinDbg, too. Development debugging (using an IDE) and production debugging are two different skills. So you don't need to install a complete IDE such as Visual Studio on your customer's production environment. Actually, if you copy an existing WinDbg installation onto a USB stick, it will run portable.
Simply start WinDbg, attach to a process (F6), start a log file (.logopen), set up Microsoft symbols, configure exceptions (sx) and let the program run (g).
Remote debugging
Perhaps you may even want to have a look into WinDbg's remote debugging capabilities, however, that's a bit harder to set up, usually due to IT restrictions (firewall etc.).
Visual Studio also offers remote debugging, so you can use VS on your machine and just install a smaller program on your customer's machine. I hardly have experience with it, so I can't tell you much.
Logging
the parts of the code that were executed in that time interval?
The most typical approch I see applied by any company is turning on the logging capabilities of your application.
You can also record useful data with WPT (Windows Performance Toolkit), namely WPR (Windows Performance Recorder) and later analyze it with WPA (Windows Performance Analyzer). It will give you call stacks over time.
Whenever one wants to attach to a process from Visual Studio, one receives this nasty message:
This question and its answers show the struggle to get rid of it. This Microsoft article tells us about the potential dangers of attaching for the debugging process/machine:
However, many developers do not realize that the security threat can
also flow in the opposite direction. It is possible for malicious code
in the debuggee process to jeopardize the security of the debugging
machine: there are a number of security exploits that must be guarded
against.
Question: how does the debugged process is able to exploit the debugging process? (I am interested in just a few highlights, as I imagine that one can write a book about it).
And also, what is the purpose of having this warning when debugging on local machine's w3wp.exe process (I imagine that the vast majority of debugging sessions happen within the development machine). If local machine's w3wp process is compromised, you are in deep trouble anyway.
You get this warning when you attach to a process that runs with a limited user account. Like w3wp.exe, a web server is typically configured with such account so that an attacker cannot do too much damage after he figured out how to compromise the web server. Note how you normally use an account with admin privileges to debug the web server.
This opens up a generic security hole that is very similar to the one exploited by a "shatter attack". A privilege escalation, the unprivileged process exploiting the privileges of another process. The conduit is the debugger transport, the channel that lets a debugger control the debuggee. I think a socket in the case where the process runs on another machine, a named pipe if it runs on the same machine. The compromised process could fake the messages that the debugger interprets as normal responses. Anything is possible, nothing is simple, none of this is documented. Intentionally.
Note how you still use the remote debugger when w3wp.exe runs locally. It is normally a 64-bit process and VS is 32-bit, the remote debugger (msvsmon.exe) is required to bridge the bitness difference.
It is the kind of attack scenario where Microsoft has to throw up their hands and can no longer guarantee that such an attack cannot succeed and do real damage to your machine. The attack surface is too large. So they display the dialog, you have to interpret it as a "we are no longer liable for what happens next". Plausible deniability when it ever comes to a lawsuit. The info it displays is not actually useful to judge whether the process is compromised, but it is all they got. Life is too short to worry about it every single time you click Attach, lawyers never once made a programmer's job easier :)
When you func eval something in the debuggee, you are effectively running code on the debugger. This is where the potential security problem could be.
For example, suppose the debuggee has some types that will load a natvis into the debugger. And suppose that the C++ Expression Evaluator has a security hole in it, that allows a buffer overrun attack through a natvis. Just by debugging a certain process, the remote process could take control of your local machine. Granted this isn’t likely, but the debugger isn’t hardened against this sort of attack. The nature of debugging means you have to let any code run.
In the other direction, once a process is being debugged, the debugger have the same permissions as it does. You can do anything you want.
This warning below pops up when attaching to an unknown users’ process. See this article:
https://msdn.microsoft.com/ro-ro/library/ms241736.aspx
From what I understand, on a high level, user mode debugging provides you with access to the private virtual address for a process. A debug session is limited to that process and it cannot overwrite or tamper w/ other process' virtual address space/data.
Kernel mode debug, I understand, provides access to other drivers and kernel processes that need full access to multiple resources, in addition to the original process address space.
From this, I get to thinking that kernel mode debugging seems more robust than user mode debugging. This raises the question for me: is there a time, when both options of debug mode are available, that it makes sense to choose user mode over a more robust kernel mode?
I'm still fairly new to the concept, so perhaps I am thinking of the two modes incorrectly. I'd appreciate any insight there, as well, to better understand anything I may be missing. I just seem to notice that a lot of people seem to try to avoid kernel debugging. I'm not entirely sure why, as it seems more robust.
The following is mainly from a Windows background, but I guess it should be fine for Linux too. The concepts are not so different.
Some inline answers first
From what I understand, on a high level, user mode debugging provides you with access to the private virtual address for a process.
Correct.
A debug session is limited to that process
No. You can attach to several processes at the same time, e.g. with WinDbg's .tlist/.attach command.
and it cannot overwrite or tamper w/ other process' virtual address space/data.
No. You can modify the memory, e.g. with WinDbg's ed command.
Kernel mode debug, I understand, provides access to other drivers and kernel processes that need full access to multiple resources,
Correct.
in addition to the original process address space.
As far as I know, you have access to physical RAM only. Some of the virtual address space may be swapped, so not the full address space is available.
From this, I get to thinking that kernel mode debugging seems more robust than user mode debugging.
I think the opposite. If you write incorrect values somewhere in kernel mode, the PC crashes with a blue screen. If you do that in user mode, it's only the application that crashes.
This raises the question for me: is there a time, when both options of debug mode are available, that it makes sense to choose user mode over a more robust kernel mode?
If you debug an application only and no drivers are involved, I prefer user mode debugging.
IMHO, kernel mode debugging is not more robust, it's more fragile - you can really break everything at the lowest level. User mode debugging provides the typical protection against crashes of the OS.
I just seem to notice that a lot of people seem to try to avoid kernel debugging
I observe the same. And usually it's not so difficult once they try it. In my debugging workshops, I explain processes and threads from kernel point of view and do it live in the kernel. And once people try kernel debugging, it's not such a mystery any more.
I'm not entirely sure why, as it seems more robust.
Well, you really can blow up everything in kernel mode.
User mode debugging
User mode debugging is the default that any IDE will do. The integration is usually good, in some IDEs it feels quite native.
During user mode debugging, things are easy. If you access memory that is paged out to disk, the OS is still running and will simply page it in, so you can read and write it.
You have access to everything that you know from application development. There are threads and you can suspend or resume them. The knowledge you have from application development will be sufficient to operate the debugger.
You can set breakpoints and inspect variables (as long as you have correct symbols).
Some kinds of debugging is only available in user mode. E.g. the SOS extension for WinDbg to debug .NET application only works in user mode.
Kernel debugging
Kernel debugging is quite complex. Typically, you can't simply do local kernel debugging - if you stop somewhere in the kernel, how do you control the debugger? The system will just freeze. So, for kernel debugging, you need 2 PCs (or virtual PCs).
During kernel mode debugging, things are complex. While you are just inside an application, a millisecond later, some interrupt occurs and does something completely different. You don't only have threads, you also need to deal with call stacks that are outside your application, you'll see CPU register content, instruction pointers etc. That's all stuff a "normal" app developer does not want to care about.
You don't only have access to everything that you implemented. You also have access to everything that Microsoft, Intel, NVidia and lots of other companies developed.
You cannot simply access all memory, because some memory that is paged out to the swap file will first generate a page fault, then involve some disk driver to fetch the data, potentially page out some other data, etc.
There is so much giong on in kernel mode and in order to not break it, you need to have really professional comprehension of all those topics.
Conclusion
Most developers just want to care about their source code. So if they are writing programs (aka. applications, scripts, tools, games), they just want user mode debugging. If "their code" is driver code, of course they want kernel debugging.
And of course Security Specialists and Crackers want kernel mode debugging because they want privileges.
I'm using Atollic TrueSTUDIO for ARM 5.0.0 Lite for debugging an STM32F3 application via the SWD debug interface. The application receives data via interrupts from a USART.
When I "step over" a relatively long function, the application doesn't pause, i.e. the program does not reach the line after the call. When I then manually pause the application, I find it to be at the entry of the USART ISR, so I concluded that the execution was paused, even though Atollic's debugger didn't recognize it.
The bigger problem is that the same happens when I simply resume: I can't run my application with the debugger attached, as every byte on the USART pauses it.
Is my analysis of the situation correct? Is this the expected behavior, and is there a way to work around it? Non-Atollic specific answers are also very welcome!
To be honest, I couldn't form a clear picture in my mind of what's really going on, but here's a possibility: you're not clearing the proper flags using the USART_ClearITPendingBit() function call from the standard peripheral library, or its equivalent in terms of direct register access. If you don't clear the proper bits, as soon as you return from the ISR, the hardware executes it again, so it looks like you're in an infinite loop inside the ISR.
I keep wondering how does a debugger work? Particulary the one that can be 'attached' to already running executable. I understand that compiler translates code to machine language, but then how does debugger 'know' what it is being attached to?
The details of how a debugger works will depend on what you are debugging, and what the OS is. For native debugging on Windows you can find some details on MSDN: Win32 Debugging API.
The user tells the debugger which process to attach to, either by name or by process ID. If it is a name then the debugger will look up the process ID, and initiate the debug session via a system call; under Windows this would be DebugActiveProcess.
Once attached, the debugger will enter an event loop much like for any UI, but instead of events coming from the windowing system, the OS will generate events based on what happens in the process being debugged – for example an exception occurring. See WaitForDebugEvent.
The debugger is able to read and write the target process' virtual memory, and even adjust its register values through APIs provided by the OS. See the list of debugging functions for Windows.
The debugger is able to use information from symbol files to translate from addresses to variable names and locations in the source code. The symbol file information is a separate set of APIs and isn't a core part of the OS as such. On Windows this is through the Debug Interface Access SDK.
If you are debugging a managed environment (.NET, Java, etc.) the process will typically look similar, but the details are different, as the virtual machine environment provides the debug API rather than the underlying OS.
As I understand it:
For software breakpoints on x86, the debugger replaces the first byte of the instruction with CC (int3). This is done with WriteProcessMemory on Windows. When the CPU gets to that instruction, and executes the int3, this causes the CPU to generate a debug exception. The OS receives this interrupt, realizes the process is being debugged, and notifies the debugger process that the breakpoint was hit.
After the breakpoint is hit and the process is stopped, the debugger looks in its list of breakpoints, and replaces the CC with the byte that was there originally. The debugger sets TF, the Trap Flag in EFLAGS (by modifying the CONTEXT), and continues the process. The Trap Flag causes the CPU to automatically generate a single-step exception (INT 1) on the next instruction.
When the process being debugged stops the next time, the debugger again replaces the first byte of the breakpoint instruction with CC, and the process continues.
I'm not sure if this is exactly how it's implemented by all debuggers, but I've written a Win32 program that manages to debug itself using this mechanism. Completely useless, but educational.
In Linux, debugging a process begins with the ptrace(2) system call. This article has a great tutorial on how to use ptrace to implement some simple debugging constructs.
If you're on a Windows OS, a great resource for this would be "Debugging Applications for Microsoft .NET and Microsoft Windows" by John Robbins:
http://www.amazon.com/dp/0735615365
(or even the older edition: "Debugging Applications")
The book has has a chapter on how a debugger works that includes code for a couple of simple (but working) debuggers.
Since I'm not familiar with details of Unix/Linux debugging, this stuff may not apply at all to other OS's. But I'd guess that as an introduction to a very complex subject the concepts - if not the details and APIs - should 'port' to most any OS.
I think there are two main questions to answer here:
1. How the debugger knows that an exception occurred?
When an exception occurs in a process that’s being debugged, the debugger gets notified by the OS before any user exception handlers defined in the target process are given a chance to respond to the exception. If the debugger chooses not to handle this (first-chance) exception notification, the exception dispatching sequence proceeds further and the target thread is then given a chance to handle the exception if it wants to do so. If the SEH exception is not handled by the target process, the debugger is then sent another debug event, called a second-chance notification, to inform it that an unhandled exception occurred in the target process. Source
2. How the debugger knows how to stop on a breakpoint?
The simplified answer is: When you put a break-point into the program, the debugger replaces your code at that point with a int3 instruction which is a software interrupt. As an effect the program is suspended and the debugger is called.
Another valuable source to understand debugging is Intel CPU manual (Intel® 64 and IA-32 Architectures
Software Developer’s Manual). In the volume 3A, chapter 16, it introduced the hardware support of debugging, such as special exceptions and hardware debugging registers. Following is from that chapter:
T (trap) flag, TSS — Generates a debug exception (#DB) when an attempt is
made to switch to a task with the T flag set in its TSS.
I am not sure whether Window or Linux use this flag or not, but it is very interesting to read that chapter.
Hope this helps someone.
My understanding is that when you compile an application or DLL file, whatever it compiles to contains symbols representing the functions and the variables.
When you have a debug build, these symbols are far more detailed than when it's a release build, thus allowing the debugger to give you more information. When you attach the debugger to a process, it looks at which functions are currently being accessed and resolves all the available debugging symbols from here (since it knows what the internals of the compiled file looks like, it can acertain what might be in the memory, with contents of ints, floats, strings, etc.). Like the first poster said, this information and how these symbols work greatly depends on the environment and the language.