Is there any memory debugger for linux kernel?
We have issues with "NULL pointer dereference" kernel oops among other crashes on android/linux arm based hardware.
Thanks
Modern kernels contain a great deal of built-in diagnostic tools (those are available in "Kernel hacking" sub-menu of the kernel source configuration tool). However, on embedded targets one has also an option of using gdb with a good jtag debugger, such as Abatron BDI series (this will, of course, allow for the most precise diagnostics, including diagnostics of interrupt related problems).
In the absence of hardware debugger, the following options can be quite handy to detect memory leaks (don't forget to compile the kernel with "Compile the kernel with debug info" and "Compile the kernel with frame pointers" set):
Kernel memory leak detector - useful in catching kmalloc/kfree errors.
KGDB (with suboptions) - this will enable a built-in gdb server inside the kernel, which can be accessed from a gdb front-end over a serial port. There's also a KGDB_KDB option to do the same manually (by omitting the gdb front end and using a human manageable protocol).
kmemcheck - requires the least of human interaction and the most machine resources, but can be handy in doing initial memory related problem analysis.
There are plenty of other diagnostics options, useful with more specific classes of problems. Most of them are reasonably documented both with kernel configuration tool snippets as well as with separate documents in Documentation/ sub-directory of the source (+ various online publications).
Related
I wanna know if normal practice of setting the breakpoints, step-in & step-out works same for the code which reside on ROM also. Do we have to do something extra for ROM debugging.
It will depend largely on the processor an the debug hardware you use. Many microcontrollers include on-chip debug hardware that includes hardware breakpoints that are essentially program-counter comparators. Other facilities may be supported such as data access break-points and instruction trace - essentially an on-chip in-circuit emulator (ICE).
Hardware breakpoints are a necessarily limited resource; for example ARM7 devices have just two while ARM Cortex-M3/4 are endowed with eight.
Either way, to utilise on-chip debug you require suitable debugger hardware (often via JTAG, or a vendor proprietary interface) to interface the target to the host debugger software.
For chips without on-chip debug, you typically use an in-circuit emulator. This is debug hardware that connects to the target board in place of the processor and can be controlled directly by the host debug software. The emulator hardware executes instructions identically to the actual processor but can be halted and stepped and have breakpoints set. Essentially the ICE works like a special version of the target processor with debug support. A true ICE is uncommon on modern processors since on-chip debug capabilities are almost ubiquitous even on small devices such as PIC and AVR, however some external debug hardware can support features not available on on-chip debug. For example Segger's J-Link supports unlimited break-points on ARM7 and Cortex-M3/4.
What is the difference between hardware and software breakpoints?
Are hardware breakpoints are said to be faster than software breakpoints, if yes then how, and also then why would we need the software breakpoints at all?
This article provides a good discussion of pros and cons:
http://www.nynaeve.net/?p=80
To answer your question directly software breakpoints are more flexible because hardware breakpoints are limited in some functionality and highly architecture-dependant. One example given in the article is that x86 hardware has a limit of 4 hardware breakpoints.
Hardware breakpoints are faster because they have dedicated registers and less overhead than software breakpoints.
Hardware breakpoints are actually comparators, comparing the current PC with the address in the comparator (when enabled). Hardware breakpoints are the best solution when setting breakpoints. Typically set via the debug probe (using JTAG, SWD, ...). The downside of hardware breakpoints: They are limited. CPUs have only a limited number of hardware breakpoints (comparators). The number of available hardware breakpoints depends on the CPU. ARM 7/9 cores have 2, modern ARM devices (Cortex-M 0,3,4) between 2 and 6,
x86 usually 4.
Software breakpoints are in fact set by replacing the instruction to be breakpointed with a breakpoint instruction. The breakpoint instruction is present in most CPUs, and usually as short as the shortest instruction, so only one byte on x86 (0xcc, INT 3). On Cortex-M CPUs, instructions are 2 or 4 bytes, so the breakpoint instruction is a 2 byte instruction.
Software breakpoints can easily be set if the program is located in RAM (such as on a PC). A lot of embedded systems have the program located in flash memory. Here it is not so easy to exchange the instruction, as the flash needs to be reprogrammed, so hardware breakpoints are used primarily. Most debug probes support only hardware breakpoints if the program is located in flash memory. However, some (such as SEGGER's J-Link) allow reprogramming the flash memory with breakpoint instruction and aso allow an unlimited number of (software) breakpoints even when debugging a program located in flash.
More info about software breakpoints in flash memory
You can go through GDB internals, its very well explains the HW and SW breakpoints.
HW breakpoints are something that require support from MCU. The ARM controllers have special registers where you can write some address space, whenever PC (program counter) == sp register CPU halts. Jtag is usually required to write into those special registers.
SW breakpoints are implemented in GDB by inserting a trap, an illegal divide, or some other instruction that will cause an exception, and then when it’s encountered, gdb will take the exception and stop the program. When the user says to continue, gdb will restore the original instruction, single-step, re-insert the trap, and continue on.
There are a lot of advantages in using HW debuggers over SW debuggers especially if you are dealing with interrupts and memory bus devices. AFAIK interrupts cannot be debugged with software debuggers.
In addition to the answers above, it is also important to note that while software breakpoints overwrite specific instructions in the program to know where to stop, the more limited number of hardware breakpoints are actually part of the processor.
Justin Seitz in his book Gray Hat Python points out that the important difference here is that by overwriting instructions, software breakpoints actually change the CRC of the file, and so any sort of program such as a piece of malware which calculates its CRC can change its behavior in response to breakpoints being set, whereas with hardware breakpoints it is less obvious that the debugger is stopping and stepping through certain chunks of code.
In brief, hardware breakpoints make use of dedicated registers and hence are limited in number. These can be set on both volatile and non volatile memory.
Software breakpoints are set by replacing the opcode of instruction in RAM memory with breakpoint instruction. These can be set only in RAM memory(Flash memory is not feasible to be written) and are not limited.
This article provides good explanation about breakpoints.
Thanks and regards,
Shivakumar V W
Software breakpoints put an instruction in RAM that is executed like a TRAP when your program reaches that address.
While hardware breakpoints use a register of the CPU to implement the breakpoint itself. That is why the hardware breakpoints are much faster. And that is why we need software breakpoints: hardware breakpoints are limited to the processor number of registers dedicated to breakpoints.
I have learned it at work today :)
Watchpoints is where it makes a huge difference
This is a case where hardware handling is much faster:
watch var
rwatch var
awatch var
When you enter those commands on GDB 7.7 x86-64 it says:
Hardware watchpoint 2: var
This hardware capability for x86 is mentioned at: http://en.wikipedia.org/wiki/X86_debug_register
It is likely possible because of the existing paging circuit, which manages every memory access.
The "software" alternative is to single step the program, which is very slow.
Compare that to regular breakpoints, where at least the software implementation injects an int3 instruction at the breaking point and lets the program run, so you only pay overhead when a breakpoint is hit.
Some quote from the Intel System Debugger help doc:
Hardware vs. Software Breakpoints
The debugger can use both hardware
and software breakpoints, each of these has strengths and weaknesses:
Hardware Breakpoints are implemented using the DRx architectural
breakpoint registers described in the Intel SDM. They have the
advantage of being usable directly at reset, being non-volatile, and
being usable with flash or other read-only memory. The downside is
that they are a finite resource. Software Breakpoints require
modifying system memory as they are implemented by replacing the
opcode at the desired location with a special instruction. This makes
them an unlimited resource, but the memory dependency mean you cannot
install them prior to a module being loaded in memory, and if the
target software overwrites that memory then they will become invalid.
In general, any debug feature that must be enabled by the debugger
does not persist after a reset, and may be impacted after other
architectural mode transitions such as SMM entry/exit or VM
entry/exit. Specific examples include:
CPU Reset will clear all debug features, except for reset break. This
means for example that user-specified breakpoints will be invalid
until the target halts once after reset. Note that this halt can be
due to either a reset-break, or due to a user-initiated halt. In
either case the debugger will restore the necessary debug features.
SMM Entry/exit will disable/re-enable breakpoints, this means you
cannot specify a breakpoint in SMRAM while halted outside of SMRAM. If
you wish the break within SMRAM, you must first halt at the SMM
entry-break and manually apply the breakpoint. Alternatively you can
patch the BIOS to re-enable breakpoints when entering SMM, but this
requires the ability to modify the BIOS which cannot be used in
production code.
On x86 GDB uses some special hardware resources (debug registers?) to set watchpoints. In some situations, when there is not enough of that resources, GDB will set the watchpoint, but it won't work.
Is there any way to programmatically monitor the availability of this resources on Linux? Maybe some info in procfs, or something. I need this info to choose machine in pool for debugging.
From GDB Internals:
"Since they depend on hardware resources, hardware breakpoints may be limited in number; when the user asks for more, gdb will start trying to set software breakpoints. (On some architectures, notably the 32-bit x86 platforms, gdb cannot always know whether there's enough hardware resources to insert all the hardware breakpoints and watchpoints. On those platforms, gdb prints an error message only when the program being debugged is continued.)"
"Too many different watchpoints requested. (On some architectures, this situation is impossible to detect until the debugged program is resumed.) Note that x86 debug registers are used both for hardware breakpoints and for watchpoints, so setting too many hardware breakpoints might cause watchpoint insertion to fail."
"The 32-bit Intel x86 processors feature special debug registers designed to facilitate debugging. gdb provides a generic library of functions that x86-based ports can use to implement support for watchpoints and hardware-assisted breakpoints."
I need this info to choose machine in pool for debugging.
No, you don't. The x86 debug registers (there are 4) are per-process resource, not per-machine resource [1]. You can have up to 4 hardware watchpoints for every process you are debugging. If someone else is debugging on the same machine, you are not going to interfere with each other.
[1] More precisely, the registers are multiplexed by the kernel: in the same way as e.g. the EAX register. Every process on the system and the kernel itself uses EAX, there is only a single EAX register on (single-core) CPU, yet it all works fine through the magic of time-slicing.
I've to test some low level code on an ARM architecture. Typically experimentation is quite complicated on the real board, so I was thinking about QEMU.
What I'd like to get is some kind of debugging information like printfs or gdb. I know that this is simple with linux since it implements both the device driver for the QEMU Integrator and the gdb feature, but I'm not working with Linux. Also I suspect that extracting this kind of functionality from the Linux kernel source code would be complicated.
I'm searching from some simple operating system that already implements one of those features. Do you have some advice?
You don't need a target OS to debug code that's running inside QEMU -- QEMU already does that for you.
Specifically, QEMU supports remote debugging from GDB -- you can run QEMU with the appropriate command-line options and it will export an interface that a copy of GDB (running on the host machine) can connect to. At that point, you can debug the program in GDB pretty much just as if you were running it on the host machine.
http://wiki.osdev.org/GDB appears to have a bit more basic information; possibly not enough to completely get you started, but at least give you the basic idea and some terms to look for in the QEMU and GDB documentation. Skip over the bit about "Implementing GDB Stubs", which doesn't apply here since QEMU has one already, and start at the section on "Using Emulator Stubs". The short form is simply that you start QEMU with the -s option (export a GDB connection on localhost:1234) and the -S option (wait for a GDB "continue" command before starting execution), and then in GDB on your host you say target remote :1234 instead of run. Also, of course, you need to be using an ARM version of GDB rather than a native-x86 one.
(In addition, if you're willing to pay for a commercial solution, CodeSourcery's ARM toolchain has the IDE integration to set all of this up automatically, including support for "printf" to print into the debugger console. That works on a physical board, too, if you've got a hardware debugger. Usual disclaimer about me being a CodeSourcery employee applies -- but I do find it very easy to use.)
Update, 2012: CodeSourcery's toolchain is now called Mentor Graphics Sourcery CodeBench, but all the above still applies.
I realise that I am addressing your original problem here rather than your proposed solution (perhaps that's better?), but to use GDB (or Insight/GDB) directly on the target, use a low-cost JTAG tool and OpenOCD. An example of such a set-up and how to implement it can be found here.
If you have a larger budget, a more fully featured JTAG debugger may be useful, such as the Abatron BDI3000 with bdiGDB firmware which allows remote debugging and device programming over Ethernet with GDB and no special drivers or target debug agent.
Maybe a microkernel like OKL4 would suit your needs?
I'm currently setting up vmware Server 2.0 for kernel debugging with gdb ( see this setup guide ) and someone asked me why not use kvm?
So I ask: kvm vs. vmware for kernel debugging / USB driver development
what are the pros and cons of each?
Driver development? are you working on a driver for a particular piece of hardware? if so, then you probably won't be able to use virtualization, because the virtualized instance won't have access to the new hardware.
For this you will need two machines, one running a remote debugger on the other.
*Edit: * Apparently you're developing a driver for a USB Device? this is one area in particular that a VM actually Can help. These days most VM's have the ability to delegate specific USB devices to a guest OS.
That said, this situation doesn't really offer any benefits over the remote debugger option, because you still need a way to inspect the state of the running or crashed OS, and VM's offer very little assistance in this regard. You might be able to replay saved states from just before a crash.
You might be able to get a bit of traction using UML, which would allow you to do local debugging as on a regular user process, which is a little bit less trouble.
Instead of answering the direct question I'll add another option... Depending on if the kernel in question is a Linux kernel, and what part(s) of it you are working on, you might find that UserModeLinux (included in the 2.6.x source, and available as patch sets for 2.4 and 2.2) may trump both of those options.
As it runs the kernel as a userland process under the host kernel it is easier to attach common debugging tools to. I believe it is very commonly used in the early stages of updates/additions to file-system related code. If you are developing/debugging modules that interact directly with hardware it may be much less use to you though.
Reference links: home,
other
I recently started building GNU Mach/HURD and found the combination of QEmu/KVM to work really quite well.. for the following reasons:
QEmu presents quite a clean environment
Networking has alot of options
I can easily mount the filesystem using a raw device file / loopback
Bottom line is, for kernel work I just want the minimum of functionality to boot and see the result. VMWare is much more for usable virtualization rather than down-and-dirty.
There is however no comparison to booting on a real machine with real hardware. The VM environment can seem like a safety blanket somtimes ... because even my toaster would know what a Realtek RTL8139C was.
If it is a "real hardware" device, of course, vmware will not emulate it, so you won't be able to debug the driver under it (nor will any other virtualisation software, unless you extend one to do so).
Device driver debugging can be done to some extent with a real hardware machine with a normal kernel - although there are obviously things you can't do - like set breakpoints.
It is still possible to attach a debugger to the kernel and inspect stuff. Moreover, traditional printf() debugging is quite possible (printk, anyone), and there are various features in the kernel which make debugging easier. It's possible to build the kernel with various debug options to try to detect pointer problems, memory leaks etc.
By default, the kernel even gives a nice-ish stack trace on the log when it encounters an OOPS or BUG condition (obviously this does not necessarily get written anywhere if the system hangs or crashes). Of course a pointer-out-of-range condition happening inside an interrupt is a recipe for disaster, but you could still get a stack trace on the screen immediately before the panic :)