How to find exact line number of FreeBSD kernel panic? - debugging

I am testing a new driver on FreeBSD kernel.
This might be trivial for experienced developers, but I can't figure out the solution to this problem.
I have a kernel panic and when it panics, I get the backtrace of the panic.
The backtrace says that the panic occurred at say foo_bar() + 0x94. How can I extract the line no corresponding to foo_bar() + 0x94?
The kernel is built with debugging symbols. I have tried grepping nm kernel but it only contains debugging symbols.
What can I do to find the exact line no?

I suggest to read the FreeBSD Handbook on Kernel debugging
https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html
It has detailed explanation of how to create a core file and how to invoke the gdb.

Configure crashdumps - it's usually just adding 'dumpdev="AUTO"' to /etc/rc.conf and rebooting - and, after crash and subsequent reboot, analyze the dump with debugger, like this: "kgdb /boot/kernel/kernel /var/crash/vmcore.latest". The "kgdb" thing is basically GDB hacked up to support kernel debugging; the "where" command should show you the backtrace.

Related

Debugging a custom OS with QEMU

I am trying to write a simple OS, I already wrote a bootloader but now I want to debug it, so I switched from using VirtualBox to QEMU because I saw it had better debugging.
The problem is that after I added the -s parameter to QEMU command and successfully connected via GDB, it says that the symbol table isn't loaded and that I should use the "file" command.
The only difference from what I did to what I saw people on the Internet do, is that they started GDB with gdb vmlinux, but I can't do that because I am not debugging a Linux kernel... so I figured that the issue is that I didn't start GDB with an executable, but using the "file" command on my OS image, and the compiled and linked .out file, tells me it's a "DOS/MBR boot sector", so I can't start GDB with either of them (I tried to do that anyways, but GDB failed).
Help would be appreciated.
EDIT: also, I did assemble the bootloader with the -g and --gstabs+ options.
gdb would like a file so that it can give you symbolic debugging information. For that you will need to give it a file in a format with debug info which corresponds to where your OS ends up in RAM. The "DOS/MBR boot sector" file is a disk image (the BIOS will load part of this into RAM for you, and it will then presumably finish loading code itself).
But gdb will also entirely happily let you do assembly-level debugging; you can just ignore the warning about not having a symbol table, and use the single step instruction, disassemble-from-pc and similar commands:
"disas $pc,+32" disassembles 32 bytes from the current PC
the display command prints after execution stops, so "disp /3i $pc" will print the next 3 instructions every time gdb gets control
"stepi" and "nexti" do single-instruction step/next ("step" and "next" are source-line stepping and require debug info)

Is there a way to log a stack trace with symbols from a kext on osx?

I would like to use it to debug kernel drivers but I would try to avoid to add logging to all functions. OSReportWithBacktrace seems to work but I need symbols.
I'm not aware of a way to print symbolicated stack traces directly from a kext. You can get symbolicated panic logs by adding keepsyms=1 to the boot-args nvram variable. I suspect the data structures for this have private linkage so you probably can't replicate the symbolicated panic code in your own kext. (It's in osfmk/i386/AT386/model_dep.c of the xnu source though if you want to try.)
Your other option is to send the output from OSReportWithBacktrace through the atos command-line tool. For kext symbols, you'll need to find the kext's load address from kextstat and pass that to the -l command line argument.
Finally, you can of course use lldb kernel debugging to get a stack trace. If you need to set a breakpoint during early kext load, before you get a chance to do it from the lldb command line, you can insert __asm__("int $3") (IIRC) at the point in the code where you want to break into the debugger.

osx kernel debug cannot 'malloc_get_all_zones'

I am doing some OSX kernel debugging with lldb and KDK.
When the kernel crash,I want to view the zones,and search the zones.
So I use:
(lldb) command script import lldb.macosx.heap
(lldb) cstr_refs CSTRING
This command is always working in Ring3 debugging, but when at kernel debugging, lldb give me an error:
error: error: use of undeclared identifier 'malloc_get_all_zones'
error: 1 errors parsing expression
The script heap.py is unusable in kernel?
How to search the kernel zones at this situation?
Someone more familiar with kernel issues can maybe tell you how to get the information you want out of the kernel. I can answer the part about "heap.py". It is only meant to be used when debugging userland programs. It relies on details of the userland malloc implementation, and it relies on being able to call functions in the debugee which is not currently possible when debugging the kernel.
Note, if you get the KDK so you have the dSYM for the mach kernel, it defines a bunch of commands that poke around in the kernel's data structures. It may be one of them will tell you what you want to know. Remember to run the lldb command:
(lldb) settings set target.load-script-from-symbol-file true
in order to allow lldb to read in the Python from the dSYM that defines all these macros. Then running the lldb help command will show you all the kernel-specific commands.

Is it possible to use gdb and qemu to debug linux user space programs and kernel space simultaneously?

So far, with gdb + qemu, I can step into/over linux kernel source code. Is it possible to debug the user space programs simultaneously? For example, single step a program from user space to kernel space so I can observe the changes of registers on the qemu monitor by issuing info registers?
Minimal step-by-setep setup
Mahouk is right, but here is a fully automated QEMU + Buildroot example which presuposes that you already know how to debug the kernel with QEMU + gdb and a more detailed exaplanation:
readelf -h myexecutable | grep Entry
Gives:
Entry point address: 0x4003a0
So inside GDB we need to do:
add-symbol-file myexecutable 0x4003a0
b main
And only then start the executable in QEMU:
myexecutable
A more reliable way to do that is to set myexecutable as the init process if you can do that.
add-symbol-file is also mentioned at: How to load multiple symbol files in gdb
Why would you ever want to do this instead of gdbserver?
I can only see one use case for this so far: debugging init: Debug init on Qemu using gdb
Otherwise, why not just use the following more reliable method, e.g. to step into a syscall:
start two remote GDBs:
one with qemu-system-* -s
the other gdbserver myexecutable as explained at: https://reverseengineering.stackexchange.com/questions/8829/cross-debugging-for-mips-elf-with-qemu-toolchain/16214#16214
step in gdbserver's GDB as close as possible to the system call, which often mean stepping into the libc
on the QEMU's GDB, do e.g. b sys_read for the read syscall
back on gdbserver, do continue
I propose this because:
using the QEMU GDB for userland can lead to random jumps as the kernel context switches to another process that uses the same virtual addresses
I was not able to load shared libraries properly without gdbserver: attempting sharedlibrary directly gives:
(gdb) sharedlibrary ../../staging/lib/libc.so.0
No loaded shared libraries match the pattern `../../staging/lib/libc.so.0'.
As a consequence, since most kernel interactions go through the stdib, you would need to do a lot of smart assembly stepping to find the kernel entry, which could be impractical.
Until, that is, someone writes a smarter GDB scripts that steps every instruction until a context switch happens or until source become available. I wonder if such scripts would't be too slow, as the naive approach has the overhead of communication to-from GDB for every instruction.
This might get you started: Tell gdb to skip standard files
Parsing Linux kernel data structures
To do userland process debug properly, that's what we would have to do eventually: thread-aware gdb for the Linux kernel
I achieve it by using the gdb command add-symbol-file to add userspace programs debugging information. But you must know these programs loading addresses. so to be precise, you have to launch the kernel debugging by connecting gdb to gdbserver as usual; and then, you can add those program debugging information. You can also use .gdbinit script though. Read this

I need to find the point in my userland code that crash my kernel

I have big system that make my system crash hard. When I boot up, I don't even have
a coredump. If I log every line that
get executed until my system goes down. I will find that evil code.
Can I log every source code line in GDB to a file?
UPDATE:
ok, I found the bug. It was nasty. The application I started did not
take the system down. After learning about coredump inspection with mdb, and some gdb stepping I found out that the systemcall causing the dump, was not implemented. Updating the system to latest kernel will fix my problem. Thanks to all of you.
MY LESSON:
make sure you know what process causes the coredump. It's not always the one you started.
Sounds like a tricky little problem.
I often try to eliminate as many possible suspects as I can by commenting out large chunks of code, configuring the system to not run certain pieces (if it allows you to do that) etc. This amounts to doing an ad-hoc binary search on the problem, and is a surprisingly effective way of zooming in on offending code relatively quickly.
A potential problem with logging is that the log might not hit the disk before the system locks up - if you don't get a core dump, you might not get the log.
Speaking of core dumps, make sure you don't have a limit on your core dump size (man ulimit.)
You could try to obtain a list of all the functions in your code using objdump, process it a little bit and create a bunch of GDB trace statements on those functions - basically creating a GDB script automatically. If that turns out to be overkill, then a binary search on the code using tracepoints can also help you zoom in on the problem.
And don't panic. You're smarter than the bug - you'll find it.
You can not reasonably track every line of your source using GDB (too slow). Besides, a system crash is most likely a result of a system call, and libc is probably doing the system call on your behalf. Even if you find the line of the application that caused OS crash, you still don't really know anything.
You should start by clarifying which OS is crashing. For Linux, you can try the following approaches:
strace -fo trace.out /path/to/app
After reboot, trace.out will contain syscalls the application was doing just before the crash. If you are lucky, you'll see the last syscall-of-death, but I wouldn't count on it.
Alternatively, try to reproduce the crash on the user-mode Linux, or on kernel with KGDB compiled in.
These will tell you where the problem in the kernel is. Finding the matching system call in your application will likely be trivial.
Please clarify your problem: What part of the system is crashing?
Is it an application?
If so, which application? Is this an application which you have written yourself? Is this an application you have obtained from elsewhere? Can you obtain a clean interrupt if you use a debugger? Can you obtain a backtrace showing which functions are calling the section of code which crashes?
Is it a new hardware driver?
Is it based on an older driver? If so, what has changed? Is it based on a manufacturer's data sheet? Is that data sheet the latest and most correct?
Is it somewhere in the kernel? Which kernel?
What is the OS? I assume it is linux, seeing that you are using the GNU debugger. But of course, that is not necessarily so.
You say you have no coredump. Have you enabled coredumps on your machine? Most systems these days do not have coredumps enabled by default.
Regarding logging GDB output, you may have some success, but it depends where the problem is whether or not you will have the right output logged before the system crashes. There is plenty of delay in writing to disk. You may not catch it in time.
I'm not familiar with the gdb way of doing this, but with windbg the way to go is to have a debugger attached to the kernel and control the debugger remotely over a serial cable (or firewire) from a second debugger. I'm pretty sure gdb has similar capabilities, I could quickly find some hints here: http://www.digipedia.pl/man/gdb.4.html

Resources