gdb loses line number information (on kernel modules) after breakpoint - debugging

I am connecting gdb to a virtual machine's kernel and trying to debug the kernel module. I am able to connect to the virtual machine. I have symbol information for kernel code, and can step through kernel code just fine.
When I add the symbol file for my kernel module (whether I do this before or after remote connection, incidentally), I am able to list <function_name> information about the function, until I set a breakpoint; after that:
(gdb) b function_name
Breakpoint 1 at 0xffffffffa01d0074 (3 locations)
(gdb) list function_name
No line number known for function_name.
Additional information:
Both host and guest are Fedora 16 64-bit.
The kernel I am debugging is 3.0.8 - note that this kernel worked fine on a prior 32-bit setup with a different environment and remote-connection setup.
I have tried this with gdb 7.2 and 7.3.50.
Any ideas on whats wrong? It would help if I even knew for certain whether the problem was my kernel, kernel module compilation, the connection, or gdb.
Update: With gdb 7.1, I get the following:
...
(gdb) b function_name
/gdb/breakpoint.c:7903: internal-error: expand_line_sal_maybe: Assertion `found' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
What does that mean?

A partial answer:
With gdb 7.1, recompiling the kernel and kernel module with -gdwarf-2, and the module with -O0 seems to have done the trick. I'm not sure which it is or why yet.

Related

GDB on Windows machine

Let us say I am on a Windows machine and I goto its command line terminal and type 'gdb' there. I get gdb prompt (gdb) as shown in the following image. It means gdb.exe is installed on the machine.
My understanding is that the GDB is client-server application. I want to know is this gdb.exe the gdbserver or gdbclient? If its the former then where would be the later and if its the later then where would be the former in this case?
GDB can be a client server application, but it doesn't have to be.
What you started is gdb itself, so, the client side. The server is actually called, gdbserver.
Usually, you'd make use of gdbserver when you want to debug something running on a different machine over a network (though there's nothing to stop you running gdbserver on the same machine as gdb itself).
You can also use gdb to directly start an application to debug, so at the (gdb) prompt you might do:
(gdb) file /path/to/some/executable
(gdb) break main
(gdb) run
For further reading the manual has lots of details, there's a simple example session and more details on remote debug.

Cannot access kernel space when debugging xv6 with QEMU and GDB

I am self-studying the 2019 version of MIT 6.828/6.S081: Operating System Engineering.
I was trying to attach GDB to xv6 running on RISC-V using QEMU, to learn about what is going on when context switching happens between user mode and kernel mode.
After doing make qemu-gdb and gdb in the same directory, my GDB connected to QEMU successfully. However:
(gdb) x/2i $pc
=> 0xd8c: ecall
0xd90: ret
The problem is: Now if I stepi, it "jumps over" to 0xd90 instead of stepping into the kernel space.
Additionally, accessing any kernel addresses is not allowed, as if I was debugging a normal userland program:
(gdb) i r stvec
stvec 0x3ffffff000 274877902848
(gdb) x/i $stvec
0x3ffffff000: Cannot access memory at address 0x3ffffff000
Environment:
Host VM: Manjaro 19.0.2
sudo pacman -Syy
sudo pacman -S riscv64-linux-gnu-binutils riscv64-linux-gnu-gcc riscv64-linux-gnu-gdb qemu-arch-extra
GDB: 9.1
QEMU: 4.2.0
GCC: 9.2.0
Much appreciate anyone could share some insight about what is going on here. Thanks a lot!
I guess you run your code on ubuntu, that is the problem I experienced, then I change to mac, and flow mit tools tutorials, finally, it works.
run make CPUS=1 qemu-gdb in one window.
run riscv64-unknown-elf-gdb in another window.
ignore the Python Exception
I managed to get around this problem by building the riscv toolchain as explained here.
Building the toolchain as explained in the site, generates a generic ELF/Newlib toolchain identified with the prefix riscv64-unknown-elf- in contrast to the more sophisticated Linux-ELF/glibc toolchain identified by the prefix riscv64-unknown-linux-gnu-. The Newlib build allows the debugger to stepi into kernel space.
For crossdev users it is possible to build the toolchain with Newlib support by running:
crossdev --ex-gcc --ex-gdb --target riscv64-unknown-elf

analyzing crash dumps with debian kernel packages

I’m trying to analyze a Linux kernel crash dump. The kernel was built out of 4.4.77 tree with some custom packages on top of that. The command to build the kernel was make-kpkg kernel_image debug_image, producing in 2 different debian packages. The idea is that the first package runs in production, the second package can be utilized for debugging if a problem is detected. So the “kernel_image” package was installed, configured for collecting crashes per instructions, ran, crashed and wrote a crash dump file.
I am using crash utility to analyze the dump. Running
crash vmlinux file.dump
for vmlinux file uncompressed per instructions outputs
crash: vmlinux: no .gnu_debuglink section
crash: vmlinux: no debugging data available
Installing the “debug_image” package does not change that.
I noticed that the “debug-image” package contains its own vmlinux file; it is placed in /usr/lib/debug/lib/modules/4.4.77+/ upon installation. Running
crash /usr/lib/debug/lib/modules/4.4.77+/vmlinux file.dump
Outputs
WARNING: kernels compiled by different gcc versions:
/usr/lib/debug/lib/modules/4.4.77+/vmlinux: (unknown)
dump.201802261029 kernel: 4.8.4
WARNING: kernel version inconsistency between vmlinux and dumpfile
crash: incompatible arguments:
/usr/lib/debug/lib/modules/4.4.77+/vmlinux is not SMP -- dump.201802261029 is SMP
What am I missing? Is it possible to analyze the the dump from “kernel_image” utilizing the info available in “debug_image” package?
Update: Apparently the system had a makedumpfile binary being too old for 4.4 kernels. At some point in the past the system's kernel was updated from 3.something to 4.4 but all user-mode binaries stayed as they were. That somehow caused an invalid crash dump. I could not just apt-get install newer makedumpfile (part of kdump-tools package) due to binary incompatibilities. The problem has been resolved after I re-built makedumpfile binary v 5.9 in the same dev environment as the one used to build user-mode applications for the system. Summarizing, pointing crash to vmlinux file from "debug-image" package worked for me at the end.

Debugging kernel using qemu and gdb

I was trying to debug the kernel using qemu and gdb. For this I have used the concept of bridge connection between qemu and host machine. In the script I have used the tcp:17777:127.0.0.1:22 to connect the qemu machine for gdb.
But when I do ssh 17777 root#localhost (root is user of qemu), it does not respond me.
Question 1: when I will know that I am on right path means I can debug the kernel using qemu?
When we do:
gdb vmlinux
target remote :1234
Question 2: When i try to do gdb vmlinux and target remote :1234 without booting the kernel I want to debug, still I get the following output (which I get when I boot with qemu for kernel I want to boot).
(gdb) target remote :1234
Remote debugging using :1234
default_idle () at arch/x86/kernel/process.c:299
299 current_thread_info()->status |= TS_POLLING;
Help me to understand the concept in detail and share the link to debug kernel using qemu and gdb

Setting up two-machine kernel debugging via firewire

The instructions found at Setting Up Kernel Debugging are what I used to get to this point. On the machine running the kext I want to debug, I do see the message "Connected to remote debugger". On the machine I am running gdb on, I do see:
(gdb) kdp-reattach localhost
Connected.
The problem is that 'showallkmods' returns an empty list and none of the other similar commands appear to be working:
(gdb) showallkmods
kmod address size id refs version name
(gdb) showalltasks
task vm_map ipc_space #acts pid process io_policy wq_state command
Invalid type combination in equality test.
(gdb) showregistry
Please load kgmacros after KDP attaching to the target.
(gdb) source /Volumes/KernelDebugKit/kgmacros
Loading Kernel GDB Macros package. Type "help kgm" for more info.
(gdb) showallkmods
kmod address size id refs version name
(gdb) showregistry
Please load kgmacros after KDP attaching to the target.
(gdb) showbootargs
Invalid cast.
I am running 10.6.8 and am using kernel_debug_kit_10.6.8_10k540.dmg
I am not sure what other details one might need to diagnose what has gone wrong, but if you want to ask questions in the comments, I can certainly attempt to provide additional details.
The error "Invalid type combination in equality test." indicates to me that gdb might be expecting a different CPU architecture than the kernel you're connecting to is running. The 10.6 kernel exists in both 32-bit and 64-bit variants, and by default it's determined by the hardware which one gets loaded. gdb normally defaults to x86_64 if your CPU supports it (true of all Intel Macs except the very early Core Duo based ones) so if you're connecting to a 32-bit kernel (the default on most Macs released before 2011) you need to pass the -arch i386 argument when starting gdb. You can check the current kernel CPU architecture by running the uname -a command.
Update: on OSX Mountain Lion, the kernel always runs in 64-bit (x86_64) mode. On OSX Lion, the kernel defaults to 64-bit mode on Macs which are capable of running Mountain Lion and in 32-bit mode otherwise.

Resources