analyzing crash dumps with debian kernel packages - linux-kernel

I’m trying to analyze a Linux kernel crash dump. The kernel was built out of 4.4.77 tree with some custom packages on top of that. The command to build the kernel was make-kpkg kernel_image debug_image, producing in 2 different debian packages. The idea is that the first package runs in production, the second package can be utilized for debugging if a problem is detected. So the “kernel_image” package was installed, configured for collecting crashes per instructions, ran, crashed and wrote a crash dump file.
I am using crash utility to analyze the dump. Running
crash vmlinux file.dump
for vmlinux file uncompressed per instructions outputs
crash: vmlinux: no .gnu_debuglink section
crash: vmlinux: no debugging data available
Installing the “debug_image” package does not change that.
I noticed that the “debug-image” package contains its own vmlinux file; it is placed in /usr/lib/debug/lib/modules/4.4.77+/ upon installation. Running
crash /usr/lib/debug/lib/modules/4.4.77+/vmlinux file.dump
Outputs
WARNING: kernels compiled by different gcc versions:
/usr/lib/debug/lib/modules/4.4.77+/vmlinux: (unknown)
dump.201802261029 kernel: 4.8.4
WARNING: kernel version inconsistency between vmlinux and dumpfile
crash: incompatible arguments:
/usr/lib/debug/lib/modules/4.4.77+/vmlinux is not SMP -- dump.201802261029 is SMP
What am I missing? Is it possible to analyze the the dump from “kernel_image” utilizing the info available in “debug_image” package?
Update: Apparently the system had a makedumpfile binary being too old for 4.4 kernels. At some point in the past the system's kernel was updated from 3.something to 4.4 but all user-mode binaries stayed as they were. That somehow caused an invalid crash dump. I could not just apt-get install newer makedumpfile (part of kdump-tools package) due to binary incompatibilities. The problem has been resolved after I re-built makedumpfile binary v 5.9 in the same dev environment as the one used to build user-mode applications for the system. Summarizing, pointing crash to vmlinux file from "debug-image" package worked for me at the end.

Related

Cannot access kernel space when debugging xv6 with QEMU and GDB

I am self-studying the 2019 version of MIT 6.828/6.S081: Operating System Engineering.
I was trying to attach GDB to xv6 running on RISC-V using QEMU, to learn about what is going on when context switching happens between user mode and kernel mode.
After doing make qemu-gdb and gdb in the same directory, my GDB connected to QEMU successfully. However:
(gdb) x/2i $pc
=> 0xd8c: ecall
0xd90: ret
The problem is: Now if I stepi, it "jumps over" to 0xd90 instead of stepping into the kernel space.
Additionally, accessing any kernel addresses is not allowed, as if I was debugging a normal userland program:
(gdb) i r stvec
stvec 0x3ffffff000 274877902848
(gdb) x/i $stvec
0x3ffffff000: Cannot access memory at address 0x3ffffff000
Environment:
Host VM: Manjaro 19.0.2
sudo pacman -Syy
sudo pacman -S riscv64-linux-gnu-binutils riscv64-linux-gnu-gcc riscv64-linux-gnu-gdb qemu-arch-extra
GDB: 9.1
QEMU: 4.2.0
GCC: 9.2.0
Much appreciate anyone could share some insight about what is going on here. Thanks a lot!
I guess you run your code on ubuntu, that is the problem I experienced, then I change to mac, and flow mit tools tutorials, finally, it works.
run make CPUS=1 qemu-gdb in one window.
run riscv64-unknown-elf-gdb in another window.
ignore the Python Exception
I managed to get around this problem by building the riscv toolchain as explained here.
Building the toolchain as explained in the site, generates a generic ELF/Newlib toolchain identified with the prefix riscv64-unknown-elf- in contrast to the more sophisticated Linux-ELF/glibc toolchain identified by the prefix riscv64-unknown-linux-gnu-. The Newlib build allows the debugger to stepi into kernel space.
For crossdev users it is possible to build the toolchain with Newlib support by running:
crossdev --ex-gcc --ex-gdb --target riscv64-unknown-elf

Custom built kernel fails to install correctly - Centos7

I am attempting to build and install multiple kernels on my machine, all of the exact same release (4.19.10, found here) but with different preemption models (for benchmarking). I was successful with initial vanilla kernel build and install, but all subsequent installs have not been bootable.
I am building the kernels as rpm packages. Again all are the exact same except for 2 changes in make menuconfig:
General Setup >> Local version - append to kernel release - Here I add a string to indicate preemption model, such as -lld for low-latency desktop
General Setup >> Preemption Model - Here I select the preemption model
All of them (with and without CONFIG_RT_PREEMPT patch) build fine with no errors.
I am installing with rpm -ivh kernel-4.19.10_lld-1.x86_64.rpm, which appears successful until it reaches 100% and hangs. Eventually I kill the install with ctrl+c and check what is running with top and can see grub2-editenv is still running.
From here, a few different things can happen but it all ends up the same. Reboot usually hangs, a 2nd reboot either brings me to grub command line or back to the command line with Welcome to emergency mode!.
I can add the new kernel to grub with grub2-mkconfig -o /boot/grub2/grub.cfg, which has no issues. But regardless of selecting the boot image from the grub command line directly or adding it to grub and selecting it during boot, I get the same text:
error: invalid magic number.
error: you need to load the kernel first
I recognize that there might not be enough info here to identify my issue, but I was hoping to at least get some direction and answer a few questions:
Is utilizing General Setup >> Local version - append to kernel release sufficient enough to make these kernels unique so that they may be installed along side one another?
Are these symptoms indicative of a bad build, incorrectly configured rpm spec, or just a bad grub configuration?
Thanks
Update: I was able to upgrade my kernel with rpm -Uvh kernel-4.19.10_lld-1.x86_64.rpm successfully and have it correctly boot, although I could not do that with one of the other kernels. Not sure what that indicates, but I'm thinking the issue is probably trying to install the same kernel versions in parallel and the builds themselves are probably OK.
Update 2:
I ditched the rpm solution and tried just make modules_install and make install. Installs no issues, but then running grub2-mkconfig hangs. Booting hangs at black screen, rebooting takes me to grub command line. Then manually loading the kernel does not give any errors but booting ends up with a kernel panic right after the hardware is identified. Message is Kernel Panic - not syncing: VFS: Unable to mount.
Probably related - I built the first (working) kernel on a VM (intel i7 hardware), but have been building the others on an intel atom e3950 chipset. I'm thinking that might be the issue because the menuconfig ends up different. I dont think I've had a healthy build on that chipset yet.

Problems getting (usable) core dump under cygwin

I'm trying to develop code under 64-bit Cygwin, and I'm having trouble getting a core dump file that I can use under GDB. The code is compiled using GCC 7.3.0, and I've just updated my Cygwin bits. ulimit -c is unlimited.
I've got my $CYGWIN variable set to point to dumper, and that appears to be being launched on crashes. I get a pop-up, and the message
*** starting debugger for pid 5288, tid 9464
*** continuing pid 5288 from debugger call (1)
Aborted (core dumped)
and a core file (basic.exe.core) is created in the current dir.
When I try to run (the stock Cygwin) GDB on this
gdb tests/basic.exe --core=basic.exe.core
I get the normal version intro, Reading symbols..., and then a warning
warning: core file may not match specified executable file.
and GDB crashes (and dumps its own core file). The crashing program was launched from the Cygwin bash command line (as ./tests/basic.exe).
It's been a long time since I've tried to develop under Windows or Cygwin, so it's quite possible that I'm doing something stupid. Or, alternatively, it may be that GCC 7.3.0 is doing something wrong or that I configured it poorly when I built it.
Any help will be appreciated.

linux/bounds.h not found while compiling source of my driver

I am developing drivers for my embedded device that has linux kernel version 2.6.32. In driver code, I am including linux/modules.h but on compiling, It gives me error linux/bounds.h not found.
I have downloaded kernel source from linux git repository. I have checked path settings. They are ok.
I checked my kernel source, there is no bounds.h file. So why my driver is expecting that. Error is coming due to including modules.h.
First, I need to run make command, so that it can generate and link all necessary files.

Setting up two-machine kernel debugging via firewire

The instructions found at Setting Up Kernel Debugging are what I used to get to this point. On the machine running the kext I want to debug, I do see the message "Connected to remote debugger". On the machine I am running gdb on, I do see:
(gdb) kdp-reattach localhost
Connected.
The problem is that 'showallkmods' returns an empty list and none of the other similar commands appear to be working:
(gdb) showallkmods
kmod address size id refs version name
(gdb) showalltasks
task vm_map ipc_space #acts pid process io_policy wq_state command
Invalid type combination in equality test.
(gdb) showregistry
Please load kgmacros after KDP attaching to the target.
(gdb) source /Volumes/KernelDebugKit/kgmacros
Loading Kernel GDB Macros package. Type "help kgm" for more info.
(gdb) showallkmods
kmod address size id refs version name
(gdb) showregistry
Please load kgmacros after KDP attaching to the target.
(gdb) showbootargs
Invalid cast.
I am running 10.6.8 and am using kernel_debug_kit_10.6.8_10k540.dmg
I am not sure what other details one might need to diagnose what has gone wrong, but if you want to ask questions in the comments, I can certainly attempt to provide additional details.
The error "Invalid type combination in equality test." indicates to me that gdb might be expecting a different CPU architecture than the kernel you're connecting to is running. The 10.6 kernel exists in both 32-bit and 64-bit variants, and by default it's determined by the hardware which one gets loaded. gdb normally defaults to x86_64 if your CPU supports it (true of all Intel Macs except the very early Core Duo based ones) so if you're connecting to a 32-bit kernel (the default on most Macs released before 2011) you need to pass the -arch i386 argument when starting gdb. You can check the current kernel CPU architecture by running the uname -a command.
Update: on OSX Mountain Lion, the kernel always runs in 64-bit (x86_64) mode. On OSX Lion, the kernel defaults to 64-bit mode on Macs which are capable of running Mountain Lion and in 32-bit mode otherwise.

Resources