GDB Backtrace Containing Similar Addresses but Different Source Lines - debugging

I was trying to debug inkscape and put a breakpoint at an address in its main shared library (i.e., /usr/lib/inkscape/libinkscape_base.so). When the execution reached that breakpoint, the backtrace was as follows:
#0 0x00007ffff6ecb220 in __static_initialization_and_destruction_0 (__priority=65535, __initialize_p=1) at /usr/include/c++/7/iostream:74
#1 0x00007ffff6ecb220 in _GLOBAL__sub_I_log_display_config.cpp(void) () at ./src/debug/log-display-config.cpp:83
#2 0x00007ffff7de5733 in call_init (env=0x7fffffffddd8, argv=0x7fffffffddc8, argc=1, l=<optimized out>) at dl-init.c:72
#3 0x00007ffff7de5733 in _dl_init (main_map=0x7ffff7ffe170, argc=1, argv=0x7fffffffddc8, env=0x7fffffffddd8) at dl-init.c:119
#4 0x00007ffff7dd60ca in _dl_start_user () at /lib64/ld-linux-x86-64.so.2
#5 0x0000000000000001 in ()
#6 0x00007fffffffe176 in ()
#7 0x0000000000000000 in ()
As can be seen, #0 and #1 point to the same address but different source locations. The same is true for #2 and #3. How is it possible?

How is it possible?
It's possible with inlining.
GCC emits sufficient debug info for GDB to tell that a particular address, even though it is physically located inside bar, actually belongs to inlined foo.
Since foo is "not really there", but a call to it is synthesized by GDB in the backtrace output, what address GDB prints for it is somewhat irrelevant.
GDB used to print no address at all (my version 8.3.50.20190824-24.fc31 still does), but I guess this isn't dependable, and sometimes GDB may just repeat the previous return address.

Related

Reading memory with GDB vmlinux /proc/kcore

I am trying to use gdb to read memory from vmlinux. The exact syntax is
sudo gdb vmlinux-4.18.0-rc1+ /proc/kcore
I use this file because vmlinux is a symlink to this file.
The result is the following
Reading symbols from vmlinux-4.18.0-rc1+...(no debugging symbols found)...done.
warning: core file may not match specified executable file.
[New process 1]
Core was generated by `root=/dev/mapper/rcs--power9--talos--vg-root ro console=hvc0 quiet'.
#0 0x0000000000000000 in ?? ()
(gdb) x/4xb 0xfffffff0
0xfffffff0: Cannot access memory at address 0xfffffff0
(gdb) print &sys_call_table
No symbol table is loaded. Use the "file" command.
(gdb)
The file vmlinux-4.18.0-rc1+ is in /boot. The file type is as follows:
root#rcs-power9-talos:/boot# file vmlinux-4.18.0-rc1+
vmlinux-4.18.0-rc1+: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=a1c9f3fe22ff5cbf419787657c878c8a07e559b2, stripped
I modified the config-4.18.0-rc1+ file such that every CONFIG_DEBUG option is set to yes. I then rebooted the system. My questions are:
Do I need to do anything else for the changes I made to /boot/config-4.18.0-rc1+ to take effect?
Based on the file type of vmlinux-4.18.0-rc1+, does it seem that this file should work for debugging?
I did not build the kernel myself. It is a custom build from Raptor Computer Systems.
The config-* file you've modified is just for reference - all these options have already been compiled into the kernel, so changing them will not have any effect.
However, you can get any symbol you want in two steps:
consult /proc/kallsyms (e.g. grep sys_call_table /proc/kallsyms). Get the address. Note, that this might appear as 0x00000000 - which can be fixed by setting /proc/sys/kernel/kptr_restrict to 0
Then use above address as direct argument. You will still run into minor issues (e.g. "print" won't know what datatype it is, but x/20x for example will work) , but these can be resolved with a bit of gdb scripting, or providing an external dwarf file.

Why does gdb does not show debug symbols of kernel with debug info?

I am trying to learn more about kernel and driver development, so for that purpose I thought to use KVM and gdb to establish debug session with custom installed kernel (v5.1.0).
The kernel has debug info included, and here is a chunk of .config I used:
$ rg -i "(debug|kalls|GDB_SCRIPTS).*=y" .config
205:CONFIG_KALLSYMS=y
206:CONFIG_KALLSYMS_ALL=y
...
225:CONFIG_SLUB_DEBUG=y
...
9620:CONFIG_DEBUG_INFO=y
9623:CONFIG_DEBUG_INFO_DWARF4=y
9624:CONFIG_GDB_SCRIPTS=y
9640:CONFIG_DEBUG_KERNEL=y
...
By using "-s" option I can connect to Ubuntu 18.04 kernel in my VM, but gdb does not show any symbols:
Reading symbols from vmlinux...
(gdb) target remote :1234
Remote debugging using :1234
0xffffffff8ea4af66 in ?? ()
(gdb) bt
#0 0xffffffff8ea4af66 in ?? ()
#1 0xffffffff8f603e38 in ?? ()
#2 0xffffffff8ea4abb2 in ?? ()
#3 0x0000000000000000 in ?? ()
(gdb) i t
Ambiguous info command "t": target, tasks, terminal, threads, tp, tracepoints, tvariables, type-printers, types.
(gdb) i threads
Id Target Id Frame
* 1 Thread 1 (CPU#0 [halted ]) 0xffffffff8ea4af66 in ?? ()
2 Thread 2 (CPU#1 [halted ]) 0xffffffff8ea4af66 in ?? ()
(gdb) b printk
Breakpoint 1 at 0xffffffff81101fa3: file /home/ilukic/projects/kernel/linux-stable/kernel/printk/printk.c, line 2030.
(gdb) c
Continuing.
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0xffffffff81101fa3
Command aborted.
(gdb) disassemble 0xffffffff81101f83,100
Dump of assembler code from 0xffffffff81101f83 to 0x64:
End of assembler dump.
(gdb) disassemble 0xffffffff81101f83,+100
Dump of assembler code from 0xffffffff81101f83 to 0xffffffff81101fe7:
0xffffffff81101f83 <kmsg_dump_rewind_nolock+19>: Cannot access memory at address 0xffffffff81101f83
(gdb) disassemble 0xffffffff81101fa3,+10
Dump of assembler code from 0xffffffff81101fa3 to 0xffffffff81101fad:
0xffffffff81101fa3 <printk+0>: Cannot access memory at address 0xffffffff81101fa3
At the end, when inspecting /proc/kallsyms on VM (e.g. searching for printk symbol from previous gdb session), no symbol is found:
~$ cat /proc/kallsyms | grep "t printk"
0000000000000000 t printk_safe_log_store
0000000000000000 t printk_late_init
~$ uname -a
Linux ubuntu18 5.1.0 #2 SMP Tue Nov 12 19:01:21 CET 2019 x86_64 x86_64 x86_64 GNU/Linux
On the other hand when using objdump, "printk" can be found in vmlinux and as seen, gdb does not complain about missing symbol when setting a breakpoint.
I am assuming that installation of kernel went well as no errors were reported, still I can't explain why I can't find corresponding symbols in kallsyms.
Other thing that I find strange is when going through /proc/kallsyms why do all the lines start with 0s.
Any ideas why is gdb not showing any symbols?
As #IanAbbott suggested, CONFIG_RANDOMIZE_BASE=y (or "nokaslr" kernel command line argument)
was missing to prevent KASLR.

How to stop GNU Fortran Compiler to show debug info during runtime?

The following lin is used to compile an executable from the Fortran source code
gfortran -funderscoring -O3 -Wall -c -fmessage-length=0 -o "src/abc.o" "../src/abc.f"
When I run my program in command prompt and errors occur, it will show runtime errors in the command prompt (see the runtime error example below). I want to disable the display of the runtime errors as I am worried that this will reveal the source code. How can I do that?
At line 429 of file ../src/abc.f (unit = 5, file = 'stdin')
Fortran runtime error: Bad value during integer read
Error termination. Backtrace:
Could not print backtrace: libbacktrace could not find executable to open
#0 0xffffffff
#1 0xffffffff
#2 0xffffffff
#3 0xffffffff
#4 0xffffffff
#5 0xffffffff
#6 0xffffffff
#7 0xffffffff
#8 0xffffffff
#9 0xffffffff
#10 0xffffffff
#11 0xffffffff
#12 0xffffffff
#13 0xffffffff
I am not aware of any such option and I am not useful of its usefulness anyway. Fortran rewuires an error condition to be handled eventhough it does not prescribe the form of the message.
You can always use the iostat= or err= specifiers to handle the error conditions yourself in anu way you like.

Debugging Linux Kernel using GDB in qemu unable to hit function or given address

I am trying to understand kernel bootup sequence step by step using GDB in qemu environment.
Below is my setting:
In one terminal im running
~/Qemu_arm/bin/qemu-system-arm -M vexpress-a9 -dtb ./arch/arm/boot/dts/vexpress-v2p-ca9.dtb -kernel ./arch/arm/boot/zImage -append "root=/dev/mmcblk0 console=ttyAMA0" -sd ../Images/RootFS.ext3 -serial stdio -s -S
In other terminal
arm-none-linux-gnueabi-gdb vmlinux
Reading symbols from vmlinux...done.
(gdb) target remote :1234
Remote debugging using :1234
0x60000000 in ?? ()
My question is how setup breakpoint for the code in /arch/arm/boot/compressed/* files .
e.g I tried to setup break point for decompress_kernel defined in misc.c .
Case 1:
(gdb) b decompress_kernel
Function "decompress_kernel" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (decompress_kernel) pending.
(gdb) c
Continuing.
The above one is not able to hit the function qemu is booting.
Case 2:
(gdb) b *0x80008000
Breakpoint 1 at 0x80008000: file arch/arm/kernel/head.S, line 89.
(gdb) c
Continuing.
In this case also its not able to hit instead qemu is booting up.
Case 3:
(gdb) b start_kernel
Breakpoint 1 at 0x8064d8d8: file init/main.c, line 498.
(gdb) c
Continuing.
Breakpoint 1, start_kernel () at init/main.c:498
498 {
(gdb)
In this case function is hitting and i am able debug step by step.
Note: I have enabled debug,Early printk and tried hbreak
So my query is:
why some functions are not able to hit break points?
Is this qemu limitation or do I need enable something more?
do I need to append any extra parameters?
how to Debug early kernel booting
You are not able to put breakpoints on any function preceding start_kernel because you are not loading symbols for them. In fact you are starting qemu with a zImage of the kernel but loading the symbols from vmlinux. They are not the same: zImage is basically vmlinux compressed as a data payload which is then attached to a stub which decompresses it in memory then jumps to start_kernel.
start_kernel is the entry point of vmlinux, any function preceding it, including decompress_kernel, are part of the stub and not present in vmlinux.
I don't know if doing "arm-none-linux-gnueabi-gdb zImage" instead allows you to debug the stub, I have always done early debug of ARM kernels with JTAG debuggers on real hardware, and never used qemu for that, sorry

Obtaining backtrace with symbols after SIGSEGV from gfortran application

I'm compiling with gfortran 4.8.1 with the flags -ggdb -O0 -Wall -Wextra -Wtabs -Wsurprising -fbacktrace -fimplicit-none -fcheck=all -std=f2008. Running in gdb I get a backtrace with no procedure names:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000000100000000 in ?? ()
#2 0x00007fffffffd760 in ?? ()
#3 0x3f1a36e2eb1c432d in ?? ()
#4 0x3da5fd7fe1796495 in ?? ()
#5 0x4024000000000000 in ?? ()
#6 0x3eb0c6f7a0b5ed8d in ?? ()
#7 0x408f400000000000 in ?? ()
#8 0x408f400000000000 in ?? ()
#9 0x0000000000000000 in ?? ()
After executing ./gyre, which segfaults, I invoke gdb ./gyre core. I see a warning Can't read pathname for load map: Input/output error, but I'm not sure if that's relevant to the problem.
What do I need to do to see where the SIGSEGV occurred?
Update:
So I suspect the stack corruption must be related to a procedure point initialisation, since my code was not segfaulting prior to this change. I'm not able to provide the full source code, but the relevant snippet is
pure function new_default_sim_spec() result(spec)
type(sim_spec_type) :: spec
spec = new_sim_spec(1.0_dp)
end function new_default_sim_spec
pure function new_sim_spec(max_days) result(spec)
type(sim_spec_type) :: spec
real(dp), intent(in) :: max_days
! snipped other attribute assignments
spec%increment_h => increment_h_euler
end function new_sim_spec
abstract interface
pure function increment_h_iface(spec, state_minus_1) result(state)
import :: sim_state_type, sim_spec_type
type(sim_state_type) :: state
class(sim_spec_type), intent(in) :: spec
type(sim_state_type), intent(in) :: state_minus_1
end function increment_h_iface
end interface
type sim_spec_type
! snipped other attribute declarations
procedure(increment_h_iface), pointer :: increment_h => null()
end type sim_spec_type
Running in gdb I get a backtrace with no procedure names:
How did you run GDB?
I am guessing you did gdb /path/to/core. Try gdb /path/to/executable /path/to/core instead.
Update:
gdb ./gyre core. I see a warning ...
That warning is irrelevant (and frequently there, though I don't understand the exact conditions which trigger it).
The other obvious way to check where SIGSEGV occurred is to simply run the binary under GDB from the start. You don't need to wait for core, a simple:
gdb ./gyre
(gdb) run
should suffice.
Update 2:
I've tried running the program itself under gdb and have the same problem.
I see plenty of expected function names listed by nm so the binary cannot have been stripped.
This implies either:
some kind of non-standard setting in ~/.gdbinit, or
a bug in GDB.
To eliminate the former, try gdb -nx ./gyre.
For the latter, try a different version of GDB, or make the binary available somewhere and I can take a look.
Update 3:
The reason GDB can't produce a stack trace is that your stack is getting corrupted on line simulation.f90:45:
(gdb) bt
#0 simulation::new_default_sim_spec () at simulation.f90:45
#1 0x0000000000401054 in gyre () at gyre.f90:21
#2 0x0000000000401fad in main (argc=1, argv=0x7fffffffeb24) at gyre.f90:3
#3 0x00007ffff742876d in __libc_start_main (main=0x401f79 <main>, argc=1, ubp_av=0x7fffffffe878, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe868) at libc-start.c:226
#4 0x0000000000400be9 in _start ()
(gdb) n
41 in simulation.f90
(gdb) bt
#0 simulation::new_default_sim_spec () at simulation.f90:41
#1 0x0000000000000000 in ?? ()
Notice how before line 45 the stack is good, but after it's not. The particular instruction that "wipes" the stack is this one:
=> 0x408fde <__simulation_MOD_new_default_sim_spec+93>: movq $0x0,0x8(%rbp)
Without access to your sources, and with 20 years since I last touched Fortran, I can't make an intelligent guess at what kind of Fortran code could provoke such a bug.
Newer gcc versions default to dwarf-4 format debug info. If you have an older toolchain it might not understand it. Try -gdwarf-2.

Resources