I'm compiling with gfortran 4.8.1 with the flags -ggdb -O0 -Wall -Wextra -Wtabs -Wsurprising -fbacktrace -fimplicit-none -fcheck=all -std=f2008. Running in gdb I get a backtrace with no procedure names:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000000100000000 in ?? ()
#2 0x00007fffffffd760 in ?? ()
#3 0x3f1a36e2eb1c432d in ?? ()
#4 0x3da5fd7fe1796495 in ?? ()
#5 0x4024000000000000 in ?? ()
#6 0x3eb0c6f7a0b5ed8d in ?? ()
#7 0x408f400000000000 in ?? ()
#8 0x408f400000000000 in ?? ()
#9 0x0000000000000000 in ?? ()
After executing ./gyre, which segfaults, I invoke gdb ./gyre core. I see a warning Can't read pathname for load map: Input/output error, but I'm not sure if that's relevant to the problem.
What do I need to do to see where the SIGSEGV occurred?
Update:
So I suspect the stack corruption must be related to a procedure point initialisation, since my code was not segfaulting prior to this change. I'm not able to provide the full source code, but the relevant snippet is
pure function new_default_sim_spec() result(spec)
type(sim_spec_type) :: spec
spec = new_sim_spec(1.0_dp)
end function new_default_sim_spec
pure function new_sim_spec(max_days) result(spec)
type(sim_spec_type) :: spec
real(dp), intent(in) :: max_days
! snipped other attribute assignments
spec%increment_h => increment_h_euler
end function new_sim_spec
abstract interface
pure function increment_h_iface(spec, state_minus_1) result(state)
import :: sim_state_type, sim_spec_type
type(sim_state_type) :: state
class(sim_spec_type), intent(in) :: spec
type(sim_state_type), intent(in) :: state_minus_1
end function increment_h_iface
end interface
type sim_spec_type
! snipped other attribute declarations
procedure(increment_h_iface), pointer :: increment_h => null()
end type sim_spec_type
Running in gdb I get a backtrace with no procedure names:
How did you run GDB?
I am guessing you did gdb /path/to/core. Try gdb /path/to/executable /path/to/core instead.
Update:
gdb ./gyre core. I see a warning ...
That warning is irrelevant (and frequently there, though I don't understand the exact conditions which trigger it).
The other obvious way to check where SIGSEGV occurred is to simply run the binary under GDB from the start. You don't need to wait for core, a simple:
gdb ./gyre
(gdb) run
should suffice.
Update 2:
I've tried running the program itself under gdb and have the same problem.
I see plenty of expected function names listed by nm so the binary cannot have been stripped.
This implies either:
some kind of non-standard setting in ~/.gdbinit, or
a bug in GDB.
To eliminate the former, try gdb -nx ./gyre.
For the latter, try a different version of GDB, or make the binary available somewhere and I can take a look.
Update 3:
The reason GDB can't produce a stack trace is that your stack is getting corrupted on line simulation.f90:45:
(gdb) bt
#0 simulation::new_default_sim_spec () at simulation.f90:45
#1 0x0000000000401054 in gyre () at gyre.f90:21
#2 0x0000000000401fad in main (argc=1, argv=0x7fffffffeb24) at gyre.f90:3
#3 0x00007ffff742876d in __libc_start_main (main=0x401f79 <main>, argc=1, ubp_av=0x7fffffffe878, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe868) at libc-start.c:226
#4 0x0000000000400be9 in _start ()
(gdb) n
41 in simulation.f90
(gdb) bt
#0 simulation::new_default_sim_spec () at simulation.f90:41
#1 0x0000000000000000 in ?? ()
Notice how before line 45 the stack is good, but after it's not. The particular instruction that "wipes" the stack is this one:
=> 0x408fde <__simulation_MOD_new_default_sim_spec+93>: movq $0x0,0x8(%rbp)
Without access to your sources, and with 20 years since I last touched Fortran, I can't make an intelligent guess at what kind of Fortran code could provoke such a bug.
Newer gcc versions default to dwarf-4 format debug info. If you have an older toolchain it might not understand it. Try -gdwarf-2.
Related
I am trying to learn more about kernel and driver development, so for that purpose I thought to use KVM and gdb to establish debug session with custom installed kernel (v5.1.0).
The kernel has debug info included, and here is a chunk of .config I used:
$ rg -i "(debug|kalls|GDB_SCRIPTS).*=y" .config
205:CONFIG_KALLSYMS=y
206:CONFIG_KALLSYMS_ALL=y
...
225:CONFIG_SLUB_DEBUG=y
...
9620:CONFIG_DEBUG_INFO=y
9623:CONFIG_DEBUG_INFO_DWARF4=y
9624:CONFIG_GDB_SCRIPTS=y
9640:CONFIG_DEBUG_KERNEL=y
...
By using "-s" option I can connect to Ubuntu 18.04 kernel in my VM, but gdb does not show any symbols:
Reading symbols from vmlinux...
(gdb) target remote :1234
Remote debugging using :1234
0xffffffff8ea4af66 in ?? ()
(gdb) bt
#0 0xffffffff8ea4af66 in ?? ()
#1 0xffffffff8f603e38 in ?? ()
#2 0xffffffff8ea4abb2 in ?? ()
#3 0x0000000000000000 in ?? ()
(gdb) i t
Ambiguous info command "t": target, tasks, terminal, threads, tp, tracepoints, tvariables, type-printers, types.
(gdb) i threads
Id Target Id Frame
* 1 Thread 1 (CPU#0 [halted ]) 0xffffffff8ea4af66 in ?? ()
2 Thread 2 (CPU#1 [halted ]) 0xffffffff8ea4af66 in ?? ()
(gdb) b printk
Breakpoint 1 at 0xffffffff81101fa3: file /home/ilukic/projects/kernel/linux-stable/kernel/printk/printk.c, line 2030.
(gdb) c
Continuing.
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0xffffffff81101fa3
Command aborted.
(gdb) disassemble 0xffffffff81101f83,100
Dump of assembler code from 0xffffffff81101f83 to 0x64:
End of assembler dump.
(gdb) disassemble 0xffffffff81101f83,+100
Dump of assembler code from 0xffffffff81101f83 to 0xffffffff81101fe7:
0xffffffff81101f83 <kmsg_dump_rewind_nolock+19>: Cannot access memory at address 0xffffffff81101f83
(gdb) disassemble 0xffffffff81101fa3,+10
Dump of assembler code from 0xffffffff81101fa3 to 0xffffffff81101fad:
0xffffffff81101fa3 <printk+0>: Cannot access memory at address 0xffffffff81101fa3
At the end, when inspecting /proc/kallsyms on VM (e.g. searching for printk symbol from previous gdb session), no symbol is found:
~$ cat /proc/kallsyms | grep "t printk"
0000000000000000 t printk_safe_log_store
0000000000000000 t printk_late_init
~$ uname -a
Linux ubuntu18 5.1.0 #2 SMP Tue Nov 12 19:01:21 CET 2019 x86_64 x86_64 x86_64 GNU/Linux
On the other hand when using objdump, "printk" can be found in vmlinux and as seen, gdb does not complain about missing symbol when setting a breakpoint.
I am assuming that installation of kernel went well as no errors were reported, still I can't explain why I can't find corresponding symbols in kallsyms.
Other thing that I find strange is when going through /proc/kallsyms why do all the lines start with 0s.
Any ideas why is gdb not showing any symbols?
As #IanAbbott suggested, CONFIG_RANDOMIZE_BASE=y (or "nokaslr" kernel command line argument)
was missing to prevent KASLR.
I was trying to debug inkscape and put a breakpoint at an address in its main shared library (i.e., /usr/lib/inkscape/libinkscape_base.so). When the execution reached that breakpoint, the backtrace was as follows:
#0 0x00007ffff6ecb220 in __static_initialization_and_destruction_0 (__priority=65535, __initialize_p=1) at /usr/include/c++/7/iostream:74
#1 0x00007ffff6ecb220 in _GLOBAL__sub_I_log_display_config.cpp(void) () at ./src/debug/log-display-config.cpp:83
#2 0x00007ffff7de5733 in call_init (env=0x7fffffffddd8, argv=0x7fffffffddc8, argc=1, l=<optimized out>) at dl-init.c:72
#3 0x00007ffff7de5733 in _dl_init (main_map=0x7ffff7ffe170, argc=1, argv=0x7fffffffddc8, env=0x7fffffffddd8) at dl-init.c:119
#4 0x00007ffff7dd60ca in _dl_start_user () at /lib64/ld-linux-x86-64.so.2
#5 0x0000000000000001 in ()
#6 0x00007fffffffe176 in ()
#7 0x0000000000000000 in ()
As can be seen, #0 and #1 point to the same address but different source locations. The same is true for #2 and #3. How is it possible?
How is it possible?
It's possible with inlining.
GCC emits sufficient debug info for GDB to tell that a particular address, even though it is physically located inside bar, actually belongs to inlined foo.
Since foo is "not really there", but a call to it is synthesized by GDB in the backtrace output, what address GDB prints for it is somewhat irrelevant.
GDB used to print no address at all (my version 8.3.50.20190824-24.fc31 still does), but I guess this isn't dependable, and sometimes GDB may just repeat the previous return address.
The following lin is used to compile an executable from the Fortran source code
gfortran -funderscoring -O3 -Wall -c -fmessage-length=0 -o "src/abc.o" "../src/abc.f"
When I run my program in command prompt and errors occur, it will show runtime errors in the command prompt (see the runtime error example below). I want to disable the display of the runtime errors as I am worried that this will reveal the source code. How can I do that?
At line 429 of file ../src/abc.f (unit = 5, file = 'stdin')
Fortran runtime error: Bad value during integer read
Error termination. Backtrace:
Could not print backtrace: libbacktrace could not find executable to open
#0 0xffffffff
#1 0xffffffff
#2 0xffffffff
#3 0xffffffff
#4 0xffffffff
#5 0xffffffff
#6 0xffffffff
#7 0xffffffff
#8 0xffffffff
#9 0xffffffff
#10 0xffffffff
#11 0xffffffff
#12 0xffffffff
#13 0xffffffff
I am not aware of any such option and I am not useful of its usefulness anyway. Fortran rewuires an error condition to be handled eventhough it does not prescribe the form of the message.
You can always use the iostat= or err= specifiers to handle the error conditions yourself in anu way you like.
I'm developing a fortran code (standard 2003) in which I have to control all non-nominal exits.
When executing the code without arguments (it requires a number of args) I received the expected exit code + some non-asked backtrace info, as you may see below:
./test_1
Error | Wrong number of inputs in test_1
STOP 128
Backtrace for this error:
#0 0x0000003b9b0ac584 in wait () from /lib64/libc.so.6
#1 0x00007ff41d8ff00d in ?? () from /usr//lib64/libgfortran.so.3
#2 0x00007ff41d90082e in ?? () from /usr//lib64/libgfortran.so.3
#3 0x00007ff41d90112f in _gfortran_stop_numeric () from usr//lib64/libgfortran.so.3
#4 0x000000000041f7d4 in _gfortran_stop_numeric_f08 ()
#5 0x000000000041b680 in MAIN__ ()
#6 0x000000000041f74d in main ()
The weird thing is that I don't have any flag in my compilation with optimization (I think) to invoke the backtracking.
gfortran -Wall -Wextra -Wuninitialized -Wno-maybe-uninitialized -O2 -finit-local-zero -I/opt/cots/netcdf_4.2_gfortran/include -L/usr//lib64 -Wl,-rpath,/usr//lib64 -L/opt/cots/netcdf_4.2_gfortran/lib -Wl,-rpath,/opt/cots/netcdf_4.2_gfortran/lib -o test_1 test_1.o -lnetcdff -lnetcdf -lz -lm
I have it though in the debug mode. But I'm using the optimized executable...
Anyone knows how I can get rid of the backtrace info?
I'm assuming it's nothing related to the code since it appears after the stop order.
Thanks a lot!
You can use -fno-backtrace for GCC versions where -fbacktrace is the default.
I have a core dump that lacks debugging information. The dump was caused by a not-so-reproducible bug.
Since I know exactly which version of source and the original build commands and optimization levels, is it possible to generate debugging information for this executable?
Yes this is possible. Here is a rather verbose example.
Program to produce a crash (crash.c):
#include <stdio.h>
#include <string.h>
int func(char *str){
char buff[32];
strcpy(buff,str);
return 0;
}
int main(int argc, char *argv[]){
func(argv[1]);
return 0;
}
Compile a version without debug symbols:
$ gcc crash.c -o crash
Compile a version with debug symbols:
$ gcc -g crash.c -o crash_debug
Generate a core file using the binary without debug symbols:
$ ./crash AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault (core dumped)
Use gdb and the binary without debug symbols to look at the core:
$ gdb -q ./crash core
warning: ~/.gdbinit.local: No such file or directory
Reading symbols from ./crash...(no debugging symbols found)...done.
warning: exec file is newer than core file.
[New LWP 7768]
Core was generated by `./crash AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000040052b in func ()
As we can see, gdb could not find any debug symbols.
Now lets try to start gdb with the binary that was build to include debug symbols and the core file:
$ gdb -q ./crash_debug core
warning: ~/.gdbinit.local: No such file or directory
Reading symbols from ./crash_debug...done.
warning: core file may not match specified executable file.
[New LWP 7768]
Core was generated by `./crash AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000040052b in func (str=0x7fff4bb66f73 'A' <repeats 52 times>) at crash.c:8
8 }
This works!
Another way, as #dbrank0 suggested, would be using the symbol-file command to load the symbols from a different binary:
$ gdb -q -c core
warning: ~/.gdbinit.local: No such file or directory
[New LWP 7768]
Core was generated by `./crash AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000040052b in ?? ()
gdb$ symbol-file crash_debug
Reading symbols from crash_debug...done.
gdb$ bt
#0 0x000000000040052b in func (str=0x7fff4bb66f73 'A' <repeats 52 times>) at crash.c:8
#1 0x4141414141414141 in ?? ()
#2 0x00007f0041414141 in ?? ()
#3 0x0000000200000000 in ?? ()
#4 0x0000000000000000 in ?? ()
gdb$
Hope this helps!