I'm developing a fortran code (standard 2003) in which I have to control all non-nominal exits.
When executing the code without arguments (it requires a number of args) I received the expected exit code + some non-asked backtrace info, as you may see below:
./test_1
Error | Wrong number of inputs in test_1
STOP 128
Backtrace for this error:
#0 0x0000003b9b0ac584 in wait () from /lib64/libc.so.6
#1 0x00007ff41d8ff00d in ?? () from /usr//lib64/libgfortran.so.3
#2 0x00007ff41d90082e in ?? () from /usr//lib64/libgfortran.so.3
#3 0x00007ff41d90112f in _gfortran_stop_numeric () from usr//lib64/libgfortran.so.3
#4 0x000000000041f7d4 in _gfortran_stop_numeric_f08 ()
#5 0x000000000041b680 in MAIN__ ()
#6 0x000000000041f74d in main ()
The weird thing is that I don't have any flag in my compilation with optimization (I think) to invoke the backtracking.
gfortran -Wall -Wextra -Wuninitialized -Wno-maybe-uninitialized -O2 -finit-local-zero -I/opt/cots/netcdf_4.2_gfortran/include -L/usr//lib64 -Wl,-rpath,/usr//lib64 -L/opt/cots/netcdf_4.2_gfortran/lib -Wl,-rpath,/opt/cots/netcdf_4.2_gfortran/lib -o test_1 test_1.o -lnetcdff -lnetcdf -lz -lm
I have it though in the debug mode. But I'm using the optimized executable...
Anyone knows how I can get rid of the backtrace info?
I'm assuming it's nothing related to the code since it appears after the stop order.
Thanks a lot!
You can use -fno-backtrace for GCC versions where -fbacktrace is the default.
Related
There are tons of questions and answers about GDB and the "No debugging symbols found" warning, but all of those assume that the debugging symbols are not part of the .elf file.
I'm dealing with the following:
I have a .elf file with debugging symbols. I can verify this by doing objdump, and seeing a disassembly with the subroutine labels being present.
When I load the .elf file, it loads file correctly.
When I then do list to list the C code, I get No symbol table is loaded. Use the "file" command.
However, I can still do things like break main or p/x global_cntr!
When I do file progmem.elf, there's no difference in behavior: I get (No debugging symbols found in progmem.elf), but breakpoints etc still work.
GCC and GDB are using the same version of the GCC toolchain
I tried using -gdwarf-3 instead of -ggdb. No difference.
I'm lost...
I'm using a RISC-V toolchain, if that matters.
Here's an excerpt of my Makefile:
TARGET = $(TOOLS_PREFIX)/riscv32-unknown-elf
AS = $(TARGET)-as
ASFLAGS = -march=$(MARCH) -mabi=ilp32
LD = $(TARGET)-gcc
LDFLAGS = -march=$(MARCH) -g -mabi=ilp32 -Wl,-Tsections.lds,-Map,progmem.map -ffreestanding -nostartfiles -Wl,--no-relax
CC = $(TARGET)-gcc
CFLAGS = -march=$(MARCH) -g -ggdb -mno-div -mabi=ilp32 -Wall -Wextra -pedantic -DCPU_FREQ=$(CPU_FREQ_MHZ)000000 $(CC_OPT)
...
progmem.elf: $(OBJ_FILES) top_defines.h sections.lds Makefile
$(LD) $(LDFLAGS) -o $# $(OBJ_FILES) -lm
And here's a log of my GDB session:
/opt/riscv32im/bin//riscv32-unknown-elf-gdb progmem.elf \
-ex "target remote localhost:3333"
...
Remote debugging using localhost:3333
0x0000002e in rdcycle64 ()
(gdb)
(gdb)
(gdb) monitor soft_reset_halt
requesting target halt and executing a soft reset
(gdb) file progmem.elf
A program is being debugged already.
Are you sure you want to change the file? (y or n) y
Reading symbols from progmem.elf...
(No debugging symbols found in progmem.elf)
(gdb) br main
Breakpoint 1 at 0x5d6
(gdb) load
Loading section .memory, size 0x5790 lma 0x0
Start address 0x0, load size 22416
Transfer rate: 23 KB/sec, 11208 bytes/write.
(gdb) c
Continuing.
Program stopped.
0x000005d6 in main ()
(gdb) p/x global_cntr
$1 = 0x0
(gdb) l
No symbol table is loaded. Use the "file" command.
(gdb)
I have a .elf file with debugging symbols. I can verify this by doing objdump, and seeing a disassembly with the subroutine labels being present.
Debugging symbols are not the same as symbols. For disassembly, you only need the latter. For source listing you need the former.
I can still do things like break main or p/x global_cntr!
These also require only the symbol table.
You can confirm that you don't have debug symbols using objdump -g progmem.elf or readelf -wi progmem.elf.
Your command lines look like debug symbols should be included, but there is no telling what you do with .debug_* sections in your sections.lds linker script. Probably you discard them, which would explain why they aren't there.
Update:
Do you by any chance has an example sections.lds file that has them included?
ld --verbose should print the default linker script. Here is one example.
My original linker script was the following:
SECTIONS {
.memory : {
. = 0x00000;
start*(.text);
*(.text);
*(*);
end = .;
}
}
I suspect that my issue was caused by the catchall *(*); which moved all sections into the .text section.
I replaced it with the following script:
MEMORY
{
ram (ax) : ORIGIN = 0x00000000, LENGTH = 16K
}
SECTIONS {
}
After this, .debug_* symbols are included.
This script should be refined with more precise placement of various sections, but it's good enough to unblock me.
I was having trouble running fortran code, so I tried an example code in here:
https://gcc.gnu.org/onlinedocs/gcc-8.4.0/gfortran/ICHAR.html
program read_val
integer value
character(len=10) string, string2
string = '154'
! Convert a string to a numeric value
read (string,'(I10)') value
print *, value
! Convert a value to a formatted string
write (string2,'(I10)') value
print *, string2
end program read_val
I did
gfortran -o hello3 hello3.f -g3 -fcheck=all -Wall -fbacktrace
And it gave me no warning nor error. However,
./hello3
failed with
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x103eab35c
#1 0x103eaa6f3
#2 0x7fff7376cb5c
#3 0x103fef340
#4 0x103fefd2d
#5 0x103fed78f
#6 0x103ea5cca
#7 0x103ea5e96
Segmentation fault: 11
I somehow feel like my gfortran compiler doesn't work properly. I'm not familiar with Mac OS and feel like Xcode/Anaconda/etc messed up my system.
I'm using GNU Fortran (Homebrew GCC 9.3.0_1) 9.3.0, MacOS Mojave 10.14.6.
gfortran path is /usr/local/bin/gfortran
Currently my gfortran is from 'brew install gcc'. I also tried manual download from https://github.com/fxcoudert/gfortran-for-macOS/releases, but it didn't worked either.
As far as I can see the code is fine, and it complies and runs correctly on my system
ian#eris:~/work/stack$ cat busted.f90
program read_val
integer value
character(len=10) string, string2
string = '154'
! Convert a string to a numeric value
read (string,'(I10)') value
print *, value
! Convert a value to a formatted string
write (string2,'(I10)') value
print *, string2
end program read_val
ian#eris:~/work/stack$ gfortran -std=f2008 -Wall -Wextra -fcheck=all -g busted.f90
ian#eris:~/work/stack$ ./a.out
154
154
So as far as I can see your instillation of gfortran is broken. But please always use Implicit None
I get a segmentation fault from a memory allocation statement just because I have linked some unrelated procedures to the binary.
I have a very simple Fortran program:
program whatsoever
!USE payload_modules
double precision,allocatable:: Vmat(:,:,:)
allocate(Vmat(2,2,2))
Vmat=1
write(*,*) Vmat
deallocate (Vmat)
! some more lines of code using procedures from payload_module
end program whatsoever
Compiling this using gfortran whatsoever.f95 -o whatsoever leads to a program with the expected behaviour. Of course, this program is not made to print eight times 1.000 but to call the payload_modules, yet hidden in the comments. However, if I compile and link the program with the modules issuing
gfortran -c -g -fPIC -ffpe-trap=overflow -pedantic -fbounds-check \
-fimplicit-none payload_module1.f90 payload_module2.f90 whatsever.f95
gcc -g -nostdlib -v -Wl,--verbose -std=gnu99 -shared -Wl,-Bsymbolic-functions \
-Wl,-z,relro -o whatsoever whatsoever.o payload_module1.o payload_module2.o
the program whatsoever doesn't run any more. I get a segmentation fault at the allocate statement. I have not yet uncommented the lines related to the modules (however, uncommenting them leads to the same behaviour)!
I know that the payload modules' code is not buggy because I ran it before from R and wrapped this working code into a f90-module. There are no name collisions; nothing in the modules is called Vmat. There is only one other call to allocate in the modules. It never caused any trouble. There is still plenty of memory left. gdb didn't give me any hints expect a memory address.
How can linking routines that are actually not called crash a program?
Compiling your code with
gfortran whatsoever.f95 -o whatsoever
is working because you link against the system libraries, everything is in place. This would correspond to
gfortran whatsoever.f95 payload_module1.f90 payload_module2.f90 -o whatsoever
which would also work. The commands you used instead omit the system libraries, and the code fails at the first time you call a function from there (the allocation). You don't see that you are missing the libraries, because you create a shared object (which is typically linked against the libraries later on).
You chose to separate compiling the objects and linking them into an executable. Doing this for Fortran program using gcc you need to specify the Fortran libraries, so there's a -lgfortran missing.
I'm not sure about that particular choice of compile options... -shared is usually used for libraries, are you sure you want a shared binary (whatever that is)?
With -nostdlib you tell the compiler not to link against the system libraries. You would then need to specify those libraries (which you don't).
For the main program test.F90 and a module payload.F90, I run
gfortran -c -g -fPIC -ffpe-trap=overflow -pedantic -fbounds-check \
-fimplicit-none payload.F90 test.F90
gcc -g -v -Wl,--verbose -std=gnu99 -Wl,-Bsymbolic-functions \
-Wl,-z,relro -lgfortran -o whatsoever test.o payload.o
This compiles and executes correctly.
It might be easier to use the advance options with gfortran:
gfortran -g -fPIC -ffpe-trap=overflow -pedantic -fbounds-check \
-fimplicit-none -Wl,-Bsymbolic-functions -Wl,-z,relro \
payload.F90 test.F90 -o whatsoever
The result is the same.
I'm compiling with gfortran 4.8.1 with the flags -ggdb -O0 -Wall -Wextra -Wtabs -Wsurprising -fbacktrace -fimplicit-none -fcheck=all -std=f2008. Running in gdb I get a backtrace with no procedure names:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000000100000000 in ?? ()
#2 0x00007fffffffd760 in ?? ()
#3 0x3f1a36e2eb1c432d in ?? ()
#4 0x3da5fd7fe1796495 in ?? ()
#5 0x4024000000000000 in ?? ()
#6 0x3eb0c6f7a0b5ed8d in ?? ()
#7 0x408f400000000000 in ?? ()
#8 0x408f400000000000 in ?? ()
#9 0x0000000000000000 in ?? ()
After executing ./gyre, which segfaults, I invoke gdb ./gyre core. I see a warning Can't read pathname for load map: Input/output error, but I'm not sure if that's relevant to the problem.
What do I need to do to see where the SIGSEGV occurred?
Update:
So I suspect the stack corruption must be related to a procedure point initialisation, since my code was not segfaulting prior to this change. I'm not able to provide the full source code, but the relevant snippet is
pure function new_default_sim_spec() result(spec)
type(sim_spec_type) :: spec
spec = new_sim_spec(1.0_dp)
end function new_default_sim_spec
pure function new_sim_spec(max_days) result(spec)
type(sim_spec_type) :: spec
real(dp), intent(in) :: max_days
! snipped other attribute assignments
spec%increment_h => increment_h_euler
end function new_sim_spec
abstract interface
pure function increment_h_iface(spec, state_minus_1) result(state)
import :: sim_state_type, sim_spec_type
type(sim_state_type) :: state
class(sim_spec_type), intent(in) :: spec
type(sim_state_type), intent(in) :: state_minus_1
end function increment_h_iface
end interface
type sim_spec_type
! snipped other attribute declarations
procedure(increment_h_iface), pointer :: increment_h => null()
end type sim_spec_type
Running in gdb I get a backtrace with no procedure names:
How did you run GDB?
I am guessing you did gdb /path/to/core. Try gdb /path/to/executable /path/to/core instead.
Update:
gdb ./gyre core. I see a warning ...
That warning is irrelevant (and frequently there, though I don't understand the exact conditions which trigger it).
The other obvious way to check where SIGSEGV occurred is to simply run the binary under GDB from the start. You don't need to wait for core, a simple:
gdb ./gyre
(gdb) run
should suffice.
Update 2:
I've tried running the program itself under gdb and have the same problem.
I see plenty of expected function names listed by nm so the binary cannot have been stripped.
This implies either:
some kind of non-standard setting in ~/.gdbinit, or
a bug in GDB.
To eliminate the former, try gdb -nx ./gyre.
For the latter, try a different version of GDB, or make the binary available somewhere and I can take a look.
Update 3:
The reason GDB can't produce a stack trace is that your stack is getting corrupted on line simulation.f90:45:
(gdb) bt
#0 simulation::new_default_sim_spec () at simulation.f90:45
#1 0x0000000000401054 in gyre () at gyre.f90:21
#2 0x0000000000401fad in main (argc=1, argv=0x7fffffffeb24) at gyre.f90:3
#3 0x00007ffff742876d in __libc_start_main (main=0x401f79 <main>, argc=1, ubp_av=0x7fffffffe878, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe868) at libc-start.c:226
#4 0x0000000000400be9 in _start ()
(gdb) n
41 in simulation.f90
(gdb) bt
#0 simulation::new_default_sim_spec () at simulation.f90:41
#1 0x0000000000000000 in ?? ()
Notice how before line 45 the stack is good, but after it's not. The particular instruction that "wipes" the stack is this one:
=> 0x408fde <__simulation_MOD_new_default_sim_spec+93>: movq $0x0,0x8(%rbp)
Without access to your sources, and with 20 years since I last touched Fortran, I can't make an intelligent guess at what kind of Fortran code could provoke such a bug.
Newer gcc versions default to dwarf-4 format debug info. If you have an older toolchain it might not understand it. Try -gdwarf-2.
I have an object file compiled using as (from assembler code).
If I link it using ld, when I try to stepi (or nexti) gdb complains about memory access at address 0x0. If I link it using gcc, all is fine.
I am guessing the problem is caused by ld, which produces fewer sections when compared to the linking result of gcc.
Is there a way to configure gdb to be more verbose so I can maybe figure out what's wrong with the executable?
(gdb) b main
Breakpoint 1 at 0x100000f8e
(gdb) r
Breakpoint 1, 0x0000000100000f8e in main ()
(gdb) x/10i $pc
0x100000f8e <main>: fbld 0x6c(%rip) # 0x100001000 <data1>
0x100000f94 <main+6>: fimul 0x7a(%rip) # 0x100001014 <data2>
0x100000f9a <main+12>: fbstp 0x60(%rip) # 0x100001000 <data1>
0x100000fa0 <main+18>: mov0x0 $0x2000001,%rax
0x100000fa7 <main+25>: mov $,%rdi
0x100000fae <main+32>: syscall
(gdb) si
Cannot access memory at address 0x0
0x0000000100000f94 in main ()
PS: The executable itself runs as expected in both versions.
Later edit: commands i've used to compile:
as -arch x86_64 src.s -o src.o
ld -e _main -arch x86_64 src.o -o src
gcc -o src src.o
gdb has a "show debug" command, giving various internal debug settings. E.g. "set debug target 1" will turn on tracing for gdb's interaction with the target process. You might want to experiment with every flag they have (there aren't that many).
GCC doesn't actually do the linking, it just calls ld on your behalf. The options it's providing must be different from the ones you are using.
Per this thread:
How to get GCC linker command?
You should be able to see the ld invocation's command line by running gcc -v.
That should tell you how to modify your ld command line so things work for you.