gfortran localizing a bug that does not occur in the debugger - windows

I've generated an gfortran executable I call mtc08.exe that exhibits the following behavior:
1) If I run it in gdb it runs successfully to the end
2) If I run it normally, re routing the output with the windows command
mtc08.exe >out
It gives me "floating point exception - erroneous arithmetic operation" but does not say where that occurs. (The "backtrace" information is incomprehensible to me, and it seems it cannot contain much information because all is the letter "f".)
Then I'm trying to localize the problem by seeing where the program stops writing results, but having difficulties there also, because I get the impression the program may be multi tasking and proceeding with "future" arithmetic operations before completing output operations that in any case do not interfere.
Does such multi tasking occur? If so, can I turn it off with a compiler switch, so I'm really sure it is doing all operations sequentially?
The currently used compilation command in windows is:
rem debug compilation:
gfortran -static -fdefault-real-8 -fdefault-integer-8 -g -ffpe-trap=invalid,zero,overflow,underflow,denormal -Wall -fcheck=all #mtc08.fls -o mtc08.exe
where mtc08.fls is a file containing the names of all source files.
It may be that by removing some of the "-ffpe-trap" options it would run, but this rattles my confidence, and I'd like to get to the bottom of it, rather than just find a way around.
I can of course give more information, but seeing the error is not localized, that is not so practical.

I still don't know how to get better diagnostics, but still managed to find the cause of the problem by trial and error, which came from an uninitialized variable.
The problem appears to arise because gdb used zero for uninitialized variables, which should have been their values anyhow, and running the same executable without gdb did use values that lead to "floating point exception".
One way to get better diagnostics in gdb is to use -finit-real-nan in the compilation so that uninitialized variables will get more attention from the debugger.
Perhaps with better coding practice, such as intent declarations for all arguments, the debugger would also be better in picking up uninitialized variables without having to initialize them to nan.

Related

How to quickly get to the first compiler error message in terminal?

I feel like this is a common question, but I couldn't find anything on it. Often times when I compile a program, I'll have a long list of compilation errors, and I have to scroll up in the terminal to find the first error. This is kind of tedious and sometimes I scroll past the first error without noticing. Is there a quicker way to navigate this?
Terminal-specific approach:
$ clear && make
Then use the shortcut Shift+Home to jump to the top. This shortcut may not be available for all consoles. It seems to work for gnome-terminal and agetty. It does not seem to work for xterm, but I would assume that such consoles can be configured to add a shortcut (e.g. for xterm see link).
Compiler-specific approach:
Alternatively, you could use compiler mechanisms for limiting the number of errors to be shown. Both clang and gcc support -Wfatal-errors for exiting on the first error (not to be confused with -Werror which turns warnings into errors). From the gcc man page:
-Wfatal-errors
This option causes the compiler to abort compilation on the first
error occurred rather than trying to keep going and printing
further error messages.
Exiting on the first error might be unhelpful in some cases (i.e. when missing some header, there might be an error about a missing ; before the error telling you which identifier is undefined).
For this reason, it might be more helpful to limit the number of errors. For gcc there is -fmax-errors=n (for clang -ferror-limit=n) for showing at most n errors. You could adjust it to a small number that allows you to see all errors at once without scrolling.
-fmax-errors=n
Limits the maximum number of error messages to n, at which
point GCC bails out rather than attempting to continue
processing the source code. If n is 0 (the default), there is
no limit on the number of error messages produced. If
-Wfatal-errors is also specified, then -Wfatal-errors takes
precedence over this option.

what do I do with an SIGFPE address in gdb?

While running an executable in gdb, I encountered the following error:
Program received signal SIGFPE, Arithmetic exception.
0x08158307 in radtra_ ()
How do I understand what line number and file does 0x08158307 without recompiling or otherwise modifying the source? if it helps, the source language was Fortran.
How do I understand what line number and file does 0x08158307 without recompiling or otherwise modifying the source?
That isn't easy. You could use GDB disassemble command, look for access to global variables and CALL instructions, and make a guess where inside radtra_ you are. This is harder the larger the routine is, the more optimizations compiler has applied to it, and the fewer calls and global variable accesses are performed.
If you can't guess, your only options are:
Rebuild the application adding -g flag, but leaving all other compile options unmodified, then use addr2line to translate the address to line number. (This is how you should build the application from the start.)
If you can't rebuild the entire application, rebuild just the source containing radtra_ (again with same flags, but add -g). You should be able to match the output from objdump -d radtra.o with the output from disassemble. Once you have a match, read output from readelf -wl radtra.o or objdump -g radtra.o to associate code offsets within radtra_ with source lines that code was generated from.
Hire an expert to guess for you. This wouldn't be cheap, as people skilled in this kind of reverse engineering are usually gainfully employed and value their time.

Difficulties compiling fortran .f95 file, how to debug?

I am trying to learn fortran. I wanted to replicate a certain step in a paper but I ran into trouble.
I compiled the file AERsimulation.f95 (I turned on all debugging functions in gfortran I am aware of) I could generate an .out file without any errors (a lot of warnings, however...)
When I tried to run the .out file I got the error message
Fortran runtime error: Index '0' of dimension 1 of array 'k' below lower bound of 1
Now, it is quite difficult for me to understand why exactly this happens. I guess, my question is, whether there is a better way of debugging, so that I can see and click through the code 'live' and see why the error occurs. (I am thinking of the matlab-debugger for instance...)
Any suggestion/hint is very welcome
The files I use are
AERsimulation.f95
AERDATANB.TXT
Thank you very much
Best
Derrick
The meaning of your error message is that you try to access an array element at the position 0 of the array. Arrays in Fortran start at 1 by default.
If you are looking for a better way to debug, try gdb (command line) or if you prefer a graphical interface you can try the Netbeans IDE. It has (limited) support for Fortran an a debugging mode where you can click line by line through the code and see the values of all variables and so on.
On command line try:
gdb name_of_executable
run
the debugger will stop at the line which causes the error.

How to trace error of OCaml programs?

I am writing a compiler written in OCaml. Sometimes when there is an error of execution, it shows the line of error, but it does not show the context, for instance, how the function is called, with which values...
In order to help debugging, does anyone know a way to show the steps of execution till the error with real value of the relevant variables?
By the way, I am using Emacs as editor.
Ocaml is compiled. You seem to be used to interpreted languages, where the run-time system has access to the full program source code. With a compiled program, the run-time system doesn't have access to much information. For example, variable names disappear at compile time, and nothing will keep track of the arguments passed to every function except as needed for the normal program execution (doing that would incur a lot of overhead).
If you compile your program with debugging symbols (pass the -g option to the compiler), you can get a stack trace if your program dies of an uncaught exception. You'll get function names and some program locations, but not detailed memory contents. Compiling with debugging information results in a bigger executable, but doesn't change the run-time performance. You need to set the OCAMLRUNPARAM environment variable to contain b when running the program.
ocamlc -g -o foo foo.ml
export OCAMLRUNPARAM=b
./foo
If you want more information, you need to run your program inside a debugger.

Using gdb with inlined functions

I'm trying to use gdb in postmortem mode with the core dump of a crashed process. I can get a stack trace, but instead of showing me the actual location in the offending function, gdb shows me the line number of a two-line inlined function that the offending function calls.
The inlined function is called many, many places; how do I find which call triggered the crash? How do I find the code immediately around the inlined function?
Go to the stack frame in question, print the instruction point (e.g. p $rip),
then use it to look it up manually with e.g. "addr2line -e -i 0x84564756".
This doesn't scale, but at least it works.
I assume that the "many many calls to the inlined function" are all happening from within a single "offending function" (otherwise your question doesn't make sense to me).
Your best bet is to note the IP address of the crash point in GDB, then use "objdump -dS ./a.out" and find that IP in the output.
You can try setting OPTIMIZE to NO (eg. setenv OPTIMIZE NO) and rebuild the project: this tells compiler not optimize code, hence it may not inline function calls.

Resources