GDB: how to call functions with modified parameters during debugging - debugging

Consider the following trivial Fortran program that adds two integers via a subroutine and prints the result:
PROGRAM MAIN
INTEGER I, J, SUM
I = 1
J = 1
CALL ADD(I, J, SUM)
WRITE(*,*) SUM
END
SUBROUTINE ADD(I, J, SUM)
INTEGER I, J, SUM
SUM = I + J
END
Compiling via gfortran -g -O0 gdb-mwe.f -o gdb-mwe and running in the GNU Debugger, I want to call ADD from the debugger with modified input arguments right before the write output. Here's what happens:
Reading symbols from gdb-mwe...done.
(gdb) break 10
Breakpoint 1 at 0x4007dd: file gdb-mwe.f, line 10.
(gdb) r
Starting program: /home/username/Documents/Fortran/gdb-mwe
Breakpoint 1, MAIN__ () at gdb-mwe.f:10
10 WRITE(*,*) SUM
(gdb) p j = j+1
$2 = 2
(gdb) call add(i,j,sum)
Program received signal SIGSEGV, Segmentation fault.
0x000000000040079a in add (
i=<error reading variable: Cannot access memory at address 0x1>,
j=<error reading variable: Cannot access memory at address 0x2>,
sum=<error reading variable: Cannot access memory at address 0x2>)
at gdb-mwe.f:18
18 SUM = I + J
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on".
Evaluation of the expression containing the function
(add) will be abandoned.
When the function is done executing, GDB will silently stop.
How do I get this right?

As pointed out in the comments, the open bugs in gdb prevents doing this currently.
A possible workaround would be to debug a 32-bit version of the code. This results in some differences, but for simple debugging tasks it may be sufficient.
For intel fortran compilers, this requires only adding the -m32 flag (provided 32-bit libraries have been installed).
For gfortran it seems that installing the multilib package first is necessary, as show in this questions.

Related

Evaluate an expression in gdb and lldb

I'm trying to understand GDB and LLDB so that I can efficiently used it to debug my Programs at any point.
But it seem that I'm stuck I'm not sure how to print the output of C library function like pow, strnlen etc. If I ever want to explore there output.
Following are by LLDB and GDB output.
3 int main(int argc,char *argv[]) {
4 int a = pow(3,2);
-> 5 printf("the value of a is %d",a);
6 return 0;
7 }
(lldb) print pow(3,1)
warning: could not load any Objective-C class information. This will significantly reduce the quality of type information available.
error: 'pow' has unknown return type; cast the call to its declared return type
(lldb) print strlen("abc")
warning: could not load any Objective-C class information. This will significantly reduce the quality of type information available.
error: 'strlen' has unknown return type; cast the call to its declared return type
(lldb) expr int a = strlen("abc");
error: 'strlen' has unknown return type; cast the call to its declared return type
(lldb) expr int a = strlen("abc");
GDB output
Starting program: /Users/noobie/workspaces/myWork/pow
[New Thread 0x1903 of process 35243]
warning: unhandled dyld version (15)
Thread 2 hit Breakpoint 1, main (argc=1, argv=0x7fff5fbffb10) at pow.c:5
5 int a = pow(3,2);
(gdb) print pow(3,2)
No symbol "pow" in current context.
(gdb) set pow(3,2)
No symbol "pow" in current context.
(gdb) set pow(3,2);
No symbol "pow" in current context.
(gdb) print pow(3,2);
No symbol "pow" in current context.
(gdb) call pow(3,2)
No symbol "pow" in current context.
(gdb)
I have compiled the program using gcc with -g3 flag
i.e
gcc -g3 pow.c -o pow
The error you are getting from lldb, e.g.:
error: 'strlen' has unknown return type; cast the call to its declared return type
is just what it says. You need to cast the call to the proper return type:
(lldb) print (size_t) strlen("abc")
(size_t) $0 = 3
The reason the type information is missing from strlen and printf etc. is that to save space, the compiler only writes the signatures of functions into the debug information when it sees the definition of the function, not at every use site. Since you don't have the debug information for the standard C libraries, you don't have this information.
The reason the debugger requires this information before it will call the function is that if you call a function that returns a structure, but generate code as though the function returned a scalar value, calling the function will corrupt the stack of the thread on which the function was called, ruining your debug session. So lldb doesn't guess about this.
Note, on macOS, the system has "module maps" for most of the system libraries, which allow lldb to reconstruct types from the modules. To tell lldb to load a module when debugging a pure C program, run this command:
(lldb) expr -l objective-c -- #import Darwin
If you are debugging an ObjC program, you can leave off the language specification. After this expression runs, lldb will have loaded the module map and you can call most of the functions in the standard C libraries without casting.
If you look at disassemble you'll see it just contains raw resulting value with no call to pow function. gcc knows what pow is and calculates it during compilation. There is no need to link with libm which contains implementation of given function => no function is present at runtime so debugger don't have anything to call.
You can force link via e.g. adding -lm (can be overridden with --as-needed linker flag, though).

gdb "watch" command breaks my loop?

First, I've got a infinite loop like this:
#include<stdio.h>
int main(){
int i=0;
int b=1;
while(b)
{
++i;
printf("%d\n",i);
}
return 0;
}
I tried to compile it, and run inside gdb, and break when "i==10", so I:
gcc 5.c -g && gdb a.out
(gdb) b main
Breakpoint 1 at 0x4005a3: file 5.c, line 3.
(gdb) r
Starting program: /home/console/a.out
Breakpoint 1, main () at 5.c:3
3 int i=0;
(gdb) watch i==10
Hardware watchpoint 2: i==10
(gdb) r
The program being debugged has been started already.
Well, the program seem to be terminated after "r". Why it doesn't break when "i==10"?
Thanks.
When you're watching your auto variable, the debugger needs to have the context available.
When you try to run the program again, gdb warns you: you'll lose the context and thus your hardware watchpoint (auto variables will be deallocated & reallocated).
r/run command runs the program from the start. You're mixing it up with continue. It's just that HW watchpoints on auto variables are cleaned up with a really unclear warning and it runs as infinite loop afterwards
To avoid this, there are several alternatives, all of them having their pros & cons:
just perform c (continue) instead of r: your hw watchpoint will work: cons: you cannot type r (you'll survive)
replace your hardware breakpoint by a conditional breakpoint: b 5.c:9 if i==10 (line 9 is the line of the printf). cons: performance will suffer because the breakpoint will be triggered each time, and gdb decides to interrupt or not depending on the condition
make g i global: allows to do what you wanted (restart) without a warning because in that case i isn't an auto variable. cons: global variables are not the best thing (specially when named i :))

Compiling single vs multiple source files in Intel Fortran

I have been compiling a project with modules and subroutines in different files. Each subroutine written in separate file. The same for the modules. Then, I tested compiling these files separately to object files (-c) and than linking with the optimization flags, and also using cat to merge the entire source code and applying the same procedure to this single source file. What I found is that the executable generated by compiling the single file was about 40% faster than that generated by the multiple files, despite using exactly same flags for both.
I would like to know if anyone knows why it is happening, and if there is any flag on the Intel Fortran compiler that compiles multiple files as they were a single file.
As #chw21 requested, I created a small program showing the problem:
program main
use operators
implicit none
integer :: n
real(8), dimension(:,:), allocatable :: a, b, c
integer :: i,j,k
n = 1000
allocate(a(n,n), b(n,n), c(n,n))
call random_number(a)
call random_number(b)
do j = 1, n
do i = 1, n
do k = 1, n
!c(i,j) = c(i,j) + a(k,i) * b(k,j)
c(i,j) = add(c(i,j), mul(a(k,i), b(k,j)))
enddo
enddo
enddo
write(*,*) sum(c)
end program
with module:
module operators
contains
function add(a,b) result (c)
real(8), intent(in) :: a, b
real(8) :: c
c = a + b
end function
function mul(a,b) result (c)
real(8), intent(in) :: a, b
real(8) :: c
c = a * b
end function
end module
The idea is that these functions should normally get inlined, if the compiler knows that they are so extremely small. I did three tests with -O2:
complete source in a single file
split in two files
split in two files with -ipo (or -flto)
The results for ifort 13.0.0 and gfortran 5.2.0 on different machines are:
Test | 1. | 2. | 3.
---------+-------+-------+-------
ifort | 1.3s | 15.7s | 1.9s
gfortran | 1.1s | 3.7s | 1.1s
Unfortunately, I don't know why there is still a difference between the 1st and 3rd test with ifort ... I guess, a look at the generated code would shed some light on this issue.
Update: The times were measured by executing time ./a.out which resulted in stable times. Due to the standard compilation with ifort -O2, the maximum instruction set should be SSE2 (thus, no FMA), the processor supports upto SSE4a (Opteron 6128). An additional test on a recent Intel processor (upto AVX) showed similar results.
An important thing seems to be the lack of inlining and vectorization of the inner loop, which gets applied during IPO and single-file-compilation (see --opt-report). Additionally, there seem to be some differences concerning vectorization between IPO and single-file-compilation.

Confusing debugging error in Fortran program

I've been sitting here for a while quite baffled as to why my debugger keeps displaying an error in my code when the program runs fine. There are three parts to a very simple program that is just reading in information from a file.
My code is broken into three Fortran files given below and compiled via
ifort -o test global.f90 read.f90 test.f90
global.f90:
module global
implicit none
integer(4), parameter :: jsz = 904
end module global
read.f90:
subroutine read(kp,q,wt,swt)
implicit none
integer(4) :: i, j
integer(4), intent(in) :: kp
real(8), intent(out) :: swt, q(kp,3), wt(kp)
swt = 0.0d0; q(:,:) = 0.0d0; wt(:) = 0.0d0
open(7,file='test.dat')
read(7,*) ! Skipping a line
do i = 1, kp
read(7,1000)(q(i,j),j=1,3), wt(i)
swt = swt + wt(i)
end do
close(7)
return
1000 format(3F10.6,1X,1F10.6)
end subroutine read
test.f90:
program test
use global
integer(4) :: i, j
real(8) :: tot, qq(jsz,3), wts(jsz)
call read(jsz,qq,wts,tot)
stop
end program test
The error I keep receiving is
Breakpoint 1, read (kp=904,
q=<error reading variable: Cannot access memory at address 0x69bb80>,
wt=..., swt=6.9531436082559572e-310) at read.f90:6
This error appears right when the subroutine of read is called. In other words, I'm adding a breakpoint at the read subroutine and running the code in gdb after the breakpoint is added. The program will continue to run as expected and give the correct outputs when I include write statements in the 'test' program. However, if I use the gdb print options I receive an error of 'Cannot access memory at address 0x69bb80' for array q only. All other arrays and variables can be displayed with no problems.
As I would like the read subroutine to be a stand alone subroutine and not necessarily use any global parameters, I have not used the global module and instead called the variable kp into the subroutine. I decided to test whether using the global module would help, and if I use jsz in place of kp, I do indeed remove the error. However, since this isn't my overall goal with the subroutine, I would hopefully like to figure out how to fix this without the use of the global module. (I also tried not using the global at all and setting the parameter variable of kp in the test.f90 program directly, but this also gives the error.)
Any insight on possible reasons for this error, or suggestions to try and fix the memory addressing issue would be greatly appreciated.
I think this is an issue specific to the ifort+gdb combination that is fixed with newer gdb versions. Here's a smaller example to reproduce the issue:
$ cat test.f90
subroutine bar(arg)
integer, intent(inout):: arg
print *, 'bar argument is', arg
arg = 42
end subroutine bar
program test
integer:: param
param = 3
call bar(param)
print *, 'post-bar param:', param
end program test
$ ifort -g -O0 -o test test.f90
$ gdb --quiet test
Reading symbols from /home/nrath/tmp/test...done.
(gdb) b 4
Breakpoint 1 at 0x402bd0: file test.f90, line 4.
(gdb) r
Starting program: /home/nrath/tmp/test
[Thread debugging using libthread_db enabled]
Breakpoint 1, bar (arg=#0x2aaa00000003) at test.f90:4
4 print *, 'bar argument is', arg
(gdb) p arg
$1 = (REF TO -> ( INTEGER(4) )) #0x2aaa00000003: <error reading variable>
(gdb) quit
$ gdb --version | head -1
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
However, if you compile with gfortran instead of ifort, or if you use GDB 7.7.1, it works fine.
Did you add the INTERFACE statement to the end of your programme?
You need it when you call a function that is not contained in the programme.

Fortran module variables not accessible in debuggers

I've compiled a Fortran code, which contains several modules, using both gfortran 4.4 and intel 11.1 and subsequently tried to debug it using both gdb and DDT. In all cases, I cannot see the values of any variables that are declared in modules. These global variables have values, as the code still runs correctly, but I can't see what the values are in my debuggers. Local variables are fine. I've had trouble finding a solution to this problem elsewhere online, so perhaps there is no straightforward solution, but it's going to be really difficult to debug my code if I can't see the values of any of my global variables.
With newer GDBs (7.2 if I recall correctly), debugging modules is simple. Take the following program:
module modname
integer :: var1 = 1 , var2 = 2
end module modname
use modname, only: newvar => var2
newvar = 7
end
You can now run:
$ gfortran -g -o mytest test.f90; gdb --quiet ./mytest
Reading symbols from /dev/shm/mytest...done.
(gdb) b 6
Breakpoint 1 at 0x4006a0: file test.f90, line 6.
(gdb) run
Starting program: /dev/shm/mytest
Breakpoint 1, MAIN__ () at test.f90:6
6 newvar = 7
(gdb) p newvar
$1 = 2
(gdb) p var1
No symbol "var1" in current context.
(gdb) p modname::var1
$2 = 1
(gdb) p modname::var2
$3 = 2
(gdb) n
7 end
(gdb) p modname::var2
$4 = 7
(gdb)
In gdb, try referencing the global variables with names like __modulename__variablename
You can check that this is the right mangling scheme using nm and grep to find one of your global variables in the symbols of your program.
If that doesn't work, make sure you're using a recent version of gdb.
Here's a thread on this issue: http://gcc.gnu.org/ml/fortran/2005-04/msg00064.html
I had the same issue (GNU gdb 7.9 running in parallel with MPI). What worked for me was the following:
p __modname_mod_var
That is: double underscore, the name of the module, underscore, mod, the name of the variable.
Compiling with -gstabs+ instead of -g may also fix some issues (but not the present one).

Resources