Can I add a watch to a COMMON block? - debugging

I have a very large, very old, very byzantine, very undocumented set of Fortran code that I am trying to troubleshoot. It is giving me divide-by-zero problems at run time due to a section that's roughly like this:
subroutine badsub(ainput)
implicit double precision (a-h,o-z)
include 'commonincludes2.h'
x=dsqrt((r(6)-r(8))**2+(z(6)-z(8))**2)
y=ainput
w=y+x
v=2./dlog(dsqrt(w/y))
This code hits divide by zero on the last line, because y is equal to w because x is zero, and thus dlog(dsqrt(1) is zero.
The include file looks something like this:
common /cblk/ r(12),z(12),otherstuff
There are actually 3 include headers with /cblk/ declaration which I've found from running grep -in "/cblk/" *.h *.f *.F: "commonincludes.h", "commonincludes2.h", and "commonincludes3.h". As an added bonus, the section of memory corresponding to r and z are named x and y in "commonincludes.h", i.e. "commonincludes'h" looks like:
common /cblk/ x(12),y(12),otherstuff
My problem is, I have NO IDEA where r and z are set. I've used grep to find everyplace where each of the headers are included, and I don't see anyplace where the variables are written into.
If I inspect the actual values in r and z in gdb where the error occurs the values look reasonable--they're non-zero, not-garbage-looking vectors of real numbers, it's just that r(6) equals r(8) and z(6) equals z(8) that's causing issue.
I need to find where z and r get written, but I can't find any instruction in the gdb documentation for attaching a watchpoint to COMMON block. How can I find where these are written to?

I think I have figured out how to do what I'm trying to do. Because COMMON variables are allocated statically, their addresses shouldn't change from run to run. Therefore, when my program stops due to my divide-by-zero error, I'm able to find the memory address of (in this example) r(8), which is global in scope and shouldn't change on subsequent runs. I can then re-run the code with a watchpoint on that address and it will flag when the value changes anywhere in the code.
In my example, the gdb session looks like this, with process names and directories filed off to protect the guilty:
Reading symbols from myprogram...
(gdb) r
Starting program: ************
Program received signal SIGFPE, Arithmetic exception.
0x00000000004df96d in badsub (ainput=1875.0000521766287) at badsub.f:109
109 v=2./dlog(dsqrt(w/y))
(gdb) p &r(8)
$1 = (PTR TO -> ( real(kind=8) )) 0xcbf7618 <cblk_+56>
(gdb) watch *(double precision *) 0x0cbf7618
Hardware watchpoint 1: *(double precision *) 0x0cbf7618
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: *************
Hardware watchpoint 1: *(double precision *) 0x0cbf7618
Old value = 0
New value = 6.123233995736766e-17
0x00007ffff6f2be2d in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
I have confirmed from running a backtrace that this is indeed a place (presumably the first place) where my common block variable is being set.

Related

Segmentation Fault (Core Dumped) in Fortran 90

I'm writing a Fortran 90 code (below) and I get a segfault (core dumped) error. What is Core Dumped and how do I fix it?
program make_pict
IMPLICIT NONE
INTEGER, PARAMETER :: REAL8=SELECTED_REAL_KIND(15,300)
INTEGER, SAVE :: nstp,npr,step
REAL(REAL8), SAVE :: r
REAL(REAL8), DIMENSION(:,:), ALLOCATABLE, SAVE :: f,fa
INTEGER :: xw,yw,x,y
REAL:: ax,ay
INTEGER, DIMENSION(250000) :: pxa
REAL(REAL8) :: s,s2
LOGICAL, SAVE :: initialized=.FALSE.
WRITE(*,*) 'give values ax,ay'
READ(*,*) ax,ay
xw = 256
yw = 256
OPEN(1,FILE='picture.pxa')
do x=0, xw-1
do y=0, yw-1
f(x,y)=(765./2)*(ax*(1-cos(2*3.14159*x*(1.0/xw)))+ay(1+cos(2*3.14159*y*(1.0/yw))))
end do
end do
WRITE(1,'(2I6)') xw,yw
ALLOCATE(f(0:xw-1,0:yw-1),fa(0:xw-1,0:yw-1))
DO y=0,yw-1
WRITE(1,'(256I4)') (f(x,y),x=0,xw-1)
END DO
CLOSE(1)
initialized=.TRUE.
step=0
nstp=100
end program make_pict
You are attempting to set f before it's allocated. You need the allocate statement before the double loop which sets it! One way to solve this problem yourself is to put output statements everywhere, which would pinpoint the location of the error.
Some other problems I noticed:
You're missing a * in ay(. I'm surprised this code compiled for you, actually.
Why are you using such a low-precision value for pi? You're requesting precision to the 15th decimal but your value for pi only goes to 6?
What is the purpose of step, nstp, and initialized? I guess they're for features to be implemented? You should strive to provide a minimal, complete, and verifiable example.
Adding the save attribute doesn't do anything here. You should read about what it actually does, but it's typically not needed. In a program it definitely does nothing.
To answer your second question, segfaults can occur for many reasons. Core dumped only refers to the system's handling of the segmentation fault. There are many causes of segmentation faults; attempting to access an unallocated array is one of them.

Erratic behavior with compiled legacy code using ifort

How I wish I had a minimum working example for this!
I'm doing a bunch of linear algebra using the HSL libraries. I've turned on every debugging flag I can think of.
On my workstation, the final result of my "deterministic" code rarely works. Most of the time, it complains of an indexing error:
On my workstation (Mac OS 10.7.5 and ifort 12):
forrtl: severe (408): fort: (3): Subscript #1 of the array W has value 0 which is less than the lower bound of 1
Image PC Routine Line Source
libintlc.dylib 0000000103C83E04 Unknown Unknown Unknown
libintlc.dylib 0000000103C8259E Unknown Unknown Unknown
libifcore.dylib 00000001031FBDA1 Unknown Unknown Unknown
libifcore.dylib 000000010316BA4E Unknown Unknown Unknown
libifcore.dylib 000000010316BFB3 Unknown Unknown Unknown
On my laptop (Mac OS 10.10.5 and ifort 16):
forrtl: severe (408): fort: (3): Subscript #1 of the array A has value 0 which is less than the lower bound of 1
Image PC Routine Line Source
libifcore.dylib 000000010ABDCC96 Unknown Unknown Unknown
Uniform2DSimplifi 00000001068851EE _ma48bd_ 1461 ma48d.f
Uniform2DSimplifi 000000010693619C _solve_sparse_mat 142 solve_sparse_matrix_d.f90
Uniform2DSimplifi 000000010693A7D8 _scale_and_solve_ 128 scale_and_solve_sparse_matrix_d.f90
Uniform2DSimplifi 000000010685740D _calc_simplified_ 598 calc_simplified_equations_B.f90
Uniform2DSimplifi 0000000106832176 _MAIN__ 161 uniform_2D_simplified_B.f90
Uniform2DSimplifi 000000010683175E Unknown Unknown Unknown
(You may notice that these are actually two different errors, even though I haven't changed a line of code between them.)
My code runs successfully ~70% of the time using the newer version of ifort on my laptop, but only ~20% of the time using the older version of ifort on my workstation. Oddly, the times that it does work are often after a fresh compilation, and after working one time, it gives that error every time after that. One time it worked, didn't work the second time, then worked the third time. (Sometimes on my laptop, it works for the first 2-3 runs, but throws an error the fourth time.)
My own code is entirely deterministic: it's setting up solving linear algebra routines. It also calls the HSL routines, which evidently call MKL. I would assume that both HSL and MKL are deterministic -- that is, identical inputs produce identical outputs. (They don't call RAND() or do file I/O....) Still, I'm not sure.
Update:
I looked up line 1461 of ma48d.f:
CALL MA50BD(NR,NC,NZB,JOB5,A(K+1),IRN(K+1),KEEP(IPTRD+J1),
+ CNTL5,ICNTL5,IW(J1),IQB,NP,LA-NEWNE-KK,
+ A(NEWNE+KK+1),IRN(NEWNE+KK+1),KEEP(IPTRL+J1),
+ KEEP(IPTRU+J1),W,IW(M+1),INFO5,RINFO5)
On my laptop, it's complaining because k has a value -1 (causing the error) while it normally has a value of 0 (leading to success). What's bizarre about this is that I'm giving these routines the exact same inputs, and the code appears to be deterministic, so they should execute the exact same lines...yet they don't.
My question:
What could be causing this erratic behavior?
So far, I've thought of the following possibilities:
Internal compiler error (Supporting evidence: the newer version of ifort seems to produce better--but even just different--results.)
Something related to the stack/heap (Supporting evidence: it works the first time, but not afterward.)
MKL (BLAS) is non-deterministic (not likely) (Supporting evidence: the traceback points to libintlc.dylib, which is an Intel library...possibly related to MKL?)
HSL is non-deterministic (likely??) (Supporting evidence: the code appears to be deterministic, though at least one of the errors is in this code.)
Something about my installation of ifort or my configuration of Mac OS X? (Supporting evidence: it's an old machine and something may have gone wrong somewhere?)
In my experience, compilers are far more reliable than programmers. That is, I would suspect the program of having a programming error unless it can be proved that bad code was generated.
This kind of error is certainly due to using an uninitialized value. Look for a variable which is not specifically set to some value before being used.
program x
integer :: i, arr(10)
do while (i < 10)
arr (i) = 0
i = i + 1
enddo
print *, arr
end
Sometimes this code will set all the elements to zero. Other times it won't change a thing.
A directly related, but more subtle lack-of-initialization error occurs in this logic:
program y
integer :: sum, i, arrA(10), arrB(10)
real :: ave(2)
do i = 1, 10
arrA(i) = i * 343
arrB(i) = i * 121
enddo
sum = 0
do i = 1, 10
sum = sum + arrA(i)
enddo
ave(0) = sum / 10.0
do i = 1, 10
sum = sum + arrB(i)
enddo
ave(1) = sum / 10.0
print *, 'Averages are', ave
end
No compiler warning will show up for failing to reinitialize sum, though this sort of error is reproducible and deterministic.
I cannot add a comment - hence the answer.
You can also try -ftrapuv (initialize stack variables to an unusual value). If you are using Intel 15 or higher you can set -init=snan. This initializes 'save'd variables to signal NaN.

Translating pseudocode into machine code

For academic purposes, I am being asked to translate this statement
assign x the value 5
Into a machine code that is made up by an author of a computer science book, called brookshear machine code. I am given a hint that is
(HINTS: Assume that the value of x is to be stored into main memory location 47.
Your program would begin by loading a value into a register. You do not need to
specify the memory locations of your program. Don't forget to end the program with
the HALT instruction.)
I am wondering if anyone knows the best way to approach this? He makes it clear to end with the halt instruction but I am unsure what exactly I should be doing.
0iii - No-operation
1RXY - Load register R with contents of location XY
2RXY - Load register R with value XY
3RXY - Store contents of register R at location XY
4iRS - Move contents of register R to register S
5RST - Add contents of registers S and T as binary numbers, place result in register R
6RST - Add contents of registers S and T as floating-point numbers, place result in register R
7RST - OR together the contents of registers S and T , place result in register R
8RST - AND together the contents of registers S and T , place result in register R
9RST - XOR together the contents of registers S and T , place result in register R
ARiZ - Rotate the contents of register R one bit to the right, Z times
BRXY - Jump to instruction at XY if contents of register R equal contents of register 0
Ciii - Halt
DRXY - Jump to instruction at XY if contents of register R are greater than contents of register 0
R,S,T - Register numbers
XY - A one-byte address or data value
Z - A half-byte value
i - Ignored when the instruction is de-coded: usually entered as 0
Above is the machine language I am expected to use.
If only there were an instruction:
EABXY - Store value XY at location AB
If that command existed, your program would be:
E4705 # store '05' at address '47'
C000 # halt
But, that instruction doesn't exist -- partly because it takes five half-byte characters, and the instructions are meant to fit into four.
So you're going to have to simulate the 'E' instruction using two steps.
You can't specify a value to put into an address directly.
There is one instruction that lets you specify a value and put it somewhere.
There is one instruction that copies a value from somewhere, into an address
That's really enough clues.

Debugger implementation - Step over issue

I am currently writing a debugger for a script virtual machine.
The compiler for the scripts generates debug information, such as function entry points, variable scopes, names, instruction to line mappings, etc.
However, and have run into an issue with step-over.
Right now, I have the following:
1. Look up the current IP
2. Get the source line from that
3. Get the next (valid) source line
4. Get the IP where the next valid source line starts
5. Set a temporary breakpoint at that instruction
or: if the next source line no longer belongs to the same function, set the temp breakpoint at the next valid source line after return address.
So far this works well. However, I seem to be having problems with jumps.
For example, take the following code:
n = 5; // Line A
if(n == 5) // Line B
{
foo(); // Line C
}
else
{
bar(); // Line D
--n;
}
Given this code, if I'm on line B and choose to step-over, the IP determined for the breakpoint will be on line C. If, however, the conditional jump evaluates to false, it should be placed on line D. Because of this, the step-over wouldn't halt at the expected location (or rather, it wouldn't halt at all).
There seems to be little information on debugger implementation of this specific issue out there. However, I found this. While this is for a native debugger on Windows, the theory still holds true.
It seems though that the author has not considered this issue, either, in section "Implementing Step-Over" as he says:
1. The UI-threads calls CDebuggerCore::ResumeDebugging with EResumeFlag set to StepOver.
This tells the debugger thread (having the debugger-loop) to put IBP on next line.
2. The debugger-thread locates next executable line and address (0x41141e), it places an IBP on that location.
3. It calls then ContinueDebugEvent, which tells the OS to continue running debuggee.
4. The BP is now hit, it passes through EXCEPTION_BREAKPOINT and reaches at EXCEPTION_SINGLE_STEP. Both these steps are same, including instruction reversal, EIP reduction etc.
5. It again calls HaltDebugging, which in turn, awaits user input.
Again:
The debugger-thread locates next executable line and address (0x41141e), it places an IBP on that location.
This statement does not seem to hold true in cases where jumps are involved, though.
Has anyone encountered this problem before? If so, do you have any tips on how to tackle this?
Since this thread comes in Google first when searching for "debugger implement step over". I'll share my experiences regarding the x86 architecture.
You start first by implementing step into: This is basically single stepping on the instructions and checking whether the line corresponding to the current EIP changes. (You use either the DIA SDK or the read the dwarf debug data to find out the current line for an EIP).
In the case of step over: before single stepping to the next instruction, you'll need to check if the current instruction is a CALL instuction. If it's a CALL instruction then put a temporary breakpoint on the instruction following it and continue execution till the execution stops (then remove it). In this case you effectively stepped over function calls literally in the assembly level and so in the source too.
No need to manage stack frames (unless you'll need to deal with single line recursive functions). This analogy can be applied to other architectures as well.
Ok, so since this seems to be a bit of black magic, in this particular case the most intelligent thing was to enumerate the instruction where the next line starts (or the instruction stream ends + 1), and then run that many instructions before halting again.
The only gotcha was that I have to keep track of the stack frame in case CALL is executed; those instructions should run without counting in case of step-over.

implementing step over, dwarf

Im working on a source level debugger. The debug info available in elf
format. How could be 'step over' implemented?
The problem is at 'Point1', anyway I can wait for the
next source line (reading it from the .debug_line table).
Thanks
if (a == 1)
x = 1; //Point1
else if (a == 2)
x = 1;
z = 1;
I'm not sure I understand the question entirely, but I can tell you how GDB implements its step command.
Once control has entered a particular compilation unit, GDB reads that CU's debugging information; in particular, it reads the CU's portion of the .debug_line section and builds a table that maps instruction addresses to source code positions.
When the step begins, GDB looks up the source location for the current PC. Then it steps by machine instruction, looking up the source location of the new PC each time, until the source location changes. When the source location changes, the step is complete.
It also computes the frame ID—the base address of the stack frame, and the start address of the function—after each step, and checks if that has changed. If it has, that means that we've stepped into or returned from a recursive call, and the step is complete.
To see why it's necessary to check the frame ID as well as the source location, consider stepping through a call to the following function:
int fact(n) { if (n > 0) { return n * fact(n-1); } else return 1; }
Since this function is defined entirely on the same source line, stepping by instruction until the source line changes would step you through all the recursive calls without stopping. However, when we enter a new call to fact, the stack frame base address will have changed, indicating that we should stop. This gives us the following behavior:
fact (n=10) at recurse.c:4
(gdb) step
fact (n=9) at recurse.c:4
(gdb) step
fact (n=8) at recurse.c:4
GDB's next command combines this general behavior with appropriate logic for recognizing function calls and letting them return to completion. As before, one must use frame IDs in deciding when calls have truly returned to the original frame; and there are other complications.
It's worth thinking a bit about how to treat inlined instances of functions (which DWARF does describe). But that's a bit much for this question.
Not to discourage experimentation, but if I were beginning a debugger project, I would want to look at Apple's work-in-progress debugger, lldb, which is open source.

Resources