Segmentation Fault (Core Dumped) in Fortran 90 - compilation

I'm writing a Fortran 90 code (below) and I get a segfault (core dumped) error. What is Core Dumped and how do I fix it?
program make_pict
IMPLICIT NONE
INTEGER, PARAMETER :: REAL8=SELECTED_REAL_KIND(15,300)
INTEGER, SAVE :: nstp,npr,step
REAL(REAL8), SAVE :: r
REAL(REAL8), DIMENSION(:,:), ALLOCATABLE, SAVE :: f,fa
INTEGER :: xw,yw,x,y
REAL:: ax,ay
INTEGER, DIMENSION(250000) :: pxa
REAL(REAL8) :: s,s2
LOGICAL, SAVE :: initialized=.FALSE.
WRITE(*,*) 'give values ax,ay'
READ(*,*) ax,ay
xw = 256
yw = 256
OPEN(1,FILE='picture.pxa')
do x=0, xw-1
do y=0, yw-1
f(x,y)=(765./2)*(ax*(1-cos(2*3.14159*x*(1.0/xw)))+ay(1+cos(2*3.14159*y*(1.0/yw))))
end do
end do
WRITE(1,'(2I6)') xw,yw
ALLOCATE(f(0:xw-1,0:yw-1),fa(0:xw-1,0:yw-1))
DO y=0,yw-1
WRITE(1,'(256I4)') (f(x,y),x=0,xw-1)
END DO
CLOSE(1)
initialized=.TRUE.
step=0
nstp=100
end program make_pict

You are attempting to set f before it's allocated. You need the allocate statement before the double loop which sets it! One way to solve this problem yourself is to put output statements everywhere, which would pinpoint the location of the error.
Some other problems I noticed:
You're missing a * in ay(. I'm surprised this code compiled for you, actually.
Why are you using such a low-precision value for pi? You're requesting precision to the 15th decimal but your value for pi only goes to 6?
What is the purpose of step, nstp, and initialized? I guess they're for features to be implemented? You should strive to provide a minimal, complete, and verifiable example.
Adding the save attribute doesn't do anything here. You should read about what it actually does, but it's typically not needed. In a program it definitely does nothing.
To answer your second question, segfaults can occur for many reasons. Core dumped only refers to the system's handling of the segmentation fault. There are many causes of segmentation faults; attempting to access an unallocated array is one of them.

Related

Can I add a watch to a COMMON block?

I have a very large, very old, very byzantine, very undocumented set of Fortran code that I am trying to troubleshoot. It is giving me divide-by-zero problems at run time due to a section that's roughly like this:
subroutine badsub(ainput)
implicit double precision (a-h,o-z)
include 'commonincludes2.h'
x=dsqrt((r(6)-r(8))**2+(z(6)-z(8))**2)
y=ainput
w=y+x
v=2./dlog(dsqrt(w/y))
This code hits divide by zero on the last line, because y is equal to w because x is zero, and thus dlog(dsqrt(1) is zero.
The include file looks something like this:
common /cblk/ r(12),z(12),otherstuff
There are actually 3 include headers with /cblk/ declaration which I've found from running grep -in "/cblk/" *.h *.f *.F: "commonincludes.h", "commonincludes2.h", and "commonincludes3.h". As an added bonus, the section of memory corresponding to r and z are named x and y in "commonincludes.h", i.e. "commonincludes'h" looks like:
common /cblk/ x(12),y(12),otherstuff
My problem is, I have NO IDEA where r and z are set. I've used grep to find everyplace where each of the headers are included, and I don't see anyplace where the variables are written into.
If I inspect the actual values in r and z in gdb where the error occurs the values look reasonable--they're non-zero, not-garbage-looking vectors of real numbers, it's just that r(6) equals r(8) and z(6) equals z(8) that's causing issue.
I need to find where z and r get written, but I can't find any instruction in the gdb documentation for attaching a watchpoint to COMMON block. How can I find where these are written to?
I think I have figured out how to do what I'm trying to do. Because COMMON variables are allocated statically, their addresses shouldn't change from run to run. Therefore, when my program stops due to my divide-by-zero error, I'm able to find the memory address of (in this example) r(8), which is global in scope and shouldn't change on subsequent runs. I can then re-run the code with a watchpoint on that address and it will flag when the value changes anywhere in the code.
In my example, the gdb session looks like this, with process names and directories filed off to protect the guilty:
Reading symbols from myprogram...
(gdb) r
Starting program: ************
Program received signal SIGFPE, Arithmetic exception.
0x00000000004df96d in badsub (ainput=1875.0000521766287) at badsub.f:109
109 v=2./dlog(dsqrt(w/y))
(gdb) p &r(8)
$1 = (PTR TO -> ( real(kind=8) )) 0xcbf7618 <cblk_+56>
(gdb) watch *(double precision *) 0x0cbf7618
Hardware watchpoint 1: *(double precision *) 0x0cbf7618
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: *************
Hardware watchpoint 1: *(double precision *) 0x0cbf7618
Old value = 0
New value = 6.123233995736766e-17
0x00007ffff6f2be2d in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
I have confirmed from running a backtrace that this is indeed a place (presumably the first place) where my common block variable is being set.

Maximum elements to loop over in fortran in linux and windows [duplicate]

This question already has answers here:
Why Segmentation fault is happening in this openmp code?
(2 answers)
Closed 4 months ago.
I am writing some parallel Fortran90/95 code and I just came across some thing I can't understand.
I work on a Toshiba laptop with 6Go RAM.
In Windows 10, I use code::blocks. I have imported gfortran from MinGW as a compiler and compile my code with the -fopenmp flag.
I have Ubuntu 18.04 inside VirtualBox. I let it use half of my ram, that is 3Go. I compile my code on this one using gfortran -fopenmp as well.
A minimal version of the encountered code causing issue is:
program main
implicit none
integer :: i
integer, parameter :: n=500000
real, dimension(n) :: A, B
real :: som
som=0
do i =1, n
A(i)= 1.0
B(i)= 2.0
end do
do i=1, n
som = som + A(i)*B(i)
end do
print *,"somme:", som
end program main
I then let vary the value of the parameter n.
Running on Windows. For n up to approx 200.000 everything's fine. Above, I get "Process returned -1073741571 (0xC00000FD)"
Running on Ubuntu I can go up to 1.000.000 with no issue. Seems that the barrier is around 2.000.000 after which I got a segfault.
My question is how one can explain that ubuntu, in spite of having far less memory available can handle 10 times more iterations ?
Is there anything I can do on the Windows size to make it able to handle more loop iterations ?
According to Rodrigo Rodrigues comment, I added one more flag to my compiler setting:
-fmax-stack-var-size=65535
Documentation says default is 32767 but I assume there is a different setting in code blocks and in ubuntu's native gfortran.
program main
implicit none
real,allocatable :: A(:),B(:)
real :: som
integer :: i
integer, parameter :: n=500000
allocate(A(n))
allocate(B(n))
som=0
do i =1, n
A(i)= 1.0
B(i)= 2.0
end do
do i=1, n
som = som + A(i)*B(i)
end do
print *,"somme:", som
end program main
Replacing
real, dimension(n) :: A, B
with
real,allocatable :: A(:),B(:)
allocate(A(n))
allocate(B(n))
solved the problem, please check.

Missing print-out for MPI root process, after its handling data reading alone

I'm writing a project that firstly designates the root process to read a large data file and do some calculations, and secondly broadcast the calculated results to all other processes. Here is my code: (1) it reads random numbers from a txt file with nsample=30000 (2) generate dens_ent matrix by some rule (3) broadcast to other processes. Btw, I'm using OpenMPI with gfortran.
IF (myid==0) THEN
OPEN(UNIT=8,FILE='rnseed_ent20.txt')
DO i=1,n_sample
DO j=1,3
READ(8,*) rn(i,j)
END DO
END DO
CLOSE(8)
END IF
dens_ent=0.0d0
DO i=1,n_sample
IF (myid==0) THEN
!Random draws of productivity and savings
rn_zb=MC_JOINT_SAMPLE((/-0.1d0,mu_b0/),var,rn(i,1:2))
iz=minloc(abs(log(zgrid)-rn_zb(1)),dim=1)
ib=minloc(abs(log(bgrid(1:nb/2))-rn_zb(2)),dim=1) !Find the closest saving grid
CALL SUB2IND(j,(/nb,nm,nk,nxi,nz/),(/ib,1,1,1,iz/))
DO iixi=1,nxi
DO iiz=1,nz
CALL SUB2IND(jj,(/nb,nm,nk,nxi,nz/),(/policybmk_2_statebmk_index(j,:),iixi,iiz/))
dens_ent(jj)=dens_ent(jj)+1.0d0/real(nxi)*markovian(iz,iiz)*merge(1.0d0,0.0d0,vent(j) .GE. -bgrid(ib)+ce)
!Density only recorded if the value of entry is greater than b0+ce
END DO
END DO
END IF
END DO
PRINT *, 'dingdongdingdong',myid
IF (myid==0) dens_ent=dens_ent/real(n_sample)*Mpo
IF (myid==0) PRINT *, 'sum_density by joint normal distribution',sum(dens_ent)
PRINT *, 'BLBLALALALALALA',myid
CALL MPI_BCAST(dens_ent,N,MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,ierr)
Problem arises:
(1) IF (myid==0) PRINT *, 'sum_density by joint normal distribution',sum(dens_ent) seems not executed, as there is no print out.
(2) I then verify this by adding PRINT *, 'BLBLALALALALALA',myid etc messages. Again no print out for root process myid=0.
It seems like root process is not working? How can this be true? I'm quite confused. Is it because I'm not using MPI_BARRIER before PRINT *, 'dingdongdingdong',myid?
Is it possible that you miss the following statement just at the very beginning of your code?
CALL MPI_COMM_RANK (MPI_COMM_WORLD, myid, ierr)
IF (ierr /= MPI_SUCCESS) THEN
STOP "MPI_COMM_RANK failed!"
END IF
The MPI_COMM_RANK returns into myid (if succeeds) the identifier of the process within the MPI_COMM_WORLD communicator (i.e a value within 0 and NP, where NP is the total number of MPI ranks).
Thanks for contributions from #cw21 #Harald and #Hristo Iliev.
The failure lies in unit numbering. One reference says:
unit number : This must be present and takes any integer type. Note this ‘number’ identifies the
file and must be unique so if you have more than one file open then you must specify a different
unit number for each file. Avoid using 0,5 or 6 as these UNITs are typically picked to be used by
Fortran as follows.
– Standard Error = 0 : Used to print error messages to the screen.
– Standard In = 5 : Used to read in data from the keyboard.
– Standard Out = 6 : Used to print general output to the screen.
So I changed all numbering i into 1i, not working; then changed into 10i. It starts to work. Mysteriously, as correctly pointed out by #Hristo Iliev, as long as the numbering is not 0,5,6, the code should behave properly. I cannot explain to myself why 1i not working. But anyhow, the root process is now printing out results.

How can I debug a Fortran READ/WRITE statement with an implicit DO loop?

The Fortran program I am working is encountering a runtime error when processing an input file.
At line 182 of file ../SOURCE_FILE.f90 (unit = 1, file = 'INPUT_FILE.1')
Fortran runtime error: Bad value during integer read
Looking to line 182 I see a READ statement with an implicit/implied DO loop:
182: READ(IT4, 310 )((IPPRM2(IP,I),IP=1,NP),I=1,16) ! read 6 integers
183: READ(IT4, 320 )((PPARM2(IP,I),IP=1,NP),I=1,14) ! read 5 reals
Format statement:
310 FORMAT(1X,6I12)
When I reach this code in the debugger NP has a value of 2. I has a value of 6, and IP has a value of 67. I think I and IP should be reinitialized in the loop.
My problem is that when I try to step through in the debugger once I get to the READ statement it seems to execute and then throw the error. I'm not sure how to follow it as it reads. I tried stepping into the function, but it seems like that may be a difficult route to take since I am unfamiliar with the gfortran library. The input file looks OK, I think it should be read just fine. This makes me think this READ statement isn't looping as intended.
I am completely new to Fortran and implicit DO loops like this, but from what I can gather line 182 should read in 6 integers according to the format string #310. However, when I arrive NP has a value of 2 which makes me think it will only try to read 2 integers 16 times.
How can I debug this read statement to examine the values read into IPPARM as they are read from the file? Will I have to step through the Fortran library?
Any tips that can clear up my confusion regarding these implicit loops would be appreciated!
Thanks!
NOTE: I'm using gfortran/gcc and gdb on Linux.
Is there any reason you need specific formatting on the read? I would use READ(IT4, *) where feasible...
Later versions of gfortran support unlimited format reads (see link http://fortranwiki.org/fortran/show/Fortran+2008+status)
Then it may be helpful to specify
310 FORMAT("*(1X,6I12)")
Or for older compilers
310 FORMAT(1000(1X,6I12))
The variables IP and I are loop indices and so they are reinitialized by the loop. With NP=2 the first statement is going to read a total of 32 integers -- it is contributing to the determination the list of items to read. The format determines how they are read. With "1X,6I12" they will be read as 6 integers per line of the input file. When the first 6 of the requested 32 integers is read fron a line/record, Fortran will consider that line/record completed and advance to the next record.
With a format of "1X,6I12" the integers must be precisely arranged in the file. There should be a single blank, then the integers should each be right-justified in fields of 12 columns. If they get out of alignment you could get the wrong value read or a runtime error.

Timing a Fortran multithreaded program

I have a Fortran 90 program calling a multi threaded routine. I would like to time this program from the calling routine. If I use cpu_time(), I end up getting the cpu_time for all the threads (8 in my case) added together and not the actual time it takes for the program to run. The etime() routine seems to do the same. Any idea on how I can time this program (without using a stopwatch)?
Try omp_get_wtime(); see http://gcc.gnu.org/onlinedocs/libgomp/omp_005fget_005fwtime.html for the signature.
If this is a one-off thing, then I agree with larsmans, that using gprof or some other profiling is probably the way to go; but I also agree that it is very handy to have coarser timers in your code for timing different phases of the computation. The best timing information you have is the stuff you actually use, and it's hard to beat stuff that's output every single tiem you run your code.
Jeremia Wilcock pointing out omp_get_wtime() is very useful; it's standards compliant so should work on any OpenMP compiler - but it only has second resolution, which may or may not be enough, depending on what you're doing. Edited; the above was completely wrong.
Fortran90 defines system_clock() which can also be used on any standards-compliant compiler; the standard doesn't specify a time resolution, but gfortran it seems to be milliseconds and ifort seems to be microseconds. I usually use it in something like this:
subroutine tick(t)
integer, intent(OUT) :: t
call system_clock(t)
end subroutine tick
! returns time in seconds from now to time described by t
real function tock(t)
integer, intent(in) :: t
integer :: now, clock_rate
call system_clock(now,clock_rate)
tock = real(now - t)/real(clock_rate)
end function tock
And using them:
call tick(calc)
! do big calculation
calctime = tock(calc)
print *,'Timing summary'
print *,'Calc: ', calctime

Resources