Parallel Fortran Program Crash - parallel-processing

I have some code that has been optimized and parallelized with OpenMP.
The code works fine with small datasets (tested with 4,000 datapoints) but fails with larger datasets (tested with 12,000 datapoints). I compiled with gfortran (gcc 4.9.3) on Windows 7 under Cygwin.
The program reads data into arrays and calculates some statistics using parameters from an input parameter file:
real,allocatable :: x(:),y(:),z(:),vr(:,:),azm(:),atol(:),
+ bandwh(:),dip(:),dtol(:),bandwd(:)
real*8,allocatable :: sills(:),dis(:),gam(:),hm(:),
+ tm(:),hv(:),tv(:),np(:)
integer,allocatable :: ivtail(:),ivhead(:),ivtype(:)
character*12,allocatable :: names(:)
Here are user defined parameters to run the program
real EPSLON,VERSION
real xlag,xltol,tmin,tmax
integer nd,nlag,ndir,nvarg,isill,test
character outfl*512
This section is related to OpenMP. There are definitions of reduced varaibles for OpenMP.
parameter(PI=3.14159265)
real uvxazm(100),uvyazm(100),uvzdec(100),
+ uvhdec(100),csatol(100),csdtol(100)
logical omni
integer threadId,numThreads
real*8,allocatable :: reducedVariables(:,:,:)
integer extractValue
integer i,j,id,ii,il,it,iv,jj
real dx,dy,dz,dxs,dys,dzs,hs,h
integer lagbeg,lagend,ilag
real band,dcazm,dcdec,dxy,gamma,vrh,vrhpr,vrt,vrtpr
real xi,yi,zi
integer liminf,limsup,iinf,isup,k
real xlaginv
The main parallel loop is called here:
c$omp parallel default(firstprivate)
c$omp& shared(x,y,z,reducedVariables,vr)
#ifdef _OPENMP
threadId = int(OMP_get_thread_num())+1
print *,'Thread ',threadId
#else
threadId = 1
#endif
This section recombines the variables:
#ifdef _OPENMP
c$omp barrier
reducedVariables(1,:,threadId)=dis(:)
reducedVariables(2,:,threadId)=gam(:)
reducedVariables(3,:,threadId)=np(:)
reducedVariables(4,:,threadId)=hm(:)
reducedVariables(5,:,threadId)=tm(:)
reducedVariables(6,:,threadId)=hv(:)
reducedVariables(7,:,threadId)=tv(:)
#endif
c$omp end parallel
#ifdef _OPENMP
dis(:)=0.0
gam(:)=0.0
np(:)=0.0
hm(:)=0.0
tm(:)=0.0
hv(:)=0.0
tv(:)=0.0
do ii=1,numThreads
do jj=1,mxdlv
dis(jj) = dis(jj) + reducedVariables(1,jj,ii)
gam(jj) = gam(jj) + reducedVariables(2,jj,ii)
np(jj) = np(jj) + reducedVariables(3,jj,ii)
hm(jj) = hm(jj) + reducedVariables(4,jj,ii)
tm(jj) = tm(jj) + reducedVariables(5,jj,ii)
hv(jj) = hv(jj) + reducedVariables(6,jj,ii)
tv(jj) = tv(jj) + reducedVariables(7,jj,ii)
end do
end do
#endif
I've debugged with gdb and get the following error:
At line 894 of file gamv.fpp Fortran runtime error: Index '13' of
dimension 1 of array 'np' above upper bound of 12
[Thread 9336.0x1494 exited with code 2] [Thread 9336.0x22f4 exited
with code 2] [Inferior 1 (process 9336) exited with code 02]
The offending section of code:
if(it.eq.1.or.it.eq.5.or.it.ge.9) then
do il=lagbeg,lagend
ii = (id-1)*nvarg*(nlag+2)+(iv-1)*(nlag+2)+il
np(ii) = np(ii) + 1.
dis(ii) = dis(ii) + dble(h)
tm(ii) = tm(ii) + dble(vrt)
hm(ii) = hm(ii) + dble(vrh)
gam(ii) = gam(ii) + dble((vrh-vrt)*(vrh-vrt))
end do
I don't see anywhere that array np was defined as having an upper bound of 12.
Is this an issue with using dynamic arrays with OpenMP?

Related

OpenMP parameter sweep parallel

I am new to OpenMP. I want to solve a stiff ODE system for a range of parameter values using parallel do loops. I use the following code in Fortran given below. However, I do not know whether calling a stiff solver(as a subroutine) inside a parallel do loop is allowed or not? Also, I want to write the time series data into files with filenames such as "r_value_s__value.txt" in the subroutine before the return to the main program. Can anyone help. Below is the code and the error. I used gfortran with flags -fopenmp to compile.
PROGRAM OPENMP_PARALLEL_STIFF
USE omp_lib
IMPLICIT NONE
INTEGER :: I, J
INTEGER, PARAMETER :: RTOT=10, STOT=15
INTEGER :: TID
INTEGER, PARAMETER :: NUM_THREADS=8
DOUBLE PRECISION :: T_INITIAL, T_FINAL
CALL OMP_SET_NUM_THREADS(NUM_THREADS)
CALL CPU_TIME(T_INITIAL)
PRINT*, "TIME INITIAL ",T_INITIAL
!$OMP PARALLEL DO PRIVATE(I,J,TID)
DO I=1,RTOT
DO J=1,STOT
TID=OMP_GET_THREAD_NUM()
CALL STIFF_DRIVER(TID,I,J,RTOT,STOT)
END DO
END DO
!$OMP END PARALLEL DO
CALL CPU_TIME(T_FINAL)
PRINT*, "TIME FINAL ",T_FINAL
PRINT*, "TIME ELAPSED ",(T_FINAL-T_INITIAL)/NUM_THREADS
END PROGRAM OPENMP_PARALLEL_STIFF
SUBROUTINE STIFF_DRIVER(TID,II,JJ,RTOT,STOT)
USE USEFUL_PARAMETERS_N_FUNC
USE DVODE_F90_M
! Type declarations:
IMPLICIT NONE
! Number of odes for the problem:
INTEGER :: SERIAL_NUMBER, TID
INTEGER :: II, JJ, RTOT, STOT, IND
INTEGER :: J, NTOUT
INTEGER :: ITASK, ISTATE, ISTATS, I
! parameters : declaration
DOUBLE PRECISION, PARAMETER :: s0=0.450D0, dr=1.0D-4, ds=1.0D-2
DOUBLE PRECISION, DIMENSION(NEQ) :: Y, YOUT
DOUBLE PRECISION :: ATOL, RTOL, RSTATS, T, TOUT, EPS, TFINAL, DELTAT
DIMENSION :: RSTATS(22), ISTATS(31)
DOUBLE PRECISION :: bb, cc, ba, ba1, eta
CHARACTER(len=45) :: filename
TYPE (VODE_OPTS) :: OPTIONS
SERIAL_NUMBER=3011+II+(JJ-1)*RTOT
IND=TID+3011+II+(JJ-1)*RTOT
WRITE (*,12)SERIAL_NUMBER,TID
12 FORMAT ("SL. NO. ",I5," THREAD NO.",I3)
r=(II-1)*dr
s=s0+JJ*ds
EPS = 1.0D-9
! Open the output file:
WRITE (filename,93)r,s
93 FORMAT ("r_",f6.4,"_s_",f4.2,".txt")
OPEN (UNIT=IND,FILE=filename,STATUS='UNKNOWN',ACTION='WRITE')
! Parameters for the stiff ODE system
q0 = 0.60D0; v = 3.0D0
Va = 20.0D-4; Vs = 1.0D-1
e1 = 1.0D-1; e2 = 1.10D-5; e3 = 2.3D-3; e4=3.0D-4
del = 1.7D-4; mu = 5.9D-4
al = 1.70D-4; be = 8.9D-4; ga = 2.5D-1
! S and r dependent parameters
e1s = e1/s; e2s = e2/(s**2); e3s = e3/s; e4s = e4/s
dels = del*s; rs = r*s
e1v = e1/v; e2v = e2/(v**2); e3v = e3/v; e4v = e4/v
delv = del*v; rv = r*v
! SET INITIAL PARAMETERS for INTEGRATION ROUTINES
T = 0.0D0
TFINAL = 200.0D0
DELTAT = 0.10D0
NTOUT = INT(TFINAL/DELTAT)
RTOL = EPS
ATOL = EPS
ITASK = 1
ISTATE = 1
! Set the initial conditions: USING MODULE USEFUL_PARAMETERS_N_FUNC
CALL Y_INITIAL(NEQ,Y)
! Set the VODE_F90 options:
OPTIONS = SET_OPTS(DENSE_J=.TRUE.,USER_SUPPLIED_JACOBIAN=.FALSE., &
RELERR=RTOL,ABSERR=ATOL,MXSTEP=100000)
! Integration:
DO I=1,NTOUT
TOUT = (I-1)*DELTAT
CALL DVODE_F90(F_FUNC,NEQ,Y,T,TOUT,ITASK,ISTATE,OPTIONS)
! Stop the integration in case of an error
IF (ISTATE<0) THEN
WRITE (*,*)"ISTATE ", ISTATE
STOP
END IF
! WRITE DATA TO FILE
WRITE (IND,*) TOUT,T, Y(NEQ-2)
END DO
CLOSE(UNIT=IND)
RETURN
END SUBROUTINE STIFF_DRIVER
At line ** of file openmp_parallel_stiff.f90 (unit = 3013)
Fortran runtime error: File already opened in another unit
The issue is the format that you chose: f6.4 for r will overflow for r>=10. Then, the output will be six asterisks ****** (depending on the compiler) for all values of r>=10 on all threads. The same holds true for s.
I would suggest to either limit/check the range of these values or extend the format to honor more digits.
As #francescalus mentioned, another possibility is hit a combination of II and JJ where r and s are identical.
Just for the fun of it - let's do the math:
r=(II-1)*dr
s=s0+JJ*ds
From r=s follows
(II-1)*dr = s0+JJ*ds
or
II = 1 + s0/dr + JJ*ds/dr
Using the constants s0=0.450D0, dr=1.0D-4, ds=1.0D-2 yields
II = 4501 + JJ*10
So, whenever this combination is true for two (or more) threads at a time, you run into the observed issue.
Simple solution for this case: add the thread number to the file name.

A value sended by host not return correctly by device using CUDA Fortran

I took an example of data transfer between Host and Device for CUDA Fortran and found this:
Host Code:
program incTest
use cudafor
use simpleOps_m
implicit none
integer, parameter :: n = 256
integer :: a(n), b, i
integer, device :: a_d(n)
a = 1
b = 3
a_d = a
call inc<<<1,n>>>(a_d, b)
a = a_d
if (all(a == 4)) then
write(*,*) 'Success'
endif
end program incTest
Device Code:
module simpleOps_m
contains
attributes(global) subroutine inc(a, b)
implicit none
integer :: a(:)
integer, value :: b
integer :: i
i = threadIdx%x
a(i) = a(i)+b
end subroutine inc
end module simpleOps_m
The expected outcome is the console presenting "Success", but this did not happen. Nothing appears in the screen, nothing errors or messages.
This happen because don't enter in if, because a_d has the same value that before call inc subroutine.
I'm using:
OS: Linux - Ubuntu 16
Cuda 8
PGI to compile
Commands to compile:
pgf90 -Mcuda -c Device.cuf
pgf90 -Mcuda -c Host.cuf
pgf90 -Mcuda -o HostDevice Device.o Host.o
./HostDevice
I tried other examples and they did not work too.
I tried using simple Fortran (.f90) code with the same commands to compile and it works!
How can I fix this problem?
What type of device are you using? (If you don't know, post the output from the "pgaccelinfo" utility).
My best guess is that you have a Pascal based device in which case you need to compile with "-Mcuda=cc60".
For example, if I add error checking to the example code, we see that we get an invalid device kernel error when running on a Pascal without the "cc60" as part of the compilation.
% cat test.cuf
module simpleOps_m
contains
attributes(global) subroutine inc(a, b)
implicit none
integer :: a(:)
integer, value :: b
integer :: i
i = threadIdx%x
a(i) = a(i)+b
end subroutine inc
end module simpleOps_m
program incTest
use cudafor
use simpleOps_m
implicit none
integer, parameter :: n = 256
integer :: a(n), b, i, istat
integer, device :: a_d(n)
a = 1
b = 3
a_d = a
call inc<<<1,n>>>(a_d, b)
istat=cudaDeviceSynchronize()
istat=cudaGetLastError()
a = a_d
if (all(a == 4)) then
write(*,*) 'Success'
else
write(*,*) 'Error code:', cudaGetErrorString(istat)
endif
end program incTest
% pgf90 test.cuf -Mcuda
% a.out
Error code:
invalid device function
% pgf90 test.cuf -Mcuda=cc60
% a.out
Success

How to deal with negative numbers that returncode get from subprocess in Python?

This piece of script in python:
cmd = 'installer.exe --install ...' #this works fine, the ... just represent many arguments
process = subprocess.Popen(cmd)
process.wait()
print(process.returncode)
This code works fine in my opinion, the problem is the value of .returncode.
The installer.exe is ok, did many test to this, and now i trying to create a script in python to automate a test for many days executing this installer.exe .
The installer.exe return:
- Success is 0;
- Failure and errors are NEGATIVE numbers
I have a specific error that is -307 that installer.exe return. But python when execute print(process.returncode) its shows 4294966989 ... How can i deal with negative numbers in python, to show in this case the -307?
I am new to python and the env is win7 32 and python 3.4.
EDIT: the final code working
The porpose of this code is to run many simple test:
import subprocess, ctypes, datetime, time
nIndex = 0
while 1==1:
cmd = 'installer.exe --reinstall -n "THING NAME"'
process = subprocess.Popen( cmd, stdout=subprocess.PIPE )
now = datetime.datetime.now()
ret = ctypes.c_int32( process.wait() ).value
nIndex = nIndex + 1
output = str( now ) + ' - ' + str( nIndex ) + ' - ' + 'Ret: ' + str( ret ) + '\n'
f = open( 'test_result.txt', 'a+' )
f.write( output )
f.closed
print( output )
Using only the standard library:
>>> import struct
>>> struct.unpack('i', struct.pack('I', 4294966989))
(-307,)
Using NumPy: view the unsigned 32-bit int, 4294966989, as a signed 32-bit int:
In [39]: np.uint32(4294966989).view('int32')
Out[39]: -307
To convert positive 32-bit integer to its two's complement negative value:
>>> 4294966989 - (1 << 32) # mod 2**32
-307
As #Harry Johnston said, Windows API functions such as GetExitCodeProcess() use unsigned 32-bit integers e.g., DWORD, UINT. But errorlevel in cmd.exe is 32-bit signed integer and therefore some exit codes (> 0x80000000) may be shown as negative numbers.

Kissfftr different results x86 - Atheros AR9331

This is my first question on stackoverflow and my englsich is unfortunately poor. But I want to try it.
A customized routine of twotonetest of kissfft brings on two different systems very different results.
The under ubuntu translated with gcc on x86 program brings the correct values. That with the openWRT SDK translated for the Arduino YUN (Atheros AR9331) program displays incorrect values​​. It seems as if since the definition of FIXED_POINT is ignored.
Defined is:
#define FIXED_POINT 32
the function:
double GetFreqBuf( tBuf * io_pBuf, int nfft)
{
kiss_fftr_cfg cfg = NULL;
kiss_fft_cpx *kout = NULL;
kiss_fft_scalar *tbuf = NULL;
uint32_t ptr;
int i;
double sigpow=0;
double noisepow=0;
long maxrange = SHRT_MAX;
cfg = kiss_fftr_alloc(nfft , 0, NULL, NULL);
tbuf = KISS_FFT_MALLOC(nfft * sizeof(kiss_fft_scalar));
kout = KISS_FFT_MALLOC(nfft * sizeof(kiss_fft_cpx));
/* generate the array from samples*/
for (i = 0; i < nfft; i++) {
//nur einen Kanal, eine Krücke, würde nun auch mit 2 kanälen gehen, aber so ist schneller
if (io_pBuf->IndexNextValue >= (i*2))
ptr = io_pBuf->IndexNextValue - (i*2);
else
ptr = io_pBuf->bufSize - ((i*2) - io_pBuf->IndexNextValue);
tbuf[i] = io_pBuf->aData[ptr] ;
}
kiss_fftr(cfg, tbuf, kout);
for (i=0;i < (nfft/2+1);++i) {
double tmpr = (double)kout[i].r / (double)maxrange;
double tmpi = (double)kout[i].i / (double)maxrange;
double mag2 = tmpr*tmpr + tmpi*tmpi;
if (i!=0 && i!= nfft/2)
mag2 *= 2; /* all bins except DC and Nyquist have symmetric counterparts implied*/
/* if there is power between the frq's, it is signal, otherwise noise*/
if ( i > nfft/96 && i < nfft/32 )
noisepow += mag2;
else
sigpow += mag2;
}
kiss_fft_cleanup();
//printf("TEST %d Werte, noisepow: %f sigpow: %f noise # %fdB\n",nfft,noisepow,sigpow,10*log10(noisepow/sigpow +1e-30) );
free(cfg);
free(tbuf);
free(kout);
return 10*log10(noisepow/sigpow +1e-30);
}
As input samples of 16-bit sound from the same file be used. Results differ for example from-3dB to-15dB. AWhere could you start troubleshooting?
Possibility #1 (most likely)
You are compiling kissfft.c or kiss_fftr.c differently than the calling code. This happens to a lot of people.
An easy way to force the same FIXED_POINT is to edit the kiss_fft.h directly. Another option: verify with some printf debugging. i.e. place the following in various places:
printf( __FILE__ " sees sizeof(kiss_fft_scalar)=%d\n" , sizeof(kiss_fft_scalar) )
Possibility #2
Perhaps the FIXED_POINT=16 code works but the FIXED_POINT=32 code does not because something is being handled incorrectly either inside kissfft or on the platform. The 32 bit fixed code relies on int64_t being implemented correctly.
Is that Atheros a 16 bit processor? I know kissfft has been used successfully on 16 bit platforms, but I'm not sure if FIXED_POINT=32 real FFTs on a 16 bit fixed point has been used.
viel Glück,
Mark

Compile errors with Fortran90

All, I've been fighting these errors for hours, here's my code:
program hello
implicit none
integer :: k, n, iterator
integer, dimension(18) :: objectArray
call SetVariablesFromFile()
do iterator = 1, 18
write(*,*) objectArray(iterator)
end do
contains
subroutine SetVariablesFromFile()
IMPLICIT NONE
integer :: status, ierror, i, x
open(UNIT = 1, FILE = 'input.txt', &
ACTION = 'READ',STATUS = 'old', IOSTAT = ierror)
if(ierror /= 0) THEN
write(*, *) "Failed to open input.txt!"
stop
end if
do i = 1, 18
objectArray(i) = read(1, *, IOSTAT = status) x
if (status > 0) then
write(*,*) "Error reading input file"
exit
else if (status < 0) then
write(*,*) "EOF"
exit
end if
end do
close(1)
END subroutine SetVariablesFromFile
end program hello
I'm getting compile errors:
make: * [hello.o] Error1
Syntax error in argument list at (1)
I read online that the latter error could be due to a long line of code exceeding 132 characters, which doesn't appear to be the problem.I have no where to begin on the first error... any help would be much appreciated!
This,
objectArray(i) = read(1, *, IOSTAT = status) x
is not valid Fortran. You need to write it as,
read(1,*,iostat=status) objectArray(i)
Setting it in this correct form, I received no compiler errors with ifort 12.1, nor with gfortran 4.4.3

Resources