OpenMP FORTRAN issue with privates

OpenMP FORTRAN issue with privates - openmp

In the following code, when I am passing the variable "aa" as private, the results are getting bad. The code works fine how it is posted, but when I replace the line
!$OMP PARALLEL PRIVATE(iii,iter,y,i,yt) SHARED(bb)
with
!$OMP PARALLEL PRIVATE(aa,iter,y,i,yt) SHARED(bb)
the code is not working properly.
!!!!!!!! module
module common
use iso_fortran_env
implicit none
integer,parameter:: dp=real64
real(dp):: aa,bb
contains
subroutine evolve(y,yevl)
implicit none
integer(dp),parameter:: id=2
real(dp),intent(in):: y(id)
real(dp),intent(out):: yevl(id)
yevl(1)=y(2)+1.d0-aa*y(1)**2
yevl(2)=bb*y(1)
end subroutine evolve
end module common
use common
implicit none
integer(dp):: iii,iter,i
integer(dp),parameter:: id=2
real(dp),allocatable:: y(:),yt(:)
integer(dp):: OMP_GET_THREAD_NUM, IXD
allocate(y(id)); allocate(yt(id)); y=0.d0; yt=0.d0; bb=0.3d0
!$OMP PARALLEL PRIVATE(iii,iter,y,i,yt) SHARED(bb)
IXD=OMP_GET_THREAD_NUM()
!$OMP DO
do iii=1,20000; print*,iii !! EXPECTED THREADS TO BE OF 5000 ITERATIONS EACH
aa=1.d0+dfloat(iii-1)*0.4d0/2000.d0
loop1: do iter=1,10 !! THE INITIAL CONDITION LOOP
call random_number(y)!! RANDOM INITIALIZATION OF THE VARIABLE
loop2: do i=1,70000 !! ITERATION OF THE SYSTEM
call evolve(y,yt)
y=yt
enddo loop2 !! END OF SYSTEM ITERATION
write(IXD+1,*)aa,yt !!! WRITING FILE CORRESPONDING TO EACH THREAD
enddo loop1 !!INITIAL CONDITION ITERATION DONE
enddo
!$OMP ENDDO
!$OMP END PARALLEL
end
What could be the issue? Works fine when I generate "aa" from "iii" but not when I pass it as a private variable. Thanks in advance for any comments or suggestions.

aa is a module variable. Module variables can either be shared (the default) or threadprivate. Example A.32.2f from the OpenMP standard document illustrates that when module variables are accessed in the dynamic scope of a construct, it is unspecified whether the original variable or the private thread copy is being accessed. This is not the case with threadprivate variables as they are always stored in the thread-local storage, no matter if used inside the lexical scope of a parallel region or not.
There are many scenarios for what happens if you declare a module variable to be private and then access it into a subroutine. What is most likely to happen depends on what kind of analysis the compiler does on the code. Some compilers might detect, that the module subroutine is only called inside the parallel region and hence make aa refer to the private copy of each thread. Other compilers might decide to always access the original module variable. On the other hand, if the subroutine gets inlined in the calling subroutine, then it might refer to the same aa that is used in the calling context (e.g. the private version if aa is declared private)
Here is an example of how gfortran handles PRIVATE(iii,aa,iter,y,i,yt) at the default optimisation level:
; aa is declared as a global symbol in the BSS section
.globl __common_MOD_aa
.bss
.align 8
.type __common_MOD_aa, #object
.size __common_MOD_aa, 8
__common_MOD_aa:
.zero 8
; Here is how evolve accesses aa
...
movsd __common_MOD_aa(%rip), %xmm2
...
; Here is how the assignment to aa is done inside the parallel region
...
movsd %xmm0, -72(%rbp)
...
The private aa is implemented as an automatic variable and stored in the stack of the thread, while evolve uses the value of aa from the module. Therefore this operator:
aa=1.d0+dfloat(iii-1)*0.4d0/2000.d0
only alters the value of aa inside the thread, while evolve uses the original value of aa from outside the parallel region.
At the high optimisation level -O3 gfortran inlines evolve into the parallel region and...
...
mulsd __common_MOD_aa(%rip), %xmm2
...
The inlined code also refers to the global value of aa in the module, i.e. the behaviour is consistent between the two optimisation levels.
The same applies to Intel Fortran.
The correct approach is to declare aa to be threadprivate and to not put it in a private clause:
module common
use iso_fortran_env
implicit none
integer,parameter:: dp=real64
real(dp):: aa,bb
!$OMP THREADPRIVATE(aa)
...
!$OMP PARALLEL PRIVATE(iii,iter,y,i,yt) SHARED(bb)
IXD=OMP_GET_THREAD_NUM()
!$OMP DO
do iii=1,20000; print*,iii !! EXPECTED THREADS TO BE OF 5000 ITERATIONS EACH
aa=1.d0+dfloat(iii-1)*0.4d0/2000.d0
...
Now both the parallel region and evolve will use a private to each thread copy of aa. As access to threadprivate variables is usually slower than access to normal private (stack) variables, on 64-bit x86 systems it might make more sense to pass the value of aa as an argument to evolve instead as suggested by #Bálint Aradi.

You should try to carefully analyze your variables, especially to think about which of them would have different values on the different threads at the same time, as those must be declared OMP private. In your case, both variables aa and iii must be OMP private. Variable iii because it is a counter in a loop which is distributed over the threads, and aa because it gets a value which depends on iii.
EDIT: As each thread calls the evolve subroutine itself and evolve is supposed to use the thread specific value of aa (I guess), you should also pass aa to your subroutine instead of using the module variable aa.
The routine should look like:
subroutine evolve(y, aa, yevl)
integer(dp),parameter:: id=2
real(dp),intent(in):: y(id), aa
real(dp),intent(out):: yevl(id)
yevl(1)=y(2)+1.d0-aa*y(1)**2
yevl(2)=bb*y(1)
end subroutine evolve
and the according call in your main program:
call evolve(y, aa, yt)

Related

Why gdb command "info locals" also print undeclared variable?

int a = 10;
if(a >= 5)
printf("Hello World");
int b;
b = 3;
For example I command "info locals" before execute line 4 "int b;" but gdb print information of variables a and b. Why gdb work like this and how can I print only declared variables?

It depends on the way the compiler decided to compile your code.
In your case, I believe that since variable b will always be used in your code, then the compiler might "declared" both a and b together to optimize execution time.
If you really want to understand what happened, disassemble the program and see the assembly that actually runs when you execute this code.
I assume you will discover that the instruction that allocates the space on the stack frame for the variable b allocated the space for both variables (a & b) at the same time.

gdb shows b variable because it has been declared by your compiler
Assuming this code is inside a function, once execution flow enters the function it allocates local variables in the stack, this is where a and b values reside. That's because the compiler reads your code and makes all declarations at the beginning, even if they haven't been declared on top of your function.
Take a look at How the local variable stored in stack

Reduction of output array dimension in Fortran77 procedure

I am working on a large Fortran code, where parts are written in FORTRAN77.
There is a piece of code, which causes debugger to raise errors like:
Fortran runtime error:
Index '2' of dimension 1 of array 'trigs' above upper bound of 1
but when compiled without debugging options runs and does not crash the program. Debugging options used:
-g -ggdb -w -fstack-check -fbounds-check\
-fdec -fmem-report -fstack-usage
The logic of the problematic piece of code is following: in file variables.cmn I declare
implicit none
integer factors,n
real*8 triggers
parameter (n=32)
common /fft/ factors(19), triggers(6*n)
Variables factors and triggers are initialized in procedure initialize:
include 'variables.cmn'
...
CALL FFTFAX(n,factors,triggers)
...
FFTFAX is declared in another procedure as:
SUBROUTINE FFTFAX(N,IFAX,TRIGS)
implicit real*8(a-h,o-z)
DIMENSION IFAX(13),TRIGS(1)
CALL FAX (IFAX, N, 3)
CALL FFTRIG (TRIGS, N, 3)
RETURN
END
and lets look at procedure FFTRIG:
SUBROUTINE FFTRIG(TRIGS,N,MODE)
implicit real*8(a-h,o-z)
DIMENSION TRIGS(1)
PI=2.0d0*ASIN(1.0d0)
NN=N/2
DEL=(PI+PI)/dFLOAT(NN)
L=NN+NN
DO 10 I=1,L,2
ANGLE=0.5*FLOAT(I-1)*DEL
TRIGS(I)=COS(ANGLE)
TRIGS(I+1)=SIN(ANGLE)
10 CONTINUE
DEL=0.5*DEL
NH=(NN+1)/2
L=NH+NH
LA=NN+NN
DO 20 I=1,L,2
ANGLE=0.5*FLOAT(I-1)*DEL
TRIGS(LA+I)=COS(ANGLE)
TRIGS(LA+I+1)=SIN(ANGLE)
20 CONTINUE
In both FFTFAX and FFTRIG procedures there are different bounds for dimensions of arguments than the actual input array size (for TRIGS it is 1 and 19, respectively).
I printed out TRIGS after calling FFTFAX in no-debugger compilation setup:
trigs: 1.0000000000000000 0.0000000000000000\
0.99144486137381038 0.13052619222005157 0.96592582628906831\
0.25881904510252074 0.92387953251128674 0.38268343236508978\
...
My questions are:
Is notation :
DIMENSION TRIGS(1)
something more than setting bound of an array?
Why is the program even working in no-debugger mode?
Is setting:
DIMENSION TRIGS(*)
a good fix if I want variable trigs be a result of the procedure?

In f77 statements like the DIMENSION TRIGS(1) or similar or ..(*) with any number, if pertaining an argument of the procedure just tells the compiler
the rank of the array, the length in memory must be assigned to the array which is given in the call of the subroutine, normally f77 does not check this!
My recommendation either use (*) or better reformat (if necessary) the f77 sources to f90 (the bits shown would compile without change...).
and use dimension computed using n in the declaration within the subroutines/procedures.
Fortan passes arguments by address (i.e. trigs(i) in the subroutine just
will refer on the memory location, which corresponds to the address of trigs(1) + i*size(real*8).
A more consisted way to write the subroutine code could be:
SUBROUTINE FFTRIG(TRIGS,N,MODE)
! implicit real*8(a-h,o-z)
integer, intent(in) :: n
real(kind=8) :: trigs(6*n)
integer :: mode
! DIMENSION TRIGS(1)
.....
PI=2.0d0*ASIN(1.0d0)
.....
or with less ability for the compiler to check
SUBROUTINE FFTRIG(TRIGS,N,MODE)
! implicit real*8(a-h,o-z)
integer, intent(in) :: n
real(kind=8) :: trigs(:)
integer :: mode
! DIMENSION TRIGS(1)
.....
PI=2.0d0*ASIN(1.0d0)
.....

To answer your question, I would change TRIGS(1) to TRIGS(*), only to more clearly identify array TRIGS as not having it's dimension provided. TRIGS(1) is a carry over from pre F77 for how to identify this.
Using TRIGS(:) is incorrect, as defining array TRIGS in this way requires any routine calling FFTRIG to have an INTERFACE definition. This change would lead to other errors.
Your question is mixing the debugger's need for the array size vs the syntax excluding the size being provided. To overcome this you could pass the array TRIGS's declared dimension, as an extra declared argument, for the debugger to check. When using "debugger" mode, some compilers do provide hidden properties including the declared size of all arrays.

Writing a parallel for loop

I'm confused as to how to declare which parts of a program are accessible or not from the different workers. On a relatively low level in a program of mine I have a for loop I want to parallelize.
module module_name
[...]
addprocs(3)
totalsum = #parallel (+) for i in 1:large_number
tmp_sum = 0
for j in 1:num
... # calls f1 f2
end
tmp_sum # not sure how to 'return' the result, the examples have a conveniantly placed calculation at the end
end
rmprocs([2 3 4])
[...]
end
As I understand I'd have to put the #everywhere decorator infront of f1 and f2. But the program fails far before with the additional workers complaining that UndefVarError: module_name not defined, and I have no clue how to fix that.
I feel like I’ve missed something needed for setting up parallel processing. As I understood it other than writing the actual #parallel part, one needs to addprocs and then add the #everywhere decorator to those functions used inside the loop. Is that really it?
I know pmap is better suited for what I'm doing here but I wanted to get the simpler option to work first (I'd need to pass several arguments the pmap function).

OpenMP calling a function gives wrong results

Hi I am trying to put a do loop in different threads. Now inside the do loop I am calling a function which again calls some subroutine and adding to a total sum. Now if I put parallel enclosing the do loop, it is giving random results however I see that if I put the function inside CRITICAL environment it gives the correct result. But this costs more cpu time and does not improve the speed at all.
I tested with a small test program and check that my logic is correct. However in a big program (which I can not post here) this only works when I enclose the function call in CRITICAL.
Below I give the test program: (my test program works and gives correct result however in the big program I see that funb is not correctly taken in different threads unless it is in CRITICAL environment.)
sum=0d0
!$OMP PARALLEL PRIVATE(i,j,sum1,xcn,fun)
ithrd=OMP_GET_THREAD_NUM()
!$OMP DO
do i=1,5
sum1=0d0
do j=1,3
xcn=i+j+xx
!$OMP CRITICAL
fun=funb(xnc)
write(*,*)fun
!$OMP END CRITICAL
sum1=sum1+fun
enddo
enddo
!$OMP END DO
!$OMP CRITICAL
sum=sum+sum1
!$OMP END CRITICAL
!$OMP END PARALLEL
write(*,*)sum
If I remove OMP CRITICAL in the big program I see that different threads are taking same values for funb in different threads which should be different. Therefore my understanding is: there is some restriction in the function being called in PARALLEL section. I would be thankful if anybody can clarify the issue.
The function funb given as:
COMPLEX*16 FUNCTION FUNB(ZAA)
IMPLICIT COMPLEX*16 (A-H,O-Z)
real*8 X1,X2
COMMON/ZVAR/ZA
COMMON/XVAR/X1,X2
ZA=ZAA
call myinvini
call myinvc(x2,fout)
funb=fout
RETURN
END
myinvini are some data for wl8,xl8 but myinvc is again a subroutine:
subroutine myinvc(x,f2)
complex*16 dir,dirc,sta,ss,ssc,cn,cnc,f2,ff,ffc,func
complex*16 f22,ans
integer igauss,inte,l,m
double precision x,range,phi,w,z,zz,zr
double precision st,st0,zint,xbl,a,b,dli,sli
double precision cpar,zero
double precision xl8,wl8,xl32,wl32
dimension zint(51)
COMMON/iinte/inte
complex*16 cbeta
common /wgauss/ xl8(8),wl8(8),xl32(32),wl32(32)
common /ccpar/ cpar
include 'constants.h'
igauss = 8
zero=0.0d0
range=201.0d0
phi=3.0d0/4.0d0*pi
dir=dcmplx(dcos(phi),dsin(phi))
dirc=dcmplx(dcos(phi),-dsin(phi))
sta=dcmplx(cpar,zero)
st =dexp(dlog(range)/dble(inte))
st0=1.0d0
zint(1)=zero
do 11 l=1,inte
st0 =st0*st
zint(l+1)=st0-1.0d0
11 continue
ss=dcmplx(zero,zero)
ssc=dcmplx(zero,zero)
xbl=dlog(x)
do 23 l=1,inte ! inte=5
a=zint(l)
b=zint(l+1)
dli=(b-a)/2.d0
sli=(b+a)/2.d0
do 24 m=1,igauss
if(igauss.eq. 8) w=wl8(m)
if(igauss.eq.32) w=wl32(m)
if(igauss.eq. 8) zz=xl8(m)
if(igauss.eq.32) zz=xl32(m)
z =dli*zz+sli
cn=sta+z*dir
cnc=sta+z*dirc
ff=func(cn)
ffc=func(cnc)
ss=ss+ff*dir*exp(-xbl*cn)*w*dli
ssc=ssc+ffc*dirc*exp(-xbl*cnc)*w*dli
24 continue
23 continue
f2=(ss+ssc)
return
end

In the absence of threadprivate directive, common block variables are shared. The function referenced inside the parallel section modifies such a common block variable, this will cause a data race and is not permitted by the openmp standard.
The code uses implicit typing and implicit specification of the data sharing attributes for most of the variables referenced in the openmp construct. These are apalling from a coding style perspective. The code as shown has one likely variable spelling mistake, which would likely have been avoided if implicit specifications were avoided.

Fortran dynamic objects

I am trying to create a subroutine that returns data as a pointer:
I want something like that:
subroutine f(p)
type(tra), pointer p
type(tra), target :: instance
p=>instance
do_work(instance)
end subroutine
Strictly speaking I want to implement analogue of c++ "new" operator.
I want then to use such a subroutine as follows:
subroutine other
type(tra), pointer :: p1,p2
call f(p1)
call f(p2)
end subroutine
The above code may not work, as I suppose "instance" inside f is destroyed after f quits, and the next call of f creates "instance" again in the same place in memory.
In particular I find with p1 and p2 pointing to the same objects, but I guess this is compiler-dependent. Is it true?
I think that a possible solution is:
subroutine f(p)
type(tra), pointer p
type(tra), allocatable, target :: instance(:)
p=>instance(1)
do_work(instance(1))
end subroutine
Is this the "official" way of doing things?

Strictly speaking I want to implement analogue of c++ "new" operator.
It is ALLOCATE. The thing you are trying to do should be simply this:
subroutine f(p)
type(tra), pointer :: p
! you can actually leak memory this way! caution required.
if(associated(p)) then
stop "possible memory leak - p was associated"
end
allocate(p)
do_work(p)
end subroutine
The above code may not work, as I suppose "instance" inside f is destroyed after f quits, and the next call of f creates "instance" again in the same place in memory.
No, this is not true. Local subroutine variables are usually "allocated" once (and even initialized only once), see e.g. Fortran 90 spec, chapter 14, especially section 14.7.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

OpenMP FORTRAN issue with privates - openmp

Related

Why gdb command "info locals" also print undeclared variable?

Reduction of output array dimension in Fortran77 procedure

Writing a parallel for loop

OpenMP calling a function gives wrong results

Fortran dynamic objects

Categories

Resources