Scattering matrix column by column in MPI using Fortran and contiguous datatype - parallel-processing

I'm trying to generate a matrix full of random numbers, of dimensions (mysize)x(Np), where Np is a given number (say,5). Mysize comes from the number of processes. And then each processor should take its own column, of size 1xNp.
My first idea was to create a vector data type but then I realized this can be done by using the contiguous data type. The code runs with no errors, but it prints only some of the numbers that were supposed to be there. I'm quite sure the problem lies on the indicators (type_column, MPI_REAL, etc.) and how should they actually be but I can't figure this out. Anyone care to help?
The code:
program main
use mpi
integer :: ierr,myrank,mysize
integer :: Np=5, type_column
integer*8, external :: seedgen
real*8, dimension(:,:), allocatable :: x_init
real*8, dimension(:), allocatable :: block_x
character(4) :: rank
integer, dimension(1) :: new_seed
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,myrank,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,mysize,ierr)
allocate(block_x(0:Np-1))
if (myrank==0) then
allocate(x_init(0:Np-1,0:mysize-1))
new_seed = seedgen(myrank)
write(*,*) new_seed !This is to check if the seed changes each time
call random_seed(put=new_seed)
call random_number(x_init)
open(unit=1111, file='./x_tot.txt')
write(1111,*) x_init
close(1111) !Up to this point, everything works
end if
call MPI_TYPE_CONTIGUOUS(Np,MPI_REAL,type_column,ierr)
call MPI_TYPE_COMMIT(type_column,ierr)
call MPI_SCATTER(x_init(:,myrank),Np,MPI_REAL,block_x,Np,type_column,0,MPI_COMM_WORLD,ierr)
write(rank,'(I4)') myrank
write(*,*) 'I, process ',myrank,' received ', block_x(0:Np-1)
open(unit=myrank, file='./x_teste_'//trim(adjustl(rank))//'.txt')
write(myrank,*) block_x
close(myrank)
call MPI_TYPE_FREE(type_column,ierr)
call MPI_FINALIZE(ierr)
end program main
!!! You may ignore what is down below, just a function to generate random seeds.
function seedgen(myrank)
use iso_fortran_env
implicit none
integer(kind=int64) :: seedgen
integer, intent(IN) :: myrank
integer :: s
call system_clock(s)
seedgen = abs( mod((s*181)*((myrank-83)*359), 104729) )
end function seedgen
The output (after running with 3 processors, thus mysize=3)
I, process 0 received 0.247272870046176
0.386432141459887 8.816221414742263E-316 0.000000000000000E+000
0.000000000000000E+000
I, process 1 received 2.737792569698344E-008
I, process 2 received 5.231628354959370E-002
0.454932876242927 3.330771999738651E-315 0.000000000000000E+000
0.000000000000000E+000
3.352416881721526E+270 5.298272496856962E-315 0.000000000000000E+000
0.000000000000000E+000
The matrix x_tot:
0.247272870046176 0.386432141459887 0.473788032900533
0.239586263133600 0.851724848335892 5.231628354959370E-002
0.454932876242927 0.702720716936168 0.559915585253771
0.605745282251549 0.253298270763062 0.809867899324171
0.590174190311136 0.125210425650182 8.138975171285148E-002
So you can see that some numbers are the good ones, the rest is garbage.

Using MPI_DOUBLE as suggested above and changing the writing as below worked fine. Thanks again
open(unit=myrank, file='./x_teste_'//trim(adjustl(rank))//'.txt')
do i=0,Np-1
write(myrank,*) block_x(i)
end do
close(myrank)

Related

Runtime error claiming a negative or zero argument to the logarithm function in Box-Mueller algorithm

The following code is a part of a Fortran 90 program that I wrote in Plato IDE:
It is just the Box-Mueller algorithm to generate Gaussian random numbers.
Program brownstep2_single_stage
Integer:: i,j,m,n,countsucc!,a
Real:: dt,D,epsa,r1,r2,w,fptsum,fptdef1,fptdef2
Real,Dimension(0:100002) :: fx !gt
!T=1000.0 and n*dt=T
dt=0.001
m=100000
n=100000
D=1.0
!a=7
w=2
epsa=0.00001
fx(0)=6.0
!gt(0)=0
fptsum=0
countsucc=0
Call random_seed()
Do i=0,m
!Call random_seed(a)
Do j=0,n
Do while (w>=1.0.and.w<0.0)
Call random_number(r1)
Call random_number(r2)
!r=rand()
r1=2.0*r1-1
r2=2.0*r2-1
w=r1*r1+r2*r2
End do
w=sqrt((-2.0*log(w))/w)
r1=r1*w
r2=r2*w
If(mod(j,2)==0) then
w=r1
Else if(mod(j,2)==1) then
w=r2
End if
fx(j+1)=fx(j)+w*sqrt(2.0*D*dt)
If(fx(j+1)<epsa) then
fptsum=fptsum+(j+1)*dt
countsucc=countsucc+1
exit
End if
print *,i,j
End do
End do
fptdef1=fptsum/m
fptdef2=fptsum/countsucc
print *,'The value of fpt by 1st definition is:',fptdef1
print *,'The value of fpt by 2nd definition is:',fptdef2
print *,'The number of successful events is:',countsucc
print *,'The total number of events is:',m
End program brownstep2_single_stage
During compilation, it shows no error, but when run, it shows the following runtime error, claiming a negative or zero argument to the logarithm function.
Runtime error from program:e:\my files\sample2brownstep_gauss.exe
Run-time Error
Error: Negative or zero argument to logarithm routine
BROWNSTEP2_SINGLE_STAGE - in file sample2brownstep_gauss.f90 at line 31 [+02cc]
What should I do to avoid this?
The changed code above still has problems. w is still not set before the do while loop is reached for the first time and w is used in the condition. Use an 'infinite' do loop with an exit statement. This ensures that one attempt at w is always attempted. This would be better:
do
Call random_number(r1)
Call random_number(r2)
r1=2.0*r1-1
r2=2.0*r2-1
w=r1*r1+r2*r2
if (w .lt. 1.0) exit
End do
w=sqrt((-2.0*log(w))/w)
r1=r1*w
r2=r2*w

OpenMP slower than serial Fortran90 code

In my Fortran90 code I have an outer cycle with several nested loops. In order to speed up my code, I tried to use OpenMP on the outer loop, but I have a very strange problem: when I use more than 1 thread the program runs slower than using OMP with 1 thread which is in turn slower than using the original serial program (in terms of wall clock time. I tried with 1, 2, 3 or 4 threads). In all the cases, however, I get the right result.
I conducted several tests on my code and at last I noticed that the problem is in one subroutine, because if I comment the call to that routine my parallel program works as expected, i.e. the bigger the number of threads, the lower the wall clock time.
Now, that routine takes in input 4 vectors, "ks1", "ks2", "ket1", "ket2" and performs the union between "ks1" and "ks2" obtaining "kstot". Then it creates 2 new vectors, "ket1tot" and "ket2tot", where ket1tot(i) is equal to ket1(j) if ks1tot(i) is equal to ks1(j), otherwise ket1tot(i)=0. The same for ket2tot.
Then combining the values stored in vectors "ks1tot", "ket1tot", "ket2tot" I compute which lines of a vector (matFC) contain the values I need and, by multiplication of that values, I obtain the final result (FCtot).
So I turn that routine in a simple program, adding some initial lines in order to mimic the real program. I mean I added:
1) a loop (on i) that mimics the outer loop of the real program I am trying to parallelize;
2) I implemented the fact that each thread works on a different file (so I should not have a false sharing problem)
3) I added another loop (on k) that mimics that I call the routine several times.
Here is the code (the part that constitutes the original subroutine that gives me problems is indicated in the text):
program evaluatefc
#ifdef _OPENMP
use omp_lib
#endif
implicit none
integer::i,ii,j,jj,jjj,k,sizeks1,sizeks2,sizec,sizekstot,NR,NR1,maxnq
integer::line,ierr,fileunit,mythread,nfreqtot
real*8::FCtot,time1,time2
integer,allocatable,dimension(:)::ks1,ket1,ks2,ket2
integer,dimension(:),allocatable::c,kstot,ket1tot,ket2tot
real*8,allocatable,dimension(:)::matFC
character*15,allocatable,dimension(:)::matfileFC
character::fileFC*15
real*4::tstarting,tending
! This program was originally a subroutine
! that takes in input 4 vectors, ks1, ks2, ket1, ket2
!---------------------------------------------------------------------------
! I initialize some values that in the original subroutine were computed by
!the main program
allocate(matfileFC(3),stat=ierr)
matfileFC(1)='filea.dat'
matfileFC(2)='fileb.dat'
matfileFC(3)='filec.dat'
sizeks1=2
sizeks2=2
maxnq=11
allocate(ks1(sizeks1),stat=ierr)
allocate(ket1(sizeks1),stat=ierr)
allocate(ks2(sizeks2),stat=ierr)
allocate(ket2(sizeks2),stat=ierr)
nfreqtot=42
NR1=nfreqtot*(maxnq**2)+nfreqtot
NR=nfreqtot*(maxnq**2)
allocate(matFC(NR),stat=ierr)
!Call two intrinsic to evaluate CPU and wall clock time
call cpu_time(time1)
tstarting=secnds(0.0)
!$OMP PARALLEL DO &
!$OMP DEFAULT(NONE) &
!$OMP firstprivate(sizeks1,sizeks2,maxnq,matfileFC,NR,NR1) &
!$OMP PRIVATE(i,ii,j,jj,k,ierr,mythread,fileunit,c,sizec,line,sizekstot) &
!$OMP PRIVATE(jjj,ket1,ks1,ket1tot,kstot,ket2,ks2,ket2tot,FCtot,matFC,fileFC)
do ii=1,3
#ifdef _OPENMP
mythread=OMP_GET_THREAD_NUM()
#else
mythread=10
#endif
fileFC=matfileFC(ii)
! Read some lines of a given file.
fileunit=50+mythread
open(unit=fileunit,name=fileFC,status='old',form='formatted')
read(fileunit,*)!Do not read first line
jjj=0
do jj=1,NR1-1
if(mod(jj,(maxnq**2+1)).eq.0) then
read(fileunit,*)
else
jjj=jjj+1
read(fileunit,*)j,k,i,matFC(jjj)
! I actually need only the fourth valor of the line to be stored
endif
enddo
close(fileunit)
do k=1,10000000
! Again I initialize the abovementioned values that in the actual
! subroutine are computed by the main program
ks1(1)=mod(k,30)+1
ks1(2)=mod(k,30)+2
ks2(1)=mod(k,17)+1
ks2(2)=mod(k,17)+3
ket1(1)=mod(k,2)
ket1(2)=mod(k,3)
ket2(1)=mod(k,5)
ket2(2)=mod(k,7)
sizec=sizeks1+sizeks2
allocate(c(sizec),stat=ierr)
do i=1,sizeks1
c(i)=ks1(i)
enddo
do i=sizeks1+1,sizec
c(i)=ks2(i-sizeks1)
enddo
sizekstot=sizec
do i=1,sizeks1
do j=1,sizeks2
if(ks1(i).eq.ks2(j)) then
sizekstot=sizekstot-1
endif
enddo
enddo
allocate(kstot(sizekstot),stat=ierr)
jjj=1
i=1
jj=0
do i=1,sizec-1
jjj=jjj+1
do j=jjj,sizec
if(c(i).eq.c(j)) then
exit
elseif(c(i).ne.c(j).and.j.eq.sizec) then
jj=jj+1
kstot(jj)=c(i)
endif
enddo
enddo
kstot(sizekstot)=c(sizec)
allocate(ket1tot(sizekstot),stat=ierr)
do i=1,sizekstot
ket1tot(i)=0
enddo
allocate(ket2tot(sizekstot),stat=ierr)
do i=1,sizekstot
ket2tot(i)=0
enddo
do i=1,sizekstot
do j=1,sizeks1
if(kstot(i).eq.ks1(j))then
ket1tot(i)=ket1(j)
endif
enddo
enddo
do i=1,sizekstot
do j=1,sizeks2
if(kstot(i).eq.ks2(j))then
ket2tot(i)=ket2(j)
endif
enddo
enddo
FCtot=1
do i=1,sizekstot
line=(kstot(i)-1)*(maxnq)**2+ket1tot(i)*(maxnq)+ket2tot(i)+1
FCtot=matFC(line)*FCtot
enddo
deallocate(c,stat=ierr)
deallocate(kstot,stat=ierr)
deallocate(ket1tot,stat=ierr)
deallocate(ket2tot,stat=ierr)
enddo
enddo
!$OMP END PARALLEL DO
call cpu_time(time2)
tending=secnds(tstarting)
write(*,*)
write(*,*)'CPU time is:'
write(*,*)time2-time1
write(*,*)
write(*,*)'Wall clock time is:'
write(*,*)tending
end program
Still, I get the same problem, i.e. the wall clock time using 4 thread is bigger than using 1 thread.
For example I get (in seconds):
type Wtime CPU time
1 thread 20.37 20.37
4 thread 31.26 91.61
serial 19.64 19.64
I am aware that the call to the OMP library introduces an overhead and in fact the 1-thread OMP program is slower than the serial one. But I cannot understand why the 4-thread OMP code is slower.
I use Intel fortran compiler 2013 on Linux.
Any suggestions?
Thank you for any time you can dedicate to this problem.
Ok, I fixed my own problem.
Thank you all for your suggestions, in particular #Jorge Bellón and #High Performance Mark.
As their comments said, the problem was actually the high number of allocation/deallocation. If I move the allocations out of the loops or at least if I put them right after the first loop, I get the "normal" OpenMP behaviour, i.e. the bigger the number of threads, the lower the wall clock time.
For the example above the wall clock time using 4 threads is now about 7 seconds.
Thank you all for your help.

Send and Receive operations between communicators in MPI

Following my previous question : Unable to implement MPI_Intercomm_create
The problem of MPI_INTERCOMM_CREATE has been solved. But when I try to implement a basic send receive operations between process 0 of color 0 (globally rank = 0) and process 0 of color 1 (ie globally rank = 2), the code just hangs up after printing received buffer.
the code:
program hello
include 'mpif.h'
implicit none
integer tag,ierr,rank,numtasks,color,new_comm,inter1,inter2
integer sendbuf,recvbuf,tag,stat(MPI_STATUS_SIZE)
tag = 22
sendbuf = 222
call MPI_Init(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,numtasks,ierr)
if (rank < 2) then
color = 0
else
color = 1
end if
call MPI_COMM_SPLIT(MPI_COMM_WORLD,color,rank,new_comm,ierr)
if (color .eq. 0) then
if (rank == 0) print*,' 0 here'
call MPI_INTERCOMM_CREATE(new_comm,0,MPI_Comm_world,2,tag,inter1,ierr)
call mpi_send(sendbuf,1,MPI_INT,2,tag,inter1,ierr)
!local_comm,local leader,peer_comm,remote leader,tag,new,ierr
else if(color .eq. 1) then
if(rank ==2) print*,' 2 here'
call MPI_INTERCOMM_CREATE(new_comm,2,MPI_COMM_WORLD,0,tag,inter2,ierr)
call mpi_recv(recvbuf,1,MPI_INT,0,tag,inter2,stat,ierr)
print*,recvbuf
end if
end
The communication with intercommunication is not well understood by most users, and examples are not as many as examples for other MPI operations. You can find a good explanation by following this link.
Now, there are two things to remember:
1) Communication in an inter communicator always go from one group to the other group. When sending, the rank of the destination is its the local rank in the remote group communicator. When receiving, the rank of the sender is its local rank in the remote group communicator.
2) Point to point communication (MPI_send and MPI_recv family) is between one sender and one receiver. In your case, everyone in color 0 is sending and everyone in color 1 is receiving, however, if I understood your problem, you want the process 0 of color 0 to send something to the process 0 of color 1.
The sending code should be something like this:
call MPI_COMM_RANK(inter1,irank,ierr)
if(irank==0)then
call mpi_send(sendbuf,1,MPI_INT,0,tag,inter1,ierr)
end if
The receiving code should look like:
call MPI_COMM_RANK(inter2,irank,ierr)
if(irank==0)then
call mpi_recv(recvbuf,1,MPI_INT,0,tag,inter2,stat,ierr)
print*,'rec buff = ', recvbuf
end if
In the sample code, there is a new variable irank that I use to query the rank of each process in the inter-communicator; that is the rank of the process in his local communicator. So you will have two process of rank 0, one for each group, and so on.
It is important to emphasize what other commentators of your post are saying: when building a program in those modern days, use moderns constructs like use mpi instead of include 'mpif.h' see comment from Vladimir F. Another advise from your previous question was yo use rank 0 as remote leader in both case. If I combine those 2 ideas, your program can look like:
program hello
use mpi !instead of include 'mpif.h'
implicit none
integer :: tag,ierr,rank,numtasks,color,new_comm,inter1,inter2
integer :: sendbuf,recvbuf,stat(MPI_STATUS_SIZE)
integer :: irank
!
tag = 22
sendbuf = 222
!
call MPI_Init(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,numtasks,ierr)
!
if (rank < 2) then
color = 0
else
color = 1
end if
!
call MPI_COMM_SPLIT(MPI_COMM_WORLD,color,rank,new_comm,ierr)
!
if (color .eq. 0) then
call MPI_INTERCOMM_CREATE(new_comm,0,MPI_Comm_world,2,tag,inter1,ierr)
!
call MPI_COMM_RANK(inter1,irank,ierr)
if(irank==0)then
call mpi_send(sendbuf,1,MPI_INT,0,tag,inter1,ierr)
end if
!
else if(color .eq. 1) then
call MPI_INTERCOMM_CREATE(new_comm,0,MPI_COMM_WORLD,0,tag,inter2,ierr)
call MPI_COMM_RANK(inter2,irank,ierr)
if(irank==0)then
call mpi_recv(recvbuf,1,MPI_INT,0,tag,inter2,stat,ierr)
if(ierr/=MPI_SUCCESS)print*,'Error in rec '
print*,'rec buff = ', recvbuf
end if
end if
!
call MPI_finalize(ierr)
end program h

Fortran array Syntax error

Basically Fortran being vague about some error in my array. it says:
Newton_Interpolation_3D.f90:19.132:
0,-0.65364361d0,0.28366220d0, 0.27015114d0, -0.20807342d0, -0.49499625d0, -
nothing else. I checked my array and it looks fine. Can someone please tell me what's wrong with the xnodes array?
implicit none
double precision, allocatable, dimension(:,:) :: nt
double precision, allocatable, dimension(:) :: znodes, ynodes, xnodes, fval
double precision :: x, evalnewton
integer :: i,n,k
n = 24
allocate(xnodes(0:n), ynodes(0:n), znodes(0:n),fval(0:4) ,nt(0:n, 0:n))
xnodes = (/0.54030228d0 ,-0.41614684d0,-0.98999250d0,-0.65364361d0,0.28366220d0,
0.27015114d0, -0.20807342d0, -0.49499625d0, -0.32682180d0, 0.14183110d0,
0.18010077d0,-0.13871562d0,-0.32999751d0,-0.21788120d0,
9.45540667d0,0.13507557d0,-0.10403671d0,-0.24749812d0,
-0.16341090d0,7.09155500d0,0.10806046d0,-8.32293704d0,
-0.19799851d0,-0.13072872d0,5.67324422d0/)
!ynodes = (/0.84147102,0.90929741,0.14112000,-0.75680250,-0.95892429,0.42073551,0.45464870,7.05600008d-02,-0.37840125,-0.47946215,0.28049034,0.30309916, 4.70400006d-02,-0.25226751,-0.31964144,0.21036775,0.227324353,.52800004d-02,-0.18920062,-0.23973107 , -0.19178486 ,-0.15136050, 0.18185948, 2.82240007d-02 ,0.16829421/)
!znodes = (/ -0.41614693 , -0.65364355 , 0.96017027, -0.14550006, -0.83907157, -0.10403673, -0.16341089, 0.24004257 , -3.63750160d-02, -0.20976789, -4.62385453d-02, -7.26270750d-02, 0.10668559, -1.61666758d-02, -9.32301804d-02, -2.60091834d-02,-4.08527218d-02,6.00106418d-02,-9.09375399d-03,-5.24419732d-02, -1.66458786d-02,-2.61457413d-02,3.84068154d-02,-5.82000241d-03, -3.35628614d-02/)
fval = (/5.63,6.11,8.12,4.33,6.15/)
deallocate(xnodes,ynodes,znodes,nt,fval)
You are missing the free form line continuation sentinel (&) at the end of the lines of statements that continue onto the next line.

Getting fortran runtime error: end of file

I have recently learned how to work with basic files in Fortran
and I assumed it was as simple as:
open(unit=10,file="data.dat")
read(10,*) some_variable, somevar2
close(10)
So I can't understand why this function I wrote is not working.
It compiles fine but when I run it it prints:
fortran runtime error:end of file
Code:
Function Load_Names()
character(len=30) :: Staff_Name(65)
integer :: i = 1
open(unit=10, file="Staff_Names.txt")
do while(i < 65)
read(10,*) Staff_Name(i)
print*, Staff_Name(i)
i = i + 1
end do
close(10)
end Function Load_Names
I am using Fortran 2008 with gfortran.
A common reason for the error you report is that the program doesn't find the file it is trying to open. Sometimes your assumptions about the directory in which the program looks for files at run-time will be wrong.
Try:
using the err= option in the open statement to write code to deal gracefully with a missing file; without this the program crashes, as you have observed;
or
using the inquire statement to figure out whether the file exists where your program is looking for it.
You can check when a file has ended. It is done with the option IOSTAT for read statement.
Try:
Function Load_Names()
character(len=30) :: Staff_Name(65)
integer :: i = 1
integer :: iostat
open(unit=10, file="Staff_Names.txt")
do while(i < 65)
read(10,*, IOSTAT=iostat) Staff_Name(i)
if( iostat < 0 )then
write(6,'(A)') 'Warning: File containts less than 65 entries'
exit
else if( iostat > 0 )then
write(6,'(A)') 'Error: error reading file'
stop
end if
print*, Staff_Name(i)
i = i + 1
end do
close(10)
end Function Load_Names
Using Fortran 2003 standard, one can do the following to check if the end of file is reached:
use :: iso_fortran_env
character(len=1024) :: line
integer :: u1,stat
open (newunit=u1,action='read',file='input.dat',status='old')
ef: do
read(u1,'A',iostat=stat) line
if (stat == iostat_end) exit ef ! end of file
...
end do ef
close(u1)
Thanks for all your help i did fix the code:
Function Load_Names(Staff_Name(65))!Loads Staff Names
character(len=30) :: Staff_Name(65)
integer :: i = 1
open(unit=10, file="Staff_Names.txt", status='old', action='read')!opens file for reading
do while(i < 66)!Sets Set_Name() equal to the file one string at a time
read(10,*,end=100) Staff_Name(i)
i = i + 1
end do
100 close(10)!closes file
return!returns Value
end Function Load_Names
I needed to change read(10,*) to read(10,*,END=100)
so it knew what to do when it came to the end the file
as it was in a loop I assume.
Then your problem was that your file was a row vector, and it was likely
giving you this error immediately after reading the first element, as #M.S.B. was suggesting.
If you have a file with a NxM matrix and you read it in this way (F77):
DO i=1,N
DO j=1,M
READ(UNIT,*) Matrix(i,j)
ENDDO
ENDDO
it will load the first column of your file in the first row of your matrix and will give you an error as soon as it reaches the end of the file's first column, because the loop enforces it to read further lines and there are no more lines (if N<M when j=N+1 for example). To read the different columns you should use an implicit loop, which is why your solution worked:
DO i=1,N
READ(UNIT,*) (Matrix(i,j), j=1,M)
ENDDO
I am using GNU Fortran 5.4.0 on the Ubuntu system 16.04. Please check your file if it is the right one you are looking for, because sometimes files of the same name are confusing, and maybe one of them is blank. As you may check the file path if it is in the same working directory.

Resources