partition a 2D array column-wise and use allgather

partition a 2D array column-wise and use allgather - parallel-processing

I have a fortran MPI code in which a compute intensive function is invoked on every element of a 2D array. I'm trying to split the tasks among the ranks. For example if there are 30 columns and 10 ranks, then each rank gets 3 columns. The following code does this split and gathers the results using allgather. But the final array doesn't have the values from all ranks.
program allgather
include 'mpif.h'
!create a 2 x 30 myarray
integer :: x=2,y=30
integer :: numprocs,myid
integer :: i,j,k,myelements,mycolumns,jb,je
integer*4,dimension(:),allocatable :: displacement,recvcnt
real :: checksum
real,dimension(:,:),allocatable :: myarr,combinedarr
call MPI_INIT(IERR)
call MPI_COMM_SIZE(MPI_COMM_WORLD,NUMPROCS,IERR)
call MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
mycolumns = y/numprocs
myelements = x * mycolumns
allocate(displacement(numprocs),recvcnt(numprocs))
jb = 1 + ( myid * mycolumns )
je = ( myid + 1 ) * mycolumns
allocate(myarr(x,mycolumns))
allocate(combinedarr(x,y))
myarr(:,:) =0
do j=jb,je
do i=1,x
myarr(i,j) = 1
enddo
enddo
!myarr(:,:)=1
if(mod(y,numprocs) > 0) then
if(myid==numprocs-1) then
jb=(myid + 1) * mycolumns + 1
do j=jb,y
do i=1,x
myarr(i,j) = 1
enddo
enddo
endif
endif
combinedarr(:,:) =0
recvcnt(:)=myelements
do k=1,numprocs
displacement(k) = (k-1) *myelements
enddo
call MPI_ALLGATHERV(myarr,myelements,MPI_REAL,combinedarr,recvcnt,displacement,MPI_REAL,MPI_COMM_WORLD,IERR)
if(mod(y,numprocs) > 0) then
recvcnt(:) = 0
recvcnt(numprocs) = (x*y) - myelements * (numprocs)
displacement(numprocs) = displacement(numprocs) + myelements
call MPI_ALLGATHERV(myarr,recvcnt(numprocs),MPI_REAL,combinedarr,recvcnt,displacement,MPI_REAL,MPI_COMM_WORLD,IERR)
endif
if (myid==0) then
checksum=0
write(6,*) "mycolumns:",mycolumns,"myelements:",myelements
do j=1,y
do i=1,x
checksum = checksum + combinedarr(i,j)
enddo
enddo
write(6,*) checksum
endif
end

First of all, you are using MPI_ALLGATHERV() just as MPI_ALLGATHER() and get no benefit from its ability to send different number of elements from/to each process. But that's not the error in your program. The error lies in the way it fills myarr. You allocate it as myarr(x,mycolumns) but when filling it from column jb to column je, you go past the end of the array in all processes but rank 0 since jb and je are greater than mycolumns there. Thus myarr contains ones only in rank 0 and zeroes in all other ranks. So, yes, the final array does not have the values that you expect but that's because you filled them wrong, not because of the way MPI subroutines are used.
Writing past the end of an allocatable array destroys the hidden structures that are used to manage heap allocation and usually crashes the program. In your case you are just lucky - I run your code with Open MPI and it crashed with core dumps each time.
And you are also missing a call to MPI_FINALIZE() at the end of your code.
Hint: use the Fortran 90 interface if available - replace include 'mpif.h' with use mpi

here is the final version of the code. I have implemented the fixes suggested by "Hristo Iliev" and also fixed the part where the # or ranks does not equally divide the # of columns. Here the last rank does the computation on the leftover columns.
program allgather
include 'mpif.h'
!create a 2 x 30 myarray
integer :: x=4,y=6
integer :: numprocs,myid
integer :: i,j,k,myelements,mycolumns,jb,je,jbb
integer*4,dimension(:),allocatable :: displacement,recvcnt
real :: checksum
real,dimension(:,:),allocatable :: myarr,combinedarr
call MPI_INIT(IERR)
call MPI_COMM_SIZE(MPI_COMM_WORLD,NUMPROCS,IERR)
call MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
mycolumns = y/numprocs
myelements = x * mycolumns
allocate(displacement(numprocs),recvcnt(numprocs))
jb = 1 + ( myid * mycolumns )
je = ( myid + 1 ) * mycolumns
allocate(myarr(x,y))
allocate(combinedarr(x,y))
myarr(:,:) =0
do j=jb,je
do i=1,x
myarr(i,j) = (j-1) * x + i
enddo
enddo
if(mod(y,numprocs) > 0) then
if(myid==numprocs-1) then
jbb=(myid + 1) * mycolumns + 1
do j=jbb,y
do i=1,x
myarr(i,j) = (j-1) * x + i
enddo
enddo
endif
endif
combinedarr(:,:) =0
recvcnt(:)=myelements
do k=1,numprocs
displacement(k) = (k-1) *myelements
enddo
call MPI_ALLGATHERV(myarr(1,jb),myelements,MPI_REAL,combinedarr,recvcnt,displacement,MPI_REAL,MPI_COMM_WORLD,IERR)
if(mod(y,numprocs) > 0) then
recvcnt(:) = 0
recvcnt(numprocs) = (x*y) - myelements * (numprocs)
displacement(numprocs) = displacement(numprocs) + myelements
call MPI_ALLGATHERV(myarr(1,jbb),recvcnt(numprocs),MPI_REAL,combinedarr,recvcnt,displacement,MPI_REAL,MPI_COMM_WORLD,IERR)
endif
if (myid==0) then
checksum=0
write(6,*) "mycolumns:",mycolumns,"myelements:",myelements
do j=1,y
do i=1,x
checksum = checksum + combinedarr(i,j)
enddo
enddo
write(6,*) checksum
endif
end

Related

OpenMP on fortran 90 lasts almost the same(if not more) as non parallelized program

I'm trying to parallelize a simulation of an Ising 2D model to get some expected values as a function of the temperature of the system. For L=48, the one-threaded version takes about 240 seconds to run 20 temperatures and 1 seed each, but the parallelized version takes about 268 seconds, which is similar.
If you take the time per seed per temperature, it results in 12 seconds for the one-threaded version and 13.4 seconds for the parallelized version. I'm looking for help with my code because I don't understand these durations. I thought that the parallelized version would split one temperature among all threads and therefore should take about 30 seconds to complete.
I need to run the simulation for 50 temperatures and 200 seeds each, for 5 values of L. It would be helpful to reduce the compute time, because otherwise it could take 20 hours for L=48 and some days for L=72.
I'm using an i7-10700KF (8 cores, 16 logical threads).
program Ising
use omp_lib
implicit none
integer L, seed, i, j, seed0, nseed,k
parameter (L=48)
integer s(1:L, 1:L)
integer*4 pbc(0:L+1), mctot, N, mcd, mcini, difE
real*8 genrand_real2, magne, energ, energia, temp, temp1, DE
real*8 mag, w(-8:8)
real*8 start, finish
real*8 sum, sume, sume2, summ, summ2, sumam, vare, varm, maxcv, maxx
real*8 cv, x, Tmaxcv, Tmaxx
integer irand, jrand
11 format(10(f20.6))
! Initialize variables
mctot = 80000
mcd = 20
mcini = 8000
N = L*L
seed0 = 20347880
nseed = 20
maxcv=0.d0
maxx=0.d0
! Initialize vector pbc
pbc(0) = L
pbc(L+1) = 1
do i = 1, L
pbc(i) = i
end do
! Initialize matrix s with random values
do i = 1, L
do j = 1, L
if (genrand_real2() < 0.5) then
s(i,j) = 1
else
s(i,j) = -1
endif
end do
end do
! Metropolis algorithm
open(1, file='Expectation values.dat')
start = omp_get_wtime()
write(1,*) '#Temp, ','E, ','E2, ','M, ','M2, ','|M|, ','VarE, ','VarM, ',&
'Cv, ','X, '
!Start loop to calculate for different temperatures
!$OMP PARALLEL PRIVATE(s,seed,w,energia,difE,irand,jrand,temp,mag,sum,sume,sume2,summ,summ2,sumam,vare,varm,cv,x)
temp1 = 1.59d0
!$OMP DO ordered schedule(dynamic)
do k = 1, 10
temp = temp1 + (0.01d0*k)
!Define the matrix w, which contains the values of the Boltzmann function for each temperature, so as not to have to calculate them each iteration
do i = -8, 8
w(i) = dexp(-i/temp)
end do
write(*,*) "Temperature: ", temp, "Thread", omp_get_thread_num()
sum = 0.d0
sume = 0.d0
sume2 = 0.d0
summ = 0.d0
summ2 = 0.d0
sumam = 0.d0
do seed = seed0, seed0 + nseed-1, 1
call init_genrand(seed)
call reinicia(s,l)
energia = energ(s,l,pbc)
do i = 1, mctot
do j = 1, N
irand = int(genrand_real2()*L) + 1
jrand = int(genrand_real2()*L) + 1
difE = int(DE(s,l,irand,jrand,pbc))
if (difE < 0) then
s(irand,jrand) = -s(irand,jrand)
energia = energia + difE
else if (genrand_real2() < w(int(difE))) then
s(irand,jrand) = -s(irand,jrand)
energia = energia + difE
endif
end do
if ((i > mcini).and.(mcd*(i/mcd)==i)) then
mag= magne(s,l)
sum = sum + 1.d0
sume = sume + energia
sume2 = sume2 + energia**2
summ = summ + mag
summ2 = summ2 + mag**2
sumam = sumam + abs(mag)
endif
end do
end do
!Energy
sume=sume/(sum*N)
sume2=sume2/(sum*N*N)
!Magnetitzation
summ = summ/(sum*N)
sumam=sumam/(sum*N)
summ2=summ2/(sum*N*N)
!Variances
vare = dsqrt(sume2-sume*sume)/dsqrt(sum)
varm = dsqrt(summ2-summ*summ)/dsqrt(sum)
!Cv
cv = (N*(sume2-sume*sume))/temp**2
if (cv.gt.maxcv) then
maxcv=cv
Tmaxcv=temp
endif
!X
x = (N*(summ2-summ*summ))/temp
if (x.gt.maxx) then
maxx=x
Tmaxx=temp
endif
write(1,11) temp,sume,sume2,summ,summ2,sumam,vare,varm,cv,x
end do
!$OMP END DO
!$OMP END PARALLEL
finish = omp_get_wtime()
close(1)
print*, "Time: ",(finish-start),"Seconds"
end program Ising
! Functions
!Function that calculates the energy of the matrix s
real*8 function energ(S,L, pbc)
implicit none
integer s(1:L, 1:L), i, j, L
integer*4 pbc(0:L+1)
real*8 ene
ene = 0.0d0
do i = 1, L
do j = 1, L
ene = ene - s(i,j) * s(pbc(i+1),j) - s(i,j) * s(i,pbc(j+1))
end do
end do
energ = ene
return
end function energ
!Function that calculates the difference in energy that occurs when the spin of position (i, j) is changed
real*8 function DE(S,L,i,j,pbc)
implicit none
integer s(1:L, 1:L), i, j, L, difE
integer*4 pbc(0:L+1)
real*8 suma
difE = 0
suma = 0.0d0
suma = suma + s(pbc(i-1),j) + s(pbc(i+1),j) + s(i,pbc(j-1)) + s(i,pbc(j+1))
difE = difE + int(2 * s(i,j) * suma)
DE = difE
return
end function DE
!Function that calculates the magnetization of the matrix s
real*8 function magne(S,L)
implicit none
integer s(1:L, 1:L),L
magne = sum(s)
return
end function magne
! SUBRUTINES
!Subroutine that resets the matrix s with random values
subroutine reinicia(S,L)
implicit none
integer s(1:L, 1:L), i,j,L
real*8 genrand_real2
do i = 1, L
do j = 1, L
if (genrand_real2() < 0.5) then
s(i,j) = 1
else
s(i,j) = -1
endif
end do
end do
return
end subroutine
I have tried parallelizing the seeds loop instead of the temperatures, but it lasts almost the same, so i think i'm not parallelizing it correctly, because it looks a nice code to parallelize.
The other option I thought of is to manually parallelize the simulation. I could do this by compiling 16 programs, each of which handles a different range of temperatures. Then I could run all of the programs concurrently, so each program would get its own thread. However, this approach would require a lot of extra RAM.

Problem in splitting array with MPI_ISEND with fortran

I wrote the following lines of code with the aim to divide the 1D array Jp (which lives on the master process only) over different processes (included the master one).
Each process have to receive a block of non-contigouos modified data (the values are changed in the inner loop) and I created a new format newF with the MPI_TYPE_INDEXED function to select the correct portion of data to send.
I used both MPI_RECV or MPI_IRECV to receive the data.
The problem is that this part of code works fine, with any numbers of tasks (from 1 to 8), until the number of element of Jp is small, when I increase such number (i.e. n = 5000), not all the processes receive the data and the splitted array JpS show the values I used to initialized it (i.e. -10000).
The lines commented show all the changes done in order to solve this problem, Does anyone have an idea?
program test_send
use mpi
implicit none
integer :: rank, nproc, mpi_stat
integer :: n, m, k, io, i, j
integer, allocatable :: Jp(:), JpS(:), JpAux(:)
integer :: count, n_distro,newF
integer, allocatable :: sendcounts(:), displ(:), &
blocklens(:), blockdisp(:), &
request(:)
integer :: ARRAY_OF_STATUS(MPI_STATUS_SIZE), error
data count /3/
call mpi_init(mpi_stat)
call mpi_comm_size(mpi_comm_world, nproc, mpi_stat)
call mpi_comm_rank(mpi_comm_world, rank, mpi_stat)
n = 400*count
allocate(sendcounts(nproc), displ(nproc), &
blocklens(count), blockdisp(count), request(nproc))
if (rank.eq.0) then
allocate(Jp(n+1),JpAux(n+1))
Jp = 0
do i = 1,n+1
Jp(i) = i
enddo
endif
call mpi_barrier(mpi_comm_world, mpi_stat)
m = n/count
n_distro = (m+1)/nproc
k = 0
do i = 1,nproc
if (i<nproc) then
sendcounts(i) = n_distro
else
sendcounts(i) = m - (nproc-1)*n_distro
endif
displ(i) = k
k = k + sendcounts(i)
enddo
call mpi_barrier(mpi_comm_world, mpi_stat)
allocate(JpS(count*sendcounts(rank+1)+1))
call mpi_barrier(mpi_comm_world, mpi_stat)
! call mpi_irecv(JpS, (sendcounts(rank+1))*count+1,mpi_int,0,0,mpi_comm_world, request(rank+1), mpi_stat)
! call mpi_recv(JpS, (sendcounts(rank+1))*count+1,mpi_int,0,0,mpi_comm_world, MPI_STATUS_IGNORE,mpi_stat)
!call mpi_waitall(1,request,ARRAY_OF_STATUS,error)
! call mpi_barrier(mpi_comm_world, mpi_stat)
if (rank.eq.0) then
do i = 0,nproc-1
JpAux = -100000
blocklens = spread(sendcounts(i+1),1,count)
blockdisp = spread(displ(i+1),1,count) + (/ (k*m, k=0,count-1) /)
blocklens(count) = blocklens(count)+1
do j = 1,count
if (j.eq.1) then
JpAux(blockdisp(j)+1:blockdisp(j)+blocklens(j)) = Jp(blockdisp(j)+1:blockdisp(j)+blocklens(j))&
-Jp(blockdisp(j)+1)
else
JpAux(blockdisp(j)+1:blockdisp(j)+blocklens(j)) = Jp( blockdisp(j) + 1 : blockdisp(j) + blocklens(j) )&
-Jp( blockdisp(j)+1 ) + JpAux( blockdisp(j-1) + blocklens(j-1))&
+(Jp( blockdisp(j-1)+blocklens(j-1)+1 )-Jp( blockdisp(j-1)+blocklens(j-1)))
endif
enddo
call mpi_type_indexed(count, blocklens, blockdisp, mpi_int, newF, mpi_stat)
call mpi_type_commit(newF, mpi_stat)
call mpi_isend(JpAux, 1, newF, i, i, mpi_comm_world, request(i+1), mpi_stat)
call mpi_type_free(newF, mpi_stat)
enddo
endif
! call mpi_wait(request(rank+1), ARRAY_OF_STATUS, mpi_stat)
call mpi_barrier(mpi_comm_world, mpi_stat)
!call mpi_waitall(1,request,ARRAY_OF_STATUS,error)
call mpi_recv(JpS, (sendcounts(rank+1))*count+1,mpi_int,0,MPI_ANY_TAG,mpi_comm_world, MPI_STATUS_IGNORE,mpi_stat)
! print*, request
print*, 'rank: ', rank, ', size: ', size(JpS), ', Jp: ', JpS
call mpi_barrier(mpi_comm_world, mpi_stat)
call mpi_finalize(mpi_stat)
end program test_send

OpenACC constant parameters

I am wondering what is the proper way to handle constants in OpenACC kernels.
For example, in the following code
module vecaddmod
implicit none
integer, parameter :: n = 100000
!$acc declare create(n)
contains
subroutine vecaddgpu(r, a, b)
real, dimension(:) :: r, a, b
integer :: i
!$acc update self(n)
!$acc data present(n)
!$acc kernels loop copyin(a(1:n),b(1:n)) copyout(r(1:n))
do i = 1, n
r(i) = a(i) + b(i)
enddo
!$acc end data
end subroutine vecaddgpu
end module vecaddmod
program main
use vecaddmod
implicit none
integer :: i, errs, argcount
real, dimension(:), allocatable :: a, b, r, e
character*10 :: arg1
allocate( a(n), b(n), r(n), e(n) )
do i = 1, n
a(i) = i
b(i) = 1000*i
enddo
! compute on the GPU
call vecaddgpu( r, a, b )
! compute on the host to compare
do i = 1, n
e(i) = a(i) + b(i)
enddo
! compare results
errs = 0
do i = 1, n
if( r(i) /= e(i) )then
errs = errs + 1
endif
enddo
print *, errs, ' errors found'
if( errs ) call exit(errs)
end program main
n is declared as a constant on CPU in a module, and it is used as the range in the loop. nvfortran warns me about Constant or Parameter used in data clause. Is the above example the proper way to handle this? Can I take advantage of the constant memory on GPU, such that I don't need to copy it from CPU to GPU for each kernel launch?
Thanks.

The compiler will replace parameters with the literal value so no need to put them in data regions.
module vecaddmod
implicit none
integer, parameter :: n = 100000
contains
subroutine vecaddgpu(r, a, b)
real, dimension(:) :: r, a, b
integer :: i
!$acc kernels loop copyin(a(1:n),b(1:n)) copyout(r(1:n))
do i = 1, n
r(i) = a(i) + b(i)
enddo
end subroutine vecaddgpu
end module vecaddmod
...
% nvfortran -acc -Minfo=accel test.f90
vecaddgpu:
11, Generating copyin(a(:100000)) << "n" is replaced with 100000
Generating copyout(r(:100000))
Generating copyin(b(:100000))
12, Loop is parallelizable
Generating Tesla code
12, !$acc loop gang, vector(128) ! blockidx%x threadidx%x

F95 Send/Receive Memory Errors for Array Sending

I'm new to parallel programming and attempting to produce a sparse matrix-vector calculation in Fortran 95. I'm working on a subprogram that only gathers the components of the vector that the sparse matrix will touch (instead of MPI_AllGather), but I keep getting SIGSESV errors. I know this means I've asked the process to touch something it can't/doesn't exist, but I can't for the life of me figure out what it could be.
!Gather the vector matrix in matrix vector multiplication for sparse matrices
subroutine sparsegather(u,BW,myid,nprocs)
use header
include "mpif.h"
type(Vector), intent(inout) :: u
integer,intent(in) :: BW !Bandwidth
integer,intent(in) :: myid !process id
integer,intent(in) :: nprocs !number of processes
integer :: n, i
integer,dimension(BW) :: rlr, rrr, slr, srr !Range of receive left/right, send left/right
real(kind=rk),dimension(BW) :: rl, rr, sl, sr !Arrays of actual values
integer :: ierr
n = u%n !Length of whole vector - used in periodic condition
!Define ranges
do i = 1,BW
rlr(i) = u%ibeg - BW - 1 + i
rrr(i) = u%iend + i
srr(i) = u%iend - i + 1
slr(i) = u%ibeg + i - 1
end do
!Periodic conditions
do i = 1,BW
if (rlr(i) < 1) then
rlr(i) = rlr(i) + n
end if
if ((srr(i) < 1) then
srr(i) = srr(i) + n
end if
if (rrr(i) > n ) then
rrr(i) = rrr(i) - n
end if
if (slr(i) > n ) then
slr(i) = slr(i) - n
end if
end do
!Store the matrix values being sent over
sl = u%xx(slr)
sr = u%xx(srr)
!Pass the value parcels around
if (myid == 0) then
call MPI_Recv(rl,BW,MPI_DOUBLE_PRECISION,nprocs-1,MPI_ANY_TAG,MPI_COMM_WORLD,ierr)
call MPI_Send(sr,BW,MPI_DOUBLE_PRECISION,myid+1,0,MPI_COMM_WORLD,ierr)
call MPI_Recv(rr,BW,MPI_DOUBLE_PRECISION,myid+1,MPI_ANY_TAG,MPI_COMM_WORLD,ierr)
call MPI_Send(sl,BW,MPI_DOUBLE_PRECISION,nprocs-1,0,MPI_COMM_WORLD,ierr)
elseif (myid == nprocs-1) then
call MPI_Send(sr,BW,MPI_DOUBLE_PRECISION,0,0,MPI_COMM_WORLD,ierr)
call MPI_Recv(rl,BW,MPI_DOUBLE_PRECISION,myid-1,MPI_ANY_TAG,MPI_COMM_WORLD,ierr)
call MPI_Send(sl,BW,MPI_DOUBLE_PRECISION,myid-1,0,MPI_COMM_WORLD,ierr)
call MPI_Recv(rr,BW,MPI_DOUBLE_PRECISION,0,MPI_ANY_TAG,MPI_COMM_WORLD,ierr)
elseif (mod(myid,2) == 0) then
call MPI_Recv(rl,BW,MPI_DOUBLE_PRECISION,myid-1,MPI_ANY_TAG,MPI_COMM_WORLD,ierr)
call MPI_Send(sr,BW,MPI_DOUBLE_PRECISION,myid+1,0,MPI_COMM_WORLD,ierr)
call MPI_Recv(rr,BW,MPI_DOUBLE_PRECISION,myid+1,MPI_ANY_TAG,MPI_COMM_WORLD,ierr)
call MPI_Send(sl,BW,MPI_DOUBLE_PRECISION,myid-1,0,MPI_COMM_WORLD,ierr)
else
call MPI_Send(sr,BW,MPI_DOUBLE_PRECISION,myid+1,0,MPI_COMM_WORLD,ierr)
call MPI_Recv(rl,BW,MPI_DOUBLE_PRECISION,myid-1,MPI_ANY_TAG,MPI_COMM_WORLD,ierr)
call MPI_Send(sl,BW,MPI_DOUBLE_PRECISION,myid-1,0,MPI_COMM_WORLD,ierr)
call MPI_Recv(rr,BW,MPI_DOUBLE_PRECISION,myid+1,MPI_ANY_TAG,MPI_COMM_WORLD,ierr)
end if
u%xx(rrr) = rr
u%xx(rlr) = rl
end subroutine sparsegather
u is an object with the vector values stored in %xx and its size in %n. The relevant starting point and end points for each processor are in %ibeg and %iend.
BW is bandwith of the sparse banded matrix. This equation has periodic conditions, so values to the left of the start of the vector wrap around to the right side (and vice versa), which is done in the periodic conditions section.

Random number generator in PGI Fortran not so random

The following code just generates a simple triple of random numbers:
program testrand
integer, parameter :: nz = 160, nf = 160, nlt = 90
real :: tmpidx(3)
integer :: idxarr(3), idx1, idx2, idx3, seed_size, ticks
integer, allocatable :: seed(:)
call random_seed(size=seed_size)
allocate(seed(seed_size))
call system_clock(count=ticks)
seed = ticks+37*(/(i-1, i=1,seed_size)/)
call random_seed(put=seed)
deallocate(seed)
call random_number(tmpidx)
idxarr = tmpidx * (/nz, nf, nlt/)
idx1 = max(1,idxarr(1))
idx2 = max(1,idxarr(2))
idx3 = max(1,idxarr(3))
print *,idx1, idx2, idx3
end program
I compile this with gfortran and run a few times and I get:
> gfortran testrand.f90
> ./a.out
74 98 86
> ./a.out
113 3 10
> ./a.out
44 104 27
Looks pretty random. Now I compile with PGI Fortran and run a few times:
> pgf90 testrand.f90
> ./a.out
1 1 1
> ./a.out
1 1 1
> ./a.out
1 1 1
Of course, there's no way to be completely sure, but I suspect this is not random. :) Anyone know what is going on here? Anyone know the right way to get random numbers with PGI Fortran?

Somehow, PGI does not implement system_clock as in GNU compilers. I do not know why, I found it recently by doing similar stuff like you.
To see what I am talking about, just print ticks after calling system_clock. Chances are that you get 0 all the time with PGI and varying numbers with GNU compilers. To solve your problem, you can adapt the code bellow. It is a slightly modified version of a code that you can get at GNU fortran web site
program testrand
use iso_fortran_env, only: int64
integer, parameter :: nz = 160, nf = 160, nlt = 90
real :: tmpidx(3)
integer :: idxarr(3), idx1, idx2, idx3, seed_size, ticks
integer, allocatable :: seed(:)
call random_seed(size=seed_size)
allocate(seed(seed_size))
! call system_clock(count=ticks)
! seed = ticks+37*(/(i-1, i=1,seed_size)/)
! call random_seed(put=seed)
!
! deallocate(seed)
call init_random_seed()
call random_number(tmpidx)
idxarr = tmpidx * (/nz, nf, nlt/)
idx1 = max(1,idxarr(1))
idx2 = max(1,idxarr(2))
idx3 = max(1,idxarr(3))
print *,idx1, idx2, idx3
contains
!
subroutine init_random_seed()
implicit none
integer, allocatable :: seed(:)
integer :: i, n, istat, dt(8), pid
integer(int64) :: t
integer, parameter :: un=703
call random_seed(size = n)
allocate(seed(n))
! First try if the OS provides a random number generator
open(unit=un, file="/dev/urandom", access="stream", &
form="unformatted", action="read", status="old", iostat=istat)
if (istat == 0) then
read(un) seed
close(un)
else
! The PID is
! useful in case one launches multiple instances of the same
! program in parallel.
call system_clock(t)
if (t == 0) then
call date_and_time(values=dt)
t = (dt(1) - 1970) * 365_int64 * 24 * 60 * 60 * 1000 &
+ dt(2) * 31_int64 * 24 * 60 * 60 * 1000 &
+ dt(3) * 24_int64 * 60 * 60 * 1000 &
+ dt(5) * 60 * 60 * 1000 &
+ dt(6) * 60 * 1000 + dt(7) * 1000 &
+ dt(8)
end if
pid = getpid()
t = ieor( t, int(pid, kind(t)) )
do i = 1, n
seed(i) = lcg(t)
end do
end if
call random_seed(put=seed)
!print*, "optimal seed = ", seed
end subroutine init_random_seed
!
function lcg(s)
integer :: lcg
integer(int64), intent(in out) :: s
if (s == 0) then
s = 104729
else
s = mod(s, 4294967296_int64)
end if
s = mod(s * 279470273_int64, 4294967291_int64)
lcg = int(mod(s, int(huge(0), 8)), kind(0))
end function lcg
!
!this option is especially used for pgf90 to provide a getpid() function
!> #brief Returns the process ID of the current process
!! #todo write the actual code, for now returns a fixed value
!<
function getpid()result(pid)
integer pid
pid = 53 !just a prime number, no special meaning
end function getpid
end program

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

partition a 2D array column-wise and use allgather - parallel-processing

Related

OpenMP on fortran 90 lasts almost the same(if not more) as non parallelized program

Problem in splitting array with MPI_ISEND with fortran

OpenACC constant parameters

F95 Send/Receive Memory Errors for Array Sending

Random number generator in PGI Fortran not so random

Categories

Resources