program array
implicit none
integer(8)::i,j
real(8):: dt
real(8):: RR(1317,26123),t(26123),IR(1317,26123),f(8,26123)
complex(8):: sum(1317),factor(1317),fnl(1317)
complex(8),dimension(:,:),allocatable::R
allocate( R(1317,26123))
j=0.
sum(j)=0.
factor(j)=0.
Open(0 , File ='/Users/clyrionchen/Desktop/result1-4.log')
Open(11 , File ='/Users/clyrionchen/Desktop/ReR' )
Open(22 , File ='/Users/clyrionchen/Desktop/ImR' )
do i=1,26123
read(11,*) RR(:,i)
read(0,*) f(:,i)
read(22,*) IR(:,i)
end do
t(:)=f(1,:)
r(:,:)=cmplx(Rr(:,:),IR(:,:),kind=16)
do j=1,1317
do i=2,26123
factor(j)=(r(j,i)+r(j,i-1))/2. *(t(i)-t(i-1))
sum(j)=sum(j)+factor(j)
end do
end do
do j=1,1317
write(*,*) sum(j)
end do
end program
“Rr” and “Dr” Form a complex number, and now we need to integrate the complex number, the trapezoidal integral I wrote myself (top base plus bottom base, divided by 2, times the time interval). There are 1317 groups of points, and each group needs to be integrated to get the result. But it turns out that the value of the integral doesn't match the result. Is it the parallel error, or is it the numerical integration error? How should I write this program? Thank you very much for your answer.
I'm running a Fortran code which performs a stochastic simulation of a marked Poisson cluster process. In practice, event properties (eg. time of occurrences) are generated by inversion method, i.e. by random sampling of the cumulative distribution function.
Because of the Poissonian randomness, I expect each generated sequence to be different, but this is not the case. I guess the reason is that the seed for the pseudorandom number generator is the same at each simulation.
I do not know Fortran, so I have no idea how to solve this issue. Here is the part of the code involved with the pseudorandom number generator, any idea?
subroutine pseud0(r)
c generation of pseudo-random numbers
c data ir/584287/
data ir/574289/
ir=ir*48828125
if(ir) 10,20,20
10 ir=(ir+2147483647)+1
20 r=float(ir)*0.4656613e-9
return
end
subroutine pseudo(random)
c wichmann+hill (1982) Appl. Statist 31
data ix,iy,iz /1992,1111,1151/
ix=171*mod(ix,177)-2*(ix/177)
iy=172*mod(iy,176)-35*(iy/176)
iz=170*mod(iz,178)-63*(iz/178)
if (ix.lt.0) ix=ix+30269
if (iy.lt.0) iy=iy+30307
if (iz.lt.0) iz=iz+30323
random=mod(float(ix)/30269.0+float(iy)/30307.0+
& float(iz)/30323.0,1.0)
return
end
First, I would review the modern literature for PRNG and pick a modern implementation. Second, I would rewrite the code in modern Fortran.
You need to follow #francescalus advice and have a method for updating the seed. Without attempting to modernizing your code, here is one method for the pseud0 prng
subroutine init0(i)
integer, intent(in) :: i
common /myseed0/iseed
iseed = i
end subroutine init0
subroutine pseud0(r)
common /myseed0/ir
ir = ir * 48828125
if (ir) 10,20,20
10 ir = (ir+2147483647)+1
20 r = ir*0.4656613e-9
end subroutine pseud0
program foo
integer i
real r1
call init0(574289) ! Original seed
do i = 1, 10
call pseud0(r1)
print *, r1
end do
print *
call init0(289574) ! New seed
do i = 1, 10
call pseud0(r1)
print *, r1
end do
print *
end program foo
As of lately I have been reading and playing around with OpenMP parallel do's in Fortran 95. However, I still have not figured out how the parallel do would be used in a code like the one beneath:
I=1
DO WHILE I<100
A=2*I
B=3*I
C=A+B
SUM(I)=C
I=I+1
END DO
Using simply !$OMP PARALLEL DO before the do loop and !$OMP END PARALLEL DO doesn't seem to work. I have read a couple of things about private and shared variables however I think that each successive loop of the code above is completely independent. Any help would be appreciated greatly.
The parallel do construct doesn't work with do while loops. You need to change the do while loop to a standard DO loop. This is from the OpenMP 4.0 standard on the parallel do construct at https://www.openmp.org/wp-content/uploads/OpenMP4.0.0.pdf, page 59:
• The associated do-loops must be structured blocks.
• Only an iteration of the innermost associated loop may be curtailed by a CYCLE statement.
• No statement in the associated loops other than the DO statements can cause a branch out of the loops.
• The do-loop iteration variable must be of type integer.
• The do-loop cannot be a DO WHILE or a DO loop without loop control.
The following example may help with understanding your approach for what you have outlined.
It shows the use of !$OMP and also identifies the thread being used for each iteration of the loop.
I changed SUM to SUMI to retain SUM as an intrinsic function.
Hopefully you can build on this.
use omp_lib
real sumi(99), a,b,c
integer thread_used(0:9), I
nThreads = omp_get_max_threads ()
thread_used = 0
!$OMP PARALLEL DO &
!$OMP SHARED (SUMI,thread_used) &
!$OMP PRIVATE (i,a,b,c,iThread)
DO I = 1,99
iThread = omp_get_thread_num ()
thread_used(iThread) = thread_used(iThread) + 1
A=2*I
B=3*I
C=A+B
SUMI(I)=C
END DO
!$OMP END PARALLEL DO
write (*,*) sum (SUMI)
do i = 0, nThreads
write (*,*) i, thread_used(i)
end do
end
I don't get any speedup when I try to do the following in the subroutine:
!$ call omp_set_num_threads(threadno)
call system_clock(x1)
!$OMP PARALLEL do private(i), reduction(+:total)
do i = 1,m
total = 0.d0
call result(a,l,b,qm,q,en) !here l is input for subroutine and en is output
qm(:,i) = q
qtv(i) = qt
mean = sum(q)/size(q)
do i2 = 1,k
total = total + ((mean-q(i2))**2)/(a+b)
end do
qvv(i1) = total
end do
call system_clock(x2)
print *, x2-x1
!$OMP END PARALLEL do
Comments on the OpenMP part:
total should not be reset in the loop but before the !$OMP clause.
i2 and mean should be private.
If q does not change between iterations of the loop, sum(q)/size(q) should be placed outside.
The lack of private setting can lead to memory access conflicts (and thus slowdowns).
I guess that the code you show is close to but not really the one that you compile. It would be useful to have a compiled code to provide a better help.
Cheers,
Pierre
EDIT: for timing OpenMP code, you should use omp_get_wtime (see https://gcc.gnu.org/onlinedocs/libgomp/omp_005fget_005fwtime.html) that gives the walltime https://en.wikipedia.org/wiki/Wall-clock_time. The module for openmp routines is loaded with use omp_lib
I'm trying to implement a data transfer using Fortran 90 and MPI in which every node sends a specific buffer to every other node, i.e. for N nodes I have N-1 buffers to be sent, each one with a different content specific to the recipient node. This involves a double loop and non-blocking send/recv calls; here's the code:
program a2a
implicit none
include "mpif.h"
integer, parameter :: ILEN=4
integer :: nn,i,j,me,ierr
integer :: status(MPI_status_size)
integer, allocatable :: sndv(:),rcvv(:),ireq(:)
call MPI_init(ierr)
call MPI_comm_size(mpi_comm_world,nn,ierr)
nn=nn-1
call MPI_comm_rank(mpi_comm_world,me,ierr)
allocate(sndv(0:nn),rcvv(0:nn),ireq(0:nn))
do i=0,nn
sndv(i)=10*me+i
rcvv(i)=0
end do
do i=0,nn
if (i == me) then
do j=0,nn
if (i == j) cycle
call MPI_isend(sndv(j),ILEN,MPI_byte,j,1000+j,MPI_comm_world,ireq(j),ierr)
write(*,*) 1000+j,'Send - #',me,' -> #',j,': ',sndv(j),ireq(j)
end do
else
do j=0,nn
if (i == j) cycle
call MPI_irecv(rcvv(j),ILEN,MPI_byte,j,1000+j,MPI_comm_world,ireq(j),ierr)
write(*,*) 1000+j,'Recv0 #',i,' -> #',j,': ',rcvv(j),ireq(j)
end do
end if
end do
do j=0,nn
if (me == j) cycle
call MPI_wait(ireq(j),status,ierr)
write(*,*) 1000+j,'Recv1 #',me,' -> #',j,': ',rcvv(j),ireq(j)
end do
call MPI_barrier(MPI_comm_world,ierr)
do i=0,nn
write(*,*) 'Recv2 #',i,' -> #',me,': ',rcvv(i)
end do
call MPI_finalize(ierr)
end program a2a
The expected result for a run with just two nodes is that node 0 sends "1" to node 1 and node 1 sends "10" to node 0. The actual result is that nothing seems to be sent, although there is no deadlock and the tags and request numbers seem to be correct. What is wrong here?
Thomas
Look at the MPI_irecv command, and what it should be:
MPI_irecv(rcvv(j),ILEN,MPI_byte,j, 1000+j,MPI_comm_world,ireq(j), ierr)
MPI_irecv(sendBuf, len,type, source, tag, comm, request, ierr)
Specifically, you have set your source variable to be j. If you look at the MPI_isend command, however, the processor that is sending the information is processor i (the send only occurs if i == me). Change the source in your MPI_irecv command to i and it should work fine.
That said, this seems like a perfect use case for an MPI_Alltoall command, why don't you use that instead?
Turns out, the whole approach of the program was flawed, because for tests with more than 2 nodes, deadlocks occurred and/or buffers got mixed up. For the record, below is a new program that seems to do the job correctly.
#wolfPack88 concerning the suggestion to use MPI_Alltoallv: yes, in principle that would do it. However, in my actual problem, for which this is just a test, it is even more complicated in that the nodes involved in the whole task can be only a fairly small subset of all nodes of the run. In that case MPI_Alltoallv might be overkill and would presumably cause unnecessary communication. Still, pointing me to the mistake with the source finally opened my eyes to the root of the trouble, so thanks for that.
Here's the code:
program a2a
implicit none
include "mpif.h"
integer, parameter :: ILEN=4
integer :: nn,i,me,ierr
integer :: status(MPI_status_size)
integer, allocatable :: sndv(:),rcvv(:),ireq(:)
integer, external :: isend,irecv,mynode,numnodes
call MPI_init(ierr)
call MPI_comm_size(mpi_comm_world,nn,ierr)
nn=nn-1
call MPI_comm_rank(mpi_comm_world,me,ierr)
allocate(sndv(0:nn),rcvv(0:nn),ireq(0:nn))
do i=0,nn
sndv(i)=10*me+i
rcvv(i)=0
end do
do i=0,nn
if (i == me) cycle
call MPI_irecv(rcvv(i),ILEN,MPI_byte,i,1000*i+me,MPI_comm_world,ireq(i),ierr)
end do
do i=0,nn
if (me == i) cycle
call MPI_isend(sndv(i),ILEN,MPI_byte,i,1000*me+i,MPI_comm_world,ireq(i),ierr)
write(*,*) 1000*me+i,'Send - #',me,' -> #',i,': ',sndv(i),ireq(i)
end do
do i=0,nn
if (me == i) cycle
call MPI_wait(ireq(i),status,ierr)
end do
call MPI_barrier(MPI_comm_world,ierr)
do i=0,nn
if (i /= me) write(*,*) 'Recv2 #',i,' -> #',me,': ',rcvv(i)
end do
call MPI_finalize(ierr)
end program a2a