MPI_TEST: invalid mpi_request - parallel-processing

I want to test if mpi_iSend and mpi_iRecv have run fine.
I have 2 request(argument) vectors: one vector for all the mpi_iSend, the other for all the mpi_iRecv.
The point is that the program runs fine until it started to run the cycle for MPI_TEST. I have tried even with 2 numbers (do i=1,2), still the same error.
Fatal error in PMPI_Test: Invalid MPI_Request, error stack:
PMPI_Test(166): MPI_Test(request=0x7fff93fd2220, flag=0x7fff93fd1ffc,
status=0x7fff93fd2890) failed PMPI_Test(121): Invalid MPI_Request
INTEGER :: ierr, myid, istatus(MPI_STATUS_SIZE), num, i, n
INTEGER,parameter :: seed = 86456, numbers=200
INTEGER :: req1(numbers), req2(numbers)
LOGICAL :: flag
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
IF (myid==0) THEN
DO n=1, numbers
req1(n)=0
req2(n)=0
num=IRAND()
CALL MPI_ISEND(num,1,MPI_INTEGER,1,1,MPI_COMM_WORLD,req1(n),ierr)
CALL MPI_IRECV(best_prime,1,MPI_INTEGER,1,0,MPI_COMM_WORLD,req2(n),ierr)
END DO
ELSE IF (myid==1) THEN
DO i=1, numbers
CALL MPI_TEST(req2(i),flag,istatus,ierr)
IF (flag .eqv. .false.) THEN
WRITE(*,*)'RECV',i,'non-blocking FAIL'
ELSE IF (flag .eqv. .true.) THEN
WRITE(*,*)'RECV',i,'non-blocking SUCCESS'
END IF
END DO
END IF

Related

A value sended by host not return correctly by device using CUDA Fortran

I took an example of data transfer between Host and Device for CUDA Fortran and found this:
Host Code:
program incTest
use cudafor
use simpleOps_m
implicit none
integer, parameter :: n = 256
integer :: a(n), b, i
integer, device :: a_d(n)
a = 1
b = 3
a_d = a
call inc<<<1,n>>>(a_d, b)
a = a_d
if (all(a == 4)) then
write(*,*) 'Success'
endif
end program incTest
Device Code:
module simpleOps_m
contains
attributes(global) subroutine inc(a, b)
implicit none
integer :: a(:)
integer, value :: b
integer :: i
i = threadIdx%x
a(i) = a(i)+b
end subroutine inc
end module simpleOps_m
The expected outcome is the console presenting "Success", but this did not happen. Nothing appears in the screen, nothing errors or messages.
This happen because don't enter in if, because a_d has the same value that before call inc subroutine.
I'm using:
OS: Linux - Ubuntu 16
Cuda 8
PGI to compile
Commands to compile:
pgf90 -Mcuda -c Device.cuf
pgf90 -Mcuda -c Host.cuf
pgf90 -Mcuda -o HostDevice Device.o Host.o
./HostDevice
I tried other examples and they did not work too.
I tried using simple Fortran (.f90) code with the same commands to compile and it works!
How can I fix this problem?
What type of device are you using? (If you don't know, post the output from the "pgaccelinfo" utility).
My best guess is that you have a Pascal based device in which case you need to compile with "-Mcuda=cc60".
For example, if I add error checking to the example code, we see that we get an invalid device kernel error when running on a Pascal without the "cc60" as part of the compilation.
% cat test.cuf
module simpleOps_m
contains
attributes(global) subroutine inc(a, b)
implicit none
integer :: a(:)
integer, value :: b
integer :: i
i = threadIdx%x
a(i) = a(i)+b
end subroutine inc
end module simpleOps_m
program incTest
use cudafor
use simpleOps_m
implicit none
integer, parameter :: n = 256
integer :: a(n), b, i, istat
integer, device :: a_d(n)
a = 1
b = 3
a_d = a
call inc<<<1,n>>>(a_d, b)
istat=cudaDeviceSynchronize()
istat=cudaGetLastError()
a = a_d
if (all(a == 4)) then
write(*,*) 'Success'
else
write(*,*) 'Error code:', cudaGetErrorString(istat)
endif
end program incTest
% pgf90 test.cuf -Mcuda
% a.out
Error code:
invalid device function
% pgf90 test.cuf -Mcuda=cc60
% a.out
Success

OpenMPI IPC performance is worse than reading/writing to file

I am trying out various ways of IPC to do the following:
Master starts.
Master starts a slave.
Master passes an array to slave.
Slave processes the array.
Slave sends the array back to master.
I have tried using OpenMPI to solve this by having the parent process spawn a child which in turn does the aforementioned processing. However, I have also tried - what I thought would be the worst possible way to do this - letting master write the data to a file and have slave read and write back to that file. The result is stunning.
Below is the two ways in which I achieve this. The first way is the "file" way, the second one is by using OpenMPI.
Master.f90
program master
implicit none
integer*4, dimension (10000) :: matrix
integer :: length, i, exitstatus, cmdstatus
logical :: waistatus
! put integers in matrix and output data into a file
open(1, file='matrixdata.dat', status='new')
length = 10000
do i=1,length
matrix(i) = i
write(1,*) matrix(i)
end do
close(1)
call execute_command_line("./slave.out", wait = .true., exitstat=exitstatus)
if(exitstatus .eq. 0) then
! open and read the file changed by subroutine slave
open(1, file= 'matrixdata.dat', status='old')
do i = 1, length
read(1,*) matrix(i)
end do
close(1)
endif
end program master
Slave.f90
program slave
implicit none
integer*4, dimension (10000) :: matrix
integer :: length, i
! Open and read the file made by master into a matrix
open (1, file= 'matrixdata.dat', status = 'old')
length = 10000
do i = 1, length
read(1,*) matrix(i)
end do
close(1)
! Square all numbers and write over the file with new data
open(1, file= 'matrixdata.dat', status = 'old')
do i=1,length
matrix(i) = matrix(i)**2
write(1,*) matrix(i)
end do
close(1)
end program slave
* OpenMPI *
Master.f90
program master
use mpi
implicit none
integer :: ierr, num_procs, my_id, intercomm, i, siz, array(10000000), s_tag, s_dest, siffra
CALL MPI_INIT(ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, my_id, ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, num_procs, ierr)
siz = 10000
!print *, "S.Rank =", my_id
!print *, "S.Size =", num_procs
if (.not. (ierr .eq. 0)) then
print*, "S.Unable to initilaize bös!"
stop
endif
do i=1,size(array)
array(i) = 2
enddo
if (my_id .eq. 0) then
call MPI_Comm_spawn("./slave.out", MPI_ARGV_NULL, 1, MPI_INFO_NULL, my_id, &
& MPI_COMM_WORLD, intercomm, MPI_ERRCODES_IGNORE, ierr)
s_dest = 0 !rank of destination (integer)
s_tag = 1 !message tag (integer)
call MPI_Send(array(1), siz, MPI_INTEGER, s_dest, s_tag, intercomm, ierr)
call MPI_Recv(array(1), siz, MPI_INTEGER, s_dest, s_tag, intercomm, MPI_STATUS_IGNORE, ierr)
!do i=1,10
! print *, "S.Array(",i,"): ", array(i)
!enddo
endif
call MPI_Finalize(ierr)
end program master
Slave.f90
program name
use mpi
implicit none
! type declaration statements
integer :: ierr, parent, my_id, n_procs, i, siz, array(10000000), ctag, csource, intercomm, siffra
logical :: flag
siz = 10000
! executable statements
call MPI_Init(ierr)
call MPI_Initialized(flag, ierr)
call MPI_Comm_get_parent(parent, ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, my_id, ierr)
call MPI_Comm_size(MPI_COMM_WORLD, n_procs, ierr)
csource = 0 !rank of source
ctag = 1 !message tag
call MPI_Recv(array(1), siz, MPI_INTEGER, csource, ctag, parent, MPI_STATUS_IGNORE, ierr)
!do i=1,10
! print *, "C.Array(",i,"): ", array(i)
!enddo
do i=1,size(array)
array(i) = array(i)**2
enddo
!do i=1,10
! print *, "C.Array(",i,"): ", array(i)
!enddo
call MPI_Send(array(1), siz, MPI_INTEGER, csource, ctag, parent, ierr)
call MPI_Finalize(ierr)
end program name
Now, the interesting part is that by using the time program I have measured that it takes 19.8 ms to execute the "file version of the program". The OpenMPI version takes 60 ms. Why? Is there really so much overhead in OpenMPI that it is faster to read/write to file if you're working with <400 KiB?
I tried increasing the array to 10^5 integers. The file version executes in 114ms, OpenMPI in 53ms. When increasing to 10^6 integers file: 1103 ms, OpenMPI: 77ms.
Is the overhead really that much?
Fundamentally, it doesn't make sense to use distributed processing for problem sizes that fit in cache (except in some trivially parallel cases). The typical usage scenario is for data transfer much larger than LLC. Even you biggest case (10^6) fits in modern caches.
Firstly, for the method of writing to disk, you have to be aware of the influence of a page cache in your operating system. If your MPI processes are on the same chip, the operating system just hears 'do a write' then 'do a read'. If, in the interim, nothing pollutes the page cache then it will just fetch the data from RAM as oppose to the disk. A better experiment would be to flush the page cache between the write and read (this is possible, at least on linux, via a shell command). In effect you are performing shared memory processing if you're grabbing the data from the page cache.
Also, you are using time on the command line so you're incorporating the time it takes for MPI to initialize and establish communication interfaces with a few function calls. This is not a good benchmark because the interface provided for disk IO method has already been initialized by the operating system. Also for such a small problem size, the initialization of MPI is nontrivial compared to the runtime of the body of the program. The proper way to do this is to do the timing in the code.
For both methods, you should expect linear scaling biased by the overhead of the method. In fact, you should see a few regimes as the data size surpasses LLC and page cache. The best way to do this is to repeat your runs with ARRAY_SIZE=2^n for n=12,13,..24 and check out the curve.

Fortran Intent (in)

I am having a problem with this simple code, giving me the following error.
Somehow it is saying that there is a conflict with the Intent (in) attribute.
gfortran -o build/lib/larsa.o -c -ffree-form -g -J./build/lib lib/larsa.f
lib/larsa.f:2701.8:
sep, sty, shr &
1
Error: PROCEDURE attribute conflicts with INTENT attribute in 'sep' at (1)
lib/larsa.f:2710.17:
If (Len_trim (sep) > 0) Then
1
Error: 'string' argument of 'len_trim' intrinsic at (1) must be CHARACTER
This is the subroutine
Subroutine write_separator_new &
( &
sep, sty, shr &
)
Character (len=*), Intent(in) :: sep, sty
Integer, Intent(in), Optional :: shr
Character (len=65) :: a, fmt
If (Len_trim (sep) > 0) Then
a = Repeat (sep(1), 60)
Else
Write (*,*) ""
End If
End Subroutine write_separator_new
Indexing of character strings requires :
a = Repeat (sep(1:1), 60)
The compiler assumed sep is a a function, because you used it as such and not as a character string.

Compile errors with Fortran90

All, I've been fighting these errors for hours, here's my code:
program hello
implicit none
integer :: k, n, iterator
integer, dimension(18) :: objectArray
call SetVariablesFromFile()
do iterator = 1, 18
write(*,*) objectArray(iterator)
end do
contains
subroutine SetVariablesFromFile()
IMPLICIT NONE
integer :: status, ierror, i, x
open(UNIT = 1, FILE = 'input.txt', &
ACTION = 'READ',STATUS = 'old', IOSTAT = ierror)
if(ierror /= 0) THEN
write(*, *) "Failed to open input.txt!"
stop
end if
do i = 1, 18
objectArray(i) = read(1, *, IOSTAT = status) x
if (status > 0) then
write(*,*) "Error reading input file"
exit
else if (status < 0) then
write(*,*) "EOF"
exit
end if
end do
close(1)
END subroutine SetVariablesFromFile
end program hello
I'm getting compile errors:
make: * [hello.o] Error1
Syntax error in argument list at (1)
I read online that the latter error could be due to a long line of code exceeding 132 characters, which doesn't appear to be the problem.I have no where to begin on the first error... any help would be much appreciated!
This,
objectArray(i) = read(1, *, IOSTAT = status) x
is not valid Fortran. You need to write it as,
read(1,*,iostat=status) objectArray(i)
Setting it in this correct form, I received no compiler errors with ifort 12.1, nor with gfortran 4.4.3

generating random numbers in a Fortran Module

Now I am facing the problem that in a module, with a seed I am generating random numbers to be used in a loop of a function but each time I call that function, the same random numbers are generated (because the seed is obviously the same) but it's supposed that it must continue the series or at least it must be different between calls. One solution could be that the main program gives a new seed to be used in the module but I think it there could be another elegant solution.
I am using Mersenne Twister generator by suggestion of many people.
Added
My function in my module (it is a package of functions) escentially makes such a Metropolis test using random numbers generated by a seed, for some reason compilation complains if I put
module mymod
uses mtmod
call sgrnd(4357)!<-- this line causes compilation error
contains
myfunc(args)
implicit none
// declarations etc
!call sgrnd(4357) <-- if I put this call here compilator says ok,
!but re-start random number series each time this function is called :(
....
!the following part is inside a loop
if (prob < grnd()) then
!grnd() is random number generated
return
else continue testing to the end of the loop cycle
end myfunc
But if I put that function in the contains of the main program (using mtmod too) and call sgrnd(4357) before contains section and the calls to myfunc, now everything compile and run nicely. For clarity, I didn't want to put that long function in the main program, it has 70 lines of code, but it seems I have no escape. Notice that so, the seed is once called. The simulations have now physical meanings but with that price payed.
I always used this subroutine (I'm running a MonteCarlo simulation), call it in the beginning of your main program and tis should do the job:
(Source: gfortran 4.6.1)
c initialize a random seed from the system clock at every run (fortran 95 code)
subroutine init_random_seed()
INTEGER :: i, n, clock
INTEGER, DIMENSION(:), ALLOCATABLE :: seed
CALL RANDOM_SEED(size = n)
ALLOCATE(seed(n))
CALL SYSTEM_CLOCK(COUNT=clock)
seed = clock + 37 * (/ (i - 1, i = 1, n) /)
CALL RANDOM_SEED(PUT = seed)
DEALLOCATE(seed)
end
You can find here a subroutine that uses system time to re-seed the random number generator. You shouldn't have to do this every time you call random_number(), just each time you re-start the program.
Honestly, it didn't take me more than ten minutes to find this with Google.
In order to recover my points taken, I was obliged to find my own answer, here it is (after one hour of tries)
main program is
program callrtmod
use mymod
implicit none
real::x
x=1.0
write(*,*) x+writerandnum()
write(*,*) x+writerandnum()
write(*,*) x+writerandnum()
end program callrtmod
here's my module
module mymod
implicit none
!-------------mt variables-------------
! Default seed
integer, parameter :: defaultsd = 4357
! Period parameters
integer, parameter :: N = 624, N1 = N + 1
! the array for the state vector
integer, save, dimension(0:N-1) :: mt
integer, save :: mti = N1
!--------------------------------------
contains
function writerandnum
implicit none
real(8)::writerandnum
writerandnum = grnd()
!if you please, you could perform a Metropolis test here too
end function writerandnum
!Initialization subroutine
subroutine sgrnd(seed)
implicit none
integer, intent(in) :: seed
mt(0) = iand(seed,-1)
do mti=1,N-1
mt(mti) = iand(69069 * mt(mti-1),-1)
enddo
!
return
end subroutine sgrnd
!---------------------------------------------------------------------------
!the function grnd was here
!---------------------------------------------------------------------------
subroutine mtsavef( fname, forma )
character(*), intent(in) :: fname
character, intent(in) :: forma
select case (forma)
case('u','U')
open(unit=10,file=trim(fname),status='UNKNOWN',form='UNFORMATTED', &
position='APPEND')
write(10)mti
write(10)mt
case default
open(unit=10,file=trim(fname),status='UNKNOWN',form='FORMATTED', &
position='APPEND')
write(10,*)mti
write(10,*)mt
end select
close(10)
return
end subroutine mtsavef
subroutine mtsaveu( unum, forma )
integer, intent(in) :: unum
character, intent(in) :: forma
select case (forma)
case('u','U')
write(unum)mti
write(unum)mt
case default
write(unum,*)mti
write(unum,*)mt
end select
return
end subroutine mtsaveu
subroutine mtgetf( fname, forma )
character(*), intent(in) :: fname
character, intent(in) :: forma
select case (forma)
case('u','U')
open(unit=10,file=trim(fname),status='OLD',form='UNFORMATTED')
read(10)mti
read(10)mt
case default
open(unit=10,file=trim(fname),status='OLD',form='FORMATTED')
read(10,*)mti
read(10,*)mt
end select
close(10)
return
end subroutine mtgetf
subroutine mtgetu( unum, forma )
integer, intent(in) :: unum
character, intent(in) :: forma
select case (forma)
case('u','U')
read(unum)mti
read(unum)mt
case default
read(unum,*)mti
read(unum,*)mt
end select
return
end subroutine mtgetu
!===============================================
!Random number generator
! real(8) function grnd()
function grnd !agregue yo
implicit integer(a-z)
real(8) grnd !agregue yo
! Period parameters
integer, parameter :: M = 397, MATA = -1727483681
! constant vector a
integer, parameter :: LMASK = 2147483647
! least significant r bits
integer, parameter :: UMASK = -LMASK - 1
! most significant w-r bits
! Tempering parameters
integer, parameter :: TMASKB= -1658038656, TMASKC= -272236544
dimension mag01(0:1)
data mag01/0, MATA/
save mag01
! mag01(x) = x * MATA for x=0,1
TSHFTU(y)=ishft(y,-11)
TSHFTS(y)=ishft(y,7)
TSHFTT(y)=ishft(y,15)
TSHFTL(y)=ishft(y,-18)
if(mti.ge.N) then
! generate N words at one time
if(mti.eq.N+1) then
! if sgrnd() has not been called,
call sgrnd( defaultsd )
! a default initial seed is used
endif
do kk=0,N-M-1
y=ior(iand(mt(kk),UMASK),iand(mt(kk+1),LMASK))
mt(kk)=ieor(ieor(mt(kk+M),ishft(y,-1)),mag01(iand(y,1)))
enddo
do kk=N-M,N-2
y=ior(iand(mt(kk),UMASK),iand(mt(kk+1),LMASK))
mt(kk)=ieor(ieor(mt(kk+(M-N)),ishft(y,-1)),mag01(iand(y,1)))
enddo
y=ior(iand(mt(N-1),UMASK),iand(mt(0),LMASK))
mt(N-1)=ieor(ieor(mt(M-1),ishft(y,-1)),mag01(iand(y,1)))
mti = 0
endif
y=mt(mti)
mti = mti + 1
y=ieor(y,TSHFTU(y))
y=ieor(y,iand(TSHFTS(y),TMASKB))
y=ieor(y,iand(TSHFTT(y),TMASKC))
y=ieor(y,TSHFTL(y))
if(y .lt. 0) then
grnd=(dble(y)+2.0d0**32)/(2.0d0**32-1.0d0)
else
grnd=dble(y)/(2.0d0**32-1.0d0)
endif
return
end function grnd
end module mymod
test my solution and vote me up ;) [of course, as you see, I modified mt.f90 code to be included conveniently in my module, so I can keep separately the main program from the randon numbers generation part, so I can do a Metropolis test aside the main program. The main program just wants to know if a trial was accepted or not. My solution does give more clarity to the main progam]

Resources