OpenMPI IPC performance is worse than reading/writing to file

OpenMPI IPC performance is worse than reading/writing to file - performance

I am trying out various ways of IPC to do the following:
Master starts.
Master starts a slave.
Master passes an array to slave.
Slave processes the array.
Slave sends the array back to master.
I have tried using OpenMPI to solve this by having the parent process spawn a child which in turn does the aforementioned processing. However, I have also tried - what I thought would be the worst possible way to do this - letting master write the data to a file and have slave read and write back to that file. The result is stunning.
Below is the two ways in which I achieve this. The first way is the "file" way, the second one is by using OpenMPI.
Master.f90
program master
implicit none
integer*4, dimension (10000) :: matrix
integer :: length, i, exitstatus, cmdstatus
logical :: waistatus
! put integers in matrix and output data into a file
open(1, file='matrixdata.dat', status='new')
length = 10000
do i=1,length
matrix(i) = i
write(1,*) matrix(i)
end do
close(1)
call execute_command_line("./slave.out", wait = .true., exitstat=exitstatus)
if(exitstatus .eq. 0) then
! open and read the file changed by subroutine slave
open(1, file= 'matrixdata.dat', status='old')
do i = 1, length
read(1,*) matrix(i)
end do
close(1)
endif
end program master
Slave.f90
program slave
implicit none
integer*4, dimension (10000) :: matrix
integer :: length, i
! Open and read the file made by master into a matrix
open (1, file= 'matrixdata.dat', status = 'old')
length = 10000
do i = 1, length
read(1,*) matrix(i)
end do
close(1)
! Square all numbers and write over the file with new data
open(1, file= 'matrixdata.dat', status = 'old')
do i=1,length
matrix(i) = matrix(i)**2
write(1,*) matrix(i)
end do
close(1)
end program slave
* OpenMPI *
Master.f90
program master
use mpi
implicit none
integer :: ierr, num_procs, my_id, intercomm, i, siz, array(10000000), s_tag, s_dest, siffra
CALL MPI_INIT(ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, my_id, ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, num_procs, ierr)
siz = 10000
!print *, "S.Rank =", my_id
!print *, "S.Size =", num_procs
if (.not. (ierr .eq. 0)) then
print*, "S.Unable to initilaize bös!"
stop
endif
do i=1,size(array)
array(i) = 2
enddo
if (my_id .eq. 0) then
call MPI_Comm_spawn("./slave.out", MPI_ARGV_NULL, 1, MPI_INFO_NULL, my_id, &
& MPI_COMM_WORLD, intercomm, MPI_ERRCODES_IGNORE, ierr)
s_dest = 0 !rank of destination (integer)
s_tag = 1 !message tag (integer)
call MPI_Send(array(1), siz, MPI_INTEGER, s_dest, s_tag, intercomm, ierr)
call MPI_Recv(array(1), siz, MPI_INTEGER, s_dest, s_tag, intercomm, MPI_STATUS_IGNORE, ierr)
!do i=1,10
! print *, "S.Array(",i,"): ", array(i)
!enddo
endif
call MPI_Finalize(ierr)
end program master
Slave.f90
program name
use mpi
implicit none
! type declaration statements
integer :: ierr, parent, my_id, n_procs, i, siz, array(10000000), ctag, csource, intercomm, siffra
logical :: flag
siz = 10000
! executable statements
call MPI_Init(ierr)
call MPI_Initialized(flag, ierr)
call MPI_Comm_get_parent(parent, ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, my_id, ierr)
call MPI_Comm_size(MPI_COMM_WORLD, n_procs, ierr)
csource = 0 !rank of source
ctag = 1 !message tag
call MPI_Recv(array(1), siz, MPI_INTEGER, csource, ctag, parent, MPI_STATUS_IGNORE, ierr)
!do i=1,10
! print *, "C.Array(",i,"): ", array(i)
!enddo
do i=1,size(array)
array(i) = array(i)**2
enddo
!do i=1,10
! print *, "C.Array(",i,"): ", array(i)
!enddo
call MPI_Send(array(1), siz, MPI_INTEGER, csource, ctag, parent, ierr)
call MPI_Finalize(ierr)
end program name
Now, the interesting part is that by using the time program I have measured that it takes 19.8 ms to execute the "file version of the program". The OpenMPI version takes 60 ms. Why? Is there really so much overhead in OpenMPI that it is faster to read/write to file if you're working with <400 KiB?
I tried increasing the array to 10^5 integers. The file version executes in 114ms, OpenMPI in 53ms. When increasing to 10^6 integers file: 1103 ms, OpenMPI: 77ms.
Is the overhead really that much?

Fundamentally, it doesn't make sense to use distributed processing for problem sizes that fit in cache (except in some trivially parallel cases). The typical usage scenario is for data transfer much larger than LLC. Even you biggest case (10^6) fits in modern caches.
Firstly, for the method of writing to disk, you have to be aware of the influence of a page cache in your operating system. If your MPI processes are on the same chip, the operating system just hears 'do a write' then 'do a read'. If, in the interim, nothing pollutes the page cache then it will just fetch the data from RAM as oppose to the disk. A better experiment would be to flush the page cache between the write and read (this is possible, at least on linux, via a shell command). In effect you are performing shared memory processing if you're grabbing the data from the page cache.
Also, you are using time on the command line so you're incorporating the time it takes for MPI to initialize and establish communication interfaces with a few function calls. This is not a good benchmark because the interface provided for disk IO method has already been initialized by the operating system. Also for such a small problem size, the initialization of MPI is nontrivial compared to the runtime of the body of the program. The proper way to do this is to do the timing in the code.
For both methods, you should expect linear scaling biased by the overhead of the method. In fact, you should see a few regimes as the data size surpasses LLC and page cache. The best way to do this is to repeat your runs with ARRAY_SIZE=2^n for n=12,13,..24 and check out the curve.

Related

Why is present() not resolved at runtime in Fortran? [duplicate]

After a search without any result I want to ask you a question regarding subroutines containing optional arguments and how they are treated by the compiler (run time/compile time). Consider the following example program.
module CONTAINS_ABC
contains
subroutine ABC( a, b, c)
implicit none
real, intent(in) :: a
real, intent(in), optional :: b, c
if( present(b) )then ! PATH_B
write(*,*) 'Provide c'
else if( present(c) )then "! PATH_C
write(*,*) 'Provide b'
end if
! Do all the computations with a, b and c
end subroutine ABC
end module CONTAINS_ABC
program CALL_ABC
use CONTAINS_ABC
real :: aa, bb, cc
call ABC( a = aa, b=bb )
call ABC( a = aa, c=cc )
end program CALL_ABC
I wonder how the compiler treats subroutines with optional arguments in terms of optimization. Does the compiler produce implicitly two interfaces for the subroutine FCN and chooses then the right one during compile time in the main program? Furthermore, is the present(b)/present(c) statement evaluated at runtime or during compile time?
In case I understand things right, the compiler might know that the first call to ABC leads to the path B, while the second call to ABC must lead to path C.
I ask this question, since I have a subroutine that is called million-million of times.
I want to avoid that during runtime the decision is made wether to go along path B or path C. It would be possible of course two write simply two subroutines, however, this would produce a lot of additional lines that would actually do the same thing.
Thank you all in advance for your help!

The Fortran standards are silent on pretty much every aspect of the implementation of the language other than the syntax and semantics. For example, people often think that Fortran passes arguments by reference but the standard only requires that programs behave as if arguments are passed by reference. Implementers are free to use pass-by-pixies if they wish and if they can thereby produce a convincing simulation of pass by reference.
That's just a long-winded introduction to telling you that how a particular compiler implements a language feature is, in general, compiler-specific. Implementation details are also know to vary from version to version of the same compiler.
I think that you are wrong to think that the presence of an optional argument will be checked at compile-time even if it is provable that it could be -- I just don't think that that is something that the current crop of compilers do. I expect the compiler to generate a single implementation of the subroutine with optional arguments. It does, after all, rest with the programmer to ensure that a procedure does not attempt to process absent optional arguments.
If you, the programmer, know that a procedure will be executed many times and if you suspect that different implementations, one with argument b and without c, one vice-versa, then it's on you to determine if separate implementations will be faster than one implementation with optional arguments. If you are concerned about code duplication you could always have yet a third procedure to implement the common code, and call it from both variants.
Equally, it is on you to inspect the assembler generated by your compiler to see just what your compiler (version) does with the variations in code that you write.

I will add an example. This is what gfortran-4.8 does with your code with -Ofast (-fdump-tree-optimized). There is only one copy of the subroutine and the if (present()) calls are changed to if (b_1(D) != 0B). Maybe some smaller subroutine would be inlined. Then I would expect such an optimization to kick in, but not here.
;; Function abc (__contains_abc_MOD_abc, funcdef_no=0, decl_uid=1875, cgraph_uid=0)
abc (real(kind=4) & restrict a, real(kind=4) * b, real(kind=4) * c)
{
struct __st_parameter_dt dt_parm.1;
struct __st_parameter_dt dt_parm.0;
<bb 2>:
if (b_1(D) != 0B)
goto <bb 3>;
else
goto <bb 4>;
<bb 3>:
dt_parm.0.common.filename = &"opt.f90"[1]{lb: 1 sz: 1};
dt_parm.0.common.line = 13;
dt_parm.0.common.flags = 128;
dt_parm.0.common.unit = 6;
_gfortran_st_write (&dt_parm.0);
_gfortran_transfer_character_write (&dt_parm.0, &"Provide c"[1]{lb: 1 sz: 1}, 9);
_gfortran_st_write_done (&dt_parm.0);
dt_parm.0 ={v} {CLOBBER};
goto <bb 6>;
<bb 4>:
if (c_11(D) != 0B)
goto <bb 5>;
else
goto <bb 6>;
<bb 5>:
dt_parm.1.common.filename = &"opt.f90"[1]{lb: 1 sz: 1};
dt_parm.1.common.line = 15;
dt_parm.1.common.flags = 128;
dt_parm.1.common.unit = 6;
_gfortran_st_write (&dt_parm.1);
_gfortran_transfer_character_write (&dt_parm.1, &"Provide b"[1]{lb: 1 sz: 1}, 9);
_gfortran_st_write_done (&dt_parm.1);
dt_parm.1 ={v} {CLOBBER};
<bb 6>:
return;
}
;; Function main (main, funcdef_no=2, decl_uid=1889, cgraph_uid=2) (executed once)
main (integer(kind=4) argc, character(kind=1) * * argv)
{
real(kind=4) aa;
real(kind=4) bb;
real(kind=4) cc;
static integer(kind=4) options.2[7] = {68, 1023, 0, 0, 1, 1, 0};
<bb 2>:
_gfortran_set_args (argc_2(D), argv_3(D));
_gfortran_set_options (7, &options.2[0]);
abc (&aa, &bb, 0B);
abc (&aa, 0B, &cc);
aa ={v} {CLOBBER};
bb ={v} {CLOBBER};
cc ={v} {CLOBBER};
return 0;
}
I checked this is also the case for the final assembly, but it is much more clearly visible here in GIMPLE.

Parallelization of an openMP nested do loop

I have a nested do loop in an openmp fortran 77 code that I am unable to parallelize (the code gives a segmentation fault error when it is run). I have a very similar nested do loop in a different subroutine of the same code that runs parallel with no issues.
Here is the nested do loop that I am having problems with:
do n=1,num_p
C$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(l,i1,i2,j1,j2,k1,k2
C$OMP& ,i,j,k,i_t,j_t,i_ddf,j_ddf,ddf_dum)
do l=1,n_l(n)
call del_fn(l,n)
i1=p_iw(l,n)
i2=p_ie(l,n)
j1=p_js(l,n)
j2=p_jn(l,n)
k1=p_kb(l,n)
k2=p_kt(l,n)
do i=i1,i2
i_ddf=i-i1+1
if(i .lt. 1) then
i_t=nx+i
elseif (i .gt. nx) then
i_t=i-nx
else
i_t=i
endif
do j=j1,j2
j_ddf=j-j1+1
if(j .lt.1) then
j_t=ny+j
elseif(j .gt. ny) then
j_t=j-ny
else
j_t=j
endif
do k=k1,k2
ddf(l,n,i_ddf,j_ddf,k-k1+1) = ddf_dum(i_t,j_t,k)
enddo
enddo
enddo
enddo
C$OMP END PARALLEL DO
enddo
I have narrowed the problem down to ddf_dum(i_t,j_t,k). When this term is turned off (say I replace it by 0.d0), the code runs fine.
On the other hand, I have a very similar nested do loop that runs parallel with no issues. Below is that nested do loop that runs parallel with no issues. Can anyone please identify what I am missing here?
do n=1,1
C$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(l,i1,i2,j1,j2,k1,k2
C$OMP& ,i,j,k,i_f,j_f,i_ddf,j_ddf)
do l=1,n_l(n)
i1=p_iw(l,n)
i2=p_ie(l,n)
j1=p_js(l,n)
j2=p_jn(l,n)
k1=p_kb(l,n)
k2=p_kt(l,n)
u_forcing(l,n)= (u_p(l,n)-up_tilde(l,n))/dt
v_forcing(l,n)= (v_p(l,n)-vp_tilde(l,n))/dt
w_forcing(l,n)= (w_p(l,n)-wp_tilde(l,n))/dt
do i=i1,i2
i_ddf=i-i1+1
if(i .lt. 1) then
i_f=nx+i
elseif (i .gt. nx) then
i_f=i-nx
else
i_f=i
endif
do j=j1,j2
j_ddf=j-j1+1
if(j .lt.1) then
j_f=ny+j
elseif(j .gt. ny) then
j_f=j-ny
else
j_f=j
endif
do k=k1,k2
forcing_x(i_f,j_f,k)=forcing_x(i_f,j_f,k)+u_forcing(l,n)
& *ddf_n(l,n,i_ddf,j_ddf,k-k1+1)*dv_l(l,n)
forcing_y(i_f,j_f,k)=forcing_y(i_f,j_f,k)+v_forcing(l,n)
& *ddf_n(l,n,i_ddf,j_ddf,k-k1+1)*dv_l(l,n)
forcing_z(i_f,j_f,k)=forcing_z(i_f,j_f,k)+w_forcing(l,n)
& *ddf_n(l,n,i_ddf,j_ddf,k-k1+1)*dv_l(l,n)
enddo
enddo
enddo
enddo
C$OMP END PARALLEL DO
enddo

As you noted, your problem is ddf_dum. It should be a shared variable, not private, because it is only being read from and never written to. You are getting a segfault because you are attempting to access uninitialized memory on all the threads that aren't your master thread.
A good rule of thumb that you could have used to find this mistake yourself: all variables that are only found on the RHS of your equal signs within your parallel region should always be shared.

MPI_TEST: invalid mpi_request

I want to test if mpi_iSend and mpi_iRecv have run fine.
I have 2 request(argument) vectors: one vector for all the mpi_iSend, the other for all the mpi_iRecv.
The point is that the program runs fine until it started to run the cycle for MPI_TEST. I have tried even with 2 numbers (do i=1,2), still the same error.
Fatal error in PMPI_Test: Invalid MPI_Request, error stack:
PMPI_Test(166): MPI_Test(request=0x7fff93fd2220, flag=0x7fff93fd1ffc,
status=0x7fff93fd2890) failed PMPI_Test(121): Invalid MPI_Request
INTEGER :: ierr, myid, istatus(MPI_STATUS_SIZE), num, i, n
INTEGER,parameter :: seed = 86456, numbers=200
INTEGER :: req1(numbers), req2(numbers)
LOGICAL :: flag
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
IF (myid==0) THEN
DO n=1, numbers
req1(n)=0
req2(n)=0
num=IRAND()
CALL MPI_ISEND(num,1,MPI_INTEGER,1,1,MPI_COMM_WORLD,req1(n),ierr)
CALL MPI_IRECV(best_prime,1,MPI_INTEGER,1,0,MPI_COMM_WORLD,req2(n),ierr)
END DO
ELSE IF (myid==1) THEN
DO i=1, numbers
CALL MPI_TEST(req2(i),flag,istatus,ierr)
IF (flag .eqv. .false.) THEN
WRITE(*,*)'RECV',i,'non-blocking FAIL'
ELSE IF (flag .eqv. .true.) THEN
WRITE(*,*)'RECV',i,'non-blocking SUCCESS'
END IF
END DO
END IF

Does Fortran resolve optional arguments and present statements during compile-time?

After a search without any result I want to ask you a question regarding subroutines containing optional arguments and how they are treated by the compiler (run time/compile time). Consider the following example program.
module CONTAINS_ABC
contains
subroutine ABC( a, b, c)
implicit none
real, intent(in) :: a
real, intent(in), optional :: b, c
if( present(b) )then ! PATH_B
write(*,*) 'Provide c'
else if( present(c) )then "! PATH_C
write(*,*) 'Provide b'
end if
! Do all the computations with a, b and c
end subroutine ABC
end module CONTAINS_ABC
program CALL_ABC
use CONTAINS_ABC
real :: aa, bb, cc
call ABC( a = aa, b=bb )
call ABC( a = aa, c=cc )
end program CALL_ABC
I wonder how the compiler treats subroutines with optional arguments in terms of optimization. Does the compiler produce implicitly two interfaces for the subroutine FCN and chooses then the right one during compile time in the main program? Furthermore, is the present(b)/present(c) statement evaluated at runtime or during compile time?
In case I understand things right, the compiler might know that the first call to ABC leads to the path B, while the second call to ABC must lead to path C.
I ask this question, since I have a subroutine that is called million-million of times.
I want to avoid that during runtime the decision is made wether to go along path B or path C. It would be possible of course two write simply two subroutines, however, this would produce a lot of additional lines that would actually do the same thing.
Thank you all in advance for your help!

I will add an example. This is what gfortran-4.8 does with your code with -Ofast (-fdump-tree-optimized). There is only one copy of the subroutine and the if (present()) calls are changed to if (b_1(D) != 0B). Maybe some smaller subroutine would be inlined. Then I would expect such an optimization to kick in, but not here.
;; Function abc (__contains_abc_MOD_abc, funcdef_no=0, decl_uid=1875, cgraph_uid=0)
abc (real(kind=4) & restrict a, real(kind=4) * b, real(kind=4) * c)
{
struct __st_parameter_dt dt_parm.1;
struct __st_parameter_dt dt_parm.0;
<bb 2>:
if (b_1(D) != 0B)
goto <bb 3>;
else
goto <bb 4>;
<bb 3>:
dt_parm.0.common.filename = &"opt.f90"[1]{lb: 1 sz: 1};
dt_parm.0.common.line = 13;
dt_parm.0.common.flags = 128;
dt_parm.0.common.unit = 6;
_gfortran_st_write (&dt_parm.0);
_gfortran_transfer_character_write (&dt_parm.0, &"Provide c"[1]{lb: 1 sz: 1}, 9);
_gfortran_st_write_done (&dt_parm.0);
dt_parm.0 ={v} {CLOBBER};
goto <bb 6>;
<bb 4>:
if (c_11(D) != 0B)
goto <bb 5>;
else
goto <bb 6>;
<bb 5>:
dt_parm.1.common.filename = &"opt.f90"[1]{lb: 1 sz: 1};
dt_parm.1.common.line = 15;
dt_parm.1.common.flags = 128;
dt_parm.1.common.unit = 6;
_gfortran_st_write (&dt_parm.1);
_gfortran_transfer_character_write (&dt_parm.1, &"Provide b"[1]{lb: 1 sz: 1}, 9);
_gfortran_st_write_done (&dt_parm.1);
dt_parm.1 ={v} {CLOBBER};
<bb 6>:
return;
}
;; Function main (main, funcdef_no=2, decl_uid=1889, cgraph_uid=2) (executed once)
main (integer(kind=4) argc, character(kind=1) * * argv)
{
real(kind=4) aa;
real(kind=4) bb;
real(kind=4) cc;
static integer(kind=4) options.2[7] = {68, 1023, 0, 0, 1, 1, 0};
<bb 2>:
_gfortran_set_args (argc_2(D), argv_3(D));
_gfortran_set_options (7, &options.2[0]);
abc (&aa, &bb, 0B);
abc (&aa, 0B, &cc);
aa ={v} {CLOBBER};
bb ={v} {CLOBBER};
cc ={v} {CLOBBER};
return 0;
}
I checked this is also the case for the final assembly, but it is much more clearly visible here in GIMPLE.

generating random numbers in a Fortran Module

Now I am facing the problem that in a module, with a seed I am generating random numbers to be used in a loop of a function but each time I call that function, the same random numbers are generated (because the seed is obviously the same) but it's supposed that it must continue the series or at least it must be different between calls. One solution could be that the main program gives a new seed to be used in the module but I think it there could be another elegant solution.
I am using Mersenne Twister generator by suggestion of many people.
Added
My function in my module (it is a package of functions) escentially makes such a Metropolis test using random numbers generated by a seed, for some reason compilation complains if I put
module mymod
uses mtmod
call sgrnd(4357)!<-- this line causes compilation error
contains
myfunc(args)
implicit none
// declarations etc
!call sgrnd(4357) <-- if I put this call here compilator says ok,
!but re-start random number series each time this function is called :(
....
!the following part is inside a loop
if (prob < grnd()) then
!grnd() is random number generated
return
else continue testing to the end of the loop cycle
end myfunc
But if I put that function in the contains of the main program (using mtmod too) and call sgrnd(4357) before contains section and the calls to myfunc, now everything compile and run nicely. For clarity, I didn't want to put that long function in the main program, it has 70 lines of code, but it seems I have no escape. Notice that so, the seed is once called. The simulations have now physical meanings but with that price payed.

I always used this subroutine (I'm running a MonteCarlo simulation), call it in the beginning of your main program and tis should do the job:
(Source: gfortran 4.6.1)
c initialize a random seed from the system clock at every run (fortran 95 code)
subroutine init_random_seed()
INTEGER :: i, n, clock
INTEGER, DIMENSION(:), ALLOCATABLE :: seed
CALL RANDOM_SEED(size = n)
ALLOCATE(seed(n))
CALL SYSTEM_CLOCK(COUNT=clock)
seed = clock + 37 * (/ (i - 1, i = 1, n) /)
CALL RANDOM_SEED(PUT = seed)
DEALLOCATE(seed)
end

You can find here a subroutine that uses system time to re-seed the random number generator. You shouldn't have to do this every time you call random_number(), just each time you re-start the program.
Honestly, it didn't take me more than ten minutes to find this with Google.

In order to recover my points taken, I was obliged to find my own answer, here it is (after one hour of tries)
main program is
program callrtmod
use mymod
implicit none
real::x
x=1.0
write(*,*) x+writerandnum()
write(*,*) x+writerandnum()
write(*,*) x+writerandnum()
end program callrtmod
here's my module
module mymod
implicit none
!-------------mt variables-------------
! Default seed
integer, parameter :: defaultsd = 4357
! Period parameters
integer, parameter :: N = 624, N1 = N + 1
! the array for the state vector
integer, save, dimension(0:N-1) :: mt
integer, save :: mti = N1
!--------------------------------------
contains
function writerandnum
implicit none
real(8)::writerandnum
writerandnum = grnd()
!if you please, you could perform a Metropolis test here too
end function writerandnum
!Initialization subroutine
subroutine sgrnd(seed)
implicit none
integer, intent(in) :: seed
mt(0) = iand(seed,-1)
do mti=1,N-1
mt(mti) = iand(69069 * mt(mti-1),-1)
enddo
!
return
end subroutine sgrnd
!---------------------------------------------------------------------------
!the function grnd was here
!---------------------------------------------------------------------------
subroutine mtsavef( fname, forma )
character(*), intent(in) :: fname
character, intent(in) :: forma
select case (forma)
case('u','U')
open(unit=10,file=trim(fname),status='UNKNOWN',form='UNFORMATTED', &
position='APPEND')
write(10)mti
write(10)mt
case default
open(unit=10,file=trim(fname),status='UNKNOWN',form='FORMATTED', &
position='APPEND')
write(10,*)mti
write(10,*)mt
end select
close(10)
return
end subroutine mtsavef
subroutine mtsaveu( unum, forma )
integer, intent(in) :: unum
character, intent(in) :: forma
select case (forma)
case('u','U')
write(unum)mti
write(unum)mt
case default
write(unum,*)mti
write(unum,*)mt
end select
return
end subroutine mtsaveu
subroutine mtgetf( fname, forma )
character(*), intent(in) :: fname
character, intent(in) :: forma
select case (forma)
case('u','U')
open(unit=10,file=trim(fname),status='OLD',form='UNFORMATTED')
read(10)mti
read(10)mt
case default
open(unit=10,file=trim(fname),status='OLD',form='FORMATTED')
read(10,*)mti
read(10,*)mt
end select
close(10)
return
end subroutine mtgetf
subroutine mtgetu( unum, forma )
integer, intent(in) :: unum
character, intent(in) :: forma
select case (forma)
case('u','U')
read(unum)mti
read(unum)mt
case default
read(unum,*)mti
read(unum,*)mt
end select
return
end subroutine mtgetu
!===============================================
!Random number generator
! real(8) function grnd()
function grnd !agregue yo
implicit integer(a-z)
real(8) grnd !agregue yo
! Period parameters
integer, parameter :: M = 397, MATA = -1727483681
! constant vector a
integer, parameter :: LMASK = 2147483647
! least significant r bits
integer, parameter :: UMASK = -LMASK - 1
! most significant w-r bits
! Tempering parameters
integer, parameter :: TMASKB= -1658038656, TMASKC= -272236544
dimension mag01(0:1)
data mag01/0, MATA/
save mag01
! mag01(x) = x * MATA for x=0,1
TSHFTU(y)=ishft(y,-11)
TSHFTS(y)=ishft(y,7)
TSHFTT(y)=ishft(y,15)
TSHFTL(y)=ishft(y,-18)
if(mti.ge.N) then
! generate N words at one time
if(mti.eq.N+1) then
! if sgrnd() has not been called,
call sgrnd( defaultsd )
! a default initial seed is used
endif
do kk=0,N-M-1
y=ior(iand(mt(kk),UMASK),iand(mt(kk+1),LMASK))
mt(kk)=ieor(ieor(mt(kk+M),ishft(y,-1)),mag01(iand(y,1)))
enddo
do kk=N-M,N-2
y=ior(iand(mt(kk),UMASK),iand(mt(kk+1),LMASK))
mt(kk)=ieor(ieor(mt(kk+(M-N)),ishft(y,-1)),mag01(iand(y,1)))
enddo
y=ior(iand(mt(N-1),UMASK),iand(mt(0),LMASK))
mt(N-1)=ieor(ieor(mt(M-1),ishft(y,-1)),mag01(iand(y,1)))
mti = 0
endif
y=mt(mti)
mti = mti + 1
y=ieor(y,TSHFTU(y))
y=ieor(y,iand(TSHFTS(y),TMASKB))
y=ieor(y,iand(TSHFTT(y),TMASKC))
y=ieor(y,TSHFTL(y))
if(y .lt. 0) then
grnd=(dble(y)+2.0d0**32)/(2.0d0**32-1.0d0)
else
grnd=dble(y)/(2.0d0**32-1.0d0)
endif
return
end function grnd
end module mymod
test my solution and vote me up ;) [of course, as you see, I modified mt.f90 code to be included conveniently in my module, so I can keep separately the main program from the randon numbers generation part, so I can do a Metropolis test aside the main program. The main program just wants to know if a trial was accepted or not. My solution does give more clarity to the main progam]

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio