Related
I'm trying to parallelize a simulation of an Ising 2D model to get some expected values as a function of the temperature of the system. For L=48, the one-threaded version takes about 240 seconds to run 20 temperatures and 1 seed each, but the parallelized version takes about 268 seconds, which is similar.
If you take the time per seed per temperature, it results in 12 seconds for the one-threaded version and 13.4 seconds for the parallelized version. I'm looking for help with my code because I don't understand these durations. I thought that the parallelized version would split one temperature among all threads and therefore should take about 30 seconds to complete.
I need to run the simulation for 50 temperatures and 200 seeds each, for 5 values of L. It would be helpful to reduce the compute time, because otherwise it could take 20 hours for L=48 and some days for L=72.
I'm using an i7-10700KF (8 cores, 16 logical threads).
program Ising
use omp_lib
implicit none
integer L, seed, i, j, seed0, nseed,k
parameter (L=48)
integer s(1:L, 1:L)
integer*4 pbc(0:L+1), mctot, N, mcd, mcini, difE
real*8 genrand_real2, magne, energ, energia, temp, temp1, DE
real*8 mag, w(-8:8)
real*8 start, finish
real*8 sum, sume, sume2, summ, summ2, sumam, vare, varm, maxcv, maxx
real*8 cv, x, Tmaxcv, Tmaxx
integer irand, jrand
11 format(10(f20.6))
! Initialize variables
mctot = 80000
mcd = 20
mcini = 8000
N = L*L
seed0 = 20347880
nseed = 20
maxcv=0.d0
maxx=0.d0
! Initialize vector pbc
pbc(0) = L
pbc(L+1) = 1
do i = 1, L
pbc(i) = i
end do
! Initialize matrix s with random values
do i = 1, L
do j = 1, L
if (genrand_real2() < 0.5) then
s(i,j) = 1
else
s(i,j) = -1
endif
end do
end do
! Metropolis algorithm
open(1, file='Expectation values.dat')
start = omp_get_wtime()
write(1,*) '#Temp, ','E, ','E2, ','M, ','M2, ','|M|, ','VarE, ','VarM, ',&
'Cv, ','X, '
!Start loop to calculate for different temperatures
!$OMP PARALLEL PRIVATE(s,seed,w,energia,difE,irand,jrand,temp,mag,sum,sume,sume2,summ,summ2,sumam,vare,varm,cv,x)
temp1 = 1.59d0
!$OMP DO ordered schedule(dynamic)
do k = 1, 10
temp = temp1 + (0.01d0*k)
!Define the matrix w, which contains the values of the Boltzmann function for each temperature, so as not to have to calculate them each iteration
do i = -8, 8
w(i) = dexp(-i/temp)
end do
write(*,*) "Temperature: ", temp, "Thread", omp_get_thread_num()
sum = 0.d0
sume = 0.d0
sume2 = 0.d0
summ = 0.d0
summ2 = 0.d0
sumam = 0.d0
do seed = seed0, seed0 + nseed-1, 1
call init_genrand(seed)
call reinicia(s,l)
energia = energ(s,l,pbc)
do i = 1, mctot
do j = 1, N
irand = int(genrand_real2()*L) + 1
jrand = int(genrand_real2()*L) + 1
difE = int(DE(s,l,irand,jrand,pbc))
if (difE < 0) then
s(irand,jrand) = -s(irand,jrand)
energia = energia + difE
else if (genrand_real2() < w(int(difE))) then
s(irand,jrand) = -s(irand,jrand)
energia = energia + difE
endif
end do
if ((i > mcini).and.(mcd*(i/mcd)==i)) then
mag= magne(s,l)
sum = sum + 1.d0
sume = sume + energia
sume2 = sume2 + energia**2
summ = summ + mag
summ2 = summ2 + mag**2
sumam = sumam + abs(mag)
endif
end do
end do
!Energy
sume=sume/(sum*N)
sume2=sume2/(sum*N*N)
!Magnetitzation
summ = summ/(sum*N)
sumam=sumam/(sum*N)
summ2=summ2/(sum*N*N)
!Variances
vare = dsqrt(sume2-sume*sume)/dsqrt(sum)
varm = dsqrt(summ2-summ*summ)/dsqrt(sum)
!Cv
cv = (N*(sume2-sume*sume))/temp**2
if (cv.gt.maxcv) then
maxcv=cv
Tmaxcv=temp
endif
!X
x = (N*(summ2-summ*summ))/temp
if (x.gt.maxx) then
maxx=x
Tmaxx=temp
endif
write(1,11) temp,sume,sume2,summ,summ2,sumam,vare,varm,cv,x
end do
!$OMP END DO
!$OMP END PARALLEL
finish = omp_get_wtime()
close(1)
print*, "Time: ",(finish-start),"Seconds"
end program Ising
! Functions
!Function that calculates the energy of the matrix s
real*8 function energ(S,L, pbc)
implicit none
integer s(1:L, 1:L), i, j, L
integer*4 pbc(0:L+1)
real*8 ene
ene = 0.0d0
do i = 1, L
do j = 1, L
ene = ene - s(i,j) * s(pbc(i+1),j) - s(i,j) * s(i,pbc(j+1))
end do
end do
energ = ene
return
end function energ
!Function that calculates the difference in energy that occurs when the spin of position (i, j) is changed
real*8 function DE(S,L,i,j,pbc)
implicit none
integer s(1:L, 1:L), i, j, L, difE
integer*4 pbc(0:L+1)
real*8 suma
difE = 0
suma = 0.0d0
suma = suma + s(pbc(i-1),j) + s(pbc(i+1),j) + s(i,pbc(j-1)) + s(i,pbc(j+1))
difE = difE + int(2 * s(i,j) * suma)
DE = difE
return
end function DE
!Function that calculates the magnetization of the matrix s
real*8 function magne(S,L)
implicit none
integer s(1:L, 1:L),L
magne = sum(s)
return
end function magne
! SUBRUTINES
!Subroutine that resets the matrix s with random values
subroutine reinicia(S,L)
implicit none
integer s(1:L, 1:L), i,j,L
real*8 genrand_real2
do i = 1, L
do j = 1, L
if (genrand_real2() < 0.5) then
s(i,j) = 1
else
s(i,j) = -1
endif
end do
end do
return
end subroutine
I have tried parallelizing the seeds loop instead of the temperatures, but it lasts almost the same, so i think i'm not parallelizing it correctly, because it looks a nice code to parallelize.
The other option I thought of is to manually parallelize the simulation. I could do this by compiling 16 programs, each of which handles a different range of temperatures. Then I could run all of the programs concurrently, so each program would get its own thread. However, this approach would require a lot of extra RAM.
I am trying to implement qsort algorithm in Fortran.
The implemented qsort is intended to operate over an array of a derived type which contains also another derived type.
The derived types are defined in a separate module as:
MODULE DATA_MODEL
! -------------------
! CONSTANTS
! -------------------
integer,parameter :: max_records = 100000000
type :: timestamp
integer :: year
integer :: month
integer :: day
integer :: hour
integer :: minute
integer :: second
end type
type :: tape
type(timestamp) :: ts
integer :: value1
integer :: value2
end type
END MODULE
This is what I have tried to implement the quicksort algorithm.
! DESCRIPTION:
! THIS MODULE IMPLEMENTS QSORT ALGORITH USING LOMUTO PARTITION SCHEME
! PSEUDOCODE:
! ALGORITHM QUICKSORT(A, LO, HI) IS
! IF LO < HI THEN
! P := PARTITION(A, LO, HI)
! QUICKSORT(A, LO, P - 1)
! QUICKSORT(A, P + 1, HI)
!
! ALGORITHM PARTITION(A, LO, HI) IS
! PIVOT := A[HI]
! I := LO
! FOR J := LO TO HI DO
! IF A[J] < PIVOT THEN
! SWAP A[I] WITH A[J]
! I := I + 1
! SWAP A[I] WITH A[HI]
! RETURN I
!
! SORTING THE ENTIRE ARRAY IS ACCOMPLOMISHED BY QUICKSORT(A, 0, LENGTH(A) - 1).
module qsort
use data_model
contains
subroutine quicksort(a, lo, hi)
implicit none
! SUBROUTINE PARAMETERS
type(tape),allocatable,intent(in out) :: a
integer,intent(in) :: lo, hi
! ALGORITHM INTERNAL VARIABLES
integer :: p
if (lo < hi) then
call partition(a, lo, hi, p)
call quicksort(a, lo, p - 1)
call quicksort(a, p + 1, hi)
end if
end subroutine
subroutine partition(a, lo, hi, p)
implicit none
! SUBROUTINE PARAMETERS
type(tape),allocatable,intent(inout) :: a
integer,intent(in) :: lo
integer,intent(in) :: hi
integer,intent(out) :: p
! ALGORITHM INTERNAL VARIABLES
type(tape) :: pivot
type(tape) :: swap
integer :: i,j
pivot = a(hi)
i = lo
do j = lo, hi
if (compare(a(j), pivot)) then
swap = a(i)
a(i) = a(j)
a(j) = swap
i = i + 1
endif
end do
swap = a(i)
a(i) = a(hi)
a(hi) = swap
p = i
end subroutine
function compare(a,b)
implicit none
! FUNCTION PARAMETERS
type(tape) :: a
type(tape) :: b
logical :: compare
if (a%ts%year < b%ts%year) then
compare = .true.
else if (a%ts%year > a%ts%year) then
compare = .false.
else if (a%ts%month < b%ts%month) then
compare = .true.
else if (a%ts%month > b%ts%month) then
compare = .false.
else if (a%ts%day < b%ts%day) then
compare = .true.
else if (a%ts%day > b%ts%day) then
compare = .false.
else if (a%ts%hour < b%ts%hour) then
compare = .true.
else if (a%ts%hour > b%ts%hour) then
compare = .false.
else if (a%ts%minute < b%ts%minute) then
compare = .true.
else if (a%ts%minute > b%ts%minute) then
compare = .false.
else if (a%ts%second < b%ts%second) then
compare = .true.
else if (a%ts%second > b%ts%second) then
compare = .false.
else
compare = .false.
end if
end function
end module
This is the errors I get while trying to compile it:
$ flang -c data_model.f95
$ flang -c qsort.f95
F90-S-0072-Assignment operation illegal to external procedure a (qsort.f95: 79)
F90-S-0076-Subscripts specified for non-array variable a (qsort.f95: 80)
F90-S-0076-Subscripts specified for non-array variable a (qsort.f95: 84)
F90-S-0076-Subscripts specified for non-array variable a (qsort.f95: 85)
F90-S-0076-Subscripts specified for non-array variable a (qsort.f95: 85)
F90-S-0076-Subscripts specified for non-array variable a (qsort.f95: 86)
0 inform, 0 warnings, 6 severes, 0 fatal for partition
$
Edit 1: I have modified the source code with the subroutine based code, which makes more sense as we want to modify the arguments.
Edit 2: modifying the definition of a to type(tape),intent(in out) :: a(:) in both quicksort and partition subroutines make the module to compile without errors – see comments.
I saw that you got unblocked with your problem with the help of the comments, but let me give you some suggestions to make your implementation more modular, easy to use and modern.
Disclaimer: Some of my suggestions might need a more recent Fortran version than 95.
You can improve your timestamp type definition by providing overloads for the relational operators.
type :: timestamp
integer :: year, month, day, hour = 0, minute = 0, second = 0
contains
procedure, private :: eq, ne, gt, ge, lt, le
generic :: operator(==) => eq
generic :: operator(/=) => ne
generic :: operator(>) => gt
generic :: operator(>=) => ge
generic :: operator(<) => lt
generic :: operator(<=) => le
end type
(A subtle change there is that I have default values for hour, minute and second. So you can instantiate like this: timestamp(2021,5,22))
To get this working, you just need to provide implementations for the functions eq, ne, gt, ge, lt, le available in the module you define your type. Note that, when writing a generic type bound procedure, you must declare your bound parameter as class(timestamp) instead of type(timestamp).
elemental function lt(left, right) result(result)
class(timestamp), intent(in) :: left, right
logical :: result
result = compare(left, right) < 0
end function
elemental function compare(this, other) result(result)
class(timestamp), intent(in) :: this, other
integer :: result
if (this%year /= other%year) then
result = sign(1, this%year - other%year)
else if (this%month /= other%month) then
result = sign(1, this%month - other%month)
else if (this%day /= other%day) then
result = sign(1, this%day - other%day)
else if (this%hour /= other%hour) then
result = sign(1, this%hour - other%hour)
else if (this%minute /= other%minute) then
result = sign(1, this%minute - other%minute)
else if (this%second /= other%second) then
result = sign(1, this%second - other%second)
else
result = 0
end if
end function
Another good practice you can implement is to control access of your module elements by using public and private.
module data_model
implicit none
public :: timestamp, tape
private
type :: timestamp
! (...)
end type
type :: tape
type(timestamp) :: ts
integer :: value1, value2
end type
contains
! (...) implementations of eq, ne, gt, ge, lt, le
end
Then, when you use this module from another program unit, only the public names will be available. You can also use only specific name with the use only clause:
module qsort
use data_model, only: tape
implicit none
public :: quicksort
private
contains
! (...) your quicksort implementation
end
Finally, let me suggest some tweaks on your quicksort implementation.
First, I suggest that you don't need to pass around the boundaries lo and hi everywhere together with your array. One of the most distinctive features of Fortran is how easy it is to operate on array segments. You can call the quicksort procedure on a contiguous portion of your array, and the procedure can work on it in a boundaries-agnostic way, if you use assumed-shape arrays, like this: type(tape) :: a(:). Inside the procedure, the array segment is rebounded to start on index 1, no matter what are the bounds on the call site.
Besides that, as I mentioned in the comments, you don't need to declare the array argument as allocatable in this case. Even if the original array you are passing is originally allocatable, you can pass an allocatable array to a procedure without declaring the argument as allocatable in the procedure, it will be handled as a normal array. It only makes sense to declare the argument as allocatable if you plan to allocate/deallocate inside the procedure.
pure recursive subroutine quicksort(a)
type(tape), intent(inout) :: a(:)
integer :: p
if (size(a) == 0) return
call partition(a, p)
call quicksort(a(:p-1))
call quicksort(a(p+1:))
end
I declared this procedure as pure in this case, but that would depend on your specific use case. Making it pure helps me to remind declaring intents correctly and have well-though procedures (and there is a performance gain in some cases), but this brings many restrictions (like not being able to print inside the procedure). You can search for pure procedures to learn more.
Both quicksort and partition are implemented as subroutines here. I like to do this way always that the procedure performs important side-effects, like updates on the passed argument. If I need a returned value, I can have an intent(out) argument, like the argument out in partition, that returns the pivot position.
pure subroutine partition(a, out)
type(tape), intent(inout) :: a(:)
integer, intent(out) :: out
integer :: i, j
i = 1
do j = 1, size(a)
if (a(j)%ts < a(size(a))%ts) then
call swap(a(i), a(j))
i = i + 1
end if
end do
call swap(a(i), a(size(a)))
out = i
end
elemental subroutine swap(a, b)
type(tape), intent(inout) :: a, b
type(tape) :: temp
temp = a
a = b
b = temp
end
You may note at a(j)%ts < a(size(a))%ts that I am making use of the overloaded operator < to compare two timestamp. This way, the comparison logic belongs to the same module as the type definition.
Finally, you can use the modules and make some tests on your quicksort implementation!
program main
use data_model, only: tape, timestamp
use qsort, only: quicksort
implicit none
type(tape) :: a(8) = [ &
tape(timestamp(2020, 01, 08), 0, 0), &
tape(timestamp(2021, 01, 30), 0, 0), &
tape(timestamp(2020, 01, 06), 0, 0), &
tape(timestamp(2019, 12, 14), 0, 0), &
tape(timestamp(2020, 01, 08), 0, 0), &
tape(timestamp(2020, 05, 05), 0, 0), &
tape(timestamp(2021, 04, 30), 0, 0), &
tape(timestamp(2020, 10, 22), 0, 0) &
]
call quicksort(a(3:7)) ! will sort in place, only from index 3 to 7
call quicksort(a) ! will sort whole array
end
Works like a charm!
This is not an answer directly related to the quicksort algorithm but rather on how to implement type-bound operators.
You can move the compare function inside the data_model module.
This decouples the modules further s.t. the quicksort module only contains the quicksort algorithm.
The compare function can be implemented by a type-bound operator operator(<).
The following shows a quick implementation (only for year/month/day) and it should help you to edit your own code accordingly.
module timestamp_m
implicit none
private
public timestamp
type timestamp
integer :: y, m, d
contains
generic :: operator(<) => timestamp_lt
procedure, private :: timestamp_lt
end type
contains
logical function timestamp_lt(this, rhs) result(tf)
!! result of: this < rhs
class(timestamp), intent(in) :: this
type(timestamp), intent(in) :: rhs
! compare year
if (this%y < rhs%y) then
tf = .true.
else if (this%y > rhs%y) then
tf = .false.
else
! compare month
if (this%m < rhs%m) then
tf = .true.
else if (this%m > rhs%m) then
tf = .false.
else
! compare day
if (this%d < rhs%d) then
tf = .true.
else
tf = .false.
end if
end if
end if
end function
end module
You will need to adjust one line in your quicksort module:
module qsort
..
subroutine quicksort(a, lo, hi)
..
! if (compare(a(j), pivot)) then ! OLD. replace by:
if (a(j)%ts < pivot%ts) then
..
I am using the procedure in the following code (that I took from here) to a program that I am trying to make run as fast as possible. The procedure is, however, very slow since it is probably optimized for pedagogical purposes not speed.
program tree_sort
! Sorts a file of integers by building a
! tree, sorted in infix order.
! This sort has expected behavior n log n,
! but worst case (input is sorted) n ** 2.
implicit none
type node
integer :: value
type (node), pointer :: left, right
end type node
type (node), pointer :: t ! A tree
integer :: number, ios
nullify (t) ! Start with empty tree
do
read (*, *, iostat = ios) number
if (ios < 0) exit
call insert (t, number) ! Put next number in tree
end do
! Print nodes of tree in infix order
call print_tree (t)
contains
recursive subroutine insert (t, number)
type (node), pointer :: t ! A tree
integer, intent (in) :: number
! If (sub)tree is empty, put number at root
if (.not. associated (t)) then
allocate (t)
t % value = number
nullify (t % left)
nullify (t % right)
! Otherwise, insert into correct subtree
else if (number < t % value) then
call insert (t % left, number)
else
call insert (t % right, number)
end if
end subroutine insert
recursive subroutine print_tree (t)
! Print tree in infix order
type (node), pointer :: t ! A tree
if (associated (t)) then
call print_tree (t % left)
print *, t % value
call print_tree (t % right)
end if
end subroutine print_tree
end program tree_sort
Is there any way to speed it up? I am using the procedure to sequentially add elements to a vector without adding repeated ones (so I changed the else in the insert subroutine to else if (number > t % value) then. Other than that, instead of printing I store the values in a global variable.
Edit:
Here is the actual code:
MODULE MOD_PARAMETERS
USE, INTRINSIC :: ISO_FORTRAN_ENV
IMPLICIT NONE
SAVE
INTEGER(INT32), PARAMETER :: d = 10 ! number of dimensions
INTEGER(INT32), PARAMETER :: L_0 = 5 ! after this adaptive grid kicks in, for L <= L_0 usual sparse grid
INTEGER(INT32), PARAMETER :: L_max = 5 ! maximum level
INTEGER(INT32), PARAMETER :: bound = 1 ! 0 -> for f = 0 at boundary
! 1 -> adding grid points at boundary
! 2 -> extrapolating close to boundary
INTEGER(INT32), PARAMETER :: testing_sample = 10**4
INTEGER(INT32), PARAMETER :: error_sample = 10**2
REAL(REAL64), PARAMETER :: eps = 0.001D0 ! epsilon for adaptive grid
TYPE NODE
INTEGER :: value
TYPE (NODE), POINTER :: left, right
END TYPE NODE
INTEGER(INT32), DIMENSION(:), ALLOCATABLE :: tree_vector
INTEGER(INT32) :: iii
END MODULE MOD_PARAMETERS
SUBROUTINE FF(x,output)
USE MOD_PARAMETERS
IMPLICIT NONE
REAL(REAL64), DIMENSION(d), INTENT(IN) :: x
REAL(REAL64) , INTENT(OUT) :: output
output = 1.0D0/(ABS(0.5D0-SUM(x(:)**4.0D0))+0.1D0)
END SUBROUTINE
SUBROUTINE XX(n,L,i,output)
USE MOD_PARAMETERS
IMPLICIT NONE
INTEGER(INT32) , INTENT(IN) :: n
INTEGER(INT32), DIMENSION(n), INTENT(IN) :: L, i
REAL(REAL64), DIMENSION(n), INTENT(OUT) :: output
INTEGER(INT32) :: j
DO j = 1,n
IF ((bound .EQ. 0) .OR. (bound .EQ. 2)) THEN
output(j) = REAL(i(j),REAL64)/REAL(2**L(j),REAL64)
ELSEIF (bound .EQ. 1) THEN
output(j) = REAL(i(j),REAL64)/REAL(2**MAX(L(j)-1,1),REAL64)
ENDIF
ENDDO
END SUBROUTINE
SUBROUTINE XX_INV(L,x,output)
USE MOD_PARAMETERS
IMPLICIT NONE
INTEGER(INT32), DIMENSION(d), INTENT(IN) :: L
REAL(REAL64), DIMENSION(d), INTENT(IN) :: x
INTEGER(INT32), DIMENSION(d), INTENT(OUT) :: output
INTEGER(INT32) :: j
DO j = 1,d
IF ((bound .EQ. 0) .OR. (bound .EQ. 2)) THEN
output(j) = 2*FLOOR(x(j)*REAL(2**(L(j)-1),REAL64))+1
ELSEIF (bound .EQ. 1) THEN
IF (L(j) .EQ. 2) THEN
IF (x(j) .LT. 0.5D0) THEN
output(j) = 0
ELSE
output(j) = 2
ENDIF
ELSE
output(j) = 2*FLOOR(x(j)*(REAL(2**MAX(L(j)-2,0),REAL64)))+1
ENDIF
ENDIF
ENDDO
END SUBROUTINE
SUBROUTINE BASE(x,L,i,output)
USE MOD_PARAMETERS
IMPLICIT NONE
REAL(REAL64), INTENT(IN) :: x
INTEGER(INT32), INTENT(IN) :: L,i
REAL(REAL64), INTENT(OUT) :: output
IF (bound .EQ. 0) THEN
output = MAX((1.0D0-ABS(x*REAL(2**L,REAL64)-REAL(i,REAL64))),0.0D0)
ELSEIF (bound .EQ. 1) THEN
IF ((L .EQ. 1) .AND. (i .EQ. 1)) THEN
output = 1.0D0
ELSEIF ((L .EQ. 2) .AND. (i .EQ. 0)) THEN
output = MAX(1.0D0-2.0D0*x,0.0D0)
ELSEIF ((L .EQ. 2) .AND. (i .EQ. 2)) THEN
output = MAX(2.0D0*x-1.0D0,0.0D0)
ELSE
output = MAX((1.0D0-ABS(x*REAL(2**(L-1),REAL64)-REAL(i,REAL64))),0.0D0)
ENDIF
ELSEIF (bound .EQ. 2) THEN
IF ((L .EQ. 1) .AND. (i .EQ. 1)) THEN
output = 1.0D0
ELSEIF ((L .GT. 1) .AND. (i .EQ. 1)) THEN
output = MAX(2.0D0-REAL(2**L,REAL64)*x,0.0D0)
ELSEIF ((L .GT. 1) .AND. (i .EQ. (2**L)-1)) THEN
output = MAX(REAL(2**L,REAL64)*x+REAL(1-i,REAL64),0.0D0)
ELSE
output = MAX((1.0D0-ABS(x*REAL(2**L,REAL64)-REAL(i,REAL64))),0.0D0)
ENDIF
ENDIF
END SUBROUTINE
PROGRAM MAIN
USE MOD_PARAMETERS
IMPLICIT NONE
INTEGER(INT32), DIMENSION(d,d) :: ident
REAL(REAL64), DIMENSION(1) :: x1
REAL(REAL64), DIMENSION(d) :: xd
INTEGER(INT32), DIMENSION(2*d) :: temp
INTEGER(INT32), DIMENSION(:,:), ALLOCATABLE :: grid_index, temp_grid_index, grid_index_new, J_index, &
adj_list, temp_adj_list
INTEGER(INT32), DIMENSION(:), ALLOCATABLE :: to_do, to_do_new, to_add_ind
REAL(REAL64), DIMENSION(:), ALLOCATABLE :: coeff, temp_coeff, J_coeff
REAL(REAL64) :: temp_min, temp_max, V, T, B, F
INTEGER(INT32) :: i, k, k1, k2, h, j, L, n, dd, dsize, count, count1, count2, count3, flag, &
first, repeated, add, ind, adj_list_ind
INTEGER(INT32) :: time1, time2, time3, time4, clock_rate, clock_max
INTEGER(INT32), DIMENSION(d) :: LL, ii
REAL(REAL64), DIMENSION(error_sample,d) :: sample_x
REAL(REAL64), DIMENSION(error_sample) :: sample_e, interp1
REAL(REAL64) :: max_error, L2_error
REAL(REAL64), DIMENSION(testing_sample,d) :: x_rand
REAL(REAL64), DIMENSION(testing_sample) :: interp2
TYPE(NODE), POINTER :: tree
! ============================================================================
! EXECUTABLE
! ============================================================================
ident = 0
DO i = 1,d
ident(i,i) = 1
ENDDO
! Initial grid point
dsize = 1
ALLOCATE(grid_index(dsize,2*d),grid_index_new(dsize,2*d),adj_list(dsize,2*d))
grid_index(1,:) = 1
grid_index_new = grid_index
adj_list = 0
ALLOCATE(coeff(0:dsize))
coeff(0) = 0.0D0
xd = 0.5D0
CALL FF(xd,coeff(1))
L = 1
n = SIZE(grid_index_new,1)
ALLOCATE(J_index(n*2*d,2*d))
ALLOCATE(J_coeff(n*2*d))
ALLOCATE(to_add_ind(1))
to_add_ind = 1
CALL RANDOM_NUMBER(sample_x)
sample_e = 0.0D0
CALL SYSTEM_CLOCK (time1,clock_rate,clock_max)
DO WHILE (L .LT. L_max)
CALL SYSTEM_CLOCK (time3,clock_rate,clock_max)
L = L+1
n = SIZE(grid_index_new,1)
count = 0
first = 1
DEALLOCATE(J_index,J_coeff)
ALLOCATE(J_index(n*2*d,2*d))
ALLOCATE(J_coeff(n*2*d))
J_index = 0
J_coeff = 0.0D0
DO k = 1,n
adj_list_ind = 0
DO i = 1,d
DO j = 1,2
IF ((bound .EQ. 0) .OR. (bound .EQ. 2)) THEN
temp = grid_index_new(k,:)+(/ident(i,:),ident(i,:)*(grid_index_new(k,d+i)-(-1)**j)/)
ELSEIF (bound .EQ. 1) THEN
IF (grid_index_new(k,i) .EQ. 1) THEN
temp = grid_index_new(k,:)+(/ident(i,:),ident(i,:)*(-(-1)**j)/)
ELSE
temp = grid_index_new(k,:)+(/ident(i,:),ident(i,:)*(grid_index_new(k,d+i)-(-1)**j)/)
ENDIF
ENDIF
CALL XX(d,temp(1:d),temp(d+1:2*d),xd)
temp_min = MINVAL(xd)
temp_max = MAXVAL(xd)
IF ((temp_min .GE. 0.0D0) .AND. (temp_max .LE. 1.0D0)) THEN
IF (first .EQ. 1) THEN
first = 0
count = count+1
J_index(count,:) = temp
V = 0.0D0
DO k1 = 1,SIZE(grid_index,1)
T = 1.0D0
DO k2 = 1,d
CALL XX(1,temp(k2),temp(d+k2),x1)
CALL BASE(x1(1),grid_index(k1,k2),grid_index(k1,k2+d),B)
T = T*B
ENDDO
V = V+coeff(k1)*T
ENDDO
CALL FF(xd,F)
J_coeff(count) = F-V
adj_list(to_add_ind(k),adj_list_ind+1) = dsize+count
adj_list_ind = adj_list_ind+1
ELSE
repeated = 0
DO h = 1,count
IF (SUM(ABS(J_index(h,:)-temp)) .EQ. 0) THEN
repeated = 1
adj_list(to_add_ind(k),adj_list_ind+1) = dsize+h
adj_list_ind = adj_list_ind+1
ENDIF
ENDDO
IF (repeated .EQ. 0) THEN
count = count+1
J_index(count,:) = temp
V = 0.0D0
DO k1 = 1,SIZE(grid_index,1)
T = 1.0D0
DO k2 = 1,d
CALL XX(1,temp(k2),temp(d+k2),x1)
CALL BASE(x1(1),grid_index(k1,k2),grid_index(k1,k2+d),B)
T = T*B
ENDDO
V = V+coeff(k1)*T
ENDDO
CALL FF(xd,F)
J_coeff(count) = F-V
adj_list(to_add_ind(k),adj_list_ind+1) = dsize+count
adj_list_ind = adj_list_ind+1
ENDIF
ENDIF
ENDIF
ENDDO
ENDDO
ENDDO
ALLOCATE(temp_grid_index(dsize,2*d))
ALLOCATE(temp_coeff(dsize))
temp_grid_index = grid_index
temp_coeff = coeff
DEALLOCATE(grid_index,coeff)
ALLOCATE(grid_index(dsize+count,2*d))
ALLOCATE(coeff(0:dsize+count))
grid_index(1:dsize,:) = temp_grid_index
coeff(0:dsize) = temp_coeff
DEALLOCATE(temp_grid_index,temp_coeff)
grid_index(dsize+1:dsize+count,:) = J_index(1:count,:)
coeff(dsize+1:dsize+count) = J_coeff(1:count)
IF (L .LT. L_max) THEN ! put this after error threshhold when implemented
ALLOCATE(temp_adj_list(dsize,2*d))
temp_adj_list = adj_list
DEALLOCATE(adj_list)
ALLOCATE(adj_list(dsize+count,2*d))
adj_list = 0
adj_list(1:dsize,:) = temp_adj_list
DEALLOCATE(temp_adj_list)
ENDIF
dsize = dsize + count
IF (L .LE. L_0) THEN
DEALLOCATE(grid_index_new)
ALLOCATE(grid_index_new(count,2*d))
grid_index_new = J_index(1:count,:)
DEALLOCATE(to_add_ind)
ALLOCATE(to_add_ind(count))
to_add_ind = dsize-count + (/ (h,h=1,count) /)
ELSE
DEALLOCATE(to_add_ind)
ALLOCATE(to_add_ind(count))
add = 0
to_add_ind = 0
DO h = 1,count
IF (ABS(J_coeff(h)) .GT. eps) THEN
add = add + 1
J_index(add,:) = J_index(h,:)
to_add_ind(add) = dsize-count+h
ENDIF
ENDDO
DEALLOCATE(grid_index_new)
ALLOCATE(grid_index_new(add,2*d))
grid_index_new = J_index(1:add,:)
ENDIF
DO i = 1,error_sample
V = 0.0D0
DO k1 = 1,SIZE(grid_index,1)
T = 1.0D0
DO k2 = 1,d
CALL BASE(sample_x(i,k2),grid_index(k1,k2),grid_index(k1,k2+d),B)
T = T*B
ENDDO
V = V+coeff(k1)*T
ENDDO
CALL FF(sample_x(i,:),F)
sample_e(i) = F-V
interp1(i) = V
ENDDO
max_error = MAXVAL(ABS(sample_e))
L2_error = (SUM(sample_e**2.0D0)/REAL(error_sample,REAL64))**0.5D0
CALL SYSTEM_CLOCK (time4,clock_rate,clock_max)
WRITE(*,'(A,I5,A,F10.5,A,I8,A,F15.10,A,F15.10)') ' level = ', L,&
' time = ',REAL(time4-time3,REAL64)/REAL(clock_rate,REAL64),&
' grid points = ',SIZE(grid_index,1),&
' max error = ',max_error,&
' L2 error = ',L2_error
ENDDO
!PRINT *, ' '
!PRINT *, ' '
!PRINT *, ' '
!DO i = 1,SIZE(adj_list,1)
! PRINT *, i, adj_list(i,:)
!ENDDO
!PRINT *, ' '
!PRINT *, ' '
!PRINT *, ' '
!DO i = 1,dsize
! PRINT *, i, grid_index(i,:), coeff(i)
!ENDDO
!PRINT *, ' '
!PRINT *, ' '
!PRINT *, ' '
ALLOCATE (to_do(dsize),to_do_new(dsize),tree_vector(dsize))
CALL SYSTEM_CLOCK (time2,clock_rate,clock_max)
PRINT *, ' '
WRITE(*,'(A,F10.5)') ' total time for setup = ', REAL(time2-time1,REAL64)/REAL(clock_rate,REAL64)
! ============================================================================
! Compute interpolated values:
! ============================================================================
IF (testing_sample .EQ. error_sample) THEN
! x_rand = sample_x
ELSE
CALL RANDOM_NUMBER(x_rand)
ENDIF
count1 = 0
count2 = 0
count3 = 0
CALL SYSTEM_CLOCK (time1,clock_rate,clock_max)
DO i = 1,testing_sample
V = 0.0D0
to_do = 0
to_do(1) = 1
to_do_new = 0
k = 1
DO L = 1,L_max
NULLIFY (tree)
tree_vector = 0
CALL SYSTEM_CLOCK (time3,clock_rate,clock_max)
DO j = 1,k
ind = to_do(j)
T = 1.0D0
DO dd = 1,d
CALL BASE(x_rand(i,dd),grid_index(ind,dd),grid_index(ind,d+dd),B)
T = T*B
ENDDO
V = V + coeff(ind)*T
ENDDO
CALL SYSTEM_CLOCK (time4,clock_rate,clock_max)
count1 = count1 + time4-time3
IF (L .LT. L_max) THEN
n = k
k = 0
DO j = 1,n
IF (adj_list(to_do(j),1) .GT. 0) THEN
DO h = 1,2*d
CALL SYSTEM_CLOCK (time3,clock_rate,clock_max)
LL = grid_index(adj_list(to_do(j),h),1:d)
ii = grid_index(adj_list(to_do(j),h),d+1:2*d)
flag = 0
k1 = 1
DO WHILE ((flag .EQ. 0) .AND. (k1 .LE. d))
IF ((bound .EQ. 0) .OR. (bound .EQ. 2)) THEN
k2 = 2*FLOOR(x_rand(i,k1)*REAL(2**(LL(k1)-1),REAL64))+1
ELSEIF (bound .EQ. 1) THEN
IF (LL(k1) .EQ. 2) THEN
IF (x_rand(i,k1) .LT. 0.5D0) THEN
k2 = 0
ELSE
k2 = 2
ENDIF
ELSE
k2 = 2*FLOOR(x_rand(i,k1)*(REAL(2**MAX(LL(k1)-2,0),REAL64)))+1
ENDIF
ENDIF
IF (k2 .NE. ii(k1)) THEN
flag = 1
ENDIF
k1 = k1 +1
ENDDO
CALL SYSTEM_CLOCK (time4,clock_rate,clock_max)
count2 = count2 + time4-time3
! CALL SYSTEM_CLOCK (time3,clock_rate,clock_max)
IF (flag .EQ. 0) THEN
!IF (MINVAL(ABS(to_do_new(1:MAX(k,1))-adj_list(to_do(j),h))) .GT. 0) THEN
to_do_new(k+1) = adj_list(to_do(j),h)
k = k+1
CALL SYSTEM_CLOCK (time3,clock_rate,clock_max)
CALL INSERT(tree,to_do_new(k))
CALL SYSTEM_CLOCK (time4,clock_rate,clock_max)
count3 = count3 + time4-time3
!ENDIF
ENDIF
! CALL SYSTEM_CLOCK (time4,clock_rate,clock_max)
! count3 = count3 + time4-time3
ENDDO
ENDIF
ENDDO
CALL SYSTEM_CLOCK (time3,clock_rate,clock_max)
iii = 0
CALL PRINT_TREE(tree)
to_do = tree_vector
CALL SYSTEM_CLOCK (time4,clock_rate,clock_max)
count3 = count3 + time4-time3
!to_do = to_do_new
to_do_new = 0
ENDIF
ENDDO
interp2(i) = V
ENDDO
CALL SYSTEM_CLOCK (time2,clock_rate,clock_max)
PRINT *, ' '
WRITE(*,'(A,F10.5,A,I10)') ' time for interpolation = ', REAL(time2-time1,REAL64)/REAL(clock_rate,REAL64),&
' points = ', testing_sample
PRINT *, ' '
WRITE(*,'(A,F10.5)') ' time for base = ', REAL(count1,REAL64)/REAL(clock_rate,REAL64)
PRINT *, ' '
WRITE(*,'(A,F10.5)') ' time for x_inv = ', REAL(count2,REAL64)/REAL(clock_rate,REAL64)
PRINT *, ' '
WRITE(*,'(A,F10.5)') ' time for repeated = ', REAL(count3,REAL64)/REAL(clock_rate,REAL64)
!PRINT *, ' '
!WRITE(*,'(A,F20.15)') ' check = ', MAXVAL(ABS(interp2-interp1))
DEALLOCATE(grid_index,grid_index_new,J_index,coeff,J_coeff,adj_list,to_do,to_do_new,to_add_ind,tree_vector)
CONTAINS
RECURSIVE SUBROUTINE INSERT(tree,number)
TYPE(NODE), POINTER :: tree
INTEGER(INT32), INTENT(IN) :: number
IF (.NOT. ASSOCIATED(tree)) THEN
ALLOCATE(tree)
tree%value = number
NULLIFY(tree%left)
NULLIFY(tree%right)
ELSEIF (number .LT. tree%value) THEN
CALL INSERT (tree%left,number)
ELSEIF (number .GT. tree%value) THEN
CALL INSERT(tree%right,number)
ENDIF
END SUBROUTINE INSERT
RECURSIVE SUBROUTINE PRINT_TREE(tree)
TYPE (NODE), POINTER :: tree
IF (ASSOCIATED(tree)) THEN
CALL PRINT_TREE(tree%left)
iii = iii+1
tree_vector(iii) = tree%value
CALL PRINT_TREE (tree%right)
END IF
END SUBROUTINE PRINT_TREE
END PROGRAM
I am using optimization O3 but otherwise no flags. In my computer the time for repeated (which is where I am using the binary tree) is 18.3 seconds, whereas if I use an alternative method that is commented in the version (with MINVAL) it only takes 3.6 seconds.
This question already has an answer here:
Random numbers keep coming out the same, despite random seed being used
(1 answer)
Closed last year.
I am attempting to write a Montecarlo algorithm to simulate interaction between agents in a population. This algorithm needs to call two random numbers at each iteration (say, 10^9 iterations).
My issue here is that everytime I change the seed (to obtain different realizations), the RNG is giving me the same output (same Montecarlo events). I have tried different ways of implementing it with to no avail.
I am completely new to Fortran and copying this code from MATLAB. Am I doing something wrong in the way I'm implementing this code?
Below is what I tried:
program Gillespie
implicit none
integer*8, parameter :: n_max = 10.0**8 ! max. number of iterations
integer*8 :: t_ext, I_init, S_init, jump, S_now, I_now, i, u
real*8 :: t, N, a0, tau, st, r1, r2
real, parameter :: beta = 1000
real, parameter :: gammma = 99.98
real, parameter :: mu = 0.02
real, parameter :: R0 = beta/(gammma+mu)
integer :: seed = 11
real, dimension(n_max) :: S_new ! susceptible pop. array
real, dimension(n_max) :: I_new ! infected pop. array
real, dimension(n_max) :: t_new ! time array
real, dimension(5) :: events ! events array
open(unit=3, file='SIS_output.dat')
t = 0 ! initial time
N = 40 ! initial population size
jump = 1 ! time increment (save in uniform increments)
u = 2
t_ext = 0 ! extiction time
I_init = 2 ! initial infected pop.
S_init = N-I_init ! initial susceptible pop.
S_now = S_init
I_now = I_init
S_new(1) = S_init ! initialize susceptibles array
I_new(1) = I_init ! initialize infected array
t_new(1) = t ! initialize time array
write(3,*) t_new(1), S_new(1), I_new(1) ! write i.c. to array
call random_seed(seed)
do i=2, n_max
call random_number(r1)
call random_number(r2)
events(1) = mu*N ! Birth(S)
events(2) = mu*S_now ! Death(S)
events(3) = mu*I_now ! Death(I)
events(4) = (beta*S_now*I_now)/N ! Infection
events(5) = gammma*I_now ! Recovery
a0 = events(1)+events(2)+events(3)+events(4)+events(5)
tau = LOG(1/r1)*(1/a0) ! time increment
t = t + tau ! update time
st = r2*a0 ! stochastic time???
! update the populations
if (st .le. events(1)) then
S_now = S_now + 1
else if (st .gt. events(1) .AND. st .le.
#(events(1) + events(2))) then
S_now = S_now - 1
else if (st .gt. (events(1) + events(2)) .AND. st .le.
#(events(1) + events(2) + events(3))) then
I_now = I_now - 1
else if (st .gt. (events(1) + events(2) + events(3)) .AND.
#st .le. (events(1) + events(2) + events(3) + events(4))) then
S_now = S_now - 1
I_now = I_now + 1
else
S_now = S_now + 1
I_now = I_now - 1
end if
! save time in uniform increments
if(t .ge. jump) then
t_new(u) = floor(t)
S_new(u) = S_now
I_new(u) = I_now
write(3,*) t_new(u), S_new(u), I_new(u)
jump = jump+1
u = u+1
end if
! write(3,*) t_new(i), S_new(i), I_new(i)
!N = S_now + I_now ! update population post event
if(I_now .le. 0) then ! if extinct, terminate
print *, "extinct"
goto 2
end if
end do
2 end program Gillespie
I appreciate all input. Thanks.
Your call
call random_seed(seed)
is strange. I thought it should not be allowed without a keyword argument, but it actually is inquiring for the size of the random seed array.
For a proper way of initializing seed see the example in
https://gcc.gnu.org/onlinedocs/gfortran/RANDOM_005fSEED.html
I have been tried to do the Morris Pratt table and the code is basically this one in C:
void preMp(char *x, int m, int mpNext[]) {
int i, j;
i = 0;
j = mpNext[0] = -1;
while (i < m) {
while (j > -1 && x[i] != x[j])
j = mpNext[j];
mpNext[++i] = ++j;
}
}
and here is where i get so far in Fortran
program MP_ALGORITHM
implicit none
integer, parameter :: m=4
character(LEN=m) :: x='abac'
integer, dimension(4) :: T
integer :: i, j
i=0
T(1)=-1
j=-1
do while(i < m)
do while((j > -1) .AND. (x(i+1:i+1) /= (x(j+i+1:j+i+1))))
j=T(j)
end do
i=i+1
j=j+1
T(i)=j
end do
print *, T(1:)
end program MP_ALGORITHM
and the problem is i think i am having the wrong output.
for x=abac it should be (?):
a b a c
-1 0 1 0
and my code is returning 0 1 1 1
so, what i've done wrong?
The problem here is that C indices start from zero, but Fortran indices start from one. You can try to adjust the index for every array acces by one, but this will get unwieldy.
The Morris-Pratt table itself is an array of indices, so it should look different in C and Fortran: The Fortran array should have one-based indices and it should use zero as invalid index.
Together with the error that chw21 pointed out, your function might look like this:
subroutine kmp_table(x, t)
implicit none
character(*), intent(in) :: x
integer, dimension(:), intent(out) :: t
integer m
integer :: i, j
m = len(x)
i = 1
t(1) = 0
j = 0
do while (i < m)
do while(j > 0 .and. x(i:i) /= x(j:j))
j = t(j)
end do
i = i + 1
j = j + 1
t(i) = j
end do
end subroutine
You can then use it in the Morris-Pratt algorithm as taken straight from the Wikipedia page with adjustment for Fortran indices:
function kmp_index(S, W) result(res)
implicit none
integer :: res
character(*), intent(in) :: S ! text to search
character(*), intent(in) :: W ! word to find
integer :: m ! zero-based offset in S
integer :: i ! one-based offset in W and T
integer, dimension(len(W)) :: T ! KMP table
call kmp_table(W, T)
i = 1
m = 0
do while (m + i <= len(S))
if (W(i:i) == S(m + i:m + i)) then
if (i == len(W)) then
res = m + 1
return
end if
i = i + 1
else
if (T(i) > 0) then
m = m + i - T(i)
i = T(i)
else
i = 1
m = m + 1
end if
end if
end do
res = 0
end function
(The index m is zero-based here, because t is only ever used in conjunction with i in S(m + i:m + i). Adding two one-based indices will yield an offset of one, whereas keeping m zero-based makes this a neutral addition. m is a local variable that isn't exposed to code from the outside.)
Alternatively, you could make your Fortran arrays zero-based by specifying a lower bound of zero for your string and array. That will clash with the useful character(*) notation, though, which always uses one-based indexing. In my opinion, it is better to think about the whole algorithm in the typical one-based indexing scheme of Fortran.
this site isn't really a debugging site. Normally I would suggest you have a look at how to debug code. It didn't take me very long to go through your code with a pen and paper and verify that that is indeed the table it produces.
Still, here are a few pointers:
The C code compares x[i] and x[j], but you compare x[i] and x[i+j] in your Fortran code, more or less.
Integer arrays usually also start at index 1 in Fortran. So just like adding one to the index in the x String, you also need to add 1 every time you access T anywhere.