Operations with big real numbers in Fortran - algorithm

I wrote a Fortran code that calculates the ith-permutation of a given list {1,2,3,...,n}, without computing all the others, that are n! I needed that in order to find the ith-path of the TSP (Travelling salesman problem).
When n! is big, the code gives me some error and I tested that the ith-permutation found is not the exact value. For n=10, there are not problems at all, but for n=20, the code crashes or wrong values are found. I think this is due to errors that Fortran makes operating with big numbers (sums of big numbers).
I use Visual Fortran Ultimate 2013. In attached you find the subroutine I use for my goal. WeightAdjMatRete is the distance matrix between each pair of knots of the network.
! Fattoriale
RECURSIVE FUNCTION factorial(n) RESULT(n_factorial)
IMPLICIT NONE
REAL, INTENT(IN) :: n
REAL :: n_factorial
IF(n>0) THEN
n_factorial=n*factorial(n-1)
ELSE
n_factorial=1.
ENDIF
ENDFUNCTION factorial
! ith-permutazione di una lista
SUBROUTINE ith_permutazione(lista_iniziale,n,i,ith_permutation)
IMPLICIT NONE
INTEGER :: k,n
REAL :: j,f
REAL, INTENT(IN) :: i
INTEGER, DIMENSION(1:n), INTENT(IN) :: lista_iniziale
INTEGER, DIMENSION(1:n) :: lista_lavoro
INTEGER, DIMENSION(1:n), INTENT(OUT) :: ith_permutation
lista_lavoro=lista_iniziale
j=i
DO k=1,n
f=factorial(REAL(n-k))
ith_permutation(k)=lista_lavoro(FLOOR(j/f)+1)
lista_lavoro=PACK(lista_lavoro,MASK=lista_lavoro/=ith_permutation(k))
j=MOD(j,f)
ENDDO
ENDSUBROUTINE ith_permutazione
! Funzione modulo, adattata
PURE FUNCTION mood(k,modulo) RESULT(ris)
IMPLICIT NONE
INTEGER, INTENT(IN) :: k,modulo
INTEGER :: ris
IF(MOD(k,modulo)/=0) THEN
ris=MOD(k,modulo)
ELSE
ris=modulo
ENDIF
ENDFUNCTION mood
! Funzione quoziente, adattata
PURE FUNCTION quoziente(a,p) RESULT(ris)
IMPLICIT NONE
INTEGER, INTENT(IN) :: a,p
INTEGER :: ris
IF(MOD(a,p)/=0) THEN
ris=(a/p)+1
ELSE
ris=a/p
ENDIF
ENDFUNCTION quoziente
! Vettori contenenti tutti i payoff percepiti dagli agenti allo state vector attuale e quelli ad ogni sua singola permutazione
SUBROUTINE tuttipayoff(n,m,nodi,nodi_rete,sigma,bvector,MatVecSomma,VecPos,lista_iniziale,ith_permutation,lunghezze_percorso,WeightAdjMatRete,array_perceived_payoff_old,array_perceived_payoff_neg)
IMPLICIT NONE
INTEGER, INTENT(IN) :: n,m,nodi,nodi_rete
INTEGER, DIMENSION(1:nodi), INTENT(IN) :: sigma
INTEGER, DIMENSION(1:nodi), INTENT(OUT) :: bvector
REAL, DIMENSION(1:m,1:n), INTENT(OUT) :: MatVecSomma
REAL, DIMENSION(1:m), INTENT(OUT) :: VecPos
INTEGER, DIMENSION(1:nodi_rete), INTENT(IN) :: lista_iniziale
INTEGER, DIMENSION(1:nodi_rete), INTENT(OUT) :: ith_permutation
REAL, DIMENSION(1:nodi_rete), INTENT(OUT) :: lunghezze_percorso
REAL, DIMENSION(1:nodi_rete,1:nodi_rete), INTENT(IN) :: WeightAdjMatRete
REAL, DIMENSION(1:nodi), INTENT(OUT) :: array_perceived_payoff_old,array_perceived_payoff_neg
INTEGER :: i,j,k
bvector=sigma
FORALL(i=1:nodi,bvector(i)==-1)
bvector(i)=0
ENDFORALL
FORALL(i=1:m,j=1:n)
MatVecSomma(i,j)=bvector(m*(j-1)+i)*(2.**REAL(n-j))
ENDFORALL
FORALL(i=1:m)
VecPos(i)=1.+SUM(MatVecSomma(i,:))
ENDFORALL
DO k=1,nodi
IF(VecPos(mood(k,m))<=factorial(REAL(nodi_rete))) THEN
CALL ith_permutazione(lista_iniziale,nodi_rete,VecPos(mood(k,m))-1.,ith_permutation)
FORALL(i=1:(nodi_rete-1))
lunghezze_percorso(i)=WeightAdjMatRete(ith_permutation(i),ith_permutation(i+1))
ENDFORALL
lunghezze_percorso(nodi_rete)=WeightAdjMatRete(ith_permutation(nodi_rete),ith_permutation(1))
array_perceived_payoff_old(k)=(1./SUM(lunghezze_percorso))
ELSE
array_perceived_payoff_old(k)=0.
ENDIF
IF(VecPos(mood(k,m))-SIGN(1,sigma(m*(quoziente(k,m)-1)+mood(k,m)))*2**(n-quoziente(k,m))<=factorial(REAL(nodi_rete))) THEN
CALL ith_permutazione(lista_iniziale,nodi_rete,VecPos(mood(k,m))-SIGN(1,sigma(m*(quoziente(k,m)-1)+mood(k,m)))*2**(n-quoziente(k,m))-1.,ith_permutation)
FORALL(i=1:(nodi_rete-1))
lunghezze_percorso(i)=WeightAdjMatRete(ith_permutation(i),ith_permutation(i+1))
ENDFORALL
lunghezze_percorso(nodi_rete)=WeightAdjMatRete(ith_permutation(nodi_rete),ith_permutation(1))
array_perceived_payoff_neg(k)=(1./SUM(lunghezze_percorso))
ELSE
array_perceived_payoff_neg(k)=0.
ENDIF
ENDDO
ENDSUBROUTINE tuttipayoff

Don't use floating-point numbers to represent factorials; factorials are products of integers and are therefore best represented as integers.
Factorials grow big fast, so it may be tempting to use reals, because reals can represent huge numbers like 1.0e+30. But floating-point numbers are precise only with relation to their magnitude; their mantissa still has a limited size, they can be huge because their exponents may be huge.
A 32-bit real can represent exact integers up to about 16 million. After that, only every even integer can be represented up to 32 million and every fourth integer up to 64 million. 64-bit integers are better, because they can represent exact integers up to 9 quadrillion.
64-bit integers can go 1024 times further: They can represent 2^63 or about 9 quintillion (9e+18) integers. That is enough to represent 20!:
20! = 2,432,902,008,176,640,000
2^63 = 9,223,372,036,854,775,808
Fortran allows you to select a kind of integer based on the decimal places it should be able to represent:
integer, (kind=selected_int_kind(18))
Use this to do your calculations with 64-bit integers. This will give you factorials up to 20!. It won't go further than that, though: Most machines support only integers up to 64 bit, so selected_int_kind(19) will give you an error.
Here's the permutation part of your program with 64-bit integers. Note how all the type conversions ald floors and ceilings disappear.
program permute
implicit none
integer, parameter :: long = selected_int_kind(18)
integer, parameter :: n = 20
integer, dimension(1:n) :: orig
integer, dimension(1:n) :: perm
integer(kind=long) :: k
do k = 1, n
orig(k) = k
end do
do k = 0, 2000000000000000_long, 100000000000000_long
call ith_perm(perm, orig, n, k)
print *, k
print *, perm
print *
end do
end program
function fact(n)
implicit none
integer, parameter :: long = selected_int_kind(18)
integer(kind=long) :: fact
integer, intent(in) :: n
integer :: i
fact = 1
i = n
do while (i > 1)
fact = fact * i
i = i - 1
end do
end function fact
subroutine ith_perm(perm, orig, n, i)
implicit none
integer, parameter :: long = selected_int_kind(18)
integer, intent(in) :: n
integer(kind=long), intent(in) :: i
integer, dimension(1:n), intent(in) :: orig
integer, dimension(1:n), intent(out) :: perm
integer, dimension(1:n) :: work
integer :: k
integer(kind=long) :: f, j
integer(kind=long) :: fact
work = orig
j = i
do k = 1, n
f = fact(n - k)
perm(k) = work(j / f + 1)
work = pack(work, work /= perm(k))
j = mod(j, f)
end do
end subroutine ith_perm

Related

Logarithm of a random number

I wrote a fortran program to code this algorithm (https://en.wikipedia.org/wiki/Reservoir_sampling#Algorithm_A-ExpJ). It works on my computer. But after I asked these two questions (Intrinsic Rand, what is the interval [0,1] or ]0,1] or [0,1[ and How far can we trust calculus with infinity?), I think I could have a problem with log(random()) because call random_number(Xw); Xw = log(Xw) is used.
Indeed, random_number(Xw) could return 0 and log(0)=-infinity.
Therefore, I plan to modify this line as follow call random_number(Xw); Xw = log(1-Xw) to change the random value interval from [0,1[ to ]0,1].
Is it a good idea or is there a best solution ?
While mathematically it is true that if X is uniformly distributed on (the real interval) [0,1) then 1-X is uniformly distributed on (0,1], this does not particularly help you.
As noted in the description of the algorithm to which you link, the underlying assumption is that the base uniform distribution is over the interval (0,1). This is not the same as (0,1].
You can use rejection sampling to generate X uniformly over (0,1) from random_number() (which is [0,1)): throw away all zero occurrences.
Not a good idea. If you want your algorithm to be stable, you need to define bounds. your log function represents a priority, it can likely be as low as you want, but it must be a number. You can bind it to numerical precision:
program t
use iso_fortran_env
implicit none
real(real64), parameter :: SAFE = exp(-0.5d0*huge(0.0_real64))
print *, log(randoms_in_range(100,SAFE,1.0_real64))
contains
elemental real(real64) function in_range(f,low,hi) result(x)
real(real64), intent(in) :: f ! in: [0:1]
real(real64), intent(in) :: low,hi
real(real64) :: frac
frac = max(min(f,1.0_real64),0.0_real64)
x = low+frac*(hi-low)
end function in_range
real(real64) function random_in_range(low,hi) result(x)
real(real64), intent(in) :: low,hi
call random_number(x) ! [0,1]
x = in_range(x,low,hi) ! [low,hi]
end function random_in_range
function randoms_in_range(n,low,hi) result(x)
integer , intent(in) :: n
real(real64), intent(in) :: low,hi
real(real64) :: x(n)
call random_number(x) ! [0,1]
x = in_range(x,low,hi) ! [low,hi]
end function randoms_in_range
end program

LAPACK Eigenvectors Sign Inverted for Similar Matrices

I am using LAPACK to perform some eigendecomposition on two relatively small matrices (15x15, in this case). These matrices are from an optimisation problem, but their origin is not of great importance. They are both quite similar as it is a small step in an optimisation algorithm, so I would expect their eigenvectors and eigenvalues to be of similar magnitude, and certainly the same sign. However, after obtaining the eigenvectors of both matrices using LAPACK's DSYEVD (I have also tried DSYEV with the same result), I find that some eigenvectors show expected similarity, but others have total inversion of their signs which is rather strange. WeTransfer link for these matrices: https://wetransfer.com/downloads/784e73883eae7c63aa06aa70b64d462b20220831121100/b7519a.
I am using a module for maths that contains two functions, one for eigenvalues and one for eigenvectors.
MODULE math
implicit none
contains
function EVALS(matrix, n) result(eigenvals)
! Here, the eigenvalues from a square, symmetric, real matrix are calculated.
! The eigenvectors associated with said eigenvalues are not given as a result, but can be obtained with the function EVEVS.
!
! ARGUMENTS: matrix : 2D array containing the matrix which the eigenvalues of will be calculated.
! n : integer which represents the number of rows/columns (it doesn't matter which as the matrix is square).
implicit none
integer(i4b), intent(in) :: n
integer(i4b) :: LDA, LWORK, LIWORK, INFO
real(dp), intent(in) :: matrix(n, n)
real(dp), allocatable :: WORK(:), IWORK(:)
real(dp) :: eigenvals(n), work_mat(n,n)
character :: JOBZ, UPLO
! Assigning working matrix...
work_mat(:,:) = 0.0
work_mat(:,:) = matrix(:,:)
! Initialising some values....
JOBZ = 'N'
UPLO = 'U'
LDA = n
INFO = 0
! Allocating working array...
LWORK = MAX(1, (1 + 6*n + 2*n**2))
LIWORK = MAX(1, (3 + 5*n))
allocate(WORK(LWORK))
allocate(IWORK(LIWORK))
WORK(:) = 0
IWORK(:) = 0
! Obtaining eigenvalues...
eigenvals(:) = 0.0
call DSYEVD(JOBZ, UPLO, n, work_mat, LDA, eigenvals, WORK, LWORK, IWORK, LIWORK, INFO)
deallocate(WORK)
end function EVALS
function EVECS(matrix, n) result(eigenvecs)
! Here, the eigenvectors of a square, symmetric, real matrix are calculated.
! The eigenvalues associated with said eigenvectors are not given as a result, but can be obtained with the function EVALS.
!
! ARGUMENTS: matrix : 2D array containing the matrix which the eigenvalues of will be calculated.
! n : integer which represents the number of rows/columns (it doesn't matter which as the matrix is square).
implicit none
integer(i4b), intent(in) :: n
integer(i4b) :: LDA, LWORK, LIWORK, INFO
real(dp), intent(in) :: matrix(n, n)
real(dp), allocatable :: WORK(:), IWORK(:)
real(dp) :: eigenvecs(n, n), eigenvals(n), work_mat(n,n)
character :: JOBZ, UPLO
! Assigning working matrix...
work_mat(:,:) = 0.0
work_mat(:,:) = matrix(:,:)
! Initialising some values....
JOBZ = 'V'
UPLO = 'U'
LDA = n
INFO = 0
! Allocating working array....
LWORK = MAX(1, (1 + 6*n + 2*n**2))
LIWORK = MAX(1, (3 + 5*n))
allocate(WORK(LWORK))
allocate(IWORK(LIWORK))
WORK(:) = 0
IWORK(:) = 0
! Obtaining eigenvectors...
eigenvals(:) = 0.0
call DSYEVD(JOBZ, UPLO, n, work_mat, LDA, eigenvals, WORK, LWORK, IWORK, LIWORK, INFO)
eigenvecs(:,:) = 0.0
eigenvecs(:,:) = work_mat(:,:)
deallocate(WORK)
end function EVECS
END MODULE math
Now, below is an minimal example of how these functions are used within another source code file. I have printed two eigenvectors (one from each matrix) which are very similar in their values, but the signs are opposite. If you were to look at other corresponding eigenvectors, some are more or less similar (as expected), and others show this same opposite signs behaviour.
program eigen_test
use math
implicit none
integer(i4b) :: i, j
integer(i4b), parameter :: npr=15
real(dp) :: mat_a(npr,npr), mat_b(npr,npr)
real(dp) :: eigenvals_a(npr), eigenvals_b(npr)
real(dp) :: eigenvecs_a(npr,npr), eigenvecs_b(npr,npr)
open(10, file="mat_a")
read(10,*) ((mat_a(i,j), j=1,npr), i=1,npr)
open(11, file="mat_b")
read(11,*) ((mat_b(i,j), j=1,npr), i=1,npr)
eigenvals_a = EVALS(mat_a, npr)
eigenvecs_a = EVECS(mat_a, npr)
eigenvals_b = EVALS(mat_b, npr)
eigenvecs_b = EVECS(mat_b, npr)
print *, eigenvecs_a(:,4)
print *, eigenvecs_b(:,4)
end program eigen_test
Perhaps this is a problem with my eigendecomposition understanding, or maybe I am not using the LAPACK function correctly, but I hope the problem is clear and reproducible.
Thanks in advance!

Big overhead when subroutine is in separate module versus in the same file as the main program

I am evaluating the overhead cost (in wall clock time) of some features in fortran programs. And I came across the following behavior with GNU fortran, that I did not expect: having the subroutine in the same file as the main program (in the contain region or in a module) versus having the subroutine in a separate module (in separate file) has a big impact.
The simple code that reproduces the behavior is:
I have a subroutine that does a matrix-vector multiplication 250000 times. In the first test, I have a subroutine in the contain region of the main program. In the second test, the same subroutine is in a separate module.
The difference in performance between the two is big.
The subroutine in the contain region of the main program, 10 runs yields
min: 1.249
avg: 1.266
1.275 - 1.249 - 1.264 - 1.279 - 1.266 - 1.253 - 1.271 - 1.251 - 1.269 - 1.284
The subroutine in separate module, 10 runs yields
min: 1.848
avg: 1.861
1.848 - 1.862 - 1.853 - 1.871 - 1.854 - 1.883 - 1.810 - 1.860 - 1.886 - 1.884
About 50% slower, this factor seems consistent with the size of the matrix as
well as the number of iterations.
those tests are done with gfortran 4.8.5. With gfortran 8.3.0, the program runs a little faster, but the time doubles from the subroutine in the contain section of the main program to the subroutine in a separate module.
Portland group does not have that problem with my test program and it run even faster than the best case of gfortran.
If I read the size of the matrix from an input file (or runtime command line arg) and do dynamic allocation, then the difference in wall clock time goes away and both cases run slower (wall clock time of the subroutine in the separate module, separate file). I suspect that gfortran is able to optimize the main program better if the size of the matrix is known at compile time in the main program.
What am I doing wrong that GNU Compilers do not like, or what is GNU compiler doing poorly? Are there compiling flags to to help gfortran in such cases?
Everything is compiled with optimization -O3
Code (test_simple.f90)
!< #file test_simple.f90
!! simple test
!>
!
program test_simple
!
use iso_fortran_env
use test_mod
!
implicit none
!
integer, parameter :: N = 100
integer, parameter :: N_TEST = 250000
logical, parameter :: GENERATE=.false.
!
real(real64), parameter :: dx = 10.0_real64
real(real64), parameter :: lx = 40.0_real64
!
real(real64), dimension(N,N) :: A
real(real64), dimension(N) :: x, y
real(real64) :: start_time, end_time
real(real64) :: duration
!
integer :: k, loop_idx
!
call make_matrix(A,dx,lx)
x = A(N/2,:)
!
y = 0
call cpu_time( start_time )
call axpy_loop (A, x, y, N_TEST)
!call axpy_loop_in (A, x, y, N_TEST)
!
call cpu_time( end_time )
!
duration = end_time-start_time
!
if( duration < 0.01 )then
write( *, "('Total time:',f10.6)" ) duration
else
write( *, "('Total time:',f10.3)" ) duration
end if
!
write(*,"('Sum = ',ES14.5E3)") sum(y)
!
contains
!
!< #brief compute y = y + A^nx
!! #param[in] A matrix to use
!! #param[in] x vector to used
!! #param[in, out] y output
!! #param[in] nloop number of iterations, power to apply to A
!!
!>
subroutine axpy_loop_in (A, x, y, nloop)
real(real64), dimension(:,:), intent(in) :: A
real(real64), dimension(:), intent(in) :: x
real(real64), dimension(:), intent(inout) :: y
integer, intent(in) :: nloop
!
real(real64), dimension(size(x)) :: z
integer :: k, iter
!
y = x
do iter = 1, nloop
z = y
y = 0
do k = 1, size(A,2)
y = y + A(:,k)*z(k)
end do
end do
!
end subroutine axpy_loop_in
!
!> #brief Computes the square exponential correlation kernel matrix for
!! a 1D uniform grid, using coordinate vector and scalar parameters
!! #param [in, out] C square matrix of correlation (kernel)
!! #param [in] dx grid spacing
!! #param [in] lx decorrelation length
!!
!! The correlation betwen the grid points i and j is given by
!! \f$ C(i,j) = \exp(\frac{-(xi-xj)^2}{2l_xi l_xj}) \f$
!! where xi and xj are respectively the coordinates of point i and j
!>
subroutine make_matrix(C, dx, lx)
! some definitions of the square correlation
! uses 2l^2 while other use l^2
! l^2 is used here by setting this factor to 1.
real(real64), parameter :: factor = 1.0
!
real(real64), dimension(:,:), intent(in out) :: C
real(real64), intent(in) :: dx
real(real64) lx
! Local variables
real(real64), dimension(size(x)) :: nfacts
real :: dist, denom
integer :: ii, jj
!
do jj=1, size(C,2)
do ii=1, size(C,1)
dist = (ii-jj)*dx
denom = factor*lx*lx
C(ii, jj) = exp( -dist*dist/denom )
end do
! compute normalization factors
nfacts(jj) = sqrt( sum( C(:, jj) ) )
end do
!
! normalize to prevent arbitrary growth in those tests
! where we apply the exponential of the matrix
do jj=1, size(C,2)
do ii=1, size(C,1)
C(ii, jj) = C(ii, jj)/( nfacts(ii)*nfacts(jj) )
end do
end do
! remove the very small
where( C<epsilon(1.) ) C=0.
!
end subroutine make_matrix
!
end program test_simple
!
Code (test_mod.f90)
!> #file test_mod.f90
!! simple operations
!<
!< #brief module for simple operations
!!
!>
module test_mod
use iso_fortran_env
implicit none
contains
!
!< #brief compute y = y + A^nx
!! #param[in] A matrix to use
!! #param[in] x vector to used
!! #param[in, out] y output
!! #param[in] nloop number of iterations, power to apply to A
!!
!>
subroutine axpy_loop( A, x, y, nloop )
real(real64), dimension(:,:), intent(in) :: A
real(real64), dimension(:), intent(in) :: x
real(real64), dimension(:), intent(inout) :: y
integer, intent(in) :: nloop
!
real(real64), dimension(size(x)) :: z
integer :: k, iter
!
y = x
do iter = 1, nloop
z = y
y = 0
do k = 1, size(A,2)
y = y + A(:,k)*z(k)
end do
end do
!
end subroutine axpy_loop
!
end module test_mod
compile as
gfortran -O3 -o simple test_mod.f90 test_simple.f90
run as
./simple
The combination of flags -march=native and -flto is the solution to the problem, at least on my testing computers. With those options, the program is fully optimized and there is no difference between having the subroutine in the same file as the main program, or in a separate file (separate module). In addition, the runtime is comparable to the runtime with Portland Group compiler. Any one of these options alone did not solved the problem. -march=native alone speeds the in contain version but makes the module version worse.
My biased thinking is that the option -march=native should be default; users doing something else are experienced and know what they are doing so they can add the appropriate option or disable the default, whereas the common user will not easily think of it.
Thank you for all the comments.

Constant shape error in simple code

I have an error with this code and I don't understand why
-"The module or main program array 'u' at (1) must have constant shape."
-Moreover, how can I do this code with a choice of parameters, I mean [U]=vector(N) where I can chose N and it returns me U.
program vector
!declaration
implicit none
integer :: n
integer, parameter :: N=10
real, dimension(N,1) :: U
do n=1,N
U(1,N)=n
end do
print*,U
end program vector
First up, Fortran is caseINsensitive, so n and N are the same thing, and you can't declare two different variables/parameters n and N.
Then you declare U to have shape (N, 1), but seem to use it in the form (1, N).
As for how to auto-generate something like U, you could use something like this:
function vector(n) result(v)
integer, intent(in) :: n
integer :: v(n)
integer :: i
v = [ (i, i=1, n) ]
return
end function vector
One more thing:
You declare U with dimension(1, N) which creates a 2D array with one dimension having length 1. I'm wondering whether you wanted to create a 1D array with range from 1 to N, for which the declaration would need to be dimension(1:N) (or, since Fortran assumes indices start at 1, just dimension(N)).
Addressing the questions in your comment:
The purpose of intent(in) tells the compiler that n is only read, not written to, in this function. Considering that you want to use n as the size of array v, you want that.
With result(v) I tell the compiler that I want to use the name v to refer to the result of the function, not the default (which is the function name). I do this to avoid confusion.
integer :: v(n) is the same as integer, dimension(n) :: v

Make a subroutine faster

In the main part of my Fortran code I have this lines
Gmat=0
do i=1,indCompMax
do j=(i-1)*useSymFlag+1,nsit-(i-1)*useSymFlag
l=1
do while (G0poles(l,2)/=0)
Gmat(i,j)=Gmat(i,j)+real(G0int(i,j,l))/(omega(k)-G0poles(l,1))**G0poles(l,3)
l=l+1
enddo
enddo
enddo
call ExtendBySymmetry(Gmat)
This part is repeated several times in the code so I defined this subroutine
!=============================================================================
SUBROUTINE EvaluateFunc(matrixPol,matrixInt,z,matrix)
use NAGmodule
integer i,j,k
REAL*8, DIMENSION(Npoles,3) :: matrixPol
COMPLEX*16, DIMENSION(nsit,nsit,Npoles) :: matrixInt
COMPLEX*16, DIMENSION(nsit,nsit) :: matrix
COMPLEX*16 :: z
do i=1,indCompMax
do j=(i-1)*useSymFlag+1,nsit-(i-1)*useSymFlag
k=1
do while (matrixPol(k,2)/=0)
matrix(i,j)=matrix(i,j)+real(matrixInt(i,j,k))/(z-matrixPol(k,1))**matrixPol(k,3)
k=k+1
enddo
enddo
enddo
call ExtendBySymmetry(matrix)
end
The problem is that if I use this subroutine the evaluation of the output matrix takes much longer (around 5 times slower) than the same evaluation but made directly in the main part of the code.
How can I optimize the code and make the evaluation made with the subroutine faster?
UPDATE: Thanks for the reply. First, the operation **matrixPol(k,3) is present also in the main code, I forgot to write it in the post.
For the comparison (matrixPol(k,2)/=0) there is no problem because actually starting from a certain position of the vector all the elements are exactly zero.
Computing the prefactor outside the i,j loop helped to speed the subroutine. And switching the two indices i and j has practically no effect. Here are the running times of the subroutine
All in the main part
1.688s
my old subroutine
19.063s
with factor outside the loop i,j
5.193s
Switching the indices i and j
5.281s
with dot_product
4.958s
But the subroutine is still more than 2.5 time slower.
Here is a minimal example:
module NAGmodule
implicit none
real*8, allocatable :: hmat(:,:),eval(:),eigmat(:,:)
real*8, allocatable :: G0poles(:,:)
complex*16, allocatable :: G0int(:,:,:)
complex*16, allocatable :: Gmat(:,:)
real*8, allocatable :: omega(:)
integer :: numpoles, halffillingflag, iter, indCompMax
complex*16 :: omegaComplex
real*8, parameter :: pi=3.141592653589793
integer, parameter :: out_unit=10
integer, parameter :: timeFag=1
integer :: counti, countf, count_rate
real :: dt
integer, parameter :: Npoles=1000
real*8, parameter :: U=4
real*8, parameter :: omegamin=-20
real*8, parameter :: omegamax=20
integer, parameter :: Nomega=1500000
integer, parameter :: nsit = 4
integer, parameter :: nup = 1
integer, parameter :: ndw = 1
integer, parameter :: PBCflag=1
integer, parameter :: useSymFlag=1
end module NAGmodule
use nagmodule
integer :: i,j,k,l,m,n,p,q
REAL*8 t1,t2
allocate(hmat(nsit,nsit),eigmat(nsit,nsit),eval(nsit))
allocate(G0poles(Npoles,3),G0int(nsit,nsit,Npoles))
allocate(omega(Nomega))
allocate(Gmat(nsit,nsit))
indCompMax=1
hmat=0.
do i=1,(nsit-1)
hmat(i,i+1)=-1
hmat(i+1,i)=-1
enddo
if (PBCflag==1) then
hmat(1,nsit)=-1
hmat(nsit,1)=-1
end if
call NAGdiag(nsit)
eigmat=hmat
do k=1,Nomega
omega(k)=(omegamax-omegamin)/(Nomega-1)*(k-1)+omegamin
enddo
do k=1,nup
G0poles(k,1)=eval(k)
G0poles(k,2)=-1
G0poles(k,3)=1
enddo
do k=(nup+1),nsit
G0poles(k,1)=eval(k)
G0poles(k,2)=1
G0poles(k,3)=1
enddo
do k=1,nsit
G0int(k,k,k)=1
if ((k==nup).AND.(abs(eval(k)-eval(k+1))<1e-15)) then
G0int(k,k,k)=0.5
G0int(k+1,k+1,k)=0.5
else if ((k==nup+1).AND.(abs(eval(k)-eval(k-1))<1e-15)) then
G0int(k,k,k)=0.5
G0int(k-1,k-1,k)=0.5
end if
enddo
do k=1,nsit
G0int(:,:,k)=matmul(eigmat,matmul(G0int(:,:,k),transpose(eigmat)))
enddo
t1=0
t2=0
do k=1,Nomega
omegaComplex=CMPLX(omega(k),0)
call system_clock(counti,count_rate)
Gmat=0
call EvaluateFunc3(G0poles,G0int,omegaComplex,Gmat)
call system_clock(countf)
dt=REAL(countf-counti)/REAL(count_rate)
t1=t1+dt
call system_clock(counti,count_rate)
Gmat=0
do i=1,indCompMax
do j=(i-1)*useSymFlag+1,nsit-(i-1)*useSymFlag
l=1
do while (G0poles(l,2)/=0)
Gmat(i,j)=Gmat(i,j)+real(G0int(i,j,l))/(omega(k)-G0poles(l,1))
l=l+1
enddo
enddo
enddo
call ExtendBySymmetry(Gmat)
call system_clock(countf)
dt=REAL(countf-counti)/REAL(count_rate)
t2=t2+dt
enddo
write(*,*)'time with subroutine',t1
write(*,*)'time main',t2
stop
end
!=================================================================================
SUBROUTINE EvaluateFunc3(matrixPol,matrixInt,z,matrix)
use NAGmodule
integer i,j,k
REAL*8, DIMENSION(Npoles,3) :: matrixPol
COMPLEX*16, DIMENSION(nsit,nsit,Npoles) :: matrixInt
COMPLEX*16, DIMENSION(nsit,nsit) :: matrix
COMPLEX*16 :: z
integer :: maxPoles
COMPLEX*16, DIMENSION(Npoles) :: factor
maxPoles=0
do while (matrixPol(maxPoles+1,2)/=0)
maxPoles=maxPoles+1
enddo
factor(:maxPoles) = (1.,0.)/(z-matrixPol(:maxPoles,1))**matrixPol(:maxPoles,3)
do j=1,indCompMax
do i=(j-1)*useSymFlag+1,nsit-(j-1)*useSymFlag
matrix(i,j)=matrix(i,j)+dot_product(matrixInt(i,j,1:maxPoles),factor(1:maxPoles))
enddo
enddo
call ExtendBySymmetry2(matrix)
end
!=================================================================================
SUBROUTINE ExtendBySymmetry2(matrix)
use NAGmodule
COMPLEX*16, DIMENSION(nsit,nsit) :: matrix
integer k,i,j,l,m
if ((PBCflag==1).AND.(useSymFlag==1)) then
do i=2,nsit
matrix(2:nsit,i)=matrix(1:nsit-1,i-1)
matrix(1,i)=matrix(nsit,i-1)
enddo
else if ((PBCflag==0).AND.(useSymFlag==1)) then
do j=1,nsit/2
do i=j,nsit-j+1
matrix(j,i)=matrix(i,j)
matrix(nsit-i+1,nsit-j+1)=matrix(i,j)
matrix(nsit-j+1,nsit-i+1)=matrix(i,j)
enddo
enddo
end if
end
!=================================================================================
SUBROUTINE ExtendBySymmetry(matrix)
use NAGmodule
COMPLEX*16, DIMENSION(nsit,nsit) :: matrix
integer k,i,j,l,m
if ((PBCflag==1).AND.(useSymFlag==1)) then
do i=2,nsit
matrix(i,2:nsit)=matrix(i-1,1:nsit-1)
matrix(i,1)=matrix(i-1,nsit)
enddo
else if ((PBCflag==0).AND.(useSymFlag==1)) then
do i=1,nsit/2
do j=i,nsit-i+1
matrix(j,i)=matrix(i,j)
matrix(nsit-i+1,nsit-j+1)=matrix(i,j)
matrix(nsit-j+1,nsit-i+1)=matrix(i,j)
enddo
enddo
end if
end
!=================================================================================
SUBROUTINE NAGdiag(nsit1)
use NAGmodule
real*8, allocatable :: WORK(:)
integer, allocatable :: IWORK(:)
CHARACTER JOB, UPLO
EXTERNAL dsyevd
NMAX=nsit1
LDA=NMAX
LWORK=4*NMAX*NMAX+100
LIWORK=5*NMAX
LIWORK=10*NMAX
ALLOCATE(WORK(LWORK),IWORK(LIWORK))
JOB='V'
UPLO='L'
CALL dsyevd(JOB,UPLO,nsit1,hmat,LDA,eval,WORK,LWORK,IWORK,LIWORK,INFO)
IF (INFO.GT.0) THEN
WRITE (*,*) 'Failure to converge.'
stop
endif
deALLOCATE(WORK,IWORK)
return
end`
Due to several edits of the original question, the answer is partially superfluous by now. However, the optimization part is still valid.
The real issue with you code is that you are passing z as a complex number to the subroutine (omegaComplex), while omega(k) is real. This results in the exponentiation and division being performed as complex operations instead of real ones.
Fixing z to be real (and factor in the optimization as well) leads to the expected results. With optimizations I get:
time with subroutine 0.24000001139938831
time main 0.35700001695659012
Original answer:
First of all, the subroutine does not do the same operations that your inline code does. The operation **matrixPol(k,3) is the power to a complex number which involves a heavy computational effort. No wonder the subroutine is a lot slower.
I see a few possibilities to accelerate your code:
The divisor (z-matrixPol(k,1))**matrixPol(k,3) is independent of i and j and can be taken out of the loop.
Divisions are more expensive than multiplications. Therefore, you should pre-compute factor = 1/div outside the loop, and multiply with factor in the loop.
The comparison (matrixPol(k,2)/=0) will almost never be true, unless you set the corresponding values to exactly zero. I assume you know the order of your poles before you call the subroutine, so why not pass it along and save yourself this comparison? If that is not possible, determine the number of poles inside the subroutine before the main loop. Then, the inner loop over k is much simpler.
Inside the loop, you convert the input matrix to real over and over again. This can be done outside the loop in one go. Or, even better, just pass only the real part to the function!
At this point your code looks something like:
!=============================================================================
SUBROUTINE EvaluateFunc(matrixPol,matrixInt,z,matrix)
use NAGmodule
integer i,j,k
REAL*8, DIMENSION(Npoles,3) :: matrixPol
COMPLEX*16, DIMENSION(nsit,nsit,Npoles) :: matrixInt
COMPLEX*16, DIMENSION(nsit,nsit) :: matrix
COMPLEX*16 :: z, factor(Npoles)
REAL*8, DIMENSION(nsit,nsit,Npoles) :: matrixInt_re
integer :: maxPoles
! Determine maximum number of poles
do k=1,Npoles
! Only valid if the poles are set to exactly zero outside. If not,
! use ( abs(matrixPol(k,2)) <= someEpsilon )
if ( matrixPol(k,2) == 0 ) then
maxPoles = k-1
exit
endif
enddo
! Pre-compute factors
factor(:maxPoles) = (1.,0.)/(z-matrixPol(:maxPoles,1))**matrixPol(:maxPoles,3)
! Convert input to real
matrixInt_re = real(matrixInt)
do i=1,indCompMax
do j=i,nsit-i+1
do k=1,maxPoles
matrix(i,j)=matrix(i,j)+matrixInt_re(i,j,k)*factor(k)
enddo
enddo
enddo
call ExtendBySymmetry(Gmat)
end
Further optimization:
Rewriting the code like this it becomes apparent that the inner loop over k is nothing more than a dot product. This could potentially be sped up by the compiler. The main loop would then look like:
do i=1,indCompMax
do j=i,nsit-i+1
matrix(i,j)=matrix(i,j) + &
dot_product( matrixInt_re(i,j,:maxPoles), factor(:maxPoles) )
enddo
enddo
As chw21 noted, Fortran uses a column major memory layout and you are accessing it in a row major fashion. This potentially loses you a lot of memory. You should consider transposing your arrays in the main program or at least switch the loops over i and j. I would prefer the first option, as the inner dot product would then be performed on contiguous memory chunks.
Try to see whether you can swap the loops around. Since Fortran stores the arrays in the order
(1, 1), (2, 1), (3, 1), ..., (n, 1), (1, 2), (2, 2), ...
Memory access is far faster if you loop along that dimension.

Resources