Ambiguous reference to variable - gcc

So I am doing 2 modules which are linking to the main program. The first one has all the variables defined in it and the second one is with the functions.
Module1:
module zmienne
implicit none
integer, parameter :: ngauss = 8
integer, parameter :: out_unit=1000
integer, parameter :: out_unit1=1001
integer, parameter :: out_unit2=1002, out_unit3=1003
real(10), parameter :: error=0.000001
real(10):: total_calka, division,tot_old,blad
real(10),parameter:: intrange=7.0
real(10),dimension(ngauss),parameter::xx=(/-0.9602898565d0,&
-0.7966664774d0,-0.5255324099d0,-0.1834346425d0,&
0.1834346425d0,0.5255324099d0,0.7966664774d0,0.9602898565d0/)
real(10),Dimension(ngauss),parameter::ww=(/0.1012285363d0,&
0.2223810345d0,0.3137066459d0,0.3626837834d0,&
0.3626837834d0,0.3137066459d0,0.2223810345d0,0.1012285363d0/)
real(10) :: r, u, r6, tempred, f, r2, r1, calka,beta
real(10) :: inte
real :: start, finish
integer:: i,j,irange
real(10),dimension(ngauss)::x,w,integrand
end module zmienne
Module2
module in
implicit none
contains
real(10) function inte(y,beta,r2,r1)
real(kind=10)::r,beta,r6,r2,r1,u,y
r=(r2-r1)*y+r1
r6=(1.0/r)**6
u=beta*r6*(r6-1.0d0)
if (u>100.d0) then
inte=-1.0d0
else
inte=exp(-u)-1.d0
endif
inte=r*r*inte
end function
end module in
And while im calling them like that:
use zmienne; use in
I am getting following error:
Name 'inte' at (1) is an ambiguous reference to 'inte' from module 'zmienne'
I've deleted "inte" in the module1 but now I am getting following error:
irange=inte(intrange/division)
1
Error: Missing actual argument for argument 'beta' at (1)
The main program code is:
program wykres
use zmienne; use in
implicit none
open(unit=out_unit, file='wykresik.dat', action='write', status='replace')
open(unit=out_unit1, file='wykresik1.dat', action='write')
open(unit=out_unit2, file='wykresik2.dat', action='write')
open(out_unit3, file='wykresik3.dat', action='write')
! the gaussian points (xx) and weights (ww) are for the [-1,1] interval
! for [0,1] interval we have (vector instr.)
x=0.5d0*(xx+1.0d0)
w=0.5d0*ww
! plots
tempred = 1.0
call cpu_time(start)
do i=1,1000
r=float(i)*0.01
r6=(1.0/r)**6
u=beta*r6*(r6-1.0)
f=exp(-u/tempred)-1.0
write(out_unit,*) r, u
write(out_unit1,*)r, f
write(out_unit2,*)r, r*r*f
end do
call cpu_time(finish)
print '("Time = ",f6.3," seconds.")',finish-start
! end of plots
! integration 1
calka=0.0
r1=0.0
r2=0.5
do i=1,ngauss
r=(r2-r1)*x(i)+r1
r6=(1.0/r)**6
u=beta*r6*(r6-1.0d0)
! check for underflows
if (u>100.d0) then
f=-1.0d0
else
f=exp(-u)-1.d0
endif
! the array integrand is introduced in order to perform vector calculations below
integrand(i)=r*r*f
calka=calka+integrand(i)*w(i)
enddo
calka=calka*(r2-r1)
write(*,*)calka
! end of integration
! integration 2
calka=0.0
do i=1,ngauss
integrand(i)=inte(x(i),beta,r2,r1)
calka=calka+integrand(i)*w(i)
enddo
calka=calka*(r2-r1)
! end of integration 2
write(*,*)calka
! vector integration and analytical result
write(*,*)sum(integrand*w*(r2-r1)),-(0.5**3)/3.0
!**************************************************************
! tot_calka - the sum of integrals all integration ranges
! dividion the initial length of the integration intervals
! tot_old - we will compare the results fro two consecutive divisions.
! at the beginning we assume any big number
! blad - the difference between two consecutive integrations,
! at the beginning we assume any big number
! error - assumed precission, parameter, it is necassary for
! performing do-while loop
total_calka=0.0
division=0.5
tot_old=10000.0
blad=10000.0
do while (blad>error)
! intrange - the upper integration limit, it should be estimated
! analysing the plot of the Mayer function. Here - 7.
! irange = the number of subintegrals we have to calculate
irange=inte(intrange/division)
total_calka=-(0.5**3)/3.0
! the analytical result for the integration range [0,0.5]
! the loop over all the intervals, for each of them we calculate
! lower and upper limits, r1 and r2
do j=1,irange
r1=0.5+(j-1)*division
r2=r1+division
calka=0.0
! the integral for a given interval
do i=1,ngauss
integrand(i)=inte(x(i),beta,r2,r1)
calka=calka+integrand(i)*w(i)
enddo
total_calka=total_calka+calka*(r2-r1)
enddo
! aux. output: number of subintervals, old and new integrals
write(*,*) irange,division,tot_old,total_calka
division=division/2.0
blad=abs(tot_old-total_calka)
tot_old=total_calka
! and the final error
write(*,*) blad
enddo
open(1,file='calka.dat', access='append')
! the secod viarial coefficient=CONSTANT*total_calka,
! CONSTANT is omitted here
write(1,*)tempred,total_calka
close(1)
end program wykres

The inte is declared in both modules.
Upd. The inte(y,beta,r2,r1) function is defined in the module in, and is used in the main program. This function requires four arguments, but this call
irange=inte(intrange/division)
provides only one argument. I'm not sure if this function should be used in this case. Try to use long meaningful names for variables and functions to avoid similar issues.

Related

Racing condition in assignment to variable of unknown scope

a parallel procedure computes the scalar product of two vectors in the following way
Compute the contribution of the current MPI process using OpenMP reduction. The result is stored in the SHARED variable s_loc
OpenMP master calls MPI_Allreduce to compute the sum of s_loc over all MPI processes. The result is stored in the SHARED variable s_glob
Finally, s_glob is assigned to the procedure output argument:
s = s_glob
Note that the scope of s is unknown, because it is defined outside the procedure. In the PRIVATE case, the assignment is just per thread and all is fine. However, if s is SHARED, a racing condition occurs, since s_glob is SHARED. I wonder if this racing condition may pose any problem and, if so, how it could be avoided.
In the following example the MPI part is removed, because it is important only for the scope of variables, but not for the problem itself.
program Scalar_Product
implicit none
integer, parameter :: n = 10000000
real, save :: a(n), b(n)
real :: ab
call random_number(a)
call random_number(b)
!$omp parallel private(ab)
ab = ScalarProduct(a, b)
!$omp end parallel
contains
real function ScalarProduct(a, b) result(s)
real, intent(in) :: a(:) ! left operand
real, intent(in) :: b(:) ! right operand
real, save :: s_loc, s_glob
integer :: i
!$omp master
s_loc = 0
!$omp end master
!$omp do reduction(+:s_loc) schedule(static)
do i = 1, size(a)
s_loc = s_loc + a(i) * b(i)
end do
!$omp master ! in the real application
s_glob = s_loc ! s_glob is computed using
!$omp end master ! MPI_Allreduce
! this is the assignment to which the question refers
s = s_glob
end function ScalarProduct
end program Scalar_Product
Thanks for any helpful comments!
Joerg
I revised the example and think that answers my question.
program Scalar_Product
integer, parameter :: n = 10000000
real :: a(n), b(n), ab
! skipping initialization of a and b ...
!$omp parallel shared(a,b) private(ab)
ab = ScalarProduct(a, b) ! would give a racing condition, if ab is shared
!$omp end parallel
contains
function ScalarProduct(a, b) result(s)
real, intent(in) :: a(:) ! left operand
real, intent(in) :: b(:) ! right operand
real, save :: s ! force result to be shared
integer :: i
!$omp master
s = 0
!$omp end master
!$omp do reduction(+:s) schedule(static)
do i = 1, size(a)
s = s + a(i) * b(i)
end do
end function ScalarProduct
end program Scalar_Product
Observations
By declaring the function result save it becomes shared
This allows to eliminate the problematic assignment inside the function
However, as the function is called inside a parallel region, its result must be assigned to a private variable to avoid a racing condition
Cheers, Joerg

Big overhead when subroutine is in separate module versus in the same file as the main program

I am evaluating the overhead cost (in wall clock time) of some features in fortran programs. And I came across the following behavior with GNU fortran, that I did not expect: having the subroutine in the same file as the main program (in the contain region or in a module) versus having the subroutine in a separate module (in separate file) has a big impact.
The simple code that reproduces the behavior is:
I have a subroutine that does a matrix-vector multiplication 250000 times. In the first test, I have a subroutine in the contain region of the main program. In the second test, the same subroutine is in a separate module.
The difference in performance between the two is big.
The subroutine in the contain region of the main program, 10 runs yields
min: 1.249
avg: 1.266
1.275 - 1.249 - 1.264 - 1.279 - 1.266 - 1.253 - 1.271 - 1.251 - 1.269 - 1.284
The subroutine in separate module, 10 runs yields
min: 1.848
avg: 1.861
1.848 - 1.862 - 1.853 - 1.871 - 1.854 - 1.883 - 1.810 - 1.860 - 1.886 - 1.884
About 50% slower, this factor seems consistent with the size of the matrix as
well as the number of iterations.
those tests are done with gfortran 4.8.5. With gfortran 8.3.0, the program runs a little faster, but the time doubles from the subroutine in the contain section of the main program to the subroutine in a separate module.
Portland group does not have that problem with my test program and it run even faster than the best case of gfortran.
If I read the size of the matrix from an input file (or runtime command line arg) and do dynamic allocation, then the difference in wall clock time goes away and both cases run slower (wall clock time of the subroutine in the separate module, separate file). I suspect that gfortran is able to optimize the main program better if the size of the matrix is known at compile time in the main program.
What am I doing wrong that GNU Compilers do not like, or what is GNU compiler doing poorly? Are there compiling flags to to help gfortran in such cases?
Everything is compiled with optimization -O3
Code (test_simple.f90)
!< #file test_simple.f90
!! simple test
!>
!
program test_simple
!
use iso_fortran_env
use test_mod
!
implicit none
!
integer, parameter :: N = 100
integer, parameter :: N_TEST = 250000
logical, parameter :: GENERATE=.false.
!
real(real64), parameter :: dx = 10.0_real64
real(real64), parameter :: lx = 40.0_real64
!
real(real64), dimension(N,N) :: A
real(real64), dimension(N) :: x, y
real(real64) :: start_time, end_time
real(real64) :: duration
!
integer :: k, loop_idx
!
call make_matrix(A,dx,lx)
x = A(N/2,:)
!
y = 0
call cpu_time( start_time )
call axpy_loop (A, x, y, N_TEST)
!call axpy_loop_in (A, x, y, N_TEST)
!
call cpu_time( end_time )
!
duration = end_time-start_time
!
if( duration < 0.01 )then
write( *, "('Total time:',f10.6)" ) duration
else
write( *, "('Total time:',f10.3)" ) duration
end if
!
write(*,"('Sum = ',ES14.5E3)") sum(y)
!
contains
!
!< #brief compute y = y + A^nx
!! #param[in] A matrix to use
!! #param[in] x vector to used
!! #param[in, out] y output
!! #param[in] nloop number of iterations, power to apply to A
!!
!>
subroutine axpy_loop_in (A, x, y, nloop)
real(real64), dimension(:,:), intent(in) :: A
real(real64), dimension(:), intent(in) :: x
real(real64), dimension(:), intent(inout) :: y
integer, intent(in) :: nloop
!
real(real64), dimension(size(x)) :: z
integer :: k, iter
!
y = x
do iter = 1, nloop
z = y
y = 0
do k = 1, size(A,2)
y = y + A(:,k)*z(k)
end do
end do
!
end subroutine axpy_loop_in
!
!> #brief Computes the square exponential correlation kernel matrix for
!! a 1D uniform grid, using coordinate vector and scalar parameters
!! #param [in, out] C square matrix of correlation (kernel)
!! #param [in] dx grid spacing
!! #param [in] lx decorrelation length
!!
!! The correlation betwen the grid points i and j is given by
!! \f$ C(i,j) = \exp(\frac{-(xi-xj)^2}{2l_xi l_xj}) \f$
!! where xi and xj are respectively the coordinates of point i and j
!>
subroutine make_matrix(C, dx, lx)
! some definitions of the square correlation
! uses 2l^2 while other use l^2
! l^2 is used here by setting this factor to 1.
real(real64), parameter :: factor = 1.0
!
real(real64), dimension(:,:), intent(in out) :: C
real(real64), intent(in) :: dx
real(real64) lx
! Local variables
real(real64), dimension(size(x)) :: nfacts
real :: dist, denom
integer :: ii, jj
!
do jj=1, size(C,2)
do ii=1, size(C,1)
dist = (ii-jj)*dx
denom = factor*lx*lx
C(ii, jj) = exp( -dist*dist/denom )
end do
! compute normalization factors
nfacts(jj) = sqrt( sum( C(:, jj) ) )
end do
!
! normalize to prevent arbitrary growth in those tests
! where we apply the exponential of the matrix
do jj=1, size(C,2)
do ii=1, size(C,1)
C(ii, jj) = C(ii, jj)/( nfacts(ii)*nfacts(jj) )
end do
end do
! remove the very small
where( C<epsilon(1.) ) C=0.
!
end subroutine make_matrix
!
end program test_simple
!
Code (test_mod.f90)
!> #file test_mod.f90
!! simple operations
!<
!< #brief module for simple operations
!!
!>
module test_mod
use iso_fortran_env
implicit none
contains
!
!< #brief compute y = y + A^nx
!! #param[in] A matrix to use
!! #param[in] x vector to used
!! #param[in, out] y output
!! #param[in] nloop number of iterations, power to apply to A
!!
!>
subroutine axpy_loop( A, x, y, nloop )
real(real64), dimension(:,:), intent(in) :: A
real(real64), dimension(:), intent(in) :: x
real(real64), dimension(:), intent(inout) :: y
integer, intent(in) :: nloop
!
real(real64), dimension(size(x)) :: z
integer :: k, iter
!
y = x
do iter = 1, nloop
z = y
y = 0
do k = 1, size(A,2)
y = y + A(:,k)*z(k)
end do
end do
!
end subroutine axpy_loop
!
end module test_mod
compile as
gfortran -O3 -o simple test_mod.f90 test_simple.f90
run as
./simple
The combination of flags -march=native and -flto is the solution to the problem, at least on my testing computers. With those options, the program is fully optimized and there is no difference between having the subroutine in the same file as the main program, or in a separate file (separate module). In addition, the runtime is comparable to the runtime with Portland Group compiler. Any one of these options alone did not solved the problem. -march=native alone speeds the in contain version but makes the module version worse.
My biased thinking is that the option -march=native should be default; users doing something else are experienced and know what they are doing so they can add the appropriate option or disable the default, whereas the common user will not easily think of it.
Thank you for all the comments.

Ambiguous reference from a Fortran module [duplicate]

So I am doing 2 modules which are linking to the main program. The first one has all the variables defined in it and the second one is with the functions.
Module1:
module zmienne
implicit none
integer, parameter :: ngauss = 8
integer, parameter :: out_unit=1000
integer, parameter :: out_unit1=1001
integer, parameter :: out_unit2=1002, out_unit3=1003
real(10), parameter :: error=0.000001
real(10):: total_calka, division,tot_old,blad
real(10),parameter:: intrange=7.0
real(10),dimension(ngauss),parameter::xx=(/-0.9602898565d0,&
-0.7966664774d0,-0.5255324099d0,-0.1834346425d0,&
0.1834346425d0,0.5255324099d0,0.7966664774d0,0.9602898565d0/)
real(10),Dimension(ngauss),parameter::ww=(/0.1012285363d0,&
0.2223810345d0,0.3137066459d0,0.3626837834d0,&
0.3626837834d0,0.3137066459d0,0.2223810345d0,0.1012285363d0/)
real(10) :: r, u, r6, tempred, f, r2, r1, calka,beta
real(10) :: inte
real :: start, finish
integer:: i,j,irange
real(10),dimension(ngauss)::x,w,integrand
end module zmienne
Module2
module in
implicit none
contains
real(10) function inte(y,beta,r2,r1)
real(kind=10)::r,beta,r6,r2,r1,u,y
r=(r2-r1)*y+r1
r6=(1.0/r)**6
u=beta*r6*(r6-1.0d0)
if (u>100.d0) then
inte=-1.0d0
else
inte=exp(-u)-1.d0
endif
inte=r*r*inte
end function
end module in
And while im calling them like that:
use zmienne; use in
I am getting following error:
Name 'inte' at (1) is an ambiguous reference to 'inte' from module 'zmienne'
I've deleted "inte" in the module1 but now I am getting following error:
irange=inte(intrange/division)
1
Error: Missing actual argument for argument 'beta' at (1)
The main program code is:
program wykres
use zmienne; use in
implicit none
open(unit=out_unit, file='wykresik.dat', action='write', status='replace')
open(unit=out_unit1, file='wykresik1.dat', action='write')
open(unit=out_unit2, file='wykresik2.dat', action='write')
open(out_unit3, file='wykresik3.dat', action='write')
! the gaussian points (xx) and weights (ww) are for the [-1,1] interval
! for [0,1] interval we have (vector instr.)
x=0.5d0*(xx+1.0d0)
w=0.5d0*ww
! plots
tempred = 1.0
call cpu_time(start)
do i=1,1000
r=float(i)*0.01
r6=(1.0/r)**6
u=beta*r6*(r6-1.0)
f=exp(-u/tempred)-1.0
write(out_unit,*) r, u
write(out_unit1,*)r, f
write(out_unit2,*)r, r*r*f
end do
call cpu_time(finish)
print '("Time = ",f6.3," seconds.")',finish-start
! end of plots
! integration 1
calka=0.0
r1=0.0
r2=0.5
do i=1,ngauss
r=(r2-r1)*x(i)+r1
r6=(1.0/r)**6
u=beta*r6*(r6-1.0d0)
! check for underflows
if (u>100.d0) then
f=-1.0d0
else
f=exp(-u)-1.d0
endif
! the array integrand is introduced in order to perform vector calculations below
integrand(i)=r*r*f
calka=calka+integrand(i)*w(i)
enddo
calka=calka*(r2-r1)
write(*,*)calka
! end of integration
! integration 2
calka=0.0
do i=1,ngauss
integrand(i)=inte(x(i),beta,r2,r1)
calka=calka+integrand(i)*w(i)
enddo
calka=calka*(r2-r1)
! end of integration 2
write(*,*)calka
! vector integration and analytical result
write(*,*)sum(integrand*w*(r2-r1)),-(0.5**3)/3.0
!**************************************************************
! tot_calka - the sum of integrals all integration ranges
! dividion the initial length of the integration intervals
! tot_old - we will compare the results fro two consecutive divisions.
! at the beginning we assume any big number
! blad - the difference between two consecutive integrations,
! at the beginning we assume any big number
! error - assumed precission, parameter, it is necassary for
! performing do-while loop
total_calka=0.0
division=0.5
tot_old=10000.0
blad=10000.0
do while (blad>error)
! intrange - the upper integration limit, it should be estimated
! analysing the plot of the Mayer function. Here - 7.
! irange = the number of subintegrals we have to calculate
irange=inte(intrange/division)
total_calka=-(0.5**3)/3.0
! the analytical result for the integration range [0,0.5]
! the loop over all the intervals, for each of them we calculate
! lower and upper limits, r1 and r2
do j=1,irange
r1=0.5+(j-1)*division
r2=r1+division
calka=0.0
! the integral for a given interval
do i=1,ngauss
integrand(i)=inte(x(i),beta,r2,r1)
calka=calka+integrand(i)*w(i)
enddo
total_calka=total_calka+calka*(r2-r1)
enddo
! aux. output: number of subintervals, old and new integrals
write(*,*) irange,division,tot_old,total_calka
division=division/2.0
blad=abs(tot_old-total_calka)
tot_old=total_calka
! and the final error
write(*,*) blad
enddo
open(1,file='calka.dat', access='append')
! the secod viarial coefficient=CONSTANT*total_calka,
! CONSTANT is omitted here
write(1,*)tempred,total_calka
close(1)
end program wykres
The inte is declared in both modules.
Upd. The inte(y,beta,r2,r1) function is defined in the module in, and is used in the main program. This function requires four arguments, but this call
irange=inte(intrange/division)
provides only one argument. I'm not sure if this function should be used in this case. Try to use long meaningful names for variables and functions to avoid similar issues.

Make a subroutine faster

In the main part of my Fortran code I have this lines
Gmat=0
do i=1,indCompMax
do j=(i-1)*useSymFlag+1,nsit-(i-1)*useSymFlag
l=1
do while (G0poles(l,2)/=0)
Gmat(i,j)=Gmat(i,j)+real(G0int(i,j,l))/(omega(k)-G0poles(l,1))**G0poles(l,3)
l=l+1
enddo
enddo
enddo
call ExtendBySymmetry(Gmat)
This part is repeated several times in the code so I defined this subroutine
!=============================================================================
SUBROUTINE EvaluateFunc(matrixPol,matrixInt,z,matrix)
use NAGmodule
integer i,j,k
REAL*8, DIMENSION(Npoles,3) :: matrixPol
COMPLEX*16, DIMENSION(nsit,nsit,Npoles) :: matrixInt
COMPLEX*16, DIMENSION(nsit,nsit) :: matrix
COMPLEX*16 :: z
do i=1,indCompMax
do j=(i-1)*useSymFlag+1,nsit-(i-1)*useSymFlag
k=1
do while (matrixPol(k,2)/=0)
matrix(i,j)=matrix(i,j)+real(matrixInt(i,j,k))/(z-matrixPol(k,1))**matrixPol(k,3)
k=k+1
enddo
enddo
enddo
call ExtendBySymmetry(matrix)
end
The problem is that if I use this subroutine the evaluation of the output matrix takes much longer (around 5 times slower) than the same evaluation but made directly in the main part of the code.
How can I optimize the code and make the evaluation made with the subroutine faster?
UPDATE: Thanks for the reply. First, the operation **matrixPol(k,3) is present also in the main code, I forgot to write it in the post.
For the comparison (matrixPol(k,2)/=0) there is no problem because actually starting from a certain position of the vector all the elements are exactly zero.
Computing the prefactor outside the i,j loop helped to speed the subroutine. And switching the two indices i and j has practically no effect. Here are the running times of the subroutine
All in the main part
1.688s
my old subroutine
19.063s
with factor outside the loop i,j
5.193s
Switching the indices i and j
5.281s
with dot_product
4.958s
But the subroutine is still more than 2.5 time slower.
Here is a minimal example:
module NAGmodule
implicit none
real*8, allocatable :: hmat(:,:),eval(:),eigmat(:,:)
real*8, allocatable :: G0poles(:,:)
complex*16, allocatable :: G0int(:,:,:)
complex*16, allocatable :: Gmat(:,:)
real*8, allocatable :: omega(:)
integer :: numpoles, halffillingflag, iter, indCompMax
complex*16 :: omegaComplex
real*8, parameter :: pi=3.141592653589793
integer, parameter :: out_unit=10
integer, parameter :: timeFag=1
integer :: counti, countf, count_rate
real :: dt
integer, parameter :: Npoles=1000
real*8, parameter :: U=4
real*8, parameter :: omegamin=-20
real*8, parameter :: omegamax=20
integer, parameter :: Nomega=1500000
integer, parameter :: nsit = 4
integer, parameter :: nup = 1
integer, parameter :: ndw = 1
integer, parameter :: PBCflag=1
integer, parameter :: useSymFlag=1
end module NAGmodule
use nagmodule
integer :: i,j,k,l,m,n,p,q
REAL*8 t1,t2
allocate(hmat(nsit,nsit),eigmat(nsit,nsit),eval(nsit))
allocate(G0poles(Npoles,3),G0int(nsit,nsit,Npoles))
allocate(omega(Nomega))
allocate(Gmat(nsit,nsit))
indCompMax=1
hmat=0.
do i=1,(nsit-1)
hmat(i,i+1)=-1
hmat(i+1,i)=-1
enddo
if (PBCflag==1) then
hmat(1,nsit)=-1
hmat(nsit,1)=-1
end if
call NAGdiag(nsit)
eigmat=hmat
do k=1,Nomega
omega(k)=(omegamax-omegamin)/(Nomega-1)*(k-1)+omegamin
enddo
do k=1,nup
G0poles(k,1)=eval(k)
G0poles(k,2)=-1
G0poles(k,3)=1
enddo
do k=(nup+1),nsit
G0poles(k,1)=eval(k)
G0poles(k,2)=1
G0poles(k,3)=1
enddo
do k=1,nsit
G0int(k,k,k)=1
if ((k==nup).AND.(abs(eval(k)-eval(k+1))<1e-15)) then
G0int(k,k,k)=0.5
G0int(k+1,k+1,k)=0.5
else if ((k==nup+1).AND.(abs(eval(k)-eval(k-1))<1e-15)) then
G0int(k,k,k)=0.5
G0int(k-1,k-1,k)=0.5
end if
enddo
do k=1,nsit
G0int(:,:,k)=matmul(eigmat,matmul(G0int(:,:,k),transpose(eigmat)))
enddo
t1=0
t2=0
do k=1,Nomega
omegaComplex=CMPLX(omega(k),0)
call system_clock(counti,count_rate)
Gmat=0
call EvaluateFunc3(G0poles,G0int,omegaComplex,Gmat)
call system_clock(countf)
dt=REAL(countf-counti)/REAL(count_rate)
t1=t1+dt
call system_clock(counti,count_rate)
Gmat=0
do i=1,indCompMax
do j=(i-1)*useSymFlag+1,nsit-(i-1)*useSymFlag
l=1
do while (G0poles(l,2)/=0)
Gmat(i,j)=Gmat(i,j)+real(G0int(i,j,l))/(omega(k)-G0poles(l,1))
l=l+1
enddo
enddo
enddo
call ExtendBySymmetry(Gmat)
call system_clock(countf)
dt=REAL(countf-counti)/REAL(count_rate)
t2=t2+dt
enddo
write(*,*)'time with subroutine',t1
write(*,*)'time main',t2
stop
end
!=================================================================================
SUBROUTINE EvaluateFunc3(matrixPol,matrixInt,z,matrix)
use NAGmodule
integer i,j,k
REAL*8, DIMENSION(Npoles,3) :: matrixPol
COMPLEX*16, DIMENSION(nsit,nsit,Npoles) :: matrixInt
COMPLEX*16, DIMENSION(nsit,nsit) :: matrix
COMPLEX*16 :: z
integer :: maxPoles
COMPLEX*16, DIMENSION(Npoles) :: factor
maxPoles=0
do while (matrixPol(maxPoles+1,2)/=0)
maxPoles=maxPoles+1
enddo
factor(:maxPoles) = (1.,0.)/(z-matrixPol(:maxPoles,1))**matrixPol(:maxPoles,3)
do j=1,indCompMax
do i=(j-1)*useSymFlag+1,nsit-(j-1)*useSymFlag
matrix(i,j)=matrix(i,j)+dot_product(matrixInt(i,j,1:maxPoles),factor(1:maxPoles))
enddo
enddo
call ExtendBySymmetry2(matrix)
end
!=================================================================================
SUBROUTINE ExtendBySymmetry2(matrix)
use NAGmodule
COMPLEX*16, DIMENSION(nsit,nsit) :: matrix
integer k,i,j,l,m
if ((PBCflag==1).AND.(useSymFlag==1)) then
do i=2,nsit
matrix(2:nsit,i)=matrix(1:nsit-1,i-1)
matrix(1,i)=matrix(nsit,i-1)
enddo
else if ((PBCflag==0).AND.(useSymFlag==1)) then
do j=1,nsit/2
do i=j,nsit-j+1
matrix(j,i)=matrix(i,j)
matrix(nsit-i+1,nsit-j+1)=matrix(i,j)
matrix(nsit-j+1,nsit-i+1)=matrix(i,j)
enddo
enddo
end if
end
!=================================================================================
SUBROUTINE ExtendBySymmetry(matrix)
use NAGmodule
COMPLEX*16, DIMENSION(nsit,nsit) :: matrix
integer k,i,j,l,m
if ((PBCflag==1).AND.(useSymFlag==1)) then
do i=2,nsit
matrix(i,2:nsit)=matrix(i-1,1:nsit-1)
matrix(i,1)=matrix(i-1,nsit)
enddo
else if ((PBCflag==0).AND.(useSymFlag==1)) then
do i=1,nsit/2
do j=i,nsit-i+1
matrix(j,i)=matrix(i,j)
matrix(nsit-i+1,nsit-j+1)=matrix(i,j)
matrix(nsit-j+1,nsit-i+1)=matrix(i,j)
enddo
enddo
end if
end
!=================================================================================
SUBROUTINE NAGdiag(nsit1)
use NAGmodule
real*8, allocatable :: WORK(:)
integer, allocatable :: IWORK(:)
CHARACTER JOB, UPLO
EXTERNAL dsyevd
NMAX=nsit1
LDA=NMAX
LWORK=4*NMAX*NMAX+100
LIWORK=5*NMAX
LIWORK=10*NMAX
ALLOCATE(WORK(LWORK),IWORK(LIWORK))
JOB='V'
UPLO='L'
CALL dsyevd(JOB,UPLO,nsit1,hmat,LDA,eval,WORK,LWORK,IWORK,LIWORK,INFO)
IF (INFO.GT.0) THEN
WRITE (*,*) 'Failure to converge.'
stop
endif
deALLOCATE(WORK,IWORK)
return
end`
Due to several edits of the original question, the answer is partially superfluous by now. However, the optimization part is still valid.
The real issue with you code is that you are passing z as a complex number to the subroutine (omegaComplex), while omega(k) is real. This results in the exponentiation and division being performed as complex operations instead of real ones.
Fixing z to be real (and factor in the optimization as well) leads to the expected results. With optimizations I get:
time with subroutine 0.24000001139938831
time main 0.35700001695659012
Original answer:
First of all, the subroutine does not do the same operations that your inline code does. The operation **matrixPol(k,3) is the power to a complex number which involves a heavy computational effort. No wonder the subroutine is a lot slower.
I see a few possibilities to accelerate your code:
The divisor (z-matrixPol(k,1))**matrixPol(k,3) is independent of i and j and can be taken out of the loop.
Divisions are more expensive than multiplications. Therefore, you should pre-compute factor = 1/div outside the loop, and multiply with factor in the loop.
The comparison (matrixPol(k,2)/=0) will almost never be true, unless you set the corresponding values to exactly zero. I assume you know the order of your poles before you call the subroutine, so why not pass it along and save yourself this comparison? If that is not possible, determine the number of poles inside the subroutine before the main loop. Then, the inner loop over k is much simpler.
Inside the loop, you convert the input matrix to real over and over again. This can be done outside the loop in one go. Or, even better, just pass only the real part to the function!
At this point your code looks something like:
!=============================================================================
SUBROUTINE EvaluateFunc(matrixPol,matrixInt,z,matrix)
use NAGmodule
integer i,j,k
REAL*8, DIMENSION(Npoles,3) :: matrixPol
COMPLEX*16, DIMENSION(nsit,nsit,Npoles) :: matrixInt
COMPLEX*16, DIMENSION(nsit,nsit) :: matrix
COMPLEX*16 :: z, factor(Npoles)
REAL*8, DIMENSION(nsit,nsit,Npoles) :: matrixInt_re
integer :: maxPoles
! Determine maximum number of poles
do k=1,Npoles
! Only valid if the poles are set to exactly zero outside. If not,
! use ( abs(matrixPol(k,2)) <= someEpsilon )
if ( matrixPol(k,2) == 0 ) then
maxPoles = k-1
exit
endif
enddo
! Pre-compute factors
factor(:maxPoles) = (1.,0.)/(z-matrixPol(:maxPoles,1))**matrixPol(:maxPoles,3)
! Convert input to real
matrixInt_re = real(matrixInt)
do i=1,indCompMax
do j=i,nsit-i+1
do k=1,maxPoles
matrix(i,j)=matrix(i,j)+matrixInt_re(i,j,k)*factor(k)
enddo
enddo
enddo
call ExtendBySymmetry(Gmat)
end
Further optimization:
Rewriting the code like this it becomes apparent that the inner loop over k is nothing more than a dot product. This could potentially be sped up by the compiler. The main loop would then look like:
do i=1,indCompMax
do j=i,nsit-i+1
matrix(i,j)=matrix(i,j) + &
dot_product( matrixInt_re(i,j,:maxPoles), factor(:maxPoles) )
enddo
enddo
As chw21 noted, Fortran uses a column major memory layout and you are accessing it in a row major fashion. This potentially loses you a lot of memory. You should consider transposing your arrays in the main program or at least switch the loops over i and j. I would prefer the first option, as the inner dot product would then be performed on contiguous memory chunks.
Try to see whether you can swap the loops around. Since Fortran stores the arrays in the order
(1, 1), (2, 1), (3, 1), ..., (n, 1), (1, 2), (2, 2), ...
Memory access is far faster if you loop along that dimension.

Unclassifiable statement at (1) in Fortran

I am pretty new to to fortran and I don't really know why am I getting this error.
integrand(i)=inte(x(i),beta,r2,r1)
1 Error: Unclassifiable statement at (1) calka11.f95:97.6:
I have made all the variables into a module file and then call them using
use
and when I am getting those variables into the code file again It works fluently again.
module zmienne
real(10) :: r, u, r6, tempred, f, r2, r1, calka,beta
real(10) :: inte
real :: start, finish
integer:: i
integer, parameter :: Ngauss = 8
real(10),dimension(ngauss),parameter::xx=(/-0.9602898565d0,&
-0.7966664774d0,-0.5255324099d0,-0.1834346425d0,&
0.1834346425d0,0.5255324099d0,0.7966664774d0,0.9602898565d0/)
real(10),Dimension(ngauss),parameter::ww=(/0.1012285363d0,&
0.2223810345d0,0.3137066459d0,0.3626837834d0,&
0.3626837834d0,0.3137066459d0,0.2223810345d0,0.1012285363d0/)
real(10),dimension(ngauss)::x,w,integrand
integer, parameter :: out_unit=1000
integer, parameter :: out_unit1=1001
integer, parameter :: out_unit2=1002, out_unit3=1003
real(10), parameter :: error=0.000001
real(10):: total_calka, division,tot_old,blad
real(10),parameter:: intrange=7.0
integer::j,irange
end module zmienne
The main program that uses the module:
program wykres
use zmienne
implicit none
open(unit=out_unit, file='wykresik.dat', action='write', status='replace')
open(unit=out_unit1, file='wykresik1.dat', action='write')
open(unit=out_unit2, file='wykresik2.dat', action='write')
open(out_unit3, file='wykresik3.dat', action='write')
! the gaussian points (xx) and weights (ww) are for the [-1,1] interval
! for [0,1] interval we have (vector instr.)
x=0.5d0*(xx+1.0d0)
w=0.5d0*ww
! plots
tempred = 1.0
beta=4.d0/tempred
call cpu_time(start)
do i=1,1000
r=float(i)*0.01
r6=(1.0/r)**6
u=beta*r6*(r6-1.0)
f=exp(-u/tempred)-1.0
write(out_unit,*) r, u
write(out_unit1,*)r, f
write(out_unit2,*)r, r*r*f
end do
call cpu_time(finish)
print '("Time = ",f6.3," seconds.")',finish-start
! end of plots
! integration 1
calka=0.0
r1=0.0
r2=0.5
do i=1,ngauss
r=(r2-r1)*x(i)+r1
r6=(1.0/r)**6
u=beta*r6*(r6-1.0d0)
! check for underflows
if (u>100.d0) then
f=-1.0d0
else
f=exp(-u)-1.d0
endif
! the array integrand is introduced in order to perform vector calculations below
integrand(i)=r*r*f
calka=calka+integrand(i)*w(i)
enddo
calka=calka*(r2-r1)
write(*,*)calka
! end of integration
! integration 2
calka=0.0
do i=1,ngauss
integrand(i)=inte(x(i),beta,r2,r1)
calka=calka+integrand(i)*w(i)
enddo
calka=calka*(r2-r1)
! end of integration 2
write(*,*)calka
! vector integration and analytical result
write(*,*)sum(integrand*w*(r2-r1)),-(0.5**3)/3.0
!**************************************************************
! tot_calka - the sum of integrals all integration ranges
! dividion the initial length of the integration intervals
! tot_old - we will compare the results fro two consecutive divisions.
! at the beginning we assume any big number
! blad - the difference between two consecutive integrations,
! at the beginning we assume any big number
! error - assumed precission, parameter, it is necassary for
! performing do-while loop
total_calka=0.0
division=0.5
tot_old=10000.0
blad=10000.0
do while (blad>error)
! intrange - the upper integration limit, it should be estimated
! analysing the plot of the Mayer function. Here - 7.
! irange = the number of subintegrals we have to calculate
irange=int(intrange/division)
total_calka=-(0.5**3)/3.0
! the analytical result for the integration range [0,0.5]
! the loop over all the intervals, for each of them we calculate
! lower and upper limits, r1 and r2
do j=1,irange
r1=0.5+(j-1)*division
r2=r1+division
calka=0.0
! the integral for a given interval
do i=1,ngauss
integrand(i)=inte(x(i),beta,r2,r1)
calka=calka+integrand(i)*w(i)
enddo
total_calka=total_calka+calka*(r2-r1)
enddo
! aux. output: number of subintervals, old and new integrals
write(*,*) irange,division,tot_old,total_calka
division=division/2.0
blad=abs(tot_old-total_calka)
tot_old=total_calka
! and the final error
write(*,*) blad
enddo
open(1,file='calka.dat', access='append')
! the secod viarial coefficient=CONSTANT*total_calka,
! CONSTANT is omitted here
write(1,*)tempred,total_calka
close(1)
end program wykres
I also made the external to this real function and it gives me an error for every variable in the module. It looks like that for example
undefined reference to `__zmienne_MOD_total_calka'
Inte is a real defined variable, which is necessary for me to calculate this integral.
integrand(i)=inte(x(i),beta,r2,r1)
Why it doesn't work when it's in another file, while it was working while its been inside it. That's weird
This is the original code:
program wykres
implicit none
real(10) :: r, u, r6, tempred, f, r2, r1, calka,beta
! beta - an auxiliary variable
real(10) :: inte
! inte - the function defined below
real :: start, finish
integer:: i
integer, parameter :: Ngauss = 8
real(10),dimension(ngauss),parameter::xx=(/-0.9602898565d0,&
-0.7966664774d0,-0.5255324099d0,-0.1834346425d0,&
0.1834346425d0,0.5255324099d0,0.7966664774d0,0.9602898565d0/)
real(10),Dimension(ngauss),parameter::ww=(/0.1012285363d0,&
0.2223810345d0,0.3137066459d0,0.3626837834d0,&
0.3626837834d0,0.3137066459d0,0.2223810345d0,0.1012285363d0/)
real(10),dimension(ngauss)::x,w,integrand
integer, parameter :: out_unit=1000
integer, parameter :: out_unit1=1001
integer, parameter :: out_unit2=1002, out_unit3=1003
real(10), parameter :: error=0.000001
real(10):: total_calka, division,tot_old,blad
real(10),parameter:: intrange=7.0
integer::j,irange
open(unit=out_unit, file='wykresik.dat', action='write', status='replace')
open(unit=out_unit1, file='wykresik1.dat', action='write')
open(unit=out_unit2, file='wykresik2.dat', action='write')
open(out_unit3, file='wykresik3.dat', action='write')
! the gaussian points (xx) and weights (ww) are for the [-1,1] interval
! for [0,1] interval we have (vector instr.)
x=0.5d0*(xx+1.0d0)
w=0.5d0*ww
! plots
tempred = 1.0
call cpu_time(start)
do i=1,1000
r=float(i)*0.01
r6=(1.0/r)**6
u=4.0d0*r6*(r6-1.0)
f=exp(-u/tempred)-1.0
write(out_unit,*) r, u
write(out_unit1,*)r, f
write(out_unit2,*)r, r*r*f
end do
call cpu_time(finish)
print '("Time = ",f6.3," seconds.")',finish-start
! end of plots
! integration 1
calka=0.0
r1=0.0
r2=0.5
! auxiliary variable
beta=4.d0/tempred
do i=1,ngauss
r=(r2-r1)*x(i)+r1
r6=(1.0/r)**6
u=beta*r6*(r6-1.0d0)
! check for underflows
if (u>100.d0) then
f=-1.0d0
else
f=exp(-u)-1.d0
endif
! the array integrand is introduced in order to perform vector calculations below
integrand(i)=r*r*f
calka=calka+integrand(i)*w(i)
enddo
calka=calka*(r2-r1)
write(*,*)calka
! end of integration
! integration 2
calka=0.0
do i=1,ngauss
integrand(i)=inte(x(i),beta,r2,r1)
calka=calka+integrand(i)*w(i)
enddo
calka=calka*(r2-r1)
! end of integration 2
write(*,*)calka
! vector integration and analytical result
write(*,*)sum(integrand*w*(r2-r1)),-(0.5**3)/3.0
!**************************************************************
! tot_calka - the sum of integrals all integration ranges
! dividion the initial length of the integration intervals
! tot_old - we will compare the results fro two consecutive divisions.
! at the beginning we assume any big number
! blad - the difference between two consecutive integrations,
! at the beginning we assume any big number
! error - assumed precission, parameter, it is necassary for
! performing do-while loop
total_calka=0.0
division=0.5
tot_old=10000.0
blad=10000.0
do while (blad>error)
! intrange - the upper integration limit, it should be estimated
! analysing the plot of the Mayer function. Here - 7.
! irange = the number of subintegrals we have to calculate
irange=int(intrange/division)
total_calka=-(0.5**3)/3.0
! the analytical result for the integration range [0,0.5]
! the loop over all the intervals, for each of them we calculate
! lower and upper limits, r1 and r2
do j=1,irange
r1=0.5+(j-1)*division
r2=r1+division
calka=0.0
! the integral for a given interval
do i=1,ngauss
integrand(i)=inte(x(i),beta,r2,r1)
calka=calka+integrand(i)*w(i)
enddo
total_calka=total_calka+calka*(r2-r1)
enddo
! aux. output: number of subintervals, old and new integrals
write(*,*) irange,division,tot_old,total_calka
division=division/2.0
blad=abs(tot_old-total_calka)
tot_old=total_calka
! and the final error
write(*,*) blad
enddo
open(1,file='calka.dat', access='append')
! the secod viarial coefficient=CONSTANT*total_calka,
! CONSTANT is omitted here
write(1,*)tempred,total_calka
close(1)
end program wykres
Jesus christ, there were a code below the end program... Thread may be closed now. Thank you to everyone.
real(kind=10) function inte(y,beta,r2,r1)
implicit none
real(kind=10)::r,beta,r6,r2,r1,u,y
r=(r2-r1)*y+r1
r6=(1.0/r)**6
u=beta*r6*(r6-1.0d0)
if (u>100.d0) then
inte=-1.0d0
else
inte=exp(-u)-1.d0
endif
inte=r*r*inte
end
real(10) :: inte
is not just a real scalar variable of kind 10, whatever that kind means for your compiler. (For gfortran kind 10 is the x87 extended precision, for many other compilers it is invalid.)
When you then do
integrand(i)=inte(x(i),beta,r2,r1)
it makes no sense to index a scalar.
If inte was meant to be an external function returning a real of kind 10, you should declare it as
real(10), external :: inte
or much better write a complete interface block for it.

Resources