I have a fortran subroutine in the file gfunc.f90. I want to call this subroutine from my main program in test.f90.
If I keep both files in the same directory and compile them with
gfortran gfunc.f90 test.f90 -o test
It works fine.
But I want the subroutine to be in a library. Thus, I create a subfolder called gfunc/ and put gfunc.f90 there. In that folder, I compile the module with
gfortran -fdefault-real-8 -fPIC -c gfunc.f90
and
gfortran -fdefault-real-8 -shared -o gfunc.so gfunc.o
I now compile the main program with
gfortran test.f90 gfunc/gfunc.so
But now I get a segmentation fault as soon as a variable in the subroutine is accessed.
How do I have to compile and link the library correctly?
Here, you find a minimal working example in order to reproduce the problem:
gfunc.f90:
module gfunc_module
implicit none
contains
subroutine gfunc(x, n, m, a, b, c)
double precision, intent(in) :: x
integer, intent(in) :: n, m
double precision, dimension(n), intent(in) :: a
double precision, dimension(m), intent(in) :: b
double precision, dimension(n, m), intent(out) :: c
integer :: i, j
do j=1,m
do i=1,n
c(i,j) = exp(-x * (a(i)**2 + b(j)**2))
end do
end do
end subroutine
end module
test.f90:
program main
use gfunc_module
implicit none
integer, parameter :: dp = kind(1.0d0)
real(dp) :: x = 1.
integer, parameter :: n = 4
integer, parameter :: m = 4
real(dp), dimension(n) :: a = [-1., -0.33333333, .033333333, 1.]
real(dp), dimension(m) :: b = [-1., -0.33333333, .033333333, 1.]
real(dp), dimension(n, m) :: c
call gfunc(x, n, m, a, b, c)
write(*,*) c
end program main
You should add fdefault-real-8 to your compilation of test.f90:
gfortran -fdefault-real-8 test.f90 gfunc/gfunc.so
From the documentation for gfortran -fdefault-real-8 is making DOUBLE PRECISION variables 16 bytes wide, so without it on test.f90 the double precision variables are only 8 bytes wide:
-fdefault-real-8
Set the default real type to an 8 byte wide type. Do nothing if this is already the default. This option also affects the kind of non-double real constants like 1.0, and does promote the default width of "DOUBLE PRECISION" to 16 bytes if possible, unless "-fdefault-double-8" is given, too.
Related
I have a very weird problem when I compile Fortran code using the zgerc subroutine from BLAS. Basically, this subroutine calculate the outer product of vector x with the conjugate of vector y. More about that function here. My simple code is as following:
program main
implicit none
integer :: i
complex(8), dimension(10) :: a = [(i, i=0,9)]
complex(8), dimension(10) :: b = [(i, i=0,9)]
complex(8), dimension(10, 10) :: c
c = 0
CALL zgerc(10, 10, 1.D0, a, 1, b, 1, c, 10)
WRITE(*, *) c
end program main
I have here 2 complex vectors, a and b, both goes from 0 to 9 and their imaginary part is 0.
Now for the weird part. If I compile that code with absolute path: gfortran -c /home/myUser/Fortran/tests/main.f90 -o main.o I get the correct result, but if I compile with gfortran -c main.f90 -o main.o (of course I'm in the right directory, and I've also tried ./main.f90) the result of the real part is correct, but for the imaginary part I get numbers like 1E+225 (and if I use ./main.f90 I get numbers like 1E+163).
I can't understand why the path to my code change the result of the imaginary part... I'll be glad for your help.
I use Ubuntu 20.04.2 with the default gfortran (9.3.0)
P.S, my final goal is to use this as part of a more complex subroutine in Python with f2py.
EDIT: my full commands:
#gfortran -c /home/myUser/Fortran/tests/main.f90 -o main.o
gfortran -c main.f90 -o main.o
gfortran -o test main.o /home/myUser/PycharmProjects/GSIE_2D/fortran_scripts/libblas.a /home/myUser/PycharmProjects/GSIE_2D/fortran_scripts/liblapack.a
rm ./main.o
./test
line 1 and 2 are the 2 cases, so I run only one of them each time.
You supply 1d0 which is a double precision literal whereas zgerc assumes a double complex value.
call zgerc(10, 10, cmplx(1, kind=8), a, 1, b, 1, c, 10)
By including explicit interfaces (via some kind of blas module) you would have gotten compile time errors when supplying arguments of wrong data type.
Intel's mkl provides such explicit interfaces in its blas95 module as well as generic routines (gerc instead of {c,z}gerc).
There is also this open-source module providing explicit interfaces to standard blas routines.
Also make use of the portable types defined in iso_fortran_env.
program main
use blas95, only: gerc
use iso_fortran_env, only: real64
implicit none
integer, parameter :: n = 10
integer :: i
complex(real64) :: a(n) = [(i, i=0,n-1)], b(n) = [(i, i=0,n-1)], c(n,n)
c = 0
! call zgerc(10, 10, cmplx(1, kind=8), a, 1, b, 1, c, 10) ! standard blas
call gerc(c, a, b, alpha=cmplx(1, kind=real64)) ! generic blas95
print *, c
end program
I am evaluating the overhead cost (in wall clock time) of some features in fortran programs. And I came across the following behavior with GNU fortran, that I did not expect: having the subroutine in the same file as the main program (in the contain region or in a module) versus having the subroutine in a separate module (in separate file) has a big impact.
The simple code that reproduces the behavior is:
I have a subroutine that does a matrix-vector multiplication 250000 times. In the first test, I have a subroutine in the contain region of the main program. In the second test, the same subroutine is in a separate module.
The difference in performance between the two is big.
The subroutine in the contain region of the main program, 10 runs yields
min: 1.249
avg: 1.266
1.275 - 1.249 - 1.264 - 1.279 - 1.266 - 1.253 - 1.271 - 1.251 - 1.269 - 1.284
The subroutine in separate module, 10 runs yields
min: 1.848
avg: 1.861
1.848 - 1.862 - 1.853 - 1.871 - 1.854 - 1.883 - 1.810 - 1.860 - 1.886 - 1.884
About 50% slower, this factor seems consistent with the size of the matrix as
well as the number of iterations.
those tests are done with gfortran 4.8.5. With gfortran 8.3.0, the program runs a little faster, but the time doubles from the subroutine in the contain section of the main program to the subroutine in a separate module.
Portland group does not have that problem with my test program and it run even faster than the best case of gfortran.
If I read the size of the matrix from an input file (or runtime command line arg) and do dynamic allocation, then the difference in wall clock time goes away and both cases run slower (wall clock time of the subroutine in the separate module, separate file). I suspect that gfortran is able to optimize the main program better if the size of the matrix is known at compile time in the main program.
What am I doing wrong that GNU Compilers do not like, or what is GNU compiler doing poorly? Are there compiling flags to to help gfortran in such cases?
Everything is compiled with optimization -O3
Code (test_simple.f90)
!< #file test_simple.f90
!! simple test
!>
!
program test_simple
!
use iso_fortran_env
use test_mod
!
implicit none
!
integer, parameter :: N = 100
integer, parameter :: N_TEST = 250000
logical, parameter :: GENERATE=.false.
!
real(real64), parameter :: dx = 10.0_real64
real(real64), parameter :: lx = 40.0_real64
!
real(real64), dimension(N,N) :: A
real(real64), dimension(N) :: x, y
real(real64) :: start_time, end_time
real(real64) :: duration
!
integer :: k, loop_idx
!
call make_matrix(A,dx,lx)
x = A(N/2,:)
!
y = 0
call cpu_time( start_time )
call axpy_loop (A, x, y, N_TEST)
!call axpy_loop_in (A, x, y, N_TEST)
!
call cpu_time( end_time )
!
duration = end_time-start_time
!
if( duration < 0.01 )then
write( *, "('Total time:',f10.6)" ) duration
else
write( *, "('Total time:',f10.3)" ) duration
end if
!
write(*,"('Sum = ',ES14.5E3)") sum(y)
!
contains
!
!< #brief compute y = y + A^nx
!! #param[in] A matrix to use
!! #param[in] x vector to used
!! #param[in, out] y output
!! #param[in] nloop number of iterations, power to apply to A
!!
!>
subroutine axpy_loop_in (A, x, y, nloop)
real(real64), dimension(:,:), intent(in) :: A
real(real64), dimension(:), intent(in) :: x
real(real64), dimension(:), intent(inout) :: y
integer, intent(in) :: nloop
!
real(real64), dimension(size(x)) :: z
integer :: k, iter
!
y = x
do iter = 1, nloop
z = y
y = 0
do k = 1, size(A,2)
y = y + A(:,k)*z(k)
end do
end do
!
end subroutine axpy_loop_in
!
!> #brief Computes the square exponential correlation kernel matrix for
!! a 1D uniform grid, using coordinate vector and scalar parameters
!! #param [in, out] C square matrix of correlation (kernel)
!! #param [in] dx grid spacing
!! #param [in] lx decorrelation length
!!
!! The correlation betwen the grid points i and j is given by
!! \f$ C(i,j) = \exp(\frac{-(xi-xj)^2}{2l_xi l_xj}) \f$
!! where xi and xj are respectively the coordinates of point i and j
!>
subroutine make_matrix(C, dx, lx)
! some definitions of the square correlation
! uses 2l^2 while other use l^2
! l^2 is used here by setting this factor to 1.
real(real64), parameter :: factor = 1.0
!
real(real64), dimension(:,:), intent(in out) :: C
real(real64), intent(in) :: dx
real(real64) lx
! Local variables
real(real64), dimension(size(x)) :: nfacts
real :: dist, denom
integer :: ii, jj
!
do jj=1, size(C,2)
do ii=1, size(C,1)
dist = (ii-jj)*dx
denom = factor*lx*lx
C(ii, jj) = exp( -dist*dist/denom )
end do
! compute normalization factors
nfacts(jj) = sqrt( sum( C(:, jj) ) )
end do
!
! normalize to prevent arbitrary growth in those tests
! where we apply the exponential of the matrix
do jj=1, size(C,2)
do ii=1, size(C,1)
C(ii, jj) = C(ii, jj)/( nfacts(ii)*nfacts(jj) )
end do
end do
! remove the very small
where( C<epsilon(1.) ) C=0.
!
end subroutine make_matrix
!
end program test_simple
!
Code (test_mod.f90)
!> #file test_mod.f90
!! simple operations
!<
!< #brief module for simple operations
!!
!>
module test_mod
use iso_fortran_env
implicit none
contains
!
!< #brief compute y = y + A^nx
!! #param[in] A matrix to use
!! #param[in] x vector to used
!! #param[in, out] y output
!! #param[in] nloop number of iterations, power to apply to A
!!
!>
subroutine axpy_loop( A, x, y, nloop )
real(real64), dimension(:,:), intent(in) :: A
real(real64), dimension(:), intent(in) :: x
real(real64), dimension(:), intent(inout) :: y
integer, intent(in) :: nloop
!
real(real64), dimension(size(x)) :: z
integer :: k, iter
!
y = x
do iter = 1, nloop
z = y
y = 0
do k = 1, size(A,2)
y = y + A(:,k)*z(k)
end do
end do
!
end subroutine axpy_loop
!
end module test_mod
compile as
gfortran -O3 -o simple test_mod.f90 test_simple.f90
run as
./simple
The combination of flags -march=native and -flto is the solution to the problem, at least on my testing computers. With those options, the program is fully optimized and there is no difference between having the subroutine in the same file as the main program, or in a separate file (separate module). In addition, the runtime is comparable to the runtime with Portland Group compiler. Any one of these options alone did not solved the problem. -march=native alone speeds the in contain version but makes the module version worse.
My biased thinking is that the option -march=native should be default; users doing something else are experienced and know what they are doing so they can add the appropriate option or disable the default, whereas the common user will not easily think of it.
Thank you for all the comments.
I want to perform a Matrix-Vector product in fortran using the SGEMV subroutine from BLAS.
I have a code that is similar to this:
program test
integer, parameter :: DP = selected_real_kind(15)
real(kind=DP), dimension (3,3) :: A
real(kind=DP), dimension (3) :: XP,YP
call sgemv(A,XP,YP)
A is a 3x3 Matrix, XP and YP are Vectors.
In the included module one can see the following code:
PURE SUBROUTINE SGEMV_F95(A,X,Y,ALPHA,BETA,TRANS)
! Fortran77 call:
! SGEMV(TRANS,M,N,ALPHA,A,LDA,X,INCX,BETA,Y,INCY)
USE F95_PRECISION, ONLY: WP => SP
REAL(WP), INTENT(IN), OPTIONAL :: ALPHA
REAL(WP), INTENT(IN), OPTIONAL :: BETA
CHARACTER(LEN=1), INTENT(IN), OPTIONAL :: TRANS
REAL(WP), INTENT(IN) :: A(:,:)
REAL(WP), INTENT(IN) :: X(:)
REAL(WP), INTENT(INOUT) :: Y(:)
END SUBROUTINE SGEMV_F95
I understand that the some of the parameters are optional, so where am i wrong in the method call?
When you look at BLAS or LAPACK routines then you should always have a look at the first letter:
S: single precision
D: double precision
C: single precision complex
Z: double precision complex
You defined your matrix A as well as the vectors XP and YP as a double precision number using the statement:
integer, parameter :: DP = selected_real_kind(15)
So for this, you need to use dgemv or define your precision as single precision.
There is also a difference between calling dgemv and dgemv_f95. dgemv_f95 is part of Intel MKL and not really a common naming. For portability reasons, I would not use that notation but stick to the classic dgemv which is also part of Intel MKL.
DGEMV performs one of the matrix-vector operations
y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y,
where alpha and beta are scalars, x and y are vectors and A is an
m by n matrix.
If you want to know how to call the function, I suggest to have a look here, but it should, in the end, look something like this:
call DGEMV('N',3,3,ALPHA,A,3,XP,1,BETA,YP,1)
The precisions are incompatible. You are calling sgemv which takes single precision arguments but you are passing double precision arrays and vectors.
Perhaps the trans parameter is required?
trans: Must be 'N', 'C', or 'T'.
(As per the note at the bottom of Developer Reference for IntelĀ® Math Kernel Library - Fortran.)
I wrote a Fortran code that calculates the ith-permutation of a given list {1,2,3,...,n}, without computing all the others, that are n! I needed that in order to find the ith-path of the TSP (Travelling salesman problem).
When n! is big, the code gives me some error and I tested that the ith-permutation found is not the exact value. For n=10, there are not problems at all, but for n=20, the code crashes or wrong values are found. I think this is due to errors that Fortran makes operating with big numbers (sums of big numbers).
I use Visual Fortran Ultimate 2013. In attached you find the subroutine I use for my goal. WeightAdjMatRete is the distance matrix between each pair of knots of the network.
! Fattoriale
RECURSIVE FUNCTION factorial(n) RESULT(n_factorial)
IMPLICIT NONE
REAL, INTENT(IN) :: n
REAL :: n_factorial
IF(n>0) THEN
n_factorial=n*factorial(n-1)
ELSE
n_factorial=1.
ENDIF
ENDFUNCTION factorial
! ith-permutazione di una lista
SUBROUTINE ith_permutazione(lista_iniziale,n,i,ith_permutation)
IMPLICIT NONE
INTEGER :: k,n
REAL :: j,f
REAL, INTENT(IN) :: i
INTEGER, DIMENSION(1:n), INTENT(IN) :: lista_iniziale
INTEGER, DIMENSION(1:n) :: lista_lavoro
INTEGER, DIMENSION(1:n), INTENT(OUT) :: ith_permutation
lista_lavoro=lista_iniziale
j=i
DO k=1,n
f=factorial(REAL(n-k))
ith_permutation(k)=lista_lavoro(FLOOR(j/f)+1)
lista_lavoro=PACK(lista_lavoro,MASK=lista_lavoro/=ith_permutation(k))
j=MOD(j,f)
ENDDO
ENDSUBROUTINE ith_permutazione
! Funzione modulo, adattata
PURE FUNCTION mood(k,modulo) RESULT(ris)
IMPLICIT NONE
INTEGER, INTENT(IN) :: k,modulo
INTEGER :: ris
IF(MOD(k,modulo)/=0) THEN
ris=MOD(k,modulo)
ELSE
ris=modulo
ENDIF
ENDFUNCTION mood
! Funzione quoziente, adattata
PURE FUNCTION quoziente(a,p) RESULT(ris)
IMPLICIT NONE
INTEGER, INTENT(IN) :: a,p
INTEGER :: ris
IF(MOD(a,p)/=0) THEN
ris=(a/p)+1
ELSE
ris=a/p
ENDIF
ENDFUNCTION quoziente
! Vettori contenenti tutti i payoff percepiti dagli agenti allo state vector attuale e quelli ad ogni sua singola permutazione
SUBROUTINE tuttipayoff(n,m,nodi,nodi_rete,sigma,bvector,MatVecSomma,VecPos,lista_iniziale,ith_permutation,lunghezze_percorso,WeightAdjMatRete,array_perceived_payoff_old,array_perceived_payoff_neg)
IMPLICIT NONE
INTEGER, INTENT(IN) :: n,m,nodi,nodi_rete
INTEGER, DIMENSION(1:nodi), INTENT(IN) :: sigma
INTEGER, DIMENSION(1:nodi), INTENT(OUT) :: bvector
REAL, DIMENSION(1:m,1:n), INTENT(OUT) :: MatVecSomma
REAL, DIMENSION(1:m), INTENT(OUT) :: VecPos
INTEGER, DIMENSION(1:nodi_rete), INTENT(IN) :: lista_iniziale
INTEGER, DIMENSION(1:nodi_rete), INTENT(OUT) :: ith_permutation
REAL, DIMENSION(1:nodi_rete), INTENT(OUT) :: lunghezze_percorso
REAL, DIMENSION(1:nodi_rete,1:nodi_rete), INTENT(IN) :: WeightAdjMatRete
REAL, DIMENSION(1:nodi), INTENT(OUT) :: array_perceived_payoff_old,array_perceived_payoff_neg
INTEGER :: i,j,k
bvector=sigma
FORALL(i=1:nodi,bvector(i)==-1)
bvector(i)=0
ENDFORALL
FORALL(i=1:m,j=1:n)
MatVecSomma(i,j)=bvector(m*(j-1)+i)*(2.**REAL(n-j))
ENDFORALL
FORALL(i=1:m)
VecPos(i)=1.+SUM(MatVecSomma(i,:))
ENDFORALL
DO k=1,nodi
IF(VecPos(mood(k,m))<=factorial(REAL(nodi_rete))) THEN
CALL ith_permutazione(lista_iniziale,nodi_rete,VecPos(mood(k,m))-1.,ith_permutation)
FORALL(i=1:(nodi_rete-1))
lunghezze_percorso(i)=WeightAdjMatRete(ith_permutation(i),ith_permutation(i+1))
ENDFORALL
lunghezze_percorso(nodi_rete)=WeightAdjMatRete(ith_permutation(nodi_rete),ith_permutation(1))
array_perceived_payoff_old(k)=(1./SUM(lunghezze_percorso))
ELSE
array_perceived_payoff_old(k)=0.
ENDIF
IF(VecPos(mood(k,m))-SIGN(1,sigma(m*(quoziente(k,m)-1)+mood(k,m)))*2**(n-quoziente(k,m))<=factorial(REAL(nodi_rete))) THEN
CALL ith_permutazione(lista_iniziale,nodi_rete,VecPos(mood(k,m))-SIGN(1,sigma(m*(quoziente(k,m)-1)+mood(k,m)))*2**(n-quoziente(k,m))-1.,ith_permutation)
FORALL(i=1:(nodi_rete-1))
lunghezze_percorso(i)=WeightAdjMatRete(ith_permutation(i),ith_permutation(i+1))
ENDFORALL
lunghezze_percorso(nodi_rete)=WeightAdjMatRete(ith_permutation(nodi_rete),ith_permutation(1))
array_perceived_payoff_neg(k)=(1./SUM(lunghezze_percorso))
ELSE
array_perceived_payoff_neg(k)=0.
ENDIF
ENDDO
ENDSUBROUTINE tuttipayoff
Don't use floating-point numbers to represent factorials; factorials are products of integers and are therefore best represented as integers.
Factorials grow big fast, so it may be tempting to use reals, because reals can represent huge numbers like 1.0e+30. But floating-point numbers are precise only with relation to their magnitude; their mantissa still has a limited size, they can be huge because their exponents may be huge.
A 32-bit real can represent exact integers up to about 16 million. After that, only every even integer can be represented up to 32 million and every fourth integer up to 64 million. 64-bit integers are better, because they can represent exact integers up to 9 quadrillion.
64-bit integers can go 1024 times further: They can represent 2^63 or about 9 quintillion (9e+18) integers. That is enough to represent 20!:
20! = 2,432,902,008,176,640,000
2^63 = 9,223,372,036,854,775,808
Fortran allows you to select a kind of integer based on the decimal places it should be able to represent:
integer, (kind=selected_int_kind(18))
Use this to do your calculations with 64-bit integers. This will give you factorials up to 20!. It won't go further than that, though: Most machines support only integers up to 64 bit, so selected_int_kind(19) will give you an error.
Here's the permutation part of your program with 64-bit integers. Note how all the type conversions ald floors and ceilings disappear.
program permute
implicit none
integer, parameter :: long = selected_int_kind(18)
integer, parameter :: n = 20
integer, dimension(1:n) :: orig
integer, dimension(1:n) :: perm
integer(kind=long) :: k
do k = 1, n
orig(k) = k
end do
do k = 0, 2000000000000000_long, 100000000000000_long
call ith_perm(perm, orig, n, k)
print *, k
print *, perm
print *
end do
end program
function fact(n)
implicit none
integer, parameter :: long = selected_int_kind(18)
integer(kind=long) :: fact
integer, intent(in) :: n
integer :: i
fact = 1
i = n
do while (i > 1)
fact = fact * i
i = i - 1
end do
end function fact
subroutine ith_perm(perm, orig, n, i)
implicit none
integer, parameter :: long = selected_int_kind(18)
integer, intent(in) :: n
integer(kind=long), intent(in) :: i
integer, dimension(1:n), intent(in) :: orig
integer, dimension(1:n), intent(out) :: perm
integer, dimension(1:n) :: work
integer :: k
integer(kind=long) :: f, j
integer(kind=long) :: fact
work = orig
j = i
do k = 1, n
f = fact(n - k)
perm(k) = work(j / f + 1)
work = pack(work, work /= perm(k))
j = mod(j, f)
end do
end subroutine ith_perm
I have to use a subroutine (neqnf) included in IMSL library, which let me solve non-linear systems. (link to users manual, neqnf page here)
main.f90, is:
program prova_sistema_in_un_modulo
include "link_fnl_shared.h"
use neqnf_int
use modx
implicit none
call d_neqnf(FCN, x, xguess=x_guess, fnorm=f_norm)
end program prova_sistema_in_un_modulo
where subroutine FCN is coded in an external module, modx.f90:
module modx
implicit none
integer, parameter :: ikind = selected_real_kind(8,99)
integer :: n=3
real(kind=ikind) :: f_norm
real(kind=ikind), dimension(3) :: x, x_guess=(/ 4.0, 4.0, 4.0/)
contains
subroutine FCN(x,f,n)
integer :: n !dummy var
real(kind=ikind), dimension(3) :: x, f !dummy var
f(1)=x(1)+A(x(1))+(x(2)+x(3))*(x(2)+x(3))-27.0 ! =0
f(2)=B(x(1),x(2))+x(3)*x(3)-10.0 ! =0
f(3)=Z(x(2),x(3)) ! =0
end subroutine FCN
function A(x)
real(kind=ikind) :: x !dummy var
real(kind=ikind) :: A !function var
A=exp(x-1.0)
end function A
function B(x,y)
real(kind=ikind) :: x,y !dummy var
real(kind=ikind) :: B !function var
B=exp(y-2.0)/x
end function B
function C(x)
real(kind=ikind) :: x !dummy var
real(kind=ikind) :: C !function var
C=sin(x-2.0)
end function C
function Z(x,y)
real(kind=ikind) :: x,y !dummy var
real(kind=ikind) :: Z !function var
Z=y+C(x)+x*x-7.0
end function Z
end module modx
but I get these three errors:
Error 1 error #7061: The characteristics of dummy argument 1 of the associated actual procedure differ from the characteristics of dummy argument 1 of the dummy procedure. (12.2) [FCN]
Error 2 error #7062: The characteristics of dummy argument 2 of the associated actual procedure differ from the characteristics of dummy argument 2 of the dummy procedure. (12.2) [FCN]
Error 3 error #7063: The characteristics of dummy argument 3 of the associated actual procedure differ from the characteristics of dummy argument 3 of the dummy procedure. (12.2) [FCN]
NB: if I put all code in the main program, all goes fine! while if I code using module (as I've done, the actually posted code) I get that errors!
can anyone help me?
The problem is that you provide a fixed dimension for the dummy arguments x(3) and f(3) in your custom function FCN, while IMSL expects a variable dimension x(n), f(n):
subroutine FCN(x,f,n)
integer :: n !dummy var
! real(kind=ikind), dimension(3) :: x, f !<- wrong
real(kind=ikind), dimension(n) :: x, f !<- correct
f(1)=x(1)+A(x(1))+(x(2)+x(3))*(x(2)+x(3))-27.0 ! =0
f(2)=B(x(1),x(2))+x(3)*x(3)-10.0 ! =0
f(3)=Z(x(2),x(3)) ! =0
end subroutine FCN
A working example to reproduce this is (interface borrowed from HYBRD1):
module test_int
contains
subroutine final(FCN, x, f, n)
interface
SUBROUTINE FCN (X, F, N)
INTEGER N
DOUBLE PRECISION X(N), F(N)
END SUBROUTINE
end interface
integer :: n
double precision :: x(n), f(n)
call FCN(x,f,n)
end subroutine
end module
module test_fct
contains
subroutine FCN(X, F, N)
integer :: n
double precision :: x(n), f(n)
print *,X ; print *,F ; print *,N
end subroutine
end module
program prova
use, intrinsic :: iso_fortran_env
use test_int
use test_fct
implicit none
integer,parameter :: n=2
double precision :: x(n), f(n)
x = [ 1.d0, 2.d0 ]
f = [ 3.d0, 4.d0 ]
call final(FCN, x, f, n)
end program prova