Compiling single vs multiple source files in Intel Fortran - performance

I have been compiling a project with modules and subroutines in different files. Each subroutine written in separate file. The same for the modules. Then, I tested compiling these files separately to object files (-c) and than linking with the optimization flags, and also using cat to merge the entire source code and applying the same procedure to this single source file. What I found is that the executable generated by compiling the single file was about 40% faster than that generated by the multiple files, despite using exactly same flags for both.
I would like to know if anyone knows why it is happening, and if there is any flag on the Intel Fortran compiler that compiles multiple files as they were a single file.

As #chw21 requested, I created a small program showing the problem:
program main
use operators
implicit none
integer :: n
real(8), dimension(:,:), allocatable :: a, b, c
integer :: i,j,k
n = 1000
allocate(a(n,n), b(n,n), c(n,n))
call random_number(a)
call random_number(b)
do j = 1, n
do i = 1, n
do k = 1, n
!c(i,j) = c(i,j) + a(k,i) * b(k,j)
c(i,j) = add(c(i,j), mul(a(k,i), b(k,j)))
enddo
enddo
enddo
write(*,*) sum(c)
end program
with module:
module operators
contains
function add(a,b) result (c)
real(8), intent(in) :: a, b
real(8) :: c
c = a + b
end function
function mul(a,b) result (c)
real(8), intent(in) :: a, b
real(8) :: c
c = a * b
end function
end module
The idea is that these functions should normally get inlined, if the compiler knows that they are so extremely small. I did three tests with -O2:
complete source in a single file
split in two files
split in two files with -ipo (or -flto)
The results for ifort 13.0.0 and gfortran 5.2.0 on different machines are:
Test | 1. | 2. | 3.
---------+-------+-------+-------
ifort | 1.3s | 15.7s | 1.9s
gfortran | 1.1s | 3.7s | 1.1s
Unfortunately, I don't know why there is still a difference between the 1st and 3rd test with ifort ... I guess, a look at the generated code would shed some light on this issue.
Update: The times were measured by executing time ./a.out which resulted in stable times. Due to the standard compilation with ifort -O2, the maximum instruction set should be SSE2 (thus, no FMA), the processor supports upto SSE4a (Opteron 6128). An additional test on a recent Intel processor (upto AVX) showed similar results.
An important thing seems to be the lack of inlining and vectorization of the inner loop, which gets applied during IPO and single-file-compilation (see --opt-report). Additionally, there seem to be some differences concerning vectorization between IPO and single-file-compilation.

Related

Compile Module and Main Program In the Same File Using GFortran?

I am new to fortran and I have this fortran90 program I am trying to run where the module and the main are in the same file called main.f90:
module real_precision
implicit none
integer, parameter :: sp = selected_real_kind(1)
integer, parameter :: dp = selected_real_kind(15)
end module real_precision
program main_program
use real_precision
implicit none
real(sp) :: a = 1.0_sp
real(dp) :: b = 1.0_dp
print *, a
print *, b
end program main_program
And I compiled it once doing:
gfortran main.f90 -o main.x
Then run it:
./main.x
However I made a change to the module and saved it but compiling and running it this same way provides the same output which leads me to think that the module needs to be compiled? How do I compile both where they're in the same file? I could make the module a separate file but I'd like to know how to do it this way!
selected_real_kind(p) returns the kind parameter of a real with precision at least p digits (if one exists). It does not give a kind parameter for a real with exactly that precision.
If your compiler has does not have a real with precision less than q then selected _real_kind(q) and selected_real_kind(q-1) will not return different kind parameters.

How does using modules that use other modules affect compilation? (gfortran)

I asked a question about some strange behavior from a Fortran compiler here:
gfortran compiler cannot find misspelled full directory
Basically, the compiler sporadically** complains that a file is missing, but the problem is that the printed name of the file (full path, actually) is misspelled, so no wonder it's "missing". I thought that I resolved the problem by using relative paths, but it turns out that the problem was just dormant. Here's an example of one such complaint:
C:\Users\charl\Documents\GitHub\MOONS>gfortran -fopenmp -g -fimplicit-none -cpp
C:/Users/charl/Documents/GitHub/MOONS/code/globals/current_precision.f90
...
C:/Users/charl/Documents/GitHub/MOONS/code/solvers/induction/init_Bfield.f90
C:/Users/charl/Documents/GitHub
gfortran: error:
C:/Users/charl/Documents/GitHubMOONS/code/solvers/induction/init_Sigma.f90: No such file or directory
Notice that the forward slash ('/') is missing between GitHub and MOONS, and instead reads "GitHubMOONS". This path was correctly written in the makefile.
** I say sporadically because changing lines in the code sometimes results in the error disappearing. Similarly, removing some unused modules (that compile just fine) from my compilation list results in the error disappearing.
The compiler I'm using is:
GNU Fortran (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 4.9.2 Copyright (C) 2014 Free Software Foundation, Inc.
But I've seen the same problem with more recent compilers, e.g.:
x86_64-7.1.0-release-posix-seh-rt_v5-rev2
I think I am seeing a trend with when this error occurs, and it seems to occur more frequently when modules pass on public interfaces to other modules.
Question
So, my question is, given the following two modules:
module add_int_mod
implicit none
private
public :: add
interface add; module procedure add_int; end interface
contains
subroutine add_int(a,b)
implicit none
integer,intent(inout) :: a
integer,intent(in) :: b
a = a + b
end subroutine
end module
module add_all_mod
use add_int_mod
implicit none
private
public :: add
end module
What is the difference between these two programs, if any?
Program 1
program main
use add_all_mod ! only difference
implicit none
integer :: a,b
a = 0
b = 1
call add(a,b)
write(*,*) 'a = ',a
write(*,*) 'b = ',b
end program
Program 2
program main
use add_int_mod ! only difference
implicit none
integer :: a,b
a = 0
b = 1
call add(a,b)
write(*,*) 'a = ',a
write(*,*) 'b = ',b
end program
I appreciate any help/suggestions.

Unable to suppress bound checking

Firstly, I wasn't aware that bound checking was automatic when using gfortran. With the following code:
gfortran -Wno-array-bounds initial_parameters.f08 derrived_types.f08 lin_alg.f08 constitutive_models.f08 input_subs.f08 Subprograms.f08 mainprog.f08
I still receive the compile time warnings:
Warning: Array reference at (1) is out of bounds (3 > 2) in dimension 2
I am probably being silly here but from reading this, I thought that -Wno-array-bounds was supposed to suppress this warning? Compiling with -w successfully inhibits all warnings.
I don't know if it's relevant but the source of these warning are "Subprograms.f08" and "constitutive_models.f08" which are both modules containing subroutines and are used in the main program.
The same behaviour occurs if I attempt to compile an individual module with
gfortran -Wno-array-bounds -c constitutive_models.f08
I can confirm that compile warning with gfortran (4.4) with this simple code:
integer,parameter::dim=3
integer :: x(2)
if(dim.eq.1)write(*,*)x(dim)
end
Warning: Array reference at (1) is out of bounds (3 > 2) in dimension 2
this could arguably be considered a bug since one would expect the compiler to optimize out the whole if statement. Note ifort compiles this just fine.
a very simple workaround fixes this example:
integer,parameter::dim=3
integer :: x(2),dimx=dim
if(dim.eq.1)write(*,*)x(dimx)
end
of course since its just a warning, and you know its not a problem, you can choose to ignore it too !
note the use of the parameter in the logical, in case the compiler feels like optimizing it later.
So what I may suggest is to use overloaded subroutines in order to process the data - then you would have generic behavior without the need to pass the dimension argument explicitly to the function(thus getting rid of the warning). And then I would recommend you to follow Holmz's advice regarding using all warnings during testing stage and then completely turning them off during production build (-w). For now I wasn't able to find an efficient way of suppressing this warning (apart from -w) - it seems that the check for array bounds is on by default and is not overridden -fno-bounds-check or -Wno-array-bounds. But overloaded functions can be a better solution to your problem, the implementation should look like this in this case:
module functions
implicit none
interface test_dim
module procedure test_func1d, test_func2d, test_func3d
end interface ! test_dim
contains
subroutine test_func1d(input1d)
real, intent(in) :: input1d(:)
print*, "DOING 1 DIM"
print*, "SHAPE OF ARRAY:", shape(input1d)
end subroutine test_func1d
subroutine test_func2d(input2d)
real, intent(in) :: input2d(:,:)
print*, "DOING 2 DIM"
print*, "SHAPE OF ARRAY:", shape(input2d)
end subroutine test_func2d
subroutine test_func3d(input3d)
real, intent(in) :: input3d(:,:,:)
print*, "DOING 3 DIM"
print*, "SHAPE OF ARRAY:", shape(input3d)
end subroutine test_func3d
end module functions
program test_prog
use functions
implicit none
real :: case1(10), case2(20,10), case3(30, 40, 20)
call test_dim(case1)
call test_dim(case2)
call test_dim(case3)
end program test_prog
And the output produced by this function looks like this:
DOING 1 DIM
SHAPE OF ARRAY: 10
DOING 2 DIM
SHAPE OF ARRAY: 20 10
DOING 3 DIM
SHAPE OF ARRAY: 30 40 20

How to measure elapsed time on a processor executing code in a Fortran OpenMP thread

I used cpu_time, but apparently that gives the total time for all threads. I used omp_get_wtime, but get an output in the negatives which is not correct, and also mpi_wtime for which I am now getting a core dump (and for which earlier I was getting just 0.000000000). The relevant code is as follows:
real*8 tbeg, tend
....
!$omp sections private (ie, tbeg, tend)
!$omp section
tbeg = omp_get_wtime()
do ie=1, E
call rmul(u, A, B, dudr, duds, dudt, ie)
enddo
tend = omp_get_wtime()
!Step 4: Print results
print *, tend-tbeg
!$omp end section
!$omp section
....
!$omp end section
!$omp end sections
My compile option is:
gfortran -Ofast -c mult.f -o mult.o -mcmodel=large -I/usr/lib/openmpi/include -fopenmp
gfortran -o baseline ../lib/performance_test.o mult.o ../lib/rose.o -lcuda -lcudart -L/usr/local/cuda-5.0/lib64 -lcublas -lgomp -lmpi_f77
I finally managed to reproduce your issue (with some difficulties, but I've got it). And I'm pretty sure that the bottom line is that you forgot two things in your code:
To include either the OpenMP header include 'omp_lib.h' or better the OpenMP module use omp_lib
To forbid implicit variables declaration implicit none
Although the latter isn't strictly speaking an error, it's definitely a good habit to take and which would have spare you the actual issue coming from the former, since you would have had the following message from the compiler:
tbeg = omp_get_wtime()
1 Error: Function 'omp_get_wtime' at (1) has no IMPLICIT type
So what happened is that you implicitly declared omp_get_wtime as a function returning a single precision floating point variable whereas it actually returns a double precision one. So the return value was truncated and you were having garbage.
Just add the right header and use omp_get_wtime() as you have in you code snippet, and everything should be all-right.

Gfortran exhibits a weird behaviour, is this a bug?

I noticed a weird behaviour with gfortran, the version i am using is
GNU Fortran (MacPorts gcc5 5.2.0_0) 5.2.0
my os is OS X YOSEMITE 10.10.3 (14D136)
i run the following code
program test
implicit none
type :: mytype
real(kind=8),dimension(:,:,:),allocatable :: f
end type
type(mytype),dimension(:,:),allocatable :: tab
integer i,j
allocate(tab(3,8))
do i=1,3
do j=1,8
allocate(tab(i,j)%f(i,i,i))
enddo
enddo
call check_shapes(tab(:,1))
contains
subroutine check_shapes(arg)
integer :: n,k
type(mytype),dimension(:) :: arg
n=size(arg)
do k=1,n
print*,shape(arg(k)%f)
enddo
end subroutine
end program
The output is as expected
1 1 1
2 2 2
3 3 3
however, change the way i define dummy arguments in the subroutine
type(mytype),dimension(:) :: arg
to
class(mytype),dimension(:) :: arg
introducing a class instead of type for the dummy argument, i have the following output
2 2 2
3 3 3
1 1 1
Is this a bug? or i am missing something?
note that it works fine with ifort
version Intel(R) 64, Version 15.0.3.187 Build 20150408
I have checked the already reported bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61337
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58043
and both of them are (almost completely) fixed on the GCC trunk by a recent commit (probably https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58043 ). Your bug appears to be just a variant of these reports.
I have added the information about the recent change to the existing reports. You can expect GCC 6 to contain the fix.

Resources