Fortran Compiler Optimization

Fortran Compiler Optimization - compilation

I have been working with a Fortran code written in a Visual Studio 2015 project. Since now I was always compiling and running the code using Visual Studio 2015 on my own laptop. It takes roughly 2 hours for the simulation to finish (for the code to run until completion) using a Dell personal computer with an Intel® CPU (Core™ i7-8650U-1.90GHz-4 cores-BS:2112 MHz,) and 16 GB RAM (DDR4 2400MHz) using Windows 10 (Enterprise, 64 bit).
I have recently started using a High Performance Computing (HPC) framework for compiling and running my Fortran code. The code is not parallelized and this is not quite possible (or straightforward) in my application. The HPC obviously has better specifications than my own laptop but I was surprised to see that HPC took 04:10:23 hours to finish the same simulation which took 2 hours for my laptop. Both platforms only use a single core since parallelization is not an option.
Would you have any suggestions for me to have a faster running code (at least the same time as my laptop-PC) like some compilation tips or options? I realized that turning on compiler optimizations in Visual Studio significantly (roughly 2x) reduces the run-time of simulations. Would compiling the code using specific options help for a Fortran code?
I have tried to look online for Intel compiling options but it seems really complicated for a beginner. I could not really find a road map to follow on where to look. I would really appreciate it if you can give me some recommendations or a direction to look at. I can provide additional information if necessary. Thank you very much for your time.
Here is the code I use for compilation:
#=============< COMPILER >=============================#
FC = ifort
EXC = DL0D
#=============< SOURCE FILES >=========================#
SRCS = 1Main.for \
OBJS=$(SRCS:%.for=%.for)
#==============< OBJECTS FILES >=======================#
OBJECTS = Main.o \
#lsode_reduced.o utility.o
#======================================================#
COMMON =
LIBES = -lm
FFLAGS = -extend-source 132 -heap-arrays -O2 -g -traceback
$(EXC) : $(OBJS)
$(FC) $(FFLAGS) -o $# $(OBJS)
$(RM) ./*.o ./*~
$(OBJECTS): $(COMMON)
clean:
rm -f $(EXC) $(OBJECTS)

Related

How to speed up Compile Time of my CMake enabled C++ Project?

I came across several SO questions regarding specific aspects of improving the turn-around time of CMake enabled C++ projects lately (like "At what level should I distribute my build process?" or "cmake rebuild_cache for just a subdirectory?"), I was wondering if there is a more general guidance utilizing the specific possibilities CMake offers. If there is probably no cross-platform compile time optimization, I'm mainly interested in Visual Studio or GNU toochain based approaches.
And I'm already aware of and investing into the generally recommended areas to speed up C++ builds:
Change/Optimize/fine-tune the toolchain
Optimize your code base/software architecture (e.g by reducing the dependencies and use well-defined sub-projects - unit tests)
Invest in a better hardware (SSD, CPU, memory)
like recommended here, here or here. So my focus in this question is on the first point.
Plus I know of the recommendations to be found in CMake's Wiki:
CMake: building with all your cores
CMake Performance Tips
The former just handles the basics (parallel make), the later handles mostly how to speed-up parsing CMake files.
Just to make this a little more concrete, if I take my CMake example from here with 100 libraries using MSYS/GNU I got the following time measurement results:
$ cmake --version
cmake version 3.5.2
CMake suite maintained and supported by Kitware (kitware.com/cmake).
$ time -p cmake -G "MSYS Makefiles" ..
-- The CXX compiler identification is GNU 4.8.1
...
-- Configuring done
-- Generating done
-- Build files have been written to: [...]
real 27.03
user 0.01
sys 0.03
$ time -p make -j8
...
[100%] Built target CMakeTest
real 113.11
user 8.82
sys 33.08
So I have a total of ~140 seconds and my goal - for this admittedly very simple example - would be to get this down to about 10-20% of what I get with the standard settings/tools.

Here's what I had good results with using CMake and Visual Studio or GNU toolchains:
Exchange GNU make with Ninja. It's faster, makes use of all available CPU cores automatically and has a good dependency management. Just be aware of
a.) You need to setup the target dependencies in CMake correctly. If you get to a point where the build has a dependency to another artifact, it has to wait until those are compiled (synchronization points).
$ time -p cmake -G "Ninja" ..
-- The CXX compiler identification is GNU 4.8.1
...
real 11.06
user 0.00
sys 0.00
$ time -p ninja
...
[202/202] Linking CXX executable CMakeTest.exe
real 40.31
user 0.01
sys 0.01
b.) Linking is always such a synchronization point. So you can make more use of CMake's Object Libraries to reduce those, but it makes your CMake code a little bit uglier.
$ time -p ninja
...
[102/102] Linking CXX executable CMakeTest.exe
real 27.62
user 0.00
sys 0.04
Split less frequently changed or stable code parts into separate CMake projects and use CMake's ExternalProject_Add() or - if you e.g. switch to binary delivery of some libraries - find_library().
Think of a different set of compiler/linker options for your daily work (but only if you also have some test time/experience with the final release build options).
a.) Skip the optimization parts
b.) Try incremental linking
If you often do changes to the CMake code itself, think about rebuilding CMake from sources optimized for your machine's architecture. CMake's officially distributed binaries are just a compromise to work on every possible CPU architecture.
When I use MinGW64/MSYS to rebuild CMake 3.5.2 with e.g.
cmake -DCMAKE_BUILD_TYPE:STRING="Release"
-DCMAKE_CXX_FLAGS:STRING="-march=native -m64 -Ofast -flto"
-DCMAKE_EXE_LINKER_FLAGS:STRING="-Wl,--allow-multiple-definition"
-G "MSYS Makefiles" ..
I can accelerate the first part:
$ time -p [...]/MSYS64/bin/cmake.exe -G "Ninja" ..
real 6.46
user 0.03
sys 0.01
If your file I/O is very slow and since CMake works with dedicated binary output directories, make use of a RAM disk. If you still use a hard drive, consider switching to a solid state disk.
Depending of your final output file, exchange the GNU standard linker with the Gold Linker. Even faster than Gold Linker is lld from the LLVM project. You have to check whether it supports already the needed features on your platform.
Use Clang/c2 instead of Visual C++ compiler. For the Visual C++ compiler performance recommendations are provided from the Visual C++ team, see https://blogs.msdn.microsoft.com/vcblog/2016/10/26/recommendations-to-speed-c-builds-in-visual-studio/
Increadibuild can boost the compilation time.
References
CMake: How to setup Source, Library and CMakeLists.txt dependencies?
Replacing ld with gold - any experience?
Is the lld linker a drop-in replacement for ld and gold?

For speeding up the CMake configure time see: https://github.com/cristianadam/cmake-checks-cache
LLVM + Clang got a ~3x speedup.

Makefile library prerequisite

I have a makefile, which I am using to cross-compile for and embeded ARM platform with gcc. Specifcally, I am using arm-none-eabi-gcc, but the same appiles to avr-gcc, msp430-gcc, etc. Typically when using make+gcc (and not cross compiling) I list libs as prerequisite as follows:
programA.elf: programA.o foo.o -lm ...etc
programB.elf: programB.o bar.o -lftdi ...etc
%.elf:
gcc $(LDFLAGS) -o $# $^
Make handles this "-lsyntax" very nicely, and its very convienient if you are building multiple progams/targets and want to have a generic rule for linking. The problem I have run into durring cross-compiling is that arm-none-eabi-gcc obviously has a different libm.a than my system's gcc libm.so (for example), but Make doesn't know whats going on here and keeps trying to use the x86 libm instead of the ARM base one. I can get things to work by adding the line:
.LIBPATTERNS = /usr/lib/arm-none-eabi/newlib/lib%.a
but it seems kinda clunky and exposes anyone wanting to compile the project to knowing a little more about the toolchain's install locations than is normally expected.
My question is: "Is there a better convention to list a binary's lib dependencies I should be using here that wont break when cross-compiling?"

This can be done. But a general solution is complex. I have Makefiles which build arm, x86, and c67 executables from a single set of sources. The page you reference eludes to the key: VPATH. I suggest a separate subdirectory for each architecture. The following is not working code, but it gives the idea
all: arm/pgma x86/pgma
vpath %.c $(CURDIR)
arm x86:
mkdir -p $#
arm/pgma: arm/main.o arm/sub.o | arm
x86/pgma: x86/main.o x86/sub.o more.o | x86
arm/%: CC=arm-none-eabi-gcc
arm/%: CFLAGS += -march=armv7-a -mtune=corex-a8
x86/%: CC=gcc
arm/%: VPATH = /usr/lib/arm-none-eabi/newlib
# Notice, VPATH not needed for x86 since it is the native host
This entire concept can be extend to build dependency file is each subdirectory as well debug and release variants. I have not tried this with the -lfoo, but it should work. E.g.,
arm/pgma: arm/main.o arm/sub.o -lmylib | arm

Error compiling with Visual Studio 2013 using VisualMicro addon for Arduino

I am trying to compile sketch for Arduino, using VisualMicro addon for Visual Studio 2013. But even if I try to compile empty project, I have error I cannot understand. What is the source of the problem?
Compiling 'Test' for 'Arduino Uno'
Process: "{runtime.tools.avr-gcc.path}\bin\avr-g++" -c -g -Os -w -fno-exceptions -ffunction-sections -fdata-sections -fno-threadsafe-statics -MMD -mmcu=atmega328p -DF_CPU=16000000L -DARDUINO=163 -DARDUINO_AVR_UNO -DARDUINO_ARCH_AVR -I"C:\Program Files\Arduino\hardware\arduino\avr\cores\arduino" -I"C:\Program Files\Arduino\hardware\arduino\avr\variants\standard" "C:\Users\Just_a_human\AppData\Local\V.Micro\Arduino\Builds\Test\uno\Test.cpp" -o "C:\Users\Just_a_human\AppData\Local\V.Micro\Arduino\Builds\Test\uno\Test.cpp.o" Error compiling
Could anyone please show me in which direction I have to dig?
Thanks in advance.

Found the problem. It was due to the fact that VisualMicro doesn't support Arduino 1.6.2+ yet. I had to read manuals better ;)

What's version of your Arduino? If your Arduino version is 1.6.3 +, you need to use Visual Micro beta since Visual Micro stable supports for Arduino 1.6.1 or earlier. This works fine for me, so check your Arduino version.
Here's download link for Visual Micro: http://www.visualmicro.com/page/Arduino-Visual-Studio-Downloads.aspx

Visual Micro 1.6.2 support was released a short time after Arduino released it. There were some huge changes but it should all be working well now :)
This error can also happen if Arduino 1.6.6+ has not been run at least once after install.

ifort 64 bit build; ifort 64 vs. gfortran 64

I have what I imagine will be an easy question. How do I create a 64-bit build using ifort? I'm using "ifort -Ofast -o program.exe .f". I've set the compilervars to intel64 and am working on win7 using a xeon processor. I've looked through the menu of compiler flags but haven't been able to identify what I need. I see there's a -m64 option for mac users but that won't help me.
A second question, would there be that big of a performance issue between a gfortran -m64 build relative to the same using ifort?
Thanks!

In ifort, you need to invoke the ifortvars.sh (or .csh) script with the "intel64" argument to get the x64 compiler. In fact, you are required to specify that argument (either as intel64 or ia32), so look to see how it is invoked in your environment and fix the reference. This is not selected with an option to the ifort command.
As for performance comparisons, I would point you at Polyhedron, an independent software reseller in the UK. They do multi-compiler comparisons on fixed hardware. Click on "Compiler Comparisons" in the left column. In their tests, gfortran is in 5th place (ifort is 1st).

No speedup with precompiled headers on gcc (but large speedup with visual studio)

I'm working on a large project that must builds under multiple environment, chiefly linux/gcc and windows/msvc. To speed up the build, we use precompiled headers.
The Windows implementation is very efficient: on my quad-core hyperthreaded i7 build time goes down from 9 minutes to 1.5 minutes. However using precompiled headers doesn't seem to improve performance: in both cases it builds in 22 minutes under a virtual box on the same computer, or about 40 minutes on a real server.
So I'm thinking the obvious, that I somehow got something wrong and that the precompiled header isn't kicking in. I can't find what however.
Our Makefiles are generated by CMake, so I can copy paste the code used to compile the header and the object files that uses them.
Creating the header:
/usr/bin/c++ -O3 -DNDEBUG --no-warnings "-I/mnt/code/server a/src/game"
"-I/mnt/code/server a/src/game/vmap" "-I/mnt/code/server a/dep/include/g3dlite"
"-I/mnt/code/server a/dep/include" "-I/mnt/code/server a/src/shared"
"-I/mnt/code/server a/src/framework" "-I/mnt/code/server a/buildlinux"
"-I/mnt/code/server a/buildlinux/src/shared" -I/usr/include/mysql
"-I/mnt/code/server a/dep/acelite" -DDO_MYSQL -DHAVE_CONFIG_H
-DVERSION=\"0.6.1\" -DSYSCONFDIR=\"../etc/\" -D_RELEASE -D_NDEBUG -x c++-header
-o "/mnt/code/server a/buildlinux/src/game/pchdef.h.gch" "/mnt/code/server
a/src/game/pchdef.h"
Compiling an object file:
/usr/bin/c++ $(CXX_DEFINES) $(CXX_FLAGS) "-I/mnt/code/server
a/buildlinux/src/game" -include pchdef.h -Winvalid-pch -o
CMakeFiles/game.dir/AccountMgr.cpp.o -c "/mnt/code/server
a/src/game/AccountMgr.cpp"
Insights are appreciated, even if they don't directly derive from the snippets above.

There are a couple of things that you need to pay attention to when using precompiled headers in GCC. First of all, the precompiled header must be created with the same arguments as the cpp files are being compiled. Also I assume you have actually included the precompiled header in AccountMgr.cpp?
Try to compile with the -H flag, this will output which include files are being considered. Check that the pchdef-file is mentioned, and see what other include files are being parsed. To have gcc complain about invalid PCH files, consider using -Winvalid-pch.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio