Does gcc -Ofast enable vectorization? - gcc

I am currently compiling spec2000 art benchmark using following 2 flag settings:
-Ofast -m32 -march=native
-Ofast -m32 -march=native -fno-tree-vectorize
The second setting just disable the vectorizer. However, when I checked the objdump of the 2 settings, both of them shows some packed instructions like vmovapd, vxorpd, etc.
Can anyone provide some explanations? Thanks.

Related

What does the -m32 compiler flag achieve?

In the code for building applications for a vendor driver, I see the following directive:
SAMPLE_CFLAGS = -Wall -m32
I am not sure I quite understand what the option -m32 stands for?
I googled for the same and the closest I got was the following link, which refers to the flag -m64 but provides no description:
CPP/C++ Compiler Flags and Options
Can someone provide an explanation?

Pytorch Extension: difference of performance between the extension that was compiled by g++ and that was built by setuptools

I wrote a cpp extension for torch which is a custom convolutional function.
Firstly, I compiled this function with g++ directly which was used for testing, the latency is 5 milliseconds.
Secondly, I tried to integrate this function to torch and installed this extension by setuptools, following the steps shown in the tutorial provided by torch. However, the latency is now 16 milliseconds.
The function invokation will consumes about 1-2 ms, so why the performance differs so much?
The compilation by g++ directly was done by
g++ -pthread -mavx2 -mfma ...
and the directives in the source file includes
#pragma GCC diagnostic ignored "-Wformat"
#pragma STDC FP_CONTRACT ON
#pragma GCC optimize("O3","unroll-loops","omit-frame-pointer","inline") //Optimization flags
// #pragma GCC option("arch=native","tune=native","no-zero-upper") //Enable AVX
#pragma GCC target("avx")
These directives were also included in the file built by setuptools. The "setup.py" file is
setup(
name = 'cusconv_cpp',
ext_modules=[
CppExtension(name='cusconv_cpp', sources=['src/cusconv.cpp'],
extra_compile_args={'cxx': ['-O3', '-pthread', '-mavx2', '-mfma']})
],
cmdclass={
'build_ext': BuildExtension
})
The output log by setuptools for buiding is
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/max/.local/lib/python3.6/site-packages/torch/lib/include -I/home/max/.local/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/home/max/.local/lib/python3.6/site-packages/torch/lib/include/TH -I/home/max/.local/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/include/python3.6m -c src/indconv.cpp -o build/temp.linux-x86_64-3.6/src/indconv.o -O3 -pthread -mavx2 -mfma -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=indconv_cpp -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
which indeed includes those flags but many other flags were also used.
Anyone has any ideas?

Why qmake passes -Wl,O1 to gcc when linking and will it harm LTO?

I noticed that in the default release configuration, qmake (qmake 3.1, qt 5.9.5 - whatever is installed on my Ubuntu build box) passes -Wl,O1 to g++ when linking. So the linking command line looks like
g++ -Wl,-O1 -flto -O2 -o program program.o lib1.a lib2.a ...
where -flto -O2 are the options that I'm passing via QMAKE_LFLAGS_RELEASE to enable LTO.
Now the question: why qmake has this -Wl,-O1 option and is it going to interfere with LTO?
QMake passes -Wl,O1, because it is meant to be a good default.
It will not harm LTO, because your -O2 option comes later and overrides the -Wl,O1.
From g++'s man page:
If you use multiple -O options, with or without level numbers, the
last such option is the one that is effective.
You can remove the -Wl,-O1 from your Makefile by specifying
QMAKE_LFLAGS_RELEASE -= -Wl,-O1

Transitioning makefile from Visual Studio + ifort to gfortran?

I need to use some Fortran code that was developed by a co-worker using Microsoft Visual Studio (it is a modified version of this groundwater flow model). However, I don't have an ifort license, and generally prefer to use open-source alternatives, so I am trying to transition to gfortran. After much googling and time on StackOverflow, I've successfully created a makefile, compiled the code, and run a simulation using gfortran. However, while benchmarking I found that the code I compiled with gfortran runs ~2.5x slower than my coworker's code compiled from Visual Studio (~30 seconds compared to ~13 seconds). Also, the size of the resulting .exe file from gfortran is about half that of Visual Studio.
I understand that there can be some differences in speed just due to switching compilers; however, the difference I observe seems extreme, leading me to believe there are certain compiler settings that differ between the two. I realize that the question, 'how can I optimize my Fortran code?' is vague and has been answered in detail elsewhere. Instead, my question is: What's the most efficient way to duplicate or approximate a compilation carried out using Microsoft Visual Studio (+ ifort) with gfortran?
My first instinct was to have my co-worker export a makefile from Microsoft Visual Studio (as described here), which I could then modify to work with gfortran by using equivalent settings where possible; however, it appears to be no longer possible to export a makefile from within Visual Studio. Second, I looked for a Fortran equivalent to MakeItSo, but didn't have any luck. I also had my co-worker send me screenshots of the Configuration Properties for Fortran in Visual Studio; however, there appear to be Visual Studio settings for which I cannot find clear analogs in gfortran (for example, the "Favor Size or Speed?" switch) seen here:
FWIW, I am running gfortran on Cygwin:
$ gfortran --version
GNU Fortran (GCC) 5.4.0
Copyright (C) 2015 Free Software Foundation, Inc.
After help in the comments section from #VladimirF, I have updated my makefile and can confirm that flags are being successfully passed to gfortran:
$ make SUTRA
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o slake_mods.o slake_mods.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o fmods_3_0.o fmods_3_0.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o slake.o slake.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o slake_blas.o slake_blas.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o slake_linpack.o slake_linpack.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o slake_slatec.o slake_slatec.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o ssubs_3_0.o ssubs_3_0.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o usubs_3_0_non-Jeff.o usubs_3_0_non-Jeff.F
gfortran -O3 -march=native -ffixed-line-length-72 -m64 -c -o sutra_3_0.o sutra_3_0.F
gfortran slake_mods.o fmods_3_0.o slake.o slake_blas.o slake_linpack.o slake_slatec.o ssubs_3_0.o usubs_3_0_non-Jeff.o sutra_3_0.o -m64 \
-O3 -march=native -ffixed-line-length-72 -m64 -o SUTRA
My co-worker is using Intel Visual Fortran Compiler XE 12.0.2.154 on IA-32 with Microsoft Visual Studio 2008.

How to get -flto to work?

I'm using GCC 4.7.2 and LD 2.23 but when I add -flto to my compile options my compile time increases by over 20%! The manual seems to indicate that -fuse-linker-plugin is needed for the optimization to work. It also says that it's enabled by default with -flto but when I add it explicitly I see the following error in the link command:
g++: error: -fuse-linker-plugin is not supported in this configuration
According to manual, it should be supported by LD 2.21 or greater. Any idea why I'm getting this error? For reference here are examples of my full compile commands:
g++ -Wall -pipe -O3 -flto -fno-strict-aliasing -mtune=generic --no-exceptions -fPIC -c some.cc
g++ -o exec -Xlinker some1.o some2.o -static some1.a some2.a -Wl,--wrap,open -flto -fuse-linker-plugin
Running 'ld --help | grep plugin' shows "-plugin" option so I don't understand why GCC is complaining:
-plugin PLUGIN Load named plugin
-plugin-opt ARG Send arg to last-loaded plugin
Link time optimizations aren't supposed to reduce compilation time, but optimize runtime of your program.
#options, just add "-flto -fuse-linker-plugin" to your CFLAGS(or CXXFLAGS for c++) and LDFLAGS and it should work just fine.
#gold: ld --version is probably gonna return gnu LD, to switch to gold, make ld symlink which ld point to which ld.gold
e.g. ln -s /usr/bin/ld.gold /usr/bin/ld

Resources