How to set gcc option -march? - gcc

I get the help of gcc -march by typing gcc --target-help command:
-march=CPU[,+EXTENSION...]
generate code for CPU and EXTENSION, CPU is one of: i8086,
i186, i286, i386, i486, pentium, pentiumpro, pentiumii,
pentiumiii, pentium4, prescott, nocona, core, core2,
corei7, l1om, k6, k6_2, athlon, k8, amdfam10, generic32,
generic64 EXTENSION is combination of: 8087, 287, 387,
no87, mmx, nommx, sse, sse2, sse3, ssse3, sse4.1, sse4.2,
sse4, nosse, avx, noavx, vmx, smx, xsave, movbe, ept, aes,
pclmul, fma, clflush, syscall, rdtscp, 3dnow, 3dnowa,
sse4a, svme, abm, padlock, fma4, xop, lwp
I tried to set -march=i686+nommx and -march=i686,+nommx, but it's not correct! gcc reported error: error: bad value (i686,+nommx) for -march= switch
I want to build my program to i686 without mmx target, how to set the -march option?

Related

For gcc compiler, what x86-64 instruction set does gcc target when you compile without any flags versus -O2?

For x86-64 there are lots of instruction sets that speed up code execution. Here is a list from gcc wiki https://gcc.gnu.org/wiki/FunctionMultiVersioning:
MMX
SSE
SSE2
SSE3
SSSE3
SSE4.1
SSE4.2
POPCNT
AVX
AVX2
For gcc compiler, what x86-64 instruction set does gcc target when you compile without any flags versus -O2?
To keep things simple lets just say the question is about gcc version 12 (most recent major). But I would like to know what gcc command switches/options i need to do to so that i can see what my version of gcc version does.
I assume that gcc chooses something that is "portable" so that would mean probably something slow. But this is just my assumption... I would like to know does that mean like SSE4.2 or none?
If you don't pass a command-line -march option, then you get whatever was selected when gcc was compiled. The default is -march=x86-64 but it could have been overridden by whoever compiled your gcc (e.g. your binary package distributor). See https://gcc.gnu.org/install/configure.html and note the --with-arch option.
You can compile with -v -Q to see what option is in use. Look for the options passed line.
With -march=x86-64 you get "least common denominator" code that will run on every known x86-64 CPU, all the way back to the AMD K8. This includes SSE2, which was part of the original AMD64 spec, but not SSE3 or anything later. popcnt would not be included either.
The -march option is orthogonal to optimization options like -O2 and the -f... flags (e.g. -funroll-loops). You always get code compatible with whatever is selected by -march, no matter what optimization options are in use. However -m flags (like -mavx) can permit the use of other CPU features beyond what -march implies, in which case your code is only guaranteed to run on CPUs with those features.

How can I determine what architectures gcc supports?

GCC supports a -march switch that allows you to specify the architecture you are targeting - allowing it to tune instruction sequences for that platform as well as using instructions that might be available on the platform which aren't available on the "default" or base version of the architecture.
For example, -march=skylake will tell the compiler to target Skylake CPUs, including using instruction sets available on Skylake such as AVX2.
How can I tell what values for -march the local version of gcc supports? Newer versions helpfully list the valid arguments when an invalid argument is passed, but older versions do not.
With gcc7 and later, gcc will print the values it supports as part of the error message.
$ gcc -E -march=help -xc /dev/null
# 1 "/dev/null"
cc1: error: bad value (‘help’) for ‘-march=’ switch
cc1: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 bonnell atom silvermont slm knl x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 btver1 btver2
I checked on Godbolt, and x86 gcc6.x and earlier just say error: bad value (invalid) for -march= switch even with -v.
It also doesn't work with clang5.0 or ICC18.
This is target-specific: ARM gcc6.3 does produce a list of supported -march values, or -mcpu=.
For gcc-7.2.0, it's here:
https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/x86-Options.html#x86-Options
You could go to gcc online documentation. Then, find the manual for the version you are interested. Following that, go to machine dependent options section. If you are looking into x86, jump to the "x86 options" section. Now, search "-march."
I haven't checked the old gcc versions. Another way you could try is to check out the source code, and open the source code that keeps the literal strings for the supported arch.
svn checkout svn://gcc.gnu.org/svn/gcc/trunk gcc_trunk
cd gcc_trunk
Then, maybe, you could try like this:
find . -type f | egrep "*\.(c|cc|cpp|h|hpp)$" | xargs egrep '"skylake-avx'
As of today, the literal strings are kept in ./gcc/config/i386/i386.c in case of x86 architectures.
%P.S.
As Peter mentioned, it seems machine-specific. I suspect that there isn't a standard/desired behavior that lists available march values. For example, if gcc has been just ported to a brand-new instruction set architecture, LEG--as opposed to ARM--, it does not necessarily have a command-line option to list all supported march values.
Fortunately, it seems like some newer gcc versions provide a way to do so. If you do need such an option for old gccs, writing a gcc plugin, which might work from gcc 4.5 or so, could be taken into consideration:
gcc plugin
simple gcc plugin how to
Gcc plugins are plugged-in to an existing gcc by adding some command-line options. Gcc has APIs for plugins. All you need would be to write a code that checks the information such as gcc version, the arch that runs gcc, etc, and that prints out the list of the supported march.
Use the detailed help page:
gcc -v --help
Look for the option -march=CPU, for example in gcc v4.8.4
-march=CPU[,+EXTENSION...]
generate code for CPU and EXTENSION, CPU is one of:
generic32, generic64, i386, i486, i586, i686,
pentium, pentiumpro, pentiumii, pentiumiii, pentium4,
prescott, nocona, core, core2, corei7, l1om, k1om,
k6, k6_2, athlon, opteron, k8, amdfam10, bdver1,
bdver2, bdver3, btver1, btver2
EXTENSION is combination of:
8087, 287, 387, no87, mmx, nommx, sse, sse2, sse3,
ssse3, sse4.1, sse4.2, sse4, nosse, avx, avx2,
avx512f, avx512cd, avx512er, avx512pf, noavx, vmx,
vmfunc, smx, xsave, xsaveopt, aes, pclmul, fsgsbase,
rdrnd, f16c, bmi2, fma, fma4, xop, lwp, movbe, cx16,
ept, lzcnt, hle, rtm, invpcid, clflush, nop, syscall,
rdtscp, 3dnow, 3dnowa, padlock, svme, sse4a, abm,
bmi, tbm, adx, rdseed, prfchw, smap, mpx, sha,
clflushopt, xsavec, xsaves, prefetchwt1
Since GCC 4 there's a --target-help which prints the supported parameters for options including
-march
-mtune
-mabi
-masm
Other options which themselves are architecture-specific e.g. -msse2, -mavx2

Can't callgrind support AVX2 instructions?

I'm trying to profile my program written with Intel AVX2 instructions using valgrind. The program run smoothly under memcheck. But when I run with callgrind (valgrind --tool=callgrind), it terminates with unrecognized instruction error. I check the release note of Valgrind 3.9.0 and it says Support for Intel AVX2 instructions. This is available only on 64 bit code.. I compile my program with g++-4.8 -std=c++11 -mavx2 -m64 but the error remains. Part of the output is as below:
vex amd64->IR: unhandled instruction bytes: 0x16 0xC5 0xDD 0x64 0xD2 0xC5 0xF5 0xDB
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0
==6775== valgrind: Unrecognised instruction at address 0x43d1c9.
==6775== at 0x43D1C9: byteslice::ByteSliceColumnBlock<16ul, (byteslice::Direction)1>::Scan(bytes
lice::Comparator, unsigned long, byteslice::BitVectorBlock*, byteslice::Bitwise) const (avxintrin.h
:965)
==6775== by 0x45DEAD: byteslice::Column::Scan(byteslice::Comparator, unsigned long, byteslice::B
itVector*, byteslice::Bitwise) const (column.cpp:113)
==6775== by 0x4017C9: main (simple.cpp:89)
Edit: I find the error depends on optimization level. There's no error with -O0. But error shows up with -O1 and above.

What is the proper architecture-specific options (-m) for Sandy Bridge based Pentium?

I'm trying to figure out how to set -march option properly to see how much performance difference between the option enabled and disabled can occur on my PC with gcc 4.7.2.
Before trying compiling, I tried to find what is the best -march option for my PC. My PC has Pentium G850, whose architecture is Sandy Bridge. So I referred to the gcc 4.7.2 manual and found that -march=corei7-avx seems the best.
However, I remembered that Sandy Bridge based Pentium lacks AVX and AES-NI instruction set support, which is true for Pentium G850. So -march=corei7-avx is not a proper option.
I come up with some potential options:
-march=corei7-avx -mno-avx -mno-aes
-march=corei7 -mtune=corei7-avx
-march=native
The first option looks reasonable considering information I have, but I'm anxious that there may be missing feature other than AVX and AES-NI. The second option looks safe, but it could miss some minor features on Sandy Bridge because of -march=corei7. The third option will take care of all of my concerns, but I've heard this option sometimes misdetects features of CPU so I would like to know how to manually do that.
I've googled and searched StackOverflow and SuperUser, but I can't find any clear solutions...
What options should be set?
What about detecting via GCC, for me (gcc-5.3.0) on an i5-2450M CPU (Lenovo e520), the following shows:
gcc -march=native -E -v - </dev/null 2>&1 | grep cc1
/usr/libexec/gcc/x86_64-pc-linux-gnu/5.3.0/cc1 -E -quiet -v - -march=sandybridge
-mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16
-msahf -mno-movbe -maes -mno-sha -mpclmul -mpopcnt -mno-abm -mno-lwp
-mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx
-mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd
-mno-f16c -mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr
-mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd
-mno-vx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves
-mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma
-mno-avx512vbmi -mno-clwb -mno-pcommit -mno-mwaitx --param
l1-cache-size=32 --param l1-cache-line-size=64 --param
l2-cache-size=3072 -mtune=sandybridge -fstack-protector-strong
I would suggest to use -march=corei7-avx -mtune=corei7-avx -mno-avx -mno-aes. It is important to specify -mtune because this option tells gcc which CPU model it should use for scheduling instructions in the generated code.
I hava a Sandy Bridge based Intel(R) Celeron(R) CPU G530.
When use -march=native in gentoo's CFLAGS, and then compile media-video/ffmpeg-1.2.6 (current stable version in Gentoo), there is something wrong when playing video with mplayer( illegal instruction). Just like what you said, -mtune=native sometimes misdetects features of CPU.
Then I change to -march=corei7-avx -mtune=corei7-avx -mno-avx -mno-aes, and recompile ffmpeg-1.2.6 and mplayer, things are all ok till now.

gcc doesn't want to use AVX on mac

So I have this brand new mac book pro with intel core I7 processor and sysctl machdep.cpu.features giving
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 xAPIC POPCNT AES PCID XSAVE OSXSAVE TSCTMR AVX1.0 RDRAND F16C
yet when I run gcc (4.7.2 macports), it doesn't #define __AVX__. What's wrong? (Mac OS X 10.8.2)
I depends on the compiler flags you are using wether __AVX__ and __SSEx__ will be defined.
So if you are using g++ -march=corei7avx the macro will be defined. -march=native should also suffice, if gcc is able to detect you cpu correctly (it usually is).
On my i7 MBP 13" (mid 2010) running 10.6.8, the current MacPorts gcc 4.7.3 and 4.8.2 do define AVX when -mavx is specified. They however crash compiling code using boost::simd (available via www.metascale.org).
Macports clang-3.3 has no such issues, but takes way longer to compile (with or without -mavx, compared to gcc >= 4.7 WITHOUT -mavx).

Resources