what does the compilation options mean?
export FFLAGS = -O3 -r8 -i4 -I${PWD}/headers -nofor_main.
-r8 means what?i4 means what?where could I find the help file.can anybody explain compilation option FFLAGS?I really appreciate it
You apparently already know that FFLAGS is a list of options for a FORTRAN compiler.
-r8 sets the size of certain data types to 8 bytes, depending on architecture. It is approximately the same as setting double precision.
-i4 sets the default integer size to 4 bytes.
Do you need more?
EDIT:
There are a lot of different compilers, and versions of compilers. The default for GNUMake is f77, and from the UNIX man page:
-r8
Double the size of default REAL, DOUBLE, INTEGER, and COMPLEX data.
NOTE: This option is now considered obsolete and may be
removed in future releases. Use the more flexible -xtypemap
option instead.
This option sets the default size for REAL, INTEGER, and
LOGICAL to 8, and for COMPLEX to 16. For INTEGER and LOGI-
CAL the compiler allocates 8 bytes, but does 4-byte arith-
metic. For actual 8-byte arithmetic,
see -dbl.
Related
I am getting different results from the same code when I compile it with the Intel Fortran compiler and the Portlan Group compiler. What would be the closest flag in pgi Fortran to the -fp-model precise in ifort?
Is there a set of flags in pgi that would match the the combination of -O2 -fp-model precise ifort flags? Thanks
Programs compiled with different compilers (or even different versions of the same compiler) are not expected to produce exactly the same results. Different levels of optimization (-On flag) are also not equivalent between compilers (except -O0 which requests no optimization at all).
I do not think there is an equivalent flag in PGI to ifort's -fp-model precise, but you may want to look into target-specific flags in the manual for PGI Fortran compiler, and more specifically, these:
-K[no]ieee Use IEEE division, optionally enable traps
-Ktrap=align|denorm|divz|fp|inexact|inv|none|ovf|unf
Determine IEEE Trap conditions
-M[no]daz Treat denormalized numbers as zero
-M[no]flushz Set SSE to flush-to-zero mode
-M[no]fpapprox[=div|sqrt|rsqrt]
Perform certain fp operations using low-precision approximation
div Approximate floating point division
sqrt Approximate floating point square root
rsqrt Approximate floating point reciprocal square root
-Mfpapprox Approximate div,sqrt,rsqrt
-M[no]fpmisalign Allow use of vector arithmetic instructions for unaligned operands
-M[no]fprelaxed[=div|recip|sqrt|rsqrt|[no]order]
Perform certain fp operations using relaxed precision
div Perform divide with relaxed precision
recip Perform reciprocal with relaxed precision
sqrt Perform square root with relaxed precision
rsqrt Perform reciprocal square root with relaxed precision
[no]order Allow expression reordering, including factoring
-Mfprelaxed Choose which operations depending on target processor
It is acceptable for the program's output to be different at some less significant digit between different compilers. If your results are very different, your algorithm might not be very robust and may need work.
Does the -march=corei7-avx -mtune=corei7-avx or -march=corei7 -mtune=corei7 -mavx command line options to MinGW with the -mfpmath=sse command line option (or even with -mfpmath=both) enables using of AVX instruction for math routines? Note, that --with-fpmath=avx from here does not work (that is "unrecognized option" for recent builds on MinGW).
AVX is enabled by either -march=corei7-avx or -mavx. The -mtune option is neither necessary nor sufficient to enable AVX.
A -mfpmath=avx does not make any sense, because with this switch you control the generation of scalar floating point code. It makes no difference if you use only one float of a 4 float vector register or only one element of a 8 float vector register. If you have march=avx enabled, scalar floating point instructions will use the VEX encoding anyway, which will save a few mov instructions.
Note that on x86_64 -mfpmath defaults to SSE, so using this switch is usually not necessary or even harmful if you don't exactly know what you are doing.
Consider the following example:
int a[4];
int main() {
a[4] = 12; // <--
return 0;
}
This is clearly an out of bounds error, is it not? I was wondering when gcc would warn about this, and found that it will only do so if optimisation is -O2 or higher (this is affected by the -ftree-vrp option that is only set automatically for -O2 or higher).
I don't really see why this makes sense and whether it is correct that gcc does not warn otherwise.
The documentation has this to say about the matter:
This allows the optimizers to remove unnecessary range checks like array bound checks and null pointer checks.
Still, I don't see why that check should be unnecessary?
Your example is a case of constant propagation, not value range propagation, and it certainly triggers a warning on my version of gcc (4.5.1) whether or not -ftree-vrp is enabled.
In general, Java and Fortran are the only languages supported by gcc which (Java by default, and Fortan if you explicitly ask for it with -fbounds-check) will generate code for checking array bounds.
However, although C/C++ does not support any such thing, the compiler will still warn you at compile time if it believes that something is amiss. For constants, this is pretty obvious, for variable ranges, it is somewhat harder.
The clause "allows the compiler to remove unnecessary range checks" relates to cases where for example you use an unsigned 8 bit wide variable to index into an array that has >256 entries or an unsigned 16 bit value to index into an array of >65536 elements. Or, if you iterate over an array in a loop, and the (variable) loop counter is bounded by values that can be proven as compile-time constants which are legal array indices, so the counter can never possibly go beyond the array bounds.
In such cases, the compiler will neither warn you nor generate any code for target languages where this is supported.
I'm trying to port code over to compile using Microchip's C18 compiler for a PIC microcontroller. The code includes enums with large values assigned (>8-bit). They are not working properly, indicating that, for example, 0x02 is the same as 0x2002.
How can I force the enumerated values to be referenced as 16-bit values?
In the DirectX headers, every enum has a FORCE_DWORD value in it with a value of 0xffffffff. I guess that's basically what you want, it forces to compiler to let the enum have at least 32 bits. So try adding a FORCE_WORD with a value of 0xffff.
This won't solve your problem, of course, if that compiler just does not support enums greater than 8 bits.
I found the problem.
For future reference, the C18 compiler will NOT promote variables OR constants when performing a math operation, even though it is ANSI C standard. This is to increase speed while running on 8-bit processors.
To force ANSI compliance, use the "-Oi" compiler option.
See page 92 of the C18 manual.
The v4 series of the gcc compiler can automatically vectorize loops using the SIMD processor on some modern CPUs, such as the AMD Athlon or Intel Pentium/Core chips. How is this done?
The original page offers details on getting gcc to automatically vectorize
loops, including a few examples:
http://gcc.gnu.org/projects/tree-ssa/vectorization.html
While the examples are great, it turns out the syntax for calling those options with latest GCC seems to have changed a bit, see now:
https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#index-fopt-info
In summary, the following options will work for x86 chips with SSE2,
giving a log of loops that have been vectorized:
gcc -O2 -ftree-vectorize -msse2 -mfpmath=sse -ftree-vectorizer-verbose=5
Note that -msse is also a possibility, but it will only vectorize loops
using floats, not doubles or ints. (SSE2 is baseline for x86-64. For 32-bit code use -mfpmath=sse as well. That's the default for 64-bit but not 32-bit.)
Modern versions of GCC enable -ftree-vectorize at -O3 so just use that in GCC4.x and later:
gcc -O3 -msse2 -mfpmath=sse -ftree-vectorizer-verbose=5
(Clang enables auto-vectorization at -O2. ICC defaults to optimization enabled + fast-math.)
Most of the following was written by Peter Cordes, who could have just written a new answer. Over time, as compilers change, options and compiler output will change. I am not entirely sure whether it is worth tracking it in great detail here. Comments? -- Author
To also use instruction set extensions supported by the hardware you're compiling on, and tune for it, use -march=native.
Reduction loops (like sum of an array) will need OpenMP or -ffast-math to treat FP math as associative and vectorize. Example on the Godbolt compiler explorer with -O3 -march=native -ffast-math including a reduction (array sum) which is scalar without -ffast-math. (Well, GCC8 and later do a SIMD load and then unpack it to scalar elements, which is pointless vs. simple unrolling. The loop bottlenecks on the latency of the one addss dependency chain.)
Sometimes you don't need -ffast-math, just -fno-math-errno can help gcc inline math functions and vectorize something involving sqrt and/or rint / nearbyint.
Other useful options include -flto (link-time optimization for cross-file inlining, constant propagation, etc) and / or profile-guided optimization with -fprofile-generate / test run(s) with realistic input(s) /-fprofile-use. PGO enables loop unrolling for "hot" loops; in modern GCC that's off by default even at -O3.
There is a gimple (an Intermediate Representation of GCC) pass pass_vectorize. This pass will enable auto-vectorization at gimple level.
For enabling autovectorization (GCC V4.4.0), we need to following steps:
Mention the number of words in a vector as per target architecture. This can be done by defining the macro UNITS_PER_SIMD_WORD.
The vector modes that are possible needs to be defined in a separate file usually <target>-modes.def. This file has to reside in the directory where other files containing the machine descriptions are residing on. (As per the configuration script. If you can change the script you can place the file in whatever directory you want it to be in).
The modes that are to be considered for vectorization as per target architecture. Like, 4 words will constitute a vector or eight half words will constitute a vector or two double-words will constitute a vector. The details of this needs to be mentioned in the <target>-modes.def file. For example:
VECTOR_MODES (INT, 8); /* V8QI V4HI V2SI /
VECTOR_MODES (INT, 16); / V16QI V8HI V4SI V2DI /
VECTOR_MODES (FLOAT, 8); / V4HF V2SF */
Build the port. Vectorization can be enabled using the command line options -O2 -ftree-vectorize.