How do the `aapcs` and `aapcs-linux` ABI options differ when compiling for bare-metal ARM with gcc? - abi

I am trying to port an application to ARM's arm-none-eabi-gcc toolchain. This application is intended to run on a bare-metal target.
The only two suitable values for the -mabi option in this case appear to be aapcs and aapcs-linux. From Debian documentation and Embedded Linux from Source I know that aapcs-linux uses a fixed 4-byte enum size, whereas aapcs defines enums as "variable length". However, I can't find any information on what other differences (if any) there might be.
Does anyone know the full list of differences between these two ABI options?

Related

Can I make my compiler use fast-math on a per-function basis?

Suppose I have
template <bool UsesFastMath> void foo(float* data, size_t length);
and I want to compile one instantiation with -ffast-math (--use-fast-math for nvcc), and the other instantiation without it.
This can be achieved by instantiating each of the variants in a separate translation unit, and compiling each of them with a different command-line - with and without the switch.
My question is whether it's possible to indicate to popular compilers (*) to apply or not apply -ffast-math for individual functions - so that I'll be able to have my instantiations in the same translation unit.
Notes:
If the answer is "no", bonus points for explaining why not.
This is not the same questions as this one, which is about turning fast-math on and off at runtime. I'm much more modest...
(*) by popular compilers I mean any of: gcc, clang, msvc icc, nvcc (for GPU kernel code) about which you have that information.
In GCC you can declare functions like following:
__attribute__((optimize("-ffast-math")))
double
myfunc(double val)
{
return val / 2;
}
This is GCC-only feature.
See working example here -> https://gcc.gnu.org/ml/gcc/2009-10/msg00385.html
It seems that GCC not verifies optimize() arguments. So typos like "-ffast-match" will be silently ignored.
As of CUDA 7.5 (the latest version I am familiar with, although CUDA 8.0 is currently shipping), nvcc does not support function attributes that allow programmers to apply specific compiler optimizations on a per-function basis.
Since optimization configurations set via command line switches apply to the entire compilation unit, one possible approach is to use as many different compilation units as there are different optimization configurations, as already noted in the question; source code may be shared and #include-ed from a common file.
With nvcc, the command line switch --use_fast_math basically controls three areas of functionality:
Flush-to-zero mode is enabled (that is, denormal support is disabled)
Single-precision reciprocal, division, and square root are switched to approximate versions
Certain standard math functions are replaced by equivalent, lower-precision, intrinsics
You can apply some of these changes with per-operation granularity by using appropriate intrinsics, others by using PTX inline assembly.

gcc; Aarch64; Armv8; enable crypto; -mcpu=cortex-a53+crypto

I am trying to optimize an Arm processor (Corte-A53) with an Armv8 architecture for crypto purposes.
The problem is that however the compiler accepts -mcpu=cortex-a53+crypto etc it doesn't change the output (I checked the assembly output).
Changing mfpu, mcpu add futures like crypto or simd, it doesn't matter, it is completely ignored.
To enable Neon code -ftree-vectorize is needed, how to make use of crypto?
(I checked the -O(1,2,3) flags, it won't help).
Edit: I realized I made a mistake by thinking the crypto flag works like an optimization flag solved by the compiler. My bad.
You had two questions...
Why does -mcpu=cortex-a53+crypto not change code output?
The crypto extensions are an optional feature under the AArch64 state of ARMv8-A. The +crypto feature flag indicates to the compiler that these instructions are available use. From a practical perspective, in GCC 4.8/4.9/5.1, this defines the macro __ARM_FEATURE_CRYPTO, and controls whether or not you can use the crypto intrinsics defined in ACLE, for example:
uint8x16_t vaeseq_u8 (uint8x16_t data, uint8x16_t key)
There is no optimisation in current GCC which will automatically convert a sequence of C code to use the cryptography instructions. If you want to make this transformation, you have to do it by hand (and guard it by the appropriate feature macro).
Why do the +fpu and +simd flags not change code output?
For -mcpu=cortex-a53 the +fp and +simd flags are implied by default (for some configurations of GCC +crypto may also be implied by default). Adding these feature flags will therefore not change code generation.

iar ewarm linking to gcc eabi build library

I have been able to build code in IAR EWARM (7.40) (for the ST STM32F407IG ARM Cortex-m4) which links to a library built under Ubuntu via gcc (4.9.3). This mostly works but some build environment adjustments on either or both the IAR or gcc side still remain. I would appreciate whatever help you can point me to.
There are no build errors evident but EWARM and arm-none-eabi-gcc disagree on the locations of parameters being passed to the gcc built library. The EWARM debugger and the code generated by EWARM agree with each other but (it appears given investigations so far) that the locations expected by the gcc generated code are offset from those expected by EWARM by eight bytes. I've only investigated a single call, so this may not be constant...
IAR's compiler flags include: --aeabi and --guard_calls as per section: "AEABI compliance" in the EWARM help section.
arm-none-eabi-gcc compiler flags include: -gdwarf-3 -mabi=aapcs -march=armv7e-m -mthumb.
I believe this tells both EWARM and gcc to play nice together with ARM AAPCS standard procedure calls and dwarf v3 formats.
EWARM does seem to be happy with either -gdwarf-2 or -gdwarf-3 (but not -4). This selection does not appear to affect the issue discussed above.
What else is required?
The answer to "What else is required?" appears to be nothing. Just be darn sure that all of the macros evaluated by #ifdef statements match in the environments so you don't end up with different sized data structures in the two different environments! #ifdef code is header files should be carefully evaluated...

What are the correct options for an ARM cross compiler with crosstool-NG

I am trying to build a cross compiler to target the processor running on my NAS box using crosstool-NG.
The NAS box is a ZyXEL NSA210, there is an example dmesg output, the /proc/cpuinfo is:
Processor : ARM926EJ-S rev 5 (v5l)
BogoMIPS : 183.09
Features : swp half thumb fastmult edsp java
CPU implementer : 0x41
CPU architecture: 5TEJ
CPU variant : 0x0
CPU part : 0x926
CPU revision : 5
...
Hardware : Oxsemi NAS
Revision : 0000
Serial : 00000d51caab2d00
The options on the target options page, the flag and my current settings in ():
Target Architecture (arm)
Use the MMU (yes)
Endianness (Little endian)
Bitness (32-bit)
Default instruction set mode (arm)
Use EABI (yes)
Architecture level --with-arch= ()
Emit assembly for CPU --with-cpu= ()
Tune for CPU ()
Use specific FPU ()
Floating point (software)
Target CFLAGS ()
Target LDFLAGS ()
I've been trying various combinations in the 'Architecture level' and 'Emit assembly for CPU', such as arm926ej-s, armv5l, armv5tej, but I don't know which option goes where.
I've set the Target OS to bare-metal as crosstool-NG doesn't have the version of Linux used on the box.
Also, once the toolchain is built do I need to pass the same options again to the compilers.
So far by attempts have just produced the Illegal instruction message.
Edit
If anyone could point me towards an article on setting up an ARM GCC toolchain with explicit reference of how to find out the correct parameters, that would answer my question.
Try one of these
--with-arch=armv5te
--with-tune=arm926ej-s
or
--with-cpu=arm926ej-s
(there's no point in having both).
Otherwise your options look fine.
If it still doesn't work then you need to look at the libraries and headers. If you want to use dynamically linked libraries then you'll need to have ones that match those on the target, version wise and name wise. If you want to use static linking, or copy your own shared libraries onto the target (in a non-standard place, perhaps, which would need extra config), you should be fine.
Either way, you'll need your kernel headers to match. You can probably just download some contemporary kernel headers from kernel.org.

How can I get a list of legal ARM opcodes from gcc (or elsewhere)?

I'd like to generate pseudo-random ARM instructions. Via assembler directives, I can tell gcc what mode I'm in, and it will complain if I try a set of opcodes and operands that's not legal in that mode, so it must have some internal listing of what can be done in which mode. Where does that live? Would it be easier to extract that info from LLVM?
Is this question "not even wrong"? Should I try a different approach entirely?
To answer my own question, this is actually really easy to do from arm.md and and constraints.md in gcc/config/arm/. I probably spent more time answering asking this question and answering comments for it than I did figuring this out. Turns out I just need to look for 'TARGET_THUMB1', until I get around to implementing thumb2.
For the ARM family the buck stops at the ARM ARM (ARM Architectural Reference Manual). There is an ARM instruction set section and a Thumb instruction set section. Within both each instruction tells you what generation (ARMvX where X is some number like 4 (arm7), or 5 (arm9 time frame) ,etc). Since the opcode and pseudo code is listed for each instruction you should be able to figure out what is a real instruction and, if any, are syntax to save typing on another (push and pop for example).
With the Cortex-m3 and thumb2 in particular you also need to look at the TRM (Technical Reference Manual) as well. ARM has, I forget the name, a universal syntax they are trying to use that should work on both Thumb and ARM. For example on an ARM you have three register instructions:
add r1,r1,r2
In thumb there are only two register operations
add r1,r2
The desire basically is to meet in the middle or I would say more accurately to encourage ARM assemblers to parse Thumb instructions and encode them with the equivalent ARM instruction without complaining. This may have started with thumb and not thumb2, I have always separated the two syntaxes in my code until recently (and I still generally use ARM syntax for ARM and Thumb for Thumb).
And then yes you have to see what the specific implementation of the assembler tool is, in your case binutils. And it sounds like you have found the binutils/gnu secret decoder ring.

Resources