Detect ARM NEON availability in the preprocessor?

Detect ARM NEON availability in the preprocessor? - gcc

According to the ARM ARM, __ARM_NEON__ is defined when Neon SIMD instructions are available. I'm having trouble getting GCC to provide it.
Neon available on this BananaPi Pro dev board running Debian 8.2:
$ cat /proc/cpuinfo | grep neon
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
I'm using GCC 4.9:
$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Try GCC and -march=native:
$ g++ -march=native -dM -E - </dev/null | grep -i neon
#define __ARM_NEON_FP 4
OK, try what Google uses for Android when building for Neon:
$ g++ -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp -dM -E - </dev/null | grep -i neon
#define __ARM_NEON_FP 4
Maybe a ARMv7-a with a hard float:
$ g++ -march=armv7-a -mfloat-abi=hard -dM -E - </dev/null | grep -i neon
#define __ARM_NEON_FP 4
My questions are:
why am I not seeing __ARM_NEON__?
how do I detect Neon availability in the preprocessor?
And maybe:
what GCC switches should I use to enable Neon SIMD instructions?
Related, on a LeMaker HiKey, which is AARCH64/ARM64 running Linaro with GCC 4.9.2, here's the output from the preprocessor:
$ cpp -dM </dev/null | grep -i neon
#define __ARM_NEON 1
According to ARM, this board does have Advanced SIMD instructions even though:
$ cat /proc/cpuinfo
Processor : AArch64 Processor rev 3 (aarch64)
...
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32

There are a number of questions hidden in here, I'll try to extract them in turn...
According to the ARM ARM, __ARM_NEON__ is defined when Neon SIMD instructions are available. I'm having trouble getting GCC to provide it.
That is compiler documentation for [an old version of] the ARM Compiler rather than the ARM Architceture Reference Manual. A better macro to check for the presence of the Advanced SIMD instructions would be __ARM_NEON, which is defined in the ARM C Language Extensions.
Try GCC and -march=native:
As you may have found. GCC for the ARM target separates out -march (For the architecture revision for which GCC should generate code), -mfpu (For the floating point/Advanced SIMD unit available) and -mfloat-abi (For how floating point arguments should be passed, and for the presence or absence of a floating point unit). Finally there is -mtune (Which asks GCC to try to optimise for a particular processor) and -mcpu (which acts as a combination of -mtune and -march).
By asking for -march=native You're asking GCC to generate code appropriate for the detected architecture of the processor on which you are running. This has no impact on the -mfpu setting, and so does not necessarily enable Advanced SIMD instruction generation.
Note that the above only applies to a compiler targeting AArch32. The AArch64 GCC does not support -mfpu and will detect presence of Advanced SIMD support through -march=native.
OK, try what Google uses for Android when building for Neon:
$ g++ -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp -dM -E
These build flags are not sufficient to enable support for Advanced SIMD instructions, your notes may be incomplete. Of the -mfpu flags supported by GCC 4.9.2 I'd expect any of:
neon, neon-fp16, neon-vfpv4, neon-fp-armv8, crypto-neon-fp-armv8
To give you what you want.
According to ARM, this board does have Advanced SIMD instructions even though:
Looks like you're running on an AArch64 kernel, which exposes support for Advanced SIMD through the asimd feature - as in your example output.

Related

Cross compiling - Error: selected processor does not support `fmrx r3,fpexc' in ARM mode - Beaglebone

I'm trying to cross-compile a file to flash into the Beaglebone Black.
All works fine, but if I try to enable the FPU with
#define set_en_bit_in_fpexc() do { \
int dummy; \
__asm__ __volatile__ ("fmrx %0,fpexc\n\t" \
"orr %0,%0,#0x40000000\n\t" \
"fmxr fpexc,%0" : "=r" (dummy) : :); \
} while (0)
I get the following error
Error: selected processor does not support `fmrx r3,fpexc' in ARM mode
Error: selected processor does not support `fmxr fpexc,r3' in ARM mode
I also tried with thumb mode, but I get the same errors.
Of course if I remove the part of the code that initialize the FPU it works fine.
Why I get those errors?
Makefile
[...]
CROSSPATH?=/usr/bin
CROSSPFX=$(CROSSPATH)/arm-none-eabi-
CC=$(CROSSPFX)gcc
AS=$(CROSSPFX)as
LD=$(CROSSPFX)ld
NM=$(CROSSPFX)nm
OBJCOPY=$(CROSSPFX)objcopy
OBJDUMP=$(CROSSPFX)objdump
CFLAGS=-Wall -Wextra -O2 -ffreestanding
ARCHFLAGS=-mcpu=cortex-a8 -march=armv7-a -mfpu=neon
CCARCHFLAGS=$(ARCHFLAGS) -marm
[...]
I'm on Arch, kernel 4.8.1
P.S. My professor uses the linaro cross-compiler and it works just fine

Most of the Linaro toolchains are configured for ARMv7 hard-float by default (certainly the Linux ones, I'm less sure about the bare-metal ones). Looking at the configuration of the arm-none-eabi toolchain as packaged by Arch, I surmise it's just using the GCC defaults for things like that, which implies something like ARMv4t, and crucially, soft-float ABI.
Whilst the -mfpu option controls code generation in terms of which floating-point instructions may be used, apparently it's the float ABI which controls whether it'll let you do things which really only make sense on a hardware FPU, rather than under floating-point emulation.
When it's not configured by default, you need to explicitly select a floating-point ABI implying an actual hardware FPU, i.e. -mfloat-abi=hard (or -mfloat-abi=softfp, but there's really no reason to use that unless you need to link against other soft-float code).

-mfpu=vfpv3-d16 -mfloat-abi=hard
Just to give a more direct solution, I had to add -mfpu=vfpv3-d16.
Test code a.S:
fmrx r2, fpscr
Working command:
sudo apt-get install binutils-arm-linux-gnueabihf
arm-linux-gnueabihf-as -mfpu=vfpv3-d16 -mfloat-abi=hard a.S
Note that -mfloat-abi=hard is enabled by default on this particular build of arm-linux-gnueabihf-as, and could be omitted.
The default value of float-abi likely depends on -msoft-float vs -mhard-float controlled at GCC build time with:
./configure --with-float=soft
as documented at: https://gcc.gnu.org/install/configure.html You can get the flags used for your toolchain build with gcc -v as mentioned at: What configure options were used when building gcc / libstdc++? I could not however easily determine its default value if not given.
You may also be interested in -mfloat-abi=softfp which can produce hard floats for the executable, but generate soft function calls: ARM compilation error, VFP registered used by executable, not object file
The possible values of -mfpu= can be found at: https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/ARM-Options.html#ARM-Options
Also note that FMRX is the pre-UAL syntax for VMRS which the newer recommended syntax, see also: Are ARM instructuons SWI and SVC exactly same thing?
Tested on Ubuntu 16.04, arm-linux-gnueabihf-as 2.26.1.

Compiling gcc with AVR options

I want to generate the assembly file of my code oriented to the AVR architecture, I am using gcc version 4.7.2 with the following arguments:
g++ -O3 -Wall -S -Wp,-mmcu=atmega8 -o "src\Compression.o" "..\src\Compression.cpp"
but I am getting the following error:
cc1plus.exe: error: unrecognized command line option '-mmcu=atmega8'
But I got the command options from the gcc website:
http://gcc.gnu.org/onlinedocs/gcc-4.7.3/gcc/AVR-Options.html#AVR-Options
There should be something that I am missing, could you help me with this please!

If gcc does not accept -mmcu, you are probably not using a gcc with support for the AVR architecture.
It's normally used like this:
avr-gcc -mmcu=atmega328p
because it's not only the preprocessor, it's actually other tools as well which require this setting (linker, assembler).
Normally the architecture gcc is compiled for is indicated by a prefix, in this case it's avr- by convention.
So the solution is to get a toolchain with AVR support. You can download it from Atmel's web site, even for Linux.
Update
If you like to check the configuration of your gcc, you can use -dumpmachine to check for the target processor
$ gcc -dumpmachine
i486-linux-gnu
$ arm-none-eabi-gcc -dumpmachine
arm-none-eabi
$ avr-gcc -dumpmachine
avr
If you look at the target specific options using --target-help
$ gcc --target-help | grep march
-march= Generate code for given CPU
you can see that the Linux gcc does accept -march as well. It probably fails later.
gcc is a very complex piece of software, because it just supports so many different architectures. From that perspective it works amazingly well.
Another interesting option is -v
$ gcc -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.5-8'
[...]
to see how that gcc has been built.
And there could be another trap down the road (multi-libs), as you can see here

What is the proper architecture-specific options (-m) for Sandy Bridge based Pentium?

I'm trying to figure out how to set -march option properly to see how much performance difference between the option enabled and disabled can occur on my PC with gcc 4.7.2.
Before trying compiling, I tried to find what is the best -march option for my PC. My PC has Pentium G850, whose architecture is Sandy Bridge. So I referred to the gcc 4.7.2 manual and found that -march=corei7-avx seems the best.
However, I remembered that Sandy Bridge based Pentium lacks AVX and AES-NI instruction set support, which is true for Pentium G850. So -march=corei7-avx is not a proper option.
I come up with some potential options:
-march=corei7-avx -mno-avx -mno-aes
-march=corei7 -mtune=corei7-avx
-march=native
The first option looks reasonable considering information I have, but I'm anxious that there may be missing feature other than AVX and AES-NI. The second option looks safe, but it could miss some minor features on Sandy Bridge because of -march=corei7. The third option will take care of all of my concerns, but I've heard this option sometimes misdetects features of CPU so I would like to know how to manually do that.
I've googled and searched StackOverflow and SuperUser, but I can't find any clear solutions...
What options should be set?

What about detecting via GCC, for me (gcc-5.3.0) on an i5-2450M CPU (Lenovo e520), the following shows:
gcc -march=native -E -v - </dev/null 2>&1 | grep cc1
/usr/libexec/gcc/x86_64-pc-linux-gnu/5.3.0/cc1 -E -quiet -v - -march=sandybridge
-mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16
-msahf -mno-movbe -maes -mno-sha -mpclmul -mpopcnt -mno-abm -mno-lwp
-mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx
-mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd
-mno-f16c -mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr
-mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd
-mno-vx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves
-mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma
-mno-avx512vbmi -mno-clwb -mno-pcommit -mno-mwaitx --param
l1-cache-size=32 --param l1-cache-line-size=64 --param
l2-cache-size=3072 -mtune=sandybridge -fstack-protector-strong

I would suggest to use -march=corei7-avx -mtune=corei7-avx -mno-avx -mno-aes. It is important to specify -mtune because this option tells gcc which CPU model it should use for scheduling instructions in the generated code.

I hava a Sandy Bridge based Intel(R) Celeron(R) CPU G530.
When use -march=native in gentoo's CFLAGS, and then compile media-video/ffmpeg-1.2.6 (current stable version in Gentoo), there is something wrong when playing video with mplayer( illegal instruction). Just like what you said, -mtune=native sometimes misdetects features of CPU.
Then I change to -march=corei7-avx -mtune=corei7-avx -mno-avx -mno-aes, and recompile ffmpeg-1.2.6 and mplayer, things are all ok till now.

How to use gcc and -msoft-float on an i386/x86-64? [duplicate]

Is it (easily) possible to use software floating point on i386 linux without incurring the expense of trapping into the kernel on each call? I've tried -msoft-float, but it seems the normal (ubuntu) C libraries don't have a FP library included:
$ gcc -m32 -msoft-float -lm -o test test.c
/tmp/cc8RXn8F.o: In function `main':
test.c:(.text+0x39): undefined reference to `__muldf3'
collect2: ld returned 1 exit status

It is surprising that gcc doesn't support this natively as the code is clearly available in the source within a directory called soft-fp. It's possible to compile that library manually:
$ svn co svn://gcc.gnu.org/svn/gcc/trunk/libgcc/ libgcc
$ cd libgcc/soft-fp/
$ gcc -c -O2 -msoft-float -m32 -I../config/arm/ -I.. *.c
$ ar -crv libsoft-fp.a *.o
There are a few c files which don't compile due to errors but the majority does compile. After copying libsoft-fp.a into the directory with our source files they now compile fine with -msoft-float:
$ gcc -g -m32 -msoft-float test.c -lsoft-fp -L.
A quick inspection using
$ objdump -D --disassembler-options=intel a.out | less
shows that as expected no x87 floating point instructions are called and the code runs considerably slower as well, by a factor of 8 in my example which uses lots of division.
Note: I would've preferred to compile the soft-float library with
$ gcc -c -O2 -msoft-float -m32 -I../config/i386/ -I.. *.c
but that results in loads of error messages like
adddf3.c: In function '__adddf3':
adddf3.c:46: error: unknown register name 'st(1)' in 'asm'
Seems like the i386 version is not well maintained as st(1) points to one of the x87 registers which are obviously not available when using -msoft-float.
Strangely or luckily the arm version compiles fine on an i386 and seems to work just fine.

Unless you want to bootstrap your entire toolchain by hand, you could start with uclibc toolchain (the i386 version, I imagine) -- soft float is (AFAIK) not directly supported for "native" compilation on debian and derivatives, but it can be used via the "embedded" approach of the uclibc toolchain.

GCC does not support this without some extra libraries. From the 386 documentation:
-msoft-float Generate output containing library calls for floating
point. Warning: the requisite
libraries are not part of GCC.
Normally the facilities of the
machine's usual C compiler are used,
but this can't be done directly in
cross-compilation. You must make your
own arrangements to provide suitable
library functions for
cross-compilation.
On machines where a function returns
floating point results in the 80387
register stack, some floating point
opcodes may be emitted even if
-msoft-float is used
Also, you cannot set -mfpmath=unit to "none", it has to be sse, 387 or both.
However, according to this gnu wiki page, there is fp-soft and ieee. There is also SoftFloat.
(For ARM there is -mfloat-abi=softfp, but it does not seem like something similar is available for 386 SX).
It does not seem like tcc supports software floating point numbers either.
Good luck finding a library that works for you.

G'day,
Unless you're targetting a platform that doesn't have inbuilt FP support, I can't think of a reason why you'd want to emulate FP support.
Doesn't your x386 platform have external FPU support? Pity it's not a x486 with the FPU built in!
In my experience, any soft emulation is bound to be much slower than its hardware equivalent.
That's why I finished up writing a package in Ada to taget the onboard 68k FPU instead of using the soft emulation provided by the compiler manufacturer at the time. They finished up bundling it in their compiler as a matter of fact.
Edit: Just seen your comment below. Hmmm, if you don't need a full suite of FP support is it possible to roll your own for the few math functions you do need? That how the Ada package I mentioned got started.
HTH
cheers,

Does Xcode 4 have support for AVX?

Before I spend time and money downloading Xcode 4, can anyone tell me whether it comes with a version of gcc (or any other compiler, e.g. LLVM) which supports the AVX instruction set on Sandy Bridge CPUs (i.e. gcc -mavx on mainstream gcc builds) ? I don't seen any public release notes anywhere so it's not easy to check, and I don't really need Xcode 4 yet unless it has AVX support.

I eventually cracked and downloaded Xcode 4 - it looks like clang is the only compiler that may support AVX currently, although I haven't tested it properly:
$ clang -dM -E -mavx - < /dev/null | grep -i avx
#define __AVX__ 1

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio