What CPU instruction sets extensions are needed to support the target 'riscv32' for Linux/GCC? - gcc

I was looking into Linux support for the RISC-V 32 Bit and came across the following compile instructions:
make ARCH=riscv CROSS_COMPILE=riscv32-unknown-linux-gnu- -j $(nproc)
The issue is that riscv32 does not make it clear if just the base CPU instruction set is needed (RV32I) or if additional extensions are needed/toggleable (RV32IMAC seems to be a common target).
This leaves me with the following questions:
What CPU instruction set or sets is implied in above command?
If not RV32I, can optional arguments be added to support RV32I?

From what I can see in the Linux kernel v5.19 the Makefile for RISC-V (arch/riscv/Makefile) looks like this:
# ISA string setting
riscv-march-$(CONFIG_ARCH_RV32I) := rv32ima
riscv-march-$(CONFIG_ARCH_RV64I) := rv64ima
riscv-march-$(CONFIG_FPU) := $(riscv-march-y)fd
riscv-march-$(CONFIG_RISCV_ISA_C) := $(riscv-march-y)c
The ISA used is rv32ima for 32-bit RISC-V kernels. Additionally, the f, d and c extensions can also be enabled by configuring CONFIG_FPU (for fd) and CONFIG_RISCV_ISA_C (for c). By default these two are both set to y so you have rv32imafdc, however you can disable fdc if you run menuconfig (or similar) and set CONFIG_FPU=n and CONFIG_RISCV_ISA_C=n.
Note: this is for Linux v5.19. You will have to make sure yourself by inspecting the Makefile of your kernel if you have a different version.
can optional arguments be added to support RV32I?
No, it does not look like so, rv32ima is the minimum set of extensions selected when tarteging RISC-V 32bit. But again, if you have a different kernel version lower than v5.19 you'd have to check your Makefile to make sure (maybe some older kernels are fine with just rv32i).

Related

How do the `aapcs` and `aapcs-linux` ABI options differ when compiling for bare-metal ARM with gcc?

I am trying to port an application to ARM's arm-none-eabi-gcc toolchain. This application is intended to run on a bare-metal target.
The only two suitable values for the -mabi option in this case appear to be aapcs and aapcs-linux. From Debian documentation and Embedded Linux from Source I know that aapcs-linux uses a fixed 4-byte enum size, whereas aapcs defines enums as "variable length". However, I can't find any information on what other differences (if any) there might be.
Does anyone know the full list of differences between these two ABI options?

Shell script: Portable way to programmably obtain the CPU vendor on POSIX systems

Is there a portable way to programmably obtain the CPU vendor info on POSIX systems in shell scripts? In particular, I need to tell whether an x86_64/AMD64 CPU is vended by Intel or AMD. The approach does not have to work on all POSIX systems, but it should work on a decent range of common POSIX systems: GNU/Linux, MacOS, and *BSD. As an example, a Linux only approach is to extract the info from /proc/cpuinfo.
POSIX (IEEE Std 1003.1-2017) does not mandate a system utility or shell variable holding the CPU brand. The closest you'll get is uname -m, which is the "hardware type on which the system is running". Unfortunately, that command doesn't have standardized output, so while you might get amd64 on some older machines, you'll mostly get i686 or x86_64 these days.
POSIX does mandate c99, a basic C compiler interface, be present when a C compiler is available at all. You can use that to compile a naive version of cpuid:
$ cat cpuid.c
#include <stdio.h>
#include <string.h>
#include <stdint.h>
int main() {
uint32_t regs[4] = { 0 };
char brand[13] = { 0 };
#ifdef _WIN32
__cpuidex((int *)regs, 0, 0);
#else
__asm volatile("cpuid" : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3]) : "a" (0), "c" (0));
#endif
memcpy(&brand[0], &regs[1], 4);
memcpy(&brand[4], &regs[3], 4);
memcpy(&brand[8], &regs[2], 4);
printf("%s\n", brand);
return 0;
}
On a variety of test machines, here's what I get:
$ c99 -o cpuid cpuid.c && ./cpuid # MacOS X
GenuineIntel
$ c99 -o cpuid cpuid.c && ./cpuid # Intel-based AWS EC2 (M5)
GenuineIntel
$ c99 -o cpuid cpuid.c && ./cpuid # AMD-based AWS EC2 (T3a)
AuthenticAMD
Wikipedia lists numerous other possible vendor brands based on the cpuid instruction, but the ones likely most interesting for your defined use case are:
GenuineIntel - Intel
AMDisbetter! - AMD
AuthenticAMD - AMD
Provided you had this simple executable available in your path, the POSIX-y logic would look like:
if cpuid | grep -q AMD; then
: # AMD logic here
elif cpuid | grep -q Intel; then
: # Intel logic here
else # neither Intel nor AMD
echo "Unsupported CPU vendor: $(cpuid)" >&2
fi
If you have a very, very old multi-core motherboard from the days when AMD was pin-equivalent with Intel, then you might care to know if CPU0 and CPU1 are the same vendor, in which case the C program above can be modified in the assembly lines to check processor 1 instead of 0 (the second argument to the respective asm functions).
This illustrates one particular benefit of this approach: if what you really want to know is whether the CPU supports a particular feature set (and are just using vendor as a proxy), then you can modify the C code to check whether the CPU feature is actually available. That's a quick modification to the EAX value given to the assembly code and a change to the interpretation of the E{B,C,D}X result registers.
With regards to the availability of c99, note that:
A POSIX conforming system without c99 is proof that no C compiler's available on that system. If your target systems do not have c99, then you need to select and install a C compiler (gcc, clang, msvc, etc.) or attempt a fallback detection with eg /proc/cpuinfo.
The standard declares that "Unlike all of the other non-OB-shaded utilities in this standard, a utility by this name probably will not appear in the next version of this standard. This utility's name is tied to the current revision of the ISO C standard at the time this standard is approved. Since the ISO C standard and this standard are maintained by different organizations on different schedules, we cannot predict what the compiler will be named in the next version of the standard." So you should consider compiling with something along the lines of ${C99:-c99} -o cpuid cpuid.c, which lets you flex as the binary name changes over time.
I would proceed writing exact commands to get the cpu vendor for every supported OS and then run the appropriate set of commands for the given OS detection.
I wrote an example that can be easily improved / extended, taking in consideration the Operating Systems in your question:
OS="`uname`"
case "$OS" in
SunOS*) /usr/platform/`uname -m`/sbin/prtdiag -v ;;
Darwin*) sysctl -n machdep.cpu.vendor ;;
Linux*) lscpu | grep Vendor | awk '{print $NF}' ;;
FreeBSD*) sysctl -n hw.model | awk 'NR==1{print $NF}' ;;
*) echo "unknown: $OS" ;;
esac
This is the basic logic you need:
Detect OS type: linux OR BSD if BSD darwin OR other bsd. If other BSD, openbsd, freebsd, netbsd, dragonfly bsd. If darwin, you'll need darwin specific handling. If not a bsd and not linux, is it a proprietary type Unix? Are you going to try to handle it? If not, you need a safe fallback. This will determine what methods you use to do some, but not all, of the detections.
If linux, it's easy if all you want is intel or amd, unless you need solid 32/64 bit detections, you specified 64 bit only, is this the running kernel or the cpu? So that has to be handled if it's relevant. Does it matter what type of intel/amd cpu it is? They make some SOC variants for example.
sysctl for BSDs will give whatever each BSD decided to put in there. dragonfly and freebsd will be similar or the same, openbsd you have to check release to release, netbsd... is tricky. Some installs will require root to read sysctl, that's out of your hands, so you have to handle it case by case, and have error handling to detect root required, that varies, the usual is to make it user readable data, but not always. Note that the bsds can and do change the syntax of some field's data in the output, so you have to keep up on it if you actually want bsd support. Apple in general does not seem to care at all about real unix tools being able to work with their data, so it's empirical, don't assume without seeing several generations of the output. And they don't include a lot of standard unix tools by default so you can't assume things are actually installed in the first place.
/proc/cpuinfo will cover all linux system for amd/intel, and a variety of methods can be used to pinpoint if it's running 32 bit or 64 bit, and if it's a 32 or 64 bit cpu.
vm's can help, but only go part of the way since the cpu will be your host machine's, or part of it. Getting reliable current and last generation data that is reliable and real is a pain. But if you have intel and amd systems to work with, you can install most of the bsd variants except darwin/osx and debug on those, so that gets you to most of the os types, except darwin, which requires having a mac of some type available.
Does failure matter? Does it actually matter if the detection fails? If so, how is failure handled? Does ARM/MIPS/PPC matter? What about other CPUs, like Elbrus? that have many intel-like features, but which are not amd or intel?
Like the comment said, read the cpu block in inxi to pick out what you need, but it's not easy to do, and requires a lot of data examples, and you'll be sad because one day FreeBSD or osx or openbsd will change something at total random for a new release.
If you ignore OSX, and pretend it doesn't exist, on the bright side, you'll get 98% support out of the box with very little code if all you need is intel/amd detection via /proc/cpuinfo, that prints it out as neat as can be desired. If you must have OSX, then you have to add the full suite of BSD handlers, which is a pain. Personally I wouldn't touch a project like that unless I got paid to do it, re OSX. Usually you can get FreeBSD and maybe OpenBSD reasonably readily, though you have to check every new major release to see if it all still works.
If you add more requirements, like cpus other than intel/amd, then it gets a lot harder and takes much more code.
Note that on darwin, currently all osx is I believe intel, though there's rumors apple is looking to leave intel. previously they were powerpc, so it also comes down to how robust the solution has to be, that is, do you care if it fails on a mac powerpc? do you care if it fails on a future mac that is not intel powered?
Further note that if BSD is specified, that excludes a wide variety of even more fragmented Unix systems, like openindiana, solaris proper, the proprietary unices of ibm, hp, and so on, which all use different tools.

gcc; Aarch64; Armv8; enable crypto; -mcpu=cortex-a53+crypto

I am trying to optimize an Arm processor (Corte-A53) with an Armv8 architecture for crypto purposes.
The problem is that however the compiler accepts -mcpu=cortex-a53+crypto etc it doesn't change the output (I checked the assembly output).
Changing mfpu, mcpu add futures like crypto or simd, it doesn't matter, it is completely ignored.
To enable Neon code -ftree-vectorize is needed, how to make use of crypto?
(I checked the -O(1,2,3) flags, it won't help).
Edit: I realized I made a mistake by thinking the crypto flag works like an optimization flag solved by the compiler. My bad.
You had two questions...
Why does -mcpu=cortex-a53+crypto not change code output?
The crypto extensions are an optional feature under the AArch64 state of ARMv8-A. The +crypto feature flag indicates to the compiler that these instructions are available use. From a practical perspective, in GCC 4.8/4.9/5.1, this defines the macro __ARM_FEATURE_CRYPTO, and controls whether or not you can use the crypto intrinsics defined in ACLE, for example:
uint8x16_t vaeseq_u8 (uint8x16_t data, uint8x16_t key)
There is no optimisation in current GCC which will automatically convert a sequence of C code to use the cryptography instructions. If you want to make this transformation, you have to do it by hand (and guard it by the appropriate feature macro).
Why do the +fpu and +simd flags not change code output?
For -mcpu=cortex-a53 the +fp and +simd flags are implied by default (for some configurations of GCC +crypto may also be implied by default). Adding these feature flags will therefore not change code generation.

What are the correct options for an ARM cross compiler with crosstool-NG

I am trying to build a cross compiler to target the processor running on my NAS box using crosstool-NG.
The NAS box is a ZyXEL NSA210, there is an example dmesg output, the /proc/cpuinfo is:
Processor : ARM926EJ-S rev 5 (v5l)
BogoMIPS : 183.09
Features : swp half thumb fastmult edsp java
CPU implementer : 0x41
CPU architecture: 5TEJ
CPU variant : 0x0
CPU part : 0x926
CPU revision : 5
...
Hardware : Oxsemi NAS
Revision : 0000
Serial : 00000d51caab2d00
The options on the target options page, the flag and my current settings in ():
Target Architecture (arm)
Use the MMU (yes)
Endianness (Little endian)
Bitness (32-bit)
Default instruction set mode (arm)
Use EABI (yes)
Architecture level --with-arch= ()
Emit assembly for CPU --with-cpu= ()
Tune for CPU ()
Use specific FPU ()
Floating point (software)
Target CFLAGS ()
Target LDFLAGS ()
I've been trying various combinations in the 'Architecture level' and 'Emit assembly for CPU', such as arm926ej-s, armv5l, armv5tej, but I don't know which option goes where.
I've set the Target OS to bare-metal as crosstool-NG doesn't have the version of Linux used on the box.
Also, once the toolchain is built do I need to pass the same options again to the compilers.
So far by attempts have just produced the Illegal instruction message.
Edit
If anyone could point me towards an article on setting up an ARM GCC toolchain with explicit reference of how to find out the correct parameters, that would answer my question.
Try one of these
--with-arch=armv5te
--with-tune=arm926ej-s
or
--with-cpu=arm926ej-s
(there's no point in having both).
Otherwise your options look fine.
If it still doesn't work then you need to look at the libraries and headers. If you want to use dynamically linked libraries then you'll need to have ones that match those on the target, version wise and name wise. If you want to use static linking, or copy your own shared libraries onto the target (in a non-standard place, perhaps, which would need extra config), you should be fine.
Either way, you'll need your kernel headers to match. You can probably just download some contemporary kernel headers from kernel.org.

How can I get a list of legal ARM opcodes from gcc (or elsewhere)?

I'd like to generate pseudo-random ARM instructions. Via assembler directives, I can tell gcc what mode I'm in, and it will complain if I try a set of opcodes and operands that's not legal in that mode, so it must have some internal listing of what can be done in which mode. Where does that live? Would it be easier to extract that info from LLVM?
Is this question "not even wrong"? Should I try a different approach entirely?
To answer my own question, this is actually really easy to do from arm.md and and constraints.md in gcc/config/arm/. I probably spent more time answering asking this question and answering comments for it than I did figuring this out. Turns out I just need to look for 'TARGET_THUMB1', until I get around to implementing thumb2.
For the ARM family the buck stops at the ARM ARM (ARM Architectural Reference Manual). There is an ARM instruction set section and a Thumb instruction set section. Within both each instruction tells you what generation (ARMvX where X is some number like 4 (arm7), or 5 (arm9 time frame) ,etc). Since the opcode and pseudo code is listed for each instruction you should be able to figure out what is a real instruction and, if any, are syntax to save typing on another (push and pop for example).
With the Cortex-m3 and thumb2 in particular you also need to look at the TRM (Technical Reference Manual) as well. ARM has, I forget the name, a universal syntax they are trying to use that should work on both Thumb and ARM. For example on an ARM you have three register instructions:
add r1,r1,r2
In thumb there are only two register operations
add r1,r2
The desire basically is to meet in the middle or I would say more accurately to encourage ARM assemblers to parse Thumb instructions and encode them with the equivalent ARM instruction without complaining. This may have started with thumb and not thumb2, I have always separated the two syntaxes in my code until recently (and I still generally use ARM syntax for ARM and Thumb for Thumb).
And then yes you have to see what the specific implementation of the assembler tool is, in your case binutils. And it sounds like you have found the binutils/gnu secret decoder ring.

Resources