I am trying to compile my C code to use soft multiplication in MIPS I as my hardware does not have a hard multiplier.
From this document (page 10): http://www.sm.luth.se/csee/courses/smd/137/doc/gcc.pdf indicates that "-mno-mul" option can be used to inform the compiler to not generate integer multiply/divide instructions and instead insert calls to multiply/divide subroutines.
However, when I feed in the "-mno-mul" option to my compiler, the error message returned is:
unrecognized command line option "-mno-mul"
I tried googling for more information on "-mno-mul", but there is very limited search results returned. The option is not even listed here: https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html
My question is: Has the mno-mul option become obsolete? If so, is there a workaround for the compiler to generate code for soft multiplication?
This option is obsolete, since all MIPS architecture specifications since MIPS1 require an integer multiplier.
You might still be able to track down a copy of GCC 2.96 and compile using that. Or you could write a handler for the illegal instruction trap that implements soft multiplication.
According to gcc MIPS options you can use -mno-mad
-mno-mad
Enable (disable) use of the mad, madu and mul instructions, as provided by the R4650 ISA.
Related
In my C program, I want the processor to compute a*b +c using FMADD instruction rather than MUL and ADD. How do I specify this to the compiler to do this. Also I would like to see FMADD instruction in the assembly code after compile.
gcc version 4.9.2
ARM v7 Processor
You need to have one of the following FPUs,
vfpv4
vfpv4-d16
fpv4-sp-d16
fpv5-sp-d16
fpv5-d16
neon-vfpv4
fp-armv8
neon-fp-armv8
crypto-neon-fp-armv8
You must use the hard-float ABI option.
An example with integers.
An example with floats.
You shouldn't need to specify any special function calls; the compiler will use the instruction if it finds they are beneficial.
The code in arm.c responsible for generation is,
case FMA:
if (TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FMA)
With TARGET_FMA being a version '4' or better FPU.
I am trying to optimize an Arm processor (Corte-A53) with an Armv8 architecture for crypto purposes.
The problem is that however the compiler accepts -mcpu=cortex-a53+crypto etc it doesn't change the output (I checked the assembly output).
Changing mfpu, mcpu add futures like crypto or simd, it doesn't matter, it is completely ignored.
To enable Neon code -ftree-vectorize is needed, how to make use of crypto?
(I checked the -O(1,2,3) flags, it won't help).
Edit: I realized I made a mistake by thinking the crypto flag works like an optimization flag solved by the compiler. My bad.
You had two questions...
Why does -mcpu=cortex-a53+crypto not change code output?
The crypto extensions are an optional feature under the AArch64 state of ARMv8-A. The +crypto feature flag indicates to the compiler that these instructions are available use. From a practical perspective, in GCC 4.8/4.9/5.1, this defines the macro __ARM_FEATURE_CRYPTO, and controls whether or not you can use the crypto intrinsics defined in ACLE, for example:
uint8x16_t vaeseq_u8 (uint8x16_t data, uint8x16_t key)
There is no optimisation in current GCC which will automatically convert a sequence of C code to use the cryptography instructions. If you want to make this transformation, you have to do it by hand (and guard it by the appropriate feature macro).
Why do the +fpu and +simd flags not change code output?
For -mcpu=cortex-a53 the +fp and +simd flags are implied by default (for some configurations of GCC +crypto may also be implied by default). Adding these feature flags will therefore not change code generation.
What does this flag mean? How will it be used? For which purpose will need to enable this flag?
According to the ARM options page for GCC;
Tells the compiler to perform function calls by first loading the
address of the function into a register and then performing a
subroutine call on this register. This switch is needed if the target
function lies outside of the 64-megabyte addressing range of the
offset-based version of subroutine call instruction.
Basically it means that if your binary is small, you'll likely never have a problem with running the default -mno-long-calls and not have to worry about the option.
If the linker gives you the error Relocation truncated to fit: R_ARM_PC24, you've hit the limit of the defaults and need to compile and link your binary using -mlong-calls.
I want to write some inline ARM assembly in my C code. For this code, I need to use a register or two more than just the ones declared as inputs and outputs to the function. I know how to use the clobber list to tell GCC that I will be using some extra registers to do my computation.
However, I am sure that GCC enjoys the freedom to shuffle around which registers are used for what when optimizing. That is, I get the feeling it is a bad idea to use a fixed register for my computations.
What is the best way to use some extra register that is neither input nor output of my inline assembly, without using a fixed register?
P.S. I was thinking that using a dummy output variable might do the trick, but I'm not sure what kind of weird other effects that will have...
Ok, I've found a source that backs up the idea of using dummy outputs instead of hard registers:
4.8 Temporary registers:
People also sometimes erroneously use clobbers for temporary registers. The right way is
to make up a dummy output, and use “=r” or “=&r” depending on the permitted overlap
with the inputs. GCC allocates a register for the dummy value. The difference is that
GCC can pick a convenient register, so it has more flexibility.
from page 20 of this pdf.
For anyone who is interested in more info on inline assembly with GCC this website turned out to be very instructive.
I'd like to generate pseudo-random ARM instructions. Via assembler directives, I can tell gcc what mode I'm in, and it will complain if I try a set of opcodes and operands that's not legal in that mode, so it must have some internal listing of what can be done in which mode. Where does that live? Would it be easier to extract that info from LLVM?
Is this question "not even wrong"? Should I try a different approach entirely?
To answer my own question, this is actually really easy to do from arm.md and and constraints.md in gcc/config/arm/. I probably spent more time answering asking this question and answering comments for it than I did figuring this out. Turns out I just need to look for 'TARGET_THUMB1', until I get around to implementing thumb2.
For the ARM family the buck stops at the ARM ARM (ARM Architectural Reference Manual). There is an ARM instruction set section and a Thumb instruction set section. Within both each instruction tells you what generation (ARMvX where X is some number like 4 (arm7), or 5 (arm9 time frame) ,etc). Since the opcode and pseudo code is listed for each instruction you should be able to figure out what is a real instruction and, if any, are syntax to save typing on another (push and pop for example).
With the Cortex-m3 and thumb2 in particular you also need to look at the TRM (Technical Reference Manual) as well. ARM has, I forget the name, a universal syntax they are trying to use that should work on both Thumb and ARM. For example on an ARM you have three register instructions:
add r1,r1,r2
In thumb there are only two register operations
add r1,r2
The desire basically is to meet in the middle or I would say more accurately to encourage ARM assemblers to parse Thumb instructions and encode them with the equivalent ARM instruction without complaining. This may have started with thumb and not thumb2, I have always separated the two syntaxes in my code until recently (and I still generally use ARM syntax for ARM and Thumb for Thumb).
And then yes you have to see what the specific implementation of the assembler tool is, in your case binutils. And it sounds like you have found the binutils/gnu secret decoder ring.