I'm trying to port some Linux C code to an Apple M1 Mac and have been encountering an issue with some inline assembly. And it has me stumped.
I have the following inline assembly block:
#define TEST asm volatile(\
"adr x0, label9 \n"\
: : : "x0");
And have encountered the following error:
test.c:73:5: error: unknown AArch64 fixup kind!
TEST
^
./ARM_branch_macros.h:2862:7: note: expanded from macro 'TEST'
"adr x0, label9 \n"\
^
<inline asm>:1:2: note: instantiated into assembly here
adr x0, label9
^
1 error generated.
make: *** [indirect_branch_latency.o] Error 1
I am using the following compiler:
Apple clang version 12.0.0 (clang-1200.0.32.27)
Target: arm64-apple-darwin20.1.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
With the command line:
clang -c -o test.o test.c -I. -w -g -lrt -O0 -static -DARM_ASSEMBLY
Any help would be greatly appreciated!
The ADR instruction stores the offset from the current PC value to the label you reference.
When you have an instruction that references a symbol in a different object file (or in a different section in the same object file), the assembler can't encode the exact offset directly as it doesn't know how the linker will lay them out, but has to leave a relocation in the object file, instructing the linker to fix up the instruction once the exact location of the symbol is known.
I think the issue here is simply that the MachO object file (which is used on apple platforms) format doesn't have a relocation type for fixing up an ADR instruction pointing at a symbol elsewhere. And even if it had that, the construct is pretty brittle - the symbol that it points at has to be within +/- 1 MB from the instruction referencing it - that's a limit that is pretty easy to hit.
To get access to a bigger range, an ADRP+ADD instruction pair is often used, which gives you a +/- 4 GB range, and the MachO format does support those.
The assembler syntax for them differs a bit between MachO and ELF (and COFF). For MachO, the syntax looks like this:
adrp x0, symbol#PAGE
add x0, x0, symbol#PAGEOFF
Or if you want to load from it at the same time:
adrp x0, symbol#PAGE
ldr x1, [x0, symbol#PAGEOFF]
On ELF (the object file format used on Linux) and COFF (windows, when assembling GNU style assembly with LLVM) platforms, the syntax looks like this:
adrp x0, symbol
add x0, x0, :lo12:symbol
Related
To test the performance feature of an assembly language feature, I would like to use it inside a C program with inline assembly to replace existing functionality.
My GNU GAS assembler already recognizes the instruction with an appropriate -march.
How to make GCC generate code that accepts such inline assembly instructions that is cannot yet emit and does not recognize?
For example, the ARM SVE extensions were more or less recently added to the ARM architecture.
So in a minimal useless inline SVE assembly example I try:
main.c
int main(void) {
__asm__ __volatile__ ("ld1d z1.d, p0/z, [x0, x4, lsl 3]");
}
and I try to compile with:
aarch64-linux-gnu-gcc -c -o main.o --save-temps -v main.c
but assembly fails with:
main.s: Assembler messages:
main.s:10: Error: selected processor does not support `ld1d z1.d,p0/z,[x0,x4,lsl 3]'
The verbose output contains:
/usr/lib/gcc-cross/aarch64-linux-gnu/7/../../../../aarch64-linux-gnu/bin/as -v -EL -mabi=lp64 -o main.o main.s
and main.s contains:
.arch armv8-a
.file "main.c"
.text
.align 2
.global main
.type main, %function
main:
#APP
// 2 "main.c" 1
ld1d z1.d, p0/z, [x0, x4, lsl 3]
// 0 "" 2
#NO_APP
mov w0, 0
ret
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0"
.section .note.GNU-stack,"",#progbits
However, my assembler is new enough to assemble that, since I managed to get it to assemble by either:
replacing .arch armv8-a in the generated main.s file manually with .arch armv8-a+sve
removing .arch armv8-a in main.s and adding -march=all or -march=armv8-a+sve on the GAS CLI (TODO .arch all does not exist unfortunately, I wonder why?)
However, if I try to add -march=armv8-a+sve or -march=all to the GCC CLI, it complains respectively with:
cc1: error: invalid feature modifier in ‘-march=armv8-a+sve’
cc1: error: unknown value ‘all’ for -march
I also tried to add -Xassembler -march=all to the GCC CLI:
aarch64-linux-gnu-gcc -Xassembler -march=all -c -o main.o --save-temps -v main.c
and the option does get forwarded to the assembler according to -v, but it still fails, because the .arch armv8-a that GCC puts into the assembly file takes precedence.
Out of desperation, I finally tried to modify my C example to:
int main(void) {
__asm__ __volatile__ (".arch armv8-a+sve\nld1d z1.d, p0/z, [x0, x4, lsl 3]");
}
and that did work, but I think it is too insane to be usable. Notably, the .arch armv8-a+sve stays in effect even after my inline assembly statement, and I don't know how to revert to the previous default GCC .arch.
I also looked for GCC -masm options which might be useful like this one but could not find anything interesting.
Tested in Ubuntu 18.04, aarch64-linux-gnu-gcc 7.4.0, aarch64-linux-gnu-as 2.30.
My program works fine on Ubuntu.
It encounters error when I compile it with gcc on a Solaris SPARC system.
I have several pieces of code like:
printf("endian_convert: %s\n", endian_convert);
asm("movl $8, %esi\n\t"
"movl $.LC0, %edi\n\t"
"movl $0, %eax");
This is the error I get on SPARC:
gcc -g -Wall -Werror -pedantic -Wextra src/utfconverter.c -o bin/utf
/usr/ccs/bin/as: "/var/tmp//cc9czJEf.s", line 957: error: unknown "%"-symbol
/usr/ccs/bin/as: "/var/tmp//cc9czJEf.s", line 957: error: statement syntax
.......
/usr/ccs/bin/as: "/var/tmp//cc9czJEf.s", line 1058: error: unknown "%"-symbol
/usr/ccs/bin/as: "/var/tmp//cc9czJEf.s", line 1058: error: statement syntax
*** Error code 1 make: Fatal error: Command failed for target `utf'
So, the "%" symbol is considered as unknown on SPARC?
How can I fix this and make it working on SPARC?
(The original version of the question didn't mention that the errors were from a SPARC Solaris system, and just called it C90 because the old version of gcc installed on it defaulted to -std=c90, leading to error messages about things that are illegal in C90.)
Wait a minute, "works fine on Ubuntu but not on C90"? /usr/ccs/bin/as (in your screenshot) looks like Solaris. That + the hostname is a clue that this might be a SPARC machine, not x86 at all.
Obviously x86 assembly isn't valid SPARC assembly syntax. It's a different CPU architecture.
If you'd used gcc foo.c -S and looked at the resulting foo.s asm output file, you'd see that it was full of SPARC asm, except for text inserted literally by your asm statements.
SPARC syntax does use % decorators on register names, but the register names are different. e.g. add %i0, %i1, %o0 adds input registers i0 and i1, storing the result in output register o0. (Input as in function arg and output as in function result. SPARC uses a sliding window onto a large virtual register file that might or might not spill to memory, depending on whether the CPU microarchitecture is out of registers when the save instruction runs.)
Remember that these errors are from the Solaris assembler, not from gcc. You're using gcc but it's using the system assembler instead of the GNU assembler.
Anyway, I recommend rewriting your code into pure portable C, rather than using #ifdef __x86__ to keep using that inline asm, or writing a SPARC port of it.
BTW, your asm statement looks horrible. A different version of gcc might store a different constant at .LC0, breaking your code. More importantly, you're not using input/output constraints to tell the compiler what value is where. If you're assuming it's ok to set eax in asm inside a function, that's incorrect. The function can and will inline, and then your asm is just floating free in the middle of wherever your function inlined. See the end of this answer for links to some GNU C inline asm tutorials.
Also, you don't need inline asm to endian-convert. You will get better asm from using endian.h functions like uint32_t le32toh(uint32_t little_endian_32bits); which use gcc builtins or inline asm to get the compiler to make optimal assembly output itself.
See also https://gcc.gnu.org/wiki/DontUseInlineAsm, which applies even if you did know how to use it properly.
I have the following code which compiles fine with the gcc command gcc ./example.c. The program itself calls the function "add_two" which simply adds two integers. To use the intel syntax within the extended assembly instructions I need to switch at first to intel and than back to AT&T. According to the gcc documentation it is possible to switch to intel syntax entirely by using gcc -masm=intel ./exmaple.
Whenever I try to compile it with the switch -masm=intel it won't compile and I don't understand why? I already tried to delete the instruction .intel_syntax but it still don't compile.
#include <stdio.h>
int add_two(int, int);
int main(){
int src = 3;
int dst = 5;
printf("summe = %d \n", add_two(src, dst));
return 0;
}
int add_two(int src, int dst){
int sum;
asm (
".intel_syntax;" //switch to intel syntax
"mov %0, %1;"
"add %0, %2;"
".att_syntax;" //switch to at&t syntax
: "=r" (sum) //output
: "r" (src), "r" (dst) //input
);
return sum;
}
The error message by compiling the above mentioned program with gcc -masm=intel ./example.c is:
tmp/ccEQGI4U.s: Assembler messages:
/tmp/ccEQGI4U.s:55: Error: junk `PTR [rbp-4]' after expression
/tmp/ccEQGI4U.s:55: Error: too many memory references for `mov'
/tmp/ccEQGI4U.s:56: Error: too many memory references for `mov'
Use -masm=intel and don't use any .att_syntax directives in your inline asm. This works with GCC and I think ICC, and with any constraints you use. Other methods don't. (See Can I use Intel syntax of x86 assembly with GCC? for a simple answer saying that; this answer explores exactly what goes wrong, including with clang 13 and earlier.)
That also works in clang 14 and later. (Which isn't released yet but the patch is part of current trunk; see https://reviews.llvm.org/D113707).
Clang 13 and earlier would always use AT&T syntax for inline asm, both in substituting operands and in assembling as op src, dst. But even worse, clang -masm=intel would do that even when taking the Intel side of an asm template using dialect-alternatives like asm ("add {att | intel}" : ... )`!
clang -masm=intel did still control how it printed asm after its built-in assembler turned an asm() statement into some internal representation of the instruction. e.g. Godbolt showing clang13 -masm=intel turning add %0, 1 as add dword ptr [1], eax, but clang trunk producing add eax, 1.
Some of the rest of this answer talking about clang hasn't been updated for this new clang patch.
Clang does support Intel-syntax inside MSVC-style asm-blocks, but that's terrible (no constraints so inputs / outputs have to go through memory.
If you were hard-coding register names with clang, -masm=intel would be usable (or the equivalent -mllvm --x86-asm-syntax=intel). But it chokes on mov %eax, 5 in Intel-syntax mode so you can't let %0 expand to an AT&T-syntax register name.
-masm=intel makes the compiler use .intel_syntax noprefix at the top of its asm output file, and use Intel-syntax when generating asm from C outside your inline-asm statement. Using .att_syntax at the bottom of your asm template breaks the compiler's asm, hence the error messages like PTR [rbp-4] looking like junk to the assembler (which is expecting AT&T syntax).
The "too many operands for mov" is because in AT&T syntax, mov eax, ebx is a mov from a memory operand (with symbol name eax) to a memory operand (with symbol name ebx)
Some people suggest using .intel_syntax noprefix and .att_syntax prefix around your asm template. That can sometimes work but it's problematic. And incompatible with the preferred method of -masm=intel.
Problems with the "sandwich" method:
When the compiler substitutes operands into your asm template, it will do so according to -masm=. This will always break for memory operands (the addressing-mode syntax is completely different).
It will also break with clang even for registers. Clang's built-in assembler does not accept %eax as a register name in Intel-syntax mode, and doesn't accept .intel_syntax prefix (as opposed to the noprefix that's usually used with Intel-syntax).
Consider this function:
int foo(int x) {
asm(".intel_syntax noprefix \n\t"
"add %0, 1 \n\t"
".att_syntax"
: "+r"(x)
);
return x;
}
It assembles as follows with GCC (Godbolt):
movl %edi, %eax
.intel_syntax noprefix
add %eax, 1 # AT&T register name in Intel syntax
.att_syntax
The sandwich method depends on GAS accepting %eax as a register name even in Intel-syntax mode. GAS from GNU Binutils does, but clang's built-in assembler doesn't.
On a Mac, even using real GCC the asm output has to assemble with an as that's based on clang, not GNU Binutils.
Using clang on that source code complains:
<source>:2:35: error: unknown token in expression
asm(".intel_syntax noprefix \n\t"
^
<inline asm>:2:6: note: instantiated into assembly here
add %eax, 1
^
(The first line of the error message didn't handle the multi-line string literal very well. If you use ; instead of \n\t and put everything on one line the clang error message works better but the source is a mess.)
I didn't check what happens with "ri" constraints when the compiler picks an immediate; it will still decorate it with $ but IDK if GAS silently ignores that, too, in Intel syntax mode.
PS: your asm statement has a bug: you forgot an early-clobber on your output operand so nothing is stopping the compiler from picking the same register for the %0 output and the %2 input that you don't read until the 2nd instruction. Then mov will destroy an input.
But using mov as the first or last instruction of an asm-template is usually also a missed-optimization bug. In this case you can and should just use lea %0, [%1 + %2] to let the compiler add with the result written to a 3rd register, non-destructively. Or just wrap the add instruction (using a "+r" operand and an "r", and let the compiler worry about data movement.) If it had to load the value from memory anyway, it can put it in the right register so no mov is needed.
PS: it's possible to write inline asm that works with -masm=intel or att, using GNU C inline asm dialect alternatives. e.g.
void atomic_inc(int *p) {
asm( "lock add{l $1, %0 | %0, 1}"
: "+m" (*p)
:: "memory"
);
}
compiles with gcc -O2 (-masm=att is the default) to
atomic_inc(int*):
lock addl $1, (%rdi)
ret
Or with -masm=intel to:
atomic_inc(int*):
lock add DWORD PTR [rdi], 1
ret
Notice that the l suffix is required for AT&T, and the dword ptr is required for intel, because memory, immediate doesn't imply an operand-size. And that the compiler filled in valid addressing-mode syntax for both cases.
This works with clang, but only the AT&T version ever gets used.
Note that -masm= also affects the default inline assembler syntax:
Output assembly instructions using selected dialect. Also affects
which dialect is used for basic "asm" and extended "asm". Supported
choices (in dialect order) are att or intel. The default is att.
Darwin does not support intel.
That means that your first .intel_syntax directive is superfluous and the final .att_syntax is wrong because your GCC call compiles C to Intel assembler code.
IOW, either stick to -masm=intel or sandwich your inline Intel assembler code sections between .intel_syntax noprefix and .att_syntax prefix directives - but don't do both.
Note that the sandwich method isn't compatible with all inline assembler constraints - e.g. a constraint that involves m (i.e. memory operand) would insert an operand in ATT syntax which would yield an error like 'Error: junk (%rbp) after expression'. In those cases you have to use -masm=intel.
This question already has answers here:
How to generate a nasm compilable assembly code from c source code on Linux?
(3 answers)
Closed 2 years ago.
I am trying to learn assembly language as a hobby and I frequently use gcc -S to produce assembly output. This is pretty much straightforward, but I fail to compile the assembly output. I was just curious whether this can be done at all. I tried using both standard assembly output and intel syntax using the -masm=intel. Both can't be compiled with nasm and linked with ld.
Therefore I would like to ask whether it is possible to generate assembly code, that can be then compiled.
To be more precise I used the following C code.
>> cat csimp.c
int main (void){
int i,j;
for(i=1;i<21;i++)
j= i + 100;
return 0;
}
Generated assembly with gcc -S -O0 -masm=intel csimp.c and tried to compile with nasm -f elf64 csimp.s and link with ld -m elf_x86_64 -s -o test csimp.o. The output I got from nasm reads:
csimp.s:1: error: attempt to define a local label before any non-local labels
csimp.s:1: error: parser: instruction expected
csimp.s:2: error: attempt to define a local label before any non-local labels
csimp.s:2: error: parser: instruction expected
This is most probably due to broken assembly syntax. My hope is that I would be able to fix this without having to manually correct the output of gcc -S
Edit:
I was given a hint that my problem is solved in another question; unfortunately, after testing the method described there, I was not able to produce nasm assembly format. You can see the output of objconv below.
Therefore I still need your help.
>>cat csimp.asm
; Disassembly of file: csimp.o
; Sat Jan 30 20:17:39 2016
; Mode: 64 bits
; Syntax: YASM/NASM
; Instruction set: 8086, x64
global main: ; **the ':' should be removed !!!**
SECTION .text ; section number 1, code
main: ; Function begin
push rbp ; 0000 _ 55
mov rbp, rsp ; 0001 _ 48: 89. E5
mov dword [rbp-4H], 1 ; 0004 _ C7. 45, FC, 00000001
jmp ?_002 ; 000B _ EB, 0D
?_001: mov eax, dword [rbp-4H] ; 000D _ 8B. 45, FC
add eax, 100 ; 0010 _ 83. C0, 64
mov dword [rbp-8H], eax ; 0013 _ 89. 45, F8
add dword [rbp-4H], 1 ; 0016 _ 83. 45, FC, 01
?_002: cmp dword [rbp-4H], 20 ; 001A _ 83. 7D, FC, 14
jle ?_001 ; 001E _ 7E, ED
pop rbp ; 0020 _ 5D
ret ; 0021 _ C3
; main End of function
SECTION .data ; section number 2, data
SECTION .bss ; section number 3, bss
Apparent solution:
I made a mistake when cleaning up the output of objconv. I should have run:
sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g; /default *rel/d" csimp.asm
All steps can be condensed in a bash script
#! /bin/bash
a=$( echo $1 | sed "s/\.c//" ) # strip the file extension .c
# compile binary with minimal information
gcc -fno-asynchronous-unwind-tables -s -c ${a}.c
# convert the executable to nasm format
./objconv/objconv -fnasm ${a}.o
# remove unnecesairy objconv information
sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g; /default *rel/d" ${a}.asm
# run nasm for 64-bit binary
nasm -f elf64 ${a}.asm
# link --> see comment of MichaelPetch below
ld -m elf_x86_64 -s ${a}.o
Running this code I get the ld warning:
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080
The executable produced in this manner crashes with segmentation fault message. I would appreciate your help.
The difficulty I think you hit with the entry point error was attempting to use ld on an object file containing the entry point named main while ld was looking for an entry point named _start.
There are a couple of considerations. First, if you are linking with the C library for the use of functions like printf, linking will expect main as the entry point, but if you are not linking with the C library, ld will expect _start. Your script is very close, but you will need some way to differentiate which entry point you need to fully automate the process for any source file.
For example, the following is a conversion using your approach of a source file including printf. It was converted to nasm using objconv as follows:
Generate the object file:
gcc -fno-asynchronous-unwind-tables -s -c struct_offsetof.c -o s3.obj
Convert with objconv to nasm format assembly file
objconv -fnasm s3.obj
(note: my version of objconv added DOS line endings -- probably an option missed, I just ran it through dos2unix)
Using a slightly modified version of your sed call, tweak the contents:
sed -i -e 's/align=1//g' -e 's/[a-z]*execute//g' -e \
's/: *function//g' -e '/default *rel/d' s3.asm
(note: if no standard library functions, and using ld, change main to _start by adding the following expressions to your sed call)
-e 's/^main/_start/' -e 's/[ ]main[ ]*.*$/ _start/'
(there are probably more elegant expressions for this, this was just for example)
Compile with nasm (replacing original object file):
nasm -felf64 -o s3.obj s3.asm
Using gcc for link:
gcc -o s3 s3.obj
Test
$ ./s3
sizeof test : 40
myint : 0 0
mychar : 4 4
myptr : 8 8
myarr : 16 16
myuint : 32 32
You basically can't, at least directly. GCC does output assembly in Intel syntax; but NASM/MASM/TASM have their own Intel syntax. They are largely based on it, but there are as well some differences the assembler may not be able to understand and thus fail to compile.
The closest thing is probably having objdump show the assembly in Intel format:
objdump -d $file -M intel
Peter Cordes suggests in the comments that assembler directives will still target GAS, so they won't be recognized by NASM for example. They typically have the same name, but GAS-like directives start with a . as in .section text (vs section text).
There are many different assembly languages - for each CPU there's possibly multiple possible syntaxes (e.g. "Intel syntax", "AT&T syntax"), then completely different directives, pre-processor, etc on top of that. It adds up to about 30 different dialects of assembly language for 32-bit 80x86 alone.
GCC is only able to generate one dialect of assembly language for 32-bit 80x86. This means it can't work with NASM, FASM, MASM, TASM, A86/A386, etc. It only works for GAS (and possibly YASM in its "AT&T mode" maybe).
Of course you can compile code with 3 different compilers into 3 different types of assembly, then write 3 more different pieces of code (in 3 more different types of assembly) yourself; then assemble all of that (each with their appropriate assembler) into object files and link all the object files together.
How can I instruct gcc to emit idiv (integer division, udiv and sdiv) instructions for arm application processors?
So far only way I can come up with is to use -mcpu=cortex-a15 with gcc 4.7.
$cat idiv.c
int test_idiv(int a, int b) {
return a / b;
}
On gcc 4.7 (bundled with Android NDK r8e)
$gcc -O2 -mcpu=cortex-a15 -c idiv.c
$objdump -S idiv.o
00000000 <test_idiv>:
0: e710f110 sdiv r0, r0, r1
4: e12fff1e bx lr
Even this one gives idiv.c:1:0: warning: switch -mcpu=cortex-a15 conflicts with -march=armv7-a switch [enabled by default] if you add -march=armv7-a next to -mcpu=cortex-a15 and doesn't emit idiv instruction.
$gcc -O2 -mcpu=cortex-a15 -march=armv7-a -c idiv.c
idiv.c:1:0: warning: switch -mcpu=cortex-a15 conflicts with -march=armv7-a switch [enabled by default]
$objdump -S idiv.o
00000000 <test_idiv>:
0: e92d4008 push {r3, lr}
4: ebfffffe bl 0 <__aeabi_idiv>
8: e8bd8008 pop {r3, pc}
On gcc 4.6 (bundled with Android NDK r8e) it doesn't emit idiv instructions at all but recognizes -mcpu=cortex-a15 also doesn't complain to -mcpu=cortex-a15 -march=armv7-a combination.
Afaik idiv is optional on armv7, so there should be a cleaner way to instruct gcc to emit them but how?
If the instruction is not in the machine descriptions, then I doubt that gcc will emit code. Note1
You can always use inline-assembler to get the instruction if the compiler is not supporting it.Note2 Since your op-code is fairly rare/machine specific, there is probably not so much effort to get it in the gcc source. Especially, there are arch and tune/cpu flags. The tune/cpu is for a more specific machine, but the arch is suppose to allow all machines in that architecture. This op-code seems to break that rule, if I understand.
For gcc 4.6.2, it looks like thumb2 and cortex-r4 are cues to use these instructions and as you have noted with gcc 4.7.2, the cortex-a15 seems to be added to use these instructions. With gcc 4.7.2, the thumb2.md file no longer has udiv/sdiv. However, it might be included somewhere else; I am not 100% familiar with all the machine description language. It also seems that cortex-a7, cortex-a15, and cortex-r5 may enable these instructions with 4.7.2. Note3
This doesn't answer the question directly, but it does give some information/path to get the answer. You can compile the module with -mcpu=cortex-r4, although this may produce linker issues. Also, there is int my_idiv(int a, int b) __attribute__ ((__target__ ("arch=cortexe-r4")));, where you can specify on a per-function basis the machine-description used by the code generator. I haven't used any of these myself, but they are only possibilities to try. Generally you don't want to keep the wrong machine as it could generate sub-optimal (and possibly illegal) op-codes. You will have to experiment and maybe then provide the real answer.
Note1: This is for a stock gcc 4.6.2 and 4.7.2. I don't know if your Android compiler has patches.
gcc-4.6.2/gcc/config/arm$ grep [ius]div *.md
arm.md: "...,sdiv,udiv,other"
cortex-r4.md:;; We guess that division of A/B using sdiv or udiv, on average,
cortex-r4.md:;; This gives a latency of nine for udiv and ten for sdiv.
cortex-r4.md:(define_insn_reservation "cortex_r4_udiv" 9
cortex-r4.md: (eq_attr "insn" "udiv"))
cortex-r4.md:(define_insn_reservation "cortex_r4_sdiv" 10
cortex-r4.md: (eq_attr "insn" "sdiv"))
thumb2.md: "sdiv%?\t%0, %1, %2"
thumb2.md: (set_attr "insn" "sdiv")]
thumb2.md:(define_insn "udivsi3"
thumb2.md: (udiv:SI (match_operand:SI 1 "s_register_operand" "r")
thumb2.md: "udiv%?\t%0, %1, %2"
thumb2.md: (set_attr "insn" "udiv")]
gcc-4.7.2/gcc/config/arm$ grep -i [ius]div *.md
arm.md: "...,sdiv,udiv,other"
arm.md: "TARGET_IDIV"
arm.md: "sdiv%?\t%0, %1, %2"
arm.md: (set_attr "insn" "sdiv")]
arm.md:(define_insn "udivsi3"
arm.md: (udiv:SI (match_operand:SI 1 "s_register_operand" "r")
arm.md: "TARGET_IDIV"
arm.md: "udiv%?\t%0, %1, %2"
arm.md: (set_attr "insn" "udiv")]
cortex-a15.md:(define_insn_reservation "cortex_a15_udiv" 9
cortex-a15.md: (eq_attr "insn" "udiv"))
cortex-a15.md:(define_insn_reservation "cortex_a15_sdiv" 10
cortex-a15.md: (eq_attr "insn" "sdiv"))
cortex-r4.md:;; We guess that division of A/B using sdiv or udiv, on average,
cortex-r4.md:;; This gives a latency of nine for udiv and ten for sdiv.
cortex-r4.md:(define_insn_reservation "cortex_r4_udiv" 9
cortex-r4.md: (eq_attr "insn" "udiv"))
cortex-r4.md:(define_insn_reservation "cortex_r4_sdiv" 10
cortex-r4.md: (eq_attr "insn" "sdiv"))
Note2: See pre-processor as Assembler if gcc is passing options to gas that prevent use of the udiv/sdiv instructions. For example, you can use asm(" .long <opcode>\n"); where opcode is some token pasted stringified register encode macro output. Also, you can annotate your assembler to specify changes in the machine. So you can temporarily lie and say you have a cortex-r4, etc.
Note3:
gcc-4.7.2/gcc/config/arm$ grep -E 'TARGET_IDIV|arm_arch_arm_hwdiv|FL_ARM_DIV' *
arm.c:#define FL_ARM_DIV (1 << 23) /* Hardware divide (ARM mode). */
arm.c:int arm_arch_arm_hwdiv;
arm.c: arm_arch_arm_hwdiv = (insn_flags & FL_ARM_DIV) != 0;
arm-cores.def:ARM_CORE("cortex-a7", cortexa7, 7A, ... FL_ARM_DIV
arm-cores.def:ARM_CORE("cortex-a15", cortexa15, 7A, ... FL_ARM_DIV
arm-cores.def:ARM_CORE("cortex-r5", cortexr5, 7R, ... FL_ARM_DIV
arm.h: if (TARGET_IDIV) \
arm.h:#define TARGET_IDIV ((TARGET_ARM && arm_arch_arm_hwdiv) \
arm.h:extern int arm_arch_arm_hwdiv;
arm.md: "TARGET_IDIV"
arm.md: "TARGET_IDIV"