How can I instruct gcc to emit idiv (integer division, udiv and sdiv) instructions for arm application processors?
So far only way I can come up with is to use -mcpu=cortex-a15 with gcc 4.7.
$cat idiv.c
int test_idiv(int a, int b) {
return a / b;
}
On gcc 4.7 (bundled with Android NDK r8e)
$gcc -O2 -mcpu=cortex-a15 -c idiv.c
$objdump -S idiv.o
00000000 <test_idiv>:
0: e710f110 sdiv r0, r0, r1
4: e12fff1e bx lr
Even this one gives idiv.c:1:0: warning: switch -mcpu=cortex-a15 conflicts with -march=armv7-a switch [enabled by default] if you add -march=armv7-a next to -mcpu=cortex-a15 and doesn't emit idiv instruction.
$gcc -O2 -mcpu=cortex-a15 -march=armv7-a -c idiv.c
idiv.c:1:0: warning: switch -mcpu=cortex-a15 conflicts with -march=armv7-a switch [enabled by default]
$objdump -S idiv.o
00000000 <test_idiv>:
0: e92d4008 push {r3, lr}
4: ebfffffe bl 0 <__aeabi_idiv>
8: e8bd8008 pop {r3, pc}
On gcc 4.6 (bundled with Android NDK r8e) it doesn't emit idiv instructions at all but recognizes -mcpu=cortex-a15 also doesn't complain to -mcpu=cortex-a15 -march=armv7-a combination.
Afaik idiv is optional on armv7, so there should be a cleaner way to instruct gcc to emit them but how?
If the instruction is not in the machine descriptions, then I doubt that gcc will emit code. Note1
You can always use inline-assembler to get the instruction if the compiler is not supporting it.Note2 Since your op-code is fairly rare/machine specific, there is probably not so much effort to get it in the gcc source. Especially, there are arch and tune/cpu flags. The tune/cpu is for a more specific machine, but the arch is suppose to allow all machines in that architecture. This op-code seems to break that rule, if I understand.
For gcc 4.6.2, it looks like thumb2 and cortex-r4 are cues to use these instructions and as you have noted with gcc 4.7.2, the cortex-a15 seems to be added to use these instructions. With gcc 4.7.2, the thumb2.md file no longer has udiv/sdiv. However, it might be included somewhere else; I am not 100% familiar with all the machine description language. It also seems that cortex-a7, cortex-a15, and cortex-r5 may enable these instructions with 4.7.2. Note3
This doesn't answer the question directly, but it does give some information/path to get the answer. You can compile the module with -mcpu=cortex-r4, although this may produce linker issues. Also, there is int my_idiv(int a, int b) __attribute__ ((__target__ ("arch=cortexe-r4")));, where you can specify on a per-function basis the machine-description used by the code generator. I haven't used any of these myself, but they are only possibilities to try. Generally you don't want to keep the wrong machine as it could generate sub-optimal (and possibly illegal) op-codes. You will have to experiment and maybe then provide the real answer.
Note1: This is for a stock gcc 4.6.2 and 4.7.2. I don't know if your Android compiler has patches.
gcc-4.6.2/gcc/config/arm$ grep [ius]div *.md
arm.md: "...,sdiv,udiv,other"
cortex-r4.md:;; We guess that division of A/B using sdiv or udiv, on average,
cortex-r4.md:;; This gives a latency of nine for udiv and ten for sdiv.
cortex-r4.md:(define_insn_reservation "cortex_r4_udiv" 9
cortex-r4.md: (eq_attr "insn" "udiv"))
cortex-r4.md:(define_insn_reservation "cortex_r4_sdiv" 10
cortex-r4.md: (eq_attr "insn" "sdiv"))
thumb2.md: "sdiv%?\t%0, %1, %2"
thumb2.md: (set_attr "insn" "sdiv")]
thumb2.md:(define_insn "udivsi3"
thumb2.md: (udiv:SI (match_operand:SI 1 "s_register_operand" "r")
thumb2.md: "udiv%?\t%0, %1, %2"
thumb2.md: (set_attr "insn" "udiv")]
gcc-4.7.2/gcc/config/arm$ grep -i [ius]div *.md
arm.md: "...,sdiv,udiv,other"
arm.md: "TARGET_IDIV"
arm.md: "sdiv%?\t%0, %1, %2"
arm.md: (set_attr "insn" "sdiv")]
arm.md:(define_insn "udivsi3"
arm.md: (udiv:SI (match_operand:SI 1 "s_register_operand" "r")
arm.md: "TARGET_IDIV"
arm.md: "udiv%?\t%0, %1, %2"
arm.md: (set_attr "insn" "udiv")]
cortex-a15.md:(define_insn_reservation "cortex_a15_udiv" 9
cortex-a15.md: (eq_attr "insn" "udiv"))
cortex-a15.md:(define_insn_reservation "cortex_a15_sdiv" 10
cortex-a15.md: (eq_attr "insn" "sdiv"))
cortex-r4.md:;; We guess that division of A/B using sdiv or udiv, on average,
cortex-r4.md:;; This gives a latency of nine for udiv and ten for sdiv.
cortex-r4.md:(define_insn_reservation "cortex_r4_udiv" 9
cortex-r4.md: (eq_attr "insn" "udiv"))
cortex-r4.md:(define_insn_reservation "cortex_r4_sdiv" 10
cortex-r4.md: (eq_attr "insn" "sdiv"))
Note2: See pre-processor as Assembler if gcc is passing options to gas that prevent use of the udiv/sdiv instructions. For example, you can use asm(" .long <opcode>\n"); where opcode is some token pasted stringified register encode macro output. Also, you can annotate your assembler to specify changes in the machine. So you can temporarily lie and say you have a cortex-r4, etc.
Note3:
gcc-4.7.2/gcc/config/arm$ grep -E 'TARGET_IDIV|arm_arch_arm_hwdiv|FL_ARM_DIV' *
arm.c:#define FL_ARM_DIV (1 << 23) /* Hardware divide (ARM mode). */
arm.c:int arm_arch_arm_hwdiv;
arm.c: arm_arch_arm_hwdiv = (insn_flags & FL_ARM_DIV) != 0;
arm-cores.def:ARM_CORE("cortex-a7", cortexa7, 7A, ... FL_ARM_DIV
arm-cores.def:ARM_CORE("cortex-a15", cortexa15, 7A, ... FL_ARM_DIV
arm-cores.def:ARM_CORE("cortex-r5", cortexr5, 7R, ... FL_ARM_DIV
arm.h: if (TARGET_IDIV) \
arm.h:#define TARGET_IDIV ((TARGET_ARM && arm_arch_arm_hwdiv) \
arm.h:extern int arm_arch_arm_hwdiv;
arm.md: "TARGET_IDIV"
arm.md: "TARGET_IDIV"
Related
I have this simple C code:
#include "uart.h"
#include <string.h>
char x[32];
__attribute__((noinline))
void foo(void)
{
strcpy(x, "xxxxxxxxxxxxxxxxxxxxxxxx");
}
int main(void)
{
uart_puts("xxx\n");
foo();
uart_puts("yyy\n");
}
compiled as:
$ aarch64-none-elf-gcc t78.c -mcpu=cortex-a57 -Wall -Wextra -g -O2 -c -std=c11 \
&& aarch64-none-elf-ld -T linker.ld t78.o boot.o uart.o -o kernel.elf
and executed as:
$ qemu-system-aarch64.exe -machine virt -cpu cortex-a57 -nographic -kernel kernel.elf
prints:
xxx
Why yyy is not printed?
By reducing the issue I've found that:
for strcpy GCC generated a code other than "call strcpy" (see below)
ldr q1, [x0] causes yyy to not be printed.
Here is the generated code of foo:
foo:
.LFB0:
.file 1 "t78.c"
.loc 1 6 1 view -0
.cfi_startproc
.loc 1 7 5 view .LVU1
adrp x0, .LC0
add x0, x0, :lo12:.LC0
adrp x1, .LANCHOR0
add x2, x1, :lo12:.LANCHOR0
ldr q1, [x0] <<== root cause
ldr q0, [x0, 9]
str q1, [x1, #:lo12:.LANCHOR0]
str q0, [x2, 9]
.loc 1 8 1 is_stmt 0 view .LVU2
ret
If I put ret before ldr q1, [x0] the yyy is printed (ax expected).
The question: why ldr q1, [x0] causes yyy to not be printed?
Tool versions:
$ aarch64-none-elf-gcc --version
aarch64-none-elf-gcc.exe (Arm GNU Toolchain 12.2.Rel1 (Build arm-12.24)) 12.2.1 20221205
$ qemu-system-aarch64 --version
QEMU emulator version 7.2.0 (v7.2.0-11948-ge6523b71fc-dirty)
The ldr q1, [x0] instruction is taking an exception because it accesses a floating-point/SIMD register but your startup code does not enable the FPU. The compiler is assuming that it can generate code that uses the FPU, so to meet that assumption one of the things your startup code must do is enable the FPU, via at least CPACR_EL1, and possibly other registers if EL2 or EL3 are enabled.
Alternatively, you could tell the compiler not to emit code that uses the FPU. The Linux kernel takes this approach, using the -mgeneral-regs-only option.
Real hardware probably has more strict requirements for what you need to do to configure the CPU to be able to run C code; QEMU is quite lenient. For instance the architecture defines that the reset value of many system registers is UNKNOWN, though QEMU usually resets them to zero. A robust startup sequence will explicitly set bits in registers like SCTLR_EL1.
You may also need to watch out for whether your compiler and your startup code agree about whether the compiler generated code is allowed to emit unaligned accesses -- if the MMU is not enabled then all memory accesses are treated as of type Device, which means they must be aligned (regardless of SCTLR_EL1.A). So you either need to make sure your compiler doesn't try to emit unaligned loads and stores, or else turn on the MMU and set SCTLR_EL1.A to 0.
You could improve your ability to debug this sort of "exception in early bootup" by installing some exception vectors which do something helpful when an unexpected exception occurs. The ideal is to be able to print registers, especially ELR_EL1 and ESR_EL1, which tell you where and why the exception occurred; printing in early bootup can be tricky, though. An easy compromise is to at least catch the exception and loop; you can then use gdb to see what the CPU state is.
An addition to answer by Peter Maydell.
Here is the code that enables FPU (found here):
mrs x1, cpacr_el1
mov x0, #(3 << 20)
orr x0, x1, x0
msr cpacr_el1, x0
To test the performance feature of an assembly language feature, I would like to use it inside a C program with inline assembly to replace existing functionality.
My GNU GAS assembler already recognizes the instruction with an appropriate -march.
How to make GCC generate code that accepts such inline assembly instructions that is cannot yet emit and does not recognize?
For example, the ARM SVE extensions were more or less recently added to the ARM architecture.
So in a minimal useless inline SVE assembly example I try:
main.c
int main(void) {
__asm__ __volatile__ ("ld1d z1.d, p0/z, [x0, x4, lsl 3]");
}
and I try to compile with:
aarch64-linux-gnu-gcc -c -o main.o --save-temps -v main.c
but assembly fails with:
main.s: Assembler messages:
main.s:10: Error: selected processor does not support `ld1d z1.d,p0/z,[x0,x4,lsl 3]'
The verbose output contains:
/usr/lib/gcc-cross/aarch64-linux-gnu/7/../../../../aarch64-linux-gnu/bin/as -v -EL -mabi=lp64 -o main.o main.s
and main.s contains:
.arch armv8-a
.file "main.c"
.text
.align 2
.global main
.type main, %function
main:
#APP
// 2 "main.c" 1
ld1d z1.d, p0/z, [x0, x4, lsl 3]
// 0 "" 2
#NO_APP
mov w0, 0
ret
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0"
.section .note.GNU-stack,"",#progbits
However, my assembler is new enough to assemble that, since I managed to get it to assemble by either:
replacing .arch armv8-a in the generated main.s file manually with .arch armv8-a+sve
removing .arch armv8-a in main.s and adding -march=all or -march=armv8-a+sve on the GAS CLI (TODO .arch all does not exist unfortunately, I wonder why?)
However, if I try to add -march=armv8-a+sve or -march=all to the GCC CLI, it complains respectively with:
cc1: error: invalid feature modifier in ‘-march=armv8-a+sve’
cc1: error: unknown value ‘all’ for -march
I also tried to add -Xassembler -march=all to the GCC CLI:
aarch64-linux-gnu-gcc -Xassembler -march=all -c -o main.o --save-temps -v main.c
and the option does get forwarded to the assembler according to -v, but it still fails, because the .arch armv8-a that GCC puts into the assembly file takes precedence.
Out of desperation, I finally tried to modify my C example to:
int main(void) {
__asm__ __volatile__ (".arch armv8-a+sve\nld1d z1.d, p0/z, [x0, x4, lsl 3]");
}
and that did work, but I think it is too insane to be usable. Notably, the .arch armv8-a+sve stays in effect even after my inline assembly statement, and I don't know how to revert to the previous default GCC .arch.
I also looked for GCC -masm options which might be useful like this one but could not find anything interesting.
Tested in Ubuntu 18.04, aarch64-linux-gnu-gcc 7.4.0, aarch64-linux-gnu-as 2.30.
I am using as and gcc to assemble and create executables of ARM assembly programs, as recommended by this tutorial, as follows:
Given an assembly source file, program.s, I run:
as -o program.o program.s
Then:
gcc -o program program.o
However, running gcc on the assembly source directly like so:
gcc -o program program.s
yields the same result.
Does gcc call as behind the scenes? Is there any reason at all to use both as and gcc given that gcc alone is able to produce an executable from source?
I am running this on a Raspberry Pi 3, Raspbian Jessie (a Debian derivative), gcc 4.9.2, as 2.25.
GCC calls all kinds of things behind the scenes; not just as but ld as well. This is pretty easy to instrument if you want to prove it (replace the CORRECT as and ld and other binaries with ones that say print out their command line, then run GCC and see that binary gets called).
When you use GCC as an assembler, it goes through a C preprocessor, so you can do some fairly disgusting things like this:
start.s
//this is a comment
#this is a comment
#define FOO BAR
.globl _start
_start:
mov sp,#0x80000
bl hello
b .
.globl world
world:
bx lr
And to see more of what is going on, here are other files:
so.h
unsigned int world ( unsigned int, unsigned int );
#define FIVE 5
#define SIX 6
so.c
#include "so.h"
unsigned int hello ( void )
{
unsigned int a,b,c;
a=FIVE;
b=SIX;
c=world(a,b);
return(c+1);
}
build
arm-none-eabi-gcc -save-temps -nostdlib -nostartfiles -ffreestanding -O2 start.s so.c -o so.elf
arm-none-eabi-objdump -D so.elf
producing
00008000 <_start>:
8000: e3a0d702 mov sp, #524288 ; 0x80000
8004: eb000001 bl 8010 <hello>
8008: eafffffe b 8008 <_start+0x8>
0000800c <world>:
800c: e12fff1e bx lr
00008010 <hello>:
8010: e92d4010 push {r4, lr}
8014: e3a01006 mov r1, #6
8018: e3a00005 mov r0, #5
801c: ebfffffa bl 800c <world>
8020: e8bd4010 pop {r4, lr}
8024: e2800001 add r0, r0, #1
8028: e12fff1e bx lr
being a very simple project. Here is so.i after the pre-processor, which goes and gets the include files and replaces the defines:
# 1 "so.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "so.c"
# 1 "so.h" 1
unsigned int world ( unsigned int, unsigned int );
# 4 "so.c" 2
unsigned int hello ( void )
{
unsigned int a,b,c;
a=5;
b=6;
c=world(a,b);
return(c+1);
}
Then GCC calls the actual compiler (whose program name is not GCC).
That produces so.s:
.cpu arm7tdmi
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 2
.global hello
.syntax unified
.arm
.fpu softvfp
.type hello, %function
hello:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
push {r4, lr}
mov r1, #6
mov r0, #5
bl world
pop {r4, lr}
add r0, r0, #1
bx lr
.size hello, .-hello
.ident "GCC: (GNU) 6.3.0"
Which is then fed to the assembler to make so.o. Then the linker is called to turn these into so.elf.
Now, you can do most of the calls directly. That doesn't mean that these programs have other programs they call. GCC still calls one or more programs to actually do the compile.
arm-none-eabi-as start.s -o start.o
arm-none-eabi-gcc -O2 -S so.c
arm-none-eabi-as so.s -o so.o
arm-none-eabi-ld start.o so.o -o so.elf
arm-none-eabi-objdump -D so.elf
Giving the same result:
00008000 <_start>:
8000: e3a0d702 mov sp, #524288 ; 0x80000
8004: eb000001 bl 8010 <hello>
8008: eafffffe b 8008 <_start+0x8>
0000800c <world>:
800c: e12fff1e bx lr
00008010 <hello>:
8010: e92d4010 push {r4, lr}
8014: e3a01006 mov r1, #6
8018: e3a00005 mov r0, #5
801c: ebfffffa bl 800c <world>
8020: e8bd4010 pop {r4, lr}
8024: e2800001 add r0, r0, #1
8028: e12fff1e bx lr
Using -S with GCC does feel a bit wrong. Using it like this instead feels more natural:
arm-none-eabi-gcc -O2 -c so.c -o so.o
Now there is a linker script that we didn't provide which the toolchain has a default for. We can control that, and depending on what this is being aimed at, perhaps we should.
I am not happy to see that the new/current version of as is tolerant of C comments, etc... Didn't used to be that way, must be a new thing with the latest release.
Thus the term "toolchain" it is a number of tools chained together, one linked to the next in order.
Not all compilers take the assembly language step. Some compile to intermediate code, and then there is another tool that turns that compiler specific intermediate code into assembly language. Then some assembler is called (GCC's intermediate code is inside tables in the compile step, where clang/llvm you can ask it to compile to this code then go from there to assembly language for one of the targets).
Some compilers go straight to machine code and don't stop at assembly language. This is likely one of those "climb the mountain just because it is there" things vs "go around". Like writing an operating system purely in assembly language.
For any decent sized project and a tool that can support it, you are going to have a linker, and an assembler the first tool you make to support a new to you target. The processor (chip or ip or both) vendor is going to have an assembler and then other tools available as well.
Try compiling even the above simple C program by hand using assembly language. Then try it again without using assembly language, by hand, using just machine code. You will find that using assembly language as an intermediate step is far more sane for compiler developers, along with the fact that it has been done this way forever, which is also a good reason to keep doing it this way.
If you wander about in the gnu toolchain directory you are using, you may find programs like cc1:
./libexec/gcc/arm-none-eabi/6.3.0/cc1 --help
The following options are specific to just the language Ada:
None found. Use --help=Ada to show *all* the options supported by the Ada front-end.
The following options are specific to just the language AdaSCIL:
None found. Use --help=AdaSCIL to show *all* the options supported by the AdaSCIL front-end.
The following options are specific to just the language AdaWhy:
None found. Use --help=AdaWhy to show *all* the options supported by the AdaWhy front-end.
The following options are specific to just the language C:
None found. Use --help=C to show *all* the options supported by the C front-end.
The following options are specific to just the language C++:
-Wplacement-new
-Wplacement-new= 0xffffffff
The following options are specific to just the language Fortran:
Now if you run that cc1 program against the so.i file you saved with -save-temps, you get so.s the assembly language file.
You could probably continue to dig into the directory or the gnu tools sources to find even more goodies.
Note this question has been asked before many times here at Stack Overflow in various ways.
Also note main() isn't anything special as I have demonstrated. In some compilers it might be, but I can make programs that don't require that function name.
This question already has answers here:
How to generate a nasm compilable assembly code from c source code on Linux?
(3 answers)
Closed 2 years ago.
I am trying to learn assembly language as a hobby and I frequently use gcc -S to produce assembly output. This is pretty much straightforward, but I fail to compile the assembly output. I was just curious whether this can be done at all. I tried using both standard assembly output and intel syntax using the -masm=intel. Both can't be compiled with nasm and linked with ld.
Therefore I would like to ask whether it is possible to generate assembly code, that can be then compiled.
To be more precise I used the following C code.
>> cat csimp.c
int main (void){
int i,j;
for(i=1;i<21;i++)
j= i + 100;
return 0;
}
Generated assembly with gcc -S -O0 -masm=intel csimp.c and tried to compile with nasm -f elf64 csimp.s and link with ld -m elf_x86_64 -s -o test csimp.o. The output I got from nasm reads:
csimp.s:1: error: attempt to define a local label before any non-local labels
csimp.s:1: error: parser: instruction expected
csimp.s:2: error: attempt to define a local label before any non-local labels
csimp.s:2: error: parser: instruction expected
This is most probably due to broken assembly syntax. My hope is that I would be able to fix this without having to manually correct the output of gcc -S
Edit:
I was given a hint that my problem is solved in another question; unfortunately, after testing the method described there, I was not able to produce nasm assembly format. You can see the output of objconv below.
Therefore I still need your help.
>>cat csimp.asm
; Disassembly of file: csimp.o
; Sat Jan 30 20:17:39 2016
; Mode: 64 bits
; Syntax: YASM/NASM
; Instruction set: 8086, x64
global main: ; **the ':' should be removed !!!**
SECTION .text ; section number 1, code
main: ; Function begin
push rbp ; 0000 _ 55
mov rbp, rsp ; 0001 _ 48: 89. E5
mov dword [rbp-4H], 1 ; 0004 _ C7. 45, FC, 00000001
jmp ?_002 ; 000B _ EB, 0D
?_001: mov eax, dword [rbp-4H] ; 000D _ 8B. 45, FC
add eax, 100 ; 0010 _ 83. C0, 64
mov dword [rbp-8H], eax ; 0013 _ 89. 45, F8
add dword [rbp-4H], 1 ; 0016 _ 83. 45, FC, 01
?_002: cmp dword [rbp-4H], 20 ; 001A _ 83. 7D, FC, 14
jle ?_001 ; 001E _ 7E, ED
pop rbp ; 0020 _ 5D
ret ; 0021 _ C3
; main End of function
SECTION .data ; section number 2, data
SECTION .bss ; section number 3, bss
Apparent solution:
I made a mistake when cleaning up the output of objconv. I should have run:
sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g; /default *rel/d" csimp.asm
All steps can be condensed in a bash script
#! /bin/bash
a=$( echo $1 | sed "s/\.c//" ) # strip the file extension .c
# compile binary with minimal information
gcc -fno-asynchronous-unwind-tables -s -c ${a}.c
# convert the executable to nasm format
./objconv/objconv -fnasm ${a}.o
# remove unnecesairy objconv information
sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g; /default *rel/d" ${a}.asm
# run nasm for 64-bit binary
nasm -f elf64 ${a}.asm
# link --> see comment of MichaelPetch below
ld -m elf_x86_64 -s ${a}.o
Running this code I get the ld warning:
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080
The executable produced in this manner crashes with segmentation fault message. I would appreciate your help.
The difficulty I think you hit with the entry point error was attempting to use ld on an object file containing the entry point named main while ld was looking for an entry point named _start.
There are a couple of considerations. First, if you are linking with the C library for the use of functions like printf, linking will expect main as the entry point, but if you are not linking with the C library, ld will expect _start. Your script is very close, but you will need some way to differentiate which entry point you need to fully automate the process for any source file.
For example, the following is a conversion using your approach of a source file including printf. It was converted to nasm using objconv as follows:
Generate the object file:
gcc -fno-asynchronous-unwind-tables -s -c struct_offsetof.c -o s3.obj
Convert with objconv to nasm format assembly file
objconv -fnasm s3.obj
(note: my version of objconv added DOS line endings -- probably an option missed, I just ran it through dos2unix)
Using a slightly modified version of your sed call, tweak the contents:
sed -i -e 's/align=1//g' -e 's/[a-z]*execute//g' -e \
's/: *function//g' -e '/default *rel/d' s3.asm
(note: if no standard library functions, and using ld, change main to _start by adding the following expressions to your sed call)
-e 's/^main/_start/' -e 's/[ ]main[ ]*.*$/ _start/'
(there are probably more elegant expressions for this, this was just for example)
Compile with nasm (replacing original object file):
nasm -felf64 -o s3.obj s3.asm
Using gcc for link:
gcc -o s3 s3.obj
Test
$ ./s3
sizeof test : 40
myint : 0 0
mychar : 4 4
myptr : 8 8
myarr : 16 16
myuint : 32 32
You basically can't, at least directly. GCC does output assembly in Intel syntax; but NASM/MASM/TASM have their own Intel syntax. They are largely based on it, but there are as well some differences the assembler may not be able to understand and thus fail to compile.
The closest thing is probably having objdump show the assembly in Intel format:
objdump -d $file -M intel
Peter Cordes suggests in the comments that assembler directives will still target GAS, so they won't be recognized by NASM for example. They typically have the same name, but GAS-like directives start with a . as in .section text (vs section text).
There are many different assembly languages - for each CPU there's possibly multiple possible syntaxes (e.g. "Intel syntax", "AT&T syntax"), then completely different directives, pre-processor, etc on top of that. It adds up to about 30 different dialects of assembly language for 32-bit 80x86 alone.
GCC is only able to generate one dialect of assembly language for 32-bit 80x86. This means it can't work with NASM, FASM, MASM, TASM, A86/A386, etc. It only works for GAS (and possibly YASM in its "AT&T mode" maybe).
Of course you can compile code with 3 different compilers into 3 different types of assembly, then write 3 more different pieces of code (in 3 more different types of assembly) yourself; then assemble all of that (each with their appropriate assembler) into object files and link all the object files together.
I'm trying to compile a program for Raspberry Pi 2B (ARMv7 / Neon), but I get an error from an inline assembly code:
Error: VFP single precision register expected -- `vstmia.64
r9,{d16-d31}'
The code is:
asm volatile (
"vstmia.64 %[reg]!, {d0 - d15} # read all regs\n\t"
"vstmia.64 %[reg], {d16 - d31} # read all regs\n\t"
::[reg] "r" (&vregs):
);
Funny thing is that it doesn't complain about the first vstmia.
I tried with single {d0 - d32} first and I thought maybe there were too many 64-bit registers, but that's obviously not the problem.
vregs is a 8-byte aligned storage.
I'm using arm-linux-gnueabihf-gcc 4.8.3, with this command line:
arm-linux-gnueabihf-gcc -mcpu=cortex-a7 -marm -O2 -g -std=gnu11 -MMD -MP -MF"ARM_decode_table.d" -MT"ARM_decode_table.o" -c -o "ARM_decode_table.o" "../ARM_decode_table.c"
By not specifying an appropriate -mfpu option, you get whatever FPU support the compiler's default configuration provides. From your configuration in this case, that is --with-fpu=vfp, which means crusty old VFPv2 with only 16 D registers overlaying the 32 S registers. Thus the first instruction targeting d0-d15 is fine, but the assembler refuses to assemble the second instruction which it knows won't work on the chosen target.
For Cortex-A7 with NEON, -mfpu=neon-vfpv4 will let the toolchain know that it can let rip and use everything you have available.