The following code worked just fine in gcc6.2:
int main(){
long x, y = 0;
__asm__ (
".data\n\t"
"var_0: .byte 2\n\t"
".text\n\t"
"xor %%eax, %%eax\n\t"
"leaq var_0(%%rax), %%rbx\n\t"
"movb var_0(%%rax), %%al\n\t"
"movq %%rax, %0\n\t"
"movq %%rbx, %1\n\t"
:"=m" (x), "=m" (y)
:
:"%rax", "%rbx"
);
printf("%x %x\n",x, y);
}
Sample output of this is 2 601031. This syntax suggests to me that gcc (or gas I suppose) uses instruction relative addressing. If I replace the above lines with
"leaq var_0(%%rip), %%rbx\n\t"
"movb var_0(%%rip), %%al\n\t"
then the output is again 2 601031 (EDIT: previous statement was incorrect as I had the output of slightly different code). I don't know where this is documented but it's great. However, when I upgraded to gcc6.3, the RIP relative addressing version still works fine but the instruction relative version produces the error
/usr/bin/ld: /tmp/cc2E9Upc.o: relocation R_X86_64_32S against `.data' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
even if I recompile with gcc -fPIC. Could someone tell me what's going on between the two versions of gcc? Are there any changes I need to be aware of? Flags that I need to be using?
EDIT 2: PLaying around with objdump has made it clear to me that the title is nonsense. Gcc6.2 simply uses the translated address of var_0 in the executable. I would like to know why this doesn't work with gcc4.3 while I apologize for the hastily written title.
Related
I'm trying to port some Linux C code to an Apple M1 Mac and have been encountering an issue with some inline assembly. And it has me stumped.
I have the following inline assembly block:
#define TEST asm volatile(\
"adr x0, label9 \n"\
: : : "x0");
And have encountered the following error:
test.c:73:5: error: unknown AArch64 fixup kind!
TEST
^
./ARM_branch_macros.h:2862:7: note: expanded from macro 'TEST'
"adr x0, label9 \n"\
^
<inline asm>:1:2: note: instantiated into assembly here
adr x0, label9
^
1 error generated.
make: *** [indirect_branch_latency.o] Error 1
I am using the following compiler:
Apple clang version 12.0.0 (clang-1200.0.32.27)
Target: arm64-apple-darwin20.1.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
With the command line:
clang -c -o test.o test.c -I. -w -g -lrt -O0 -static -DARM_ASSEMBLY
Any help would be greatly appreciated!
The ADR instruction stores the offset from the current PC value to the label you reference.
When you have an instruction that references a symbol in a different object file (or in a different section in the same object file), the assembler can't encode the exact offset directly as it doesn't know how the linker will lay them out, but has to leave a relocation in the object file, instructing the linker to fix up the instruction once the exact location of the symbol is known.
I think the issue here is simply that the MachO object file (which is used on apple platforms) format doesn't have a relocation type for fixing up an ADR instruction pointing at a symbol elsewhere. And even if it had that, the construct is pretty brittle - the symbol that it points at has to be within +/- 1 MB from the instruction referencing it - that's a limit that is pretty easy to hit.
To get access to a bigger range, an ADRP+ADD instruction pair is often used, which gives you a +/- 4 GB range, and the MachO format does support those.
The assembler syntax for them differs a bit between MachO and ELF (and COFF). For MachO, the syntax looks like this:
adrp x0, symbol#PAGE
add x0, x0, symbol#PAGEOFF
Or if you want to load from it at the same time:
adrp x0, symbol#PAGE
ldr x1, [x0, symbol#PAGEOFF]
On ELF (the object file format used on Linux) and COFF (windows, when assembling GNU style assembly with LLVM) platforms, the syntax looks like this:
adrp x0, symbol
add x0, x0, :lo12:symbol
I'm on a Cortex-M0+ device (Thumb only) and I'm trying to dynamically generate some code in ram and then jump to it, like so:
uint16_t code_buf[18];
...
void jump() {
register volatile uint32_t* PASET asm("r0") = &(PA->OUTSET.reg);
register volatile uint32_t* PACLR asm("r1") = &(PA->OUTCLR.reg);
register uint32_t set asm("r2") = startset;
register uint32_t cl0 asm("r3") = clears[0];
register uint32_t cl1 asm("r4") = clears[1];
register uint32_t cl2 asm("r5") = clears[2];
register uint32_t cl3 asm("r6") = clears[3];
register uint32_t dl0 asm("r8") = delays[0];
register uint32_t dl1 asm("r9") = delays[1];
register uint32_t dl2 asm("r10") = delays[2];
register uint32_t dl3 asm("r11") = delays[3];
asm volatile (
"bl code_buf\n"
: [set]"+r" (set) : [PASET]"r" (PASET), [PACLR]"r" (PACLR), [cl0]"r" (cl0), [cl1]"r" (cl1), [cl2]"r" (cl2), [cl3]"r" (cl3), [dl0]"r" (dl0), [dl1]"r" (dl1), [dl2]"r" (dl2), [dl3]"r" (dl3) : "lr"
);
}
The code in code_buf will use the arguments passed via registers (that's why I'm forcing specific registers).
This code compiles fine, but when I look at the disassembly the branch instruction has been changed to
a14: f004 ebb0 blx 0x5178
Which would try to switch the cpu to ARM mode and cause a HardFault. Is there a way to force the assembler to keep the branch as a simple bl?
So it turns out that the toolchain I was using (gcc 4.8) is buggy, and makes two errors: it interprets code_buf as an arm address, and produces a bogus blx label which isn't even legal on a cortex-m0+. I updated it to 6.3.1 and the inline asm was converted to a bl label as it was supposed to.
From section 4.1.1 of the ARMv6-M Architecture Reference Manual:
Thumb interworking is held as bit [0] of an interworking address.
Interworking addresses are used in the following instructions: BX,
BLX, or POP that loads the PC.
ARMv6-M only supports the Thumb
instruction Execution state, therefore the value of address bit [0]
must be 1 in interworking instructions, otherwise a fault occurs. All
instructions ignore bit [0] and write bits [31:1]:’0’ when updating
the PC.
The target of your branch, code_buf, will be word-aligned (possibly double-word aligned) so bit 0 will be clear in its address. The key is to ensure that bit 0 is set before you branch, and then even if the toolchain selects an interworking instruction you'll remain in thumb mode.
I don't have a development environment in front of me to test this, but I would suggest casting to a pointer-to-single-byte type and using pointer arithmetic to set bit 0:
uint8_t *thumb_target = ((uint8_t *)code_buf) + 1;
asm volatile (
"bl thumb_target\n"
: [set]"+r" (set) : [PASET]"r" (PASET), [PACLR]"r" (PACLR), [cl0]"r" (cl0), [cl1]"r" (cl1), [cl2]"r" (cl2), [cl3]"r" (cl3), [dl0]"r" (dl0), [dl1]"r" (dl1), [dl2]"r" (dl2), [dl3]"r" (dl3) : "lr"
);
Edit: The above doesn't work, as Peter Cordes points out, because a local variable can't be used in inline ASM in this context. Not being well-versed in gcc's inline ASM, I won't attempt to fix it.
I have now had a chance to test the supplied code though, and gcc 7.2.1 with -S -mtune=cortex-m0plus -fomit-frame-pointer generates a BL not a BLX.
Edit 2: The documentation (section A6.7.14) suggests that only the register-target version of BLX is present in the ARMv6-M architecture (this is in common with the ARMv7 devices I'm most familiar with) and so it looks to me as if the fault is caused not by an attempt to switch to ARM mode but by an illegal instruction. Is your compiler correctly configured?
IDK why your assembler would be changing bl into blx. Mine doesn't, using arm-none-eabi-gcc 7.3.0 on Arch Linux. arm-none-eabi-as --version shows Binutils 2.30.
unsigned short code_buf[18];
void jump() {
asm("bl code_buf");
asm("blx code_buf"); // still assembles to BL, not BLX
// asm("blx jump");
// asm("bl jump");
}
compiled with arm-none-eabi-gcc -O2 -nostdlib arm-bl.c -mcpu=cortex-m0plus -mthumb (I made a linked executable with -nostdlib so I could see actual branch displacements, not placeholders).
Disassembling with arm-none-eabi-objdump -d a.out shows
00008000 <jump>:
8000: f010 f804 bl 1800c <__data_start>
8004: f010 f802 bl 1800c <__data_start>
8008: 4770 bx lr
800a: 46c0 nop ; (mov r8, r8)
Your f004 ebb0 may be a Thumb2 encoding for BLX. I don't know why you're getting it.
The Thumb encoding for bl is documented in section 5.19 of this ARM7TDMI ISA manual ("long branch with link"), but that manual doesn't mention a Thumb encoding for blx at all (because it's only Thumb, not Thumb 2). The Thumb bl encoding stores the branch displacement right-shifted by 1 (i.e. without the low bit), and always stays in Thumb mode.
It's actually two separate instructions; one which puts the high 12 bits of the displacement into LR, and another which branches and updates LR to the return address. (This 2-instruction hack allows Thumb1 to work without Thumb2 32-bit instructions). Both instructions start with f, so your disassembly shows that you got something else; the first 16-bit chunk of f004 ebb0 is the LR setup, but ebb0 doesn't match any Thumb 1 instruction.
Possibly asm("bl code_buf+1" : ...); or blx code_buf+1 could work, if the +1 convinces the assembler to treat it as a Thumb target. But you might need to use asm to get a .thumb_func directive applied to code_buf somehow to keep your assembler happy.
I have the following code which compiles fine with the gcc command gcc ./example.c. The program itself calls the function "add_two" which simply adds two integers. To use the intel syntax within the extended assembly instructions I need to switch at first to intel and than back to AT&T. According to the gcc documentation it is possible to switch to intel syntax entirely by using gcc -masm=intel ./exmaple.
Whenever I try to compile it with the switch -masm=intel it won't compile and I don't understand why? I already tried to delete the instruction .intel_syntax but it still don't compile.
#include <stdio.h>
int add_two(int, int);
int main(){
int src = 3;
int dst = 5;
printf("summe = %d \n", add_two(src, dst));
return 0;
}
int add_two(int src, int dst){
int sum;
asm (
".intel_syntax;" //switch to intel syntax
"mov %0, %1;"
"add %0, %2;"
".att_syntax;" //switch to at&t syntax
: "=r" (sum) //output
: "r" (src), "r" (dst) //input
);
return sum;
}
The error message by compiling the above mentioned program with gcc -masm=intel ./example.c is:
tmp/ccEQGI4U.s: Assembler messages:
/tmp/ccEQGI4U.s:55: Error: junk `PTR [rbp-4]' after expression
/tmp/ccEQGI4U.s:55: Error: too many memory references for `mov'
/tmp/ccEQGI4U.s:56: Error: too many memory references for `mov'
Use -masm=intel and don't use any .att_syntax directives in your inline asm. This works with GCC and I think ICC, and with any constraints you use. Other methods don't. (See Can I use Intel syntax of x86 assembly with GCC? for a simple answer saying that; this answer explores exactly what goes wrong, including with clang 13 and earlier.)
That also works in clang 14 and later. (Which isn't released yet but the patch is part of current trunk; see https://reviews.llvm.org/D113707).
Clang 13 and earlier would always use AT&T syntax for inline asm, both in substituting operands and in assembling as op src, dst. But even worse, clang -masm=intel would do that even when taking the Intel side of an asm template using dialect-alternatives like asm ("add {att | intel}" : ... )`!
clang -masm=intel did still control how it printed asm after its built-in assembler turned an asm() statement into some internal representation of the instruction. e.g. Godbolt showing clang13 -masm=intel turning add %0, 1 as add dword ptr [1], eax, but clang trunk producing add eax, 1.
Some of the rest of this answer talking about clang hasn't been updated for this new clang patch.
Clang does support Intel-syntax inside MSVC-style asm-blocks, but that's terrible (no constraints so inputs / outputs have to go through memory.
If you were hard-coding register names with clang, -masm=intel would be usable (or the equivalent -mllvm --x86-asm-syntax=intel). But it chokes on mov %eax, 5 in Intel-syntax mode so you can't let %0 expand to an AT&T-syntax register name.
-masm=intel makes the compiler use .intel_syntax noprefix at the top of its asm output file, and use Intel-syntax when generating asm from C outside your inline-asm statement. Using .att_syntax at the bottom of your asm template breaks the compiler's asm, hence the error messages like PTR [rbp-4] looking like junk to the assembler (which is expecting AT&T syntax).
The "too many operands for mov" is because in AT&T syntax, mov eax, ebx is a mov from a memory operand (with symbol name eax) to a memory operand (with symbol name ebx)
Some people suggest using .intel_syntax noprefix and .att_syntax prefix around your asm template. That can sometimes work but it's problematic. And incompatible with the preferred method of -masm=intel.
Problems with the "sandwich" method:
When the compiler substitutes operands into your asm template, it will do so according to -masm=. This will always break for memory operands (the addressing-mode syntax is completely different).
It will also break with clang even for registers. Clang's built-in assembler does not accept %eax as a register name in Intel-syntax mode, and doesn't accept .intel_syntax prefix (as opposed to the noprefix that's usually used with Intel-syntax).
Consider this function:
int foo(int x) {
asm(".intel_syntax noprefix \n\t"
"add %0, 1 \n\t"
".att_syntax"
: "+r"(x)
);
return x;
}
It assembles as follows with GCC (Godbolt):
movl %edi, %eax
.intel_syntax noprefix
add %eax, 1 # AT&T register name in Intel syntax
.att_syntax
The sandwich method depends on GAS accepting %eax as a register name even in Intel-syntax mode. GAS from GNU Binutils does, but clang's built-in assembler doesn't.
On a Mac, even using real GCC the asm output has to assemble with an as that's based on clang, not GNU Binutils.
Using clang on that source code complains:
<source>:2:35: error: unknown token in expression
asm(".intel_syntax noprefix \n\t"
^
<inline asm>:2:6: note: instantiated into assembly here
add %eax, 1
^
(The first line of the error message didn't handle the multi-line string literal very well. If you use ; instead of \n\t and put everything on one line the clang error message works better but the source is a mess.)
I didn't check what happens with "ri" constraints when the compiler picks an immediate; it will still decorate it with $ but IDK if GAS silently ignores that, too, in Intel syntax mode.
PS: your asm statement has a bug: you forgot an early-clobber on your output operand so nothing is stopping the compiler from picking the same register for the %0 output and the %2 input that you don't read until the 2nd instruction. Then mov will destroy an input.
But using mov as the first or last instruction of an asm-template is usually also a missed-optimization bug. In this case you can and should just use lea %0, [%1 + %2] to let the compiler add with the result written to a 3rd register, non-destructively. Or just wrap the add instruction (using a "+r" operand and an "r", and let the compiler worry about data movement.) If it had to load the value from memory anyway, it can put it in the right register so no mov is needed.
PS: it's possible to write inline asm that works with -masm=intel or att, using GNU C inline asm dialect alternatives. e.g.
void atomic_inc(int *p) {
asm( "lock add{l $1, %0 | %0, 1}"
: "+m" (*p)
:: "memory"
);
}
compiles with gcc -O2 (-masm=att is the default) to
atomic_inc(int*):
lock addl $1, (%rdi)
ret
Or with -masm=intel to:
atomic_inc(int*):
lock add DWORD PTR [rdi], 1
ret
Notice that the l suffix is required for AT&T, and the dword ptr is required for intel, because memory, immediate doesn't imply an operand-size. And that the compiler filled in valid addressing-mode syntax for both cases.
This works with clang, but only the AT&T version ever gets used.
Note that -masm= also affects the default inline assembler syntax:
Output assembly instructions using selected dialect. Also affects
which dialect is used for basic "asm" and extended "asm". Supported
choices (in dialect order) are att or intel. The default is att.
Darwin does not support intel.
That means that your first .intel_syntax directive is superfluous and the final .att_syntax is wrong because your GCC call compiles C to Intel assembler code.
IOW, either stick to -masm=intel or sandwich your inline Intel assembler code sections between .intel_syntax noprefix and .att_syntax prefix directives - but don't do both.
Note that the sandwich method isn't compatible with all inline assembler constraints - e.g. a constraint that involves m (i.e. memory operand) would insert an operand in ATT syntax which would yield an error like 'Error: junk (%rbp) after expression'. In those cases you have to use -masm=intel.
I know this is a very basic question but I am really stuck on it. In fact I am absolutely newbie in GCC syntax.
I want to have local variables (Stack addresses with labels) without using extended inline assembly. Something like the following code in Intel syntax:
DATA1 DB 100
MOV AL, DATA1
This is the code I guess may substitute in GCC:
int someFunction(int x)
{
__asm__ volatile(
"function1:"
".data;"
".2byte $4 data1 ;"
".text;"
"pushq %rbp;"
"movq %rsp , %rbp ;"
"movl var , %eax;" // this is source of error
"popq %rbp;"
"leaveq;"
"retq ; "
);
}
But this code results in this error:
symbol(s) not found for architecture x86_64
I can use global variables in x86 but the same result comes in x64 or x86_x64.
Setting: LLVM 4.1; Cocoa used in Xcode 4
What is the correct syntax?
GCC inline assembler doesn't support local variables, use GCC's extended syntax.
If you are uncomfortable with AT&T syntax there are ways to use Intel syntax on GCC.
This is an excellent how-to on GCC asm.
I am in the process of trying to compile an old project on my modern machine. I know this old project used an old (2.x) version of GCC/GAS so I need to clean it up so that I can compile it with a current version.
The code in question:
__get_flags:
pushfl
popl %eax
ret
gas complains that pushfl is not an instruction. From what I read this is an old deprecated GNU mnemonic, but there must be some sort of x86_64 equivalent, surely?
In addition to answering how to convert this code to x86_64 gas, it would be great if someone could explain where mnemonics like pushfl and pushfq and defined and documented. I am quite sure that are GAS creations...
Note: I want to compile this on gcc 4.6.1 and gas 2.21
The following:
static inline u16 __get_flags(void) {
u16 flags;
asm ("pushf\n\t"
"pop %0" : : "+r"(flags) : "cc");
return flags;
}
should actually be portable for both 64/32bit, x64/x86. But it only gets you the condition codes, not the high parts of the state register. If you require the entire contents of the flags register when running in 64bit (long) mode, use the PUSHFQ instruction name, with a u64 (or other 64bit quantity) operand.
I can't add a comment to the answer because I don't have 50 reputation. Anyway:
In the answer:
"pop %0" : : "+r"(flags) : "cc");
Should read:
"pop %0" : "=r"(flags) : : "cc");
Because the syntax is asm( code : outputs : inputs : clobbers ) and what you want is flags to be a register that's output from the asm block, not used as input. (I tested this and it worked for me, I also used 64-bit ints)