ARM inline constraints - gcc

Is there a way in ARM assembly to place the address of an array into a register?
Something similar to
__asm__("movl %0,%%eax"::"r"(&array1));
AT&T syntax for X86
My initially attempt when in manner
__asm__("LDR R0,%0" :: "m" (&array`)");
Can you give me any suggestion or point to a place where I can look in for this.

This should work:
int a[10];
asm volatile("mov %r0, %[a]" : : [a] "r" (a));
ARM GCC Inline Assembler Cookbook is a very good resource to get syntax right.
Look also at Specifying Registers for Local Variables in GCC docs. You can directly specify registers for variables.
register int *foo asm ("a5");

Related

gcc inline asm template for constant with out hash

I am trying to emit a global SYMBOL based on a #define VALUE. My attempt is as follows:
__asm__ (".globl SYMBOL");
__asm__ (".set SYMBOL, %0" :: "i" (VALUE));
What is emitted by gcc to the assembler is the following:
.globl SYMBOL
.set SYMBOL, #VALUE
How can I get rid of the hash in the .set before VALUE. FWIW, my target is ARM.
armclang defines various template modifiers that can be used with inline assembly. gcc supports them, in every instance I've checked, although it doesn't document this.
In particular there is
c
Valid for an immediate operand. Prints it as a plain value without a preceding #. Use this template modifier when using the operand in .word, or another data-generating directive, which needs an integer without the #.
So you can do
__asm__ (".set SYMBOL, %c0" : : "i" (VALUE));
Try on godbolt
(There's a few open bugs on the gcc bugzilla suggesting that template / operand modifiers should be documented. The main one seems to be 30527, where I've just posted a comment. The developers' view seems to be that operand modifiers are "compiler internals" that are not meant for end users, but for arm/aarch64 in particular, there are simple things that you just can't do any other way. They made an exception for x86, so why not here?)
You can use stringizing.
#define VALUE 89
#define xstr(s) str(s)
#define str(s) #s
__asm__ (".globl SYMBOL");
__asm__ (".set SYMBOL, " str(VALUE));
The 'VALUE' must conform to something that gas will take as working with set. They could be fixed addresses from some vendor documentation or a listing output that is parsed. If you want 'VALUE' use str(s), if you want '89' then use xstr(s). You did not describe the actual use case.

Retrieving x64 register values with inline asm

I was wondering if there was any way that would allow me to specify anything other than eax, ebx, ecx and edx as output operands.
Lets say I want to put the content of r8 in a variable, Is it possible to write something like this :
__asm__ __volatile__ (""
:"=r8"(my_var)
: /* no input */
);
It is not clear why would you need to put contents of a specific register into a variable, given a volatile nature of the most of them.
GNU C only has specific-register constraints for the original 8 registers, like "=S"(rsi). For r8..r15, your only option (to avoid needing a mov instruction inside the asm statement) is a register-asm variable.
register long long my_var __asm__ ("r8");
__asm__ ("" :"=r"(my_var)); // guaranteed that r chooses r8
You may want to use an extra input/output constraint to control where you sample the value of r8. (e.g. "+rm"(some_other_var) will make this asm statement part of a data dependency chain in your function, but that will also prevent constant-propagation and other optimizations.) asm volatile may help with controlling the ordering, but that's not guaranteed.
It sometimes works to omit the __asm__ ("" :"=r"(my_var)); statement using the register local as an operand, but it's only guaranteed to work if you do use it: https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables. (And see discussion in comments on a previous version of this answer which suggested you could skip that part.) It doesn't make your code any slower, so don't skip that part to make sure your code is safe in general.
The only supported use for this feature is to specify registers for input and output operands when calling Extended asm (see Extended Asm). This may be necessary if the constraints for a particular machine don’t provide sufficient control to select the desired register. To force an operand into a register, create a local variable and specify the register name after the variable’s declaration. Then use the local variable for the asm operand and specify any constraint letter that matches the register
P.S. This is a GCC extension that may not be portable, but should be available on all compilers that support GNU C inline asm syntax.
gcc doesn't have specific-register constraints at all for some architectures, like ARM, so this technique is the only way for rare cases where you want to force specific registers for input or output operands.
Example:
int get_r8d(void) {
register long long my_var __asm__ ("r8");
__asm__ ("" :"=r"(my_var)); // guaranteed that r chooses r8
return my_var * 2; // do something interesting with the value
}
compiled with gcc7.3 -O3 on the Godbolt compiler explorer
get_r8d():
lea eax, [r8+r8] # gcc can use it directly without a MOV first
ret
It should be possible, based on the answer here:
https://stackoverflow.com/a/43197401/3569229
#include <stdint.h>
uint64_t getsp( void )
{
uint64_t sp;
asm( "mov %%r8, %0" : "=rm" ( sp ));
return sp;
}
You can find a list of register names here: https://www3.nd.edu/~dthain/courses/cse40243/fall2015/intel-intro.html
So your code above would be changed to:
__asm__ __volatile__ ("mov %%r8, %0"
:"=rm"(my_var)
: /* no input */
);

How do I get address of class member function by asm in GCC?

guys! I have a problem. How do I get address of class member function by asm in GCC?
In VS2012, we can do below code to get address.
asm {mov eax, offset TEST::foo}
But, in GCC?
__asm__ __volatile__(
"movq offset %1, %%rdi"
"movq %%rdi, %0"
:"=r"(addr)
:"r"(&TEST::foo)
);
It failed...
AT&T syntax doesn't use the offset keyword. And besides, you've asked the compiler to put &TEST::foo in a register already.
__asm__ (
"mov %1, %0"
:"=r"(addr)
:"r"(&TEST::foo)
);
Or better:
__asm__ ( "" // no instructions
:"=r"(addr)
:"0"(&TEST::foo) // same register as operand 0
);
Or even better: addr = &TEST::foo; https://gcc.gnu.org/wiki/DontUseInlineAsm for this, because it stops the compiler from knowing what's going on.
But if you are going to use inline asm, make sure you let the compiler do as much for you as it can. Use constraints to tell it where you want the input, and where you left the output. If the first or last instruction of an inline-asm statement is a mov, usually that means you're doing it wrong. (See the inline-assembly tag wiki for some links to guides on how to write GNU C inline asm that doesn't suck.
Bugs in your original: you didn't declare a clobber on RDI, so the compiler will still assume you didn't modify it.
You don't need volatile if the only reason to run the code in the asm statement is to produce the output operands, not for side effects. Leaving out volatile lets the compiler optimize around it, and even drop it entirely if the output is unused.

How kernel know the difference between "int 0x80" and "int x"

int 0x80 is a system call, it's also 128 in hexa.
why kernel use int 0x80 as interrupt and when i declare int x he knows it's just an integer named x and vice versa ?
You appear to be confused about the difference between C and assembly language. Both are programming languages, (nowadays) both accept the 0xNNNN notation for writing numbers in hexadecimal, and there's usually some way to embed tiny snippets of assembly language in a C program, but they are different languages. The keyword int means something completely different in C than it does in (x86) assembly language.
To a C compiler, int always and only means to declare something involving an integer, and there is no situation where you can immediately follow int with a numeric literal. int 0x80 (or int 128, or int 23, or anything else of the sort) is always a syntax error in C.
To an x86 assembler, int always and only means to generate machine code for the INTerrupt instruction, and a valid operand for that instruction (an "imm8", i.e. a number in the range 0–255) must be the next thing on the line. int x; is a syntax error in x86 assembly language, unless x has been defined as a constant in the appropriate range using the assembler's macro facilities.
Obvious follow-up question: If a C compiler doesn't recognize int as the INTerrupt instruction, how does a C program (compiled for x86) make system calls? There are four complementary answers to this question:
Most of the time, in a C program, you do not make system calls directly. Instead, you call functions in the C library that do it for you. When processing your program, as far as the C compiler knows, open (for instance) is no different than any other external function. So it doesn't need to generate an int instruction. It just does call open.
But the C library is just more C that someone else wrote for you, isn't it? Yet, if you disassemble the implementation of open, you will indeed see an int instruction (or maybe syscall or sysenter instead). How did the people who wrote the C library do that? They wrote that function in assembly language, not in C. Or they used that technique for embedding snippets of assembly language in a C program, which brings us to ...
How does that work? Doesn't that mean the C compiler does need to understand int as an assembly mnemonic sometimes? Not necessarily. Let's look at the GCC syntax for inserting assembly—this could be an implementation of open for x86/32/Linux:
int open(const char *path, int flags, mode_t mode)
{
int ret;
asm ("int 0x80"
: "=a" (ret)
: "0" (SYS_open), "d" (path), "c" (flags), "D" (mode));
if (ret >= 0) return ret;
return __set_errno(ret);
}
You don't need to understand the bulk of that: the important thing for purpose of this question is, yes, it says int 0x80, but it says it inside a string literal. The compiler will copy the contents of that string literal, verbatim, into the generated assembly-language file that it will then feed to the assembler. It doesn't need to know what it means. That's the assembler's job.
More generally, there are lots of words that mean one thing in C and a completely different thing in assembly language. A C compiler produces assembly language, so it has to "know" both of the meanings of those words, right? It does, but it does not confuse them, because they are always used in separate contexts. "add" being an assembly mnemonic that the C compiler knows how to use, does not mean that there is any problem with naming a variable "add" in a C program, even if the "add" instruction gets used in that program.

How to specify assembler flavor in gcc

I am compiling a program for an embedded ARM device, and want to switch from one bootloader to another. Both bootloaders are written in assembler (for the same type of device), but the problem is that they are different dialects/flavors (perhaps Intel vs AT&T?). The existing assembler code compiles happily in gcc, but the one I want to use does not.
For example, the existing (working) code looks like this...
/* Comments are c-style */
.syntax unified
.arch armv7-m
.section .stack
.align 3
#ifdef __STACK_SIZE
.equ Stack_Size, __STACK_SIZE
#else
.equ Stack_Size, 0xc00
#endif
.globl __StackTop
.globl __StackLimit
__StackLimit:
.space Stack_Size
.size __StackLimit, . - __StackLimit
__StackTop:
.size __StackTop, . - __StackTop
... and the code I want to use looks like this ...
; comments are lisp-style
Stack_Size EQU 0x00000400
AREA STACK, NOINIT, READWRITE, ALIGN=3
Stack_Mem SPACE Stack_Size
__initial_sp
; <h> Heap Configuration
; <o> Heap Size (in Bytes) <0x0-0xFFFFFFFF:8>
; </h>
Heap_Size EQU 0x00000200
AREA HEAP, NOINIT, READWRITE, ALIGN=3
__heap_base
Heap_Mem SPACE Heap_Size
__heap_limit
PRESERVE8
THUMB
Notice the order of operands, and commenting style is different. What type of assembler is this second block? Can gcc be told to expect this format and parse it?
The first one is GNU gas syntax, the second is ARM's commercial toolchain syntax.
The formats (directives and label definitions) are not compatible, although the instruction syntax itself is. Assembling the one with the other is not possible, but the generated object files can be linked together.
Your code examples contain no instructions however, only various assembler directives allocating space for stack and heap.
The first looks like AT&T to me, and the second like Intel. I don't think GCC has an option to change what flavor it uses (as it runs all of it's assembly through GAS (GNU assembler), which uses AT&T). But, if you spend a little time learning the C calling conventions, you can use NASM (Netwide Assembler, which uses Intel syntax but can't be inline). Just create a definition something like this in one of your C headers:
extern void assembly_boot();
And then in your assembly, implement it (yes, the prefixing underscore is correct):
global _assembly_boot
_assembly_boot:
;Blah blah blah
Note: That example doesn't implement the C calling conventions. If you want your assembly to be callable from C, you need to use the C calling conventions. Google them.

Resources