Is it necessary to initialize all the used registers in inline assembly?

Is it necessary to initialize all the used registers in inline assembly? - gcc

I am testing simple inline assembly code using gcc. And I find the result of the following code unexpected:
#include <stdio.h>
int main(void) {
unsigned x0 = 0, x1 = 1, x2 = 2;
__asm__ volatile("movl %1, %0;\n\t"
"movl %2, %1"
:"=r"(x0), "+r"(x1)
:"r"(x2)
:);
printf("%u, %u\n", x0, x1);
return 0;
}
The printed result is 1, 1, rather than the expected 1, 2. Then I compiled the code with -S option and found out gcc generated the code as
movl %eax, %edx;
movl %edx, %eax;
%0 and %2 are using the same register, why?
I want gcc to generate, say,
movl %eax, %edx;
movl %ecx, %eax;
If I add "0"(x1) to the input constraints, gcc will generate the code above. Does it mean that all registers need to be initialized before being used in inline assembly?

Moving my comment to an 'Answer' so this question can be closed.
To prevent the compiler from re-using a register for both an input and an output, you can use the early clobber constraint (for example =&r (x)), which informs the compiler that the register associated with the parameter is
written before the instruction is finished using the input operands.
While this can be a good thing (since it reduces the number of registers that must made available before calling your asm), it can also cause problems (as you have seen). So, either make sure you have finished using all the inputs before writing to the output, or use & to tell the compiler not to do this optimization.
For completeness, let me also point out that using inline asm is usually a bad idea.

Related

How get EIP from x86 inline assembly by gcc

I want to get the value of EIP from the following code, but the compilation does not pass
Command :
gcc -o xxx x86_inline_asm.c -m32 && ./xxx
file contetn x86_inline_asm.c:
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
unsigned int eip_val;
__asm__("mov %0,%%eip":"=r"(eip_val));
return 0;
}
How to use the inline assembly to get the value of EIP, and it can be compiled successfully under x86.
How to modify the code and use the command to complete it?

This sounds unlikely to be useful (vs. just taking the address of the whole function like void *tmp = main), but it is possible.
Just get a label address, or use . (the address of the current line), and let the linker worry about getting the right immediate into the machine code. So you're not architecturally reading EIP, just reading the value it currently has from an immediate.
asm volatile("mov $., %0" : "=r"(address_of_mov_instruction) );
AT&T syntax is mov src, dst, so what you wrote would be a jump if it assembled.
(Architecturally, EIP = the end of an instruction while it's executing, so arguably you should do
asm volatile(
"mov $1f, %0 \n\t" // reference label 1 forward
"1:" // GAS local label
"=r"(address_after_mov)
);
I'm using asm volatile in case this asm statement gets duplicated multiple times inside the same function by inlining or something. If you want each case to get a different address, it has to be volatile. Otherwise the compiler can assume that all instances of this asm statement produce the same output. Normally that will be fine.
Architecturally in 32-bit mode you don't have RIP-relative addressing for LEA so the only good way to actually read EIP is call / pop. Reading program counter directly. It's not a general-purpose register so you can't just use it as the source or destination of a mov or any other instruction.
But really you don't need inline asm for this at all.
Is it possible to store the address of a label in a variable and use goto to jump to it? shows how to use the GNU C extension where &&label takes its address.
int foo;
void *addr_inside_function() {
foo++;
lab1: ; // labels only go on statements, not declarations
void *tmp = &&lab1;
foo++;
return tmp;
}
There's nothing you can safely do with this address outside the function; I returned it just as an example to make the compiler put a label in the asm and see what happens. Without a goto to that label, it can still optimize the function pretty aggressively, but you might find it useful as an input for an asm goto(...) somewhere else in the function.
But anyway, it compiles on Godbolt to this asm
# gcc -O3 -m32
addr_inside_function:
.L2:
addl $2, foo
movl $.L2, %eax
ret
#clang -O3 -m32
addr_inside_function:
movl foo, %eax
leal 1(%eax), %ecx
movl %ecx, foo
.Ltmp0: # Block address taken
addl $2, %eax
movl %eax, foo
movl $.Ltmp0, %eax # retval = label address
retl
So clang loads the global, computes foo+1 and stores it, then after the label computes foo+2 and stores that. (Instead of loading twice). So you still can't usefully jump to the label from anywhere, because it depends on having foo's old value in eax, and on the desired behaviour being to store foo+2

I don't know gcc inline assembly syntax for this, but for masm:
call next0
next0: pop eax ;eax = eip for this line
In the case of Masm, $ represents the current location, and since call is a 5 byte instruction, an alternative syntax without a label would be:
call $+5
pop eax

How to have GCC combine "move r10, r3; store r10" into a "store r3"?

I'm working Power9 and utilizing the hardware random number generator instruction called DARN. I have the following inline assembly:
uint64_t val;
__asm__ __volatile__ (
"xor 3,3,3 \n" // r3 = 0
"addi 4,3,-1 \n" // r4 = -1, failure
"1: \n"
".byte 0xe6, 0x05, 0x61, 0x7c \n" // r3 = darn 3, 1
"cmpd 3,4 \n" // r3 == -1?
"beq 1b \n" // retry on failure
"mr %0,3 \n" // val = r3
: "=g" (val) : : "r3", "r4", "cc"
);
I had to add a mr %0,3 with "=g" (val) because I could not get GCC to produce expected code with "=r3" (val). Also see Error: matching constraint not valid in output operand.
A disassembly shows:
(gdb) b darn.cpp : 36
(gdb) r v
...
Breakpoint 1, DARN::GenerateBlock (this=<optimized out>,
output=0x7fffffffd990 "\b", size=0x100) at darn.cpp:77
77 DARN64(output+i*8);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.ppc64le libgcc-4.8.5-28.el7_5.1.ppc64le libstdc++-4.8.5-28.el7_5.1.ppc64le
(gdb) disass
Dump of assembler code for function DARN::GenerateBlock(unsigned char*, unsigned long):
...
0x00000000102442b0 <+48>: addi r10,r8,-8
0x00000000102442b4 <+52>: rldicl r10,r10,61,3
0x00000000102442b8 <+56>: addi r10,r10,1
0x00000000102442bc <+60>: mtctr r10
=> 0x00000000102442c0 <+64>: xor r3,r3,r3
0x00000000102442c4 <+68>: addi r4,r3,-1
0x00000000102442c8 <+72>: darn r3,1
0x00000000102442cc <+76>: cmpd r3,r4
0x00000000102442d0 <+80>: beq 0x102442c8 <DARN::GenerateBlock(unsigned char*, unsigned long)+72>
0x00000000102442d4 <+84>: mr r10,r3
0x00000000102442d8 <+88>: stdu r10,8(r9)
Notice GCC faithfully reproduces the:
0x00000000102442d4 <+84>: mr r10,r3
0x00000000102442d8 <+88>: stdu r10,8(r9)
How do I get GCC to fold the two instructions into:
0x00000000102442d8 <+84>: stdu r3,8(r9)

GCC will never remove text that's part of the asm template; it doesn't even parse it other than substituting in for %operand. It's literally just a text substitution before the asm is sent to the assembler.
You have to leave out the mr from your inline asm template, and tell gcc that your output is in r3 (or use a memory-destination output operand, but don't do that). If your inline-asm template ever starts or ends with mov instructions, you're usually doing it wrong.
Use register uint64_t foo asm("r3"); to force "=r"(foo) to pick r3 on platforms that don't have specific-register constraints.
(Despite ISO C++17 removing the register keyword, this GNU extension still works with -std=c++17. You can also use register uint64_t foo __asm__("r3"); if you want to avoid the asm keyword. You probably still need to treat register as a reserved word in source that uses this extension; that's fine. ISO C++ removing it from the base language doesn't force implementations to not use it as part of an extension.)
Or better, don't hard-code a register number. Use an assembler that supports the DARN instruction. (But apparently it's so new that even up-to-date clang lacks it, and you'd only want this inline asm as a fallback for gcc too old to support the __builtin_darn() intrinsic)
Using these constraints will let you remove the register setup, too, and use foo=0 / bar=-1 before the inline asm statement, and use "+r"(foo).
But note that darn's output register is write-only. There's no need to zero r3 first. I found a copy of IBM's POWER ISA instruction set manual that is new enough to include darn here: https://wiki.raptorcs.com/w/images/c/cb/PowerISA_public.v3.0B.pdf#page=96
In fact, you don't need to loop inside the asm at all, you can leave that to the C and only wrap the one asm instruction, like inline-asm is designed for.
uint64_t random_asm() {
register uint64_t val asm("r3");
do {
//__asm__ __volatile__ ("darn 3, 1");
__asm__ __volatile__ (".byte 0x7c, 0x61, 0x05, 0xe6 # gcc asm operand = %0\n" : "=r" (val));
} while(val == -1ULL);
return val;
}
compiles cleanly (on the Godbolt compiler explorer) to
random_asm():
.L6: # compiler-generated label, no risk of name clashes
.byte 0x7c, 0x61, 0x05, 0xe6 # gcc asm operand = 3
cmpdi 7,3,-1 # compare-immediate
beq 7,.L6
blr
Just as tight as your loop, with less setup. (Are you sure you even need to zero r3 before the asm instruction?)
This function can inline anywhere you want it to, allowing gcc to emit a store instruction that reads r3 directly.
In practice, you'll want to use a retry counter, as advised in the manual: if the hardware RNG is broken, it might give you failure forever so you should have a fallback to a PRNG. (Same for x86's rdrand)
Deliver A Random Number (darn) - Programming Note
When the error value is obtained, software is
expected to repeat the operation. If a non-error
value has not been obtained after several attempts,
a software random number generation method
should be used. The recommended number of
attempts may be implementation specific. In the
absence of other guidance, ten attempts should be
adequate.
xor-zeroing is not efficient on most fixed-instruction-width ISAs, because a mov-immediate is just as short so there's no need to detect and special-case an xor. (And thus CPU designs don't spend transistors on it). Moreover, dependency rules for the PPC asm equivalent of C++11 std::memory_order_consume require it to carry a dependency on the input register, so it couldn't be dependency-breaking even if the designers wanted it to. xor-zeroing is only a thing on x86 and maybe a few other variable-width ISAs.
Use li r3, 0 like gcc does for int foo(){return 0;} https://godbolt.org/z/-gHI4C.

Assembly inline AT&T Type mismatch

I'm learning assembly and I found nothing that helps me do this. Is it even possible? I can't make this work.
I want this code to take the "b" value, put it in %eax and then move the content of %eax in my output and print that ASCII character, "0" in this case.
char a;
int b=48;
__asm__ (
//Here's the "Error: operand type mismatch for `mov'
"movl %0, %%eax;"
"movl %%eax, %1;"
:"=r"(a)
:"r" (b)
:"%eax"
);
printf("%c\n",a);

The instruction responsible for the error is this one:
movl %0, %%eax
So, in order to figure out why it's causing an error, we need to understand what it says. It's a 32-bit MOV instruction (the l suffix in AT&T syntax means "long", aka DWORD). The destination operand is the 32-bit EAX register. The source operand is the first input/output operand, a. In other words, this:
"=r"(a)
which says that char a; is to be used as an output-only register.
As such, what the inline assembler wants to do is to generate code like the following:
movl %dl, %eax
(assuming, for the sake of argument that a is allocated in the dl register, but it could just as easily have been allocated in any of the 8-bit registers). The problem is, that code is invalid because there is an operand size mismatch. The source operand and destination operand are different sizes: one is 32 bits while the other is 8 bits. This cannot work.
A workaround is the movzx/movsx instructions (introduced with the 80386) which move an 8 (or 16) bit source operand into a 32-bit destination operand, either with zero extension or sign extension, respectively. In AT&T syntax, the form that moves an 8-bit source into a 32-bit destination would be movzbl (for zero extension, used with unsigned values) or movsbl (for sign extension, used with signed values).
But wait—this is the wrong workaround. Your code is invalid for another reason: a is uninitialized! And not only is a uninitialized, but you've told the inline assembler via the output constraints it is an output-only operand (the = sign)! So you can't read from it—you can only store into it.
You have your operand notation backwards. What you really wanted was something like the following:
__asm__(
"movl %1, %%eax;"
"movl %%eax, %0;"
: "=r"(a)
: "r" (b)
: "%eax"
);
Of course, that's still going to give you an operand size mismatch, but it's now on the second assembly instruction. What this is telling the inline assembler to emit is the following code:
movl $48, %edx
movl %edx, %eax
movl %eax, %dl
which is invalid because a 32-bit source (%eax) cannot be moved into an 8-bit destination (%dl). And you can't fix this with movzx/movsx, because that is used to extend, not truncate. The way to write this would be the following:
movl $48, %edx
movl %edx, %eax
movb %al, %dl
where the last instruction is an 8-bit move, from an 8-bit source register to an 8-bit destination register.
In inline assembly, this would be written as:
__asm__(
"movl %1, %%eax;"
"movb %%al, %0;"
: "=r"(a)
: "r" (b)
: "%eax"
);
However, this is not the correct way to use inline assembly. You've manually hard-coded the EAX register inside of the inline assembly block, which means that you had to clobber it. The problem with this is that it ties the compiler's hands behind its back when it comes to register allocation. What you're supposed to do is put everything that goes into and out of the inline assembly block in the input and output operands. This lets the compiler handle all register allocation in the most optimal way possible. The code should look as follows:
char a;
int b = 48;
int temp;
__asm__(
"movl %2, %0\n\t"
"movb %b0, %1"
: "=r"(temp),
"=r"(a)
: "r" (b)
:
);
A lot of changes happened here:
I introduced another temporary variable (appropriately named temp) and added it to the output-only operands list. This causes the compiler to allocate a register for it automatically, which we then use inside of the asm block.
Now that we're letting the compiler do the register allocation, we don't need a clobber list, so that's left empty.
The b modifier is needed on the source operand for the movb instruction to ensure that the byte-sized portion of that register is used, rather than the entire 32-bit register.
Instead of using semicolons at the end of each asm instruction, I used \n\t (except on the last one). This is what is recommended for use in inline assembly blocks, and it gets you nicer assembly output listings because it matches what the compiler does internally.
Even better would be to introduce symbolic names for the operands, making the code more readable:
char a;
int b = 48;
int temp;
__asm__(
"movl %[input], %[temp]\n\t"
"movb %b[temp], %[dest]"
: [temp] "=r"(temp),
[dest] "=r"(a)
: [input] "r" (b)
:
);
And, at this point, if you hadn't noticed already, you'd see that this code is enormously silly. You don't need all those temporaries and register-register shuffling. You can just do:
movl $48, %eax
and the value 48 is already in al, since al is the low 8 bits of the 32-bit register eax.
Or, you can do:
movb $48, %al
which is just an 8-bit move of the value 48 explicitly into the 8-bit register al.
But, in fact, if you're calling printf, the argument must be passed as an int (not a char, since it's a variadic function), so you definitely want:
movl $48, %eax
When you start using inline assembly, the compiler can't easily optimize through it, so you get inefficient code. All you really needed was:
int a = 48;
printf("%c\n",a);
Which produces the following assembly code:
pushl $48
pushl $AddressOfFormatString
call printf
addl $8, %esp
or, equivalently:
movl $48, %eax
pushl %eax
pushl $AddressOfFormatString
call printf
addl $8, %esp
Now, I imagine you're saying to yourself something like: "Yes, but if I do that, then I'm not using inline assembly!" To which my response is: exactly. You don't need inline assembly here, and in fact, you should not be using it, because it just causes problems. It's more difficult to write and leads to inefficient code generation.
If you want to learn assembly language programming, get an assembler and use that—not a C compiler's inline assembler. NASM is a popular and excellent choice, as is YASM. If you want to stick with using the Gnu assembler so you can stick with this tortuous AT&T syntax, then run as.

Since a is defined as character (char a;), :"=r"(a) will assign a 8-byte register. The 32-byte register EAX cannot be loaded with an 8-byte register - movl %dl, %eax (movl %0, %%eax) will cause this error. There are the sign extend and zero extend instructions movzx and movsx (Intel syntax), in AT&T syntax: movs... and movz... for this purpose.
Change
movl %0, %%eax;
to
movzbl %0, %%eax;

What is the role of the clobber list? [duplicate]

This function "strcpy" aims to copy the content of src to dest, and it works out just fine: display two lines of "Hello_src".
#include <stdio.h>
static inline char * strcpy(char * dest,const char *src)
{
int d0, d1, d2;
__asm__ __volatile__("1:\tlodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b"
: "=&S" (d0), "=&D" (d1), "=&a" (d2)
: "0"(src),"1"(dest)
: "memory");
return dest;
}
int main(void) {
char src_main[] = "Hello_src";
char dest_main[] = "Hello_des";
strcpy(dest_main, src_main);
puts(src_main);
puts(dest_main);
return 0;
}
I tried to change the line : "0"(src),"1"(dest) to : "S"(src),"D"(dest), the error occurred: ‘asm’ operand has impossible constraints. I just cannot understand. I thought that "0"/"1" here specified the same constraint as the 0th/1th output variable. the constraint of 0th output is =&S, te constraint of 1th output is =&D. If I change 0-->S, 1-->D, there shouldn't be any wrong. What's the matter with it?
Does "clobbered registers" or the earlyclobber operand(&) have any use? I try to remove "&" or "memory", the result of either circumstance is the same as the original one: output two lines of "Hello_src" strings. So why should I use the "clobbered" things?

The earlyclobber & means that the particular output is written before the inputs are consumed. As such, the compiler may not allocate any input to the same register. Apparently using the 0/1 style overrides that behavior.
Of course the clobber list also has important use. The compiler does not parse your assembly code. It needs the clobber list to figure out which registers your code will modify. You'd better not lie, or subtle bugs may creep in. If you want to see its effect, try to trick the compiler into using a register around your asm block:
extern int foo();
int bar()
{
int x = foo();
asm("nop" ::: "eax");
return x;
}
Relevant part of the generated assembly code:
call foo
movl %eax, %edx
nop
movl %edx, %eax
Notice how the compiler had to save the return value from foo into edx because it believed that eax will be modified. Normally it would just leave it in eax, since that's where it will be needed later. Here you can imagine what would happen if your asm code did modify eax without telling the compiler: the return value would be overwritten.

Rewrite Intel-style assembly code into GCC inline assembly

How to write this assembly code as inline assembly? Compiler: gcc(i586-elf-gcc). The GAS syntax confuses me. Please give tell me how to write this as inline assembly that works for gcc.
.set_video_mode:
mov ah,00h
mov al,13h
int 10h
.init_mouse:
mov ax,0
int 33h
Similar one I have in assembly. I wrote them separate as assembly routines to call them from my C program. I need to call these and some more interrupts from C itself.
Also I need to put some values in some registers depending on which interrupt routine I'm calling. Please tell me how to do it.
All that I want to do is call interrupt routines from C. It's OK for me even to do it using int86() but i don't have source code of that function.
I want int86() so that i can call interrupts from C.
I am developing my own tiny OS so i got no restrictions for calling interrupts or for any direct hardware access.

I've not tested this, but it should get you started:
void set_video_mode (int x, int y) {
register int ah asm ("ah") = x;
register int al asm ("al") = y;
asm volatile ("int $0x10"
: /* no outputs */
: /* no inputs */
: /* clobbers */ "ah", "al");
}
I've put in two 'clobbers' as an example, but you'll need to set the correct list of clobbers so that the compiler knows you've overwritten register values (maybe none).

First, keep in mind GCC doesn't support 16-bit code yet, so you'll end up compiling 32-bit code in 16-bit mode, which is very inefficient but doable (it is used, for example, by Linux and SeaBIOS). It can be done with the following at the begging of each file:
__asm__ (".code16gcc");
Newer GCC versions (since 4.9 IIRC) support the -m16 flag that does the same thing.
Also, there's no mouse driver available unless you load it previous to your kernel running init_mouse.
You seem to be using an API commonly available in several x86 DOS.
asm can take care of the register assignments, so the code can be reduced to:
void set_video_mode(int mode)
{
mode &= 255;
__asm__ __volatile__ (
"int $0x10"
: "+a" (mode) /* %eax = mode & 255 => %ah = 0, %al = mode */
);
}
void init_mouse(void)
{
/* XXX it is really important to check the IDT entry isn't 0 */
int tmp = 0;
__asm__ __volatile__ (
"int $0x33"
: "+a" (tmp) /* %eax = 0*/
:: "ebx" /* %ebx is also clobbered by DOS mouse drivers */
);
}
The asm statement is documented in the GCC manual, although perhaps not in enough depth and lacks x86 examples. The outputs (after first colon) have a distinctively obscure syntax, while the rest is far easier to understand (the second colon specifies the inputs and the third the clobbered registers, flags and/or memory).
The outputs must be prefixed with =, meaning you don't care the previous value it may have had, or +, meaning you want to use it as an input too. In this context we use that instead of an input because the value is modified by the interrupt and you're not allowed to specify input registers in the clobbered list (because the compiler is forbidden from using them).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Is it necessary to initialize all the used registers in inline assembly? - gcc

Related

How get EIP from x86 inline assembly by gcc

How to have GCC combine "move r10, r3; store r10" into a "store r3"?

Assembly inline AT&T Type mismatch

What is the role of the clobber list? [duplicate]

Rewrite Intel-style assembly code into GCC inline assembly

Categories

Resources