Inline assembly addressing mode - gcc

I'm trying to write a lidt instruction in inline assembly in gcc with -masm=intel and 32 bits (-m32). So I have defined the following structures:
typedef struct {
uint16_t length;
uint32_t base;
} idt_desc;
static idt_desc desc = {
sizeof(idt)-1,
(uint32_t)(uintptr_t)idt,
};
If I were using nasm, I would be done with lidt [desc]. But I'm using inline assembly and have this inside a function:
asm volatile("lidt %0"::"m"(desc):);
This gives me "Error: unsupported instruction `lidt'". The assembly generated looks like this:
lidt QWORD PTR desc
As far as I know, braces are optional in gas intel syntax. So the problem here is the qword ptr which is not acceptable in lidt instruction as it expects a m16&32 operand. How can I tell gcc to use that? i.e., drop qword ptr and just use desc.

You need pack the idt_desc structure as the compiler will add padding between the 16-bit length and the 32-bit base structure members. Even if the compiler had managed to generate the code for this the structure would have been invalid and LIDT would have almost certainly loaded an incorrect IDT record leading to an eventual crash/triple fault at runtime. It should be:
typedef struct {
uint16_t length;
uint32_t base;
} __attribute__((packed)) idt_desc;
The -masm=intel option seems to have caused the compiler to see the unpacked version of the structure as a padded 8 byte structure and then treated it as a 64-bit QWORD. In 32-bit code an LIDT doesn't take a pointer to a QWORD it takes a pointer to a 48-bit value (aka FWORD in some Intel dialects) which is the source of the error. By packing the structure the compiler is no longer generating QWORD since the packed version is 6 bytes in size.

Related

Why doesn't SASM's debugger show the value of a "result" variable updating after a store?

I'm trying to run a simple code in assembly - I want to save an address to memory.
I'm moving the address into a register and then moving it into the memory, but for some reason the memory isn't updated.
.data
str1: .asciz "atm course number is 234118"
str2: .asciz "234118"
result: .space 8
.text
.global main
main:
xorq %rax, %rax
xorq %rbx, %rbx
leaq str1, %rax
mov %rax, result(,%rbx,1)
ret
What am I doing wrong?
Your debugger is looking at the wrong instance of result. Your code was always fine (although inefficient; use mov %rax, result(%rip) and don't zero an index, or use mov %rax, result(%rbx,,) to use the byte offset as a "base", not "index", which is more efficient).
glibc contains several result symbols, and in GDB info var result shows:
All variables matching regular expression "result":
Non-debugging symbols:
0x000000000040404b result # in your executable, at a normal static address
0x00007ffff7f54f20 result_type
0x00007ffff7f821b8 cached_result
0x00007ffff7f846a0 result # in glibc, at a high address
0x00007ffff7f85260 result # where the dynamic linker puts shared libs
0x00007ffff7f85660 result
0x00007ffff7f86ab8 result
0x00007ffff7f86f48 result
When I do p /x &result to see what address the debugger resolved that symbol to, I get one of the glibc instances, not the instance in your .data section. Specifically, I get 0x7ffff7f85660 as the address, with the content = 0.
When I print the value with a cast to p /x (unsigned long)result, or dump the memory with GDB's x command, I find a 0 there after the store.
(gdb) x /xg &result
0x7ffff7f85660 <result>: 0x0000000000000000
It looks like your system picked a different instance, one that contained a pointer to a libc address or something. I can't copy-paste from your image. These other result variables are probably static int result or whatever inside various .c files in glibc. (And BTW, that looks like a sign of poor coding style; usually you want to return a value instead of set a global or static. But glibc is old and/or maybe there's some justification for some of those.)
Your result: is the asm a compiler would make for static void* result if it didn't get optimized away. Except it would put it in .bss instead of .data because it's zero-initialized.
You're using SASM. I used GDB to get more details on exactly what's going on. Looking at the address of result in SASM's debug pane might have helped. But now that we've identified the problem using GDB, we can change your source to fix it for SASM.
You can use .globl result to make it an externally-visible symbol so it "wins" when the debugger is looking for symbols.
I added that and compiled again with gcc -g -no-pie store.s. It works as expected now, with p /x (unsigned long)result giving 0x404028

GCC inline assembly read value from array

While learning gcc inline assembly I was playing a bit with memory access. I'm trying to read a value from an array using a value from a different array as index.
Both arrays are initialized to something.
Initialization:
uint8_t* index = (uint8_t*)malloc(256);
memset(index, 33, 256);
uint8_t* data = (uint8_t*)malloc(256);
memset(data, 44, 256);
Array access:
unsigned char read(void *index,void *data) {
unsigned char value;
asm __volatile__ (
" movzb (%1), %%edx\n"
" movzb (%2, %%edx), %%eax\n"
: "=r" (value)
: "c" (index), "c" (data)
: "%eax", "%edx");
return value;
}
This is how I use the function:
unsigned char value = read(index, data);
Now I would expect it to return 44. But it actually returns me some random value. Am I reading from uninitialzed memory? Also I'm not sure how to tell the compiler that it should assign the value from eax to the variable value.
You told the compiler you were going to put the output in %0, and it could pick any register for that "=r". But instead you never write %0 in your template.
And you use two temporaries for no apparent reason when you could have used %0 as the temporary.
As usual, you can debug your inline asm by adding comments like # 0 = %0 and looking at the compiler's asm output. (Not disassembly, just gcc -S to see what it fills in. e.g. # 0 = %ecx. (You didn't use an early-clobber "=&r" so it can pick the same register as inputs).
Also, this has 2 other bugs:
doesn't compile. Requesting 2 different operands in ECX with "c" constraints can't work unless the compiler can prove at compile-time that they have the same value so %1 and %2 can be the same register. https://godbolt.org/z/LgR4xS
You dereference pointer inputs without telling the compiler you're reading the pointed-to memory. Use a "memory" clobber or dummy memory operands. How can I indicate that the memory *pointed* to by an inline ASM argument may be used?
Or better https://gcc.gnu.org/wiki/DontUseInlineAsm because it's useless for this; just let GCC emit the movzb loads itself. unsigned char* is safe from strict-aliasing UB so you can safely cast any pointer to unsigned char* and dereference it, without even having to use memcpy or other hacks to fight against language rules for wider unaligned or type-punned accesses.
But if you insist on inline asm, read manuals and tutorials, links at https://stackoverflow.com/tags/inline-assembly/info. You can't just throw code at the wall until it sticks with inline asm: you must understand why your code is safe to have any hope of it being safe. There are many ways for inline asm to happen to work but actually be broken, or be waiting to break with different surrounding code.
This is a safe and not totally terrible version (other than the unavoidable optimization-defeating parts of inline asm). You do still want a movzbl load for both loads, even though the return value is only 8 bits. movzbl is the natural efficient way to load a byte, replacing instead of merging with the old contents of a full register.
unsigned char read(void *index, void *data)
{
uintptr_t value;
asm (
" movzb (%[idx]), %k[out] \n\t"
" movzb (%[arr], %[out]), %k[out]\n"
: [out] "=&r" (value) // early-clobber output
: [idx] "r" (index), [arr] "r" (data)
: "memory" // we deref some inputs as pointers
);
return value;
}
Note the early-clobber on the output: this stops gcc from picking the same register for output as one of the inputs. It would be safe for it to destroy the [idx] register with the first load, but I don't know how to tell GCC that in one asm statement. You could split your asm statement into two separate ones, each with their own input and output operands, connecting the output of the first to the input of the 2nd via a local variable. Then neither one would need early-clobber because they're just wrapping single instructions like GNU C inline asm syntax is designed to do nicely.
Godbolt with test caller to see how it inlines / optimizes when called twice, with i386 clang and x86-64 gcc. e.g. asking for index in a register forces an LEA, instead of letting the compiler see the deref and letting it pick an addressing mode for *index. Also the extra movzbl %al, %eax done by the compiler when adding to unsigned sum because we used a narrow return type.
I used uintptr_t value so this can compile for 32-bit and 64-bit x86. There's no harm in making the output from the asm statement wider than the return value of the function, and that saves us from having to use size modifiers like movzbl (%1), %k0 to get GCC to print the 32-bit register name (like EAX) if it chose AL for an 8-bit output variable, for example.
I did decided to actually use %k[out] for the benefit of 64-bit mode: we want movzbl (%rdi), %eax, not movzb (%rdi), %rax (wasting a REX prefix).
You might as well declare the function to return unsigned int or uintptr_t, though, so the compiler knows that it doesn't have to redo zero-extension. OTOH sometimes it can help the compiler to know that the value-range is only 0..255. You could tell it that you produce a correctly-zero-extend value using if(retval>255) __builtin_unreachable() or something. Or you could just not use inline asm.
You don't need asm volatile. (Assuming you want to let it optimize away if the result is unused, or be hoisted out of loops for constant inputs). You only need a "memory" clobber so if it does get used, the compiler knows that it reads memory.
(A "memory" clobber counts as all memory being an input, and all memory being an output. So it can't CSE, e.g. hoist out of a loop, because as far as the compiler knows one invocation might read something a previous one wrote. So in practice a "memory" clobber is about as bad as asm volatile. Even two back-to-back calls to this function without touching the input array force the compiler to emit the instructions twice.)
You could avoid this with dummy memory-input operands so the compiler knows this asm block doesn't modify memory, only read it. But if you actually care about efficiency, you shouldn't be using inline asm for this.
But like I said there is zero reason to use inline asm:
This will do exactly the same thing in 100% portable and safe ISO C:
// safe from strict-aliasing violations
// because unsigned char* can alias anything
inline
unsigned char read(void *index, void *data) {
unsigned idx = *(unsigned char*)index;
unsigned char * dp = data;
return dp[idx];
}
You could cast one or both pointers to volatile unsigned char* if you insist on the access happening every time and not being optimized away.
Or maybe even to atomic<unsigned char> * depending on what you're doing. (That's a hack, prefer C++20 atomic_ref to atomically load/store on objects that are normally not atomic.)

Retrieving x64 register values with inline asm

I was wondering if there was any way that would allow me to specify anything other than eax, ebx, ecx and edx as output operands.
Lets say I want to put the content of r8 in a variable, Is it possible to write something like this :
__asm__ __volatile__ (""
:"=r8"(my_var)
: /* no input */
);
It is not clear why would you need to put contents of a specific register into a variable, given a volatile nature of the most of them.
GNU C only has specific-register constraints for the original 8 registers, like "=S"(rsi). For r8..r15, your only option (to avoid needing a mov instruction inside the asm statement) is a register-asm variable.
register long long my_var __asm__ ("r8");
__asm__ ("" :"=r"(my_var)); // guaranteed that r chooses r8
You may want to use an extra input/output constraint to control where you sample the value of r8. (e.g. "+rm"(some_other_var) will make this asm statement part of a data dependency chain in your function, but that will also prevent constant-propagation and other optimizations.) asm volatile may help with controlling the ordering, but that's not guaranteed.
It sometimes works to omit the __asm__ ("" :"=r"(my_var)); statement using the register local as an operand, but it's only guaranteed to work if you do use it: https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables. (And see discussion in comments on a previous version of this answer which suggested you could skip that part.) It doesn't make your code any slower, so don't skip that part to make sure your code is safe in general.
The only supported use for this feature is to specify registers for input and output operands when calling Extended asm (see Extended Asm). This may be necessary if the constraints for a particular machine don’t provide sufficient control to select the desired register. To force an operand into a register, create a local variable and specify the register name after the variable’s declaration. Then use the local variable for the asm operand and specify any constraint letter that matches the register
P.S. This is a GCC extension that may not be portable, but should be available on all compilers that support GNU C inline asm syntax.
gcc doesn't have specific-register constraints at all for some architectures, like ARM, so this technique is the only way for rare cases where you want to force specific registers for input or output operands.
Example:
int get_r8d(void) {
register long long my_var __asm__ ("r8");
__asm__ ("" :"=r"(my_var)); // guaranteed that r chooses r8
return my_var * 2; // do something interesting with the value
}
compiled with gcc7.3 -O3 on the Godbolt compiler explorer
get_r8d():
lea eax, [r8+r8] # gcc can use it directly without a MOV first
ret
It should be possible, based on the answer here:
https://stackoverflow.com/a/43197401/3569229
#include <stdint.h>
uint64_t getsp( void )
{
uint64_t sp;
asm( "mov %%r8, %0" : "=rm" ( sp ));
return sp;
}
You can find a list of register names here: https://www3.nd.edu/~dthain/courses/cse40243/fall2015/intel-intro.html
So your code above would be changed to:
__asm__ __volatile__ ("mov %%r8, %0"
:"=rm"(my_var)
: /* no input */
);

How do I get address of class member function by asm in GCC?

guys! I have a problem. How do I get address of class member function by asm in GCC?
In VS2012, we can do below code to get address.
asm {mov eax, offset TEST::foo}
But, in GCC?
__asm__ __volatile__(
"movq offset %1, %%rdi"
"movq %%rdi, %0"
:"=r"(addr)
:"r"(&TEST::foo)
);
It failed...
AT&T syntax doesn't use the offset keyword. And besides, you've asked the compiler to put &TEST::foo in a register already.
__asm__ (
"mov %1, %0"
:"=r"(addr)
:"r"(&TEST::foo)
);
Or better:
__asm__ ( "" // no instructions
:"=r"(addr)
:"0"(&TEST::foo) // same register as operand 0
);
Or even better: addr = &TEST::foo; https://gcc.gnu.org/wiki/DontUseInlineAsm for this, because it stops the compiler from knowing what's going on.
But if you are going to use inline asm, make sure you let the compiler do as much for you as it can. Use constraints to tell it where you want the input, and where you left the output. If the first or last instruction of an inline-asm statement is a mov, usually that means you're doing it wrong. (See the inline-assembly tag wiki for some links to guides on how to write GNU C inline asm that doesn't suck.
Bugs in your original: you didn't declare a clobber on RDI, so the compiler will still assume you didn't modify it.
You don't need volatile if the only reason to run the code in the asm statement is to produce the output operands, not for side effects. Leaving out volatile lets the compiler optimize around it, and even drop it entirely if the output is unused.

What are the limitations on the use of output registers in avr-gcc inline assembly?

Output register in inline assembly must be declared with the "=" constraint, meaning "write-only" [1]. What exactly does this mean - is it truly forbidden to read and modify them within the assembly? For example, consider this code:
uint8_t one ()
{
uint8_t res;
asm("ldi %[res],0\n"
"inc %[res]\n"
: [res] "=r" (res)
);
return res;
}
The assembly sets the output register to 0 then increments it. Is this breaking the "write-only" constraint?
UPDATE
I'm seeing problems where my inline asm breaks when I change it to work directly on an output register, as opposed to using r16 for the computation and finally mov'ing r16 into the output register. The code is here: http://ideone.com/JTpYma . It prints results to serial, you just need to define F_CPU and BAUD. The problem appears only when using gcc-4.8.0 and not using gcc-4.7.2.
[1] http://www.nongnu.org/avr-libc/user-manual/inline_asm.html
The compiler doesn't care whether you read it or not, it just won't put the initial value of the variable into the register. Your example is entirely legal, but people often wrongly expect to get result 2 from this code:
uint8_t one ()
{
uint8_t res = 1;
asm("inc %[res]\n"
: [res] "=r" (res)
);
return res;
}
Since it's only an output constraint, the initial value of res is not guaranteed to be loaded into the register. In fact, the initializer may even be optimized away on the assumption that the asm block will overwrite it anyway. The above code is compiled to this by my version of avr-gcc:
inc r24
ret
As you can see, the compiler indeed removed loading 1 into res and hence into r24 thus producing undefined result.
Update
The problem with the updated program in the question is that it also has an input register operand. By default the compiler assumes that all inputs are consumed before the outputs are assigned so it's safe to allocate overlapping registers. That's clearly not the case for your example. You should use an "early clobber" modifier (&) for the output. This is what the manual has to say about that:
& Means (in a particular alternative) that this operand is an
earlyclobber operand, which is modified before the instruction is
finished using the input operands. Therefore, this operand may not lie
in a register that is used as an input operand or as part of any
memory address.
Nobody said gcc inline asm was easy :D

Resources