Consider the following code:
#include <stdio.h>
void main() {
uint32_t num = 2;
__asm__ __volatile__ ("CPUID");
__asm__ __volatile__ ("movl $1, %%ecx":);
__asm__ __volatile__ ("andl $0, %%ecx": "=r"(num));
printf("%i\n", num);
}
My initial expectation was that this code would print 0, and it does if I comment out the CPUID line, but as-is it was giving me garbage. After some trial, error, and research I realized that I was getting the value of a random register. Apparently GCC doesn't assume that I want the result of the statement being executed.
The problem is that I've seen (other people's) code that relies on that statement properly getting the result of the AND, regardless of what is going on with the other registers. Obviously such code is broken, given my observations, and the "=r" should be replaced with "=c".
My question is, can we ever rely on the "=r" constraint behaving consistently or according to the obvious expectation? Or is GCC's implementation too opaque/weird/other and it's best just to avoid it in every situation?
In order to use the =r output specifier you need to give gcc the freedom to pick the register that it wants to use. You do that by specifying the inputs and outputs generically with %0 for the output and the inputs starting with %1 for the first input.
In your case you are saying that num can be in a register. But there is nothing in the asm instruction that uses the output register. So gcc will essentially ignore this.
The reason that you are getting a different value if you comment or don't comment the CPUID instruction is that CPUID can write to eax,ebx,ecx, and edx. I tried your example on my system and got 0 as the result in both cases. But I noticed that the assembly that is generated is printing the value of eax. So I guess when I ran this program CPUID was writing 0 to eax.
If you did want to use the =r constraint you would need to do something like this:
asm("CPUID \n\t"
"movl $1, %0 \n\t"
"andl $0, %0 \n\t"
:"=r"(num) );
Otherwise if your asm code specifically mentions a register then you will need to specify it in the constraint list. In your example that means using =c.
Related
I am trying to emit a global SYMBOL based on a #define VALUE. My attempt is as follows:
__asm__ (".globl SYMBOL");
__asm__ (".set SYMBOL, %0" :: "i" (VALUE));
What is emitted by gcc to the assembler is the following:
.globl SYMBOL
.set SYMBOL, #VALUE
How can I get rid of the hash in the .set before VALUE. FWIW, my target is ARM.
armclang defines various template modifiers that can be used with inline assembly. gcc supports them, in every instance I've checked, although it doesn't document this.
In particular there is
c
Valid for an immediate operand. Prints it as a plain value without a preceding #. Use this template modifier when using the operand in .word, or another data-generating directive, which needs an integer without the #.
So you can do
__asm__ (".set SYMBOL, %c0" : : "i" (VALUE));
Try on godbolt
(There's a few open bugs on the gcc bugzilla suggesting that template / operand modifiers should be documented. The main one seems to be 30527, where I've just posted a comment. The developers' view seems to be that operand modifiers are "compiler internals" that are not meant for end users, but for arm/aarch64 in particular, there are simple things that you just can't do any other way. They made an exception for x86, so why not here?)
You can use stringizing.
#define VALUE 89
#define xstr(s) str(s)
#define str(s) #s
__asm__ (".globl SYMBOL");
__asm__ (".set SYMBOL, " str(VALUE));
The 'VALUE' must conform to something that gas will take as working with set. They could be fixed addresses from some vendor documentation or a listing output that is parsed. If you want 'VALUE' use str(s), if you want '89' then use xstr(s). You did not describe the actual use case.
While learning gcc inline assembly I was playing a bit with memory access. I'm trying to read a value from an array using a value from a different array as index.
Both arrays are initialized to something.
Initialization:
uint8_t* index = (uint8_t*)malloc(256);
memset(index, 33, 256);
uint8_t* data = (uint8_t*)malloc(256);
memset(data, 44, 256);
Array access:
unsigned char read(void *index,void *data) {
unsigned char value;
asm __volatile__ (
" movzb (%1), %%edx\n"
" movzb (%2, %%edx), %%eax\n"
: "=r" (value)
: "c" (index), "c" (data)
: "%eax", "%edx");
return value;
}
This is how I use the function:
unsigned char value = read(index, data);
Now I would expect it to return 44. But it actually returns me some random value. Am I reading from uninitialzed memory? Also I'm not sure how to tell the compiler that it should assign the value from eax to the variable value.
You told the compiler you were going to put the output in %0, and it could pick any register for that "=r". But instead you never write %0 in your template.
And you use two temporaries for no apparent reason when you could have used %0 as the temporary.
As usual, you can debug your inline asm by adding comments like # 0 = %0 and looking at the compiler's asm output. (Not disassembly, just gcc -S to see what it fills in. e.g. # 0 = %ecx. (You didn't use an early-clobber "=&r" so it can pick the same register as inputs).
Also, this has 2 other bugs:
doesn't compile. Requesting 2 different operands in ECX with "c" constraints can't work unless the compiler can prove at compile-time that they have the same value so %1 and %2 can be the same register. https://godbolt.org/z/LgR4xS
You dereference pointer inputs without telling the compiler you're reading the pointed-to memory. Use a "memory" clobber or dummy memory operands. How can I indicate that the memory *pointed* to by an inline ASM argument may be used?
Or better https://gcc.gnu.org/wiki/DontUseInlineAsm because it's useless for this; just let GCC emit the movzb loads itself. unsigned char* is safe from strict-aliasing UB so you can safely cast any pointer to unsigned char* and dereference it, without even having to use memcpy or other hacks to fight against language rules for wider unaligned or type-punned accesses.
But if you insist on inline asm, read manuals and tutorials, links at https://stackoverflow.com/tags/inline-assembly/info. You can't just throw code at the wall until it sticks with inline asm: you must understand why your code is safe to have any hope of it being safe. There are many ways for inline asm to happen to work but actually be broken, or be waiting to break with different surrounding code.
This is a safe and not totally terrible version (other than the unavoidable optimization-defeating parts of inline asm). You do still want a movzbl load for both loads, even though the return value is only 8 bits. movzbl is the natural efficient way to load a byte, replacing instead of merging with the old contents of a full register.
unsigned char read(void *index, void *data)
{
uintptr_t value;
asm (
" movzb (%[idx]), %k[out] \n\t"
" movzb (%[arr], %[out]), %k[out]\n"
: [out] "=&r" (value) // early-clobber output
: [idx] "r" (index), [arr] "r" (data)
: "memory" // we deref some inputs as pointers
);
return value;
}
Note the early-clobber on the output: this stops gcc from picking the same register for output as one of the inputs. It would be safe for it to destroy the [idx] register with the first load, but I don't know how to tell GCC that in one asm statement. You could split your asm statement into two separate ones, each with their own input and output operands, connecting the output of the first to the input of the 2nd via a local variable. Then neither one would need early-clobber because they're just wrapping single instructions like GNU C inline asm syntax is designed to do nicely.
Godbolt with test caller to see how it inlines / optimizes when called twice, with i386 clang and x86-64 gcc. e.g. asking for index in a register forces an LEA, instead of letting the compiler see the deref and letting it pick an addressing mode for *index. Also the extra movzbl %al, %eax done by the compiler when adding to unsigned sum because we used a narrow return type.
I used uintptr_t value so this can compile for 32-bit and 64-bit x86. There's no harm in making the output from the asm statement wider than the return value of the function, and that saves us from having to use size modifiers like movzbl (%1), %k0 to get GCC to print the 32-bit register name (like EAX) if it chose AL for an 8-bit output variable, for example.
I did decided to actually use %k[out] for the benefit of 64-bit mode: we want movzbl (%rdi), %eax, not movzb (%rdi), %rax (wasting a REX prefix).
You might as well declare the function to return unsigned int or uintptr_t, though, so the compiler knows that it doesn't have to redo zero-extension. OTOH sometimes it can help the compiler to know that the value-range is only 0..255. You could tell it that you produce a correctly-zero-extend value using if(retval>255) __builtin_unreachable() or something. Or you could just not use inline asm.
You don't need asm volatile. (Assuming you want to let it optimize away if the result is unused, or be hoisted out of loops for constant inputs). You only need a "memory" clobber so if it does get used, the compiler knows that it reads memory.
(A "memory" clobber counts as all memory being an input, and all memory being an output. So it can't CSE, e.g. hoist out of a loop, because as far as the compiler knows one invocation might read something a previous one wrote. So in practice a "memory" clobber is about as bad as asm volatile. Even two back-to-back calls to this function without touching the input array force the compiler to emit the instructions twice.)
You could avoid this with dummy memory-input operands so the compiler knows this asm block doesn't modify memory, only read it. But if you actually care about efficiency, you shouldn't be using inline asm for this.
But like I said there is zero reason to use inline asm:
This will do exactly the same thing in 100% portable and safe ISO C:
// safe from strict-aliasing violations
// because unsigned char* can alias anything
inline
unsigned char read(void *index, void *data) {
unsigned idx = *(unsigned char*)index;
unsigned char * dp = data;
return dp[idx];
}
You could cast one or both pointers to volatile unsigned char* if you insist on the access happening every time and not being optimized away.
Or maybe even to atomic<unsigned char> * depending on what you're doing. (That's a hack, prefer C++20 atomic_ref to atomically load/store on objects that are normally not atomic.)
I was wondering if there was any way that would allow me to specify anything other than eax, ebx, ecx and edx as output operands.
Lets say I want to put the content of r8 in a variable, Is it possible to write something like this :
__asm__ __volatile__ (""
:"=r8"(my_var)
: /* no input */
);
It is not clear why would you need to put contents of a specific register into a variable, given a volatile nature of the most of them.
GNU C only has specific-register constraints for the original 8 registers, like "=S"(rsi). For r8..r15, your only option (to avoid needing a mov instruction inside the asm statement) is a register-asm variable.
register long long my_var __asm__ ("r8");
__asm__ ("" :"=r"(my_var)); // guaranteed that r chooses r8
You may want to use an extra input/output constraint to control where you sample the value of r8. (e.g. "+rm"(some_other_var) will make this asm statement part of a data dependency chain in your function, but that will also prevent constant-propagation and other optimizations.) asm volatile may help with controlling the ordering, but that's not guaranteed.
It sometimes works to omit the __asm__ ("" :"=r"(my_var)); statement using the register local as an operand, but it's only guaranteed to work if you do use it: https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables. (And see discussion in comments on a previous version of this answer which suggested you could skip that part.) It doesn't make your code any slower, so don't skip that part to make sure your code is safe in general.
The only supported use for this feature is to specify registers for input and output operands when calling Extended asm (see Extended Asm). This may be necessary if the constraints for a particular machine don’t provide sufficient control to select the desired register. To force an operand into a register, create a local variable and specify the register name after the variable’s declaration. Then use the local variable for the asm operand and specify any constraint letter that matches the register
P.S. This is a GCC extension that may not be portable, but should be available on all compilers that support GNU C inline asm syntax.
gcc doesn't have specific-register constraints at all for some architectures, like ARM, so this technique is the only way for rare cases where you want to force specific registers for input or output operands.
Example:
int get_r8d(void) {
register long long my_var __asm__ ("r8");
__asm__ ("" :"=r"(my_var)); // guaranteed that r chooses r8
return my_var * 2; // do something interesting with the value
}
compiled with gcc7.3 -O3 on the Godbolt compiler explorer
get_r8d():
lea eax, [r8+r8] # gcc can use it directly without a MOV first
ret
It should be possible, based on the answer here:
https://stackoverflow.com/a/43197401/3569229
#include <stdint.h>
uint64_t getsp( void )
{
uint64_t sp;
asm( "mov %%r8, %0" : "=rm" ( sp ));
return sp;
}
You can find a list of register names here: https://www3.nd.edu/~dthain/courses/cse40243/fall2015/intel-intro.html
So your code above would be changed to:
__asm__ __volatile__ ("mov %%r8, %0"
:"=rm"(my_var)
: /* no input */
);
guys! I have a problem. How do I get address of class member function by asm in GCC?
In VS2012, we can do below code to get address.
asm {mov eax, offset TEST::foo}
But, in GCC?
__asm__ __volatile__(
"movq offset %1, %%rdi"
"movq %%rdi, %0"
:"=r"(addr)
:"r"(&TEST::foo)
);
It failed...
AT&T syntax doesn't use the offset keyword. And besides, you've asked the compiler to put &TEST::foo in a register already.
__asm__ (
"mov %1, %0"
:"=r"(addr)
:"r"(&TEST::foo)
);
Or better:
__asm__ ( "" // no instructions
:"=r"(addr)
:"0"(&TEST::foo) // same register as operand 0
);
Or even better: addr = &TEST::foo; https://gcc.gnu.org/wiki/DontUseInlineAsm for this, because it stops the compiler from knowing what's going on.
But if you are going to use inline asm, make sure you let the compiler do as much for you as it can. Use constraints to tell it where you want the input, and where you left the output. If the first or last instruction of an inline-asm statement is a mov, usually that means you're doing it wrong. (See the inline-assembly tag wiki for some links to guides on how to write GNU C inline asm that doesn't suck.
Bugs in your original: you didn't declare a clobber on RDI, so the compiler will still assume you didn't modify it.
You don't need volatile if the only reason to run the code in the asm statement is to produce the output operands, not for side effects. Leaving out volatile lets the compiler optimize around it, and even drop it entirely if the output is unused.
Output register in inline assembly must be declared with the "=" constraint, meaning "write-only" [1]. What exactly does this mean - is it truly forbidden to read and modify them within the assembly? For example, consider this code:
uint8_t one ()
{
uint8_t res;
asm("ldi %[res],0\n"
"inc %[res]\n"
: [res] "=r" (res)
);
return res;
}
The assembly sets the output register to 0 then increments it. Is this breaking the "write-only" constraint?
UPDATE
I'm seeing problems where my inline asm breaks when I change it to work directly on an output register, as opposed to using r16 for the computation and finally mov'ing r16 into the output register. The code is here: http://ideone.com/JTpYma . It prints results to serial, you just need to define F_CPU and BAUD. The problem appears only when using gcc-4.8.0 and not using gcc-4.7.2.
[1] http://www.nongnu.org/avr-libc/user-manual/inline_asm.html
The compiler doesn't care whether you read it or not, it just won't put the initial value of the variable into the register. Your example is entirely legal, but people often wrongly expect to get result 2 from this code:
uint8_t one ()
{
uint8_t res = 1;
asm("inc %[res]\n"
: [res] "=r" (res)
);
return res;
}
Since it's only an output constraint, the initial value of res is not guaranteed to be loaded into the register. In fact, the initializer may even be optimized away on the assumption that the asm block will overwrite it anyway. The above code is compiled to this by my version of avr-gcc:
inc r24
ret
As you can see, the compiler indeed removed loading 1 into res and hence into r24 thus producing undefined result.
Update
The problem with the updated program in the question is that it also has an input register operand. By default the compiler assumes that all inputs are consumed before the outputs are assigned so it's safe to allocate overlapping registers. That's clearly not the case for your example. You should use an "early clobber" modifier (&) for the output. This is what the manual has to say about that:
& Means (in a particular alternative) that this operand is an
earlyclobber operand, which is modified before the instruction is
finished using the input operands. Therefore, this operand may not lie
in a register that is used as an input operand or as part of any
memory address.
Nobody said gcc inline asm was easy :D