Which inline assembly code is correct for rdtscp? - gcc

Disclaimer: Words cannot describe how much I detest AT&T style syntax
I have a problem that I hope is caused by register clobbering. If not, I have a much bigger problem.
The first version I used was
static unsigned long long rdtscp(void)
{
unsigned int hi, lo;
__asm__ __volatile__("rdtscp" : "=a"(lo), "=d"(hi));
return (unsigned long long)lo | ((unsigned long long)hi << 32);
}
I notice there is no 'clobbering' stuff in this version. Whether or not this is a problem I don't know... I suppose it depends if the compiler inlines the function or not. Using this version causes me problems that aren't always reproducible.
The next version I found is
static unsigned long long rdtscp(void)
{
unsigned long long tsc;
__asm__ __volatile__(
"rdtscp;"
"shl $32, %%rdx;"
"or %%rdx, %%rax"
: "=a"(tsc)
:
: "%rcx", "%rdx");
return tsc;
}
This is reassuringly unreadable and official looking, but like I said my issue isn't always reproducible so I'm merely trying to rule out one possible cause of my problem.
The reason I believe the first version is a problem is that it is overwriting a register that previously held a function parameter.
What's correct... version 1, or version 2, or both?

Here's C++ code that will return the TSC and store the auxiliary 32-bits into the reference parameter
static inline uint64_t rdtscp( uint32_t & aux )
{
uint64_t rax,rdx;
asm volatile ( "rdtscp\n" : "=a" (rax), "=d" (rdx), "=c" (aux) : : );
return (rdx << 32) + rax;
}
It is better to do the shift and add to merge both 32-bit halves in C++ statement rather than inline, this allows the compiler to schedule those instructions as it sees fit.

According to this, this operation clobbers EDX and ECX. You need to mark those registers as clobbered which is what the second one does. BTW, is this the link where you got the above code or did you find it elsewhere? It also shows a few other variaitions for timings as well which is pretty neat.

Related

Confusion about different clobber description for arm inline assembly

I'm learning ARM inline assembly, and is confused about a very simple function: assign the value of x to y (both are int type), on arm32 and arm64 why different clobber description required?
Here is the code:
#include <arm_neon.h>
#include <stdio.h>
void asm_test()
{
int x = 10;
int y = 0;
#ifdef __aarch64__
asm volatile(
"mov %w[in], %w[out]"
: [out] "=r"(y)
: [in] "r"(x)
: "r0" // r0 not working, but r1 or x1 works
);
#else
asm volattile(
"mov %[in], %[out]"
: [out] "=r"(y)
: [in] "r"(x)
: "r0" // r0 works, but r1 not working
);
#endif
printf("y is %d\n", y);
}
int main() {
arm_test();
return 0;
}
Tested on my rooted android phone, for arm32, r0 generates correct result but r1 won't. For arm64, r1 or x1 generate correct result, and r0 won't. Why on arm32 and arm64 they are different? What is the concrete rule for this and where can I find it?
ARM / AArch64 syntax is mov dst, src
Your asm statement only works if the compiler happens to pick the same register for both "=r" output and "r" input (or something like that, given extra copies of x floating around).
Different clobbers simply perturb the compiler's register-allocation choices. Look at the generated asm (gcc -S or on https://godbolt.org/, especially with -fverbose-asm.)
Undefined Behaviour from getting the constraints mismatched with the instructions in the template string can still happen to work; never assume that an asm statement is correct just because it works with one set of compiler options and surrounding code.
BTW, x86 AT&T syntax does use mov src, dst, and many GNU C inline-asm examples / tutorials are written for that. Assembly language is specific to the ISA and the toolchain, but a lot of architectures have an instruction called mov. Seeing a mov does not mean this is an ARM example.
Also, you don't actually need a mov instruction to use inline asm to copy a valid. Just tell the compiler you want the input to be in the same register it picks for the output, whatever that happens to be:
// not volatile: has no side effects and produces the same output if the input is the same; i.e. the output is a pure function of the input.
asm (""
: "=r"(output) // pick any register
: "0"(input) // pick the same register as operand 0
: // no clobbers
);

GCC Inline Assembly 'Nd' constraint

I'm developing a small toy kernel in C. I'm at the point where I need to get user input from the keyboard. So far, I have implemented inb using the following code:
static inline uint8_t inb(uint16_t port) {
uint8_t ret;
asm volatile("inb %1, %0" : "=a"(ret) : "Nd"(port));
return ret;
}
I know that the "=a" constraint means that al/ax/eax will be copied to ret as output, but I'm still confused about the "Nd" constraint. Can anyone provide some insight on why this constraint is necessary? Or why I can't just use a general purpose register constraint like "r" or "b"? Any help would be appreciated.
The in instruction (returning a byte) can either take an immediate 8 bit value as a port number, or a port specified in the dx register. More on the in instruction can be found in the instruction reference (Intel syntax) . The machine constraints being used can be found in the GCC docs . If you scroll down to x86 family you'll see:
d
The d register
N
Unsigned 8-bit integer constant (for in and out instructions).

can't find a register in class 'CREG' while reloading 'asm' - memcpy inline asm

I am trying to make an earlier verion Linux got compiled, you can download the source code from git://github.com/azru0512/linux-0.12.git. While compiling ''kernel/blk_drv/ramdisk.c'', I got error message below,
ramdisk.c:36:10: error: can't find a register in class 'CREG' while reloading 'asm'
ramdisk.c:40:10: error: can't find a register in class 'CREG' while reloading 'asm'
ramdisk.c:36:10: error: 'asm' operand has impossible constraints
ramdisk.c:40:10: error: 'asm' operand has impossible constraints
What in ramdisk.c are,
if (CURRENT-> cmd == WRITE) {
(void) memcpy(addr,
CURRENT->buffer,
len);
} else if (CURRENT->cmd == READ) {
(void) memcpy(CURRENT->buffer,
addr,
len);
} else
panic("unknown ramdisk-command");
And the memcpy is,
extern inline void * memcpy(void * dest,const void * src, int n)
{
__asm__("cld\n\t"
"rep\n\t"
"movsb"
::"c" (n),"S" (src),"D" (dest)
:"cx","si","di");
return dest;
}
I guess it's memcpy (include/string.h) inline asm problem, so I remove the clobber list from it but without luck. Could you help me to find out what's going wrong? Thanks!
GCC's syntax for this has changed / evolved a bit.
You must now specify each of the special target registers as an output operand:
...("...instructions..."
: "=c"(n), "=S"(src), "=D"(dest)
and then additionally as the same registers as source operands:
: "0"(n), "1"(src), "2"(dest)
and finally you need to clobber "memory" (I can't remember offhand if this affects condition codes, if so you would also need "cc"):
: "memory")
Next, because this instruction should not be moved or deleted, you need to use either volatile or __volatile__ (I'm not entirely sure why but without this the instructions were deleted, in my test-case).
Last, it's no longer a good idea to attempt to override memcpy because gcc "knows" how to implement the function. You can override gcc's knowledge with -fno-builtin.
This compiles (for me anyway, with a somewhat old gcc on an x86-64 machine):
extern inline void * memcpy(void * dest,const void * src, int n)
{
__asm__ volatile("cld\n\t"
"rep\n\tmovsb\n\t"
: "=c" (n), "=S" (src), "=D" (dest)
: "0" (n), "1" (src), "2" (dest)
: "memory", "cc");
return dest;
}
This exact problem & its reasons are discussed on GCC's bugzilla :
Bug 43998 - inline assembler: can't set clobbering for input register
gcc wont allow input & output registers as clobbers.
If you corrupt input register, do a dummy output to same register :
unsigned int operation;
unsigned int dummy;
asm ("cpuid" : "=a" (dummy) : "0" ( operation) :);

Loading SSE registers

I'm working on homework project for OS development class. One task is to save context of SSE registers upon interrupt. Now, saving and restoring context is easy (fxsave/fxsave). But I have problem with testing. I want to put same sample date into one of registers, but all I get is error interrupt 6. Here is code:
// load some SSE registers
struct Vec4 {
int x, y, z, w;
} vec = { 0, 1, 2, 3 };
asm volatile ( "movl %0, %%eax"
: /* no output */
: "r"( &vec )
:
);
asm volatile ( "movups (%eax), %xmm0" );
I searched on internet for solution. All I got is that it might something to do with effective address space. But I don't know what it is.
You need to use a memory operand as a constraint in the inline assembly. This is much better than generating the address by yourself (as you tried with the & operator) and loading in in a register, because the latter will not work if the address is rip relative or relocatable.
asm volatile ( "movups %0, %%xmm0"
: /* no output */
: "m"( vec )
:
);
And you need to use two "%%" before register names.
Read more about gcc's constraints here: http://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints . The title is somewhat misleading, as this concept is far from simple :-)
I found out what is problem. Execution of SSE instructions must be enabled by setting some flags in CR0 and CR4 registers. More info here: http://wiki.osdev.org/SSE
You're making this way harder than it needs to be - just use the intrinsics in the *mmintrin.h headers, e.g.
#include <emmintrin.h>
__m128i vec = _mm_set_epi32(3, 2, 1, 0);
If you need to put this in a specific XMM register then use the above example as a starting point, then generate asm, e.g. using gcc -S and use the generated asm as a template for your own code.

Any constraints to refer to the high half of a register in gcc inline assembly?

In my C code there are some inlined assembly calling PCI BIOS service. Now the problem is that one of the results is returned in the %ah register, but I can't find a constrant to refer to that register.
What I want is to write like following:
asm("lcall *%[call_addr]" : "something here"(status) :);
and the variable status contains the value of %ah register.
If I use "=a"(status) and add a mov %%ah, %%al instruction it will work. But it seems ugly.
Any suggestions?
I don't think there is a way to specify %ah in a constraint - the GCC x86 backend knows that sub-registers contain particular parts of values, but doesn't really treat them as independent entities.
Your approach will work; another option is to shift status down outside the inline assembler. e.g. this:
unsigned int foo(void)
{
unsigned int status;
asm("movb $0x12, %%ah; movb $0x34, %%al" : "=a"(status) : );
return (status >> 8) & 0xff;
}
...implements the (status >> 8) & 0xff as a single movzbl %ah, %eax instruction.
A third option is to use a small structure:
unsigned int bar(void)
{
struct { uint8_t al; uint8_t ah; } status;
asm("movb $0x12, %%ah; movb $0x34, %%al" : "=a"(status) : );
return status.ah;
}
I'm not sure whether this is nicer or not - it seems a little more self-documenting, but the use of a small structure with a register constraint looks less obviously correct. However, it generates the same code as foo() above.
(Disclaimer: code generation tested with gcc 4.3.2 only; results may well differ on other versions.)

Resources