With inline assembly in GCC, you can specify an immediate asm operand with the "i" constraint, like so:
void set_to_five(int* p)
{
asm
(
"movl %1, (%0);"
:: "r" (p)
, "i" (5)
);
}
int main()
{
int i;
set_to_five(&i);
assert(i == 5);
}
Nothing wrong with this so far, except that it's in horrible AT&T syntax. So let's try again with .intel_syntax noprefix:
void set_to_five(int* p)
{
asm
(
".intel_syntax noprefix;"
"mov [%0], %1;"
".att_syntax prefix;"
:: "r" (p)
, "i" (5)
);
}
But this doesn't work, since the compiler inserts a $ prefix before the immediate value, which the assembler no longer understands.
How do I use the "i" constraint with Intel syntax?
You should be able to use %c1 (see modifiers).
Note that if you are using symbolic names (which I find easier to read/maintain), you can use %c[five].
Lastly, I realize code this is just a "for-instance," but you are modifying memory without telling the compiler. This is a "bad thing." Consider either using a output constraint for the memory ("=m") or adding the "memory" clobber.
Related
I'm learning ARM inline assembly, and is confused about a very simple function: assign the value of x to y (both are int type), on arm32 and arm64 why different clobber description required?
Here is the code:
#include <arm_neon.h>
#include <stdio.h>
void asm_test()
{
int x = 10;
int y = 0;
#ifdef __aarch64__
asm volatile(
"mov %w[in], %w[out]"
: [out] "=r"(y)
: [in] "r"(x)
: "r0" // r0 not working, but r1 or x1 works
);
#else
asm volattile(
"mov %[in], %[out]"
: [out] "=r"(y)
: [in] "r"(x)
: "r0" // r0 works, but r1 not working
);
#endif
printf("y is %d\n", y);
}
int main() {
arm_test();
return 0;
}
Tested on my rooted android phone, for arm32, r0 generates correct result but r1 won't. For arm64, r1 or x1 generate correct result, and r0 won't. Why on arm32 and arm64 they are different? What is the concrete rule for this and where can I find it?
ARM / AArch64 syntax is mov dst, src
Your asm statement only works if the compiler happens to pick the same register for both "=r" output and "r" input (or something like that, given extra copies of x floating around).
Different clobbers simply perturb the compiler's register-allocation choices. Look at the generated asm (gcc -S or on https://godbolt.org/, especially with -fverbose-asm.)
Undefined Behaviour from getting the constraints mismatched with the instructions in the template string can still happen to work; never assume that an asm statement is correct just because it works with one set of compiler options and surrounding code.
BTW, x86 AT&T syntax does use mov src, dst, and many GNU C inline-asm examples / tutorials are written for that. Assembly language is specific to the ISA and the toolchain, but a lot of architectures have an instruction called mov. Seeing a mov does not mean this is an ARM example.
Also, you don't actually need a mov instruction to use inline asm to copy a valid. Just tell the compiler you want the input to be in the same register it picks for the output, whatever that happens to be:
// not volatile: has no side effects and produces the same output if the input is the same; i.e. the output is a pure function of the input.
asm (""
: "=r"(output) // pick any register
: "0"(input) // pick the same register as operand 0
: // no clobbers
);
Trying to multiply 400 by 2 with inline assembly, using the fact imul implicity multiplies by eax. However, i'm getting "undefined reference" compile errors to $1 and $2
int c;
int a = 400;
int b = 2;
__asm__(
".intel_syntax;"
"mov eax, $1;"
"mov ebx, $2;"
"imul %0, ebx;"
".att_syntax;"
: "=r"(c)
: "r" (a), "r" (b)
: "eax");
std::cout << c << std::endl;
Do not use fixed registers in inline asm, especially if you have not listed them as clobbers and have not made sure inputs or outputs don't overlap them. (This part is basically a duplicate of segmentation fault(core dumped) error while using inline assembly)
Do not switch syntax in inline assembly as the compiler will substitute wrong syntax. Use -masm=intel if you want intel syntax.
To reference arguments in an asm template string use % not $ prefix. There's nothing special about $1; it gets treated as a symbol name just like if you'd used my_extern_int_var. When linking, the linker doesn't find a definition for a $1 symbol.
Do not mov stuff around unnecessarily. Also remember that just because something seems to work in a certain environment, that doesn't guarantee it's correct and will work everywhere every time. Doubly so for inline asm. You have to be careful. Anyway, a fixed version could look like:
__asm__(
"imul %0, %1"
: "=r"(c)
: "r" (a), "0" (b)
: );
Has to be compiled using -masm=intel. Notice b has been put into the same register as c.
using the fact imul implicity multiplies by eax
That's not true for the normal 2-operand form of imul. It works the same as other instructions, doing dst *= src so you can use any register, and not waste uops writing the high half anywhere if you don't even want it.
I'm trying to write a custom memory-copy function for AVR as inline assembly, because avr-gcc will always use a loop for memcpy and struct assignment, which is inefficient in terms of time. I want to use memory operands to avoid having to add a "memory" clobber. I currently have this:
void copy_2_bytes (char *restrict dst, char *restrict src)
{
struct S {
char x[2];
};
__asm__(
" ld __tmp_reg__,%[src]+\n"
" st %[dst]+,__tmp_reg__\n"
" ld __tmp_reg__,%[src]+\n"
" st %[dst]+,__tmp_reg__\n"
: [dst] "=m" ( *(struct S *)dst )
: [src] "m" ( *(struct S *)src )
);
}
This compiles, but it's incorrect in general because it modifies the pointer register pairs corresponding to the memory operands. It's easy to see that gcc assumes that the registers stay unchanged, for example by adding "*dst = 0;" after the assembly.
On the other hand, the Y and Z registers support the "ldd" and "std" instructions, which also take an immediate offset, so they can be used to access multiple bytes without being modified. But then there doesn't seem to be a way to force gcc to not select the X register, which doesn't support that.
UPDATE
Actually, if gcc determines that the address of the memory operand is constant, it will pass the constant address into the assembly, instead of a register pair. So now, I have absolutely no idea how to deal with this. Are there some magic instructions or assembly macros which can deal with both pointer registers and constant addresses at the same time?
I am trying to make an earlier verion Linux got compiled, you can download the source code from git://github.com/azru0512/linux-0.12.git. While compiling ''kernel/blk_drv/ramdisk.c'', I got error message below,
ramdisk.c:36:10: error: can't find a register in class 'CREG' while reloading 'asm'
ramdisk.c:40:10: error: can't find a register in class 'CREG' while reloading 'asm'
ramdisk.c:36:10: error: 'asm' operand has impossible constraints
ramdisk.c:40:10: error: 'asm' operand has impossible constraints
What in ramdisk.c are,
if (CURRENT-> cmd == WRITE) {
(void) memcpy(addr,
CURRENT->buffer,
len);
} else if (CURRENT->cmd == READ) {
(void) memcpy(CURRENT->buffer,
addr,
len);
} else
panic("unknown ramdisk-command");
And the memcpy is,
extern inline void * memcpy(void * dest,const void * src, int n)
{
__asm__("cld\n\t"
"rep\n\t"
"movsb"
::"c" (n),"S" (src),"D" (dest)
:"cx","si","di");
return dest;
}
I guess it's memcpy (include/string.h) inline asm problem, so I remove the clobber list from it but without luck. Could you help me to find out what's going wrong? Thanks!
GCC's syntax for this has changed / evolved a bit.
You must now specify each of the special target registers as an output operand:
...("...instructions..."
: "=c"(n), "=S"(src), "=D"(dest)
and then additionally as the same registers as source operands:
: "0"(n), "1"(src), "2"(dest)
and finally you need to clobber "memory" (I can't remember offhand if this affects condition codes, if so you would also need "cc"):
: "memory")
Next, because this instruction should not be moved or deleted, you need to use either volatile or __volatile__ (I'm not entirely sure why but without this the instructions were deleted, in my test-case).
Last, it's no longer a good idea to attempt to override memcpy because gcc "knows" how to implement the function. You can override gcc's knowledge with -fno-builtin.
This compiles (for me anyway, with a somewhat old gcc on an x86-64 machine):
extern inline void * memcpy(void * dest,const void * src, int n)
{
__asm__ volatile("cld\n\t"
"rep\n\tmovsb\n\t"
: "=c" (n), "=S" (src), "=D" (dest)
: "0" (n), "1" (src), "2" (dest)
: "memory", "cc");
return dest;
}
This exact problem & its reasons are discussed on GCC's bugzilla :
Bug 43998 - inline assembler: can't set clobbering for input register
gcc wont allow input & output registers as clobbers.
If you corrupt input register, do a dummy output to same register :
unsigned int operation;
unsigned int dummy;
asm ("cpuid" : "=a" (dummy) : "0" ( operation) :);
I'd like to use two array passed into a C function as below in assembly using a GCC compiler (Xcode on Mac). It has been many years since I've written assembly, so I'm sure this is an easy fix.
The first line here is fine. The second line fails. I'm trying to do the following, A[0] += x[0]*x[0], and I want to do this for many elements in the array with different indices. I'm only showing one here. How do I use a read/write array in the assembly block?
And if there is a better approach to do this, I'm open ears.
inline void ArrayOperation(float A[36], const float x[8])
{
float tmp;
__asm__ ( "fld %1; fld %2; fmul; fstp %0;" : "=r" (tmp) : "r" (x[0]), "r" (x[0]) );
__asm__ ( "fld %1; fld %2; fadd; fstp %0;" : "=r" (A[0]) : "r" (A[0]), "r" (tmp) );
// ...
}
The reason why the code fails is not because of arrays, but because of the way fld and fst instructions work. This is the code you want:
float tmp;
__asm__ ( "flds %1; fld %%st(0); fmulp; " : "=t" (tmp) : "m" (x[0]) );
__asm__ ( "flds %1; fadds %2;" : "=t" (A[0]) : "m" (A[0]), "m" (tmp) );
fld and fst instructions need a memory operand. Also, you need to specify if you want to load float (flds), double (fldl) or long double (fldt). As for the output operands, I just use a constraint =t, which simply tells the compiler that the result is on the top of the register stack, i.e. ST(0).
Arithmetic operations have either no operands (fmulp), or a single memory operand (but then you have to specify the size again, fmuls, fadds etc.).
You can read more about inline assembler, GNU Assembler in general, and see the Intel® 64 and IA-32 Architectures Software Developer’s Manual.
Of course, it is best to get rid of the temporary variable:
__asm__ ( "flds %1; fld %%st(0); fmulp; fadds %2;" : "=t" (A[0]) : "m" (x[0]), "m" (A[0]));
Though if a performance improvement is what you're after, you don't need to use assembler. GCC is completely capable of producing this code. But you might consider using vector SSE instructions and other simple optimization techniques, such as breaking the dependency chains in the calculations, see Agner Fog's optimization manuals