how to let Inline assembly pass -O1 optimization - gcc

I have following dispatch code for my user level thread library.
The code can pass GCC and runs correctly without optimization, but if I choose -O1 optimization (also higher levels), when run the code, program generates segmentation fault.
Basically the function does save context and jump to next context.
void __attribute__ ((noinline)) __lwt_dispatch(lwt_context *curr, lwt_context *next)
{
__asm__ __volatile
(
"mov 0xc(%ebp),%eax\n\t"
"mov 0x4(%eax),%ecx\n\t"
"mov (%eax),%edx\n\t"
"mov 0x8(%ebp),%eax\n\t"
"add $0x4,%eax\n\t"
"mov 0x8(%ebp),%ebx\n\t"
"push %ebp\n\t"
"push %ebx\n\t"
"mov %esp,(%eax)\n\t"
"movl $return,(%ebx)\n\t"
"mov %ecx,%esp\n\t"
"jmp *%edx\n\t"
"return: pop %ebx\n\t"
"pop %ebp\n\t"
);
}

Thanks for help, I figured out some ways to solve it.
Normally compile this function as a separate .o file then use O3 to optimize it with other files.
using inline assembly is much easier and simpler than this function. Like below:
int foo = 10, bar = 15;
asm volatile("addl %%ebx,%%eax"
:"=a"(foo)
:"a"(foo), "b"(bar)
);
printf("foo+bar=%d\n", foo);
Another post has helped me figuring out labeling problem, see here: Labels in GCC inline assembly

Related

How to set a bit in Control Register (cr0) with inline assembly?

I'm trying to set bit 30 of cr0 register with inline assembly. I'm using the following assembly in my kernel module,
__asm__ (
"mov %%cr0, %%rax\n\t"
"or 0x40000000, %%eax\n\t"
"mov %%rax, %%cr0\n\t"
::
:"%rax"
);
My module compiles but upon inserting the module my terminal freezes. From a new terminal when I try to remove the module, it shows the following,
rmmod: ERROR: Module xxx is in use
and dmesg shows the following in red color,
RIP [<ffffffffc09c604c>] hello_start+0x4c/0x1000 [ModuleName]
how to set control register 0 (cr0) bits in x86-64 using gcc assembly on linux and Trying to disable paging through cr0 register also talk about the same problem. I try to follow their solution but I just can't make it work. Any help, where am I making mistake in my inline assembly?
Update post:
I have fixed the code according #prl suggestions, following is my full source code,
u64 get_cr0(void){
u64 cr0;
__asm__ (
"mov %%cr0, %%rax\n\t"
"mov %%eax, %0\n\t"
: "=m" (cr0)
: /* no input */
: "%rax"
);
return cr0;
}
static int __init hello_start(void){
printk(KERN_INFO "Loading hello module...\n");
printk(KERN_INFO "Hello world\n");
printk(KERN_INFO "cr0 = 0x%8.8X\n", get_cr0());
__asm__ (
"mov %%cr0, %%rax\n\t"
"or $0x40000000, %%eax\n\t"
"mov %%rax, %%cr0\n\t"
::
:"%rax"
);
printk(KERN_INFO "cr0 after change = 0x%8.8X\n", get_cr0());
return 0;
}
static void __exit hello_end(void){
printk(KERN_INFO "Goodbye Mr.\n");
__asm__ (
"mov %%cr0, %%rax\n\t"
"and $~(0x40000000), %%eax\n\t"
"mov %%rax, %%cr0\n\t"
::
:"%rax"
);
}
Indeed, my system is running super slow after loading the module. But after changing the bit I still did not see any difference in cr0 register value. Following is the output in dmesg
[ +0.000400] Loading hello module...
[ +0.000001] Hello world
[ +0.000002] cr0 = 0x80050033
[ +0.000312] cr0 after change = 0x80050033
[ +6.085675] perf interrupt took too long (2522 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
Why can't I see the change in bit 30 of the cr0 register?
Therefore, Upon removing the module, I tried to clear bit 30 hoping my system will start to respond normally. But seems like that did not work. My system is still running slow. Any thoughts, how to bring back the system to its normal functional state after modifying the cr0?

Inline assembly multiplication "undefined reference" on inputs

Trying to multiply 400 by 2 with inline assembly, using the fact imul implicity multiplies by eax. However, i'm getting "undefined reference" compile errors to $1 and $2
int c;
int a = 400;
int b = 2;
__asm__(
".intel_syntax;"
"mov eax, $1;"
"mov ebx, $2;"
"imul %0, ebx;"
".att_syntax;"
: "=r"(c)
: "r" (a), "r" (b)
: "eax");
std::cout << c << std::endl;
Do not use fixed registers in inline asm, especially if you have not listed them as clobbers and have not made sure inputs or outputs don't overlap them. (This part is basically a duplicate of segmentation fault(core dumped) error while using inline assembly)
Do not switch syntax in inline assembly as the compiler will substitute wrong syntax. Use -masm=intel if you want intel syntax.
To reference arguments in an asm template string use % not $ prefix. There's nothing special about $1; it gets treated as a symbol name just like if you'd used my_extern_int_var. When linking, the linker doesn't find a definition for a $1 symbol.
Do not mov stuff around unnecessarily. Also remember that just because something seems to work in a certain environment, that doesn't guarantee it's correct and will work everywhere every time. Doubly so for inline asm. You have to be careful. Anyway, a fixed version could look like:
__asm__(
"imul %0, %1"
: "=r"(c)
: "r" (a), "0" (b)
: );
Has to be compiled using -masm=intel. Notice b has been put into the same register as c.
using the fact imul implicity multiplies by eax
That's not true for the normal 2-operand form of imul. It works the same as other instructions, doing dst *= src so you can use any register, and not waste uops writing the high half anywhere if you don't even want it.

Can I use function macros inside gcc inline assembly block?

#define MOV_MACRO(R0,R1)\
{\
"mov R0, R1 \n\t"\
}
__asm__ volatile(\
MOV_MACRO(r4,r5)
:"r4","r5"\
);\
Is it possible to use a function like this in an asm block ?
If not, please suggest ways to use it.
Yes, it's possible, but if you need such functionality you probably shouldn't be using inline asm but a separate asm module.
#define MOV_MACRO(R0,R1) "mov " #R0 ", " #R1 "\n\t"
void foo()
{
__asm__ volatile(
MOV_MACRO(r4,r5)
::
:"r4","r5"
);
}

GCC inline assembly error: Error: junk `(%esp)' after expression

GCC inline assembly error: Error: junk `(%esp)' after expression
I'm studying gcc inline assembly. My environment is Win 7 32bit, mingw-gcc 4.6.1.
I have got a problem about the 'm' constraint. Here is my c function code:
static int asm_test(int a, int b)
{
int c = 0;
__asm__ __volatile__(".intel_syntax\n"
"mov eax, %1\n" //error
"mov edx, %2\n" //error
"add eax, edx\n"
"mov %0, eax\n" //error
".att_syntax"
:"=m"(c)\
:"m"(a),"m"(b)\
:"eax","edx"
);
return c;
}
For at&t code, it is like this:
static int asm_test(int a, int b)
{
int c = 0;
__asm__ __volatile__(
"movl %1, $eax\n" //error
"movl %2, $edx\n" //error
"addl $edx, $eax\n"
"movl $eax, %0\n" //error
:"=m"(c)\
:"m"(a),"m"(b)\
:"eax","edx"
);
return c;
}
For each of the three lines which operate input/output operands, gcc generate an error when compiling, read like this:
C:\Users\farta\AppData\Local\Temp\cc99HxYj.s:22: Error: junk `(%esp)' after expression
If i use 'r' for input/output constraint, the code will work. But I cannot understand why it works and what the error stands for. Can anyone tell me? As far as I know 'm' is just telling gcc not to allocate registers but directly access them in memory if inline asm code try to access input/output operands. Is this correct?
Thanks a lot.
The problem here is the GCC generates AT&T syntax construct for %0, %1 and %2. If you look at the generated assembly, it looks like:
.intel_syntax
mov eax, 8(%ebp)
mov edx, 12(%ebp)
add eax, edx
mov -4(%ebp), eax
which is not a valid Intel syntax.
Generally, you don't need to include in the inline assembly explicit load/store operation - just specify register constraint and the compiler will generate loads/stores by itself. This has the advantage that even if your variables (parameters, locals) do not reside in memory at all, but are in registers your code will still be correct - unlike in the case if you explicitly put memory load/stores there.
For your example, try the following code, look at the assembly (gcc -S) and notice how the compiler will perform moves from argument area (e.g. stack on x86) all by itself.
int asm_test(int a, int b)
{
__asm__ __volatile__ (
".intel_syntax\n"
"add %0, %1 \n"
".att_syntax \n"
:"+r"(a)
:"r"(b));
return a;
}

Error in my first assembly program (GCC Inline Assembly)

After a lot of internet research I implemented a small assembler routine in my C++ program to get the CPU's L1 cache size using cpuid.
int CPUID_getL1CacheSize() {
int l1CacheSize = -1;
asm ( "mov $5, %%eax\n\t" // EAX=80000005h: L1 Cache and TLB Identifiers
"cpuid\n\t"
"mov %%eax, %0" // eax into l1CacheSize
: "=r"(l1CacheSize) // output
: // no input
: "%eax" // clobbered register
);
return l1CacheSize;
}
It works perfectly on Windows 7 64 bit with MinGW (GCC, G++). Next I tried this on my Mac computer using GCC 4.0 and there must be an error somewhere because my program shows strange strings in the ComboBoxes and some signals cannot be connected (Qt GUI).
This is my first assembler program, I hope someone can give me a hint, Thanks!
I think that CPUID actually clobbers EAX, EBX, ECX, EDX, so it's probably just a register trashing problem. The following code appears to work OK with gcc 4.0.1 and 4.2.1 on Mac OS X:
#include <stdio.h>
int CPUID_getL1CacheSize()
{
int l1CacheSize = -1;
asm ( "mov $5, %%eax\n\t" // EAX=80000005h: L1 Cache and TLB Identifiers
"cpuid\n\t"
"mov %%eax, %0" // eax into l1CacheSize
: "=r"(l1CacheSize) // output
: // no input
: "%eax", "%ebx", "%ecx", "%edx" // clobbered registers
);
return l1CacheSize;
}
int main(void)
{
printf("CPUID_getL1CacheSize = %d\n", CPUID_getL1CacheSize());
return 0;
}
Note that you need to compile with -fno-pic as EBX is reserved when PIC is enabled. (Either that or you need to take steps to save and restore EBX).
$ gcc-4.0 -Wall -fno-pic cpuid2.c -o cpuid2
$ ./cpuid2
CPUID_getL1CacheSize = 64
$ gcc-4.2 -Wall -fno-pic cpuid2.c -o cpuid2
$ ./cpuid2
CPUID_getL1CacheSize = 64
$
I finally resolved the problem. I got a compiler error while playing around: "error: PIC register '%ebx' clobbered in 'asm'" and after some internet research I modified my code to:
int CPUID_getL1CacheSize() {
int l1CacheSize = -1;
asm volatile ( "mov $5, %%eax\n\t"
"pushl %%ebx; cpuid; popl %%ebx\n\t"
"mov %%eax, %0"
: "=r"(l1CacheSize)
:
: "%eax"
);
return l1CacheSize;
}
Thanks Paul, The compiler option -fno-pic is a nice solution too.
Greetings

Resources