GCC with the -fomit-frame-pointer option - gcc

I'm using GCC with the -fomit-frame-pointer and -O2 options. When I looked through the assembly code it generated,
push %ebp
movl %esp, %ebp
at the start and pop %ebp at the end is removed. But some redundant subl/addl instructions to esp is left in - subl $12, %esp at the start and addl $12, %esp at the end.
How will I be able to remove them as some inline assembly will jmp to another function before addl is excecuted.

You probably don't want to remove those -- that's usually the code that allocates and deallocates your local variables. If you remove those, your code will trample all over the return addresses and such.
The only safe way to get rid of them is not to use any local variables. Even in macros. And be really careful about inline functions, as they often have their own locals that'll get put in with yours. You may want to consider explicitly disabling function inlining for that section of code, if you can.
If you're absolutely sure that the adds and subs aren't needed (and i mean really, really sure), on my machine GCC apaprently does some stack manipulation to keep the stack aligned at 16 byte boundaries. You may be able to say "-mpreferred-stack-boundary=2", which will align to 4-byte boundaries -- which x86 processors like to do anyway, so no code is generated to realign it. Works on my box with my GCC; int main() { return 0; } turned into
main:
xorl %eax, %eax
ret
but the alignment code looked different to start with...so that may not be the problem for you.
Just so you're warned: optimization causes a lot of weird stuff like that to happen. Be careful with hand-coded assembler language and optimized <insert-almost-any-language-here> code, especially when you're doing something as unstructured as a jump from the middle of one function into another.

I solved the problem by giving a function prototype, then defining it manually like this:
void my_function();
asm (
".globl _my_function\n"
"_my_function:\n\t"
/* Assembler instructions go here */
);
Later I also wanted the function to be exported, so I added this at the end of the source file:
asm (
".section .drectve\n\t"
".ascii \" -export:my_function\"\n"
);

How will I be able to remove them as some inline assembly will jmp to another function before addl is executed.
This will corrupt your stack, that caller expects the stack pointer
to be corrected on function return. Does the other function return
by ret instruction? What exactly do you try to achieve? maybe there's another solution possible?
Please, show us the lines around the function call (in the caller) and your
entry/exit part of your function in question.

Related

0xbffff8a8: aam $-0x8 error when saving the base pointer

I am currently following an introductory course in microelectronics and assembly programming in Uni. At the beginning of every function, I'm saving the caller's base pointer by pushing it onto the stack. Given the following function, I get an error:
.globl my_func
.globl _my_func
my_func:
_my_func:
pushl %ebp
movl %esp,%ebp
movl 4(%esp),%ebx
subl $1,%ebx
movl %ebx,%eax
ret
0xbffff8a8: aam $-0x8 <-EXC_BAD_ACCESS (code=2, address=0xbffff8a8)
I've figured out this is a memory exception, I just don't understand why it's being thrown. When I skip the first two instructions in the function (the base pointer saving), the function runs well. And before you point it out -- yes, I know the function is pointless and slow, I'm just trying to learn how the instructions work, and how to use the stack and registers.
I'm assembling it for IA32 on an Intel Mac with OSX10.9 using LLVM5.1
You need to reset the stack pointer at the end of the function, either explicitly or by popping a register to match what you pushed at the start of the function, otherwise when you return it will be to an invalid address:
popl %ebp ; restore stack pointer to its original value
ret

Why is an empty function not just a return

If I compile an empty C function
void nothing(void)
{
}
using gcc -O2 -S (and clang) on MacOS, it generates:
_nothing:
pushq %rbp
movq %rsp, %rbp
popq %rbp
ret
Why does gcc not remove everything but the ret? It seems like an easy optimisation to make unless it really does something (seems not to, to me). This pattern (push/move at the beginning, pop at the end) is also visible in other non-empty functions where rbp is otherwise unused.
On Linux using a more recent gcc (4.4.5) I see just
nothing:
rep
ret
Why the rep ? The rep is absent in non-empty functions.
Why the rep ?
The reasons are explained in this blog post. In short, jumping directly to a single-byte ret instruction would mess up the branch prediction on some AMD processors. And rather than adding a nop before the ret, a meaningless prefix byte was added to save instruction decoding bandwidth.
The rep is absent in non-empty functions.
To quote from the blog post I linked to: "[rep ret] is preferred to the simple ret either when it is the target of any kind of branch, conditional (jne/je/...) or unconditional (jmp/call/...)".
In the case of an empty function, the ret would have been the direct target of a call. In a non-empty function, it wouldn't be.
Why does gcc not remove everything but the ret?
It's possible that some compilers won't omit frame pointer code even if you've specified -O2. At least with gcc, you can explicitly tell the compiler to omit them by using the -fomit-frame-pointer option.
As explained here: http://support.amd.com/us/Processor_TechDocs/25112.PDF, a two-byte near-return instruction (i.e. rep ret) is used because a single-byte return can me mispredicted on some on some amd64 processors in some situations such as this one.
If you fiddle around with the processor targeted by gcc you may find that you can get it to generate a single-byte ret. -mtune=nocona worked for me.
I suspect early, your last code is a bug. As johnfound says. The first code is because all C Compiler must always follow _cdecl calling convention that in function means (In Intel, sorry, I don't know the AT&T Syntax):
Function Definition
_functionA:
push rbp
mov rbp, rsp
;Some function
pop rbp
ret
In caller :
call _functionA
sub esp, 0 ; Maybe if it zero, some compiler can strip it
Why GCC is always follow _cdecl calling convention when not following that is nonsense, that is the compiler isn't smarter that the advanced assembly programmer. So, it always follow _cdecl at all cost.
That is, because even so called "optimization compilers" are too dumb to generate always good machine code.
They can't generate better code than their creators made them to generate.
As long as an empty function is nonsense, they probably simply didn't bother to optimize it or even to detect this very special case.
Although, single "rep" prefix is probably a bug. It does nothing when used without string instruction, but anyway, in some newer CPU it theoretically can cause an exception. (and IMHO should)

Why does GCC add assembly commands to my inline assembly?

I'm using Apple's llvm-gcc to compile some code with inline assembly. I wrote what I want it to do, but it adds extraneous commands that keep writing variables to memory. Why is it doing this and how can I stop it?
Example:
__asm__{
mov r11, [rax]
and r11, 0xff
cmp r11, '\0'
}
becomes (in the "assembly" assistant view):
mov 0(%rax), %r11 // correct
movq %r11, -104(%rbp) // no, GCC, obviously wrong
and $255, %r11
movq %r11, -104(%rbp)
cmp $0, %r11
Cheers.
You need to use GCC's extended asm syntax to tell it which registers you're using as input and output and which registers get clobbered. If you don't do that, it has no idea what you're doing, and the assembly it generates can easily interfere with your code.
By informing it about what your code is doing, it changes how it does register allocation and optimization and avoids breaking your code.
it's because gcc tries to optimize your code. you can prevent optimizations by adding -O0 to command-line.
Try adding volatile after __asm__ if you don't want that. That additional commands are probably part previous/next C instructions. Without volatile compiler is allowed to do this (as it probably executes faster this way - not your code, the whole routine).

Why does a printf() stop a crash from occuring?

I have been looking all over the Internet for an answer to this question (see subject of post). I have been asked this exact question twice. Once at an interview for company and once by a friend and I cannot find the answer for the life of me.
I have actually experienced this error on multiple occasions when debugging without a debugger, and just using print statements to isolate the error. I cannot recall any exact situations, though I am positive I have experienced it. If anyone can provide a link or a reference or point me to something in printf() source that might cause an error to stop occurring when using print statements to debug code I would greatly appreciate the good read.
Thank you,
Matthew Hoggan
I am currently reading the link provided but for further conversation I have posted some of my weak attempts to investigate:
Okay, so i have started to play around myself to try and answer my own question but things are still not 100% clear to me. Below is the output from the g++ compiler using the -S option to output the assembly instead of the executable. The equivalent C++ code is also posted below. My goal is to try and recreate a simple scenario and then try and detect based on the instructions what might be happening at the processor levels. So lets say right after the "call printf" assembly code, which I am assuming is linked from the library files stored in /usr/lib or another lib directory, I tried to access a NULL pointer (not in code), or some other form of operation that would traditionally crash the program. I am assuming that I would have to find out what printf is doing instruction wise to get a deeper look into this?
.file "assembly_test_printf.cpp"
.section .rodata
.LC0:
.string "Hello World"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
.cfi_personality 0x0,__gxx_personality_v0
pushl %ebp
.cfi_def_cfa_offset 8
movl %esp, %ebp
.cfi_offset 5, -8
.cfi_def_cfa_register 5
andl $-16, %esp
subl $32, %esp
movl $0, 28(%esp)
movl $.LC0, (%esp)
call printf
movl 28(%esp), %eax
leave
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5"
.section .note.GNU-stack,"",#progbits
Equivalent C++ code:
#include <stdio.h>
int main ( int argc, char** argv ) {
int x = 0;
printf ("Hello World");
return x;
}
There are several reasons adding a printf() can change the behavior of a bug. Some of the more common ones might be:
changing the timing of execution (particularly for threading bugs)
changing memory use patterns (the compiler might change how the stack is used)
changing how registers are used
For example, an uninitialized local variable might be allocated to a register. Before adding the printf() the uninitialized variable is used and gets come garbage value that's in the register (maybe the result of a previous call to rand(), so it really is indeterminate). Adding the printf() causes the register to be used in printf() and printf() always happens to leave that register set to 0 (or whatever). Now your buggy program is still bugy, but with different behavior. And maybe that behavior happens to be benign.
I've seen it before, for example in Java, in cases where initialization code isn't complete when another thread attempts to access an object assumed to have already been created. The System.out.println() slows down the other thread enough for the initialization to complete.

cedecl calling convention -- compiled asm instructions cause crash

Treat this more as pseudocode than anything. If there's some macro or other element that you feel should be included, let me know.
I'm rather new to assembly. I programmed on a pic processor back in college, but nothing since.
The problem here (segmentation fault) is the first instruction after "Compile function entrance, setup stack frame." or "push %ebp". Here's what I found out about those two instructions:
http://unixwiz.net/techtips/win32-callconv-asm.html
Save and update the %ebp :
Now that we're in the new function, we need a new local stack frame pointed to by %ebp, so this is done by saving the current %ebp (which belongs to the previous function's frame) and making it point to the top of the stack.
push ebp
mov ebp, esp // ebp « esp
Once %ebp has been changed, it can now refer directly to the function's arguments as 8(%ebp), 12(%ebp). Note that 0(%ebp) is the old base pointer and 4(%ebp) is the old instruction pointer.
Here's the code. This is from a JIT compiler for a project I'm working on. I'm doing this more for the learning experience than anything.
IL_CORE_COMPILE(avs_x86_compiler_compile)
{
X86GlobalData *gd = X86_GLOBALDATA(ctx);
ILInstruction *insn;
avs_debug(print("X86: Compiling started..."));
/* Initialize X86 Assembler opcode context */
x86_context_init(&gd->ctx, 4096, 1024*1024);
/* Compile function entrance, setup stack frame*/
x86_emit1(&gd->ctx, pushl, ebp);
x86_emit2(&gd->ctx, movl, esp, ebp);
/* Setup floating point rounding mode to integer truncation */
x86_emit2(&gd->ctx, subl, imm(8), esp);
x86_emit1(&gd->ctx, fstcw, disp(0, esp));
x86_emit2(&gd->ctx, movl, disp(0, esp), eax);
x86_emit2(&gd->ctx, orl, imm(0xc00), eax);
x86_emit2(&gd->ctx, movl, eax, disp(4, esp));
x86_emit1(&gd->ctx, fldcw, disp(4, esp));
for (insn=avs_il_tree_base(tree); insn != NULL; insn = insn->next) {
avs_debug(print("X86: Compiling instruction: %p", insn));
compile_opcode(gd, obj, insn);
}
/* Restore floating point rounding mode */
x86_emit1(&gd->ctx, fldcw, disp(0, esp));
x86_emit2(&gd->ctx, addl, imm(8), esp);
/* Cleanup stack frame */
x86_emit0(&gd->ctx, emms);
x86_emit0(&gd->ctx, leave);
x86_emit0(&gd->ctx, ret);
/* Link machine */
obj->run = (AvsRunnableExecuteCall) gd->ctx.buf;
return 0;
}
And when obj->run is called, it's called with obj as its only argument:
obj->run(obj);
If it helps, here are the instructions for the entire function call. It's basically an assignment operation: foo=3*0.2;. foo is pointing to a float in C.
0x8067990: push %ebp
0x8067991: mov %esp,%ebp
0x8067993: sub $0x8,%esp
0x8067999: fnstcw (%esp)
0x806799c: mov (%esp),%eax
0x806799f: or $0xc00,%eax
0x80679a4: mov %eax,0x4(%esp)
0x80679a8: fldcw 0x4(%esp)
0x80679ac: flds 0x806793c
0x80679b2: fsts 0x805f014
0x80679b8: fstps 0x8067954
0x80679be: fldcw (%esp)
0x80679c1: add $0x8,%esp
0x80679c7: emms
0x80679c9: leave
0x80679ca: ret
Edit: Like I said above, in the first instruction in this function, %ebp is void. This is also the instruction that causes the segmentation fault. Is that because it's void, or am I looking for something else?
Edit: Scratch that. I keep typing edp instead of ebp. Here are the values of ebp and esp.
(gdb) print $esp
$1 = (void *) 0xbffff14c
(gdb) print $ebp
$3 = (void *) 0xbffff168
Edit: Those values above are wrong. I should have used the 'x' command, like below:
(gdb) x/x $ebp
0xbffff168: 0xbffff188
(gdb) x/x $esp
0xbffff14c: 0x0804e481
Here's a reply from someone on a mailing list regarding this. Anyone care to illuminate what he means a bit? How do I check to see how the stack is set up?
An immediate problem I see is that the
stack pointer is not properly aligned.
This is 32-bit code, and the Intel
manual says that the stack should be
aligned at 32-bit addresses. That is,
the least significant digit in esp
should be 0, 4, 8, or c.
I also note that the values in ebp and
esp are very far apart. Typically,
they contain similar values --
addresses somewhere in the stack.
I would look at how the stack was set
up in this program.
He replied with corrections to the above comments. He was unable to see any problems after further input.
Another edit: Someone replied that the code page may not be marked executable. How can I insure it is marked as such?
The problem had nothing to do with the code. Adding -z execstack to the linker fixed the problem.
If push %ebp is causing a segfault, then your stack pointer isn't pointing at valid stack. How does control reach that point? What platform are you on, and is there anything odd about the runtime environment? At the entry to the function, %esp should point to the return address in the caller on the stack. Does it?
Aside from that, the whole function is pretty weird. You go out of your way to set the rounding bits in the fp control word, and then don't perform any operations that are affected by rounding. All the function does is copy some data, but uses floating-point registers to do it when you could use the integer registers just as well. And then there's the spurious emms, which you need after using MMX instructions, not after doing x87 computations.
Edit See Scott's (the original questioner) answer for the actual reason for the crash.

Resources