Why does GCC use frame pointer when I call Win32 functions with arguments? - gcc

When I compile 32-bit C code with GCC and the -fomit-frame-pointer option, the frame pointer (ebp) is not used unless my function calls Windows API functions with stdcall and atleast one parameter.
For example, if I only use GetCommandLine() from the Windows API, which has no parameters/arguments, GCC will omit the frame pointer and use ebp for other things, speeding up the code and not having that useless prologue.
But the moment I call a stdcall Win32 function that accepts at least one argument, GCC completely ignores the -fomit-frame-pointer and uses the frame pointer anyway, and the code is worse in inspection as it can't use ebp for general purpose things. Not to mention I find the frame pointer quite pointless. I mean, I want to compile for release and distribution, why should I care about debugging? (if I want to debug I'll just use a debug build instead after reproducing the bug)
My stack most certainly does NOT contain dynamic allocation like alloca. So, the stack has a defined structure yet GCC chooses the dumb method despite my options? Is there something I'm missing to force it to not use frame pointer?
My second grip I have with it is that it refuses to use "push" instructions for Win32 functions. Every other compiler I tried, they used push instructions to push on the stack, resulting in much better more compact code, not to mention it is the most natural way to push arguments for stdcall. Yet GCC stubbornly uses "mov" instructions to move in each spot, manually, at offsets relative to esp because it needs to keep the stack pointer completely static. stdcall is made to be easy on the caller, and yet GCC completely misses the point of stdcall since it generates this crappy code when interfacing with it. What's worse, since the stack pointer is static, it still uses a frame pointer? Just why?
I tried -mpush-args, it doesn't do anything.
I also noticed that if I make my stack big enough for it to exceed a page (4096 bytes), GCC will add a prologue with a function that does nothing but "bitwise or" the stack every 4096 bytes with zero (which does nothing). I assume it's for touching the stack and automatically commiting memory with page faults if the stack was reserved? Unfortunately, it does this even if I set the initial commit of the stack (not reserve) to high enough to hold my stack, not to mention this shouldn't even be needed in the first place. Redundant code at its best.
Are these bugs in GCC? Or something I'm missing in options? Should I use something else? Please tell me if I'm missing some options.
I seriously hope I won't have to make an inline asm macro just to call stdcall functions and use push instructions (and this will avoid frame pointer too I guess). That sounds really overkill for something so basic that should be in compilers of today. And yes I use GCC 4.8.1 so not an ancient version.
As extra question, is it possible to force GCC to not save registers on the stack at function prologue? I use my own direct entry point with -nostartfiles argument, because it is a pure Windows application and it works just fine without standard lib startup. If I use attribute((noreturn)), it will discard the epilogue restoring the registers but it will still push them on the stack at prologue, I don't know if there's a way to force it to not save registers for this entry point function. Either way not a big deal in the least, it would just feel more complete I guess. Thanks!

See the answer Force GCC to push arguments on the stack before calling function (using PUSH instruction)
I.e. try -mpush-args -mno-accumulate-outgoing-args. It may also require -mno-stack-arg-probe if gcc complains.

It looks like supplying the -mpush-args -mno-accumulate-outgoing-args -mno-stack-arg-probe works, specifically the last one. Now the code is cleaner and more normal like other compilers, and it uses PUSH for arguments, even makes it easier to track in OllyDbg this way.
Unfortunately, this FORCES the stupid frame pointer to be used, even in small functions that absolutely do not need it at all. Seriously is there a way to absolutely force GCC to disable the frame pointer?!

Related

How do symbols solve walking the stack with FPO in x86 debugging?

In this answer: https://stackoverflow.com/a/8646611/192359 , it is explained that when debugging x86 code, symbols allow the debugger to display the callstack even when FPO (Frame Pointer Omission) is used.
The given explanation is:
On the x86 PDBs contain FPO information, which allows the debugger to reliably unwind a call stack.
My question is what's this information? As far as I understand, just knowing whether a function has FPO or not does not help you finding the original value of the stack pointer, since that depends on runtime information.
What am I missing here?
Fundamentally, it is always possible to walk the stack with enough information1, except in cases where the stack or execution context has been irrecoverably corrupted.
For example, even if rbp isn't used as the frame pointer, the return address is still on the stack somewhere, and you just need to know where. For a function that doesn't modify rsp (indirectly or directly) in the body of the function it would be at a simple fixed offset from rsp. For functions that modify rsp in the body of the function (i.e., that have a variable stack size), the offset from rsp might depend on the exact location in the function.
The PDB file simply contains this "side band" information which allows someone to determine the return address for any instruction in the function. Hans linked a relevant in-memory structure above - you can see that since it knows the size of the local variables and so on it can calculate the offset between rsp and the base of the frame, and hence get at the return address. It also knows how many instruction bytes are part of the "prolog" which is important because if the IP is still in that region, different rules apply (i.e., the stack hasn't been adjusted to reflect the locals in this function yet).
In 64-bit Windows, the exact function call ABI has been made a bit more concrete, and all functions generally have to provide unwind information: not in a .pdb but directly in a section included in the binary. So even without .pdb files you should be able to unwind a properly structured 64-bit Windows program. It allows any register to be used as the frame pointer, and still allows frame-pointer omission (with some restrictions). For details, start here.
1 If this weren't true, ask yourself how the currently running function could ever return? Now, technically you could design a program which clobbers or forgets the stack in a way that it cannot return, and either never exits or uses a method like exit() or abort() to terminate. This is highly unusual and not possibly outside of assembly.

What is the gcc -fstack-protector -S XXX.c function? [duplicate]

I have enabled the -Wstack-protector warning when compiling the project I'm working on (a commercial multi-platform C++ game engine, compiling on Mac OS X 10.6 with GCC 4.2).
This flag warns about functions that will not be protected against stack smashing even though -fstack-protector is enabled.
GCC emits some warnings when building the project:
not protecting function: no buffer at least 8 bytes long
not protecting local variables: variable length buffer
For the first warning, I found that it is possible to adjust the minimum size a buffer must have when used in a function, for this function to be protected against stack smashing: --param ssp-buffer-size=X can be used, where X is 8 by default and can be as low as 1.
For the second warning, I can't suppress its occurrences unless I stop using -Wstack-protector.
When should -fstack-protector be used? (as in, for instance, all the time during dev, or just when tracking bugs down?)
When should -fstack-protector-all be used?
What is -Wstack-protector telling me? Is it suggesting that I decrease the buffer minimum size?
If so, are there any downsides to putting the size to 1?
It appears that -Wstack-protector is not the kind of flag you want enabled at all times if you want a warning-free build. Is this right?
Stack-protection is a hardening strategy, not a debugging strategy. If your game is network-aware or otherwise has data coming from an uncontrolled source, turn it on. If it doesn't have data coming from somewhere uncontrolled, don't turn it on.
Here's how it plays out: If you have a bug and make a buffer change based on something an attacker can control, that attacker can overwrite the return address or similar portions of the stack to cause it to execute their code instead of your code. Stack protection will abort your program if it detects this happening. Your users won't be happy, but they won't be hacked either. This isn't the sort of hacking that is about cheating in the game, it's the sort of hacking that is about someone using a vulnerability in your code to create an exploit that potentially infects your user.
For debugging-oriented solutions, look at things like mudflap.
As to your specific questions:
Use stack protector if you get data from uncontrolled sources. The answer to this is probably yes. So use it. Even if you don't have data from uncontrolled sources, you probably will eventually or already do and don't realize it.
Stack protections for all buffers can be used if you want extra protection in exchange for some performance hit. From gcc4.4.2 manual:
-fstack-protector
Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call alloca, and functions with buffers larger than 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits.
-fstack-protector-all
Like -fstack-protector except that all functions are protected.
The warnings tell you what buffers the stack protection can't protect.
It is not necessarily suggesting you decrease your minimum buffer size, and at a size of 0/1, it is the same as stack-protector-all. It is only pointing it out to you so that you can, if you decide redesign the code so that buffer is protected.
No, those warnings don't represent issues, they just point out information to you. Don't use them regularly.
You indeed should not care about the warning for normal builds. It's really more of an informational message. I hope it's obvious that you do have an inherent security concern with variable-sized buffers on the stack; get the size calculation wrong and you're opening a big hole.

Optimizing used registers when using inline ARM assembly in GCC

I want to write some inline ARM assembly in my C code. For this code, I need to use a register or two more than just the ones declared as inputs and outputs to the function. I know how to use the clobber list to tell GCC that I will be using some extra registers to do my computation.
However, I am sure that GCC enjoys the freedom to shuffle around which registers are used for what when optimizing. That is, I get the feeling it is a bad idea to use a fixed register for my computations.
What is the best way to use some extra register that is neither input nor output of my inline assembly, without using a fixed register?
P.S. I was thinking that using a dummy output variable might do the trick, but I'm not sure what kind of weird other effects that will have...
Ok, I've found a source that backs up the idea of using dummy outputs instead of hard registers:
4.8 Temporary registers:
People also sometimes erroneously use clobbers for temporary registers. The right way is
to make up a dummy output, and use “=r” or “=&r” depending on the permitted overlap
with the inputs. GCC allocates a register for the dummy value. The difference is that
GCC can pick a convenient register, so it has more flexibility.
from page 20 of this pdf.
For anyone who is interested in more info on inline assembly with GCC this website turned out to be very instructive.

How to debug stack-overwriting errors with Valgrind?

I just spent some time chasing down a bug that boiled down to the following. Code was erroneously overwriting the stack, and I think it wrote over the return address of the function call. Following the return, the program would crash and stack would be corrupted. Running the program in valgrind would return an error such as:
vex x86->IR: unhandled instruction bytes: 0xEA 0x3 0x0 0x0
==9222== valgrind: Unrecognised instruction at address 0x4e925a8.
I figure this is because the return jumped to a random location, containing stuff that were not valid x86 opcodes. (Though I am somehow suspicious that this address 0x4e925a8 happened to be in an executable page. I imagine valgrind would throw a different error if this wasn't the case.)
I am certain that the problem was of the stack-overwriting type, and I've since fixed it. Now I am trying to think how I could catch errors like this more effectively. Obviously, valgrind can't warn me if I rewrite data on the stack, but maybe it can catch when someone writes over a return address on the stack. In principle, it can detect when something like 'push EIP' happens (so it can flag where the return addresses are on the stack).
I was wondering if anyone knows if Valgrind, or anything else can do that? If not, can you comment on other suggestions regarding debugging errors of this type efficiently.
If the problem happens deterministically enough that you can point out particular function that has it's stack smashed (in one repeatable test case), you could, in gdb:
Break at entry to that function
Find where the return address is stored (it's relative to %ebp (on x86) (which keeps the value of %esp at the function entry), I am not sure whether there is any offset).
Add watchpoint to that address. You have to issue the watch command with calculated number, not an expression, because with an expression gdb would try to re-evaluate it after each instruction instead of setting up a trap and that would be extremely slow.
Let the function run to completion.
I have not yet worked with the python support available in gdb7, but it should allow automating this.
In general, Valgrind detection of overflows in stack and global variables is weak to non-existant. Arguably, Valgrind is the wrong tool for that job.
If you are on one of supported platforms, building with -fmudflap and linking with -lmudflap will give you much better results for these kinds of errors. Additional docs here.
Udpdate:
Much has changed in the 6 years since this answer. On Linux, the tool to find stack (and heap) overflows is AddressSanitizer, supported by recent versions of GCC and Clang.

How does the gcc option -fstack-check exactly work?

My program crashed when I added the option -fstack-check and -fstack-protector. __stack_chk_fail is called in the back trace.
So how could I know where the problem is ? What does -fstack-check really check ?
The information about gcc seems too huge to find out the answer.
After checked the assembly program.
I think -fstack-check, will add code write 0 to an offset of the stack pointer, so to test if the program visit a violation address, the program went crash if it does.
e.g. mov $0x0,-0x928(%esp)
-fstack-check: If two feature macros STACK_CHECK_BUILTIN and STACK_CHECK_STATIC_BUILTIN are left at the default 0, it just inserts a NULL byte every 4kb (page) when the stack grows.
By default only one, but when the stack can grow more than one page, which is the most dangerous case, every 4KB. linux >2.6 only has only one small page gap between the stack and the heap, which can lead to stack-gap attacks, known since 2005.
See What exception is raised in C by GCC -fstack-check option for assembly.
It is enabled in gcc at least since 2.95.3, in clang since 3.6.
__stack_chk_fail is the inserted -fstack-protector code which verifies an inserted stack canary value which might be overwritten by a simple stack overflow, e.g. by recursion.
"`-fstack-protector' emits extra code to check for buffer overflows, such as stack
smashing attacks. This is done by adding a guard variable to
functions with vulnerable objects. This includes functions that
call alloca, and functions with buffers larger than 8 bytes. The
guards are initialized when a function is entered and then checked
when the function exits. If a guard check fails, an error message
is printed and the program exits"
GCC Options That Control Optimization
GCC extension for protecting applications from stack-smashing attacks
Smashing The Stack For Fun And Profit
I Hope this will give some clue..

Resources