I'm working on the Pintos toy operating system at university, but there's a strange bug when using GCC 4.6.2. When I push my system call arguments (just 3 pushl-s in inline assembly), some mysterious data also appears on the stack, and the arguments are in the wrong order. Setting -fno-omit-frame-pointer gets rid of the strange data, but the arguments are still in the wrong order. GCC 4.5 works fine. Any idea what specific option could fix this?
NOTE: the problem still occurs with -O0.
Without a code example and a listing of the result from your different compilations, it's difficult to help you. But here are three possible causes for your problems:
Make sure you understand how arguments are pushed to the stack. Arguments are pushed from the back. This makes it possible for printf(char *, ...) to examine the first item to find out how many more there are. If you want to call the function int foo(int a, int b, int c), you'll need to push c, then b and finally a.
Could the strange data on the stack be a return address or EFLAGS? I don't know Pintos and how system calls are made, but make sure that you understand the difference between CALL/RET and INT/IRET. INT pushes the flags onto the stack.
If your inline assembly has side effects, you might want to write volatile/__volatile__ in front of it. Otherwise GCC is allowed to move it when optimizing.
I need to see your code to better understand what's going on.
The culprit was -fomit-frame-pointer, which has been enabled by default since 4.6.2. -fno-omit-frame-pointer fixed the issue.
Did you clean the parameters on stack after the syscall? gcc may not be aware that you touch the stack and generate code depends on the stack pointer it expected.
-fno-omit-frame-pointer force gcc to use e/rbp for accessing locate data but it just hide the actual problem.
Related
With C++ code built for debugging with g++ (i.e. options "-O0 -ggdb") and using the newest gcc (5.1.0) and gdb (7.9) the display of source code in gdb is still painfully non-linear when using the "next" command. As an example this function call might be expected to step through with a single "next":
7757| SDValue NewRoot = TLI->LowerFormalArguments(
7758| DAG.getRoot(), F.getCallingConv(), F.isVarArg(), Ins, dl, DAG, InVals);
however it takes four, with the displayed execution line being first 7757, then 7758, then again 7757, then again 7758. If the function call is condensed to a single line then just one "next" is needed. If the call is absurdly inflated then seven "next"s are needed (shown as the '#' annotations)
7757| SDValue
7758| NewRoot
7759| =
#1,6 7760| TLI
7761| ->
7762| LowerFormalArguments(
#5 7763| DAG.getRoot(),
7764| F.getCallingConv(),
#3 7765| F.isVarArg(),
7766| Ins,
7767| dl,
7768| DAG,
7769| InVals
#2,4,7 7770| );
So it's related to but not as simple as "each function call on a distinct line is a stepping point". This gets especially confusing with breakpoints in recursive functions, where I find myself checking the callstack to see whether it's really a new invocation or just a phony backwards step.
Since reflowing all of the LLVM source to contain function calls in a single line isn't really a viable option, is there some gcc/gdb option for controlling this behaviour?
EDIT: now checked with clang 3.5 and lldb 3.5: when built with clang only three "next"s occur. And gdb and lldb see the same "next" behaviour in either case (i.e. 4 with gcc, 3 with clang)
This sort of behavior from the debugger is a "GIGO" situation -- that is, normally gdb is just doing whatever the debug info tells it to do. That is, when there is odd behavior, it is generally due to decisions made by the compiler. It may be a bug, and probably worth a bug report, but I also wouldn't be surprised if it is intended to work this way for some reason.
You can investigate these kinds of problems by using readelf or objdump to examine the line table.
I have enabled the -Wstack-protector warning when compiling the project I'm working on (a commercial multi-platform C++ game engine, compiling on Mac OS X 10.6 with GCC 4.2).
This flag warns about functions that will not be protected against stack smashing even though -fstack-protector is enabled.
GCC emits some warnings when building the project:
not protecting function: no buffer at least 8 bytes long
not protecting local variables: variable length buffer
For the first warning, I found that it is possible to adjust the minimum size a buffer must have when used in a function, for this function to be protected against stack smashing: --param ssp-buffer-size=X can be used, where X is 8 by default and can be as low as 1.
For the second warning, I can't suppress its occurrences unless I stop using -Wstack-protector.
When should -fstack-protector be used? (as in, for instance, all the time during dev, or just when tracking bugs down?)
When should -fstack-protector-all be used?
What is -Wstack-protector telling me? Is it suggesting that I decrease the buffer minimum size?
If so, are there any downsides to putting the size to 1?
It appears that -Wstack-protector is not the kind of flag you want enabled at all times if you want a warning-free build. Is this right?
Stack-protection is a hardening strategy, not a debugging strategy. If your game is network-aware or otherwise has data coming from an uncontrolled source, turn it on. If it doesn't have data coming from somewhere uncontrolled, don't turn it on.
Here's how it plays out: If you have a bug and make a buffer change based on something an attacker can control, that attacker can overwrite the return address or similar portions of the stack to cause it to execute their code instead of your code. Stack protection will abort your program if it detects this happening. Your users won't be happy, but they won't be hacked either. This isn't the sort of hacking that is about cheating in the game, it's the sort of hacking that is about someone using a vulnerability in your code to create an exploit that potentially infects your user.
For debugging-oriented solutions, look at things like mudflap.
As to your specific questions:
Use stack protector if you get data from uncontrolled sources. The answer to this is probably yes. So use it. Even if you don't have data from uncontrolled sources, you probably will eventually or already do and don't realize it.
Stack protections for all buffers can be used if you want extra protection in exchange for some performance hit. From gcc4.4.2 manual:
-fstack-protector
Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call alloca, and functions with buffers larger than 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits.
-fstack-protector-all
Like -fstack-protector except that all functions are protected.
The warnings tell you what buffers the stack protection can't protect.
It is not necessarily suggesting you decrease your minimum buffer size, and at a size of 0/1, it is the same as stack-protector-all. It is only pointing it out to you so that you can, if you decide redesign the code so that buffer is protected.
No, those warnings don't represent issues, they just point out information to you. Don't use them regularly.
You indeed should not care about the warning for normal builds. It's really more of an informational message. I hope it's obvious that you do have an inherent security concern with variable-sized buffers on the stack; get the size calculation wrong and you're opening a big hole.
When I compile 32-bit C code with GCC and the -fomit-frame-pointer option, the frame pointer (ebp) is not used unless my function calls Windows API functions with stdcall and atleast one parameter.
For example, if I only use GetCommandLine() from the Windows API, which has no parameters/arguments, GCC will omit the frame pointer and use ebp for other things, speeding up the code and not having that useless prologue.
But the moment I call a stdcall Win32 function that accepts at least one argument, GCC completely ignores the -fomit-frame-pointer and uses the frame pointer anyway, and the code is worse in inspection as it can't use ebp for general purpose things. Not to mention I find the frame pointer quite pointless. I mean, I want to compile for release and distribution, why should I care about debugging? (if I want to debug I'll just use a debug build instead after reproducing the bug)
My stack most certainly does NOT contain dynamic allocation like alloca. So, the stack has a defined structure yet GCC chooses the dumb method despite my options? Is there something I'm missing to force it to not use frame pointer?
My second grip I have with it is that it refuses to use "push" instructions for Win32 functions. Every other compiler I tried, they used push instructions to push on the stack, resulting in much better more compact code, not to mention it is the most natural way to push arguments for stdcall. Yet GCC stubbornly uses "mov" instructions to move in each spot, manually, at offsets relative to esp because it needs to keep the stack pointer completely static. stdcall is made to be easy on the caller, and yet GCC completely misses the point of stdcall since it generates this crappy code when interfacing with it. What's worse, since the stack pointer is static, it still uses a frame pointer? Just why?
I tried -mpush-args, it doesn't do anything.
I also noticed that if I make my stack big enough for it to exceed a page (4096 bytes), GCC will add a prologue with a function that does nothing but "bitwise or" the stack every 4096 bytes with zero (which does nothing). I assume it's for touching the stack and automatically commiting memory with page faults if the stack was reserved? Unfortunately, it does this even if I set the initial commit of the stack (not reserve) to high enough to hold my stack, not to mention this shouldn't even be needed in the first place. Redundant code at its best.
Are these bugs in GCC? Or something I'm missing in options? Should I use something else? Please tell me if I'm missing some options.
I seriously hope I won't have to make an inline asm macro just to call stdcall functions and use push instructions (and this will avoid frame pointer too I guess). That sounds really overkill for something so basic that should be in compilers of today. And yes I use GCC 4.8.1 so not an ancient version.
As extra question, is it possible to force GCC to not save registers on the stack at function prologue? I use my own direct entry point with -nostartfiles argument, because it is a pure Windows application and it works just fine without standard lib startup. If I use attribute((noreturn)), it will discard the epilogue restoring the registers but it will still push them on the stack at prologue, I don't know if there's a way to force it to not save registers for this entry point function. Either way not a big deal in the least, it would just feel more complete I guess. Thanks!
See the answer Force GCC to push arguments on the stack before calling function (using PUSH instruction)
I.e. try -mpush-args -mno-accumulate-outgoing-args. It may also require -mno-stack-arg-probe if gcc complains.
It looks like supplying the -mpush-args -mno-accumulate-outgoing-args -mno-stack-arg-probe works, specifically the last one. Now the code is cleaner and more normal like other compilers, and it uses PUSH for arguments, even makes it easier to track in OllyDbg this way.
Unfortunately, this FORCES the stupid frame pointer to be used, even in small functions that absolutely do not need it at all. Seriously is there a way to absolutely force GCC to disable the frame pointer?!
I want to write some inline ARM assembly in my C code. For this code, I need to use a register or two more than just the ones declared as inputs and outputs to the function. I know how to use the clobber list to tell GCC that I will be using some extra registers to do my computation.
However, I am sure that GCC enjoys the freedom to shuffle around which registers are used for what when optimizing. That is, I get the feeling it is a bad idea to use a fixed register for my computations.
What is the best way to use some extra register that is neither input nor output of my inline assembly, without using a fixed register?
P.S. I was thinking that using a dummy output variable might do the trick, but I'm not sure what kind of weird other effects that will have...
Ok, I've found a source that backs up the idea of using dummy outputs instead of hard registers:
4.8 Temporary registers:
People also sometimes erroneously use clobbers for temporary registers. The right way is
to make up a dummy output, and use “=r” or “=&r” depending on the permitted overlap
with the inputs. GCC allocates a register for the dummy value. The difference is that
GCC can pick a convenient register, so it has more flexibility.
from page 20 of this pdf.
For anyone who is interested in more info on inline assembly with GCC this website turned out to be very instructive.
My program crashed when I added the option -fstack-check and -fstack-protector. __stack_chk_fail is called in the back trace.
So how could I know where the problem is ? What does -fstack-check really check ?
The information about gcc seems too huge to find out the answer.
After checked the assembly program.
I think -fstack-check, will add code write 0 to an offset of the stack pointer, so to test if the program visit a violation address, the program went crash if it does.
e.g. mov $0x0,-0x928(%esp)
-fstack-check: If two feature macros STACK_CHECK_BUILTIN and STACK_CHECK_STATIC_BUILTIN are left at the default 0, it just inserts a NULL byte every 4kb (page) when the stack grows.
By default only one, but when the stack can grow more than one page, which is the most dangerous case, every 4KB. linux >2.6 only has only one small page gap between the stack and the heap, which can lead to stack-gap attacks, known since 2005.
See What exception is raised in C by GCC -fstack-check option for assembly.
It is enabled in gcc at least since 2.95.3, in clang since 3.6.
__stack_chk_fail is the inserted -fstack-protector code which verifies an inserted stack canary value which might be overwritten by a simple stack overflow, e.g. by recursion.
"`-fstack-protector' emits extra code to check for buffer overflows, such as stack
smashing attacks. This is done by adding a guard variable to
functions with vulnerable objects. This includes functions that
call alloca, and functions with buffers larger than 8 bytes. The
guards are initialized when a function is entered and then checked
when the function exits. If a guard check fails, an error message
is printed and the program exits"
GCC Options That Control Optimization
GCC extension for protecting applications from stack-smashing attacks
Smashing The Stack For Fun And Profit
I Hope this will give some clue..