I have a few questions about Stack Guard and SSP protections. First question is about Stack Guard and its three types of canaries, if I am correctly - terminator, random and random XOR.
I'd like to know, how to disabled Stack Guard on x86 Linux system? Somewhere I read, it's possible with this command, while compiling with gcc '-disable-stackguard-randomization', it's same like with this command for enable '-enable-stackguard-randomization', both doesn't work. If needed, my gcc version is 4.8.2.
Next question about Stack guard, when I will able to enable/disable it, how can I set, which type of canaries I want to use? What I read, terminator canaries are used by default, for random I have to compiled with '-enable-stackguard-randomization', but how about random XOR? (Or with null 0x00000000)
Now about SSP(ProPolice), I know, for random canary I have to compiled with 'fstack-protector-all', but how about terminator, is it same as in Stack Guard, by default?
Last one, if anyone of you, can tell me, where I can find random canary in memory. For example, I have this scenario - compiled C program, like 'gcc -g example.c -o example -fstack-protector-all', so with random canaries. Let's say, I'm able to get address of canary, after every execution. So expect, I have: Canary = 0x1ae3f900. From a different papers, I get some info, that canary is located in .bss segment. So I get address of .bss segment using readelf: 'readelf -a ./example | grep bss'. It's 080456c9. In gdb I set some breakpoints, to get address of canary, but when I check .bss address x/20x 0x080456c9, all I see are only 0x00000000 addresses, but
canary is nowhere. Plus, I checked __stack_chk_fail's if it isn't there, but with same result, I can't see it there. I get address of stack_chk_fail from PLT/GOT.
Thank in advance for your answer and time.
Stack Smashing Protection (SSP) is an improvement over StackGuard. SSP was first implemented in gcc 4.1.
I'd like to know, how to disabled Stack Guard on x86 Linux system?
Use -fno-stack-protector to disable the userland SSP.
The --disable-stackguard-randomization and --enable-stackguard-randomization are build options for glibc source code.
when I will able to enable/disable it, how can I set, which type of
canaries I want to use?
This is not configurable in gcc as far as I know. Since glibc 2.10, the stack canary is generated in a function called _dl_setup_stack_chk_guard. Here is some part of its code:
if (dl_random == NULL)
{
ret.bytes[sizeof (ret) - 1] = 255;
ret.bytes[sizeof (ret) - 2] = '\n';
}
else
{
memcpy (ret.bytes, dl_random, sizeof (ret));
ret.num &= ~(uintptr_t) 0xff;
}
dl_random holds the address of the auxiliary vector entry for AT_RANDOM, which is a 16-byte random value initialized by the kernel while creating the process. If you are running on a kernel or an emulator that doesn't initialize AT_RANDOM, the check dl_random == NULL would be true and the canary used is the terminator value with the first and second most significant bytes initialized to 255 and \n, respectively. All other bytes are zero. Usually AT_RANDOM is initialized by the kernel and so the least 7 significant bytes of AT_RANDOM are copied. The last byte of canary is set to zero.
So if you want to use a particular method to generate the canary, you can change this code and build you own glibc.
As an alternative method, #PeterCordes have suggested in the comments to write your canary value to memory location %%fs:0x28 (see the code below) at the top of the main function and restore the runtime-generated canary just before returning from main.
Now about SSP(ProPolice), I know, for random canary I have to compiled
with 'fstack-protector-all', but how about terminator, is it same as
in Stack Guard, by default?
All variants of the -fstack-protector option use SSP. These don't affect how the canary is generated.
Last one, if anyone of you, can tell me, where I can find random
canary in memory.
The canary is generated dynamically early at process startup; you can't use readelf to get the canary. According to this article, you can use the following code to get the canary when compiling for i386:
int read_canary()
{
int val = 0;
__asm__("movl %%gs:0x14, %0;"
: "=r"(val)
:
:);
return val;
}
and for x86_64:
long read_canary()
{
long val = 0;
__asm__("movq %%fs:0x28, %0;"
: "=r"(val)
:
:);
return val;
}
Related
I have a process where one .o file is built without any .eh_frame or .debug_frame section (via an assembler) but with other types of debug info such as .debug_info. Apparently this triggers gdb to stop using frame-pointer (rbp) based unwinding for any functions from that object, and it produces invalid backtraces (it isn't clear how it is trying to unwind the stack at all).
Now the functions in this binary set up the stack frame properly (i.e., rbp points to correctly to the base of the frame) and if GDB were just to use that to unwind, everything would be great. Is there some way I can tell it to ignore the dwarf2 info and use frame-pointer based unwinding?
if gcc were just to use that to unwind, everything would be great.
You mean GDB.
I use the following routine in my ~/.gdbinit to unwind $rbp frame chain:
define xbt
set $xbp = (void **)$arg0
while 1
x/2a $xbp
set $xbp = (void **)$xbp[0]
end
end
Call it with the initial base pointer address you want to start from, e.g., xbt $rbp to use the current base pointer.
This isn't as good as allowing GDB to do it (no access to parameters or locals), but it does get at least the call trace.
For making GDB to ignore existing DWARF unwind info, you'll have to patch it out and build your own GDB.
P.S. Using --strip-dwo will not help.
Update:
why stripping isn't feasible?
Well, --strip-dwo only strips .dwo sections, and that's not where unwind info is (it's in .eh_frame and .debug_frame sections).
That said, you should try to strip .debug_frame with strip -g bad.o -- if your file only has bad .debug_frame but correct (or missing) .eh_frame, then removing .debug_frame should work.
strip doesn't remove .eh_frame because that info is usually required for unwinding.
If .eh_frame is also bad, you may be able to remove it with objcopy.
Some more info on unwinding here.
I've found a very simple hack that was good enough for my purposes.
In my case there is a single function that didn't work with up command.
Here are the steps:
set $rip = *((void**)$rbp+ 1)
set $rbp = *((void**)$rbp)
First line manually patches the instruction pointer. This seems similar to calling up on gdb, but function arguments are still broken. Second line sets rbp to it's value from caller - this fixes arguments for me.
It's probably ok to call this multiple times to go up multiple functions. In my case after single iteration of these commands up and frame start to work. You might also need to set rsp.
Warning: there is no easy way to go back (down)
I have to use a Microchip PIC for a new project (needed high pin count on a TQFP60 package with 5V operation).
I have a huge problem, I might miss something (sorry for that in advance).
IDE: MPLAB X 3.51
Compiler: XC8 1.41
The issue is that if I initialize an object to anything other than 0, it will not be initialized, and always be zero when I reach the main();
In simulator it works, and the object value is the proper one.
Simple example:
#include <xc.h>
static int x= 0x78;
void main(void) {
while(x){
x++;
}
return;
}
In simulator the x is 0x78 and the while(x) is true.
BUT when I load the code to the PIC18F67K40 using PICKIT3, the x is 0.
This happening even if I do a simple sprintf, and it does nothing as the formatting text string (char array) is full of zeros.
sprintf(buf,"Number is %u",x")
I can not initialize any object apart to be zero.
What is going on? Any help appreciated!
Found the problem, The chip has an errata issues, and I got the one which is effected, strange, Farnell sells it. More strange that the compiler is not prepared for that, does not even give a warning to say to be careful!
Errata note:
Module: PIC18 Core
3.1 TBLRD requires NVMREG value to point to
appropriate memory
The affected silicon revisions of the PIC18FXXK40
devices improperly require the NVMREG<1:0>
bits in the NVMCON register to be set for TBLRD
access of the various memory regions. The issue
is most apparent in compiled C programs when the
user defines a const type and the compiler uses
TBLRD instructions to retrieve the data from
program Flash memory (PFM). The issue is also
apparent when the user defines an array in RAM
for which the complier creates start-up code,
executed before main(), that uses TBLRD
instructions to initialize RAM from PFM.
I'm using MinGW64 build based on GCC 4.6.1 for Windows 64bit target. I'm playing around with the new Intel's AVX instructions. My command line arguments are -march=corei7-avx -mtune=corei7-avx -mavx.
But I started running into segmentation fault errors when allocating local variables on the stack. GCC uses the aligned moves VMOVAPS and VMOVAPD to move __m256 and __m256d around, and these instructions require 32-byte alignment. However, the stack for Windows 64bit has only 16 byte alignment.
How can I change the GCC's stack alignment to 32 bytes?
I have tried using -mstackrealign but to no avail, since that aligns only to 16 bytes. I couldn't make __attribute__((force_align_arg_pointer)) work either, it aligns to 16 bytes anyway. I haven't been able to find any other compiler options that would address this. Any help is greatly appreciated.
EDIT:
I tried using -mpreferred-stack-boundary=5, but GCC says that 5 is not supported for this target. I'm out of ideas.
I have been exploring the issue, filed a GCC bug report, and found out that this is a MinGW64 related problem. See GCC Bug#49001. Apparently, GCC doesn't support 32-byte stack alignment on Windows. This effectively prevents the use of 256-bit AVX instructions.
I investigated a couple ways how to deal with this issue. The simplest and bluntest solution is to replace of aligned memory accesses VMOVAPS/PD/DQA by unaligned alternatives VMOVUPS etc. So I learned Python last night (very nice tool, by the way) and pulled off the following script that does the job with an input assembler file produced by GCC:
import re
import fileinput
import sys
# fix aligned stack access
# replace aligned vmov* by unaligned vmov* with 32-byte aligned operands
# see Intel's AVX programming guide, page 39
vmova = re.compile(r"\s*?vmov(\w+).*?((\(%r.*?%ymm)|(%ymm.*?\(%r))")
aligndict = {"aps" : "ups", "apd" : "upd", "dqa" : "dqu"};
for line in fileinput.FileInput(sys.argv[1:],inplace=1):
m = vmova.match(line)
if m and m.group(1) in aligndict:
s = m.group(1)
print line.replace("vmov"+s, "vmov"+aligndict[s]),
else:
print line,
This approach is pretty safe and foolproof. Though I observed a performance penalty on rare occasions. When the stack is unaligned, the memory access crosses the cache line boundary. Fortunately, the code performs as fast as aligned accesses most of the time. My recommendation: inline functions in critical loops!
I also attempted to fix the stack allocation in every function prolog using another Python script, trying to align it always at the 32-byte boundary. This seems to work for some code, but not for other. I have to rely on the good will of GCC that it will allocate aligned local variables (with respect to the stack pointer), which it usually does. This is not always the case, especially when there is a serious register spilling due to the necessity to save all ymm register before a function call. (All ymm registers are callee-save). I can post the script if there's an interest.
The best solution would be to fix GCC MinGW64 build. Unfortunately, I have no knowledge of its internal workings, just started using it last week.
You can get the effect you want by
Declaring your variables not as variables, but as fields in a struct
Declaring an array that is larger than the structure by an appropriate amount of padding
Doing pointer/address arithmetic to find a 32 byte aligned address in side the array
Casting that address to a pointer to your struct
Finally using the data members of your struct
You can use the same technique when malloc() does not align stuff on the heap appropriately.
E.g.
void foo() {
struct I_wish_these_were_32B_aligned {
vec32B foo;
char bar[32];
}; // not - no variable definition, just the struct declaration.
unsigned char a[sizeof(I_wish_these_were_32B_aligned) + 32)];
unsigned char* a_aligned_to_32B = align_to_32B(a);
I_wish_these_were_32B_aligned* s = (I_wish_these_were_32B_aligned)a_aligned_to_32B;
s->foo = ...
}
where
unsigned char* align_to_32B(unsiged char* a) {
uint64_t u = (unit64_t)a;
mask_aligned32B = (1 << 5) - 1;
if (u & mask_aligned32B == 0) return (unsigned char*)u;
return (unsigned char*)((u|mask_aligned_32B) + 1);
}
I just ran in the same issue of having segmentation faults when using AVX inside my functions. And it was also due to the stack misalignment. Given the fact that this is a compiler issue (and the options that could help are not available in Windows), I worked around the stack usage by:
Using static variables (see this issue). Given the fact that they are not stored in the stack, you can force their alignment by using __attribute__((align(32))) in your declaration. For example: static __m256i r __attribute__((aligned(32))).
Inlining the functions/methods receiving/returning AVX data. You can force GCC to inline your function/method by adding inline and __attribute__((always_inline)) to your function prototype/declaration. Inlining your functions increase the size of your program, but they also prevent the function from using the stack (and hence, avoids the stack-alignment issue). Example: inline __m256i myAvxFunction(void) __attribute__((always_inline));.
Be aware that the usage of static variables is no thread-safe, as mentioned in the reference. If you are writing a multi-threaded application you may have to add some protection for your critical paths.
I am porting a program for an ARM chip from a IAR compiler to gcc.
In the original code, IAR specific operators such as __segment_begin and __segment_size are used to obtain the beginning and size respectively of certain memory segments.
Is there any way to do the same thing with GCC? I've searched the GCC manual but was unable to find anything relevant.
More details:
The memory segments in question have to be in fixed locations so that the program can interface correctly with certain peripherals on the chip. The original code uses the __segment_begin operator to get the address of this memory and the __segment_size to ensure that it doesn't overflow this memory.
I can achieve the same functionality by adding variables to indicate the start and end of these memory segments but if GCC had similar operators that would help minimise the amount of compiler dependent code I end up having to write and maintain.
What about the linker's flag --section-start? Which I read is supported here.
An example on how to use it can be found on the AVR Freaks Forum:
const char __attribute__((section (".honk"))) ProjString[16] = "MY PROJECT V1.1";
You will then have to add to the linker's options: -Wl,--section-start=.honk=address.
Modern versions of GCC will declare two variables for each segment, namely __start_MY_SEGMENT and __stop_MY_SEGMENT. To use these variables, you need to declare them as externs with the desired type. Following that, you and then use the '&' operator to get the address of the start and end of that segment.
extern uint8_t __start_MY_SEGMENT;
extern uint8_t __stop_MY_SEGMENT;
#define MY_SEGMENT_LEN (&__stop_MY_SEGMENT - &__start_MY_SEGMENT)
My program crashed when I added the option -fstack-check and -fstack-protector. __stack_chk_fail is called in the back trace.
So how could I know where the problem is ? What does -fstack-check really check ?
The information about gcc seems too huge to find out the answer.
After checked the assembly program.
I think -fstack-check, will add code write 0 to an offset of the stack pointer, so to test if the program visit a violation address, the program went crash if it does.
e.g. mov $0x0,-0x928(%esp)
-fstack-check: If two feature macros STACK_CHECK_BUILTIN and STACK_CHECK_STATIC_BUILTIN are left at the default 0, it just inserts a NULL byte every 4kb (page) when the stack grows.
By default only one, but when the stack can grow more than one page, which is the most dangerous case, every 4KB. linux >2.6 only has only one small page gap between the stack and the heap, which can lead to stack-gap attacks, known since 2005.
See What exception is raised in C by GCC -fstack-check option for assembly.
It is enabled in gcc at least since 2.95.3, in clang since 3.6.
__stack_chk_fail is the inserted -fstack-protector code which verifies an inserted stack canary value which might be overwritten by a simple stack overflow, e.g. by recursion.
"`-fstack-protector' emits extra code to check for buffer overflows, such as stack
smashing attacks. This is done by adding a guard variable to
functions with vulnerable objects. This includes functions that
call alloca, and functions with buffers larger than 8 bytes. The
guards are initialized when a function is entered and then checked
when the function exits. If a guard check fails, an error message
is printed and the program exits"
GCC Options That Control Optimization
GCC extension for protecting applications from stack-smashing attacks
Smashing The Stack For Fun And Profit
I Hope this will give some clue..