Reordering functions in gcc assembly - gcc

I am writing a program which encrypt/decrypts itself in memory and then writes the .text memory region to a copy of the executable so I can change the encryption key each time.
This is mainly for a challenge as I am not great with C, and I'm incorporating parts in assembly as well.
My system is x86_64 Linux but I'm compiling with -m32
I am also using -nostartfiles (with gcc) so that I can write my own _start function. This function is written in assembly and this decrypts/encrypts the rest of the .text section. My problem is that the external functions are being compiled in the wrong order, such that when I try to dump the memory after it has been encrypted it calls an encrypted function which therefore doesn't work.
This is the current order of the functions:
some from -static
my functions which are in the correct order (assembly functions and then the ones from the main C file)
some more from -static
This doesn't work becuase the assembly encrypts from the main C file 'downwards', also encrypting some -static functions which are needed from the assembly functions.
This is the order I would like the functions to be in:
all -static functions & anything from an #include <>
functions from the .S assembly file (the whole .S in order)
functions from the .c main file (the whole .c in order)
any non-standard includes for the .c main file (ie not stdio.h etc, things from #include "")
Is there any way, short of manually mangling the ELF file, for me to reorder these functions so that the functions I need are not encrypted while the ones I want encrypted can be easily?
edit upon compiling with the musl (alternative libc) I can get all of my functions at the start, and the rest of the static functions following. However, This is the wrong way around still.

The "wrong" order of functions inside the binary comes from optimization efforts of the compiler. Functions that are used often (or often together) are near each other, so that no pagefault is generated by calling them.
You can turn off part of these optimizations with the flag -fno-toplevel-reorder. You can also use the attribute section to order only a subset of functions together (eg to encrypt them) or you can write your own linker scripts.
See also this question.

Related

C code Optimization by compiler for atmel studio

I am using Atmel Studio 7 and in that, optimization level is -O1.
Can I check what portion of code is being optimized by the compiler itself?
If I am disabling the optimization, my binary file size is of 12KB and on using optimization level -O1, binary file size if 5.5KB.
Can I check what portion of code is being optimized by the compiler itself?
All the code is optimized by the compiler, i.e affected by optimization flags except
It's code that's dragged from libraries (libgcc.a, libc.a, libm.a, lib<device>.a).
Startup code (crt<device>.o) which also includes the vector table, or code from other objects that already exist and are not (re-)compiled in the current compilation. The latter can happen with Makefiles when you change flags therein: If the modules do not depend on the Makefile itself, make will not rebuild them.
Code from assembly modules (*.S, *.sx, *.s) provided preprocessed assembly code does not use conditional assemblation by means of #ifdef __OPTIMIZE__ or similar.
Code in inline assembly, provided the inline asm is not optimized away.
In order to determine whether anything of this is in effect, you can respectively:
Link with -Wl,-Map,file.map and inspect that map file (a text file). It will list which objects have been dragged from where due to which undefined symbol.
Startup code is linked except you -nostartfiles. Add -Wl,-v to the link stage, you'll see crt<device>.o being linked.
You know your compilation units, assembly modules, don't you?
Add -save-temps to the compilation. Inline asm will show in the intermediate *.s file as
/* #APP */
; <line> "<compilation-unit>"
<inline-asm-code>
/* #NOAPP */

What is the name for the structure fo the gcc assembly output

Im trying to learn assembly, first i was using NASM for the compiling, but then i understood that i could use .s files in gcc. This interested me greatly, since my goal for this is to be able to write a compiler for a custom language, so this was very intriguing, as it would allow me to link and compile with c code. So filled with excitement, I started compiling c to assembly (.s files) with gcc, and examen it. As I was doing this, it seamed to be structured in a different way then NASM assembly, with only main label, f.eks, and not _start, and other weird structure, and im not talking about Intel- vs AT&T syntax. So then my question follows:
Is it a different structure, in normal assembly and the .s files in gcc, or is it just me not having a good enough knowlage of assembly? If it is a different structure, does it have a name?
I have been trying to google my way to this for hours, but when i search for gcc assembly, and other things I can think of, I only get c inline assembly...
Please help, im going crazy from not figuring this out.
gcc emits definitions for all the functions present in the translation unit. (unless they're static inline or static and unused or it chooses to inline them everywhere...).
The CRT start files (linked by default by gcc, not re-built from source every time you compile) provides the definition for _start and the other functions you'll see if you disassemble the binary. They're only linked in at the link stage, not as part of compiling a .c to a .s, so you don't see them in gcc -S output.
Related: How to remove "noise" from GCC/clang assembly output? for tips on making compiler asm output human-readable.

Static library "interface"

Is there any way to tell the compiler (gcc/mingw32) when building an object file (lib*.o) to only expose certain functions from the .c file?
The reason I want to do this is that I am statically linking to a 100,000+ line library (SQLite), but am only using a select few of the functions it offers. I am hoping that if I can tell the compiler to only expose those functions, it will optimize out all the code of the functions that are never needed for those few I selected, thus dratically decreasing the size of the library.
I found several possible solutions:
This is what I asked about. It is the gcc equivalent of Windows' dllexpoort:
http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Code-Gen-Options.html (-fvisibility)
http://gcc.gnu.org/wiki/Visibility
I also discovered link-time code-generation. This allows the linker to see what parts of the code are actually used and get rid of the rest. Using this together with strip and -fwhole-program has given me drastically better results.
http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Optimize-Options.html (see -flto and -fwhole-program)
Note: This flag only makes sense if you are not compiling the whole program in one call to gcc, which is what I was doing (making a sqlite.o file and then statically linking it in).
The third option which I found but have not yet looked into is mentioned here:
How to remove unused C/C++ symbols with GCC and ld?
That's probably the linkers job, not the compilers. When linking that as a program (.exe), the linker will take care of only importing the relevant symbols, and when linking a DLL, the __dllexport mechanism is probably what you are looking for, or some flags of ld can help you (man ld).

g++ vs gcc, linking problem with a static library (.a)

I'm trying to link a static library (.a) file with a .o file which supposedly uses the symbols from the library. However, when using gcc - the normal linker error comes up, regardless of using the .a file as
gcc -L. a.c staticlib.a
However, the same command works with g++ flawlessly.
Why is this happening ?
I can see that the .c file is totally legal c ( and hence c++ ), but then why isn't gcc able to detect symbols in the the library ?
Tried finding the symbols in the library using objdump, was able to find closely resembling symbols, but not exact ones. e.g:
Got
00000000000000b0 g F .text 000000000000004e _*Z15PhttsFn_InitTTSPh*
for the symbol *PhttsFn_InitTTS*
Can someone please explain this phenomenon ? I've also checked the architecture the library file was compiled for, and it's the same as my architecture.
Thanks!
C++ uses something called name mangling, in order for namespaces, overloaded function names, etc, to get unique symbols in the compiled object file.
Your C code refers to a symbol PhttsFn_InitTTS explicitly. Now if compiled as a C, it will produce that very symbol name. However, since C++ needs to deal with all these different variations of the same name (e.g. overloading, with different parameter lists), it creates a 'mangled' version encoding namespace, and parameter types. In your case it was mangled to Z15PhttsFn_InitTTSPh, basically saying no namespace and no parameters. (I reckon Z15 means 15 character name; followed by no parameter list).
Invoking GCC as gcc allows it to pick the file format itself, based on file extension (.c -> C, .cc or .cpp, etc -> C++). Invoking it as g++ forces C++ mode.
Your .a-file is obviously compiled using C++, as it exposed that mangled symbol.

Is there a way to strip all functions from an object file that I am not using?

I am trying to save space in my executable and I noticed that several functions are being added into my object files, even though I never call them (the code is from a library).
Is there a way to tell gcc to remove these functions automatically or do I need to remove them manually?
If you are compiling into object files (not executables), then a compiler will never remove any non-static functions, since it's always possible you will link the object file against another object file that will call that function. So your first step should be declaring as many functions as possible static.
Secondly, the only way for a compiler to remove any unused functions would be to statically link your executable. In that case, there is at least the possibility that a program might come along and figure out what functions are used and which ones are not used.
The catch is, I don't believe that gcc actually does this type of cross-module optimization. Your best bet is the -Os flag to optimize for code size, but even then, if you have an object file abc.o which has some unused non-static functions and you link statically against some executable def.exe, I don't believe that gcc will go and strip out the code for the unused functions.
If you truly desperately need this to be done, I think you might have to actually #include the files together so that after the preprocessor pass, it results in a single .c file being compiled. With gcc compiling a single monstrous jumbo source file, you stand the best chance of unused functions being eliminated.
Have you looked into calling gcc with -Os (optimize for size.) I'm not sure if it strips unreached code, but it would be simple enough to test. You could also, after getting your executable back, 'strip' it. I'm sure there's a gcc command-line arg to do the same thing - is it --dead_strip?
In addition to -Os to optimize for size, this link may be of help.
Since I asked this question, GCC 4.5 was released which includes an option to combine all files so it looks like it is just 1 gigantic source file. Using that option, it is possible to easily strip out the unused functions.
More details here
IIRC the linker by default does what you want ins some specific cases. The short of it is that library files contain a bunch of object files and only referenced files are linked in. If you can figure out how to get GCC to emit each function into it's own object file and then build this into a library you should get what you are looking.
I only know of one compiler that can actually do this: here (look at the -lib flag)

Resources