What is the name for the structure fo the gcc assembly output - gcc

Im trying to learn assembly, first i was using NASM for the compiling, but then i understood that i could use .s files in gcc. This interested me greatly, since my goal for this is to be able to write a compiler for a custom language, so this was very intriguing, as it would allow me to link and compile with c code. So filled with excitement, I started compiling c to assembly (.s files) with gcc, and examen it. As I was doing this, it seamed to be structured in a different way then NASM assembly, with only main label, f.eks, and not _start, and other weird structure, and im not talking about Intel- vs AT&T syntax. So then my question follows:
Is it a different structure, in normal assembly and the .s files in gcc, or is it just me not having a good enough knowlage of assembly? If it is a different structure, does it have a name?
I have been trying to google my way to this for hours, but when i search for gcc assembly, and other things I can think of, I only get c inline assembly...
Please help, im going crazy from not figuring this out.

gcc emits definitions for all the functions present in the translation unit. (unless they're static inline or static and unused or it chooses to inline them everywhere...).
The CRT start files (linked by default by gcc, not re-built from source every time you compile) provides the definition for _start and the other functions you'll see if you disassemble the binary. They're only linked in at the link stage, not as part of compiling a .c to a .s, so you don't see them in gcc -S output.
Related: How to remove "noise" from GCC/clang assembly output? for tips on making compiler asm output human-readable.

Related

Where is __builtin_va_start defined?

I'm trying to locate where __builtin_va_start is defined in GCC's source code, and see how it is implemented. (I was looking for where va_start is defined and then found that this macro is defined as __builtin_va_start.) I used cscope -r in GCC 9.1's source code directory to search the definition but haven't found it. Can anyone point where this function is defined?
That __builtin_va_start is not defined anywhere. It is a GCC compiler builtin (a bit like sizeof is a compile-time operator). It is an implementation detail related to the <stdarg.h> standard header (provided by the compiler, not the C standard library implementation libc). What really matters are the calling conventions and ABI followed by the generated assembler.
GCC has special code to deal with compiler builtins. And that code is not defining the builtin, but implementing its ad-hoc behavior inside the compiler. And __builtin_va_start is expanded into some compiler-specific internal representation of your compiled C/C++ code, specific to GCC (some GIMPLE perhaps)
From a comment of yours, I would infer that you are interested in implementation details. But that should be in your question
If you study GCC 9.1 source code, look inside some of gcc-9.1.0/gcc/builtins.c (the expand_builtin_va_start function there), and for other builtins inside gcc-9.1.0/gcc/c-family/c-cppbuiltin.c, gcc-9.1.0/gcc/cppbuiltin.c, gcc-9.1.0/gcc/jit/jit-builtins.c
You could write your own GCC plugin (in 2Q2019, for GCC 9, and the C++ code of your plugin might have to change for the future GCC 10) to add your own GCC builtins. BTW, you might even overload the behavior of the existing __builtin_va_start by your own specific code, and/or you might have -at least for research purposes- your own stdarg.h header with #define va_start(v,l) __my_builtin_va_start(v,l) and have your GCC plugin understand your __my_builtin_va_start plugin-specific builtin. Be however aware of the GCC runtime library exception and read its rationale: I am not a lawyer, but I tend to believe that you should (and that legal document requires you to) publish your GCC plugin with some open source license.
You first need to read a textbook on compilers, such as the Dragon book, to understand that an optimizing compiler is mostly transforming internal representations of your compiled code.
You further need to spend months in studying the many internal representations of GCC. Remember, GCC is a very complex program (of about ten millions lines of code). Don't expect to understand it with only a few days of work. Look inside the GCC resource center website.
My dead GCC MELT project had references and slides explaining more of GCC (the design philosophy and architecture of GCC changes slowly; so the concepts are still relevant, even if individual details changed). It took me almost ten years full time to partly understand some of the middle-end layers of GCC. I cannot transmit that knowledge in a StackOverflow answer.
My draft Bismon report (work in progress, funded by H2020, so lot of bureaucracy) has a dozen of pages (in its sections ยง1.3 and 1.4) introducing the internal representations of GCC.

C code Optimization by compiler for atmel studio

I am using Atmel Studio 7 and in that, optimization level is -O1.
Can I check what portion of code is being optimized by the compiler itself?
If I am disabling the optimization, my binary file size is of 12KB and on using optimization level -O1, binary file size if 5.5KB.
Can I check what portion of code is being optimized by the compiler itself?
All the code is optimized by the compiler, i.e affected by optimization flags except
It's code that's dragged from libraries (libgcc.a, libc.a, libm.a, lib<device>.a).
Startup code (crt<device>.o) which also includes the vector table, or code from other objects that already exist and are not (re-)compiled in the current compilation. The latter can happen with Makefiles when you change flags therein: If the modules do not depend on the Makefile itself, make will not rebuild them.
Code from assembly modules (*.S, *.sx, *.s) provided preprocessed assembly code does not use conditional assemblation by means of #ifdef __OPTIMIZE__ or similar.
Code in inline assembly, provided the inline asm is not optimized away.
In order to determine whether anything of this is in effect, you can respectively:
Link with -Wl,-Map,file.map and inspect that map file (a text file). It will list which objects have been dragged from where due to which undefined symbol.
Startup code is linked except you -nostartfiles. Add -Wl,-v to the link stage, you'll see crt<device>.o being linked.
You know your compilation units, assembly modules, don't you?
Add -save-temps to the compilation. Inline asm will show in the intermediate *.s file as
/* #APP */
; <line> "<compilation-unit>"
<inline-asm-code>
/* #NOAPP */

Reordering functions in gcc assembly

I am writing a program which encrypt/decrypts itself in memory and then writes the .text memory region to a copy of the executable so I can change the encryption key each time.
This is mainly for a challenge as I am not great with C, and I'm incorporating parts in assembly as well.
My system is x86_64 Linux but I'm compiling with -m32
I am also using -nostartfiles (with gcc) so that I can write my own _start function. This function is written in assembly and this decrypts/encrypts the rest of the .text section. My problem is that the external functions are being compiled in the wrong order, such that when I try to dump the memory after it has been encrypted it calls an encrypted function which therefore doesn't work.
This is the current order of the functions:
some from -static
my functions which are in the correct order (assembly functions and then the ones from the main C file)
some more from -static
This doesn't work becuase the assembly encrypts from the main C file 'downwards', also encrypting some -static functions which are needed from the assembly functions.
This is the order I would like the functions to be in:
all -static functions & anything from an #include <>
functions from the .S assembly file (the whole .S in order)
functions from the .c main file (the whole .c in order)
any non-standard includes for the .c main file (ie not stdio.h etc, things from #include "")
Is there any way, short of manually mangling the ELF file, for me to reorder these functions so that the functions I need are not encrypted while the ones I want encrypted can be easily?
edit upon compiling with the musl (alternative libc) I can get all of my functions at the start, and the rest of the static functions following. However, This is the wrong way around still.
The "wrong" order of functions inside the binary comes from optimization efforts of the compiler. Functions that are used often (or often together) are near each other, so that no pagefault is generated by calling them.
You can turn off part of these optimizations with the flag -fno-toplevel-reorder. You can also use the attribute section to order only a subset of functions together (eg to encrypt them) or you can write your own linker scripts.
See also this question.

Static library "interface"

Is there any way to tell the compiler (gcc/mingw32) when building an object file (lib*.o) to only expose certain functions from the .c file?
The reason I want to do this is that I am statically linking to a 100,000+ line library (SQLite), but am only using a select few of the functions it offers. I am hoping that if I can tell the compiler to only expose those functions, it will optimize out all the code of the functions that are never needed for those few I selected, thus dratically decreasing the size of the library.
I found several possible solutions:
This is what I asked about. It is the gcc equivalent of Windows' dllexpoort:
http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Code-Gen-Options.html (-fvisibility)
http://gcc.gnu.org/wiki/Visibility
I also discovered link-time code-generation. This allows the linker to see what parts of the code are actually used and get rid of the rest. Using this together with strip and -fwhole-program has given me drastically better results.
http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Optimize-Options.html (see -flto and -fwhole-program)
Note: This flag only makes sense if you are not compiling the whole program in one call to gcc, which is what I was doing (making a sqlite.o file and then statically linking it in).
The third option which I found but have not yet looked into is mentioned here:
How to remove unused C/C++ symbols with GCC and ld?
That's probably the linkers job, not the compilers. When linking that as a program (.exe), the linker will take care of only importing the relevant symbols, and when linking a DLL, the __dllexport mechanism is probably what you are looking for, or some flags of ld can help you (man ld).

Is there any disassembler which generates compilable assembly source code?

I would like to know, is there any Windows platform disassembler (software) can generate the assembly source code which is also compilable by an assembler?
Since disassembler can generate the assembly code based on an EXE file, is it possible the assembly code be used directly as a source code, then the source code be compiled by an assembler like NASM?
IDA can generate the source code. But in most cases you can't edit it. Assume the following code:
loc_401020:
ret
; ...
dd 0FFFFFFFFh, 0, 1, 401020h, 0
; ^^^^^^^ can you find it in big real program?
to insert any new bytes to you must either be shure that any sub_XXXX or loc_XXXX will remain at the same offset, either you must replace all its references to labels.
If you don't move any code, you don't need to recompile it - just patch and maybe extend the code section.
I think IDA is quite good at this.
Anyway, the main problem would be that the generated assembly code would be quite unreadable and very hard to mantain (no variable names, function names and signatures), so altough technically it would be ASM code, it's still better to use IDA as clever editor to deduce this information.
I'd patch the executable directly in a debugger and then save the modified executable.
Decompiling and then recompiling is a fragile process since any change of position of the reassembled code can break the program. And I think there are multiple binary representations of certain asm instructions complicating the matter ever further.
You are losing some information when disassembling an executable so you are unlikely to get a fully working executable when assembling the disassembly. If you are clever, you can extract single functions from an executable, but not the whole program.
The objconv disassembler can produce assembly code in masm, nasm, yasm and gas syntax.

Resources