C code Optimization by compiler for atmel studio - gcc

I am using Atmel Studio 7 and in that, optimization level is -O1.
Can I check what portion of code is being optimized by the compiler itself?
If I am disabling the optimization, my binary file size is of 12KB and on using optimization level -O1, binary file size if 5.5KB.

Can I check what portion of code is being optimized by the compiler itself?
All the code is optimized by the compiler, i.e affected by optimization flags except
It's code that's dragged from libraries (libgcc.a, libc.a, libm.a, lib<device>.a).
Startup code (crt<device>.o) which also includes the vector table, or code from other objects that already exist and are not (re-)compiled in the current compilation. The latter can happen with Makefiles when you change flags therein: If the modules do not depend on the Makefile itself, make will not rebuild them.
Code from assembly modules (*.S, *.sx, *.s) provided preprocessed assembly code does not use conditional assemblation by means of #ifdef __OPTIMIZE__ or similar.
Code in inline assembly, provided the inline asm is not optimized away.
In order to determine whether anything of this is in effect, you can respectively:
Link with -Wl,-Map,file.map and inspect that map file (a text file). It will list which objects have been dragged from where due to which undefined symbol.
Startup code is linked except you -nostartfiles. Add -Wl,-v to the link stage, you'll see crt<device>.o being linked.
You know your compilation units, assembly modules, don't you?
Add -save-temps to the compilation. Inline asm will show in the intermediate *.s file as
/* #APP */
; <line> "<compilation-unit>"
<inline-asm-code>
/* #NOAPP */

Related

GCC __attribute__((used)) vs. linker file KEEP statement?

Using "Gnu Arm Embedded Toolchain", it seems that I need to have both this statement in my .c file:
__attribute__ ((section("section_name"),used))
and this statement in my .ld file:
KEEP(sectionname)
in order for that particular section to not get removed by linker garbage collection (--gc-sections).
Can anyone explain why or guide me to some documentation mentioning this?
Both compiler and linker may remove functions which they consider to be unused (which usually means not reachable from main) so to preserve a function you need to inform both of the tools.
In theory compiler could automatically generate KEEP statements based on used attributes but this isn't done for historical reasons.

What is the name for the structure fo the gcc assembly output

Im trying to learn assembly, first i was using NASM for the compiling, but then i understood that i could use .s files in gcc. This interested me greatly, since my goal for this is to be able to write a compiler for a custom language, so this was very intriguing, as it would allow me to link and compile with c code. So filled with excitement, I started compiling c to assembly (.s files) with gcc, and examen it. As I was doing this, it seamed to be structured in a different way then NASM assembly, with only main label, f.eks, and not _start, and other weird structure, and im not talking about Intel- vs AT&T syntax. So then my question follows:
Is it a different structure, in normal assembly and the .s files in gcc, or is it just me not having a good enough knowlage of assembly? If it is a different structure, does it have a name?
I have been trying to google my way to this for hours, but when i search for gcc assembly, and other things I can think of, I only get c inline assembly...
Please help, im going crazy from not figuring this out.
gcc emits definitions for all the functions present in the translation unit. (unless they're static inline or static and unused or it chooses to inline them everywhere...).
The CRT start files (linked by default by gcc, not re-built from source every time you compile) provides the definition for _start and the other functions you'll see if you disassemble the binary. They're only linked in at the link stage, not as part of compiling a .c to a .s, so you don't see them in gcc -S output.
Related: How to remove "noise" from GCC/clang assembly output? for tips on making compiler asm output human-readable.

Using Linker Symbol from C++ code as a fixed constant (NOT relocated) in a shared library (DLL)

Sorry if the title is not very clear. I am using MinGW with GCC 6.3.0 to build a x86 32-bit DLL on Windows (so far). I'll spare you the details why I need hacky offsets amongst its sections accessible from code, so please do not ask if it's useful or not (because I don't want to bother explaining that).
So, if I can get the following testcase to work, I'm good. Here's my problem:
In a C++ file, I want to access a linker symbol as an absolute numeric value, not relocated, directly. Remember that I am building a 32-bit DLL which requires a .reloc section for relocations, but in this case I do NOT want relocation, in fact a relocation would screw it up completely.
Here's an example: retrieve the offset of say __imp__MessageBoxW#16 relative to __IAT_start__, in case you don't know what they are, __imp__MessageBoxW#16 is the relocated pointer to the actual function at runtime, and __IAT_start__ is a linker symbol in the default script file. Here's where it is defined:
.idata BLOCK(__section_alignment__) :
{
/* This cannot currently be handled with grouped sections.
See pe.em:sort_sections. */
KEEP (SORT(*)(.idata$2))
KEEP (SORT(*)(.idata$3))
/* These zeroes mark the end of the import list. */
LONG (0); LONG (0); LONG (0); LONG (0); LONG (0);
KEEP (SORT(*)(.idata$4))
__IAT_start__ = .;
KEEP (SORT(*)(.idata$5))
__IAT_end__ = .;
KEEP (SORT(*)(.idata$6))
KEEP (SORT(*)(.idata$7))
}
So far, no problem. Because GAS doesn't allow me to "subtract" two externally defined symbols (both symbols are defined in the linker), I have to define the symbol in the linker script, so at the end of the linker script I have this:
test_symbol = ABSOLUTE("__imp__MessageBoxW#16" - __IAT_start__);
Then in C++ I use this little inline asm to retrieve this relative difference which is supposed to be a fixed value once linked:
asm("movl $test_symbol, %0":"=r"(var));
Now var should contain that fixed number right? Wrong!
Because test_symbol is an "undefined" symbol as far as the assembler is concerned, it makes it relocated. Or I don't know why, but I tried so many things to force it to be an "absolute constant value symbol" instead of a "relocated symbol" to no avail. Even editing the linker script with many things like LD_FEATURE("SANE_EXPR") and others, doesn't work at all.
Its value is correct only if the DLL does not get relocated.
You see, either GNU LD or the assembler adds an entry in the .reloc section for that movl instruction, which is WRONG!
Is there a way to force it to treat an external/undefined symbol as a fixed CONSTANT and apply no relocation to it whatsoever? Basically, omit it from the .reloc section.
I am going crazy with this, please tell me there's something easy I overlooked, I searched for hours!
In other words, is there a way to use a Linker Symbol from within inline asm/C++ without having it relocated whatsoever? No entry to the .reloc section or anything, basically same as a constant like $1234. So if a DLL gets loaded into another base address, that constant would be the same everytime.
UPDATE: I forgot about this question but decided to bring an update, since it seems it's likely not possible as nobody even commented. For anyone else in the same boat as me, I presume this is a limitation of the COFF object format itself. In other words, external symbols are implicitly relocated, and it doesn't seem there's a way against this.
I didn't "fix" it the way I wanted, I did it in a very hacky way though. If anyone is interested, here's my ugly "hack":
First I put a special "custom" instruction in the inline assembly where I reference this external symbol from C++. This "custom" instruction holds a placeholder instruction that grabs the symbol (normal x86 asm instruction with a dummy constant, e.g. 1234) and a way to identify it. Then let GCC generate the assembly files (.S files), then I parse the assembly with a simple script and when I find that "custom" instruction I insert a label for the linker (make it .global) and at the same time add a directive to a custom "on-the-fly" generated linker script that gets included from my main linker script at the end.
This places data in a temporary section in the resulting DLL with absolute offsets to the custom instruction that I need, but without relocation.
Next, I parse the binary DLL itself, in particular that temporary section I added with all this hack. I take the offsets from there, convert them to file offsets, and modify the DLL's .text section directly where those offsets point (remember those placeholder instructions? it is replacing their immediate constants 1234 with the respective value from the linker's non-relocated constant). Then I strip the temporary section from the DLL, and it's done. Of course, all of this is done automatically by a helper program and script
It's an insane hack, but it works and it's fully automatic now that I got it going. If my assumption is correct that COFF doesn't support non-relocated external symbols, then it's really the only way to use linker constants from C++ without them being relocated, which would be a disaster.

Reordering functions in gcc assembly

I am writing a program which encrypt/decrypts itself in memory and then writes the .text memory region to a copy of the executable so I can change the encryption key each time.
This is mainly for a challenge as I am not great with C, and I'm incorporating parts in assembly as well.
My system is x86_64 Linux but I'm compiling with -m32
I am also using -nostartfiles (with gcc) so that I can write my own _start function. This function is written in assembly and this decrypts/encrypts the rest of the .text section. My problem is that the external functions are being compiled in the wrong order, such that when I try to dump the memory after it has been encrypted it calls an encrypted function which therefore doesn't work.
This is the current order of the functions:
some from -static
my functions which are in the correct order (assembly functions and then the ones from the main C file)
some more from -static
This doesn't work becuase the assembly encrypts from the main C file 'downwards', also encrypting some -static functions which are needed from the assembly functions.
This is the order I would like the functions to be in:
all -static functions & anything from an #include <>
functions from the .S assembly file (the whole .S in order)
functions from the .c main file (the whole .c in order)
any non-standard includes for the .c main file (ie not stdio.h etc, things from #include "")
Is there any way, short of manually mangling the ELF file, for me to reorder these functions so that the functions I need are not encrypted while the ones I want encrypted can be easily?
edit upon compiling with the musl (alternative libc) I can get all of my functions at the start, and the rest of the static functions following. However, This is the wrong way around still.
The "wrong" order of functions inside the binary comes from optimization efforts of the compiler. Functions that are used often (or often together) are near each other, so that no pagefault is generated by calling them.
You can turn off part of these optimizations with the flag -fno-toplevel-reorder. You can also use the attribute section to order only a subset of functions together (eg to encrypt them) or you can write your own linker scripts.
See also this question.

LLVM translation unit

I try to understand LLVM program high level structure.
I read in the book that "programs are composed of modules ,each of which correspons to tranlation unit".Can someone explain me in more details the above and what is the diffrenece between modules and translation units(if any).
I am also interested to know which part of the code is called when translation unit starts and completes debugging information encoding?
Translation unit is term from language standard. For example, this is from C (c99 iso draft)
5.1 Conceptual models; 5.1.1 Translation environment; 5.1.1.1 Program structure
A C program need not all be translated at the same time. The text of the program is kept
in units called source files, (or preprocessing files) in this International Standard. A
source file together with all the headers and source files included via the preprocessing
directive #include is known as a preprocessing translation unit. After preprocessing, a
preprocessing translation unit is called a translation unit.
So, translation unit is the single source file (file.c) after preprocessing (all #included *.h files instantiated, all macro are expanded, all comments are skipped, and file is ready for tokenizing).
Translation unit is a unit of compiling, because it didn't depend on any external resource until linking step. All headers are within TU.
Term module is not defined in the language standard, but it AFAIK refers to translation unit at deeper translation phases.
LLVM describes it as: http://llvm.org/docs/ProgrammersManual.html
The Module class represents the top level structure present in LLVM programs. An LLVM module is effectively either a translation unit of the original program or a combination of several translation units merged by the linker.
The Module class keeps track of a list of Functions, a list of GlobalVariables, and a SymbolTable. Additionally, it contains a few helpful member functions that try to make common operations easy.
About this part of your question:
I am also interested to know which part of the code is called when translation unit starts and completes debugging information encoding?
This depends on how LLVM is used. LLVM itself is a library and can be used in various ways.
For clang/LLVM (C/C++ complier build on libclang and LLVM) the translation unit created after preprocessing stage. It will be parsed into AST, then into LLVM assembly and saved in Module.
For tutorial example, here is a creation of Modules http://llvm.org/releases/2.6/docs/tutorial/JITTutorial1.html

Resources