How does gcc linker get the size of a function? - gcc

As a result of studying the ELF format, I can see that the object file has a symbol corresponding to each function, and the corresponding symbol table entry has a value of st_size, which means the size of the function.
The problem is that the executable file was successfully created even though I changed the st_size of the specific function in the object file and linked it. The following code is the test code I used.
// In main.c,
int main(void)
{
myprintf("TEST");
}
// In log.c
#include <stdio.h>
void myprintf(const char *str)
{
printf(str);
}
In the code above, I changed the st_size value of the myprintf function in the log.o file, and linked the log.o and main.o files. In the default, the st_size value was 0x13. I tested it by changing it to 0x00. I tested it by changing it to 0x40. But the myprintf function of the a.out result file is well up. How does the linker determine the size of each function?

Well, firstly I'd like to begin with an old saying that it is more likely for humanity to find a theory of everything and unify quantum mechanics with general relativity rather than understand the optimizations and a decision tree of a linker.
back to our business, I've played with this on my machine and came to the conclusion that the only reasonable explanation for this, is that the linker doesn't truly need the size of a function in order to unify raw machine instruction from different compilation units into a single executable, and let's discuss why:
Let's say you have two compilation units, each containing three consecutive functions,
why would one need to know the size of each function? Isn't the fixed resolved virtual address granted to that function by that specific linker enough for relocation? and the true answer is - it is sufficient to have nothing but the offset of a function within the object file to link different compilation units into one executables.
However, with that being said, some executable formats such as ELF don't offer you an offset for a function machine code within a compilation unit, and you must calculate it yourself by indeed using the offset of that section within the ELF file and the size of each symbol entry within the section being pointed by the symbol table. which simply means, if you had as I've said earlier, two compilation units with three functions each after corrupting the size entry within the symbol table, as the linker would attempt resolving the compilation units into a single executable, it would simply corrupt it, and your executable would cause you segfaults quickly. I've attempted this at my home, and these are the results I've received:
when corrupting a symbol table's size entry of a compilation unit with one function nothing happens, as the entire text section's size (for this matter) is the exact same as that function's size, so the linker has no problem resolving it,
and when doing the same thing for compilation units with three functions it corrupts my executable, as the linker starts copying corrupted offsets of text from one compilation unit into the final executable.
Generally speaking, if you were to use an executable format which offers the linker an immediate offset of that function within the object file, without the need of calculation by size and section offset within the file, you'd probably end up with the same results even if you've had more than one function in a single compilation unit, unless there is some sanity test done by the linker. in my opinion the only reason a linker would need to use the size rather than the one I've just mentioned, is probably the need to clean out some section from redundant functions or variables not being referenced by anyone else (link time optimizations) and hence the need to recalculate relocation offsets for other referenced functions within that compilation unit, or to somehow recalculate relative jumps from within that same compilation unit.
hope this somehow answers your question, I'd be more than glad to help if you'd like a deeper demonstration of this

Related

What content/factors increase a C++ programs exe file size?

Can someone explain exactly what factors, in detail, increases an executable's file output size in C++? Things I know that increase file sizes from testing are, including libraries, and built in variable types. I also read somewhere on here that build modes(release vs debug) can also increase exe file size? What I'm not sure of is if the actual value size in a variable like int with a value of 5, vs int with a value of 100,000,000, increase exe size, if the number of actual lines in the program (depends on the lines content) plays a role, and what other factors increases the .exe file size.
Debug vs Release
In debug mode, there are sometimes debuginfos in the .exe file, sometimes they are in an external pdb file. (Depending on the compile used).
Furthermore, often the debug mode has less optimizations and uses a lot more instructions. (Like loading and storing a variable unnecessary often).
On the other hand, in release builds, some optimizations are applied. They can have both effects. They either increase the size of the file (like function inlining or loop unrolling) or the decrease the size (Removing unneccessary instructions or merge them).
Furthermore symbols are often stripped away in release builds.
Static vs dynamic linking
If you link statically, all required files/functions are included in the final executable ==> Bigger executable (Often)
If you link dynamically, only a reference to the library is in the final executable, so the filesize is smaller.
(C++ specific) Templates
Templates increase the size of the file really much, as for each type, where the template is applied, machine code has to be generated.
Because of this, if you make heavy use of templates (Like using Boost), you will get a big executable.
What I'm not sure of is if the actual value size in a variable like int with a value of 5, vs int with a value of 100,000,000, increase exe size
There won't be a difference, because both will take 4 bytes. EXCEPT if the compiler decides to optimize and uses only one byte for 5. Like:
mov al, 5 ;(Save some bytes)
vs
mov eax, 5
if the number of actual lines in the program [...] plays a role
Yes, more lines, more code. But you can't translate the lines of code to the size of the executable. Consider:
void foo(int a,
int b,
int c,
double d)
{
if (a<50)
{
baz(a);
}
//Do something
}
vs
void foo(int a,int b,int c, double d){
if(a<50) baz(a);
}
All of them will compile to the same code. Furthermore comments don't count to the executable, except if a compiler would decide to add comments into the executable (I don't know any compiler, that is doing something like that).

instruction point value of dynamic linking and static linking

By using Intel's pin, I printed out the instruction pointer (ip) values for a program with dynamic linking and static linking.
And I've found that their ip values are quite different, even though they are the same program.
A program with static linking shows 0x400f50 for its very first ip value.
but a program with dynamic linking shows 0x7f94f0762090 for its first ip value
I am not sure why they have that quite a large gap.
It would be appreciated if anyone could help me find out the reason
I am not sure why they have that quite a large gap.
Because a dynamically linked program does not start executing in the binary: the first few thousands of instructions are executed in the dynamic linker (ld-linux), before control is transferred to _start in the main executable.
See also this answer.

Why ELF executables have a fixed load address?

ELF executables have a fixed load address (0x804800 on 32-bit x86 Linux binaries, and 0x40000 on 64-bit x86_64 binaries).
I read the SO answers (e.g., this one) about the historical reasons for those specific addresses. What I still don't understand is why to use a fixed load address and not a randomized one (given some range to be randomized within)?
why to use a fixed load address and not a randomized one
Traditionally that's how executables worked. If you want a randomized load address, build a PIE binary (which is really a special case of shared library that has startup code in it) with -fPIE and link with -pie flags.
Building with -fPIE introduces runtime overhead, in some cases as bad as 10% performance degradation, which may not be tolerable if you have a large cluster or you need every last bit of performance.
not sure if I understood your question correct, but saying I did, that's sort-off a "legacy" / historical issue, ELF is the file format used by UNIX derived operating systems, both POSIX (IOS) and Unix-like (Linux).
and the elf format simply states that there must be some resolved and absolute virtual address that the code is loaded into and begins running from...
and simply that's how the file format is, and during to historical reasons that cant be changed... you couldn't just "throw" the executable in any memory address and have it run successfully, back in the 90's when the ELF format was introduced problems such as calling functions with virtual tables we're raised and it was decided that the elf format would have absolute addresses within it.
Also think about it, take a look at the elf format -https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
how would you design an OS executable-loader that would be able to handle an executable load it to ANY desired virtual address and have the code run successfully without actually having to change the binary itself... if you would like to do something like that you'd either need to vastly change the output compilers generate or the format itself, which again isn't possible
As time passed the requirement of position independent executing (PIE/PIC) has raised and shared objects we're introduced in order to allow that and ASLR
(Address Space Layout Randomization) - which means that the code could be thrown in any memory address and still be able to execute, that is simply implemented by making sure that all calls within the code itself are relative to the current address of the executed instruction, AND that when the shared object is loaded the OS loader would have to change some data within the binary given that the data changed is not executable instructions (R E) but actual data (RW, e.g .data segment), which also is implemented by calling functions from some "Jump tables" ( which would be changed at load time ) for example PLT / GOT.... those shared objects allow absolute randomization of the addresses the code is loaded to and if you want to execute some more "secure" code you'd have to compile it as a shared object and and dynamically link it and load time or run time..
( hope I've cleared some things out :) )

Delete dynamically allocated memory twice?

First I'd like to point out that I'm using a GNU GCC compiler. I'm using Code::Blocks as my IDE so I don't have to type in all the compiler junk into the Windows DOS command prompt. If I could be more specific about my compiler, what shows up as a line at the bottom of Cod::Blocks when I successfully compile is
mingw32-g++.exe -std=c++11 -g
Anyways, my question involves using the delete operator to release dynamically allocated memory. When I compile this code snippet:
int* x;
x = new int;
delete x;
delete x;
I don't get any warnings or errors or crashes. From the book I'm learning C++ from, releasing a pointer to a dynamically allocated memory chuck can only be done once, then the pointer is invalid. If you use delete on the same pointer again, then there will be problems. However, I don't get this problem.
Likewise, if I pass an object by value to a function, so that it is shallow copied, I get no error if I don't have a copy constructor to ensure deep copy (using raw pointers in the object). This means that when the function returns, the shallow copy goes out of scope, and invokes its destructor (where I'm using delete on a pointer). When int main returns, the original object goes out of scope, its destructor is invoked, and that same shallow copied pointer is deleted. But I have no problems.
I tried finding documentation online about the compiler I'm using and couldn't find any. Does this mean that the mingw32 compiler has some sort of default copy constructor it uses? Thus, I don't have to worry about creating copy constructors?
The compiler documentation is not likely to be helpful in this case: If it exists, it is likely to list exceptions to the C++ spec. It's the C++ spec that you need here.
When you delete the same pointer twice, the result--according to the C++ spec--is undefined. The compiler could do anything at all and have it meet spec. The compiler is allowed to recognize the fault and give an error message, or not recognize the fault and blow up immediately or sometime later. That your compiler appeared to work this time is no indication that the double delete is safe. It could be mucking up the heap in a way that results in a seg fault much later.
If you do not define a copy constructor, C++ defines one for you. The default copy constructor does a memberwise copy.
When you have the same object pointed to by multiple pointers, such as you do, consider using std::smart_ptr.

GCC: avoiding long time linking while using static arrays

My question is practically repeats this one, which asks why this issue occurs. I would like ot know if it is possible to avoid it.
The issue is: if I allocate a huge amount of memory statically:
unsigned char static_data[ 8 * BYTES_IN_GYGABYTE ];
then linker (ld) takes very long time to make an executable. There is a good explanation from #davidg about this behaviour in question I gave above:
This leaves us with the follow series of steps:
The assembler tells the linker that it needs to create a section of memory that is 1GB long.
The linker goes ahead and allocates this memory, in preparation for placing it in the final executable.
The linker realizes that this memory is in the .bss section and is marked NOBITS, meaning that the data is just 0, and doesn't need to be physically placed into the final executable. It avoids writing out the 1GB of data, instead just throwing the allocated memory away.
The linker writes out to the final ELF file just the compiled code, producing a small executable.
A smarter linker might be able to avoid steps 2 and 3 above, making your compile time much faster
Ok. #davidg had explained why does linker takes a lot of time, but I want to know how can I avoid it. Maybe GCC have some options, that will say to linker to be a little smarter and to avoid steps 2 and 3 above ?
Thank you.
P.S. I use GCC 4.5.2 at Ubuntu
You can allocate the static memory in the release version only:
#ifndef _DEBUG
unsigned char static_data[ 8 * BYTES_IN_GYGABYTE ];
#else
unsigned char *static_data;
#endif
I would have 2 ideas in mind that could help:
As already mentioned in some comment: place it in a separate compilation unit.That itself will not reduce linking time. But maybe together with incremental linking it helps (ld option -r).
Other is similar. Place it in a separate compilation unit, and generate a shared library from it. And just link later with the shared library.
Sadly I can not promise that one of it helps, as I have no way to test: my gcc(4.7.2) and bin tools dont show this time consuming behaviour, 8, 16 or 32 Gigabytes testprogram compile and link in under a second.

Resources