What happens when compiling against a shared library? - gcc

I understand that when linking against a static library i.e. libname.a, the binary code for the used functions is taken out of the archive and inserted in the application binary. Therefore, the static library MUST be present at compilation time.
However, with shared libraries I am lost. The function definitions are not copied. Then why is it needed that the shared library be provided on the linker command line? Also, are there different ways to link against shared libraries and what are they?

The shared libraries need to be fed to the linker's command line so that a reference to the specific functions and the file in which these functions reside, is stored into the executable. When the executable is run, the dynamic linker (/lib/ld-linux.so, /libexec/ld-elf.so, etc, depending on your system) is loaded first and checks these references. Once it finds the lib files, it maps them (using the mmap() system call) to your program's adress space.
You can see these references by running
objdump -T a.out
or
nm -D a.out
For ELF executables, the existence of the .interp section implies that the program uses dynamic linking.

See the man pages for dlopen and dlsym for explicit dynamic link loader management.

Related

Is ld called at both compile time and runtime?

I am trying to understand how linking and loading work. My understanding is that the Unix program "ld" contains both linking and loading functionality. When gcc is invoked, after preprocessing, compiling, and assembling, the linker is called which links all object files and .a files into an executable, along with minimal instructions for how shared libraries should be "connected" (what is the correct terminology here?) at runtime. This linker is ld.
At runtime, my understanding is that the executable is loaded into memory, although I'm not sure how. My specific questions are as follows:
1) Are shared object files being "linked" at compile time, or is there another word for what is happening?
2) At runtime, is ld being called for a second time? How can I see proof of this for my executable (on Linux and on MacOS)?
3) Are shared object files being "linked" at runtime, or is there another word for the process when shared objects are read from the location in LD_LIBRARY_PATH at runtime?
Thanks!
Is ld called at both compile time and runtime?
No: ld is not called at either compile or runtime.
When gcc is invoked, after preprocessing, compiling, and assembling, the linker is called which links all object files and .a files into an executable
Most moderately complicated programs use separate compilation and linking steps.
At compilation, a set of relocatable object files is produced (preprocessing, compilation and assembling are invoked at that step). Optionally the .o files are archived into an archive library (libsomething.a).
Then a link step is performed (often this is called "static linking", to differentiate this step from "dynamic loading" that will happen at runtime), producing an executable, or a shared library. Only at this step is /usr/bin/ld is invoked. On Linux, ld is part of the binutils package.
along with minimal instructions for how shared libraries should be "connected"
The linker records which shared libraries are required at runtime, and possibly which versions of libraries or symbols are required.
It also records which runtime loader should be used to load the required shared libraries.
At runtime, my understanding is that the executable is loaded into memory, although I'm not sure how.
The kernel loads executable into memory, and checks whether runtime loader was requested at static link time. If it was, the dynamic loader is also loaded into memory, and execution control is passed to it (instead of the main executable).
It is then the job of the dynamic loader to examine the executable for instructions on which other libraries are required, check whether correct versions can be found, loading them into memory, and arranging things such that symbol resolution will work between the main executable and the shared libraries. This is the runtime loading step, often also called dynamic linking.
The dynamic loader can be part of the OS, but on Linux it's part of libc (GLIBC, uClibc and musl each have their own loader).
No. ld is linking as in creating a library or exe, ld*.so is the loading part. Also ld*.so is part of the OS, not the gcc suite afaik. ld is generally part of (GNU) binutils on a gcc based system (but e.g. usually LLVM lld in a LLVM based system)
ld*.so is ld-linux-{arch}.so.2 on Linux and /libexec/ld-elf.so on e.g. FreeBSD.

Generate library from ELF file

I'm trying to generate a static library from a compiled ELF file.
Previously, I've been able to generate the library by compiling my source code to object files, then passing those objects to avr-ar to successfully create my library. In order to reduce code space of the project, I've switched over to using link-time optimisations so save ~1.5 kB of space - however, in order to do so I end up passing all my source and header files to avr-gcc in one invocation and it spits out a .elf file.
I can't seem to get the -flto option working with the linker (I'm using a custom linker script) and compiler driver, otherwise I'd have the object files I need.
Is it possible to take this generated .elf and push it through ar to generate a library?
Problem Context:
This is related to this problem. I've written the shared libraries and bootloader section, and am using this linker script to set out my flash space. Here's the Makefile that drives all this - it's very hacked together.
Ideally, what I'd like to happen is to be able to compile my src/ director to separate object files in obj/, all with link time optimisation enabled to cut down on code space as much as possible but still leaving unused functions in the output (the shared library that is stored in flash is not fully utilised by the bootloader application, but may be linked against by the loaded applications). I'd then like to be able link those objects together to create a .elf and libbootloader.a. The elf is then used to generate a binary to flash to my AVR and the bootloader library is referenced when building user applications that refer to the stored library in flash space already. (Perhaps I want to just link against a list of symbols referencing the shared library section?)

Linking binary against functions/data in specific location in memory

I'm currently in the process of writing an intermediate-memory bootloader for an ATMega.
I'd like to place a section of commonly used functions and data in a specific location in memory, such that:
limited size of the bootloader section is not overcome
library functions, drivers, etc, are not reproduced by the application section and thus wasting space
For illustrative purposes, a map of the desired memory layout is below:
Following some help in this thread on avrfreaks, I'm to the point where I've been able to move all code (in my bootloader + library development environment - applications will be developed in separate projects) not tagged with __attribute__ ((section(".boot"))) to the shared library section successfully via means of a custom linker script.
It was suggested in the avrfreaks thread that I can link my applications by using avr-objcopy --strip-all --keep-symbol=fred --keep-symbol=greg ... boot.elf dummy.elf to create a symbol reference of what I have in my shared library, and then linking my applications against this memory layout with avr-gcc -o app.elf -Wl,--just-symbols=dummy.elf app1.o app2.o ....
The problem I face here is that I need to specify each symbol I want to keep in my dumy.elf. I can use the keep-symbols=<file> directive to specify a text file list of symbols to keep, but I still must generate this list.
I've noticed that there is a bunch of symbols that I don't want to include (stuff like C environment set-up code that is common in name, but different in functionality, for both the bootloader and application) that seems to start with the prefix '_' (but of course, there are some useful and large library functions with the same prefix, e.g. *printf and math routines). Perhaps there won't be conflicts if I link my application against the existing runtime code in the application/bootloader?
How can I generate a list of symbols for my library section that contains the code that I've written (maybe some sed magic and scanning header files)/excludes any symbols that may conflict in linking the application?
The project can be viewed in its current state at this github repository.
Edit: I want to make clear that I could tag everything I want to be in the shared library section with __attribute__ ((section(".library"))), but as I also want to share some rather large libc stuff (vsprintf, etc) between the bootloader and application, this becomes cumbersome very quickly. As such, I've elected to put everything not tagged as boot in the library memory region via a linker script.
Perhaps I just need some advice on my linker script, as I'm not super sure what I'm doing there.
Consider using -R <file> as linker option (gcc -Wl,-R -Wl,<file>).
This will generate references to (global) symbols in <file> just as if they were linked normally, but not include the referenced code.

How to detect code compiled with LTO?

Exist any way to detect if code is compiled with -flto?
Example is classic library or executable under Linux compiled with GCC (4.9.1), without debugging.
Considering that LTO information is stored in several ELF sections inside object files (see LTO file sections), you could try and see what readelf returns (as used for instance in this answer).
Look for .gnu.lto_.xxx entries.

How does gcc/ld find zlib.so?

I've used zlib for ages and never thought about the fact that it is named slightly unconventionally. While most libraries on Linux follow the naming convention of lib<name>.so for shared objects and lib<name>.a for archives, zlib is named zlib.so/zlib.a. My question is: how does gcc/ld know to look for zlib.so when I use -lz as a link flag?
I understand that for linking, gcc invokes ld, which searches for libraries in certain default paths and any path specified with -L, and it appends the lib and .so or .a. parts as necessary. Oddly, gcc's manual page for linking options only mentions that the linker can find archives; there is no mention of the .so extension. The man page for ld at least mentions both extensions, but still only mentions searching by prepending lib to the specified library name. How does ld know to add the lib after the z for zlib? I've never seen this happen to another library.
gcc has several different methods for linking libraries, shared or static. If you specify -lz, gcc is going to look for libz.so (possibly with some version bits between the libz and the .so, but the important part is the file name will start with libz and end with .so), or for libz.a (again, possibly with version info) if you are compiling statically, or as a fallback if the shared library does not exist. If you specify -lzlib it will look for libzlib.so (which is not the standard name - the package is often named zlib, but the library itself is libz). Another way of linking would be to not use the -l<lib> option, and just specify /path/to/zlib.so or -L /path/to zlib.so (or zlib.a if you want). In this case, the library doesn't have to have the lib prefix, but you would have to explicitly provide any version info, unless provisions are made for a symbolic link or something similar to provide the literal name zlib.so.
Applications can also load shared libraries at runtime via dlopen() and it's other associated functions, in which case the library can also be named whatever you want it to be (this doesn't work for static libraries, of course).
So, if the library you are looking at is actually called zlib.so, then it is not being found by gcc ... -lz, unless it just happens to be a symbolic link to libz.so (or vice versa, in which case gcc is really just using libz.so, which happens to have the same content as your zlib.so). However gcc might be using it if the build process explicitly names the library in the link stage (not using -l<lib>) or if your application loads it via dlopen() (but in that case, it's not really linked to your program - it's just loaded at run time).

Resources