Is ld called at both compile time and runtime? - compilation

I am trying to understand how linking and loading work. My understanding is that the Unix program "ld" contains both linking and loading functionality. When gcc is invoked, after preprocessing, compiling, and assembling, the linker is called which links all object files and .a files into an executable, along with minimal instructions for how shared libraries should be "connected" (what is the correct terminology here?) at runtime. This linker is ld.
At runtime, my understanding is that the executable is loaded into memory, although I'm not sure how. My specific questions are as follows:
1) Are shared object files being "linked" at compile time, or is there another word for what is happening?
2) At runtime, is ld being called for a second time? How can I see proof of this for my executable (on Linux and on MacOS)?
3) Are shared object files being "linked" at runtime, or is there another word for the process when shared objects are read from the location in LD_LIBRARY_PATH at runtime?
Thanks!

Is ld called at both compile time and runtime?
No: ld is not called at either compile or runtime.
When gcc is invoked, after preprocessing, compiling, and assembling, the linker is called which links all object files and .a files into an executable
Most moderately complicated programs use separate compilation and linking steps.
At compilation, a set of relocatable object files is produced (preprocessing, compilation and assembling are invoked at that step). Optionally the .o files are archived into an archive library (libsomething.a).
Then a link step is performed (often this is called "static linking", to differentiate this step from "dynamic loading" that will happen at runtime), producing an executable, or a shared library. Only at this step is /usr/bin/ld is invoked. On Linux, ld is part of the binutils package.
along with minimal instructions for how shared libraries should be "connected"
The linker records which shared libraries are required at runtime, and possibly which versions of libraries or symbols are required.
It also records which runtime loader should be used to load the required shared libraries.
At runtime, my understanding is that the executable is loaded into memory, although I'm not sure how.
The kernel loads executable into memory, and checks whether runtime loader was requested at static link time. If it was, the dynamic loader is also loaded into memory, and execution control is passed to it (instead of the main executable).
It is then the job of the dynamic loader to examine the executable for instructions on which other libraries are required, check whether correct versions can be found, loading them into memory, and arranging things such that symbol resolution will work between the main executable and the shared libraries. This is the runtime loading step, often also called dynamic linking.
The dynamic loader can be part of the OS, but on Linux it's part of libc (GLIBC, uClibc and musl each have their own loader).

No. ld is linking as in creating a library or exe, ld*.so is the loading part. Also ld*.so is part of the OS, not the gcc suite afaik. ld is generally part of (GNU) binutils on a gcc based system (but e.g. usually LLVM lld in a LLVM based system)
ld*.so is ld-linux-{arch}.so.2 on Linux and /libexec/ld-elf.so on e.g. FreeBSD.

Related

Create a statically linked shared library

Is it possible to create a shared library which is itself statically linked, i.e. it does not depend on other shared libraries?
Let me be a little bit more concrete..
I want to create a shared library, say mylib.so, which makes use of some other special libraries (in my case its intel mkl and openMP). Since I have installed these libraries I can build mylib.so and include it in other programs without any problem.
However, if I want to use the library (or the executables including it) on another machine I first have to install all the intel stuff. Is there a way to avoid this? My first try was to add the option -static when building mylib.so but this doesn't seem to do anything..
I'm using icc..
Is it possible to create a shared library which is itself statically linked, i.e. it does not depend on other shared libraries?
Not on Linux, not when using GLIBC (your shared library will always depend on at least ld-linux*.so*).
I want to create a shared library, say mylib.so, which makes use of some other special libraries (in my case its intel mkl and openMP).
There is no problem1 statically linking Intel MKL and OpenMP libraries into mylib.so -- you just don't want to depend on these libraries dynamically (in other words, you are asking for an impossible thing which you don't actually need).
To do so, you need two things:
Link mylib.so with archive versions of the libraries you don't want to depend on dynamically, e.g. gcc -o mylib.so -shared mylib.c .../libmkl.a ...
The libraries which you want to statically link into mylib.so must have been built with position-independent code (i.e. with -fPIC flag).
Update:
What if the archived version isn't available?
Then you can't link it into your library.
Eg I'm using intel/oneapi/intelpython/latest/lib/libstdc++.so and there is no corresponding .a file..
This is a special case: you wouldn't want to link that version into your library even if it were available.
Instead, your program should use the version installed on the target system.
Having two separate versions of libstdc++ (e.g. one statically linked, and the other dynamically linked) into a single process will end very badly -- either with a crash, or with silent stack or heap corruption.
1 Note that linking in somebody else's library and distributing it may have licensing implications.

Cross compilation of libraries that dynamically or statically linked with system libraries

I am trying to cross compile some dependency libs for RaspberryPi target system, and host system is Linux with GCC compiler. For example, let's say that one of those libs has dependency on linkage stage and being linked with one of the system's static or dynamic libraries.
How this case is resolved by linker? (Because those .a or .so files can be different on target system, so probably program on RaspberryPi will crash in this case). How to make it work in a right way?
The build environment that the cross-compiler provides is more accurately described as a cross-toolchain. It needs to provide everything you need: Not just the compiler, but also the assembler, linker, and all run-time support libraries. That includes a C library (maybe glibc, maybe something else), the GCC run-time library (libgcc and libgcc_s), and the C++ run-time library (libstdc++). But the build environment also needs copies of all the libraries your software needs to build, typically both header files and static libraries or dynamic shared objects for the target. In particular, you cannot use the installed header files on the host because they might have the wrong definitions and declarations for the target.
Some programmers simply copy their dependencies (which are not system libraries) into their source tree, so that the cross-build environment can stay minimal. But then these libraries have to be tracked and updated as part of the project, which can be cumbersome.

GCC, PIE, PIC, archives and shared objects - what works with what?

I have a question about GCC, ThreadSanitizer and the use of archives, shared libraries and PIE and PIC.
I've been reading about as best I can all morning, but I just can't find useful, clear information on-line.
I understand what PIC does. I think I understand that PIE is if you like an optimized version of PIC, which is only for executables.
Now come the questions...
Can I compile an executable with PIC, rather than PIE?
If I compile a shared library (.so) with PIC, must I then use PIC with any executable which uses that library, rather than PIE?
If I compile an archive (.a), can I use PIE? (I have read -static and -pie should not be used together, which implies not).
I'm using ThreadSanitizer. This requires PIC (and perhaps PIE is okay too - but as you can see, I'm not clear about this). I have a library, which can be compiled as an archive (.a) or a shared library (.so). The library needs to use ThreadSanitizer. However, the binary which uses it also needs to use ThreadSanitizer (as it has some code of its own which needs checking).
The library when built as a shared library in fact fails to link when used wih ThreadSanitizer - I think the link is failing to link to libtsan (but this is I suspect not a real library, but a bunch of compiler instrincs built into GCC). This is almost certainly me getting something wrong somewhere.
What I really want to do is use an archive (.a) since the binary is a test programme and should be able to compile without the library being installed (so users can conveniently check/test the library - the makefile for the test binary has a hard coded path to the archive binary).
If I can use PIE with archives (.a), then I'd PIE the library and the test binary. If PIE cannot be used with archives, then I think I need to use PIC with both the library and test binary. I don't want to use a shared library at all, since ThreadSanitizer uses TLS (thread local store) heavily and shared libraries with PIC have absolutely terrible TLS performance.
Ultimate functionality of pic and pie are same but in gcc -fpic is used to create shared libraries whereas -fpie is used to for exes.
No you cannot use pic for an executable
Shared libraries don't care where pie (PIE just make the exec position independent) or normal execs are using it. It is dynamic linker's (ld.so) job to link the shared library.
No you can't make a exec position independent while using a static library. When you link a exec with static library it creates a dependency on exec and the symbols have to be resolved at compile-time.So in short you can't.
I've to go answer rest after office

How does gcc/ld find zlib.so?

I've used zlib for ages and never thought about the fact that it is named slightly unconventionally. While most libraries on Linux follow the naming convention of lib<name>.so for shared objects and lib<name>.a for archives, zlib is named zlib.so/zlib.a. My question is: how does gcc/ld know to look for zlib.so when I use -lz as a link flag?
I understand that for linking, gcc invokes ld, which searches for libraries in certain default paths and any path specified with -L, and it appends the lib and .so or .a. parts as necessary. Oddly, gcc's manual page for linking options only mentions that the linker can find archives; there is no mention of the .so extension. The man page for ld at least mentions both extensions, but still only mentions searching by prepending lib to the specified library name. How does ld know to add the lib after the z for zlib? I've never seen this happen to another library.
gcc has several different methods for linking libraries, shared or static. If you specify -lz, gcc is going to look for libz.so (possibly with some version bits between the libz and the .so, but the important part is the file name will start with libz and end with .so), or for libz.a (again, possibly with version info) if you are compiling statically, or as a fallback if the shared library does not exist. If you specify -lzlib it will look for libzlib.so (which is not the standard name - the package is often named zlib, but the library itself is libz). Another way of linking would be to not use the -l<lib> option, and just specify /path/to/zlib.so or -L /path/to zlib.so (or zlib.a if you want). In this case, the library doesn't have to have the lib prefix, but you would have to explicitly provide any version info, unless provisions are made for a symbolic link or something similar to provide the literal name zlib.so.
Applications can also load shared libraries at runtime via dlopen() and it's other associated functions, in which case the library can also be named whatever you want it to be (this doesn't work for static libraries, of course).
So, if the library you are looking at is actually called zlib.so, then it is not being found by gcc ... -lz, unless it just happens to be a symbolic link to libz.so (or vice versa, in which case gcc is really just using libz.so, which happens to have the same content as your zlib.so). However gcc might be using it if the build process explicitly names the library in the link stage (not using -l<lib>) or if your application loads it via dlopen() (but in that case, it's not really linked to your program - it's just loaded at run time).

What happens when compiling against a shared library?

I understand that when linking against a static library i.e. libname.a, the binary code for the used functions is taken out of the archive and inserted in the application binary. Therefore, the static library MUST be present at compilation time.
However, with shared libraries I am lost. The function definitions are not copied. Then why is it needed that the shared library be provided on the linker command line? Also, are there different ways to link against shared libraries and what are they?
The shared libraries need to be fed to the linker's command line so that a reference to the specific functions and the file in which these functions reside, is stored into the executable. When the executable is run, the dynamic linker (/lib/ld-linux.so, /libexec/ld-elf.so, etc, depending on your system) is loaded first and checks these references. Once it finds the lib files, it maps them (using the mmap() system call) to your program's adress space.
You can see these references by running
objdump -T a.out
or
nm -D a.out
For ELF executables, the existence of the .interp section implies that the program uses dynamic linking.
See the man pages for dlopen and dlsym for explicit dynamic link loader management.

Resources