Build library archive with undefined references - gcc

A colleague of mine told me yesterday that building libfoo.a doesn't require all it's functions to be defined, as long as they will be if you are building an executable that links to it AND that defines that missing reference..
He said that archives are only a collection of object files with indexing, and since object files can be build with undefined references, so can archives..
Is this true ? If so, does this imply that reference resolving is ONLY performed during the linking stage (ie never at compilation or archiving) ?
Thanks a lot.. compiler is gcc by the way, language is c/c++

Yes, all that is perfectly true. You seem to know that libfoo.a
is an ar archive. ar is the GNU
general purpose archiver. It is quite as happy to archive the contents
of your Documents, Pictures and/or Music folders as a collection of object
files.
External symbol resolution is linkage: it is the core business of linkage, and linkage
is done only by the linker. If ar were to supposed to resolve the external symbol references of object files in an archive, then ar,
like the linker, would require command options to specify the external libraries in
which symbol definitions are to be searched for, and the directories in which
those libraries are to be searched for. It hasn't any.
An ar archive may be used as a linker input file. In this case the linker will
search in the archive for any object files that provide definitions for unresolved
symbol references that have accrued from object files already consumed. It will
not care at all what other kinds of files are in the archive, with or
without object files. If it finds any object files that define unresolved references, it extracts
them from the archive and adds them to linkage, exactly as if they had been individually
specified in the commandline and the archive not mentioned at all. So the only role of
an archive in linkage is as a bag of object files from which the linker can pick ones
it needs to carry on.
If we know the right bag to offer the linker, we're spared the difficulty of knowing
exactly which object files within in it the linkage will need. That's the usefulness
of static libraries. In principle, any archive format might have been adopted (.tar, .gz...) But ar was first in
the field, is not burdened with unwanted functionality (directory serialization, compression ...), and was history's choice.
Microsoft LIB format, incidentally, is the same as ar format.
For this role in the service of the linker, GNU ar has specialized a little
for the presence of object files. The s option - which is a default, overrideable by
S - adds a fake "file" to the archive with an empty filename and data that the
linker is able to read as a lookup table from the global symbols defined by any object files in the archive to
the names and positions of those object files. Formerly
(and in non-GNU variants of ar) this kludge was applied by running a separate
program, ranlib on an archive to make it accessible to the linker.
The injection of a ranlib table is what enables the linker to pick the object files it needs out of the archive.
Any undefined references that are brought in with these object files are for the linker to
resolve as usual, from object files or libraries subsequently consumed.
The wording of your question suggests you may be under the impression that "archiving" -
e.g. creating libfoo.a - is one of processes that can be invoked, like compilation
and linkage, through the GCC frontends (gcc, g++, gfortran, etc.) This isn't
so. Those frontends invoke only (one or more of) a preprocessor, a compiler, an assembler
and the linker. Archives are an auxiliary convenience for delivering object files
to the linker and are created straightforwardly with ar:
ar cr libfoo.a file.o...
When this is done, the undefined references within libfoo.a are exactly the
undefined references within file.o ....

Related

Is ld called at both compile time and runtime?

I am trying to understand how linking and loading work. My understanding is that the Unix program "ld" contains both linking and loading functionality. When gcc is invoked, after preprocessing, compiling, and assembling, the linker is called which links all object files and .a files into an executable, along with minimal instructions for how shared libraries should be "connected" (what is the correct terminology here?) at runtime. This linker is ld.
At runtime, my understanding is that the executable is loaded into memory, although I'm not sure how. My specific questions are as follows:
1) Are shared object files being "linked" at compile time, or is there another word for what is happening?
2) At runtime, is ld being called for a second time? How can I see proof of this for my executable (on Linux and on MacOS)?
3) Are shared object files being "linked" at runtime, or is there another word for the process when shared objects are read from the location in LD_LIBRARY_PATH at runtime?
Thanks!
Is ld called at both compile time and runtime?
No: ld is not called at either compile or runtime.
When gcc is invoked, after preprocessing, compiling, and assembling, the linker is called which links all object files and .a files into an executable
Most moderately complicated programs use separate compilation and linking steps.
At compilation, a set of relocatable object files is produced (preprocessing, compilation and assembling are invoked at that step). Optionally the .o files are archived into an archive library (libsomething.a).
Then a link step is performed (often this is called "static linking", to differentiate this step from "dynamic loading" that will happen at runtime), producing an executable, or a shared library. Only at this step is /usr/bin/ld is invoked. On Linux, ld is part of the binutils package.
along with minimal instructions for how shared libraries should be "connected"
The linker records which shared libraries are required at runtime, and possibly which versions of libraries or symbols are required.
It also records which runtime loader should be used to load the required shared libraries.
At runtime, my understanding is that the executable is loaded into memory, although I'm not sure how.
The kernel loads executable into memory, and checks whether runtime loader was requested at static link time. If it was, the dynamic loader is also loaded into memory, and execution control is passed to it (instead of the main executable).
It is then the job of the dynamic loader to examine the executable for instructions on which other libraries are required, check whether correct versions can be found, loading them into memory, and arranging things such that symbol resolution will work between the main executable and the shared libraries. This is the runtime loading step, often also called dynamic linking.
The dynamic loader can be part of the OS, but on Linux it's part of libc (GLIBC, uClibc and musl each have their own loader).
No. ld is linking as in creating a library or exe, ld*.so is the loading part. Also ld*.so is part of the OS, not the gcc suite afaik. ld is generally part of (GNU) binutils on a gcc based system (but e.g. usually LLVM lld in a LLVM based system)
ld*.so is ld-linux-{arch}.so.2 on Linux and /libexec/ld-elf.so on e.g. FreeBSD.

How does gcc/ld find zlib.so?

I've used zlib for ages and never thought about the fact that it is named slightly unconventionally. While most libraries on Linux follow the naming convention of lib<name>.so for shared objects and lib<name>.a for archives, zlib is named zlib.so/zlib.a. My question is: how does gcc/ld know to look for zlib.so when I use -lz as a link flag?
I understand that for linking, gcc invokes ld, which searches for libraries in certain default paths and any path specified with -L, and it appends the lib and .so or .a. parts as necessary. Oddly, gcc's manual page for linking options only mentions that the linker can find archives; there is no mention of the .so extension. The man page for ld at least mentions both extensions, but still only mentions searching by prepending lib to the specified library name. How does ld know to add the lib after the z for zlib? I've never seen this happen to another library.
gcc has several different methods for linking libraries, shared or static. If you specify -lz, gcc is going to look for libz.so (possibly with some version bits between the libz and the .so, but the important part is the file name will start with libz and end with .so), or for libz.a (again, possibly with version info) if you are compiling statically, or as a fallback if the shared library does not exist. If you specify -lzlib it will look for libzlib.so (which is not the standard name - the package is often named zlib, but the library itself is libz). Another way of linking would be to not use the -l<lib> option, and just specify /path/to/zlib.so or -L /path/to zlib.so (or zlib.a if you want). In this case, the library doesn't have to have the lib prefix, but you would have to explicitly provide any version info, unless provisions are made for a symbolic link or something similar to provide the literal name zlib.so.
Applications can also load shared libraries at runtime via dlopen() and it's other associated functions, in which case the library can also be named whatever you want it to be (this doesn't work for static libraries, of course).
So, if the library you are looking at is actually called zlib.so, then it is not being found by gcc ... -lz, unless it just happens to be a symbolic link to libz.so (or vice versa, in which case gcc is really just using libz.so, which happens to have the same content as your zlib.so). However gcc might be using it if the build process explicitly names the library in the link stage (not using -l<lib>) or if your application loads it via dlopen() (but in that case, it's not really linked to your program - it's just loaded at run time).

Even with /Zf, not all names are in MAP file

I'm building a project for Wintel-32 with Visual Studio's MASM (it's called ML). I request map file generation in linker options. I'm specifying the /Zf option for the assembler (make all symbols global). Yet not all functions appear in the generated map file. Looks like only ones that are imported by other modules appear.
EDIT: there's a bunch of functions that are used only statically (i. e. within the same source file). They are not eliminated from the executable and they shouldn't be. But they don't appear in the MAP file. I want them there.
Those names can be seen if I call dumpbin /symbols on the object file (but only with the /Zf). Yet linker strips it from the final executable's map for some reason. The linker options /MAP and /MAPINFO:EXPORTS are there. What am I missing?
EDIT: and /OPT:NOREF too.
Probably the following options are enabled in your project.
- COMDAT generation i.e. Function-Level Linking during compilation.
- /OPT=NOREF linker optimization during linking.
Both the above options tell the compiler/linker to discard unused functions.
For example, update the command invoking the MASM linker ML, with the /OPT linker option as follows to retain even unused functions in the final executable:
ML [you-options] <your-file-name>/link /OPT:REF

'Undefined reference' despite class being linked properly

I have the following problem - I'm trying a sort of bastardized build of the poco library for C++ (ie, using a premake-generated makefile instead of the poco makefile because I'm building on windows without msvc)
I've actually managed to get all the libraries built into .a files. The problem arises when I try to actually use classes - and then gcc swears up and down that it can't find the reference. This despite the fact that I have checked the libraries with ar -t and seen that the classes in question do indeed exist there.
In general, what could be the problem? I have a library that at least claims to have the requisite .o files, yet the references are still undefined.
For example, I have an undefined reference to Poco::XML::InputSource::InputSource(std::istream&), yet "InputSource.o" is in the linked library, and the requisite ctor is in the header file.

Problem with linking in gcc

I am compiling a program in which a header file is defined in multiple places. Contents of each of the header file is different, though the variable names are the same internal members within the structures are different .
Now at the linking time it is picking up from a library file which belongs to a different header not the one which is used during compilation. Due to this I get an error at link time.
Since there are so many libraries with the same name I don't know which library is being picked up. I have lot of oems and other customized libraries which are part of this build.
I checked out the options in gcc which talks about selecting different library files to be included. But no where I am able to see an option which talks about which libraries are being picked up the linker.
If the linker is able to find more than one library file name, then which does the linker pick up is something which I am not able to understand. I don't want to specify any path, rather I want to understand how the linker is resolving the multiple libraries that it is able to locate. I tried putting -v option, but that doesn't list out the path from which the gcc picks up the library.
I am using gcc on linux.
Any help in this regard is highly appreciated.
Regards,
Chitra
Passing -Wl,-t to gcc will tell ld to dump which files it's reading.

Resources