simulating --whole-archive behavior on per symbol basis

simulating --whole-archive behavior on per symbol basis - gcc

I am familiar with what the --whole-archive linker option does when using a static archive.
Is there a way to achieve the same effect on a per symbol basis, either via some symbol attributes, or any other trick?
To be clear, lets say I have a .a which has two functions :
void foo() {}
void bar() {}
I would like to make sure that any executable that was built by linking against this archive will always have the foo() symbol, regardless of whether foo() is being used or not. I do not similarly care about bar()
Thanks.

You can use the option -u for that:
gcc -O2 foo.c -o foo -umysymbol -lmylib
This forces the linker to treat mysymbol as undefined and resolve it by linking it from the specified libraries.
From the manpage of ld:
-u symbol
--undefined=symbol
Force symbol to be entered in the output file as an undefined symbol. Doing this may, for example, trigger linking of additional
modules from standard libraries. `-u' may be repeated with different
option arguments to enter additional undefined symbols.

I'm afraid you'll need to modify your linkage to achieve this.
You should be clear that the entities within an archive (.a)
that can be linked, or not linked, with your executable are not
individual symbol definitions but individual archive members,
which are object files, each of which may define arbitrarily
many symbols. If you want to link a symbol from an archive, you link the whole
archive member that defines it.1
The foremost difference between linking an archive (.a)
with your executable and linking an object file (.o) is that
an object file will be linked, unconditionally, whereas an archive member
will be linked only if it provides a definition for at least one symbol that
has been referenced but not defined when the archive is examined.
Hence the ordinary, non-roundabout way of ensuring that a symbol foo is linked
unconditionally is to link an object file that defines it. There is no
way to mark an archive member foo.o as must be linked because, if you
must link foo.o, you do it by linking foo.o.
So if foo resides in an archive member foo.o, you can
extract that member from the archive:
ar x libthing.a foo.o
and add foo.o to your linkage, even if you don't have the foo source
from which to compile foo.o.
If there are a lot of functions that you want to link unconditionally
then you might either compile them all from source into a single object
file, if you have the source, or you might collect all the object files
that define them into a single archive that you link with --whole-archive.
I was hoping to find a way to advertise the public nature of the symbol in the library itself
For linkage purposes, there are nothing but public symbols in a library: by
definition any symbols that the linker can see are public.
[1] If you are in a position to compile an object file that goes into to
your linkage, then by the use of appropriate compiler and linker flags you
can ensure that redundant symbols it may contribute are finally discarded by
the linker. See this answer

The following seems to have worked for me.
In a function in my library that I know is always called, I do the following :
static volatile auto var = &foo
Now, without changing anything in the linking either when creating the archive or building the executable, I see that the executable has the foo symbol.

Related

Why is it possible to override symbols from some static libraries but not others?

I'm working on a tool that links against Clang, and I need to implement a small number of changes to some operations. To improve development times, instead of rebuilding Clang, I decided to redefine the symbols of interest in my program code, and let the linker take care of the rest: in most cases, the program version of a symbol that is defined in both program code and a static library takes precedence at link-time without a fuss. (The linked answer relates to Linux, but I found that to work on macOS too–usually.)
This worked great when I was using the stock Clang build for macOS that can be downloaded from the LLVM website. However, I am currently trying to switch to my company's customized Clang (which I built once from source, and hoped to further modify in the same way), and now I get duplicate symbol errors.
I don't know what is causing this issue. My project's linker flags have remained unchanged (save for one new static library): importantly, they do not contain -all_load or its -force_load cousin, which tell the linker to try to include every symbol defined in static libraries. The symbols that I'm trying to override look defined the same way when I check them with nm in the stock archive and in the custom archive. The difference has to be with how I built LLVM, but just knowing that doesn't really help me figure out what I need to change.
For instance, say that I want to redefine clang::Qualifiers::getAsString() const. I could do that just fine using the stock LLVM libraries, but now I would get a duplicate symbol error:
duplicate symbol __ZNK5clang10Qualifiers11getAsStringEv in:
.../Objects-normal/x86_64/TypePrinter.o
clang+llvm-internal/lib/libclangAST.a(TypePrinter.cpp.o)
Using nm -f darwin to inspect both archives, I would get very similar results for __ZNK5clang10Qualifiers11getAsStringEv:
# clang+llvm-6.0.0/lib/libclangAST.a
(undefined) external __ZNK5clang10Qualifiers11getAsStringEv
0000000000000bb0 (__TEXT,__text) external __ZNK5clang10Qualifiers11getAsStringEv
# clang+llvm-internal/lib/libclangAST.a
(undefined) external __ZNK5clang10Qualifiers11getAsStringEv
0000000000000d00 (__TEXT,__text) external __ZNK5clang10Qualifiers11getAsStringEv
So, assuming more or less identical symbol definitions, and identical linker flags, why was I able to override static library symbols this way before, and why am I no longer able to?

This part of the premise isn't quite correct:
In most cases, the program version of a symbol that is defined in both program code and a static library takes precedence at link-time without a fuss. (The linked answer relates to Linux, but I found that to work on macOS too–usually.)
The linked answer appears correct, but I originally misunderstood it. The actual behavior, as evidenced by passing -Wl,-why_load to Clang (or -why_load to the linker), goes as follow:
if a symbol is referenced, try to find its definition in the program code.
if it is defined in the program code, you're done; do not search static libraries.
if it is not defined in the program code, look up the .__SYMDEF file in the static library to know which object file has it.
use all the definitions from that object file.
The issue was that switching to the custom Clang, I accidentally pulled in references to symbols that were defined in the same object file as symbols that I am redefining, causing the linker to see both definitions. I was able to solve the problem by using the -why_load argument to the linker, and then looking for which symbol caused the problem object file to be loaded. I then duplicated the definition of that symbol to my program, and now the linker doesn't complain anymore.
The morale of the story is that this technique isn't as reliable on macOS as it is on Linux, and that if you do it, you kind of have to go all in. It's better to take the entire source file and copy it to your project than to try to pick symbols piecewise.

Actually this behavior is the same for Linux, see this reproducer:
First case: build library where symbols are in different object files:
//val.cpp - contains needed symbol
int val=42;
//wrong_main.cpp - contains duplicate symbol
int main(){
return 21;
}
>>> g++ -c val.cpp -o val.o
>>> g++ -c wrong_main.cpp -o wrong.o
>>> ar rcs libsingle.a val.o wrong.o
Linking against this library works, no multiple definition of main-error is issued, because no symbols at all are used from the object file wrong_main.o at all:
//main.cpp
extern int val;
int main(){
return val;
}
>>> g++ main.cpp -L. -lsingle -o works
Second case: both symbols are in the same object file:
//together.cpp - contains both, needed and duplicate, symbols
#include "val.cpp"
#include "wrong_main.cpp"
>>> g++ -c together.cpp -o together.o
>>> ar rcs libtogether.a all.o
Linking against libtogether.a doesn't work:
>>> g++ main.cpp -L. -ltogether -o doesntwork
./libtogether.a(all.o): In function `main':
all.cpp:(.text+0x0): multiple definition of `main'
/tmp/cc38isDb.o:main.cpp:(.text+0x0): first defined here
collect2: ld returned 1 exit status
The linker takes either the whole object file from a static library or nothing. In this case val is needed and so the object file together.o will be taken, but it also contains the duplicate symbol main and thus the linker issues an error.
A great description how the linker works on Linux (and very very similar on MacOS) is this article.

undefined reference to `cudaFree' and many other errors when compileing program [duplicate]

I'm attempting to do a release of some software and am currently working through a script for the build process. I'm stuck on something I never thought I would be, statically linking LAPACK on x86_64 linux. During configuration AC_SEARCH_LIB([main],[lapack]) works, but compilation of the lapack units do not work, for example undefiend reference to 'dsyev_' --no lapack/blas routine goes unnoticed.
I've confirmed I have the libraries installed and even compiled them myself with the appropriate options to make them static with the same results.
Here is an example I had used in my first experience with LAPACK a few years ago that works dynamically, but not statically: http://pastebin.com/cMm3wcwF
The two methods I'm using to compile are the following,
gcc -llapack -o eigen eigen.c
gcc -static -llapack -o eigen eigen.c

Your linking order is wrong. Link libraries after the code that requires them, not before. Like this:
gcc -o eigen eigen.c -llapack
gcc -static -o eigen eigen.c -llapack
That should resolve the linkage problems.
To answer the subsequent question why this works, the GNU ld documentation say this:
It makes a difference where in the command you write this option; the
linker searches and processes libraries and object files in the order
they are specified. Thus, foo.o -lz bar.o' searches libraryz' after
file foo.o but before bar.o. If bar.o refers to functions in `z',
those functions may not be loaded.
........
Normally the files found this way are library files—archive files
whose members are object files. The linker handles an archive file by
scanning through it for members which define symbols that have so far
been referenced but not defined. But if the file that is found is an
ordinary object file, it is linked in the usual fashion.
ie. the linker is going to make one pass through a file looking for unresolved symbols, and it follows files in the order you provide them (ie. "left to right"). If you have not yet specified a dependency when a file is read, the linker will not be able to satisfy the dependency. Every object in the link list is parsed only once.
Note also that GNU ld can do reordering in cases where circular dependencies are detected when linking shared libraries or object files. But static libraries are only parsed for unknown symbols once.

gcc archives not considered when linking?

Should it make a difference whether a gcc linker links archive files or object files (or both)?
Background:
In an embedded project, an ISR (which is of course not referenced by any other source code) is located as the only function in a file. This file is compiled to an object file and then put into an archive file.
Other functions in other files are compiled to separate object files.
The binary is built without complaints and runs on the target with no exceptions, no matter whether the linker uses the ISR object file or the ISR archive file.
However, if using the archive file, the ISR is not linked.
Plus, if there is any other reference (e.g. a variable used by some other function in some other file) in the same file, it is linked completely.
Why this?

Yes, it makes a difference.
Any object file that is specified on the linker commandline is linked
into the executable, regardless of whether any of the symbols that it
defines are referenced by the executable.
The linkage of a static library is different. It is an archive of object
files. For each object file in the archive, the linker will determine
whether that object file provides a definition for any of the symbols
that are so far undefined at that point in the linkage. If it does so,
then the linker will extract that object file from the archive and
link it in the executable; otherwise not.
This behaviour is as documented for the ld, the GNU linker {- l | --library }
option:
-l namespec
--library=namespec
...
The linker will search an archive only once, at the location where
it is specified on the command line. If the archive defines a symbol
which was undefined in some object which appeared before the archive
on the command line, the linker will include the appropriate file(s)
from the archive. However, an undefined symbol in an object appearing
later on the command line will not cause the linker to search the
archive again.
...
(To see that this applies to linkages invoked with gcc or another GNU compiler,you may need to know that the named compiler is simply a tool-driver that delegates to the appropriate tool for discharging the commandline options that are presented: when it sees options that call for a linkage, it calls ld.)
Hence the object file containing the unreferenced ISR is not linked when
it is in a library, and contains no other referenced symbols, and it is
linked when it is not in a library, or when it contains some other
referenced symbol.

Link multiple object files in gfortran

I have "library" folder with multiple object (.o) files. These files contain subroutines which are not changing from project to project. Each new project uses some of those object files, but not all of them.
Could you please tell me is there any way to tell gfortran to look up that folder for necessary .o files?
I've tried -I and -L options, but no way. When I write .o names directly, it works:gfortran main.for ./library/obj1.o ./library/obj2.o but I have many of .o files and write all of them waste time.
I could write gfortran main.for ./*.o but then main program will be linked with all .o files, but it needs only some of them.
I hoped that something like gfortran main.for -L./library/ will work, but it doesn't.
I use OS X with gcc version 5.1.0.
And I'm pretty sure that I should use makefile for such case

You are confusing object files with static libraries. An object file
is not a static library and the gfortran linker - which is simply
the GNU system linker, invoked by gfortran - will not treat it as such.
You need a static library and you are trying to use object files in lieu.
The linker recognizes an object file by the extension .o. It recognizes
a static library by the extension .a, and it expects the contents of an .a file
to have the form of a static library, not the format of an object file. (So you
cannot make an object file into a static library just by renaming it).
The linker will link into your program every object file that appears on its
commandline, whether or not it is needed. It does not expect you to mention
object files if you don't want them linked. The linker options -L and -l for
locating libraries have no application to object files.
A static library is a fairly simple archive containing some number of
object files, plus a house-keeping header and typically an index of the
public symbols defined in the contained object files.
When the linker encounters a static library on its commandline, it does not
link the entire contents of the library (unless you expressly tell it to). It inspects
the contained object files to determine which, if any, of them contain
definitions for symbols that are as yet undefined at that point in the linkage
of the program. If any object file in the library is found to provide any
of the missing definitions, then that object file is extracted from the library
and linked into the program. Object files in the library that provide no
missing definitions are not linked. Libraries on the commandline are sequentially
inspected in this way until either all the symbols referred to by the program
have definitions in linked object files or there are no more libraries.
If as you say the object files that you are trying to use as libraries are stable
resources that you never have to build for your projects, then you can just make a static
library out of them and link that library with your per-project programs.
To make a static library from object files, use the ar tool.
See man ar.
When you have made your library, say, libsubs.a, and have decided it shall reside
in some directory, /path/to/subs, then you link it with a program by adding
-L/path/to/subs -lsubs
to the commandline in which your program is linked. This will cause the linker
to search for a library called libsubs.a in directory /path/to/subs.
So if you are compiling and linking in a single step, use it like:
gfortran -o myprog myprog.f90 -L/path/to/subs -lsubs
And if you are compiling and linking in distinct steps, use it like:
gfortran -c -o myprog_1st_file.o myprog_1st_file.f90
gfortran -c -o myprog_2nd_file.o myprog_2nd_file.f90
gfortran -o myprog myprog_1st_file.o myprog_2nd_file.o -L/path/to/subs -lsubs
This is how you are supposed to use a set of object file resources of which
different subsets will be required for linkage with different programs: you put
them in a library and link the library.

Makefile automatic link dependency?

It's easy to let program figure out the dependency at compile time, (with gcc -MM). Nevertheless, link dependency (deciding which libraries should be linked to) seems to be difficult to figure out. This issue become emergent when multiple targets with individual libraries to link to are needed.
For instance, three dynamic library targets t1.so, t2.so and t3.so needs to be built. t1.so needs math library (-lm), while t2 and t3 don't. It would be tedious to write separate rules. A single rule requiring the three targets linked with math library saves the trouble. However, it causes inflation of target size since math library is unused for t2.so and t3.so.
Any ideas?

This is not as easy to figure out as finding needed headers. gcc -MM is just some fancy way to use the preprocessor, but it knows pretty much nothing about the way the code is used or works: you could include some headers full of #define's or introduce complex dependencies library dependencies.
I would stick with writing explicit linking dependencies for all targets (3 in your case). You can collect common dependencies in LDFLAGS.

It looks like ld's --trace option is a good start. The output needs formatting, but I think it contains all the right information.
My invocation looks something like this:
$ g++ -o foo a.o b.o -l sfml-graphics -l sfml-window -Wl,--trace
/usr/bin/ld: mode elf_i386
/usr/lib/gcc/i686-linux-gnu/4.6/../../../i386-linux-gnu/crt1.o
/usr/lib/gcc/i686-linux-gnu/4.6/../../../i386-linux-gnu/crti.o
/usr/lib/gcc/i686-linux-gnu/4.6/crtbegin.o
a.o
b.o
-lsfml-graphics (/usr/lib/gcc/i686-linux-gnu/4.6/../../../../lib/libsfml-graphics.so)
-lsfml-window (/usr/lib/gcc/i686-linux-gnu/4.6/../../../../lib/libsfml-window.so)
-lstdc++ (/usr/lib/gcc/i686-linux-gnu/4.6/libstdc++.so)
-lm (/usr/lib/gcc/i686-linux-gnu/4.6/../../../i386-linux-gnu/libm.so)
-lgcc_s (/usr/lib/gcc/i686-linux-gnu/4.6/libgcc_s.so)
/lib/i386-linux-gnu/libc.so.6
(/usr/lib/i386-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/i386-linux-gnu/ld-linux.so.2
-lgcc_s (/usr/lib/gcc/i686-linux-gnu/4.6/libgcc_s.so)
/usr/lib/gcc/i686-linux-gnu/4.6/crtend.o
/usr/lib/gcc/i686-linux-gnu/4.6/../../../i386-linux-gnu/crtn.o

Have you tried using 'nm'? It gives you a list of defined and undefined symbols in object/library files (see documentation here.
There's an approach mentioned in this post by Bernd Strieder that I am considering using -
1. Use nm to generate a list of symbols in all object/library files involved.
2. This file is parsed and basically the (U)ndefined and (T)ext symbols
and the symbols of main functions are filtered out and mapped to their
object files. I found that U and T symbols suffice, which reduces the
overall problem considerably compared to the linker, which has to
consider all symbols.
3. The transitive hull of the dependency relation according to U and T
symbols between object files is being calculated.
4. A list of object files needed to resolve all dependencies can be
printed for any object file.
5. For any main object file, a make target to link it is arranged.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio