I've recently become aware of gcc's incremental linking feature, and I want to use it. The thing is, I don't write my own Makefile's - I use CMake. The inter-file dependencies and the targets are essentially the same, but I want to have CMake try to obtain them using incremental linking rather than linking from scratch?
Moreover, if it's possible to have this even for files-within libraries, i.e. when you recompile one .o within a .a file, instead of that whole file be reconsidered when linking the executable, only the single .o within it is reconsidered/reapplied.
To illustrate, suppose my CMakeLists.txt has:
add_executable(
foo
a.cpp
b.cpp
c.cpp
)
Right now, when a.cpp changes, we get a compilation a.cpp -> a.o then a regular linkage a.o b.o c.o -> foo. I want it to be a.o foo something_else_maybe -> foo
Note: This question is not about MSVC and its own incremental linking capabilities.
I've recently become aware of gcc's incremental linking feature
I think we need to clarify some terms. Incremental links is linker feature which allows you to speed up linking when only small subset of object files has changed. It does so by re-using results of previous link.
GNU ld does not have such a feature. What it can do is relocatable link i.e. combine several objects into one. If you link a.o and b.o to ab.o and then modify a.o, it'll not be able to reuse results of relocatable link so you'll have to re-link ab.o from scratch (as opposed to honest incremental linking).
I want to have CMake try to obtain them using incremental linking rather than linking from scratch
I'm afraid that CMake (or any other build system) does not provide support for this for several reasons.
First of all in this case you'll have to cache results of ld -rfor all possible subsets of your object files. This number grows exponentially which makes it non quite practical.
Secondly, there is more to linking the app besides linking it's object files: library linking, generation of dynamic sections (PLT, relocations, etc.), relaxation, etc. which will have to be done from scratch every time, even if you somehow manage to use -r. It can easily turn up to take much longer time than just linking object files.
Related
Normally, I would compile a program that requires a specific library, e.g. math, by passing the linker flag after the sources that need it like so:
gcc foo.c -lm
However, it seems that older versions of gcc work equally well with the reverse order (let's call this BAD ORDER):
gcc -lm foo.c
I wouldn't worry about it if some popular open-source projects I'm trying to compile didn't use the latter while my version of gcc (or is it ld that's the problem?) work only in the former case (also, the correct one in my opinion).
My question is: when did the BAD ORDER stop working and why? It seems that not supporting it breaks legacy packages.
when did the BAD ORDER stop working and why? It seems that not supporting it breaks legacy packages.
When?
Not dead sure but I think pre-GCC 4.5. Long ago. Subsequently, the --as-needed option is operative for shared libraries by default,
so like static libraries, they must occur in the linkage sequence later than the objects for which they provide definitions.
This is a change in the default options that the gcc/g++/gfortran etc. tool-driver passes to ld.
Why?
It was considered confusing to inexpert users that static libraries by default has to appear
later that the objects to which they provided definitions while shared libraries by default did
not - the difference between the two typically being concealed by the -l<name> convention
for linking either libname.a or libname.so.
It was perhaps an unforeseen consequence that inexpert users who
had formerly had a lot of luck with the mistaken belief that a GCC
[compile and] link command conforms to the normal Unix pattern:
command [OPTION...] FILE [FILE...]
e.g.
gcc -lthis -lthat -o prog foo.o bar.o
now fare much worse with it.
I have a program foo I'm trying to compile and link and I'm running into a chicken and egg dillemma.
For reasons I'll explain below, Within a given directory I'm forced to add a link to several libraries we build (let's call them libA and libB) regardless of my target. I know I only actually need libA for my program; so after all libs are built and this binary is built I verified with ldd -u -r foo to show that libB is an unused direct dependency.
Being unused I altered the makefiles and flags such that libB is enveloped with -Wl --as-needed and -Wl --no-as-needed. I make rebuild, use ldd again and this time it doesn't show any unused deps. So far so good.
Now the fun part: Since its unused I would expect that if libB is not found/available/built that I should still be able to compile and link foo as long is libA is available. (example: If I did a fresh checkout and only built libA before trying to compile this specific test). But ld errors out with /usr/bin/ld: cannot find -lB
This suggests that ld needs to locate libB even if it won't need any of the symbols it provides? That doesn't seem to make sense. If all symbolic dependencies are already met, why does it even need to look at this other library? (That would explain the problem ld has and why this is not possible)
Is there a way I can say "Hey don't complain if you can't find this library and we shouldn't need to link with it?"
The promised reasons below
For various reasons beyond my control I have to share makeflags with many other tests in this directory due to the projects makefile hierarchy. There is a two level makefile for all these tests that says foo is a phony target, his recipe is make -f generictest.mk target=foo, and the generictest.mk just says that the source file is $(target).C, that this binary needs to use each library we build, specifies relative path to our root directory and then includes root's generic makefile. The root directory generic makefile expands all the other stuff out (flags, options, compiler, auto-gen of dependencies through g++ etc), and most importantly for each statement that said "use libX" in generictest.mk it adds -lX to the flags (or in my case enveloped in as-needed's)
While I'm well aware there are lots of things that are very unideal and/or horribly incorrect in terms of makefile best practices with this, I don't have the authority/physical ability to change it. And compared to the alternative employed in other folders, where others make individual concrete copies of this makefile for each target, I greatly prefer it; because that forces me to edit all of them whenever want to revise our whole make pattern, and yields lot of other typos and problems.
I could certainly create another generictest.mk like file to use for some tests and group together those using each based on actual library needs, but it would be kind of neat if I didn't have to as long as I said "you don't all of them, you need each of them but only if you actually use it".
There's no way that the linker can know that the library is not needed. Even after all your "normal" libraries are linked there are still lots and lots of unresolved symbols: symbols for the C runtime library (printf, etc.). The linker has no idea where those are going to come from.
Personally I'd be surprised if the linker didn't complain, even if every single symbol was already resolved. After all there may be fancy things at work here: weak bindings, etc. which may mean that symbols found later on the link line would be preferred over symbols found earlier (I'm not 100% sure this is possible but I wouldn't be surprised).
As for your situation, if you know that the library is not needed can't you just use $(filter-out ...) on the link command line to get rid of it? You'd have to write your own explicit rule for this with your own recipe, rather than using a default one, but at least you could use all the same variables.
Alternatively it MIGHT be possible to play some tricks with target-specific variables. Declare a target-specific variable for that target that resets the variable containing the "bad library" with a value that doesn't contain it (maybe by using $(filter-out ...) as above), and it will override that value for that target only. There are some subtle gotchas with target-specific variables overriding "more general" variables but I think it would work.
In what cases exactly do you need -all_load flag?
Lets say I have something like
g++ source.cpp -o test libA.a libB.a libC.a
From what i recall if there is some reference to a symbol used in source.cpp that is present
in say libB.a file then that libB.a will be linked (just that symbol or whole code in that library? ) and libA.a and libC.a will be ignored (their code will not be present in final executable).
What happens to other libraries when i use -all_load flag as follows
g++ source.cpp -o test -Wl,-all_load libA.a libB.a libC.a
how does 'strip' command effect the output with all_load flag?
-all_load is for when you want to link compile units that are (to the linker) unnecessary. For instance, perhaps you will dynamically access functions within the static library at runtime that you know the addresses of, but haven't actually made any explicit function calls to. How would you do that? Well, the compiler could help you by storing a bunch of function pointers in the executable to be read at run time, and then you'd build a lookup system for finding those functions using a string, and you'd call the whole thing Objective-C, which is probably the most common user of -all_load (at least if Google is any guide).
The most common case of this in ObjC is when you have a category in its own compile unit. The complier may not be able to tell that you reference it and so won't link it. So ObjC programmers use -all_load (or -force_load) more often than other C-like programmers. In fact, -all_load is a Darwin-specific extension in gcc.
But there are cases where people might want to use -all_load outside of ObjC. For instance, there might be some inter-dependencies in libA and libB. Consider this case:
source.cpp requires A() and B()
libA defines A() in a.o and Aprime() in aprime.o
libB defines B() in b.o and requires Aprime()
This typically won't link (*). The compiler will start with source.o and make a list of requirements: A() and B(). It'll then look at libA and see that it defines A(), so it'll link a.o (but not aprime.o). Then it will look at libB and see that it defines B() and requires Aprime(). It is now out of libraries, and it hasn't resolved Aprime(). It fails.
(*) Actually, it will with clang because clang is quite smart about this. But it won't with g++ at least up through 4.6.
The best solution would be to reorder it so that libB comes first (**). But if the dependencies were circular, you could get completely stuck. -all_load and -force_load let you work around these situations by turning off the linker's optimization.
(**) The really best solution is usually to redesign your libraries to avoid this kind of interdependency, but that may be hoping too much.
If you want to play around with the issue, see https://gist.github.com/rnapier/5710509.
strip just removes symbols from executables. That's not particularly related to static linking and -all_load (though it does impact dynamic linking). strip(1) has lots of discussion of that.
I can never remember what to type when linking include files in GCC, in fact the only one I can remember is -lm for math.h. The one I am specifically concerned with right now is sys/time.h.
This page clears things up some, but I would still like a list.
Does anyone know of a good list of linking options?
EDIT:
Maybe my question was not clear. I want to know what I need to type at the command line (like -lm for math or -lpthread for pthread) for the various libraries I might need to link when making C programs.
The functionality provided in <sys/time.h> is implemented in libc.so (C library). You don't need to link anything else in as gcc should automatically link to libc.so by itself. There is no 'linking of include files', rather you are linking against libraries that contain the symbols defined by code.
The -l flag is one of GCC's linker options and is used to specify additional libraries to link against.
edit because my gcc was performing optimizations on my source code at compile time
Also, the information in that link is a little outdated - you should not need an explicit link to libm (which is what -l m or -lm does) in modern GCC.
I'm not sure i understand your question but -lm is not an ld option, -l is an option and -lx links libx.a (or .so, it depends). you might want to look at the ld manual for a full list of options.
I think all other standard libraries other than math are included in libc.so(.a) (-lc)
I've read that the gcc compiler can perform certain optimization when compiling an application that references a static library, for instance - it will "pull" in only that code from the static library that the application depends upon. This helps keep the size of the application's executable to a minimum if portions of the static library are not being used by the app.
1) Is this true?
2) How does GCC know what code from the static library the application is actually using? Does it only look t the header files that are included (directly and indirectly) in the application and then pull code accordingly? Or does it actually look at what methods from the static library are being called?
A static library is just a bag of object files. The linker (ld) will keep track of which object files are used (i.e. contains a function referenced from somewhere), and not include unreferenced code in the final executable image.
gcc does nothing of the sort. Everything you describe is linking, which is handled by ld.
ld examines the symbol tables of the object files in order to determine which symbols need to be linked, and then pulls the relevant object files from the libraries and links them into the executable.
Answers
1) Yes, only the code referenced will be pulled in. Besides the smaller size there is also a gain in link speed since the static library contains a index table of all the symbols exported by the library. It is quicker doing lookups in this table as opposed to looking up in object files one by one.
Alternatively, if you wanted to pull in all the symbols in the static library regardless of reference. You can pass the --whole-archive switch to ld.
2) It would be more correct to ask this question in the context of ld (the gnu linker) since that is what actually pulls in the references. GCC just invokes the linker after its done compiling (unless you do gcc -c, which causes it to stop after compilation).
So, after compilation is done, ld is invoked with a ordered list of object(.o) files and libraries . ld processes the .o files one by one, and for each the linker
a) Notes down the external symbols needed by this file that cannot be resolved yet. Adds these to a (say) unresolved table.
b) Looks at the symbols (functions, global variables) exported by this file and resolves any previous refrences that it can.
This is a very simplified overview of the linking process.
Now when the linker comes to the static library, it essentially does the same thing, this time using the static library to resolve symbols. However there is one difference, the linker pulls in only the unresolved symbols and its dependencies. So assume we have
a.o and libstatic.a which in turn contains b.o and c.o.
b.o defines bar() and moreBar();
c.o defines baz() and moreBaz();
a.o defines foo();
where foo calls bar which calls baz. Now when you do
gcc -o app a.o libstatic.a
After processing a.o the linker knows that it needs to resolves bar, this gets resolved from the static library, however while resolving bar the linker notices that bar needs baz. This again gets resolved from libstatic.a. moreBar() and moreBaz() have no references and get ignored.