How does GCC compile applications that reference a static library

How does GCC compile applications that reference a static library - gcc

I've read that the gcc compiler can perform certain optimization when compiling an application that references a static library, for instance - it will "pull" in only that code from the static library that the application depends upon. This helps keep the size of the application's executable to a minimum if portions of the static library are not being used by the app.
1) Is this true?
2) How does GCC know what code from the static library the application is actually using? Does it only look t the header files that are included (directly and indirectly) in the application and then pull code accordingly? Or does it actually look at what methods from the static library are being called?

A static library is just a bag of object files. The linker (ld) will keep track of which object files are used (i.e. contains a function referenced from somewhere), and not include unreferenced code in the final executable image.

gcc does nothing of the sort. Everything you describe is linking, which is handled by ld.
ld examines the symbol tables of the object files in order to determine which symbols need to be linked, and then pulls the relevant object files from the libraries and links them into the executable.

Answers
1) Yes, only the code referenced will be pulled in. Besides the smaller size there is also a gain in link speed since the static library contains a index table of all the symbols exported by the library. It is quicker doing lookups in this table as opposed to looking up in object files one by one.
Alternatively, if you wanted to pull in all the symbols in the static library regardless of reference. You can pass the --whole-archive switch to ld.
2) It would be more correct to ask this question in the context of ld (the gnu linker) since that is what actually pulls in the references. GCC just invokes the linker after its done compiling (unless you do gcc -c, which causes it to stop after compilation).
So, after compilation is done, ld is invoked with a ordered list of object(.o) files and libraries . ld processes the .o files one by one, and for each the linker
a) Notes down the external symbols needed by this file that cannot be resolved yet. Adds these to a (say) unresolved table.
b) Looks at the symbols (functions, global variables) exported by this file and resolves any previous refrences that it can.
This is a very simplified overview of the linking process.
Now when the linker comes to the static library, it essentially does the same thing, this time using the static library to resolve symbols. However there is one difference, the linker pulls in only the unresolved symbols and its dependencies. So assume we have
a.o and libstatic.a which in turn contains b.o and c.o.
b.o defines bar() and moreBar();
c.o defines baz() and moreBaz();
a.o defines foo();
where foo calls bar which calls baz. Now when you do
gcc -o app a.o libstatic.a
After processing a.o the linker knows that it needs to resolves bar, this gets resolved from the static library, however while resolving bar the linker notices that bar needs baz. This again gets resolved from libstatic.a. moreBar() and moreBaz() have no references and get ignored.

Related

Creating and linking static rust library and link to c

I tried to create a rust library that is callable by a c program, so far i managed to create a dynamic library and call it (library created using rustc --crate-type=cdylib src/lib.rs -o libCustomlib.so, linked using gcc main.o -lCustomlib).
When i now take the same code but compile it as a static library (rustc --crate-type=staticlib src/lib.rs -o libCustomlib.a) gcc throws errors when linking (using gcc main.o -L. -l:libCustomlib.a)
the errors are all undefined references to various functions
first few lines:
/usr/bin/ld: ./libCustomlib.a(std-b1b61f01951b016b.std.5rqysbiy-cgu.2.rcgu.o): in function `std::sys::unix::mutex::Mutex::init':
/usr/src/rustc-1.43.0//src/libstd/sys/unix/mutex.rs:46: undefined reference to `pthread_mutexattr_init'
/usr/bin/ld: /usr/src/rustc-1.43.0//src/libstd/sys/unix/mutex.rs:48: undefined reference to `pthread_mutexattr_settype'
/usr/bin/ld: /usr/src/rustc-1.43.0//src/libstd/sys/unix/mutex.rs:52: undefined reference to `pthread_mutexattr_destroy'
full error is over 100 lines long but the lines are all of this form
the lib.rs currently only has one test helloWorld function:
#[no_mangle]
pub extern "C" fn fn_test() {
println!("Hello, world!");
}
with the header file included in the caller part being:
extern void fn_test();
The question is, is my error at creating the static library or at linking it? Or lies the problem somewhere else and it should not work with static libraries? Should i just use the dynamic approach (which i would like to avoid since static ones feel more like using multiple languages in one exe since you don't have to distribute the library)?
(disclaimer: for everyone asking why i would do something like that without a good reason: it's a fun project, the entire program should be as overcomplicated as possible and that's the reason why i want to use different languages)

On Linux, std dynamically links to pthreads and libdl. You need to link these in as well to create the executable:
gcc main.o libCustomlib.a -lpthread -ldl
The result is a binary that links dynamically to a handful of fundamental libraries, but statically to Customlib.
If you want a purely statically linked binary, you will probably need to use no_std and enable only the specific features of core that do not depend on dynamically linked system libraries. (Certain libraries cannot be statically linked on Linux; read Statically linking system libraries, libc, pthreads, to aid in debugging) Just for a toy program like hello, world you may get away with simply passing -static to gcc, but for anything robust it's better to dynamically link these fundamental libraries.

Can I get CMake to generate Makefiles utilizing gcc incremental linking?

I've recently become aware of gcc's incremental linking feature, and I want to use it. The thing is, I don't write my own Makefile's - I use CMake. The inter-file dependencies and the targets are essentially the same, but I want to have CMake try to obtain them using incremental linking rather than linking from scratch?
Moreover, if it's possible to have this even for files-within libraries, i.e. when you recompile one .o within a .a file, instead of that whole file be reconsidered when linking the executable, only the single .o within it is reconsidered/reapplied.
To illustrate, suppose my CMakeLists.txt has:
add_executable(
foo
a.cpp
b.cpp
c.cpp
)
Right now, when a.cpp changes, we get a compilation a.cpp -> a.o then a regular linkage a.o b.o c.o -> foo. I want it to be a.o foo something_else_maybe -> foo
Note: This question is not about MSVC and its own incremental linking capabilities.

I've recently become aware of gcc's incremental linking feature
I think we need to clarify some terms. Incremental links is linker feature which allows you to speed up linking when only small subset of object files has changed. It does so by re-using results of previous link.
GNU ld does not have such a feature. What it can do is relocatable link i.e. combine several objects into one. If you link a.o and b.o to ab.o and then modify a.o, it'll not be able to reuse results of relocatable link so you'll have to re-link ab.o from scratch (as opposed to honest incremental linking).
I want to have CMake try to obtain them using incremental linking rather than linking from scratch
I'm afraid that CMake (or any other build system) does not provide support for this for several reasons.
First of all in this case you'll have to cache results of ld -rfor all possible subsets of your object files. This number grows exponentially which makes it non quite practical.
Secondly, there is more to linking the app besides linking it's object files: library linking, generation of dynamic sections (PLT, relocations, etc.), relaxation, etc. which will have to be done from scratch every time, even if you somehow manage to use -r. It can easily turn up to take much longer time than just linking object files.

including static libraries with -all_load flag

In what cases exactly do you need -all_load flag?
Lets say I have something like
g++ source.cpp -o test libA.a libB.a libC.a
From what i recall if there is some reference to a symbol used in source.cpp that is present
in say libB.a file then that libB.a will be linked (just that symbol or whole code in that library? ) and libA.a and libC.a will be ignored (their code will not be present in final executable).
What happens to other libraries when i use -all_load flag as follows
g++ source.cpp -o test -Wl,-all_load libA.a libB.a libC.a
how does 'strip' command effect the output with all_load flag?

-all_load is for when you want to link compile units that are (to the linker) unnecessary. For instance, perhaps you will dynamically access functions within the static library at runtime that you know the addresses of, but haven't actually made any explicit function calls to. How would you do that? Well, the compiler could help you by storing a bunch of function pointers in the executable to be read at run time, and then you'd build a lookup system for finding those functions using a string, and you'd call the whole thing Objective-C, which is probably the most common user of -all_load (at least if Google is any guide).
The most common case of this in ObjC is when you have a category in its own compile unit. The complier may not be able to tell that you reference it and so won't link it. So ObjC programmers use -all_load (or -force_load) more often than other C-like programmers. In fact, -all_load is a Darwin-specific extension in gcc.
But there are cases where people might want to use -all_load outside of ObjC. For instance, there might be some inter-dependencies in libA and libB. Consider this case:
source.cpp requires A() and B()
libA defines A() in a.o and Aprime() in aprime.o
libB defines B() in b.o and requires Aprime()
This typically won't link (*). The compiler will start with source.o and make a list of requirements: A() and B(). It'll then look at libA and see that it defines A(), so it'll link a.o (but not aprime.o). Then it will look at libB and see that it defines B() and requires Aprime(). It is now out of libraries, and it hasn't resolved Aprime(). It fails.
(*) Actually, it will with clang because clang is quite smart about this. But it won't with g++ at least up through 4.6.
The best solution would be to reorder it so that libB comes first (**). But if the dependencies were circular, you could get completely stuck. -all_load and -force_load let you work around these situations by turning off the linker's optimization.
(**) The really best solution is usually to redesign your libraries to avoid this kind of interdependency, but that may be hoping too much.
If you want to play around with the issue, see https://gist.github.com/rnapier/5710509.
strip just removes symbols from executables. That's not particularly related to static linking and -all_load (though it does impact dynamic linking). strip(1) has lots of discussion of that.

g++ vs gcc, linking problem with a static library (.a)

I'm trying to link a static library (.a) file with a .o file which supposedly uses the symbols from the library. However, when using gcc - the normal linker error comes up, regardless of using the .a file as
gcc -L. a.c staticlib.a
However, the same command works with g++ flawlessly.
Why is this happening ?
I can see that the .c file is totally legal c ( and hence c++ ), but then why isn't gcc able to detect symbols in the the library ?
Tried finding the symbols in the library using objdump, was able to find closely resembling symbols, but not exact ones. e.g:
Got
00000000000000b0 g F .text 000000000000004e _*Z15PhttsFn_InitTTSPh*
for the symbol *PhttsFn_InitTTS*
Can someone please explain this phenomenon ? I've also checked the architecture the library file was compiled for, and it's the same as my architecture.
Thanks!

C++ uses something called name mangling, in order for namespaces, overloaded function names, etc, to get unique symbols in the compiled object file.
Your C code refers to a symbol PhttsFn_InitTTS explicitly. Now if compiled as a C, it will produce that very symbol name. However, since C++ needs to deal with all these different variations of the same name (e.g. overloading, with different parameter lists), it creates a 'mangled' version encoding namespace, and parameter types. In your case it was mangled to Z15PhttsFn_InitTTSPh, basically saying no namespace and no parameters. (I reckon Z15 means 15 character name; followed by no parameter list).
Invoking GCC as gcc allows it to pick the file format itself, based on file extension (.c -> C, .cc or .cpp, etc -> C++). Invoking it as g++ forces C++ mode.
Your .a-file is obviously compiled using C++, as it exposed that mangled symbol.

OSX 10.5 Leopard Symbol Mangling with $non_lazy_ptr

Why does Leopard mangle some symbols with $non_lazy_ptr? More importantly what is the best method to fix undefined symbol errors because a symbol has been mangled with $non_lazy_ptr?

From: Developer Connection - Indirect Addressing
Indirect addressing is the name of the code generation technique that allows symbols defined in one file to be referenced from another file, without requiring the referencing file to have explicit knowledge of the layout of the file that defines the symbol. Therefore, the defining file can be modified independently of the referencing file. Indirect addressing minimizes the number of locations that must be modified by the dynamic linker, which facilitates code sharing and improves performance.
When a file uses data that is defined in another file, it creates symbol references. A symbol reference identifies the file from which a symbol is imported and the referenced symbol. There are two types of symbol references: nonlazy and lazy.
Nonlazy symbol references are resolved (bound to their definitions) by the dynamic linker when a module is loaded.
A nonlazy symbol reference is essentially a symbol pointer—a pointer-sized piece of data. The compiler generates nonlazy symbol references for data symbols or function addresses.
Lazy symbol references are resolved by the dynamic linker the first time they are used (not at load time). Subsequent calls to the referenced symbol jump directly to the symbol’s definition.
Lazy symbol references are made up of a symbol pointer and a symbol stub, a small amount of code that directly dereferences and jumps through the symbol pointer. The compiler generates lazy symbol references when it encounters a call to a function defined in another file.

In human-speak: the compiler generates stubs with $non_lazy_ptr appended to them to speed up linking. You're probably seeing that function Foo referenced from _Foo$non_lazy_ptr is undefined, or something like that - these are not the same thing. Make sure that the symbol is actually declared and exported in the object files/libraries you're linking your app to. At least that was my problem, I also thought it's a weird linker thing until I found that my problem was elsewhere - there are several other possible causes found on Google.

ranlib -c libwhatever.a
is a solid fix for the issue. I had the same problem when building the PJSIP library for iOS. This library sort-of uses an autoconf based make system, but needs a little tweaking to various files to make everything alright for iOS. In the process of doing that I managed to remove the ranlib line in the rule for libraries and then started getting an error in the link of my project about _PJ_NO_MEMORY_EXCEPTION referenced from _PJ_NO_MEMORY_EXCEPTION$non_lazy_ptr being undefined.
Adding the ranlib line back to the library file solved it. Now my full entry for LIBS in rules.mak is
$(LIB): $(OBJDIRS) $(OBJS) $($(APP)_EXTRA_DEP)
if test ! -d $(LIBDIR); then $(subst ##,$(subst /,$(HOST_PSEP),$(LIBDIR)),$(HOST_MKDIR)); fi
$(LIBTOOL) -o $(LIB) $(OBJS)
$(RANLIB) -c $(LIB)
Hope this helps others as well trying to use general UNIX configured external libraries with iPhone or iOS.

If someone else stumbles the same problem I had:
Had a extern NSString* const someString; in the header file, but forgot to put it the implementation file. as NSString* const someString=#"someString";
This solved it.

ranlib -c on your library file fixes the problem

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How does GCC compile applications that reference a static library - gcc

A static library is just a bag of object files. The linker (ld) will keep track of which object files are used (i.e. contains a function referenced from somewhere), and not include unreferenced code in the final executable image.

gcc does nothing of the sort. Everything you describe is linking, which is handled by ld. ld examines the symbol tables of the object files in order to determine which symbols need to be linked, and then pulls the relevant object files from the libraries and links them into the executable.

Related

Creating and linking static rust library and link to c

Can I get CMake to generate Makefiles utilizing gcc incremental linking?

including static libraries with -all_load flag

g++ vs gcc, linking problem with a static library (.a)

OSX 10.5 Leopard Symbol Mangling with $non_lazy_ptr

Categories

Resources