strip some public symbols from a Windows static library - windows

We want to produce Windows static libraries that have public symbols only for the documented API functions. We want to strip out all other symbols.
This is easy for a *NIX library; you can use the strip(1) utility and specify --keep-symbol=foo, and you can even put a whole list of symbols into a file and specify the file.
How can we do this for a Windows library?
A little more detail: suppose we are making a library and it is built from multiple .C files.
util.c defines the functions util_foo() and util_bar(). math.c defines some_math_func(). Then lib.c defines the functions api_func_0() and api_func_1(). The API functions call the utility and math functions, so those function must not be declared static. When we compile each .c file, the public symbols are in the object file, and then when we link the object files to make a library, the linker will leave all the public symbols visible. Once the linker has produced the static library, we only want the symbols api_func_0 and api_func_1 visible.

Related

How to make C function dynamically exported

My application works with static library which has extensions API. The API is able to call extension init function from the external shared library or from the "local" binary. That is I can include extension init function statically in to the main executable binary.
The local function is searched by dlsym call and init function should be dynamically exported from the main binary. That is following nm call:
nm -CD <binary>
should list my init function.
Let's assume init function has this signature:
int init_func(INIT_STRUCT *);
This function is not called directly - it is only supposed to be loaded by dlsym call.
So I have two related question:
how to force linker to not exclude this function from the generated binary?
how to force compiler/linker to export this function dynamically?
(I use gcc to compile and link my program)
Unfortunately default behavior of GNU toolchain is to not export symbols from executables by default (as opposed to shared libraries which default to exporting all their symbols). You can use a big-hammer -rdynamic flag which tells linker to export all symbols from your executable file. A less intrusive solution would be to provide explicit exports file via -Wl,--dynamic-list when linking (see example usage in Clang sources).
Ok, I will post an answer based on previous comments.
To make all functions dynamically exported: -rdynamic.
For a single function to be always linked (even if not referenced) you need to add -u<function> to the link line.
To link all functions (even unreferenced) use --whole-archive. To return to the normal linking use --no-whole-archive

Does gcc ld only include the .text of the relevant functions being used inside an executable?

I have compiled a variety of .c source codes into their respective .o object files and archived it as a .a archive file. Say in the main() function I use the function foo(). After compiling and linking, does the executable 1) only include the .text of the foo() function as well as all other functions recursively called by foo(), or 2) does it include the entire .o where the foo() resides in, or 3) the .text in the entire .a file?
It is desirable to go for option 1) as it will only include the bare minimum amount of instructions in a space constrained environment. How do I go about accomplishing this?
Normally, the loader will include all the material in all the object files (.o files on Unix/Linux) listed on the link line. If it processes any static libraries (.a files on Unix/Linux), then the object files that are needed from the library are included in toto (but any object files which do not define a symbol needed by the program are left out of the executable). If it processes any shared libraries (usually .so files on Unix/Linux), then it doesn't load any of the material into the binary, but it does keep a record of all the symbols provided by the shared library, so that it does not try to satisfy any of those symbols from later files or libraries.
The loader processes the argument list in left-to-right order. This means that you should list static libraries after object files. It isn't quite so critical to list shared libraries after object files, though it is still best to do so anyway, just in case the program is ever linked with static libraries instead of shared libraries.
If you end up with doubly defined symbols from the explicitly listed object files or object files extracted from static libraries, then the loader will fail. If you end up with doubly defined symbols in some of the shared libraries, the duplicates in the shared libraries are effectively ignored.
I think that's a reasonable summary written at high speed and not getting too lost in the details. There's a lot of if's and but's that could be added to the discussion; whole books have been written on the subject of how executables are created from object files and libraries.

GCC proper visibility for shared object written in C++

I have a huge project written in C++. It's all split into multiple static libraries that are eventually linked into one final shared library which has to export only a few simple functions.
If I do objdump of that final .so I see all my internal names etc. Because it uses long class names and namespaces these strings become excessively long and as a result final binary is big.
So, my question is how do I do it properly with GCC to make sure that all these internal functions do not show up in the final binary?
I'm aware about all these GCC-specific visibility modifiers, I use -fvisibility=hidden -fvisibility-inlines-hidden, I use -Wl,--no-whole-archive. I disable c++ exceptions and rtti (-fno-exceptions -fno-rtti) but i still can't get GCC to generate my final .so that doesn't contain names of my namespaces and classes that aren't supposed to be there at all!
I tried to use -Wl,--version-script= to control which functions should be visible, but still I see lot's of internal names in final stripped shared object. I read multiple similar entries on SO, but don't see anything that does the job.
Note: I compile for multiple platforms (Linux, Windows, iPhone etc) and only on windows in VS I don't have any problems.
thanks
You might want to try the --retain-symbols-file linker option when linking the final .so file (-Wl,--retain-symbols-file=filename) to specify JUST the symbols you want to keep (export) and delete everything else. The file is just a text file with symbols (one per line) to keep.

g++ vs gcc, linking problem with a static library (.a)

I'm trying to link a static library (.a) file with a .o file which supposedly uses the symbols from the library. However, when using gcc - the normal linker error comes up, regardless of using the .a file as
gcc -L. a.c staticlib.a
However, the same command works with g++ flawlessly.
Why is this happening ?
I can see that the .c file is totally legal c ( and hence c++ ), but then why isn't gcc able to detect symbols in the the library ?
Tried finding the symbols in the library using objdump, was able to find closely resembling symbols, but not exact ones. e.g:
Got
00000000000000b0 g F .text 000000000000004e _*Z15PhttsFn_InitTTSPh*
for the symbol *PhttsFn_InitTTS*
Can someone please explain this phenomenon ? I've also checked the architecture the library file was compiled for, and it's the same as my architecture.
Thanks!
C++ uses something called name mangling, in order for namespaces, overloaded function names, etc, to get unique symbols in the compiled object file.
Your C code refers to a symbol PhttsFn_InitTTS explicitly. Now if compiled as a C, it will produce that very symbol name. However, since C++ needs to deal with all these different variations of the same name (e.g. overloading, with different parameter lists), it creates a 'mangled' version encoding namespace, and parameter types. In your case it was mangled to Z15PhttsFn_InitTTSPh, basically saying no namespace and no parameters. (I reckon Z15 means 15 character name; followed by no parameter list).
Invoking GCC as gcc allows it to pick the file format itself, based on file extension (.c -> C, .cc or .cpp, etc -> C++). Invoking it as g++ forces C++ mode.
Your .a-file is obviously compiled using C++, as it exposed that mangled symbol.

How does GCC compile applications that reference a static library

I've read that the gcc compiler can perform certain optimization when compiling an application that references a static library, for instance - it will "pull" in only that code from the static library that the application depends upon. This helps keep the size of the application's executable to a minimum if portions of the static library are not being used by the app.
1) Is this true?
2) How does GCC know what code from the static library the application is actually using? Does it only look t the header files that are included (directly and indirectly) in the application and then pull code accordingly? Or does it actually look at what methods from the static library are being called?
A static library is just a bag of object files. The linker (ld) will keep track of which object files are used (i.e. contains a function referenced from somewhere), and not include unreferenced code in the final executable image.
gcc does nothing of the sort. Everything you describe is linking, which is handled by ld.
ld examines the symbol tables of the object files in order to determine which symbols need to be linked, and then pulls the relevant object files from the libraries and links them into the executable.
Answers
1) Yes, only the code referenced will be pulled in. Besides the smaller size there is also a gain in link speed since the static library contains a index table of all the symbols exported by the library. It is quicker doing lookups in this table as opposed to looking up in object files one by one.
Alternatively, if you wanted to pull in all the symbols in the static library regardless of reference. You can pass the --whole-archive switch to ld.
2) It would be more correct to ask this question in the context of ld (the gnu linker) since that is what actually pulls in the references. GCC just invokes the linker after its done compiling (unless you do gcc -c, which causes it to stop after compilation).
So, after compilation is done, ld is invoked with a ordered list of object(.o) files and libraries . ld processes the .o files one by one, and for each the linker
a) Notes down the external symbols needed by this file that cannot be resolved yet. Adds these to a (say) unresolved table.
b) Looks at the symbols (functions, global variables) exported by this file and resolves any previous refrences that it can.
This is a very simplified overview of the linking process.
Now when the linker comes to the static library, it essentially does the same thing, this time using the static library to resolve symbols. However there is one difference, the linker pulls in only the unresolved symbols and its dependencies. So assume we have
a.o and libstatic.a which in turn contains b.o and c.o.
b.o defines bar() and moreBar();
c.o defines baz() and moreBaz();
a.o defines foo();
where foo calls bar which calls baz. Now when you do
gcc -o app a.o libstatic.a
After processing a.o the linker knows that it needs to resolves bar, this gets resolved from the static library, however while resolving bar the linker notices that bar needs baz. This again gets resolved from libstatic.a. moreBar() and moreBaz() have no references and get ignored.

Resources