Is there a way to strip all functions from an object file that I am not using? - gcc

I am trying to save space in my executable and I noticed that several functions are being added into my object files, even though I never call them (the code is from a library).
Is there a way to tell gcc to remove these functions automatically or do I need to remove them manually?

If you are compiling into object files (not executables), then a compiler will never remove any non-static functions, since it's always possible you will link the object file against another object file that will call that function. So your first step should be declaring as many functions as possible static.
Secondly, the only way for a compiler to remove any unused functions would be to statically link your executable. In that case, there is at least the possibility that a program might come along and figure out what functions are used and which ones are not used.
The catch is, I don't believe that gcc actually does this type of cross-module optimization. Your best bet is the -Os flag to optimize for code size, but even then, if you have an object file abc.o which has some unused non-static functions and you link statically against some executable def.exe, I don't believe that gcc will go and strip out the code for the unused functions.
If you truly desperately need this to be done, I think you might have to actually #include the files together so that after the preprocessor pass, it results in a single .c file being compiled. With gcc compiling a single monstrous jumbo source file, you stand the best chance of unused functions being eliminated.

Have you looked into calling gcc with -Os (optimize for size.) I'm not sure if it strips unreached code, but it would be simple enough to test. You could also, after getting your executable back, 'strip' it. I'm sure there's a gcc command-line arg to do the same thing - is it --dead_strip?

In addition to -Os to optimize for size, this link may be of help.

Since I asked this question, GCC 4.5 was released which includes an option to combine all files so it looks like it is just 1 gigantic source file. Using that option, it is possible to easily strip out the unused functions.
More details here

IIRC the linker by default does what you want ins some specific cases. The short of it is that library files contain a bunch of object files and only referenced files are linked in. If you can figure out how to get GCC to emit each function into it's own object file and then build this into a library you should get what you are looking.
I only know of one compiler that can actually do this: here (look at the -lib flag)

Related

how does ld deal with code that is supplied twice (in a source file and in a library)?

Suppose we call
gcc -Dmyflag -lmylib mycode.c
where mylib contains all of mycode but is compiled without -Dmyflag. So all functions and other entities implemented in mycode are available in two versions to the loader. Empirically, I find that the version from mycode is taken. Can I rely on that? Will mycode always overwrite mylib?
Empirically, I find that the version from mycode is taken.
Read this explanation of how linker works with archive libraries, and possibly this one.
Can I rely on that?
You should rely on understanding how this works.
If you understood material in referenced links, you'll observe that adding main to libmylib.a will invert the answer (and if mycode.c also contains main, you'll get duplicate symbol definition error).
If you are using a dynamic library libmylib.so, the rules are different, and the library will always lose to the main binary, although there are many complications, such as LD_PRELOAD, linking the library with -Bsymbolic, and others.
In short, you should prefer to not do this at all.

G++/LD fails: can't find library when library isn't actually needed

I have a program foo I'm trying to compile and link and I'm running into a chicken and egg dillemma.
For reasons I'll explain below, Within a given directory I'm forced to add a link to several libraries we build (let's call them libA and libB) regardless of my target. I know I only actually need libA for my program; so after all libs are built and this binary is built I verified with ldd -u -r foo to show that libB is an unused direct dependency.
Being unused I altered the makefiles and flags such that libB is enveloped with -Wl --as-needed and -Wl --no-as-needed. I make rebuild, use ldd again and this time it doesn't show any unused deps. So far so good.
Now the fun part: Since its unused I would expect that if libB is not found/available/built that I should still be able to compile and link foo as long is libA is available. (example: If I did a fresh checkout and only built libA before trying to compile this specific test). But ld errors out with /usr/bin/ld: cannot find -lB
This suggests that ld needs to locate libB even if it won't need any of the symbols it provides? That doesn't seem to make sense. If all symbolic dependencies are already met, why does it even need to look at this other library? (That would explain the problem ld has and why this is not possible)
Is there a way I can say "Hey don't complain if you can't find this library and we shouldn't need to link with it?"
The promised reasons below
For various reasons beyond my control I have to share makeflags with many other tests in this directory due to the projects makefile hierarchy. There is a two level makefile for all these tests that says foo is a phony target, his recipe is make -f generictest.mk target=foo, and the generictest.mk just says that the source file is $(target).C, that this binary needs to use each library we build, specifies relative path to our root directory and then includes root's generic makefile. The root directory generic makefile expands all the other stuff out (flags, options, compiler, auto-gen of dependencies through g++ etc), and most importantly for each statement that said "use libX" in generictest.mk it adds -lX to the flags (or in my case enveloped in as-needed's)
While I'm well aware there are lots of things that are very unideal and/or horribly incorrect in terms of makefile best practices with this, I don't have the authority/physical ability to change it. And compared to the alternative employed in other folders, where others make individual concrete copies of this makefile for each target, I greatly prefer it; because that forces me to edit all of them whenever want to revise our whole make pattern, and yields lot of other typos and problems.
I could certainly create another generictest.mk like file to use for some tests and group together those using each based on actual library needs, but it would be kind of neat if I didn't have to as long as I said "you don't all of them, you need each of them but only if you actually use it".
There's no way that the linker can know that the library is not needed. Even after all your "normal" libraries are linked there are still lots and lots of unresolved symbols: symbols for the C runtime library (printf, etc.). The linker has no idea where those are going to come from.
Personally I'd be surprised if the linker didn't complain, even if every single symbol was already resolved. After all there may be fancy things at work here: weak bindings, etc. which may mean that symbols found later on the link line would be preferred over symbols found earlier (I'm not 100% sure this is possible but I wouldn't be surprised).
As for your situation, if you know that the library is not needed can't you just use $(filter-out ...) on the link command line to get rid of it? You'd have to write your own explicit rule for this with your own recipe, rather than using a default one, but at least you could use all the same variables.
Alternatively it MIGHT be possible to play some tricks with target-specific variables. Declare a target-specific variable for that target that resets the variable containing the "bad library" with a value that doesn't contain it (maybe by using $(filter-out ...) as above), and it will override that value for that target only. There are some subtle gotchas with target-specific variables overriding "more general" variables but I think it would work.

how can I verify that dead code was stripped from the binary?

My c/obj-c code (an iOS app built with clang) has some functions excluded by #ifdefs. I want to make sure that code that gets called from those functions, but not from others (dead code) gets stripped out (eliminated) at link time.
I tried:
Adding a local literal char[] in a function that should be eliminated; the string is still visible when running strings on the executable.
Adding a function that should be eliminated; the function name is still visible when running strings.
Before you ask, I'm building for release, and all strip settings (including dead-code stripping, obviously) are enabled.
The question is not really xcode/apple/iOS specific; I assume the answer should be pretty much the same on any POSIX development platform.
(EDIT)
In binutils, ld has the --gc-sections option which does what you want for sections on object level. You have several options:
use gcc's flags -ffunction-sections and -fdata-sections to isolate each symbol into its own section, then use --gc-sections;
put all candidates for removal into a separate file and the linker will be able to strip the whole section;
disassemble the resulting binary, remove dead code, assemble again;
use strip with appropriate -N options to discard the offending symbols from the
symbol table - this will leave the code and data there, but it won't show up in the symbol table.

Static library "interface"

Is there any way to tell the compiler (gcc/mingw32) when building an object file (lib*.o) to only expose certain functions from the .c file?
The reason I want to do this is that I am statically linking to a 100,000+ line library (SQLite), but am only using a select few of the functions it offers. I am hoping that if I can tell the compiler to only expose those functions, it will optimize out all the code of the functions that are never needed for those few I selected, thus dratically decreasing the size of the library.
I found several possible solutions:
This is what I asked about. It is the gcc equivalent of Windows' dllexpoort:
http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Code-Gen-Options.html (-fvisibility)
http://gcc.gnu.org/wiki/Visibility
I also discovered link-time code-generation. This allows the linker to see what parts of the code are actually used and get rid of the rest. Using this together with strip and -fwhole-program has given me drastically better results.
http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Optimize-Options.html (see -flto and -fwhole-program)
Note: This flag only makes sense if you are not compiling the whole program in one call to gcc, which is what I was doing (making a sqlite.o file and then statically linking it in).
The third option which I found but have not yet looked into is mentioned here:
How to remove unused C/C++ symbols with GCC and ld?
That's probably the linkers job, not the compilers. When linking that as a program (.exe), the linker will take care of only importing the relevant symbols, and when linking a DLL, the __dllexport mechanism is probably what you are looking for, or some flags of ld can help you (man ld).

Size of a library and the executable

I have a static library *.lib created using MSVC on windows. The size of library is say 70KB. Then I have an application which links this library. But now the size of the final executable (*.exe) is 29KB, less than the library. What i want to know is :
Since the library is statically linked, I was thinking it should add directly to the executable size and the final exe size should be more than that? Does windows exe format also do some compression of the binary data?
How is it for linux systems, that is how do sizes of library on linux (*.a/*.la file) relate with size of linux executable (*.out) ?
-AD
A static library on both Windows and Unix is a collection of .obj/.o files. The linker looks at each of these object files and determines if it is needed for the program to link. If it isn't needed, then the object file won't get included in the final executable. This can lead to executables that are smaller then the library.
EDIT: As MSalters points out, on Windows the VC++ compiler now supports generating object files that enable function-level linking, e.g., see here. In fact, edit-and-continue requires this, since the edit-and-continue needs to be able to replace the smallest possible part of the executable.
There is additional bookkeeping information in the .lib file that is not needed for the final executable. This information helps the linker find the code to actually link. Also, debug information may be stored in the .lib file but not in the .exe file (I don't recall where debug info is stored for objs in a lib file, it might be somewhere else).
The static library probably contains several functions which are never used. When the linker links the library with the main executable, it sees that certain functions are never used (and that their addresses are never taken and stored in function pointers), it just throws away the code. It can also do this recursively: if function A() is never called, and A() calls B(), but B() is never otherwise called, it can remove the code for both A() and B(). On Linux, the same thing happens.
A static library has to contain every symbol defined in its source code, because it might get linked into an executable which needs just that specific symbol. But once it is linked into an executable, we know exactly which symbols end up being used, and which ones don't. So the linker can trivially remove unused code, trimming the file size by a lot. Similarly, any duplicate symbols (anything that's defined in both the static library and the executable it's linked into gets merged into a single instance.
Disclaimer: It's been a long time since I dealt with static linking, so take my answer with a grain of salt.
You wrote: I was thinking it should add directly to the executable size and final exe size should be more than that?
Naive linkers work exactly this way - back when I was doing hobby development for CP/M systems (a LONG time ago), this was a real problem.
Modern linkers are smarter, however - they only link in the functions referenced by the original code, or as required.
Additionally to the current answers, the linker is allowed to remove function definitions if they have identical object code - this is intended to help reduce the bloating effects of templated code.
#All: Thanks for the pointers.
#Greg Hewgill - Your answer was a good pointer. Thanks.
The answer i found out was as follows:
1.)During Library building what happens is if the option "Keep Program debug databse" in MSVC (or something alike ) is ON, then library will have this debug info bloating its size.
but when i statically include that library and create a executable, the linker strips all that debug info from the library before geenrating the exe and hence the exe size is less than that of the library.
2.) When i disabled the option "Keep Program debug databse", i got an library whose size was smaller than the final executable, which was what i thought is nromal in most situations.
-AD

Resources