linker option to ignore unused dependencies - gcc

I would like to remove all unused symbols from my compiled C++ binary. I saw this, which gives an overview using gcc, which is the toolchain I'm using: How to remove unused C/C++ symbols with GCC and ld?
However, on my system, the linking option (-Wl,--gc-sections) is rejected:
$ gcc -fdata-sections -ffunction-sections a.c -o a.o -Wl,--gc-sections
ld: fatal: unrecognized option '--'
ld: fatal: use the -z help option for usage information
collect2: error: ld returned 1 exit status
I'm running on illumos, which is a (relatively) recent fork of Solaris, with GCC 4.7. Anybody know what the correct linker option to use here is?
Edit: searching the man pages more closely turned up "-zignore":
-z ignore | record
Ignores, or records, dynamic dependencies that are not
referenced as part of the link-edit. Ignores, or
records, unreferenced ELF sections from the relocatable
objects that are read as part of the link-edit. By
default, -z record is in effect.
If an ELF section is ignored, the section is eliminated
from the output file being generated. A section is
ignored when three conditions are true. The eliminated
section must contribute to an allocatable segment. The
eliminated section must provide no global symbols. No
other section from any object that contributes to the
link-edit, must reference an eliminated section.
However the following sequence still puts FUNCTION_SHOULD_BE_REMOVED in the ELF section .text.FUNCTION:
$ cat a.c
int main() {
return 0;
}
$ cat b.c
int FUNCTION_SHOULD_BE_REMOVED() {
return 0;
}
$ gcc -fdata-sections -ffunction-sections -c a.c -Wl,-zignore
$ gcc -fdata-sections -ffunction-sections -c b.c -Wl,-zignore
$ gcc -fdata-sections -ffunction-sections a.o b.o -Wl,-zignore
$ elfdump -s a.out # I removed a lot of output for brevity
Symbol Table Section: .dynsym
[2] 0x08050e72 0x0000000a FUNC GLOB D 1 .text.FUNCTION FUNCTION_SHOULD_BE_REMOVED
Symbol Table Section: .symtab
[71] 0x08050e72 0x0000000a FUNC GLOB D 0 .text.FUNCTION FUNCTION_SHOULD_BE_REMOVED
Because the man pages say "no global symbols", I tried making the function "static" and that had the same end result.

The ld '-z ignore' option is positional, it applies to those input objects which occur after it on the command line. The example you gave:
gcc a.o b.o -Wl,-zignore
Applies the option to no objects -- so nothing is done.
gcc -Wl,-zignore a.o b.o
Should work

Related

How to link multiple mach-o files with terminal on MacOS

I created two files: a.c and b.c, they are both *.c file;
Then I use terminal to compile both with command:
gcc -c a.c
gcc -c b.c
I got two mach-o files: a.o and b.o;
So what can I do to link them and generate a linked object file like ab.o ?
I tried the following:
ld a.o b.o -e main -o ab
But it turned me down with the following:
ld: warning: No version-min specified on command line
ld: dynamic main executables must link with libSystem.dylib for inferred architecture x86_64
So what should be done next ?
Just feed them back to gcc:
gcc -o ab a.o b.o
Side note: you might wanna call the resulting file ab.out or ab without suffix, but probably not ab.o, since that usually implies an unlinked object file.
enter image description here
why link a.o and b.o to ab ,but a.o and b.o text size is not ab.o text size
0000005c != 0000002e + 0000002c

GCC differently treats an object and a static library regarding undefined symbols

Recently I discoved that Linux linker does not fail due to undefined symbols from static libraries, however does fail due to the same undefined symbols if I link directly with te object files. Here is a simple example:
Source code:
$ cat main.c
int main() { return 0; }
$ cat src.c
int outerUnusedFunc() {
return innerUndefinedFunc();
}
int innerUndefinedFunc();
Creating *.o and *.a from it, comparing using "nm":
$ gcc -c -o main.o main.c
$ gcc -c -o src.o src.c
$ ar r src.a src.o
ar: creating src.a
$ nm src.o
U innerUndefinedFunc
0000000000000000 T outerUnusedFunc
$ nm src.a
src.o:
U innerUndefinedFunc
0000000000000000 T outerUnusedFunc
(Here we clearly see that both *.o and *.a contain the equal symbols list)
And now...
$ ld -o exe main.o src.o
src.o: In function `outerUnusedFunc':
src.c:(.text+0xa): undefined reference to `innerUndefinedFunc'
$ echo $?
1
$ ld -o exe main.o src.a
$ echo $?
0
What is the reason for GCC to treat it differenty?
If you read the static-libraries tag wiki
it will explain why no object files from src.a are linked into your program, and therefore
why it doesn't matter what undefined symbols are referenced in them.
The difference between an object file foo.o and a static library libfoo.a, as linker inputs, is
that an object file is always linked into your program, unconditionally, whereas the same object file
in a static library library, libfoo.a(foo.o), is extracted from libfoo.a and linked into
the program only if the linker needs it to carry on the linkage, as explained by the tag wiki.
Naturally, the linker will give errors only for undefined references in object files that are linked into the program.
The behaviour you are observing is behaviour of the linker, whether or not you invoke it via a GCC
front-end.
Giving the linker foo.o tells it: I want this in the program. Giving the linker
libfoo.a tells it: Here are some object files that you might or might not need.
In the second case — with static library — command line with says "build exe from main.o and add all required things from src.a".
ld just ignores the library because no external symbols required for main.o (outerUnusedFunc is not referenced from main.o).
But in the first case command line says "build exe from main.o and src.o".
ld should place src.o content into output file.
Hence, it obligate to analyze src.o module, add outerUnusedFunc into output file and resolve all symbols for outerUnusedFunc despite it is unused.
You can enable garbage collection for code sections
gcc --function-sections -Wl,--gc-sections -o exe main.c src.c
In this case outerUnusedFunc (as well as all other functions) will be placed
in separate section. ld will see that this section unused (no symbols referenced). It will remove all the section from output file so that innerUndefinedFunc would not be referenced and the symbol should not be resolved — the same result as for library case.
On the other hand, you can manually reference outerUnusedFunc as "undefined" so that ld should find it in library and add to output file.
ld -o exe main.o -u outerUnusedFunc src.a
in this case the same error (undefined reference to innerUndefinedFunc) will be produced.

GCC 4.5 vs 4.4 linking with dependencies

I am observing a difference when trying to do the same operation on GCC 4.4 and GCC 4.5. Because the code I am doing this with is proprietary, I am unable to provide it, but I am observing a similar failure with this simple test case.
What I am basically trying to do is have one shared library (libb) depend on another shared library (liba). When loading libb, I assume that liba should be loaded as well - even though libb is not necessarily using the symbols in liba.
What I am observing is when I compile with GCC 4.4, I observe that the liba is loaded, but if I compile with GCC 4.5, libb is not loaded.
I have a small test case that consists of two files, a.c and b.c . The contents of the files:
//a.c
int a(){
return 0;
}
//b.c
int b(){
return 0;
}
//c.c
#include <stdio.h>
int a();
int b();
int main()
{
printf("%d\n", a()+b());
return 0;
}
//test.sh
$CC -o liba.so a.c -shared
$CC -o libb.so b.c -shared -L. -la -Wl,-rpath-link .
$CC c.c -L. -lb -Wl,-rpath-link .
LD_LIBRARY_PATH=. ./a.out
This is my output with different versions of GCC
$ CC=gcc-4.4 ./test.sh
1
$ CC=gcc-4.5 ./test.sh
/tmp/cceJhAqy.o: In function `main':
c.c:(.text+0xf): undefined reference to `a'
collect2: ld returned 1 exit status
./test.sh: line 4: ./a.out: No such file or directory
$ CC=gcc-4.6 ./test.sh
/tmp/ccoovR0x.o: In function `main':
c.c:(.text+0xf): undefined reference to `a'
collect2: ld returned 1 exit status
./test.sh: line 4: ./a.out: No such file or directory
$
Can anyone explain what is happening? Another extra bit of information is that ldd on libb.so does show liba.so on GCC 4.4 but not on GCC 4.5.
EDIT
I changed test.sh to the following:
$CC -shared -o liba.so a.c
$CC -L. -Wl,--no-as-needed -Wl,--copy-dt-needed-entries -la -shared -o libb.so b.c -Wl,-rpath-link .
$CC -L. c.c -lb -Wl,-rpath-link .
LD_LIBRARY_PATH=. ./a.out
This gave the following output with GCC 4.5:
/usr/bin/ld: /tmp/cc5IJ8Ks.o: undefined reference to symbol 'a'
/usr/bin/ld: note: 'a' is defined in DSO ./liba.so so try adding it to the linker command line
./liba.so: could not read symbols: Invalid operation
collect2: ld returned 1 exit status
./test.sh: line 4: ./a.out: No such file or directory
There seems to have been changes in how DT_NEEDED libraries are treated during linking by ld. Here's the relevant part of current man ld:
With --copy-dt-needed-entries dynamic libraries mentioned on the command
line will be recursively searched, following their DT_NEEDED tags to other libraries, in order to resolve symbols required by the output binary. With the
default setting however the searching of dynamic libraries that follow it will stop with the dynamic library itself. No DT_NEEDED links will be traversed
to resolve symbols.
(part of the --copy-dt-needed-entries section).
Some time between GCC 4.4 and GCC 4.5 (apparently, see some reference here - can't find anything really authoritative), the default was changed from the recursive search, to no recursive search (as you are seeing with the newer GCCs).
In any case, you can (and should) fix it by specifying liba in your final link step:
$CC c.c -L. -lb -la -Wl,-rpath-link .
You can check that this linker setting is indeed (at least part of) the issue by running with your newer compilers and this command line:
$CC c.c -L. -Wl,--copy-dt-needed-entries -lb -Wl,--no-copy-dt-needed-entries \
-Wl,-rpath-link .

How can I tell, with something like objdump, if an object file has been built with -fPIC?

How can I tell, with something like objdump, if an object file has been built with -fPIC?
The answer depends on the platform. On most platforms, if output from
readelf --relocs foo.o | egrep '(GOT|PLT|JU?MP_SLOT)'
is empty, then either foo.o was not compiled with -fPIC, or foo.o doesn't contain any code where -fPIC matters.
I just had to do this on a PowerPC target to find which shared object (.so) was being built without -fPIC. What I did was run readelf -d libMyLib1.so and look for TEXTREL. If you see TEXTREL, one or more source files that make up your .so were not built with -fPIC. You can substitute readelf with elfdump if necessary.
E.g.,
[user#host lib]$ readelf -d libMyLib1.so | grep TEXT # Bad, not -fPIC
0x00000016 (TEXTREL)
[user#host lib]$ readelf -d libMyLib2.so | grep TEXT # Good, -fPIC
[user#host lib]$
And to help people searching for solutions, the error I was getting when I ran my executable was this:
root#target:/# ./program: error while loading shared libraries: /usr/lib/libMyLi
b1.so: R_PPC_REL24 relocation at 0x0fc5987c for symbol 'memcpy' out of range
I don't know whether this info applies to all architectures.
Source: blogs.oracle.com/rie
I assume, what you really want to know is whether or not a shared library is composed from object files compiled with -fPIC.
As already mentioned, if there are TEXTRELs, then -fPIC was not used.
There is a great tool called scanelf which can show you the symbols that caused .text relocations.
More information can be found at HOWTO Locate and Fix .text Relocations TEXTRELs.
-fPIC means that code will be able to execute in addresses different form the address that was compile for.
To do it , disasambler will look like this....
call get_offset_from_compilation_address
get_offset_from_compilation_address: pop ax
sub ax, ax , &get_offset_from_compilation_address
now in ax we have an offset that we need to add to any access to memory.
load bx, [ax + var_address}
readelf -a *.so | grep Flags
Flags: 0x50001007, noreorder, pic, cpic, o32, mips32
This should work most of the time.
Another option to distinguish whether your program is generated wit -fPIC option:
provided that your code has -g3 -gdwarf-2 option enabled when compiling.
other gcc debug format may also contains the macro info:
Note the following $'..' syntax is assumes bash
echo $' main() { printf("%d\\n", \n#ifdef __PIC__\n__PIC__\n#else\n0\n#endif\n); }' | gcc -fPIC -g3
-gdwarf-2 -o test -x c -
readelf --debug-dump=macro ./test | grep __PIC__
such a method works because gcc manual declares that if -fpic is used, PIC is defined to 1, and
if -fPIC used, PIC is 2.
The above answers by checking the GOT is the better way. Because the prerequest of -g3 -gdwarf-2 I guess seldom being used.
From The Linux Programming Interface:
On Linux/x86-32, it is possible to create a shared library using
modules compiled without the –fPIC option. However, doing so loses
some of the benefits of shared libraries, since pages of program text
containing position-dependent memory references are not shared across
processes. On some architectures, it is impossible to build shared
libraries without the –fPIC option.
In order to determine whether an existing object file has been
compiled with the –fPIC option, we can check for the presence of the
name _GLOBAL_OFFSET_TABLE_ in the object file’s symbol table, using
either of the following commands:
$ nm mod1.o | grep _GLOBAL_OFFSET_TABLE_
$ readelf -s mod1.o | grep _GLOBAL_OFFSET_TABLE_
Conversely, if either of the following equivalent commands yields any
output, then the specified shared library includes at least one object
module that was not compiled with –fPIC:
$ objdump --all-headers libfoo.so | grep TEXTREL
$ readelf -d libfoo.so | grep TEXTREL
However, neither above quoting nor any answer of this question works for x86_64.
What I've observed on my x86_64 Ubuntu machine is that, whether specifying -fPIC or not, it would generate fPIC .o. That is
gcc -g -Wall -c -o my_so.o my_so.c // has _GLOBAL_OFFSET_TABLE_
gcc -g -Wall -fPIC -c -o my_so_fpic.o my_so.c // has _GLOBAL_OFFSET_TABLE_
readelf -s my_so.o > 1.txt && readelf -s my_so_fpic > 2.txt && diff 1.txt 2.txt
has no difference and both my_so.o and my_so_fpic.o can be used to create a shared library.
In order to generate non fpic object file, I found a gcc flag called -fno-pic in the first comment of How to test whether a Linux binary was compiled as position independent code? .
This works,
gcc -g —Wall -fno-pic -c -o my_so_fnopic.o my_so.c // no _GLOBAL_OFFSET_TABLE_
and
gcc -g -Wall -shared -o libdemo.so my_so_fnopic.o
gives error:
/usr/bin/ld: my_so_fnopic.o: relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC
collect2: error: ld returned 1 exit status
can not create a shared library with non pic .o.

Why does the order in which libraries are linked sometimes cause errors in GCC?

Why does the order in which libraries are linked sometimes cause errors in GCC?
(See the history on this answer to get the more elaborate text, but I now think it's easier for the reader to see real command lines).
Common files shared by all below commands
// a depends on b, b depends on d
$ cat a.cpp
extern int a;
int main() {
return a;
}
$ cat b.cpp
extern int b;
int a = b;
$ cat d.cpp
int b;
Linking to static libraries
$ g++ -c b.cpp -o b.o
$ ar cr libb.a b.o
$ g++ -c d.cpp -o d.o
$ ar cr libd.a d.o
$ g++ -L. -ld -lb a.cpp # wrong order
$ g++ -L. -lb -ld a.cpp # wrong order
$ g++ a.cpp -L. -ld -lb # wrong order
$ g++ a.cpp -L. -lb -ld # right order
The linker searches from left to right, and notes unresolved symbols as it goes. If a library resolves the symbol, it takes the object files of that library to resolve the symbol (b.o out of libb.a in this case).
Dependencies of static libraries against each other work the same - the library that needs symbols must be first, then the library that resolves the symbol.
If a static library depends on another library, but the other library again depends on the former library, there is a cycle. You can resolve this by enclosing the cyclically dependent libraries by -( and -), such as -( -la -lb -) (you may need to escape the parens, such as -\( and -\)). The linker then searches those enclosed lib multiple times to ensure cycling dependencies are resolved. Alternatively, you can specify the libraries multiple times, so each is before one another: -la -lb -la.
Linking to dynamic libraries
$ export LD_LIBRARY_PATH=. # not needed if libs go to /usr/lib etc
$ g++ -fpic -shared d.cpp -o libd.so
$ g++ -fpic -shared b.cpp -L. -ld -o libb.so # specifies its dependency!
$ g++ -L. -lb a.cpp # wrong order (works on some distributions)
$ g++ -Wl,--as-needed -L. -lb a.cpp # wrong order
$ g++ -Wl,--as-needed a.cpp -L. -lb # right order
It's the same here - the libraries must follow the object files of the program. The difference here compared with static libraries is that you need not care about the dependencies of the libraries against each other, because dynamic libraries sort out their dependencies themselves.
Some recent distributions apparently default to using the --as-needed linker flag, which enforces that the program's object files come before the dynamic libraries. If that flag is passed, the linker will not link to libraries that are not actually needed by the executable (and it detects this from left to right). My recent archlinux distribution doesn't use this flag by default, so it didn't give an error for not following the correct order.
It is not correct to omit the dependency of b.so against d.so when creating the former. You will be required to specify the library when linking a then, but a doesn't really need the integer b itself, so it should not be made to care about b's own dependencies.
Here is an example of the implications if you miss specifying the dependencies for libb.so
$ export LD_LIBRARY_PATH=. # not needed if libs go to /usr/lib etc
$ g++ -fpic -shared d.cpp -o libd.so
$ g++ -fpic -shared b.cpp -o libb.so # wrong (but links)
$ g++ -L. -lb a.cpp # wrong, as above
$ g++ -Wl,--as-needed -L. -lb a.cpp # wrong, as above
$ g++ a.cpp -L. -lb # wrong, missing libd.so
$ g++ a.cpp -L. -ld -lb # wrong order (works on some distributions)
$ g++ -Wl,--as-needed a.cpp -L. -ld -lb # wrong order (like static libs)
$ g++ -Wl,--as-needed a.cpp -L. -lb -ld # "right"
If you now look into what dependencies the binary has, you note the binary itself depends also on libd, not just libb as it should. The binary will need to be relinked if libb later depends on another library, if you do it this way. And if someone else loads libb using dlopen at runtime (think of loading plugins dynamically), the call will fail as well. So the "right" really should be a wrong as well.
The GNU ld linker is a so-called smart linker. It will keep track of the functions used by preceding static libraries, permanently tossing out those functions that are not used from its lookup tables. The result is that if you link a static library too early, then the functions in that library are no longer available to static libraries later on the link line.
The typical UNIX linker works from left to right, so put all your dependent libraries on the left, and the ones that satisfy those dependencies on the right of the link line. You may find that some libraries depend on others while at the same time other libraries depend on them. This is where it gets complicated. When it comes to circular references, fix your code!
Here's an example to make it clear how things work with GCC when static libraries are involved. So let's assume we have the following scenario:
myprog.o - containing main() function, dependent on libmysqlclient
libmysqlclient - static, for the sake of the example (you'd prefer the shared library, of course, as the libmysqlclient is huge); in /usr/local/lib; and dependent on stuff from libz
libz (dynamic)
How do we link this? (Note: examples from compiling on Cygwin using gcc 4.3.4)
gcc -L/usr/local/lib -lmysqlclient myprog.o
# undefined reference to `_mysql_init'
# myprog depends on libmysqlclient
# so myprog has to come earlier on the command line
gcc myprog.o -L/usr/local/lib -lmysqlclient
# undefined reference to `_uncompress'
# we have to link with libz, too
gcc myprog.o -lz -L/usr/local/lib -lmysqlclient
# undefined reference to `_uncompress'
# libz is needed by libmysqlclient
# so it has to appear *after* it on the command line
gcc myprog.o -L/usr/local/lib -lmysqlclient -lz
# this works
If you add -Wl,--start-group to the linker flags it does not care which order they're in or if there are circular dependencies.
On Qt this means adding:
QMAKE_LFLAGS += -Wl,--start-group
Saves loads of time messing about and it doesn't seem to slow down linking much (which takes far less time than compilation anyway).
Another alternative would be to specify the list of libraries twice:
gcc prog.o libA.a libB.a libA.a libB.a -o prog.x
Doing this, you don't have to bother with the right sequence since the reference will be resolved in the second block.
A quick tip that tripped me up: if you're invoking the linker as "gcc" or "g++", then using "--start-group" and "--end-group" won't pass those options through to the linker -- nor will it flag an error. It will just fail the link with undefined symbols if you had the library order wrong.
You need to write them as "-Wl,--start-group" etc. to tell GCC to pass the argument through to the linker.
You may can use -Xlinker option.
g++ -o foobar -Xlinker -start-group -Xlinker libA.a -Xlinker libB.a -Xlinker libC.a -Xlinker -end-group
is ALMOST equal to
g++ -o foobar -Xlinker -start-group -Xlinker libC.a -Xlinker libB.a -Xlinker libA.a -Xlinker -end-group
Careful !
The order within a group is important !
Here's an example: a debug library has a debug routine, but the non-debug
library has a weak version of the same. You must put the debug library
FIRST in the group or you will resolve to the non-debug version.
You need to precede each library in the group list with -Xlinker
Link order certainly does matter, at least on some platforms. I have seen crashes for applications linked with libraries in wrong order (where wrong means A linked before B but B depends on A).
I have seen this a lot, some of our modules link in excess of a 100 libraries of our code plus system & 3rd party libs.
Depending on different linkers HP/Intel/GCC/SUN/SGI/IBM/etc you can get unresolved functions/variables etc, on some platforms you have to list libraries twice.
For the most part we use structured hierarchy of libraries, core, platform, different layers of abstraction, but for some systems you still have to play with the order in the link command.
Once you hit upon a solution document it so the next developer does not have to work it out again.
My old lecturer used to say, "high cohesion & low coupling", it’s still true today.

Resources