Only one --hash-style in embedded Linux. Why?

Only one --hash-style in embedded Linux. Why? - embedded-linux

I am trying to get a software package built and deployed into rootfs with OpenEmbedded-based Arago. Unfortunately the software package includes prebuilt shared libs. As far as I understand, Arago builds the entire Linux distro with --hash-style=gnu, while those shared libs have been built with --hash-style=sysv, I suspect. At least the build stops with "No GNU_HASH in the ELF binary" QA issue.
I understand what hashes are for. But I guess I do not understand how they are being used when the system is running.
Why is it necessary to have one hash style for all ELFs in the system? Why can't the dynamic linker determine the hash style on the fly and just use it?

The dynamic linker can and does figure out the kind of hash table ("sysv" or "gnu") present in the ELF and works accordingly.
Unfortunately what you see is a case where support for gnu hash sections has NOT been back-ported to older version of the dynamic linker in use on your system.
A similar situation exists wherein binaries built for RHEL5/FC6 do NOT work on RHEL4/FC5.
Why .gnu.hash is incompatible with .hash(sysv)?
Generating an ELF with the gnu hash section imposes certain restrictions(additional rules) on the construction of the dynamic symbol table.
With GNU hash, the dynamic symbol table is divided into two parts. The first part receives the symbols that can be omitted from the hash table. GNU hash does not impose any specific order for the symbols in this part of the dynamic symbol table.
The second part of the dynamic symbol table receives the symbols that are accessible from the hash table. These symbols are required to be sorted by increasing (hash % nbuckets) value, using the GNU hash function described above. The number of hash buckets (nbuckets) is recorded in the GNU hash section, described below. As a result, symbols which will be found in a single hash chain are adjacent in memory, leading to better cache performance.
Reference: blogs.oracle.com/ali/entry/gnu_hash_elf_sections

I had the same No GNU_HASH in the ELF binary issue with my Yocto Arago build, but it turned out that my application's Makefile wasn't using $(LDFLAGS) which is set by the Yocto build and contains -Wl,--hash-style=gnu among other important things.
I'm mentioning this because this question is the top search result for that error message, and this might help other people.

Related

Indirect symbols in mach-o files

Symbols in mach-o files can be marked as 'indirect' (I in output from nm, defined in the headers using the constant N_INDR). It seems this type of symbol is rarely used in practice (or perhaps has a very specific, limited use). The only example I've come across is in the macOS system library /usr/lib/system/liblaunch.dylib. Here's a partial output from running nm on that file:
U __spawn_via_launchd
0000000000000018 I __spawn_via_launchd
U __vproc_get_last_exit_status
0000000000000049 I __vproc_get_last_exit_status
U __vproc_grab_subset
000000000000007a I __vproc_grab_subset
As this output shows, the same symbol is listed twice, once as U and again as I. This is true of all the I symbols in the file.
Examining the symbol table entries in more detail show that while the U entries reference another linked library (libxpc.dylib), the I entries reference nothing.
Searching for "mach-o indirect symbols" doesn't seem to produce many useful results. Most, including Apple's documentation, cover the stubs used to implement lazy weak symbols. I don't think this is related to the I indirect symbols, mainly because lazy weak linked symbols in other files are not marked as I, but I'll be happy to be told different.
The source files from the Darwin archives are also lacking useful comments.
So I have 3 questions:
What is the canonical explanation for what indirect symbols are?
How does the linker at compile time and the link loader at load time interpret and process these symbols?
How do you define a symbol in a source file as indirect? (I'm mostly interested in a C / Clang answer, but I'm sure future readers will welcome answers for other languages too.)
(For what it's worth, my current best guess is that this is a way to allow symbols to be removed from a library while maintaining compatibility, as long as those symbols are available elsewhere. A way of keeping the linker happy at compile time and pointing the link loader to the moved implementation at load time. But that's just a guess.)
UPDATE
After some more research I can answer question 3. This is, of course, controlled by the linker, and the option I was looking for was -reexport_symbol. This will create the I entry in the symbol table (and if the symbol isn't already present it will also create the U entry). There is also the -reexport_symbols_list option which takes a file of symbol names and creates indirect symbols for each.

Trying deterministic gcc compilation, symbol table problems

I work for embedded systems and I am trying to make a build that yields exactly the same executable each time. Using -frandom-seed certainly helped to stabilize names that were otherwise variable, but still I have a couple of symbols that I have problems with. For example:
0x00003bfc _ZN13WorkingMemory17ReadTransactionalERN3HSL4FileERN58_GLOBAL__N_......_.._working_memory.cc_AE42A16A_FF4623503AllE
The ".._.." etc. part was evidently worked out of what I passed as -frandom-seed, id est, the source filename. Of the couple of hex number that follows sometimes, the second one sometimes is different, and I guess it is probably linked to the compilation date, but I am not sure.
I am working on ARM, using gcc 3.4.0, using FLAT executables. I tried to remove symbols using strip on the ELF file, but that prevents FLAT conversion.
Any ideas?

how can I verify that dead code was stripped from the binary?

My c/obj-c code (an iOS app built with clang) has some functions excluded by #ifdefs. I want to make sure that code that gets called from those functions, but not from others (dead code) gets stripped out (eliminated) at link time.
I tried:
Adding a local literal char[] in a function that should be eliminated; the string is still visible when running strings on the executable.
Adding a function that should be eliminated; the function name is still visible when running strings.
Before you ask, I'm building for release, and all strip settings (including dead-code stripping, obviously) are enabled.
The question is not really xcode/apple/iOS specific; I assume the answer should be pretty much the same on any POSIX development platform.

(EDIT)
In binutils, ld has the --gc-sections option which does what you want for sections on object level. You have several options:
use gcc's flags -ffunction-sections and -fdata-sections to isolate each symbol into its own section, then use --gc-sections;
put all candidates for removal into a separate file and the linker will be able to strip the whole section;
disassemble the resulting binary, remove dead code, assemble again;
use strip with appropriate -N options to discard the offending symbols from the
symbol table - this will leave the code and data there, but it won't show up in the symbol table.

Static library "interface"

Is there any way to tell the compiler (gcc/mingw32) when building an object file (lib*.o) to only expose certain functions from the .c file?
The reason I want to do this is that I am statically linking to a 100,000+ line library (SQLite), but am only using a select few of the functions it offers. I am hoping that if I can tell the compiler to only expose those functions, it will optimize out all the code of the functions that are never needed for those few I selected, thus dratically decreasing the size of the library.

I found several possible solutions:
This is what I asked about. It is the gcc equivalent of Windows' dllexpoort:
http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Code-Gen-Options.html (-fvisibility)
http://gcc.gnu.org/wiki/Visibility
I also discovered link-time code-generation. This allows the linker to see what parts of the code are actually used and get rid of the rest. Using this together with strip and -fwhole-program has given me drastically better results.
http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Optimize-Options.html (see -flto and -fwhole-program)
Note: This flag only makes sense if you are not compiling the whole program in one call to gcc, which is what I was doing (making a sqlite.o file and then statically linking it in).
The third option which I found but have not yet looked into is mentioned here:
How to remove unused C/C++ symbols with GCC and ld?

That's probably the linkers job, not the compilers. When linking that as a program (.exe), the linker will take care of only importing the relevant symbols, and when linking a DLL, the __dllexport mechanism is probably what you are looking for, or some flags of ld can help you (man ld).

Is there a way to strip all functions from an object file that I am not using?

I am trying to save space in my executable and I noticed that several functions are being added into my object files, even though I never call them (the code is from a library).
Is there a way to tell gcc to remove these functions automatically or do I need to remove them manually?

If you are compiling into object files (not executables), then a compiler will never remove any non-static functions, since it's always possible you will link the object file against another object file that will call that function. So your first step should be declaring as many functions as possible static.
Secondly, the only way for a compiler to remove any unused functions would be to statically link your executable. In that case, there is at least the possibility that a program might come along and figure out what functions are used and which ones are not used.
The catch is, I don't believe that gcc actually does this type of cross-module optimization. Your best bet is the -Os flag to optimize for code size, but even then, if you have an object file abc.o which has some unused non-static functions and you link statically against some executable def.exe, I don't believe that gcc will go and strip out the code for the unused functions.
If you truly desperately need this to be done, I think you might have to actually #include the files together so that after the preprocessor pass, it results in a single .c file being compiled. With gcc compiling a single monstrous jumbo source file, you stand the best chance of unused functions being eliminated.

Have you looked into calling gcc with -Os (optimize for size.) I'm not sure if it strips unreached code, but it would be simple enough to test. You could also, after getting your executable back, 'strip' it. I'm sure there's a gcc command-line arg to do the same thing - is it --dead_strip?

In addition to -Os to optimize for size, this link may be of help.

Since I asked this question, GCC 4.5 was released which includes an option to combine all files so it looks like it is just 1 gigantic source file. Using that option, it is possible to easily strip out the unused functions.
More details here

IIRC the linker by default does what you want ins some specific cases. The short of it is that library files contain a bunch of object files and only referenced files are linked in. If you can figure out how to get GCC to emit each function into it's own object file and then build this into a library you should get what you are looking.
I only know of one compiler that can actually do this: here (look at the -lib flag)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio