Indirect symbols in mach-o files

Indirect symbols in mach-o files - macos

Symbols in mach-o files can be marked as 'indirect' (I in output from nm, defined in the headers using the constant N_INDR). It seems this type of symbol is rarely used in practice (or perhaps has a very specific, limited use). The only example I've come across is in the macOS system library /usr/lib/system/liblaunch.dylib. Here's a partial output from running nm on that file:
U __spawn_via_launchd
0000000000000018 I __spawn_via_launchd
U __vproc_get_last_exit_status
0000000000000049 I __vproc_get_last_exit_status
U __vproc_grab_subset
000000000000007a I __vproc_grab_subset
As this output shows, the same symbol is listed twice, once as U and again as I. This is true of all the I symbols in the file.
Examining the symbol table entries in more detail show that while the U entries reference another linked library (libxpc.dylib), the I entries reference nothing.
Searching for "mach-o indirect symbols" doesn't seem to produce many useful results. Most, including Apple's documentation, cover the stubs used to implement lazy weak symbols. I don't think this is related to the I indirect symbols, mainly because lazy weak linked symbols in other files are not marked as I, but I'll be happy to be told different.
The source files from the Darwin archives are also lacking useful comments.
So I have 3 questions:
What is the canonical explanation for what indirect symbols are?
How does the linker at compile time and the link loader at load time interpret and process these symbols?
How do you define a symbol in a source file as indirect? (I'm mostly interested in a C / Clang answer, but I'm sure future readers will welcome answers for other languages too.)
(For what it's worth, my current best guess is that this is a way to allow symbols to be removed from a library while maintaining compatibility, as long as those symbols are available elsewhere. A way of keeping the linker happy at compile time and pointing the link loader to the moved implementation at load time. But that's just a guess.)
UPDATE
After some more research I can answer question 3. This is, of course, controlled by the linker, and the option I was looking for was -reexport_symbol. This will create the I entry in the symbol table (and if the symbol isn't already present it will also create the U entry). There is also the -reexport_symbols_list option which takes a file of symbol names and creates indirect symbols for each.

Related

Can the gnat compiler find unused specification procedures/functions/variables?

Is there a warning option switch that will identify spec-level procedures, functions, or variables that are not called or referenced anywhere? I've tried the switches below without luck.
This is what I'm currently using:
-gnatwfilmopuvz
-- m turn on warnings for variable assigned but not read
-- u turn on warnings for unused entity
-- v turn on warnings for unassigned variable
When I move unused variables from the spec to the body, the compiler correctly identifies them as not referenced. I would like to understand why the compiler won't identify unused code in the spec, and if there is a way to get it to do so. An excessive number of warnings isn't a concern, because I use the filter field in gnat studio to only look at a few files at a time, and I can easily filter to ignore library packages.
Any help is very appreciated.

The compiler will only detect unused items in the unit it is compiling.
If you have items in a package spec, you can know they are used (or not) only by exploring the whole project's Ada sources. Some tools like AdaControl can do it.

You need a tool for that: gnatelim. Its main use is to reduce the size of the executable, eliminating the object code for unused subprograms, but you can use its output just to get the list of unused subprograms. As far as I know, it will not detect unused variables in the spec, only procedures and functions.
https://gcc.gnu.org/onlinedocs/gcc-4.5.4/gnat_ugn_unw/About-gnatelim.html

Use link-time garbage collection: https://docs.adacore.com/live/wave/gnat_ugn/html/gnat_ugn/gnat_ugn/gnat_and_program_execution.html#reducing-size-of-executables-with-unused-subprogram-data-elimination
You can then add the linker option --print-gc-sections to instruct the linker to print out a list of all symbols that were garbage collected.

Using Linker Symbol from C++ code as a fixed constant (NOT relocated) in a shared library (DLL)

Sorry if the title is not very clear. I am using MinGW with GCC 6.3.0 to build a x86 32-bit DLL on Windows (so far). I'll spare you the details why I need hacky offsets amongst its sections accessible from code, so please do not ask if it's useful or not (because I don't want to bother explaining that).
So, if I can get the following testcase to work, I'm good. Here's my problem:
In a C++ file, I want to access a linker symbol as an absolute numeric value, not relocated, directly. Remember that I am building a 32-bit DLL which requires a .reloc section for relocations, but in this case I do NOT want relocation, in fact a relocation would screw it up completely.
Here's an example: retrieve the offset of say __imp__MessageBoxW#16 relative to __IAT_start__, in case you don't know what they are, __imp__MessageBoxW#16 is the relocated pointer to the actual function at runtime, and __IAT_start__ is a linker symbol in the default script file. Here's where it is defined:
.idata BLOCK(__section_alignment__) :
{
/* This cannot currently be handled with grouped sections.
See pe.em:sort_sections. */
KEEP (SORT(*)(.idata$2))
KEEP (SORT(*)(.idata$3))
/* These zeroes mark the end of the import list. */
LONG (0); LONG (0); LONG (0); LONG (0); LONG (0);
KEEP (SORT(*)(.idata$4))
__IAT_start__ = .;
KEEP (SORT(*)(.idata$5))
__IAT_end__ = .;
KEEP (SORT(*)(.idata$6))
KEEP (SORT(*)(.idata$7))
}
So far, no problem. Because GAS doesn't allow me to "subtract" two externally defined symbols (both symbols are defined in the linker), I have to define the symbol in the linker script, so at the end of the linker script I have this:
test_symbol = ABSOLUTE("__imp__MessageBoxW#16" - __IAT_start__);
Then in C++ I use this little inline asm to retrieve this relative difference which is supposed to be a fixed value once linked:
asm("movl $test_symbol, %0":"=r"(var));
Now var should contain that fixed number right? Wrong!
Because test_symbol is an "undefined" symbol as far as the assembler is concerned, it makes it relocated. Or I don't know why, but I tried so many things to force it to be an "absolute constant value symbol" instead of a "relocated symbol" to no avail. Even editing the linker script with many things like LD_FEATURE("SANE_EXPR") and others, doesn't work at all.
Its value is correct only if the DLL does not get relocated.
You see, either GNU LD or the assembler adds an entry in the .reloc section for that movl instruction, which is WRONG!
Is there a way to force it to treat an external/undefined symbol as a fixed CONSTANT and apply no relocation to it whatsoever? Basically, omit it from the .reloc section.
I am going crazy with this, please tell me there's something easy I overlooked, I searched for hours!
In other words, is there a way to use a Linker Symbol from within inline asm/C++ without having it relocated whatsoever? No entry to the .reloc section or anything, basically same as a constant like $1234. So if a DLL gets loaded into another base address, that constant would be the same everytime.
UPDATE: I forgot about this question but decided to bring an update, since it seems it's likely not possible as nobody even commented. For anyone else in the same boat as me, I presume this is a limitation of the COFF object format itself. In other words, external symbols are implicitly relocated, and it doesn't seem there's a way against this.
I didn't "fix" it the way I wanted, I did it in a very hacky way though. If anyone is interested, here's my ugly "hack":
First I put a special "custom" instruction in the inline assembly where I reference this external symbol from C++. This "custom" instruction holds a placeholder instruction that grabs the symbol (normal x86 asm instruction with a dummy constant, e.g. 1234) and a way to identify it. Then let GCC generate the assembly files (.S files), then I parse the assembly with a simple script and when I find that "custom" instruction I insert a label for the linker (make it .global) and at the same time add a directive to a custom "on-the-fly" generated linker script that gets included from my main linker script at the end.
This places data in a temporary section in the resulting DLL with absolute offsets to the custom instruction that I need, but without relocation.
Next, I parse the binary DLL itself, in particular that temporary section I added with all this hack. I take the offsets from there, convert them to file offsets, and modify the DLL's .text section directly where those offsets point (remember those placeholder instructions? it is replacing their immediate constants 1234 with the respective value from the linker's non-relocated constant). Then I strip the temporary section from the DLL, and it's done. Of course, all of this is done automatically by a helper program and script
It's an insane hack, but it works and it's fully automatic now that I got it going. If my assumption is correct that COFF doesn't support non-relocated external symbols, then it's really the only way to use linker constants from C++ without them being relocated, which would be a disaster.

how does ld deal with code that is supplied twice (in a source file and in a library)?

Suppose we call
gcc -Dmyflag -lmylib mycode.c
where mylib contains all of mycode but is compiled without -Dmyflag. So all functions and other entities implemented in mycode are available in two versions to the loader. Empirically, I find that the version from mycode is taken. Can I rely on that? Will mycode always overwrite mylib?

Empirically, I find that the version from mycode is taken.
Read this explanation of how linker works with archive libraries, and possibly this one.
Can I rely on that?
You should rely on understanding how this works.
If you understood material in referenced links, you'll observe that adding main to libmylib.a will invert the answer (and if mycode.c also contains main, you'll get duplicate symbol definition error).
If you are using a dynamic library libmylib.so, the rules are different, and the library will always lose to the main binary, although there are many complications, such as LD_PRELOAD, linking the library with -Bsymbolic, and others.
In short, you should prefer to not do this at all.

Rust library for inspecting .rlib binaries

I'm looking for a way to load and inspect .rlib binaries generated by rustc. I've hunted around the standard library without much luck. My assumption is that an .rlib contains all the type information necessary to statically type check programs that "extern crate" it. rustc::metadata is where my hunt ended. I can't quite figure out if the structures available at this point in the compiler are intended as entry points for users, or if they are solely intermediate abstractions depending on a chain of previously initialized data.
Alternatively, If there's a way to dump an .rlib to stdout in a parsable form then that's also fantastic. I tried /usr/bin/nm, but it seemed to be excluding function type signatures. Maybe I'm missing something.
Anyways, I'm working on an editor utility for emacs that I hope at some point will provide contextually relevant information such as available methods, module items and their types, etc. I'd really appreciate any hints anyone has.

The .rlib file is an ar archive file. You can use readelf to read its content.
Try readelf -s <your_lib>.rlib. The type name may be mingled/decorated by the compiler so it may not be exactly the same as in .rs file.

Size of a library and the executable

I have a static library *.lib created using MSVC on windows. The size of library is say 70KB. Then I have an application which links this library. But now the size of the final executable (*.exe) is 29KB, less than the library. What i want to know is :
Since the library is statically linked, I was thinking it should add directly to the executable size and the final exe size should be more than that? Does windows exe format also do some compression of the binary data?
How is it for linux systems, that is how do sizes of library on linux (*.a/*.la file) relate with size of linux executable (*.out) ?
-AD

A static library on both Windows and Unix is a collection of .obj/.o files. The linker looks at each of these object files and determines if it is needed for the program to link. If it isn't needed, then the object file won't get included in the final executable. This can lead to executables that are smaller then the library.
EDIT: As MSalters points out, on Windows the VC++ compiler now supports generating object files that enable function-level linking, e.g., see here. In fact, edit-and-continue requires this, since the edit-and-continue needs to be able to replace the smallest possible part of the executable.

There is additional bookkeeping information in the .lib file that is not needed for the final executable. This information helps the linker find the code to actually link. Also, debug information may be stored in the .lib file but not in the .exe file (I don't recall where debug info is stored for objs in a lib file, it might be somewhere else).

The static library probably contains several functions which are never used. When the linker links the library with the main executable, it sees that certain functions are never used (and that their addresses are never taken and stored in function pointers), it just throws away the code. It can also do this recursively: if function A() is never called, and A() calls B(), but B() is never otherwise called, it can remove the code for both A() and B(). On Linux, the same thing happens.

A static library has to contain every symbol defined in its source code, because it might get linked into an executable which needs just that specific symbol. But once it is linked into an executable, we know exactly which symbols end up being used, and which ones don't. So the linker can trivially remove unused code, trimming the file size by a lot. Similarly, any duplicate symbols (anything that's defined in both the static library and the executable it's linked into gets merged into a single instance.

Disclaimer: It's been a long time since I dealt with static linking, so take my answer with a grain of salt.
You wrote: I was thinking it should add directly to the executable size and final exe size should be more than that?
Naive linkers work exactly this way - back when I was doing hobby development for CP/M systems (a LONG time ago), this was a real problem.
Modern linkers are smarter, however - they only link in the functions referenced by the original code, or as required.

Additionally to the current answers, the linker is allowed to remove function definitions if they have identical object code - this is intended to help reduce the bloating effects of templated code.

#All: Thanks for the pointers.
#Greg Hewgill - Your answer was a good pointer. Thanks.
The answer i found out was as follows:
1.)During Library building what happens is if the option "Keep Program debug databse" in MSVC (or something alike ) is ON, then library will have this debug info bloating its size.
but when i statically include that library and create a executable, the linker strips all that debug info from the library before geenrating the exe and hence the exe size is less than that of the library.
2.) When i disabled the option "Keep Program debug databse", i got an library whose size was smaller than the final executable, which was what i thought is nromal in most situations.
-AD

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio