What does *COM* stand for in objdump symbol table? - compilation

There are a few section names in objdump output that have some unique names, like
*ABS*
*COM*
*UND*
I guess *ABS* stands for ABSolute and denotes a symbol that doesn't belong to any section. External variables go to *COM* section. It seems that external functions go down to UNDefined. My questions are — what does COM stand for? What does it contain besides references to external variables? What are other sections like those two?

This indeed refers to "Common". From the ELF spec:
SHN_COMMON Symbols defined relative to this section are common symbols,
such as FORTRAN COMMON or unallocated C external variables.

Related

Indirect symbols in mach-o files

Symbols in mach-o files can be marked as 'indirect' (I in output from nm, defined in the headers using the constant N_INDR). It seems this type of symbol is rarely used in practice (or perhaps has a very specific, limited use). The only example I've come across is in the macOS system library /usr/lib/system/liblaunch.dylib. Here's a partial output from running nm on that file:
U __spawn_via_launchd
0000000000000018 I __spawn_via_launchd
U __vproc_get_last_exit_status
0000000000000049 I __vproc_get_last_exit_status
U __vproc_grab_subset
000000000000007a I __vproc_grab_subset
As this output shows, the same symbol is listed twice, once as U and again as I. This is true of all the I symbols in the file.
Examining the symbol table entries in more detail show that while the U entries reference another linked library (libxpc.dylib), the I entries reference nothing.
Searching for "mach-o indirect symbols" doesn't seem to produce many useful results. Most, including Apple's documentation, cover the stubs used to implement lazy weak symbols. I don't think this is related to the I indirect symbols, mainly because lazy weak linked symbols in other files are not marked as I, but I'll be happy to be told different.
The source files from the Darwin archives are also lacking useful comments.
So I have 3 questions:
What is the canonical explanation for what indirect symbols are?
How does the linker at compile time and the link loader at load time interpret and process these symbols?
How do you define a symbol in a source file as indirect? (I'm mostly interested in a C / Clang answer, but I'm sure future readers will welcome answers for other languages too.)
(For what it's worth, my current best guess is that this is a way to allow symbols to be removed from a library while maintaining compatibility, as long as those symbols are available elsewhere. A way of keeping the linker happy at compile time and pointing the link loader to the moved implementation at load time. But that's just a guess.)
UPDATE
After some more research I can answer question 3. This is, of course, controlled by the linker, and the option I was looking for was -reexport_symbol. This will create the I entry in the symbol table (and if the symbol isn't already present it will also create the U entry). There is also the -reexport_symbols_list option which takes a file of symbol names and creates indirect symbols for each.

Extract Structure definitions from executable

I need to extract structure definitions from an executable. How can I do that?
I read we can do it using ELF, but not sure how to do this. Any help here?
I read we can do it using ELF, but not sure how to do this.
What you probably read is that if a binary contains debug info, then the types of variables, structures, and great many other kinds of info can be extracted from that binary.
This isn't specific to ELF: many other executable formats (such as COFF) allow for embedding of debugging info as well.
Further, the format of that debugging info is different between different platforms. Some of the common UNIX ones are DWARF and STABS (with DWARF being more recent and much more powerful).
If you have an ELF binary, and you suspect that it may contain DWARF debug info, you can decode it using readelf -wi a.out (be prepared for there to be a lot of info, if any is present at all). objdump -g can be used to decode STABS (recent objdump versions can decode DWARF as well).
Or, as suggested by tristan, you can load the executable into GDB and use info types and ptype commands.
If the binary doesn't contain debug info, then DrPrItay's answer is correct: you can't easily recover structure definitions from it. However, you still can recover them by using reverse-engineering techniques. For example, many struct definitions used by the Wine project (example) were obtained by such techniques.
As much as I know, you can't. c / c++ programs are not like java, structs dont gain a symbol. Their just definitions for your compiler about how to align and pack variables within stack frames or some other memory (struct data members). For example unlike java you dont have what resembles class loading when loading shared objects's (no header file included within your c program ) you can only load global variables and functions. Defining a struct is much as creating some data type, it's definition should be only present for compilation, you dont get a symbol within the symtable for int or char then why should you for some struct? It simply makes no sense. Symbols aee soley meant for objects that your compiler doesn't recognize during compilation - link time/load time/run time

Absolute value of symbol

I was trying to understand the System.map file that gets created every time one compiles the Linux kernel, I was trying to understand the values presented in the System.map file.
Following is a sample information from it
000001d5 A kexec_control_code_size
00400000 A phys_startup_32
c0400000 T _text
c0400000 T startup_32
c04000b4 T start_cpu0
c04000c4 T startup_32_smp
c04000e0 t default_entry
c0400158 t enable_paging
c04001da t is486`
If you see the first line, the type of the symbol kexec_control_code_size is shown as A, I know that A means value of the symbol is absolute, but I wasn't able to completely decode what that exactly means. Does value mean the address of the symbol? Does absolute address mean that this symbol will be present at this address everytime the kernel gets loaded in to the memory?
Please forgive, if the questions are too basic.
You can examine symbol type via "man nm". nm tool shows all symbols in object file. Details about type of symbols you can find under man nm. Linux kernel modules .ko file and kernel object file can be examine with nm tool. Also you can investigate symbols from zImage or uImage or any kernel image and from kernel modules using objdump and readelf. Try use man pages for detail descriptions. Address of symbol can be calculated like offset from some main point for example section start. Other approach of symbols address calculation is absolute value of address (probably absolute value related to address space?). External symbols should be absolute. Symbols marked like absolute retain the same address through any link operation.
When the linker evaluates an expression, the result is either absolute or relative to some section. A relative expression is expressed as a fixed offset from the base of a section.
The position of the expression within the linker script determines whether it is absolute or relative. An expression which appears within an output section definition is relative to the base of the output section. An expression which appears elsewhere will be absolute.
A symbol set to a relative expression will be relocatable if you request relocatable output using the -r option. That means that a further link operation may change the value of the symbol. The symbol's section will be the section of the relative expression.
A symbol set to an absolute expression will retain the same value through any further link operation. The symbol will be absolute, and will not have any particular associated section. Taken from this manual
An example is here. Look for line "The following example shows how two absolute symbol definitions can be defined. "

Using DLLs with NASM

I have been doing some x86 programming in Windows with NASM and I have run into some confusion. I am confused as to why I must do this:
extern _ExitProcess#4
Specifically I am confused about the '_' and the '#4'. I know that the '#4' is the size of the stack but why is it needed? When I looked in the kernel32.dll with a hex editor I only saw 'ExitProcess' not '_ExitProcess#4'.
I am also confused as to why C Functions do not need the underscore and the stack size such as this:
extern printf
Why don't C Functions need decorations?
My third question is "Is this the way I should be using these functions?" Right now I am linking with the actual dll files themselves.
I know that the '#4' is the size of the stack but why is it needed?
To enable the linker to report a fatal error if your compiler assumed the wrong calling convention for the function (this can happen if you forget to include header files in C and ignore all the compiler warnings or if a declaration doesn't exactly match the function in the shared library).
Why don't C Functions need decorations?
Functions that use the cdecl calling convention are decorated with a single leading (so it would actually be _printf).
The reason why no parameter size is encoded into the decorated name is that the caller is responsible for both setting up and tearing down the stack, so an argument count mismatch will not be fatal for the stack setup (though the calling function might still crash if it isn't given the right arguments, of course). It might even be possible that the argument count is variable, like in the case of printf.
When I looked in the kernel32.dll with a hex editor I only saw ExitProcess not _ExitProcess#4.
The mangled names are usually mapped to the actual exported names of the DLL using definition files (*.def), which then get compiled to *.lib import library files that can be used in your linker invocation. An example of such a definition file for kernel32.dll is this one. The following line defines the mapping for ExitProcess:
_ExitProcess#4 = ExitProcess
Is this the way I should be using these functions?
I don't know NASM very well, but the code I've seen so far usually specifies the decorated name, like in your example.
You can find more information on this excellent page about Win32 calling conventions.

define a program section in C code (GCC)

In assembly language, it's easy to define a section like:
.section foo
How can this be done in C code? I want to put a piece of C code in a special section rather than .text, so I will be able to put that section in a special location in the linker script.
I'm using GCC.
The C standard doesn't say anything about "sections" in the sense that you mean, so you'll need to use extensions specific to your compiler.
With GCC, you will want to use the section attribute:
extern void foobar(void) __attribute__((section("bar")));
There is some limited documentation here, including a warning:
Some file formats do not support
arbitrary sections so the section
attribute is not available on all
platforms. If you need to map the
entire contents of a module to a
particular section, consider using the
facilities of the linker instead.

Resources