Absolute value of symbol - debugging

I was trying to understand the System.map file that gets created every time one compiles the Linux kernel, I was trying to understand the values presented in the System.map file.
Following is a sample information from it
000001d5 A kexec_control_code_size
00400000 A phys_startup_32
c0400000 T _text
c0400000 T startup_32
c04000b4 T start_cpu0
c04000c4 T startup_32_smp
c04000e0 t default_entry
c0400158 t enable_paging
c04001da t is486`
If you see the first line, the type of the symbol kexec_control_code_size is shown as A, I know that A means value of the symbol is absolute, but I wasn't able to completely decode what that exactly means. Does value mean the address of the symbol? Does absolute address mean that this symbol will be present at this address everytime the kernel gets loaded in to the memory?
Please forgive, if the questions are too basic.

You can examine symbol type via "man nm". nm tool shows all symbols in object file. Details about type of symbols you can find under man nm. Linux kernel modules .ko file and kernel object file can be examine with nm tool. Also you can investigate symbols from zImage or uImage or any kernel image and from kernel modules using objdump and readelf. Try use man pages for detail descriptions. Address of symbol can be calculated like offset from some main point for example section start. Other approach of symbols address calculation is absolute value of address (probably absolute value related to address space?). External symbols should be absolute. Symbols marked like absolute retain the same address through any link operation.

When the linker evaluates an expression, the result is either absolute or relative to some section. A relative expression is expressed as a fixed offset from the base of a section.
The position of the expression within the linker script determines whether it is absolute or relative. An expression which appears within an output section definition is relative to the base of the output section. An expression which appears elsewhere will be absolute.
A symbol set to a relative expression will be relocatable if you request relocatable output using the -r option. That means that a further link operation may change the value of the symbol. The symbol's section will be the section of the relative expression.
A symbol set to an absolute expression will retain the same value through any further link operation. The symbol will be absolute, and will not have any particular associated section. Taken from this manual
An example is here. Look for line "The following example shows how two absolute symbol definitions can be defined. "

Related

Using Linker Symbol from C++ code as a fixed constant (NOT relocated) in a shared library (DLL)

Sorry if the title is not very clear. I am using MinGW with GCC 6.3.0 to build a x86 32-bit DLL on Windows (so far). I'll spare you the details why I need hacky offsets amongst its sections accessible from code, so please do not ask if it's useful or not (because I don't want to bother explaining that).
So, if I can get the following testcase to work, I'm good. Here's my problem:
In a C++ file, I want to access a linker symbol as an absolute numeric value, not relocated, directly. Remember that I am building a 32-bit DLL which requires a .reloc section for relocations, but in this case I do NOT want relocation, in fact a relocation would screw it up completely.
Here's an example: retrieve the offset of say __imp__MessageBoxW#16 relative to __IAT_start__, in case you don't know what they are, __imp__MessageBoxW#16 is the relocated pointer to the actual function at runtime, and __IAT_start__ is a linker symbol in the default script file. Here's where it is defined:
.idata BLOCK(__section_alignment__) :
{
/* This cannot currently be handled with grouped sections.
See pe.em:sort_sections. */
KEEP (SORT(*)(.idata$2))
KEEP (SORT(*)(.idata$3))
/* These zeroes mark the end of the import list. */
LONG (0); LONG (0); LONG (0); LONG (0); LONG (0);
KEEP (SORT(*)(.idata$4))
__IAT_start__ = .;
KEEP (SORT(*)(.idata$5))
__IAT_end__ = .;
KEEP (SORT(*)(.idata$6))
KEEP (SORT(*)(.idata$7))
}
So far, no problem. Because GAS doesn't allow me to "subtract" two externally defined symbols (both symbols are defined in the linker), I have to define the symbol in the linker script, so at the end of the linker script I have this:
test_symbol = ABSOLUTE("__imp__MessageBoxW#16" - __IAT_start__);
Then in C++ I use this little inline asm to retrieve this relative difference which is supposed to be a fixed value once linked:
asm("movl $test_symbol, %0":"=r"(var));
Now var should contain that fixed number right? Wrong!
Because test_symbol is an "undefined" symbol as far as the assembler is concerned, it makes it relocated. Or I don't know why, but I tried so many things to force it to be an "absolute constant value symbol" instead of a "relocated symbol" to no avail. Even editing the linker script with many things like LD_FEATURE("SANE_EXPR") and others, doesn't work at all.
Its value is correct only if the DLL does not get relocated.
You see, either GNU LD or the assembler adds an entry in the .reloc section for that movl instruction, which is WRONG!
Is there a way to force it to treat an external/undefined symbol as a fixed CONSTANT and apply no relocation to it whatsoever? Basically, omit it from the .reloc section.
I am going crazy with this, please tell me there's something easy I overlooked, I searched for hours!
In other words, is there a way to use a Linker Symbol from within inline asm/C++ without having it relocated whatsoever? No entry to the .reloc section or anything, basically same as a constant like $1234. So if a DLL gets loaded into another base address, that constant would be the same everytime.
UPDATE: I forgot about this question but decided to bring an update, since it seems it's likely not possible as nobody even commented. For anyone else in the same boat as me, I presume this is a limitation of the COFF object format itself. In other words, external symbols are implicitly relocated, and it doesn't seem there's a way against this.
I didn't "fix" it the way I wanted, I did it in a very hacky way though. If anyone is interested, here's my ugly "hack":
First I put a special "custom" instruction in the inline assembly where I reference this external symbol from C++. This "custom" instruction holds a placeholder instruction that grabs the symbol (normal x86 asm instruction with a dummy constant, e.g. 1234) and a way to identify it. Then let GCC generate the assembly files (.S files), then I parse the assembly with a simple script and when I find that "custom" instruction I insert a label for the linker (make it .global) and at the same time add a directive to a custom "on-the-fly" generated linker script that gets included from my main linker script at the end.
This places data in a temporary section in the resulting DLL with absolute offsets to the custom instruction that I need, but without relocation.
Next, I parse the binary DLL itself, in particular that temporary section I added with all this hack. I take the offsets from there, convert them to file offsets, and modify the DLL's .text section directly where those offsets point (remember those placeholder instructions? it is replacing their immediate constants 1234 with the respective value from the linker's non-relocated constant). Then I strip the temporary section from the DLL, and it's done. Of course, all of this is done automatically by a helper program and script
It's an insane hack, but it works and it's fully automatic now that I got it going. If my assumption is correct that COFF doesn't support non-relocated external symbols, then it's really the only way to use linker constants from C++ without them being relocated, which would be a disaster.

How to use the fixups attribute on a section?

What exactly does "fixups" do when applied on a section?
In a fasm sample i found the following section delcaration and i'm really not sure what the fixups attribute does, i couldn't find much information on that in the fasm documentation.
section '.reloc' fixups data readable discardable
if $=$$
dd 0,8 ; if there are no fixups, generate dummy entry
end if
This appears to be a workaround for a bug in how FASM generates PECOFF DLLs. The .reloc section only applies to PECOFF images (EXEs and DLLs), and provides relocations (or "fixups") that allow the image to be loaded at any address. (Relocations of a different sort are used in PECOFF object files; these fixups aren't put in the .reloc section.)
The bug in FASM is that it will generate an empty .reloc section if the DLL doesn't need any relocations rather than not generating one at all. Windows will refuse to load a DLL (or EXE) if has an empty section. The workaround forces a non-empty .reloc section, by adding a dummy "base relocation block" if the .reloc section doesn't have any contents.
Apparently the developer of FASM doesn't think this is a bug in FASM, but rather a bug in Windows, and so hasn't fixed it.
To answer your question directly, the fixups keyword appears to indicate that this section is special to FASM, that it's used for image relocations as described above. Unlike the the other attributes it doesn't correspond to one of the section flags used in PECOFF images, so it appears to only be used internally by FASM.
fixups is just another name for relocation entries.
If you are new to relocation on PE, take a look at the official specifications.
Relocation entries tell the loader how to fix (hence the name fixups) the addresses in the compiled code.
The fixups directive tell FASM that the section declared is the one where the relocation entry should be generated (automatically).
You can still add your data though, presumably the fixups are written before any user supplied data1.
The test if $=$$ check if the current address counter ($) is equal to the value of the address counter when the section started ($$).
If that is true, the user data will be written at the start of the section, hence no fixups have been generated.
The two dwords dd 0, 8 create an empty entry (a dummy entry).
The second DWORD specify the length of the whole entry including the 8 bytes header, a value of 8 specify no additional data.
I don't know why such dummy entry is created.
1 Just inferring this from the snippet, I don't know for sure.

GDB using variable name to access local variable name

gdb provides a command "print localx" which prints the value stored in the localx variable. So, it basically must be using the symbol table to find the mapping (localx -> addressx on stack). I am unable to understand how this mapping can be created.
What I tried
I studied the intermediate temporary files of gcc using -save-temps option, and observed that a local variable local1 was mapped to a symbol name "LASF8". However, the objdump utility tool did not show this symbol name.
Context :
I am working on a project which requires building a pin-tool to print the accesses of local variables. Given a function, I would like to say that this address corresponds to this variable name. This requires reading the symbol table to correspond an address to a symbol table entry. GDB does the exact reverse mapping. Hence, I would like to understand the same.
The symbol table is contained in the debugging information. This debugging information is emitted by gcc -g. gdb reads the debugging information to get symbolic information, among other things.
Typically the debugging information is in DWARF format. See http://www.dwarfstd.org/ for the specification.
You can also see DWARF more directly using readelf. For example readelf -wi will show the main (".debug_info") debugging information for an ELF file.
Note that doing the mapping in reverse -- that is, assigning a name to every stack slot -- is not entirely easy. First, not every stack slot will have a name. This is because the compiler may spill temporaries to the stack. Second, many locals will have DWARF location expressions to represent their location. This means you'll need to write an expression evaluator (not hard but also not trivial); you could conceivably (unlikely in practice but possible in theory) run into expressions which cannot be evaluated without a real stack frame; and finally the names will therefore generally only be valid at a given PC.
I believe there's a feature request in gdb bugzilla to add this feature to gdb.

How can I force the order of functions in a binary with the gcc toolchain?

I'm building a static binary out of several source files and libraries, and I want to control the order in which the functions are put into the resulting binary.
The background is, I have external code which is linked against offsets in this binary. Now if I change the source, all the offsets change because gcc may decide to order the functions differently, so I want to put the referenced functions at the beginning in a fixed order so their offsets stay unchanged...
I looked through ld's documentation but couldn't find anything about order of functions.
The only thing i found was -fno-toplevel-reorder which doesn't really help me.
There is really no clean and reliable way of forcing a function to a particular address (except for the entry function) or even forcing functions having a particular order (and if you could enforce the order that would still not mean that the addresses stay the same when the source is changed!).
The biggest problem that I see is that even if it may be possible to fix a function to some address, it will be sheer impossible to fix all of them to exactly the addresses that the already existing external program expects (assuming you cannot modify this program). If that actually worked, it would be total coincidence and sheer luck.
It might be almost easiest to provide trampolines at the addresses that the other program expects, and having the real functions (whereever they may be) pointed to by these. That would require your code to use a different base address, so the actual program code doesn't collide with the trampolines.
There are three things that almost work for giving functions fixed addresses:
You can place each function that isn't allowed to move in its proper section using __attribute__ ((section ("some name"))). Unluckily, .text always appears as the first section, so if anything in .text changes so the size is bumped over the 512 byte boundary, your offsets will change. By default (but see below) you can't get a section to start before .text.
The -falign-functions=n commandline option lets you align functions to a boundary. Normally this is something around 16 bytes. Now, you could choose a large value like for example 1024. That will waste an immense amount of space, but it will also make sure that as long as functions only change moderately, the addresses of the following functions will remain the same. Obviously it still does not prevent the compiler/linker from reordering entire blocks when it feels like it (though -fno-toplevel-reorder will prevent this at least partially).
If you are willing to write a custom linker script, you can assign a start address for each section. These are virtual memory addresses, not positions in the executable, but I assume the hard linking works with VMAs (based on the default image base) too. So that could kind of work, although with much trouble and not in a pretty way.
When writing your own linker script, you could also consider putting the functions that must not move into their own sections and moving these sections at the beginning of the executable (in front of .text), so changes in .text won't move your functions around.
Update:
The "gcc" tag suggests that you probably target *NIX, so again this is probably not going to help you, but... if you have the option to use COFF, dollar-sign sections might work (the info might be interesting for others, in any case).
I just stumbled across this today (emphasis mine):
The "$" character (dollar sign) has a special interpretation in section names in object files. When determining the image section that will contain the contents of an object section, the linker discards the "$" and all characters that follow it. Thus, an object section named .text$X actually contributes to the .text section in the image. However, the characters following the "$" determine the ordering of the contributions to the image section. All contributions with the same object-section name are allocated contiguously in the image, and the blocks of contributions are sorted in lexical order by object-section name. Therefore, everything in object files with section name .text$X ends up together, after the .text$W contributions and before the .text$Y contributions.
If the documentation does not lie (and if I'm not reading wrong), this means you should be able to pack all the functions that you want located in the front into one section .text$A, and everything else into .text$B, and it should do just that.
Build your code with -ffunction-sections -- this will place each function into its own section.
If you are using GNU-ld, the linker script gives you absolute control, but is a very platform-specific and somewhat painful solution.
A better solution might be to use the recent work on gold, which allows exactly the function ordering you are seeking.
A lot of it comes from the order the functions are in the file and the order the files are on the command line when you link.
Embed something in the code that your external code can find, a const structure with some ascii code and the address to functions perhaps, then no matter where the compiler puts the functions you can find them.
that or use the normal .dll or .so mechanisms, and not have to mess with it.
In my experience, gcc -O0 will fix the binary order of functions to match the order in the source code.
However as others have mentioned, even if the order is fixed, the offsets can change as you modify the source code or upgrade your toolchain.

Wondering about COFF Externs

The Microsoft PE/COFF SPEC (v8, section 5.4.4) says that when a symbol has:
A storage class of IMAGE_SYM_CLASS_EXTERNAL
And a section number of 0 (IMAGE_SYM_UNDEFINED)
It's "value" field (in the symbol table) which "indicates the size".
This confuses me. In particular, I'm wondering "indicates the size of what?".
Generally, IMAGE_SYM_CLASS_EXTERNAL and IMAGE_SYM_UNDEFINED are used by CL(visual C++) to represent externs.
Why would the linker need to know, or care, about the symbol's size? Doesn't it just need to know a name, that it's an extern, and have the appropriate relocation entries set? None of this should depend on size. Now, admittedly, the compiler needs to know this, but it would get that information from a header file, not from an object file.
I've looked at some simple example externs compiled by CL, and the Value field always seems to be zero. So, it's clearly not being used to encode the size of the field.
Does anyone know what "size" the spec is referring to? Are their any scenarios where the visual studio linker might use that field, or is that blurb in the spec just nonsense? My limited brain is unable to think of any such scenarios.
Update:
Please note that it does not, at least not always, appear to be the size of the symbol. In the cases I've observed the VALUE IS ALWAYS 0, hence the question.
Mr.Wisniewski, I believe I found the answer.
I'm a student and I've tried to write my own linker.
The very first version of it can link OBJ files and dump them
to my own binary format. But soon I've realized that many C++ language
features are unsupported without LIBCMT.LIB.
So at first I've coded lib parser... and stuck while trying to link CRT.
In the second linker member of the file LIBCMT.LIB was specified that
object file crt0.obj (inside libcmt) contains symbol __acmdln (global pointer to the
command line)... but I couldn't manage to find it there! I was really frustrated...
Symbol had IMAGE_SYM_CLASS_EXTERNAL and section IMAGE_SYM_UNDEFINED, but why?
In the source file crt0.c there is a declaration:
#ifdef WPRFLAG
wchar_t *_wcmdln; /* points to wide command line */
#else /* WPRFLAG */
char *_acmdln; /* points to command line */
#endif /* WPRFLAG */
My investigation was rather long and the result is so:
C++ compiler places uninitilized data into the .bss section and marks it with IMAGE_SCN_CNT_UNINITIALIZED_DATA, but
pure C compiler behaves in a different way (libcmt was written in C).
It is linker's duty to place uninitialized data into sections.
If C compiler emits symbol without section (0) and marked as external and if
it's value field is zero, then it is declared anywhere else, but if value field is not null, that
means that given OBJ file really contains that symbol but it is not initialized.
So linker should reserve place in .bss section for it. THE PLACE OF 'VALUE' SIZE.
And when you change those lines to:
#ifdef WPRFLAG
wchar_t *_wcmdln = 0xCCCCCCCC; /* points to wide command line */
#else /* WPRFLAG */
char *_acmdln = 0xCCCCCCCC; /* points to command line */
#endif /* WPRFLAG */
There will be zero value field and both of them will be placed in .data section.
Good luck, and sorry for my bad English.
How about an extern declaration for an array that declares the size:
a.cpp:
extern int example[42];
b.cpp:
int example[13];
The fact that the linker doesn't catch this mismatch suggests however that Value isn't used. I have no easy way to see that.
It's the size of the data structure referred to by the symbol.
Basically, if the symbol is undefined, the linker can't otherwise find the size of the data structure, and therefore needs to know in advance how big it is when it's instantiated so it can deal with those issues during linkage.
You have a very exotic, but interesting question. It it correct that the only possibility to produce symbol table inside of COFF is usage of /Zd compiler switch which are supported till Visual C++ 6.0 and use the old linker switch /debugtype:coff (see http://www.debuginfo.com/articles/gendebuginfo.html#gendebuginfovc6)? Is there any possibility to produce symbol table inside of COFF with at least Visual Studio 2008?
My idea is try to produce a PE with a symbol table of storage class IMAGE_SYM_CLASS_EXTERNAL and the section number 0 (IMAGE_SYM_UNDEFINED) with respect of linker switch /FORCE (/FORCE:UNRESOLVED or /FORCE:MULTIPLE) and an unresolved symbol either by /INCLUDE:dummySymbol or by /NODEFAULTLIB. My problem is that it's not easy to produce symbol table inside of COFF. Where you receive the test PEs?

Resources