Trying to understand how the relocation happens in the situation below
I have a shared library libbigshr.so which uses another shared library libfunlib.so. In the latter I have declared this global variable foo. To compile the former I had to do forward declaration as extern int foo.
.rel.dyn at libbigshr.so
Offset Info Type Sym.Value Sym. Name
000005a5 00000401 R_386_32 00000000 foo
000005ab 00000401 R_386_32 00000000 foo
000005c7 00000401 R_386_32 00000000 foo
.rel.dyn at libfunlib.so
000005a5 00000901 R_386_32 00002010 foo
In the libfunlib the translation offset is proper value (0x2010), therefore I don't have problem. But wanted to know the how the correct addresses were inserted in libbigshr. I can understand, that once a variable has been allocated the memory and its location is identified the same can be used everywhere else. But I am interested in the procedure of doing so.
Due to my ignorance my question may not have sufficient data to answer it - so please let me know and I will furnish more details.
The correct address is generated by relocation processing at run time, by the dynamic linker. For example, in glibc, R_386_32 is processed in the i386 version of the elf_machine_rel function in the file sysdeps/i386/dl-machine.h. The glibc wiki has an overview of process startup. For details, see the references in the Linux Standard Base, particular those concerning the System V Application Binary Interface. For machine-specific information, H.J. Lu maintains a set of x86 ABI documents.
Related
The question is about loading portable executable images to a random address.
Let's take kernel32.dll as an example, loaded at 0x75A00000.
I can see that at offset 0x10e15 from the image, there is an assembler instruction, which depends on where the image is located.
address:
75A10E13
bytes:
8B 35 18 03 AE 75
command:
MOV ESI,DWORD PTR DS:[75AE0318]
It turns out that by launching the executable file, we must tell the system that we need to relocation to this address.
The system looks at the relocation table, which is in the executable file, and sees the following:
base relocation table
To get the absolute address of the first element to be moved, I do the following: add the virtual address to the address of the image, and then I add the first element of the block to the resulting number.
0x75A00000 + 0x10000 + 0x3E15 = 75A10E15
it's a good number, but always 0x3000 more than I expect. i just subtract 0x3000 and it works. Please, help me find the answer, where does 0x3000 for x86 come from?
Relocation in Portable Executables were resolved when the file was linked. The base relocation table, which you are referring, has a different function: it is used by Windows loader when the PE could not be loaded at the prefered ImageBase address specified by the linker, usually 0x0040_0000.
Dynamically Loaded Libraries shipped with MS Windows are linked to ImageBase addresses different for each core DLL and chosen not to colide with one another, so an executable which imports usual combination of libraries doesn't have to relocate them.
You misinterpreted the format of base relocation section .reloc.
Those 16bit words TypeOrOffset which follow PageRVA and BlockSize have their Base Relocation Type encoded in four most significant bits.
For instance the first TypeOrOffset entry in you dump 0x3E15 has type IMAGE_REL_BASED_HIGHLOW (3) and offset 0x0E15, which is the number to be added to PageRVA.
I have some startup assembly for RISCV which defines the .text section as beginning at .globl _start.
I know what this is - as a disassembly shows me the address, but I cannot see where it is defined. It's not in the linker script and a grep in the build directories shows it is in various binary files, but I cannot find a definition.
I am guessing this appears in a file somewhere as a function of the architecture, but can anyone tell me where? (This is all being built using RISCV GNU cross compilers on Linux)
Unless you control it yourself there is usually at least in the gnu tools world a file called crt0.s. Or perhaps some other name. Should be one per architecture since it is in assembly language. It is the default bootstrap, zeros .bss copies .data as needed, etc.
I dont remember if it is part of the C library (glibc, newlib, etc), or if it is added on later by folks that build a toolchain targeting some specific platform.
Not required certainly but it is not uncommon to see _start be the label of the beginning of the binary, it is supposed to be the entry point certainly. So if you have an operating system/loader that uses a binary with labels present (elf, etc), then it can load the binary and instead of branching to the first address it branches to the entry point.
So the _start is merely defined as being at the start of the .text section, and the address of the .text section is defined in the linker script.
I believe that the MODULE_VERSION does not work if the driver is statically compiled into the kernel. The version number was no where to be seen in the sysfs. the modinfo does not work as its not a loaded module.
So Whats the best way for to wither get the MODULE_VERSION of this driver or encode version number in the driver. Is there a standard way of doing this or should I simply use sysfs?
First of all, there is no much sense to have a module version for in tree modules. Otherwise it is kept is special section called __modver.
$ objdump -h ~/prj/TMP/out/mfld/vmlinux -j __modver
/home/andy/prj/TMP/out/mfld/vmlinux: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
12 __modver 00000c40 c1a003c0 01a003c0 00a013c0 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
It contains pointers to corresponding structures defined in include/linux/module.h in macro MODULE_VERSION.
Since the contents of the .eh_frame section is used at runtime by the C++ library, I can only assume that is is mapped into the running process and that there is some way for the library to "get at it". Likewise, I can only assume that way is accessible outside the library as well, but how? Are there some symbols pointing to the relevant parts of it, and, if so, what names to they bear?
A very similar question was asked in Accessing .eh_frame data during execution, but I would like to point out that it is not a direct duplicate, since it was asked in the context of the Linux kernel and was correctly answered in that that final compiled kernel does not include the .eh_frame sections. This question is about normal, userspace, ELF binaries.
I was wondering how exactly .cfi_remember_state is implemented. I know it is a pseudo-op, so I suppose it is converted into a couple of instructions when assembling. I am interested what exact instructions are used to implement it. I tried many ways to figure it out. Namely:
Read GAS source code. But failed to find anything useful enough.
Read GAS documentation. But the .cfi_remember_state entry is just a simple joke (literally).
Tried to find a gcc switch that would make gcc generate asm from C code with pseudo-ops "expanded". Failed to find such a switch for x86 / x86-64. (Would be nice if someone could point me to such a switch, assuming it exists, BTW.)
Google-fu && searching on SO did not yield anything useful.
The only other solution in my mind would be to read the binary of an assembled executable file and try to deduce the instructions. Yet I would like to avoid such a daunting task.
Could any of You, who knows, enlighten me, how exactly it is implemented on x86 and/or x86-64? Maybe along with sharing how / where that information was acquired, so I could check other pseudo-ops, if I ever have the need to?
This directive is a part of DWARF information (really all it does is emit DW_CFA_remember_state directive). Excerpt from DWARF3 standard:
The DW_CFA_remember_state instruction takes no operands. The required
action is to push the set of rules for every register onto an implicit
stack.
You may play with DWARF information using objdump. Lets begin with simple void assembler file:
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
#.cfi_remember_state
.cfi_endproc
.LFE0:
.size main, .-main
Compile it with gcc cfirem.s -c -o cfirem.o
Now disassemble generated DWARF section with objdump --dwarf cfirem.o
You will get:
00000018 00000014 0000001c FDE cie=00000000 pc=00000000..00000000
DW_CFA_nop
DW_CFA_nop
...
If you will uncomment .cfi_remember_state, you will see instead:
00000018 00000014 0000001c FDE cie=00000000 pc=00000000..00000000
DW_CFA_remember_state
DW_CFA_nop
DW_CFA_nop
...
So it is not really converting in assembler instructions (try objdump -d to see that there are no assembler instructions in our sample at all). It is converted in DWARF pseudo-instructions, that are used when debugger like GDB processes your variable locations, stack information and so on.