Loading data segment of already loaded shared library - gcc

For global offset table to work, GOT must be at a fixed location from text segment. Now assume that a program needs a shared library. Assume also that the shared library is already loaded by the OS for some other process. Now for our program, since text section of shared library is already loaded, it just needs to load data segment. The shared library text section is mapped back to the virtual address of our process. But what if there is already some data/text or whatever at the fixed offset from the virtual address of our shared library. How does the dynamic linker resolve that conflict? One approach would be to leave R_386_GOTPC in the text section till load time and let the dynamic linker change it the new offset. Is this how it is done in practice.

On GNU, even the same DSO is mapped at different addresses in different processes. No data at all is shared between them. This means that the GOT is just private data (like .data), and is initialized at load time with the proper addresses (either stubs or the proper function addresses with BIND_NOW).
(This assumes that prelink is not in use, which is somewhat broken anyway.)

Related

What is there before ImageBase address in Virtual Address?

I know from the Microsoft documentation that the image base is set to 0x140000000 for 64-bit images and it is the base address where the executable file is first loaded into the memory.
So my questions are as follows
What comes before 0x140000000 address and starting of virtual address first page (0x0000000)
What does it mean by executable first loaded? Is it the entry point of the program (which is of course not the main function) or something else
What comes before 0x140000000 address and starting of virtual address first page (0x0000000)
Whatever happens to allocate there, like DLLs, file mappings, heap memory, or this memory can be free. The first page is always inaccessible.
What does it mean by executable first loaded? Is it the entry point of the program (which is of course not the main function) or something else
Loaded means mapped into memory. After it is mapped into memory, its imports are resolved, statically linked DLLs are mapped into memory, their entry points are executed, and only then it comes to the executable entry point. Executable entry point is not really the first function to execute from the executable if it has TLS callbacks.
I don't know the technical reason why the 64-bit default is so high, perhaps just to make sure your app does not have 32-bit pointer truncation bugs with data/code in the module? And it is important to note that this default comes from the Microsoft compiler, Windows itself will accept a lower value. The default for 32-bit applications is 0x00400000 and there are actual hardware and technical reasons for that.
The first page starting at 0 is off limits in most operating systems to prevent issues with de-referencing a NULL pointer. The first couple of megabytes might have BIOS/firmware or other legacy things mapped there.
By first loaded, it means the loader will map the file into memory starting at that address. First the MZ part (DOS header and stub code) and the PE header. After this comes the various sections listed in the PE header.
Most applications are using ASLR these days so the base address will be random and not the preferred address listed in the PE. ntdll and kernel32 are mapped before the exe so if you choose their base address you will also be relocated.

Low-level details on linking and loading of (PE) programs in Windows

Low-level details on linking and loading of (PE) programs in Windows.
I'm looking for an answer or tutorial that clarifies how a Windows program are linked and loaded into memory after it has been assembled.
Especially, I'm uncertain about the following points:
After the program is assembled, some instructions may reference memory within the .DATA section. How are these references translated, when the program is loaded into memory starting at some arbitrary address? Does RVA's and relative memory references take care of these issues (BaseOfCode and BaseOfData RVA-fields of the PE-header)?
Is the program always loaded at the address specified in ImageBase header field? What if a loaded (DLL) module specifies the same base?
First I'm going to answer your second question:
No, a module (being an exe or dll) is not allways loaded at the base address. This can happen for two reasons, either there is some other module already loaded and there is no space for loading it at the base address contained in the headers, or because of ASLR (Address Space Layout Randomization) which mean modules are loaded at random slots for exploit mitigation purposes.
To address the first question (it is related to the second one):
The way a memory location is refered to can be relative or absolute. Usually jumps and function calls are relative (though they can be absolute), which say: "go this many bytes from the current instruction pointer". Regardless of where the module is loaded, relative jumps and calls will work.
When it comes to addressing data, they are usually absolute references, that is, "access these 4-byte datum at this address". And a full virtual address is specified, not an RVA but a VA.
If a module is not loaded at its base address, absolute references will all be broken, they are no longer pointing to the correct place the linker assumed they should point to. Let's say the ImageBase is 0x04000000 and you have a variable at RVA 0x000000F4, the VA will be 0x040000F4. Now imagine the module is loaded not at its BaseAddress, but at 0x05000000, everything is moved 0x1000 bytes forward, so the VA of your variable is actually 0x050000F4, but the machine code that accessess the data still has the old address hardcoded, so the program is corrupted. In order to fix this, linkers store in the executable where these absolute references are, so they can be fixed by adding to them how much the executable has been displaced: the delta offset, the difference between where the image is loaded and the image base contained in the headers of the executable file. In this case it's 0x1000. This process is called Base Relocation and is performed at load time by the operating system: before the code starts executing.
Sometimes a module has no relocations, so it can't be loaded anywhere else but at its base address. See How do I determine if an EXE (or DLL) participate in ASLR, i.e. is relocatable?
For more information on ASLR: https://insights.sei.cmu.edu/cert/2014/02/differences-between-aslr-on-windows-and-linux.html
There is another way to move the executable in memory and still have it run correctly. There exists something called Position Independent Code. Code crafted in such a way that it will run anywhere in memory without the need for the loader to perform base relocations.
This is very common in Linux shared libraries and it is done addressing data relatively (access this data item at this distance from the instruction pointer).
To do this, in the x64 architecture there is RIP-relative addressing, in x86 a trick is used to emulate it: get the content of the instruction pointer and then calculate the VA of a variable by adding to it a constant offset.
This is very well explained here:
https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html
I don't think PIC code is common in Windows, more often than not, Windows modules contain base relocations to fix absolute addresses when it is loaded somewhere else than its prefered base address, although I'm not exactly sure of this last paragraph so take it with a grain of salt.
More info:
http://opensecuritytraining.info/LifeOfBinaries.html
How are windows DLL actually shared? (a bit confusing because I didn't explain myself well when asking the question).
https://www.iecc.com/linker/
I hope I've helped :)

Why ELF executables have a fixed load address?

ELF executables have a fixed load address (0x804800 on 32-bit x86 Linux binaries, and 0x40000 on 64-bit x86_64 binaries).
I read the SO answers (e.g., this one) about the historical reasons for those specific addresses. What I still don't understand is why to use a fixed load address and not a randomized one (given some range to be randomized within)?
why to use a fixed load address and not a randomized one
Traditionally that's how executables worked. If you want a randomized load address, build a PIE binary (which is really a special case of shared library that has startup code in it) with -fPIE and link with -pie flags.
Building with -fPIE introduces runtime overhead, in some cases as bad as 10% performance degradation, which may not be tolerable if you have a large cluster or you need every last bit of performance.
not sure if I understood your question correct, but saying I did, that's sort-off a "legacy" / historical issue, ELF is the file format used by UNIX derived operating systems, both POSIX (IOS) and Unix-like (Linux).
and the elf format simply states that there must be some resolved and absolute virtual address that the code is loaded into and begins running from...
and simply that's how the file format is, and during to historical reasons that cant be changed... you couldn't just "throw" the executable in any memory address and have it run successfully, back in the 90's when the ELF format was introduced problems such as calling functions with virtual tables we're raised and it was decided that the elf format would have absolute addresses within it.
Also think about it, take a look at the elf format -https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
how would you design an OS executable-loader that would be able to handle an executable load it to ANY desired virtual address and have the code run successfully without actually having to change the binary itself... if you would like to do something like that you'd either need to vastly change the output compilers generate or the format itself, which again isn't possible
As time passed the requirement of position independent executing (PIE/PIC) has raised and shared objects we're introduced in order to allow that and ASLR
(Address Space Layout Randomization) - which means that the code could be thrown in any memory address and still be able to execute, that is simply implemented by making sure that all calls within the code itself are relative to the current address of the executed instruction, AND that when the shared object is loaded the OS loader would have to change some data within the binary given that the data changed is not executable instructions (R E) but actual data (RW, e.g .data segment), which also is implemented by calling functions from some "Jump tables" ( which would be changed at load time ) for example PLT / GOT.... those shared objects allow absolute randomization of the addresses the code is loaded to and if you want to execute some more "secure" code you'd have to compile it as a shared object and and dynamically link it and load time or run time..
( hope I've cleared some things out :) )

How does a PE file get mapped into memory?

So I have been reasearching the PE format for the last couple days, and I still have a couple of questions
Does the data section get mapped into the process' memory, or does the program read it from the disk?
If it does get mapped into its memory, how can the process aqquire the offset of the section? ( And other sections )
Is there any way the get the entry point of a process that has already been mapped into the memory, without touching the file on disk?
Does the data section get mapped into the process' memory
Yes. That's unlikely to survive for very long, the program is apt to write to that section. Which triggers a copy-on-write page copy that gets the page backed by the paging file instead of the PE file.
how can the process aqquire the offset of the section?
The linker already calculated the offsets of variables in the section. It might be relocated, common for DLLs that have an awkward base address that's already in use when the DLL gets loaded. In which case the relocation table in the PE file is used by the loader to patch the addresses in the code. The pages that contain such patched code get the same treatment as the data section, they are no longer backed by the PE file and cannot be shared between processes.
Is there any way the get the entry point of a process
The entire PE file gets mapped to memory, including its headers. So you can certainly read IMAGE_OPTIONAL_HEADER.AddressOfEntryPoint from memory without reading the file. Do keep in mind that it is painful if you do this for another process since you don't have direct access to its virtual address space. You'd have to use ReadProcessMemory(), that's fairly little joy and unlikely to be faster than reading the file. The file is pretty likely to be present in the file system cache. The Address Space Layout Randomization feature is apt to give you a headache, designed to make it hard to do these kind of things.
Does the data section get mapped into the process' memory, or does the program read it from the disk?
It's mapped into process' memory.
If it does get mapped into its memory, how can the process aqquire the offset of the section? ( And other sections )
By means of a relocation table: every reference to a global object (data or function) from the executable code, that uses direct addressing, has an entry in this table so that the loader patches the code, fixing the original offset. Note that you can make a PE file without relocation section, in which case all data and code sections have a fixed offset, and the executable has a fixed entry point.
Is there any way the get the entry point of a process that has already been mapped into the memory, without touching the file on disk?
Not sure, but if by "not touching" you mean not even reading the file, then you may figure it out by walking up the stack.
Yes, all sections that are described in the PE header get mapped into memory. The IMAGE_SECTION_HEADER struct tells the loader how to map it (the section can for example be much bigger in memory than on disk).
I'm not quite sure if I understand what you are asking. Do you mean how does code from the code section know where to access data in the data section? If the module loads at the preferred load address then the addresses that are generated statically by the linker are correct, otherwise the loader fixes the addresses with relocation information.
Yes, the windows loader also loads the PE Header into memory at the base address of the module. There you can file all the info that was in the file PE header - also the Entry Point.
I can recommend this article for everything about the PE format, especially on relocations.
Does the data section get mapped into the process' memory, or does the
program read it from the disk?
Yes, everything before execution by the dynamic loader of operating systems either Windows or Linux must be mapped into memory.
If it does get mapped into its memory, how can the process acquire the
offset of the section? ( And other sections )
PE file has a well-defined structure which loader use that information and also parse that information to acquire the relative virtual address of sections around ImageBase. Also, if ASLR - Address randomization feature - was activated on the system, the loader has to use relocation information to resolve those offsets.
Is there any way the get the entry point of a process that has already
been mapped into the memory, without touching the file on disk?
NOPE, the loader of the operating system for calculation of OEP uses ImageBase + EntryPoint member values of the optional header structure and in some particular places when Address randomization is enabled, It uses relocation table to resolve all addresses. So we can't do anything without parsing of PE file on the disk.

How much of shared object is loaded to memory

If there is a shared object file say libComponent.so which is made up of two object files Component_1.o and Compononet_2.o.
And there is an application which links to libComponent.so but is only using Compononent_1.o functions.
Will the entire shared object i.e libComponent.so will be loaded into memory when application runs and uses shared object file or just the Component_1.o ?
Is there an option available in gcc compiler to toggle this behaviour of only loading the required symbols from a shared object ?
Well, it depends on what you mean by 'loaded'.
The dynamic linker will map all of the library into the process's virtual memory space and will fill in entries in the executable's import table for each library function used with the addresses of functions in the shared library. But filling in the import table doesn't actually load from those addresses, so they won't be loaded into physical memory.
From then on, the library code will be paged into physical memory on demand when the function is called, just like any other pageable memory in the process's virtual address space. If a function is never called (directly from the application or indirectly from another library function called by the application), it won't be paged in. (Well, paging occurs with page size granularity, so you might pull in a function the application doesn't call if it's next to a function it does call. Some compilers use profile-guided optimization to place functions commonly called together next to each other to minimize the number of pages used.)
(Aside: if your library wasn't compiled to use position-independent code and it's loaded at its non-default base address, the linker will need to fix up addresses in the code when it's loaded, which would cause the entire library to be paged in. This could be done lazily when each page is first loaded, though I'm not sure which linkers do this.)

Resources