I wondered if linux shared libraries like the gnu libc is shared between process or every single process the linker ld.so map a new region of libc to the application if true is this doesn't eat the ram as the same library will be loaded repeatedly for every process in different regions also the same thing for linux VDSO the fast v_syscall method.
Or linux kernel already mapped all shared libraries in the ram and just every process that needs the library linkage kernel give it access to the library region and by this no huge ram pages
On Linux libraries are typically compiled as position-independent-code, which means that they can be mapped anywhere in the address space without needing relocation fixups.
Each process that loads the library uses a private mapping of the library's segments, but because relocation fixups are not required the text and read-only data mappings remain clean (unmodified), which means that these mappings are backed by only one set of physical pages no matter how many processes they are mapped in.
Related
Say two processes are using Kernel32.dll, does Windows map the DLLs to the same virtual address space in both processes? If not, how does paging mechanism end up using the same physical address where the DLL is in fact loaded for both processes? I tried finding this info in the windows internals book but didn't find anything
TL;DR: No, it might be loaded somewhere else in another process.
Ntdll and Kernel32 are special and always load at the same address so it is better to focus on something else, Shell32 for example.
A dll has what is known as a preferred base address and this is stored in the PE header (ImageBase). The loader will first attempt to load the dll at this address. If that address range is free then loading will succeed with no extra work required.
If the address is not free then the loader has to load it somewhere else. Loading at a different address usually requires relocation information and if this was removed during linking (/FIXED) then loading will fail! If there was space somewhere else to load the dll, the loader will use the relocation information to patch the given locations in the dll with the new base address. Because dlls are loaded as copy-on-write, this will cause extra memory usage compared to loading at the preferred address since each memory page that needed a patch is now a private copy in the process. This means that the answer to your question is no, a dll might not load at the same address in a different process if that process already has something else loaded there.
So far I have only talked about the loader. The loader is implemented in Ntdll as normal usermode code and is not involved with how a file mapped into memory actually works. Memory mapped files (known as Sections internally in NT) is a co-operation between the operating system kernel and the CPU hardware. This is a whole topic in of itself but the important thing to know is that physical memory and the page/swap file mechanism is completely disconnected from how a usermode process accesses its virtual memory pages. The kernel can map a physical memory page to zero, one, or multiple places in a processes virtual memory and the CPU will automatically translate when a virtual page is accessed by the process.
As a final note, ASLR does complicate things a little bit but the "offset" only changes on reboot and should not have an impact on this specific question in current implementations. In theory Windows could change this in the future and always load things at different addresses in different processes but this is unlikely to happen because of the copy-on-write downsides.
Linux and MacOS leverage the power of Position-Independent Code. There's no such thing on Windows and yet programs can link against shared DLLs normally. I can't seem to find good documentation on this topic, besides a couple of terse articles on the Microsoft website (here and here).
Does Windows just copy the DLL code in memory and adjust function addresses as needed? What if two programs link against the same library? Could the virtual memory mechanism be involved somehow?
Windows PE (.exe/.dll) files contain relocation data that allows the loader to adjust addresses as required if the code is loaded at an address other than the intended base address.
The relocation table is essentially just a list of offsets within the binary that need to be adjusted, such that e.g. if a .dll with a base address of 0x100000, is instead loaded at 0x300000, each of the addresses included in the relocation table will have (0x300000 - 0x100000) = 0x200000 added to them.
Further details on the format of the relocation data with the PE file, and the structure of such files generally can be found here: https://learn.microsoft.com/en-us/previous-versions/ms809762(v=msdn.10)#pe-file-base-relocations
I'm studying Windows system internals and the question is just a guess.
I learn that DLL is a form of shared libraries, so at least the code section of the same DLL is shared between processes using it. (By adding the same page entries into the page table of these processes) The code section usually has something like jump tables, which need to be relocated (i.e. write the run-time virtual address to fix the pointer) before it's ready to be executed.
Assume that the same DLL aa.dll is mapped in two different processes at different virtual addresses. (e.g. a.exe 0x00400000 b.exe 0x00410000) The same pointer (at .text+0x100) will be fixed into different addresses. (e.g. a.exe 0x00400100 b.exe 0x004100100). So we have to make a copy of the code section and change it to adapt one process. Then how can the code section be shared?
Am I right?
Answering my own question. The first time a DLL is loaded, Windows would try to load it at the Preferred address which would not require relocation (i.e. fixing addresses due to the fact that code segment is located at x). If it cannot be loaded at the preferred address, it would be allocated virtual pages at a free address backed up by the DLL file itself (not swap file) but marked as Copy-On-Write. Now Windows has to go and fix up the assembly code using the relocation table. Hopefully only a small percentage of code needs to be fixed up and each code segment that is changed would be copied on write and put into physical memory somewhere.
Each time a process cannot load a DLL at the preferred address, I believe this process would happen. This is why sometimes popular DLLs need to be rebased so that their preferred addresses don't conflict.
In Windows if there are two processes each using the same DLL then apparently each process separately loads the DLL into its address space while in Linux a shared-object is loaded once and mapped into different processes. Can someone explain to me the pros and cons of the Windows approach ?
I'm not sure the difference is so stark. Windows shares everything except the data segment between all users of a DLL by loading the DLL once and mapping the shared parts into each process. However, any global data in the DLL is loaded separately for each process so that processes don't unintentionally share data. I'd be surprised if linux wasn't very similar, otherwise shared libraries could pose significant security risks, not to mention potential reliability issues as well. Here are a couple references:
From stackoverflow:
Are .dll files loaded once for every program or once for all programs?
From wikipedia:
http://en.wikipedia.org/wiki/Dynamic-link_library
I can readily dump the entire memory space of a process using various tools.
But is it possible to dump just the memory space used by a DLL loaded by some process? What tools should I use?
Thanks,
Jim
You probably mean looking at the memory allocated by code in the DLL.
I think this is impossible. If the DLL allocates memory, and the DLL is written in C++, and the C/C++ Run Time is dynamically linked (i.e. as DLL), then it will use the same C/C++ Run Time as the main application, and all DLL's allocated memory will be allocated on the same heap.
Even if the DLL would have the C/C++ Run Time statically linked, or the DLL is written in a different language, it will probably use the same default Windows heap.
If you have control over the DLL yourself, you could try to implement a custom memory manager for your DLL (in C++ this means overriding new and delete, 6 global operators in total), try to use a different (i.e. non-default) Windows heap, and then using the heapwalk methods of the low-level Windows debugger WinDbg, but it will be quite difficult to get this all working. Or your DLL's custom memory manager could allocate memory at a fixed address using VirtualAlloc (or non-fixed, and then logging the virtual address). Then you can look at this address space in the normal process memory dump.