In answer to a question about mapping non-contiguous blocks of files into contiguous memory, here, it was suggested by one respondent that I should use VirtualAllocEx() with MEM_RESERVE in order to establish a 'safe' value for the final (lpBaseAddress) parameter for MapViewOfFileEx().
Further investigation revealed that this approach causes MapViewofFileEx() to fail with error 487: "Attempt to access invalid address." The MSDN page says:
"No other memory allocation can take place in the region that is used for mapping, including the use of the VirtualAlloc or VirtualAllocEx function to reserve memory."
While the documentation might be considered ambiguous with respect to valid sequences of calls, experimentation suggests that it is not valid to reserve memory for MapViewOfFileEx() using VirtualAllocEx().
On the web, I've found examples with hard-coded values - example:
#define BASE_MEM (VOID*)0x01000000
...
hMap = MapViewOfFileEx( hFile, FILE_MAP_WRITE, 0, 0, 0, BASE_MEM );
To me, this seems inadequate and unreliable... It is far from clear to me why this address is safe, or how many blocks can be safely be mapped there. It seems even more shaky given that I need my solution to work in the context of other allocations... and that I need my source to compile and work in both 32 and 64 bit contexts.
What I'd like to know is if there is any way to reliably reserve a pool of address space in order that - subsequently - it can be reliably used by MapViewOfFileEx to map blocks to explicit memory addresses.
You almost got to the solution by yourself but fell short of the last small step.
As you figured, use VirtualAlloc (with MEM_RESERVE) to find room in your address space, but after that (and before MapViewOfFileEx) use VirtualFree (with MEM_RELEASE). Now the address range will be free again. Then use the same memory address (returned by VirtualAlloc) with MapViewOfFileEx.
What you are trying to do is impossible.
From the MapViewOfFileEx docs, the pointer you supply is "A pointer to the memory address in the calling process address space where mapping begins. This must be a multiple of the system's memory allocation granularity, or the function fails."
The memory allocation granularity is 64K, so you cannot map disparate 4K pages from the file into adjacent 4K pages in virtual memory.
If you provide a base address, the function will try to map your file at that address. If it cannot use that base address (because something is already using all or part of the requested memory region), then the call will fail.
For most applications, there's no real point trying to fix the address yourself. If you're a sophisticated database process and you're trying to carefully manage your own memory layout on a machine with a known configuration for efficiency reasons, then it might be reasonable. But you'd have to be prepared for failure.
In 64-bit processes, the virtual address space is pretty wide open, so it might be possible to select a base address with some certainty, but I don't think I'd bother.
From MSDN:
While it is possible to specify an address that is safe now (not used by the operating system), there is no guarantee that the address will remain safe over time. Therefore, it is better to let the operating system choose the address.
I believe "over time" refers to future versions of the OS and whatever run-time libraries you're using (e.g., for memory allocation), which might take a different approach to memory layout.
Also:
If the lpBaseAddress parameter specifies a base offset, the function succeeds if the specified memory region is not already in use by the calling process. The system does not ensure that the same memory region is available for the memory mapped file in other 32-bit processes.
So basically, your instinct is right: specifying a base address is not reliable. You can try, but you must be prepared for failure.
So to directly answer your question:
What I'd like to know is if there is any way to reliably reserve a pool of address space in order that - subsequently - it can be reliably used by MapViewOfFileEx to map blocks to explicit memory addresses.
No, there isn't. Not without applying many constraints on the runtime environment (e.g., limiting to a specific version of the OS, setting base addresses for all of your DLLs, disallowing DLL injection, etc.).
Related
On Linux x86_64, I have a simple application for which I track all memory accesses with Intel's PIN. The program uses only "a bit" of memory, most of it for dynamically allocated matrices (I've bisected the right value with ulimit). However, the memory accesses span the whole range of the VM address space, low addresses for what I presume global variables in the code, high addresses for the malloc()ed arrays.
There's a huge gap in the middle, and even in the high addresses the range is between 0x7fff4e1a37f4 and 0x7fea3af99000, which is much larger than what I would assume my application to use in total.
The post-processing that I need to do on the memory accesses deals very badly with these sparse accesses, so I'm looking for a way to restrict the virtual address range available to the process so that "it just fits", and accesses will show addresses between 0 and some more reasonable value for dynamically allocated memory (somewhere around the 40 Mb that I've discovered through ulimit).
Q: Is there an easy way to limit the available address space (and hence implicitly, available memory) to an individual process on Linux, ideally from the command line on a per-process basis?
Further notes:
I can link my application statically.
Even if I limit the memory with ulimit, the process still uses the full VM address range (not entirely unexpected).
I know about /proc/${pid}/maps, but would like to avoid creating wrappers to deal with this, and how to actually use the data in there.
I've heard about prelink (which may not apply to my static binary, but only libraries?) and can image that there are more intrusive ways to interfere with malloc(), but these solutions are too far out of my expertise to evaluate their usefulness (Limiting the heap area's Virtual address range, https://stackoverflow.com/a/29960208/60462)
If there's no simple command line-solution, instead of going for any elaborate hack, I'll probably just wing it in the post-processing and "normalize" the addresses e.g. via a few lines of perl).
ld.so(8) lists LD_PREFER_MAP_32BIT_EXEC as a way to restrict the address space to the lower 2GiB which is significantly less than the normal 64-bit address space, but possibly not small enough for your purposes.
It may also be possible to use the prctl(2) PR_SET_MM options to control the addresses of different parts of the program to solve your problem.
When a fork is called, the stack and heap are both copied from the parent process to the child process. Before using the fork system call, I malloc() some memory; let's say its address was A. After using the fork system call, I print the address of this memory in both parent and child processes. I see both are printing the same address: A. The child and parent processes are capable of writing any value to this address independently, and modification by one process is not reflected in the other process. To my knowledge, addresses are globally unique within a machine.
My question is: Why is it that the same address location A stores different values at the same time, even though the heap is copied?
There is a difference between the "real" memory address, and the memory address you usually work with, i.e. the "virtual" memory address. Virtual memory is basically just an abstraction from the Operating System in order to manage different pages, which allows the OS to switch pages from RAM into HDD (page file) and vice versa.
This allows the OS to continue operating even when RAM capacity has been reached, and to put the relevant page file into a random location inside RAM without changing your program's logic (otherwise, a pointer pointing to 0x1234 would suddenly point to 0x4321 after a page switch has occured).
What happens if you fork your process is basically just a copy of the page file, which - I assume - allows for smarter algorithms to take place, such as copying only if one process actually modifies the page file.
One important aspect to mention is that forking should not change any memory addresses, since (e.g. in C) there can be quite a bit of pointer logic in your application, relying on the consistency of the memory you allocated. If the addresses were to suddenly change after forking, it would break most, if not all, of this pointer logic.
You can read more on this here: http://en.wikipedia.org/wiki/Virtual_memory or, if you're truly interested, I recommend reading "Operating Systems - Internals and Design Principles" by William Stallings, which should cover most things including why and how virtual memory is used. There is also an excellent answer to this in this StackOverflow thread. Lastly, you might want to also read answers from this, this and this question.
I'm implementing IPC between two processes on the same machine (Linux x86_64 shmget and friends), and I'm trying to maximize the throughput of the data between the processes: for example I have restricted the two processes to only run on the same CPU, so as to take advantage of hardware caching.
My question is, does it matter where in the virtual address space each process puts the shared object? For example would it be advantageous to map the object to the same location in both processes? Why or why not?
It doesn't matter as long as the OS is concerned. It would have been advantageous to use the same base address in both processes if the TLB cache wasn't flushed between context switches. The Translation Lookaside Buffer (TLB) cache is a small buffer that caches virtual to physical address translations for individual pages in order to reduce the number of expensive memory reads from the process page table. Whenever a context switch occurs, the TLB cache is flushed - you don't want processes to be able to read a small portion of the memory of other processes, just because its page table entries are still cached in the TLB.
Context switch does not occur between processes running on different cores. But then each core has its own TLB cache and its content is completely uncorrelated with the content of the TLB cache of the other core. TLB flush does not occur when switching between threads from the same process. But threads share their whole virtual address space nevertheless.
It only makes sense to attach the shared memory segment at the same virtual address if you pass around absolute pointers to areas inside it. Imagine, for example, a linked list structure in shared memory. The usual practice is to use offsets from the beginning of the block instead of aboslute pointers. But this is slower as it involves additional pointer arithmetic. That's why you might get better performance with absolute pointers, but finding a suitable place in the virtual address space of both processes might not be an easy task (at least not doing it in a portable way), even on platforms with vast VA spaces like x86-64.
I'm not an expert here, but seeing as there are no other answers I will give it a go. I don't think it will really make a difference, because the virutal address does not necessarily correspond to the physical address. Said another way, the underlying physical address the OS maps your virtual address to is not dependent on the virtual address the OS gives you.
Again, I'm not a memory master. Sorry if I am way off here.
My main problem is that I need to enable multiple OS processes to communicate via a large shared memory heap that is mapped to identical address ranges in all processes. (To make sure that pointer values are actually meaningful.)
Now, I run into trouble that part of the program/library is using standard malloc/free and it seems to me that the underlying implementation does not respect mappings I create with mmap.
Or, another option is that I create mappings in regions that malloc already planned to use.
Unfortunately, I am not able to guarantee 100% identical malloc/free behavior in all processes before I establish the mmap-mappings.
This leads me to give the MAP_FIXED flag to mmap. The first process is using 0x0 as base address to ensure that the mapping range is at least somehow reasonable, but that does not seem to transfer to other processes. (The binary is also linked with -Wl,-no_pie.)
I tried to figure out whether I could query the system to know which pages it plans to use for malloc by reading up on malloc_default_zone, but that API does not seem to offer what I need.
Is there any way to ensure that malloc is not using particular memory pages/address ranges?
(It needs to work on OSX. Linux tips, which guide me in the right direction are appreciate, too.)
I notice this in the mmap documentation:
If MAP_FIXED is specified, a successful mmap deletes any previous mapping in the allocated address range
However, malloc won't use map fixed, so as long as you get in before malloc, you'd be okay: you could test whether a region is free by first trying to map it without MAP_FIXED, and if that succeeds at the same address (which it will do if the address is free) then you can remap with MAP_FIXED knowing that you're not choosing a section of address space that malloc had already grabbed
The only guaranteed way to guarantee that the same block of logical memory will be available in two processes is to have one fork from the other.
However, if you're compiling with 64-bit pointers, then you can just pick an (unusual) region of memory, and hope for the best, since the chance of collision is tiny.
See also this question about valid address spaces.
OpenBSD malloc() implementation uses mmap() for memory allocation. I suggest you to see how does it work then write your own custom implementation of malloc() and tell your program and the libraries used by it to use your own implementation of malloc().
Here is OpenBSD malloc():
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/stdlib/malloc.c?rev=1.140
RBA
Title says it pretty much all : is there a way to get the lowest free virtual memory address under windows ? I should add that I am interested by this information at the beginning of the program (before any dynamic memory allocation has been done).
Why I need it : trying to build a malloc implementation under Windows. If it is not possible I would have to really to whatever VirtualAlloc() returns when given NULL as first parameter. While you would expect it to do something sensible, like allocation memory at the bottom of what is available, there are no guarantees.
This can be implemented yourself by using VirtualQuery looking for pages that are marked as free. It would be relatively slow though. (You will also need to consider allocation granularity which is different from page size.)
I will say that unless you need contiguous blocks of memory, trying to keep everything close together is mostly meaningless since if two pages of virtual memory might be next to each other in the address space, there is no reason to assume they are close to each other in physical memory. In fact, even if they are close to each other at some point in time, if those pages get moved to backing store and then faulted back into memory, the page would not be faulted to the same physical address page.
The OS uses more complicated metrics than just what is the "lowest" memory address available. Specifically, VirtualAlloc allocates pages of memory, so depending on how much you're asking for, at least one page of unused address space has to be available at the starting address. So even if you think there's a "lower" address that it should have used, that address might not have been compatible with the operation that you asked for.