Determining the memory access pattern of a program - memory-management

I want to extract all the virtual memory addresses accessed by a program and the sequence in which these memory addresses are accessed. Is there some tool/system call on Windows(preferred) or Linux which would let me do this?
Say there is a program P which accesses memory addresses m1,m2,m3 etc. I want to log what addresses were accessed and the sequence in which these addresses were accessed. I also want to know whether the memory address belongs to stack or heap.
Thanks!

What you need is called a tool for dynamic binary instrumentation. A famous one, which I am currently working with is Intel Pin. Be warned that you have to do a lot of reading and reviewing the examples in order to produce something functional.

Related

Restricting virtual address range of a process?

On Linux x86_64, I have a simple application for which I track all memory accesses with Intel's PIN. The program uses only "a bit" of memory, most of it for dynamically allocated matrices (I've bisected the right value with ulimit). However, the memory accesses span the whole range of the VM address space, low addresses for what I presume global variables in the code, high addresses for the malloc()ed arrays.
There's a huge gap in the middle, and even in the high addresses the range is between 0x7fff4e1a37f4 and 0x7fea3af99000, which is much larger than what I would assume my application to use in total.
The post-processing that I need to do on the memory accesses deals very badly with these sparse accesses, so I'm looking for a way to restrict the virtual address range available to the process so that "it just fits", and accesses will show addresses between 0 and some more reasonable value for dynamically allocated memory (somewhere around the 40 Mb that I've discovered through ulimit).
Q: Is there an easy way to limit the available address space (and hence implicitly, available memory) to an individual process on Linux, ideally from the command line on a per-process basis?
Further notes:
I can link my application statically.
Even if I limit the memory with ulimit, the process still uses the full VM address range (not entirely unexpected).
I know about /proc/${pid}/maps, but would like to avoid creating wrappers to deal with this, and how to actually use the data in there.
I've heard about prelink (which may not apply to my static binary, but only libraries?) and can image that there are more intrusive ways to interfere with malloc(), but these solutions are too far out of my expertise to evaluate their usefulness (Limiting the heap area's Virtual address range, https://stackoverflow.com/a/29960208/60462)
If there's no simple command line-solution, instead of going for any elaborate hack, I'll probably just wing it in the post-processing and "normalize" the addresses e.g. via a few lines of perl).
ld.so(8) lists LD_PREFER_MAP_32BIT_EXEC as a way to restrict the address space to the lower 2GiB which is significantly less than the normal 64-bit address space, but possibly not small enough for your purposes.
It may also be possible to use the prctl(2) PR_SET_MM options to control the addresses of different parts of the program to solve your problem.

Questions about memory management of a process

I have few question of operating system. I Google a lot but not able to find out any answer. Can anyone please help me.
Q1. How much memory is made available to a user program by the kernel, is there any limit to it?
Q2. What is the range of addresses a user program can have at max, what determines it?
Q3. What happens if excess memory is allocated to a user program, say malloc in an infinite loop?
Q1. How much memory is made available to a user program by the kernel, is there any limit to it?
Varies.In a modern system this is limited by process quotas, system parameters, and the page file size (and ultimately the virtual address space size of the hardware if you get there).
Q2. What is the range of addresses a user program can have at max, what determines it?
Varies. This is determined by both the hardware and the operating system's configuration of the page tables. Some CPU types assign a fixed range of user addresses within the total virtual address space. Others have a single range of logical addresses and allow the OS to divide it up into user and system addresses. The latter is more flexible. The former allows paging of page tables.
Q3. What happens if excess memory is allocated to a user program, say malloc in an infinite loop?
The allocation requests will fail. In the case of malloc, you'd get a null pointer returned. At the system service level, you'd get a failure code.

MapViewOfFileEx - valid lpBaseAddress

In answer to a question about mapping non-contiguous blocks of files into contiguous memory, here, it was suggested by one respondent that I should use VirtualAllocEx() with MEM_RESERVE in order to establish a 'safe' value for the final (lpBaseAddress) parameter for MapViewOfFileEx().
Further investigation revealed that this approach causes MapViewofFileEx() to fail with error 487: "Attempt to access invalid address." The MSDN page says:
"No other memory allocation can take place in the region that is used for mapping, including the use of the VirtualAlloc or VirtualAllocEx function to reserve memory."
While the documentation might be considered ambiguous with respect to valid sequences of calls, experimentation suggests that it is not valid to reserve memory for MapViewOfFileEx() using VirtualAllocEx().
On the web, I've found examples with hard-coded values - example:
#define BASE_MEM (VOID*)0x01000000
...
hMap = MapViewOfFileEx( hFile, FILE_MAP_WRITE, 0, 0, 0, BASE_MEM );
To me, this seems inadequate and unreliable... It is far from clear to me why this address is safe, or how many blocks can be safely be mapped there. It seems even more shaky given that I need my solution to work in the context of other allocations... and that I need my source to compile and work in both 32 and 64 bit contexts.
What I'd like to know is if there is any way to reliably reserve a pool of address space in order that - subsequently - it can be reliably used by MapViewOfFileEx to map blocks to explicit memory addresses.
You almost got to the solution by yourself but fell short of the last small step.
As you figured, use VirtualAlloc (with MEM_RESERVE) to find room in your address space, but after that (and before MapViewOfFileEx) use VirtualFree (with MEM_RELEASE). Now the address range will be free again. Then use the same memory address (returned by VirtualAlloc) with MapViewOfFileEx.
What you are trying to do is impossible.
From the MapViewOfFileEx docs, the pointer you supply is "A pointer to the memory address in the calling process address space where mapping begins. This must be a multiple of the system's memory allocation granularity, or the function fails."
The memory allocation granularity is 64K, so you cannot map disparate 4K pages from the file into adjacent 4K pages in virtual memory.
If you provide a base address, the function will try to map your file at that address. If it cannot use that base address (because something is already using all or part of the requested memory region), then the call will fail.
For most applications, there's no real point trying to fix the address yourself. If you're a sophisticated database process and you're trying to carefully manage your own memory layout on a machine with a known configuration for efficiency reasons, then it might be reasonable. But you'd have to be prepared for failure.
In 64-bit processes, the virtual address space is pretty wide open, so it might be possible to select a base address with some certainty, but I don't think I'd bother.
From MSDN:
While it is possible to specify an address that is safe now (not used by the operating system), there is no guarantee that the address will remain safe over time. Therefore, it is better to let the operating system choose the address.
I believe "over time" refers to future versions of the OS and whatever run-time libraries you're using (e.g., for memory allocation), which might take a different approach to memory layout.
Also:
If the lpBaseAddress parameter specifies a base offset, the function succeeds if the specified memory region is not already in use by the calling process. The system does not ensure that the same memory region is available for the memory mapped file in other 32-bit processes.
So basically, your instinct is right: specifying a base address is not reliable. You can try, but you must be prepared for failure.
So to directly answer your question:
What I'd like to know is if there is any way to reliably reserve a pool of address space in order that - subsequently - it can be reliably used by MapViewOfFileEx to map blocks to explicit memory addresses.
No, there isn't. Not without applying many constraints on the runtime environment (e.g., limiting to a specific version of the OS, setting base addresses for all of your DLLs, disallowing DLL injection, etc.).

How to ensure that malloc and mmap cooperate, i.e., work on non-overlapping memory regions?

My main problem is that I need to enable multiple OS processes to communicate via a large shared memory heap that is mapped to identical address ranges in all processes. (To make sure that pointer values are actually meaningful.)
Now, I run into trouble that part of the program/library is using standard malloc/free and it seems to me that the underlying implementation does not respect mappings I create with mmap.
Or, another option is that I create mappings in regions that malloc already planned to use.
Unfortunately, I am not able to guarantee 100% identical malloc/free behavior in all processes before I establish the mmap-mappings.
This leads me to give the MAP_FIXED flag to mmap. The first process is using 0x0 as base address to ensure that the mapping range is at least somehow reasonable, but that does not seem to transfer to other processes. (The binary is also linked with -Wl,-no_pie.)
I tried to figure out whether I could query the system to know which pages it plans to use for malloc by reading up on malloc_default_zone, but that API does not seem to offer what I need.
Is there any way to ensure that malloc is not using particular memory pages/address ranges?
(It needs to work on OSX. Linux tips, which guide me in the right direction are appreciate, too.)
I notice this in the mmap documentation:
If MAP_FIXED is specified, a successful mmap deletes any previous mapping in the allocated address range
However, malloc won't use map fixed, so as long as you get in before malloc, you'd be okay: you could test whether a region is free by first trying to map it without MAP_FIXED, and if that succeeds at the same address (which it will do if the address is free) then you can remap with MAP_FIXED knowing that you're not choosing a section of address space that malloc had already grabbed
The only guaranteed way to guarantee that the same block of logical memory will be available in two processes is to have one fork from the other.
However, if you're compiling with 64-bit pointers, then you can just pick an (unusual) region of memory, and hope for the best, since the chance of collision is tiny.
See also this question about valid address spaces.
OpenBSD malloc() implementation uses mmap() for memory allocation. I suggest you to see how does it work then write your own custom implementation of malloc() and tell your program and the libraries used by it to use your own implementation of malloc().
Here is OpenBSD malloc():
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/stdlib/malloc.c?rev=1.140
RBA

Getting the lowest free virtual memory address in windows

Title says it pretty much all : is there a way to get the lowest free virtual memory address under windows ? I should add that I am interested by this information at the beginning of the program (before any dynamic memory allocation has been done).
Why I need it : trying to build a malloc implementation under Windows. If it is not possible I would have to really to whatever VirtualAlloc() returns when given NULL as first parameter. While you would expect it to do something sensible, like allocation memory at the bottom of what is available, there are no guarantees.
This can be implemented yourself by using VirtualQuery looking for pages that are marked as free. It would be relatively slow though. (You will also need to consider allocation granularity which is different from page size.)
I will say that unless you need contiguous blocks of memory, trying to keep everything close together is mostly meaningless since if two pages of virtual memory might be next to each other in the address space, there is no reason to assume they are close to each other in physical memory. In fact, even if they are close to each other at some point in time, if those pages get moved to backing store and then faulted back into memory, the page would not be faulted to the same physical address page.
The OS uses more complicated metrics than just what is the "lowest" memory address available. Specifically, VirtualAlloc allocates pages of memory, so depending on how much you're asking for, at least one page of unused address space has to be available at the starting address. So even if you think there's a "lower" address that it should have used, that address might not have been compatible with the operation that you asked for.

Resources