Implement Virtual Memory with Memory Mapped Files

Implement Virtual Memory with Memory Mapped Files - windows

Is it possible to wrap up memory mapped files something like this?
TVirtualMemoryManager = class
public
function AllocMem (Size : Integer) : Pointer;
procedure FreeMem (Ptr : Pointer);
end;
Since the memory mapped file API functions all take offsets I don't know how to manage the free areas in the memory mapped files. My only idea is to implement some kind of basic memory management (mainting free lists for different block sizes) but I don' t know how efficient this will be.
EDIT: What I really want (as David made clear to me) is this:
IVirtualMemory = interface
function ReadMem (Addr : Int64) : TBytes;
function AllocateMem (Data : TBytes) : Int64;
procedure FreeMem (Addr : Int64);
end;
I need to store continous blocks of bytes (each relatively small) in virtual memory and be able to read them back into memory using a 64-bit adress. Most of the time access is read-only. If a write is necessary I would just use FreeMem followed by AllocMem since the size will be different anyway.
I want a wrapper for a memory mapped file with this interface. Internally it has a handle to a memory mapped files and uses MapViewOfFile on each ReadMem request. The Addr 64-bit integers are just offsets into the memory mapped file. The open question is how to assign those adresses - I currently keep a list of free blocks that I maintain.

Your proposal that "Internally it has a handle to a memory mapped files and uses MapViewOfFile on each ReadMem request" will be just a waste of CPU resource, IMHO.
It is worth saying that your GetMem / FreeMem requirement won't be able to break the 3/4 GB barrier. Since all allocated memory will be mapped into memory until a call to FreeMem, you'll be short of memory space, just as with the regular Delphi memory manager. The best you can do is to rely of FastMM4, and change your program to reduce its memory use.
IMHO you'll have to change/update your specification. For instance, your "updated" question sounds just like a regular storage problem.
What you want is to be able to allocate more than 3/4 GB of data for your application. You have a working implementation of such a feature in our SynBigTable open source unit. This is a fast and light NoSQL solution in pure Delphi.
It is able to create a file of any size (only 64 bit limited), then will map the content of each record into memory, on request. It will use a memory mapping of the file, if possible. You can implement your interface very directly with TSynBigTable methods: ReadMem=Get, AllocMem=Add, FreeMem=Delete. The IDs will be your pointer-like values, and RawByteString will be used instead of TBytes.
You can access any block of data using an integer ID, or a string ID, or even use a sophisticated field layout (inside the record, or as in-memory metadata - including indexes and fast search).
Or rely on a regular embedded SQL database. For instance, SQLite3 is very good at handling BLOB fields, and is able to store huge amount of data. With a simple in-memory caching mechanism for most used records, it could be a powerful solution.

Related

How to overwrite portions of a DriverKit OSData internal buffer?

The documentation of OSData says that "...You can add bytes to them and overwrite portions of the byte array.". I can see a method to append bytes, but I don't understand how I am able to overwrite a portion of the buffer.
Another option would be to use IONewZero to allocate a number of elements of the type I need. I my case I just need this for ints.
Example:
T* dataBuffer = IONewZero(T, SIZE);
And then deallocate with:
IOSafeDeleteNULL(dataBuffer_, T, SIZE);
What are the advantages of using an OSData object compared to the solution with IONewZero / IOSafeDeleteNULL?

I think the documentation might just be copy-pasted from the kernel variant of OSData. I've seen that in a bunch of places, especially USBDriverKit.
OSData is mostly useful for dealing with plist-like data structures (i.e. setting and getting properties on service objects) in conjunction with the other OSTypes: OSArray, OSDictionary, OSNumber, etc. It's also used for in-band (<= 4096 byte) "struct" arguments of user client external methods.
The only use I can see outside of those scenarios is when you absolutely have to reference-count a blob of data. But it's certainly not a particularly convenient or efficient container for data-in-progress. If you subsequently need to send the data to a device or map it to user space, IOBufferMemoryDescriptor is probably a better choice (and also reference counted) though it's even more heavyweight.

accessing process memory parts

I'm currently studying memory management of OS by the video lecture. The instructor says,
In fact, you may have, and it is quite often the case that there may
be several parts of the process memory, which are not even accessed at
all. That is, they are neither executed, loaded or stored from memory.
I don't understand the saying since even if in a simple C program, we access whole address space of it. Don't we?
#include <stdio.h>
int main()
{
printf("Hello, World!");
return 0;
}
Could you elucidate the saying? If possible could you provide an example program wherein "several parts of the process memory, which are not even accessed at all" when it is run.

Imagine you have a large and complicated utility (e.g. a compiler), and the user asks it for help (e.g. they type gcc --help instead of asking it to compile anything). In this case, how much of the utility's code and data is used?
Most programs have various optional parts that aren't used (e.g. maybe something that works with graphics will have some code for 16 bits per pixel and other code for 32 bits per pixel, and will determine which code to use and not use the other code). Most heap allocators are "eager" (e.g. they'll ask the OS for 20 MiB of space and then might only "malloc() 2 MiB of it). Sometimes a program will memory map a huge file but then only access a small part of it.
Even for your trivial "hello world" example code; the virtual address space probably contains a huge (several MiB) shared library to support lots of C standard library functions (e.g. puts(), fprintf(), sprintf(), ...) and your program only uses a small part of that shared library; and your program probably reserves a conservative amount of space for its stack (e.g. maybe 20 KiB of space for its stack) and then probably only uses a few hundred bytes of stack.

In a virtual memory system, the address space of the process is created in secondary store at start up. Little or nothing gets placed in memory. For example, the operating system may use the executable file as the page file for the code and static data. It just sets up an internal structure that says some range of memory is mapped to these blocks in the executable file. The same goes for shared libraries. The other data gets mapped to the page file.
As your program runs it starts page faulting rapidly because nothing is in memory and the operating system has to load it from secondary storage.
If there is something that your program does not reference, it never gets loaded into memory.
If you had global variable declared like
char somedata [1045] ;
and your program never references that variable, it will never get loaded into memory. The same goes for code. If you have pages of code that done get execute (e.g. error handling code) it does not get loaded. If you link to shared libraries, you will likely bece including a lot of functions that you never use. Likewise, they will not get loaded if you do not execute them.

To begin with, not all of the address space is backed by physical memory at all times, especially if your address space covers 248+ bytes, which your computer doesn't have (which is not to say you can't map most of the address space to a single physical page of memory, which would be of very little utility for anything).
And then some portions of the address space may be purposefully permanently inaccessible, like a few pages near virtual address 0 (to catch NULL pointer dereferences).
And as it's been pointed out in the other answers, with on-demand loading of programs, you may have some portions of the address space reserved for your program but if the program doesn't happen to need any of its code or data there, nothing needs to be loader there either.

Does mlock prevent the page from appearing in a core dump?

I have a process with some sensitive memory which must never be written to disk.
I also have a requirement that I need core dumps to satisfy first-time data capture requirements of my client.
Does locking a page using mlock() prevent the page from appearing in a core dump?
Note, this is an embedded system and we don't actually have any swap space.

Taken from man 2 madvise:
The madvise() system call advises the kernel about how to handle
paging input/output in the address range beginning at address addr and
with size length bytes. It allows an application to tell the kernel
how it expects to use some mapped or shared memory areas, so that the
kernel can choose appropriate read-ahead and caching techniques.
This call does not influence the semantics of the application (except
in the case of MADV_DONTNEED), but may influence its performance. The
kernel is free to ignore the advice.
Particularly check the option MADV_DONTDUMP :
Exclude from a core dump those pages in the range specified by addr
and length. This is useful in applications that have large areas of
memory that are known not to be useful in a core dump. The effect of
MADV_DONTDUMP takes precedence over the bit mask that is set via the
/proc/PID/coredump_filter file (see core(5)).

Sharing GlobalAlloc() memory from DLL to multiple Win32 applications

I want to move my caching library to a DLL and allow multiple applications to share a single pointer allocated within the DLL using GlobalAlloc(). How could I accomplish this, and would it result in a significant performance decrease?

You could certainly do this and there won't be any performance implication for a single pointer.
Rather than use GlobalAlloc, a legacy API, you should opt for a different shared heap. For example the simplest to use is the COM allocator, CoTaskMemAlloc. Or you can use HeapAlloc passing the process heap obtained by GetProcessHeap.
For example, and neglecting to show error checking:
void *mem = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, size);
Note that you only need to worry about heap sharing if you expect the memory to be deallocated in a different module from where it was created. If your DLL both creates and destroys the memory then you can use plain old malloc. Because all modules live in the same process address space, memory allocated by any module in that process, can be used by any other module.
Update
I failed on first reading of the question to pick up on the possibility that you may be wanting multiple process to have access to the same memory. If that's what you need then it is only possible with memory mapped files, or perhaps with some form of IPC.

Can address space be recycled for multiple calls to MapViewOfFileEx without chance of failure?

Consider a complex, memory hungry, multi threaded application running within a 32bit address space on windows XP.
Certain operations require n large buffers of fixed size, where only one buffer needs to be accessed at a time.
The application uses a pattern where some address space the size of one buffer is reserved early and is used to contain the currently needed buffer.
This follows the sequence:
(initial run) VirtualAlloc -> VirtualFree -> MapViewOfFileEx
(buffer changes) UnMapViewOfFile -> MapViewOfFileEx
Here the pointer to the buffer location is provided by the call to VirtualAlloc and then that same location is used on each call to MapViewOfFileEx.
The problem is that windows does not (as far as I know) provide any handshake type operation for passing the memory space between the different users.
Therefore there is a small opportunity (at each -> in my above sequence) where the memory is not locked and another thread can jump in and perform an allocation within the buffer.
The next call to MapViewOfFileEx is broken and the system can no longer guarantee that there will be a big enough space in the address space for a buffer.
Obviously refactoring to use smaller buffers reduces the rate of failures to reallocate space.
Some use of HeapLock has had some success but this still has issues - something still manages to steal some memory from within the address space.
(We tried Calling GetProcessHeaps then using HeapLock to lock all of the heaps)
What I'd like to know is there anyway to lock a specific block of address space that is compatible with MapViewOfFileEx?
Edit: I should add that ultimately this code lives in a library that gets called by an application outside of my control

You could brute force it; suspend every thread in the process that isn't the one performing the mapping, Unmap/Remap, unsuspend the suspended threads. It ain't elegant, but it's the only way I can think of off-hand to provide the kind of mutual exclusion you need.

Have you looked at creating your own private heap via HeapCreate? You could set the heap to your desired buffer size. The only remaining problem is then how to get MapViewOfFileto use your private heap instead of the default heap.
I'd assume that MapViewOfFile internally calls GetProcessHeap to get the default heap and then it requests a contiguous block of memory. You can surround the call to MapViewOfFile with a detour, i.e., you rewire the GetProcessHeap call by overwriting the method in memory effectively inserting a jump to your own code which can return your private heap.
Microsoft has published the Detour Library that I'm not directly familiar with however. I know that detouring is surprisingly common. Security software, virus scanners etc all use such frameworks. It's not pretty, but may work:
HANDLE g_hndPrivateHeap;
HANDLE WINAPI GetProcessHeapImpl() {
return g_hndPrivateHeap;
}
struct SDetourGetProcessHeap { // object for exception safety
SDetourGetProcessHeap() {
// put detour in place
}
~SDetourGetProcessHeap() {
// remove detour again
}
};
void MapFile() {
g_hndPrivateHeap = HeapCreate( ... );
{
SDetourGetProcessHeap d;
MapViewOfFile(...);
}
}
These may also help:
How to replace WinAPI functions calls in the MS VC++ project with my own implementation (name and parameters set are the same)?
How can I hook Windows functions in C/C++?
http://research.microsoft.com/pubs/68568/huntusenixnt99.pdf

Imagine if I came to you with a piece of code like this:
void *foo;
foo = malloc(n);
if (foo)
free(foo);
foo = malloc(n);
Then I came to you and said, help! foo does not have the same address on the second allocation!
I'd be crazy, right?
It seems to me like you've already demonstrated clear knowledge of why this doesn't work. There's a reason that the documention for any API that takes an explicit address to map into lets you know that the address is just a suggestion, and it can't be guaranteed. This also goes for mmap() on POSIX.
I would suggest you write the program in such a way that a change in address doesn't matter. That is, don't store too many pointers to quantities inside the buffer, or if you do, patch them up after reallocation. Similar to the way you'd treat a buffer that you were going to pass into realloc().
Even the documentation for MapViewOfFileEx() explicitly suggests this:
While it is possible to specify an address that is safe now (not used by the operating system), there is no guarantee that the address will remain safe over time. Therefore, it is better to let the operating system choose the address. In this case, you would not store pointers in the memory mapped file, you would store offsets from the base of the file mapping so that the mapping can be used at any address.
Update from your comments
In that case, I suppose you could:
Not map into contiguous blocks. Perhaps you could map in chunks and write some intermediate function to decide which to read from/write to?
Try porting to 64 bit.

As the earlier post suggests, you can suspend every thread in the process while you change the memory mappings. You can use SuspendThread()/ResumeThread() for that. This has the disadvantage that your code has to know about all the other threads and hold thread handles for them.
An alternative is to use the Windows debug API to suspend all threads. If a process has a debugger attached, then every time the process faults, Windows will suspend all of the process's threads until the debugger handles the fault and resumes the process.
Also see this question which is very similar, but phrased differently:
Replacing memory mappings atomically on Windows

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio