How do I create an instance of a class, variably sized, at a specific memory location? - c++11

I'm working on a project involving writing packets to a memory-mapped file. Our current strategy is to create a packet class containing the following members
uint32_t packetHeader;
uint8_t packetPayload[];
uint32_t packetChecksum;
When we create a packet, first we'd like to have its address in memory be a specified offset within the memory mapped file, which I think can be done with placement-new(). However, we'd also like for the packetPayload not to be a pointer to some memory from the heap, but contiguous with the rest of the class (so we can avoid memcpying from heap to our eventual output file)
i.e.
Memory
Beginning of class | BOC + 4 | (length of Payload) |
Header Payload Checksum
Would this be achievable using a length argument for the Packet class constructor? Or would we have to template this class for variably sized payloads?

Forget about trying to make that the layout of your class. You'll be fighting against the C++ language all the day long. Instead a class that provides access to the binary layout (in shared memory). But the class instance itself will not be in shared memory. And the byte range in shared/mapped memory will not be a C++ object at all, it just exists within the file mapping address range.
Presumably the length is fixed from the moment of creation? If so, then you can safely cache the length, pointer to the checksum, etc in your accessor object. Since this cache isn't inside the file, you can store whatever you want however you want without concern for its layout. You can even use virtual member functions, because the v-table is going in the class instance, not the range of the binary file.
Also, given that this lives in shared memory, if there are multiple writers you'll have to be very careful to synchronize between them. If you're just prepositioning a buffer in shared/mapped memory to avoid a copy later, but totally handing off ownership between processes so that the data is never shared by simultaneous accesses, it will be easier. You also probably want to calculate the checksum once after all the data is written, instead of trying to keep it up-to-date (and risking data races in the process) for every single write into the buffer.

First remember, that you need to know what your payload length is, somehow. Either you specify it in your instance somewhere, or you template your class over the payload length.
Having said that - you will need one of:
packetOffset being a pointer
A payload length member
A checksum offset member
and you'll want to use a named constructor idiom which takes the allocation length, and performs both the allocation and the setup of the offset/length/pointer member to a value corresponding to the length.

Related

How to overwrite portions of a DriverKit OSData internal buffer?

The documentation of OSData says that "...You can add bytes to them and overwrite portions of the byte array.". I can see a method to append bytes, but I don't understand how I am able to overwrite a portion of the buffer.
Another option would be to use IONewZero to allocate a number of elements of the type I need. I my case I just need this for ints.
Example:
T* dataBuffer = IONewZero(T, SIZE);
And then deallocate with:
IOSafeDeleteNULL(dataBuffer_, T, SIZE);
What are the advantages of using an OSData object compared to the solution with IONewZero / IOSafeDeleteNULL?
I think the documentation might just be copy-pasted from the kernel variant of OSData. I've seen that in a bunch of places, especially USBDriverKit.
OSData is mostly useful for dealing with plist-like data structures (i.e. setting and getting properties on service objects) in conjunction with the other OSTypes: OSArray, OSDictionary, OSNumber, etc. It's also used for in-band (<= 4096 byte) "struct" arguments of user client external methods.
The only use I can see outside of those scenarios is when you absolutely have to reference-count a blob of data. But it's certainly not a particularly convenient or efficient container for data-in-progress. If you subsequently need to send the data to a device or map it to user space, IOBufferMemoryDescriptor is probably a better choice (and also reference counted) though it's even more heavyweight.

Working of mmap()

I am trying to get an idea on how does memory mapping take place using the system call mmap.
So far I know mmap takes arguments from the user and returns a logical address of where the file is stored. When the user tries to access it takes this address to the map table converts it to a a physical address and carries the operation as requested.
However I found articles as code example and Theoretical explanation
What it mentions is the memory mapping is carried out as:
A. Using system call mmap ()
B. file operations using (struct file *filp, struct vm_area_struct *vma)
What I am trying to figure out is:
How the arguments passed in the mmap system call are used in the struct vm_area_struct *vma) More generally how are these 2 related.
for instance: the struct vm_area_struct has arguments such as starting address, ending address permissions,etc. How are the values sent by the user used to fill values of these variables.
I am trying to write a driver so, Does the kernal fill the values for variables in the structure for us and I simply use it to call and pass values to remap_pfn_range
And a more fundamental question, why is a different file systems operation needed. The fact that mmap returns the virtual address means that it has already achieved a mapping doesnt it ?
Finally I am not that clear about how the entire process would work in user as well as kernal space. Any documentation explaining the process in details would be helpful.

Transfer a pointer through boost::interprocess::message_queue

What I am trying to do is have application A send application B a pointer to an object which A has allocated on shared memory ( using boost::interprocess ). For that pointer transfer I intend to use boost::interprocess::message_queue. Obviously a direct raw pointer from A is not valid in B so I try to transfer an offset_ptr allocated on the shared memory. However that also does not seem to work.
Process A does this:
typedef offset_ptr<MyVector> MyVectorPtr;
MyVectorPtr * myvector;
myvector = segment->construct<MyVectorPtr>( boost::interprocess::anonymous_instance )();
*myvector = segment->construct<MyVector>( boost::interprocess::anonymous_instance )
(*alloc_inst_vec); ;
// myvector gets filled with data here
//Send on the message queue
mq->send(myvector, sizeof(MyVectorPtr), 0);
Process B does this:
// Create a "buffer" on this side of the queue
MyVectorPtr * myvector;
myvector = segment->construct<MyVectorPtr>( boost::interprocess::anonymous_instance )();
mq->receive( myvector, sizeof(MyVectorPtr), recvd_size, priority);
As I see it, in this way a do a bit copy of the offset pointer which invalidates him in process B. How do I do this right?
It seems you can address it as described in this post on the boost mailing list.
I agree there is some awkwardness here and offset_ptr doesn't really work for what you are trying to do. offset_ptr is useful if the pointer itself is stored inside of another class/struct which also is allocated in your shared memory segment, but generally you have some top-level item which is not a member of some object allocated in shared memory.
You'll notice the offset_ptr example kindof glosses over this - it just has a comment "Communicate list to other processes" with no details. In some cases you may have a single named top-level object and that name can be how you communicate it, but if you have an arbitrary number of top-level objects to communicate, it seems like just sending the offset from the shared memory's base address is the best you can do.
You calculate the offset on the sending in, send it, and then add to the base adddress on the receiving end. If you want to be able to send nullptr as well, you could do like offset_ptr does and agree that 1 is an offset that is sufficiently unlikely to be used, or pick another unlikely sentinel value.

Partial unmap of Win32 memory-mapped file

I have some code (which I cannot change) that I need to get working in a native Win32 environment. This code calls mmap() and munmap(), so I have created those functions using CreateFileMapping(), MapViewOfFile(), etc., to accomplish the same thing. Initially this works fine, and the code is able to access files as expected. Unfortunately the code goes on to munmap() selected parts of the file that it no longer needs.
x = mmap(0, size, PROT_READ, MAP_SHARED, fd, 0);
...
munmap(x, hdr_size);
munmap(x + foo, bar);
...
Unfortunately, when you pass a pointer into the middle of the mapped range to UnmapViewOfFile() it destroys the entire mapping. Even worse, I can't see how I would be able to detect that this is a partial un-map request and just ignore it.
I have tried calling VirtualFree() on the range but, unsurprisingly, this produces ERROR_INVALID_PARAMETER.
I'm beginning to think that I will have to use static/global variables to track all the open memory mappings so that I can detect and ignore partial unmappings, but I hope you have a better idea...
edit:
Since I wasn't explicit enough above: the docs for UnMapViewOfFile do not accurately reflect the behavior of that function.
Un-mapping the whole view and remapping pieces is not a good solution because you can only suggest a base address for a new mapping, you can't really control it. The semantics of munmap() don't allow for a change to the base address of the still-mapped portion.
What I really need is a way to find the base address and size of a already-mapped memory area.
edit2: Now that I restate the problem that way, it looks like the VirtualQuery() function will suffice.
It is quite explicit in the MSDN Library docs for UnmapViewOfFile:
lpBaseAddress A pointer to the
base address of the mapped view of a
file that is to be unmapped. This
value must be identical to the value
returned by a previous call to the
MapViewOfFile or MapViewOfFileEx
function.
You changing the mapping by unmapping the old one and creating a new one. Unmapping bits and pieces isn't well supported, nor would it have any useful side-effects from a memory management point of view. You don't want to risk getting the address space fragmented.
You'll have to do this differently.
You could keep track each mapping and how many pages of it are still allocated by the client and only free the mapping when that counter reaches zero. The middle sections would still be mapped, but it wouldn't matter since the client wouldn't be accessing that memory anyway.
Create a global dictionary of memory mappings through this interface. When a mapping request comes through, record the address, size and number of pages that are in the range. When a unmap request is made, find out which mapping owns that address and decrease the page count by the number of pages that are being freed. When that count reaches zero, really unmap the view.

Can address space be recycled for multiple calls to MapViewOfFileEx without chance of failure?

Consider a complex, memory hungry, multi threaded application running within a 32bit address space on windows XP.
Certain operations require n large buffers of fixed size, where only one buffer needs to be accessed at a time.
The application uses a pattern where some address space the size of one buffer is reserved early and is used to contain the currently needed buffer.
This follows the sequence:
(initial run) VirtualAlloc -> VirtualFree -> MapViewOfFileEx
(buffer changes) UnMapViewOfFile -> MapViewOfFileEx
Here the pointer to the buffer location is provided by the call to VirtualAlloc and then that same location is used on each call to MapViewOfFileEx.
The problem is that windows does not (as far as I know) provide any handshake type operation for passing the memory space between the different users.
Therefore there is a small opportunity (at each -> in my above sequence) where the memory is not locked and another thread can jump in and perform an allocation within the buffer.
The next call to MapViewOfFileEx is broken and the system can no longer guarantee that there will be a big enough space in the address space for a buffer.
Obviously refactoring to use smaller buffers reduces the rate of failures to reallocate space.
Some use of HeapLock has had some success but this still has issues - something still manages to steal some memory from within the address space.
(We tried Calling GetProcessHeaps then using HeapLock to lock all of the heaps)
What I'd like to know is there anyway to lock a specific block of address space that is compatible with MapViewOfFileEx?
Edit: I should add that ultimately this code lives in a library that gets called by an application outside of my control
You could brute force it; suspend every thread in the process that isn't the one performing the mapping, Unmap/Remap, unsuspend the suspended threads. It ain't elegant, but it's the only way I can think of off-hand to provide the kind of mutual exclusion you need.
Have you looked at creating your own private heap via HeapCreate? You could set the heap to your desired buffer size. The only remaining problem is then how to get MapViewOfFileto use your private heap instead of the default heap.
I'd assume that MapViewOfFile internally calls GetProcessHeap to get the default heap and then it requests a contiguous block of memory. You can surround the call to MapViewOfFile with a detour, i.e., you rewire the GetProcessHeap call by overwriting the method in memory effectively inserting a jump to your own code which can return your private heap.
Microsoft has published the Detour Library that I'm not directly familiar with however. I know that detouring is surprisingly common. Security software, virus scanners etc all use such frameworks. It's not pretty, but may work:
HANDLE g_hndPrivateHeap;
HANDLE WINAPI GetProcessHeapImpl() {
return g_hndPrivateHeap;
}
struct SDetourGetProcessHeap { // object for exception safety
SDetourGetProcessHeap() {
// put detour in place
}
~SDetourGetProcessHeap() {
// remove detour again
}
};
void MapFile() {
g_hndPrivateHeap = HeapCreate( ... );
{
SDetourGetProcessHeap d;
MapViewOfFile(...);
}
}
These may also help:
How to replace WinAPI functions calls in the MS VC++ project with my own implementation (name and parameters set are the same)?
How can I hook Windows functions in C/C++?
http://research.microsoft.com/pubs/68568/huntusenixnt99.pdf
Imagine if I came to you with a piece of code like this:
void *foo;
foo = malloc(n);
if (foo)
free(foo);
foo = malloc(n);
Then I came to you and said, help! foo does not have the same address on the second allocation!
I'd be crazy, right?
It seems to me like you've already demonstrated clear knowledge of why this doesn't work. There's a reason that the documention for any API that takes an explicit address to map into lets you know that the address is just a suggestion, and it can't be guaranteed. This also goes for mmap() on POSIX.
I would suggest you write the program in such a way that a change in address doesn't matter. That is, don't store too many pointers to quantities inside the buffer, or if you do, patch them up after reallocation. Similar to the way you'd treat a buffer that you were going to pass into realloc().
Even the documentation for MapViewOfFileEx() explicitly suggests this:
While it is possible to specify an address that is safe now (not used by the operating system), there is no guarantee that the address will remain safe over time. Therefore, it is better to let the operating system choose the address. In this case, you would not store pointers in the memory mapped file, you would store offsets from the base of the file mapping so that the mapping can be used at any address.
Update from your comments
In that case, I suppose you could:
Not map into contiguous blocks. Perhaps you could map in chunks and write some intermediate function to decide which to read from/write to?
Try porting to 64 bit.
As the earlier post suggests, you can suspend every thread in the process while you change the memory mappings. You can use SuspendThread()/ResumeThread() for that. This has the disadvantage that your code has to know about all the other threads and hold thread handles for them.
An alternative is to use the Windows debug API to suspend all threads. If a process has a debugger attached, then every time the process faults, Windows will suspend all of the process's threads until the debugger handles the fault and resumes the process.
Also see this question which is very similar, but phrased differently:
Replacing memory mappings atomically on Windows

Resources