How to map memory in DriverKit using IOMemoryDescriptor::CreateMapping? - macos

I am trying to learn more about DriverKit and memory management, and I read this question:
How to allocate memory in a DriverKit system extension and map it to another process?
And I would like to understand how to use IOMemoryDescriptor::CreateMapping.
I wrote a little app to test this where I do (very simplified code):
uint8_t * buffer = new uint8_t[256];
for (int i = 0 ; i < 256 ; i++)
buffer[i] = 0xC6;
clientData in, out;
in.nbytes = 256;
in.pbuffer = buffer;
size_t sout = sizeof(out);
IOConnectCallStructMethod(connection, selector,&in,sizeof(in),&out,&sout);
// out.pbuffer now has new values in it
In my Kext user client class, I was doing (I am simplifying):
IOReturn UserClient::MyExtFunction(clientData * in, clientData * out, IOByteCount inSize, IOByteCount * outSize)
{
MyFunction(in->nBytes, in->pbuffer);//this will change the content of pbuffer
*out = *in;
}
IOReturn UserClient::MyFunction(SInt32 nBytesToRead,void* pUserBuffer,SInt32* nBytesRead)
{
PrepareBuffer(nBytesToRead,&pBuffer);
...
(call function that will fill pBuffer)
}
IOReturn UserClient::PrepareBuffer(UInt32 nBytes,void** pBuffer);
{
IOMemoryDescriptor * desc = IOMemoryDescriptor::withAddressRange((mach_vm_address_t)*pBuffer,nBytes,direction, owner task);
desc->prepare();
IOMemoryMap * map = desc->map();
*pBuffer = (void*)map->getVirtualAddress();
return kIOReturnSuccess;
}
This is what I don't know how to reproduce in a DExt and where I think I really don't understand the basic of CreateMapping.
Or is what I used to do not possible?
In my driver, this is where I don't know how to use CreateMapping and IOMemoryMap so this buffer can be mapped to a memory location and updated with different values.
I can create an IOBufferMemoryDescriptor but how do I tie it to the buffer from my application? I also don't understand the various options for CreateMapping.
Please note that in another test app I have successfully used IOConnectMapMemory64()/CopyClientMemoryForType() but I would like to learn specifically about CreateMapping().
(I hope it is alright I edited this question a lot... still new to StackOverflow)

Or is what I used to do not possible?
In short, no.
You're attempting to map arbitrary user process memory, which the client application did not explicitly mark as available to the driver using IOKit. This doesn't fit with Apple's ideas about safety, security, and sandboxing, so this sort of thing isn't available in DriverKit.
Obviously, kexts are all-powerful, so this was possible before, and indeed, I've used the technique myself in shipping drivers and then ran into trouble when porting said kexts to DriverKit.
The only ways to gain direct access to the client process's memory, as far as I'm aware, are:
By passing buffers >= 4097 bytes as struct input or output arguments to IOConnectCall…Method()s so they arrive as IOMemoryDescriptors in the driver. Note that these can be retained in the driver longer term, but at least for input structs, updates on the user space side won't be reflected on the driver side as a copy-on-write mapping is used. So they should be used purely for sending data in the intended direction.
By the user process mapping an existing IOMemoryDescriptor into its space using IOConnectMapMemory64()/CopyClientMemoryForType().
This does mean you can't use indirect data structures like the one you are using. You'll have to use "packed" structures, or indices into long-lasting shared buffers.
By "packed" structures I mean buffers containing a header struct such as your clientData which is followed in contiguous memory by further data, such as your buffer, referencing it by offset into this contiguous memory. The whole contiguous memory block can be passed as an input struct.
I have filed feedback with Apple requesting a more powerful mechanism for exchanging data between user clients and dexts; I have no idea if it will be implemented, but if such a facility would be useful, I recommend you do the same. (explaining what you'd like to use it for with examples) The more of us report it, the more likely it'll happen. (IOMemoryDescriptor::CreateSubMemoryDescriptor() was added after I filed a request for it; I won't claim I was the first to do so, or that Apple wasn't planning to add it prior to my suggestion, but they are actively improving the DriverKit APIs.)
Original answer before question was edited to be much more specific:
(Retained because it explains in general terms how buffer arguments to external methods are handled, which is likely helpful for future readers.)
Your question is a little vague, but let me see if I can work out what you did in your kext, vs what you're doing in your dext:
You're calling IOConnectCallStructMethod(connection, selector, buffer, 256, NULL, NULL); in your app. This means buffer is passed as a "struct input" argument to your external method.
Because your buffer is 256 bytes long, which is less than or equal to sizeof(io_struct_inband_t), the contents of the buffer is sent to the kernel in-band - in other words, it's copied at the time of the IOConnectCallStructMethod() call.
This means that in your kext's external method dispatch function, the struct input is passed via the structureInput/structureInputSize fields in the incoming IOExternalMethodArguments struct. structureInput is a pointer in the kernel context and can be dereferenced directly. The pointer is only valid during execution of your method dispatch, and can't be used once the method has returned synchronously.
If you need to use the buffer for device I/O, you may need to wrap it in an IOMemoryDescriptor. One way to do this is indeed via IOMemoryDescriptor::CreateMapping().
If the buffer was 4097 bytes or larger, it would be passed via the structureInputDescriptor IOMemoryDescriptor, which can either be passed along to device I/O directly, or memory-mapped for dereferencing in the kernel. This memory descriptor directly references the user process's memory.
DriverKit extensions are considerably more limited in what they can do, but external method arguments arrive in almost exactly the same way.
Small structs arrive via the IOUserClientMethodArguments' structureInput field, which points to an OSData object. You can access the content via the getBytesNoCopy()/getLength() methods.
If you need this data in an IOMemoryDescriptor for onward I/O, the only way I know of is to create an IOBufferMemoryDescriptor using either IOUSBHostDevice::CreateIOBuffer() or IOBufferMemoryDescriptor::Create and then copying the data from the OSData object into the buffer.
Large buffers are again already referenced via an IOMemoryDescriptor. You can pass this on to I/O functions, or map it into the driver's address space using CreateMapping()

namespace
{
/*
**********************************************************************************
** create a memory descriptor and map its address
**********************************************************************************
*/
IOReturn arcmsr_userclient_create_memory_descriptor_and_map_address(const void* address, size_t length, IOMemoryDescriptor** memory_descriptor)
{
IOBufferMemoryDescriptor *buffer_memory_descriptor = nullptr;
uint64_t buffer_address;
uint64_t len;
#if ARCMSR_DEBUG_IO_USER_CLIENT
arcmsr_debug_print("ArcMSRUserClient: *******************************************************\n");
arcmsr_debug_print("ArcMSRUserClient: ** IOUserClient IOMemoryDescriptor create_with_bytes \n");
arcmsr_debug_print("ArcMSRUserClient: *******************************************************\n");
#endif
if (!address || !memory_descriptor)
{
return kIOReturnBadArgument;
}
if (IOBufferMemoryDescriptor::Create(kIOMemoryDirectionInOut, length, 0, &buffer_memory_descriptor) != kIOReturnSuccess)
{
if (buffer_memory_descriptor)
{
OSSafeReleaseNULL(buffer_memory_descriptor);
}
return kIOReturnError;
}
if (buffer_memory_descriptor->Map(0, 0, 0, 0, &buffer_address, &len) != kIOReturnSuccess)
{
if (buffer_memory_descriptor)
{
OSSafeReleaseNULL(buffer_memory_descriptor);
}
return kIOReturnError;
}
if (length != len)
{
if (buffer_memory_descriptor)
{
OSSafeReleaseNULL(buffer_memory_descriptor);
}
return kIOReturnNoMemory;
}
memcpy(reinterpret_cast<void*>(buffer_address), address, length);
*memory_descriptor = buffer_memory_descriptor;
return kIOReturnSuccess;
}
} /* namespace */

Related

External Procedure memory allocations

I have some existing functions written in C that take pointers to two arrays and do some calculations on the data. I wanted to call those functions from PL/SQL as external procedures. Our data is stored in Oracle as a BLOB. So, I made a wrapper shared object to be called from PL/SQL. It felt like overkill to pass the BLOB into every wrapper function and parse it into these two arrays with every single external procedure call. So, instead I made a function called ParseBlobToArrays
which looks like:
int OracleBlobToDataArrays(OCIExtProcContext* ctx,OCILobLocator* blob,
//some other stuff I can't post used to parse blob,
unsigned int* address1, unsigned int* address2)
{
//this isn't all the code, error checking etc here as well
unsigned char* buf = OCIExtProcAllocCallMemory(ctx,lobLen ))
//read blob into allocated buf
double* arr1 = ParseToArray1(buf,//some additional params);
float** arr2 = ParseToArray2(buf,//some additional params);
*address1= (unsigned int)((uintptr_t) arr1);
*address2= (unsigned int)((uintptr_t) arr2);
}
I then return the addresses and feed them into each subsequent function that needs them and set them appropriately:
double OtherFunctionWrapper(unsigned int address1,unsigned int address2)
{
double* arr1 = (double*)address1;
float** arr2 = (float**)address2;
return DoCalculation(arr1,arr2);;
}
I then call a Free wrapper function I wrote which takes the addresses, casts again to appropriate types and then frees all the pointers.
There is a code smell I don't like:
The C code is passing memory addresses like this when PL/SQL doesn't seem to support a uintptr_t/64 bit binary integer (or even exposing these raw addresses at all). Casting and returning the address as a UINT32 and then moving this to 64 bit OS in the future would potentially blow up I believe, if the address doesn't fit in a UINT32.
I could parse the data into arrays and return them to PL/SQL as OCIColls, but then I need to use OCI specific data types etc. This is safer (I think?), but I already have the BLOB parsed and in memory to be used. Seems silly to pass data back to PL/SQL as PL/SQL supported types just to be passed in to other functions, when all I really need is a pointer to data that already exists in the format I want in memory.
Any advice on handling this differently? If the right option is to do everything with supported PL/SQL datatypes and return them accordingly, I will. But I am looking for other options as well to minimize duplicate code with each call to various calculation functions since speed of these external procedures is very important.
Thanks!

lock-free synchronization, fences and memory order (store operation with acquire semantics)

I am migrating a project that was run on bare-bone to linux, and need to eliminate some {disable,enable}_scheduler calls. :)
So I need a lock-free sync solution in a single writer, multiple readers scenario, where the writer thread cannot be blocked. I came up with the following solution, which does not fit to the usual acquire-release ordering:
class RWSync {
std::atomic<int> version; // incremented after every modification
std::atomic_bool invalid; // true during write
public:
RWSync() : version(0), invalid(0) {}
template<typename F> void sync(F lambda) {
int currentVersion;
do {
do { // wait until the object is valid
currentVersion = version.load(std::memory_order_acquire);
} while (invalid.load(std::memory_order_acquire));
lambda();
std::atomic_thread_fence(std::memory_order_seq_cst);
// check if something changed
} while (version.load(std::memory_order_acquire) != currentVersion
|| invalid.load(std::memory_order_acquire));
}
void beginWrite() {
invalid.store(true, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
}
void endWrite() {
std::atomic_thread_fence(std::memory_order_seq_cst);
version.fetch_add(1, std::memory_order_release);
invalid.store(false, std::memory_order_release);
}
}
I hope the intent is clear: I wrap the modification of a (non-atomic) payload between beginWrite/endWrite, and read the payload only inside the lambda function passed to sync().
As you can see, here I have an atomic store in beginWrite() where no writes after the store operation can be reordered before the store. I did not find suitable examples, and I am not experienced in this field at all, so I'd like some confirmation that it is OK (verification through testing is not easy either).
Is this code race-free and work as I expect?
If I use std::memory_order_seq_cst in every atomic operation, can I omit the fences? (Even if yes, I guess the performance would be worse)
Can I drop the fence in endWrite()?
Can I use memory_order_acq_rel in the fences? I don't really get the difference -- the single total order concept is not clear to me.
Is there any simplification / optimization opportunity?
+1. I happily accept any better idea as the name of this class :)
The code is basically correct.
Instead of having two atomic variables (version and invalid) you may use single version variable with semantic "Odd values are invalid". This is known as "sequential lock" mechanism.
Reducing number of atomic variables simplifies things a lot:
class RWSync {
// Incremented before and after every modification.
// Odd values mean that object in invalid state.
std::atomic<int> version;
public:
RWSync() : version(0) {}
template<typename F> void sync(F lambda) {
int currentVersion;
do {
currentVersion = version.load(std::memory_order_seq_cst);
// This may reduce calls to lambda(), nothing more
if(currentVersion | 1) continue;
lambda();
// Repeat until something changed or object is in an invalid state.
} while ((currentVersion | 1) ||
version.load(std::memory_order_seq_cst) != currentVersion));
}
void beginWrite() {
// Writer may read version with relaxed memory order
currentVersion = version.load(std::memory_order_relaxed);
// Invalidation requires sequential order
version.store(currentVersion + 1, std::memory_order_seq_cst);
}
void endWrite() {
// Writer may read version with relaxed memory order
currentVersion = version.load(std::memory_order_relaxed);
// Release order is sufficient for mark an object as valid
version.store(currentVersion + 1, std::memory_order_release);
}
};
Note the difference in memory orders in beginWrite() and endWrite():
endWrite() makes sure that all previous object's modifications have been completed. It is sufficient to use release memory order for that.
beginWrite() makes sure that reader will detect object being in invalid state before any futher object's modification is started. Such garantee requires seq_cst memory order. Because of that reader uses seq_cst memory order too.
As for fences, it is better to incorporate them into previous/futher atomic operation: compiler knows how to make the result fast.
Explanations of some modifications of original code:
1) Atomic modification like fetch_add() is intended for cases, when concurrent modifications (like another fetch_add()) are possible. For correctness, such modifications use memory locking or other very time-costly architecture-specific things.
Atomic assignment (store()) does not use memory locking, so it is cheaper than fetch_add(). You may use such assignment because concurrent modifications are not possible in your case (reader does not modify version).
2) Unlike to release-acquire semantic, which differentiate load and store operations, sequential consistency (memory_order_seq_cst) is applicable to every atomic access, and provide total order between these accesses.
The accepted answer is not correct. I guess the code should be something like "currentVersion & 1" instead of "currentVersion | 1". And subtler mistake is that, reader thread can go into lambda(), and after that, the write thread could run beginWrite() and write value to non-atomic variable. In this situation, write action in payload and read action in payload haven't happens-before relationship. concurrent access (without happens-before relationship) to non-atomic variable is a data race. Note that, single total order of memory_order_seq_cst does not means the happens-before relationship; they are consistent, but two kind of things.

How can I safely append data to a sk_buff for IPTables target

I am working on a Linux kernel module that needs to modify network packets and append an extra header. I already implemented the modification part, recomputed the check-sums and it worked nice. But I don't know how to safely append an extra header. If my input packet is something like:
ip-header / tcp-header / data
I would like to have an output packet like:
ip-header / tcp-header / my-header / data
For what I read, I think I need something like the following code. I wrote my specific questions on the code as comments. My general concern is if the code I am writing here is memory-safe or what should I do to have a memory-safe way to append the new header. Also, if I am doing something wrong or there is a better way to do it I will also appreciate the comment. I have tried to find examples but no luck so far. Here is the code:
static unsigned int my_iptables_target(struct sk_buff *skb, const struct xt_action_param *par) {
const struct xt_mytarget_info *info = par->targinfo;
/* Some code ... */
if (!skb_make_writable(skb, skb->len)) {
//Drop the packet
return NF_DROP;
}
struct newheader* myheader;
// Check if there is enough space and do something about it
if (skb_headroom(skb) < sizeof(struct newheader)) {
// So there is no enugh space.
/* I don't know well what to put here. I read that a function called pskb_expand_head might
* do the job. I do not understand very well how it works, or why it might fail (return value
* different from zero). Does this code work:
*/
if (pskb_expand_head(skb, sizeof(struct newheader) - skb_headroom(skb), 0, GPF_ATOMIC) != 0) {
// What does it mean if the code reaches this point?
return NF_DROP;
}
}
// At this point, there should be enough space
skb_push(skb, sizeof(struct newheader));
/* I also think that skb_push() creates space at the beggining, to open space between the header and
* the body I guess I must move the network/transport headers up. Perhaps something like this:
*/
memcpy(skb->data, skb->data + sizeof(struct newheader), size_of_all_headers - sizeof(struct newheader));
// Then set myheader address and fill data.
myheader = skb->data + size_of_all_headers;
//Then just set the new header, and recompute checksums.
return XT_CONTINUE;
}
I assumed that the variable size_of_all_headers contains the size in bytes of the network and transport headers. I also think that memcpy copies bytes in increasing order, so that call shouldn't be a problem. So does the above code works? It is all memory-safe? Are there better ways to do it? Are there examples (or can you provide one) that does something like this?
I used a code similar to the one in the question and so far it has worked very well for all the test I have done. To answer some of the specific questions, I used something like:
if (skb_headroom(skb) < sizeof(struct newheader)) {
printk("I got here!\n");
if (pskb_expand_head(skb, sizeof(struct newheader) - skb_headroom(skb), 0, GPF_ATOMIC) != 0) {
printk("And also here\n");
return NF_DROP;
}
}
But none of the print statements ever executed. I suppose that happens because the OS reserves enough space in memory such that there can be no problems given the limits of the IP header. But I think it is better to leave that if statement to grow the packet if necessary.
The other difference of the code that I tested and worked is that instead of moving all the other headers up to create a space for my header, I chose to move the body of the packet down.

create device mapper target

I am trying to implement device mapper target by referring to the already existing ones dm-linear, dm-snapshot, dm-cache etc. In my implementation, I need to perform a read/modify/write operation on a certain sector range. Since the device mapper directly talks to the block layer, I am not sure what data structures/functions to use to read the sectors in the memory, modify the buffer and write it back to another sector range.
At the application level, we have syscalls and below we have vfs_read/vfs_write. Is there anything similar for device mapper layer?
I have been stuck here for very long. Any help will be appreciated.
NOTE: My answer is related to kernel version < 3.14, because since 3.14 API is slightly changed.
In kernel you read/write certain sectors with struct bio. This struct is used for all block level I/O. Comprehensive documentation can be found in kernel and on lwn. These are the several most significant members of this structure:
bio->bi_sector - first sector of block I/O request
bio->bi_size - size of I/O request
bio->bi_bdev - device to read/write
bio->bi_end_io - callback that kernel will call on the end of request
What you do in device mapper target is map incoming bio. When you creating your device mapper target you supply at least 2 callbacks: ctr, and map. For example, the simplest device-mapper target dm-zero declares it's callbacks as following:
static struct target_type zero_target = {
.name = "zero",
.version = {1, 1, 0},
.module = THIS_MODULE,
.ctr = zero_ctr,
.map = zero_map,
};
map is a key callback - it's a heart of every device-mapper target. map receive incoming bio and it can do anything with it. For example, dm-linear just shift sector of every incoming bio by predefined offset. See the code:
static sector_t linear_map_sector(struct dm_target *ti, sector_t bi_sector)
{
struct linear_c *lc = ti->private;
return lc->start + dm_target_offset(ti, bi_sector);
}
static void linear_map_bio(struct dm_target *ti, struct bio *bio)
{
struct linear_c *lc = ti->private;
bio->bi_bdev = lc->dev->bdev;
if (bio_sectors(bio))
bio->bi_sector = linear_map_sector(ti, bio->bi_sector);
}
static int linear_map(struct dm_target *ti, struct bio *bio)
{
linear_map_bio(ti, bio);
return DM_MAPIO_REMAPPED;
}
Because map receives pointer to bio it can change value under that pointer and that's it.
That's how you map I/O requests. If you want to create your own requests then you must allocate bio, fill it's sector, device, size, end callback and add buffers to read into/write from. Basically, it's just a few steps:
Call to bio_alloc to allocate bio.
Set bio->bi_bdev, bio->bi_sector, bio->bi_size, bio->bi_end_io
Add pages via bio_add_page.
Call submit_bio.
Handle results and errors in bio->bi_end_io callback
Example can be found in dm-crypt target in crypt_alloc_buffer function.

How to convert between shared_ptr<FILE> to FILE* in C++?

I am trying to use a FILE pointer multiple times through out my application
for this I though I create a function and pass the pointer through that. Basically I have this bit of code
FILE* fp;
_wfopen_s (&fp, L"ftest.txt", L"r");
_setmode (_fileno(fp), _O_U8TEXT);
wifstream file(fp);
which is repeated and now instead I want to have something like this:
wifstream file(SetFilePointer(L"ftest.txt",L"r"));
....
wofstream output(SetFilePointer(L"flist.txt",L"w"));
and for the function :
FILE* SetFilePointer(const wchar_t* filePath, const wchar_t * openMode)
{
shared_ptr<FILE> fp = make_shared<FILE>();
_wfopen_s (fp.get(), L"ftest.txt", L"r");
_setmode (_fileno(fp.get()), _O_U8TEXT);
return fp.get();
}
this doesn't simply work. I tried using &*fp instead of fp.get() but still no luck.
You aren't supposed to create FILE instances with new and destroy them with delete, like make_shared does. Instead, FILEs are created with fopen (or in this case, _wfopen_s) and destroyed with fclose. These functions do the allocating and deallocating internally using some unspecified means.
Note that _wfopen_s does not take a pointer but a pointer to pointer - it changes the pointer you gave it to point to the new FILE object it allocates. You cannot get the address of the pointer contained in shared_ptr to form a pointer-to-pointer to it, and this is a very good thing - it would horribly break the ownership semantics of shared_ptr and lead to memory leaks or worse.
However, you can use shared_ptr to manage arbitrary "handle"-like types, as it can take a custom deleter object or function:
FILE* tmp;
shared_ptr<FILE> fp;
if(_wfopen_s(&tmp, L"ftest.txt", L"r") == 0) {
// Note that we use the shared_ptr constructor, not make_shared
fp = shared_ptr<FILE>(tmp, std::fclose);
} else {
// Remember to handle errors somehow!
}
Please do take a look at the link #KerrekSB gave, it covers this same idea with more detail.

Resources