share data with tasklet bottom half in linux driver - linux-kernel

I am writing a linux phy driver that handles packet timestamping. The bottom half does the process of calculating timestamps and sending this info to the kernel networking stack and then to user space. The bottom half needs some information from the skb(packet) which the caller of the tasklet has. I am having difficulty passing this skb to the takslet. tasklet handler function doesnot take any input other than unsigned long. I am stuck here. Below is a code snippet for you understanding -
static void tx_ts_task(unsigned long val)
{
struct phyts *phyts = container_of(&val, struct phyts, int_flags);
//skb_copy(skb); ///want to access skb in this tasklet but I am unable to do this.
.
.
}
int tx_timestamp(struct phyts *phyts, struct sk_buff *skb, int len)
{
.
.
tasklet_schedule(&tx_ts_tasklet);
}
Appreciate your inputs. Thanks

Tasklet function receives the same data parameter that is specified in DECLARE_TASKLET/tasklet_init. Usually this a pointer to some (large) driver struct.
So basically, you can't pass runtime data between ISR and tasklet directly and should use some sort of shared variable (may be the above-mentioned struct) with proper locking.

Related

How to map memory in DriverKit using IOMemoryDescriptor::CreateMapping?

I am trying to learn more about DriverKit and memory management, and I read this question:
How to allocate memory in a DriverKit system extension and map it to another process?
And I would like to understand how to use IOMemoryDescriptor::CreateMapping.
I wrote a little app to test this where I do (very simplified code):
uint8_t * buffer = new uint8_t[256];
for (int i = 0 ; i < 256 ; i++)
buffer[i] = 0xC6;
clientData in, out;
in.nbytes = 256;
in.pbuffer = buffer;
size_t sout = sizeof(out);
IOConnectCallStructMethod(connection, selector,&in,sizeof(in),&out,&sout);
// out.pbuffer now has new values in it
In my Kext user client class, I was doing (I am simplifying):
IOReturn UserClient::MyExtFunction(clientData * in, clientData * out, IOByteCount inSize, IOByteCount * outSize)
{
MyFunction(in->nBytes, in->pbuffer);//this will change the content of pbuffer
*out = *in;
}
IOReturn UserClient::MyFunction(SInt32 nBytesToRead,void* pUserBuffer,SInt32* nBytesRead)
{
PrepareBuffer(nBytesToRead,&pBuffer);
...
(call function that will fill pBuffer)
}
IOReturn UserClient::PrepareBuffer(UInt32 nBytes,void** pBuffer);
{
IOMemoryDescriptor * desc = IOMemoryDescriptor::withAddressRange((mach_vm_address_t)*pBuffer,nBytes,direction, owner task);
desc->prepare();
IOMemoryMap * map = desc->map();
*pBuffer = (void*)map->getVirtualAddress();
return kIOReturnSuccess;
}
This is what I don't know how to reproduce in a DExt and where I think I really don't understand the basic of CreateMapping.
Or is what I used to do not possible?
In my driver, this is where I don't know how to use CreateMapping and IOMemoryMap so this buffer can be mapped to a memory location and updated with different values.
I can create an IOBufferMemoryDescriptor but how do I tie it to the buffer from my application? I also don't understand the various options for CreateMapping.
Please note that in another test app I have successfully used IOConnectMapMemory64()/CopyClientMemoryForType() but I would like to learn specifically about CreateMapping().
(I hope it is alright I edited this question a lot... still new to StackOverflow)
Or is what I used to do not possible?
In short, no.
You're attempting to map arbitrary user process memory, which the client application did not explicitly mark as available to the driver using IOKit. This doesn't fit with Apple's ideas about safety, security, and sandboxing, so this sort of thing isn't available in DriverKit.
Obviously, kexts are all-powerful, so this was possible before, and indeed, I've used the technique myself in shipping drivers and then ran into trouble when porting said kexts to DriverKit.
The only ways to gain direct access to the client process's memory, as far as I'm aware, are:
By passing buffers >= 4097 bytes as struct input or output arguments to IOConnectCall…Method()s so they arrive as IOMemoryDescriptors in the driver. Note that these can be retained in the driver longer term, but at least for input structs, updates on the user space side won't be reflected on the driver side as a copy-on-write mapping is used. So they should be used purely for sending data in the intended direction.
By the user process mapping an existing IOMemoryDescriptor into its space using IOConnectMapMemory64()/CopyClientMemoryForType().
This does mean you can't use indirect data structures like the one you are using. You'll have to use "packed" structures, or indices into long-lasting shared buffers.
By "packed" structures I mean buffers containing a header struct such as your clientData which is followed in contiguous memory by further data, such as your buffer, referencing it by offset into this contiguous memory. The whole contiguous memory block can be passed as an input struct.
I have filed feedback with Apple requesting a more powerful mechanism for exchanging data between user clients and dexts; I have no idea if it will be implemented, but if such a facility would be useful, I recommend you do the same. (explaining what you'd like to use it for with examples) The more of us report it, the more likely it'll happen. (IOMemoryDescriptor::CreateSubMemoryDescriptor() was added after I filed a request for it; I won't claim I was the first to do so, or that Apple wasn't planning to add it prior to my suggestion, but they are actively improving the DriverKit APIs.)
Original answer before question was edited to be much more specific:
(Retained because it explains in general terms how buffer arguments to external methods are handled, which is likely helpful for future readers.)
Your question is a little vague, but let me see if I can work out what you did in your kext, vs what you're doing in your dext:
You're calling IOConnectCallStructMethod(connection, selector, buffer, 256, NULL, NULL); in your app. This means buffer is passed as a "struct input" argument to your external method.
Because your buffer is 256 bytes long, which is less than or equal to sizeof(io_struct_inband_t), the contents of the buffer is sent to the kernel in-band - in other words, it's copied at the time of the IOConnectCallStructMethod() call.
This means that in your kext's external method dispatch function, the struct input is passed via the structureInput/structureInputSize fields in the incoming IOExternalMethodArguments struct. structureInput is a pointer in the kernel context and can be dereferenced directly. The pointer is only valid during execution of your method dispatch, and can't be used once the method has returned synchronously.
If you need to use the buffer for device I/O, you may need to wrap it in an IOMemoryDescriptor. One way to do this is indeed via IOMemoryDescriptor::CreateMapping().
If the buffer was 4097 bytes or larger, it would be passed via the structureInputDescriptor IOMemoryDescriptor, which can either be passed along to device I/O directly, or memory-mapped for dereferencing in the kernel. This memory descriptor directly references the user process's memory.
DriverKit extensions are considerably more limited in what they can do, but external method arguments arrive in almost exactly the same way.
Small structs arrive via the IOUserClientMethodArguments' structureInput field, which points to an OSData object. You can access the content via the getBytesNoCopy()/getLength() methods.
If you need this data in an IOMemoryDescriptor for onward I/O, the only way I know of is to create an IOBufferMemoryDescriptor using either IOUSBHostDevice::CreateIOBuffer() or IOBufferMemoryDescriptor::Create and then copying the data from the OSData object into the buffer.
Large buffers are again already referenced via an IOMemoryDescriptor. You can pass this on to I/O functions, or map it into the driver's address space using CreateMapping()
namespace
{
/*
**********************************************************************************
** create a memory descriptor and map its address
**********************************************************************************
*/
IOReturn arcmsr_userclient_create_memory_descriptor_and_map_address(const void* address, size_t length, IOMemoryDescriptor** memory_descriptor)
{
IOBufferMemoryDescriptor *buffer_memory_descriptor = nullptr;
uint64_t buffer_address;
uint64_t len;
#if ARCMSR_DEBUG_IO_USER_CLIENT
arcmsr_debug_print("ArcMSRUserClient: *******************************************************\n");
arcmsr_debug_print("ArcMSRUserClient: ** IOUserClient IOMemoryDescriptor create_with_bytes \n");
arcmsr_debug_print("ArcMSRUserClient: *******************************************************\n");
#endif
if (!address || !memory_descriptor)
{
return kIOReturnBadArgument;
}
if (IOBufferMemoryDescriptor::Create(kIOMemoryDirectionInOut, length, 0, &buffer_memory_descriptor) != kIOReturnSuccess)
{
if (buffer_memory_descriptor)
{
OSSafeReleaseNULL(buffer_memory_descriptor);
}
return kIOReturnError;
}
if (buffer_memory_descriptor->Map(0, 0, 0, 0, &buffer_address, &len) != kIOReturnSuccess)
{
if (buffer_memory_descriptor)
{
OSSafeReleaseNULL(buffer_memory_descriptor);
}
return kIOReturnError;
}
if (length != len)
{
if (buffer_memory_descriptor)
{
OSSafeReleaseNULL(buffer_memory_descriptor);
}
return kIOReturnNoMemory;
}
memcpy(reinterpret_cast<void*>(buffer_address), address, length);
*memory_descriptor = buffer_memory_descriptor;
return kIOReturnSuccess;
}
} /* namespace */

lock-free synchronization, fences and memory order (store operation with acquire semantics)

I am migrating a project that was run on bare-bone to linux, and need to eliminate some {disable,enable}_scheduler calls. :)
So I need a lock-free sync solution in a single writer, multiple readers scenario, where the writer thread cannot be blocked. I came up with the following solution, which does not fit to the usual acquire-release ordering:
class RWSync {
std::atomic<int> version; // incremented after every modification
std::atomic_bool invalid; // true during write
public:
RWSync() : version(0), invalid(0) {}
template<typename F> void sync(F lambda) {
int currentVersion;
do {
do { // wait until the object is valid
currentVersion = version.load(std::memory_order_acquire);
} while (invalid.load(std::memory_order_acquire));
lambda();
std::atomic_thread_fence(std::memory_order_seq_cst);
// check if something changed
} while (version.load(std::memory_order_acquire) != currentVersion
|| invalid.load(std::memory_order_acquire));
}
void beginWrite() {
invalid.store(true, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
}
void endWrite() {
std::atomic_thread_fence(std::memory_order_seq_cst);
version.fetch_add(1, std::memory_order_release);
invalid.store(false, std::memory_order_release);
}
}
I hope the intent is clear: I wrap the modification of a (non-atomic) payload between beginWrite/endWrite, and read the payload only inside the lambda function passed to sync().
As you can see, here I have an atomic store in beginWrite() where no writes after the store operation can be reordered before the store. I did not find suitable examples, and I am not experienced in this field at all, so I'd like some confirmation that it is OK (verification through testing is not easy either).
Is this code race-free and work as I expect?
If I use std::memory_order_seq_cst in every atomic operation, can I omit the fences? (Even if yes, I guess the performance would be worse)
Can I drop the fence in endWrite()?
Can I use memory_order_acq_rel in the fences? I don't really get the difference -- the single total order concept is not clear to me.
Is there any simplification / optimization opportunity?
+1. I happily accept any better idea as the name of this class :)
The code is basically correct.
Instead of having two atomic variables (version and invalid) you may use single version variable with semantic "Odd values are invalid". This is known as "sequential lock" mechanism.
Reducing number of atomic variables simplifies things a lot:
class RWSync {
// Incremented before and after every modification.
// Odd values mean that object in invalid state.
std::atomic<int> version;
public:
RWSync() : version(0) {}
template<typename F> void sync(F lambda) {
int currentVersion;
do {
currentVersion = version.load(std::memory_order_seq_cst);
// This may reduce calls to lambda(), nothing more
if(currentVersion | 1) continue;
lambda();
// Repeat until something changed or object is in an invalid state.
} while ((currentVersion | 1) ||
version.load(std::memory_order_seq_cst) != currentVersion));
}
void beginWrite() {
// Writer may read version with relaxed memory order
currentVersion = version.load(std::memory_order_relaxed);
// Invalidation requires sequential order
version.store(currentVersion + 1, std::memory_order_seq_cst);
}
void endWrite() {
// Writer may read version with relaxed memory order
currentVersion = version.load(std::memory_order_relaxed);
// Release order is sufficient for mark an object as valid
version.store(currentVersion + 1, std::memory_order_release);
}
};
Note the difference in memory orders in beginWrite() and endWrite():
endWrite() makes sure that all previous object's modifications have been completed. It is sufficient to use release memory order for that.
beginWrite() makes sure that reader will detect object being in invalid state before any futher object's modification is started. Such garantee requires seq_cst memory order. Because of that reader uses seq_cst memory order too.
As for fences, it is better to incorporate them into previous/futher atomic operation: compiler knows how to make the result fast.
Explanations of some modifications of original code:
1) Atomic modification like fetch_add() is intended for cases, when concurrent modifications (like another fetch_add()) are possible. For correctness, such modifications use memory locking or other very time-costly architecture-specific things.
Atomic assignment (store()) does not use memory locking, so it is cheaper than fetch_add(). You may use such assignment because concurrent modifications are not possible in your case (reader does not modify version).
2) Unlike to release-acquire semantic, which differentiate load and store operations, sequential consistency (memory_order_seq_cst) is applicable to every atomic access, and provide total order between these accesses.
The accepted answer is not correct. I guess the code should be something like "currentVersion & 1" instead of "currentVersion | 1". And subtler mistake is that, reader thread can go into lambda(), and after that, the write thread could run beginWrite() and write value to non-atomic variable. In this situation, write action in payload and read action in payload haven't happens-before relationship. concurrent access (without happens-before relationship) to non-atomic variable is a data race. Note that, single total order of memory_order_seq_cst does not means the happens-before relationship; they are consistent, but two kind of things.

create device mapper target

I am trying to implement device mapper target by referring to the already existing ones dm-linear, dm-snapshot, dm-cache etc. In my implementation, I need to perform a read/modify/write operation on a certain sector range. Since the device mapper directly talks to the block layer, I am not sure what data structures/functions to use to read the sectors in the memory, modify the buffer and write it back to another sector range.
At the application level, we have syscalls and below we have vfs_read/vfs_write. Is there anything similar for device mapper layer?
I have been stuck here for very long. Any help will be appreciated.
NOTE: My answer is related to kernel version < 3.14, because since 3.14 API is slightly changed.
In kernel you read/write certain sectors with struct bio. This struct is used for all block level I/O. Comprehensive documentation can be found in kernel and on lwn. These are the several most significant members of this structure:
bio->bi_sector - first sector of block I/O request
bio->bi_size - size of I/O request
bio->bi_bdev - device to read/write
bio->bi_end_io - callback that kernel will call on the end of request
What you do in device mapper target is map incoming bio. When you creating your device mapper target you supply at least 2 callbacks: ctr, and map. For example, the simplest device-mapper target dm-zero declares it's callbacks as following:
static struct target_type zero_target = {
.name = "zero",
.version = {1, 1, 0},
.module = THIS_MODULE,
.ctr = zero_ctr,
.map = zero_map,
};
map is a key callback - it's a heart of every device-mapper target. map receive incoming bio and it can do anything with it. For example, dm-linear just shift sector of every incoming bio by predefined offset. See the code:
static sector_t linear_map_sector(struct dm_target *ti, sector_t bi_sector)
{
struct linear_c *lc = ti->private;
return lc->start + dm_target_offset(ti, bi_sector);
}
static void linear_map_bio(struct dm_target *ti, struct bio *bio)
{
struct linear_c *lc = ti->private;
bio->bi_bdev = lc->dev->bdev;
if (bio_sectors(bio))
bio->bi_sector = linear_map_sector(ti, bio->bi_sector);
}
static int linear_map(struct dm_target *ti, struct bio *bio)
{
linear_map_bio(ti, bio);
return DM_MAPIO_REMAPPED;
}
Because map receives pointer to bio it can change value under that pointer and that's it.
That's how you map I/O requests. If you want to create your own requests then you must allocate bio, fill it's sector, device, size, end callback and add buffers to read into/write from. Basically, it's just a few steps:
Call to bio_alloc to allocate bio.
Set bio->bi_bdev, bio->bi_sector, bio->bi_size, bio->bi_end_io
Add pages via bio_add_page.
Call submit_bio.
Handle results and errors in bio->bi_end_io callback
Example can be found in dm-crypt target in crypt_alloc_buffer function.

current->mm gives NULL in linux kernel

I would like to walk the page table, so I have accessed the current->mm, but it gives NULL value.
I'm working on linux kernel 3.9 and I don't understand how could current->mm is zero.
Is there something I miss here?
It means you are in a kernel thread.
In Linux, kernel threads have no mm struct. A kernel thread borrows the mm from the previous user thread and records it in active_mm. So you should use active_mm instead.
More details:
in /kernel/sched/core.c you can find the following code:
static inline void
context_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next)
{
...
if (!mm) {
next->active_mm = oldmm;
atomic_inc(&oldmm->mm_count);
enter_lazy_tlb(oldmm, next);
} else
switch_mm(oldmm, mm, next);
...
}
If the next thread has no mm (a kernel thread), the scheduler would not switch mm and just reuse the mm of the previous thread.
Need for active_mm assignment : The call to switch_mm(), which results in a TLB flush, is avoided by “borrowing” the mm_struct used by the previous task and placing it in task_struct→active_mm. This technique has made large improvements to context switches times.

boost signals - How control lifetime of objects sent to subscribers? Smart pointers?

I am using boost::signals2 under Red Hat Enterprise Linux 5.3.
My signal creates an object copy and sends it's pointer to subscribers. This was implemented for thread safety to prevent the worker thread from updating a string property on the object at the same time it is being read ( perhaps I should revisit the use of locks? ).
Anyway, my concern is with multiple subscribers that dereference the pointer to the copied object on their own thread. How can I control object lifetime? How can I know all subscribers are done with the object and it is safe to delete the object?
typedef boost::signals2::signal< void ( Parameter* ) > signalParameterChanged_t;
signalParameterChanged_t m_signalParameterChanged;
// Worker Thread - Raises the signal
void Parameter::raiseParameterChangedSignal()
{
Parameter* pParameterDeepCopied = new Parameter(*this);
m_signalParameterChanged(pParameterDeepCopied);
}
// Read-Only Subscriber Thread(s) - GUI (and Event Logging thread ) handles signal
void ClientGui::onDeviceParameterChangedHandler( Parameter* pParameter)
{
cout << pParameter->toString() << endl;
delete pParameter; // **** This only works for a single subscriber !!!
}
Thanks in advance for any tips or direction,
-Ed
If you really have to pass Parameter by pointer to your subscribers, then you should use boost::shared_ptr:
typedef boost::shared_ptr<Parameter> SharedParameterPtr;
typedef boost::signals2::signal< void ( SharedParameterPtr ) > signalParameterChanged_t;
signalParameterChanged_t m_signalParameterChanged;
// The signal source
void Parameter::raiseParameterChangedSignal()
{
SharedParameterPtr pParameterDeepCopied = new Parameter(*this);
m_signalParameterChanged(pParameterDeepCopied);
}
// The subscriber's handler
void ClientGui::onDeviceParameterChangedHandler( SharedParameterPtr pParameter)
{
cout << pParameter->toString() << endl;
}
The shared parameter object sent to your subscribers will be automatically deleted when its reference count becomes zero (i.e. it goes out of scope in all the handlers).
Is Parameter really so heavyweight that you need to send it to your subscribers via pointer?
EDIT:
Please note that using shared_ptr takes care of lifetime management, but will not relieve you of the responsibility to make concurrent reads/writes to/from the shared parameter object thread-safe. You may well want to pass-by-copy to your subscribers for thread-safety reasons alone. In your question, it's not clear enough to me what goes on thread-wise, so I can't give you more specific recommendations.
Is the thread calling raiseParameterChangedSignal() the same as your GUI thread? Some GUI toolkits don't allow concurrent use of their API by multiple threads.

Resources