workqueues in userspace(or userpace device drivers) - linux-kernel

I'm working on device drivers(HDMI, HDCP) which had been implemented in user-space.
Now, I'm looking for similar-to-linux-workqueue functionality in user-space.
What I want:
a.) To Tie-up different work/functions on a workqueue and run it.
b.) Able to flush the workqueue when you are shutting down your driver or resetting your driver state machine.
c.) Add delayed execution of work-items.
d.) cancel current-work item etc.
I'm familiar with Linux kernel work-queues and work structures(though not expert) and hence, my curiosity that how we can emulate similar mechanism in user-space level ?
Probably,I can write such kind of library by using Pthread APIs mixing it with some global queue.
Any idea/suggestions?

Using an eventloop library, such as libev or libevent would get the job done.

Related

What is irq_bypass and how to use it in Linux?

I am learning VFIO in Linux, and found there is a kernel module irq_bypass, which is being used by VFIO. I read its codes, and found it has functions to add/delete irq producer and consumer. The code submitter described as follows,
The IRQ bypass manager here is meant to provide the shim to connect
interrupt producers, generally the host physical device driver, with
interrupt consumers, generally the hypervisor, in order to configure
these bypass mechanism
So I wrote a module to call the irq_bypass interfaces to figure out its working flow.
The kernel module register irq producer and consumer for an interrupt generated by calling 'int' instruction within this module. But I am not sure if I am doing right, since I did NOT see the consumer is being triggered.
Maybe I am wrong in understanding the mechanism of the irq_bypass module, if so, how does it work in virtualization system or why it is needed in KVM/VFIO in Linux?

is there a workqueue feature in xnu kernel?

I need to use workqueue-like feature on Mac OSX (kernel mode driver) and am looking for a way to add work into a queue to be processed by a kernel thread later. Conceptually this is the same thing as workqueue feature available in Linux kernel. Is there something similar on XNU kernel as well?
I don't think there's a direct equivalent as such, although I admit I'm not intimately familiar with the Linux side, so I'll avoid comparing and just tell you about what's available on macOS/xnu.
I/O Kit IOWorkLoops
If you're building an I/O Kit driver, and especially if you're writing a secondary interrupt handler, you'll be using IOWorkLoops. Interrupts are abstracted by IOEventSource objects, which schedule secondary interrupt handlers to run on the driver's IOWorkLoop.
Each IOWorkLoop wraps one kernel thread and also provides a serialisation/locking mechanism for resources shared with that thread. All jobs submitted to a workloop either explicitly through an IOCommandGate or the workloop object directly, or as a result of an IOEventSource event will be serialised. Note that IOCommandGate jobs will run synchronously on the calling thread, not the workloop thread.
As always with macOS/OSX internals, you will want to look at the header file comments and possibly the implementation in the xnu source for details. I personally find IOWorkLoops a bit clumsy for some tasks, but if you're dealing with PCI devices, etc. you don't really have a choice.
thread_call
A more lightweight background work mechanism is the thread_call API. It's defined in <kern/thread_call.h> and supports running functions on an OS-managed background thread, optionally after a delay or with a specific priority. This is probably closer to what you know from Linux, has a fairly straightforward API, but is not suitable for secondary interrupt handlers.

Is it possible to allow a particular user-level application to run in kernel-mode?

This is a hypothetical question. Suppose there is an application (which typically executes in user mode) that wants to access kernel data structures, read register values, and perform some kernel-level functions.
Is there a way for kernel and/or CPU to allow this application to perform its functions while maintaining the normal user-level/kernel-level isolation for other applications except this one?
In order to either put your app in kernel space (kernel memory) or to run it in ring 0 CPU mode, you will need to do that from kernel code. In normal state of operation you can't run app from the kernel with mentioned privileges (at least there is no existing API to do that). It's probably possible to implement some kernel code which is able of this. But it will be tricky and will mess up the whole concept of kernel-space/user-space separation, and if any advanced user-space API was used -- it won't work anyway.
If you are thinking about just giving your app ring 0 privileges -- it won't work either, because kernel has its own stack and because of kernel-space/user-space memory separation, so you won't be able to run internal kernel API.
Basically, you can achieve the same thing by writing kernel module instead. And for running some kernel code on behalf of user-space app -- you can use system calls interface.
So, answering your question: no, it's not possible to run user-space app in kernel mode so it can use internal kernel API.

Two-way communication between kernel-mode driver and user-mode application?

I need a two-way communication between a kernel-mode WFP driver and a user-mode application. The driver initiates the communication by passing a URL to the application which then does a categorization of that URL (Entertainment, News, Adult, etc.) and passes that category back to the driver. The driver needs to know the category in the filter function because it may block certain web pages based on that information. I had a thread in the application that was making an I/O request that the driver would complete with the URL and a GUID, and then the application would write the category into the registry under that GUID where the driver would pick it up. Unfortunately, as the driver verifier pointed out, this is unstable because the Zw registry functions have to run at PASSIVE_LEVEL. I was thinking about trying the same thing with mapped memory buffers, but I’m not sure what the interrupt requirements are for that. Also, I thought about lowering the interrupt level before the registry function calls, but I don't know what the side effects of that are.
You just need to have two different kinds of I/O request.
If you're using DeviceIoControl to retrieve the URLs (I think this would be the most suitable method) this is as simple as adding a second I/O control code.
If you're using ReadFile or equivalent, things would normally get a bit messier, but as it happens in this specific case you only have two kinds of operations, one of which is a read (driver->application) and the other of which is a write (application->driver). So you could just use WriteFile to send the reply, including of course the GUID so that the driver can match up your reply to the right query.
Another approach (more similar to your original one) would be to use a shared memory buffer. See this answer for more details. The problem with that idea is that you would either need to use a spinlock (at the cost of system performance and power consumption, and of course not being able to work on a single-core system) or to poll (which is both inefficient and not really suitable for time-sensitive operations).
There is nothing unstable about PASSIVE_LEVEL. Access to registry must be at PASSIVE_LEVEL so it's not possible directly if driver is running at higher IRQL. You can do it by offloading to work item, though. Lowering the IRQL is usually not recommended as it contradicts the OS intentions.
Your protocol indeed sounds somewhat cumbersome and doing a direct app-driver communication is probably preferable. You can find useful information about this here: http://msdn.microsoft.com/en-us/library/windows/hardware/ff554436(v=vs.85).aspx
Since the callouts are at DISPATCH, your processing has to be done either in a worker thread or a DPC, which will allow you to use ZwXXX. You should into inverted callbacks for communication purposes, there's a good document on OSR.
I've just started poking around WFP but it looks like even in the samples that they provide, Microsoft reinject the packets. I haven't looked into it that closely but it seems that they drop the packet and re-inject whenever processed. That would be enough for your use mode engine to make the decision. You should also limit the packet capture to a specific port (80 in your case) so that you don't do extra processing that you don't need.

How to call usermode from Windows kernel?

I'd like to call my app from my driver when an interesting event happens in the Windows kernel. I need to be able to pass at least 4 bytes of data back to user mode. How to achieve this? These events might happen quite, but not too, often, so I don't want to build a queue system and use IOCTLs.
I was thinking of something like the driver gets loaded, the user mode app registers its callback using IOCTL and kernel keeps calling that callback when events happen and finally the user mode client unregisters the callback and no more data is send to user mode. Is this possible?
I'm new to kernel programming, so after a day of googling I decided to ask here. I've noticed that there isn't much discussion about the kernel and drivers. And even less proper docs.

Resources