What is irq_bypass and how to use it in Linux? - linux-kernel

I am learning VFIO in Linux, and found there is a kernel module irq_bypass, which is being used by VFIO. I read its codes, and found it has functions to add/delete irq producer and consumer. The code submitter described as follows,
The IRQ bypass manager here is meant to provide the shim to connect
interrupt producers, generally the host physical device driver, with
interrupt consumers, generally the hypervisor, in order to configure
these bypass mechanism
So I wrote a module to call the irq_bypass interfaces to figure out its working flow.
The kernel module register irq producer and consumer for an interrupt generated by calling 'int' instruction within this module. But I am not sure if I am doing right, since I did NOT see the consumer is being triggered.
Maybe I am wrong in understanding the mechanism of the irq_bypass module, if so, how does it work in virtualization system or why it is needed in KVM/VFIO in Linux?

Related

Serializing the driver's init call order, which were registered at the same init level

I have come across a scenario where two drivers (driver1 and driver2) are registering with kernel using module_init(). driver1 is configuring the chip and driver2 is accessing that chip, so driver1's module_init() has to be called before driver2's module_init() call. How we can create this order, as both driver1 and driver2 are registering at the same init call level (i.e using module_init()).
Can driver2 somehow determine that the chip has been properly configured, using either device status (preferred) or the value of a shared variable (messy)?
If so, then driver2's probe routine should return with -EPROBE_DEFER when the chip has not yet been configured (by driver1).
See https://lwn.net/Articles/485194/
drivercore: Add driver probe deferral mechanism
Allow drivers to report at probe time that they cannot get all the resources
required by the device, and should be retried at a later time.
This should completely solve the problem of getting devices
initialized in the right order. Right now this is mostly handled by
mucking about with initcall ordering which is a complete hack, and
doesn't even remotely handle the case where device drivers are in
modules. This approach completely sidesteps the issues by allowing
driver registration to occur in any order, and any driver can request
to be retried after a few more other drivers get probed.

is there a workqueue feature in xnu kernel?

I need to use workqueue-like feature on Mac OSX (kernel mode driver) and am looking for a way to add work into a queue to be processed by a kernel thread later. Conceptually this is the same thing as workqueue feature available in Linux kernel. Is there something similar on XNU kernel as well?
I don't think there's a direct equivalent as such, although I admit I'm not intimately familiar with the Linux side, so I'll avoid comparing and just tell you about what's available on macOS/xnu.
I/O Kit IOWorkLoops
If you're building an I/O Kit driver, and especially if you're writing a secondary interrupt handler, you'll be using IOWorkLoops. Interrupts are abstracted by IOEventSource objects, which schedule secondary interrupt handlers to run on the driver's IOWorkLoop.
Each IOWorkLoop wraps one kernel thread and also provides a serialisation/locking mechanism for resources shared with that thread. All jobs submitted to a workloop either explicitly through an IOCommandGate or the workloop object directly, or as a result of an IOEventSource event will be serialised. Note that IOCommandGate jobs will run synchronously on the calling thread, not the workloop thread.
As always with macOS/OSX internals, you will want to look at the header file comments and possibly the implementation in the xnu source for details. I personally find IOWorkLoops a bit clumsy for some tasks, but if you're dealing with PCI devices, etc. you don't really have a choice.
thread_call
A more lightweight background work mechanism is the thread_call API. It's defined in <kern/thread_call.h> and supports running functions on an OS-managed background thread, optionally after a delay or with a specific priority. This is probably closer to what you know from Linux, has a fairly straightforward API, but is not suitable for secondary interrupt handlers.

module context of execution

I work on module for ipsec in linux. Look at two different situations when code from my module will be executed.
Executing from process context: application generate some traffic to transmit via network, application should call some syscall to transfer data, then process switch to kernel space and packet go through network subsystem of linux, somewere here will be executed my module, and all finished after affording task to network card. All these steps performed from process context and in any moment scheduler can switch process from one to another. Is as follows fist case of using my module - from process context.
Executing from softirq context: when network card receive packet it generate hardware interrupt, which "prepare" appropriate softirq to run. And packet go through network subsystem of linux (including my module) until some application got it. These steps performed from softirq context and could be interrupted only by hardware interrupt, but not by scheduler work.
The question is: How can I programmatically determine in module, from which context module is executing? It can be some element of struct task_struct or some syscall or something else. I couldn't find it by myself.
It is considered as a bad practice to make a function's control flow dependent from whether it is executed in interrupt context or not.
Citation from the Linux kernel developer (Andrew Morton):
The consistent pattern we use in the kernel is that callers keep track of whether they are running in a schedulable context and, if necessary, they will inform callees about that. Callees don't work it out for themselves.
However, there are several functions(macros) defined in linux/preempt.h for detect current scheduling context: in_atomic(), in_interrupt(). But see that LWN article about their usage.

Accessing Platform Device from Userpace

From a general standpoint, I am trying to figure out how to access a platform device from userspace. To be more specific, I have a EMIF controller on and SoC of which I have added to my device tree and I believe it is correctly bound to a pre-written EMIF platform device driver. Now I am trying to figure out how I can access this EMIF device from a userspace application. I have come accross a couple different topics that seem to have some connection to this issue but I cannot quite find out how they relate.
1) As I read it seems like most I/O is done through the use of device nodes which are created by mknod(), do I need to create a device node in order to access this device?
2) I have read a couple threads that talk about writting a Kernel module (Character?, Block?) that can interface with both userspace and the platform device driver, and use it as an intermediary.
3) I have read about the possibility of using mmap() to map the memory of my platform device into my virtual memory space. Is this possible?
4) It seems that when the EMIF driver is instantiated, it calls the probe() fucntion. What functions would a userpace application call in the driver?
It's not completely clear what you're needing to do (and I should caveat that I have no experience with EMIF or with "platform devices" specifically), but here's some overview to help you get started:
Yes, the usual way of providing access to a device is via a device node. Usually this access is provided by a character device driver unless there's some more specific way of providing it. Most of the time if an application is talking "directly" to your driver, it's a character device. Most other types of devices are used in interfacing with other kernel subsystems: for example, a block device is typically used to provide access from a file system driver (say) to an underlying disk drive; a network driver provides access to the network from the in-kernel TCP/IP stack, etc.
There are several char device methods or entry points that can be supported by your driver, but the most common are "read" (i.e. if a user-space program opens your device and does a read(2) from it), "write" (analogous for write(2)) and "ioctl" (often used for configuration/administrative tasks that don't fall naturally into either a read or write). Note that mknod(2) only creates the user-space side of the device. There needs to be a corresponding device driver in the kernel (the "major device number" given in the mknod call links the user-space node with the driver).
For actually creating the device node in the file system, this can be automated (i.e. the node will automatically show up in /dev) if you call the right kernel functions while setting up your device. There's a special daemon that gets notifications from the kernel and responds by executing the mknod(2) system call.
A kernel module is merely a dynamically loadable way of creating a driver or other kernel extension. It can create a character, block or network device (et al.), but then so can a statically linked module. There are some differences in capability mostly because not all kernel functions you might want to use are "exported" to (i.e. visible to) dynamically loaded modules.
It's possible to support mapping of the device memory into user virtual memory space. This would be implemented by yet another driver entry point (mmap). See struct file_operations for all the entry points a char driver can support.
This is pretty much up to you: it depends on what the application needs to be able to do. There are many drivers in the kernel that provide no direct function to user-space, only to other kernel code. As to "probe", there are many probe functions defined in various interfaces. In most cases, these are called by the kernel (or perhaps by a 'higher level "class" driver') to allow the specific driver to discover, identify and "claim" individual devices. They (probe functions) don't usually have anything directly to do with providing access from user-space but I might well be missing something in a particular interface.
You need to create a device node in order to access the device.
The probe function is called when the driver finds a matching device.
For information on platform device API, the following articles could be useful.
The platform device API
Platform devices and device trees

workqueues in userspace(or userpace device drivers)

I'm working on device drivers(HDMI, HDCP) which had been implemented in user-space.
Now, I'm looking for similar-to-linux-workqueue functionality in user-space.
What I want:
a.) To Tie-up different work/functions on a workqueue and run it.
b.) Able to flush the workqueue when you are shutting down your driver or resetting your driver state machine.
c.) Add delayed execution of work-items.
d.) cancel current-work item etc.
I'm familiar with Linux kernel work-queues and work structures(though not expert) and hence, my curiosity that how we can emulate similar mechanism in user-space level ?
Probably,I can write such kind of library by using Pthread APIs mixing it with some global queue.
Any idea/suggestions?
Using an eventloop library, such as libev or libevent would get the job done.

Resources