Can any body tell why MSI interrupts are not shareable in linux.
PIN based interrupts can be shared by devices, but MSI interrupts are not shared by devices, each device gets its own MSI IRQ number. Why can't MSI interrupts be shared ?
The old INTx interrupts have two problematic properties:
Each INTx signal requires a separate signal line in hardware; and
the interrupt signal is independent of the other data signals, and this is sent asynchronously.
The consequences are that
multiple devices and drivers need to be able to share interrupts (the interrupt handler needs to check if its device actually raised the interrupt); and
when a driver receives an interrupt, it needs to do a read of some device register to ensure that any previous DMA writes made by the device are visible on the CPU.
Typically, both cases are handled by the driver reading its device's interrupt status register.
Message-Signaled Interrupts do not require a separate signal line but are sent as a message over the data bus. This means that the same hardware can support many more interrupts (so sharing it not necessary), and that the interrupt message is automatically synchronized with any DMA accesses. As a consequence, the interrupt handler does not need to do anything; the interrupt is guaranteed to come from its device, and DMA'd data is guaranteed to have already arrived.
If some drivers were written to share some MSI, the interrupt handler would again have to check whether the interrupt actually came from its own device, and there would be no advantage over INTx interrupts.
MSIs are not shared because it would not be possible, but because it is not necessary.
Please note that sharing an MSI is actually possible: as seen in this excerpt from /proc/interrupts, the Advanced Error Reporting, Power Management Events, and hotplugging drivers share one interrupt:
64: 0 0 PCI-MSI-edge aerdrv, PCIe PME, pciehp
These drivers are actually attached to the same device, but they still behave similar to INTx drivers, i.e., they register their interrupt with IRQF_SHARED, and the interrupt handlers check whether it was their own function that raised the interrupt.
Interrupt sharing is a hack due to resource constraints, like not having enough physical IRQ lines for each device that wants attention. If interrupts are represented by messages that have a large ID space, why would you do that?
"That" meaning: giving them the same identity so that devices then have to be probed to figure out which of the ones clashing to the same ID actually interrupted.
In fact, we would sometimes like to have multiple interrupts for one device. For instance, it's useful if the interrupt ID tells us not only which device interrupted by also why: like is it due to the arrival of input, or the draining of an output buffer? If interrupt lines are "cheap" because they are just software ID's with lots of bits, we can have that.
Related
I have a Linux device driver which allows a userspace process to mmap() certain regions of the device's MMIO space for writing. The device may at some point decide to revoke access to the region, and will notify the driver when this happens. The driver (asynchronously) notifies the userspace process to stop using this region.
I'd like the driver to immediately zap the PTEs for this mapping so they can be returned to device control, however, the userspace process might still be finishing a write. I'd like to simply discard these writes. The user does not need to know which writes made it to the device and which writes were discarded. What can the driver's fault handler do after zapping the PTEs that can discard writes to the region harmlessly?
For the userspace process to make progress, the PTE needs to end up pointing to a writeable page.
If you don't want it writing to your device MMIO region, this implies you'll need to allocate a page of normal memory for the write to go to, just like the fault handler does for an anonymous VMA.
Alternatively, you could let your userspace task take a SIGBUS when this revocation event occurs, and just specify that a task using this device should expect this to happen and must install a SIGBUS handler that uses longjmp() to cancel its attempt to write to the device. The downside of this approach - apart from the additional complexity it dumps onto userspace - is that it makes using your device difficult from a library, as signal handlers are process-global state.
When we use irq_set_chained_handler the irq line will not be disabled or not, when we are servicing the associated handler, as in case of request_irq.
It doesn't matter how the interrupt was setup. When any interrupt occurred, all interrupts (for this CPU) will be disabled during the interrupt handler. For example, on ARM architecture first place in C code where interrupt handling is found is asm_do_IRQ() function (defined in arch/arm/kernel/irq.c). It's being called from assembler code. For any interrupt (whether it was requested by request_irq() or by irq_set_chained_handler()) the same asm_do_IRQ() function is called, and interrupts are disabled automatically by ARM CPU. See this answer for details.
Historical notes
Also, it worth to be mentioned that some time ago Linux kernel was providing two types of interrupts: "fast" and "slow" ones. Fast interrupts (when using IRQF_DISABLED or SA_INTERRUPT flag) were running with disabled interrupts, and those handlers supposed to be very short and quick. Slow interrupts, on the other hand, were running with re-enabled interrupts, because handlers for slow interrupts may take much of time to be handled.
On modern versions of Linux kernel all interrupts are considered as "fast" and are running with interrupts disabled. Interrupts with huge handlers must be implemented as threaded (or enable interrupts manually in ISR using local_irq_enable_in_hardirq()).
That behavior was changed in Linux kernel v2.6.35 by this commit. You can find more details about this here.
Refer https://www.kernel.org/doc/Documentation/gpio/driver.txt
This means the GPIO irqchip is registered using
irq_set_chained_handler() or the corresponding
gpiochip_set_chained_irqchip() helper function, and the GPIO irqchip
handler will be called immediately from the parent irqchip, while
holding the IRQs disabled. The GPIO irqchip will then end up calling
something like this sequence in its interrupt handler:
I have been trying to understand how do h/w interrupts end up in some user space code, through the kernel.
My research led me to understand that:
1- An external device needs attention from CPU
2- It signals the CPU by raising an interrupt (h/w trance to cpu or bus)
3- The CPU asserts, saves current context, looks up address of ISR in the
interrupt descriptor table (vector)
4- CPU switches to kernel (privileged) mode and executes the ISR.
Question #1: How did the kernel store ISR address in interrupt vector table? It might probably be done by sending the CPU some piece of assembly described in the CPUs user manual? The more detail on this subject the better please.
In user space how can a programmer write a piece of code that listens to a h/w device notifications?
This is what I understand so far.
5- The kernel driver for that specific device has now the message from the device and is now executing the ISR.
Question #3:If the programmer in user space wanted to poll the device, I would assume this would be done through a system call (or at least this is what I understood so far). How is this done? How can a driver tell the kernel to be called upon a specific systemcall so that it can execute the request from the user? And then what happens, how does the driver gives back the requested data to user space?
I might be completely off track here, any guidance would be appreciated.
I am not looking for specific details answers, I am only trying to understand the general picture.
Question #1: How did the kernel store ISR address in interrupt vector table?
Driver calls request_irq kernel function (defined in include/linux/interrupt.h and in kernel/irq/manage.c), and Linux kernel will register it in right way according to current CPU/arch rules.
It might probably be done by sending the CPU some piece of assembly described in the CPUs user manual?
In x86 Linux kernel stores ISR in Interrupt Descriptor Table (IDT), it format is described by vendor (Intel - volume 3) and also in many resources like http://en.wikipedia.org/wiki/Interrupt_descriptor_table and http://wiki.osdev.org/IDT and http://phrack.org/issues/59/4.html and http://en.wikibooks.org/wiki/X86_Assembly/Advanced_Interrupts.
Pointer to IDT table is registered in special CPU register (IDTR) with special assembler commands: LIDT and SIDT.
If the programmer in user space wanted to poll the device, I would assume this would be done through a system call (or at least this is what I understood so far). How is this done? How can a driver tell the kernel to be called upon a specific systemcall so that it can execute the request from the user? And then what happens, how does the driver gives back the requested data to user space?
Driver usually registers some device special file in /dev; pointers to several driver functions are registered for this file as "File Operations". User-space program opens this file (syscall open), and kernels calls device's special code for open; then program calls poll or read syscall on this fd, kernel will call *poll or *read of driver's file operations (http://www.makelinux.net/ldd3/chp-3-sect-7.shtml). Driver may put caller to sleep (wait_event*) and irq handler will wake it up (wake_up* - http://www.makelinux.net/ldd3/chp-6-sect-2 ).
You can read more about linux driver creation in book LINUX DEVICE DRIVERS (2005) by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman: https://lwn.net/Kernel/LDD3/
Chapter 3: Char Drivers https://lwn.net/images/pdf/LDD3/ch03.pdf
Chapter 10: Interrupt Handling https://lwn.net/images/pdf/LDD3/ch10.pdf
Our group is using a custom driver to interface four MAX3107 UARTs on a shared I2C bus. The interrupts of the four MAX3107's are connected (i.e. shared interrupt via logic or'ing)) to a GPIO pin on the ARM9 processor (LPC3180 module). When one or more of these devices interrupt, they pull the GPIO line, which is configured as a level-sensitive interrupt, low. My question concerns the need, or not, to disable the specific interrupt line in the handler code. (I should add that we are running Linux 2.6.10).
Based on my reading of several ARM-specific app notes on interrupts, it seems that when the ARM processor receives an interrupt, it automatically disables (masks?) the corresponding interrupt line (in our case this would seem to be the line corresponding to the GPIO pin we selected). If this is true, then it seems that we should not have to disable interrupts for this GPIO pin in our interrupt handler code as doing so would seem redundant (though it seems to work okay). Stated differently, it seems to me that if the ARM processor automatically disables the GPIO interrupt upon an interrupt occurring, then if anything, our interrupt handler code should only have to re-enable the interrupt once the device is serviced.
The interrupt handler code that we are using includes disable_irq_nosync(irqno); at the very beginning of the handler and a corresponding enable_irq() at the end of the handler. If the ARM processor has already disabled the interrupt line (in hardware), what is the effect of these calls (i.e. a call to disable_irq_nosync() followed by a call to enable(irq())?
From the Arm Information Center Documentation:
On entry to an exception (interrupt):
interrupt requests (IRQs) are disabled for all exceptions
fast interrupt requests (FIQs) are disabled for FIQ and Reset exceptions.
It then goes on to say:
Handling an FIQ causes IRQs and subsequent FIQs to be disabled,
preventing them from being handled until after the FIQ handler enables
them. This is usually done by restoring the CPSR from the SPSR at the
end of the handler.
So you do not have to worry about disabling them, but you do have to worry about re-enabling them.
You will need to include enable_irq() at the end of your routine, but you shouldn't need to disable anything at the beginning. I wouldn't think that calling disable_irq_nosync(irqno) in software after it has been called in hardware would effect anything. Since the hardware call is most definitely called before the software call has a chance to take over. But it's probably better to remove it from the code to follow convention and not confuse the next programmer who takes a look at it.
More info here:
Arm Information Center
I am actually reading Windows Internals 5th edition and i am enjoying, although isn't a easy book to read and understand.
I am confused about IRQLs and IDT Table.
I read that windows implement custom priorization levels with IRQL and the Plug and Play Manager maps IRQ from devices to IRQL.
Alright, so, IRQLs are used for Software and Hardware interrupts, and for exceptions is used the Exception Dispatch handler.
When one device generates an interrupt, the interrupt controller pass this information to the CPU with the IRQ.
So Windows takes this IRQ and translates to IRQL to schedule when to execute the routine (routine that IDT[IRQ_VALUE] is pointing to?
Is that what is happening?
Yes, on a very high level.
Everything starts with a kernel trap. Kernel trap handler handles interrupts, exceptions, system service calls and virtual memory pager.
When an interrupt happens (line based - using dedicated pin or message based- writing to an address) windows uses IRQL to determine the priority of the interrupt and uses this to see if the interrupt can be served or not during that time. HAL does the job of translating the IRQ to IRQL.
It then uses IRQ to get an index of the IDT to find the appropriate ISR routing to invoke. Note there can be multiple ISR associated for a given IRQ. All of them execute in order.
Each processor has its own IDT so you could potentially have multiple ISR's running at the same time.
Exception dispatch, as I mentioned before, is also handled by the kernel trap but the procedure for it is different. It usually starts by checking for any exception handlers by stack unwinding, then checking for debugger port etc.