Interrupt handling on an SMP ARM system with a GIC - linux-kernel

I wanted to know how interrupt handling works from the point any device is interrupted.I know of interrupt handling in bits and pieces and would like to have clear end to end picture of interrupt handing.Let me put across what little I know about interrupt handling.
Suppose an FPGA device is interrupted through electrical lines and get some data .Device driver for this FPGA device already had code (Interrupt handler) registered using request_irq function.
So now FPGA device have an IRQ line which it get after to call request_irq ,using this IRQ line device send data to the General Interrupt controller and GIC will do many to one translation of IRQ lines and send the signal to CPU core which then call below minimal code
IRQ_handler
SUB lr, lr, #4 ; modify LR
SRSFD #0x12! ; store SPSR and LR to IRQ mode stack
PUSH {r0-r3, r12} ; store AAPCS registers on to the IRQ mode stack
BL IRQ_handler_to_specific_device
POP {r0-r3, r12} ; restore registers
RFEFD sp! ; and return from the exception using pre-modified LR
IRQ_handler_to_specific_device is nothing is what we registered in Device driver using request_irq() call.
I still don't how CPU core comes to know about the interrupt source?(from which device interrupt is coming)
Also what is role of call like do_irq and shared interrupts works?
Need some help in understanding end to end picture on how interrupts are handled on ARM architecture?

The GIC is divided into two sections. The first is called the distributor. This is global to the system. It has several interrupt sources physically routed to it; although it maybe within an SOC package. The second section is replicated per-CPU and it called the cpu interface. The distributor has logic on how to distribute the shared peripheral interrupts or SPI. These are the type of interrupt your question is asking about. They are global hardware interrupts.
In the context of Linux, this is implemented in irq-gic.c. There is some documentation in gic.txt. Of specific interest,
reg : Specifies base physical address(s) and size of the GIC registers. The
first region is the GIC distributor register base and size. The 2nd region is
the GIC cpu interface register base and size.
The distributor must be accessed globally, so care must be taken to manage it's registers. The CPU interface has the same physical address for each CPU, but each CPU has a separate implementation. The distributor can be set up to route interrupts to specific CPUs (including multiples). See: gic_set_affinity() for example. It is also possible for any CPU to handle the interrupt. The ACK register will allocate IRQ; the first CPU to read it, gets the interrupt. If multiple IRQs pend and there are two ACK reads from different CPUs, then each will get a different interrupt. A third CPU reading would get a spurious IRQ.
As well, each CPU interface has some private interrupt sources, that are used for CPU-to-CPU interrupts as well as private timers and the like. But I believe the focus of the question is how a physical peripheral (unique to a system) gets routed to a CPU in an SMP system.

Related

Why are MSI interrupts not shared?

Can any body tell why MSI interrupts are not shareable in linux.
PIN based interrupts can be shared by devices, but MSI interrupts are not shared by devices, each device gets its own MSI IRQ number. Why can't MSI interrupts be shared ?
The old INTx interrupts have two problematic properties:
Each INTx signal requires a separate signal line in hardware; and
the interrupt signal is independent of the other data signals, and this is sent asynchronously.
The consequences are that
multiple devices and drivers need to be able to share interrupts (the interrupt handler needs to check if its device actually raised the interrupt); and
when a driver receives an interrupt, it needs to do a read of some device register to ensure that any previous DMA writes made by the device are visible on the CPU.
Typically, both cases are handled by the driver reading its device's interrupt status register.
Message-Signaled Interrupts do not require a separate signal line but are sent as a message over the data bus. This means that the same hardware can support many more interrupts (so sharing it not necessary), and that the interrupt message is automatically synchronized with any DMA accesses. As a consequence, the interrupt handler does not need to do anything; the interrupt is guaranteed to come from its device, and DMA'd data is guaranteed to have already arrived.
If some drivers were written to share some MSI, the interrupt handler would again have to check whether the interrupt actually came from its own device, and there would be no advantage over INTx interrupts.
MSIs are not shared because it would not be possible, but because it is not necessary.
Please note that sharing an MSI is actually possible: as seen in this excerpt from /proc/interrupts, the Advanced Error Reporting, Power Management Events, and hotplugging drivers share one interrupt:
64: 0 0 PCI-MSI-edge aerdrv, PCIe PME, pciehp
These drivers are actually attached to the same device, but they still behave similar to INTx drivers, i.e., they register their interrupt with IRQF_SHARED, and the interrupt handlers check whether it was their own function that raised the interrupt.
Interrupt sharing is a hack due to resource constraints, like not having enough physical IRQ lines for each device that wants attention. If interrupts are represented by messages that have a large ID space, why would you do that?
"That" meaning: giving them the same identity so that devices then have to be probed to figure out which of the ones clashing to the same ID actually interrupted.
In fact, we would sometimes like to have multiple interrupts for one device. For instance, it's useful if the interrupt ID tells us not only which device interrupted by also why: like is it due to the arrival of input, or the draining of an output buffer? If interrupt lines are "cheap" because they are just software ID's with lots of bits, we can have that.

From Kernel Space to User Space: Inner-workings of Interrupts

I have been trying to understand how do h/w interrupts end up in some user space code, through the kernel.
My research led me to understand that:
1- An external device needs attention from CPU
2- It signals the CPU by raising an interrupt (h/w trance to cpu or bus)
3- The CPU asserts, saves current context, looks up address of ISR in the
interrupt descriptor table (vector)
4- CPU switches to kernel (privileged) mode and executes the ISR.
Question #1: How did the kernel store ISR address in interrupt vector table? It might probably be done by sending the CPU some piece of assembly described in the CPUs user manual? The more detail on this subject the better please.
In user space how can a programmer write a piece of code that listens to a h/w device notifications?
This is what I understand so far.
5- The kernel driver for that specific device has now the message from the device and is now executing the ISR.
Question #3:If the programmer in user space wanted to poll the device, I would assume this would be done through a system call (or at least this is what I understood so far). How is this done? How can a driver tell the kernel to be called upon a specific systemcall so that it can execute the request from the user? And then what happens, how does the driver gives back the requested data to user space?
I might be completely off track here, any guidance would be appreciated.
I am not looking for specific details answers, I am only trying to understand the general picture.
Question #1: How did the kernel store ISR address in interrupt vector table?
Driver calls request_irq kernel function (defined in include/linux/interrupt.h and in kernel/irq/manage.c), and Linux kernel will register it in right way according to current CPU/arch rules.
It might probably be done by sending the CPU some piece of assembly described in the CPUs user manual?
In x86 Linux kernel stores ISR in Interrupt Descriptor Table (IDT), it format is described by vendor (Intel - volume 3) and also in many resources like http://en.wikipedia.org/wiki/Interrupt_descriptor_table and http://wiki.osdev.org/IDT and http://phrack.org/issues/59/4.html and http://en.wikibooks.org/wiki/X86_Assembly/Advanced_Interrupts.
Pointer to IDT table is registered in special CPU register (IDTR) with special assembler commands: LIDT and SIDT.
If the programmer in user space wanted to poll the device, I would assume this would be done through a system call (or at least this is what I understood so far). How is this done? How can a driver tell the kernel to be called upon a specific systemcall so that it can execute the request from the user? And then what happens, how does the driver gives back the requested data to user space?
Driver usually registers some device special file in /dev; pointers to several driver functions are registered for this file as "File Operations". User-space program opens this file (syscall open), and kernels calls device's special code for open; then program calls poll or read syscall on this fd, kernel will call *poll or *read of driver's file operations (http://www.makelinux.net/ldd3/chp-3-sect-7.shtml). Driver may put caller to sleep (wait_event*) and irq handler will wake it up (wake_up* - http://www.makelinux.net/ldd3/chp-6-sect-2 ).
You can read more about linux driver creation in book LINUX DEVICE DRIVERS (2005) by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman: https://lwn.net/Kernel/LDD3/
Chapter 3: Char Drivers https://lwn.net/images/pdf/LDD3/ch03.pdf
Chapter 10: Interrupt Handling https://lwn.net/images/pdf/LDD3/ch10.pdf

Linux PCI Device Driver - Bus v. Kernel IRQ

I am writing a device driver for a PCIe card in Linux. I am trying to use interrupts in my driver.
Reading the "IRQ Line" section of the PCI configuration register (offset 0x3C) reports that the assigned IRQ line for the device is 11. lspci -b -vv also reports that my device's interrupt number is 11.
Heres where it gets weird... cat /sys/bus/pci/devices/<my_device>/irq reports that the interrupt number is 19. lspci -vv also reports that the interrupt number is 19.
Requesting 11 in my driver does not work. If I request 19 in the driver, I catch interrupts just fine.
What gives?
Thanks!!!
I believe that it has to do with the difference between "physical" and "virtual" IRQ lines. Because the processor has a limited number of physical IRQ lines it assigns virtual IRQ lines to allow the total number of PCI devices to exceed the number of physical lines.
In this instance, 19 is your virtual IRQ line (as recognized by the processor) while 11 is the physical line (as recognized by the PCI device).
By the way, you should probably really get the IRQ number from the struct pci_dev for that device since they're dynamically generated.
Sean's answer is easy to understand. However here I would try to make it more complete.
CPU's IRQ pin, almost always, isn't connected directly to a peripheral device, but via an programmable interrupt controller(PIC, e.g. Intel 8259A). This helps handling large device fan-out and also heterogeneous interrupt format (pin based v.s. message based as in PCIe).
If you run a recent version of lspci, it would print information like
Interrupt: pin A routed to IRQ 26
Here, pin A as 11 in OP, is the physical pin. This is something saved by the PCI device and used by the hardware to exchange between interrupts controller. From LDP:
The PCI set up code writes the pin number of the interrupt controller
into the PCI configuration header for each device. It determines the
interrupt pin (or IRQ) number using its knowledge of the PCI interrupt
routing topology together with the devices PCI slot number and which
PCI interrupt pin that it is using. The interrupt pin that a device
uses is fixed and is kept in a field in the PCI configuration header
for this device. It writes this information into the interrupt line
field that is reserved for this purpose. When the device driver runs,
it reads this information and uses it to request control of the
interrupt from the Linux kernel.
IRQ 26 as 19 in OP is something that kernel code and CPU deal with. According to Linux Documentation/IRQ.txt:
An IRQ number is a kernel identifier used to talk about a hardware
interrupt source. Typically this is an index into the global irq_desc
array, but except for what linux/interrupt.h implements the details
are architecture specific.
So the PCI first receives interrupts from device, translate interrupt source to a IRQ number and informs the CPU. CPU use IRQ number to look into Interrupt Descriptor Table(IDT) and find the correct software handler.
Ref:
http://www.tldp.org/LDP/tlk/dd/interrupts.html
http://www.brokenthorn.com/Resources/OSDevPic.html

Trap Dispatching on Windows

I am actually reading Windows Internals 5th edition and i am enjoying, although isn't a easy book to read and understand.
I am confused about IRQLs and IDT Table.
I read that windows implement custom priorization levels with IRQL and the Plug and Play Manager maps IRQ from devices to IRQL.
Alright, so, IRQLs are used for Software and Hardware interrupts, and for exceptions is used the Exception Dispatch handler.
When one device generates an interrupt, the interrupt controller pass this information to the CPU with the IRQ.
So Windows takes this IRQ and translates to IRQL to schedule when to execute the routine (routine that IDT[IRQ_VALUE] is pointing to?
Is that what is happening?
Yes, on a very high level.
Everything starts with a kernel trap. Kernel trap handler handles interrupts, exceptions, system service calls and virtual memory pager.
When an interrupt happens (line based - using dedicated pin or message based- writing to an address) windows uses IRQL to determine the priority of the interrupt and uses this to see if the interrupt can be served or not during that time. HAL does the job of translating the IRQ to IRQL.
It then uses IRQ to get an index of the IDT to find the appropriate ISR routing to invoke. Note there can be multiple ISR associated for a given IRQ. All of them execute in order.
Each processor has its own IDT so you could potentially have multiple ISR's running at the same time.
Exception dispatch, as I mentioned before, is also handled by the kernel trap but the procedure for it is different. It usually starts by checking for any exception handlers by stack unwinding, then checking for debugger port etc.

How to send nmi on same system

I need to send an nmi on the system I am working on. I want to test few things which I have implemented. Is there any windows driver routine which allows us to do that? I think I can write to a port using __outword. Is there any other way to do it?
I have one more question. Are there any specific scenarios which causes an NMI? (However, I dont want system to BSOD or triple fault.)
Thanks
From Intel's Software Development Manual: System Programming Guide:
The nonmaskable interrupt (NMI) can be generated in either of two ways:
External hardware asserts the NMI pin.
The processor receives a message on the system bus (Pentium 4, Intel Core Duo, Intel Core 2, Intel Atom, and Intel Xeon processors) or the APIC serial bus (P6 family and Pentium processors) with a delivery mode NMI.
and
It is possible to issue a maskable hardware interrupt (through the INTR pin) to vector 2 to invoke the NMI interrupt handler; however, this interrupt will not truly be an NMI interrupt. A true NMI interrupt that activates the processors NMI-handling hardware can only be delivered through one of the mechanisms listed above.
So, if all you want to do is trigger the NMI handler, you can simply use int $2 (int 02h in Intel syntax). But, if you need to ensure that it is not masked, you will either need external hardware to trigger it, or to use the APIC.
If you choose to use the APIC to send an NMI, the easiest way to do it is to send an inter-processor interrupt. To do this, you will need access to the local APIC's registers, which are mapped into physical memory, by default at the address 0xFEE00000, although that can be changed. You will need to find the physical page containing the APIC's registers and map it into virtual memory so that you can access them.
In order to send an IPI, you need to write into the interrupt configuration register. The ICR's low 32 bits are located at 0x300 within the APIC's page, and the upper 32 bits are at 0x310. To send the NMI, you need to:
Get the APIC ID of the processor you want to send the NMI to. If you want to send it to the processor you are running on, this is simple since you can read it from the APIC at 0x20 in bits 24-31.
Write the APIC ID into the destination field, bits 24-31 of the high ICR register.
Write the value 0x4400 into the low ICR register. Bits 8-10 of this value indicate that you are sending an NMI, and bit 14 indicates that you are using the assert trigger mode.
When writing to an APIC register, you must write a full 32 bit value. Also, bits 13, 16-17, and 20-55 in the ICR are reserved, so you should not change their values. You also must write to the high bits of the ICR before the low bits, since the IPI is triggered by the write to the low bits.
Here is an example of sending an NMI to the current processor in C.
#define APIC_ID_OFFSET 0x20
#define ICR_LOW_OFFSET 0x300
#define ICR_HIGH_OFFSET 0x310
// Convenience macro used to access APIC registers
#define APIC_REG(offset) (*(unsigned int*)(apicAddress + offset))
void *apicAddress; // This should contain the virtual address that the APIC registers are mapped to
// Get the current APIC ID. Leave it in the high 8 bits since that is where it needs to be written anyway
unsigned int apicID = APIC_REG(APIC_ID_OFFSET) & 0xFF000000;
unsigned int high = APIC_REG(ICR_HIGH_OFFSET) & 0x00FFFFFF;
high |= apicID;
unsigned int low = APIC_REG(ICR_LOW_OFFSET) & 0xFFF32000;
low |= 0x4400;
APIC_REG(ICR_HIGH_OFFSET) = high;
APIC_REG(ICR_LOW_OFFSET) = low;

Resources