Can I increase a thread irq priority - linux-kernel

I write a device driver to receive data from the hardware. The logic is when device data is ready, it will send an irq to the Linux host. The communication interface is SPI. However, the SPI controller driver to read data in the machine can not be used in irq immediately, because it will sleep.
So, I use "request_threaded_irq" to create a thread irq and put the data read function into the bottom irq.
unfortunately, I found the bottom of the irq has a large and unstable delay, varying from tens of microseconds to hundreds of microseconds.
My question is, is there a method to to make the time delay less than a value, such as increase the bottom irq priority ?

Related

Interrupt Nested, Sequencing

I am reading the Linux Kernel documents and I have these questions(X86_64 Arch);
When PIC sends an interrupt to CPU, will that disable that specific interrupt till the acknowledgement comes from CPU? If that is the case, why do we need to local_irq_disable() in the ISR?
Related to above question, but say if CPU is processing an interrupt in its ISR and if there are 3 interrupts send by the same device to CPU, how does this going to be handled? Will that be serialised in some buffer(if yes, where)?
X86 architecture supports priority based interrupts?
The PIC is a very old interrupt controller, today interrupts are mostly delivered through MSI or through the APIC hierarchy.
The matter is actually more complicated with the IRQ routing, virtualization and so on.
I won't discuss these.
The interrupt priority concept still exists (though a bit simplified) and it works like this:
When an interrupt request is received by the interrupt controller, all the lower priority interrupts are masked and the interrupt is sent to the CPU.
What actually happens is that interrupts are ordered by their request number, with lower numbers having higher priority (0 has more priority than 1).
When any request line is toggled or asserted, the interrupt controller will scan the status of each request line from the number 0 up to the last one.
It stops as soon as it finds a line asserted or which is marked (with the use or a secondary register) in processing.
This way if request line 2 is first asserted and then request line 4 is, the interrupt controller won't server this last request until the first one is "done" because line 2 stops the scanning.
So local_irq_disable may be used to disable all interrupts, including those with higher priority.
AFAIK, this function should be rarely used today. It is a very simple, but inefficient, way to make sure no other code can run (potentially altering common structures).
In general, there needs to be some coordination between the ISR and the device to avoid losing interrupts.
Some devices require the software to write to a special register to let them know it is able to process the next interrupt. This way the device may implement an internal queue of notifications.
The keyboard controller works kind of like this, if you don't read the scancodes fast enough, you simply lose them.
If the device fires interrupts at will and too frequently, the interrupt controller can buffer the requests so they don't get lost.
Both the PIC and the LAPIC can buffer at most one request while another one is in progress (they basically use the fact that they have a request register and an in-progress register for each interrupt).
So in the case of three interrupts in a row, one is surely lost. If the interrupt controller couldn't deliver the first one to the CPU because a higher priority interrupt was in progress, then two will be lost.
In general, the software doesn't except the interrupt controller to buffer any request.
So you shouldn't find code that relies on this (after all, the only number in CS are 0, 1, and infinity. So 2 doesn't exist as far as the software is concerned).
The x86, as a CPU core, doesn't support priority when dealing with interrupt. If the interrupts are not masked, and a hardware interrupt arrives, it is served. It's up to the software and the interrupt controller to prioritize interrupts.
The PIC and LAPIC (and so the MSIs and the IOAPIC) both give interrupts a priority, so for all practical purposes the x86 supports a priority-based interrupt mechanism.
Note however that giving interrupt priority is not necessarily good, it's hard to tell if a network packet is more important than a keystroke.
So Linux has the guideline to do as little work as possible in the ISR and instead to queue the rest of the work to be processed asynchronously out of the ISR.
This may mean to just return from the ISR to work function in order to not block other interrupts.
In the vast majority of cases, only a small portion of code needs to be run in a critical section, a condition where no other interrupt should occur, so the general approach is to return the EOI to the interrupt controller and unmask the interrupt in the CPU as early as possible and write the code so that it can be interrupted.
In case one needs to stop the other interrupt for performance reasons, the approach usually taken is to split the interrupt across different cores so the load is within the required metrics.
Before multi-core systems were widespread, having too many interrupts would effectively slow down some operations.
I guess it would be possible to load a driver that would denial other interrupts for its own performance but that is a form of QoS/Real-time requirement that is up to the user to settle.

Trigger packet transmit for DPDK/DPAA2 from FPGA

I want to transmit a small static UDP packet upon receiving a trigger signal from an FPGA by GPOI. This has to be done around 1 microsecond with low latency and no jitter. My setup consists of FPGA card is connected tot NXP processor via PCIe lane.
My current experimentation showed that even starting the transmit from the GPIO interrupt handler in the kernel typically exhibits too high a jitter to be useful for the application (about one microsecond should be doable). As I am not familiar with DPDK, I wanted to ask whether it can be of any help in this situation.
Can I use DPDK to do the following
Prepare the UDP payload in Buffer.
Push the buffer to DPAA2.
Poll periodically for the GPIO from FPGA over mmaped area on PCIe in DPDK application.
Trigger the transmit of buffer in DPAA2 (and not CPU DDR memory).
Question: instead of issuing the transmit DPDK rte_eth_tx_burst the FPGA shall directly interact with the networking hardware to queue the packet. Can DPDK on NXP do the same for my use case?
note: If DPDK is not going to help, I think I would need to map an IO portal of the DPAA2 management complex directly into the FPGA. But according to the documentation from NXP, they do not consider DPAA2 a public API (unlike USDPAA) and only support it through e.g. DPDK.

What is the kernel timer system and how is it related to the scheduler?

I'm having a hard time understanding this.
How does the scheduler know that a certain period of time has passed?
Does it use some sort of syscall or interrupt for that?
What's the point of using the constant HZ instead of seconds?
What does the system timer have to do with the scheduler?
How does the scheduler know that a certain period of time has passed?
The scheduler consults the system clock.
Does it use some sort of syscall or interrupt for that?
Since the system clock is updated frequently, it suffices for the scheduler to just read its current value. The scheduler is already in kernel mode so there is no syscall interface involved in reading the clock.
Yes, there are timer interrupts that trigger an ISR, an interrupt service routine, which reads hardware registers and advances the current value of the system clock.
What's the point of using the constant HZ instead of seconds?
Once upon a time there was significant cost to invoking the ISR, and on each invocation it performed a certain amount of bookkeeping, such as looking for scheduler quantum expired and firing TCP RTO retransmit timers. The hardware had limited flexibility and could only invoke the ISR at fixed intervals, e.g. every 10ms if HZ is 100. Higher HZ values made it more likely the ISR would run and find there is nothing to do, that no events had occurred since the previous run, in which case the ISR represented overhead, cycles stolen from a foreground user task. Lower HZ values would impact dispatch latency, leading to sluggish network and interactive response times. The HZ tuning tradeoff tended to wind up somewhere near 100 or 1000 for practical hardware systems. APIs that reported system clock time could only do so in units of ticks, where each ISR invocation would advance the clock by one tick. So callers would need to know the value of HZ in order to convert from tick units to S.I. units. Modern systems perform network tasks on a separately scheduled TCP kernel thread, and may support tickless kernels which discard many of these outdated assumptions.
What does the system timer have to do with the scheduler?
The scheduler runs when the system timer fires an interrupt.
The nature of a pre-emptive scheduler is it can pause "spinning" usermode code, e.g. while (1) {}, and manipulate the run queue, even on a single-core system.
Additionally, the scheduler runs when a process voluntarily gives up its time slice, e.g. when issuing syscalls or taking page faults.

Ring buffers and DMA

I'm trying to understand everything that happens in between the time a packet reaches the NIC until the time the packet is received by the target application.
Assumption: buffers are big enough to hold an entire packet. [I know it is not always the case, but I don't want to introduce too many technical details]
One option is:
1. Packet reaches the NIC.
2. Interrupt is raised.
2. Packet is transferred from the NIC buffer to OS's memory by means of DMA.
3. Interrupt is raised and the OS copies the packet from it's buffer to the relevant application.
The problem with the above is when there is a short burst of data and the kernel can't keep with the pace. Another problem is that every packet triggers an interrupt which sounds very inefficient to me.
I know that to solve at least one of the above problems there is a use of several buffers [ring buffer]. However I don't understand the mechanism which will allow to make this works.
Suppose that:
1. Packet arrives to the NIC.
2. DMA is triggered and the packet is transfered to one of the buffers [from the ring buffer].
3. Handling of the packet is then scheduled for latter time [bottom half].
Will this work?
Is this is what happened in the real NIC driver within the Linux kernel?
According to this slideshare the correct sequence of actions are:
Network Device Receives Frames and these frames are transferred to the DMA ring buffer.
Now After making this transfer an interrupt is raised to let the CPU know that the transfer has been made.
In the interrupt handler routine the CPU transfers the data from the DMA ring buffer to the CPU network input queue for later time.
Bottom Half of the handler routine is to process the packets from the CPU network input queue and pass it to the appropriate layers.
So a slight variant which is followed in this as compared to traditional DMA transfer is regarding the involvement of CPU.
In this we involve CPU after data gets transferred to the DMA ring buffer unlike traditional DMA transfer where we generate the interrupts as soon as data is available and expect CPU to initialise DMA device with appropriate memory locations to make happen the transfer of data.
Read this as well: https://www.safaribooksonline.com/library/view/linux-device-drivers/0596000081/ch13s04.html

What is the advantage of using GPIO as IRQ.?

I know that we convert the GPIO to irq, but want to understand what is the advantage of doing so ?
If we need interrupt why can't we have interrupt line only in first place and use it directly as interrupt ?
What is the advantage of using GPIO as IRQ?
If I get your question, you are asking why even bother having a GPIO? The other answers show that someone may not even want the IRQ feature of an interrupt. Typical GPIO controllers can configure an I/O as either an input or an output.
Many GPIO pads have the flexibility to be open drain. With an open drain configuration, you may have a bi-direction 'BUS' and data can be both sent and received. Here you need to change from an input to an output. You can imagine this if you bit-bash I2C communications. This type of use maybe fine if the I2C is only used to initialize some other interface at boot.
Even if the interface is not bi-directional, you might wish to capture on each edge. Various peripherals use zero crossing and a timer to decode a signal. For example a laser bar code reader, a magnetic stripe reader, or a bit-bashed UART might look at the time between zero crossings. Is the time double a bit width? Is the line high or low; then shift previous value and add two bits. In these cases you have to look at the signal to see whether the line is high or low. This can happen even if polarity shouldn't matter as short noise pulses can cause confusion.
So even for the case where you have only the input as an interrupt, the current level of the signal is often very useful. If this GPIO interrupt happens to be connected to an Ethernet controller and active high means data is ready, then you don't need to have the 'I/O' feature. However, this case is using the GPIO interrupt feature as glue logic. Often this signalling will be integrated into a dedicated module. The case where you only need the interrupt is typically some custom hardware to detect a signal (case open, power disconnect, etc) which is not industry standard.
The ARM SOC vendor has no idea which case above the OEM might use. The SOC vendor gives lots of flexibility as the transistors on the die are cheap compared to the wire bond/pins on the package. It means that you, who only use the interrupt feature, gets economies of scale (and a cheaper part) because other might be using these features and the ARM SOC vendor gets to distribute the NRE cost between more people.
In a perfect world, there is maybe no need for this. Not so long ago when tranistors where more expensive, some lines did only behave as interrupts (some M68k CPUs have this). Historically the ARM only has a single interrupt line with one common routine (the Cortex-M are different). So the interrupt source has to be determined by reading another register. As the hardware needs to capture the state of the line on the ARM, it is almost free to add the 'input controller' portion.
Also, for this reason, all of the ARM Linux GPIO drivers have a macro to convert from a GPIO pin to an interrupt number as they are usually one-to-one mapped. There is usually a single 'GIC' interrupt for the GPIO controller. There is a 'GPIO' interrupt controller which forms a tree of interrupt controllers with the GIC as the root. Typically, the GPIO irq numbers are Max GIC IRQ + port *32 + pin; so the GPIO irq numbers are just appended to the 'GIC' irq numbers.
If you were designing a bespoke ASIC for one specific system you could indeed do precisely that - only implement exactly what you need.
However, most processors/SoCs are produced as commodity products, so more flexibility allows them to be integrated in a wider variety of systems (and thus sell more). Given modern silicon processes, chip size tends to be constrained by the physical packaging, so pin count is at an absolute premium. Therefore, allowing pins to double up as either I/O or interrupt sources depending on the needs of the user offers more functionality in a given space, or the same functionality in less space, depending on which way you look at it.
It is not about "converting" anything - on a typical processor or microcontroller, a number of peripherals are connected to an interrupt controller; GPIO is just one of those peripherals. It is also by no means universally true; different devices have different capabilities, but in any case you are simply configuring a GPIO pin to generate an interrupt - that's a normal function of the GPIO not a "conversion".
Prior to ARM Cortex, ARM did not define an interrupt controller, and the core itself had only two interrupt sources (IRQ and FIQ). A vendor defined interrupt controller was required to multiplex the single IRQ over multiple peripherals. ARM Cortex defines an interrupt controller and a more flexible interrupt architecture; it is possible to achieve zero-latency interrupt from a GPIO, so there is no real advantage in accessing a dedicated interrupt? Doing that might mean the addition of external signal conditioning circuitry that is often incorporated in GPIO on the die.

Resources