who will pass the control to ISR while cpu is executing BH - linux-kernel

Assume that an interrupt is occured in Unicore processor
As a general practice the scheduler is disabled and the cpu is serving the ISR
The ISR disables the current IRQ and schedules the bottom half (tasklet here) for the deferred work.
After ISR is served(IRQ is enabled) and now the processor got the change to serve the scheduled bottom half.
Mean-while again the interrupt is occurred, then the currently running BH is pre-empted and CPU executes the new ISR.
In this case who is responsible for switching the control from BH to ISR
My question is based on assuming that the scheduler is disabled and system is unicore processor.

Passing control to the Interrupt handlers are not done by the process scheduler. This is a hardware mechanism. In x86, The interrupt handling routines are specified by the IDT which the kernel usually setup in the beginning.
If you are really interested in the mechanism, you can further read this and this or your processor architecture's developer manual.

Related

Bottom Halves on FreeRTOS?

I have heard about deferred interrupts in FreeRTOS, but as per my understanding, the task to which the ISR switches in order to do the necessary work runs in the task/process context. Is there a scheme similar such as tasklets or softirq where in the deferred work runs in Interrupt context rather than process context ?
ISR routine and deferred interrupt Handler task in Free RTOS works similar to Top half and Bottom half(Tasklets) in linux.
They are often used to process frequent interrupt requests when ISR need to perform lengthy operations.
In freeRTOS, to defer the processing of a function to the RTOS daemon task use vPendableFunction or xTimerPendFunctionCallFromISR(). This is similar to implementing bottom half(Tasklets) in linux. For more info click here

When we use irq_set_chained_handler the irq line will be disabled or not?

When we use irq_set_chained_handler the irq line will not be disabled or not, when we are servicing the associated handler, as in case of request_irq.
It doesn't matter how the interrupt was setup. When any interrupt occurred, all interrupts (for this CPU) will be disabled during the interrupt handler. For example, on ARM architecture first place in C code where interrupt handling is found is asm_do_IRQ() function (defined in arch/arm/kernel/irq.c). It's being called from assembler code. For any interrupt (whether it was requested by request_irq() or by irq_set_chained_handler()) the same asm_do_IRQ() function is called, and interrupts are disabled automatically by ARM CPU. See this answer for details.
Historical notes
Also, it worth to be mentioned that some time ago Linux kernel was providing two types of interrupts: "fast" and "slow" ones. Fast interrupts (when using IRQF_DISABLED or SA_INTERRUPT flag) were running with disabled interrupts, and those handlers supposed to be very short and quick. Slow interrupts, on the other hand, were running with re-enabled interrupts, because handlers for slow interrupts may take much of time to be handled.
On modern versions of Linux kernel all interrupts are considered as "fast" and are running with interrupts disabled. Interrupts with huge handlers must be implemented as threaded (or enable interrupts manually in ISR using local_irq_enable_in_hardirq()).
That behavior was changed in Linux kernel v2.6.35 by this commit. You can find more details about this here.
Refer https://www.kernel.org/doc/Documentation/gpio/driver.txt
This means the GPIO irqchip is registered using
irq_set_chained_handler() or the corresponding
gpiochip_set_chained_irqchip() helper function, and the GPIO irqchip
handler will be called immediately from the parent irqchip, while
holding the IRQs disabled. The GPIO irqchip will then end up calling
something like this sequence in its interrupt handler:

Can an interrupt handler be preempted by the same interrupt handler?

Does the CPU disable all interrupts on local CPU before calling the interrupt handler?
Or does it only disable that particular interrupt line, which is being served?
x86 disables all local interrupts (except NMI of course) before jumping to the interrupt vector. Linux normally masks the specific interrupt and re-enables the rest of the interrupts (which aren't masked), unless a specific flags is passed to the interrupt handler registration.
Note that while this means your interrupt handler will not race with itself on the same CPU, it can and will race with itself running on other CPUs in an SMP / SMT system.
Normally (at least in x86), an interrupt disables interrupts.
When an interrupt is received, the hardware does these things:
1. Save all registers in a predetermined place.
2. Set the instruction pointer (AKA program counter) to the interrupt handler's address.
3. Set the register that controls interrupts to a value that disables all (or most) interrupts. This prevents another interrupt from interrupting this one.
An exception is NMI (non maskable interrupt) which can't be disabled.
Yes, that's fine.
I'd like to also add what I think might be relevant.
In many real-world drivers/kernel code, "bottom-half" (bh) handlers are used pretty often- tasklets, softirqs. These bh's run in interrupt context and can run in parallel with their top-half (th) handlers on SMP (esp softirq's).
Of course, recently there's a move (mainly code migrated from the PREEMPT_RT project) towards mainline, that essentially gets rid of the 'bh' mechanism- all interrupt handlers will run with all interrupts disabled. Not only that, handlers are (can be) converted to kernel threads- these are the so-called "threaded" interrupt handlers.
As of today, the choice is still left to the developer- you can use the 'traditional' th/bh style or the threaded style.
Ref and Details:
http://lwn.net/Articles/380931/
http://lwn.net/Articles/302043/
Quoting Intels own, surprisingly well-written "Intel® 64 and IA-32 Architectures Software Developer’s Manual", Volume 1, pages 6-10:
If an interrupt or exception handler is called
through an interrupt gate, the processor clears the interrupt enable (IF) flag in the EFLAGS register to prevent
subsequent interrupts from interfering with the execution of the handler. When a handler is called through a trap
gate, the state of the IF flag is not changed.
So just to be clear - yes, effectively the CPU "disables" all interrupts before calling the interrupt handler. Properly described, the processor simply triggers a flag which makes it ignore all interrupt requests. Except probably non-maskable interrupts and/or its own software exceptions (please someone correct me on this, not verified).
We want ISR to be atomic and no one should be able to preempt the ISR.
Therefore, An ISR disables the local interrupts ( i.e. the interrupt on the current processor) and once the ISR calls ret_from_intr() function ( i.e. we have finished the ISR) , interrupts are again enabled on the current processor.
If an interrupt occurs, it will now be served by the other processor ( in SMP system) and ISR related to that interrupt will start running.
In SMP system , We also need to include the proper synchronization mechanism ( spin lock) in an ISR.

Trap Dispatching on Windows

I am actually reading Windows Internals 5th edition and i am enjoying, although isn't a easy book to read and understand.
I am confused about IRQLs and IDT Table.
I read that windows implement custom priorization levels with IRQL and the Plug and Play Manager maps IRQ from devices to IRQL.
Alright, so, IRQLs are used for Software and Hardware interrupts, and for exceptions is used the Exception Dispatch handler.
When one device generates an interrupt, the interrupt controller pass this information to the CPU with the IRQ.
So Windows takes this IRQ and translates to IRQL to schedule when to execute the routine (routine that IDT[IRQ_VALUE] is pointing to?
Is that what is happening?
Yes, on a very high level.
Everything starts with a kernel trap. Kernel trap handler handles interrupts, exceptions, system service calls and virtual memory pager.
When an interrupt happens (line based - using dedicated pin or message based- writing to an address) windows uses IRQL to determine the priority of the interrupt and uses this to see if the interrupt can be served or not during that time. HAL does the job of translating the IRQ to IRQL.
It then uses IRQ to get an index of the IDT to find the appropriate ISR routing to invoke. Note there can be multiple ISR associated for a given IRQ. All of them execute in order.
Each processor has its own IDT so you could potentially have multiple ISR's running at the same time.
Exception dispatch, as I mentioned before, is also handled by the kernel trap but the procedure for it is different. It usually starts by checking for any exception handlers by stack unwinding, then checking for debugger port etc.

Why kernel code/thread executing in interrupt context cannot sleep?

I am reading following article by Robert Love
http://www.linuxjournal.com/article/6916
that says
"...Let's discuss the fact that work queues run in process context. This is in contrast to the other bottom-half mechanisms, which all run in interrupt context. Code running in interrupt context is unable to sleep, or block, because interrupt context does not have a backing process with which to reschedule. Therefore, because interrupt handlers are not associated with a process, there is nothing for the scheduler to put to sleep and, more importantly, nothing for the scheduler to wake up..."
I don't get it. AFAIK, scheduler in the kernel is O(1), that is implemented through the bitmap. So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?
So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?
The problem is that the interrupt context is not a process, and therefore cannot be put to sleep.
When an interrupt occurs, the processor saves the registers onto the stack and jumps to the start of the interrupt service routine. This means that when the interrupt handler is running, it is running in the context of the process that was executing when the interrupt occurred. The interrupt is executing on that process's stack, and when the interrupt handler completes, that process will resume executing.
If you tried to sleep or block inside an interrupt handler, you would wind up not only stopping the interrupt handler, but also the process it interrupted. This could be dangerous, as the interrupt handler has no way of knowing what the interrupted process was doing, or even if it is safe for that process to be suspended.
A simple scenario where things could go wrong would be a deadlock between the interrupt handler and the process it interrupts.
Process1 enters kernel mode.
Process1 acquires LockA.
Interrupt occurs.
ISR starts executing using Process1's stack.
ISR tries to acquire LockA.
ISR calls sleep to wait for LockA to be released.
At this point, you have a deadlock. Process1 can't resume execution until the ISR is done with its stack. But the ISR is blocked waiting for Process1 to release LockA.
I think it's a design idea.
Sure, you can design a system that you can sleep in interrupt, but except to make to the system hard to comprehend and complicated(many many situation you have to take into account), that's does not help anything. So from a design view, declare interrupt handler as can not sleep is very clear and easy to implement.
From Robert Love (a kernel hacker):
http://permalink.gmane.org/gmane.linux.kernel.kernelnewbies/1791
You cannot sleep in an interrupt handler because interrupts do not have
a backing process context, and thus there is nothing to reschedule back
into. In other words, interrupt handlers are not associated with a task,
so there is nothing to "put to sleep" and (more importantly) "nothing to
wake up". They must run atomically.
This is not unlike other operating systems. In most operating systems,
interrupts are not threaded. Bottom halves often are, however.
The reason the page fault handler can sleep is that it is invoked only
by code that is running in process context. Because the kernel's own
memory is not pagable, only user-space memory accesses can result in a
page fault. Thus, only a few certain places (such as calls to
copy_{to,from}_user()) can cause a page fault within the kernel. Those
places must all be made by code that can sleep (i.e., process context,
no locks, et cetera).
Because the thread switching infrastructure is unusable at that point. When servicing an interrupt, only stuff of higher priority can execute - See the Intel Software Developer's Manual on interrupt, task and processor priority. If you did allow another thread to execute (which you imply in your question that it would be easy to do), you wouldn't be able to let it do anything - if it caused a page fault, you'd have to use services in the kernel that are unusable while the interrupt is being serviced (see below for why).
Typically, your only goal in an interrupt routine is to get the device to stop interrupting and queue something at a lower interrupt level (in unix this is typically a non-interrupt level, but for Windows, it's dispatch, apc or passive level) to do the heavy lifting where you have access to more features of the kernel/os. See - Implementing a handler.
It's a property of how O/S's have to work, not something inherent in Linux. An interrupt routine can execute at any point so the state of what you interrupted is inconsistent. If you interrupted the thread scheduling code, its state is inconsistent so you can't be sure you can "sleep" and switch threads. Even if you protect the thread switching code from being interrupted, thread switching is a very high level feature of the O/S and if you protected everything it relies on, an interrupt becomes more of a suggestion than the imperative implied by its name.
So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?
Scheduling happens on timer interrupts. The basic rule is that only one interrupt can be open at a time, so if you go to sleep in the "got data from device X" interrupt, the timer interrupt cannot run to schedule it out.
Interrupts also happen many times and overlap. If you put the "got data" interrupt to sleep, and then get more data, what happens? It's confusing (and fragile) enough that the catch-all rule is: no sleeping in interrupts. You will do it wrong.
Disallowing an interrupt handler to block is a design choice. When some data is on the device, the interrupt handler intercepts the current process, prepares the transfer of the data and enables the interrupt; before the handler enables the current interrupt, the device has to hang. We want keep our I/O busy and our system responsive, then we had better not block the interrupt handler.
I don't think the "unstable states" are an essential reason. Processes, no matter they are in user-mode or kernel-mode, should be aware that they may be interrupted by interrupts. If some kernel-mode data structure will be accessed by both interrupt handler and the current process, and race condition exists, then the current process should disable local interrupts, and moreover for multi-processor architectures, spinlocks should be used to during the critical sections.
I also don't think if the interrupt handler were blocked, it cannot be waken up. When we say "block", basically it means that the blocked process is waiting for some event/resource, so it links itself into some wait-queue for that event/resource. Whenever the resource is released, the releasing process is responsible for waking up the waiting process(es).
However, the really annoying thing is that the blocked process can do nothing during the blocking time; it did nothing wrong for this punishment, which is unfair. And nobody could surely predict the blocking time, so the innocent process has to wait for unclear reason and for unlimited time.
Even if you could put an ISR to sleep, you wouldn't want to do it. You want your ISRs to be as fast as possible to reduce the risk of missing subsequent interrupts.
The linux kernel has two ways to allocate interrupt stack. One is on the kernel stack of the interrupted process, the other is a dedicated interrupt stack per CPU. If the interrupt context is saved on the dedicated interrupt stack per CPU, then indeed the interrupt context is completely not associated with any process. The "current" macro will produce an invalid pointer to current running process, since the "current" macro with some architecture are computed with the stack pointer. The stack pointer in the interrupt context may point to the dedicated interrupt stack, not the kernel stack of some process.
By nature, the question is whether in interrupt handler you can get a valid "current" (address to the current process task_structure), if yes, it's possible to modify the content there accordingly to make it into "sleep" state, which can be back by scheduler later if the state get changed somehow. The answer may be hardware-dependent.
But in ARM, it's impossible since 'current' is irrelevant to process under interrupt mode. See the code below:
#linux/arch/arm/include/asm/thread_info.h
94 static inline struct thread_info *current_thread_info(void)
95 {
96 register unsigned long sp asm ("sp");
97 return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
98 }
sp in USER mode and SVC mode are the "same" ("same" here not mean they're equal, instead, user mode's sp point to user space stack, while svc mode's sp r13_svc point to the kernel stack, where the user process's task_structure was updated at previous task switch, When a system call occurs, the process enter kernel space again, when the sp (sp_svc) is still not changed, these 2 sp are associated with each other, in this sense, they're 'same'), So under SVC mode, kernel code can get the valid 'current'. But under other privileged modes, say interrupt mode, sp is 'different', point to dedicated address defined in cpu_init(). The 'current' calculated under these mode will be irrelevant to the interrupted process, accessing it will result in unexpected behaviors. That's why it's always said that system call can sleep but interrupt handler can't, system call works on process context but interrupt not.
High-level interrupt handlers mask the operations of all lower-priority interrupts, including those of the system timer interrupt. Consequently, the interrupt handler must avoid involving itself in an activity that might cause it to sleep. If the handler sleeps, then the system may hang because the timer is masked and incapable of scheduling the sleeping thread.
Does this make sense?
If a higher-level interrupt routine gets to the point where the next thing it must do has to happen after a period of time, then it needs to put a request into the timer queue, asking that another interrupt routine be run (at lower priority level) some time later.
When that interrupt routine runs, it would then raise priority level back to the level of the original interrupt routine, and continue execution. This has the same effect as a sleep.
It is just a design/implementation choices in Linux OS. The advantage of this design is simple, but it may not be good for real time OS requirements.
Other OSes have other designs/implementations.
For example, in Solaris, the interrupts could have different priorities, that allows most of devices interrupts are invoked in interrupt threads. The interrupt threads allows sleep because each of interrupt threads has separate stack in the context of the thread.
The interrupt threads design is good for real time threads which should have higher priorities than interrupts.

Resources