I am writing a Linux kernel module using Kprobes to trace specific system calls, and I need to write to a file from within a KProbe handler (specifically, a Kretprobe). I know this is generally not advised, but I need to write the output to a very specific location, so I can't use any standard logging mechanisms.
I can open/write fine from the init() function in the module, but when I try to do so from within a probe handler, the kernel crashes.
From Documentation/kprobes.txt:
Probe handlers are run with preemption disabled. Depending on the
architecture and optimization state, handlers may also run with
interrupts disabled (e.g., kretprobe handlers and optimized kprobe
handlers run without interrupt disabled on x86/x86-64). In any case,
your handler should not yield the CPU (e.g., by attempting to acquire
a semaphore).
In other words, you cannot sleep inside probe handler. Because read/write operations with file normally use disk I/O, you cannot use these operations inside the handler.
I need to write the output to a very specific location, so I can't use any standard logging mechanisms.
You can output trace from probe handler, e.g., into the special device file, and run(in parallel) user-space program, which simply reads that file and writes into one at very specific location.
Related
I'm making an emulation driver that requires me to call schedule() in ATOMIC contexts in order to make the emulation part work. For now I have this hack that allows me to call schedule() inside ATOMIC (e.g. spinlock) context:
int p_count = current_thread_info()->preempt_count;
current_thread_info()->preempt_count = 0;
schedule();
current_thread_info()->preempt_count = p_count;
But that doesn't work inside IRQs, the system just stops afer calling schedule().
Is there any way to hack the kernel in a way to allow me to do it? I'm using Linux kernel 4.2.1 with User Mode Linux
In kernel code you can be either in interrupt context or in process context.
When you are in interrupt context, you cannot call any blocking function (e.g., schedule()) or access the current pointer. That's related to how the kernel is designed and there is no way for having such functionalities in interrupt context. See also this answer.
Depending on what is your purpose, you can find some strategy that allows you to reach your goal. To me, it sounds strange that you have to call schedule() explicitly instead of relying on the natural kernel flow.
One possible approach follows (but, again, it depends on your specific goal). Form the IRQ you can schedule the work on a work queue through schedule_work(). The work queue, in fact, by design, executes kernel code in process context. From there, you are allowed to call blocking functions and access the current process data.
I'm trying to make sure that a unique user process executes as soon as possible after a particular hardware interrupt occurs.
One mechanism I'm aware of for doing this is to write a small kernel module that exports a device while sleeping inside the read handler. The module also registers an irq handler, which does nothing but wake the process. Then from the user's perspective, reads to that device block until the relevant interrupt occurs.
(1) On a modern CPU with a mainline kernel, can you reliably expect sub millisecond latency between the kernel seeing the interrupt and the user process regaining control with this?
(2) Are there any lower latency mechanisms on a mainline kernel?
Apply the PREEMPT_RT patch to the kernel and compile it configuring full preemptability through make menuconfig.
This will allow you to have threaded interrupts (i.e., interrupt handlers executed as kernel threads). Then, you can assign maximum priority (i.e., RT prio > 50) to your specific interrupt handler (check its PID using ps aux) and to your specific process, and a lower priority to anything else.
Is it possible to run multiple instances of a same interrupt simultaneously on a multi processor system in linux?
If not possible, why do we need to synchronize between interrupt handlers using spin locks?
Thanks
Venkatesh
On a SMP architecture Advanced Programmable Interrupt Controller(APIC) is used to route the interrupts from peripherals to the CPU's.
the APIC, based on
1. the routing table (where interrupt affinity is set to a particular processor),
2. priority of the interrupt,
3. the load on the CPU's
For example, consider a interrupt is received at IRQ line 32, this goes through APIC,the interrupt is routed to a particular CPU, for now consider CPU0, this interrupt line is masked until the ISR is handled, which means you will not get a interrupt of the same type if ISR execution is in progress
Once ISR is handled, only then the interrupt line is unmasked for future interrupts
Is it possible to run multiple instances of a same interrupt simultaneously on a multi processor system in linux?
The interrupt handlers are generally serialized. Meaning, that only one instance of the handler would be running(on either of the processors). While this is running, if same type of interrupt is again generated, it is processed only after the current one is done, thus serialized. While "this" handler is being executed by one of the core, other core might service handler of a different instance.
Why do we need to synchronize between interrupt handlers using spin locks?
The spinlocks are used even in such cases as the data has to be protected against some other threads(for example bottom halved, user read/write handler functions, etc).
The scenario could be something like this :
my_ISR()
{
lock(&l);
// data is accessed here
unlock(&l);
}
my_other_thread()
{
lock(&l);
// same data is accessed here
unlock(&l);
}
I am doing one assignment where I have to write and interrupt handler for Keyboard. So in this assignment we have to log the key stroke so it is allowed to do File I/O and we are using Work queue for that.
I know it is not allowed to sleep in Interrupt handler and so we cannot use any file i/o or printk in interrupt handler.
So in real industry how to debug and interrupt handler OR what I can do if I want to debug something ?
Yes! this is correct we can not use printk inside an ISR. As i studied in RTOS(Real time operating System) during interrupt handling it creates message log and save required information in the log file which you can see later.
The similar thing is also available for latest Kernel. Using trace_printk you can debug time critical place. I haven't used this before so no sample for this. You can follow this link to know more about trace_printk.
I am actually reading Windows Internals 5th edition and i am enjoying, although isn't a easy book to read and understand.
I am confused about IRQLs and IDT Table.
I read that windows implement custom priorization levels with IRQL and the Plug and Play Manager maps IRQ from devices to IRQL.
Alright, so, IRQLs are used for Software and Hardware interrupts, and for exceptions is used the Exception Dispatch handler.
When one device generates an interrupt, the interrupt controller pass this information to the CPU with the IRQ.
So Windows takes this IRQ and translates to IRQL to schedule when to execute the routine (routine that IDT[IRQ_VALUE] is pointing to?
Is that what is happening?
Yes, on a very high level.
Everything starts with a kernel trap. Kernel trap handler handles interrupts, exceptions, system service calls and virtual memory pager.
When an interrupt happens (line based - using dedicated pin or message based- writing to an address) windows uses IRQL to determine the priority of the interrupt and uses this to see if the interrupt can be served or not during that time. HAL does the job of translating the IRQ to IRQL.
It then uses IRQ to get an index of the IDT to find the appropriate ISR routing to invoke. Note there can be multiple ISR associated for a given IRQ. All of them execute in order.
Each processor has its own IDT so you could potentially have multiple ISR's running at the same time.
Exception dispatch, as I mentioned before, is also handled by the kernel trap but the procedure for it is different. It usually starts by checking for any exception handlers by stack unwinding, then checking for debugger port etc.