BUG: Scheduling while atomic .... using sysfs_notify() - linux-kernel

I have a kernel module that uses hrtimers to notify userspace when the timer has fired. I understand I can just use userspace timers, but it is emulating a driver that will actually talk to hardware in the future. Every once in a while I get a BUG: Scheduling while atomic. After doing some research I am assuming that the hrtimer.function that I register as a callback, is being called from an interrupt routine by the kernel internals (making my callback function in an "Atomic Context"). Then when I call sysfs_notify() within the callback, I get the kernel bug, because sysfs_notify() acquires a mutex.
1) Is this a correct assumption?
If this is correct, I have seen that there is a function called sys_notify_dirent() that I can use to notify userspace from an atomic context. But according to this source:
http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-10/msg07510.html
It can only be called from a "process" context, and not an interrupt context (due to the spinlock).
2) Could someone explain the difference between process, interrupt, and atomic context?
3) If this cannot be used in an interrupt context, what is an alternative to notifying userspace in this context?

Correct, sysfs_notify() cannot be called from atomic context. And yes, sysfs_notify_dirent() appears to be safe to call from atomic context. The source you cite is a bug report that notices in an old kernel version that statement wasn't actually true, along with a patch to fix it. It now appears to be safe to call.
Follow the source code in gpiolib_sysfs.c, and you'll notice that sysfs_notify_dirent() eventually calls schedule_work(), which defers the actual call to sysfs_notify(), which is exactly what the comments to your question are advising you to do. It's just wrapped inside the convenience function.

Related

atomic context and process context/interrupt context

in Linux Device Driver3 and Understanding the Linux Kernel. Some buzzword appear many times without definition
process context: referenced in both books, but no definitions
interrupt context: Understanding the Linux Kernel gives definition
atomic context: only appear in LDD3 and without definition. "it specifies that the kernel is currently executing either an
interrupt handler or a deferrable function"
when reading tutorial, these three buzzword are referenced by many things. So I think the most important thing is to try figure out the exact definition, then I can understand those references.
I also did some search online, no very clear sources.Could any one gives good definition and the source of that definition? Thanks so much
Process context is the values of the registers. When a context switch occurs, one process is put off, the content of the registers is saved, so that when the proccess runs again, you can continue running from the same spot. Stack pointer, instruction pointer and so on.
This article gives an excellent explanation. Let me summarize it here:
Process Context - Regular processes and syscall invocations execute in this context and it can be interrupted by IRQs
Atomic Context - IRQs are generally executed in this context and they don't belong to any specific process, but rather are invoked by some device(ignore exceptions for simplicity). Once the interrupt context sleeps or gives up the CPU, it cannot be awakened. So it is also called atomic context.
A basic principle of the kernel is that in an interrupt or atomic context, the kernel cannot access user space, and the kernel cannot sleep.
Quoting from the book Linux Kernel Programming by Kaiwan N Billimoria:

Calling schedule() inside Linux IRQ

I'm making an emulation driver that requires me to call schedule() in ATOMIC contexts in order to make the emulation part work. For now I have this hack that allows me to call schedule() inside ATOMIC (e.g. spinlock) context:
int p_count = current_thread_info()->preempt_count;
current_thread_info()->preempt_count = 0;
schedule();
current_thread_info()->preempt_count = p_count;
But that doesn't work inside IRQs, the system just stops afer calling schedule().
Is there any way to hack the kernel in a way to allow me to do it? I'm using Linux kernel 4.2.1 with User Mode Linux
In kernel code you can be either in interrupt context or in process context.
When you are in interrupt context, you cannot call any blocking function (e.g., schedule()) or access the current pointer. That's related to how the kernel is designed and there is no way for having such functionalities in interrupt context. See also this answer.
Depending on what is your purpose, you can find some strategy that allows you to reach your goal. To me, it sounds strange that you have to call schedule() explicitly instead of relying on the natural kernel flow.
One possible approach follows (but, again, it depends on your specific goal). Form the IRQ you can schedule the work on a work queue through schedule_work(). The work queue, in fact, by design, executes kernel code in process context. From there, you are allowed to call blocking functions and access the current process data.

Is it safe to call getrawmonotonic() in Linux interrupt handler?

I did some research online, and people suggest using getrawmonotonic to get timestamp in kernel. Now I need to get time stamp in ISR, just wondering if it's safe. The Linux kernel version is 2.6.34.
Thanks
Yes, it is safe to use getrawmonotonic in interrupt handler.
Implementation of that function (in kernel/time/timekeeping.c) uses seqlock functionality(read_seqbegin(), read_seqretry calls), which is interrupt-safe, and timespec_add_ns() call, which is just arithmetic operation.

What does the term "interrupt safe" mean?

I come across this term every now and then.
And now I really need a clear explanation as I wish to use some MPI routines that
are said not to be interrupt-safe.
I believe it's another wording for reentrant. If a function is reentrant it can be interrupted in the middle and called again.
For example:
void function()
{
lock(mtx);
/* code ... */
unlock(mtx);
}
This function can clearly be called by different threads (the mutex will protect the code inside). But if a signal arrives after lock(mtx) and the function is called again it will deadlock. So it's not interrupt-safe.
Code that is safe from concurrent access from an interrupt is said to be interrupt-safe.
Consider a situation that your process is in critical section and an asynchronous event comes and interrupts your process to access the same shared resource that process was accessing before preemption.
It is a major bug if an interrupt occurs in the middle of code that is manipulating a resource and the interrupt handler can access the same resource. Locking can save you!

Avoiding sleep while holding a spinlock

I've recently read section 5.5.2 (Spinlocks and Atomic Context) of LDDv3 book:
Avoiding sleep while holding a lock can be more difficult; many kernel functions can sleep, and this behavior is not always well documented. Copying data to or from user space is an obvious example: the required user-space page may need to be swapped in from the disk before the copy can proceed, and that operation clearly requires a sleep. Just about any operation that must allocate memory can sleep; kmalloc can decide to give up the processor, and wait for more memory to become available unless it is explicitly told not to. Sleeps can happen in surprising places; writing code that will execute under a spinlock requires paying attention to every function that you call.
It's clear to me that spinlocks must always be held for the minimum time possible and I think that it's relatively easy to write correct spinlock-using code from scratch.
Suppose, however, that we have a big project where spinlocks are widely used.
How can we make sure that functions called from critical sections protected by spinlocks will never sleep?
Thanks in advance!
What about enabling "Sleep-inside-spinlock checking" for your kernel ? It is usually found under Kernel Debugging when you run make config. You might also try to duplicate its behavior in your code.
One thing I noticed on a lot of projects is people seem to misuse spinlocks, they get used instead of the other locking primitives that should have be used.
A linux spinlock only exists in multiprocessor builds (in single process builds the spinlock preprocessor defines are empty) spinlocks are for short duration locks on a multi processor platform.
If code fails to aquire a spinlock it just spins the processor until the lock is free. So either another process running on a different processor must free the lock or possibly it could be freed by an interrupt handler but the wait event mechanism is much better way of waiting on an interrupt.
The irqsave spinlock primitive is a tidy way of disabling/ enabling interrupts so a driver can lock out an interrupt handler but this should only be held for long enough for the process to update some variables shared with an interrupt handler, if you disable interupts you are not going to be scheduled.
If you need to lock out an interrupt handler use a spinlock with irqsave.
For general kernel locking you should be using mutex/semaphore api which will sleep on the lock if they need to.
To lock against code running in other processes use muxtex/semaphore
To lock against code running in an interrupt context use irq save/restore or spinlock_irq save/restore
To lock against code running on other processors then use spinlocks and avoid holding the lock for long.
I hope this helps

Resources