Avoiding sleep while holding a spinlock - linux-kernel

I've recently read section 5.5.2 (Spinlocks and Atomic Context) of LDDv3 book:
Avoiding sleep while holding a lock can be more difficult; many kernel functions can sleep, and this behavior is not always well documented. Copying data to or from user space is an obvious example: the required user-space page may need to be swapped in from the disk before the copy can proceed, and that operation clearly requires a sleep. Just about any operation that must allocate memory can sleep; kmalloc can decide to give up the processor, and wait for more memory to become available unless it is explicitly told not to. Sleeps can happen in surprising places; writing code that will execute under a spinlock requires paying attention to every function that you call.
It's clear to me that spinlocks must always be held for the minimum time possible and I think that it's relatively easy to write correct spinlock-using code from scratch.
Suppose, however, that we have a big project where spinlocks are widely used.
How can we make sure that functions called from critical sections protected by spinlocks will never sleep?
Thanks in advance!

What about enabling "Sleep-inside-spinlock checking" for your kernel ? It is usually found under Kernel Debugging when you run make config. You might also try to duplicate its behavior in your code.

One thing I noticed on a lot of projects is people seem to misuse spinlocks, they get used instead of the other locking primitives that should have be used.
A linux spinlock only exists in multiprocessor builds (in single process builds the spinlock preprocessor defines are empty) spinlocks are for short duration locks on a multi processor platform.
If code fails to aquire a spinlock it just spins the processor until the lock is free. So either another process running on a different processor must free the lock or possibly it could be freed by an interrupt handler but the wait event mechanism is much better way of waiting on an interrupt.
The irqsave spinlock primitive is a tidy way of disabling/ enabling interrupts so a driver can lock out an interrupt handler but this should only be held for long enough for the process to update some variables shared with an interrupt handler, if you disable interupts you are not going to be scheduled.
If you need to lock out an interrupt handler use a spinlock with irqsave.
For general kernel locking you should be using mutex/semaphore api which will sleep on the lock if they need to.
To lock against code running in other processes use muxtex/semaphore
To lock against code running in an interrupt context use irq save/restore or spinlock_irq save/restore
To lock against code running on other processors then use spinlocks and avoid holding the lock for long.
I hope this helps

Related

What happens when a task is executing critical section but it needs to be scheduled out on a uniprocessor system with preemption disabled?

Here is a scenario. Let’s say that a kernel task is running on a uniprocessor system with preemption disabled. The task acquires a spin lock. Now it is executing it’s critical section. At this time, what if the time slice available for this task expires and it has to schedule out?
Does the spin_lock have a mechanism to prevent this?
Can it be scheduled out? If yes, then what happens to the critical section?
Can it be interrupted by an IRQ? (Assuming that preemption is disabled)
Is this scenario feasible? In other words, could this scenario happen?
From the kernel code, I understand that the spin_lock is basically a nop on a uniprocessor with preemption disabled. To be accurate, all it does is barrier()
I understand why it is a nop (as it is a uniprocessor and no other task could be manipulating the data at that instant) but I still don’t understand how it could be uninterrupted(due to IRQs or scheduling).
What am I missing here? Pointers to the Linux kernel code which indicates about this could be really helpful.
My basic assumptions:
32 bit Linux kernel
Actually spin_lock() disables preemption by calling preempt_disable() before it tries to acquire the lock, so scenario #1, #2, #3 could never happen.
From recent source code, spin_lock() eventually calls __raw_spin_lock(), which calls preempt_disable() before calling spin_acquire() to acquire the lock. spin_lock_irqsave() which is commonly used in interrupt context has similar context.
Regarding #3, if the variable is shared between process/interrupt context, you should always use spin_lock_irq()/spin_lock_irqsave() instead of spin_lock() to avoid deadlock scenario.
The mechanism that handles time slices expiring is a timer interrupt. The interrupt will set the TIF_NEEDS_RESCHED flag for the process. When returning from the timer's interrupt context back to your critical section, a check will be made whether or not to preempt the process due to the TIF_NEEDS_RESCHED flag. Since preemption is disabled, nothing will happen and it will return to your critical section.
When your critical section is over, the release of the lock will call preempt_enable() to reenable preemption. At that moment another check is done as to whether or not to preempt. Since the TIF_NEEDS_RESCHED flag is set and preemption is now enabled, the process will be preempted.
Spin locks disable preemption.
No, because preemption is disabled.
Yes. There are spin lock versions that disable IRQs to prevent this.
No because spin locks disable preemption.
Spinlocks don't exist on unitprocessor systems anyway because they don't make sense. If a a thread that doesn't own the lock attempts to acquire it, that means that the thread that does own it is currently asleep (only one cpu). So there's no reason to spin wait for something that's asleep. For this reason spinlocks are optimized away in these cases to just a preemption disable so that no other thread can touch the critical section.

What is channel event system?

I am working on some project Where I have to deal with uc ATxmega128A1 , But being a beginner to a ucontrollers I want to know what is this channel event system regarding ucs.
I have referred a link http://www.atmel.com/Images/doc8071.pdf but not getting it.
The traditional way to do things the channel system can do is to use interrupts.
In the interrupt model, the CPU runs the code starting with main(), and continues usually with some loop. When an particular event occurs, such as a button being pressed, the CPU is "interrupted". The current processing is stopped, some registers are saved, and the execution jumps to some code pointed to by an interrupt vector called an interrupt handler. This code usually has instructions to save register values, and this is added automatically by the compiler.
When the interrupting code is finished, the CPU restores the values that the registers previously had and execution jumps back to the point in the main code where it was interrupted.
But this approach takes valuable CPU cycles. And some interrupt handlers don't do very much expect trigger some peripheral to take an action. Wouldn't it be great it these kinds of interrupt handlers could be avoided and have the mC have the peripherals talk directly to each other without pausing the CPU?
This is what the event channel system does. It allows peripherals to trigger each other directly without involving the CPU. The CPU continues to execute instructions while the channel system operates in parallel. This doesn't mean you can replace all interrupt handlers, though. If complicated processing is involved, you still need a handler to act. But the channel system does allow you to avoid using very simple interrupt handlers.
The paper you reference describes this in a little more detail (but assumes a lot of knowledge on the reader's part). You have to read the actual datasheet of your mC to find the exact details.

use of spin variants in network processing

I have written a Kernel module that is interacting with net-filter hooks.
The net-filter hooks operate in Softirq context.
I am accessing a global data structure
"Hash Table" from the softirq context as well as from Process context. The process context access is due to a sysctl file being used to modify the contents of the Hash-table.
I am using spinlock_irq_save.
Is this choice of spin_lock api correct ?? In terms of performance and locking standards.
what would happen if an interrupt is scheduled on another processor? while on the current processor lock is already hold by a process context code?
Firstly:
So, with all the above details I concluded that my softirqs can run concurrently on both cores.
Yes, this is correct. Your softirq handler may be executed "simultaneously on more than one CPU".
Your conclusion to use spinlocks sounds correct to me. However, this assumes that the critical section (ie., that which is executed with the spinlock held) has the following properties:
It must not sleep (for example, acquire a blocking mutex)
It should be as short as possible
Generally, if you're just updating your hash table, you should be fine here.
If an IRQ handler tries to acquire a spinlock that is held by a process context, that's fine. As long as your process context does not sleep with that lock held, the lock should be released within a short amount of time, allowing the IRQ handler to make forward progress.
I think the solution is appropriate . Softirqs anyways runs with preemption disabled . To share a data with a process, the process must also disable both preemption and interrupts. In case of timer, which only reduces the time stamp of an entry can do it atomically i.e. the time stamp variable must be atomic. If in another core softirqs run and wants to acquire the spinlock, when it is already held in the other core,it must wait.

How does Kernel handle the lock in process context when an interrupt comes?

First of all sorry for a little bit ambiguity in Question... What I want to understand is the below scenario
Suppose porcess is running, it holds one lock, Now after acquiring the lock HW interrupt is generated, So How kernel will handle this situation, will it wait for lock ? if yes, what if the interrupt handler need to access that lock or the shared data protected by that lock in process ?
The Linux kernel has a few functions for acquiring spinlocks, to deal with issues like the one you're raising here. In particular, there is spin_lock_irq(), which disables interrupts (on the CPU the process is running on) and acquires the spinlock. This can be used when the code knows interrupts are enabled before the spinlock is acquired; in case the function might be called in different contexts, there is also spin_lock_irqsave(), which stashes away the current state of interrupts before disabling them, so that they can be reenabled by spin_unlock_irqrestore().
In any case, if a lock is used in both process and interrupt context (which is a good and very common design if there is data that needs to be shared between the contexts), then process context must disable interrupts (locally on the CPU it's running on) when acquiring the spinlock to avoid deadlocks. In fact, lockdep ("CONFIG_PROVE_LOCKING") will verify this and warn if a spinlock is used in a way that is susceptible to the "interrupt while process context holds a lock" deadlock.
Let me explain some basic properties of interrupt handler or bottom half.
A handler can’t transfer data to or from user space, because it doesn’t execute in the context of a process.
Handlers also cannot do anything that would sleep, such as calling wait_event, allocating memory with anything other than GFP_ATOMIC, or locking a semaphore
handlers cannot call schedule.
What i am trying to say is that Interrupt handler runs in atomic context. They can not sleep as they cannot be rescheduled. interrupts do not have a backing process context
The above is by design. You can do whatever you want in code, just be prepared for the consequences
Let us assume that you acquire a lock in interrupt handler(bad design).
When an interrupt occur the process saves its register on stack and start ISR. now after acquiring a lock you would be in a deadlock as their is no way ISR know what the process was doing.
The process will not be able to resume execution until it is done it with ISR
In a preemptive kernel the ISR and the process can be preempt but for a non-preemptive kernel you are dead.

How best to synchronize memory access shared between kernel and user space, in Windows

I can't find any function to acquire spinlock in Win32 Apis.
Is there a reason?
When I need to use spinlock, what do I do?
I know there is an CriticalSectionAndSpinCount function.
But that's not what I want.
Edit:
I want to synchronize a memory which will be shared between kernel space and user space. -The memory will be mapped.
I should lock it when I access the data structure and the locking time will be very short.
The data structure(suppose it is a queue) manages event handles to interaction each other.
What synchronization mechanism should I use?
A spinlock is clearly not appropriate for user-level synchronization. From http://www.microsoft.com/whdc/driver/kernel/locks.mspx:
All types of spin locks raise the IRQL
to DISPATCH_LEVEL or higher. Spin
locks are the only synchronization
mechanism that can be used at IRQL >=
DISPATCH_LEVEL. Code that holds a spin
lock runs at IRQL >= DISPATCH_LEVEL,
which means that the system’s thread
switching code (the dispatcher) cannot
run and, therefore, the current thread
cannot be pre-empted.
Imagine if it were possible to take a spin lock in user mode: Suddenly the thread would not be able to be pre-empted. So on a single-cpu machine, this is now an exclusive and real-time thread. The user-mode code would now be responsible for handling interrupts and other kernel-level tasks. The code could no longer access any paged memory, which means that the user-mode code would need to know what memory is currently paged and act accordingly. Cats and dogs living together, mass hysteria!
Perhaps a better question would be to tell us what you are trying to accomplish, and ask what synchronization method would be most appropriate.
There is a managed user-mode SpinLock as described here. Handle with care, as advised in the docs - it's easy to go badly wrong with these locks.
The only way to access this in native code is via the Win32 API you named already - CriticalSectionAndSpinCount and its siblings.

Resources