What are the implications of a linux kernel being preemptive, particularly for creating device drivers. I'm guessing you need to be more diligent about resource locking, but is there anything more to this?
There are a lot more opportunities for race conditions as you had mentioned, so yes, you have to be very diligent with locks. You also have to be careful about timing, such as when you enable/disable interrupts or other hardware resources, etc. You don't always have to use locks for these situations, but you may have to reorder you code. Finally, it also effects scheduling, allowing high priority tasks to be much more responsive, which in turn may have a negative effect on the lower priority tasks.
If not on SMP, make sure this lock patch need to be applied: "Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT", that was made in Apr 2013.
Be aware that each time the code runs "spin_unlock_" or "preemption_enable", a preemption could kick in. It's the same whenever an exception returns or interrupt returns. Beyond those cases and the likes, there should be no other concerns. The kernel design guarantees exceptions and interrupts are handled in strictly embedded manner, though with SMP multiple instances may run in parallel.
Related
I have a textbook statement says disabling interrupt is not recommended in multi-processor system, and it will take too much time. But I don't understand this, can anyone show me the process of multi-processor system disabling the interrupts? Thanks
on x86 (and other architectures, AFAIK), enabling/disabling interrupts is on a per-core basis. You can't globally disable interrupts on all cores.
Software can communicate between cores with inter-processor interrupts (IPIs) or atomic shared variables, but even so it would be massively expensive to arrange for all cores to sit in a spin-loop waiting for a notification from this core that they can re-enable interrupts. (Interrupts are disabled on other cores, so you can't send them an IPI to let them know when you're done your block of atomic operations.) You have to interrupt whatever all 7 other cores (e.g. on an 8-way SMP system) are doing, with many cycles of round-trip communication overhead.
It's basically ridiculous. It would be clearer to just say you can't globally disable interrupts across all cores, and that it wouldn't help anyway for anything other than interrupt handlers. It's theoretically possible, but it's not just "slow", it's impractical.
Disabling interrupts on one core doesn't make something atomic if other threads are running on other cores. Disabling interrupts works on uniprocessor machines because it makes a context-switch impossible. (Or it makes it impossible for the same interrupt handler to interrupt itself.)
But I think my confusion is that for me the difference between 1 core and 8 core is not a big number for me; why disabling all of them from interrupt is time consuming.
Anything other than uniprocessor is a fundamental qualitative difference, not quantitative. Even a dual-core system, like early multi-socket x86 and the first dual-core-in-one-socket x86 systems, completely changes your approach to atomicity. You need to actually take a lock or something instead of just disabling interrupts. (Early Linux, for example, had a "big kernel lock" that a lot of things depended on, before it had fine-grained locking for separate things that didn't conflict with each other.)
The fundamental difference is that on a UP system, only interrupts on the current CPU can cause things to happen asynchronously to what the current code is doing. (Or DMA from devices...)
On an SMP system, other cores can be doing their own thing simultaneously.
For multithreading, getting atomicity for a block of instructions by disabling interrupts on the current CPU is completely ineffective; threads could be running on other CPUs.
For atomicity of something in an interrupt handler, if this IRQ is set up to only ever interrupt this core, disabling interrupts on this core will work. Because there's no threat of interference from other cores.
I came across few articles talking about differences between Mutexes and Critical sections.
One of the major differences which I came across is , Mutexes run in kernel mode whereas Critical sections mainly run in user mode.
So if this is the case then arent the applications which use mutexes harmful for the system in case the application crashes?
Thanks.
Use Win32 Mutexes handles when you need to have a lock or synchronization across threads in different processes.
Use Win32 CRITICAL_SECTIONs when you need to have a lock between threads within the same process. It's cheaper as far as time and doesn't involve a kernel system call unless there is lock contention. Critical Section objects in Win32 can't span process boundaries anyway.
"Harmful" is the wrong word to use. More like "Win32 mutexes are slightly more expensive that Win32 Critical Sections in terms of performance". A running app that uses mutexes instead of critical sections won't likely hurt system performance. It will just run minutely slower. But depending on how often your lock is acquired and released, the difference may not even be measurable.
I forget the perf metrics I did a long time ago. The bottom line is that EnterCriticalSection and LeaveCriticalSection APIs are on the order of 10-100x faster than the equivalent usage of WaitForSingleObject and ReleaseMutex. (on the order of 1 microsecond vs 1 millisecond).
This is a two fold question that raised from my trivial observation that I am running a SMP enabled Linux on our ARM-Cortex 8 based SoC. First part is about performance (memory space/CPU time) difference between SMP and NON-SMP Linux kernel on a Uni processor system. Does any difference exits?
Second part is about use of Spinlock. AFAIK spinklock are noop in case uni-processor. Since there is only one CPU and only one process will be running on it (at a time ) there is no other process for busy-looping. So for synchronization I just need to disable interrupt for protecting my critical section. Is this understanding of mine correct?
Ignore portability of drivers factor for this discussion.
A large amount of synchronisation code in the kernel compiles way to almost nothing in uni-processor kernels which descries the behaviour you describe. Performance of n-way system is definitely not 2n - and gets worse as the number of CPUs.
You should continue to write your driver with using synchronisation mechanisms for SMP systems - safe in the knowledge that you'll get the correct single-processor case when the kernel is configured for uni-processor.
Disabling interrupts globally is like taking a sledge-hammer to a nut - maybe just disabling pre-emption on the current CPU is enough - which the spinlock does even on uni-processor systems.
If you've not already done so, take a look at Chapter 5 of Linux Device Drivers 3rd Edition - there are a variety of spinlock options depending on the circumstance.
As you have stated that you are running the linux kernel as compiled in SMP mode on Uni-processor system so it's clear that you'll not get any benefit in terms of speed & memory.
As the linux-kernel uses extensive locking for synchronization. But it Uni-Processor mode there may be no need of locking theoretically but there are many cases where its necessary so try to use Locking where its needed but not as much as in SMP.
but you should know it well that Spinlocks are implemented by set of macros, some prevent concurrency with IRQ handlers while the
other ones not.Spinlocks are suitable to protect small pieces of code which are intended to run
for a very short time.
As of your second question, you are trying to remove spinlocks by disabling interrupts for Uni-Processor mode but Spinlock macros are in non-preemptible UP(Uni-Processor) kernels evaluated to empty macros(or some of them to macros just disabling/enabling interrupts). UP kernels with
preemption enabled use spinlocks to disable preemption. For most purposes, pre-emption can be tought of as SMP equivalent. so in UP kernels if you use Spinlocks then they will be just empty macro & i think it will be better to use it.
there are basically four technique for synchronization as..1->Nonpreemptability,2->Atomic Operations,3->Interrupt Disabling,4->Locks.
but as you are saying to disable interrupt for synchronization then remember Because of its simplicity, interrupt disabling is used by kernel functions for implementing a critical region.
This technique does not always prevent kernel control path interleaving.
Critical section should be short because any communication between CPU and I/O is blocked while a kernel control path is running in this section.
so if you need synchronization in Uni-Processor then use semaphore.
I know that spinlocks work with spining, different kernel paths exist and Kernels are preemptive, so why spinlocks don't work in uniprocessor systems? (for example, in Linux)
If I understand your question, you're asking why spin locks are a bad idea on single core machines.
They should still work, but can be much more expensive than true thread-sleeping concurrency:
When you use a spinlock, you're essentially asserting that you don't think you will have to wait long. You are saying that you think it's better to maintain the processor time slice with a busy loop than the cost of sleeping your thread and context-shifting to another thread or process. If you have to wait a very short amount of time, you can sleep and be reawakened almost immediately, but the cost of going down and up is more expensive than just waiting around.
This is more likely to be OK on multi-core processors, since they have much better concurrency profiles than single core processors. On multi core processors, between loop iterations, some other thread may have taken care of your prerequisite. On single core processors, it's not possible that someone else could have helped you out - you've locked up the one and only core.
The problem here is that if you wait or sleep on a lock, you hint to the system that you don't have everything you need yet, so it should go do some other stuff and come back to you later. With a spin lock, you never tell the system this, so you lock it up waiting for something else to happen - but, meanwhile, you're holding up the whole system, so something else can't happen.
The nature of a spinlock is that it does not deschedule the process - instead it spins until the process acquires the lock.
On a uniprocessor, it will either immediately acquire the lock or it will spin forever - if the lock is contended, then there will never be an opportunity for the process which currently holds the resource to give it up. Spinlocks are only useful when another process can execute while one is spinning on the lock - which means multiprocessor systems.
there are different versions of spinlock:
spin_lock_irqsave(&xxx_lock, flags);
... critical section here ..
spin_unlock_irqrestore(&xxx_lock, flags);
In Uni processor spin_lock_irqsave() should be used when data needs to shared between process context and interrupt context, as in this case IRQ also gets disabled. spin_lock_irqsave() work under all circumstances, but partly because they are safe they are also fairly slow.
However, in case data needs to be protected across different CPUs then it is better to use below versions, these are cheaper ones as IRQs dont get disabled in this case:
spin_lock(&lock);
...
spin_unlock(&lock);
In uniprocessor systems calling spin_lock_irqsave(&xxx_lock, flags); has the same effect as disabling interrupts which will provide the needed interrupt concurrency protection without unneeded SMP protection. However, in multiprocessor systems this covers both interrupt and SMP concurrency issues.
Spinlocks are, by their nature, intended for use on multiprocessor systems, although a uniprocessor workstation running a preemptive kernel behaves like SMP, as far as concurrency is concerned. If a nonpreemptive uniprocessor system ever went into a spin on a lock, it would spin forever; no other thread would ever be able to obtain the CPU to release the lock. For this reason, spinlock operations on uniprocessor systems without preemption enabled are optimized to do nothing, with the exception of the ones that change the IRQ masking status. Because of preemption, even if you never expect your code to run on an SMP system, you still need to implement proper locking.
Ref:Linux device drivers
By Jonathan Corbet, Alessandro Rubini, Greg Kroah-Hartma
Find the following two paragraph in Operating System Three Easy Pieces that might be helpful:
For spin locks, in the single CPU case, performance overheads can be
quite painful; imagine the case where the thread holding the lock is
pre-empted within a critical section. The scheduler might then run
every other thread (imagine there are N − 1 others), each of which
tries to ac- quire the lock. In this case, each of those threads will
spin for the duration of a time slice before giving up the CPU, a
waste of CPU cycles.
However, on multiple CPUs, spin locks work
reasonably well (if the number of threads roughly equals the number of
CPUs). The thinking goes as follows: imagine Thread A on CPU 1 and
Thread B on CPU 2, both contending for a lock. If Thread A (CPU 1)
grabs the lock, and then Thread B tries to, B will spin (on CPU 2).
However, presumably the crit- ical section is short, and thus soon the
lock becomes available, and is ac- quired by Thread B. Spinning to
wait for a lock held on another processor doesn’t waste many cycles in
this case, and thus can be effective
Can breakpoints be used in interrupt service routines (ISRs)?
Yes - in an emulator.
Otherwise, no. It's difficult to pull off, and a bad idea in any case. ISRs are (usually) supposed to work with the hardware, and hardware can easily behave very differently when you leave a gap of half a second between each instruction.
Set up some sort of logging system instead.
ISRs also ungracefully "steal" the CPU from other processes, so many operating systems recommend keeping your ISRs extremely short and doing only what is strictly necessary (such as dealing with any urgent hardware stuff, and scheduling a task that will deal with the event properly). So in theory, ISRs should be so simple that they don't need to be debugged.
If it's hardware behaviour that's the problem, use some sort of logging instead, as I've suggested. If the hardware doesn't really mind long gaps of time between instructions, then you could just write most of the driver in user space - and you can use a debugger on that!
Depending on your platform, you can do this by accessing the debug port of your processor, typically using the JTAG interface. Keep in mind that you're drastically changing everything that has to do with timing with that method, so your debug session may be useless. But then again, many bugs can be caught this way. Also mind MMU based memory mappings, as JTAG debuggers often don't take them into account.
In Windows, with a kernel debugger attached, you can indeed place breakpoints in interrupt handlers.