Why does the Linux kernel panic pin the CPU at 100%? - linux-kernel

Take a look at the following:
qemu-system-x86_64 -kernel /boot/vmlinuz-linux
This will cause a kernel panic, as expected (there is no init). Much less expected is that it causes one processor core to spin until qemu is killed. Why is this? What exactly causes the kernel to leave the CPU in this state?

Why does the Linux kernel panic pin the CPU at 100%?
This loop https://code.woboq.org/linux/linux/kernel/panic.c.html#74 :
while (1)
cpu_relax();

Related

What's the process of disabling interrupt in multi-processor system?

I have a textbook statement says disabling interrupt is not recommended in multi-processor system, and it will take too much time. But I don't understand this, can anyone show me the process of multi-processor system disabling the interrupts? Thanks
on x86 (and other architectures, AFAIK), enabling/disabling interrupts is on a per-core basis. You can't globally disable interrupts on all cores.
Software can communicate between cores with inter-processor interrupts (IPIs) or atomic shared variables, but even so it would be massively expensive to arrange for all cores to sit in a spin-loop waiting for a notification from this core that they can re-enable interrupts. (Interrupts are disabled on other cores, so you can't send them an IPI to let them know when you're done your block of atomic operations.) You have to interrupt whatever all 7 other cores (e.g. on an 8-way SMP system) are doing, with many cycles of round-trip communication overhead.
It's basically ridiculous. It would be clearer to just say you can't globally disable interrupts across all cores, and that it wouldn't help anyway for anything other than interrupt handlers. It's theoretically possible, but it's not just "slow", it's impractical.
Disabling interrupts on one core doesn't make something atomic if other threads are running on other cores. Disabling interrupts works on uniprocessor machines because it makes a context-switch impossible. (Or it makes it impossible for the same interrupt handler to interrupt itself.)
But I think my confusion is that for me the difference between 1 core and 8 core is not a big number for me; why disabling all of them from interrupt is time consuming.
Anything other than uniprocessor is a fundamental qualitative difference, not quantitative. Even a dual-core system, like early multi-socket x86 and the first dual-core-in-one-socket x86 systems, completely changes your approach to atomicity. You need to actually take a lock or something instead of just disabling interrupts. (Early Linux, for example, had a "big kernel lock" that a lot of things depended on, before it had fine-grained locking for separate things that didn't conflict with each other.)
The fundamental difference is that on a UP system, only interrupts on the current CPU can cause things to happen asynchronously to what the current code is doing. (Or DMA from devices...)
On an SMP system, other cores can be doing their own thing simultaneously.
For multithreading, getting atomicity for a block of instructions by disabling interrupts on the current CPU is completely ineffective; threads could be running on other CPUs.
For atomicity of something in an interrupt handler, if this IRQ is set up to only ever interrupt this core, disabling interrupts on this core will work. Because there's no threat of interference from other cores.

What is the relation between reentrant kernel and preemptive kernel?

What is the relation between reentrant kernel and preemptive kernel?
If a kernel is preemptive, must it be reentrant? (I guess yes)
If a kernel is reentrant, must it be preemptive? (I am not sure)
I have read https://stackoverflow.com/a/1163946, but not sure about if there is relation between the two concepts.
I guess my questions are about operating system concepts in general. But if it matters, I am interested mostly in Linux kernel, and encounter the two concepts when reading Understanding the Linux Kernel.
What is reentrant kernel:
As the name suggests, a reentrant kernel is the one which allows
multiple processes to be executing in the kernel mode at any given
point of time and that too without causing any consistency problems
among the kernel data structures.
What is kernel preemption:
Kernel preemption is a method used mainly in monolithic and hybrid
kernels where all or most device drivers are run in kernel space,
whereby the scheduler is permitted to forcibly perform a context
switch (i.e. preemptively schedule; on behalf of a runnable and higher
priority process) on a driver or other part of the kernel during its
execution, rather than co-operatively waiting for the driver or kernel
function (such as a system call) to complete its execution and return
control of the processor to the scheduler.
Can I imagine a preemptive kernel which is not reentrant? Hardly, but I can. Let's consider an example: some thread performs a system call. While entering a kernel it takes a big kernel lock and forbids all interrupt except scheduler timer irq. After that this thread is preempted in kernel by a scheduler. Now we may switch to another userspace thread. This process do some work in userspace and after that enters kernel, take big kernel lock and sleeps and so on. In practice looks like this solution can't be implemented, because of huge latency due to forbidding interrupts on a big time intervals.
Can I imagine reentrant kernel which is not preemptive? Why not? Just use cooperative preemption in kernel. Thread 1 enters kernel and calls thread_yield() after some time. Thread 2 enters kernel do it's own work maybe call another thread_yield maybe not. There is nothing special here.
As for linux kernel it is absolutely reentrant, the kernel preemption may be configured by CONFIG_PREEMPT. Also voluntary preemption is possible and many other different options.

How the kernel different subsystems share CPU time

Processes in userspace are scheduled by the kernel scheduler to get processor time but how the different kernel tasks get CPU time? I mean, when no process at userspace are requering CPU time (so CPU is iddle by executing NOP instructions) but some kernel subsystem need to carry out some task regularly, are timers and other hw and sw interrupts the common methods to get CPU time in kernel space?.
It's pretty much the same scheduler. The only difference I could think of is that kernel code has much more control over execution flow. For example, there is direct call to scheduler schedule().
Also in kernel you have 3 execution contexts - hardware interrupt, softirq/bh and process. In hard (and probably soft) interrupt context you can't sleep, so scheduling is not done during executing code in this context.

Difference between User vs Kernel System call

A system call is how a program requests a service from an operating system's kernel.
They can occur in user-mode and kernel-mode.
What are differences?
For example:
Overhead
System time
A system call is the way you transition between the application ("user mode") and the kernel.
Syscalls are slower than normal function calls, but newer x86 chips from Intel and AMD have a special sysenter/syscall opcode to make it take just a hundred nanoseconds or so, give or take.
#Leo,
Could you elaborate on how system calls vary when made from within kernel space? For better understanding of the Linux kernel, which is written in C and assembly
Notice, that system calls are just an interface between user space and kernel space. When you need some computer resources (files, networks, ...), you ask the kernel to give it to you (under the hood you ask the kernel to run kernel code, that is responsible for it).
Overhead of system calls is that you need to perform a CPU interrupt. As Will mentioned the time for it is very depends of a CPU type.

linux programable interval timers

Let's assume below scenario:
In the multi-processor system we have, PIT that interrupt any cpu in system and its update
jiffies value which get protected by write_seqlock(&xtime_lock).
When all CPU receive PIT interrupt they do jiffies++. In this case if we have 4 CPU, the value of jiffies is incremented by 4 ticks at each one tick, therefore our time is not true.
Is this scenario true or not?
i belive this can true in unerstand linux kernel ebook below text mentioned:
The local APIC timer sends an interrupt only to its processor, while the PIT raises a global interrupt, which may be handled by any CPU in the system.
what is your comment?
False. Only one CPU receives the interrupt.
i find my question i hope it is useful for you.
There are two components in the Intel APIC system, the local APIC (LAPIC) and the I/O APIC.
we know due of LAPIC bu about I/O APIC
I/O APICs contain a redirection table, which is used to route the interrupts it receives from peripheral buses to one or more local APICs.
(it is from wikipedia)
there for just one cpu recieve interupt or in some case more than one.
thank for your attention.

Resources