Linux kernel: why preemption is disabled when use per-CPU variable? - linux-kernel

Im looking at this macro from the linux kernel which has to do with handling cpu-specific variables
#define get_cpu_var(var) \
(*({ \
preempt_disable(); \
this_cpu_ptr(&var); \
}))
Why do we disable preemption? Isnt preemption something that cant happen when you are in the kernel? (Since the kernel is the one doing the preemption)

Why do we disable preemption?
To avoid having the thread preempted and rescheduled on a different processor core.
Isnt preemption something that cant happen when you are in the kernel?
This was true when there was still a big kernel lock. Having one global lock means that if you block in-kernel, no other thread may enter the kernel. Now, with the more fine-grained locking, sleeping in the kernel is possible. Linux can be configured at build-time for other preemption models, e.g. CONFIG_PREEMPT.
While your normal desktop kernel is probably configured with CONFIG_PREEMPT_VOLUNTARY, some distributions also ship CONFIG_PREEMPT as a separate low-latency kernel package, e.g. for audio use. For real-time use cases, The preempt_rt patchset even makes most spinlocks preemptible (hence the name).

Related

Are PCIe device drivers beneficial if using Linux as a bootloader for bare-metal code?

I am developing an embedded system on a PowerPC processor and there is need for communication with an FPGA via PCIe. I wish to use Linux/embedded-Linux as a bootloader to leverage its PCIe initialization code and driver API for simplified PCIe driver development. However in the end I want to be running bare-metal code (no OS running). So I am looking at using PetitBoot/kexec to jump from Linux to my own code.
Is this possible?
My current understanding of PCIe drivers leads me to believe that once the device is initialized, so long as I have a pointer to the address space, I should be able to simply execute MMIO R/W operations directly to the memory space. So even if kexec overwrites the driver code I should be able to use the device because the driver has done its job already.
Is this correct?
If not, what are my alternatives?
I don't think this approach would be a good idea. Drivers that were written with the Linux OS in mind are going to assume that all of the OS's resources are available, not just memory allocations. For example, it may configure interrupt handlers, but when the OS is not longer available, your hardware may get hung because nothing is acknowledging and servicing its interrupt requests.
I'm skeptical of the memory initialization as well. I suppose you could theoretically allocate some DMA memory and pass the resulting physical address to your bare-metal application as it takes over, but the whole process seems sketchy. It would be very difficult to make sure everything in Linux is shut down cleanly while leaving the PCIe subsystem running. You'll have to look at the driver's shut down routines and see what it does to the card to make sure it doesn't shut down the device and make it unresponsive to your bare-metal code.
I would suggest that you instead go through the Linux-based driver and use it as a guide to construct a new bare-metal driver. Copy the initialization code that you need, and leave out the Linux-specific configuration details.

Can a Linux kernel run as an ARM TrustZone secure OS?

I am trying to run a Linux kernel as the secure OS on a TrustZone enabled development board(Samsung exynos 4412). Although somebody would say secure os should be small and simple. But I just want to try. And if it is possible, then write or port a trustlet application to this secure os will be easy, especially for applications with UI(trusted UI).
I bought the development board with a runnable secure OS based on Xv6 and the normal os is Android(android version 4.2.2, kernel version 3.0.15). I have tried to replace the simple secure os with the android Linux kernel, that is, with a little assembly code ahead, such as clearing the NS bit of SCR register, directly called the Linux kernel entry(with necessary kernel tagged list passed in).
The kernel uncompressed code is executed correctly and the first C function of the kernel, start_kernel(), is also executed. Almost all the initialization functions run well except running to calibrate_delay(). This function will wait for the jiffies changed:
/* wait for "start of" clock tick */
ticks = jiffies;
while (ticks == jiffies);
I guess the reason is no clock interrupt is generated(I print logs in clock interrupt callback functions, they are never gotten in). I have checked the CPSR state before and after the local_irq_enable() function. The IRQ and FIQ bit are set correctly. I also print some logs in the Linux kernel's IRQ handler defined in the interrupt vectors table. Nothing logged.
I know there may be some differences in interrupt system between secure world and non secure world. But I can't find the differences in any documentation. Can anybody point out them? And the most important question is, as Linux is a very complicated OS, can Linux kernel run as a TrustZone secure OS?
I am a newbie in Linux kernel and ARM TrustZone. Please help me.
Running Linux as a secure world OS should be standard by default. Ie, the secure world supervisor is the most trusted and can easily transition to the other modes. The secure world is an operating concept of the ARM CPU.
Note: Just because Linux runs in the secure world, doesn't make your system secure! TrustZone and the secure world are features that you can use to make a secure system.
But I just want to try. And if it is possible, then write or port a trustlet application to this secure os will be easy, especially for applications with UI(trusted UI).
TrustZone allows partitioning of software. If you run both Linux and the trustlet application in the same layer, there is no use. The trustlet is just a normal application.
The normal mode for trustlets is to setup a monitor vector page on boot and lock down physical access. The Linux kernel can then use the smc instruction to call routines in the trustlet to access DRM type functionality to decrypt media, etc. In this mode, Linux runs as a normal world OS, but can call limited functionality with-in the secure world through the SMC API you define.
Almost all the initialization functions run well except running to calibrate_delay().
This is symptomatic of non-functioning interrupts. calibrate_delay() runs a tight loop while waiting for a tick count to increase via system timer interrupts. If you are running in the secure world, you may need to route interrupts. The register GICD_ISPENDR can be used to force an interrupt. You may use this to verify that the ARM GIC is functioning properly. Also, the kernel command line option lpj=XXXXX (where XXXX is some number) can skip this step.
Most likely some peripheral interrupt router, clock configuration or other interrupt/timer initialization is done by a boot loader in a normal system and you are missing this. Booting a new board is always difficult; even more so with TrustZone.
There is nothing technically preventing Linux from running in the Secure state of an ARM processor. But it defeats the entire purpose of TrustZone. A large, complex kernel and OS such as Linux is infeasible to formally verify to the point that it can be considered "Secure".
For further reading on that aspect, see http://www.ok-labs.com/whitepapers/sample/sel4-formal-verification-of-an-os-kernel
As for the specific problem you are facing - there should be nothing special about interrupt handling in Secure vs. Non-secure state (unless you explicitly configure it to be different). But it could be that the secure OS you have removed was performing some initial timer initializations that are now not taking place.
Also 3.0.15 is an absolutely ancient kernel - it was released 2.5 years ago, based on something released over 3 years ago.
There are several issues with what you are saying that need to be cleared up. First, are you trying to get the Secure world kernel running or the Normal world kernel? You said you wanted to run Linux in the SW and Android in the NW, but your question stated, "I have tried to replace the simple secure os with the android Linux kernel". So which kernel are you having problems with?
Second, you mentioned clearing the NS-bit. This doesn't really make sense. If the NS-bit is set, clearing it requires means that you are running in the NW already (as the bit being set would indicate), you have executed a SMC instruction, switched to Monitor mode, set the NS-bit to 0, and then restored the SW registers. Is this the case?
As far as interrupts are concerned, have you properly initialized the VBAR for each execution mode, i.e. Secure world VBAR, Normal world VBAR, and MVBAR. All three will need to be setup, in addition to setting the proper values in the NSACR and others to ensure interrupts are channeled to the correct execution world and not all just handled by the SW. Also, you do need separate exception vector tables and handlers for all three modes. You might be able to get away with only one set initially, but once you partition your memory system using the TZASC, you will need separate everything.
TZ requires a lot of configuration and is not simply handled by setting/unsetting the NS-bit. For the Exynos 4412, numerous TZ control registers that must be properly set in order to execute in the NW. Unfortunately, none of the information on them is covered in the Public version of the User's Guide. You need the full version in order to get all the values and address necessary to actually run a SW and NW kernel on this processor.

Setting a MCU in low power mode on Linux

I'm running embedded Linux (Angstrom) on an Atmel board (at91 sam9g25) mounting an ARM MCU.
I'd like to set the CPU in idle mode, ideally from userspace by using a function (then the system would be waken up by an hardware gpio interrupt). How can I do that? Alternatively, how can it be done in kernelspace?
I cannot find much, maybe somebody has some example to start from?
Try checking this page . Try also reading Optimizing Power Consumption for AT91SAM9261-based Systems to have an idea about what you can do with power management.
What you can basically do is setting the state you want in /sys/power/state but before entering the low-power state you need to set how your system can be awakened.
Be advised that, in my experience, I have seen a lot of different behaviors by changing the kernel, so by patient and try different versions.

What is a Windows Kernel Driver?

What is Windows Kernel Driver written with the WDK?
What is different from normal app or service?
Kernel drivers are programs written against Windows NT's native API (rather than the Win32 Subsystem's API) and which execute in kernel mode on the underlying hardware. This means that a driver needs to be able to deal with switching virtual memory contexts between processes, and needs to be written to be incredibly stable -- because kernel drivers run in kernel mode, if one crashes, it brings down the entire system. Kernel drivers are unsuitable for anything but hardware devices because they require administrative access to install or start, and because they remove the security the kernel normally provides to programs that crash -- namely, that they crash themselves and not the entire system.
Long story short:
Drivers use the native API rather than the Win32 API
This means that drivers generally cannot display any UI.
Drivers need to manage memory and how memory is paged explicitly -- using things like paged pool and nonpaged pool.
Drivers need to deal with process context switching and not depend on which process happens to have the page table while they're running.
Drivers cannot be installed into the kernel by limited users.
Drivers run with privileged rights at the processor level.
A fault in a user-level program results in termination of that program's process. A fault in a driver brings down the system with a Blue Screen of Death.
Drivers need to deal with low level hardware bits like Interrupts and Interrupt Request Levels (IRQLs).
It is code that runs in kernel mode rather than user mode. Kernel mode code has direct access to the internals of the OS, hardware etc.
Invariably you write kernel mode modules to implement device drivers.
A kernel driver is a low-level implementation of an "application".
Because it runs in the kernel context, it has the ability to access the kernel API and memory directly.
For example, a kernel driver should be used to:
Control access to files (password protection,hiding)
Allow accessing non-standard filesystems (like ext, reiserfs, zfs and etc.) and devices
True API hooks
...and for many other reasons
If you'd like to get know more, you can search for keyword "ring0" with your favorite search engine.
Others have explained the difference as the perspective of system level.
If you are doing development in C++, there are below differences in User mode development and kernel-mode development.
Unhandled exceptions crash the process in User mode, but in kernel mode, it crashes the whole system(face BSOD).
When the user-mode process terminates without free private memory, the system implicitly free process memory. But in kernel mode, remaining memory free after system boot.
The user-mode code is written and execute in PASSIVE_LEVEL. In kernel mode, there are more IRQL level.
Kernel code debugging done using separate machines. But you can debug user mode on same machine.
you can't use all C++ functionality in kernel-mode such as Exception handling and STL.
Entry points are different, in user mode, you use the main as the entry point. But in kernel mode, we need to use DriverEntry.
You can't use new operator in kernel mode, you need to overload it explictly.

How can a kernel be non preemptive and still have multiple control paths

In an operating systems course I took a while ago we were working on an old, non-preemptive kernel of Linux (2.4.X). However, we were told that there could be multiple control paths in the kernel simultaneously. Doesn't that contradict the non-preemptive nature of the kernel?
EDIT: I mean, there is no context switch inside the kernel. Last time I tried asking this question I got the response "well, the Linux kernel is preemptive, so there's no problem".
Within the 2.4 kernel, although kernel code could not be arbitrarily pre-empted by other kernel code, kernel code could still voluntarily give up the CPU by sleeping (this is obviously quite a common case).
In addition, kernel code could always be pre-empted by interrupt handlers (unless it specifically disabled interrupts), and the 2.4 kernel also supported SMP, allowing multiple CPUs to be executing within the kernel simultaneously.
The Linux kernel can run in interrupt context or in process (user) context. Process context means it is running on behalf of a process, which has called a syscall. Interrupt context means it is running on behalf of some kind of interrupt (hardware interrupt, softirq, ...).
When you talk about preemptive multitasking, it means the kernel can decide to preempt some task to run another task. When you talk about preemptive kernels, it means the kernel can decide to preempt itself running to run some other kernel code.
Now, before Linux was a preemptive kernel, you could run kernel code on several CPUs, and kernel code could be interrupted by hardware interrupts (which could end up running softirqs before returning,...). Preemptive kernels mean the kernel can also be preemptied by process context kernel code, to avoid long latencies (preemptive Linux came from the Linux realtime tree).
Of course, all of this is better explained in Rusty Russell's Unreliable Guide to Kernel Hacking and Unreliable Guide to Kernel Locking.
EDIT:
Or, trying to explain it better, when a task calls a syscall on a non-preemptive kernel, that task cannot be preemptied until the syscall ends (maybe with EINTR, but this could be a long time). A preemptive kernel allows that task to be preemptied, leading to lower average-case and worst-case latencies for other tasks waiting to run.
A non-preemptive kernel means that the kernel does not perform context switching on behalf of another process, or interrupt another running process. It can still be multi-processing by implementing cooperative multitasking where the actually running processes themselves yield control to the kernel or other processes. So yes, you can have multi-tasking and a non-preemptive kernel.
There is no context switching within the kernel for MONOLITHIC kernels, but of course there is still multitasking performed by the kernel....therefore you still have multi-tasking and non-premptiveness
The Linux kernel offloads a lot of work to kernel threads, which may be scheduled in and out alongside userspace tasks, independent of kernel preemption. Even your old 2.4 kernel has these kernel threads, albeit less of them than a modern 2.6 kernel. The 2.6 kernel now has several levels of preemption that can be chosen at compile time, but full preemption is not the default.

Resources