infinite loop in linux/windows kernel module - windows

In ubuntu10.04 linux kernel if I insmod a module which runs
while(1);
in init_module part, entire system stops.
However, if I load a sys file in Windows 7
which runs while(1); in DriverEntry part,
system gets slow but still works.
can someone explain me why two system differs
and what is happening inside kernel?...
I think in first case(infinite loop in init_module),
there is no reason the system stops. because
even if I make while(1); in init_module, it is running
in context of insmod user application program.
so the flow infinite loop has to be scheduled by hardware interrupt signal.
This is just my opinion, I want to know the details if I am wrong...

init_module() is a system call, it runs in kernel space and not in user space.
From what you have observed, it looks like the NT kernel performs module initialization in parallel, whereas the Linux kernel does it sequentially. It might have to do with their respective architectures, NT being a hybrid kernel and Linux being monolithic.

Adding to Frédéric's answer: on Windows the DriverEntry function runs at IRQL PASSIVE_LEVEL (same as virtually all user mode code, all if we exclude APCs). Which means that it can be interrupted by any code running at a higher IRQL at any point. So what you probably encounter here is that the thread that goes into the infinite loop is still being scheduled (thus consuming CPU time), but due to its (low) IRQL it isn't able to starve the system threads or much of the other code that is running. It will, however, be able to starve user mode threads. The effect can be anything from a slowdown to a perceived hanging system.

Related

Calling local_irq_disable() from the kernel also disable local interrupts in userspace?

From the kernel I can call local_irq_disable(). To my understanding it will disable the interrupts of the current CPU. And interrupts will remain disabled until I call local_irq_enable(). Please correct me if my understanding is incorrect.
If my understanding is correct, does it mean upon calling local_irq_disable() interrupt is also disabled for a process in the user space that is running on that same CPU?
More details:
I have a process running in the user space which I want to run without affected by interrupts and context switch. As it is not possible from the user space, I thought disabling interrupt and kernel preemption for a particular CPU from kernel will help in this case. Therefore, I wrote a simple device driver to disable kernel preemption and local interrupt by using the following code,
int i = irqs_disabled();
pr_info("before interrupt disable: %d\n", i);
pr_info("module is loaded on processor: %d\n", smp_processor_id());
id = get_cpu();
message[1] = smp_processor_id() + '0';
local_irq_disable();
printk(KERN_INFO " Current CPU id is %c\n", message[1]);
printk(KERN_INFO " local_irq_disable() called, Disable local interrupts\n");
pr_info("After interrupt disable: %d\n", irqs_disabled());
output: $dmesg
[22690.997561] before interrupt disable: 0
[22690.997564] Current CPU id is 1
[22690.997565] local_irq_disable() called, Disable local interrupts
[22690.997566] After interrupt disable: 1
I think the output confirms that local_irq_disable() does disable local interrupts.
After I disable the kernel preemption and interrupts, In the userspace I use CPU_SET() to pin my process into that particular CPU. But after doing all these I'm still not getting the desired outcome. So, it seems like disabling interrupt of a particular CPU from kernel also disable interrupts for a user space process running on that CPU is not true. I'm confused.
I was looking for an answer to the above question but could not get any suitable answer.
Duration of the CPU state with disabled interrupts should be short, because it affects the whole OS. For that reason allowing user space code to be run with disabled interrupts is considered as bad practice and is not supported by the Linux kernel.
It is responsibility of the kernel module to wrap by local_irq_disable / local_irq_enable only the kernel code. Sometimes the kernel itself could "fix" incorrect usage of these functions, but that fact shouldn't be relied upon when write a module.
I have a process running in the userspace which I want to run without affected by interrupts and context switch.
Protection from the context switch could be achieved by proper setting of scheduling policy, affinity and priority of the process. That way, the scheduler will never attempt to reschedule your process. There are several questions on Stack Overflow about making a CPU to be exclusive for a selected process.
As for interrupts, they shouldn't be disabled for a user code.
If user code accesses some hardware which should have interrupts disabled, then consider moving your code into the kernel space.
If even rare interrupts badly affect on the performance of your process or its timing, then try to reconfigure Linux kernel to be "more real time". There are also some boot-time configuration options, which could help in further reducing number of interrupts on a specific core(s). See e.g. that question: Why does using taskset to run a multi-threaded Linux program on a set of isolated cores cause all threads to run on one core?.
Note, that Linux kernel is not a base for real-time OS and never intended to be. So, if no configuration and boot settings could help you, consider to choose for your application another OS, which is real time.

What happens to program when it accesses kernel space

I have read about user space and kernel space and how a program's execution path can bring it from user space to kernel space, I suppose an example of this is if my program runs like this
Poco::Net::SocketAddress sender;
char buffer[64000];
.
.
.
socket.receiveFrom(buffer, sizeof(buffer), sender);
since this call requires accessing the network card, I think it should go into kernel space.
My question is:
What happens as the program makes the socket.receivefrom(...) call
Does the thread go to sleep and give up its core since it is going
to kernel space and only gets woken up when the char buffer has been
written
Does the thread go directly to kernel space and come back to user space after writing into the char buffer
No. The thread goes to execute kernel code, at kernel-permissions (ring 0 in an x86). The thread might go to "sleep" (i.e. the CPU might go and execute a different program, or will go idle, this depends on what the scheduler decides) inside the kernel. However, it might not go to sleep at all, if, for example, data is already available from the network card. From a user perspective, you know that when the call returns you have your data in the buffer, and you might expect the call to take a while.
It depends on the scheduler. You might get an interrupt at any time and execute something else. But generally, yes, you go to the kernel and back.

Linux (or other *nix): Attaching an interrupt to userspace

I'm trying to make sure that a unique user process executes as soon as possible after a particular hardware interrupt occurs.
One mechanism I'm aware of for doing this is to write a small kernel module that exports a device while sleeping inside the read handler. The module also registers an irq handler, which does nothing but wake the process. Then from the user's perspective, reads to that device block until the relevant interrupt occurs.
(1) On a modern CPU with a mainline kernel, can you reliably expect sub millisecond latency between the kernel seeing the interrupt and the user process regaining control with this?
(2) Are there any lower latency mechanisms on a mainline kernel?
Apply the PREEMPT_RT patch to the kernel and compile it configuring full preemptability through make menuconfig.
This will allow you to have threaded interrupts (i.e., interrupt handlers executed as kernel threads). Then, you can assign maximum priority (i.e., RT prio > 50) to your specific interrupt handler (check its PID using ps aux) and to your specific process, and a lower priority to anything else.

who is running kernel if cpu is running processes?

Suppose in a two process environment, one process is scheduled for execution by the kernel, and it demanded for some data which is not available in the RAM. So the cpu will indicate the kernel that something is not available and the process will be suspended. Then after kernel loads the second process for execution through the CPU and start investigating about the data in secondary memory location (say virtual memory) and gets it, puts it back to main memory by a swap to the memory data which is currently inactive, and puts the process back in the ready queue for execution.
We know that everything in computer system is get manipulated by CPU only and if CPU is busy executing continuously the process code then who is executing the kernel code to perform the tasks done by kernel?
Please let me know if i am able to explain the scenario.
At any point in time, CPU (/s) will be
Running a process in User Mode.
Running on behalf of a process in Kernel Mode to execute previleged instruction or access hardware (for example when system call read / write is issued).
Running in repsonse to a hardware interrupt. i.e. running in interrupt context. (Not associated with any process in particular) and yes in kernel mode.
Running some kernel threads to serve deferred work like soft irq. (Tasklet / Softirq)
Running CPU idle thread if nothing is there to execute.
If you are in particular asking about scheduling, then
Suppose a process is running and now it has issued a read call to retrieve data from hard disk, say, then process is removed from cpu and kernel invokes schedule() functions. So here, first process issues read system call, which results in switching from user mode to kernel mode. The kernel which is running on behalf of the process prepares for the hard disk read operation and then calls schedule() function
Suppose a hardware interrupt has come, then currently running process is removed, and interrupt service handler for that interrupt begins to execute in kernel mode (obviously).
Basically, kernel runs in between user processes !!
Clear now ?
Shash
The kernel runs either as a result of a hardware interrupt, or as a result of being invoked by a process to do something. In both cases the code which was executing at that moment stops running until the kernel finishes its job.
It is similar to a function call: when function A calls function B, function A has to wait until function B is done doing what it does, and returns control to function A. You do not need multiple CPUs, or any kind of magic to accomplish this.
The CPU is not continuously executing process code. The CPU is interrupted to perform various operations. Interrupts can occur for various reasons: a resource becomes available, a previous action completes, or simply a timer goes off.
I recommend this series of videos for more in-depth information: http://academicearth.org/courses/operating-systems-and-system-programming

How does Windows protect transition into kernel mode?

How does Windows protect against a user-mode thread from arbitrarily transitioning the CPU to kernel-mode?
I understand these things are true:
User-mode threads DO actually transition to kernel-mode when a system call is made through NTDLL.
The transition to kernel-mode is done through processor-specific instructions.
So what is special about these system calls through NTDLL? Why can't the user-mode thread fake-it and execute the processor-specific instructions to transition to kernel-mode? I know I'm missing some key piece of Windows architecture here...what is it?
You're probably thinking that thread running in user mode is calling into Ring 0, but that's not what's actually happening. The user mode thread is causing an exception that's caught by the Ring 0 code. The user mode thread is halted and the CPU switches to a kernel/ring 0 thread, which can then inspect the context (e.g., call stack and registers) of the user mode thread to figure out what to do. Before syscall, it really was an exception rather than a special exception specifically to invoke ring 0 code.
If you take the advice of the other responses and read the Intel manuals, you'll see syscall/sysenter don't take any parameters - the OS decides what happens. You can't call arbitrary code. WinNT uses function numbers that map to which kernel mode function the user mode code will execute (for example, NtOpenFile is fnc 75h on my Windows XP machine (the numbers change all the time; it's one of the jobs of NTDll is to map a function call to a fnc number, put it in EAX, point EDX to the incoming parameters then invoke sysenter).
Intel CPUs enforce security using what's called 'Protection Rings'.
There are 4 of these, numbered from 0 to 3. Code running in ring 0 has the highest privileges; it can (practically) do whatever it pleases with your computer. The code in ring 3, on the other hand, is always on a tight leash; it has only limited powers to influence things. And rings 1 and 2 are currently not used for any purpose at all.
A thread running in a higher privileged ring (such as ring 0) can transition to lower privilege ring (such as ring 1, 2 or 3) at will. However, the transition the other way around is strictly regulated. This is how the security of high privileged resources (such as memory) etc. is maintained.
Naturally, your user mode code (applications and all) runs in ring 3 while the OS's code runs in ring 0. This ensures that the user mode threads can't mess with the OS's data structures and other critical resources.
For details on how all this is actually implemented you could read this article. In addition, you may also want to go through Intel Manuals, especially Vol 1 and Vol 3A, which you can download here.
This is the story for Intel processors. I'm sure other architectures have something similar going on.
I think (I may be wrong) that the mechanism which it uses for transition is simple:
User-mode code executes a software interrupt
This (interrupt) causes a branch to a location specified in the interrupt descriptor table (IDT)
The thing that prevents user-mode code from usurping this is as follows: you need to be priviledged to write to the IDT; so only the kernel is able to specify what happens when an interrupt is executed.
Code running in User Mode (Ring 3) can't arbitrarily change to Kernel Mode (Ring 0). It can only do so using special routes -- jump gates, interrupts, and sysenter vectors. These routes are highly protected and input is scrubbed so that bad data can't (shouldn't) cause bad behavior.
All of this is set up by the kernel, usually on startup. It can only be configured in Kernel Mode so User-Mode code can't modify it.
It's probably fair to say that it does it in a (relatively) similar way to what Linux does. In both cases it's going to be CPU-specific, but on x86 probably either a software interrupt with the INT instruction, or via SYSENTER instruction.
The advantage of looking at how Linux does it is that you can do so without a Windows source licence.
The userspace source part is here here at LXR and the
kernel space bit - look at entry_32.S and entry_64.S
Under Linux on x86 there are three different mechanisms, int 0x80, syscall and sysenter.
A library which is built at runtime by the kernel called vdso is called by the C library to implement the syscall function, which uses a different mechanism depending on the CPU and which system call it is. The kernel then has handlers for those mechanisms (if they exist on the specific CPU variant).

Resources