how does kernel code run on SMP machines - linux-kernel

How does the kernel code run on SMP machines? i know that module (driver) code can run on several processors\cores, but it this the same also for the core kernel code?

Drivers are part of kernel, whether they are modular or built-in.
It is the scheduler that schedules Tasks[processes/threads] to each CPU/core.
Scheduler is a single Software entity that runs itself and runs other processes(kernel, its drivers, kernel threads, system calls, apps, ...).
Every process runs on the scheduler as per scheduling Algorithm under use.
It is the the scheduler that decides which process is supposed to be run on which CPU/core
Ex: Say Round Robin Scheduler, It keeps a time slice for every process that enters the "Ready Queue[RQ]". If the scheduler finds any processor/core idle and there are processes in RQ, it starts a timer to generate an interrupt when the timer reaches the time slice limit, and this interrupt will trigger the scheduler in the interrupt handler, and a process from the RQ will be given to the idle core for execution/running.
Thus, at any point of time, all the processors can be made to run the tasks, hence achieving high through put, if there are enough tasks to be run.

Related

OS thread scheduling and cpu usage relations

As I know, for threads scheduling, Linux implements a fair scheduler and Windows implements the Round-robin (RR) schedulers: each thread has a time slice for its execution (correct me if I'm wrong).
I wonder, is the CPU usage related to the thread scheduling?
For example: there are 2 threads executing at the same time, and the time slice for system is 15ms. The cpu has only 1 core.
Thread A needs 10ms to finish the job and then sleep 5ms, run in a loop.
Thread B needs 5ms to finish the job and then sleep 10ms, also in a loop.
Will the CPU usage be 100%?
How is the thread scheduled? Will thread A use up all its time and then schedule out?
One More Scenario:
If I got a thread A running, that is then blocked by some condition (e.g network). Will the CPU at 100% affect the wakeup time of this thread? For example, a thread B may be running in this time window, will the thread A be preempted by the OS?
As i know that Linux implements a fair scheduler and Windows System
implements the Round-robin (RR) schedulers for threads scheduling,
Both Linux and Windows use priority-based, preemptive thread schedulers. Fairness matters but it's not, strictly speaking, the objective. Exactly how these scheduler work depends on the version and the flavor (client vs. server) of the system. Generally, thread schedulers are designed to maximize responsiveness and alleviate scheduling hazards such as inversion and starvation. Although some scheduling decisions are made in a round-robin fashion, there are situations in which the scheduler may insert the preempted thread at the front of the queue rather than at the back.
each thread has a time slice for its execution.
The time slice (or quantum) is really more like a guideline than a rule. It's important to understand that a time slice is divisible and it equals some variable number of clock cycles. The scheduler charges CPU usage in terms of clock cycles, not time slices. A thread may run for more than a time slice (e.g., a time slice and a half). A thread may also voluntarily relinquish the rest of its time slice. This is possible because the only way for a thread to relinquish its time slice is by performing a system call (sleep, yield, acquire lock, request synchronous I/O). All of these are privileged operations that cannot be performed in user-mode (otherwise, a thread can go to sleep without telling the OS!). The scheduler can change the state of the thread from "ready" to "waiting" and schedule some other ready thread to run. If a thread relinquishes the rest of its time slice, it will not be compensated the next time it is scheduled to run.
One particularly interesting case is when a hardware interrupt occurs while a thread is running. In this case, the processor will automatically switch to the interrupt handler, forcibly preempting the thread even if its time slice has not finished yet. In this case, the thread will not be charged for the time it takes to handle the interrupt. Note that the interrupt handler would be indeed utilizing the CPU. By the way, the overhead of context switching itself is also not charged towards any time slice. Moreover, on Windows, the fact that a thread is running in user-mode or kernel-mode by itself does not have an impact on its priority or time slice. On Linux, the scheduler is invoked at specific places in the kernel to avoid starvation (kernel preemption implemented in Linux 2.5+).
So the CPU usage will be 100%? And how is the thread scheduled? Will
thread A use up all its time and then schedule out?
It's easy to answer these questions now. When a thread goes to sleep, the other gets scheduled. Note that this happens even if the threads have different priorities.
If i got a thread running, and blocked by some
condition(e.g network). Will the CPU 100% will affect the wakeup time
of this thread? For example, another thread may running in its time
window and will not schedule out by the OS?
Linux and Windows schedulers implement techniques to enable threads that are waiting on I/O operations to "wake up quickly" and get higher chances of being scheduled soon. For example, on Windows, the priority of a thread waiting on an I/O operation may be boosted a bit when the I/O operation completes. This means that it can preempt another running thread before finishing its time slice, even if both threads had the same priorities initially. When a boosted-priority thread wakes up, its original priority is restored.
So the CPU usage will be 100%?
Ideally speaking, the answer would be yes and by ideally I mean , you are not considering the time wasted in doing performing a context switch. Practically , the CPU utilization is increased by keeping it busy all of the time but still there is some amount of time that is wasted in doing a context switch(the time it takes to switch from one process or thread to another).
But I would say that in your case the time constraints of both threads are aligned perfectly to have maximum CPU utilization.
And how is the thread scheduled? Will thread A use up all its time and
then schedule out?
Well it really depends, in most modern operating systems implementations , if there is another process in the ready queue, the current process is scheduled out as soon as it is done with CPU , regardless of whether it still has time quantum left. So yeah if you are considering a modern OS design then the thread A is scheduled out right after 10ms.

In windows, what does the CPU do while blocking

One has blocking calls whenever the CPU is waiting for some system to respond, e.g. waiting for an internet request. Is the CPU literally wasting time during these calls (I don't know whether there are machine instructions other than no-op that would correspond to the CPU literally wasting time). If not, what is it doing?
The thread is simply skipped when the operating system scheduler looks for work to hand off to a core. With the very common outcome that nothing needs to be done. The processor core then executes the HLT instruction.
In the HALT state it consumes (almost) no power. An interrupt is required to bring it back alive. Most typically that will be the clock interrupt, it ticks 64 times per second by default. It could be a device interrupt. The scheduler then again looks for work to do. Rinse and repeat.
Basically, the kernel maintains run queues or something similar to schedule threads. Each thread receives a time slice where it gets to execute until it expires or it volontarily yields its slice. When a thread yields or its slice expires, the scheduler decides which thread gets to execute next.
A blocking system call would result in a yield. It would also result in the thread being removed from the run queue and placed in a sleep/suspend queue where it is not eligible to receive time slices. It would remain in the sleep/suspend queue until some critiera is met (e.g. timer tick, data available on socket, etc.). Once the criteria is met, it'd be placed back into the run queue.
Sleep(1); // Yield, install a timer, and place the thread in a sleep queue.
As long as there are tasks in any of the run queues (there may be more than one, commonly one per processor core), the scheduler will keep handing out time slices. Depending on scheduler design and hardware constraints, these time slices may vary in length.
When there are no tasks in the run queue, the core can enter a powersaving state until an interrupt is received.
In essence, the processor never wastes time. Its either executing other threads, servicing interrupts or in a powersaving state (even for very short durations).
While a thread is blocked, especially if it is blocked on an efficient wait object that puts the blocked thread to sleep, the CPU is busy servicing other threads in the system. If there are no application threads running, there is always system threads running. The CPU is never truly idle.

Instruct win32 threads to run on a single processor core

I have a test program which would be much simpler if it could rely on threads being scheduled in strict priority order on Windows. I'm seeing a low priority thread running alongside higher priority threads and wonder if this is happening because the different threads are being scheduled on different processor cores.
Is there a way to force all Win32 threads in a process to use a single processor core? SetThreadAffinityMask looks like it might be interesting but its docs aren't entirely clear and I'm not sure how to use it.
SetThreadAffinityMask function: Sets a processor affinity mask for the specified thread.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686247%28v=vs.85%29.aspx
SetThreadAffinityMask(GetCurrentThread(), (1 << CoreNumber));
Sets the current thread's affinity to 'CoreNumber' variable
Even if you force all threads onto one virtual processor you will still often have low-priority threads running and high-priority threads waiting for them (priority inversion). Once a thread is scheduled by the windows-scheduler it runs until it is either preempted or sleeps (or some other sleep-inducing system call). You will have to change the design of your application so that it no-longer assumes that no low-priority thread runs while a high-priority thread would be ready to run also.

How does a cpumask affect scheduling of other processes in the linux kernel?

I'm using a linux 2.6.x kernel on my machine which has ubuntu installed (Ubuntu is just mentioned in case this changes anything). The kernel runs on a machine that has 8 cores. The machine also runs openvz but I don't think this does change the context of the question.
I have a software installed that allows only the usage of two CPUs and it sets a hard CPU affinity on the first both CPUs (cpumask 3). I'm asking myself how the scheduling of the other processes is affected by this. I think I read something about it but I assume for now that processes are likely to be attached to the first CPUs. And the kernel tries to keep the processes on the same CPU always to avoid cache invalidation.
On the machine there are quite a few processes running. How does the kernel handle this situation? Can it be the hard CPU affinity proceses are running slower because they are affected while being bound to a crowded zone? How does the kernel care about the hard affinity.
What will happen in the long run is that the load balancing code of the scheduler will move more of the unbound tasks to the rest of the CPUs to account for this task being bound to the first two.
The way it works is that each task starts on the CPU where it was created and at the micro level the Linux task scheduler does scheduling decisions on each CPU without regard for the others. But then there is the more macro level process migration load balancing code that will step up and say: "the run queue (list of processes waiting to be scheduled) on this cpu is longer than that cpu, let's move some over to balance the loads".
Of course, since your specific task is bound to the first two cpus the load balance will pick other tasks to move - so your bound task will in the long run will "push out" enough of the other non bound tasks to the other cpus and balance will be preserved.

What exactly happens when sleeping a thread

I was wondering how the task scheduler in the operating system handles sleeping threads.
By this I mean whether a sleeping thread is still checked by the scheduler, or just skipped entirely when figuring out which thread to active for the next 10 ms or however long it's given.
My reason for asking this, is to figure out whether a sleeping thread consumes CPU cycles (albeit very few).
So does anyone know what happens ?
And do you know whether it's different from Windows to Linux ?
A thread runs when the CPU is executing instructions for that thread. The scheduler hands the CPU to runnable threads. A sleeping thread is just an entry into the scheduler internal tables; that thread consumes no CPU per itself, since the scheduler knows that the thread is not runnable, and thus does not give him the CPU. The entry conceptually contains the time at which the thread shall be awakened.
A sleeping thread may have an indirect cost, in management time by the scheduler itself. This really depends on the structures and algorithms employed by the scheduler; the Linux kernel scheduler is rumored to be very good at managing thousands of sleeping threads without taking too much time to decide which thread to run. Some other operating systems do not fare as well, but as a rule of thumb this effect is negligible when the total number of threads/processes is less than a thousand.
It depends on the OS implementation, but usually there is a "schedulable thread" data structure to keep things more efficient.
But some housekeeping task probably has to look at the list of all existing threads occasionally, even if not at every scheduling cycle.

Resources