Why does linux kernel need idle thread? - linux-kernel

Rather than "do nothing" if there is nothing to do (including SMP), why linux kernel runs idle thread?

When the scheduler decides to switch to the idle task, at this point, the dynamic tick begins to work, by disabling periodic tick until the next timer expires. The tick will be reenabled after this time span or when an interrupt occurs at some time.
In the mean time, the CPU is going to a well-deserved sleep, in an architecture-specific way, therefore saving your power. Take a look at the definition of cpu_idle() in arch/x86/kernel/process.c.
/*
* The idle thread. There's no useful work to be
* done, so just try to conserve power and have a
* low exit latency (ie sit in a loop waiting for
* somebody to say that they'd like to reschedule)
*/
void cpu_idle(void)

What do you mean by "do nothing"??
When the CPU is powered up there is a rather long list of things that happen. Once powered up the CPU cannot "do nothing". It has to do something because there is a voltage and a periodic clock signal. You can power it down again and do ABSOLUTELY nothing but then you have to go through the long list of things to get a steady clock signal when you need it again.
So the idle thread is a thread that does the bare minimum. i.e. if multiplying two floating point numbers required the least number of cycles and the least number of electronic circuitry; then the idle thread would be multiplying two floats all the time. In addition as Wang said the Linux kernel (in some configurations) monitors when cores start executing idle thread and also switches them to a lower frequency, disabling any sort of periodic OS house keeping. This results in a bit of latency when the core is needed but then there is much less power being used.

Related

What is the kernel timer system and how is it related to the scheduler?

I'm having a hard time understanding this.
How does the scheduler know that a certain period of time has passed?
Does it use some sort of syscall or interrupt for that?
What's the point of using the constant HZ instead of seconds?
What does the system timer have to do with the scheduler?
How does the scheduler know that a certain period of time has passed?
The scheduler consults the system clock.
Does it use some sort of syscall or interrupt for that?
Since the system clock is updated frequently, it suffices for the scheduler to just read its current value. The scheduler is already in kernel mode so there is no syscall interface involved in reading the clock.
Yes, there are timer interrupts that trigger an ISR, an interrupt service routine, which reads hardware registers and advances the current value of the system clock.
What's the point of using the constant HZ instead of seconds?
Once upon a time there was significant cost to invoking the ISR, and on each invocation it performed a certain amount of bookkeeping, such as looking for scheduler quantum expired and firing TCP RTO retransmit timers. The hardware had limited flexibility and could only invoke the ISR at fixed intervals, e.g. every 10ms if HZ is 100. Higher HZ values made it more likely the ISR would run and find there is nothing to do, that no events had occurred since the previous run, in which case the ISR represented overhead, cycles stolen from a foreground user task. Lower HZ values would impact dispatch latency, leading to sluggish network and interactive response times. The HZ tuning tradeoff tended to wind up somewhere near 100 or 1000 for practical hardware systems. APIs that reported system clock time could only do so in units of ticks, where each ISR invocation would advance the clock by one tick. So callers would need to know the value of HZ in order to convert from tick units to S.I. units. Modern systems perform network tasks on a separately scheduled TCP kernel thread, and may support tickless kernels which discard many of these outdated assumptions.
What does the system timer have to do with the scheduler?
The scheduler runs when the system timer fires an interrupt.
The nature of a pre-emptive scheduler is it can pause "spinning" usermode code, e.g. while (1) {}, and manipulate the run queue, even on a single-core system.
Additionally, the scheduler runs when a process voluntarily gives up its time slice, e.g. when issuing syscalls or taking page faults.

What exactly is CPU load if instructions are executed one at a time?

I know this question has been asked many times in many different manners, but it's still not clear for me what the CPU load % means.
I'll start explaining how I perceive the concepts now (of course, I might, and sure will, be wrong):
A CPU core can only execute one instruction at a time. It will not execute the next instruction until it finishes executing the current one.
Suppose your box has one single CPU with one single core. Parallel computing is hence not possible. Your OS's scheduler will pick up a process, set the IP to the entry point, and send that instruction to the CPU. It won't move to the next instruction until the CPU finishes executing the current instruction. After a certain amount of time it will switch to another process, and so on. But it will never switch to another process if the CPU is currently executing an instruction. It will wait until the CPU becomes free to switch to another process. Since you only have one single core, you can't have two processes executing simultaneously.
I/O is expensive. Whenever a process wants to read a file from the disk, it has to wait until the disk accomplishes its task, and the current process can't execute its next instruction until then. The CPU is not doing anything while the disk is working, and so our OS will switch to another process until the disk finishes its job in order not to waste time.
Following these principles, I've come myself to the conclusion that CPU load at a given time can only be one of the following two values:
0% - Idle. CPU is doing nothing at all.
100% - Busy. CPU is currently executing an instruction.
This is obviously false as taskmgr reports %1, 12%, 15%, 50%, etc. CPU usage values.
What does it mean that a given process, at a given time, is utilizing 1% of a given CPU core (as reported by taskmgr)? While that given process is executing, what happens with the 99%?
What does it mean that the overall CPU usage is 19% (as reported by Rainmeter at the moment)?
If you look into the task manager on Windows there is Idle process, that does exactly that, it just shows amount of cycles not doing anything useful. Yes, CPU is always busy, but it might be just running in a loop waiting for useful things to come.
Since you only have one single core, you can't have two processes
executing simultaneously.
This is not really true. Yes, true parallelism is not possible with single core, but you can create illusion of one with preemptive multitasking. Yes, it is impossible to interrupt instruction, but it is not a problem because most of the instructions require tiny amount of time to finish. OS shares time with time slices, which are significantly longer than execution time of single instruction.
What does it mean that a given process, at a given time, is utilizing 1% of a given CPU core
Most of the time applications are not doing anything useful. Think of application that waits for user to click a button to start processing something. This app doesn't need CPU, so it sleeps most of the time, or every time it gets time slice it just goes into sleep (see event loop in Windows). GetMessage is blocking, so it means that thread will sleep until message arrives. So what CPU load really means? So imagine the app receives some events or data to do things, it will do operations instead of sleeping. So if it utilizes X% of CPU means that over sampling period of time that app used X% of CPU time. CPU time usage is average metric.
PS: To summarize concept of CPU load, think of speed (in terms of physics). There are instantaneous and average speeds, so speaking of CPU load, there also are instantaneous and average measurements. Instantaneous is always equal to either 0% or 100%, because at some point of time process either uses CPU or not. If process used 100% of CPU in the course of 250ms and didn't use for next 750ms then we can say that process loaded CPU for 25% with sampling period of 1 second (average measurement can only be applied with certain sampling period).
http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages
A single-core CPU is like a single lane of traffic. Imagine you are a bridge operator ... sometimes your bridge is so busy there are cars lined up to cross. You want to let folks know how traffic is moving on your bridge. A decent metric would be how many cars are waiting at a particular time. If no cars are waiting, incoming drivers know they can drive across right away. If cars are backed up, drivers know they're in for delays.
This is basically what CPU load is. "Cars" are processes using a slice of CPU time ("crossing the bridge") or queued up to use the CPU. Unix refers to this as the run-queue length: the sum of the number of processes that are currently running plus the number that are waiting (queued) to run.
Also see: http://en.wikipedia.org/wiki/Load_(computing)

forced preemption on windows (occurs or not here)

Sorry for my weak english, by preemption I mean forced context
(process) switch applied to my process.
My question is :
If I write and run my own program game in such way that it does 20 millisecond period work, then 5 millisecond sleep, and then windows pump (peek message/dispatch message) in loop again and again - is it ever preempted by force in windows or no, this preemption does not occur?
I suppose that this preemption would occur if I would not voluntary give control back to system by sleep or peek/dispatch in by a larger amount of time. Here, will it occur or not?
The short answer is: Yes, it can be, and it will be preempted.
Not only driver events (interrupts) can preempt your thread at any time, such thing may also happen due to temporary priority boost, for example when a waitable object is signalled on which a thread is blocked, or for example due to another window becoming the topmost window. Or, another process might simply adjust its priority class.
There is no way (short of giving your process realtime priority, and this is a very bad idea -- forget about it immediately) to guarantee that no "normal" thread will preempt you, and even then hardware interrupts will preempt you, and certain threads such as the one handling disk I/O and the mouse will compete with you over time quantums. So, even if you run with realtime priority (which is not truly "realtime"), you still have no guarantee, but you seriously interfere with important system services.
On top of that, Sleeping for 5 milliseconds is unprecise at best, and unreliable otherwise.
Sleeping will make your thread ready (ready does not mean "it will run", it merely means that it may run -- if and only if a time slice becomes available and no other ready thread is first in line) on the next scheduler tick. This effectively means that the amount of time you sleep is rounded to the granularity of the system timer resolution (see timeBeginPeriod function), plus some unknown time.
By default, the timer resolution is 15.6ms, so your 5ms will be 7.8 seconds on the average (assuming the best, uncontended case), but possibly a lot more. If you adjust the system timer resolution to 1ms (which is often the lowest possible, though some systems allow 0.5ms), it's somewhat better, but still not precise or reliable. Plus, making the scheduler run more often burns a considerable amount of CPU cycles in interrupts, and power. Therefore, it is not something that is generally advisable.
To make things even worse, you cannot even rely on Sleep's rounding mode, since Windows 2000/XP round differently from Windows Vista/7/8.
It can be interrupted by a driver at any time. The driver may signal another thread and then ask the OS to schedule/dispatch. The newly-ready thread may well run instead of yours.
These desktop OS, like Windows, do not provide any real-time guarantees - they were not designed to provide it.

How do you limit a process' CPU usage on Windows? (need code, not an app)

There is programs that is able to limit the CPU usage of processes in Windows. For example BES and ThreadMaster. I need to write my own program that does the same thing as these programs but with different configuration capabilities. Does anybody know how the CPU throttling of a process is done (code)? I'm not talking about setting the priority of a process, but rather how to limit it's CPU usage to for example 15% even if there is no other processes competing for CPU time.
Update: I need to be able to throttle any processes that is already running and that I have no source code access to.
You probably want to run the process(es) in a job object, and set the maximum CPU usage for the job object with SetInformationJobObject, with JOBOBJECT_CPU_RATE_CONTROL_INFORMATION.
Very simplified, it could work somehow like this:
Create a periodic waitable timer with some reasonable small wait time (maybe 100ms). Get a "last" value for each relevant process by calling GetProcessTimes once.
Loop forever, blocking on the timer.
Each time you wake up:
if GetProcessAffinityMask returns 0, call SetProcessAffinityMask(old_value). This means we've suspended that process in our last iteration, we're now giving it a chance to run again.
else call GetProcessTimes to get the "current" value
call GetSystemTimeAsFileTime
calculate delta by subtracting last from current
cpu_usage = (deltaKernelTime + deltaUserTime) / (deltaTime)
if that's more than you want call old_value = GetProcessAffinityMask followed by SetProcessAffinityMask(0) which will take the process offline.
This is basically a very primitive version of the scheduler that runs in the kernel, implemented in userland. It puts a process "to sleep" for a small amount of time if it has used more CPU time than what you deem right. A more sophisticated measurement maybe going over a second or 5 seconds would be possible (and probably desirable).
You might be tempted to suspend all threads in the process instead. However, it is important not to fiddle with priorities and not to use SuspendThread unless you know exactly what a program is doing, as this can easily lead to deadlocks and other nasty side effects. Think for example of suspending a thread holding a critical section while another thread is still running and trying to acquire the same object. Or imagine your process gets swapped out in the middle of suspending a dozen threads, leaving half of them running and the other half dead.
Setting the affinity mask to zero on the other hand simply means that from now on no single thread in the process gets any more time slices on any processor. Resetting the affinity gives -- atomically, at the same time -- all threads the possibility to run again.
Unluckily, SetProcessAffinityMask does not return the old mask as SetThreadAffinityMask does, at least according to the documentation. Therefore an extra Get... call is necessary.
CPU usage is fairly simple to estimate using QueryProcessCycleTime. The machine's processor speed can be obtained from HKLM\HARDWARE\DESCRIPTION\System\CentralProcessor\\~MHz (where is the processor number, one entry for each processor present). With these values, you can estimate your process' CPU usage and yield the CPU as necessary using Sleep() to keep your usage in bounds.

How to generate ~100kHz clock signal in Liunx kernel module with bit-banging?

I'm trying to generate clock signal on GPIO pin (ARM platform, mach-davinci, kernel 2.6.27) which will have something arroung 100kHz. Using tasklet with high priority to do that. Theory is simple, set gpio high, udelay for 5us, set gpio low, wait another 5us, but strange problems appear. First of all, can't get this 5us of dalay, but it's fine, looks like hw performance problem, so i moved to period = 40us (gives ~25kHz). Second problem is worst. Once per ~10ms udelay waits 3x longer than usual. I'm thinking that it's hearbeat taking this time, but this is is unacceptable from protocol (which will be implemented on top of this) point of view. Is there any way to temporary disable heartbeat procedure, lets say, for 500ms ? Or maybe I'm doing it wrong from the beginning? Any comments?
You cannot use tasklet for this kind of job. Tasklets can be preempted by interrupts. In some case your tasklet can be even executed in the process context!
If you absolutely have to do it this way, use an interrupt handler - get in, disable interrupts, do whatever you have to do and get out as fast as you can.
Generating the clock asynchronously in software is not the right thing to do. I can think of two alternatives that will work better:
Your processor may have a built-in clock generator peripheral that isn't already being used by the kernel or another driver. When you set one of these up, you tell it how fast to run its clock, and it just starts running out the pulses.
Get your processor's datasheet and study it.
You might not find a peripheral called a "clock" per se, but might find something similar that you can press into service, like a PWM peripheral.
The other device you are talking to may not actually require a regular clock. Some chips that need a "clock" line merely need a line that goes high when there is a bit to read, which then goes low while the data line(s) are changing. If this is the case, the 100 kHz thing you're reading isn't a hard requirement for a clock of exactly that frequency, it is just an upper limit on how fast the clock line (and thus the data line(s)) are allowed to transition.
With a CPU so much faster than the clock, you want to split this into two halves:
The "top half" sets the data line(s) state correctly, then brings the clock line up. Then it schedules the bottom half to run 5 μs later, using an interrupt or kernel timer.
In the "bottom half", called by the interrupt or timer, bring the clock line back down, then schedule the top half to run again 5 μs later.
Unless you can run your timer tasklet at higher priority than the kernel timer, you will always be susceptible to this kind of jitter. You do really have to do this by bit-ganging? It would be far easier to use a hardware timer or PWM generator. Configure the timer to run at your desired rate, set the pin to output, and you're done.
If you need software control on each bit period, you can try and work around the other tasks by setting your tasklet to run at a short period, say three-fourths of your 40 us delay. In the tasklet, disable interrupts and poll the clock until you get to the correct 40 us timeslot, set the I/O state, re-enable interrupts, and exit. But this effectively types up 25 % of your system in watching a clock.

Resources